It seems that my original dither was right all along, but the 16-bit conversion/rounding wasn't. My 37% value was correct for full rectangular dithering, not half. Half rectangular should never span more than two samples, which I wasn't sure about. The width of the 'rectangle' is one sample, but will fractionally impinge on neighbours as the central value moves.
The end result is that I've got properly tested dithering, and triangle (which is a 'full' rectangle in width, that is two samples, but these are multiplied together like two dice rolls).
Tests now match the Sony dither and Prometheus is now updated to v3.63. I'll add these changes to SFXEngine at some point.