Sunday, January 24, 2021

Prometheus v2.65 Bug Hunting

A rather tedious and anxious day of programming. I was, am, determined to fix this very rare and long term bug in Prometheus. There are only 5 known bugs and two are really due to experimental or rarely used code that I can easily fix if needed, they don't count. The other three are all exremely rare and intermittant crash bugs. All are probably as old as the program itself, over 10 years and sufficiently rare that they're not a vital problem. One didn't happen at all for an entire year of use, only to come back and reveal that it was still there.

The one I focused on last night and today is an occasional crash when I click stop during playback. I've put a series of traces in the code, a text file saves out where the program is at various points, so that by seeing the last entry I can focus in on the cause, hopefully. Last night I had pinned it down to a single line, playing=0. Now the program is multithreaded and this will tell the player thread to end, so the error must be in there. So today I set lots of traps in there... but the results were strange and inconsistent.

See this:

PS. Blogger won't let me use the 'code' or 'pre' HTML so it's an image.

Logic will tell you that [things] will never be called if 'playing' is not set but if it is set, then [things] will be called 10 times. But this is a multithreaded operation and 'playing' is a global, so it can be set to zero in the middle of [things] and this, apparently, is what causes my bug. The thing is, it seems to crash at the end of [things] but before [afterthings]... I find this mystifying.

The only causes are that [things] itself causes the problem - this is possible, that constitutes the player code, but if so, it's by coincidence the last thing in the list; but it also only seems to occur at the instant playing is set to zero, which is odd. It seems that the 'for' loop itself is the problem... my guess is that the compiler or some fancy logic in the internal workings of the computer (a multi-core processor?) is doing something strange in that 'for' loop... it might see the 'while' and make some other sort of intervention.

For now, I've deleted the second playing check altogether. It would have been faster, a micro bit faster, to drop out rather than do those 10 chunks, but in practice the timing is of no consequence.

I hate these rare crashes, they make me anxious. I work hard to make sure that every game and program I design has zero bugs. None. The whole idea of 'updates' to fix errors is a sign of failure. My updates generally add new functionality and I never really update games. Like the boxed games (and tapes!) of old, my games are finished and bug-free with the first public version. Updates or changes are not up for debate, but I'll make them grudgingly if necessary. This is probably why I'm not a super-successful game developer; those people generally release popular bug-filled rubbish and add every change that every idiot suggests, and release new versions every week which become things to announce and publicise and desire.

So, this has taken all day. This week I hope to get back to the Sisyphus music. I have great music ideas, a 'new path'. I'd like to make some new structural units in new music... something like the three or four movements in a piano sonata... but more on this later.

The other two bugs are being watched.

I'll leave the program in a test phase for a week or so and if its stable, file this version.