Tuesday, March 20, 2018

Structures for 3D Audio

I've looked again at upgrading my sound software Prometheus to support surround sound, or 5.1 which is the prevalent option for multi-channel audio, replacing the quadrophonic experiments of the 1970s. The whole situation about this sort of sound is still in flux and not well designed, despite digital technology largely resolving many of the technical problems that made analogue quadrophonic difficult. 5.1 was developed for a cinema and has proven to be popular primarily because of its native support on DVD. This uses 4 audio speakers plus 1 central speaker for dialogue, and a sub-woofer (which is the 'point 1' in the name).

This might suit a cinema, but is a poor choice for music. The sound remains two dimensional, being on one plane. Also, why have one speaker for dialogue, why not more? Or combine the dialogue with the music? Most song music has an inherent mix of speech and music, the balancing between vocals and music is part of the art. For music, a more universal standard would be useful, so I've explored some options to integrate into software.

Current music audio is stereo, left and right. Quadrophonic sound is (or was) normally made from four speakers placed at the corners of the sound area, but this seems irrational given that most conventional music is stereo already, and so front and rear sound would instantly interfere with left and right. It would make the most logical sense to divide the space axially; left and right (LR), front and back (FB), up and down (UD) with six speakers placed in those locations.

It is notably rare for speakers to feature below the listener, under the floor. The Microsoft WAV specification for multichannel audio, at a pinch, includes options in its WAVEFORMATEXTENSIBLE structure for a front speaker (SPEAKER_FRONT_CENTER), left and right (SPEAKER_SIDE_LEFT and RIGHT), rear (SPEAKER_BACK_CENTER), and up (SPEAKER_TOP_CENTER), but nothing for speakers below the listener. The structure seems to have been developed based on current audio usage rather than have any rational structure. There is, for example, support for back top left and front top left speakers, yet not plain top left or top right. There is no support for speakers below the listener, odd allocations such as a "FRONT_LEFT_OF_CENTER" option, and a single low frequency channel somewhere in the middle of the structure. Bass sounds are harder to locate spatially, so presumably these are assumed to be spatially ubiquitous, or unimportant.

It would be more logical to store data in 6 tracks for 3 dimensions: Left L, Right R, Front F, Back B, Up U, Down D. Sound could be recalculated for different speaker arrangements, such as 50% left, 50% front for a traditional quadrophonic placement speaker, or differently for the 'recommended' placement for a 5.1 music system.

Perhaps dialogue or additional layers would be desirable; in cinema or television, for example, where a separate volume control for background music, dialogue, and sound effects could be an option. These could be stored in a different dimension; a new 6-track layer, so for a 3 layer system we might include speech, music, and sound effects, creating 18 audio tracks.

It's interesting to note that, according to Wikipedia, the SACD format supports 6 channels, which would suit a 3D spatial format. A 7.1 sound card could play the audio back with current technology. Monitoring the audio would require six speakers and a specially designed studio, with a speaker in the floor and ceiling. Headphones could be used with contemporary virtual reality technology to detect the exact orientation of the listener's head.

With the growth of virtual reality and immersive environments, new ways of storing multi-dimensional audio will be needed. The current 2D structures are simply not adequate for a 3D environment, and the most efficient system is to use 3 axis for 3 dimensions, and thus 6 channel audio.


I propose an audio data structure that interleaves 6 channels as such; left, right, front, back, top, bottom.
For additional dimensions, a specifier would be needed on the content type; music, dialogue, sound effects, and others (ambient sound, other additional dimensions).
New virtual reality audio systems should be designed for 6 channels, with detection of the correct head orientation of the listener.