Last time we stepped through the process of loading and playing a sound using OpenAL. In the example we only played a single sound once. In practical applications, in particular games, will have a larger set of sounds they use, and they will be playing these sounds multiple times. They might even play the same sound multiple times simultaneously.
In this post we are going to look at managing a larger set of sound effects.
The best place to start is the pipeline we developed last time. We can summarise this pipeline into a single image:
Each of the boxes represents a storage step in our process. Each arrow represents a processing step. We start with an encoded sound file on the file system. To get access to the sound, we load the binary data from the file into our application and convert it into something useful (decoding). Before we can play the data, we have to copy it to the sound card (note that presently computers often do not come with a dedicated sound card and use the CPU to emulate it).
Some of the operations in this pipeline are slow. Even for short sounds the process of reading the binary data and decoding it takes several milliseconds. If you start with running through the pipeline from the moment the sound is requested to be played, there will be a noticeable lag between the event on the screen and the sound. Humans are very sensible for this lag, and therefore running through the entire pipeline for every time a sound is played is an invalid approach.
Storing intermediate results
The solution is quite simple: make it so that we don’t have to go through the entire pipeline every single time. The image above shows three more candidates of intermediate results we could store to shorten the amount of work we have to do at runtime:
- the encoded binary data (application memory);
- the decoded raw data (application memory);
- the decoded raw data (sound card memory).
One of these candidates can be immediately disregarded: decoding the binary data is the slowest process in our pipeline, especially for compressed files like MP3 and OGG. There is a more practical problem as well: if we look at the way we loaded our audio data in the previous blog post, we notice that we do the decoding while reading the binary data. The following image therefore better represents the actual pipeline:
This combination of reading the data from the file system and decoding it as one process is something that will be of vital importance when streaming audio files, which we will discuss in a future post. For now, it invalidates the encoded binary data as valid immediate result to store.
Application memory vs sound card memory
This leaves the decoded raw data. The only remaining problem to consider is where to store it. In general we want to do as little work as possible on runtime, so storing results from as far into the pipeline as possible is beneficial. This makes storing the audio data on the sound card the prime candidate. Just as with graphics cards: application memory (RAM) is fast, but internal memory is faster. Not having to move data across all the time might save us valuable time, especially considering the amount of data audio can contain.
There is one major issue with storing data onto the sound card: the buffers have a very limited capacity. A few dozen completely filled buffers will be fine, but many applications require hundreds of different sound effects, and this will quickly overflow the buffer memory.
The only remaining option is to store all the decoded data in the application memory and move it to the sound card when needed. The good news is: as long as your sound effects are reasonably short (even up to a few seconds should be fine), there will be no noticeable time lag. Even in high-paced action games like Roche Fusion there does not appear to be any lag between graphical and auditory feedback. We can easily demonstrate this by looking at the performance graph of this game, since the sound effect code in this game runs synchronously.
Even in later stages of the game where many sounds play simultaneously, the copying of the sound data to the buffer does not appear to have any significant impact on the framerate. I should add however that Roche Fusion utilises the caching code that will be discussed in the following section, which cuts down on the performance impact even further.
Using the sound card memory after all
While using the application memory is a completely acceptable solution, we can do even better. We can still use the sound card memory, but we have to be really careful in doing so. We will build on the principle that if we play a sound, it is very likely that the sound will be played again very soon. If we have a game where the player can shoot weapons, it is likely that the player will only be using a subset of the available weapons. It makes no sense to upload all the audio data of weapons to the sound card, since the player can not even shoot all of them.
We will for now work with the assumption that the sound effect manager is not context aware, so it does not know anything about the sounds that may be played soon. Still, if the sound effect manager is told to play a sound, it can deduce that that sound is likely to be played again soon. Instead of copying the sound effect to the buffer, playing it, and then dropping the buffers again, we can make the buffers semi-persistent. That is: we let the buffers hang around for a few seconds after the sound is finished playing. If the sound has to be played again, we already have the buffers ready, so we don’t have to copy the audio data again. Since multiple sources can read from the same buffer simultaneously, even playing the same sound multiple times at the same time is no problem.
Variations & closing notes
This approach does add a bit of overhead: we do have to check the buffers regularly to see if they have been played lately and drop them if it has been too long ago since they have been used. This can be circumvented by not making the buffers drop after a certain amount of time, but by using a queue of buffers. Every time a sound effect is played, it picks up a buffer from the front of the queue if it hasn’t one yet, empties it, and then uses it for its own data. Regardless of whether the buffer was new, the buffer is then pushed to the end of the queue. This means that the buffers in front of the queue belong to the sound effects that have not been played for the longest time. While this approach removes the need of continuously updating the buffers, it does add extra overhead when playing a sound, so this is a trade-off that might balance out in either direction depending on the applications.
For the sake of brevity, I will not include implementations details in this post. If you are interested in this or something is not clear, feel free to request this and I will dedicate a complete post to it. Otherwise, there will be an implementation in my audio library available for use, but the library is still work in progress.
Taking it even further & conclusion
During this post we have been focussing on improving the runtime performance of playing a sound. To achieve this we introduced a caching mechanism. The system currently assumes that all sounds are cached when the application is first started. This significantly impacts loading times, and in Roche Fusion the sound effect initialisation is one of the deciding factors in the loading time. Another major disadvantage is the memory requirements: because all sound effects are stored in the application memory, the memory usage will increase by every sound you add. For applications with a lot of sounds, this may be very undesirable behaviour.
These issues can both be solved by deferred loading. Instead of decoding all sounds when the application starts, we only decode the sounds once we need it. However, this puts us back at the start where running through the entire pipeline introduces a noticeable time lag. This in turn can be solved by predicting which sounds will be required soon.
As a simple example, we could load the sounds required for the main menu when the application loads. The basic sounds required for the game can then be loaded in the background while the user navigates the menus.
Actually predicting which sounds are required may be a very challenging problem depending on the application. In any circumstances, it will require a description of which sound effect is needed when. Doing this for a large number of sound effects will add up to a very complex system and is often not worth it.
In the end, the exact shape of your sound effect manager is hugely dependent on what it is needed for. I hope that in this post I gave you a rough outline of a generic system that can be adapted in many different ways. In case you have any questions, feel free to leave them behind as a comment, and don’t hesitate to get in contact if you have experience with these kind of systems yourself and have something to add.
Until next time!