game audio

Oct 7, 2022

Audio in games is a surprisingly sparse subject when it comes to the technical side. Especially when you’re trying to work with it independent of a game engine. The barrier to entry can be pretty great if you’ve never worked with it before and want to explore how it actually works. My goal with this article is to help get the ball rolling. As well as explain some concepts that can be hard to find real world definitions for. Things like FMod, Unity, and prebuilt engines in general can expose similar concepts to one another. These being: audio source (think speaker in a room), audio buffer (the data of the sound file), listener (your ears), and basic controls for playback modification like filters and effects on both the listener and source side. The goal of this post is to help explain some concepts that are often overlooked by more experienced developers that less acquainted ones might not have many resources for. I’m going to start small and build towards some more complex ideas as this post goes on. Much of this is meant to be universal, but there will be a link to a sample audio framework written by a friend of mine in C99 at the bottom of the page! As well I’ve made an example repository you can find on GitHub !

Buffer

Buffers are a relatively straightforward concept, right? Think big array with only a set number of indexes immediately stored. You can fetch more, continue down the buffer, or just straight up modify the data going in for effects. You see this data, is the audio that’s getting played back. Beware tho, modifying this data directly can lead to funky outcomes when combined with other layers of the audio system. Generally we try to limit modifying the buffers to just simple filters (I’ll explain filters more later, just think of it like cutting out certain bits). With these buffers, you pass them to sources that read them and play them back! Note: Generally the data in these buffers is called PCM. PCM stands for Pulse Code Modulation (Amplitude, Frequency, and Phase). This data tells your speakers what to play.

Psuedo-Code:

void create_buffer(unsigned char* file_data, int file_size) {
  int buffer_id = buffer_create(); // in OpenAL, this is alGenBuffers(1, &buffer_id)
  buffer_data(buffer_id, file_data, file_size); // as well in OpenAL this is alBufferData
}

Source

While a source could mean a few different things in the programming context, think of it as a floating speaker at a given point in space. Whether 2D or 3D, doesn’t make too much of a difference except what you hear. With these sources, like mentioned earlier, you can attach a buffer to play back. You can also generally attach filters and effects (think reverb, chorus, etc, I’ll explain this more too). The source is meant to be the place where you dial in what you want to come out of that speaker if you were directly in front of it. Do you want that monster hit to sound super echo-y without making all sounds echo? You should probably attach the reverb effect to the source then.

Psuedo-Code:

void create_source(vec3 position) {
  int source_id = source_create(); // in OpenAL, this is alGenSources(1, &source_id);
  source_set_position(source_id, position); // in OpenAL, this is alSource3f(source_id, AL_POSITION, x, y, z);
}

Listener

The listener is one of the more straightforward concepts in game audio. Think of it as your ears. Some people consider it a microphone so it doesn’t have to be tied to a specific entity. Either works. At the listener stage is generally where you want to mix the audio in what is being heard. Keep in mind you can also attach global effects and filters to the listener for when you sounds to be modified uniformly.

Psuedo-Code:

// NOTE: Most audio frameworks assume there's always a listener so you shouldn't need to initialize it
void update_listener(vec3 position, float gain) {
  listener_set_position(position); // in OpenAL, this is alListener3f(AL_POSITION, x, y, z);
  listener_set_gain(gain); // in OpenAL, this is alListenerf(AL_GAIN, gain);
}

Mixing

Mixing can be a whole host of things. Like DJing mixing, mixing in game audio is generally dynamic. Having a good mix can having a game sound great or like complete garbage noise being spammed at all times without regard for the listener. There’s a few steps to mixing in itself but by far the most important is loudness. There’s a difference between loudness and volume by the way. Volume is generally at what decibel is something being played. Loudness is how loud something is generally perceived. There’s a few categories to it but the one I find most important is the delta of volume. Something that has high volume suddenly can appear louder than something winding up over time. The best example is boiling a frog, where volume is the heat. As well it also includes the range of that sound. Is it too high pitched? Might want to eq (equalize) the high range down so it’s not so annoying. Does the sound just need some high frequencies taken out, not muffled? Add a low pass filter to it and tweak the values to match. An important topic on loudness can be found surrounding a metric called LUFS (Loudness Units relative to Full Scale) here .

Mixing, I think, is generally the art of layering things together well and with proper focus. What is relevant in a song that is crazy memorable on a journey thru an environment is very different from a background song that’s just meant to set the mood while you’re fighting enemies creating tons of sounds. In combat it’s a good idea to prioritize important sounds that might be cues to the player as to exactly what is going on. Another important thing is culling out too many sounds. If you have 600 enemies making the exact same sound in a given period of time, hearing it 600 times can be a little crazy. There’s a limit to how much we naturally perceive relative to how loud something already is. So when something is loud and near you, you won’t necessarily be able to hear the leafs falling 50 meters away.

Effects

Effects can be thought of as mathematical formulas for modifying data. All the effects generally do is take some inputs and create an output. I like to think of them as small black boxes I hook all my guitar pedals up to to make it sound different. Reverb is perhaps one of the more prevalent effects used. Reverb has a few inputs to it generally: room size, wet/dry, damp, gain, etc. There’s a large amount of possibility when it comes to effects but generally there’s a set of effects used: reverb, chorus, distortion, echo, flanger, and equalizers.

Filters

Filters are more like, well, filters. Think about audio this way: there’s 3 main frequency bands we tend to consider: low, mid, and high. Think of low pretty much as bass/booming sounds, mid is roughly where some lower sounding instruments are and much of human talking, while high is all the higher pitched ends of talking, instruments, metal sounds, etc. These categories directly correlate to the frequency of the sound being percieved. For helping cull out some of these bands, we generally use what are aptly named: lowpass filter, highpass filters, and bandpass filters. You can think of them as pretty direct implications of their names. Low pass lets thru low end noises, high pass higher end, and band pass both low and high. Filters like these might be important if you have a sound effect you know occupies a certain frequency, and you want to give it the ability to stand out in the playback. To do that, you can just add a filters to filter around where the sound occupies, giving it a sort of solo effect. If you’re interested in learning more about filters, I’d recommend this article . TL;DR Low pass lets low frequencies thru, high pass lets high frequencies thru.

Psuedo-Code:

void filter_out_annoying_sounds(song annoying_song) {
  // Create a lowpass filter that lets all frequencies under 22khz thru
  lowpass_filter = lowpass_filter_create(22000.0f);
  song_add_filter(annoying_song, lowpass_filter);
}

Keyframes + Events

Effects and filters are important. What is also important is that you can update the parameters of each over time following key frames/events. There might be specific bits of songs that need a low pass filter during combat but not the rest of the song or when there is no longer other sounds in that frequency range playing. I strongly encourage you to not go too over board with this. But it is important to understand that there are many possible layers of complexity when it comes to playing with the sound you’re receiving. Keyframes and Events should be able to trigger different things with regards to your sound, dynamic sound modulation can have really cool effects!

Psuedo-Code:

void enemy_attacks_us() {
  // ... update collision, game status, etc
  start_combat_effects();
}

// Filter out most the upper end noises so combat sound effects are heard clearly
void start_combat_effects() {
  filter = lowpass_filter_create(1600.0f); // let everything below 1600 hz thru
  for(song current_song : songs_playing) {
    song_add_filter(current_song, filter);
    // Maybe lower the gain as well, unless we're playing an intense song!
  }
}

Audio Codecs

Audio can be a lot of data if it goes on for long enough. The way we store and transmit audio is generally compressed so that most audio that wouldn’t be human perceivable is cut out. The point of the codec is to help compress and cut out unnecessary (open to interpretation) data in audio files. In game development I’ve generally run across two main formats for audio, and I’ll explain them as such. Note: Codec stands for coder/decoder.

WAV / Wave Format: Wave Format is one of the most straightforward formats you can use. It’s good for short sounds that generally don’t need compression and thus you can afford to let them be closer to original size. Wave is generally uncompressed meaning the format dictates you store the raw audio data in the file. The file is pretty much just a header on at the start with the data following. That makes loading and streaming it from disk relatively straightforward. Keep track of where you were and you can know where you should go next.

OGG / Vorbis: OGG Vorbis is an open source audio format/container. .ogg files can be deceptively small, as the algorithm is super efficient at cutting out imperceptible noise and saving frames of what will need to be played back. It has quite a bit of configurability, but the main benefit of the system is the almost mind boggling level of compression you can get out of it. Here’s some real world stats. My prized possession of a file: Rebecca Black - Friday.mp3 is roughly 5.3 mb. Well, that’s pretty good if you ask me for a 3 minute 47 second absolute banger of a song. Well when I hook that up to ffmpeg and convert it to WAV File, it’s now 39.1 mb. Man, mp3 must be insane… Run it thru ffmpeg again converting that WAV file to OGG yields a file of 2.8 mb. Almost half the size of the mp3 we started with. With that level of compression in mind there are some drawbacks to implementing it. The way the format is laid out means there’s almost always a need to use a library that has implemented decoding it (stb, libvorbis) which has inherit overheads in itself. The main tradeoff here is disk space for CPU cycles. Vorbis is primarily used in my projects for music files exclusively. The compression is so great and the streaming implementation, when done correctly, can be almost no cost on modern machines. Keep in mind that wave files are generally advisable for sound effects, because now you’re not having to decode 60 smaller files, just a few big ones over time.

After these two examples you might be wondering, why not just use MP3? It’s a good middle ground, it’s almost the defacto standard for compressed audio these days. You can download them pretty easily. Well, historically, mp3 is a licensed technology. Meaning you should probably acquire a license (acquire as in pay for) to use the technology/standard in your software. Software patents are evil, and make it complicated and mundane enough that it’s straight up just not worth it. If you’re concerned about accessibility to getting files to WAV/OGG, I strongly encourage downloading ffmpeg and using it to convert your audio if your creation software doesn’t export to it immediately (most do).

Psuedo-Code:

void decode_ogg_file(unsigned char* file_data, int file_length) {
 usable_audio_data = vorbis_decoder_decode(file_data, file_length);
 
 // then buffer the data into a buffer for use or just print it for funsies
}

Tips

Don’t master your sounds to 0 db being the ceiling, there is data loss after you cross 0 db, so amplifying it leads to lower quality
A good atmosphere sound is better than an alright song, don’t be afraid to not be playing music constantly
Making music is hard, but it’s not the most important audio of your game. Sound effects alone can make a game sound good, bad music can tank it. Add good music and it can be a masterpiece. Find your capabilities and stretch them a little but be realistic when it comes to standards
Listen to your audio in different places. Headphones are always ideal, but not everyone will play every game with great headphones on. Not everyone will play your game inside. Listen to your creations in the car, in your bathroom on a speaker, on your TV, and with quiet headphones. Note: some games offer EQ options based on where you’re listen to it, not all these environments are made equal
Good audio doesn’t have to be loud to be heard correctly. Well balanced audio shouldn’t need to have your headphones maxed out at 100% to listen to the beautiful noises you’ve spent so much time making.

Notes / Other good things to know

In the next section, I’ll be listing a few tools. Some of them are what are called DAWs (Digital Audio Workstations). Think, Game Engine for creating music/sound. Many of them are so advanced now you can get by without outside equipment. VSTs (Virtual Studio Technology, think, synths/instruments) as generally plugins for helping create sounds. If you’re interested in working in DAWs I strongly recommend starting small. You don’t need to go and buy a midi keyboard (piano kind) to start making music. You can get by making some pretty good stuff with just your computer keyboard, mouse, and decent headphones. Remember, a bad carpenter with a $5000 tool is still bad. Grow out of your tools first.

Tools

rxi/aq, audio framework in C - Audio framework that has many of the concepts discussed!

sfxia (windows, linux) - Generate sounds/sound effects

sfrx - The OG sound generator for game jams

ableton live - This is my preferred DAW, it’s not open source but it works damn well

fl studio - Also known as fruity loops, this also a super capable DAW with tons of documentation/videos on using it

reaper - Another DAW but this one is cross platform (linux, mac, windows). I don’t have too much experience in it, I just remember it not quite feeling right for me

freesound - This is a giant repository of open source / creative commons sounds, I highly recommend contributing and using it!

and most importantly, patience. Patience will be your greatest tool when working with this kind of tech.