Part 2: Understanding auditory reaction time
My previous article highlighted the ways game audio designers can assist players in challenging gameplay situations through careful assessment of function, purpose, and context of sounds. This one goes on a lower level, exploring whether we can leverage the speed of human auditory processing to enable players to make faster decisions based on the sounds they hear.
Before you get excited, I should mention two things. Firstly, while there are some interesting facts to be shared, we’ll ultimately see that designing for auditory reaction time only makes sense for solving very specific problems in a narrow context. Secondly, if you are new to my blog, keep in mind that I am an audio designer, not an academic researcher in the fields of psychoacoustics or cognitive psychology. If you have more expertise in these areas, please challenge me and correct any mistakes I make. For your convenience I’ve included numerous links to studies that underpin my arguments.
As an audio designer, I’ve frequently heard (and repeated) the phrase “Hearing is the fastest human sense.” But what exactly does this mean? Does it pertain solely to reaction time, or does it also encompass information processing? How does it impact our decision-making abilities? And finally, just how much faster are we talking about, is the difference significant enough for game audio designers to take into account? Let’s delve deeper into the topic and try to answer at least some of these questions.
Many studies show that our auditory reaction time is quicker than our visual reaction time. But there are various types of reaction times. Simple reaction time refers to the time it takes for us to detect a sound, without yet understanding what it is. Recognition reaction time refers to how long it takes for us to extract meaning from what we heard. Choice, or complex reaction time, is the total time between hearing something and making a decision based on the meaning we extracted.
Most studies I found focus on simple reaction times. They present different values (such as 284 ms vs. 331 ms; 140–160 ms vs. 180–200 ms; 228 vs. 247 ms; 248 ms vs. 293 ms; 131 ms vs. 202 ms) but on average, they demonstrate that our rapid, instinctive response to sounds is roughly 50 milliseconds quicker than our response to images. But this is just one piece of the puzzle.
While simple reaction times are interesting, they don’t tell the full story of how we react to sounds in games. It’s not enough to simply hear a sound; we also need to recognize its meaning in order to make informed decisions. Additionally, since sounds exist in time, we need to spend some time listening before we can recognize them. On a positive note, there is evidence that humans are able to differentiate sounds longer than 4–8 ms, and the process of recognition takes between 30 and 200 ms after the exposure. Unfortunately, I didn’t find any direct comparison of recognition reaction times between auditory and visual modalities. Still, the study shows that under 200 ms is enough to not just “startle” the player causing a quick preconscious response but deliver them a meaningful message and inform their choices.
The most relevant metric for game sound designers is choice reaction time, as it answers the key question we have: does sound enable faster decision-making? The answer is not straightforward, as different studies have shown different results. One study I referenced earlier mentions that visual choice reaction times are similar to auditory despite auditory simple reaction times being shorter. Another study acknowledges the auditory choice reaction times are shorter than visual but doesn’t specify by how much (though their data suggests a difference of approximately 50 ms, the same as in simple reaction times). Two other studies show that auditory choice reaction is, on average, 100 ms faster than visual. At the same time, there is also evidence of the opposite effect.
Acoustic properties of sounds can also affect our reaction times, which may explain the variations in the results of different studies. Loudness is the obvious one: the louder the sound, the quicker our reaction to it. An older study from 1977 found that if the sounds are quiet, we tend to react more quickly to high-pitched and low-pitched tones compared to mid-range ones. The effect disappears when the sound reaches a loudness of 60 phons, at which point we start to give equal attention to all frequency ranges. Additionally, sounds with a shorter attack time consistently result in shorter reaction times at all loudness levels. Another study shows that we process voice (vocal, not verbal information) faster than other types of sounds.
These findings align with evolutionary explanations from my earlier post on how certain acoustic features make sounds intrinsically unpleasant, possibly because they subconsciously signal danger¹. For instance, sharpness — a psychoacoustic measurement of high-pitched content — naturally indicates proximity. A quiet yet sharp sound can alert us to something nearby, prompting a fast response. Meanwhile, low pitch may be linked to large moving objects or predators that demand our immediate attention. Loudness communicates both proximity and scale. While there are more parallels to explore, it’s not surprising that we react faster to acoustic features that we instinctively associate with potential threats.
Let’s tentatively conclude that sounds indeed give us a 50–100 ms advantage over images in terms of reaction time. In practical terms, this amounts to approximately 3–6 frames at 60 FPS mode, or slightly less than the duration of a blink of an eye. It doesn’t sound like a huge difference to me, although one might argue it matters a lot in a highly competitive context like esports where split second decisions determine the outcome of a match.
A study looked at how delayed audio impacts player performance in first-person shooter games. The results showed that a delay up to 200ms had no negative effect on the performance and went unnoticeable. The time difference between visual and auditory processing could explain why the players did not notice the latency. But the main result shows that introduced audio latency did not make them perform worse.
However, the study participants had no prior experience with that particular game. It means they couldn’t learn specific auditory cues and start actively listening for them. In that sense, the study only shows no effect of introduced audio latency on the performance of new players, but doesn’t indicate if it is the same for the experienced ones. Given how much attention competitive players pay to accurate audio representation, it seems likely that they learn to benefit from the speed of auditory processing as they achieve greater mastery of the game.
Many players with hearing loss avoid games that require quick reaction such as fighting and first-person shooter games. The main reason for rejection likely comes from the players missing out on important gameplay information presented with sound. But additionally, the difficulty level of such games could be unintentionally balanced with auditory reaction time in mind, making them too difficult for deaf players even if all necessary information is presented visually.
It’s worth noting that when we’re faced with high visual perceptual load (more about it in part 1) our reaction time to visual stimuli slows down, but there is a much smaller, if any effect on reaction to sounds. So in perceptually challenging gameplay situations where we’re bombarded with visual stimuli, the difference in reaction time between sounds and visuals could bring a stronger advantage.
Finally, In case you’ve read my article about video game sound effects and pleasant sensations, you may remember the idea that perceptual fluency (the ease of processing sensory information) makes things appear more pleasant and beautiful. Research suggests that the impact of perceptual fluency on pleasant sensations is stronger when the source of fluency is unexpected, and we are surprised by how easily we can process the information. A far-fetched guess, but maybe this is what happens when a sound delivers an important message on a faster sensory channel, leading to an enhancement of second-to-second gameplay.
Here is a quick summary of the main ideas up to this point:
- Humans tend to react to sounds faster than visuals, with studies showing a difference of around 50 milliseconds.
- Some studies suggest that auditory information can help us make choices faster, with a difference of 50–100 milliseconds compared to visuals.
- Acoustic factors such as loudness, spectral and temporal qualities, affect our reaction time.
- Humans can recognize and distinguish sounds as short as 4–8 ms, which is quite impressive!
- Despite these differences in reaction time, it’s unclear whether they really matter in video games, and some evidence suggests they may have no effect on new players.
- While I don’t have hard data to back it up, it’s possible that experienced players benefit from faster reaction times when they learn to interpret game sounds.
While it is true that our hearing is faster than our sight, the practical benefits of this advantage for game audio design appear limited. The pace of the gameplay needs to be fast enough for <100 ms difference to become significant. Players will likely only benefit from shorter reaction time if they actively listen for the sounds they have memorized before. Given that auditory recognition memory is inferior to visual, it is very likely they memorize only a few sounds with the highest ludic value — those that convey the most useful information. It is also likely that memorizing the sound to the point of instant recognition will require them to spend a lot of time in the game. As a result, game audio design techniques that leverage auditory reaction time will only be useful for:
- Reaction-based competitive games.
- The most experienced players.
- Only a handful of sound effects with the highest ludic value.
If you don’t know how to assess ludic value of a sound, I invite you to read my two other articles where I describe the concept and provide some guidance on how to operate it.
Since short attack time enables faster reaction, highly informative sounds benefit from noticeable transients at the beginning. Remember that humans can recognize very short sounds, so consider putting extra effort into the first 50 milliseconds of a sound effect to turn them into a high-level message of its own. For example, all sounds that signify taking damage could share the identical noticeable transient part that indicates “your character got hurt” while the rest of the sound effect would communicate the source of the damage, such as impact, fire, poison, etc. I don’t have an ideal number of sounds that could use such treatment, but given the human memory limitations, 7 ± 2 looks like a good start.
While high loudness can reliably make humans react to sound faster, important ludic sounds don’t have to be the loudest in the mix as long as they remain audible. Instead, designers can give them an acoustic niche at a higher or lower end of the spectrum. Frequently played sounds can be sharp, while rare and impactful sounds can have a powerful low-end layer. We can also introduce roughness or nonlinearity with targeted amplitude or frequency modulation. Since the ultimate goal is to increase the noticeability of a sound, these tricks should only apply to its onset phase, that will help minimize the listening fatigue.