Unit 10.2: 3D Sound


Sound as Things in Space: Some Thoughts on 3D Audio.** We imagine sound as things, as bounded and tangible physical objects in three-dimensional space. “3D” is the way we perceive sound. And spatial language as a manifestation of spatial cognition is not just common in the description of spatial position and movement of auditory, visual or audiovisual objects, but also in the metaphorical description of pitch, timbre and intensity: A sound can be big or small, thick or thin, it can be edgy, sharp, flat or round, a tone can be high or low, a melody rising or falling. Can a listener distinguish between these metaphorical and physical/tangible properties of a sound? And how are spatial positions and trajectories of sound, spatial cognition and emotional impact related? The interdependencies between physical and metaphorical space become specifically interesting in “3D” audio systems like VR audio, Ambisonics or multichannel sound, where position and movement in space become creative devices.

Introduction Perceiving the Physical Environment

In spatial perception of the sonic environment we might distinguish between explicit and implicit space: Space is perceived explicitly, as auditory objects span a three-dimensional space through the perceived distances and directions of the auditory objects, and implicitly through reflections from room boundaries or non-sounding objects, creating the sensation of being immersed in space even in the absence of surrounding auditory objects. The former may be described as “sonic envelopment” or “surroundedness”, the latter as “room envelopment” (Berg, 2009). The first wavefront arriving at the listener’s ears creates the perception of an auditory object in the position in space from where this wavefront arrives, while all later instances of the same soundwave, arriving from random directions, create the perception of “auditory space” or “room envelopment” (law of the first wavefront or precedence effect). The first wavefront not just defines the spatial position of the perceived source, but rather its complete sonic qualities (generalized precedence effect, (Benade, 1985)).

Fahlenbrach notes:

Perceived space is basically different from “physical space”. We are not “measuring” physical properties of space, but perceive the environment with regard to our options of behaviour in or interaction with the environment. Specifically meaningful are e.g. sizes of objects and their relations and distances.” (Fahlenbrach, 2010)

The human auditory perceptual system is optimized for orientation in the horizontal plane. Consequently, perceived space is largely composed of horizontal spatial cues, namely interaural time differences (ITD), interaural level differences (ILD) and interaural spectral differences caused by the acoustical effects of head, pinnae (external ears) and torso. In contrast, spatial orientation in the vertical plane is solely determined by the acoustical impact of pinnae and torso.

The perceptual dominance of the horizontal plane becomes apparent in “2D” technologies for spatial audio recording and reproduction, namely stereophonic, quadraphonic, octophonic or surround systems.

Spatial Illusions Through Recorded Sound

Explicit and implicit spaces refer to perceiving the auditory scene “as it is”, the apparent physical world. This is different with recorded or synthesized (acousmatic) sound – Blesser and Salter note:

When music is reproduced electronically, a listener actually experiences a hybrid comprising at least three sets of spatial attributes: the acoustics of the performance space where the music was recorded (recording studio or concert hall), the acoustics added during the mixing process when the music was prepared (spatial synthesizer), and the acoustics of the listening space where music actually heard (living room).” (Blesser et al., 2007)

In surrounding 2D audio systems like quadraphonic or octophonic the upper hemisphere might nevertheless be populated by auditory objects, depending on the specific sounds or composition, a famous example being John Chowning’s Turenas (1972). And even a proper stereophonic recording can evoke the illusion of three-dimensional aural space between the loudspeakers. A three-dimensional playback system including height loudspeakers grants complete control of spatial positions/directions and trajectories of auditory objects or simulated reflections, the only limitation being the distance of the loudspeakers to the audience, marking more or less the minimum distance of an auditory object (to a certain degree this can be overcome with Wave Field Synthesis systems).

Headphone based “binaural” technologies – dummy head, virtual binaural/transaural, auditory VR/AR with or without headtracking – can potentially give access to the complete three-dimensional space including the emotionally very effective private and intimate distances to the body (Hall, 1990). However, the currently available technical tools still lack precision in the simulation of head and pinnae related spatial cues; the issue of individualized.

Nevertheless, it can be said that technically reproduced sound can create the illusion of space: Perceptual representations of physical space are evoked even if the physical simulation is limited. Our perception’s ability to recognize the auditory spaces of a composition during playback within the acoustics of the actual listening space can be regarded as a variation of Benade’s generalized precedence effect.

The Thingness of Sound

The classic psychoacoustic models do not explain the most peculiar feature of auditory perception, that is the perception of sound as a thing. We do not hear the sound but a hypothesis of the source or origin of the sound (a door, footsteps, a flute, the phone, a voice, …), externalized as a bounded physical object in space. Heidegger notes in The Origin of the Work of Art: “Much closer to us than all sensations are the things themselves. We hear the door shut in the house and never hear acoustical sensations or even mere sounds” (Heidegger, 1960). Pierre Schaeffer and Michel Chion call this identification of the sound with its source “causal listening” (Schaeffer, 1974) (Chion, 1994).

His Master's Voice
His Master's Voice

The very nature of sound is different: The physical phenomenon is an unbounded wavefield, pervading space, causing an alternating pressure at the eardrums, thus a tactile stimulus. Its perception as a distant bounded object appears so natural and self-evident that it is common e.g. in amateur mixing tutorials to depict spatial arrangements of sounds.

D. Gibson, “The Art of Mixing” Youtube tutorial. Note
that bass and kickdrum are depicted larger and lower in space than
e.g. the cymbals.
D. Gibson, “The Art of Mixing” Youtube tutorial. Note that bass and kickdrum are depicted larger and lower in space than e.g. the cymbals.

One explanation is the evolutionary-biological approach: Decomposing a very complex multi-sensory environment into scenes composed of relatively few visual, auditory and audiovisual objects allows us to orientate quickly in a potentially dangerous surrounding. We can expect similar mechanisms in the auditory perception of other species. Another explanation, referring to consciousness and reasoning, is given by Lakoff and Johnson: “Once we can identify our experiences as entities or substances, we can refer to them, categorize them, group them, and quantify them – and, by this means, reason about them.” (Lakoff et al., 1980)

Speaking of Sound

Besides “loud” and “silent” there are barely any words to describe the sensation of sound. When we talk of sound, we mainly use

  • onomatopoeia (shrill, hum, mumble, buzz, squeak, tweet, splash, sizzle, …)
  • descriptive terminology, referring to physical materiality (wooden, metallic, …), comparisons and references to familiar objects (“it sounds like …”), and terminologies of movement and interaction (vibrating, rolling, sliding, ripping, …)
  • metaphoric materiality and impersonations (solid, massive, fragile, strong, powerful, energetic, …)
  • specific technical/acoustical/musical/linguistic/etc. terminologies and…
  • crossmodal metaphors, referring to other sensory modalities (high/low, bright/dark, thick/thin, large/small, sharp, soft, rough, …)

Crossmodal metaphors are most interesting here, as they are referring to metaphoric properties of the auditory objects. Carl Stumpf noted in his Tonpsychologie that we express the sensation of sound “with a certain psychological necessity” in metaphors like high/low and large/small (Stumpf, 1883) . These metaphors are verbal expressions of the crossmodal correspondences of perception, mainly investigated since the 1980’s (for an overview see (Spence, 2011)). Crossmodal correspondences between auditory and visual / spatio-visual perception due to innate neural connections or infant learning have been proven for pitch or intensity and height, size, thickness, shape (edgy/round) and brightness. The dominant metaphor is the height of a sound depending on its pitch, also referred to as SPARC or SMARC effect (spatial-pitch or spatial-musical association of response codes): The respective linguistic metaphor can be found in numerous languages. And these crossmodal mappings interfere with basic object perception: Physical and metaphorical properties of the auditory object can be indistinguishable. “Pratt’s effect” or “pitch height effect” describes the perceived spatial elevation of an auditory object according to its pitch (Pratt, 1930) (Roffler et al., 1968) (Ferguson et al., 2005).

To sum this up:

  • height and size of an auditory object have each a (perceived) physical and metaphoric dimension,
  • its movement can have a complex metaphoric dimension, according to pitch and temporal structures, besides its (perceived) physical dimension,
  • intensity as well as low frequency content refer to the metaphor of size, evoking distance and proximity,
  • moreover, the auditory object may have a metaphoric shape and haptics.

Metaphoric Spaces, Metaphoric Properties of Auditory Objects and Conceptual Metaphors

In music and specifically in 3D-reproduced acousmatic compositions, the virtual physical spaces created through spatial positions and movements of auditory objects and by means of spatial synthesizers / reverb algorithms are in a dialogue with the crossmodally metaphoric spatial properties of the sounds, specifically due to their metaphoric sizes and heights. Gernot Böhme identifies these metaphoric spaces as “musical spaces” that are “experienced affectively” (Böhme, 2017). An impressive example of this inherent spatiality of a complex sound can be found in the 2nd part “Silentium” of Arvo Pärt’s Tabula Rasa (1977): The seemingly endless melodic movements, simultaneously upwards and downwards, evoke associations of vast spaces, independent of the physical spatiality of the recorded sound that may be a humble stereo production.

Arvo Pärt: Tabula Rasa. Silentium (stereo recording,
spectrogram created with Audacity). Note that the metaphoric spatial
movement is visualized in the spectrogram's spatial mappings of time
and frequency.
Arvo Pärt: Tabula Rasa. Silentium (stereo recording, spectrogram created with Audacity). Note that the metaphoric spatial movement is visualized in the spectrogram's spatial mappings of time and frequency.

Besides the mere experience of space due to perceived physical or metaphorical spatial properties of sounds, one can expect a further impact: Human cognition is strongly related to space, mental representations are mapped along spatial dimensions (Lakoff et al., 2003) (Olson et al., 2009). Tversky states: “spatial thinking, rooted in perception of space and action in it, is the foundation for all thought.” (Tversky, 2019) – Spatial concepts of abstract thought may be excited by spatial sound as well as by spatial crossmodal metaphors.

Space in Composition and Sound Design

Space as a parameter in 3D audio compositions can be differentiated in explicit, implicit and metaphoric spaces. The first and most obvious spatial concepts apply to explicit composed space, i.e. the (virtual) spatial relations of auditory objects in a composition or sound design. As we experience space in relation to our body, relevant spatial relations can be derived from spatial cognition (cf. (Olson et al., 2009)):

  • up / down
  • close / distant
  • in front / behind
  • central / peripheral
  • inside / outside

Movement can be applied to individual objects, to meta-objects composed from several basic objects, or to complete auditory scenes (which even might induce vection, the illusion of self-movement in the audience). A specific spatial concept might call for a specific 3D audio technology facilitating this concept, like channel based (e.g. VBAP), object based, or scene based audio (e.g. Ambisonics / HOA), and like loudspeaker, binaural or VR presentation.

Suggestions for spatial concepts regarding room envelopment (implicit space) that can be rendered in a full 3D system with tools like delay, algorithmic reverb or convolution reverb are

  • large / small
  • wide / narrow
  • low / high
  • open / closed

Finally, the congruency of physical and metaphorical properties can be regarded an important factor in spatial composition: As spatial position, distance and size of the auditory object have both a perceived physical and a metaphoric dimension, the congruency of these dimensions becomes meaningful and therewith is a creative device. Likewise, physical movement / spatial trajectories of auditory objects or meta-objects can be congruent or incongruent with their intrinsic movement due to pitch and rhythm. Would Tabula Rasa benefit from a congruent spatialisation?


  1. Benade, A.H. “From Instrument to Ear in a Room: Direct or via Recording, Journ.”. Audio Eng. Soc. 33(4). 1985.
  2. Berg, J. “The contrasting and conflicting definitions of envelopment, Convention Paper 7808”. Audio Eng. Soc. 126th Conv., Munich. 2009.
  3. Blesser, B. & Salter, L.R. “Spaces Speak, Are You Listening? Experiencing Aural Architecture”. MIT Press. 2007.
  4. Böhme, G. “Atmosphäre. Essays zur neuen Ästhetik”. edition suhrkamp, 7th ed.. 2017.
  5. Chion, M. “Audio-Vision: Sound on Screen”. Columbia University Press. 1994.
  6. Fahlenbrach, K. “Audiovisuelle Metaphern: Zur örper- und Affektästhetik in Film und Fernsehen”. Schüren. 2010.
  7. Ferguson, S. & Cabrera, D. “Vertical Localization of Sound from Multiway Loudspeakers”. Journ. Audio Eng. Soc. 53(3). 2005.
  8. Hall, E.T. “The Hidden Dimension [1969]”. Anchor Books. 1990.
  9. Heidegger, M. “Der Ursprung des Kunstwerks (The Origin of the Work of Art)”. Reclam. 1960.
  10. Ihde, D. “Listening and Voice: Phenomenologies of Sound”. State Univ. of New York Press, 2nd ed.. 2007.
  11. Lakoff, G. & Johnson, M. “The metaphorical structure of the human conceptual system”. Cogn. Science 4. 1980.
  12. Lakoff, G. & Johnson, M. “Metaphors We Live By [1980]”. Univ. of Chicago Press. 2003.
  13. Olson, D. R. & Bialystok, E. “Spatial Cognition. The Structure and Development of Mental Representations of Spatial Relations [1983]”. Psychology Press. 2009.
  14. Pratt, C.C. “The spatial character of high and low tones”. Journ. Experimental Psychology 13,. 1930.
  15. Roffler, S.K. & Butler, R.A. “Localization of tonal stimuli in the vertical plane”. Journ. Acoust. Soc. Am. 43(6). 1968.
  16. Schaeffer, P. “Musique Concrète”. Ernst Klett Verlag. 1974.
  17. Spence, C. “Crossmodal correspondences: a tutorial review”. Attn. Percep. & Psychophys. 73(4). 2011.
  18. Stumpf, C. “Tonpsychologie, 1.Bd.”. Hirzel. 1883.
  19. Tversky, B. “Mind in Motion. How Action Shapes Thought”. Basic Books. 2019.


  • Thomas Görne