Unit 11: Music and Language
Synopsis
Music and language are both systematic, organized uses of sound through which we are able to communicate and express ourselves. While the two are certainly different, they share a number of similar features with one another. For example, both exhibit syntactic structure, and both rely on sensory, cognitive, and the vocal-motor apparatus of the central nervous system. Although there is some evidence that music and language are processed by different parts of the brain, for example, it is possible to lose the ability to speak while retaining the ability to sing and vice versa, recent evidence has shown some degree of overlap in brain activity between the two. Ongoing research suggests that these similarities may be due to certain general characteristics of how the brain processes sequential sounds.
Music and Language
Music and language are both fundamental to human experience. The two domains, while apparently different, rely on several notable similarities: both exhibit syntactic structure, and both rely on sensory, cognitive, and vocal-motor apparatus of the central nervous system.
Evidence for shared processing between music and language
In the field of neuropsychology, the double dissociations between amusia and aphasia have been used to claim that music and language do not share common neural substrates. On the other hand, neuroimaging evidence (from functional and structural Magnetic Resonance Imaging, as well as electrophysiological recordings) show similar activations between musical and linguistic stimuli, which stands as evidence in support of the overlap between music and language. Patel (2008) points out that several aphasics without amusia are professional musicians whose data cannot be generalized to ordinary individuals.
One methodological issue in characterizing musicality is that tests used to verify musicality and/or musical competences are not very sensitive and well refined. An explanatory theory to account for discrepancies between neuroimaging and neuropsychological data is needed.
Phonology
Phonology refers to the sounds of speech and language. Abundant evidence suggests that phonological processing may be influenced by musical training. For example, phonological ability is a second language is better in people who have musical training (Slevc et al, 2006). Children who are better able to perceive and produce pitches also perform better at tasks of phonological awareness, which requires manipulating speech sounds in the mind (Loui et al, 2011).
Syntax
Syntax refers to the ordering of entities to form coherent strings. In language, this refers to the order of words and phrases to form sentences. In music, syntax refers to the ordering of pitches, chords, and rhythms to form melodies and harmonies. The nature of the relationship between syntactic structure in language and music, and their underlying neural substrates, is a topic of intense interest to the cognitive and brain sciences community.
The Shared Syntactic Integration Resource Hypothesis (SSIRH, Patel, 2008) is an influential theoretical account of similarities and differences between cognitive processing for music and language. The SSIRH posits that neural resources for music and language overlap at the level of the syntax; in other words, processing of music and language should interact at the syntactic level, but not at other levels such as semantics or acoustic or phonemic structure. Support for the SSIRH comes from a variety of behavioral and neural studies (see Patel, 2008 for a review). One set of findings comes from EEG recordings in humans as they listen to chord progressions that were unexpected given the context, creating a sort of syntax-like violation. These violations in musical structure elicit patterns of brain potentials, specially a negative waveform in the frontal recording sites on the head, around 200 ms after the unexpected chord occurred. This waveform is known as the Early Right Anterior Negativity (ERAN) (Koelsch et al, 2000). The ERAN has been observed in response to many syntactically unexpected musical contexts, even in alternative musical systems such as the Bohlen-Pierce scale (Loui et al, 2009). The ERAN can be elicited even when music is played in the background: directing attention away from the music can reduce but not completely abolish this effect, suggesting that the ERAN is elicited by partially automatic neural sources (Loui et al, 2005).
Semantics
Semantics refers to the study of meaning. In the study of semantics in language, influential results have come from studies that involve reading sentences with expected words being occasionally replaced by ssemantically unexpected words. Consider this example:
I take my coffee with cream and socks.
Since the word “socks” is unexpected in meaning, it is incongruous with the semantic expectation in most English speakers. By comparing brain responses for this semantically incongruous sentence against brain responses for expected sentences, i.e. “I take my coffee with cream and sugar.” we can observe how the brain responsds to semantics. From electrophysiology studies, the onset of the word “wocks” is known to elicit a negative waveform around 400 milliseconds after the onset of the unexpected word. This is known as the N400 effect. Koelsch and colleagues later found that when these unexpected words were presented after hearing short clips of music that preceded the word, then the N400 was reduced. This suggests that musical information can affect, or prime, semantic processing of language.
Prosody
Prosody refers to the sounds of speech: the stress patterns, the pitch changes, and the rhythm within speech. These are parts of speech that are not usually written down. The prosody of speech can be appreciated when listening to low-pass filtered speech sounds, or sine wave speech [http://www.lifesci.sussex.ac.uk/home/Chris_Darwin/SWS/]. These speech samples, while not always comprehensible, bring out the prosodic elements (pitch and pitch changes, rhythm and timing content) of speech.
Another aspect of speech in which prosody comes across clearly is in accents. Stress patterns and pitch patterns differ clearly between different accents, as can be seen here:
The rhythmic content of speech can be captured using the normalized Pairwise Variability Index (nPVI): the ratio between successive durations of syllables within (formula given here)
\begin{equation} \mathrm{nPVI} = 100 \left[ \sum_{k=1}^{m-1} \left| \frac{d_k - d_{k+1}}{\frac{1}{2}(d_k + d_{k+1})} \right| \frac{1}{(m-1)} \right] \label{eq:nPVI} \end{equation}
As in the examples, the speaker is able to manipulate her speech sound duration, the pattern of duration of different syllables, and the direction of her pitches is effective in conveying nationality or regionality, or where the speaker comes from.
In music, composers and performers may also employ certain compositional devices to convey a national or regional accent. Here is an examples of very “French-sounding” music: Debussy La Mer
Here is an example of very “English-sounding” music: Elgar Violin Concerto
One example of quantified differences between different languages, that could also be applied to music, is the nPVI. The nPVI of English speech is higher than for French speech, because of the different syllabic structures of French and English speech. Daniele and Patel (2003) extended this result to music, and showed that the nPVI of music by English composers is higher than music by French composers.
Lolli et al (2015) showed a correlation between pitch perception and emotional content in speech: people who were better at discriminating small differences in pitch (i.e. better pitch discrimination skills) were also better at identifying the emotional content from speech, i.e. telling between “happy-sounding” or “sad-sounding” speech. This relationship was strongest when the speech was low-pass filtered. This finding has several implications: 1) the prosodic content is part of emotional content of speech, 2) this emotional content is contained in low-frequency information within speech, and 3) people with musical training or musical ability may be more sensitive to emotional content.
Quiz
Describe the N400 effect, and what it reveals about the relationship between music and language.
What does the nPVI measure, and how has it been used to draw comparisons between music and speech?