In the late 1970s the fbi hired sue thomas, along with eight other deaf individuals, to analyze fingerprint patterns. Deaf people, the agency reasoned, might have an easier time staying focused during the notoriously meticulous task. From the first day, however, Thomas found the job unbearably monotonous. She complained to her superiors so often that she was prepared to walk away unemployed when her boss summoned her to a meeting with other agents in his office.
But Thomas was not fired—she was, in a sense, promoted. The agents showed her a silent video of two criminal suspects conversing and asked her to decipher their conversation.
In their own interactions with Thomas, the agents had noticed how deftly she read their lips. As her co-workers anticipated, Thomas easily interpreted the suspects’ dialogue, which implicated them in an illegal gambling ring. So began Thomas’s career as the FBI’s first deaf lipreading expert.
A lifetime’s dependence on lipreading to communicate had honed Thomas’s skill, but we all rely on the same talent more than we know. In fact, our ability to understand speech is diminished if we cannot see the lips of the speaker, especially in a noisy environment or when the speaker has a thick accent that is foreign to us. Learning to perceive speech with our eyes, as well as our ears, is an important part of typical speech development; as a consequence, blind infants—who cannot see the mouths of speakers around them—often take longer than average to learn certain aspects of speech. We simply cannot help but integrate the words we see on another’s lips with the words we hear. In recent years research on multisensory speech perception has helped bring about a revolution in our understanding of how the brain organizes the information it receives from our many different senses.
Neuroscientists and psychologists have largely abandoned early ideas of the brain as a Swiss Army knife, in which many distinct regions are dedicated to different senses. Instead scientists now think that the brain has evolved to encourage as much cross talk as possible between the senses—that the brain’s sensory regions are physically intertwined.
Our senses are always eavesdropping on one another and sticking their noses in one another’s business. Although the visual cortex is primarily concerned with vision, for example, it is perfectly capable of interpreting other sensory information as well. Within 90 minutes of being blindfolded, a seeing person becomes extra sensitive to touch via the visual cortex; likewise, brain scans have shown that blind people’s visual cortices rewire themselves for hearing. When we snack on potato chips, the crispness of our crunching partially determines how good we think the chips taste—and researchers can bias the results of taste tests by tweaking what people hear. Where we look when we stand still, and what we see, shapes our whole body posture. Put simply, research in the past 15 years demonstrates that no sense works alone. The multisensory revolution is also suggesting new ways to improve devices for the blind and deaf, such as cochlear implants.
Silent Syllables
One of the earliest and most robust examples of multisensory perception is known as the McGurk effect, first reported by Harry McGurk and John MacDonald in 1976. If you watch a video clip of someone silently and repeatedly mouthing the syllable “ga” while you listen to a recording of the same person speaking the syllable “ba,” you will hear them pronouncing “da.” The silent “ga” syllables change your perception of the audible “ba” syllables because the brain integrates what the body hears and sees. The McGurk effect works in all languages and continues to work even if you have been studying it for 25 years—I can vouch for that myself.
The speech you hear is also influenced by the speech you feel. In 1991 Carol Fowler, then at Dartmouth College, and her colleagues asked naive volunteers to try something called the Tadoma technique, in which you interpret someone’s speech by placing your fingers on their lips, cheek and neck. Before cochlear implants, many deaf-blind individuals (including Helen Keller) relied on Tadoma. The syllables the volunteers felt changed how they interpreted syllables coming from nearby loudspeakers.
In 1997 Gemma Calvert, then at the University of Oxford, mapped the areas of the brain that are most active during lipreading. Volunteers with no formal lipreading experience silently lipread a face that slowly articulated the numbers one through nine. Calvert and her colleagues found that lipreading fired up the auditory cortex—the region of the brain that processes sounds—as well as related brain regions known to be active when someone hears speech. This was one of the first demonstrations of cross-sensory influences on an area of the brain thought to be dedicated to a single sense. More recent studies have contributed further evidence of sensory synthesis. For example, scientists now know that the auditory brain stem responds to aspects of seen speech, whereas before they thought it was involved only in more rudimentary processing of sounds. Neuroimaging studies have shown that during the McGurk effect—hearing “da” even though the recorded sound is “ba”—the brain behaves as though the syllable “da” were falling on that person’s ears.
These findings suggest that the brain may give equal weight to speech gleaned from the ears, the eyes and even the skin. This is not to say that these distinct modalities provide an equal amount of information: clearly, hearing captures more articulatory detail than sight or touch. Rather the brain makes a concerted effort to consider and combine all the different types of speech information it receives, regardless of modality.
Written All over Your Face
In other instances, distinct senses help one another process the same type of information. The specific manner in which a person speaks, for example, provides information about who they are, regardless of whether their speech is seen or heard. My colleagues and I film people speaking and manipulate the resulting videos to remove all recognizable facial features—transforming faces into patterns of glowing dots that dart and bob like fireflies where someone’s cheeks and lips would have appeared. When we play the videos, our volunteers can lipread these faceless cluster of dots and recognize their friends.
Simple sounds derived from speech can also clue us in to a person’s identity. Robert Remez of Columbia University and his colleagues reduce normal speech recordings to sine waves that sound something like the whistles and bloops emitted by R2-D2 in Star Wars. Despite missing the typical qualities that distinguish voices such as pitch and timbre, these sine waves retain speaking-style information that allows listeners to recognize their friends. Most strikingly, volunteers can match these sine waves to glowing dot videos of the same person talking.
The fact that stripped-down versions of both heard and seen speech preserve similar information about speech style suggests that these distinct modes of perception are entangled in the brain. Neuroimaging research supports this connection: listening to the voice of someone familiar induces neural activity in the fusiform gyrus, an area of the human brain involved in recognizing faces.
These findings inspired an even more outlandish prediction. If these forms of perception are mingled, then learning to read someone’s lips should simultaneously improve one’s ability to hear his or her spoken words. We asked volunteers with no lipreading experience to practice lipreading silent videos of someone speaking for one hour. Afterward, the volunteers listened to a set of spoken sentences played against a background of random noise. Unbeknownst to them, half the participants listened to sentences spoken by the same person they had just lipread, whereas the other half heard sentences from a different speaker. The volunteers who lipread and listened to the same person were more successful at picking out the sentences from the noise.
Promiscuous Perception
Research on multisensory speech perception has helped inspire scientists to investigate all kinds of previously unstudied interactions between the senses. For example, most of us know that smell is a big component of taste, but some research shows that sights and sounds also change flavor. In a particularly striking example, scientists found that an orange-flavored drink will taste of cherry if it is tinted red, and vice versa. In 2005 Massimiliano Zampini of the University of Trento in Italy and his teammates showed that altering the timbre of a crunching sound played to volunteers as they ate potato chips partially determined how fresh and crisp the chips tasted. Looking at a continuously descending visual texture—such as a waterfall—convinces people that certain textured surfaces they feel with their hands are ascending. Other evidence shows that cross-sensory input unconsciously changes our behaviors. Tom Stoffregen of the University of Minnesota and his colleagues asked volunteers to stand straight and shift their gaze from a nearby target to a distant one. This simple shift in visual focus induced subtle but systematic changes in body posture.
Similar findings have become so prevalent that many researchers now think of the sensory regions of the brain as inherently multisensory. This revised model of the brain is also consistent with evidence of the brain’s incredible plasticity—it can switch up a region’s primary function when faced with even short-term or subtle sensory deprivation. For example, imaging research in the past four years has confirmed that blindfolding a person for as little as one and a half hours primes their visual cortex to respond to touch. In fact, the visual cortex’s involvement actually heightens sensitivity to touch. In a related example, nearsightedness often enhances people’s auditory and spatial skills even if they wear glasses (which leave a good part of the visual periphery blurry). In general, cross-sensory compensation is much more prevalent than we previously thought.
The multisensory revolution has already started to help people who have lost one of their primary senses. Research has shown, for example, that cochlear implants are less effective if someone’s brain has had too much time to repurpose the neglected auditory cortex for other forms of perception, such as vision and touch. It is generally recommended, therefore, that congenitally deaf children receive cochlear implants as soon as possible. Similar research has encouraged the practice of having deaf children who have received cochlear implants watch videos of people speaking so that they learn how to integrate the speech they see on someone’s lips with the speech they hear.
Engineers working on face- and speech-recognition devices have benefited from research on multisensory perception, too. Speech-recognition systems often perform poorly when faced with even moderate levels of background noise. Teaching such systems to analyze video footage of someone’s mouth substantially increases accuracy—a strategy that works even with the types of cameras commonly installed in cell phones and laptops.
In some ways, the notion of multisensory perception seems to contradict our everyday experiences. Our instinct is to organize the senses into types because each sense seems to apprehend a very different aspect of our world. We use our eyes to see others and our ears to hear them; we feel the firmness of an apple with our hands but taste it with our tongue. Once sensory information reaches the brain, however, such strict classification crumbles. The brain does not channel visual information from the eyes into one neural container and auditory information from the ears into another, discrete, container as though it were sorting coins. Rather our brains derive meaning from the world in as many ways as possible by blending the diverse forms of sensory perception.
But Thomas was not fired—she was, in a sense, promoted. The agents showed her a silent video of two criminal suspects conversing and asked her to decipher their conversation.
In their own interactions with Thomas, the agents had noticed how deftly she read their lips. As her co-workers anticipated, Thomas easily interpreted the suspects’ dialogue, which implicated them in an illegal gambling ring. So began Thomas’s career as the FBI’s first deaf lipreading expert.
A lifetime’s dependence on lipreading to communicate had honed Thomas’s skill, but we all rely on the same talent more than we know. In fact, our ability to understand speech is diminished if we cannot see the lips of the speaker, especially in a noisy environment or when the speaker has a thick accent that is foreign to us. Learning to perceive speech with our eyes, as well as our ears, is an important part of typical speech development; as a consequence, blind infants—who cannot see the mouths of speakers around them—often take longer than average to learn certain aspects of speech. We simply cannot help but integrate the words we see on another’s lips with the words we hear. In recent years research on multisensory speech perception has helped bring about a revolution in our understanding of how the brain organizes the information it receives from our many different senses.
Neuroscientists and psychologists have largely abandoned early ideas of the brain as a Swiss Army knife, in which many distinct regions are dedicated to different senses. Instead scientists now think that the brain has evolved to encourage as much cross talk as possible between the senses—that the brain’s sensory regions are physically intertwined.
Our senses are always eavesdropping on one another and sticking their noses in one another’s business. Although the visual cortex is primarily concerned with vision, for example, it is perfectly capable of interpreting other sensory information as well. Within 90 minutes of being blindfolded, a seeing person becomes extra sensitive to touch via the visual cortex; likewise, brain scans have shown that blind people’s visual cortices rewire themselves for hearing. When we snack on potato chips, the crispness of our crunching partially determines how good we think the chips taste—and researchers can bias the results of taste tests by tweaking what people hear. Where we look when we stand still, and what we see, shapes our whole body posture. Put simply, research in the past 15 years demonstrates that no sense works alone. The multisensory revolution is also suggesting new ways to improve devices for the blind and deaf, such as cochlear implants.
Silent Syllables
One of the earliest and most robust examples of multisensory perception is known as the McGurk effect, first reported by Harry McGurk and John MacDonald in 1976. If you watch a video clip of someone silently and repeatedly mouthing the syllable “ga” while you listen to a recording of the same person speaking the syllable “ba,” you will hear them pronouncing “da.” The silent “ga” syllables change your perception of the audible “ba” syllables because the brain integrates what the body hears and sees. The McGurk effect works in all languages and continues to work even if you have been studying it for 25 years—I can vouch for that myself.
The speech you hear is also influenced by the speech you feel. In 1991 Carol Fowler, then at Dartmouth College, and her colleagues asked naive volunteers to try something called the Tadoma technique, in which you interpret someone’s speech by placing your fingers on their lips, cheek and neck. Before cochlear implants, many deaf-blind individuals (including Helen Keller) relied on Tadoma. The syllables the volunteers felt changed how they interpreted syllables coming from nearby loudspeakers.
In 1997 Gemma Calvert, then at the University of Oxford, mapped the areas of the brain that are most active during lipreading. Volunteers with no formal lipreading experience silently lipread a face that slowly articulated the numbers one through nine. Calvert and her colleagues found that lipreading fired up the auditory cortex—the region of the brain that processes sounds—as well as related brain regions known to be active when someone hears speech. This was one of the first demonstrations of cross-sensory influences on an area of the brain thought to be dedicated to a single sense. More recent studies have contributed further evidence of sensory synthesis. For example, scientists now know that the auditory brain stem responds to aspects of seen speech, whereas before they thought it was involved only in more rudimentary processing of sounds. Neuroimaging studies have shown that during the McGurk effect—hearing “da” even though the recorded sound is “ba”—the brain behaves as though the syllable “da” were falling on that person’s ears.
These findings suggest that the brain may give equal weight to speech gleaned from the ears, the eyes and even the skin. This is not to say that these distinct modalities provide an equal amount of information: clearly, hearing captures more articulatory detail than sight or touch. Rather the brain makes a concerted effort to consider and combine all the different types of speech information it receives, regardless of modality.
Written All over Your Face
In other instances, distinct senses help one another process the same type of information. The specific manner in which a person speaks, for example, provides information about who they are, regardless of whether their speech is seen or heard. My colleagues and I film people speaking and manipulate the resulting videos to remove all recognizable facial features—transforming faces into patterns of glowing dots that dart and bob like fireflies where someone’s cheeks and lips would have appeared. When we play the videos, our volunteers can lipread these faceless cluster of dots and recognize their friends.
Simple sounds derived from speech can also clue us in to a person’s identity. Robert Remez of Columbia University and his colleagues reduce normal speech recordings to sine waves that sound something like the whistles and bloops emitted by R2-D2 in Star Wars. Despite missing the typical qualities that distinguish voices such as pitch and timbre, these sine waves retain speaking-style information that allows listeners to recognize their friends. Most strikingly, volunteers can match these sine waves to glowing dot videos of the same person talking.
The fact that stripped-down versions of both heard and seen speech preserve similar information about speech style suggests that these distinct modes of perception are entangled in the brain. Neuroimaging research supports this connection: listening to the voice of someone familiar induces neural activity in the fusiform gyrus, an area of the human brain involved in recognizing faces.
These findings inspired an even more outlandish prediction. If these forms of perception are mingled, then learning to read someone’s lips should simultaneously improve one’s ability to hear his or her spoken words. We asked volunteers with no lipreading experience to practice lipreading silent videos of someone speaking for one hour. Afterward, the volunteers listened to a set of spoken sentences played against a background of random noise. Unbeknownst to them, half the participants listened to sentences spoken by the same person they had just lipread, whereas the other half heard sentences from a different speaker. The volunteers who lipread and listened to the same person were more successful at picking out the sentences from the noise.
Promiscuous Perception
Research on multisensory speech perception has helped inspire scientists to investigate all kinds of previously unstudied interactions between the senses. For example, most of us know that smell is a big component of taste, but some research shows that sights and sounds also change flavor. In a particularly striking example, scientists found that an orange-flavored drink will taste of cherry if it is tinted red, and vice versa. In 2005 Massimiliano Zampini of the University of Trento in Italy and his teammates showed that altering the timbre of a crunching sound played to volunteers as they ate potato chips partially determined how fresh and crisp the chips tasted. Looking at a continuously descending visual texture—such as a waterfall—convinces people that certain textured surfaces they feel with their hands are ascending. Other evidence shows that cross-sensory input unconsciously changes our behaviors. Tom Stoffregen of the University of Minnesota and his colleagues asked volunteers to stand straight and shift their gaze from a nearby target to a distant one. This simple shift in visual focus induced subtle but systematic changes in body posture.
Similar findings have become so prevalent that many researchers now think of the sensory regions of the brain as inherently multisensory. This revised model of the brain is also consistent with evidence of the brain’s incredible plasticity—it can switch up a region’s primary function when faced with even short-term or subtle sensory deprivation. For example, imaging research in the past four years has confirmed that blindfolding a person for as little as one and a half hours primes their visual cortex to respond to touch. In fact, the visual cortex’s involvement actually heightens sensitivity to touch. In a related example, nearsightedness often enhances people’s auditory and spatial skills even if they wear glasses (which leave a good part of the visual periphery blurry). In general, cross-sensory compensation is much more prevalent than we previously thought.
The multisensory revolution has already started to help people who have lost one of their primary senses. Research has shown, for example, that cochlear implants are less effective if someone’s brain has had too much time to repurpose the neglected auditory cortex for other forms of perception, such as vision and touch. It is generally recommended, therefore, that congenitally deaf children receive cochlear implants as soon as possible. Similar research has encouraged the practice of having deaf children who have received cochlear implants watch videos of people speaking so that they learn how to integrate the speech they see on someone’s lips with the speech they hear.
Engineers working on face- and speech-recognition devices have benefited from research on multisensory perception, too. Speech-recognition systems often perform poorly when faced with even moderate levels of background noise. Teaching such systems to analyze video footage of someone’s mouth substantially increases accuracy—a strategy that works even with the types of cameras commonly installed in cell phones and laptops.
In some ways, the notion of multisensory perception seems to contradict our everyday experiences. Our instinct is to organize the senses into types because each sense seems to apprehend a very different aspect of our world. We use our eyes to see others and our ears to hear them; we feel the firmness of an apple with our hands but taste it with our tongue. Once sensory information reaches the brain, however, such strict classification crumbles. The brain does not channel visual information from the eyes into one neural container and auditory information from the ears into another, discrete, container as though it were sorting coins. Rather our brains derive meaning from the world in as many ways as possible by blending the diverse forms of sensory perception.