Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments

Lars A. Ross; Dave Saint-Amour; Victoria M. Leavitt; Daniel C. Javitt; John J. Foxe

doi:10.1093/cercor/bhl024

Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments

Lars A. Ross, Dave Saint-Amour, Victoria M. Leavitt, Daniel C. Javitt, John J. Foxe

Research output: Contribution to journal › Article › peer-review

486 Scopus citations

Abstract

Viewing a speaker's articulatory movements substantially improves a listener's ability to understand spoken words, especially under noisy environmental conditions. It has been claimed that this gain is most pronounced when auditory input is weakest, an effect that has been related to a well-known principle of multisensory integration - "inverse effectiveness." In keeping with the predictions of this principle, the present study showed substantial gain in multisensory speech enhancement at even the lowest signal-to-noise ratios (SNRs) used (-24 dB), but it was also evident that there was a "special zone" at a more intermediate SNR of -12 dB where multisensory integration was additionally enhanced beyond the predictions of this principle. As such, we show that inverse effectiveness does not strictly apply to the multisensory enhancements seen during audiovisual speech perception. Rather, the gain from viewing visual articulations is maximal at intermediate SNRs, well above the lowest auditory SNR where the recognition of whole words is significantly different from zero. We contend that the multisensory speech system is maximally tuned for SNRs between extremes, where the system relies on either the visual (speech-reading) or the auditory modality alone, forming a window of maximal integration at intermediate SNR levels. At these intermediate levels, the extent of multisensory enhancement of speech recognition is considerable, amounting to more than a 3-fold performance improvement relative to an auditory-alone condition.

Original language	English (US)
Pages (from-to)	1147-1153
Number of pages	7
Journal	Cerebral Cortex
Volume	17
Issue number	5
DOIs	https://doi.org/10.1093/cercor/bhl024
State	Published - May 2007
Externally published	Yes

Keywords

Audiovisual
Crossmodal
Inverse effectiveness
Lip-reading
Multisensory
Speech perception
Speech-reading

ASJC Scopus subject areas

Cognitive Neuroscience
Cellular and Molecular Neuroscience

Access to Document

10.1093/cercor/bhl024

Cite this

@article{d4e993e1721042f6b08b10967073b658,

title = "Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments",

abstract = "Viewing a speaker's articulatory movements substantially improves a listener's ability to understand spoken words, especially under noisy environmental conditions. It has been claimed that this gain is most pronounced when auditory input is weakest, an effect that has been related to a well-known principle of multisensory integration - {"}inverse effectiveness.{"} In keeping with the predictions of this principle, the present study showed substantial gain in multisensory speech enhancement at even the lowest signal-to-noise ratios (SNRs) used (-24 dB), but it was also evident that there was a {"}special zone{"} at a more intermediate SNR of -12 dB where multisensory integration was additionally enhanced beyond the predictions of this principle. As such, we show that inverse effectiveness does not strictly apply to the multisensory enhancements seen during audiovisual speech perception. Rather, the gain from viewing visual articulations is maximal at intermediate SNRs, well above the lowest auditory SNR where the recognition of whole words is significantly different from zero. We contend that the multisensory speech system is maximally tuned for SNRs between extremes, where the system relies on either the visual (speech-reading) or the auditory modality alone, forming a window of maximal integration at intermediate SNR levels. At these intermediate levels, the extent of multisensory enhancement of speech recognition is considerable, amounting to more than a 3-fold performance improvement relative to an auditory-alone condition.",

keywords = "Audiovisual, Crossmodal, Inverse effectiveness, Lip-reading, Multisensory, Speech perception, Speech-reading",

author = "Ross, {Lars A.} and Dave Saint-Amour and Leavitt, {Victoria M.} and Javitt, {Daniel C.} and Foxe, {John J.}",

note = "Funding Information: Support for this work was provided by grants to JJF from the National Institute of Mental Health (MH65350) and the National Institute on Aging (AG22696). The authors would like to express their sincere thanks to Dr Sophie Molholm for her ever-valuable comments on earlier versions. We would also like to thank our good friend Dr Alex Meredith for his challenging comments and 2 anonymous reviewers for their helpful suggestions. Conflict of Interest: None declared.",

year = "2007",

month = may,

doi = "10.1093/cercor/bhl024",

language = "English (US)",

volume = "17",

pages = "1147--1153",

journal = "Cerebral Cortex",

issn = "1047-3211",

publisher = "Oxford University Press",

number = "5",

}

TY - JOUR

T1 - Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments

AU - Ross, Lars A.

AU - Saint-Amour, Dave

AU - Leavitt, Victoria M.

AU - Javitt, Daniel C.

AU - Foxe, John J.

N1 - Funding Information: Support for this work was provided by grants to JJF from the National Institute of Mental Health (MH65350) and the National Institute on Aging (AG22696). The authors would like to express their sincere thanks to Dr Sophie Molholm for her ever-valuable comments on earlier versions. We would also like to thank our good friend Dr Alex Meredith for his challenging comments and 2 anonymous reviewers for their helpful suggestions. Conflict of Interest: None declared.

PY - 2007/5

Y1 - 2007/5

N2 - Viewing a speaker's articulatory movements substantially improves a listener's ability to understand spoken words, especially under noisy environmental conditions. It has been claimed that this gain is most pronounced when auditory input is weakest, an effect that has been related to a well-known principle of multisensory integration - "inverse effectiveness." In keeping with the predictions of this principle, the present study showed substantial gain in multisensory speech enhancement at even the lowest signal-to-noise ratios (SNRs) used (-24 dB), but it was also evident that there was a "special zone" at a more intermediate SNR of -12 dB where multisensory integration was additionally enhanced beyond the predictions of this principle. As such, we show that inverse effectiveness does not strictly apply to the multisensory enhancements seen during audiovisual speech perception. Rather, the gain from viewing visual articulations is maximal at intermediate SNRs, well above the lowest auditory SNR where the recognition of whole words is significantly different from zero. We contend that the multisensory speech system is maximally tuned for SNRs between extremes, where the system relies on either the visual (speech-reading) or the auditory modality alone, forming a window of maximal integration at intermediate SNR levels. At these intermediate levels, the extent of multisensory enhancement of speech recognition is considerable, amounting to more than a 3-fold performance improvement relative to an auditory-alone condition.

AB - Viewing a speaker's articulatory movements substantially improves a listener's ability to understand spoken words, especially under noisy environmental conditions. It has been claimed that this gain is most pronounced when auditory input is weakest, an effect that has been related to a well-known principle of multisensory integration - "inverse effectiveness." In keeping with the predictions of this principle, the present study showed substantial gain in multisensory speech enhancement at even the lowest signal-to-noise ratios (SNRs) used (-24 dB), but it was also evident that there was a "special zone" at a more intermediate SNR of -12 dB where multisensory integration was additionally enhanced beyond the predictions of this principle. As such, we show that inverse effectiveness does not strictly apply to the multisensory enhancements seen during audiovisual speech perception. Rather, the gain from viewing visual articulations is maximal at intermediate SNRs, well above the lowest auditory SNR where the recognition of whole words is significantly different from zero. We contend that the multisensory speech system is maximally tuned for SNRs between extremes, where the system relies on either the visual (speech-reading) or the auditory modality alone, forming a window of maximal integration at intermediate SNR levels. At these intermediate levels, the extent of multisensory enhancement of speech recognition is considerable, amounting to more than a 3-fold performance improvement relative to an auditory-alone condition.

KW - Audiovisual

KW - Crossmodal

KW - Inverse effectiveness

KW - Lip-reading

KW - Multisensory

KW - Speech perception

KW - Speech-reading

UR - http://www.scopus.com/inward/record.url?scp=34247172408&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34247172408&partnerID=8YFLogxK

U2 - 10.1093/cercor/bhl024

DO - 10.1093/cercor/bhl024

M3 - Article

C2 - 16785256

AN - SCOPUS:34247172408

SN - 1047-3211

VL - 17

SP - 1147

EP - 1153

JO - Cerebral Cortex

JF - Cerebral Cortex

IS - 5

ER -

Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this