Lip-reading aids word recognition most in moderate noise: A Bayesian explanation using high-dimensional feature space

Wei Ji Ma, Xiang Zhou, Lars A. Ross, John J. Foxe, Lucas C. Parra

Research output: Contribution to journalArticle

95 Citations (Scopus)

Abstract

Watching a speaker's facial movements can dramatically enhance our ability to comprehend words, especially in noisy environments. From a general doctrine of combining information from different sensory modalities (the principle of inverse effectiveness), one would expect that the visual signals would be most effective at the highest levels of auditory noise. In contrast, we find, in accord with a recent paper, that visual information improves performance more at intermediate levels of auditory noise than at the highest levels, and we show that a novel visual stimulus containing only temporal information does the same. We present a Bayesian model of optimal cue integration that can explain these conflicts. In this model, words are regarded as points in a multidimensional space and word recognition is a probabilistic inference process. When the dimensionality of the feature space is low, the Bayesian model predicts inverse effectiveness; when the dimensionality is high, the enhancement is maximal at intermediate auditory noise levels. When the auditory and visual stimuli differ slightly in high noise, the model makes a counterintuitive prediction: as sound quality increases, the proportion of reported words corresponding to the visual stimulus should first increase and then decrease. We confirm this prediction in a behavioral experiment. We conclude that auditory-visual speech perception obeys the same notion of optimality previously observed only for simple multisensory stimuli.

Original languageEnglish (US)
Article numbere4638
JournalPLoS One
Volume4
Issue number3
DOIs
StatePublished - Mar 4 2009
Externally publishedYes

Fingerprint

Lipreading
lips
Noise
Speech Perception
Visual Perception
Aptitude
prediction
Cues
Acoustic waves
Recognition (Psychology)
Experiments

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

Lip-reading aids word recognition most in moderate noise : A Bayesian explanation using high-dimensional feature space. / Ma, Wei Ji; Zhou, Xiang; Ross, Lars A.; Foxe, John J.; Parra, Lucas C.

In: PLoS One, Vol. 4, No. 3, e4638, 04.03.2009.

Research output: Contribution to journalArticle

Ma, Wei Ji ; Zhou, Xiang ; Ross, Lars A. ; Foxe, John J. ; Parra, Lucas C. / Lip-reading aids word recognition most in moderate noise : A Bayesian explanation using high-dimensional feature space. In: PLoS One. 2009 ; Vol. 4, No. 3.
@article{8e64e16fbc6c4d3590bf1d0a02df5d1d,
title = "Lip-reading aids word recognition most in moderate noise: A Bayesian explanation using high-dimensional feature space",
abstract = "Watching a speaker's facial movements can dramatically enhance our ability to comprehend words, especially in noisy environments. From a general doctrine of combining information from different sensory modalities (the principle of inverse effectiveness), one would expect that the visual signals would be most effective at the highest levels of auditory noise. In contrast, we find, in accord with a recent paper, that visual information improves performance more at intermediate levels of auditory noise than at the highest levels, and we show that a novel visual stimulus containing only temporal information does the same. We present a Bayesian model of optimal cue integration that can explain these conflicts. In this model, words are regarded as points in a multidimensional space and word recognition is a probabilistic inference process. When the dimensionality of the feature space is low, the Bayesian model predicts inverse effectiveness; when the dimensionality is high, the enhancement is maximal at intermediate auditory noise levels. When the auditory and visual stimuli differ slightly in high noise, the model makes a counterintuitive prediction: as sound quality increases, the proportion of reported words corresponding to the visual stimulus should first increase and then decrease. We confirm this prediction in a behavioral experiment. We conclude that auditory-visual speech perception obeys the same notion of optimality previously observed only for simple multisensory stimuli.",
author = "Ma, {Wei Ji} and Xiang Zhou and Ross, {Lars A.} and Foxe, {John J.} and Parra, {Lucas C.}",
year = "2009",
month = "3",
day = "4",
doi = "10.1371/journal.pone.0004638",
language = "English (US)",
volume = "4",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "3",

}

TY - JOUR

T1 - Lip-reading aids word recognition most in moderate noise

T2 - A Bayesian explanation using high-dimensional feature space

AU - Ma, Wei Ji

AU - Zhou, Xiang

AU - Ross, Lars A.

AU - Foxe, John J.

AU - Parra, Lucas C.

PY - 2009/3/4

Y1 - 2009/3/4

N2 - Watching a speaker's facial movements can dramatically enhance our ability to comprehend words, especially in noisy environments. From a general doctrine of combining information from different sensory modalities (the principle of inverse effectiveness), one would expect that the visual signals would be most effective at the highest levels of auditory noise. In contrast, we find, in accord with a recent paper, that visual information improves performance more at intermediate levels of auditory noise than at the highest levels, and we show that a novel visual stimulus containing only temporal information does the same. We present a Bayesian model of optimal cue integration that can explain these conflicts. In this model, words are regarded as points in a multidimensional space and word recognition is a probabilistic inference process. When the dimensionality of the feature space is low, the Bayesian model predicts inverse effectiveness; when the dimensionality is high, the enhancement is maximal at intermediate auditory noise levels. When the auditory and visual stimuli differ slightly in high noise, the model makes a counterintuitive prediction: as sound quality increases, the proportion of reported words corresponding to the visual stimulus should first increase and then decrease. We confirm this prediction in a behavioral experiment. We conclude that auditory-visual speech perception obeys the same notion of optimality previously observed only for simple multisensory stimuli.

AB - Watching a speaker's facial movements can dramatically enhance our ability to comprehend words, especially in noisy environments. From a general doctrine of combining information from different sensory modalities (the principle of inverse effectiveness), one would expect that the visual signals would be most effective at the highest levels of auditory noise. In contrast, we find, in accord with a recent paper, that visual information improves performance more at intermediate levels of auditory noise than at the highest levels, and we show that a novel visual stimulus containing only temporal information does the same. We present a Bayesian model of optimal cue integration that can explain these conflicts. In this model, words are regarded as points in a multidimensional space and word recognition is a probabilistic inference process. When the dimensionality of the feature space is low, the Bayesian model predicts inverse effectiveness; when the dimensionality is high, the enhancement is maximal at intermediate auditory noise levels. When the auditory and visual stimuli differ slightly in high noise, the model makes a counterintuitive prediction: as sound quality increases, the proportion of reported words corresponding to the visual stimulus should first increase and then decrease. We confirm this prediction in a behavioral experiment. We conclude that auditory-visual speech perception obeys the same notion of optimality previously observed only for simple multisensory stimuli.

UR - http://www.scopus.com/inward/record.url?scp=61849139060&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=61849139060&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0004638

DO - 10.1371/journal.pone.0004638

M3 - Article

C2 - 19259259

AN - SCOPUS:61849139060

VL - 4

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 3

M1 - e4638

ER -