Temporal window of integration in auditory-visual speech perception

https://doi.org/10.1016/j.neuropsychologia.2006.01.001Get rights and content

Abstract

Forty-three normal hearing participants were tested in two experiments, which focused on temporal coincidence in auditory visual (AV) speech perception. In these experiments, audio recordings of/pa/and/ba/were dubbed onto video recordings of /ba/or/ga/, respectively (ApVk, AbVg), to produce the illusory “fusion” percepts /ta/, or /da/ [McGurk, H., & McDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746–747]. In Experiment 1, an identification task using McGurk pairs with asynchronies ranging from −467 ms (auditory lead) to +467 ms was conducted. Fusion responses were prevalent over temporal asynchronies from −30 ms to +170 ms and more robust for audio lags. In Experiment 2, simultaneity judgments for incongruent and congruent audiovisual tokens (AdVd, AtVt) were collected. McGurk pairs were more readily judged as asynchronous than congruent pairs. Characteristics of the temporal window over which simultaneity and fusion responses were maximal were quite similar, suggesting the existence of a 200 ms duration asymmetric bimodal temporal integration window.

Section snippets

Participants

Participants (native speakers of American English) were recruited from the University of Maryland undergraduate population and provided informed consent. Two groups of participants took part in this study. The first group included 21 participants (11 females, average 21 years) who were run in the voiced AbVg condition (AbVg: audio /b/ and video /g/). The second group consisted of 22 participants (8 females, average 22.5 years) who were run in the voiceless ApVk condition (ApVk: audio /p/ and

Voiced McGurk pair AbVg

Fig. 1 shows the distribution (in percent) of each of the three possible response categories (/ba/, /ga/, /da/ or /ða/) as a function of SOA (N = 18). Auditory-visual “ga” responses (visually driven responses) were seldom given, whereas /ba/ (auditorily driven responses) and /da/ or /ða/ fusion responses formed the majority of responses. The overall trend shows that as the asynchrony between the AV utterances increases, /ba/ judgments increase, whereas /da/ or /ða/ judgments (fusion responses

Discussion

Two experiments were conducted to examine the effects of audiovisual temporal asynchrony on syllable identification and simultaneity judgment. The major finding was that AV speech inputs are extremely tolerant to bimodal asynchrony, and that bimodal information separated in time by as much as 200 ms is usually perceived as simultaneous. Specifically, both the identification experiment and the subjective simultaneity judgment experiment revealed temporal windows of maximal AV integration of about

Acknowledgments

This work was supported by grants NIH DC 0463801 and NIH DC 05660 to DP. A preliminary report of this work was presented at the 31st Annual Meeting of the Society for Neurosciences, San Diego, November 15, 2001 and the 9th Annual Meeting of the Cognitive Neuroscience Society, San Francisco, April 14th, 2002. The opinions or assertions contained herein are the private views of the authors [KG] and are not to be construed as official or as reflecting the views of the Department of the Army or the

References (59)

  • J. Tuomainen et al.

    Audio-visual speech perception is special

    Cognition

    (2005)
  • J. Vroomen et al.

    Recalibration of temporal order perception by exposure to audio-visual asynchrony

    Brain Research. Cognitive Brain Research

    (2004)
  • H. Yabe et al.

    Automatic discriminative sensitivity inside temporal window of sensory memory as a function of time

    Cognitive Brain Research

    (2001)
  • H. Yabe et al.

    Organizing sound sequences in the human brain: the interplay of auditory streaming and temporal integration

    Brain Research

    (2001)
  • Arai, T., & Greenberg, S. (1997). The temporal properties of spoken Japanese are similar to those of English. In...
  • A. Boemio et al.

    Hierarchical and asymmetric temporal sensitivity in human auditory cortices

    Nature Neuroscience

    (2005)
  • H. Buchner et al.

    Fast visual evoked potential input in to human area V5

    Neuroreport

    (1997)
  • C.S. Campbell et al.

    Perception of visible speech: influence of spatial quantization

    Perception

    (1997)
  • G.G. Celesia

    Organization of auditory cortical areas in man

    Brain

    (1976)
  • B.L. Conrey et al.

    Audiovisual asynchrony detection for speech and nonspeech signals

  • N.F. Dixon et al.

    The detection of auditory visual desynchrony

    Perception

    (1980)
  • G.R. Engel et al.

    Visual-auditory distance constancy

    Nature

    (1971)
  • D.H. Ffytche et al.

    The parallel visual motion inputs into areas V1 and V5 of human cerebral cortex

    Brain

    (1995)
  • W. Fujisaki et al.

    Recalibration of audio-visual simultaneity

    Nature Neuroscience

    (2004)
  • K.W. Grant et al.

    Speech intelligibility derived from asynchronous processing of auditory-visual information

  • K. Grant et al.

    Auditory-visual speech recognition by hearing-impaired subjects: consonant recognition, sentence recognition, and auditory-visual integration

    Journal of the Acoustical Society of America

    (1998)
  • Greenberg, S. (1996). Understanding speech understanding: towards a unified theory of speech perception. In Proceedings...
  • S. Greenberg

    A multi-tier theoretical framework for understanding spoken language

  • K.S. Helfer

    Auditory and auditory-visual perception of clear and conversational speech

    Journal of Speech, Language and Hearing Research

    (1997)
  • Cited by (465)

    • Sensory profiles of children using cochlear implant and auditory brainstem implant

      2023, International Journal of Pediatric Otorhinolaryngology
    View all citing articles on Scopus
    View full text