Skip to content

Developmental shifts in detection and attention for auditory, visual, and audiovisual speech

Research output: Contribution to journalArticle

Original languageEnglish
Pages (from-to)3095-3112
Number of pages18
JournalJournal of Speech, Language, and Hearing Research
Issue number12
Early online date10 Dec 2018
DateAccepted/In press - 16 Jul 2018
DateE-pub ahead of print - 10 Dec 2018
DatePublished (current) - Dec 2018


Purpose: Successful speech processing depends on our ability to detect and integrate multisensory cues, yet there is minimal research on multisensory speech detection and integration by children. To address this need, we studied the development of speech detection for auditory (A), visual (V), and audiovisual (AV) input. Method: Participants were 115 typically developing children clustered into age groups between 4 and 14 years. Speech detection (quantified by response times [RTs]) was determined for 1 stimulus, /buh/, presented in A, V, and AV modes (articulating vs. static facial conditions). Performance was analyzed not only in terms of traditional mean RTs but also in terms of the faster versus slower RTs (defined by the 1st vs. 3rd quartiles of RT distributions). These time regions were conceptualized respectively as reflecting optimal detection with efficient focused attention versus less optimal detection with inefficient focused attention due to attentional lapses. Results: Mean RTs indicated better detection (a) of multisensory AV speech than A speech only in 4-to 5-year-olds and (b) of A and AV inputs than V input in all age groups. The faster RTs revealed that AV input did not improve detection in any group. The slower RTs indicated that (a) the processing of silent V input was significantly faster for the articulating than static face and (b) AV speech or facial input significantly minimized attentional lapses in all groups except 6-to 7-year-olds (a peaked U-shaped curve). Apparently, the AV benefit observed for mean performance in 4-to 5-year-olds arose from effects of attention. Conclusions: The faster RTs indicated that AV input did not enhance detection in any group, but the slower RTs indicated that AV speech and dynamic V speech (mouthing) significantly minimized attentional lapses and thus did influence performance. Overall, A and AV inputs were detected consistently faster than V input; this result endorsed stimulus-bound auditory processing by these children.

    Structured keywords

  • Language

Download statistics

No data available



  • Full-text PDF (accepted author manuscript)

    Rights statement: This is the author accepted manuscript (AAM). The final published version (version of record) is available online via ASHA at . Please refer to any applicable terms of use of the publisher.

    Accepted author manuscript, 1 MB, PDF document


View research connections

Related faculties, schools or groups