The sight of a speaker’s facial movements during the perception of a spoken message can benefit speech processing through online predictive mechanisms. Recent evidence suggests that these predictive mechanisms can operate across sensory modalities, that is, vision and audition. However, to date, behavioral and electrophysiological demonstrations of cross-modal prediction in speech have considered only the speaker’s native language. Here, we address a question of current debate, namely whether the ...
The sight of a speaker’s facial movements during the perception of a spoken message can benefit speech processing through online predictive mechanisms. Recent evidence suggests that these predictive mechanisms can operate across sensory modalities, that is, vision and audition. However, to date, behavioral and electrophysiological demonstrations of cross-modal prediction in speech have considered only the speaker’s native language. Here, we address a question of current debate, namely whether the level of representation involved in cross-modal prediction is phonological or pre-phonological. We do this by testing participants in an unfamiliar language. If cross-modal prediction is predominantly based on phonological representations tuned to the phonemic categories of the native language of the listener, then it should be more effective in the listener’s native language than in an unfamiliar one. We tested Spanish and English native speakers in an audiovisual matching paradigm that allowed us to evaluate visual-to-auditory prediction, using sentences in the participant’s native language and in an unfamiliar language. The benefits of cross-modal prediction were only seen in the native language, regardless of the particular language or participant’s linguistic background. This pattern of results implies that cross-modal visual-to-auditory prediction during speech processing makes strong use of phonological representations, rather than low-level spatiotemporal correlations across facial movements and sounds.
+