Can visual capture of sound separate auditory targets from noise?
Speech recognition in noise improves when two competing sound sources are spatially separated. This phenomenon is termed Spatial Release from Masking (SRM) and it can arise for a combination of masking and attentional selection. As the physical distance between the two sound sources increases, the head-shadow effect progressively changes the ratio of signal and noise at each ear, affecting energetic masking of the target sound. In addition, as distance in external space increases, the two concurrent auditory events can be segregated more easily for attentional selection. This cognitive mechanism prioritises information coming from the relevant source, while inhibiting the competing signal. Studies across sensory systems (e.g., vision or touch) have repeatedly shown that separating concurrent streams of information in space helps selective attention mechanisms. Here, we investigated to what extent illusory changes in the perceived position of the target sound in external space could influence the SRM phenomenon. Specifically, we used a multisensory paradigm to illusory increase or decrease the perceived separation between speech and noise, exploiting a visual capture of sound phenomenon known as ‘ventriloquist effect’. In each experimental trail, normal-hearing participants (N=20) performed two tasks: hearing-in-noise and sound localization. The hearing-in-noise task entailed repeating aloud a sequence of 5 spoken digits, delivered from unseen speakers in front space, while ignoring concurrent noise delivered from a fixed visible speaker to the left of the apparatus. The sound localisation task entailed pointing to the perceived position of the sound, after digits identification. Crucially, all target sounds were delivered together with a visual stimulus that changed in brightness as a function of the target-sound’s envelope. With respect to the auditory target, the visual stimulus either originated from the same location (audio-visual congruent, AVcon) or was located 15 degrees to the left or right (audio-visual incongruent, AVinc). Results showed that AVinc conditions induced visual capture of sounds: target sounds were perceived closer to noise when the visual stimulus was presented leftwards, and farther away from noise when the visual stimulus was presented rightwards. We also obtained SRM for auditory targets delivered to the right of the participant’s body midline (i.e., opposite hemispace with respect to noise), compared to auditory targets delivered the left. However, SRM was unaffected by AV conditions, revealing no measurable effects of the multisensory illusion on SRM. This indicates a greater role for energetic masking compared to attentional selection, in contributing to SRM in our experimental paradigm.