The influence of a physiologically inspired complex compression scheme on speech intelligibility in noise
Objective: The primary goal of this study was to assess whether speech intelligibility in speech-shaped background noise can be improved by processing the clean speech signal with a complex compression scheme consisting of an instantaneous feed-forward and delayed feedback component mimicking the early stages of the healthy human auditory system ("Mimi-processing"). Speech intelligibility measures were compared between processed and unprocessed sentences. A global equal-RMS constraint was imposed to avoid the influence of level-boost.
Methods: Speech intelligibility was assessed for 35 native German speakers (24-68 years old) with an adaptive speech reception threshold (SRT) test, which provides an estimate of the required signal-to-noise ratio to achieve 50% correct word identification. Participants had average PTA4s of 9.9 dBHL (SD=7.1 dBHL). SRTs were measured with the German Oldenburg Matrix sentence test (OLSA). Sounds were presented monaurally via Etymotic ER-1 insert earphones. For a sub-cohort of 12 participants, an additional third condition was assessed in which speech was processed with an ‘equivalent-equaliser’ (equivalent-EQ). For each condition, a psychometric function was fitted to individual datasets with the psignifit toolbox, which implements a maximum-likelihood method for estimating psychometric parameters. In this way, SRTs were estimated for all participants and conditions, and the results were analysed as the difference-SRT [dB], averaged across participants, between the unprocessed condition and the respective test condition of interest.
Results: Full-cohort results: Mimi-processing resulted in statistically significantly improved SRTs [t(34)=19.78, p<0.0001] with a mean SRT-improvement of 2.77 dB compared to unprocessed speech as assessed with a two-sided one-sample t-test on paired observations. Sub-cohort results: The sub-cohort data (n=12) indicated statistically significant SRT-differences between the conditions unprocessed, Mimi-processed and equivalent-EQ [ANOVA: F(2,22)=219.73, p<0.0001]. Post-hoc analysis with multiple t-tests and Bonferroni correction for multiple comparisons revealed that SRTs of the equivalent-EQ condition were significantly worse than unprocessed SRTs (SRT=-1.31 dB) and SRTs of the Mimification condition was significantly better than unprocessed SRTs.
Conclusions: These results indicate that the Mimi-processing algorithm can improve speech intelligibility for speech presented in noise. This benefit seems to be a result of its unique compression scheme and does not solely emerge from frequency dependent energy shifting as represented by the equivalent-EQ condition. This work provides a promising foundation upon which further improvements of the processing parameters may be implemented to increase speech intelligibility in noise.