Near-end listening enhancement in cars
Near-end listening enhancement is a growing field of research that aims at increasing intelligibility of speech signals in noisy environments. Voice transformation techniques are usually used under the constraint of keeping the Signal to Noise Ratio (SNR) unchanged. In our work we propose a slightly different approach where different near-end listening methods applied to in-car noisy environments are studied under the constraint of a fixed perceived loudness.
The first method consists of an adaptive equalizer which reallocates the energy of frequency bands to maximize the Speech Intelligibility Index (SII). Perceptual tests have been carried out and demonstrate a variable performance of the algorithm depending on the shape of the noise spectrum. We also highlighted limitations of perceptual tests based on the Speech Reception Threshold (SRT) as it does not reflect real-life situations.
The second method is based on deep parallel learning models which automatically learn the voice transformations from speech datasets. We introduced a novel duration modification feature and we studied the use of recurrent architectures combined with wavelet description of features. Objective results and preliminary listening tests show the merit of this approach.