Using automatic speech recognition to predict aided speech-in-noise intelligibility
As the main complaint of people with age-related hearing loss (ARHL) is difficulty understanding speech, the success of rehabilitation through hearing aids (HAs) is often measured through speech intelligibility tests. These tests can be fairly lengthy and therefore cannot be conducted for all HA settings that might yield optimal speech intelligibility to the hearing-impaired listener.
Recent studies showed that automatic speech recognition (ASR) can be used as an objective measure for the prediction of unaided speech intelligibility in quiet in people with real or simulated ARHL (Fontan et al., 2017; Fontan et al., in revision). The aim of the present study was to assess the applicability of ASR to a wider range of listening conditions, involving unaided and aided speech-in-noise perception in older hearing-impaired (OHI) listeners.
Twenty-eight OHI participants (mean age = 73.3 years) were recruited for this study. They completed several speech-identification tasks, involving logatoms, words, and sentences. All speech materials were mixed with a background noise with the long-term average speech spectrum (LTASS) and presented monaurally through headphones at 60 dB SPL. The signal-to-noise ratio was -1.5 dB. Participants completed the identification tasks unaided and aided using a HA simulator implementing individual gains prescribed by the CAM2b fitting rule.
A speech-intelligibility prediction system was set up, consisting of: (1) the HA simulator used for the OHI participants (Moore et al., 2010), (2) an age-related-hearing-loss simulator implementing the algorithms described by Nejime and Moore (1997), and (3) an HMM-GMM-based ASR system using the Julius decoder software (Nagoya Institute of Technology, Japan), with acoustic models trained on speech in LTASS noise, and a different language model for each of the speech materials. Human and machine intelligibility scores were calculated as the percentage of logatoms or words that were correctly identified.
The results show that, on average, the implementation of CAM2b gains significantly improved speech-in-noise intelligibility performances both in OHI listeners and the ASR system.