Voice pitch perception in cochlear implant users with a spectro-temporally enhanced dual filter-bank sound coding strategy
Despite their great clinical success in partial restoration of hearing and speech understanding in the deaf and hard-of-hearing, current cochlear implants (CI) still do not provide adequate spectral and temporal cues for voice perception. These limitations could particularly affect speech recognition with competing talkers, and may represent one of the key factors for substantial variability in the efficacy of CI.
We present STEP, a novel experimental multi-channel sound coder with enhanced spectro-temporal processing. STEP employs an FFT-based approach with dual filter banks, the first with narrow, good quality filters for spectral processing and the second with a parallel bank of wide filters to reinforce temporal cues. STEP coding was assessed in two experiments investigating voice pitch perception by 16 Nucleus® CI users, where we controlled the modulation characteristics and varied the carrier rate. We used a fixed set of threshold and comfortable stimulation levels for each subject, obtained from clinical MAPs. In the first experiment, we determined equivalence for voice pitch ranking and gender identification between the clinical ACE strategy and STEP for fundamental frequencies (F0) between 120 Hz and 250 Hz. In the second experiment, loudness as a function of the input amplitude of speech samples, was determined for carrier rates of 1000, 500, and 250 pps per channel. Then, using equally loud sound coder programs, we evaluated the effect of carrier rate on voice pitch perception.
Voice pitch perception was heterogeneous across subjects from no ranking ability to good ranking. Conversely, nearly all subjects could identify voice gender at a level significantly above chance. Overall, carrier rate did not have a substantial effect on voice pitch ranking or gender identification as long as the carrier rate was at least twice the fundamental frequency, or if stimulation pulses for the lowest, 250 pps carrier were aligned to F0 peaks. Also, in the overlap region of male/female F0s, gender categorization was also dependent on speaker gender.
These results indicate that carrier rates as low as 250 pps per channel are enough to support functional voice pitch perception based on temporal cues at least when temporal modulation and pulse timings in the coder output are well controlled. The results also suggest that further investigation of the use of spectral cues for gender identification by CI subjects is warranted. Finally, we will present some data on speech-in-noise performance with STEP.
Funding: Research was supported by IIR Grants #2098 and #853 from Cochlear© to DK.