SpiN 2020 :: program

Exploring listeners' speech modification preferences

Olympia Simantiraki^(a)
University of the Basque Country, Spain

Martin Cooke^(b)
Ikerbasque (Basque Science Foundation), Spain

(a) Presenting
(b) Attending

Listening to synthetic or artificially-produced speech under adverse conditions is an everyday phenomenon. Many algorithms have been proposed for augmenting the speech signal before reaching the listener. Near-end listening enhancement algorithms can achieve significant improvements in speech understanding compared to unprocessed speech in adverse conditions (Taal & Jensen, 2013; Schepker et al, 2015). Other factors such as listening effort and naturalness are also important when intelligibility is close to ceiling. One means to explore these supra-intelligibility factors is through listener preferences. Earlier studies have measured listener preferences via subjective scales (Moore et al, 2007; Adams & Moore, 2009) or by allowing listeners to modify speech properties in real-time (Wingfield & Ducharme, 1999; Zhang & Shen, 2019; Simantiraki & Cooke,2019).

Using the virtual adjustment tool proposed in Simantiraki & Cooke, 2019, we conducted several experiments to explore the effects of speech properties on listening preferences and intelligibility. Participants were permitted to change a speech feature during an open-ended adjustment phase, followed by a test phase in which they identified speech presented with the feature value selected at the end of the adjustment phase. This technique generates information about the feature value that listeners subjectively feel allows comfortable speech recognition performance as well as the actual intelligibility, and the time required to make the adjustment.

Experiments with native normal-hearing listeners measured the consequences of allowing listeners to change spectral slope, the location of a spectral band of speech, speech rate and mean F0. Speech stimuli were presented in both quiet and masked conditions. As the noise level increased, compared to the original values, listeners (i) chose increasingly flatter spectral tilts; (ii) moved spectral bands to higher frequencies and (iii) preferred slower speech rates. However, the mean of F0 was unaffected by noise and was always lower than the original value. These outcomes are largely consistent with earlier findings of the effect of corresponding modifications on intelligibility, but provide additional information in cases where intelligibility is at ceiling levels.

References:
Taal and Jensen (2013) Interspeech 3582–3586
Schepker et al (2015) J. Acoust. Soc. Am. 138, 2692–2706
Moore et al (2007) Int. J. Audiol. 46, 154–160
Adams and Moore (2009) J. Am. Acad. Audiol. 20, 28–39
Wingfield and Ducharme (1999) J. Gerontol. B Psychol. Sci. Soc. Sci. 54B, P199–P202
Zhang and Shen (2019) Interspeech 1383–1387
Simantiraki and Cooke (2019) ICA 5736–5738

Last modified 2020-01-06 19:23:55