Age-Related Voice Disguise and its Impact in Speaker Verification Accuracy
Self archived versionpublished version
MetadataShow full item record
CitationGonzalez Hautamäki, Rosa. Sahidullah, Md. Kinnunen, Tomi. Hautamäki, Ville. (2016). Age-Related Voice Disguise and its Impact in Speaker Verification Accuracy. Proceedings of Odyssey 2016: The Speaker and Language Recognition Workshop, June 21-24, 2016, Bilbao, Spain, 277-282. 10.21437/Odyssey.2016-40.
This study focuses in the impact of age-related intentional voice modification, or age disguise, on the performance of automatic speaker verification (ASV) systems. The data collected for this study includes 60 native Finnish speakers (29 males, 31 females) with age range between 18 and 73 years. The corpus consist of two sessions of read speech per speaker. Our experiments demonstrate vulnerability of modern ASV systems when a person attempts to conceal his or her identity, by modifying the voice to sound like an old or young person. For our i-vector PLDA system, the increase in equal error rate (EER), in the case of male speakers, was 7-fold for the attempt of old voice and 11-fold for young voice. Similar degradation in performance is observed for female speakers with a 5-fold increase in EER for old voice disguise and a 6-fold increase for young voice disguise. We further analyze the factors affecting the performance of ASV systems for the studied speech data. In our experiments, male speakers were found more successful in disguising their voices. The effect on fundamental frequency (F0) was also studied. The mean F0 distributions showed a shift towards higher frequencies when speakers attempted a young voice, which relates to the perception that younger speakers’ F0 values tend to be higher than for older speakers.