Skip to main contentSkip to search and navigation

UEF eREPOSITORY

    • English
    • suomi
  • English 
    • English
    • suomi
  • Login
View Item 
  •   Home
  • Artikkelit
  • Luonnontieteiden ja metsätieteiden tiedekunta
  • View Item
  •   Home
  • Artikkelit
  • Luonnontieteiden ja metsätieteiden tiedekunta
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

On the limits of automatic speaker verification: Explaining degraded recognizer scores through acoustic changes resulting from voice disguise

Thumbnail
Files
Article (1.160Mb)
Self archived version
published version
Date
2019
Author(s)
González Hautamäki, R
Hautamäki, V
Kinnunen, T
Unique identifier
10.1121/1.5119240
Metadata
Show full item record
More information
Research Database SoleCris

Self-archived article

Citation
González Hautamäki, R. Hautamäki, V. Kinnunen, T. (2019). On the limits of automatic speaker verification: Explaining degraded recognizer scores through acoustic changes resulting from voice disguise.  Journal of the acoustical society of america, 146 (1) , 693-704. 10.1121/1.5119240.
Rights
© Acoustical Society of America. This article may be downloaded for personal use only. Any other use requires prior permission of the author and the Acoustical Society of America. The following article appeared in The Journal of the Acoustical Society of
Licensed under
All rights reserved
Abstract

In speaker verification research, objective performance benchmarking of listeners and automatic speaker verification (ASV) systems are of key importance in understanding the limits of speaker recognition. While the adoption of common data and metrics has been instrumental to progress in ASV, there are two major shortcomings. First, the utterances lack intentional voice changes imposed by the speaker. Second, the standard evaluation metrics focus on average performance across all speakers and trials. As a result, a knowledge gap remains in how the acoustic changes impact recognition performance at the level of individual speakers. This paper addresses the limits of speaker recognition in ASV systems under voice disguise using a linear mixed effects model to analyze the impact of change in long-term statistics of selected features (formants F1–F4, the bandwidths B1–B4, F0, and speaking rate) to ASV log-likelihood ratio (LLR) score. The correlations between the proposed predictive model and the LLR scores are 0.72 for females and 0.81 for male speakers. As a whole, the difference in long-term F0 between enrollment and test utterances was found to be the individually most detrimental factor, even if the ASV system uses only spectral, rather than prosodic, features.

URI
https://erepo.uef.fi/handle/123456789/7740
Link to the original item
http://dx.doi.org/10.1121/1.5119240
Publisher
Acoustical Society of America (ASA)
Collections
  • Luonnontieteiden ja metsätieteiden tiedekunta [1109]
University of Eastern Finland
OpenAccess
eRepo
erepo@uef.fi
OpenUEF
Service provided by
the University of Eastern Finland Library
Library web pages
Twitter
Facebook
Youtube
Library blog
 sitemap
Search

Browse

All of the ArchiveResource types & CollectionsBy Issue DateAuthorsTitlesSubjectsFacultyDepartmentFull organizationSeriesMain subjectThis CollectionBy Issue DateAuthorsTitlesSubjectsFacultyDepartmentFull organizationSeriesMain subject

My Account

Login
University of Eastern Finland
OpenAccess
eRepo
erepo@uef.fi
OpenUEF
Service provided by
the University of Eastern Finland Library
Library web pages
Twitter
Facebook
Youtube
Library blog
 sitemap