Voice biometrics security: Extrapolating false alarm rate via hierarchical Bayesian modeling of speaker verification scores

Sholokhov, Alexey; Kinnunen, Tomi; Vestman, Ville; Aik Lee, Kong

dc.contributor.author	Sholokhov, Alexey
dc.contributor.author	Kinnunen, Tomi
dc.contributor.author	Vestman, Ville
dc.contributor.author	Aik Lee, Kong
dc.date.accessioned	2019-12-12T13:40:23Z
dc.date.available	2019-12-12T13:40:23Z
dc.date.issued	2019
dc.identifier.uri	https://erepo.uef.fi/handle/123456789/7869
dc.description.abstract	How secure automatic speaker verification (ASV) technology is? More concretely, given a specific target speaker, how likely is it to find another person who gets falsely accepted as that target? This question may be addressed empirically by studying naturally confusable pairs of speakers within a large enough corpus. To this end, one might expect to find at least some speaker pairs that are indistinguishable from each other in terms of ASV. To a certain extent, such aim is mirrored in the standardized ASV evaluation benchmarks, for instance, the series of speaker recognition evaluation (SRE) organized by the National Institute of Standards and Technology (NIST). Nonetheless, arguably the number of speakers in such evaluation benchmarks represents only a small fraction of all possible human voices, making it challenging to extrapolate performance beyond a given corpus. Furthermore, the impostors used in performance evaluation are usually selected randomly. A potentially more meaningful definition of an impostor — at least in the context of security-driven ASV applications — would be closest (most confusable) other speaker to a given target. We put forward a novel performance assessment framework to address both the inadequacy of the random-impostor evaluation model and the size limitation of evaluation corpora by addressing ASV security against closest impostors on arbitrarily large datasets. The framework allows one to make a prediction of the safety of given ASV technology, in its current state, for arbitrarily large speaker database size consisting of virtual (sampled) speakers. As a proof-of-concept, we analyze the performance of two state-of-the-art ASV systems, based on i-vector and x-vector speaker embeddings (as implemented in the popular Kaldi toolkit), on the recent VoxCeleb 1, and 2 corpora, containing a total of 7365 speakers. We fix the number of target speakers to 1000, and generate up to N = 100, 000 virtual impostors sampled from the generative model. The model-based false alarm rates are in a reasonable agreement with empirical false alarm rates and, as predicted, increase substantially (values up to 98%) with N = 100, 000 impostors. Neither the i-vector or x-vector system is immune to increased false alarm rate at increased impostor database size, as predicted by the model.
dc.language.iso	englanti
dc.publisher	Elsevier BV
dc.relation.ispartofseries	Computer speech and language
dc.relation.uri	http://dx.doi.org/10.1016/j.csl.2019.101024
dc.rights	CC BY-NC-ND 4.0
dc.subject	speaker verification
dc.subject	population size
dc.subject	security
dc.subject	false alarm rate
dc.subject	random impostor
dc.subject	closest impostor
dc.subject	Bayesian score modeling
dc.subject	VoxCeleb
dc.title	Voice biometrics security: Extrapolating false alarm rate via hierarchical Bayesian modeling of speaker verification scores
dc.description.version	final draft
dc.contributor.department	School of Computing, activities
uef.solecris.id	66486541	en
dc.type.publication	Tieteelliset aikakauslehtiartikkelit
dc.relation.doi	10.1016/j.csl.2019.101024
dc.description.reviewstatus	peerReviewed
dc.relation.articlenumber	101024
dc.relation.issn	0885-2308
dc.relation.volume	60
dc.rights.accesslevel	openAccess
dc.type.okm	A1
uef.solecris.openaccess	Ei
dc.rights.copyright	© Elsevier Ltd.
dc.type.displayType	article	en
dc.type.displayType	artikkeli	fi
dc.rights.url	https://creativecommons.org/licenses/by-nc-nd/4.0/

Files in this item

Name:: 1576158141648956748.pdf
Size:: 1.259Mb
Format:: PDF
Description:: Article

Files

This item appears in the following Collection(s)

Luonnontieteiden, metsätieteiden ja tekniikan tiedekunta [1563]
Luonnontieteiden, metsätieteiden ja tekniikan tiedekunta / Faculty of Science, Forestry and Technology

Show simple item record