ASVspoof 2019: The 3rd Automatic Speaker Verification Spoofing and Countermeasures Challenge database

Yamagishi, Junichi; Todisco, Massimiliano; Sahidullah, Md; Delgado, Héctor; Wang, Xin; Evans, Nicolas; Kinnunen, Tomi; Lee, Kong Aik; Vestman, Ville; Nautsch, Andreas

Date

2019-06-17

Author(s)

Yamagishi, Junichi

Todisco, Massimiliano

Sahidullah, Md

Delgado, Héctor

Wang, Xin

Evans, Nicolas

Kinnunen, Tomi

Lee, Kong Aik

Vestman, Ville

Nautsch, Andreas

Unique identifier

urn:nbn:fi:att:4e69bd10-66b8-4b0a-9854-c2b738ef721a

Metadata

Show full item record

Research data

Citation

Yamagishi, Junichi. Todisco, Massimiliano. Sahidullah, Md. Delgado, Héctor. Wang, Xin. Evans, Nicolas. Kinnunen, Tomi. Lee, Kong Aik. Vestman, Ville. Nautsch, Andreas. , ASVspoof 2019: The 3rd Automatic Speaker Verification Spoofing and Countermeasures Challenge database, 2019, urn:nbn:fi:att:4e69bd10-66b8-4b0a-9854-c2b738ef721a.

Abstract

This is a database used for the Third Automatic Speaker Verification Spoofing and Countermeasures Challenge, for short, ASVspoof 2019 (http://www.asvspoof.org) organized by Junichi Yamagishi, Massimiliano Todisco, Md Sahidullah, Héctor Delgado, Xin Wang, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Ville Vestman, and Andreas Nautsch in 2019. The ASVspoof challenge aims to encourage further progress through (i) the collection and distribution of a standard dataset with varying spoofing attacks implemented with multiple, diverse algorithms and (ii) a series of competitive evaluations for automatic speaker verification. The ASVspoof 2019 challenge follows on from three special sessions on spoofing and countermeasures for automatic speaker verification held during INTERSPEECH 2013, 2015, and 2017. While the first edition in 2013 was targeted mainly at increasing awareness of the spoofing problem, the 2015 edition included the first challenge on the topic, accompanied by commonly defined evaluation data, metrics and protocols. The task in ASVspoof 2015 was to design countermeasure solutions capable of discriminating between bona fide (genuine) speech and spoofed speech produced using either text-to-speech (TTS) or voice conversion (VC) systems. The ASVspoof 2017 challenge focused on the design of countermeasures aimed at detecting replay spoofing attacks that could, in principle, be implemented by anyone using common consumer-grade devices. The ASVspoof 2019 challenge extends the previous challenge in several directions. The 2019 edition is the first to focus on countermeasures for all three major attack types, namely those stemming from TTS, VC and replay spoofing attacks. Advances with regards to the 2015 edition include the addition of up-to-date TTS and VC systems that draw upon substantial progress made in both fields during the last four years. ASVspoof 2019 thus aims to determine whether the advances in TTS and VC technology post a greater threat to automatic speaker verification and the reliability of spoofing countermeasures. Advances with regards to the 2017 edition concern the use of a far more controlled evaluation setup for the assessment of replay spoofing countermeasures. Whereas the 2017 challenge was created from the recordings of real replayed spoofing attacks, the use of an uncontrolled setup made results somewhat difficult to analyse. A controlled setup, in the form of replay attacks simulated using a range of real replay devices and carefully controlled acoustic conditions is adopted in ASVspoof 2019 with the aim of bringing new insights into the replay spoofing problem. Last but not least, the 2019 edition aligns ASVspoof more closely with the field of automatic speaker verification. Whereas the 2015 and 2017 editions focused on the development and assessment of stand-alone countermeasures, ASVspoof 2019 adopts for the first time a new ASV-centric metric in the form of the tandem decision cost function (t-DCF). The ASVspoof 2019 database encompasses two partitions for the assessment of logical access (LA) and physical access (PA) scenarios. Both are derived from the VCTK base corpus which includes speech data captured from 107 speakers (46 males, 61 females). Both LA and PA databases are themselves partitioned into three datasets, namely training, development and evaluation which comprise the speech from 20 (8 male, 12 female), 10 (4 male, 6 female) and 48 (21 male, 27 female) speakers respectively. The three partitions are disjoint in terms of speakers, and the recording conditions for all source data are identical. While the training and development sets contain spoofing attacks generated with the same algorithms/conditions (designated as known attacks), the evaluation set also contains attacks generated with different algorithms/conditions (designated as unknown attacks). Reliable spoofing detection performance therefore calls for systems that generalise well to previously-unseen spoofing attacks. Full descriptions are available in the ASVspoof 2019 evaluation plan.