Waveform to Single Sinusoid Regression to Estimate the F0 Contour from Noisy Speech Using Recurrent Deep Neural Networks

Kato, Akihiro; Kinnunen, Tomi

Files

Article (804.8Kb)

Self archived version

published version

Date

2018

Author(s)

Kato, Akihiro

Kinnunen, Tomi

Unique identifier

10.21437/Interspeech.2018-1671

Metadata

Show full item record

More information

Research Database SoleCris

Self-archived item

Citation

Kato, Akihiro. Kinnunen, Tomi. (2018). Waveform to Single Sinusoid Regression to Estimate the F0 Contour from Noisy Speech Using Recurrent Deep Neural Networks. Interspeech, 327-331. 10.21437/Interspeech.2018-1671.

Rights

Licensed under

Abstract

The fundamental frequency (F0) represents pitch in speech that determines prosodic characteristics of speech and is needed in various tasks for speech analysis and synthesis. Despite decades of research on this topic, F0 estimation at low signal-to-noise ratios (SNRs) in unexpected noise conditions remains difficult. This work proposes a new approach to noise robust F0 estimation using a recurrent neural network (RNN) trained in a supervised manner. Recent studies employ deep neural networks (DNNs) for F0 tracking as a frame-by-frame classification task into quantised frequency states but we propose waveform-to-sinusoid regression instead to achieve both noise robustness and accurate estimation with increased frequency resolution. Experimental results with PTDB-TUG corpus contaminated by additive noise (NOISEX-92) demonstrate that the proposed method improves gross pitch error (GPE) rate and fine pitch error (FPE) by more than 35% at SNRs between -10 dB and +10 dB compared with well-known noise robust F0 tracker, PEFAC. Furthermore, the proposed method also outperforms state-of-the-art DNN-based approaches by more than 15% in terms of both FPE and GPE rate over the preceding SNR range.

URI

https://erepo.uef.fi/handle/123456789/7214

Link to the original item

https://doi.org/10.21437/Interspeech.2018-1671

Publisher

International Speech Communication Association

Collections

Luonnontieteiden, metsätieteiden ja tekniikan tiedekunta [1592]