Show simple item record

dc.contributor.authorKato, Akihiro
dc.contributor.authorKinnunen, Tomi H
dc.date.accessioned2019-12-18T14:09:29Z
dc.date.available2019-12-18T14:09:29Z
dc.date.issued2019
dc.identifier.urihttps://erepo.uef.fi/handle/123456789/7881
dc.description.abstractThe fundamental frequency (F0) in a speech signal, which corresponds to pitch, is one of the key features involved in a variety of speech processing tasks. Therefore, accurate F0 estimation has remained an important problem to be solved over decades. However, this problem is difficult, especially in low signal-to-noise ratio (SNR) conditions with unknown noise. In this work, we propose new approaches to noise-robust F0 estimation using recurrent neural networks (RNNs). Recent F0 estimation studies exploit deep neural networks (DNNs), including RNNs, to classify acoustic features into quantized frequency states. In contrast to these classification approaches, we put forward a regression method for F0 tracking, which is accomplished with RNNs. To this end, we propose two variants. Our first model predicts the (scalar) F0 value directly from a spectrum, while our second model predicts a target sinusoidal waveform (with the desired F0) from the raw speech waveform. Our experiments with the pitch tracking database from Graz University of Technology (PTDB-TUG), contaminated by additive noise (NOISEX-92), demonstrate the improvement of the proposed approaches in terms of the gross pitch error (GPE) and fine pitch error (FPE) rates by more than 35% at SNRs between -10 dB and +10 dB against a well-known, noise-robust F0 tracker, PEFAC. Furthermore, our methods outperform state-of-the-art neural network-based approaches by more than 15% in terms of both the FPE and GPE rates over the abovementioned SNR range.
dc.language.isoenglanti
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartofseriesIEEE/ACM transactions on audio, speech, and language processing
dc.relation.urihttp://dx.doi.org/10.1109/TASLP.2019.2945489
dc.rightsAll rights reserved
dc.titleStatistical Regression Models for Noise Robust F0 Estimation Using Recurrent Deep Neural Networks
dc.description.versionfinal draft
dc.contributor.departmentSchool of Computing, activities
uef.solecris.id66890948en
dc.type.publicationTieteelliset aikakauslehtiartikkelit
dc.rights.accessrights© IEEE
dc.relation.doi10.1109/TASLP.2019.2945489
dc.description.reviewstatuspeerReviewed
dc.format.pagerange2336-2349
dc.relation.issn2329-9290
dc.relation.issue12
dc.relation.volume27
dc.rights.accesslevelopenAccess
dc.type.okmA1
uef.solecris.openaccessEi


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record