Local spectral variability features for speaker verification
Self archived versionfinal draft
MetadataShow full item record
CitationSahidullah, Md. Kinnunen, Tomi. (2015). Local spectral variability features for speaker verification. Digital Signal Processing, 50, 1-11. 10.1016/j.dsp.2015.10.011.
Speaker verification techniques neglect the short-time variation in the feature space even though it contains speaker related attributes. We propose a simple method to capture and characterize this spectral variation through the eigenstructure of the sample covariance matrix. This covariance is computed using sliding window over spectral features. The newly formulated feature vectors representing local spectral variations are used with classical and state-of-the-art speaker recognition systems. Results on multiple speaker recognition evaluation corpora reveal that eigenvectors weighted with their normalized singular values are useful in representing local covariance information. We have also shown that local variability features can be extracted using mel frequency cepstral coefficients (MFCCs) as well as using three recently developed features: frequency domain linear prediction (FDLP), mean Hilbert envelope coefficients (MHECs) and power-normalized cepstral coefficients (PNCCs). Since information conveyed in the proposed feature is complementary to the standard short-term features, we apply different fusion techniques. We observe considerable relative improvements in speaker verification accuracy in combined mode on text-independent (NIST SRE) and text-dependent (RSR2015) speech corpora. We have obtained up to 12.28% relative improvement in speaker recognition accuracy on text-independent corpora. Conversely in experiments on text-dependent corpora, we have achieved up to 40% relative reduction in EER. To sum up, combining local covariance information with the traditional cepstral features holds promise as an additional speaker cue in both text-independent and text-dependent recognition.