Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 http://www.tsi.enst.fr/~chollet
Outline What is ENST/CNRS-LTCI ? Research and application topics related to COST-277: Speech production and perception, Speech analysis and synthesis, Speech coding: The SYMPATEX project Automatic speech recognition: The SIROCCO project Speaker characterisation and verification Perspectives within COST-277
Our affiliations ENST: Ecole Nationale Supérieure des Télécommunications http://www.enst.fr CNRS: Centre National de la Recherche Scientifique http://www.cnrs.fr LTCI: Laboratoire de Traitement et Communication de l’Information http://www.enst.fr/externe/ura.html
What is ENST? Ecole Nationale de Télécommunications classed among the ‘Grandes Ecoles d'Ingénieurs’. 250 state certified engineers each year . part of ‘Groupement des Ecoles de Télécommunications’
GET: Groupement des Ecoles de Télécommunications ENST-Paris ( ) ENST-Bretagne in Brest Institut National des Télécommunications in Evry EURECOM in Sophia-Antipolis ENIC (Ecole Nouvelle d’Ingénieurs en Télécoms) in Lille Internet school in Marseille
Speech Production and Perception Parametric Vocal Tract model (Shinji Maeda) Non-linear Production model using Distinctive Regions and Modes (René Carré) Quantal nature of speech (R. Carré and S. Maeda) Perceptual filter (Nicolas Moreau) Auditory prosthesis (Alain Goyé and Jacques Prado)
Speech analysis and synthesis Time-Frequency representations, Wavelets Time-dependent spectral models (Yves Grenier) HNM (Harmonics + Noise Model) (Olivier Cappé, Eric Moulines, Maurice Charbit) Glottal Excited LPC
Time-dependent Spectral Models Temporal Decomposition (B. Atal, 1983) Vectorial Autoregressive models with detection of model ruptures (A. DeLima, Y. Grenier) Segmental parameterisation using a time-dependent polynomial expansion (Y. Grenier)
Temporal Decomposition
HNM: Harmonics + Noise Model Estimation des harmoniques Estimation de l’enveloppe harmonique Paramètres H+B f A Signal à l ’entrée Voisement Estimation AR du résiduel Détection du pitch, et l’énergie AR + - Voisé Non-voisé
A L I S P A utomatic L anguage I ndependent S peech P rocessing Automatic discovery of segmental units for speech coding, synthesis, recognition, language identification and speaker verification.
Speech Coding by indexing SYMPATEX SYstème de Messagerie unifiée avec présentation vocale des messages (PArole et TEXte) Thomson-CSF, ELAN TTS, Irius GET, ESIEE
Coding principle parole Analyse spectrale Analyse prosodique Reconnaissance HMM Dictionnaire des modèles HMM des unités ALISP Représentant A1 … Représentant A8 HMM A Détermination des unités de synthèse Choix unité de synthèse par DTW Codage prosodie Indice unité ALISP Indice unité de synthèse Pitch, énergie, temps
Decoding Représentant A1 … Représentant A8 Indice ALISP Parole synthétique Représentant A1 … Représentant A8 Indice ALISP N° représentant de synthèse Paramètres de prosodie Choix unité de synthèse Synthèse par concaténation
Automatic Speech Recognition Recognition of proper names and spellings Keyword spotting, noise robustness, adaptation Large Vocabulary Speech Recognition (SIROCCO) http://perso.enst.fr/~sirocco/index-en.html Markov Random Fields, Bayesian Networks and Graphical Models
Markov Random Fields Bayesian Networks and Graphical Models Speech modelling with state constrained Markov Random Field over Frequency bands (Guillaume Gravier and Marc Sigelle) http://perso.enst.fr/~ggravier/recherche.html#these Comparative framework to study MRF, Bayesian Networks and Graphical Models. http://www.cs.berkeley.edu/~murphyk/Bayes/bayes.html
Speaker Verification Typology of approaches (EAGLES Handbook) Text dependent Public password Private password Customized password Text prompted Text independent Incremental enrolment Evaluation
Speaker Verification (text independent) The ELISA consortium ENST, LIA, IRISA, ... http://www.lia.univ-avignon.fr/equipes/RAL/elisa/index_en.html NIST evaluations http://www.nist.gov/speech/tests/spk/index.htm
Support Vector Machines and Speaker Verification Hybrid GMM-SVM system is proposed SVM scoring model trained on development data to classify true-target speakers access and impostors access, using new feature representation based on GMMs Modeling Scoring GMM SVM
SVM principles X y(X) Feature space Input space H Class(X) Ho Separating hyperplan H , with the optimal hyperplan Ho Ho H Class(X)
Results
Voice technology in Majordome Server side background tasks: continuous speech recognition applied to voice messages upon reception Detection of sender’s name and subject User interaction: Speaker identification and verification Speech recognition (receiving user commands through voice interaction) Text-to-speech synthesis (reading text summaries, E-mails or faxes)
Perspectives within COST-277 Text-book on Speech Processing Evaluation of parametric representations of speech for diverse applications Fundamental work on voice transformations with applications in coding, synthesis, recognition and speaker characterisation Fundamental work on noise robustness with applications in coding, recognition and speaker verification