Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI

Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI
Gérard CHOLLET ENST/CNRS-LTCI 46 rue Barrault PARIS cedex 13

Outline What is ENST/CNRS-LTCI ?
Research and application topics related to COST-277: Speech production and perception, Speech analysis and synthesis, Speech coding: The SYMPATEX project Automatic speech recognition: The SIROCCO project Speaker characterisation and verification Perspectives within COST-277

Our affiliations ENST: Ecole Nationale Supérieure des Télécommunications CNRS: Centre National de la Recherche Scientifique LTCI: Laboratoire de Traitement et Communication de l’Information

What is ENST? Ecole Nationale de Télécommunications
classed among the ‘Grandes Ecoles d'Ingénieurs’. 250 state certified engineers each year . part of ‘Groupement des Ecoles de Télécommunications’

GET: Groupement des Ecoles de Télécommunications
ENST-Paris ( ) ENST-Bretagne in Brest Institut National des Télécommunications in Evry EURECOM in Sophia-Antipolis ENIC (Ecole Nouvelle d’Ingénieurs en Télécoms) in Lille Internet school in Marseille

Speech Production and Perception
Parametric Vocal Tract model (Shinji Maeda) Non-linear Production model using Distinctive Regions and Modes (René Carré) Quantal nature of speech (R. Carré and S. Maeda) Perceptual filter (Nicolas Moreau) Auditory prosthesis (Alain Goyé and Jacques Prado)

Speech analysis and synthesis
Time-Frequency representations, Wavelets Time-dependent spectral models (Yves Grenier) HNM (Harmonics + Noise Model) (Olivier Cappé, Eric Moulines, Maurice Charbit) Glottal Excited LPC

Time-dependent Spectral Models
Temporal Decomposition (B. Atal, 1983) Vectorial Autoregressive models with detection of model ruptures (A. DeLima, Y. Grenier) Segmental parameterisation using a time-dependent polynomial expansion (Y. Grenier)

Temporal Decomposition

HNM: Harmonics + Noise Model
Estimation des harmoniques Estimation de l’enveloppe harmonique Paramètres H+B f A Signal à l ’entrée Voisement Estimation AR du résiduel Détection du pitch, et l’énergie AR + - Voisé Non-voisé

A L I S P A utomatic L anguage I ndependent S peech P rocessing
Automatic discovery of segmental units for speech coding, synthesis, recognition, language identification and speaker verification.

Speech Coding by indexing
SYMPATEX SYstème de Messagerie unifiée avec présentation vocale des messages (PArole et TEXte) Thomson-CSF, ELAN TTS, Irius GET, ESIEE

Coding principle  parole Analyse spectrale Analyse prosodique
Reconnaissance HMM Dictionnaire des modèles HMM des unités ALISP Représentant A1 … Représentant A8 HMM A Détermination des unités de synthèse Choix unité de synthèse par DTW Codage prosodie Indice unité ALISP Indice unité de synthèse Pitch, énergie, temps

 Decoding Représentant A1 … Représentant A8 Indice ALISP
Parole synthétique Représentant A1 … Représentant A8 Indice ALISP N° représentant de synthèse Paramètres de prosodie Choix unité de synthèse Synthèse par concaténation

Automatic Speech Recognition
Recognition of proper names and spellings Keyword spotting, noise robustness, adaptation Large Vocabulary Speech Recognition (SIROCCO) Markov Random Fields, Bayesian Networks and Graphical Models

Markov Random Fields Bayesian Networks and Graphical Models
Speech modelling with state constrained Markov Random Field over Frequency bands (Guillaume Gravier and Marc Sigelle) Comparative framework to study MRF, Bayesian Networks and Graphical Models.

Speaker Verification Typology of approaches (EAGLES Handbook)
Text dependent Public password Private password Customized password Text prompted Text independent Incremental enrolment Evaluation

Speaker Verification (text independent)
The ELISA consortium ENST, LIA, IRISA, ... NIST evaluations

Support Vector Machines and Speaker Verification
Hybrid GMM-SVM system is proposed SVM scoring model trained on development data to classify true-target speakers access and impostors access, using new feature representation based on GMMs Modeling Scoring GMM SVM

SVM principles X y(X) Feature space Input space H Class(X) Ho
Separating hyperplan H , with the optimal hyperplan Ho Ho H Class(X)

Results

Voice technology in Majordome
Server side background tasks: continuous speech recognition applied to voice messages upon reception Detection of sender’s name and subject User interaction: Speaker identification and verification Speech recognition (receiving user commands through voice interaction) Text-to-speech synthesis (reading text summaries, s or faxes)

Perspectives within COST-277
Text-book on Speech Processing Evaluation of parametric representations of speech for diverse applications Fundamental work on voice transformations with applications in coding, synthesis, recognition and speaker characterisation Fundamental work on noise robustness with applications in coding, recognition and speaker verification

Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI

Présentations similaires

Présentation au sujet: "Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI"— Transcription de la présentation:

Présentations similaires

Notre projet

Feed-back

Entrer

S'autoriser via un réseau social:

Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI

Présentations similaires

Présentation au sujet: "Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI"— Transcription de la présentation:

Présentations similaires

Notre projet

Feed-back