Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari NOLISP, Paris, March 23rd 2007.

Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari NOLISP, Paris, March 23rd 2007

NOLISP 2007, PARIS 23 Mai 2007 Who are the actors ? nMany groups are active : http://proget.int-evry.fr/portail/?AcceuilProGET INT – ARTEMIS INT – HANDICOM ENST / CNRS-LTCI ENST – Bretagne / SID nWhat are we doing : Maison IntelligenteMaison Intelligente, OUISPER, InfoM@gic, UNDLOUISPERInfoM@gicUNDL T@paT@pa, TéDéVi (ClipVideo)TéDéViClipVideo LABIAOLABIAO, MyLife3DMyLife3D Livre_lu, Aide aux malentendantsAide aux malentendants

NOLISP 2007, PARIS 23 Mai 2007 Codage/Compression par indexation nDébit : vers le 400 bits/s nEcoute –Original : –Analyse/Synthèse HNM : –Codage à très bas débit : nThèmes –Segmentation/Indexation dunités vocales (ALISP, polyphones), HMM –Analyse/synthèse HNM –Modifications de la voix (prosodie, timbre)

NOLISP 2007, PARIS 23 Mai 2007 Codage/Compression par indexation Entrée du signal de parole Analyse de la prosodie Analyse spectrale Codage des paramètres de prosodie Codage des paramètres de prosodie Paramètres de prosodie codés Reconnaissance HMM des RAU Reconnaissance HMM des RAU Coef LPCC Sélection des unités de synthèse: SAU Sélection des unités de synthèse: SAU Frontières des segments Indices des unités RAU Segments de parole Indices des unités SAU Indices des unités RAU Corpus de parole du codeur Sélection des unités de Synthèse: SAU Sélection des unités de Synthèse: SAU Segment sélectionné Corpus de parole du décodeur Analyse HNM Analyse HNM Modification de la prosodie Modification de la prosodie Synthèse HNM par concaténation Synthèse HNM par concaténation Sortie du signal synthétique de parole LPCC: HMM: RAU: SAU: HNM: Coefficient Cepstral de Prédiction Linéaire Hidden Markov Model Unité de Reconnaissance Unité de Synthèse Harmonic plus Noise Model

NOLISP 2007, PARIS 23 Mai 2007 Codage/Compression par indexation nApplications –Transmission (ex.: vers les mobiles) –Compression (ex.: livre lu) nEquipes –Permanents: M. Charbit, G. Chollet, E. Moulines –Thésard: S. Renouard

NOLISP 2007, PARIS 23 Mai 2007 nPartenariats –Projet RNRT : Sympatex, Thalès, Elan, ESIEE, –Projet GET : Maison Intelligente (aide aux handicapés) INT, ENST Br –Projet STRP: MobiNews (oct. 2003) Thalès, Elan, ESIEE, Radio France, Multitel, etc Codage/Compression par indexation

NOLISP 2007, PARIS 23 Mai 2007 nG. Baudoin, J. Cernocky, P. Gournay & G. Chollet, Codage de parole à bas et à très bas débit, Annales des Télécoms, 1999. nK.S.Lee, R.V.Cox, A very low bit rate speech coder based on a recognition/synthesis paradigm., Vol.9, n°5, pp:482-491, in IEEE Transactions on Speech and Audio Processing, July 2001. nCharles du Jeu, Maurice Charbit, Gérard Chollet, Very-low-rate speech compression by indexation of polyphones, Eurospeech 2003. nD. Cadic, O. Cappé, M. Charbit, G. Chollet, E. Moulines, « Toolbox » danalyse/synthèse vocale par HNM, rapport stage (ENST). Codage/Compression par indexation

NOLISP 2007, PARIS 23 Mai 2007 Audiovisual identity verification nCompulsory ? for: –Homeland/firms security: restricted accesses,… –Secured computer login –Secured on-line signature of contracts (e-Commerce)

NOLISP 2007, PARIS 23 Mai 2007 Audiovisual identity verification nAvailable features –Face/Face features (lip, eyes) Face Modality –Speech Speech Modality –Speech Synchrony Synchrony Modality

NOLISP 2007, PARIS 23 Mai 2007 Audiovisual identity verification nFace modality –Detection: Generative models (MPT toolbox) Temporal median Filtering Eyes detection within faces –Normalization: geometry + illumination

NOLISP 2007, PARIS 23 Mai 2007 Audiovisual identity verification nFace modality –Selection: Keep only the most reliable detection results Based on the distance Rel between a detected zone and its projection over the eigenfaces space

NOLISP 2007, PARIS 23 Mai 2007 Audiovisual identity verification nFace Modality: –Two verification strategies and one single comparison framework Global = Eigenfaces: –Calculation of a set of directions (eigenfaces) defining a projection space –Two faces are compared regarding their projection on the eigenfaces space. –Learning data: BIOMET (130 pers.) + BANCA (30 pers.)

NOLISP 2007, PARIS 23 Mai 2007 Audiovisual identity verification nFace Modality: SIFT descriptors: –Keypoints extraction –Keypoints representation: 128-dimensional vector (gradient orientation histogramme,…) + 4-dimensional position vector SIFT descriptor (dim 128) Position (x,y) + scale + orientation (dim 4)

NOLISP 2007, PARIS 23 Mai 2007 Audiovisual identity verification nFace Modality: SVD-based matching method: –Compare two videos V1 and V2 –Exclusive principle: One-to-one correspondences between »Faces (global) »Descriptors (local) –Principle: »Proximity matrix computation between faces or descriptors »Extraction of good pairings (made easy by SVD computation) –Scores: »One matching score between global representations »One matching score between local representations

NOLISP 2007, PARIS 23 Mai 2007 Audiovisual identity verification nSpeech Modality: –GMM-based approach; One world model Each speaker model is derived from the World Model by MAP adaptation Speech verification score: derived from likelihood ratio

NOLISP 2007, PARIS 23 Mai 2007 Audiovisual identity verification nSynchrony Modality: –Principle: synchrony between lips and speech carries identity information –Process: Computation of a synchrony model (CoIA analysis) for each person based on DCT (visual signal) and MFCC (speech signal) Comparison of the test sample with the synchrony model

NOLISP 2007, PARIS 23 Mai 2007 Audiovisual identity verification nExperiments: –BANCA database: 52 persons divided into two groups (G1 and G2) 3 recording conditions 1 person 8 recordings (4 client accesses, 4 impostor accesses) Evaluation based on P protocol: 234 client accesses and 312 impostor accesses –Scores: 4 scores per access (PCA face, SIFT face, speech, synchrony) Score fusion based on RBF-SVM: hyperplan learned on G1/tested on G2 and conversely)

NOLISP 2007, PARIS 23 Mai 2007 Audiovisual identity verification nExperiments:

NOLISP 2007, PARIS 23 Mai 2007 Audiovisual identity verification V2 V1 Frame N Frame N+1 Frame N+2 Frame M Frame M+1 Frame M+2 SIFT

Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari NOLISP, Paris, March 23rd 2007.

Présentations similaires

Présentation au sujet: "Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari NOLISP, Paris, March 23rd 2007."— Transcription de la présentation:

Présentations similaires

Notre projet

Feed-back

Entrer

S'autoriser via un réseau social:

Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari NOLISP, Paris, March 23rd 2007.

Présentations similaires

Présentation au sujet: "Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari NOLISP, Paris, March 23rd 2007."— Transcription de la présentation:

Présentations similaires

Notre projet

Feed-back