La présentation est en train de télécharger. S'il vous plaît, attendez

La présentation est en train de télécharger. S'il vous plaît, attendez

Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg.

Présentations similaires


Présentation au sujet: "Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg."— Transcription de la présentation:

1 Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

2 Outline Introduction Thematic Alignment One-best Alignment Multiple Alignments o Meeting Thematic Segmentation Alignments Grouping Conclusion & Perspectives

3 Introduction In document-centric meetings (lectures, teleconferencing, press reviews, etc.): Static documents are present Should be integrated in a common multimedia archive Need to build links between documents and other media Introduction Thematic Alignment One-best Alignment Multiple Alignments o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

4 Document Alignments Several way to link static documents with other meeting data: Document/Image alignment Document/Speech alignment Introduction Thematic Alignment One-best Alignment Multiple Alignments o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

5 Document/Speech Alignment Links static data (documents) to temporal data (audio). Enriches the documents with temporal indexes and thematic links. Helps: Building document-based browsing interfaces. Improving documents search and retrieval. Introduction Thematic Alignment One-best Alignment Multiple Alignments o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

6 3 alignment categories Thematic: lexical similarity of document/speech parts Quotation of a document part Reference to a document part Document/Speech Alignment Text decomposition into segments Document Logical Syntactic Speech transcript Turns Utterances Document Logical SyntacticUtterances Turns Speech Transcript Introduction Thematic Alignment One-best Alignment Multiple Alignments o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

7 … Alors euh.. mardi 15 juillet, hier, euh.. la commission d'enquête parlementaire en France a rendu un rapport euh..sur la gestion des entreprises.. entreprises publiques. Euh.. Très critique sur la gestion de France Telecom et d'EDF, leurs politiques d'acquisitions ont été menées sans que les moyens humains,... … Speech Transcript Rendu public mardi 15 juillet, le rapport de la commission d'enquete parlementaire sur la gestion des entreprises publiques, presidee par Philippe Douste-Blazy, secretaire general de l'UMP. Tres critique sur la gestion de France Telecom et d'EDF - leurs politiques d'acquisitions ont ete menees sans que les moyens humains, techniques, financiers aient ete adaptes en consequence.. … Similarity based matching Thematic Alignment Introduction Thematic Alignment One-best Alignment Multiple Alignments o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

8 similarities S1 S2 S3 Speech transcript segments S1 S2 … Document segments Similarity based matching Vectors of weighted terms: S1 V1={t1, t2,..}; S1 V1={t1, t2,..} a. Stop-words removing, Stemming b. Similarity metrics between units Jaccard = |V1 V1| / |V1 V1| Dice = 2 × |V1 V1| / |V1| + |V1| Cosine =|V1 V1| / |V1| |V1| Two strategies: One-best and multiple alignments Thematic Alignment Introduction Thematic Alignment One-best Alignment Multiple Alignments o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

9 One-best Alignment Evaluation Precision: N3/ N2 Recall: N3/ N1 Precision 0 0.2 0.4 0.6 0.8 1 sent/uttutt/sentturn/logic Cosine Dice Jaccard Recall 0 0.2 0.4 0.6 0.8 1 sent/uttutt/sentturn/logic Cosine Dice Jaccard Improve the similarity metrics with a semantic dictionary Introduction Thematic Alignment One-best Alignment Multiple Alignments o Meeting Segmentation Alignments Grouping Conclusion & Perspectives Manual ground truth for 8 meetings N1: alignments to found (manual ) N2: alignments found (automatic) => N3: correct alignments found (automatic)

10 Meeting Thematic Segmentatio n Doc/Speech Thematic Alignment Multiple Alignments Evaluation A1A2A3A1A2A3 A4A4 A5A5 … Alors euh.. mardi 15 juillet, hier, euh.. la commission d'enquête parlementaire en France a rendu un rapport euh..sur la gestion des entreprises.. entreprises publiques. Euh.. en gros, ça dit que le modèle français des entreprises publiques ne répond plus aux nouvelles exi.. exigences internationales et européenne. … … Introduction Thematic Alignment One-best Alignment Multiple Alignments o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

11 Thematic alignment (e.g sentences/utterances) Alignability arcs Similarity weights nodes nodes size Sentences utterances 10 9 12 13 11 3 4 7 8 5 77787980818485 Similarity value The most connected sub-graphs Thematic regions Document sentences Speech utterances 84 85 91 10 9 12 13 14 11 (84, 9) 0.42 80 77 3 4 7 81 8 78 5 79 (91, 8) 0.25 Introduction Thematic Alignment One-best Alignment Multiple Alignments o Meeting Segmentation Alignments Grouping Conclusion & Perspectives Meeting Thematic Segmentation

12 Document sentences Speech utterances a. Bi-graph representation of the multiple alignment pairs. Meeting Themes b. Densest regions extraction (using clustering) A 1 A 2 A 3 A 4 A 5 S1S2S3S4S5S1S2S3S4S5 c. Segments extraction (clusters projection) Meeting Thematic Segmentation Introduction Thematic Alignment One-best Alignment Multiple Alignments o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

13 Thematic Segmentation Evaluation Manual ground-truth for 22 meetings 1. Speech: 2 main sets Stereotyped: 2.7 utterances/turn (ratio>2) Non-stereotyped: 1.3 utterances/turn (ratio<=2) 2. Documents: 2 main sets Mono-document Multi-documents Comparison with 2 mono-modal methods: Texttiling, Baseline Speech Baseline: turn-based segmentation Documents Baseline : reflexive alignment/clustering Introduction Thematic Alignment One-best Alignment Multiple Alignments o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

14 P k (Beeferman) metric 0 for a perfect segmentation. Bi-modal Texttiling Baseline a. Speech b. Documents Thematic Segmentation Evaluation Introduction Thematic Alignment One-best Alignment Multiple Alignments o Meeting Segmentation Alignments Grouping Conclusion & Perspectives 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 StereotypedNon- stereotyped StereotypedNon- stereotyped Mono-documentMulti-documents Meetings Pk 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 StereotypedNon-stereotyped Meetings Pk

15 Our bi-modal method outperforms standard mono-modal methods: Analysis bridges the gaps between documents and speech transcript detects the similar segments Document A 1 A 2 A 3 A 4 A 5 S1S2S3S4S5S1S2S3S4S5 S1S2S3S4S5S1S2S3S4S5 S1S2S3S4S5S1S2S3S4S5 Documents greatly help structuring meetings more precise in computing the segments number Introduction Thematic Alignment One-best Alignment Multiple Alignments o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

16 Alignments Grouping 1.Implementation of a framework that: a. Combine the various levels, to correct the false alignments pairs, e.g. (sentences x utterances) & (logical blocks x turns) Speech T1 T2 U1 U2 U3 Document L1 L2 S1 S2 b. Combine the 3 alignments categories (Thematic, Quotations and References) to improve the document/speech alignment Speech Document Introduction Thematic Alignment One-best Alignment Multiple Alignment o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

17 2. A tool for the visualization that: Highlights the alignment categories (Thematic, Quotations, References) Represent the various structures of the documents/speech as Layers. Alignments Grouping Introduction Thematic Alignment One-best Alignment Multiple Alignment o Meeting Segmentation Alignments Grouping Conclusion & Perspectives Speech Document

18 Conclusion Thematic Alignment of documents with meeting dialog Is a solution for integrating static documents into multimedia archives: Conference Lectures, etc. Introduction Thematic Alignment One-best Alignment Multiple Alignments o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

19 Perspectives Automatic transcription of the speech Generalize the alignment on other: documents types with few text (e.g. slides, agenda) meeting kinds where documents are discussed irregularly (e.g. conferences) Introduction Thematic Alignment One-best Alignment Multiple Alignments o Meeting Segmentation Alignments Grouping Conclusion & Perspectives


Télécharger ppt "Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg."

Présentations similaires


Annonces Google