La présentation est en train de télécharger. S'il vous plaît, attendez

La présentation est en train de télécharger. S'il vous plaît, attendez

Statistical Machine Translation Translation without Understanding Colin Cherry.

Présentations similaires


Présentation au sujet: "Statistical Machine Translation Translation without Understanding Colin Cherry."— Transcription de la présentation:

1 Statistical Machine Translation Translation without Understanding Colin Cherry

2 Who is this guy?  One of Dr. Lin’s PhD students  Did my Masters degree at U of A  Research Area: Machine Translation  Home town: Halifax, Nova Scotia  Please ask questions!

3 Machine Translation  Translation is easy for (bilingual) people  Process:  Read the text in English  Understand it  Write it down in French

4 Machine Translation  Translation is easy for (bilingual) people  Process:  Read the text in English  Understand it  Write it down in French  Hard for computers  The human process is invisible, intangible

5 One approach: Babelfish  A rule-based approach to machine translation  A 30-year-old feat in Software Eng.  Programming knowledge in by hand is difficult and expensive

6 Alternate Approach: Statistics  What if we had a model for P(F|E) ?  We could use Bayes rule:

7 Why Bayes rule at all?  Why not model P(E|F) directly?  P(F|E)P(E) decomposition allows us to be sloppy  P(E) worries about good English  P(F|E) worries about French that matches English  The two can be trained independently

8 Crime Scene Analogy  F is a crime scene. E is a person who may have committed the crime  P(E|F) - look at the scene - who did it?  P(E) - who had a motive? (Profiler)  P(F|E) - could they have done it? (CSI - transportation, access to weapons, alabi)  Some people might have great motives, but no means - you need both!

9 On voit Jon à la télévision good English? P(E)good match to French? P(F|E) Jon appeared in TV. Appeared on Jon TV. In Jon appeared TV. Jon is happy today. Jon appeared on TV. TV appeared on Jon. TV in Jon appeared. Jon was not happy. Table borrowed from Jason Eisner

10 Where will we get P(F|E)? Books in English Same books, in French Machine Learning Magic P(F|E) model We call collections stored in two languages parallel corpora or parallel texts Want to update your system? Just add more text!

11 Our Inspiration:  The Canadian Parliamentary Debates!  Stored electronically in both French and English and available over the Internet

12 Problem:  How are we going to generalize from examples of translations?  I’ll spend the rest of this lecture telling you:  What makes a useful P(F|E)  How to obtain the statistics needed for P(F|E) from parallel texts

13 Strategy: Generative Story  When modeling P(X|Y):  Assume you start with Y  Decompose the creation of X from Y into some number of operations  Track statistics of individual operations  For a new example X,Y: P(X|Y) can be calculated based on the probability of the operations needed to get X from Y

14 What if…? The quick fox jumps over the lazy dog Le renard rapide saut par - dessus le chien parasseux

15 New Information  Call this new info a word alignment (A)  With A, we can make a good story The quick fox jumps over the lazy dog Le renard rapide saut par - dessus le chien parasseux

16 P(F,A|E) Story null The quick fox jumps over the lazy dog

17 P(F,A|E) Story null The quick fox jumps over the lazy dog f1f2f2 f3f3 …f 10

18 P(F,A|E) Story null The quick fox jumps over the lazy dog f1f2f2 f3f3 …f 10

19 P(F,A|E) Story null The quick fox jumps over the lazy dog Le renard rapide saut par - dessus le chien parasseux

20 P(F,A|E) Story null The quick fox jumps over the lazy dog Le renard rapide saut par - dessus le chien parasseux

21 Getting P t (f|e)  We need numbers for P t (f|e)  Example: P t (le|the)  Count lines in a large collection of aligned text null The quick fox jumps over the lazy dog Le renard rapide saut par - dessus le chien parasseux null The quick fox jumps over the lazy dog Le renard rapide saut par - dessus le chien parasseux null The quick fox jumps over the lazy dog Le renard rapide saut par - dessus le chien parasseux null The quick fox jumps over the lazy dog Le renard rapide saut par - dessus le chien parasseux null The quick fox jumps over the lazy dog Le renard rapide saut par - dessus le chien parasseux null The quick fox jumps over the lazy dog Le renard rapide saut par - dessus le chien parasseux

22 Where do we get the lines?  That sure looked like a lot of monkeys…  Remember POS tagging w/ HMMs:  You didn’t need a tagged corpus to train a tagger  We’ll get alignments out of unaligned text by treating the alignment as a hidden variable  Generalization of ideas in HMM training: called EM

23 English :In the beginning God created the heavens and the earth. Vietnamese :Ban dâu Dúc Chúa Tròi dung nên tròi dât. English :God called the expanse heaven. Vietnamese :Dúc Chúa Tròi dat tên khoang không la tròi. English :… you are this day like the stars of heaven in number. Vietnamese :… các nguoi dông nhu sao trên tròi. Where’s “heaven” in Vietnamese? Example borrowed from Jason Eisner

24 English :In the beginning God created the heavens and the earth. Vietnamese :Ban dâu Dúc Chúa Tròi dung nên tròi dât. English :God called the expanse heaven. Vietnamese :Dúc Chúa Tròi dat tên khoang không la tròi. English :… you are this day like the stars of heaven in number. Vietnamese :… các nguoi dông nhu sao trên tròi. Where’s “heaven” in Vietnamese? Example borrowed from Jason Eisner

25 EM: Estimation Maximization  Assume a probability distribution (weights) over hidden events  Take counts of events based on this distribution  Use counts to estimate new parameters  Use parameters to re-weight examples.  Rinse and repeat

26 Alignment Hypotheses null I like milk Je aime le lait null I like milk Je aime le lait null I like milk Je aime le lait null I like milk Je aime le lait null I like milk Je aime le lait null I like milk Je aime le lait null I like milk Je aime le lait null I like milk Je aime le lait 0.650.250.05 0.01 0.001

27 Weighted Alignments  What we’ll do is:  Consider every possible alignment  Give each alignment a weight - indicating how good it is  Count weighted alignments as normal

28 Good grief! We forgot about P(F|E)!  No worries, a little more stats gets us what we need:

29 Big Example: Corpus fast car voiture rapide fast rapide 1 2

30 Possible Alignments fast car voiture rapide fast rapide fast car voiture rapide 1a1b2

31 Parameters fast car voiture rapide fast rapide fast car voiture rapide 1a1b2 P(voiture|fast)P(rapide|fast)P(voiture|car)P(rapide|car) 1/2

32 Weight Calculations fast car voiture rapide fast rapide fast car voiture rapide 1a1b2 P(voiture|fast)P(rapide|fast)P(voiture|car)P(rapide|car) 1/2 P(A,F|E)P(A|F,E) 1a1/2*1/2=1/41/4 / 2/4 = 1/2 1b1/2*1/2=1/41/4 / 2/4 = 1/2 21/21/2 / 1/2 = 1

33 Count Lines fast car voiture rapide fast rapide fast car voiture rapide 1a1b2 1/2 1

34 Count Lines fast car voiture rapide fast rapide fast car voiture rapide 1a1b2 1/2 1 #(voiture,fast)#(rapide,fast)#(voiture,car)#(rapide,car) 1/21/2+1 = 3/21/2

35 Count Lines fast car voiture rapide fast rapide fast car voiture rapide 1a1b2 1/2 1 #(voiture,fast)#(rapide,fast)#(voiture,car)#(rapide,car) 1/21/2+1 = 3/21/2 Normalize P(voiture|fast)P(rapide|fast)P(voiture|car)P(rapide|car) 1/43/41/2

36 Parameters fast car voiture rapide fast rapide fast car voiture rapide 1a1b2 P(voiture|fast)P(rapide|fast)P(voiture|car)P(rapide|car) 1/43/41/2

37 Weight Calculations fast car voiture rapide fast rapide fast car voiture rapide 1a1b2 P(voiture|fast)P(rapide|fast)P(voiture|car)P(rapide|car) 1/43/41/2 P(A,F|E)P(A|F,E) 1a1/4*1/2=1/81/8 / 4/8 = 1/4 1b1/2*3/4=3/83/8 / 4/8 = 3/4 23/43/4 / 3/4 = 1

38 Count Lines fast car voiture rapide fast rapide fast car voiture rapide 1a1b2 1/43/41

39 Count Lines fast car voiture rapide fast rapide fast car voiture rapide 1a1b2 1/43/41 #(voiture,fast)#(rapide,fast)#(voiture,car)#(rapide,car) 1/43/4+1 = 7/43/41/4

40 Count Lines fast car voiture rapide fast rapide fast car voiture rapide 1a1b2 1/43/41 #(voiture,fast)#(rapide,fast)#(voiture,car)#(rapide,car) 1/43/4+1 = 7/43/41/4 Normalize P(voiture|fast)P(rapide|fast)P(voiture|car)P(rapide|car) 1/87/83/41/4

41 After many iterations: fast car voiture rapide fast rapide fast car voiture rapide 1a1b2 ~0~11 P(voiture|fast)P(rapide|fast)P(voiture|car)P(rapide|car) 0.0010.999 0.001

42 Seems too easy?  What if you have no 1-word sentence?  Words in shorter sentences will get more weight - fewer possible alignments  Weight is additive throughout the corpus: if a word e shows up frequently with some other word f, P(f|e) will go up

43 Some things I skipped  Enumerating all possible alignments:  Very easy with this model: The independence assumptions save us  Model could be a lot better:  Word positions  Multiple f’s generated by the same e  Can actually use an HMM!

44 The Final Product  Now we have a model for P(F|E)  Test it by aligning a corpus!  IE: Find argmax A P(A|F,E)  Use it for translation:  Combine with favorite model for P(E)  Search space of English sentences for one that maximizes P(E)P(F|E) for a given F

45 Questions? ?


Télécharger ppt "Statistical Machine Translation Translation without Understanding Colin Cherry."

Présentations similaires


Annonces Google