A. Cornuéjols IAA (basé sur Rob Schapire’s IJCAI’99 talk) Combiner des apprenants: le boosting.

A. Cornuéjols IAA (basé sur Rob Schapire’s IJCAI’99 talk) Combiner des apprenants: le boosting

Boosting 2 Prédiction de courses hippiques

Boosting 3 Comment gagner aux courses ? On interroge des parieurs professionnels Supposons: –Que les professionnels ne puissent pas fournir une règle de pari simple et performante –Mais que face à des cas de courses, ils puissent toujours produire des règles un peu meilleures que le hasard Pouvons-nous devenir riche?

Boosting 4 Idée Demander à l’expert des heuristiques Recueillir un ensemble de cas pour lesquels ces heuristiques échouent (cas difficiles) Ré-interroger l’expert pour qu’il fournisse des heuristiques pour les cas difficiles Et ainsi de suite… Combiner toutes ces heuristiques Un expert peut aussi bien être un algorithme d’apprentissage peu performant (weak learner)

Boosting 5 Questions Comment choisir les courses à chaque étape? å Se concentrer sur les courses les plus “difficiles” (celles sur lesquelles les heuristiques précédentes sont les moins performantes) Comment combiner les heuristiques (règles de prédiction) en une seule règle de prédiction ? å Prendre une vote (pondéré) majoritaire de ces règles

Boosting 6 Boosting boosting boosting = méthode générale pour convertir des règles de prédiction peu performantes en une règle de prédiction (très) performante Plus précisément : –Étant donné un algorithme d’apprentissage “faible” qui peut toujours retourner une hypothèse de taux d’erreur  1/2-  –Un algorithme de boosting peut construire (de manière prouvée) une règle de décision (hypothèse) de taux d’erreur  

Boosting 7 Soit X un espace d’entrée à 10 dimensions Les attributs sont indépendants et de distribution gaussienne L’étiquette est définie par : 2000 exemples d’apprentissages (1000+;1000-) 10000 exemples de test Apprentissage d’arbres de décision Illustration avec :

Boosting 8 Illustration (cont.) 0,5 0,4 0,3 0,2 0,1 0,0 0 100 200 300 400 Arbre à un noeud Arbre à 400 noeuds Arbres à un nœud avec boosting

Boosting 9 Plan Introduction au boosting (AdaBoost) Expériences Conclusion Analyse de l’erreur en apprentissage Analyse de l’erreur en généralisation basée sur la théorie des marges Extensions Bibliographie

Boosting 10 Étant donné l’échantillon d’apprentissage S = {(x 1,y 1 ),…,(x m,y m )} y i  {  } étiquette de l’exemple x i  S Pour t = 1,…,T : Construire la distribution D t sur {1,…,m} Trouver l’hypothese faible (“heuristique”) h t : S  {  } avec erreur petite  t sur D t : Retourner l’hypothèse finale h final Boosting : vue formelle

Boosting 11 Le principe général X h0h0 D0D0 X h1h1 D1D1 X h2h2 D2D2 X hThT DTDT Comment passer de D t à D t+1 ? Comment calculer la pondération  t ?

Boosting 12 AdaBoost [Freund&Schapire ’97] construire D t : Étant donnée D t et h t : où: Z t = constante de normalisation Hypothèse finale :

Boosting 13 AdaBoost en plus gros

Boosting 14 Exemple jouet

Boosting 15 Étape 1

Boosting 18 Hypothèse finale

Boosting 19 Une Applet Boosting http://www.research.att.com/~yoav/adaboost/index.html

Boosting 20 Avantages pratiques de AdaBoost (très) rapide simple + facile à programmer Une seul paramètre à régler : le nombre d’étapes de boosting (T) Applicable à de nombreux domaines par un bon choix de classifieur faible (neuro net, C4.5, …) Pas de sur-spécialisation par la maximisation des marges Peut être adapté au cas où ht : X  R ; la classe est définie par le signe de h t (x) ; la confiance est donnée par | h t (x) | Peut être adapté aux problèmes multi-classes où y i  {1,..,c} et aux problèmes multi-étiquettes Permet de trouver les exemples aberrants (outliers)

Boosting 21 Caveats performance depends on data & weak learner AdaBoost can fail if –weak hypothesis too complex (overfitting) –weak hypothesis too weak (  t  0 too quickly), underfitting Low margins  overfitting empirically, AdaBoost seems especially susceptible to noise

Boosting 22 Conclusion boosting useful tool for classification problems grounded in rich theory performs well experimentally often (but not always) resistant to overfitting many applications but slower classifiers result less comprehensible sometime susceptible to noise

Boosting 23 UCI Benchmarks Comparison with C4.5 (Quinlan’s Decision Tree Algorithm) Decision Stumps (only single attribute)

Boosting 24 UCI Results

Boosting 25 Analyzing the Training Error Theorem [Freund&Schapire ’97]: write  t as ½-  t then so if  t:  t   > 0 then Remark: AdaBoost is adaptive: does not need to know  or T a priori can exploit  t  

Boosting 26 Proof Intuition on round t : increase weight of examples incorrectly classified by h t if x i incorrectly classified by H final then x i incorrectly classified by weighted majority of h t ’s then x i must have “large” weight under final dist. D T+1 since total weight  1: number of incorrectly classified examples “small”

Boosting 27 A First Attempt at Analyzing Generalization Error we expect:  training error to continue to drop (or reach zero)  test error to increase when H final becomes “too complex” (Occam’s razor)

Boosting 28 A Typical Run Test error does not increase even after 1,000 rounds (~2,000,000 nodes) Test error continues to drop after training error is zero! Occam’s razor wrongly predicts “simpler” rule is better. (boosting on C4.5 on “letter” dataset)

Boosting 29 A Better Story: Margins Key idea: Consider confidence (margin): with define: margin of (x,y) =

Boosting 30 Margins for Toy Example

Boosting 31 The Margin Distribution epoch51001000 training error0.0 test error8.43.33.1 %margins  0.5 7.70.0 Minimum margin0.140.520.55

Boosting 32 Boosting Maximizes Margins Can be shown to minimize  to margin of (x i,y i )

Boosting 33 Analyzing Boosting Using Margins generalization error bounded by function of training sample margins:  larger margin  better bound  bound independent on # of epochs  boosting tends to increase margins of training examples by concentrating on those with smallest margin

Boosting 34 Relation to SVMs SVM: map x into high-dim space, separate data linearly

Boosting 35 Relation to SVMs (cont.)

Boosting 36 Relation to SVMs Both maximize margins: SVM:Euclidean norm ( L 2 ) AdaBoost:Manhattan norm ( L 1 ) Has implications for optimization, PAC bounds See [Freund et al ‘98] for details

Boosting 37 Extensions: Multiclass Problems Reduce to binary problem by creating several binary questions for each example: “does or does not example x belong to class 1?” “does or does not example x belong to class 2?” “does or does not example x belong to class 3?”...

Boosting 38 Extensions: Confidences and Probabilities Prediction of hypothesis h t : Confidence of hypothesis h t : Probability of H final : [Schapire&Singer ‘98], [Friedman, Hastie & Tibshirani ‘98]

Boosting 39 Text Categorization Decision stumps: presence of word or short phrase. Example: “If the word Clinton appears in the document predict document is about politics” database: Reutersdatabase: AP

Boosting 40 Many More Applications …in many recent papers on boosting!

Boosting 41 Background [Valiant’84] introduced theoretical PAC model for studying machine learning [Kearns&Valiant’88] open problem of finding a boosting algorithm [Schapire’89], [Freund’90] first polynomial-time boosting algorithms [Drucker, Schapire&Simard ’92] first experiments using boosting

Boosting 42 Backgroung (cont.) [Freund&Schapire ’95] –introduced AdaBoost algorithm –strong practical advantages over previous boosting algorithms experiments using AdaBoost: [Drucker&Cortes ’95][Schapire&Singer ’98] [Jackson&Cravon ’96][Maclin&Opitz ’97] [Freund&Schapire ’96][Bauer&Kohavi ’97] [Quinlan ’96][Schwenk&Bengio ’98] [Breiman ’96][Dietterich’98] continuing development of theory & algorithms: [Schapire,Freund,Bartlett&Lee ’97] [Schapire&Singer ’98] [Breiman ’97][Mason, Bartlett&Baxter ’98] [Grive and Schuurmans’98][Friedman, Hastie&Tibshirani ’98]

A. Cornuéjols IAA (basé sur Rob Schapire’s IJCAI’99 talk) Combiner des apprenants: le boosting.

Présentations similaires

Présentation au sujet: "A. Cornuéjols IAA (basé sur Rob Schapire’s IJCAI’99 talk) Combiner des apprenants: le boosting."— Transcription de la présentation:

Présentations similaires

Notre projet

Feed-back

Entrer

S'autoriser via un réseau social:

A. Cornuéjols IAA (basé sur Rob Schapire’s IJCAI’99 talk) Combiner des apprenants: le boosting.

Présentations similaires

Présentation au sujet: "A. Cornuéjols IAA (basé sur Rob Schapire’s IJCAI’99 talk) Combiner des apprenants: le boosting."— Transcription de la présentation:

Présentations similaires

Notre projet

Feed-back