Télécharger la présentation
La présentation est en train de télécharger. S'il vous plaît, attendez
Publié parJean-Marc Beauregard Modifié depuis plus de 8 années
1
SLVA Colloquium 15-17 MARCH 2002 1
2
2 L’enseignement d’une langue seconde a toujours été difficile si non handicapé par un manque de moyens pour transmettre le grand nombre d’items lexicaux nécessaires à un usage de base. La résolution de ce problème a été amorcée par l’analyse de la fréquence d’usage des mots d’un corpus. Une approche extrêmement utile à l’enseignement de l’anglais, langue seconde, a été récemment développée. Cette approche est fondée sur la fréquence d’usage du mot et sur un ensemble d’outils logiciels pour supporter, à la fois, la conception d’un syllabus et l’apprentissage de la langue. Ma contribution à ce projet a été de collecter ces concepts et ces outils, de les adapter pour les besoins des chercheurs, professeur et apprenants et de les mettre en ligne pour un accès universel. Le projet est en grande partie complété. J’ai commencé l’analyse du corpus et le développement logiciel pour une approche similaire à l’enseignement et l’apprentissage du français. Ma présentation abordera les méthodes et les problèmes, à la fois théoriques et pratiques ainsi que l’état d’avancement du travail. Résumé
3
SLVA Colloquium 15-17 MARCH 2002 3 Translating The Compleat Lexical Tutor into French Is there a GSL (General Service List) and AWL (Academic Word List) in French?
4
SLVA Colloquium 15-17 MARCH 2002 4 Second language teaching has always been hampered (if not crippled) by the lack of a means to teach the large number of vocabulary items needed for even basic functioning in a language. Computational corpus analysis has begun to solve this problem. A new approach to the teaching of English as a Second Language is based on word frequency and the use of computational tools for both instructional design and learning. My role in this project has been to adapt these concepts and tools to the needs of researchers, teachers, and learners, and put them on line for universal access. This adaptation is now largely complete, and I have begun the corpus analysis and software development for a similar project in teaching/ learning French. This presentation will discuss procedures, problems (theoretical and practical) and progress of this project to date. Abstract
5
SLVA Colloquium 15-17 MARCH 2002 5 Key concept Text Coverage 86,000 100% 44,000 99% 6,000 89% 5,00088% 4,00087% 3,00084% 2,00080% 1,00072% HOUSE, BIG, WAY, GIRL… 10049% THE, A, OF, I, BY… 1024% Words Text Known Covered
6
SLVA Colloquium 15-17 MARCH 2002 6 Teaching implications Good news: Huge payoff for learning first 2000 words (80% text coverage) Bad news: After 2000, coverage by natural increase slows dramatically Only at about 6000 is 90% reached Adequate comprehension relies on 95% of words known
7
SLVA Colloquium 15-17 MARCH 2002 7 Exploiting the good news Pedagogical word lists where the smallest amount of word learning… … gives the largest amount of text coverage The General Service List (GSL; West, 1953) has served this function
8
SLVA Colloquium 15-17 MARCH 2002 8 Countering the bad news (1) AWL Academic Wordlist 570 post-2k word families frequent across the genre of ‘academic texts’ And can break through the 90% barrier (2) Domain- Specific Wordlists Putative frequency zones within domains (law, medicine, economics) May raise coverage to 95% Look for high frequency items within genres and domains
9
SLVA Colloquium 15-17 MARCH 2002 9 Targeting 95% … and frequency in domains 95% at 3,000 vs. 95% at c.30,000
10
SLVA Colloquium 15-17 MARCH 2002 10 How to make use of this information Develop tests to locate learner in 1k-2k-AWL sequence? Give lists to learners? Out of context? Find reading texts for learners targeting specific lexical i+1 ? Give learners their own tools of corpus analysis? Compleat Lexical Tutor website attempts to make this information usable Compleat Lexical Tutor For course designers For researchers For learners
11
SLVA Colloquium 15-17 MARCH 2002 11 What can corpus analysis do for learners? We have seen that corpus analysis can reduce the quantity of words to be learned The same tools can increase the number of contexts these words will be presented in The Compleat Lexical Tutor attempts to exploit both these capabilitiesCompleat Lexical Tutor
12
SLVA Colloquium 15-17 MARCH 2002 12 To mention: research on learning from concordance Multiple exposure of concordance contexts has same effect as natural contexts but in less time. Cobb, T., & Horst, M. Reading academic English: Carrying learners across the lexical threshold. In John Flowerdew & Matthew Peacock (Eds.) The English for Academic Purposes Curriculum (pp. 315-329). Cambridge: Cambridge University Press..Reading academic English: Carrying learners across the lexical threshold. Cobb, T., Greaves, C., & Horst, Peut-on augmenter le rythme d’acquisition lexicale par la lecture ? Une expérience de lecture en français appuyée sur une série de ressources en ligne. Dans P. Raymond & C. Cornaire, Regards sur la didactique des langues secondes. Montréal: Éditions logique.Peut-on augmenter le rythme d’acquisition lexicale par la lecture ? Une expérience de lecture en français appuyée sur une série de ressources en ligne.
13
SLVA Colloquium 15-17 MARCH 2002 13 Definitional vs. multicontextual learning with concordances Classic off-line finding replicated Example of cognitive efficiency (Cobb, 1997)Cobb, 1997
14
SLVA Colloquium 15-17 MARCH 2002 14 Could a similar approach be used for other languages (French)? Open University (UK) French Project Goodfellow, Jones & Lamy (2002) 1k and 2k lists developed by Selva French AWL developed by Jones Within Nation and Laufer framework Purpose initially tutorial –to predict raters’ assessments of student writing Hypothetical set of equivalent lists
15
SLVA Colloquium 15-17 MARCH 2002 15 Is there a French GSL and AWL? Following the many uses found for Nation's text analysis program Vocabprofile, a question arises as to whether such a tool could be developed for other languages. The answer is not obvious, even for a language as lexically similar to English as French. For example, it is possible that French deploys its resources differently from English, say, using grammar where English uses lexis. And it cannot be taken for granted that anything resembling an Academic Word List (AWL) even exists in French, given that many of the medium-frequency items on the English AWL are common items in French. Leiden – Horst Cobb presentation
16
SLVA Colloquium 15-17 MARCH 2002 16 Personal context for interest in French lists Two inevitable questions at the end of any demonstration of Lexical Tutor Website (which uses the GSL-AWL lists framework) Do these lists exist in French? Is Français Fondamental the same as these lists? Longstanding attempt to get research assistants to assemble such lists Inevitably defeated by – Lack / inaccessibility of French corpora Lack of frequency studies in French Laborious lemmatization (to flesh out families with huge membership)
17
SLVA Colloquium 15-17 MARCH 2002 17 Using 1k-2k-AWL frame in a Lexical Tutor
18
SLVA Colloquium 15-17 MARCH 2002 18 Research context for interest in French lists Alderson (1982) “threshold” question finally getting some sustained attention Bernhardt, Bossers, Schoonen, Hulstijn Learners’ L1 literacy could tell us a lot about how they can become literate in L2… Need we teach “strategies” in L2 that already exist inL1 but are inactive due to lack of L2 knowledge/ability? Big need for validated comparative measures
19
SLVA Colloquium 15-17 MARCH 2002 19 Now first hurdle cleared… Thierry Selva + OU group have finally got French lists over the initial hurdle French frequency lists from Parole corpus Plus some automation of lemma building Used in initial small study of OU Ss’ writing Predict raters’ assessment of student writing Finding: 2k-F can predict grade Made into a pilot tutorial Website However, power and behaviour of OU French lists have not been extensively investigated
20
SLVA Colloquium 15-17 MARCH 2002 20 Concepts 1.Frequency 2.Coverage 3.Specialist word list 4.Word family 5.Vocab profile 86,000 100% 44,000 99% 6,000 89% 5,00088% 4,00087% 3,00084% 2,00080% 1,00072% HOUSE, BIG, WAY, GIRL… 10049% THE, A, OF, I, BY… 1024% Cumulative Words Text Known Coverage
21
SLVA Colloquium 15-17 MARCH 2002 21 Lexical frequency profile of English
22
SLVA Colloquium 15-17 MARCH 2002 22 Frequency profile: L2 Learning zone
23
SLVA Colloquium 15-17 MARCH 2002 23 Targeting 90% - frequency in genres… 90% at 2,500 vs. 90% after 6,000
24
SLVA Colloquium 15-17 MARCH 2002 24 Curves compared
25
SLVA Colloquium 15-17 MARCH 2002 25 GSL (1k + 2k) and AWL list samples 0-1000 A an ABLE ability abler ablest ably abilities unable inability ABOUT ABOVE ACCEPT acceptability acceptable unacceptable acceptance accepted accepting accepts ACCORD accorded accordance according accordingly accords 1001-2000 ABROAD ABSENT absence absences ABSOLUTE absolutely ACCIDENT accidents accidental accidentally ACCUSE accusing accuses accused ACCUSTOM accustoms accustoming accustomed ACHE aching aches ached ADMIRE admiring AWL ABANDON abandoned abandoning abandonment abandons ABSTRACT abstraction abstractions abstractly abstracts ACADEMY academia academic academically academics academies ACCESS accessed accesses accessibility accessible accessing inaccessible ACCOMMODATE accommodated
26
SLVA Colloquium 15-17 MARCH 2002 26 Visual Vocab Profile (1) K1 Words (1 to 1000): 69.37% K2 Words (1001 to 2000): 6.31% AWL Words (academic): 7.21% Off-List Words: 17.12% PROFILE: Wireless technology giant Motorola Inc has introduced a range of new wireless telephones including one of the industry first third generation models that it hopes will add fuel to its recent resurgence in the global handset market On Thursday the Chicago based company unveiled five new wireless phones at a company design centre in Milan Italy as well as three existing models with enhanced technology and several accessories to provide additional entertainment and functionality Motorola which ranks far behind Finland Nokia as the world second largest maker of wireless phones has lost market share over the past decade to Nokia sleek and popular phones Motorola stock was off number cents at number number on the New York Stock Exchange on Friday In recent quarters Motorola has returned its wireless phone unit or Personal Communications Sector business to profitability and boosted market share with new models It said it had about number per cent of the global handset market in the fourth quarter of number half of Nokia leading share and has targeted number per cent long term Anyone who tracks our industry will recognize the new direction represented by our number portfolio Mike Zafirovski president of Motorola handset business said in a statement Design style elegance entertainment and overall experience these are all the hallmarks of a renewed and refocused Motorola PCS How? Each word in text is compared to three word lists…
27
SLVA Colloquium 15-17 MARCH 2002 27 K1 Words (1 to 1000): 63.75% K2 Words (1001 to 2000): 6.25% AWL Words (academic): 18.33% Off-List Words: 11.67% PROFILE: Automatic extraction of keywords from scientific text Application to the knowledge domain of protein families Abstract Annotation of the biological function of different protein sequences is a time consuming process currently performed by human experts Genome analysis tools encounter great difficulty in performing this task Database curators developers of genome analysis tools and biologists in general could benefit from the access to tools able to suggest functional annotations and facilitate access to functional information In the paper we present a prototype system for the automatic annotation of protein function The system is triggered by collections of abstracts related to a given protein and it is able to extract biological information directly from scientific literature i e MEDLINE abstracts Relevant keywords are selected by their relative accumulation in comparison with a domain specific background distribution Simultaneously the most representative sentences and MEDLINE abstracts are selected and presented to the end user Evolutionary information is considered as a predominant characteristic in the domain of protein function Our system consequently extracts domain specific information from the analysis of a set of protein families The system has been tested with different protein families of which three examples are discussed in detail in the paper ataxia telangiectasia associated protein ran GTPase and carbonic anhydrase We found generally good correlation between the amount of information provided to the system and the quality of the annotations The current limitations and future developments of the system are discussed Visual Vocab Profile (2)
28
SLVA Colloquium 15-17 MARCH 2002 28 Sample lexical profiles (in quasi-academic school texts)
29
SLVA Colloquium 15-17 MARCH 2002 29 Concepts 1.Frequency Extent that words are repeated 2.Text coverage Percentage of running text a frequency list can (reliably) account for 3.Specialist word list Frequency list not in lang-at-large but in a text genre (e.g., AWL) used to increase coverage 4.Word family Word + inflections + derivations where no extra learning is required 5.Vocab profile Sorting/counting words of a text by frequency zones
30
SLVA Colloquium 15-17 MARCH 2002 30 Interesting questions about the OU French lists 1.How does their coverage profile compare to English lists? 2.How similar are the new lists to old FF? In terms of contents In terms of text coverage 3.Is there a French AWL? All of French is Greco-Latin… 4.How useful are current lists for research? What are potential uses of a validated Fr GSL-AWL? 5.How useful are current lists for teaching? Can French lists be adapted for online Lexical Tutor?
31
SLVA Colloquium 15-17 MARCH 2002 31 Method Feasibility / pilot type study Comparison of small sets of parallel texts Vocabprofile analysis Machine Hand List sampling Proper nouns not eliminated or reclassified But roughly equal between languages Findings are suggestive and require confirmation with larger texts
32
SLVA Colloquium 15-17 MARCH 2002 32 Question 1. How does the text coverage profile compare to English?
33
SLVA Colloquium 15-17 MARCH 2002 33 French list samples 0-1000 A au aux ABANDONNER abandonner abandonne abandonnes abandonnons abandonnez abandonnent abandonnais abandonnait abandonnions abandonniez abandonnaient abandonnai abandonnât abandonnassions abandonnassiez abandonnassent abandonnerais abandonnerait abandonnerions abandonneriez abandonneraient abandonnant abandonné abandonnée abandonnés abandonnées ABORD abords 1001-2000 ABOUTIR aboutir aboutis aboutit aboutissons aboutissez aboutissent aboutissais aboutissait aboutissions aboutissiez aboutissaient aboutîmes aboutîtes aboutirent aboutirai aboutiras aboutira aboutirons aboutirez aboutiront aboutisse aboutisses aboutît aboutissions aboutissiez aboutissent aboutirais aboutirait aboutirions AWL? ABOLITION abolition abolitions ABONDANCE abondance ABONDANT abondant abondante abondants abondantes ABSENT absent absente absents absentes ABSOLUMENT absolument ABSTRAIT abstrait abstraits abstraites abstraite ACCÉDER accéder accède accédassions accédassiez accédassent
34
SLVA Colloquium 15-17 MARCH 2002 34 Comparison text samples Traditional methods of teaching are no longer enough in this technological world. Currently there are more than 100,000 computers in schoolrooms in the United States. Students, mediocre and bright alike, from the first grade through high school, not only are not intimidated by computers, but have become enthusiastic users. Children are very good at using computers in their school curriculum. A music student can program musical notes so that the computer will play Beethoven or the Beatles. In a biology class, the computer can produce a picture of the complex actions of the body's organs, thus enabling today's students to understand human biology more deeply. A nuclear reactor is no longer a puzzle to students who can see its workings in minute detail on a computer. In Wisconsin, the Chippewa Indians are studying their ancient and almost forgotten language with the aid of a computer. The simplest computers aid the handicapped, who learn more rapidly from the computer than from humans. Once a source of irritation, practice and exercises on the computer are now helping children to learn because the machine responds to correct answers with praise and to incorrect answers with sad faces and even an occasional tear. Les méthodes traditionnelles d'enseignement ne suffisent plus dans ce monde de technologie. Présentement, on compte plus de 100 000 ordinateurs dans les salles de classe aux États-Unis. Les étudiants, moyens et brillants pareils, de la première année jusqu'à la fin du secondaire, sont non seulement peu intimidés par les ordinateurs mais sont même devenus des utilisateurs enthousiastes. Les enfants sont très doués pour ce qui est d'utiliser les ordinateurs dans leur curriculum scolaire. Un étudiant en musique peut programmer des notes pour que l'ordinateur joue Beethoven ou les Beatles. Dans un cours de biologie, l'ordinateur peut produire une image du fonctionnement complexe des organes du corps, permettant ainsi à l'étudiant de comprendre plus en profondeur les principes de la biologie humaine. Un réacteur nucléaire n'a plus de mystères pour les étudiants qui peuvent observer son fonctionnement en détails sur l'ordinateur. Dans le Wisconsin, les indiens Chippewa s'en servent pour étudier leur langue, ancienne et presque oubliée. Les ordinateurs les plus simples aident les personnes handicapées qui apprennent plus rapidement d'une machine que d'une autre personne. Jadis une source d'irritation, la pratique et les exercices sur l'ordinateur aident maintenant les enfants à apprendre parce que la machine complimente les bonnes réponses et présente des visages tristes et même parfois une larme pour les mauvaises réponses. Bilingual texts courtesy N. Segalowitz
35
SLVA Colloquium 15-17 MARCH 2002 35 Words in text (tokens):200100% Different words (types):125 Type-token ratio:0.62 K1 Words (1 to 1000):15376.50% K2 Words (1001 to 2000):105.00% AWL Words (academic):2211.00% Off-List Words:157.50% PROFILE: Traditional methods of teaching are no longer enough in this technological world Currently there are more than number number computers in schoolrooms in the United States Students mediocre and bright alike from the first grade through high school not only are not intimidated by computers but have become enthusiastic users Children are very good at using computers in their school curriculum A music student can program musical notes so that the computer will play Beethoven or the Beatles In a biology class the computer can produce a picture of the complex actions of the body organs thus enabling today students to understand human biology more deeply A nuclear reactor is no longer a puzzle to students who can see its workings in minute detail on a computer In Wisconsin the Chippewa Indians are studying their ancient and almost forgotten language with the aid of a computer The simplest computers aid the handicapped who learn more rapidly from the computer than from humans Once a source of irritation practice and exercises on the computer are now helping children to learn because the machine responds to correct answers with praise and to incorrect answers with sad faces and even an occasional tear VP ‘Computer learning’
36
SLVA Colloquium 15-17 MARCH 2002 36 Mots dans le texte (tokens):226100% Mots différents (types):131 Ratio type-token:0.58 Mots K1 (1 à 1000):17878.76% Mots K2 (1001 à 2000):2310.18% Mots Académiques:83.54% Mots Off-List:177.52% Profil intégral: Les méthodes traditionnelles de enseignement ne suffisent plus dans ce monde de technologie Présentement on compte plus de nombre ordinateurs dans les salles de classe aux états Unis Les étudiants moyens et brillants pareils de la première année jusque la fin du secondaire sont non seulement peu intimidés par les ordinateurs mais sont même devenus des utilisateurs enthousiastes Les enfants sont très doués pour ce qui est de utiliser les ordinateurs dans leur curriculum scolaire Un étudiant en musique peut programmer des notes pour que le/la ordinateur joue Beethoven ou les Beatles Dans un cours de biologie le/la ordinateur peut produire une image du fonctionnement complexe des organes du corps permettant ainsi à le/la étudiant de comprendre plus en profondeur les principes de la biologie humaine Un réacteur nucléaire ne a plus de mystères pour les étudiants qui peuvent observer son fonctionnement en détails sur le/la ordinateur Dans le Wisconsin les indiens Chippewa se/si en servent pour étudier leur langue ancienne et presque oubliée Les ordinateurs les plus simples aident les personnes handicapées qui apprennent plus rapidement de une machine que de une autre personne Jadis une source de irritation la pratique et les exercices sur le/la ordinateur aident maintenant les enfants à apprendre parce que la machine complimente les bonnes réponses et présente des visages tristes et même parfois une larme pour les mauvaises réponses VP ‘Aprentissage par ordinateur’
37
SLVA Colloquium 15-17 MARCH 2002 37 Typical English lexical profiles (in quasi-academic school texts)
38
SLVA Colloquium 15-17 MARCH 2002 38 Coverage of same texts translated
39
SLVA Colloquium 15-17 MARCH 2002 39 Coverage comparison (2) Answers to Q.1 French 1000 and 2000 lists seem to have even better, more reliable coverage than English equivalents But English AWL has better coverage Hint of a crossover – French 2000 and English AWL
40
SLVA Colloquium 15-17 MARCH 2002 40 Question 2. How similar are the new corpus based OU lists to old FF? 1. Contents 2. Coverage
41
SLVA Colloquium 15-17 MARCH 2002 41 FF v. OU corpus lists Contents Only on FF-1 (35) tabac tailler se taire tante tarder tas tasse taxi télégramme téléphoner tel quel tellement tendre tente terrible timbre thé théorie tissu toit tondre tonnerre tort tousser en train de tranquille traverser tribunal tricot tricoter triste se tromper trottoir trou truc On both lists (26) table tableau tant tard téléphone temps tenir terrain terre tête théâtre tirer tomber toucher toujours tour tourner tout train travail travailler travailleur très trop trouver tuer Only on Corpus 1K (26) tandis technique tel télévision tenter terme terminer territoire texte thème titre ton tôt total toutefois tradition traditionnel traitement traiter transport travers trente trois troisième tu type E.g., FF-1 for “T” = 87 words 43% overlap
42
SLVA Colloquium 15-17 MARCH 2002 42 FF v. OU corpus lists (Contents 2) Only on FF-2 (66) dactylo date davantage de débarrasser décision déclaration déclarer décourager découvrir défaire défendre définition dégoût dégoûter degré delà demande déménager demoiselle dentiste département dépasser des dès désert désespèrer désirer désolé désoler désordre détacher détester détour deuil développer deviner dévoué dévouer diable dictionnaire difficulté digérer digne On both lists (33) débuter décrire défaut défense déficit définir délai délégué délicat demeurer dépendre dépense déplacer déposer député destiner détail déterminer détruire dette dimension diminuer directement discours disparition disputer distance distribuer distribution domicile dominer drame dresser Only on Corpus 2K (53) dame danger dangereux danse dater débat déchet décor défaite défenseur défi définitif définitivement dégager délégation demain démarche démission démocrate démocratie démocratique démontrer dénoncer dépit déplacement dépôt dérouler descendre désigner désir désormais dessin dessiner dessous dessus détenir FF-2 for “D” = 152 words 33% overlap dimanche diriger discussion disposer distinguer distraction distrait distribution divers diviser divorcer document domaine domestique dommage doré douanier doucement douceur douleur douter durée dieu diffuser diffusion discipline discret discuter disponible dispositif disposition disque documentaire documentation dos doubler drogue duquel durer
43
SLVA Colloquium 15-17 MARCH 2002 43 FF v. OU lists Coverage (by hand analysis) Suggestion: Corpus-based lists do not necessarily give much better coverage Analysis needed of many more texts Much labour since FF is not lemmatized
44
SLVA Colloquium 15-17 MARCH 2002 44 FF v. OU lists (2) Coverage (by hand analysis)
45
SLVA Colloquium 15-17 MARCH 2002 45 FF v. OU lists (3) Pattern seems consistent
46
SLVA Colloquium 15-17 MARCH 2002 46 Q.2 How similar are corpus based OU lists to old FF? Answer … Content quite different but coverage seems similar Large amount of corpus work may add few coverage points… Pity Français Fondamental was never developed Lemmatized Incorporated in something like VocabProfile Used more to inform Proficiency levels Materials development Testing FF might have been the French GSL In 2002 FF does not exist on the WWW
47
SLVA Colloquium 15-17 MARCH 2002 47 Question 3. Is there a French AWL? Already noted that French AWL coverage is low
48
SLVA Colloquium 15-17 MARCH 2002 48 Fr v. English medical texts (CMAJ, translated, mainly E F)
49
SLVA Colloquium 15-17 MARCH 2002 49 Fr v. English EU debates (10 x 2,000 wds, translated F E)
50
SLVA Colloquium 15-17 MARCH 2002 50 Findings Coverage totals are similar, but not composition Fr 1+2 = Eng 1+2+AWL AWL-Fr pulls consistent 3.5% only Hint of crossovers AWL-En / 2k-Fr AWL-En / 1k-Fr
51
SLVA Colloquium 15-17 MARCH 2002 51 Classic coverage profile of general English 5,00088% 4,00087% 3,00084% 2,00080% 1,00072% 10049% 1024% Words Text Known Covered With 3.5% coverage, AWL-Fr appears to be merely the 2000- 3000 stretch of a general freq. list (assuming similar distribution to English) Problem with AWL-Fr: A general or a specialist list?
52
SLVA Colloquium 15-17 MARCH 2002 52 How it seems AWL-Fr was made A post-2k list of about 1000 word families cracked out of large, academic Parole corpus But with no obvious breakdown by domains Hence no range considerations Odd family groupings Many items on AWL-Fr have already appeared in another guise in 2k or even 1k list May have been adequate for original purpose…
53
SLVA Colloquium 15-17 MARCH 2002 53 Q.3 Is there a French AWL? Answer … At this point we do not know To find out, we need: A French academic corpus broken down by domains (medicine, philosophy, economics…) To look for a post-2K HF zone common across domains Encouragement: OU “AWL” seems to have some components of an AWL…
54
SLVA Colloquium 15-17 MARCH 2002 54 Question 4 How useful are current lists for research? A nswer … 1k + 2k lists appear stable and able to reliably predict 85% coverage of several text types Can probably be used for some interesting tasks To compare same students formal writing and bulletin board contributions (in progress) To investigate threshold issues Are our learners transferring language abilities from L1 or learning some abilities for the first time? Link to interesting Dutch work
55
SLVA Colloquium 15-17 MARCH 2002 55 Current research: Comparison of same Francophone students formal writing and bulletin board contributions Graduate course in SLA Formal essays + bulletin board contributions of same adult francophone students (n=16) Question: Is Fr talk-written-down mainly 1k lex items as in En? Profile -
56
SLVA Colloquium 15-17 MARCH 2002 56 Question 5 How useful are French lists pedagogicaily?
57
SLVA Colloquium 15-17 MARCH 2002 57 Remembering the return-for- learning principle We are looking for word lists where the smallest number of items to learn… … gives the largest amount of text coverage
58
SLVA Colloquium 15-17 MARCH 2002 58 OU word ‘families’ – most efficient (smallest) groupings? [1k – 2k – AWL ] paradigme paradoxal paradoxalement paradoxe paraître parallèle parallèlement paramètre parc parce parcours pareil parent parenté parfait parfaitement parfois parisien parking parlement parler parmi parole part partager partenaire participant participation participer particularité particulier partiellement partir Headwords (basewords) across the frequency zones Typical unbroken stretch from combined OU 1k+2k+AWL lists ?
59
SLVA Colloquium 15-17 MARCH 2002 59 Need for fewer, larger families OU families are composed only of inflectional morphologies, rarely derivational But some derivational suffixes are extremely frequent and transparent in meaning E.g. Paradoxe n. / paradoxale adj. / paradoxalement adv. Bauer & Nation (1993) treatment needed… Families not lemmas 1k + 2k lists with fewer, larger families is a manageable job Not actually needed for Lex Tutor application
60
SLVA Colloquium 15-17 MARCH 2002 60 How much weight could OU lists lose? prétendre prétends prétendons prétendez prétendent prétendais prétendait prétendions prétendiez prétendaient prétendis prétendit prétendîmes prétendîtes prétendirent prétendrai prétendras prétendra prétendrons prétendrez prétendront prétende prétendes prétendisse prétendisses prétendît prétendissions prétendissiez prétendissent prétendrais prétendrait prétendrions prétendriez prétendraient prétendant prétendu prétendue prétendus prétendues prétention prétentions provisoire provisoires psychologie psychologies psychologique psychologiques Ex. A stretch of “p” from AWL-Fr Five families => 3 if –tion and –ique can be assumed obvious
61
SLVA Colloquium 15-17 MARCH 2002 61 254 => 194 base words if we can assume these suffixes are obvious (20% loss) -aire -ance -ant -ate -isation -ation -é -el -ence -ent (plus pl. + fem.) -er -eté -eur -eux -ial -ible -ie -ième -ier Loosely applying Nation & Bauer (1993) family criteria based on Frequency of suffix Obviousness Non-distortion of root Likelihood knowable to learner -if -ion -ique -isme -iste -ité -ment -tion -ure COUNTING FREQUENCY Instances of -ique in Le Monde corpus = 7745 Instances of -iques = 2552 Many instances relate nicely to a root (pornographique, scientifique) but other instances are not so neat or obvious (politique, critique). In these cases, -ique is integral part of the root form...
62
SLVA Colloquium 15-17 MARCH 2002 62 Q.5 Are 1k and 2k OU lists pedagogically useful? With minor modifications, arguably quite useful Rationalize vocabulary learning task Grade materials lexically with VP Since quite similar to FF, can build on modifications of that work Frequency based recognition tests Comparison of recognition test and active production using VP
63
SLVA Colloquium 15-17 MARCH 2002 63 Using 1k-2k lists in a French Lexical Tutor Practical problems Suitable million-word learning corpus hard to find Frequency-based tests hard to find Complete click-on wordlists impractically large Families v. lemmas Accent-handling powers of computers erratic Problem with user-entered text (VP) From what type of keyboard etc? Live Demo: Realization to date (or slides=>)Realization to date
64
SLVA Colloquium 15-17 MARCH 2002 64 Solutions Million-word corpora hard to find T. Selva donates 1-million of Le Monde 1998 Frequency-based tests hard to find P. Meara donates FF based Yes-No tests Complete lemmatized click-on wordlists truly enormous Strategy: Word-root + “starts-with” search Picks up derivations and inflecitons Accent-handling powers of computers erratic At beginnings/endings of words (é, ë, ç) Strategy: “_à “ agrav “_à “ Live show: Realization to dateRealization to date
65
SLVA Colloquium 15-17 MARCH 2002 65 Using 1k-2k lists in Lexical Tutor (1)
66
SLVA Colloquium 15-17 MARCH 2002 66 Using 1k-2k lists in Lexical Tutor (2) First 1000 list so constituted = 902 families
67
SLVA Colloquium 15-17 MARCH 2002 67 Conclusions (1) Is there a GSL in French? French 1k and 2k lists seem to give similar coverage to GSL and can be used for tasks GSL was used for Tutorial Research There may have been a GSL-like French list all along (FF) Final French 1k-2k lists with 80% coverage will probably be fewer than 2000 word families
68
SLVA Colloquium 15-17 MARCH 2002 68 Conclusions (2) Is there an AWL in French? Question cannot be answered at present Awaits the development of a French academic corpus with domain breakdown If there is a French AWL, as in English it will not be defined by general frequency
69
SLVA Colloquium 15-17 MARCH 2002 69 Coordonnées Lexical Tutor Website http://132.208.224.131/ Adresses électronique Marlise Horst marlise@education.concordia.ca Tom Cobb cobb.tom@uqam.ca
Présentations similaires
© 2024 SlidePlayer.fr Inc.
All rights reserved.