Télécharger la présentation
La présentation est en train de télécharger. S'il vous plaît, attendez
Publié parAnatole Beauregard Modifié depuis plus de 9 années
1
Comparable Corpora BootCat (CCBC) Adam Kilgarriff, Avinesh PVS Lexical Computing Ltd
2
BootCaT Bootstrapping Corpora and Terms Translators – Know the language – Not domain experts – Can interpret domain terms but can’t guess them Instant domain corpus from the web Marco Baroni and Silvia Bernardini (2004)
3
BootCaT method Piggyback on a search engine – Google, Yahoo, Bing Set of seed terms Repeat – Take random 3 seeds – Send to search engine – Gather ‘search hits’ pages Remove, duplicates, find terms – Can iterate
4
WebBootCaT Web interface Improved cleaning, duplicate removal Integrated with corpus tool (Sketch Engine)
9
Going multilingual Google-translate – English: volcanology volcanologist "volcanic eruption" seismographs Eyjafjallajokull geodic "deformation monitoring" tephra magma stratigraphic tephrochronology geochronological "volcanic ash" ablation rhyolitic – French: vulcanologue volcanologie "éruption volcanique " sismographes Eyjafjallajokull "surveillance de la déformation" géodiques tephra magma téphrochronologie stratigraphique géochronologiques "de cendres volcaniques" ablation rhyolitiques And do the same thing for French
11
By July 2011 – All steps integrated – Propose bilingual terminology
Présentations similaires
© 2024 SlidePlayer.fr Inc.
All rights reserved.