A MULTIDISCIPLINARY PORTAL OF COLLABORATIVE ANNOTATION TOOLS OVERVIEW OF THE ADNOTARE PROJECT PRESENTED TO THE RESEARCH SOFTWARE DEVELOPERS’ WORKSHOP BY ANDRÉ LAPOINTE AND PIERRE ANDRÉ MÉNARD COMPUTER RESEARCH INSTITUTE OF MONTREAL MAY 30 TH, 2016
2 CONTEXT AND HISTORY VESTA –V ideo E valuation S ystem for T ask A nalysis –Financed by CANARIE’s Research Software program in 2013 Custom made for the LEADS research network (Learning Environment Across Disciplines) –Educational science : How do students learn? –6 universities and 11 partner organisations (Canada) –13 universities et 4 partner organisations (North America, Europe and Australia) –Lead by Dr Susanne Lajoie (McGill University)
3 VESTA PLATFORM
4 VESTA – SUPPORTED RESEARCH LEADS –McGill University –Educational Science Canadian Centre of Ethnomusicology (CCE) –University of Alberta –Archiving and research on musical and cultural traditions –Musical transcription service is for their usage Collaborative Music And Movement Laboratory (CoMM Lab) –Cape Breton University –Analysis of creative and interactive video content Tech3Lab –HEC Montreal (University of Montreal) –UX and Neuromarketing Research Lab
5 ADNOTARE OVERVIEW Adnotare –A portal of collaborative annotation tools –Composed of two platforms and component services VESTA : Video Evaluation System for Task Analysis Composed of 8 annotations services : 5 existing, 3 new PACTE : Plateforme d’Annotation Collaborative de Textes Électroniques Composed of 4 new annotations services
6 ADNOTARE OVERVIEW
7 PACTE OVERVIEW Web platform - Unified management with AdNotare Corpora management and search Manual text annotation Automated text annotation Text mining and exploration … … Research teams
8 PACTE – SUPPORTED RESEARCH Montréal youth protection center - Research institute –Around 13,000 kids and teenagers under care annually –Manage cases of mistreatment, abuse, drugs, and behavior disorders –Weekly follow-up about significant events and behavior. –Monthly risk evaluation to guide intervention Needs –Manual and automatic annotations of various entities for text mining –Detection, identification and chronological reordering of events –Specialized vocabulary –Assessing risk evaluation impact on daily interventions Restrictions –Secured storage and processing for sensitive data –Non-CS users –Mostly French documents and users
9 PACTE - SUPPORTED RESEARCH Laboratory of cognitive and semantic engineering (Lincs - ÉTS Montréal) –Research on text and data mining on SE and medical data –Analysis of language alterations for dementia detection (i.e. Alzheimer) Needs –Manual creation of large annotated text corpora –Flexible annotation schemas –Interaction between video and text annotation platforms –Bi(multi)lingual NLP toolset and interface Restrictions –Secured storage and processing for sensitive data (i.e. medical data) –Enables external annotation services registration
10 PACTE – TECHNICAL ASPECTS On-request asynchronous multilingual annotation pipelines –Encoding converter, language identifier, statistical corrector –Tokenizer, sentence splitter, part-of-speech tagger, chunker –Keyword extraction, lexical disambiguation, named entity tagger –Predefined pipelines with compatible tools/languages/tag sets/formats –Optimize missing or deficient language-specific models Semi-automated annotation with active learning –Help manual annotation by interacting with a machine learning algorithm –Ask user to annotate fewer, more relevant, non-redundant data point –Generates an optimized prediction model –Use it to automatically annotate the remaining text corpus –Uses pipelines’ annotations as ML features
11 ADNOTARE TEAM PIs and project management Collaborations Portal and platforms Text annotation services Speech annotation services Vision annotation services Component services
Suivez-nous Dialoguez avec nous Suivez-nous #CRIM_ca wwwCRIMca Tous droits réservés © 2015 CRIM. 405, avenue Ogilvy, bureau 101, Montréal (Québec) H3N 1M3/ / André Lapointe et Pierre André Ménard CRIM – Centre de recherche informatique de Montréal Principal partenaire financier Le CRIM est un centre de recherche appliquée en TI qui développe, en mode collaboratif avec ses clients et partenaires, des technologies innovatrices et du savoir-faire de pointe, et les transfère aux entreprises et aux organismes québécois afin de les rendre plus productifs et plus compétitifs localement et mondialement. Le CRIM dispose de quatre équipes de recherche en TI de calibre mondial, d’un centre de tests et d’interopérabilité considéré comme une référence neutre au Québec ainsi qu’un centre de formations de pointe en TI. Le CRIM œuvre principalement dans les domaines des interactions et interfaces personne- système, de l’analytique avancée et des architectures et technologies avancées de développement et tests. Détenteur d’une certification ISO 9001:2008, son action s’inscrit dans les politiques et stratégies pilotées par le ministère de l'Économie, de l'Innovation et des Exportations (MEIE), son principal partenaire financier.