L’évaluation des interfaces utilisateurs N.B.: Dans ces diapos, « BGBG » réfère à la 2e édition du livre « Human-Computer Interaction » de Baecker, Grudin,

Slides:

Advertisements

Présentations similaires

Les expériences contrôlées

Advertisements

Making PowerPoint Slides Avoiding the Pitfalls of Bad Slides.

QUEL CADEAU! YOU HAVE BEEN OF GREAT SERVICE TO FRANCE. PRESIDENT HOLLANDE IS VERY GRATEFUL TO YOU! KNOWING OF YOUR INTEREST IN EGYPTOLOGY, THE DIRECTOR.

J’habite dans une petite maison. J’habite dans une maison jumellé.

A POWER POINT DEMONSTRATION. The End. Just kidding! This is serious stuff.

Template Provided By Genigraphics – Replace This Text With Your Title John Smith, MD 1 ; Jane Doe, PhD 2 ; Frederick Smith, MD, PhD 1,2 1.

The Basis of the Servqual Model The Gaps The Key Service Dimensions Causes & Solutions to Gaps.

Theme Three Speaking Questions

Notes for teacher. You can just use slides 2-5 if you wish. If you want to do the practical activity (slides 6-8) you will need to: print off Slide 6.

Reflexive verbs and morning routine FR2

Êtes-vous terrorisés par la pollution ?

AP Examen Pratique commentaires

Speaking Exam Preparation

Theme Two Speaking Questions

L’exposé 3- 4 mins long - 4 minutes is a maximum, not a target.

How many young people? Young people take drugs

Qu’est-ce qu’ils aiment faire?

Why is it important to plan ahead for the future?

Chapter 6- the verb ‘to go’ question words places time

Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.

Theme Two Speaking Questions

Theme One Speaking Questions

© 2004 Prentice-Hall, Inc.Chap 4-1 Basic Business Statistics (9 th Edition) Chapter 4 Basic Probability.

F RIENDS AND FRIENDSHIP Project by: POPA BIANCA IONELA.

Setting SMART Objectives Training. ©SHRM Introduction Of all the functions involved in management, planning is the most important. As the old saying.

Leadership Styles Mrs. Keith Main Types of Leadership Styles 1.The Autocratic or Authoritarian Leader 2.The Democratic or Participative Leader.

Quel type de compétences peut-on apprendre en participant à des activités de robotique? Recherche et raisonnement déductif.

Psychological Approaches to Dreams why people dream? five major approaches to psychology: Psychodynamic, humanistic, behavioral, cognitive, and neuroscience.

1 ISO/TC 176/SC 2/N1219 ISO 9001:2015 Revision overview - General users July 2014.

Author : Moustapha ALADJI PhD student in economics-University of Guyana Co-author : Paul ROSELE Chim HDR Paris 1-Pantheon Sorbonne Economics / Management.

Français - couleurs - pays - drapeaux

Le soir Objectifs: Talking about what you do in the evening

L'amour ne demande pas pourquoi Défilement automatique

Qu’est-ce que tu as dans ta trousse?

Pablo Picasso P____ P_______

Quelle est la date aujourd’hui?

Pato et Elly font du sport aussi.

L’objectif: to know the words for school subjects in French.

les instructions Bonjour la classe, sortez vos affaires

Les expériences contrôlées

Qu’est-ce que tu as dans ta trousse?

Essaie Persuasif.

Révision – Phrases Importantes

Quelle est la date aujourd’hui?

Les instructions en classe

C’est quel numéro? Count the numbers with pupils.

Mr MAJDOUB Med Amine Computer Science 9th Level FB : Mr Majdoub Classroom.

9-1 What is Creativity?. 9-2 Creativity is… Person Process Produce Press.

Mettez vos devoirs dans la boîte rouge prennez les devoirs 2.2 B

Quelle est la date aujourd’hui?

sortir avec mes copains faire les magasins jouer à des vidéo

Question formation In English, you can change a statement into a question by adding a helping verb (auxiliary): does he sing? do we sing? did they sing.

WRITING A PROS AND CONS ESSAY. Instructions 1. Begin your essay by introducing your topic Explaining that you are exploring the advantages and disadvantages.

J’ai mal !!!!! FINAL REVIEW.

Pato et Elly font du sport aussi.

Aujourd’hui nous allons…

Making PowerPoint Slides Avoiding the Pitfalls of Bad Slides.

POWERPOINT PRESENTATION FOR INTRODUCTION TO THE USE OF SPSS SOFTWARE FOR STATISTICAL ANALISYS BY AMINOU Faozyath UIL/PG2018/1866 JANUARY 2019.

© by Vista Higher Learning, Inc. All rights reserved.4A.1-1 Point de départ In Leçon 1A, you saw a form of the verb aller (to go) in the expression ça.

les instructions Bonjour la classe, sortez vos affaires

5S Methodology How to implement "5S" and get extraordinary results.

1 Sensitivity Analysis Introduction to Sensitivity Analysis Introduction to Sensitivity Analysis Graphical Sensitivity Analysis Graphical Sensitivity Analysis.

Avoiding the Pitfalls of Bad Slides Tips to be Covered Outlines Slide Structure Fonts Colour Background Graphs Spelling and Grammar Conclusions Questions.

Les Mots Intérrogatifs

Chapter 6- the verb ‘to go’ question words places time

Les opinions Les opinions = Opinions. In this lesson pupils will learn to understand and give their own opinions about singular items.

L’orchestre des animaux

IMPROVING PF’s M&E APPROACH AND LEARNING STRATEGY Sylvain N’CHO M&E Manager IPA-Cote d’Ivoire.

Transcription de la présentation:

L’évaluation des interfaces utilisateurs N.B.: Dans ces diapos, « BGBG » réfère à la 2e édition du livre « Human-Computer Interaction » de Baecker, Grudin, Buxton et Greenberg (1995)

Analyse des tâches, analyse des besoins des utilisateurs. Entrevues et observations des utilisateurs. Analyse concurrentielle des autres produits. Recherche de directives ("guidelines", conseils) publiées pour le domaine ciblé. Évaluation Prototypes crées avec des outils logiciels (Balsamiq, Axure, JustInMind, Proto.io, etc.) Programmation Prototypes en papier, avec participation des utilisateurs, et plusieurs prototypes en parallèle. Croquis pour générer plusieurs idées en parallèle. Solicitation de réactions des collègues + utilisateurs. Évaluation Processus de conception d'interfaces (selon McGuffin)

Analytic vs Empirical Evaluations (BGBG pp ) Analytic Evaluations (Évaluations analytiques) –Do not involve actual users –Focus is on why things happen the way they do, and on the components of the system –Produce interpretations and suggestions, not “solid facts” –Better for formative evaluation than summative evaluation –Can be used early in design process, before any high-fidelity prototype exists –Examples: heuristic evaluation, walkthrough, claims analysis Empirical Evaluations (Évaluations empiriques) –Involve actual users –Focus is on what actually happens in practice –Produce factual measurements and observations –Good for summative evaluation, but may not clearly point to what changes to make –Can produce a lot of data that is laborious to analyze –Examples: experiments, usability testing, field studies

Usability Inspection Methods –Heuristic evaluation — Judgments by a panel of evaluators (e.g, 3 to 5) of the degree to which an interface satisfies a set of usability guidelines, followed by discussion and analysis –Cognitive walkthroughs Roles –Evaluation without users (contrast to usability tests, etc.) –Elicit expert opinions about the user’s model, functionality, look & feel, etc.

Usability Inspection (cont’d) Advantages –Structured method of using accumulated wisdom of experts Disadvantages –Doesn’t take advantage of real insights from real users Example — Heuristic evaluation with 10 usability guidelines (Nielsen, BGBG, Fig. 2.7, p. 83) –Visibility of system status –Match between system and the real world –User control and freedom –Consistency and standards –Error prevention –Recognition rather than recall –Flexibility and efficiency of use –Aesthetic and minimalist design –Help users recognize, diagnose, and recover from errors –Help and documentation

Demonstrations Demonstrate system to: –Any random person –Management, potential investors, journalists –Potential customers –Potential users –Potential business partners Take detailed notes Elicit reactions to user's model, functionality, interface Advantages –Get feedback early in prototype or system construction –You're going to have to give demos anyway — why not learn from them? Disadvantages –System still rough, which introduces noise into process

Empirical Evaluation: Naturalistic Observation vs True Experiments (Example: Ray and Ravizza 1985) Naturalistic observation (watching, recording) True experiments (manipulating, measuring) Noninterference with phenomenaManipulation, control Observations of patterns and invariants Measurements of observed patterns High level, big picture insightsLow level, detailed results Qualitative, descriptiveQuantitative

Empirical Evaluation: User Testing Design and implement scenario or prototype Record user behaviour –Typical usage, or critical incidents –Keystroke and mouse event recording –Thinking aloud protocols –Audio or video recording Collect subjective impressions (questionnaire, interview) Analyze recordings of user behaviour

Typical Steps in User Testing (Gomoll, in Laurel, 85-90) Set up the observation Describe the purpose of the study, and how the data collected will be used Tell the user (verbally and on paper) that it's OK to quit at any time Ask participant if they are willing to sign form to give their permission to begin Pre-questionnaire (name, age, handedness, background, education, experience with computers, etc.) Talk about and demonstrate the equipment Explain how to “think aloud” Explain that you will not provide help Describe the task and introduce the system Ask if there are questions before you start; then begin observation Post-questionnaire and/or interview to solicit opinions, impressions, etc. Conclude the observation and debrief participants Transcribe, tabulate the data and results Analyze, interpret the results À retenir!

User Testing (BGBG, Fig. 2.8, p. 85, adapted from Neilsen, 1992) Practical study design –Reflect on the participants’ backgrounds and how they might affect the study –Be aware of problems that arise when experimenters know the users personally –Prepare for the study carefully (avoid last minute panic) –Select the tasks carefully to be representative and to fit the allotted time –In general, start with an easier (but not frivolous) task –Write down features of system not being tested as well as those that are! –Define the start-up state for the study precisely –Define precise rules for when and how users can be helped during the study –Plan timing and cut-off procedure (if subject gets stuck) for each part of study –Include provisions for data collection (e.g., audio, video, or keystroke capture) –Plan data analysis techniques in advance –Carry out an initial pilot study to test your protocol Written materials –Participant release (permission) form –Pre-questionnaire covering prior experience etc. –Introduction to the study for users, including scenario of use, and description of tasks –Checklist for experimenters, and paper for note-taking –Post-questionnaire or survey À retenir!

User Testing (BGBG, Fig. 2.8, p. 85, adapted from Neilsen, 1992) Carrying out the study –Let users know that complete anonymity will be preserved –Let them know that they may quit at any time –Stress that the system is being tested, not the participant Note: “participant” is the more modern term for “subjet” –Indicate that you are only interested in their thoughts relevant to the system –Demonstrate the thinking-aloud method by acting it out for a simple task, e.g., figuring out how to load a stapler –Hand out instructions for each part of the study individually, not all at once –Maintain a relaxed environment free of interruptions –Occasionally encourage users to talk if they grow silent –If users ask questions, try to get them to talk (e.g., “What do you think is going on?” and follow predefined rules on when to help or interrupt to help. –Debrief each user after the experiment À retenir!

Thinking Aloud Attempt to elicit thought processes of participant, thereby yielding valuable insights (although process is slowed down and may be changed) Participant talking while they are doing –Problems they are having –Solutions they are considering –Why they are having trouble –Insights that they have –Wishes that they have Co-Discovery: Pairs of participants conversing (Co- Discovery Learning, Kennedy paper in BGBG, pp )

Data Capture and Analysis Keystroke+mouse logging –Record precise user behaviour –Record times to carry out actions –Record user errors Observation and note taking by observers, especially of user problems and critical incidents –Best if note taking done by a 2nd observer Audio and video recordings –Can't observe and record all behaviour in real-time –Preserve behaviour for review (even non-verbal behaviour) –Can produce a lot of data 

Asking Users in Addition to Observing Them Methods (Post-) Questionnaire design –Formulating & asking questions, & analyzing answers –Hard to avoid bias in the phrasing of questions –Therefore requires pre-testing (“pilot testing”) Surveys (Sondages) — (possibly large-scale) administration of questionnaires to appropriate samples of individuals chosen from a population Administration of questions through interviews

Ethical Issues Basic principles –Do no harm –Voluntary participation –Informed consent –Right to privacy Use of research protocols and consent forms –Explanation of study and purpose –Anonymity –Ability to withdraw at any time –For example, see p. 256 of Rosson & Carroll

Les expériences contrôlées

Les expériences Une partie fondamentale de la méthode scientifique Permettent de trouver des relations causales entres des conditions et leurs effets En IHM, permettent de trouver si une interface A est plus rapide/cause moins d’erreurs/etc. qu’une interface B

Les expériences On varie (manipule) au moins une variable (exemple: l’interface à utiliser). C’est la variable indépendante. Chacune de ses valeurs correspond à une condition. On mesure au moins une variable (exemples: le temps, le nombre d’erreurs, la satisfaction subjective). C’est la variable dépendante. On analyse les résultats pour voir s’il y a des différences significatives.

Exemple d’expérience Les « expanding targets » Référence: M. McGuffin, R. Balakrishnan (2002). Acquisition of Expanding Targets. Proceedings of ACM Conference on Human Factors in Computing Systems (CHI) 2002, pages ,

Exemple: Mac OS X Est-ce que ce genre de grossissement rend la sélection plus facile ?

D’autres exemples Furnas Generalized fisheye views CHI 1986 Bederson Fisheye Menus UIST 2000 Mackinlay, Robertson, Card The Perspective Wall CHI 1991

Fitts’ Law A W Cursor Target

Fitts’ Law Same ID → Same Difficulty Target 1 Target 2

Fitts’ Law Smaller ID → Easier Target 2 Target 1

Fitts’ Law Larger ID → Harder Target 2 Target 1

Distance Vitesse La loi de Fitts Largeur W “undershoot” (trop court) “overshoot” (trop loin) Mouvements à boucle ouverte (sans retour) Mouvements à boucle fermée (avec retour)

Expanding Targets Basic Idea: Big targets can be acquired faster, but take up more screen space So: keep targets small until user heads toward them Cancel Okay Click Me !

Experimental Setup Target Start Position W A

Experimental Setup Expansion: How ? Animated Expansion

Experimental Setup Expansion: How ? Fade-in Expansion

Experimental Setup Expansion: How ? When ? P = 0.25

Experimental Setup Expansion: How ? When ? P = 0.5

Experimental Setup Expansion: How ? When ? P = 0.75

Pilot Study 7 conditions: No expansion (to establish a, b values) Expanding targets –Either animated growth or fade-in –P is one of 0.25, 0.5, 0.75 (Expansion was always by a factor of 2)

Pilot Study 7 conditions x 16 (A,W) values x 5 repetitions x 2 blocks x 3 participants = 3360 trials

Pilot Study: Results Time (seconds) ID (index of difficulty)

Pilot Study: Results Time (seconds) ID (index of difficulty)

Pilot Study: Results Time (seconds) ID (index of difficulty)

Pilot Study: Results Time (seconds) ID (index of difficulty) P = 0.25

Pilot Study: Results Time (seconds) ID (index of difficulty) P = 0.5

Pilot Study: Results Time (seconds) ID (index of difficulty) P = 0.75

Pilot Study suggests the advantage of expansion doesn’t depend on P So, set P = 0.9 and perform a more rigorous study Implications

Full Study 2 conditions: No expansion (to establish a, b values) Expanding targets, with –Animated growth –P = 0.9 –Expansion factor of 2

Full Study 2 conditions x 13 (A,W) values x 5 repetitions x 5 blocks x 12 participants = 7800 trials

Results Time (seconds) A, W values

Results Time (seconds) ID (index of difficulty)

Results Time (seconds) ID (index of difficulty)

Results Time (seconds) ID (index of difficulty)

Results Time (seconds) ID (index of difficulty) P = 0.9

Implications For single-target selection task, –Expansion yields a significant advantage, even when P=0.9 What about multiple targets ?

(Fin des diapos sur les « expanding targets »)

Les variables dans une expérience Variables indépendantes: celles qu’on manipule (on les appelle aussi les facteurs); correspondent aux conditions (ou traitements ou niveaux) Variables dépendantes: celles qu’on mesure Variables de contrôle: celles qu’on contrôle, c.-à-d. qu’on essaie de garder constantes entre les conditions Variables aléatoires: celles qu’on laisse varier, de manière le plus aléatoire possible. – Exemples: âge, sexe, profil socio-économique, etc. – Comment assurer une variation aléatoire entre les conditions ? Affectation aléatoire des participants aux conditions – Désavantage: Ces variables vont introduire plus de variabilité dans nos résultats – Avantage: Nos résultats seront plus généraux; nos conclusions vont s’appliquer à plus de situations Variables confondantes: celles qui varient de manière systématique entre les conditions. On veut éliminer ces variables! À retenir!

Régression linéaire Résultats du calcul: pente, intersection, et coéfficient de corrélation de Pearson r qui est dans l’intervalle [-1,1] X Y

Un lien causal … Dans une expérience bien contrôlée, s’il n’y a pas de variables confondantes, et on trouve que les variable dépendantes changent lorsqu’on change les variables indépendantes, on peut conclure qu’il y a un lien causal: le changements dans les variables indépendantes cause le changement dans les variables dépendantes. Dans ce cas, une corrélation impliquerait un lien causal.

… versus une corrélation simple Par contre, si on ne fait qu’observer une corrélation entre deux variables X et Y, sans contrôler les conditions, cela n’implique pas un lien causal entre eux. Il se pourrait que – X a un effet sur Y – Y a un effet sur X – Une troisième variable, Z, a un effet sur X et Y C’est pour ça qu’on essaie d’éliminer les variables confondantes dans les expériences

Exemple Des chercheurs voulaient savoir quelle variable pourrait prédire les chances qu’un conducteur de motocyclette ait un accident de moto. Ils ont cherché des corrélations entre le nombre d’accidents, et l’âge, le niveau socio- économique, etc. Ils ont trouvé que la plus forte corrélation était avec le nombre de tatouages du conducteur. Évidemment, les tatouages ne causent pas les accidents, ni l’inverse.

Exemples de questions qu’on peut chercher à répondre avec une expérience Parmi 3 interfaces, A, B, C, laquelle permet de compléter une tâche donnée la plus vite ? Est-ce que le Prozac a un effet sur la performance des gens à attacher des lacets de souliers ? Est-ce que la fréquence des annonces de type X à la télévision a un effet sur les élections ? Est-ce que le fait de jeter un sort à des dés peut avoir un effet sur les résultats de lancer les dés ?

Éléments d’une expérience Population –L’ensemble de tous les participants possibles Échantillon (“sample”) –Sous-ensemble de la population choisi pour une étude; un ensemble de participants Participants (anciennement, on disait sujets) –Les gens ou les utilisateurs qui effectuent des tâches Observations / Variable(s) dépendente(s) –Données qui sont mesurées Exemples: temps pour compléter une tâche, nombre d’erreurs commises, préférences subjectives Condition(s) / Traîtement(s) / Variable(s) independente(s) –Quelque chose qui distingue les échantillons (exemple: prendre un médicament vs un placebo, ou utiliser l’interface A vs B) –Le but de l’expérience est souvent de déterminer si les conditions ont un effet sur les observations

Étapes dans la planification et l’exécution d’une expérience Le plan de l’expérience (“experimental design”) –Choisir les variable(s) indépendente(s) –Choisir les variable(s) dépendente(s) –Développer une hypothèse –Choisir un paradigme croisé = “within subjects” OU emboîté = “between subjects” –Choisir une manière de contrôler les variables –Choisur la taille de l’échantillon Expérience pilote –Une première expérience, souvent pour explorer plusieurs conditions pour sonder l’effet de chaque variable La “vraie” expérience –Se concentre plus sur l’effet soupçonné; essaie de recueillir beaucoup de données à des conditions optimales pour obtenir un effet prononcé et de conclusions confiantes Analyser les données –En utilisant un test statitisque comme le ANOVA (analysis of variance) Interpréter les résultats

Hypothèse Énoncé, à tester, concernant la relation entre les variables indépendentes et dépendentes L’hypothèse nulle dit que les variables indépendentes n’ont pas d’effet sur les variables dépendentes

Les plans expérimentaux ("experimental design") Between subjects or within subjects manipulation (emboîté vs croisé) Example: designs with one independent variable –Between subjects design (emboîté) One independent variable with 2 or more levels Subjects randomly assigned to groups Each subject tested under only 1 condition –Within subject design (croisé) One independent variable with 2 or more levels Each subject tested under all conditions Order of conditions randomized or counterbalanced (why?)

Choses à contrôler Les caractéristiques des participants –Sexe, droitier vs gaucher, etc. –Habileté –Expérience (professionnelle, de vie, ou autre) Les tâches –Directives données aux participants –Matériel / équipment utilisé L’environnement –Toujours le même local –Bruit, lumière ambiente, etc. Effets dûs à l’ordonnancement des conditions dans une expérience “within subjects” –L’amélioration du participant dû au pratique de la tâche –La détérioration dû à la fatigue

Comment contrôler pour les effets d’ordonnancement Contrebalancement –Plan factoriel –Plan en carré latin

↓Plan expérimental↓Nombre de conditions = N 234 →Emboîté ("between subjects"): chaque participant passe par seulemeunt une condition. Avantage: pas d'effets de transfert. 1/2: A 1/2: B (chaque moitié des participants fait une condition) 1/3: A 1/3: B 1/3: C 1/4: A 1/4: B 1/4: C 1/4: D →Croisé ("within subjects"): chaque participant passe par toutes les conditions. Avantage: plus de données recueillies pour le même nombre de participants. →Factoriel: tous les (N!) ordonnancements possibles de conditions 1/2: AB 1/2: BA 1/6: ABC 1/6: ACB 1/6: BAC 1/6: BCA 1/6: CAB 1/6: CBA 1/24: ABCD 1/24: →Carré latin: chaque condition apparaît une fois dans chaque position (colonne) possible. 1/2: AB 1/2: BA (même chose que factoriel) 1/3: ABC 1/3: BCA 1/3: CAB 1/4: ABCD 1/4: BCDA 1/4: CDAB 1/4: DABC Carré latin versus plan factoriel: si le nombre de conditions est N, le nombre de participants avec un carré latin a seulemeunt besoin d'être un multiple de N, au lieu d'un multiple de N! Plans expérimentaux contrebalancés pour une variable indépendente avec N niveaux A, B,...

Exemple de plan expérimental avec une variable indépendente (diapo 1/2) Variable indépendente MENU avec trois niveaux (c.-à-d. trois conditions) A, B, C, soient trois sortes de menus Plan expérimental croisé par rapport à MENU, contrebalancé avec un carré latin 3×3 – C.-à-d.: un tiers des participants font A suivi de B suivi de C; un tiers font B,C,A; un tiers font C,A,B Il y aura des effets de transfert, mais on espère qu'ils seront symmétriques Variable dépendente: TEMPS de selection À la fin de l'expérience, on pourra faire un ANOVA pour savoir si MENU a un effet significatif (p < 0.05) sur TEMPS

Exemple de plan expérimental avec une variable indépendente (diapo 2/2) On peut aussi définir une deuxième variable ORDRE, qui sera emboîtée, avec trois niveaux (un pour chaque tiers des participants) – 1/3: MENU = A, B, C; ORDRE = 1 – 1/3: MENU = B, C, A; ORDRE = 2 – 1/3: MENU = C, A, B; ORDRE = 3 Donc, notre plan a deux variables indépendentes: MENU qui est croisé, et ORDRE qui est emboîté À la fin de l'expérience, on fait un ANOVA pour savoir si ORDRE a un effet significatif sur TEMPS – Si non, les effets de transfert sont symmétriques (bonne nouvelle!) – Si oui, on peut simplement supprimer toutes les données recueillies après le premier niveau de MENU; notre plan expérimental se réduit donc à un plan emboîté par rapport à MENU avec seulement un tiers des données qui restent (et donc moins de puissance statistique pour le ANOVA), mais nous n'avons plus d'effets de transfert On fait un ANOVA pour savoir si MENU a un effet significatif sur TEMPS

Exemple de plan expérimental avec deux variables indépendentes (1/2) Disons qu'on veut évaluer deux techniques de visualisation de données (TECHNIQUE = A ou B). On ne veut pas demander aux participants de faire des tâches avec le même jeu de données pour les deux techniques, donc on aura deux jeux JEU = J1 ou J2 Un plan possible: – 1/4: (TECHNIQUE, JEU) = (A, J1), (B, J2) – 1/4: (TECHNIQUE, JEU) = (A, J2), (B, J1) – 1/4: (TECHNIQUE, JEU) = (B, J1), (A, J2) – 1/4: (TECHNIQUE, JEU) = (B, J2), (A, J1) Notre plan et donc croisé par rapport à TECHNIQUE et JEU avec contrebalancement factoriel (ce qui est équivalent au carré latin, dans ce cas) Peut-être, pour chaque technique, on a une série de tâches TÂCHE = T1, T2, T3, T4. Cela rajoute une troisième variable. Si on contrebalance l'ordonnancement de TÂCHE avec un carré latin 4×4, ça donne 4 ordonnancements de tâches. Combiné avec nos 4 ordonnancements de (TECHNIQUE, JEU), ça donnerait 16 ordonnancements, ce qui est beaucoup.

Exemple de plan expérimental avec deux variables indépendentes (2/2) Une autre approche serait de fixer l'ordre des tâches, de la plus facile à la plus difficile, par exemple. On pourrait aussi définir une variable ORDRE Un plan possible serait donc – 1/4: (TECHNIQUE, JEU) = (A, J1), (B, J2); ORDRE = 1 – 1/4: (TECHNIQUE, JEU) = (A, J2), (B, J1); ORDRE = 2 – 1/4: (TECHNIQUE, JEU) = (B, J1), (A, J2); ORDRE = 3 – 1/4: (TECHNIQUE, JEU) = (B, J2), (A, J1); ORDRE = 4... où TÂCHE = T1, T2, T3, T4 pour chaque combinaison de (TECHNIQUE, JEU). Notre plan serait donc croisé par rapport à TECHNIQUE et JEU, emboîté par rapport à ORDRE, et avec un ordonnancement fixe pour TÂCHE. Il va sûrement y avoir des effets de transfert asymmétriques entre les tâches, nous empêchant de comparer les tâches avec un ANOVA, mais cela peut être acceptable si notre objectif principal est de comparer les techniques A et B de visualisation. Une autre approche aurait été de dire que l'ordonnancement de TÂCHE sera alléatoire À la fin de l'expérience, on fait un ANOVA pour savoir si ORDRE a un effet significatif sur TEMPS, et ensuite un autre ANOVA pour savoir si TECHNIQUE a un effet significatif sur TEMPS

ANOVA “Analysis of Variance” A statistical test that compares the distributions of multiple samples, and determines the probability that differences in the distributions are due to chance In other words, it determines the probability p that we would observe the given distributions if the null hypothesis is correct If probability is below 0.05 (i.e. 5 %), then we reject the null hypothesis, and we say that we have a (statistically) significant result –Why 0.05 ? Dangers of using this value ?

Techniques for Making Experiment more “Powerful” (i.e. able to detect effects) Reduce noise (i.e. reduce variance) –Increase sample size –Control for random variables E.g. psychologists often use in-bred rats for experiments ! Increase the magnitude of the effect –E.g. give a larger dosage of the drug

Une petite différence entre les moyennes des échantillons. Est-ce significative, ou simplement dû au hasard ? Une plus grande différence entre les moyennes des échantillons. Est-ce significative, ou simplement dû au hasard ?

Avec une variance plus petite (que sur le diapo précedent), on est plus sûr que la très petite différence ici est dû au hasard … … et la différence plus grande ici est significative.

Avec une taille d’échantillon plus large (que sur les diapos précedents), on est plus sûr que la très petite différence ici est dû au hasard … … et la différence plus grande ici est significative.

Uses of Controlled Experiments within HCI Evaluate or compare existing systems/features/interfaces Discover and test useful scientific principles –Examples ? Establish benchmarks/standards/guidelines –Examples ?

Exemple d’un plan d’expérience … Pour chaque participant … –Pour chaque condition majeure... * On fait des essais de réchauffement On a un certain nombre de blocs, séparés par des pauses Pour chaque bloc … On répète chaque condition mineure un certain nombre de fois * * Comment ordonner ces choses ?