2 Plusieurs méthodes d’évaluation vues en LOG 350 … SondagesÉvaluation heuristiqueTests d’utilisabilitéExpériencesEtc.
3 Les expériences Une partie fondamentale de la méthode scientifique Permettent de trouver des relations causales entres des conditions et leurs effetsEn IHM, permettent de trouver si une interface A est plus rapide/cause moins d’erreurs/etc. qu’une interface B
4 Les expériencesOn varie (manipule) au moins une variable (exemple: l’interface à utiliser). C’est la variable indépendante. Chaqu’une de ses valeurs correspond à une condition.On mesure au moins une variable (exemples: le temps, le nombre d’erreurs, la satisfaction subjective). C’est la variable dépendante.On analyse les résultats pour voir s’il y a des différences significatives.
5 Exemple d’expérienceLes « expanding targets » Référence: M. McGuffin, R. Balakrishnan (2002). Acquisition of Expanding Targets. Proceedings of ACM Conference on Human Factors in Computing Systems (CHI) 2002, pages 57-64,
6 Example: Mac OS X Does this really make acquisition easier ? This is a backup slide, in case the movie doesn’t work.Does this really make acquisition easier ?
7 Additional motivation There are also less recent examples of schemes where a widget or some portion of a widget expands in response to the user’s focus:From left to right we have a fisheye calendar, the perspective wall, and a fisheye menu.The common theme for all of these strategies is an attempt to make better use of available screen space by displaying more information when and where it is needed.However, this talk will focus on the effects that this kind of thing has on selection of targets.// Various widgets have been proposed that expand dynamically in response to the user’s focus.// to make better use of screen space, and display more information when and where it’s needed.// Here are some examples where the expansion is used to display more information:// Furnas’ calendar, etc.FurnasGeneralized fisheye viewsCHI 1986Mackinlay, Robertson, CardThe Perspective WallCHI 1991BedersonFisheye MenusUIST 2000
8 Fitts’ Law A Target Cursor W A good place for us to start is with Fitts’ Law.Fitts’ Law describes the average time required to select a target.…There are a few different formulations of Fitts’ Law; the one that is popular now is the Shannon formulation which looks like this …// … to answer this question, a good place to start is Fitts’ Law.// …// This is the Shannon formulation of Fitts’ Law that is generally accepted in the HCI community.TargetCursorW
9 Fitts’ Law Target 1 Target 2 Same ID → Same Difficulty This part could be skipped over.So Fitts’ Law tells us that ID is scale invariant.How is this possible ? How is it that a target farther away takes the same time to acquire ?The answer is that, although the user has farther to travel to acquire the 2nd target, they also have more distance over which they can accelerate. Furthermore, because the target is bigger, the user doesn’t have to be as precise about when to stop.Target 1Target 2Same ID → Same Difficulty
10 Fitts’ Law Target 1 Target 2 Smaller ID → Easier Now, on the other hand, if the target somehow covers more than the cone, …Target 1Target 2Smaller ID → Easier
11 Fitts’ Law Target 1 Target 2 Larger ID → Harder ---------------------- Likewise, if the target is strictly within the cone, …Target 1Target 2Larger ID → Harder
12 Fitts’ Law W Open-loop Closed-loop Speed Overshoot Undershoot Distance As a final point about Fitts’ Law, I would like to show you a velocity profile of a user’s movement toward a target.Imagine the user starting on the left and having to move onto the target.Ideally, …However, in practice, the user may for example not move quite far enough …… one or more small corrective movements …The average number of corrective movements increases as the target becomes smaller or harder to select.Now, 2 points:-The current prevailing model is that the initial movement is open-loop, while the corrective motions at the end are closed-loop.-If the initial movement really is open-loop, then the target size doesn’t matter initially, and we may be able to take advantage of this fact when designing targets that expand dynamically.(Actually, when I stated this at CHI 2002, someone pointed out that the width can be important, i.e. users may perform a shorter initial movement if they know that the width is large, since they won’t be expecting to have to move all the way to the centre of the target.)UndershootDistance
13 Expanding Targets Basic Idea: Big targets can be acquired faster, but take up more screen spaceSo: keep targets small until user heads toward themClick Me !Well, what exactly do I mean by an “expanding target” ?The basic idea is that Fitts’ Law tells us bigger buttons are easier to select, however if we make all of our buttons big we run out of screen space.So, as a compromise, let’s try to keep buttons small until the user wants to select one of them;Somewhat like this …Unfortunately, Fitts’ Law does not tell us a priori that such a target would be easier to select, because the expansion occurs after the user has already started to move towards the target.So, as a first step, we wanted to establish that expanding targets are in fact easier to select.// Now since Fitts’ law tells us that bigger targets are faster to acquire, why not make all our buttons and widgets bigger ?// Well, because they would take up too much screen space. Etc.OkayCancel
14 Experimental Setup W Target Start Position A … to do this, we reduced things to a 1-dimensional, single target selection task …First, we wanted to confirm that expanding targets were easier to acquire.We reduced the problem to a single target, 1-dimensional selection task, to eliminate confounding factors.In our experiments, we have each user do the following: …
15 Experimental Setup Expansion: How ? Animated Expansion Now, for expanding targets, there were a few different parameters that we wanted to explore.First, …
16 Experimental Setup Expansion: How ? Fade-in Expansion Repeat diff between two,Point out that with fade-in expansion, the full target size is immediately available (in the motor domain);this is not the case with animated expansion.
17 Experimental Setup Expansion: How ? When ? P = 0.25 P is confusing; say “expansion point P”.
19 Experimental Setup Expansion: How ? When ? P = 0.75 Why do we care about p ?
20 Pilot Study 7 conditions: No expansion (to establish a, b values) Expanding targetsEither animated growth or fade-inP is one of 0.25, 0.5, 0.75(Expansion was always by a factor of 2)Mention why a factor of 2 was used: because we thought it would be a reasonable value for designers to use in a real UI.
21 Pilot Study 7 conditions x 16 (A,W) values x 5 repetitions x 2 blocks x 3 participants= 3360 trialsSlower
22 Pilot Study: ResultsTime (seconds)ID (index of difficulty)
23 Pilot Study: ResultsTime (seconds)ID (index of difficulty)
24 Pilot Study: Results Time (seconds) ID (index of difficulty) I’ve shown you how long it took to select static targets.What about the expanding targets ?Well, before I show you that data, let’s try to predict what the results might look like.By doubling the size of a target, we reduce its ID by approximately 1.This is approximately the same as shifting the base line to the right by 1.So, at best, we should expect the time to select expanded targets to coincide with the dashed line.So the dashed line is a lower bound on performance with expanding targets.Now, what we actually expected was for the selection time to fall somewhere in between these two lines.We expected the expansion to yield some advantage, but not achieve the lower bound.To our surprise, …ID (index of difficulty)
25 Pilot Study: Results Time (seconds) P = 0.25 ID (index of difficulty) To do: find out if the measured red lines are for fade-in or animated expansion.There was a significant difference between the base condition and the expanding conditions.There was no significant difference between any of the expanding conditions(i.e. between animated growth and fade-in, and also between the 3 P values).ID (index of difficulty)
26 Pilot Study: ResultsTime (seconds)P = 0.5ID (index of difficulty)
27 Pilot Study: ResultsTime (seconds)P = 0.75ID (index of difficulty)
28 ImplicationsPilot Study suggests the advantage of expansion doesn’t depend on PSo, set P = 0.9 and perform a more rigorous studyIf any P value will do, let’s choose a value close to 1.From a designer’s perspective, a large P value is better(because it allows us to delay expansion until the very end of the trajectory).Mention that we performed a small 1-person study that confirmed there was still an effect with P=0.9.
29 Full Study 2 conditions: No expansion (to establish a, b values) Expanding targets, withAnimated growthP = 0.9Expansion factor of 2Quickly mention again why the factor of 2.
30 Full Study 2 conditions x 13 (A,W) values x 5 repetitions x 5 blocks x 12 participants= 7800 trials
31 Results Time (seconds) A, W values Statistically significant For simplicity, refer to the x-axis as “different ID values”.A, W values
35 Results Time (seconds) P = 0.9 ID (index of difficulty) Since our measured MT approximately coincides with the lower bound, we have essentially shown that the advantage of expansion is about as good as you could possibly expect.And this is with an expansion point P of 0.9, so the expansion only happens at the very end of the trajectory.Note that we can therefore use the lower bound as a predictive tool.ID (index of difficulty)
36 Implications For single-target selection task, Expansion yields a significant advantage, even when P=0.9What about multiple targets ?Expansion point p
38 Les variables dans une expérience Variables indépendantes: celles qu’on manipule (on les appelle aussi les facteurs); correspondent aux conditions (ou traitements ou niveaux)Variables dépendantes: celles qu’on mesureVariables de contrôle: celles qu’on contrôle, c.-à-d. qu’on essaie de garder constantes entre les conditionsVariables aléatoires: celles qu’on laisse varier, de manière le plus aléatoire possible.Exemples: âge, sexe, profil socio-économique, etc.Comment assurer une variation aléatoire entre les conditions ?Assignation aléatoire des participants aux conditionsDésavantage: Ces variables vont introduire plus de variabilité dans nos résultatsAvantage: Nos résultats seront plus généraux; nos conclusions vont s’appliquer à plus de situationsVariables confondantes: celles qui varient de manière systématique entre les conditions. On veut éliminer ces variables!
39 Régression linéaireYXSortie: pente, intersection, et coéfficient de corrélation de Pearson r qui est dans l’intervalle [-1,1]
40 Un lien causal …Dans une expérience bien contrôlée, s’il n’y a pas de variables confondantes, et on trouve que les variable dépendantes changent lorsqu’on change les variables indépendantes, on peut conclure qu’il y a un lien causal: le changements dans les variables indépendantes cause le changement dans les variables dépendantes. Dans ce cas, une corrélation impliquerait un lien causal.
41 … versus une corrélation simple Par contre, si on ne fait qu’observer une corrélation entre deux variables X et Y, sans contrôler les conditions, cela n’implique pas un lien causal entre eux. Il se pourrait queX a un effet sur YY a un effet sur XUne troisième variable, Z, a un effet sur X et YC’est pour ça qu’on essaie d’éliminer les variables confondantes dans les expériences
42 ExempleDes chercheurs voulait savoir quelle variable pourrait prédire les chances qu’un conducteur de motocyclette ait un accident de moto. Ils ont cherché des corrélations entre le nombre d’accidents, et l’âge, le niveau socio-économique, etc.Ils ont trouvé que la plus forte corrélation était avec le nombre de tatous du conducteur.Évidemment, les tatous ne causent pas les accidents, ni l’inverse.
44 Examples of Questions to Answer in an Experiment Of 3 interfaces, A, B, C, which enables fastest performance at a given task?Does prozac have an effect on performance at tying shoe laces?How does frequency of advertisements on television affect voting behaivour?Can casting a spell on a pair of dice affect what numbers appear on them?
45 Elements of an Experiment PopulationSet of all possible subjects / observationsSampleSubset of the population chosen for study; a set of subjects / observationsSubjectsPeople/users under study. The more politically correct term within HCI is “participants”.Observations / Dependent variable(s)Individual data points that are measured/collected/recordedE.g. time to complete a task, errors, etc.Condition / Treatment / Independent variables(s)Something done to the samples that distinguishes them (e.g. giving a drug vs placebo, or using interface A vs B)Goal of experiment is often to determine whether the conditions have an effect on observations, and what the effect is
46 Tasks to Design and Run an Experiment Choose independent variablesChoose dependent variablesDevelop hypothesisChoose design paradigmChoose control proceduresChoose a sample sizePilot experimentOften more exploratory, varying a greater number of variables to get a “feel” for where the effect(s) might beRun experimentFocuses in on the suspected effect; tries to gather lots of data under key or optimal conditions to result in a strong conclusionAnalyze dataUsing statistical tests such as ANOVAInterpret results
47 HypothesisStatement, to be tested, of relationship between independent and dependent variablesThe null hypothesis is that the independent variables have no effect on the dependent variables
48 Experimental Design Paradigms Between subjects or within subjects manipulation (entre participants vs à travers tous les participants)Example: designs with one independent variableBetween subjects designOne independent variable with 2 or more levelsSubjects randomly assigned to groupsEach subject tested under only 1 conditionWithin subject designEach subject tested under all conditionsOrder of conditions randomized or counterbalanced (why?)
49 What To Control Subject characteristics Task variables Gender, handedness, etc.AbilityExperienceTask variablesInstructionsMaterials usedEnvironmental variablesSettingNoise, light, etc.Order effectsPracticeFatigue
50 How to Control for Order Effects CounterbalancingFactorial DesignLatin Square
52 Data Analysis and Hypothesis Testing Describe dataDescriptive statistics (means, medians, standard deviations)Graphs and tablesPerform statistical analysis of resultsAre results due to chance? (That is, with what probability)
53 ANOVA “Analysis of Variance” A statistical test that compares the distributions of multiple samples, and determines the probability that differences in the distributions are due to chanceIn other words, it determines the probability that the null hypothesis is correctIf probability is below 0.05 (i.e. 5 %), then we reject the null hypothesis, and we say that we have a (statistically) significant resultWhy 0.05 ? Dangers of using this value ?
54 Techniques for Making Experiment more “Powerful” (i. e Techniques for Making Experiment more “Powerful” (i.e. able to detect effects)Reduce noise (i.e. reduce variance)Increase sample sizeControl for confounding variablesE.g. psychologists often use in-bred rats for experiments !Increase the magnitude of the effectE.g. give a larger dosage of the drug
55 Uses of Controlled Experiments within HCI Evaluate or compare existing systems/features/interfacesDiscover and test useful scientific principlesExamples ?Establish benchmarks/standards/guidelines
57 Exemple d’un plan d’expérience … Pour chaque participant …Pour chaque condition majeure ... *On fait des essais de réchauffementOn a un certain nombre de blocs, séparés par des pausesPour chaque bloc …On répète chaque condition mineure un certain nombre de fois ** Comment ordonner ces choses ?