Posts in usability testing
Measuring Task Time During Usability Testing

I design applications that are used all day, every day in a corporate setting. Because of this, I measure efficiency and time when I do usability studies to make sure that we are considering productivity as part of our design process. 

Although actual times gathered from real interactions via an analytics package are more reliable and quantifiable than those gathered in usability testing, they require you to have a lot of users or a live product. When you're in the design stage, you often don't have the ability to gather that kind of data, especially when you're using mockups or prototypes instead of a live application. Being able to gauge the relative times of actions within a process during usability testing can be helpful, and being able to compare the times of two new design options is also valuable. Gathering information about task times early in the design phase can save money and effort down the road. 


 

HOW TO CONDUCT A TIME STUDY

During a typical usability study, simply collect the times it took to accomplish a task. The best way to do this is to measure time per screen or activity in addition to the duration of the task, so that you'll be able to isolate which step of a process is taking the most time or adding unnecessary seconds. This can be more illuminating from a usability perspective than simply knowing how long something takes.

Make a video screen recording of the session. Pick a trigger event to start and pause timing, such as clicking a link or a button. Gather the times via the timestamp when you replay the video. Don't try to time with a stopwatch during the actual usability test. You can make a screen recording with SnagIt, Camtasia, or Morae, or through any number of other tools.

When comparing two designs for time, test both designs in the same study and use the same participants. This means you'll have a within-subjects study, which produces results with less variation - a good thing if you have a small sample size. To reduce bias, rotate the order of designs so each option is presented first half of the time.  


 

COMMON QUESTIONS ABOUT TIME STUDIES

Should you count unsuccessful tasks?

Yes and no. If the user fails to complete the task, or the moderator intervenes, exclude it from the time study. If the user heads the wrong direction, but eventually completes the task, include it.

What if my participant is thinking aloud and goes on a tangent, but otherwise, they completed the task?

I leave "thinking aloud" in and let it average in the results. If the participant stops what they are doing to talk for an extended period of time (usually to ask a question or give an example), I exclude the seconds of discussion. But, be conservative with the amount of time excluded and make sure you've made a note of how long the excluded time was. 

Should you tell participants they are being timed?

I don't. Sometimes I'll say that we're gathering information for benchmarking, but I generally only give them the usual disclaimer about participating in a usability test and being recorded.

How relevant are these results? 

People will ask if times gathered in an unnatural environment like usability testing or a simulation are meaningful. These times are valuable because some information is better than no information. However, it's important to caveat your results with the methodology and the environment in which the information was collected.


 

REPORTING RESULTS: AVERAGE TASK TIMES WITH CONFIDENCE INTERVALS

Report the confidence interval if you want to guesstimate how long an activity will take: "On average, this took users 33 seconds. With 95% confidence, this will take users between 20 and 46 seconds."

Report the mean if you want to make an observation that one segment of the task took longer than the other during the study. A confidence interval may not be important if your usability results are presented informally to the team, or you're not trying to make a prediction. Consider the following scenario: you notice, based on your timings, that a confirmation page is adding an average of 9 seconds to the task, which end-to-end takes an average of 42 seconds. Does it matter that the confirmation screen may actually take 4-15 seconds? Not really. The value in the observation is whether you think the confirmation page is worth nearly 1/4 of the time spent on the task, and whether there's a better design solution that would increase speed. 

When you're determining average task time, always take the geometric mean of times instead of the arithmetic mean/average (Excel: =GEOMEAN). This is because times are actually ratios (0:34of 0:60). If the sample size is smaller than 25, report the geometric mean. If the sample size is larger than 25, the median may be a better gauge (Excel: =MEDIAN).

If you're reporting the confidence interval, take the natural log of the values and calculate the confidence interval based on that. This is because time data is almost always positively skewed (not a normal distribution). Pasting your time values into this calculator from Measuring U is much easier than calculating in Excel. 


 

REPORTING RESULTS: CALCULATING THE DIFFERENCE BETWEEN TWO DESIGNS

For a within-subjects study, you'll compare the mean from Design A to the mean of Design B. You'll use matched pairs, so if a participant completed the task for Design A, but did not complete the task for Design B, you will exclude her both of her times from the results.

There are some issues with this, though. First, I've found it very difficult to actually get a decent p-value, so my comparison is rarely statistically significant. I suspect this is because my sample size is quite small (<15). I also have trouble with the confidence interval. Often my timings are very short, so I will have a situation where my confidence interval takes me into negative time values, which, though seemingly magical, calls my results into question.  

Here's the process: 

  1. Find the difference between Design A and B for each pair. (A-B=difference)

  2. Take the average of the differences (Excel: =AVERAGE).

  3. Calculate the standard deviation of the differences (Excel: =STDEV).

  4. Calculate the test statistic.
    t = average of the difference / (standard deviation / square root of the sample size)

  5. Look up the p-value to test for statistical significance.
    Excel: =TDIST(test statistic, sample size-1, 2). If the result is greater than 0.01, you have statistical significance.

  6. Calculate the confidence interval:

    1. Confidence interval = Absolute value of the mean of the difference +/- (critical value (standard deviation / square root of sample size).

    2. Excel critical value at 95% confidence: =TINV(0.05, sample size - 1)


 

REFERENCES

Both of these books are great resources. The Tullis/Albert book provides a good overview and is a little better at explaining how to use Excel. The Sauro/Lewis book gives many examples and step-by-step solutions, which I found more user-friendly. 

Measuring the User Experience by Tom Tullis and Bill Albert ©2008

Quantifying the User Experience by Jeff Sauro and James R. Lewis ©2012

Interested in more posts about usability testing? Click here.

Measuring Efficiency during Usability Testing

Recently, most of my work has been developing enterprise software and web applications. Because I'm building applications that employees spend their whole workday using, productivity and efficiency matters. This information can be uncovered during usability testing. 

The simplest way to capture the amount of effort during usability testing is to keep track of the actions or steps necessary to complete a task, usually by counting page views or clicks. Whichever you count should be meaningful and easily countable, either by an automated tool or video playback. 

There are two ways to examine the data - comparison to another system, and comparing the average users' performance to the optimal performance.

Compare One App to Another

Use this when you're comparing how many steps it took in the new application vs. the old application. Here, you'll compare the "optimal paths" of both systems and see which one required fewer steps. This doesn't require usability test participants and can be gathered at any time. It can be helpful to present this information in conjunction with a comparison time study, as it may become obvious that App A was faster than App B because it had fewer page views.

Compare the Users' Average Path to the Optimal Path

To do this, you'll compare the average click count or page views per task of all of the users in your usability study to the optimal path for the system. The optimal path should be the expected "best" path for the task. 

More than simply reporting efficiency, comparing average performance to optimal performance can uncover usability issues. For example, is there a pattern of users deviating from the "optimal path" scenario in a specific spot? Was part of the process unaccounted for in the design, or could the application benefit from more informed design choices?

Here's the process I use to calculate efficiency against the optimal path benchmark. 

  1. Count the clicks or page views for the optimal path.

  2. Count the clicks or page views for a task for each user.

  3. Exclude failed tasks.

  4. Take the average of the users' values (Excel: =AVERAGE or Data > Data Analysis* > Descriptive Statistics).

  5. Calculate the confidence interval of the users' values (Excel: Data > Data Analysis* > Descriptive Statistics).

  6. Compare to the optimal path benchmark and draw conclusions.

*Excel for Mac does not include the Data Analysis package. I use StatPlus instead. 

Reference

Measuring the User Experience by Tom Tullis and Bill Albert ©2008

Read more posts about usability testing.

Usability Testing Hack: Speed Through Videos in Half the Time

There are two reactions to this usability testing hack: 

  1. Doesn’t everybody do it that way? OR, 

  2. I can’t believe the hours I’ve squandered! 

Ready to find out which side you’re on?

Watch your usability testing videos at a playback speed of 1.5 or 2. An hour-long video will only take 30 to 45 minutes to watch. 

When I usability test, I always record and re-watch each session to make sure that I see all the behaviors that were invisible to me at the time, as well as to backup my own notes and assumptions. (Ever finish the sessions feeling like “everybody” missed something, only to discover that fewer than half actually did? This is why I re-watch.). If you’re doing unmoderated remote usability testing through usertesting.com (or similar), you’re also faced with hours of video to watch. Re-watching, though valuable to the process, makes usability testing more expensive for the client, and also lengthens your turnaround time for reporting results. It’s in everyone’s best interest to recover some of this time by adjusting the video’s speed. 

How to Adjust Playback Speed

Nearly every video player has a playback speed control. On a Mac, I like the VLC video player because it’s not obvious how to change playback speed in iTunes or Quicktime (or maybe it’s not possible anymore). If you’re using Windows Media Player on a PC, you can find playback speed if you right-click the video and click on “Enhancements” (I wish I was making this up). 

A speed somewhere between 1.5 and 2 works well for me to be able to watch and take notes. It’s even possible to grab user quotes at this speed. If I’m grabbing timestamps for a time study, and I have already collected my general usability findings, I’ll set the video to play as fast as possible (8-16x) and only look for the clicks that correspond to what I’m timing.

Once you know about this hack, you’ll find yourself watching YouTube at 1.5, speeding through podcasts, and even taking online classes at warp speed - there are so many applications! 

Interested in more posts about usability testing? Read on.