Metrics to get started with in your user testing

13 Oct

Photo by Mohamed Boumaiza https://unsplash.com/@mbuiux

Thinking about using metrics in your usability testing? Metrics can be great when used at the right time, but they’re not always appropriate.

What do we mean by metrics?

Metrics are measurements that can give numerical scores to aspects of a product’s performance. Numerical scores make it easier to quantify and compare different sets of data that would otherwise be qualitative.

Things to consider before using metrics in your testing

Gathering metrics can increase costs through longer analysis times, use of unique tools and a need for greater numbers of participants to obtain reliable quantitative data (you should be testing with 20 or more per design).

Metrics are great when you want to gain comparable data to:

See where you stand amongst competition: by measuring your product against competitor products
Track progress: by comparing your product with other iterations of the same product to
Understand whether your product meets specific internal or external standards

Types of metrics

Performance metrics

These metrics are great to introduce into your study design when doing usability testing, as they are observable and give us extra information about how users carry out or ‘perform’ tasks.

Efficiency:

How long it takes users to complete tasks
- If you’re using the “think aloud” technique, it’s best to avoid time taken as a metric. People will not produce reliable data because everyone will take different amounts of time “thinking aloud”
The number of clicks users take to complete tasks
- This can be better if you’re using “ think aloud” as it’s based on the decisions people are making, not the time taken.

Effectiveness:

Measure task success rates (how many people completed the task)
Measure the number of errors users make completing tasks
- This requires a robust definition of errors to ensure consistency in the data recording

Issue metrics

Issue metrics are useful for highlighting the most problematic areas of your product, and tracking the impact of design changes. Typically, these will be uncovered during the analysis. In most cases you will want to count the number of issues found across the: whole platform, specific tasks/journeys, or specific interface elements.

Issue metrics include:

Number of usability problems identified during the testing
Severity of usability problems (e.g. minor, moderate, critical)
Types of usability problems (for example, issues with navigation)

Experience metrics

Experience metrics help us to understand how a product makes users feel, which can be useful if you’re wanting to understand how well your design creates the reaction intended.

Single ease question

Single ease questions (SEQ) are an easy metric to gather a general sense of how users feel about a system. Get a score after each task, simply ask users:

Overall, how difficult or easy was the task to complete? Use a scale of 1-7 where 1 is hard and 7 is easy.

The SEQ is widely used so can be compared to other systems easily. It has a global average response of 5.5, which is above the true middle of 4 so make sure you keep that in mind!

Other metrics

A product that creates an appropriate feeling goes down better with users, but first you need a clear idea of what users will want from your product. This can be achieved through discovery research.

A design that gets the heart racing and feels exciting may not be the best fit for a banking app, but might work well for a fast paced gaming platform.

The easiest way to gather these would be to ask participants to rate categories such as the below on a scale of 1-10:

Trust
Pleasure
Frustration

Be careful about trusting the ratings alone! Self reported data can be misleading as people find it difficult to accurately quantify their feelings. Make sure you gather qualitative data too to validate your scores!

Results

Once you’ve got the data it’s time to figure out how to present it. In usability testing it is typical to record your chosen metrics for each user, task by task. Once you have gotten all of the data for each user, and doubled checked it, it’s time to get your averages!

Use a “mean” calculation:

Add up the total values for each metric on each task.
Then divide by the number of data points.

For example: if you have used “number of errors” add up each user’s number of errors on a particular task then divide this number by the total number of users.

Example of metrics and data we used for a recent benchmarking project.

  
    
      Task
      Success
      Time
      Clicks
      Errors
      Satisfaction
    

      Task 1
      74%
      5:05
      9.2
      1.2
      6
    

      Task 2
      77%
      3:42
      11.9
      0.9
      5.2
    

      Task 3
      42%
      3:37
      10
      7.3
      4
    

Once you have got your averages you can use these figures to compare to future iterations or other similar products.

Tips for repeat testing

You can use repeat rounds of testing to compare your product with future iterations or competitor products. To make sure differences in the product(s) are the only things affecting the results you gain, you should:

Make sure to use the same tasks
Gather the same metrics metrics in the same way
- For example, use the same definition of severity
recruit a participant panel that is as similar as possible to the group from the first round
- tech literacy and prior knowledge of the system are the most important criteria to match
Use the same researchers if possible

Long term monitoring

If your product is designed for continued or repeated use over a long period you may want to consider gathering data on engagement. This can be useful to track the impact of redesigns on long term behaviour, as well as any changes to typical long term behaviour that are the result of other real-world influences.

Engagement metrics:

Number of visits over a time period, for example: visits per week.
Duration of engagement activity, how long do users keep coming back to the site for?
Drop off rates, what proportion of the users do not return over a given time period.

Engagement data is often gathered through analytics, which are great for answering surface level questions about what is happening.

It is not advised to gather this information through longitudinal UX studies, such as diary studies because they rely on self reported behaviour, which can be unreliable!

Getting metrics right

There are many good reasons to use metrics in a study. They are great at providing comparable measurements and answering the ‘what’ and ‘how many’ questions you may have about your users’ behaviour more broadly.

Metrics alone are poor for answering “Why?” questions, as they cannot explain the reasons for user behaviour. With this in mind when trying to explain behaviour metrics are best used to complement qualitative insights to help triangulate your findings.

If you are on a tight budget and want to maximise value we recommend sticking to qualitative methods where a small number of participants can provide a huge number of findings.

To sum up, don’t use metrics for the sake of it - use them at the right time. If you focus on identifying and fixing usability issues first, your data will be more reliable and robust. Then get your metrics out to fine tune the product once it’s reached a higher level!