How satisfied are you with your life?

How happy are you with your job or your marriage?

Are you extroverted or introverted?

It’s hard to capture the fickle nature of attitudes and constructs in any measure. It can be particularly hard to do that with just one question or item.

Consequently, psychology, education, marketing, and user experience have a long history of recommending multiple items to measure a construct.

Using only a single item to measure a construct is often greeted with skepticism—the Net Promoter Score being a recent example. It’s based on responses to only a single item (how likely are you to recommend to a friend) to measure loyalty.

But is it ever acceptable to use just a single item to measure a construct like loyalty, satisfaction, or ease of use?

History of Multi-Item Scales

The classic text that has influenced much of scale development in the behavioral sciences comes from Psychometric Theory. Jum Nunnally, its author, recommended multi-scale instruments because “measurement error averages out when individual scores are summed to obtain a total score” (p. 67).

This thinking has influenced standardized testing and personality assessments. For example, a well-known assessment of personality, the 16PF, has 185 items.

Marketers also have...

Finding and fixing usability problems in an interface leads to a better user experience.

Beyond fixing problems with current functionality, participant behavior can also reveal important insights into needed new features.

These problems and insights are often best gleaned from observing participants interacting with a website, app, or hardware device during actual use or simulated use (during a usability test).

With the advent of remote testing platforms like MUIQ, you can collect videos of hundreds of people attempting tasks in a short amount of time. It can take quite a long time to actually view every video though. You ’ll, therefore, want to make the most of your time by systematically learning as much as you can and coding observations into insights. Here’s the process we follow:

1. Start with a random (or pseudo-random) sample of videos.

If you have a lot of videos to watch for a task (e.g. 50+), pick a subset to start with. I recommend picking every 5th or 10th video so the videos you watch aren’t all from the beginning or end of the study. If you have a smaller number of videos (say < 20) then this step is less important and you can take the time to watch all of them.

2. Record the ID...

Finding and fixing problems encountered by participants through usability testing generally leads to a better user experience.

But not all participants are created equal. One of the major differentiating characteristics is prior experience.

People with more experience tend to perform more tasks successfully, more quickly and generally have a more positive attitude about the experience than inexperienced people.

But does testing with experienced users lead to uncovering fewer problems or different problems than with novices? As participants gain experience with an app do they find workarounds that mean problems are less detectable in usability tests? Or do expert users tend to uncover the same or even more issues because of their knowledge of the app?

Defining Experience

For quite some time there has been a discussion on the differences between novice and experienced users. For example, in Nielsen’s seminal book, Usability Engineering, he defines three types of experience levels that are worth noting:

  • Domain knowledge: Ignorant versus knowledgeable
  • Computer knowledge: Minimal versus extensive
  • System knowledge: Novice versus expert

While there’s general agreement that domain, tech, and app-specific experience are the aspects that separate novices from experts, it’s less clear what those thresholds are. For example, Dumas and Redish provide the general guidance of:

  • Novice: < 3 months...

Customer satisfaction is a staple of company measurement.

It’s been used for decades to understand how customers feel about a product or experience.

Poor satisfaction measures are an indication of unhappy customers, and unhappy customers generally won’t purchase again, leading to poor revenue growth.

But is satisfaction the wrong measure for most companies?

That’s certainly the claim Fred Reichheld has made and advocated the Net Promoter Score as a better measure of company growth.

In his book, The Loyalty Effect, Reichheld makes a compelling claim against satisfaction as an effective measure of growth. He reported that 90% of car customers reported being satisfied or very satisfied when responding to a customer satisfaction survey, yet less than 40% actually repurchased the same brand of car (p. 236).

He strengthened that claim in his 2003 HBR article:

“Our research indicates that satisfaction lacks a consistently demonstrable connection to actual customer behavior and growth.”

In particular, he calls out the American Consumer Satisfaction Index (ASCI) as being a poor predictor of growth:

“…it is difficult to discern a strong correlation between high customer satisfaction scores and outstanding sales growth.”

Some Evidence NPS Predicts Future Growth

In an earlier article,...

This Valentine’s Day around $2 billion will be spent on flowers.

A lot of that ordering will be online.

Poor online experiences mean shoppers will abandon an order and go somewhere else, or not return when they need to purchase flowers again.

Having a strong user experience will ensure customers can find the right arrangement, for the right price, and have the flowers delivered fresh and on time.

Benchmarking The Flower Ordering Experience

To understand the quality of the online experience, we collected UX benchmark metrics on four popular flower websites in both 2017 and 2018.

  • 1-800-Flowers
  • FTD
  • ProFlowers
  • Teleflora

A good benchmark indicates where a website falls relative to the competition and is an essential step to take to understand how any design changes contribute to a quantifiable improvement. See the introduction to UX benchmarking for more background on this essential UX method.

The Study

We conducted two benchmark studies: a retrospective and task-based. In the retrospective study, we had 200 participants who recently visited or purchased from one of the flower websites reflect on their most recent experiences. In the task-based study, we had 120 participants who recently visited or purchased from any flower website attempt a predetermined task on one of the four websites (randomly assigned).

The data was collected in January 2018. Participants in the studies answered the 8-item SUPR-Q (including the Net...

UX research efforts should be driven by business questions and a good hypothesis.

Whether the research is a usability evaluation (unmoderated or moderated), survey, or an observational method like a contextual inquiry, decisions need to be made about question wording, response options, and tasks.

But in the process of working through study details, often the original intent of the study can get lost.

At its worst, study-design can get bogged down by internal politics as multiple stakeholders provide input.

Decisions are made to satisfy multiple stakeholders rather than what most efficiently addresses research goals. To help ensure a study design addresses the research questions and to help guide decision-making, we’ve found a grid that aligns research questions to study components helps.

To create a research grid, follow these steps.

  1. List the research goals and hypotheses. Examples of research questions include:
    • What are the pain points users have with purchasing and installation of our enterprise software product?
    • Are participants noticing product placements when searching for computers on our website?
    • How is our brand perceived by our current and prospective customers?
    • How loyal is our customer base?
    • Do financial advisors use the product filter to find our mutual funds?
    • How accurate are the search results? Do key products appear on the first page of the search results page?
  2. Place the research goals, questions, or hypotheses at...

It’s the only number a company needs to grow.

Or at least that’s what was proclaimed in the title of the now famous HBR article that helped popularize the Net Promoter Score (NPS). Lately it’s been taking on more criticism.

The NPS is compelling to executives because of its simplicity and for what it purports to do: be the one number a company should track for revenue growth. For example:

It’s hard to get simpler than the NPS. It’s only a single item, compared to multiple items in most customer satisfaction questionnaires. But while simplicity is good, it’s not the only thing. A good measure needs to be valid and reliable. Validity in this sense is best gauged by seeing how well the NPS lives up to its claim: to predict future growth.

NPS Validation

To establish the validity and make the claim that NPS predicts growth, Fred Reichheld, its creator, reported that the NPS was the best or second-best...

While UX research may be a priority for you, it probably isn’t for your participants. And participants are a pretty important ingredient in usability testing.

If people were predictable, reliable, and always did what they said, few of us would make a living in improving the user experience!

Unfortunately, people don’t always show up when they say they will for your usability test, in-depth interview, or other research study. They get busy, forget, or have other things they need to do other than participate in your study.

For in person and remote studies, always plan for a specific percentage of participants not showing up. The “typical” no-show rate we see for moderated (in person or remote) studies is between 10% and 20%. But we’ve seen as many as half the people not showing.

We’ve been conducting studies every week at MeasuringU for years and have implemented several steps to get no-show rates down to 0. Here are eight of them.

1. Establish both phone and email contact.

Don’t just email participants for scheduling; speak to them on the phone and let them know there are people waiting for them. A simple conversation lets participants know you’re real and counting on them.

2. Communicate it’s a dedicated slot.

Despite the popularity of usability testing and other one-on-one research, most people will have never participated in...

Benchmarking is an essential part of a plan to systematically improve the user experience.

A regular benchmark study is a great way to show how design improvements may or may not be improving the user experience of websites and products.

After you’ve decided you’re ready to conduct a benchmark, you’ll need to consider whether to conduct it internally within your company or outsource all or part of it to an external firm like MeasuringU.

Benchmark studies involve many decisions in study design, along with data collection and analysis. If you manage your own benchmark, I’ve included all the relevant details in my book Benchmarking the User Experience.

Here are major factors to consider when deciding whether to outsource or run your own UX benchmark.

  • Cost: Outsourcing a benchmark will generally cost more than doing it yourself. You can see the typical cost range for benchmark studies we provide. The cost covers professional services time needed to design and execute the study, UX measurement expertise, and software/technical services as participant recruiting costs. The more competitors, larger sample size, platforms, and complexity of analysis are the biggest drivers of cost.
  • Time: Having an external company conduct the benchmark usually means it will get done faster than doing it yourself. Internal teams can focus on interpretation and buy-in instead of dealing with the minutiae...

As the world continues to go mobile, there’s a huge demand for both mobile devices and cellular plans from mobile carriers.

A lot of that buying happens online. Having a good online experience for wireless carriers ensures customers select the right device and plan.

If customers can’t find the right device or have questions about the already complicated plans, they need to visit a store, call the company, or go to a competitor—all undesirable outcomes.

Benchmarking Wireless Carriers’ Website Experiences

To understand the quality of the online experience, we collected UX benchmark metrics on five popular carrier websites:

  • AT&T
  • BT (British Telecom)
  • Sprint
  • T-Mobile
  • Verizon

A good benchmark indicates where a website falls relative to the competition and is an essential step to take to understand how any design changes contribute to a quantifiable improvement. See the introduction to UX benchmarking for more background on this essential UX method.

The Study

We conducted two benchmark studies: retrospective and task-based. In the retrospective study, we had 253 participants who recently visited or purchased from one of the carrier websites reflect on their most recent experiences. In the task-based study, we had 140 participants who recently visited or purchased from any wireless carrier website attempt a task on one of the five websites (randomly assigned).

The data was collected in November 2017. Participants in the studies answered the 8-item

It was another busy year on with 50 new articles, a new website, a new unmoderated research platform (MUIQ), and our 5th UX Bootcamp.

In 2017 over 1.2 million people viewed our articles. Thank You!

The most common topics we covered include: usability testing, benchmarking, the 3Ms of methods, metrics and measurement, and working with online panels. Here’s a summary of the articles I wrote in 2017.

Usability Testing

Facilitation remains an important skill for conducting usability tests. We provided a resource for facilitators, covered the ten golden rules of facilitating, and shown how thinking aloud—a hallmark of usability testing—affects where people look. There’s evidence that observers may affect the data in usability tests and I suggested some ways to mitigate the impact. I also provided guidance on when to assist in a usability test, how to best prepare for a moderated benchmark, how to determine task completion, and five less conventional interfaces that could use usability testing.

Prototypes are a staple of early design validation in usability testing. I provided six practical tips for testing with them and five metrics for detecting problems. Interestingly, there’s evidence that even low-fidelity prototypes serve as a good proxy (although not a substitute) for testing with actual websites and products.


Benchmarking is...

How usable is a website?

While most usability activities involve finding and fixing problems on websites (often called formative evaluations), it’s good practice to know how usable a website is (often called a summative evaluation).

Obtaining a quantitative usability benchmark allows you to understand how design and functional changes impacted the user experience. We will cover this in detail at the Denver UX Boot Camp.

Here are the steps we follow in designing, conducting and analyzing benchmark usability tests.

Designing the Study

Identify the users to test: You probably have some idea about who the users or customers are that come to your website. The biggest variable we find is the users’ prior experience with a website. Users with more experience tend to perform better on tasks, have higher perceptions of usability, and have higher Net Promoter Scores.

Finding users: You can recruit users right off your website using a pop-up invite, email users from an existing customer list, or use a panel agency that finds users who meet your requirements (e.g. recently purchased a car). While it seems like pulling customers off your website or using a customer list will be the obvious lower-cost option, we often find users recruited this way are less motivated and less reliable to complete studies than panel provided users.

Defining the...

A good measure of customer loyalty should be valid, reliable, and sensitive to changes in customer attitudes.

For the most part, the Net Promoter Score achieves this (although it does have its drawbacks).

One area the Net Promoter Score lacks in is how its scoring approach adds “noise” to the customer loyalty signal.

The process of subtracting detectors from promoters may be “executive friendly” but has the unfortunate side effect of increasing measurement error.

In an earlier article we reviewed the effects of changing the NPS scale from the original 11 points to 10 or even 5 points. We collected data from 520 U.S.-based participants and had them reflect on one of 11 brands/products. Participants answered all three variations of the NPS question with different scales (11, 10, and 5) presented in randomized order.

The results showed that changing the scale indeed changed the Net Promoter Scores (but only a little). The differences were more noticeable for individual brands than in the aggregate. The different Net Promoter Scores by response option for each brand is shown in Table 1 below.

 NPS 11NPS 10NPS 5 American-8%-6%-8% Comcast-55%-53%-63% Delta-10%4%-8% DirecTV-14%-12%-20% Dish

Surveys are one of the most cost effective ways of collecting data from current or prospective users.

Gathering meaningful insights starts with summarizing raw responses. How to summarize and interpret those responses aren’t always immediately obvious.

There are many approaches to summarizing and visually displaying quantitative data and it seems people always have a strong opinion on the “right” way.

Here are some of the most common survey questions and response options and some ways we’ve summarized them. We’ll cover many of these approaches at the Denver UX Bootcamp.

Binary Responses

If a question has only two possible response options (e.g., Male/Female, Yes/No, Agree/Disagree) then it is a binary (also called dichotomous) response option. Both options, when added, equal 100%. When summarizing just the sample of respondents, such as the percent of women who responded, you can use the ubiquitous pie graph.

Or you could go with something a bit more USA Today:

However, when you want to estimate the percent of users in your entire user population (or at least out of those who are likely to participate in your survey) who would agree with a statement, then you’ll want to use confidence intervals around the percentage. The graph below shows the percentage of the 100 respondents that agreed to a statement.

Are you sure you did that right?

When we put the effort into making a purchase online, finding information or attempting tasks in software, we want to know we’re doing things right.

Having confidence in our actions and the outcomes is an important part of the user experience.

That’s why we ask users how confident they are that they completed a task in a usability test or a tree test. To measure confidence we use the following seven-point rating scale.

Even if users are completing tasks or finding items in a navigation structure correctly, it doesn’t mean they are 100% sure that what they did was correct.

Understanding how confident users are that they completed a task is one of many ways of diagnosing interaction problems and providing a benchmark for comparisons between tasks or versions. (Note: This measure of confidence is different than a confidence interval, which is a statistical procedure to put the most plausible range around a sample mean or proportion).

Like many UX measures, it can be helpful to have a comparison to provide more meaning to the data. We’ve collected confidence data for a few years now and have compiled data from 21 studies representing 347 tasks, each with between 10 and 320 users....

The System Usability Scale (SUS) is a ten-item questionnaire administered to users for measuring the perceived ease of use of software, hardware, cell phones and websites.

It’s been around for more than a quarter century, and its wide usage has allowed us to study it extensively and write about it in this blog and in the book, A Practical Guide to the System Usability Scale.

If you are unfamiliar with the SUS, see the earlier blog for some background and fundamentals. Here are 10 things to know when using the SUS:

  1. The average SUS Score is a 68: When looking at scores from 500 products, we found the average SUS score to be a 68. It’s important to remember that SUS Scores are not percentages. Even though the SUS ranges from 0 to 100, these scaled scores are not a percentage. A score of a 68 is 68% of the maximum score, but it falls right at the 50th percentile. It’s best to express the raw number as a score and, when wanting to express it as a percentage, convert the raw score to a percentile by comparing it to the database.

  2. SUS measures usability & learnability: Even though SUS was intended to be a measure of...

Is a beautiful website more usable?

Psychological literature has discussed, for some time, the “what is beautiful is good” phenomenon.

That is, we ascribe positive attributes to things that are more attractive.

This applies to people and likely to products and websites, as well. But does that positive halo also carry over to our impressions of website usability?

It’s a bit of an open research question, but first, we need to consider: how reliable are impressions of website beauty?

Forming Impressions Early

We form impressions of the visual appeal of websites in a fraction of a second. Gitte Lindgaard and her team at the appropriately named HOT Laboratory (Human Oriented Technology) found that participants in their studies could form reliable impressions of website visual appeal in as little as 50 milliseconds (Lindgaard et al 2006)!  It takes 250 milliseconds to blink! They also found participants’ ratings of the same 100 homepages were consistent over time (typically R-Square of ~94%). That is, if users think a webpage has low attractiveness at one point in time, they feel the same way at a future point.

How to Measure Visual Appeal

In reviewing the literature on rating aesthetics, beauty and visual appeal, researchers often generate their own set of questions and scales to measure these somewhat fuzzy and overlapping constructs. There’s...

To understand problems on a website, nothing quite beats watching users. The process provides a wealth of information both about what users can or can’t do and what might be causing problems in an interface.

The major drawback to watching users live or recordings of sessions is that it takes a lot of focused time. 5 to 20 participants—the typical sample size in moderated studies—isn’t too much of a commitment.

But with unmoderated studies, the ability to collect data from hundreds of participants quickly means even a few tasks per study requires watching thousands of videos.

While there won’t be a replacement for watching session recordings and coding them (something our research team does regularly), we’re always looking for more systematically and quicker ways to identify patterns in behavior.

We’ve found that a few metrics automatically collected in unmoderated studies are good for diagnosing potential problems with websites. The MUIQ research platform collects these metrics for every page visited and summarizes them by page (see Figure 1 below and Figure 2 at the end of the article below). Here are the five metrics in more detail.

Figure 1: Key metrics summarized on each page of an unmoderated study (in this example from a...

Benchmarking is an essential step in making quantitative improvements to the user experience of a website, app, or product.

In an earlier article and course, I discussed the benefits of regularly benchmarking and it’s the topic of my forthcoming book.

While technology platforms like MUIQ have made unmoderated benchmarks popular, moderated benchmarks are still essential for benchmarking physical products, and desktop and enterprise software. It can still be a better option for websites and mobile apps.

Benchmarks are all about the data and you’ll want to ensure your procedures are properly in place to collect reliable and representative data. Before running your first participant, here are five things to do to prepare for a moderated benchmarking study.

1. Review the Study Script

Have the facilitators go through the script in detail to ensure they understand all the steps participants can possibly take with the interface. Facilitators should pay particular attention to points where they will need to probe or intervene and ensure they know how to score each task.

Tip: To collect more reliable task time metrics, we don’t recommend facilitators probe or interrupt the participant mid-task. Instead, save questions and probing points for between tasks or at the beginning or end of the study.

2. Check the Technology (Twice)

If technology always worked as planned, there wouldn’t be...

The Net Promoter Score (NPS) is a popular metric for measuring customer loyalty.

For many companies, it’s THE only metric that matters.

With such wide usage across different industries, departments, and companies of various sizes, it’s no surprise many questions and controversies arise.

Some are systemic—should the NPS be used as a key metric?—and some are trivial—should the NPS be treated as a percentage?

In case you aren’t familiar with it, the NPS is based on a single question, “How likely are you to recommend a product to a friend or colleague?” Participants respond on an 11-point scale (0 = not at all likely to recommend and 10 = extremely likely to recommend).

Responses of 9 and 10 are considered “promoters,” 7 and 8 “passives,” and 0-6 “detractors.” Detractors are customers likely saying bad things about your product or service and even discouraging others to use it. Promoters are customers most likely to spread positive word of mouth.

The “Net” in Net Promoter Score comes from subtracting the percentage of detractors from the percentage of promoters. A negative score means you have more detractors than promoters and a positive score means your promoters outweigh...