As the world continues to go mobile, there’s a huge demand for both mobile devices and cellular plans from mobile carriers.

A lot of that buying happens online. Having a good online experience for wireless carriers ensures customers select the right device and plan.

If customers can’t find the right device or have questions about the already complicated plans, they need to visit a store, call the company, or go to a competitor—all undesirable outcomes.

Benchmarking Wireless Carriers’ Website Experiences

To understand the quality of the online experience, we collected UX benchmark metrics on five popular carrier websites:

  • AT&T
  • BT (British Telecom)
  • Sprint
  • T-Mobile
  • Verizon

A good benchmark indicates where a website falls relative to the competition and is an essential step to take to understand how any design changes contribute to a quantifiable improvement. See the introduction to UX benchmarking for more background on this essential UX method.

The Study

We conducted two benchmark studies: retrospective and task-based. In the retrospective study, we had 253 participants who recently visited or purchased from one of the carrier websites reflect on their most recent experiences. In the task-based study, we had 140 participants who recently visited or purchased from any wireless carrier website attempt a task on one of the five websites (randomly assigned).

The data was collected in November 2017. Participants in the studies answered the 8-item

It was another busy year on with 50 new articles, a new website, a new unmoderated research platform (MUIQ), and our 5th UX Bootcamp.

In 2017 over 1.2 million people viewed our articles. Thank You!

The most common topics we covered include: usability testing, benchmarking, the 3Ms of methods, metrics and measurement, and working with online panels. Here’s a summary of the articles I wrote in 2017.

Usability Testing

Facilitation remains an important skill for conducting usability tests. We provided a resource for facilitators, covered the ten golden rules of facilitating, and shown how thinking aloud—a hallmark of usability testing—affects where people look. There’s evidence that observers may affect the data in usability tests and I suggested some ways to mitigate the impact. I also provided guidance on when to assist in a usability test, how to best prepare for a moderated benchmark, how to determine task completion, and five less conventional interfaces that could use usability testing.

Prototypes are a staple of early design validation in usability testing. I provided six practical tips for testing with them and five metrics for detecting problems. Interestingly, there’s evidence that even low-fidelity prototypes serve as a good proxy (although not a substitute) for testing with actual websites and products.


Benchmarking is...

How usable is a website?

While most usability activities involve finding and fixing problems on websites (often called formative evaluations), it’s good practice to know how usable a website is (often called a summative evaluation).

Obtaining a quantitative usability benchmark allows you to understand how design and functional changes impacted the user experience. We will cover this in detail at the Denver UX Boot Camp.

Here are the steps we follow in designing, conducting and analyzing benchmark usability tests.

Designing the Study

Identify the users to test: You probably have some idea about who the users or customers are that come to your website. The biggest variable we find is the users’ prior experience with a website. Users with more experience tend to perform better on tasks, have higher perceptions of usability, and have higher Net Promoter Scores.

Finding users: You can recruit users right off your website using a pop-up invite, email users from an existing customer list, or use a panel agency that finds users who meet your requirements (e.g. recently purchased a car). While it seems like pulling customers off your website or using a customer list will be the obvious lower-cost option, we often find users recruited this way are less motivated and less reliable to complete studies than panel provided users.

Defining the...

A good measure of customer loyalty should be valid, reliable, and sensitive to changes in customer attitudes.

For the most part, the Net Promoter Score achieves this (although it does have its drawbacks).

One area the Net Promoter Score lacks in is how its scoring approach adds “noise” to the customer loyalty signal.

The process of subtracting detectors from promoters may be “executive friendly” but has the unfortunate side effect of increasing measurement error.

In an earlier article we reviewed the effects of changing the NPS scale from the original 11 points to 10 or even 5 points. We collected data from 520 U.S.-based participants and had them reflect on one of 11 brands/products. Participants answered all three variations of the NPS question with different scales (11, 10, and 5) presented in randomized order.

The results showed that changing the scale indeed changed the Net Promoter Scores (but only a little). The differences were more noticeable for individual brands than in the aggregate. The different Net Promoter Scores by response option for each brand is shown in Table 1 below.

 NPS 11NPS 10NPS 5 American-8%-6%-8% Comcast-55%-53%-63% Delta-10%4%-8% DirecTV-14%-12%-20% Dish

Surveys are one of the most cost effective ways of collecting data from current or prospective users.

Gathering meaningful insights starts with summarizing raw responses. How to summarize and interpret those responses aren’t always immediately obvious.

There are many approaches to summarizing and visually displaying quantitative data and it seems people always have a strong opinion on the “right” way.

Here are some of the most common survey questions and response options and some ways we’ve summarized them. We’ll cover many of these approaches at the Denver UX Bootcamp.

Binary Responses

If a question has only two possible response options (e.g., Male/Female, Yes/No, Agree/Disagree) then it is a binary (also called dichotomous) response option. Both options, when added, equal 100%. When summarizing just the sample of respondents, such as the percent of women who responded, you can use the ubiquitous pie graph.

Or you could go with something a bit more USA Today:

However, when you want to estimate the percent of users in your entire user population (or at least out of those who are likely to participate in your survey) who would agree with a statement, then you’ll want to use confidence intervals around the percentage. The graph below shows the percentage of the 100 respondents that agreed to a statement.

Are you sure you did that right?

When we put the effort into making a purchase online, finding information or attempting tasks in software, we want to know we’re doing things right.

Having confidence in our actions and the outcomes is an important part of the user experience.

That’s why we ask users how confident they are that they completed a task in a usability test or a tree test. To measure confidence we use the following seven-point rating scale.

Even if users are completing tasks or finding items in a navigation structure correctly, it doesn’t mean they are 100% sure that what they did was correct.

Understanding how confident users are that they completed a task is one of many ways of diagnosing interaction problems and providing a benchmark for comparisons between tasks or versions. (Note: This measure of confidence is different than a confidence interval, which is a statistical procedure to put the most plausible range around a sample mean or proportion).

Like many UX measures, it can be helpful to have a comparison to provide more meaning to the data. We’ve collected confidence data for a few years now and have compiled data from 21 studies representing 347 tasks, each with between 10 and 320 users....

The System Usability Scale (SUS) is a ten-item questionnaire administered to users for measuring the perceived ease of use of software, hardware, cell phones and websites.

It’s been around for more than a quarter century, and its wide usage has allowed us to study it extensively and write about it in this blog and in the book, A Practical Guide to the System Usability Scale.

If you are unfamiliar with the SUS, see the earlier blog for some background and fundamentals. Here are 10 things to know when using the SUS:

  1. The average SUS Score is a 68: When looking at scores from 500 products, we found the average SUS score to be a 68. It’s important to remember that SUS Scores are not percentages. Even though the SUS ranges from 0 to 100, these scaled scores are not a percentage. A score of a 68 is 68% of the maximum score, but it falls right at the 50th percentile. It’s best to express the raw number as a score and, when wanting to express it as a percentage, convert the raw score to a percentile by comparing it to the database.

  2. SUS measures usability & learnability: Even though SUS was intended to be a measure of...

Is a beautiful website more usable?

Psychological literature has discussed, for some time, the “what is beautiful is good” phenomenon.

That is, we ascribe positive attributes to things that are more attractive.

This applies to people and likely to products and websites, as well. But does that positive halo also carry over to our impressions of website usability?

It’s a bit of an open research question, but first, we need to consider: how reliable are impressions of website beauty?

Forming Impressions Early

We form impressions of the visual appeal of websites in a fraction of a second. Gitte Lindgaard and her team at the appropriately named HOT Laboratory (Human Oriented Technology) found that participants in their studies could form reliable impressions of website visual appeal in as little as 50 milliseconds (Lindgaard et al 2006)!  It takes 250 milliseconds to blink! They also found participants’ ratings of the same 100 homepages were consistent over time (typically R-Square of ~94%). That is, if users think a webpage has low attractiveness at one point in time, they feel the same way at a future point.

How to Measure Visual Appeal

In reviewing the literature on rating aesthetics, beauty and visual appeal, researchers often generate their own set of questions and scales to measure these somewhat fuzzy and overlapping constructs. There’s...

To understand problems on a website, nothing quite beats watching users. The process provides a wealth of information both about what users can or can’t do and what might be causing problems in an interface.

The major drawback to watching users live or recordings of sessions is that it takes a lot of focused time. 5 to 20 participants—the typical sample size in moderated studies—isn’t too much of a commitment.

But with unmoderated studies, the ability to collect data from hundreds of participants quickly means even a few tasks per study requires watching thousands of videos.

While there won’t be a replacement for watching session recordings and coding them (something our research team does regularly), we’re always looking for more systematically and quicker ways to identify patterns in behavior.

We’ve found that a few metrics automatically collected in unmoderated studies are good for diagnosing potential problems with websites. The MUIQ research platform collects these metrics for every page visited and summarizes them by page (see Figure 1 below and Figure 2 at the end of the article below). Here are the five metrics in more detail.

Figure 1: Key metrics summarized on each page of an unmoderated study (in this example from a...

Benchmarking is an essential step in making quantitative improvements to the user experience of a website, app, or product.

In an earlier article and course, I discussed the benefits of regularly benchmarking and it’s the topic of my forthcoming book.

While technology platforms like MUIQ have made unmoderated benchmarks popular, moderated benchmarks are still essential for benchmarking physical products, and desktop and enterprise software. It can still be a better option for websites and mobile apps.

Benchmarks are all about the data and you’ll want to ensure your procedures are properly in place to collect reliable and representative data. Before running your first participant, here are five things to do to prepare for a moderated benchmarking study.

1. Review the Study Script

Have the facilitators go through the script in detail to ensure they understand all the steps participants can possibly take with the interface. Facilitators should pay particular attention to points where they will need to probe or intervene and ensure they know how to score each task.

Tip: To collect more reliable task time metrics, we don’t recommend facilitators probe or interrupt the participant mid-task. Instead, save questions and probing points for between tasks or at the beginning or end of the study.

2. Check the Technology (Twice)

If technology always worked as planned, there wouldn’t be...

The Net Promoter Score (NPS) is a popular metric for measuring customer loyalty.

For many companies, it’s THE only metric that matters.

With such wide usage across different industries, departments, and companies of various sizes, it’s no surprise many questions and controversies arise.

Some are systemic—should the NPS be used as a key metric?—and some are trivial—should the NPS be treated as a percentage?

In case you aren’t familiar with it, the NPS is based on a single question, “How likely are you to recommend a product to a friend or colleague?” Participants respond on an 11-point scale (0 = not at all likely to recommend and 10 = extremely likely to recommend).

Responses of 9 and 10 are considered “promoters,” 7 and 8 “passives,” and 0-6 “detractors.” Detractors are customers likely saying bad things about your product or service and even discouraging others to use it. Promoters are customers most likely to spread positive word of mouth.

The “Net” in Net Promoter Score comes from subtracting the percentage of detractors from the percentage of promoters. A negative score means you have more detractors than promoters and a positive score means your promoters outweigh...

Many researchers are familiar with the Hawthorne Effect in which people act differently when observed.

It was named when researchers found workers at the Hawthorne Works factory performed better not because of increased lighting but because they were being watched.

This observer effect happens not only with people but also with particles. In physics, the mere act of observing a phenomenon (like subatomic particle movement) changes the outcome.

The Observer Effect

While there’s been some question about the actual details of the now infamous Hawthorne experiment and people are not subatomic particles, there has been strong evidence for other sources for this aptly named social facilitation, audience effect, or more generally the observer effect: people tend to act differently when being observed.

The “White-Coat” response has shown that a patient’s blood pressure rises from the psychological effect of the office visit (who likes going to the doctor?). The effect also differs depending on the gender of the observer and participant. Research has found, for example, women appear more physiologically affected by social rejection, whereas men react more to achievement challenges.

Interestingly, the behavior isn’t always consistent. There’s some evidence that when people perform rote and simple tasks, performances tend to improve when being observed. Conversely, performance tends to degrade when tasks are complex and less...

The System Usability Scale (SUS) is the most widely used questionnaire for measuring the perception of usability.

It’s been around for more than 30 years. While its original term “system” has fallen somewhat out of favor, its usage has not—with thousands of citations in the literature.

The system can be anything from business software, consumer software, websites, mobile apps, or hardware.

With such wide usage across industry and academia, there has been a lot of research into the SUS and many practitioners may not be familiar with some of it. Here are four recent advances with the System Usability Scale.

1. Drop the Learnability factor

While SUS was designed to be unidimensional (measuring only the construct on perceived usability), there was some evidence it measured more than one thing (multidimensional). In 2009, Jim Lewis and I published a paper using independent datasets that showed the SUS had a second factor. We called it the learnability factor from items 4 and 10 based on their wording:

  • I think that I would need the support of a technical person to be able to use this system.
  • I would imagine that most people would learn to use this system very quickly.

While other papers since 2009 had consistently found more than one factor (good), there wasn’t consistency in which items loaded on the two...

Surveys often suffer from having too many questions.

Many items are redundant or don’t measure what they intend to measure.

Even worse, survey items are often the result of “design by committee” with more items getting added over time to address someone’s concerns.

Let’s say an organization uses the following items in a customer survey:

  • Satisfaction
  • Importance
  • Usefulness
  • Ease of use
  • Happiness
  • Delight
  • Net Promoter (Likelihood to recommend)

Are all those necessary? If you had to reduce them, which do you pick?

Instead of including or excluding items arbitrarily (first in, first out) or based on the most vocal stakeholder, you can take a more scientific approach to determine which variables to keep.

Clients often pose this question to us when examining their data. As is common with quantitative analysis, you can take multiple approaches and in this case they’re all based on correlations. Here are seven techniques we use to identify which items to remove or keep, progressing in sophistication (and needed skill and software).

  1. Correlation Between Items
  2. Start with a simple correlation table between the items. Look to identify items that don’t tend to correlate highly with other items. The figure below is a correlation matrix between nine items in a questionnaire (labeled A to I). Item D tends to correlate lower with the other items and using this approach is a good candidate for removal. Although it’s...

There’s been a long heated debate within the industry about the need to certify User Experience practitioners.

After all, many professional organizations—accountants, realtors, attorneys, and even ergonomists—have some sort of official certification.

Certifications provide some indication that a minimum threshold of competence has been demonstrated, which in theory should help prospective employers, customers, and the industry as a whole. Certification may even be helpful with the perennial confusion between UI and UX.

While this debate will continue, several organizations already offer varying degrees of certifications, including HFI and Nielsen Norman Group, and there are certified university courses (including our own UX Boot Camp).

Certification is a challenging topic, as is higher education in general, because it involves multiple prongs that are debated:

  1. The content and experience one needs
  2. The credential one receives
  3. The cost—usually a non-trivial amount of money

While there will inevitably be some disagreement on what content should be included in a certification program, there’s likely high agreement on several core concepts and activities. So while few can argue against the benefits of the knowledge and experience gained from a certification, many do wonder whether the credential and its corresponding cost are worth the price.

After all, a motivated person can read a few books and papers, watch free online videos, and conduct some projects...

We conduct unmoderated UX studies, surveys, and various forms of online research every week at MeasuringU.

Part of our process for delivering effective research is spending enough time up front on issues that affect the quality of results.

Here are our nine recommendations for conducting better online research.

  1. Use a Study Script. A study script is similar to a blueprint for online research or prototype for a functioning product. It’s best to work through all the details while it’s easy to make changes. After a study is programmed with logic, conditions, and tasks, it takes a lot longer to make changes and introduces opportunities for errors. Study scripts don’t need to be fancy; just Word documents or Google docs for online collaboration and tracking changes.
  2. Do You Really Need Every Demographic Question? There’s a tendency to want to ask anything and everything about the participants in an online study: age, income, gender, education, geography, and occupation to name a few. This can especially be the case when using paid participants from online panels where little is known about the respondents. Demographic question-bloat can be particularly bad when multiple stakeholders want to have input and each take their turn adding demographic questions. For every demographic question, ask:
    • How will you report on it?
    • Can you get the...

Prototypes are an effective method for incorporating early and frequent user feedback into the design process.

Even low-fidelity prototypes have been found to be good predictors of usability problems and perceptions of ease compared to fully functioning products.

We test client prototypes just about every week here at MeasuringU using our research platform MUIQ. They range from prototypes for major consumer brands to internal facing IT apps.

At one time, all high-fidelity prototypes seemed to come from Axure. Over the last couple years we’ve seen a complete shift to InVision. Regardless of the prototyping solution you use, here are six recommendations we’ve found helpful for teams looking to evaluate prototypes with users.

    1. Plan for changes. Be sure you have someone with the right knowledge and skills to make changes or fix problems with your prototype. In our experience, the people with the skills to create clickable prototypes are in high demand. Be sure someone who can change both the look and the functionality of the prototype is available before and during evaluation.
    2. Use caution when comparing prototypes with live sites. Organizations will often want to know whether their proposed designs are more effective than their existing websites or products. Running a head to head comparison is a natural step. However, by definition, fully...

Many researchers are familiar with the SUS, and for good reason.

It’s the most commonly used and widely cited questionnaire for assessing the perception of the ease of using a system (software, website, or interface).

Despite being short—10 items—the SUS has a fair amount of redundancy given it only measures one construct (perceived usability).

While some redundancy is good to improve reliability, shorter questionnaires are preferred when time in a usability study is limited or when a measure of usability is needed as part of a larger survey (which may already be too long).


In response to the need for a shorter questionnaire, Finstad introduced[pdf] the Usability Metric for User Experience (UMUX) in 2010. It’s intended to be similar to the SUS but is shorter and targeted toward the ISO 9241 definition of usability (effectiveness, efficiency, and satisfaction). It contains two positive and two negative items with a 7-point response scale. The four items are:

[This system’s] capabilities meet my requirements.

Using [this system] is a frustrating experience.

[This system] is easy to use.

I have to spend too much time correcting things with [this system].

While reducing length is a good thing, it’s not the only thing to be concerned about when developing an effective questionnaire. Subsequent analyses by Lewis et al....

UX benchmarking is an effective method for understanding how people use and think about an interface, whether it’s for a website, software, or mobile app.

Benchmarking becomes an essential part of a plan to systematically improve the user experience.

A lot is involved in conducting an effective benchmark. I’m covering many of the details in a 4-week course sponsored by the UXPA this October 2017. More on benchmarking will be available in my forthcoming book, Benchmarking the User Experience.

To start, benchmarking the user experience effectively means first understanding both what benchmarking is, what the user experience is, and then progressing to methods, metrics, and analysis.

What Is User Experience?

Few things seem to elicit more disagreement than the definition of user experience and how it may or may not differ from user interface design or usability testing. While I don’t intend to offer an official definition (there’s some health in the debate), here’s the definition I use similar to Tullis & Albert: The user experience is the combination of all the behaviors and attitudes people have while interacting with an interface. These include but aren’t limited to:

  • Ability to complete tasks
  • The time it takes to complete tasks or find information
  • Ability to find products or information
  • Attitudes toward visual appearance
  • Attitudes toward trust and credibility
  • Perceptions of ease, usefulness and satisfaction

These are...

Facilitating a usability test is a skill.

With enough of the right practice you’ll get better at facilitating and running more effective usability test sessions.

A solid foundation in both the theory and practical applications of facilitating a usability test will aid you in becoming a solid facilitator.

To help, here are ten resources for both beginners and intermediate usability test facilitators.

1. Read about the technique of usability facilitation.

While the best learning for facilitating comes from doing, it’s good to have a foundation in the techniques of facilitating. An excellent place to start is Dumas and Loring’s Moderating Usability Tests.

2. Understand the history and evolution of thinking aloud.

Having participants think aloud as they use an interface is a cornerstone technique of usability testing. To better understand both the method and how to best apply (or not apply) it in a usability test, it helps to know where think aloud came from and how it’s evolved, and even how it may affect a participant’s behavior.

3. Refine these five core techniques of facilitating.
  1. Reduce or avoid using “why” questions.
  2. Try not to plant ideas in the minds of the users.
  3. Minimize yes and no questions.
  4. Don’t rely too heavily on the “would you?” questions.
  5. Gently deflect questions back on the user to understand what they think (not you).
4. Know the ten golden rules...