A client recently asked: When should I use Remote Usability Testing rather than A/B Testing? While doing doing some research into when each method is used best, I came across some data on the low success rates for A/B Testing.
According to this book only 10% of Google’s A/B testing led to a success. (1)
Ronny Kohavi of Bing (Microsoft) quotes failure rate of about 80%-90%. That is a success rate of 10%-20%. (2)
QualPro, who where one of the pioneers of multivariate testing, have published data on testing 150,000 idea over 22 years. 25% of tests had improvements, 53% no significant impact, 22% were worse. (3)
A/B Testing from AppSumo.com showed only 1 out of 8 A/B tests drove significant change. Again, that is a success rate of only around 12.5%. (4)
Low success rates don’t mean that when a Significant change is found that the uplift will be big. What these low numbers mean is that to discover an improvement will take far more time and investment than most people realize. In another post, I will discus how using Remote Usability Testing for Interaction Testing can be a useful addition to the optimisation toolkit.
We wanted to find out what the real impact is of a CAPTCHAs is on customer experience. To find out we set up an online usability test using our own tool (This kind of research is also sometimes called an un-moderated remote usability test).
Impact on User Experience
In the research, people had to complete a CAPTCHA as part of a ticket registration process.
The result showed that:
The chart below shows how hard it was to complete the CAPTCHA. It illustrates the percentages of participants who failed entering the correct code at each attempt. On the first attempt, 38% of participants failed to enter the CAPTCHA correctly. On the second attempt over 80% of participants who retried failed again. On each subsequent step between 70% to 90% of people who re-try did fail. No participant tried more than 5 times.
Additionally, 36% of participants mentioned in the post-task questions that they had trouble with the CAPTCHA. To quote some of them:
“The CAPTCHA took about 4 attempts even though I’m sure I did it correctly. It’s never clear with these things whether to make a space between the words.”
“Captcha: it sucks”
“Captcha is hard on an iPhone”
“The captcha made me quit the task…”
Our research showed that CAPTCHAs have a significant negative impact on customer experience. They make a task hard, and for some people impossible to complete. The result is people abandoning their goal, and going elsewhere.
The challenge is that, rather than the business solving the issue of spam, the issue is pushed to the customer to solve. This creates more work and effort for the customer.
The cost of this is high. Reddit recently shared the results of removing the CAPTCHA from their registration process. They found that removing the CAPTCHA increased the account creation rates by nearly 8%. This means 4500-5000 new users a week, or over 200,000 new users a year.
While CAPTCHAs are trying to solve a business problem, they do not simply reduce spam. Instead, they significantly reduce the number of customers completing their online goal.
UXLX is happening this week in Lisbon. That can only mean that the legendary UxCocktail Hour is back as well! After four years, we’ll be bigger and better than ever!
UxCocktail Hour is the only unofficial fringe party during UXLX. If you are in Lisbon, come and join us for a drink as the sun goes down at the Miradouro São Pedro de Alcantara (pictures of the miradoro).
The idea of the UxCocktail Hour is that after a day in a conference room in modern Expo, to have a drink before dinner overlooking old Lisbon. Don’t miss seeing a city in daylight that Charles Dickens described as even more beautiful than Venice.
There will be a bus from FIL, where UXLX is being held, to Miradouro São Pedro de Alcantara where the UxCocktail Hour takes place. The bus leaves FIL just after 18h15 to take you to the party.
For the local UX community, and anyone else who wants to make their own way to the UxCocktail Hour, the drinks start at 19h00.
If you want to join the UXLX official dinner, it is happening a short walk down the hill in Terreiro do Paço. Alternatively, you can join us for dinner (“ask us for details”).
Where and When
Thursday 5th of June 2014
Miradoro São Pedro de Alcantara (map)
Trype Oriente Hotel 18h15 | 18h30
Rotunda FIL 18h35 | 18h45
Miradouro 19h00 | 19h15
Save your seat on the bus. Sign up here: vista.webnographer.com
About the organizers
Webnographer is a usability agency that collects data, and generates insights to back up design decisions. It has offices in Lisbon and Ireland.
Web Made Good mix innovative technology with great design to make the web a better place. They have offices in Lisbon and London.
ONDACITY is the leading provider of relocation services to Lisbon. It is about launch a new product offering “War Rooms”, or Project rooms to people wanting to escape to Lisbon where they can be creative. The founders of Ondacity are from Malta and Australia.
Think your idea’s that valuable? Then go try to sell it and see what you get for it. Not much is probably the answer. Until you actually start making something, your brilliant idea is just that, an idea. And everyone’s got one of those.
Jason Fried – Rework
One of the fastest growing research methodologies for startups is called customer discovery, which is part of the Lean startup movement. The method, if it is done rightly, is great for identifying potential user needs. But if done wrong, can lead the startup going in the wrong direction.
A friend of ours, Santiago was thinking of starting a company aimed at tourists in Lisbon. He had joined a startup accelerator. Accelerators are meant to do exactly what they are called, accelerate startups.
Santiago came to us very sad and depressed as he had just tried Customer Discovery, and he had not really found any customers. Customer Discovery, he was told, is to go out and try to find customers. The issue was after a day he had only found one potential customer for his service.
In fact, there was a massive potential for his idea, the challenge was that he had been instructed the wrong way.
Just because you have only seen white swans does not mean that there are no Black Swans. (Black swans are native to Australia, and were not discovered until the 17th Century). The accelerator’s had just pushed him out of the door and told him to find tourists. He was not given either the method or the time to think about where or who his customers were to enable him to test his idea.
If Santiago had started with a hypothesis, his life would have been simpler. A hypothesis is a prediction of the research outcome. Coming up with a hypothesis would have forced him to think through both who, and where his potential customers where.
After we sat down with him, and helped him come up with some hypothesis, he carried out the research in another part of town and found many potential customers. Even if he had come back with negative results, with a hypothesis he would have known how he was wrong, and therefore be able to change his assumptions for the next round of tests.
A useful method for coming up with a hypothesis is Jeff Gothelf’s Proto-Personas. Jeff’s method helps one to be fast, and to give some structure to your idea of whom you think your users or customers are. You can very easily start with these proto-personas, and then use your research to try to discover them.
The take away from this is that if you use a Hypothesis, or make assumptions before carrying out your research, even a negative result will be useful. Also be careful on who you take advice from. Bad advice may mean ditching an idea that could work.
The status quo for User Experience (UX) is to test a website with only 5 users. We keep getting asked why – if this is true – do you need to use a tool like Webnographer to test with more users? The perceived wisdom is that 5 users will find 85% of the site’s usability issues. Without addressing issues like, how the context of the lab effects the users behavior, here is our answer. We argue that the idea that you only need 5 users is very old, and needs updating.
Jakob Nielsen, a Usability guru, 21 years ago wrote a paper that said that you only need to test with 5 users to discover 85% of your usability issues . The claim is based on the formula N (1-(1- .31 ) n ). The formula is simple. Nearly every UX Consultant quotes it. But the formula makes a huge assumption in that usability issues have a visibility of 31%, which means that the usability issues affect at least 31% of your users. The question is does this still hold up today after so many years? In the time since Nielsen discovered the “formula” both the technology, and the people using software has changed dramatically. I will provide evidence that we need to revisit the constants used in the formula, and show that we need to be testing with more than 5 users.
Back in 1993, when the formula was first introduced, the World Wide Web was only one year old. Nowadays, we take for granted that if you need to book a flight you just visit Expedia, Kayak, or the airline’s website. Back then if you needed to book a flight you had to go to a travel agent, who had months of training to use the Computer Reservation system. Windows 3.1 was also just released, which was the first popular Windows operating system. Before then most people used DOS, they interacted with the computer through a black screen. To just copy a file you needed to remember and type a series of commands.
In these past 21 years the profession of the Interactive Designer has been created. It was in 1994 that Carnegie Mellon University starts offering a degree in Interaction Design , followed by other universities around the world. In this time period, we have learned a lot more about what works and what does not in designing interfaces. Since then, computer systems have both become easier to use, as well as people becoming better at using them. What this means is that the obvious and big issues have probably already been discovered. Designers know about the obvious mistakes.
Since the Journal Article was written, usage of computer systems has exploded. Only 3 million people bought Windows 3.1 in the first 3 months in 1993. Recently Windows 8 sold 60 million copies in 2 months . Back then, a popular application would sell in the thousands mainly in the USA or Europe. Now it is common for a website to be used by millions of people around the world. It is easy to dismiss the idea that 5 users in Texas can predict how users in England would use a website. In fact, Nielsen on his website describing the method goes on to say that “You need to test additional users when a website has several highly distinct groups of users”. Nielsen does recommend to test 3 users from each category . How many different distinct groups does a modern website have? Is it easier just to take a sample of users from the site, or it easier to try to categorize users into different groups? Even small websites get 10’s of thousands of users a month. Therefore, even an issue that affects 10% or 20% of users impacts 1,000’s of users. Though the issues are getting smaller, the number of people experiencing them is increasing.
Nielsen’s assumption is that the smallest issue will affect at least 31% of users. Using his formula, the chart below shows the likelihood of an issue being discovered by testing with 5 or 80 people. And there is a large discoverability gap between both. A test with 5 users only discovers 85% of issues that affect at least 31% of your users. This means that 5 users only identify major issues, and it leads to smaller but still important issues being overlooked. In contrast, testing with 80 users will discover 98% of issues with a visibility as low as 5%.
It may be easy to think that a 5% issue is not that important. However, a 5% issue on a website with 100.000 users, means 5000 users struggling to find information, or being unable to complete a purchase. This is ultimately a large amount of revenue lost. Additionally, data collected in the last 2 years through Remote Usability Testing backs up the importance of fixing smaller issues. It shows in the chart below that the majority of issues have a low probability of a user failing in any one place, and that issues are widely distributed across the sites. 56% of all issues have a visibility of less than 31% (they affect less than 31% of visitors). In practical terms, this means that because 5 users can only reliability find issues with a probability of more than 31%, they leave 56% of all issues undiscovered.
It is worth pointing out that Nielsen’s formula is about occurrence of an issue. In other words, does an issue exist or not. For example, does a haystack have a needle in it. It does not estimate the frequency of an issue. In other words, how many needles does the haystack contain. We will deal with estimating frequency in another article. Far more users are needed to estimate frequency of an issue with a low margin of error, than to discover if the issues exist. Quantifying an issue is important in that, as we are faced with more issues, we need help in prioritizing them.
For finding big issues early on in the development process, methods like Guerrilla UX testing can make sense, but for production other methods need to be coincided. When Nielsen came up with the formula testing with many users was expensive, nowadays with technology like Webnographer, testing with more users is cheaper than testing in the lab. More people are using technology and are experiencing more varied issues than 20 years ago. We now have to tools for finding smaller issues, so we should use them. Do get in touch if you have any questions on how to test with five or five thousand users.
» 23 Remote Usability Methods Webnographer
» Discovering WHY from numbers Webnographer
» 5 Reasons You Should And Should Not Test With 5 Users MeasuringUsability
 Nielsen, Jakob, and Landauer, Thomas K.: “A mathematical model of the finding of usability problems,”Proceedings of ACM INTERCHI’93 Conference (Amsterdam, The Netherlands, 24-29 April 1993), pp. 206-213.