Put Your Web Site to the Test
“I don’t know who discovered water, but I’m pretty sure it wasn’t a fish.” Coined by the media guru and professor Marshall McLuhan, the aquatic metaphor sums up how we lose objectivity once we’re immersed. When building a Web site, or graphical-user interface, objectivity becomes problematic. As this familiarity grows, it becomes increasingly hard to discern how our perceptions vary from those of the customer. This perception gap can create customer-experience blind spots the size of the Bermuda Triangle.
Web site testing is the best way to evaluate longstanding assumptions and get a better understanding of what your site visitors desire. Two methods of testing can help site owners learn from their customers: A/B testing and facilitated usability testing. A/B testing works best when you have a well-defined goal and need to choose between a few clear alternatives. Facilitated usability testing is the better choice for more complex design challenges requiring a deeper understanding of user behavior.
While there are numerous variations, traditional usability testing usually involves a facilitator providing actual customers with some tasks to achieve on the Web site. Participants are recruited from the actual customer base or from a carefully screened set of participants who match the target profile. Web site managers and the development and design team will often watch in real time, either behind one-way glass or via a Web cast. The facilitator will carefully ask questions before, during and after the tasks to understand the participants’ actions.
A/B testing involves comparing one version against another based on a desired outcome. For example, two registration pages might be created with different instructional text on each. Presumably the one that is successfully completed more often is better. Commonly tested elements include graphics, blocks of text, titles and form fields. A/B testing is a game of numbers, requiring a substantial sample size. It uses Web metrics to track performance data and draws comparisons between versions.
Choosing the Best Method
A common question is “which method is better?” The more appropriate question, however, is “which do I use when?” The information and decisions required should drive the selection of testing methods. For complex design challenges that require a richer understanding of customer motivation, a good usability facilitator can help ask the questions that unravel the hidden meaning behind user actions. Such insight can be hard to find in the reams of statistics provided by A/B testing.
This was the case for Blue Coat, a Sunnyvale, Calif.-based manufacturer of network products. Its challenge went beyond the browser to include the physical product, the user environment and more. In one study, team members took beta products to customer sites, observing and asking questions as they unpacked the product, reviewed (or ignored) the documentation, set up the security appliance online, and then configured it using their Web browser. Some of the findings during this process could never have been replicated through A/B testing of two versions of an interface component.
“Our best insight comes from watching and listening carefully to our customers. Through such interaction we get first-hand feedback, can find patterns and then optimize our products to meet our customer’s needs,” says Senior Vice President of Engineering Dave de Simone. “One-on-one dialogue is an essential part of this process and helps us to understand more than just their behavior, but also why they do things a certain way, while at the same time gaining insight regarding their application-level needs. Ultimately, this leads to a much better understanding of the problem and their business needs, and results in a better solution.”
With usability testing, you have the ability to ask contextual questions, such as “Why did you click there?” and “What were you expecting?” This ability to delve into the meaning of user actions is the reason usability consultant Jakob Nielsen suggests that A/B testing will never replace live-user testing. “In many cases, a Web site’s worst problems are not issues that you’d account for in an A/B test because you wouldn’t be aware of them unless you’ve carried out usability testing,” he explained in a 2006 interview with Matt Mickiewicz “When you limit your research to the ideas you can generate yourself, you have closed your mind to all the unexpected things that users do.”
Making the Split Decision
If your challenge is one of optimization — deciding which option yields the best return — the trial-and-error method of A/B testing provides the controlled, quantitative feedback that you need to move the design through a series of successive improvements. Amazon.com was one of the early innovators with A/B testing, rolling out multiple home page versions for a short trial period. Because of the incredible volume of visitors, sometimes a few hours would provide a definitive sample size.
For A. Harrison Barnes, chief executive officer of Pasadena, Calif.-based Juriscape, A/B tests provided the tangible feedback needed for design decisions. His group tested extensively, including as many as 13 variations of a landing page. They also modified and tested banners using an A/B comparison. In one instance, changing a small portion of banner text from generic descriptive wording to promotional offers improved the click-through rate dramatically.
To determine the optimal landing page, Barnes and his team have moved to multivariable testing, an extension of A/B testing in which multiple parameters are simultaneously combined and measured rather than just one. “We defined control variables and set up an experiment,” Barnes explains. “We do statistical analysis to find out the best combination in a multivariable-testing experiment. Of course with traditional usability testing, this is not possible.”
Choosing the right test can also depend on the types of decision-makers involved in your project. For a dysfunctional team where designers and developers don’t see eye-to-eye, bringing in the voice of the customer through a traditional usability test can help them find common ground by abandoning their opinions and instead adopting the viewpoint of the customer. The customer is truly a great unknowing arbiter of design disputes.
On the other hand, if quantifiable return on investments measurements dominate your company’s decision-making processes, a statistical evaluation, such as A/B testing, can point to performance improvements that may be easier to link to dollar amounts.
The steps involved in either test vary considerably. For traditional usability tests, a small number of participants (usually six to eight) are recruited to participate. The test can be conducted in a lab setting or remotely using Web conferencing and Web cams. Before the test occurs, detailed scenarios and user tasks need to be created, along with a list of standard questions. If you outsource the work, it’s a good idea to know whether the report will include actual design recommendations or just a description of key findings.
For an A/B test, the first step is defining what will be tested and then designing a study that limits the test to one variable. Next, the team sets up analytical tracking to ensure that both versions will be measured. A/B testing is typically done in a live environment, and will run until the desired number of responses is obtained. Statistical analysis follows, and the outcome of that makes up the report.
In terms of cost, a November 2006 report by Forrester Research in Cambridge, Mass., found that the surveyed usability vendors tested 13 users per engagement, conducted tests over the course of two to three weeks, and charged $20,000 or less for this work. Project size and cost varied widely. Similar data does not yet exist for A/B tests, but a typical A/B test costs less than that amount.
While most usability studies are outsourced, more firms are insourcing A/B testing using commercial software designed for this purpose. Google very recently introduced a new Web site optimizer service that allows you to design and run A/B and multivariate tests for your Web site. The service is in a beta version now and is free to use at http://services.google.com/websiteoptimizer/. The tools, of course, are only half the battle, as well-thought-out research design is critical.
Regardless of your testing method — getting back to the McLuhan metaphor — make sure you look for regular opportunities to re-evaluate old assumptions. The value of testing is in unlearning those things that you thought you knew.
Think Like a Searcher
If you’re exploring navigation options, always include the most literal description as an option. In an A/B study conducted for Skype, a few small changes, such as using “accessories” instead of “shop” in the navigation menu resulted in an increase of 18.75 percent revenue per visit. In this case, shifting emphasis from what the user will do to what the user will find resulted in a terrific improvement.
Account for Date and Time Fluctuations
Variations based on time of day, week, month or year can make a huge difference in user behavior. This is an important variable to watch to ensure that your study findings are not skewed by normal fluctuations.
Vary Your Sample Size
Obtaining a valid sample is important, but there’s no point in hearing an obvious answer over and over. Sometimes a few usability sessions will result in the same critical feedback. In this case, it may be worth tampering with the sample size to perform some rapid prototyping and test a new design. Similarly, with A/B testing you might have a minimum and maximum threshold for your test. If findings are very conclusive after the minimum has been reached, why waste time implementing a winning solution?
Blend Quantitative With Qualitative
The best insight comes from having more than one view into user behavior. If you are using primary research, such as facilitated usability testing to learn about the customer, consider an A/B test or a survey to test your key assumptions. Conversely, if your quantitative data suggests that users prefer “A” rather than “B,” find a way to talk with a few one-on-one to learn why.
Don’t Damage Your Brand
It’s great to be able to test different options, but what experience will a customer have if they view a new home page treatment on each visit? Many A/B tests will limit brand impact by only serving alternative versions to a small percentage of users, such as 1 percent. With usability tests, generic prototypes can actually help reduce brand bias and offer more objective feedback on the designs.