How a Small Team Iterated Rapidly by Running Up to 12 User Tests Per Week

by Edmond Lau

Photo credit: Noom Peerapong

“Movie time!”

My team and I huddled on the couches to watch the latest batch of movies — videos really — of users testing out our product. For several weeks during the most recent product redesign at Quip, we ran tests on UserTesting — a virtual user testing lab that lets you get videos of real people sharing their thoughts as they test out your product. Watching these user videos became a core part of our workflow and quickly evolved into one of our highest-leverage activities.

Our project team developed a healthy cadence where we would iterate on a part of the user experience based on some hypothesis, deploy the changes to production, and then run a user test on the new version. Within the hour, we’d receive detailed feedback from real people and watch 30-minute long videos of people making sense of — or failing to make sense of — our changes. Based on our learnings, we’d brainstorm solutions to eliminate sources of friction and confusion and repeat the cycle. At our peak, we ran up to 12 user tests in a single week.

This technique of continuous user testing, where we would aggressively and iteratively run user tests to validate hypotheses, played a key role in guiding our redesign. We’d ask ourselves, “What’s the next piece of functionality where we would learn the most by testing on users?” And then we’d hone in on those areas. The tests confirmed promising design choices and surfaced areas – sometimes surprising ones – that people found confusing. Ultimately, the technique helped us to confidently ship a redesign to simplify core aspects of Quip’s living documents, and the launch largely received positive feedback.1 And it’s a technique that can benefit other teams as well.

The Power of Continuous User Testing

User testing is by no means a new concept. When I worked on search quality at Google, our team would periodically run tests in the usability lab. In a typical test, a researcher would guide a paid volunteer through a list of tasks, and the team could either observe through a two-way mirror or watch the recorded video afterwards. Eye-tracking lasers followed a participant’s gaze as it danced across pages, and overlaying this optical data later over the recorded screen provided valuable insights into where people focused their attention. Once, instead of running a lab test, we even drove out to volunteers’ homes to observe how people used Google in their natural habitat.

These user tests often surprised us. Who would have known that some people copied and pasted search results into Word documents and printed them out when doing online research? That’s not an insight any amount of team discussion would have revealed. But while useful, the overhead involved in scheduling and running a test often meant that user testing rarely became an integrated part of a team’s development workflow. Many teams would conduct tests only as an afterthought, after they had already invested substantial amounts of engineering and design work.

Outside of Google, the teams that actually run user tests continuously as part of their workflows tend to be ones that are scrappy and have low numbers of users — traits that you’d find in early startup teams. For example, when Akshay Kothari and Ankit Gupta were initially building the Pulse News reader app (acquired by LinkedIn in 2013 for $90M),2 they would set up camp in a Palo Alto café and invite visitors to test out new prototypes on an iPad. Based on the usability issues that people hit, they would then make hundreds of small iterations per day — from fixing interaction patterns to adjusting button sizes. Within a couple weeks, people went from calling it “crap” to asking if it was preloaded on the iPad. 3

The advent of virtual usability labs and the speed with which you can get results change this dynamic, making these types of workflows accessible to more teams. The tight feedback loop enables teams to iterate more incrementally than a more traditional build-and-launch approach and, when applicable, much faster than running A/B tests.

Continuous user testing also affords two other key benefits. First, it provides a powerful channel to quickly validate and build confidence that you’re moving in the right direction. Driving a product forward based on vision and intuition is important, but we’re oftentimes so immersed in our own mental models of how something works that we can become fairly disconnected from someone less familiar to the product. Gathering feedback from real users helps ground our reality.

Second, the persuasiveness of data from real users helps resolve discussions. How often have we spent hours in heated design discussions over the hypothetical benefits of various approaches, only to move forward based on some line of argument that doesn’t necessarily have everyone’s buy-in? We often spend our energy debating because we don’t have enough data, and continuous user testing gives us another tool to collect that data. And that means we can focus on results and ship what actually works rather than what hypothetically works.

When to Leverage Continuous User Testing

Continuous user testing isn’t free — the time spent conducting and learning from user tests is time not spent writing software. But when we consider all the time we spend tweaking and perfecting features that never get the adoption we want, it’s clear that the earlier we can make course corrections, the less effort we’ll waste and the bigger the impact we’ll create. So when does it make sense to use this technique?

A good starting point is where A/B testing falls short. Dan McKinley, a former principal engineer at Etsy, has shared how some teams there have successfully used continuous experimentation to drive product development. 4 They decompose large product changes into a series of hypotheses (e.g., “casual users don’t understand that they can search different marketplaces using a dropdown menu next to the search box”) that they then verify with A/B tests of small, measurable changes (e.g., “does surfacing handmade and vintage items outside of the dropdown increase findability?”).

We’ve used continuous experimentation as well to make product changes at Quip, and I used it previously to direct growth projects at Quora. The strategy works very well when the desired impact is quantifiable and easy to measure. But sometimes the interactions we’re trying to understand are too complex to distill into a single number. Or perhaps measuring it would take a significant amount of time and traffic to detect any meaningful difference. How much of the core product do new users understand after going through a short tour? How obvious are the interactions required to accomplish a common task? It’s tough to get deep answers to these questions with an A/B test.

Continuous user testing can help in these situations. By observing where users succeed and stumble as they work through a series of tasks, we can gain a deeper understanding of user behavior that isn’t captured through high-level metrics. That understanding can validate your intuition and your hypotheses, as well as surface unexpected sources of friction.

Interactions where users don’t require a significant amount of context to understand what’s going on generally work better for rapid user tests. For more complex, power-user features for Quip, we’ve relied more heavily on internal dogfooding or beta tests with customers — we’re building a new class of productivity tool, so we’ll iteratively deploy changes to employees or customers so that we can gather more data about what works and what doesn’t.

The many hours that we spent watching users understand and work with Quip introduced a level of confidence in our redesign that we wouldn’t have otherwise. Many of the ideas we tried didn’t pan out, but continuous user testing let us focus and ship the ones that did.

Special kudos to my teammates and fellow movie watchers Nate Botwick and Belinda Gu.

  1. Quip Inbox, Product Hunt. 

  2. Frederic Lardinois, “LinkedIn Acquires Pulse For $90M In Stock And Cash”, TechCrunch, April 11th, 2013. 

  3. Tom Kelley and David Kelley, Creative Confidence, p109 - 114. 

  4. Dan McKinley, “Design for Continuous Experimentation: Talk and Slides”, December 22nd, 2012. 


“A comprehensive tour of our industry's collective wisdom written with clarity.”

— Jack Heart, Engineering Manager at Asana

“Edmond managed to distill his decade of engineering experience into crystal-clear best practices.”

— Daniel Peng, Senior Staff Engineer at Google

“A comprehensive tour of our industry's collective wisdom written with clarity.”

— Jack Heart, Engineering Manager at Asana

“Edmond managed to distill his decade of engineering experience into crystal-clear best practices.”

— Daniel Peng, Senior Staff Engineer at Google

Grow Your Skills Beyond the Book

Listen to podcast interviews with top software engineers and watch master-level videos of techniques previously taught only in workshops and seminars.

Leave a Comment