Introducing switchback testing: A/B testing for decision models with network effects

See the real-world impacts of a candidate model with a switchback test that randomly assigns units of production runs to each model. Start a switchback test, analyze the results, and promote a new model to production all in the Nextmv console.

Switchback testing is available on the Nextmv platform! (You can get started for free today and join our engineers for a live demo.) We’re proud to expand our decision model testing suite – and we’re excited by the response from customers and community members about the value that production testing brings to their organizations. Let’s take a look at what switchback testing is, where it fits into modeling workflows, and how to get started. 

Compare decision models using random assignment

What is switchback testing? We dove into the concept of switchback testing in detail in an earlier post. In short, it’s similar to A/B testing, but not quite the same. Switchback testing randomizes which model is applied to units of time and/or location to mitigate network effects (like pooled resources) so your team can compare a candidate model to a baseline model in a true production environment where the model is making operational decisions. 

Take, for example, a delivery use case where you’d like to test a new model against the one already running in production. When splitting treatment, you cannot assign the same order to two different drivers as a test for comparison. The same driver must deliver the order that they picked up, so a traditional A/B test would not be effective as there isn’t a way to isolate treatment and control within a single unit of time. Delivery driver assignment is just one example of a network effect that is taken into account with switchback testing. 

In the video below, Dirk Schumacher (one of the engineers behind the feature) explains the concept of an experiment plan, including units and randomized assignments.

Switchback testing as part of the DecisionOps workflow

There’s a quote from the Principles of Chaos Engineering, “Chaos strongly prefers to experiment directly on production traffic.” We know from experience that confidently promoting a new model to production requires seeing the real-world impact that your candidate model has on your production workflow. After putting a new model through its paces in historical tests such as batch experiments and acceptance tests, the next step in the workflow is to see how the candidate model performs on live, production data. A shadow test (where the candidate model uses production data but doesn’t impact production workflows) is another way to build confidence. But what do you do when you want to see the impact of a new model on your operational environment? Enter switchback tests.

Nextmv makes it simple to kick off and analyze the results of switchback tests so you can quickly and confidently develop and deploy well-tuned models for your production use case.

Check out this demo to see switchback testing in action: 

Create an experiment directly in the Nextmv console

To kick off a switchback test, all you need is:

  • Name of the experiment
  • Description (optional)
  • Baseline instance (likely the model that’s currently in production)
  • Candidate instance (the model you’d like to test against the production model)
  • Total number of units to use in the test (that will be randomly assigned to each model)
  • Length of unit (in minutes)

Review the plan, then analzye and share results

Before diving into the metrics, the results page will also display the Plan Summary that provides a unit index to easily identify the experimental units and which treatment was applied to them. Here we can see that the unit duration is 60 minutes, a common time frame to use for a unit in a switchback test. In the full plan summary, we can see that it will run for a full week. Often these tests run for a few days or weeks.

After the test is completed, it’s time to check out how the candidate model fared. Is the candidate model ready to be promoted to production? Let’s take a look at KPIs, including summary metrics like solution value and custom metrics per use case like unplanned stops for routing. In the results below, we can see that the candidate model (staging) had fewer unplanned stops in production. We expect that fewer unplanned stops will increase broader metrics like customer satisfaction that can only be seen when a model is run in an operational environment.

Experiment further or make a switch to the production instance

Once you’ve reviewed the results of your switchback test, you may decide to make further changes to your candidate model. In that case, you’ll likely run it through another set of tests to see how the updates affect your KPIs. 

If you’re happy with the results, you can push your new model to production directly from the Nextmv console. Simply switch your production instance to use the new version of your code. (This can also be done via the Nextmv CLI.)

Get started

Sign up for a free Nextmv account. Start a free trial to get instant access to the full Nextmv testing suite, including batch experiments, acceptance tests, shadow tests, and switchback tests.

Check out the switchback testing documentation to learn more about how to design a plan and analyze results.

Have questions? Jump over to our forums and don’t hesitate to reach out to us directly. We love chatting about all things testing and decision science modeling.

Video by:
No items found.