What are batch experiments for optimization models?

You need to change your decision model, but you’re not sure how it’ll impact your KPIs. Will there be unexpected effects? Batch experiments allow for exploration through summary statistics to orient yourself to impacted metrics.

This post is part of a series describing our thinking about the types of optimization model testing. We welcome your feedback in our community forum and make sure to watch our techtalk on this topic!

Batch experiments are used to analyze the output from one or more decision models. They are generally used as an exploratory test to understand the impacts to business metrics (or KPIs)  when updating a model with a new feature, such as an additional constraint. They can also be used to validate that a model is ready for further testing – and likely to make an intended business impact.

You might be wondering, “What’s the difference between batch experiments and acceptance testing or scenario testing?” Glad you asked! In the Nextmv testing framework, batch experiments are the foundational layer for acceptance tests, scenario tests, and benchmarking. Fundamentally, all of these tests compare the output of one or more decision models. Batch experiments, also known as back tests or ad hoc tests, are the most basic type of testing that simply return the output metrics of one or more models for comparison. The other tests build on the core concept of a batch test and include more explicit criteria or intentions. For example, an acceptance test is focused on evaluating the differences between exactly two models and assigning a pass / fail label based on predefined thresholds. 

When you’re in the early stages of testing, it’s easiest to start with a batch experiment to orient to the types of changes to expect. Let’s see what this looks like in practice.

What’s an example of a batch experiment?

Imagine that you work at a farm share delivery company – picking up produce boxes from local farmers and delivering them to customers’ homes using a fleet of 5 vans. Business has been growing lately and you’ve been adding new farms as well as new customers. With the additional stops, operators have been hearing from the drivers that they’re experiencing burnout. Driver health and happiness are of utmost importance, so the company is open to different approaches to tackling this problem. 

As your team brainstorms options, a few updates to the model come to mind: adding driver shifts, setting a maximum distance per route, or including driver break locations. Your operators have agreed that any of the three options will satisfy the needs of the drivers, but you aren’t sure what other impacts each of those model updates may have in other places. 

Your team decides the smartest move is to do some initial testing using input files from 20 previous runs to account for realistic variability. 

You run a batch experiment with four models: 

  1. Current model
  2. Model with shifts
  3. Model with max route distance
  4. Model with break locations

As you browse through the summary stats of the output for each, you take note of two impacted KPIs. 

Unassigned stops

  • Current model: 0 unassigned stops
  • Model with shifts: 10 unassigned stop
  • Model with max route distance: 8 unassigned stops
  • Model with break locations: 0 unassigned stops

Total distance

  • Current model: 50km
  • Model with shifts: 40km
  • Model with max route distance: 45km
  • Model with break locations: 75km

The results of the experiment highlight that all three potential changes to the model have impacts that require further discussion and action. Your team takes these numbers to the operators and product managers so they can assess the impact and create acceptable thresholds for the affected metrics. They report back that there cannot be more than 2 unassigned stops and that the total distance cannot be more than 1.2x that of the current model.

It’s clear that your organization will need to take action in some way, like adding more drivers (and vehicles to your fleet). But which option gives you acceptable KPIs while adding the fewest number of vehicles? With thresholds defined, you can now run acceptance tests while incrementally adding vehicles to determine which model will meet the KPI requirements with the smallest fleet size.

Why perform batch experiments?

As we can see in the example, batch experiments are a great place to start when you’re trying to understand the impact of model changes but don’t have explicitly defined acceptance criteria or are not ready to systematically test the impact of scenario variability. 

Here are a few situations where batch experiments provide insight.

Comparing output metrics from one or more models, including: 

  • Summary stats from the output of each model (e.g., solution value)
  • Details and comparison of specific metrics or custom metrics (e.g., the number of vehicles used for route optimization)

Identifying impacted KPIs and deciding which actions to take, like: 

  • Updating constraints (e.g., adding shift times to the model)
  • Updating solver configuration (e.g., decreasing run time)
  • Alerting stakeholders to the potential impacts of moving forward (e.g., including driver shifts will increase the number of unassigned stops if no additional changes are made to the model or config)

Preparing for acceptance testing by: 

  • Identifying which KPIs are impacted 
  • Understanding realistic thresholds for acceptance testing

Preparing for scenario testing by:

  • Identifying potential interactions between parameters that you’d like to vary (e.g., shift length and unassigned stops)
  • Understanding realistic ranges of parameters to test across

Preparing for benchmarking by:

  • Quickly assessing potential differences between models using different solvers or solving paradigms
  • Understanding realistic solver options to test across

When do you need batch experiments?

The need for updates can come from a number of teams across an organization who are each focused on specific goals. So what are typical business situations that require basic output comparison? Here are a few examples, ​​where you as the developer might want to leverage batch experiments as a first step:

  • Operators / Product managers are seeing issues or opportunities with the current model that impact user experience
  • Finance teams are updating budget requirements that require model changes
  • Decision scientists / OR teams are looking to improve solver performance
  • DevOps is seeing a high load on the infrastructure from the current solution due to long run durations

How are batch experiments performed?

We’ve frequently seen batch experiments performed manually by the team responsible for model development as ad hoc exploration to orient around the impacts of a change. The experiments are done by performing runs against two models and then either manually parsing through the output or pushing the data into another script or tool. This requires either creating bespoke tooling or piecing together different existing tools. 

At Nextmv, we’ve integrated a decision model testing framework directly into our optimization platform. To get started with batch testing, you simply need the model(s) you’d like to compare and an input set (or batch of input files). We use input sets (or collections of input files created from previous runs) to ensure your experiments account for more variability instead of relying on a single input file. Further, having a record of batch experiments makes it easier to collaborate on models and share results. With context and metadata tied to each experiment, we make it easy to share results with stakeholders and then hop right back into development. 

How do I get started with batch experiments using Nextmv?

Batch experiments are available to get started with today! Sign up for a free 14-day trial for instant access to the entire Nextmv platform for building, testing, and deploying custom models.

Ready to dive into more about decision model testing? Have feedback about the framework or what you’d like to see us build out? We’d love to hear from you in our community forum – or reach out to us directly.

Video by:
No items found.