Release: Shadow testing for decision models in Nextmv

Shadow testing isn’t just for the MLOps crowd. Decision algorithm teams can have it too for their optimization models solving routing, scheduling, allocation, and fulfillment problems across supply chain, retail, healthcare, and more.

We're thrilled to announce our first release of shadow testing in Nextmv! 

Shadow testing is a useful way to de-risk algorithm rollout and build confidence in deploying model changes to production systems. Nextmv users can now run a test model alongside a production model using the same online inputs and production conditions with the goal of analyzing and sharing model performance results.

Check out the video below for a quick walkthrough and read on for more details 👇👇👇

This latest release builds on a lot of Nextmv engineering effort and customer feedback from the last few months. In our experience, it's one thing to say you want to do shadow testing (or other types of tests), it's another to actually do it in a systematic and repeatable way. Following through usually means building out and maintaining bespoke tooling — and managing model versions. With Nextmv, we're looking to help teams focus on building and shipping more decision models, not building and maintaining more decision tools.

So what *is* shadow testing? Imagine that you have a production model you're considering replacing with a new model you've developed. You've put the new model through its paces with some offline data sets using acceptance testing. It passed with flying colors. But you know that data can shift over time — data from today may not look like data in your standard test files from a month ago. This isn't unusual. Shadow testing is a great way to account for that by exposing your new model to current online data and running it in a live production environment alongside your production model — but without the production impact.

The results of shadow testing mainly surface two things:

  1. Operational metrics such as solution value or unassigned stops for routing
  2. Model stability performance metrics such as run status or run duration

In this shadow testing release, you can do things such as:

  • Create and run a shadow test
  • Define end criteria by date or number of runs (and an optional start criteria by date)
  • Complete or cancel a shadow test early
  • Review standard and custom operational metrics
  • Review performance data for run status and run duration
  • Export result details to a CSV for further analysis
  • Share the results of a shadow test in the Nextmv console with your team

Check out the documentation for more information on creating and running shadow tests via the Nextmv console, CLI, and HTTP API.

We're really excited to share this release with you today. And there's more in the works with adding new charts and time series visualizations and more statistical analyses. Join us for a techtalk about the role of offline and online testing — including shadow testing — for shipping decision models to production to see these features in action and ask our team questions live.

We want to hear your feedback on what we've built so far and where we're taking shadow testing (and Nextmv as a platform!) next. Drop comments into our community forum or contact us to chat live

May your solutions be ever improving 🖖

Video by:
No items found.