I recently heard a story about a NASA scientist and Nobel Laureate named John Mather. While working on his PhD research project in the 1970s (a high-altitude weather balloon experiment on cosmic background radiation), the team grew tired of testing. When they went to fly, the payload failed.
When Mather was asked to reflect on the experience many years later, he said two things that stood out to me: “Testing is tiresome, tedious, boring, and essential” and “if you do not test it, it will not work.”
The maxim “test early and often” permeates the software world. Most developers know their way around the software testing basics: acceptance testing, unit testing, integration testing, stress testing, and so on. Far fewer are familiar with how to go about decision testing. We’re looking to change that.
It’s easy for people to get stuck on decision testing. There are high-level questions. What do you test? How do you test? When do you test? How do you communicate test results effectively? And then there are lower-level questions. How do I know if all my constraints are being met? How does my value function change over time? How does the value function change from one solution to the next?
In our former lives at Grubhub, my cofounder, Ryan O’Neil, and I spent a lot of time wrestling with these exact challenges. In our experience, most people who test decisions write a lot of custom code. For example, DoorDash has written quite a bit about their switchback testing framework. Many companies mimic it internally. But there’s more to decision testing than just switchback (live production) tests. We think testing frameworks should be provided by decision automation tools, so we are building them into the Nextmv Decision Stack.
It is important that our customers can run decision apps locally on their machine or in their production infrastructure and have it behave the same way. Skip the surprises. The ability to reproduce a decision given the input is vital to transparent automation. Manual testing should always be possible!
Then (of course) we like automated testing too. In the Nextmv world, automated decision testing mirrors software testing:
- Unit tests | Nextmv works on states and transitions, not inequalities and math formulations. It is simple to write table driven tests for things like constraints and apps.
- Functional tests with the test runner | Nextmv speaks JSON natively. This makes it easy to build golden file tests that ensure decisions remain the same as you update your code. You can include small data samples (2 cars and 5 stops for example) to ensure your decisions remain the same.
- Acceptance tests | Nextmv is easy to integrate with CI platforms like GitHub Actions to add business KPIs to your pull requests. (Post to follow here!!) Benchmark inputs can represent a cross section of your operations that are tested with each decision PR. Add a new constraint to the app? Modify the value function? Analysis of the resultant plans can give PMs and business stakeholders a great gut check on the output before a new decision hits production.
- Stress tests | Nextmv helps you test decisions for scale. Stress testing (or performance testing) helps you answer questions like "How does my routing app perform with 2x, 4x, 10x volume?" with simulators and get ahead of the business with new decision strategies, parameters or even infrastructure (e.g., switching from AWS Lambda to EC2).
More recently, we released new capabilities into Nextmv Cloud to help customers better understand existing output data for vehicle routing decisions (or solutions). It’s now easier for developers to understand how long to run their app before they see diminishing returns, if more stops are being unassigned than expected, or what the routes actually look like on a map for returned solutions. These views also serve as a nice way to communicate to stakeholders — often operators — and get the all-important buy-in for the decision app they’re building. The factors discussed in that process can inform automated acceptance tests later on.
Testing is a journey. No matter what line of work you find yourself in — cooking, spacecraft production, software development, or decision engineering — this will always be true. There's certainly more to come with Nextmv and added testing structures for decisions!