Experiments

Experiments are used to evaluate changes to your models by running and comparing the results of one or more executable binaries (i.e. different versions). Experimentation is a key part of developing a good model and Nextmv’s goal is to make it easier to run experiments so you can focus on improving your model.

Nextmv Platform provides a suite of products to create and manage different types of experiments. Currently there are three types of experiments: batch, acceptance, and shadow; and a feature called input sets which is a way to manage the inputs that are used for the experiments. Experiments are always created and managed in the context of an application. That is, each application will have its own set of experiments (that you have created). See the Apps core concepts page for more information about applications.

Experiments and input sets can be created and managed with Nextmv CLI, Nextmv Console, or the HTTP API endpoints. Created experiments are saved and can be accessed at any time. After experiments have been started, the results are aggregated and can be retrieved with the same tools. When viewing the result of an experiment, Console provides a visual interpretation of the results, while the API and Nextmv CLI provide the raw JSON.

The different types of experiments and input sets are summarized below.

Types of experiments

Scenario

Scenario tests compare the output from one or more scenarios. A scenario is composed of a model version, a collection of inputs, and any specific configuration that should be applied to the runs for that scenario. You can also configure repetitions to test for variability in the results.

You can use scenario tests as a way to explore impacts to business metrics (KPIs) based on model updates, different conditions (e.g. low demand vs. high demand), parameter tuning, and more. You can also use scenario tests as a way to validate that a model is ready for further testing and likely to make an intended business impact.

Batch

Batch experiments are used to analyze the output from one or more decision models. They are generally used as an exploratory test to understand the impacts to business metrics (or KPIs) when updating a model with a new feature, such as an additional constraint. They can also be used to validate that a model is ready for further testing — and likely to make an intended business impact.

See the batch experiment reference guide for more information on batch experiments.

Acceptance

Acceptance tests build on the core concept of a batch test with a focus on evaluating the differences between exactly two models and assigning a pass / fail label based on predefined thresholds. They are used to verify if business or operational requirements (e.g., KPIs and OKRs) are being met. Acceptance tests involve running an existing production model and a new updated model against a set of test data. You then look at the results and determine if the new model is acceptable based on criteria identified beforehand.

See the acceptance tests reference guide for more information on acceptance tests.

Shadow

A shadow test is an experiment that runs in the background and compares the results of a baseline instance against a candidate instance. When the shadow test has started, any run made on the baseline instance will trigger a run on the candidate instance using the same input and options. The results of the shadow test are often used to determine if a new version of a model is ready to be promoted to production.

Shadow tests can be created using the CLI, Nextmv console or the HTTP API. See the shadow test reference guide for more information on shadow tests

Switchback

Switchback tests for decision models allow algorithm teams to analyze the performance of a candidate model compared to a baseline model using production data and conditions while making operational decisions by randomizing the candidate treatment over units of time.

Switchback tests are related to general A/B tests, but they are not the same. Switchback tests allow you to account for network effects, whereas A/B tests do not.

Switchback tests can be created using the Nextmv console or the HTTP API. See the switchback test reference guide for more information on switchback tests

Inputs

Inputs are managed input data to use as part of an input set in an experiment. You can create an input in the Nextmv Console or with the HTTP API endpoints.

You can create a managed input from either an uploaded input or by referencing a previous run.

Input sets

Input sets are defined sets of input files to use for an experiment. You can create input sets with Nextmv CLI, in Nextmv Console, or with the HTTP API endpoints.

At the moment inputs for the input sets can only be retrieved from prior runs. So to “upload” an input you must make a run using this input. Then when you create an input set you can reference the run ID and when the input set is created it will take the input used for this run as the input file. Alternatively you can specify a date range and an instance ID to gather inputs for an input set. Note that the maximum number of inputs allowed in an input set is 20.

Custom metrics (statistics convention)

It is often useful to define custom metrics to evaluate the results of an experiment. Custom metrics are defined as part of the run output in the statistics field. The statistics field is a JSON object that follows our statistics convention and can be fully customized to your needs. For more information on custom metrics, see the statistics convention or look at one of our apps.

Review the Results

After running an experiment from the CLI, navigate to the Nextmv console to view the results of your experiment comparing the models. Note, when running large experiments, you may need to check back later to view results.

Within the Nextmv console, you'll find your experiment under the Experiments section.

Overview of experimentation in Nextmv Platform.