Experiments Tests

Experiments and tests

An introduction to experiments and tests on Nextmv Platform.

Experiments are used to evaluate changes to your models by running and comparing the results of one or more executables (i.e. different instances). Experimentation is a key part of developing a good decision app and Nextmv’s goal is to make it easier to run experiments so you can focus on improving your model.

Experiments are always created and managed in the context of an application. That is, each application will have its own set of experiments (that you have created).

Our experimentation and testing framework is designed to be visualized in the Console. Go to the app, Experiments tab.

Experiments are classified into two categories, depending on the data they use.

  • Offline testing. Uses offline data, i.e., historical data. Data is stored in Input Sets.

    • Batch experiments. Offline test to analyze the output from one or more decision models.
    • Acceptance testing. Offline test to evaluate the differences between exactly two models and assigning a pass / fail label based on predefined thresholds.
  • Online testing. Uses online data, i.e., data that is generated in real-time and fed to the model directly.

    • Shadow testing. Online test that runs in the background and compares the results of a baseline instance against a candidate instance (shadow).
    • Switchback testing. Online test to analyze the performance of a candidate model compared to a baseline model by switching back between them randomly.

Statistics convention

To support custom metrics within our Platform you can use a statistics convention as part of the run output. This convention is used for both runs and experiments. The convention is purely optional. If you want to use the experimentation system in particular or want to communicate metrics to the system in general, you have use this convention.

Let's start with an example. Below you see the JSON output of the meal allocation example.

{
  "options": {
    "solve": {
      "control": {
        "bool": [],
        "float": [],
        "int": [],
        "string": []
      },
      "duration": 10000000000,
      "mip": {
        "gap": {
          "absolute": 0.000001,
          "relative": 0.0001
        }
      },
      "verbosity": "off"
    }
  },
  "solutions": [
    {
      "meals": [
        {
          "name": "A",
          "quantity": 2
        },
        {
          "name": "B",
          "quantity": 3
        }
      ]
    }
  ],
  "statistics": {
    "result": {
      "custom": {
        "constraints": 2,
        "provider": "HiGHS",
        "status": "optimal",
        "variables": 2
      },
      "duration": 0.123,
      "value": 27
    },
    "run": {
      "duration": 0.123
    },
    "schema": "v1"
  },
  "version": {
    "go-mip": "VERSION",
    "sdk": "VERSION"
  }
}
Copy

The above statistics object is interpreted as follows:

  • The schema is the version of the convention. This is required and indicates which version of the convention is used.
  • The run object contains statistics about the run itself.
  • The result object contains statistics about the result of the run.

Your run output needs to meet the following requirements for the system to extract the statistics information:

  • The output of the run must be a valid JSON object.
  • The output must contain a statistics object.
  • The statistics object must contain a schema string with the value "v1". This version determines what schema is expected.
  • The content of the statistics object conforms to the v1 schema described below.
  • The maximum size of the statistics object without unnecessary whitespaces is 10kb.

Add custom statistics

If you have a custom app, you may wish to add custom statistics to your result. You can do this by passing a computed statistic in to output.statistics.result.custom.

Consider an example from vehicle routing. You want to add unplanned stop count as a custom statistic, you can:

  1. Format the output using the nextroute factory formatter.
  2. Add custom unplanned stop count to the formatted solution output.

An example is shown below:

output := factory.Format(ctx, options, solver, last)
solution := output.Solutions[0].(schema.SolutionOutput)
output.Statistics.Result.Custom = map[string]any{"unplanned": len(solution.Unplanned)}
Copy

V1 schema

In addition to the schema key, there can be up to three keys in the statistics object:

  • run: Information about the run itself. For example the total duration of the solver and number of iterations.
  • result: Information about the result of the run. For example the objective value.
  • series_data: Time series data for the run such as the progression of the value function over time.

All numerical values are interpreted as floating point (IEEE 754) values. This includes the special string values "nan", "inf", "+inf" and "–inf" as JSON does not support these values natively.

All keys except for schema are optional and can be omitted.

schema

A string that determines the structure of the statistics object. This is required and must be "v1".

run

keytypedescription
durationfloatThe runtime of the complete search in seconds. Optional, but must be a finite and positive number.
iterationsintThe total number of iterations. Optional, but must be a positive number.
customobjectOptionally contains arbitrary structure a user can define. By convention all information here applies to the complete run and only numeric values are interpreted, the rest is ignored. All keys within the object must only contain alphanumeric characters, _ and -. The minimum length of a key is 1 and the maximum length is 60.

result

keytypedescription
valuefloatThe value of the result. Usually the objective value of the best solution. Optional. For non finite results use the special string values "nan", "inf", "+inf" and "–inf".
durationfloatTime in seconds until that specific result was found. Optional, but must be a finite and positive number.
customobjectOptionally contains arbitrary structure a user can define. By convention all the information here applies to the result and only numeric values are interpreted, the rest is ignored. All keys within the object must only contain alphanumeric characters, _ and -. The minimum length of a key is 1 and the maximum length is 60.

series_data

keytypedescription
valueSeriesThe progression of the objective value over time. Optional.
custom[]SeriesAn optional array of Series. This can be used to communicate any type of two dimensional data sequences.
type Series struct {
  name string
  data_points []struct {
    x float64
    y float64
  }
}
Copy

Complex example

Below is a more complex example that also contains time series data.

The time series below is a simple example of a value function progression over time. The custom series are examples of how you can communicate more complex data, e.g. the usage of a custom operator over time.

{
  "schema": "v1",
  "run": {
    "duration": 40,
    "iterations": 100
  },
  "result": {
    "value": 3500,
    "duration": 15.5,
    "custom": {
      "routing": {
        "stops": {
          "assigned": 10,
          "unassigned": 2
        }
      },
      "special_float_values": {
        "inf": "inf",
        "nan": "nan",
        "plus_inf": "+inf",
        "minus_inf": "-inf"
      }
    }
  },
  "series_data": {
    "value": {
      "name": "value",
      "data_points": [
        {
          "x": 5.5,
          "y": 10000
        },
        {
          "x": 15.5,
          "y": 3500
        }
      ]
    },
    "custom": [
      {
        "name": "simple_operator_success",
        "data_points": [
          {
            "x": 5,
            "y": 15
          },
          {
            "x": 10,
            "y": 10
          },
          {
            "x": 12,
            "y": 5
          },
          {
            "x": 15,
            "y": 0
          }
        ]
      },
      {
        "name": "simple_operator_avg_contribution",
        "data_points": [
          {
            "x": 5,
            "y": 300
          },
          {
            "x": 10,
            "y": 200
          },
          {
            "x": 12,
            "y": 150
          },
          {
            "x": 15,
            "y": "nan"
          }
        ]
      }
    ]
  }
}
Copy

How the statistics are used

If the output conforms to the statistics convention the system extracts it and makes it available for further processing. The main application right now is within the experimentation system. For example all metrics from the statistics convention are displayed in the run table and are also summarized automatically.

Screenshot of a summary of custom metrics in Console

What if my output is not JSON?

To use the statistics convention your output must be a valid JSON. In these cases (e.g. a CSV output) we suggest to wrap your non-JSON output as a base64 encoded string in a JSON object.

{
  "solutions": [
    {
      "output": "aHR0cHM6Ly95b3V0dS5iZS9kUXc0dzlXZ1hjUQo=",
    }
  ],
  "statistics": {
    "result": {
      "duration": 0.000368916,
      "value": 27
    },
    "run": {
      "duration": 0.000368916
    },
    "schema": "v1"
  }
}
Copy

Page last updated

Go to on-page nav menu