Haversine vs. OSRM: Distance and cost experiments on a vehicle routing problem (VRP)

How do you test if you should use map data over Haversine for a vehicle routing problem? Let’s use batch experiments with Nextmv to find out!

When working on a vehicle routing problem (VRP), there may come a time when you need to decide whether to use map data or not. We’ve discussed the considerations for using map data in another post. It provided guidance on how to approach this decision and hinted at a future post that explored actual comparisons. Well, today is the day!

In this blog post, we will compare several VRPs with and without map data using Nextmv’s batch experiments feature. We selected different cities (and surrounding areas) to test different population densities, geographic features, and city layouts. For the map data, we will use the Nextmv Maps feature, which leverages OSRM map data. Let’s dive in!

The differences between map-based data and haversine

There are multiple ways to compute the distance and duration matrix for a routing problem and the best approach often depends on your specific use case and constraints. Figuring out what the best approach is can be a challenge, however. Experimentation is key — and without it, businesses are often left with taking a leap of faith and hoping for the best or not making any changes at all and potentially losing out on business value. 

As we define our experimentation space, let’s briefly recap a few concepts. When solving a VRP with Nextmv Routing app, the default distance measure is the haversine distance, which is the distance between two points on a sphere. This is a pure mathematical approach, similar to Euclidean or Manhattan distance measures. You can scale this distance by vehicle speeds to get to the duration it takes to travel between two locations. Haversine has the advantage of not being dependent upon other data sources other than longitude and latitude of locations. As a result, it’s simple to implement and not as computationally intensive as other approaches.  

That said, there are some drawbacks to the haversine approach. Haversine abstracts from the actual landscape (e.g., lakes) and infrastructure (e.g., low bridges) of the relevant area and computes the distance “as the crow flies.” Therefore, any distance computed with haversine will be more optimistic compared to a distance that considers the actual roads of the area.

Side-by-side comparison of a haversine display and a road-based display for the same stops.

What we’re looking at in this post: how much of a difference it makes to select map-based data over haversine? Sure, choosing map data will result in different road distances and travel times of vehicles. But in the context of a vehicle routing problem, where you must pick up and deliver packages, meet time windows, and make customers happy, does the difference warrant the investment in terms of cost, development effort, or model changes? 

A sample scenario with The Farm Share Company

In order to analyze this, let’s put ourselves into the shoes of decision modelers at Nextmv’s go-to example business: The Farm Share Company. This delivery business transports farm-based goods from local farms to customers’ homes. In our scenario, The Farm Share Company is already operating a VRP using haversine distance. Lately, our ops team has gotten consistent driver and customer feedback that drivers are finishing their shifts later than expected and not all orders are getting delivered on time, indicating a gap between model results and real-world results. We want to know: can we improve our business metrics if we use more precise distance and duration information?

To evaluate the benefits of upgrading to map information, we will look at two main metrics: 

  1. Mean gap per vehicle: The individual drivers and the differences in their return times at the end of their shifts. If there’s a higher gap between return times, then it’s likely the haversine approach is being too optimistic about a driver’s total travel time and impacting business performance. 
  2. Unplanned stops: Another factor to consider is the number of unplanned stops. In high-demand situations where vehicles are being used at their maximum capacity, operators will work on solutions to fulfill all unplanned (or not yet assigned) orders. That might mean staffing more drivers or outsourcing deliveries to third party drivers. If your VRP model’s haversine implementation is too optimistic, you could end up with drivers working overtime or returning with undelivered orders. 

Now that we’ve oriented ourselves to a relatable example, let’s define our problem space. 

Assumptions for comparing haversine vs. map-based data

So far we’ve recapped the basics of haversine vs. map data and we’ve settled on an example business scenario. What assumptions do we need to make? This section walks through two parts to define before we perform our comparison: regions of operation and vehicle speed. 

Characterizing regions of operation

The farm share business operates globally. Each operating region can be categorized in one of two ways: 

  • Small radius delivery: Delivery locations in the region are situated in a relatively small radius (radius = 12km), usually the downtown area of a larger city
  • Large radius delivery: Delivery locations in the region are placed over a larger area (radius = 150km), spanning multiple cities (e.g., Lakeland, Florida is part of the Orlando, Florida area)

For our experiment, we’ll look at 3 areas for each type of operating region — so 6 in total. The smaller radius areas are: Amsterdam. San Francisco, and London. The larger radius areas are: Orlando, Stuttgart, and Stockholm. 

The following characteristics are the same for each of these operating regions:

  • 15 vehicles
  • Vehicle capacity is 90 orders
  • 4.5 minutes stop duration
  • Shift length for drivers is 8 hours
  • All vehicles start and stop at the same location

Determining vehicle speed

For the haversine approach, we’ve defined a vehicle speed to determine the travel duration between two locations. This was not an easy task as the possible vehicle speed depends on several factors: the road type and their speed limits, the traffic conditions, and the vehicle type to name a few. For our experiment, we are only using a single vehicle type (cargo van) and we aren’t worrying about traffic data. That leaves us with the legal speed limits for consideration.  

For both areas (small and large radius), we’ll assume that we aren’t able to travel at the actual legal speed limit all the time. This is because streets are not straight lines without interruptions. You will encounter traffic lights. You will need to make turns. And, most importantly, you will encounter other vehicles in traffic — slowing you down because you’re behind someone else driving slowly or it’s rush hour. All these factors (and many more) decrease the average speed of a vehicle in traffic. Therefore, we’re going to assume the following: 

  • For the small radius area, the legal speed limit is 50km/h and average speed is 40% of the limit
  • For the large radius area, the legal speed limit is 100km/h and average speed is 60% of the limit

Of course, these assumptions miss nuances for each individual area of operation. Taking into account the distribution of road types to come up with a better speed assumption is not part of this blog post. 

To the experiment! Custom statistics and comparison 

As discussed before, we’re interested in the return time gaps (time between return to depot and driver shift end). As this is not a default statistic in our routing model output, we need to create a custom statistic for this KPI. Below you will find a code snippet that computes the sum of all return time gaps and the mean of the return time gaps. If you clone the Nextroute community app (nextroute community clone -a nextroute), you can simply add these snippets to the model.

Here is the definition of the custom struct that we’re using for the custom statistics.

type CustomStatistics struct {
    DefaultStatistics       schema.CustomResultStatistics `json:"default_statistics"`
    TotalReturnTimeGap      float64                       `json:"total_return_time_gap"`
    ReturnTimeGapPerVehicle float64                       `json:"return_time_gap_per_vehicle"`
    }

This is the part that computes the custom statistics:

vehicleIdToShiftEnd := make(map[string]time.Time)

    for _, v := range input.Vehicles {
        vehicleIdToShiftEnd[v.ID] = *v.EndTime
    }

    totalGap := 0.0

    for _, v := range last.Vehicles() {
        gap := vehicleIdToShiftEnd[v.ModelVehicle().ID()].Sub(v.End())
        totalGap += gap.Seconds()
    }

    customStatistics := CustomStatistics{
        DefaultStatistics:       factory.DefaultCustomResultStatistics(last),
        TotalReturnTimeGap:      totalGap,
        ReturnTimeGapPerVehicle: totalGap / float64(len(input.Vehicles)),
    }

    output.Statistics.Result.Custom = customStatistics

Comparison and results: Haversine or OSRM?

For each of our areas of operation, we have five historic runs that we’re going to use to evaluate the usage of OSRM data compared to the default haversine approach. We have set up two instances in our decision app, one uses Nextmv’s map feature (using OSRM) and the other one uses the default haversine distances. In order to compare both settings, we create a batch experiment for each area of operation on the historic inputs. 

The batch experiments will aggregate the individual runs and their custom statistics and display them in the experiment’s summary. Here is what the batch experiment for Amsterdam looks like in the Nextmv console: 

Batch experiment setup for Amsterdam in Nextmv console.
Example batch experiment results for Amsterdam. (Note: the KPIs are in seconds in this visual. In the summary tables below, everything is represented in minutes.)

Below you will find two tables summarizing the results returned to me in Nextmv for our six areas of operation:

Small radius

Large radius

What we’re looking for in these results is a signal that indicates that haversine is too optimistic in the specific area  and underestimates the time it takes to travel between two points because of road infrastructure and the landscape. 

The mean gap per vehicle (the gap between the return time and the vehicle shift end) can be seen as a buffer towards the end of the vehicle shifts. The smaller this gap is the tighter the buffer gets. We can expect to see larger gaps for the haversine approach than for the OSRM approach. How large the gap is for OSRM will indicate how “tight” the routing is when the actual road infrastructure is being considered.

Let’s take a closer look at each of these results and figure out whether we should consider switching from the default haversine approach to using OSRM data in solving our routing problems.

Amsterdam (small radius)

Our KPIs indicate that haversine is too optimistic about the travel times and if you consider the actual road network, you end up with an average of 27 unplanned stops. A switch to OSRM should be considered.

San Francisco (small radius)

There are no additional unplanned stops in the OSRM approach. The mean gap per vehicle is about 27.42% smaller for OSRM than for haversine. With the chaos of delivering in the actual world during different traffic periods, a gap of about 48 minutes might not be enough. So this could be an area that either needs additional analysis or more historical data to decide whether OSRM should be used.

London (small radius)

The OSRM approach indicates an additional 101.6 unplanned stops on average, so a switch to OSRM seems to be a good idea.

Orlando (large radius)

There are no additional unplanned stops and the mean gap is about 80 minutes. As this is an area with a larger radius, 80 minutes might be enough to mitigate any traffic disruptions throughout a typical business day. A switch to OSRM is probably not necessary.

Stuttgart (large radius)

There are no additional unplanned stops and the computed gap is fairly big. This also indicates that a switch to OSRM is probably not necessary.

Stockholm (large radius)

The data shows an additional 41.60 unplanned stops on average. This is a signal for switching to OSRM.

From these results, you can see that for some areas a switch to OSRM is preferable, while for other areas keeping the haversine approach should be okay based on the data we used in this experiment.

Takeaways and next steps

As you’ve seen, we’ve conducted a few model experiments using Nextmv to evaluate model changes and their business impact. These testing tools allowed us to play out and analyze scenarios using historical data without impacting production systems and operations.

For the Farm Share Company, we determined that 3 of our delivery regions would benefit from a switch to using map data with OSRM, 2 regions are fine to stay with haversine, and 1 needs further analysis. We used batch experiments with custom statistics to make these comparisons specifically for our farm share delivery business. 

Now, these results should not be considered a general proposal for each of these areas of operation. This blog post makes assumptions specific to the Farm Share Company’s operations that might not be true for your own operations. Maybe you have time windows you need to consider or other restrictions that create limitations on the routes you can create. This blog post did not look at individual stop ETAs, which would involve comparisons to other data sources or driver performance metrics. Obviously, these considerations could lead to different results and different interpretations of the results. Also, there might be other KPIs that you’re interested in. 

So if you think about this decision for your own business, I encourage you to create your own experiments and make your own conclusions. (And remember to check out the considerations for choosing a cost matrix.) As you have seen, the Nextmv platform is a great solution for this type of experimentation and others (e.g., should you cluster routes or change your objective function?). 

Try out the full Nextmv experimentation suite with a free account and trial — or reach out to us with questions and feedback. 

Video by:
No items found.