Should I use map or traffic data for my vehicle routing problem (VRP)?

When planning routes for vehicles, it’s reasonable to think you want to incorporate map or traffic data. But it’s not always needed and is often an investment if you decide to pursue it. We look at how to decide.

The optimization goal of a vehicle routing problem (VRP) is to sequence the order of stops a vehicle will service while minimizing, maximizing, or satisfying a specific objective such as distance traveled or time on road. As part of your VRP, you’ll need to consider constraints that limit the ways stops are sequenced. For example, pickups must happen before dropoffs or vehicles have limited capacity. 

Some constraints are guided by a distance matrix, which is a table that usually defines the distances between sets of paired locations on a map. An example of one such constraint is a maximum limit on the distance a vehicle can travel. Additional examples include hard time windows (e.g., a stop must be serviced between 9:00 and 9:30 AM) or early/late arrivals in your VRP. In some cases, it’s useful to compute an estimated time of arrival (ETA) by providing a start time, vehicle speed, and distance matrix, and the model can calculate the travel time between each stop. 

Regardless of how you constrain your VRP, it’s likely you’ll wonder how map and traffic data play a role. In this post, we’ll explore some options and how to consider if they’re appropriate for your route optimization. 

Calculating time and distance costs: a primer

The first step in determining if you need map and traffic data, is to get familiar with the ways to calculate distance and time. We’ve written about distance and duration costs in the past, but I’ll provide a quick summary for convenience. 

There are two main ways to determine travel distance and travel time: mathematical computation and observation. Mathematical computation leverages geometry-based approaches such as Euclidean distance (flat plane distance) and Haversine distance (or “as the crow flies” or curved surface distance). These approaches only consider the distance between two points and not any possible terrain such as mountains and lakes or infrastructure such as bridges and bike paths therein. 

In contrast, observational approaches leverage providers of road network data such as OpenStreetMap (OSM). This data is based on street surveys, crowdsourcing, and satellite imagery to provide accurate distance information, while accounting for available terrain and infrastructure information. For solutions that offer traffic data, it’s usually based on historical data and sometimes reported updates. 

Mathematical approaches tend to be faster and lower cost. At the same time they offer solutions that can sometimes be too optimistic in terms of travel distance and travel duration between locations. Observational approaches are usually more accurate, but also more resource intensive and higher cost. At the end of the day, however, both approaches provide estimates — and it’s possible to get successful estimates with both if you understand their strengths and weaknesses in the context of your problem. 

Considerations for using map and traffic data

So, how do you start to determine which approach is suitable for your business? The first step is to take a closer look at your operational, technical, and financial requirements and restrictions of the decision problem you’re trying to solve. The following list of questions to consider will help guide you in this endeavor. Keep in mind, this list is not exhaustive, but it’s a starting point for thinking about the specific requirements for your business and you will probably have unique requirements to consider for your specific business.

Operational: Location, population, infrastructure, etc.

What your operations look like and what kind of properties or requirements it has will be a key driver in thinking about the source of the distance and duration matrix for your decision model. Some questions to consider here include: 

What is my area/region of operation? 

If you’re operating in a place with lakes and rivers (e.g. London or Copenhagen) or hills and mountains (e.g. San Francisco or Lima), map data might be helpful, as it considers bottlenecks like bridges and tunnels. Whereas a place that has a more uniform landscape with few geographical obstacles like Texas, Haversine might be good enough.

That said, even within a place with few geographical obstacles where Haversine might seem best, there may be other regional attributes that would be better accounted for with map data. For example, culs-de-sac in a suburban neighborhood or travel obstacles such as golf courses, theme parks, wildlife preserves, airports, etc.   

What is the infrastructure and population density of this region? 

There are times in New York City where you’re stuck in traffic and it feels like nothing moves forward anymore. A trip of 30 minutes can quickly become a trip of 90 minutes. Having traffic information or a scaled Haversine approach for such a case could mitigate the risk of getting stuck. Whereas in a city of a couple of thousand people, this might not be a problem at all.

Another relevant aspect is the actual infrastructure and connectivity in your area of operations. Is it possible to bypass a congested downtown area, because there are highways around it? If so you might want to know when it is clever to take those detours and when you should be fine to take the direct connection. 

Are there any bottlenecks in the infrastructure (tunnels, bridges, a high number of one-way streets)? 

If you want to go from downtown New York City to the other side of the Hudson River, you need to take the Holland tunnel or the Lincoln tunnel. These will be bottlenecks slowing down your drivers. Bridges and tunnels can also come with limitations regarding the weight or height of allowed vehicles. Having map data is useful in these cases. Traffic data can also help.

What is the speed variability and the road complexity of your area of operations?

If you’re transporting goods from one warehouse to another warehouse, which is directly connected with a highway, there is not much complexity to this connection and you are probably good to use Haversine distances with appropriate vehicle speeds for this. 

But what if a city was located between the highway and one of your depots? Now you have a more complex situation where you encounter different road types, each with different speed limits. Furthermore, the inner city setting itself leads to a higher road complexity as you will probably encounter culs-de-sac, one-way roads and a range of different speed limits depending on your local regulation. With a higher complexity, using map information helps in figuring out the best routes for your vehicles.

What does your fleet look like? What kind of vehicle types do you have? 

If you have a hybrid fleet consisting of bicycles capable of using separate pathways than large vehicles, map data along with vehicle profile information can be appropriate. If you have a more uniform fleet consisting of electric vehicles and regular vehicles (but not bicycles), Haversine distance may be sufficient. 

Do you consider time windows in your model? Do you communicate estimated times of arrival (ETAs) to drivers and/or customers?

If you need ETAs for your deliveries to either communicate to customers or to figure out if you’re meeting customer time windows, having good estimates about your actual travel time will be crucial to the precision of these ETAs. Relying on Haversine alone will most likely lead to overly optimistic ETAs, which can both lead to unhappy customers but also to unhappy drivers if they’re not able to meet those times. In this case, map data is worth exploring. 

Technical: Integrations, technical staff, SLAs, etc.

Here it gets a bit more technical. If you’re using a third-party service to provide map or traffic data such as HERE Maps, Google Maps, RoutingKit, and others, you need to integrate it into your decision model. Some questions to consider from this perspective: 

How dynamic are the locations that you need to solve for with your model? 

If you have a list of warehouses and a list of shops that are connected to those warehouses, the connections you need to service with your drivers are pretty steady. You might be able to cache travel information for these connections, which reduces access to your traffic service or the service provider. If you’re running a food delivery or ride sharing service, you will end up with new stop locations every day, which makes it almost impossible to cache any map or traffic information. 

What is the size of a typical distance/duration matrix that you're going to consider? 

Many providers for map and traffic information have limits on the size of the matrices you’re requesting. If your problem becomes too big, you may need to split it up into chunks. Now, you may have the hardware and skillset necessary to self-host a map/traffic solution capable of supporting large matrices, but the larger the matrices, the longer it takes to compute them. Do you have time to wait in your operations?

Are you ready to build the integration to your routing model? 

No matter if you’re using a third-party provider or a self-hosted service for map and traffic information, you need to bring this information into your model. In most of the cases, this requires some engineering work to set everything up. Do you have an engineer for this? Do you have the time for this? Or would it be easier to start with Haversine and later expand to a provider?

How much effort does it take to integrate the travel distance and travel time matrix into the model?

If you’re relying on a third-party provider, it might be easy to integrate their service with your model. If you want to set up your self-hosted service, there is work to do before you can even start to integrate. 

What is the SLA of the third-party service providing the matrices? 

If you’re relying on a third-party service to get travel distance or duration matrices, you’re bound to their SLA. It’s important to understand what their promised uptime is over the year, because if they are down, so is your matrix computation. (And maybe you should prepare a fallback mechanism that uses Haversine so that your operations continue, even if the solutions are different from the primary service.) Additionally, you should check whether the provider covers the region for your operations. Not all providers cover all parts of the globe or with the same level of detail.  

Financial: Licensing, frequency, limits, etc.

Now that we have a better understanding of operational and technical requirements, let’s talk about the financial considerations. If you’re working with a third-party service, this will probably mean that you need to pay for this service. Often, these services have different tiers and by thinking about the scale of your operation, you should get a feeling for the best fitting tier as well as how the limits of these tiers match your operations in short, medium and longer terms. 

How much does it cost to license a third-party service? 

Costs for map and traffic services can vary widely depending on hosting, API call frequency, and fidelity. For example, self-hosting an OSRM solution can cost a few hundred dollars a month whereas third-party providers can be significantly more expensive, especially if you want to include traffic data in addition to map data. It’s important to carefully review and evaluate costs to fully understand your investment in a service. You should also think about any future growth: How will the costs of the selected solution scale with your business?

What are the tiers and their limits of a third-party service? 

Different companies will tier their offerings differently. Limits might include number of calls to the API in a certain time frame, size of the matrix you can request with a single call, and only specific vehicle profiles for you to be aware of. 

What is the best fitting tier? How much room is there before tier limits are exceeded? What happens if limits are temporarily exceeded? 

Selecting the right tier for you can be difficult, because of all the limits mentioned before. Some providers have hard caps on these limits and won’t service your requests if you are over them. So if you’re already close to the upper limits of a specific tier, does it make sense to select this tier or the next one, assuming your business grows? 

Do these limits match the model and operational requirements? 

Your model might require a bike profile to route your cargo bikes, but the service only offers truck or car profiles. What now? Make sure that your operational requirements match what your service provider can offer. Otherwise you need to find a way to make it work (e.g., use a scale factor to downgrade trucks to cargo bikes). Maybe you need to consider specific truck dimensions or you have special requirements because you’re transporting hazardous materials. If this isn’t considered by your third-party service, you will need to invest some additional work before you can use the data for your operations. 

Conclusion

When you’re thinking about how to incorporate map and traffic data into your decision model, there are several angles and questions to consider. The list of questions we showed in this post is by no means exhaustive and there will be many considerations unique to your business. We hope this gives you a good starting point for your internal conversations around this topic. If there are additional considerations we should add to this post or if you’d like to chat with us further on this topic, comment on our community forum or send us a message.  

In a future post, we will walk through an example comparing the different approaches. Stay tuned!

Video by:
No items found.