Following on from my previous here, I wanted to present another use of linear optimisations: **grouping points on a plane into closest pairs**.

Imagine we have 4 points and we would like to group them into (unique) pairs such that the sum of distances (given the pairs) is minimised. If A is closest to B, this does necessarily mean that B is closest to A (perhaps B is next-door to C); however even if B is closest to C and C is closest to B it may still not be optimal to group them. For example:

Imagine AB = 10 miles, BC = 5 miles, CD = 10 miles. If we group BC then we have to group AD, thus our total distance is 5 + (10+10+5) = 30 miles.

However, if we group AB and CD then we have 10 + 10 miles = 20 miles. So the problem is not immediately obvious.

Let’s go through an example of grouping just 10 points:

I import my data into a list called “*shops*” to hold the co-ordinates and generate an ID which is equal to the row-number.

import csv from math import radians, cos, sin, asin, sqrt from pulp import * ################# # [A] Read-in list ################# shops = [] new_id = 1 with open('points_to_pair.csv') as f: reader = csv.reader(f) headers = next(f) for x in reader: # ID equal to row in CSV, lat, lon shops.append([new_id,x[0],x[1]]) new_id += 1 f.close() shops = shops[:10] print("We have %d points" % len(shops))

Since our distances are symmetric (A to B = B to A) we need to only calculate the lower triangle of the distance matrix (using the *haversine* formula) like so:

This means if we have n stores we will have n(n-1)*0.5 connections to consider, with 10 stores this means 45. To put it another way we want to minimise the sum of all of these 45 variables:

Such that each store is only connected to one (other) store:

And that our (45) variables are binary (0 = no connection or 1 = connect). The result of the optimisation should give us 5 variables (since we have 10 stores) equal to 1. Hence, we create a few helper functions to assist with generating the objective and constraints (I create generator functions using yield to avoid loading all the crosses into RAM):

def haversine(origin, destination): """ Find distance between a pair of lat/lng coordinates """ # convert decimal degrees to radians lat1, lon1, lat2, lon2 = map(radians, [float(origin[0]), float(origin[1]), float(destination[0]), float(destination[1])]) dlon = lon2 - lon1 dlat = lat2 - lat1 a = sin(dlat / 2) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2 c = 2 * asin(sqrt(a)) r = 3959 # Radius of earth in miles # Miles rounded to 3dp return round((c * r),3) def store_connect(shops): """ Cross stores and return connection e.g. "v1-2" by ID """ for i in range(len(shops)-1): for j in range(i+1,len(shops)): yield 'v'+str(shops[i][0])+'-'+str(shops[j][0]) def distances(shops): """ Return distance in miles for crosses """ for i in range(len(shops)-1): for j in range(i+1,len(shops)): yield haversine([shops[i][1],shops[i][2]], [shops[j][1],shops[j][2]]) def all_constraints(shops): """ Return constraint row """ for a in shops: cons = [] for b in shops: if a[0]<b[0]: cons.append("v"+str(a[0])+"-"+str(b[0])) elif a[0]>b[0]: cons.append("v"+str(b[0])+"-"+str(a[0])) yield cons

Coupled with a function to help us visualise the end-result:

""" Helper function to plot map """ def plot_points(all_stores, connections, name='closest_point.html'): """ Helper function: Plot the network on an easy-to-read google-map """ map = [] for x in all_stores: map.append("<marker name = \"%s\" lat = \"%.5f\" lng = \"%.5f\" col = \"green\"/>" % (x[0],float(x[1]),float(x[2]))) for x in connections: map.append("<line latfrom = \"%.5f\" longfrom = \"%.5f\" latto = \"%.5f\" longto = \"%.5f\"/>" % (float(x[0]),float(x[1]),float(x[2]),float(x[3]))) # Map part 1 htmltext = """<!DOCTYPE html > <style type="text/css"> html, body { height: 100%; width: 100%; padding: 0; margin: 0; } .labels { color: black; background-color: white; font-family: "Lucida Grande", "Arial", sans-serif; font-size: 10px; text-align: center; width: 45px; border: 0 solid black; white-space: nowrap; } </style> <html> <head> <meta name="viewport" content="initial-scale=1.0, user-scalable=no" /> <meta http-equiv="content-type" content="text/html; charset=UTF-8"/> <title>Network</title> <xml id="myxml">""" # Map part 2 for x in map: htmltext += x + "\n" # Map part 3 htmltext += """ </xml> <script type="text/javascript" src="https://maps.googleapis.com/maps/api/js?&libraries=geometry"></script> <script type="text/javascript" src="http://google-maps-utility-library-v3.googlecode.com/svn/trunk/markerwithlabel/src/markerwithlabel.js"></script> <script> var XML = document.getElementById("myxml"); if(XML.documentElement == null) XML.documentElement = XML.firstChild; var MARKERS = XML.getElementsByTagName("marker"); var LINES = XML.getElementsByTagName("line"); function initialize() { var mapOptions = { center: new google.maps.LatLng(51.47010,-0.10617), zoom: 12, styles: [ {"featureType":"administrative","stylers":[{ "saturation":-80},{"visibility":"off"}]}, {"featureType":"landscape.natural","elementType":"geometry","stylers":[{"color":"#d0e3b4"}]}, {"featureType":"landscape.natural.terrain","elementType":"geometry","stylers":[{"visibility":"off"}]}, {"featureType":"poi","elementType":"labels","stylers":[{"visibility":"off"}]}, {"featureType":"poi.business","elementType":"all","stylers":[{"visibility":"off"}]}, {"featureType":"poi.medical","elementType":"geometry","stylers":[{"color":"#fbd3da"}]}, {"featureType":"poi.park","elementType":"geometry","stylers":[{"color":"#bde6ab"}]}, {"featureType":"road","elementType":"geometry.stroke","stylers":[{"visibility":"off"}]}, {"featureType":"road","elementType":"labels","stylers":[{"visibility":"off"}]}, {"featureType":"road.highway","elementType":"geometry.fill","stylers":[{"color":"#ffe15f"}]}, {"featureType":"road.highway","elementType":"geometry.stroke","stylers":[{"color":"#efd151"}]}, {"featureType":"road.arterial","elementType":"geometry.fill","stylers":[{"color":"#ffffff"}]}, {"featureType":"road.local","elementType":"geometry.fill","stylers":[{"color":"black"}]}, {"featureType":"transit","stylers":[{"visibility":"off"}]}, {"featureType":"water","elementType":"geometry","stylers":[{"color":"#a2daf2"}]} ] }; var map = new google.maps.Map(document.getElementById('map'), mapOptions); var bounds = new google.maps.LatLngBounds(); for (var i = 0; i < MARKERS.length; i++) { var name = MARKERS[i].getAttribute("name"); var point_i = new google.maps.LatLng( parseFloat(MARKERS[i].getAttribute("lat")), parseFloat(MARKERS[i].getAttribute("lng"))); var col = MARKERS[i].getAttribute("col") if (col == "green") { var ic = "https://storage.googleapis.com/support-kms-prod/SNP_2752129_en_v0" } else { var ic = "https://storage.googleapis.com/support-kms-prod/SNP_2752125_en_v0" } var marker = new MarkerWithLabel({ position: point_i, map: map, icon: ic, //labelContent: name, labelAnchor: new google.maps.Point(22, 0), labelClass: "labels", // the CSS class for the label labelStyle: {opacity: 0.75}, }) bounds.extend(point_i); }; for (var i = 0; i < LINES.length; i++) { var point_a = new google.maps.LatLng( parseFloat(LINES[i].getAttribute("latfrom")), parseFloat(LINES[i].getAttribute("longfrom"))); var point_b = new google.maps.LatLng( parseFloat(LINES[i].getAttribute("latto")), parseFloat(LINES[i].getAttribute("longto"))); var flightPlanCoordinates = [ point_a, point_b ]; var flightPath = new google.maps.Polyline({ path: flightPlanCoordinates, geodesic: true, strokeColor: 'black', strokeOpacity: 1, strokeWeight: 1 }); flightPath.setMap(map); }; } google.maps.event.addDomListener(window, 'load', initialize); </script> </head> <body> <center> <div id="map" style="width:95%; height:1200px;"></div> <center> </body> </html> """ with open(name, 'w') as f: f.write(htmltext) f.close()

We can now set-up our optimisation

################# # [B] Optimisation ################# prob = LpProblem("Minimise distance",LpMinimize) # Variables x = [] xnames = [] for one in store_connect(shops): xnames.append(one) x.append(LpVariable(one,0,1,LpInteger)) print("Created %d variables" % len(x)) # Objective prob += pulp.lpSum([i[1]*x[i[0]] for i in enumerate(distances(shops))]) print("Created objective") # Constraints counter = 0 for constraint_rw in all_constraints(shops): prob += pulp.lpSum(x[xnames.index(one_cn)] for one_cn in constraint_rw) == 1 counter += 1 if counter % 10 == 0: print("Added %d contraints out of %d" % (counter,len(shops))) # Save "LP" file for other solvers prob.writeLP("lp_for_cplex.lp") """ Using SCIP: read xxxxx.lp optimize write solution xxxx.sol Using CPLEX: display solution variables - """

Solving this example is fairly easy:

################# # [C] Solve ################# #Using PULP - if not too complicated ... otherwise CPLEX: prob.solve() # The status of the solution is printed to the screen print("Status:", LpStatus[prob.status]) # Optimum variable values sol = [] for v in prob.variables(): if v.varValue: sol.append(v.name) print("Total minimised distance: ", value(prob.objective)) shp_names = [row[0] for row in shops] edges = [] for x in sol: v_list = (x)[1:].split("_") #remove the first letter 'v' and split conn = [it for sub in [shops[shp_names.index(float(v_list[0]))][1:3], shops[shp_names.index(float(v_list[1]))][1:3]] for it in sub] edges.append(conn) # Print diagram plot_points(shops,edges)

We get a minimised distance of 190.6 Miles – our visualisation helper function shows the matches:

If we re-run with 50 points -> this ups our variables to 1225 and constraints to 50, our objective function is minimised with a value of 506.3 Miles. Here is a zoom on an interesting part of the new map:

We can now see that the matching is not as trivial as in the previous example! Some very close points are not matched (similar to the illustration at the very top of this page).

## Bigger: 1000 Points

With 1000 points we end up with nearly half a million variables. This requires a much more powerful solver (e.g CPLEX). I use the generated “.lp” file and submit it to CPLEX on the NEOS servers. I also submit a text-file to execute at the end to show me the variables which are equal to 1:

display solution variables –

It takes the solver around 660 seconds to solve the whole thing from start to finish (using 32 threads!). There are 55 solutions (since there is nothing to guarantee uniqueness) which minimise the objective function at a value of 2,273.5573 Miles

Solution pool: 55 solutions saved.

MIP – Integer optimal, tolerance (0.0001/1e-06): Objective = 2.2735572700e+03 Current MIP best bound = 2.2733393433e+03 (gap = 0.217927, 0.01%) Solution time = 660.74 sec. Iterations = 83345 Nodes = 18700 (6) Deterministic time = 279118.77 ticks (422.43 ticks/sec)

I save the 500 variable names to a CSV and load it into Python like so;

################# # [D] Solve - CPLEX SOLVER ################# sol = [] with open('cplex_out.csv') as f: reader = csv.reader(f) for x in reader: sol.append(x) f.close() shp_names = [row[0] for row in shops] edges = [] for x in sol: v_list = (x[0])[1:].split("_") #remove the first letter 'v' and split conn = [it for sub in [shops[shp_names.index(v_list[0])][1:3],shops[shp_names.index(v_list[1])][1:3]] for it in sub] edges.append(conn) # Print diagram plot_points(shops[:1000],edges)

Below is the diagram of all the connections:

Finally, a zoom on a sub-section:

Perhaps a way to push this past 1000 points (e.g. 10,000 points) is to find a way of dividing the points into 100 groups of 100 (provided that at the optimal solution the groups do not cross) … somehow: