The nearest neighbour functionality of a K-D tree naturally lends itself to the task of reverse-geocoding. Consider the blue point I randomly dropped into the Thames (51.500478, -0.122118) – we want to find the closest postcode to this point (perhaps within a maximum cut-off of 1 Mi)…

rev_geocoder

In a similar spirit to this: we use Numpy to load all the postcodes in the UK into a scipy.spatial.cKDTree and then query it with a list of points we are interested in. We then extracted the postcode for the closest match and save it as a CSV:

"""
Nearest Neighbour Search / Reverse Geocoding

Get nearest postcode for each point in list:
points(latitude,longitudes)
"""

import numpy as np
from pyproj import Proj, transform
from scipy.spatial import cKDTree

# Postcodes location (full 1.7 mill live UK postcodes)
# https://geoportal.statistics.gov.uk/geoportal/catalog/content/filelist.page?&search=onspd
postcodes_csv = '../_postcodes.csv'
# Points to get nearest postcode for (lat|lon)
in_csv = '.../nn_in.csv'
# Output
out_csv = '.../nn_out.csv'

# Projection: http://epsg.io/27700-5339 (accuracy: 1.0m)
uk = '+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 \
+x_0=400000 +y_0=-100000 +ellps=airy \
+towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs'

def proj_arr(points,proj_to):
    """
    Project geographic co-ordinates to get cartesian x,y
    points = [lat,lon]
    Transform(origin|destination|lon|lat)
    """
    inproj = Proj(init='epsg:4326')
    outproj = Proj(proj_to)
    func = lambda x: transform(inproj,outproj,x[1],x[0])
    return np.array(list(map(func, points)))


# Load postcodes
postcodes = np.genfromtxt(postcodes_csv,
                          delimiter=',',
                          names=True,
                          usecols=(0,1,2),
                          dtype=('U7',float,float))
proj_postcodes = proj_arr(postcodes[['lat','lon']],uk)
print("We have %d postcodes" % len(proj_postcodes))

# Load points to query:
points = np.genfromtxt(in_csv,
                       delimiter=',',
                       names=True,
                       usecols=(0,1),
                       dtype=(float,float))
proj_points = proj_arr(points[['lat','lon']],uk)
print("We have %d points to query" % len(proj_points))

# Create KD tree for nearest-neighbour
%time tree = cKDTree(proj_postcodes)
%time out_postcodes = tree.query(proj_points)

# Save the postcodes
np.savetxt(out_csv, postcodes['pcd'][out_postcodes[1]],fmt=('%s'))

The closest post-code returned for my test-point was SW1A2JH – I thought that SW1A0AA would be closer, fortunately it was .02 Miles further away. This slightly bastardises the notion that a postcode is a polygon and not a point (or a polygon centroid), which means that although Big Ben is closer, its postcode ‘SW10AA’ covers a large area and the ‘centre’ returned is actually further away – but the idea still stands.

To check some timings I ran this on a list of 10,000 co-ordinates and it took seconds:

time

2.54 seconds to build the KD Tree and 24ms to query it with 10,000 points