I wanted to use geolocation data available on Flickr, Instagram, and Pynormaio to find the most popular photographs for an area. In most cases an area is defined as a radius, for example: show me all photos within 200 metres of this latitude/longitude and then return the most popular. Then, I would plot the returned photograph on a map cropped to that very bounding-box.

The mains issues I had were:

  1. Whether to go for a popular photograph (by views? by favourites? by shares?) or a ‘representative photograph’ (would I cluster by features? by colour buckets?). I decided to go by view-count.
  2. Source for the photographs: Instagram had too many selfies (which were very popular – way to go!) and Flickr had too many glamour/nude photos. I realised what I wanted were cityscapes/landscapes and thus settled on Panoramio
  3. It turns out that – in general – the probability of a photo being geotagged was negatively correlated with my rating for it as a photograph. The best photographs just weren’t geotagged (I suspect because the user would invest some time into post-processing them?)

For example here is central London (with the river):

Panoramio of City

Here is all of Greater London (with the outline):

Panoramio of London

Here is all of Greater London using Flickr rather than Panoramio:

Flickr of London

For the full Python and R code check the end of the post, my methodology was as follows:

  1. Use the ‘createcoordinates‘ function to tessellate a bounding box with circles, which would become the search-area and return a list of circle centres as coordinates
  2. Loop through the  coordinates and download all the photos (get_pynoramio_photos) I could contained within. Usually it was not possible to (i) get the most popular photos and (ii) get photos in a bounding circle so I would also download the ‘views’ for a photo and then sort my list based on the number of views
  3. Run some operations on the photos (prep_photos_for_map) to get them ready for the R-script (I felt it was easier to do the data-mining in Python and the graphing in R)
  4. The operations were to create a list of all the most-popular photos (“1.jpg” since they were sorted) and then crop the rectangular photograph to a circle (circ_bounding_box) and embed in the file name the bottom, top, left, right coordinates of the circle – for example: “51.24800_51.28397_0.00535_0.06283.png”. All the R code needed to do was then just position the (circular) photograph in that bounding box. This was run in parallel as it was somewhat processor-intensive (in my data-mining stage I also opted for the largest photo-size possible which was probably a mistake).
  5. I was able to create the visualisation in R using ggplot2 and the annotation_custom command i.e. plot an image on a latitude/longitude graph.

Step 1 – Python Data-Mining Script:

import flickrapi
import pynoramio
import requests
import shutil
import math
from bs4 import BeautifulSoup
from PIL import Image, ImageOps, ImageDraw
from multiprocessing import Pool
import os
import string
import time

api_key = '...'
secret = '...'
flickr = flickrapi.FlickrAPI(api_key, secret)
TRY_ATTEMPT_MAX = 10
master_list = []


class coordinates_box(object):
    """
    Initialise a coordinates_box class which will hold the produced coordinates and
    output a html map of the search area
    """

    def __init__(self, radius_km):
        """ Initialise with a radius in kilometres """
        self.coordset = []
        self.radius_km = radius_km

    def createcoordinates(self,
                          southwest_lat,
                          southwest_lng,
                          northeast_lat,
                          northeast_lng,
                          log_path=''):
        """
        Based on the input radius this tesselates a 2D space with circles in
        a hexagonal structure
        """
        earth_radius_km = 6371
        lat_start = math.radians(southwest_lat)
        lon_start = math.radians(southwest_lng)
        lat = lat_start
        lon = lon_start
        lat_level = 1
        while True:
            if (math.degrees(lat) <= northeast_lat) & (math.degrees(lon) <= northeast_lng):
                self.coordset.append([math.degrees(lat), math.degrees(lon)])
            parallel_radius = earth_radius_km * math.cos(lat)
            if math.degrees(lat) > northeast_lat:
                break
            elif math.degrees(lon) > northeast_lng:
                lat_level += 1
                lat += (self.radius_km / earth_radius_km) + (self.radius_km / earth_radius_km) * math.sin(
                    math.radians(30))
                if lat_level % 2 != 0:
                    lon = lon_start
                else:
                    lon = lon_start + (self.radius_km / parallel_radius) * math.cos(math.radians(30))
            else:
                lon += 2 * (self.radius_km / parallel_radius) * math.cos(math.radians(30))

        print('Coordinates-set contains %d coordinates' % len(self.coordset))
        # LOG MAP
        if log_path:
            self.htmlmaplog(log_path)

    def htmlmaplog(self,
                   map_save_path):
        """
        Outputs a HTML map
        """
        htmltext = """
        <!DOCTYPE html >
          <style type="text/css">
                    html, body {
                        height: 100%;
                        width: 100%;
                        padding: 0px;
                        margin: 0px;
                    }
        </style>
        <head>
        <meta name="viewport" content="initial-scale=1.0, user-scalable=no" />
        <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
        <title>Boundary Partitioning</title>
        <xml id="myxml">
        <markers>
        """
        # Content
        for coord in self.coordset:
            rowcord = '<marker name = "' + "photos" + '" lat = "' + \
                      '%.5f' % coord[0] + '" lng = "' + '%.5f' % coord[1] + '"/>\n'
            htmltext += rowcord
        # Bottom
        htmltext += """
        </markers>
        </xml>
        <script type="text/javascript" src="https://maps.googleapis.com/maps/api/js?&sensor=false&libraries=geometry"></script>
        <script type="text/javascript">
        var XML = document.getElementById("myxml");
        if(XML.documentElement == null)
        XML.documentElement = XML.firstChild;
        var MARKERS = XML.getElementsByTagName("marker");
        """
        htmltext += "var radius_km = " + str(self.radius_km) + ";"
        htmltext += """
        var map;
        var geocoder = new google.maps.Geocoder();
        var counter = 0
        function load() {
            // Initialize around City, London
            var my_lat = 51.518175;
            var my_lng = -0.129064;
            var mapOptions = {
                    center: new google.maps.LatLng(my_lat, my_lng),
                    zoom: 12
            };
            map = new google.maps.Map(document.getElementById('map'),
                mapOptions);
            var bounds = new google.maps.LatLngBounds();
            for (var i = 0; i < MARKERS.length; i++) {
                var name = MARKERS[i].getAttribute("name");
                var point_i = new google.maps.LatLng(
                    parseFloat(MARKERS[i].getAttribute("lat")),
                    parseFloat(MARKERS[i].getAttribute("lng")));
                var icon = {icon: 'http://labs.google.com/ridefinder/images/mm_20_gray.png'};
                var col = '#0033CC';
                var draw_circle = new google.maps.Circle({
                    center: point_i,
                    radius: radius_km*1000,
                    strokeColor: col,
                    strokeOpacity: 0.15,
                    strokeWeight: 2,
                    fillColor: col,
                    fillOpacity: 0.15,
                    map: map
                });
                var marker = new google.maps.Marker({
                    position: point_i,
                    map: map,
                    icon: 'https://maps.gstatic.com/intl/en_us/mapfiles/markers2/measle_blue.png'
                })
                bounds.extend(point_i);
            };
            map.fitBounds(bounds);
        }
        </script>
        </head>
        <body onload="load()">
        <center>
        <div style="padding-top: 20px; padding-bottom: 20px;">
        <div id="map" style="width:90%; height:1024px;"></div>
        </center>
        </body>
        </html>
        """
        with open(map_save_path, 'w') as f:
            f.write(htmltext)
        f.close()


def get_flickr_photos(lat='51.515932',
                      lon='-0.082484',
                      radius='0.5',
                      path='H:/flickrapi',
                      limit=50):
    """
    Mine photos from flickr API. Specify extra paramter (views) to then sort the photos in a list, where
    views are descending
    """
    try_count = 0
    while try_count < TRY_ATTEMPT_MAX:
        try:
            photos = flickr.photos_search(lat=lat, lon=lon, radius=radius, extras='views')
            print("Found %d photos" % len(photos[0]))
            break
        except Exception as err:
            time.sleep(5)
            print('Waiting 5 seconds ... retry as error submitting query: %s' % err)
            try_count += 1
    if try_count == TRY_ATTEMPT_MAX:
        print("Could not find any photos")
        return

    photos_to_download = []
    counter = 1
    for photo in photos[0]:
        try:
            photoSizes = flickr.photos_getSizes(photo_id=photo.attrib['id'])
            # Largest size available: index -1
            url = photoSizes[0][-1].attrib['source']
            # De-duplicate (arbitrarily)
            if url not in master_list:
                views = float(photo.get('views'))
                print('%d views: %d' % (counter, views))
                print([url, views])
                photos_to_download.append([url, views])
                counter += 1
            else:
                print("Photos already exists - %s" % url)
        except Exception as err:
            time.sleep(5)
            print('Waiting 5 seconds ... retry as error retrieving url: %s' % err)
            print('Skipping Photo')

    # Sort photos by views
    print('Sorting photos by views')
    sorted_photos = sorted(photos_to_download, key=lambda x: x[1], reverse=True)
    # Technically append only if downloaded rather than just 'viewed'
    for x in sorted_photos:
        master_list.append(x[0])

    # Find most popular 50 photos
    counter = 1
    for photo in sorted_photos:
        if counter <= limit:
            url = photo[0]
            time.sleep(1)
            try_count = 0
            while try_count < TRY_ATTEMPT_MAX:
                try:
                    response = requests.get(url, stream=True)
                    print("Downloading no. %d - %s - %d views" % (counter, url, photo[1]))
                    with open(path + '/%d.jpg' % counter, 'wb') as out_file:
                        shutil.copyfileobj(response.raw, out_file)
                    counter += 1
                    break
                except Exception as err:
                    time.sleep(5)
                    print('Waiting 5 seconds ... retry as error downloading photo: %s' % err)
                    try_count += 1
            if try_count == TRY_ATTEMPT_MAX:
                print("Could not find any photos")


def get_instagram_photos(lat='',
                         lon='',
                         rad_m=2000,
                         path='H:/instapi',
                         limit=50,
                         access_token='...'):
    """
    Mine photos from Instagram API
    """
    try_count = 0
    while try_count < TRY_ATTEMPT_MAX:
        try:
            url = "https://api.instagram.com/v1/media/search?lat=%s&lng=%s&distance=%s&access_token=%s" % (
                lat, lon, rad_m, access_token)
            print(url)
            rep = requests.get(url)
            reps = rep.json()
            break
        except Exception as err:
            time.sleep(5)
            print('Waiting 5 seconds ... retry as error submitting query: %s' % err)
            try_count += 1
    if try_count == TRY_ATTEMPT_MAX:
        print("Could not find any photos")
        return

    photos_to_download = []
    counter = 1
    for i in range(len(reps['data'])):
        try:
            url = reps['data'][i]['images']['thumbnail']['url']
            likes = reps['data'][i]['likes']['count']
            # High-quality image:
            url = url.replace('s150x150/', '')
            if url not in master_list:
                print('%d views: %d' % (counter, likes))
                print([url, likes])
                photos_to_download.append([url, likes])
                counter += 1
            else:
                print("Photos already exists - %s" % url)
        except Exception as err:
            time.sleep(5)
            print('Waiting 5 seconds ... retry as error retrieving url: %s' % err)
            print('Skipping Photo')

    # Sort photos by views
    print('Sorting photos by views')
    sorted_photos = sorted(photos_to_download, key=lambda x: x[1], reverse=True)
    for x in sorted_photos:
        master_list.append(x[0])

    # Download
    counter = 1
    for photo in sorted_photos:
        if counter <= limit:
            url = photo[0]
            time.sleep(1)
            try_count = 0
            while try_count < TRY_ATTEMPT_MAX:
                try:
                    response = requests.get(url, stream=True)
                    print("Downloading no. %d - %s - %d views" % (counter, url, photo[1]))
                    with open(path + '/%d.jpg' % counter, 'wb') as out_file:
                        shutil.copyfileobj(response.raw, out_file)
                    counter += 1
                    break
                except Exception as err:
                    time.sleep(5)
                    print('Waiting 5 seconds ... retry as error downloading photo: %s' % err)
                    try_count += 1
            if try_count == TRY_ATTEMPT_MAX:
                print("Could not find any photos")


def get_pynoramio_photos(lat='',
                         lon='',
                         radius='',
                         path='',
                         limit=50):
    """
    Mine photos from pynoramia API
    """
    # Get bounding box (slightly different to the flickr and instagram APIs)
    ymax = float(select_destination([lat, lon], 0, radius)[0])
    xmax = float(select_destination([lat, lon], 90, radius)[1])
    ymin = float(select_destination([lat, lon], 180, radius)[0])
    xmin = float(select_destination([lat, lon], 270, radius)[1])
    print(ymin, ymax, xmin, xmax)
    try_count = 0
    while try_count < TRY_ATTEMPT_MAX:
        try:
            # Initiate
            pyn = pynoramio.Pynoramio()
            # Get photos
            photos = pyn.get_from_area(lat_min=ymin, lat_max=ymax, lon_min=xmin, lon_max=xmax)
            print(photos)
            break
        except Exception as err:
            print('Waiting 5 seconds ... retry as error submitting query: %s' % err)
            time.sleep(5)
            try_count += 1
    if try_count == TRY_ATTEMPT_MAX:
        print("Could not find any photos")
        return

    photos_to_download = []
    counter = 1
    for photo in photos['photos']:
        # Small hack to get the views -> is there a better way?
        try:
            photog_id = str(photo['owner_url']).split("/")[-1]
            photo_id = photo['photo_id']
            view_url = 'http://www.panoramio.com/photo_counter_snippet?photographer_id=%s&photo_id=%s&page=photo' % (
                photog_id, photo_id)
            soup = BeautifulSoup(requests.get(view_url).content, "lxml")
            views = float(soup.find('a').text.split()[0])
        except Exception as err:
            print('Something went wrong getting views for %s' % url)
            views = float(0)

        try:
            # url = photo['photo_file_url']
            url = "http://static.panoramio.com/photos/original/%s.jpg" % photo_id
            title = photo['photo_title']
            # De-duplicate (arbitrarily)
            if url not in master_list:
                print('%s %d views: %d' % (title, counter, views))
                print([url, views])
                photos_to_download.append([url, views])
                counter += 1
            else:
                print("Photos already exists - %s" % url)
        except Exception as err:
            print('Waiting 5 seconds ... retry as error retrieving url: %s' % err)
            time.sleep(5)
            print('Skipping Photo')

    # Sort photos by views
    print('Sorting photos by views')
    sorted_photos = sorted(photos_to_download, key=lambda x: x[1], reverse=True)

    # Find most popular 50 photos
    counter = 1
    for photo in sorted_photos:
        if counter <= limit:
            url = photo[0]
            # time.sleep(1)
            try_count = 0
            while try_count < TRY_ATTEMPT_MAX:
                try:
                    response = requests.get(url, stream=True)
                    print("Downloading no. %d - %s - %d views" % (counter, url, photo[1]))
                    with open(path + '/%d.jpg' % counter, 'wb') as out_file:
                        shutil.copyfileobj(response.raw, out_file)
                    counter += 1
                    master_list.append(url)
                    break
                except Exception as err:
                    time.sleep(5)
                    print('Waiting 5 seconds ... retry as error downloading photo: %s' % err)
                    try_count += 1
            if try_count == TRY_ATTEMPT_MAX:
                print("Could not find any photos")


def select_destination(origin_geocode='',
                       angle='',
                       radius_km=''):
    """
    Create latitude/longitude points based on distance from given point
    Supply radius im km -> Gets converted to miles
    Angle is in decimal degrees
    """
    radius = radius_km * 0.621371
    r = 3959  # Radius of the Earth in miles
    bearing = math.radians(angle)  # Bearing in radians converted from angle in degrees
    lat1 = math.radians(float(origin_geocode[0]))
    lng1 = math.radians(float(origin_geocode[1]))
    lat2 = math.asin(math.sin(lat1) * math.cos(radius / r) +
                     math.cos(lat1) * math.sin(radius / r) * math.cos(bearing))
    lng2 = lng1 + math.atan2(math.sin(bearing) * math.sin(radius / r) *
                             math.cos(lat1), math.cos(radius / r) - math.sin(lat1) * math.sin(lat2))
    lat2 = math.degrees(lat2)
    lng2 = math.degrees(lng2)
    # Check distances (miles)
    # print(haversine(origin_geocode,['%.6f' % lat2, '%.6f' % lng2]))
    # 5 Decimal Places: 1.11metres
    return ['%.5f' % lat2, '%.5f' % lng2]


def circ_bounding_box(x):
    """
    Parallel:
    x = zip(imlist, input_coordinates, radius_km, destination_main_path)
    """
    print(x)
    xmax = select_destination(x[1], 0, x[2])[0]
    ymax = select_destination(x[1], 90, x[2])[1]
    xmin = select_destination(x[1], 180, x[2])[0]
    ymin = select_destination(x[1], 270, x[2])[1]
    image_name = "%s_%s_%s_%s" % (xmin, xmax, ymin, ymax)

    # Crop image top circle
    im = Image.open(x[0])
    print(im.size)  # Original
    size = (min(im.size), min(im.size))
    mask = Image.new('L', size, 0)
    draw = ImageDraw.Draw(mask)
    draw.ellipse((0, 0) + size, fill=255)

    output = ImageOps.fit(im, mask.size, centering=(0, 0))
    output.putalpha(mask)

    # Reduce size to 900 pixels (will be 900 x 900)
    basewidth = 900
    wpercent = (basewidth / float(output.size[0]))
    hsize = int((float(output.size[1]) * float(wpercent)))
    output = output.resize((basewidth, hsize), Image.ANTIALIAS)
    output.save(x[3] + "/" + image_name + ".png")


def prep_photos_for_map(photos_main_path='',
                        destination_main_path='',
                        radius_km=1):
    """
    Recursively scans the main path for photos labelled "1.jpg"
    Crops the photo to a circle
    Renames file to bounding-box (xmin, xmax, ymin, ymax) for plotting in R
    """
    if not os.path.exists(destination_main_path):
        os.makedirs(destination_main_path)
    else:
        print("Folder already exists")
        # raise

    # In this version use "1.jpg" which is the picture with the most views
    # Recursive (includes sub-folders)
    imlist = [os.path.join(root, f) for root, subdirs, files in os.walk(photos_main_path)
              for f in files
              if os.path.splitext(f)[1].lower() in ('.jpg')
              and os.path.splitext(f)[0].lower() == "1"]

    print("Found %d images" % len(imlist))
    print(imlist[0])
    # Create bounding box
    # -1 index is last element and 0 is first element
	# Basically the folder name (of the file) gives the coordinate centre used so we extract that
    input_coordinates = [str(os.path.split((os.path.split(im)[0]))[-1]).split("_") for im in imlist]
    print(input_coordinates[0])

    # Run in parallel:
    pool = Pool()
    mp_in = zip(imlist, input_coordinates, [radius_km] * len(imlist), [destination_main_path] * len(imlist))
    pool.map(circ_bounding_box, mp_in)
    pool.close()  # This means that no more tasks will be added to the pool
    pool.join()  # This blocks the program till function is run on all the items
    print('Job Completed successfully ...')


if __name__ == '__main__':
    # 1. Initialise
    using_rad = 0.2  # km
    coord = coordinates_box(using_rad)
    coord.createcoordinates(51.492524,
                            -0.156292,
                            51.521105,
                            -0.065397,
                            log_path='H:/photo_circles.html')  # CITY 1000m

    # 2. Loop through circles
    for coord in coord.coordset:
        use_lat = "%0.6f" % coord[0]
        use_lon = "%0.6f" % coord[1]
        # Convert latitude and longitude into valid Windows file-name
        valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits)
        my_loc = '%s_%s' % (use_lat, use_lon)
        my_loc_safe = ''.join(c for c in my_loc if c in valid_chars)
        print(my_loc_safe)
        big_path = 'C:/get_pynoramio_photos'
        path = big_path + '/%s/' % my_loc

        if not os.path.exists(path):
            os.makedirs(path)
            """
            Get Photos!
            Flickr and Instagram did not return the photos I wanted but pynormaio was the clear winner however
            """
            # get_flickr_photos(lat=float(use_lat), lon=float(use_lon), path=path, limit=50, radius=using_rad)  #FLICKR
            # get_instagram_photos(lat=float(use_lat), lon=float(use_lon), path=path, limit=50, rad_m=using_rad*1000)  #INSTAGRAM
            #get_pynoramio_photos(lat=float(use_lat), lon=float(use_lon), radius=using_rad, path=path, limit=50)
        else:
            print("Folder already exists - skipping")
            # Can re-run if stops half-way through
            pass

    # 3. Prepare photos for R-mapping
    prep_photos_for_map(big_path, big_path + '_out', radius_km=using_rad)

Step 2 – R Data-Visualisation Script:

library(png)
library(grid)
library(ggplot2)
library(rgdal)

img_path <- 'C:/pano_vision_big_out/'
setwd(img_path)

# Initiate a blank data-frame
df <- data.frame(x=(0:0), 
                 y=(0:0))

# An input to this is for example: 51.24800_51.28397_0.00535_0.06283.png
# Create a list of the original file name and then a list of 4 (the bounding box)
# Split by the "_" character
plot_files <- lapply(list.files(img_path, pattern="*.png"),
                     function(x) list(unlist(strsplit(gsub(".png","",x), "_")),x)
                     
)

p <- ggplot(df, aes(x,y))
counter = 1
# Loop through the list of files
for (i in plot_files) {
  bb <- i[[1]]  # Bounding Box
  im_loc <- i[[2]]  # Image
  print(im_loc) 
  print(counter)
  counter <- counter + 1
  # LOAD the image
  img <- readPNG(im_loc)
  # Render the raster at the given location
  try(g <- rasterGrob(img, interpolate=TRUE, width = 1, height=1))
  if ('try-error' %in% class(g)) next
  # Add to plot using "annotation_custom" 
  # Specical geometry for static annotations
  # This is why we needed the latitude/longitude in the file_name
  # For the the bounding box:
  p <- p + annotation_custom(g, xmin=as.numeric(unlist(bb)[3]),
                             xmax=as.numeric(unlist(bb)[4]),
                             ymin=as.numeric(unlist(bb)[1]),
                             ymax=as.numeric(unlist(bb)[2]))
}

# Add some background geometry (in this case an outline fo london)
p <- p +  geom_polygon(data=london, aes(long, lat, group=group),
                       color='white', fill=NA, size=1, alpha=1)
p <- p + ylim(51.27, 51.7) + xlim(-0.52,0.34)
p <- p + theme(panel.background = element_rect(fill='#2C3539',colour='#2C3539')) 
p <- p + coord_equal(ratio=122/78)

#Remove axis
xquiet<- scale_x_continuous("", breaks=NULL)
yquiet<-scale_y_continuous("", breaks=NULL)
quiet<-list(xquiet, yquiet)

# Base-map (Greater London)
london_shape_loc <- '.../london_sport'
london <- readOGR(london_shape_loc, layer="london_sport") 
london <- spTransform(london, CRS("+proj=longlat +datum=WGS84"))

# ADD Water (clipped from OS)
rivers_loc <- '.../os_vector_map_merged(IK)'
rivers <- readOGR(rivers_loc, layer="TQ_TidalWater_Merge_Clip") 
rivers <- spTransform(rivers, CRS("+proj=longlat +datum=WGS84"))

# Plot
p

ggsave(file='.../pano_london.png', width=20,height=10,dpi=360)