Working in lat long: great circle distance, bearing, midpoint and centroid calculations

For my work in species distribution modeling I'm mostly working with lat long coordinates so I needed some great circle functions to calculate the point-point distance, point-to-line distance and the centroid of a group of points. Starting from 2 excellent resource
and I created the following functions in Python, with some tests. As usual the full source code can be found online:

I started with importing some functions from the math module and defining a small Point class:

from math import radians, degrees, cos, sin, sqrt, atan2, asin, fabs, pi

class Point(object):
    def __init__(self,x,y):
        self.x = x
        self.y = y

Then came some very simple functions for calculating the distance (with the Haversine formula), the bearing and the midpoint. These are almost verbatim translations from the previously mentioned resources.

def distance_haversine(A, B, radius=6371000):
    """ note that the default distance is in meters """
    dLat = radians(B.y-A.y)
    dLon = radians(B.x-A.x)
    lat1 = radians(A.y)
    lat2 = radians(B.y)
    a = sin(dLat/2) * sin(dLat/2) + sin(dLon/2) * sin(dLon/2) * cos(lat1) * cos(lat2)
    c = 2 * atan2(sqrt(a), sqrt(1-a))
    return c * radius

def bearing(A, B):
    dLon = radians(B.x-A.x)
    lat1 = radians(A.y)
    lat2 = radians(B.y)
    y = sin(dLon) * cos(lat2)
    x = cos(lat1)* sin(lat2) - sin(lat1) * cos(lat2)* cos(dLon)
    return atan2(y, x)

def bearing_degrees(A,B):
    return degrees(bearing(A,B))

def midpoint(A,B):
    ## little shortcut
    if A.x == B.x: return Point(A.x, (A.y+B.y)/2)
    if A.y == B.y: return Point((A.x+B.x)/2, A.y)
    lon1, lat1 = radians(A.x), radians(A.y)
    lon2, lat2 = radians(B.x), radians(B.y)
    dLon = lon2-lon1

    Bx = cos(lat2) * cos(dLon)
    By = cos(lat2) * sin(dLon)
    lat3 = atan2(sin(lat1)+sin(lat2), sqrt((cos(lat1)+Bx)*(cos(lat1)+Bx) + By*By))
    lon3 = lon1 + atan2(By, cos(lat1) + Bx)
    return Point(degrees(lon3),degrees(lat3))

Then came the more puzzling part where I implemented the cross track error to calculate the great circle distance from a point to a line .

def crosstrack_error(p,A,B, radius=6371000):
    """ distance (in meters) from a point to the closest point along a track """
    dAp = distance_haversine(A, p, radius=1)
    brngAp = bearing(A,p)
    brngAB = bearing(A,B)
    dXt = asin(sin(dAp)*sin(brngAp-brngAB))
    return fabs(dXt) * radius

But the results didn't always match the distances that I calculated with PostGIS so I wrote a recursive binary search function consisting of 3 parts: calculating the midpoint, calculating the distance to two end and the midpoint and continuing again with the halve that is closest to the point until the difference in distance between the point and the two ends of the line is within a certain tolerance. The disadvantage of this function is that the rounding errors in calculating the midpoint do add up and the function is not very fast. So if anyone knows what wrong with my cross track implementation or knows a better method for calculating the great circle distance between a point and a line, feel free to comment below or send me an e-mail at mail@<thisdomainhere>.com.

def point_line_distance(p, A,B, radius=6371000, tolerance=0.1): 
    # tolerance and radius in meters
    """ recursive function that halves the search space until result is within tolerance
        1) the rounding errors in midpoint do add up
        2) not very fast """
    def rec_point_line_distance(p,A,B,dA,dB):
        C = midpoint(A,B)
        dC = distance_haversine(p,C)
        if fabs(dC-dA) < tolerance or fabs(dC-dB) < tolerance:
            return dC
        elif dA < dB:
            return point_line_distance(p,A,C,dA, dC)
            return point_line_distance(p,C,B,dC, dB)
    dA = distance_haversine(p,A)
    dB = distance_haversine(p,B)
    return rec_point_line_distance(p,A,B,dA,dB)

The last function I needed was finding the centroid of a set of points in lat long coordinates. Below is an implementation from a response on First all coordinates are converted to cartesian coordinates. Then the average of the x, y and z coordinates are calculated and finally these average coordinates are converted back to longitude and latitude.
def get_centroid(points):
    sum_x,sum_y,sum_z = 0,0,0
    for p in points:
        lat = radians(p.y)
        lon = radians(p.x)
        ## convert lat lon to cartesian coordinates
        sum_x = sum_x + cos(lat) * cos(lon)
        sum_y = sum_y + cos(lat) * sin(lon)
        sum_z = sum_z + sin(lat)
    avg_x = sum_x / float(len(points))
    avg_y = sum_y / float(len(points))
    avg_z = sum_z / float(len(points))
    center_lon = atan2(avg_y,avg_x)
    hyp = sqrt(avg_x*avg_x + avg_y*avg_y) 
    center_lat = atan2(avg_z, hyp)
    return Point(degrees(center_lon), degrees(center_lat))

For the implementation and running of my tests I used the built-in unittest module in the following way.

import unittest
class Test_gc(unittest.TestCase):
    def test_distance_haversine(self):
        d = distance_haversine(Point(-10.0001, 80.0001), Point(7.935, 63.302))
        self.assertAlmostEqual(d, 1939037.0, places=0)
    def test_bearing(self):
        b = bearing(Point(-10.0001, 80.0001), Point(7.935, 63.302))
        self.assertAlmostEqual(b, 2.661709,places=6)

    def test_bearing_degrees(self):
        b = bearing_degrees(Point(-10.0001, 80.0001), Point(7.935, 63.302))
        self.assertAlmostEqual(b, 152.504694,places=6)

    def test_midpoint(self):
        m = midpoint(Point(-20.0,5.0), Point(-10.0,5.0))
        self.assertAlmostEqual(-15.0, m.x, places=6)
        m = midpoint(Point(7.0,5.0), Point(7.0,15.0))
        self.assertAlmostEqual(10.0, m.y, places=6)
    def test_crosstrack_error(self):
        p,A,B = Point(0.0, 1.0), Point(1.0, 1.0), Point(2.0, 1.0)
        d = crosstrack_error(p,A,B)
        self.assertAlmostEqual(d, distance_haversine(p,A), places=0)
        ## Apparently crosstrack_error returns incorrect results in some very specific cases (when p is completly off track)
    def test_point_line_distance(self):
        p,A,B = Point(0.0, 1.0), Point(1.0, 1.0), Point(2.0, 1.0)
        d = point_line_distance(p,A,B)
        self.assertAlmostEqual(d, distance_haversine(p,A), places=0)

        p,A,B = Point(2.0, 1.2), Point(1.0, 1.0), Point(3.0, 1.0)
        d = point_line_distance(p,A,B)
        self.assertAlmostEqual(d, distance_haversine(p,Point(2.0,1.0)), places=0)

        p,A,B = Point(0.0, 90.0), Point(-179.0, -21.0), Point(179.5, -22.0)
        d = point_line_distance(p,A,B)
        self.assertAlmostEqual(d, distance_haversine(p,A), places=0)

        p,A,B = Point(0.0, 89.0), Point(-179.0, -22.0), Point(179.5, -22.0)
        d = point_line_distance(p,A,B)
        self.assertAlmostEqual(d, distance_haversine(p,Point(0, -22)), places=0)
    def test_get_centroid(self):
        ## check crosses the north pole
        center = get_centroid([Point(0.0, 80.0), Point(180.0,80.0)])
        self.assertEqual(90.0, center.y)
        ## check multiple points
        center = get_centroid([Point(0.0, 0.0), Point(10.0,0.0), Point(5.0,10.0)])
        self.assertAlmostEqual(5.0, center.x)
        ## even more points
        center = get_centroid([Point(0.0, 0.0), Point(10.0,0.0),Point(10.0,30.0), Point(10.0,10.0), Point(20.0, 0.0)])
        self.assertAlmostEqual(10.0, center.x)
        ## not lat lon average
        center = get_centroid([Point(0.0, 30.0), Point(10.0,30.0),Point(10.0,60.0), Point(0.0,60.0)])
        self.assertNotAlmostEqual(45.0, center.y)
        self.assertAlmostEqual(5.0, center.x)
        ## crosses date line
        center = get_centroid([Point(170.0, 30.0), Point(-160.0,30.0),Point(170.0,60.0), Point(-160.0,60.0)])
        self.assertNotAlmostEqual(45.0, center.y)
        self.assertAlmostEqual(-175.0, center.x)

if __name__ == '__main__':

That's it for this post, the full code is on GitHub. If you'd like more posts like this on this blog then feel free to make some topic suggestions in the comments and make sure to subscribe to the updates of this site via RSS or via e-mail.


dBaap said...

Hi Samuel, nice post there some of these functions are really helpful. So going through it I came across the following case

lat,long = 34.076419095,-118.291352314

So when I call the function
returns (-145.923580905,-61.708647686)

Correct me if I am wrong, but guess with one point, the centroid should be the same as the given point right ?

dBaap said...

Dude, Thanks, I screwed up in using the lat and lon. It works Awesome. Thanks

Happy St Patrick's Day

Anup Pandey said...

What is A, to append since I have for values..lat1, lat2,long1, long2

Samuel Bosch said...

A and B are your points, in your case this could be:
A = Point(long1, lat1)
B = Point(long2, lat2)

Anup Pandey said...

Thanks Samuel thats really helpful. Need one help, I am trying to find point B(la/long) from another point A(lat/long) towards some distance and bearing.
Eg: 50 metres towards 45 degree ( w.r.t north) from a list of lat/long in csv

Samuel Bosch said...

Take a look at this equations, they should give you the result you need:

bombaygiraffe said...

Hi, sorry but I don't understand the comment
"A = Point(long1, lat1)
B = Point(long2, lat2)"

To measure distance I use the following:
#function to measure distance
def haversine_np(StatLng, StatLat, reallng, reallat):
#a = row.reallat + row.reallng
reallng , reallat , StatLng , StatLat = map(np.radians, [ reallng , reallat , StatLng , StatLat ])
dlon = reallng - StatLng
dlat = reallat - StatLat
a = np.sin(dlat/2.0)**2 + np.cos( StatLat ) * np.cos( reallat ) * np.sin(dlon/2.0)**2
c = 2 * np.arcsin(np.sqrt(a))
km = 6371 * c
return km

What I'm looking for is the same type of thing to return bearing, which I think is what you have written but the A & B format for points loses me sorry. can you please explain in very simple terms as I expect to pass 2 pairs of values (lat1, lng1 & lat2, lng2) Thx!