Samuel Bosch

Showing posts with label Python. Show all posts

Great circle calculations with numpy

In this very short post I want to point you to some code for calculating the centroid and distance to that centroid for a set of points in numpy. As bonus I also included some profiling. Note that I already blogged about the centroid function in a previous post.

from math import atan2, sqrt, degrees
import numpy as np
from math import radians, sin, cos

RADIUS = 6371.009


def get_centroid(points):
    xy = np.asarray(points)
    xy = np.radians(xy)
    lon, lat = xy[:, 0], xy[:, 1]
    avg_x = np.sum(np.cos(lat) * np.cos(lon)) / xy.shape[0]
    avg_y = np.sum(np.cos(lat) * np.sin(lon)) / xy.shape[0]
    avg_z = np.sum(np.sin(lat)) / xy.shape[0]
    center_lon = atan2(avg_y, avg_x)
    hyp = sqrt(avg_x * avg_x + avg_y * avg_y)
    center_lat = atan2(avg_z, hyp)
    return degrees(center_lon), degrees(center_lat)


def gc_distance_points(a, points):
    b = np.asarray(points)
    lat1, lng1 = radians(a[1]), radians(a[0])
    lat2, lng2 = np.radians(b[:, 1]), np.radians(b[:, 0])

    sin_lat1, cos_lat1 = sin(lat1), cos(lat1)
    sin_lat2, cos_lat2 = np.sin(lat2), np.cos(lat2)

    delta_lng = np.subtract(lng2, lng1)
    cos_delta_lng, sin_delta_lng = np.cos(delta_lng), np.sin(delta_lng)

    d = np.arctan2(np.sqrt((cos_lat2 * sin_delta_lng) ** 2 +
                 (cos_lat1 * sin_lat2 -
                  sin_lat1 * cos_lat2 * cos_delta_lng) ** 2),
                 sin_lat1 * sin_lat2 + cos_lat1 * cos_lat2 * cos_delta_lng)

    return RADIUS * d


def gc_dist(a, b):
        from math import radians, sin, cos, sqrt, atan2
        lat1, lng1 = radians(a[1]), radians(a[0])
        lat2, lng2 = radians(b[1]), radians(b[0])

        sin_lat1, cos_lat1 = sin(lat1), cos(lat1)
        sin_lat2, cos_lat2 = sin(lat2), cos(lat2)

        delta_lng = lng2 - lng1
        cos_delta_lng, sin_delta_lng = cos(delta_lng), sin(delta_lng)

        d = atan2(sqrt((cos_lat2 * sin_delta_lng) ** 2 +
                       (cos_lat1 * sin_lat2 -
                        sin_lat1 * cos_lat2 * cos_delta_lng) ** 2),
                  sin_lat1 * sin_lat2 + cos_lat1 * cos_lat2 * cos_delta_lng)

        return RADIUS * d


if __name__ == '__main__':

    import random
    random.seed(42)
    pts = [(random.uniform(-180, 180), random.uniform(-90, 90)) for _ in range(1000000)]
    
    centr = get_centroid(pts)

    import cProfile

    cProfile.runctx('gc_distance_points(centr, pts)', globals(), locals())
    cProfile.runctx('dists = [gc_dist(centr, b) for b in pts]', globals(), locals())

All code is also available at Github. Feel free to post a comment or email in case you have any questions.

Benchmarking reading binary values from a file with F#, Python, Julia, R, Go and OCaml

In one of my recent posts I showed some F# code for storing and reading integers as binary data. Since then I've created different versions of the reading part in Python, Julia, Go, OCaml and R to be able to compare the performance of the different languages on this simple task and to get a feeling on what it's like to program in Julia, Go and OCaml as I hadn't created any programs in these languages yet.

Below table shows the benchmark results for reading 10 times 10000 values and 10000 times 10 values with links to the source code files in the different programming languages:

Language	10 times 10000 values	10000 times 10 values
F#	6 seconds	20 seconds
Python	26 seconds	40 seconds
Julia	45 seconds	72 seconds
Go	8 seconds	25 seconds
OCaml	2.5 seconds	48 seconds
R	110 seconds	NA

The overall fastest version is the one written in F# but note that it's also the version I have tweaked the most. As I'm not very experienced in most of the languages so any performance tips are welcome. Note that I tried using memory mapped files in .NET and Python this improved performance when querying lots of values from the same file but also made it worse in other cases.

The implementation of the functionality is most of the times rather similar in the different languages. Some notable differences where:

Julia apparently doesn't have a null value so I refrained from checking whether the read integer value was equal to the int32 minimum value (-2147483648).
In Go converting the bytes to integers was faster with a custom function.
I didn't find a function in the OCaml Core library to convert bytes to a 32-bit integer, but luckily I found one on Stack Overflow.

F#:

open System
open System.IO

let readValue (reader:BinaryReader) cellIndex = 
    reader.BaseStream.Seek(int64 (cellIndex*4), SeekOrigin.Begin) |> ignore
    match reader.ReadInt32() with
    | Int32.MinValue -> None
    | v -> Some(v)
        
let readValues indices fileName = 
    use reader = new BinaryReader(File.Open(fileName, FileMode.Open, FileAccess.Read, FileShare.Read))
    let values = Array.map (readValue reader) indices
    values

Python:

def read_values(filename, indices):
    # indices are sorted and unique
    values = []
    with open(filename, 'rb') as f:
        for index in indices:
            f.seek(index*4L, os.SEEK_SET)
            b = f.read(4)
            v = struct.unpack("@i", b)[0]
            if v == -2147483648:
                v = None
            values.append(v)
    return values

Julia:

function readvalue(stream, position)
    seek(stream, position)
    return read(stream, Int32)
end

function readvalues(filename::String, indices)
    stream = open(filename, "r")
    try
        return Int32[readvalue(stream, index*4) for index in indices]
    finally
        close(stream)
    end
end

Go:

import ("os")
func bytes2int(b []byte) int32 {
    v := int32(0)
    for  i := 0; i < 4; i++ {
        v = v | (int32(b[i]) << (uint(8*i)))
    }
    return v
}

func readValues(indices []int, filename string) []int32 {
    results := make([]int32, len(indices))
    b := make([]byte, 4)
    f,_ := os.Open(filename)
    for i, cellIndex := range indices {
        f.Seek(int64(cellIndex*4), os.SEEK_SET)
        f.Read(b)
        value := bytes2int(b) // around 10-20% faster then binary.Read
        if value != -2147483648 {
            results[i] = value
        } else {
            results[i] = 99999
        }
    }
    return results
}

OCaml:

let input_le_int32 inchannel = (* http://stackoverflow.com/a/6031286/477367 *)
  let res = ref 0l in
    for i = 0 to 3 do
      let byte = input_byte inchannel in
        res := Int32.logor !res (Int32.shift_left (Int32.of_int byte) (8*i))
    done;

    match !res with
      | -2147483648l -> None
      | v -> Some(v)

let readvalue inchannel index =
  seek_in inchannel (index*4);
  input_le_int32 inchannel

let readvalues (indices:int array) filename =
  let inchannel = open_in_bin filename in
    try
      let result = Array.map (readvalue inchannel) indices in
        close_in inchannel;
        result
    with e ->
      close_in_noerr inchannel;
      raise e

read.values <- function(filename, indices) {
  conn <- file(filename, "rb")
  read.value <- function(index) {
    seek(conn, where=index*4)
    readBin(conn, integer(), size = 4, n = 1, endian = "little")
  }
  r <- sapply(indices,read.value)
  close(conn)
  r[r==-2147483648] <- NA
  r
}

Any suggestions for improving the implementation in one of the above programming languages ? Which language would you like to compare my results with ? Which other language(s) do you expect to be faster than my benchmark results ? Can you help me out with a version in your favorite language or in C, fortran, Common Lisp, Scheme, Clojure, Java or J ?

Think Python

Several years ago, when I was learning about Python for the first time, I read the online book: "How to think like a computer scientist". Today I just finished Think Python: How To Think Like a Computer Scientist by Allen B. Downey (free e-version) which is an evolved version of this book.

You should read this book if you want to learn programming or if you want to learn python. But if you already have advanced programming skills then I would suggest to just skim the free online version and spend your money on a more advanced python book.

Think Python starts with the most basic things like operators, variables and assignment then functions, conditionals, recursion and iteration are introduced, followed up with strings, lists , dictionaries and tuples. The next chapter is a practical chapter on files and the book ends with 4 chapters introducing object oriented programming. Between the different chapters there are 4 case studies where the learned concepts and techniques are applied.

What you won't learn in this book:

what the different standard libraries are
web development in python
popular (scientific) libraries like numpy
writing (unit) tests for your programs
Python 3, except some small remarks

What I particularly liked:

the information about debugging your programs at the end of every chapter and the appendix
the step by step introduction to object-oriented programming
the case studies
the exercises
the glossary at the end of every chapter

As you might have inferred from the above, this book is not a reference book (we have the web for that) but a book that learns you to think like a programmer.

An alternative way to learn Python is Learn Python the Hard Way (html) (pdf + epub + video) by Zed Shaw.

Other books by Allen B. Downey:

Think Complexity: Exploring Complexity Science with Python (free e-version)
Think Stats: Probability and Statistics for Programmers (free e-version)
Think Bayes (free e-version)
Think OS: A Brief Introduction to Operating Systems (free e-version)

If you want to improve your Python skills then take a look at this list of advanced books. I especially liked Expert Python Programming by Tarek Ziadé.

Working in lat long: great circle distance, bearing, midpoint and centroid calculations

For my work in species distribution modeling I'm mostly working with lat long coordinates so I needed some great circle functions to calculate the point-point distance, point-to-line distance and the centroid of a group of points. Starting from 2 excellent resource http://www.movable-type.co.uk/scripts/latlong.html
and http://williams.best.vwh.net/ftp/avsig/avform.txt I created the following functions in Python, with some tests. As usual the full source code can be found online: GreatCircle.py.

I started with importing some functions from the math module and defining a small Point class:

from math import radians, degrees, cos, sin, sqrt, atan2, asin, fabs, pi

class Point(object):
    def __init__(self,x,y):
        self.x = x
        self.y = y

Then came some very simple functions for calculating the distance (with the Haversine formula), the bearing and the midpoint. These are almost verbatim translations from the previously mentioned resources.

def distance_haversine(A, B, radius=6371000):
    """ note that the default distance is in meters """
    dLat = radians(B.y-A.y)
    dLon = radians(B.x-A.x)
    lat1 = radians(A.y)
    lat2 = radians(B.y)
    a = sin(dLat/2) * sin(dLat/2) + sin(dLon/2) * sin(dLon/2) * cos(lat1) * cos(lat2)
    c = 2 * atan2(sqrt(a), sqrt(1-a))
    return c * radius

def bearing(A, B):
    dLon = radians(B.x-A.x)
    lat1 = radians(A.y)
    lat2 = radians(B.y)
    y = sin(dLon) * cos(lat2)
    x = cos(lat1)* sin(lat2) - sin(lat1) * cos(lat2)* cos(dLon)
    return atan2(y, x)

def bearing_degrees(A,B):
    return degrees(bearing(A,B))

def midpoint(A,B):
    ## little shortcut
    if A.x == B.x: return Point(A.x, (A.y+B.y)/2)
    if A.y == B.y: return Point((A.x+B.x)/2, A.y)
    
    lon1, lat1 = radians(A.x), radians(A.y)
    lon2, lat2 = radians(B.x), radians(B.y)
    dLon = lon2-lon1

    Bx = cos(lat2) * cos(dLon)
    By = cos(lat2) * sin(dLon)
    lat3 = atan2(sin(lat1)+sin(lat2), sqrt((cos(lat1)+Bx)*(cos(lat1)+Bx) + By*By))
    lon3 = lon1 + atan2(By, cos(lat1) + Bx)
    return Point(degrees(lon3),degrees(lat3))

Then came the more puzzling part where I implemented the cross track error to calculate the great circle distance from a point to a line .

def crosstrack_error(p,A,B, radius=6371000):
    """ distance (in meters) from a point to the closest point along a track
        http://williams.best.vwh.net/avform.htm#XTE """
    dAp = distance_haversine(A, p, radius=1)
    brngAp = bearing(A,p)
    brngAB = bearing(A,B)
    dXt = asin(sin(dAp)*sin(brngAp-brngAB))
    return fabs(dXt) * radius

But the results didn't always match the distances that I calculated with PostGIS so I wrote a recursive binary search function consisting of 3 parts: calculating the midpoint, calculating the distance to two end and the midpoint and continuing again with the halve that is closest to the point until the difference in distance between the point and the two ends of the line is within a certain tolerance. The disadvantage of this function is that the rounding errors in calculating the midpoint do add up and the function is not very fast. So if anyone knows what wrong with my cross track implementation or knows a better method for calculating the great circle distance between a point and a line, feel free to comment below or send me an e-mail at mail@<thisdomainhere>.com.

def point_line_distance(p, A,B, radius=6371000, tolerance=0.1):

    # tolerance and radius in meters
    """ recursive function that halves the search space until result is within tolerance
        disadvantages:
        1) the rounding errors in midpoint do add up
        2) not very fast """
    def rec_point_line_distance(p,A,B,dA,dB):
        C = midpoint(A,B)
        dC = distance_haversine(p,C)
        if fabs(dC-dA) < tolerance or fabs(dC-dB) < tolerance:
            return dC
        elif dA < dB:
            return point_line_distance(p,A,C,dA, dC)
        else:
            return point_line_distance(p,C,B,dC, dB)
    dA = distance_haversine(p,A)
    dB = distance_haversine(p,B)
    return rec_point_line_distance(p,A,B,dA,dB)

The last function I needed was finding the centroid of a set of points in lat long coordinates. Below is an implementation from a response on gis.stackexchange.com. First all coordinates are converted to cartesian coordinates. Then the average of the x, y and z coordinates are calculated and finally these average coordinates are converted back to longitude and latitude.

def get_centroid(points):
    """ 
    http://www.geomidpoint.com/example.html 
    http://gis.stackexchange.com/questions/6025/find-the-centroid-of-a-cluster-of-points
    """
    sum_x,sum_y,sum_z = 0,0,0
    for p in points:
        lat = radians(p.y)
        lon = radians(p.x)
        ## convert lat lon to cartesian coordinates
        sum_x = sum_x + cos(lat) * cos(lon)
        sum_y = sum_y + cos(lat) * sin(lon)
        sum_z = sum_z + sin(lat)
    avg_x = sum_x / float(len(points))
    avg_y = sum_y / float(len(points))
    avg_z = sum_z / float(len(points))
    center_lon = atan2(avg_y,avg_x)
    hyp = sqrt(avg_x*avg_x + avg_y*avg_y) 
    center_lat = atan2(avg_z, hyp)
    return Point(degrees(center_lon), degrees(center_lat))

For the implementation and running of my tests I used the built-in unittest module in the following way.

import unittest
class Test_gc(unittest.TestCase):
    def test_distance_haversine(self):
        d = distance_haversine(Point(-10.0001, 80.0001), Point(7.935, 63.302))
        self.assertAlmostEqual(d, 1939037.0, places=0)
        
    def test_bearing(self):
        b = bearing(Point(-10.0001, 80.0001), Point(7.935, 63.302))
        self.assertAlmostEqual(b, 2.661709,places=6)

    def test_bearing_degrees(self):
        b = bearing_degrees(Point(-10.0001, 80.0001), Point(7.935, 63.302))
        self.assertAlmostEqual(b, 152.504694,places=6)

    def test_midpoint(self):
        m = midpoint(Point(-20.0,5.0), Point(-10.0,5.0))
        self.assertAlmostEqual(-15.0, m.x, places=6)
        m = midpoint(Point(7.0,5.0), Point(7.0,15.0))
        self.assertAlmostEqual(10.0, m.y, places=6)
        
    def test_crosstrack_error(self):
        p,A,B = Point(0.0, 1.0), Point(1.0, 1.0), Point(2.0, 1.0)
        d = crosstrack_error(p,A,B)
        self.assertAlmostEqual(d, distance_haversine(p,A), places=0)
        ## Apparently crosstrack_error returns incorrect results in some very specific cases (when p is completly off track)
        
    def test_point_line_distance(self):
        
        p,A,B = Point(0.0, 1.0), Point(1.0, 1.0), Point(2.0, 1.0)
        d = point_line_distance(p,A,B)
        self.assertAlmostEqual(d, distance_haversine(p,A), places=0)

        p,A,B = Point(2.0, 1.2), Point(1.0, 1.0), Point(3.0, 1.0)
        d = point_line_distance(p,A,B)
        self.assertAlmostEqual(d, distance_haversine(p,Point(2.0,1.0)), places=0)

        p,A,B = Point(0.0, 90.0), Point(-179.0, -21.0), Point(179.5, -22.0)
        d = point_line_distance(p,A,B)
        self.assertAlmostEqual(d, distance_haversine(p,A), places=0)

        p,A,B = Point(0.0, 89.0), Point(-179.0, -22.0), Point(179.5, -22.0)
        d = point_line_distance(p,A,B)
        self.assertAlmostEqual(d, distance_haversine(p,Point(0, -22)), places=0)
        
    def test_get_centroid(self):
        ## check crosses the north pole
        center = get_centroid([Point(0.0, 80.0), Point(180.0,80.0)])
        self.assertEqual(90.0, center.y)
        ## check multiple points
        center = get_centroid([Point(0.0, 0.0), Point(10.0,0.0), Point(5.0,10.0)])
        self.assertAlmostEqual(5.0, center.x)
        ## even more points
        center = get_centroid([Point(0.0, 0.0), Point(10.0,0.0),Point(10.0,30.0), Point(10.0,10.0), Point(20.0, 0.0)])
        self.assertAlmostEqual(10.0, center.x)
        ## not lat lon average
        center = get_centroid([Point(0.0, 30.0), Point(10.0,30.0),Point(10.0,60.0), Point(0.0,60.0)])
        self.assertNotAlmostEqual(45.0, center.y)
        self.assertAlmostEqual(5.0, center.x)
        ## crosses date line
        center = get_centroid([Point(170.0, 30.0), Point(-160.0,30.0),Point(170.0,60.0), Point(-160.0,60.0)])
        self.assertNotAlmostEqual(45.0, center.y)
        self.assertAlmostEqual(-175.0, center.x)

if __name__ == '__main__':
    unittest.main()

That's it for this post, the full code is on GitHub. If you'd like more posts like this on this blog then feel free to make some topic suggestions in the comments and make sure to subscribe to the updates of this site via RSS or via e-mail.

Array manipulation and booleans in numpy, hunting for what went wrong

Recently I was doing some raster processing with gdal_calc.py but I was getting some surprising results.
The goal was to convert some continuous value into a binary value based on a treshold and then detect places where only A was True (value: -1), where only B was True (value: 1) and where both A and B where True (value: 0). With a treshold set to 2 I came with the following formula:

(A > 2) - (B > 2)

But after calling the gdal_calc.py with my formula and inspecting the results I only got values of 0 and 1.
After inspecting gdal_calc.py I noticed that it uses numpy and more specifically numpy arrays for the raster manipulation.

This how my python shell session went (numpy_array_math.py):

>>> import numpy as np
>>> a = np.array([1,2,3,4])
>>> b = np.array([1,5,3,2])
>>> print(a > 2)
[False False  True  True]
>>> print(b > 2)
[False  True  True False]
>>> print(True-False)
1
>>> print(False-True)
-1
>>> print((a>2)-(b>2))
[False  True False  True]
>>> print((a>2)*1-(b>2))  # we got a winner
[ 0 -1  0  1]

The problem was that boolean substraction in Python does generate the expected numeric results but the results where converted back into a boolean array by numpy after the substraction. And indeed converting -1, 1 or any other non zero number to a boolean generates a True which when converted back to number for writing the raster to disk gives you the value 1.
The solution was to force at least one of the arrays to be numeric so that we substract numeric values.

>>> bool(1)
True
>>> bool(2)
True
>>> bool(-1)
True
>>> bool(0)
False
>>> print((a>2)*1)
[0 0 1 1]

If you want to try this yourself on Windows then the easiest way to install gdal is with the OSGeo4W installer. A windows installer for numpy can be found on the website by Christopher Golke but consider also installing the full SciPy stack with one of the Scientific Python distributions.

What surprising results have you encountered with numpy.array or gdal_calc ?

Minimal introduction to Python

A few years ago I gave a short introduction to Python to some of my former colleagues and this was the result (also on github):

# 1) data basics

# numbers
i = 1234    # integer
f = 1.234   # float

# operators
a = 12 + 5  # = 17
b = 3 * 4   # = 12
c = 6 / 3   # = 2
d = 46 % 5  # = 1 (=> modulo operator)

a += 1      # = 18
a -= 2      # = 16
a *= 2      # = 32
a /= 2      # = 16
a %= 5      # = 1

# text
s = 'string'                
s = s.upper()               # 'STRING'
s = s.lower()               # 'string'
s.startswith('s')           # true
s.endswith('ng')             # true
s = s.replace('str', 'bl')  # 'bling'
l = s.split('i')            # list ['bl', 'ng']
strings = ['s', "t", "r'i'n", 'g"s"tring', "3"]

## add the prefix r for e.g. paths to files to escape the backslash character

testdoc = r'test\test.txt' ## same as : 'test\\test.txt'

# list
l = ['a1']
l.append('b2')
l.append('c3')
l[0] # 'a1'


mixed = ['3', 'abc', 3, 4.56]

d = {} # dictionary (key-value pairs)

d = {'a' : 'blabla',
     'b' : 1,
     3 : 'ablabl',
     4 : 12.3,
     16.9 : 'dif3'}

d['a'] # 'blabla'

# 2) conditions, loops and functions

# indentation is required !!!

if 3 > 2:
    print('3 > 2')
elif 1 == 0:
    print('1 = 0')
else:
    print('else clause')

if 'a' in d:
    print(d['a'])
else:
    print('not found')

for x in range(0, 5):  # from 0 to 5 (0, 1, 2, 3 ,4)
    print(x)

letters = ['a', 'b', 'c']
for letter in letters:
    print(letter)

# list comprehension
upper_letters = [letter.upper() for letter in letters]
print(upper_letters)


d = { 1:'a', 2:'b', 3:'c'}
for key, value in d.iteritems():
    print(str(key), value)
for key in d.keys():
    print('key :', str(key))
for value in d.values():
    print('value :', value)

def special_sum(numberlist):
    total = 0
    for element in numberlist:
        if element < 5:
            continue # go to the next element
        elif total > 100:
            break # stop the loop
        else:
            total += element
    return total

print(special_sum([1,2,2,4,8,50, 60])) # = 118

# 3) using os and shutil

# import modules

import os
import shutil

def setup_test(directory):
    if not os.path.isdir(directory):
        os.mkdir(directory)
    file1 = os.path.join(directory, 'file1_test.dll')
    open(file1, 'a').close() # create empty file and close it
    file2 = os.path.join(directory, 'file2_test.txt')
    open(file2, 'a').close() # 'a' creates a file for appending (doesn't overwrite)
    
setup_test('test')

# looping over files in dir and subdirs and renaming some of them
def list_files(startdir):
    for element in os.listdir(startdir):
        path = os.path.join(startdir, element)
        if os.path.isfile(path):
            print(path)
            root, extension = os.path.splitext(path)
            if extension.lower() == '.dll': # add an extra extension to dll's
                shutil.copyfile(path, path + '.REMOVETHIS')
                # with os.rename you can replace the file
                ## os.rename(path, path + '.REMOVETHIS')
        else:
            list_files(path)

startdir = r'test'
list_files(startdir)

# or you can loop with os.walk

for root, directories, files in os.walk(startdir, topdown=True):
    print 'dir : %s' % root
    if files:
        print('files :')
        for f in files:
            print('\t', os.path.join(root, f))

# 4) reading, parsing and writing files

def create_tab_test(inputpath):
    with open(inputpath, 'w') as inputfile: # 'w' creates a file for (over)writing
        # create some dummy content separated by tabs and with newlines at the end
        lines = ['\t'.join([str(1*i),str(2*i)])+'\n' for i in range(5)]
        # write to the file
        inputfile.writelines(lines)
        
def tab_to_csv(inputpath, outputpath): # only for small files
    lines = []
    with open(inputpath) as inputfile:
        for line in inputfile:
            line = line.replace('\n', '')
            columns = line.split('\t')
            line = ';'.join(columns)
            lines.append(line)
    with open(outputpath, 'w') as outputfile: # overwrites the outputfile if it exists
        outputfile.write('\n'.join(lines))
    
inputpath = r'test\test.txt' ## or 'D:\temp\test.txt'
outputpath = r'test\test.csv'

create_tab_test(inputpath)
tab_to_csv(inputpath, outputpath)

Want to learn more or need some help ?
Contact me at mail@samuelbosch.com or take a look at one of the following resources:

https://docs.python.org/3/tutorial/
Learn Python The Hard Way by Zed A. Shaw
A Byte of Python by Swaroop Chitlur
Dive Into Python 3 by Mark Pilgrim

I you want a bigger list of Python and other CS related books to look at then the following free programming books list might be what you're looking for.

-- update --
Some suggestions by Meriam in the comments:

Type "import this" in a python prompt
Have a look at the style guide for python code
Learn about exceptions

Azimuthal equidistant projection: An implementation in F#, Python and Julia

Recently I needed to calculate distance from one point to another set of points in order to find the nearest point and its distance to the origin point. I opted to project all points to the azimuthal equidistant projection. The most important property of this projection is that all distances from the center of the projection to all other points represents the true distance (assuming a spheric earth) to all other projected points. To calculate the distance you just have to calculate the euclidean distance and multiply it with the average earth radius (6371000.0 meters).

The formulas for the azimuthal equidistant projection can be found at mathworld.wolfram.com. This formula can then be directly translated into F# like this:

open System
module AzimuthalEquidistantProjection =

    let inline degToRad d = 0.0174532925199433 * d; // (1.0/180.0 * Math.PI) * d

    let project centerlon centerlat lon lat =
        // http://mathworld.wolfram.com/AzimuthalEquidistantProjection.html
        // http://www.radicalcartography.net/?projectionref
        let t:float = degToRad lat
        let l:float = degToRad lon
        let t1 = degToRad centerlat // latitude center of projection
        let l0 = degToRad centerlon // longitude center of projection
        let c = acos ((sin t1) * (sin t) + (cos t1) * (cos t) * (cos (l-l0)))
        let k = c / (sin c)
        let x = k * (cos t) * (sin (l-l0))
        let y = k * ((cos t1) * (sin t) - (sin t1) * (cos t) * (cos (l-l0)))
        (x, y)

I also implemented the azimuthal equidistant projection by using units of measure. I create one set of measures to distinguish between degrees and radians and another one to distinguish between x and y.

open System
module AzimuthalEquidistantProjectionWithMeasures = 
    [<Measure>] type deg // degrees
    [<Measure>] type rad // radians
    [<Measure>] type x
    [<Measure>] type y
    type lon = float<x deg>
    type lat = float<y deg>
    
    let inline degToRad (d:float<'u deg>) = d*0.0174532925199433<rad/deg>
    let cos (d:float<'u rad>) = Math.Cos(float d)
    let sin (d:float<'u rad>) = Math.Sin(float d)
    
    let project (centerlon:lon) (centerlat:lat) (lon:lon) (lat:lat) =
        // http://mathworld.wolfram.com/AzimuthalEquidistantProjection.html
        // http://www.radicalcartography.net/?projectionref
        let t = degToRad lat
        let l = degToRad lon
        let t1 = degToRad centerlat // latitude center of projection
        let l0 = degToRad centerlon // longitude center of projection
        let c = acos ((sin t1) * (sin t) + (cos t1) * (cos t) * (cos (l-l0)))
        let k = c / (sin c)
        let x:float<x> = k * (cos t) * (sin (l-l0)) 
                         |> LanguagePrimitives.FloatWithMeasure
        let y:float<y> = k * ((cos t1) * (sin t) - (sin t1) * (cos t) * (cos (l-l0))) 
                         |> LanguagePrimitives.FloatWithMeasure
        (x, y)

In the past I've used the python OGR bindings and the ESRI Projection Engine for my map projection needs but this time I needed a pure python implementation so I translated the above code and optimized it a bit by precalculating some values that we'll need when we project the points during the initialization of my projection class. A similarly optimized version in F# is on fssnip.net.

from math import cos, sin, acos, radians

class Point(object):
    def __init__(self,x,y):
        self.x = x
        self.y = y       

class AzimuthalEquidistantProjection(object):
    """ 
        http://mathworld.wolfram.com/AzimuthalEquidistantProjection.html
        http://www.radicalcartography.net/?projectionref
    """
    def __init__(self, center):
        self.center = center
        self.t1 = radians(center.y) ## latitude center of projection
        self.l0 = radians(center.x) ## longitude center of projection
        self.cost1 = cos(self.t1)
        self.sint1 = sin(self.t1)
        
    def project(self, point):
        t = radians(point.y)
        l = radians(point.x)
        costcosll0 = cos(t) * cos(l-self.l0)
        sint = sin(t)
        
        c = acos ((self.sint1) * (sint) + (self.cost1) * costcosll0)
        k = c / sin(c)
        x = k * cos(t) * sin(l-self.l0)
        y = k * (self.cost1 * sint - self.sint1 * costcosll0)
        return Point(x, y)

import unittest
class Test_AzimuthalEquidistantProjection(unittest.TestCase):
    def test_project(self):
        p = AzimuthalEquidistantProjection(Point(0.0,0.0))
        r = p.project(Point(1.0,1.0))
        self.assertAlmostEqual(0.01745152022, r.x)
        self.assertAlmostEqual(0.01745417858, r.y)

        p = AzimuthalEquidistantProjection(Point(1.0,2.0))
        r = p.project(Point(3.0,4.0))
        self.assertAlmostEqual(0.03482860733, r.x)
        self.assertAlmostEqual(0.03494898734, r.y)

        p = AzimuthalEquidistantProjection(Point(-10.0001, 80.0001))
        r = p.project(Point(7.935, 63.302))
        self.assertAlmostEqual(0.1405127567, r.x)
        self.assertAlmostEqual(-0.263406547, r.y)

if __name__ == '__main__':
    unittest.main()

I've also implemented the straightforward projection code in Julia. Julia is a promising language built for technical computing. It's still only on version 0.2 but a lot of packages have already been created for the language.

function degToRad(d)
    d*0.0174532925199433
end

function project(centerlon, centerlat, lon, lat)
  t = degToRad(lat)
  l = degToRad(lon)
  t1 = degToRad(centerlat)
  l0 = degToRad(centerlon)
  c = acos(sin(t1) * sin(t) + cos(t1) * cos(t) * cos(l-l0))
  k = c / sin(c)
  x = k * cos(t) * sin(l-l0)
  y = k * (cos(t1) * sin(t) - sin(t1) * cos(t) * cos(l-l0))
  x, y
end

More Python, F#, Julia and geospatial code will be posted here so make sure to subscribe to my e-mail updates on the left. I also occasionally tweet as @gissolved.

Monte Carlo Experiment of the Orchard Game: What are the odds of winning ?

Update november 2021: Simon Naarmann noticed a bug in the worst scenario, I've updated this post to include the fix.

Recently my little son received the Orchard Game (Amazon). After playing it a few time I started wondering what the odds of winning the game are. To find this out I wrote a short Monte Carlo simulation of the game.

The rules

The orchard game is a very simple collaborative game with the following rules:

On the board 4 types of fruits with 10 pieces each.
Every player throws a six sided dice with 4 colors (for each fruit a color), a fruit basket and a raven.
If you throw a color you may harvest one piece of fruit with the same color.
If you throw the fruit basket you can take two pieces of the fruit of your choice.
If you throw the raven you have to put one of the 9 pieces of the raven on the board.
If all fruits are collected by the players before the raven is complete then the players have won the game.

Note that to play perfectly you have to pick up the most abundant fruits when you've thrown the fruit basket.

The program

I wrote the following simulation in Python. First thing I wrote was a simulation of the game. The actual strategy used to decide which fruit is harvested when the fruit basket is thrown is passed in as a function.

import random

def orchard(fruitbox_strategy):
    raven = 0
    fruits = 4 * [10]
    while sum(fruits) > 0 and raven < 9:
        n = random.randint(0,5)
        if n == 5:
            raven = raven+1
        elif n == 4:
            fruitbox_strategy(fruits)
            fruitbox_strategy(fruits)
        elif fruits[n] > 0:
            fruits[n] = fruits[n]-1
    return raven < 9

Next I implemented 3 different fruit basket strategies. The perfect playing strategy is decrease_max. It will always harvest the most abundant fruit(s).

def decrease_max(fruits):
    mi, mv = 0,0
    for i, v in enumerate(fruits):
        if v > mv:
            mi,mv = i,v
    fruits[mi] = mv-1

def decrease_min(fruits):
    mi, mv = 0, 11
    for i, v in enumerate(fruits):
        if v < mv and v > 0:
            mi,mv = i,v

    if mv < 11:
        fruits[mi] = mv-1

def decrease_random(fruits):
    mi = random.randint(0,3)
    v = fruits[mi]
    if v > 0:
        fruits[mi] = v-1
    elif sum(fruits) > 0:
        decrease_random(fruits)

After that I implemented the Monte Carlo simulation. It will repeatedly play the game with the given strategy and return the percentage of games it won. The number of times the game is simulated can be picked up front. To get an idea of the robustness of the result I repeated the simulation 10 times to be able to inspect the variability in the outcome. Increasing the number of games played within a simulation makes the results more stable.

def monte_carlo_simulation(game, count):
    won = 0
    for _ in range(count):
        if game():
            won = won + 1
    return (won*100) /count

def simulate_orchard_best(count):
    return monte_carlo_simulation(lambda:orchard(decrease_max), count)

def simulate_orchard_worst(count):
    return monte_carlo_simulation(lambda:orchard(decrease_min), count)

def simulate_orchard_random(count):
    return monte_carlo_simulation(lambda:orchard(decrease_random), count)


print('Winning rates of 10 runs of the best strategy with 50 games: \n%s' %
      ([str(simulate_orchard_best(50)) + '%' for _ in range(10)]))
    
print('Winning rates of 10 runs of the best strategy with 1000 games: %s' %
      ([str(simulate_orchard_best(1000)) + '%' for _ in range(10)])) 

print('Winning rates of 10 runs of the worst strategy with 50 games: \n%s' %
      ([str(simulate_orchard_worst(50)) + '%' for _ in range(10)]))

print('Winning rates of 10 runs of the worst strategy with 1000 games: \n%s' %
      [str(simulate_orchard_worst(1000)) + '%' for _ in range(10)])

print('Winning rates of 10 runs of the random strategy with 50 games: \n%s' % 
      ([str(simulate_orchard_random(50)) + '%' for _ in range(10)]))

print('Winning rates of 10 runs of the random strategy with 1000 games: \n%s' %
      [str(simulate_orchard_random(1000)) + '%' for _ in range(10)])

A typical result looks like this:

Winning rates of 10 runs of the best strategy with 50 games: 
['58%', '80%', '70%', '62%', '70%', '66%', '62%', '78%', '64%', '70%']
Winning rates of 10 runs of the best strategy with 1000 games:
['67%', '68%', '68%', '68%', '68%', '67%', '69%', '67%', '71%', '69%']
Winning rates of 10 runs of the worst strategy with 50 games: 
['60%', '44%', '62%', '42%', '50%', '50%', '62%', '54%', '64%', '58%']
Winning rates of 10 runs of the worst strategy with 1000 games: 
['53%', '51%', '54%', '54%', '53%', '52%', '54%', '57%', '50%', '54%']
Winning rates of 10 runs of the random strategy with 50 games: 
['72%', '58%', '60%', '66%', '60%', '58%', '60%', '70%', '60%', '72%']
Winning rates of 10 runs of the random strategy with 1000 games: 
['62%', '62%', '63%', '63%', '62%', '60%', '60%', '62%', '61%', '65%']

The full source code is in the github repository of this blog.

Pages

The rules

The program