Great circle calculations with numpy

In this very short post I want to point you to some code for calculating the centroid and distance to that centroid for a set of points in numpy. As bonus I also included some profiling. Note that I already blogged about the centroid function in a previous post.

from math import atan2, sqrt, degrees
import numpy as np
from math import radians, sin, cos

RADIUS = 6371.009


def get_centroid(points):
    xy = np.asarray(points)
    xy = np.radians(xy)
    lon, lat = xy[:, 0], xy[:, 1]
    avg_x = np.sum(np.cos(lat) * np.cos(lon)) / xy.shape[0]
    avg_y = np.sum(np.cos(lat) * np.sin(lon)) / xy.shape[0]
    avg_z = np.sum(np.sin(lat)) / xy.shape[0]
    center_lon = atan2(avg_y, avg_x)
    hyp = sqrt(avg_x * avg_x + avg_y * avg_y)
    center_lat = atan2(avg_z, hyp)
    return degrees(center_lon), degrees(center_lat)


def gc_distance_points(a, points):
    b = np.asarray(points)
    lat1, lng1 = radians(a[1]), radians(a[0])
    lat2, lng2 = np.radians(b[:, 1]), np.radians(b[:, 0])

    sin_lat1, cos_lat1 = sin(lat1), cos(lat1)
    sin_lat2, cos_lat2 = np.sin(lat2), np.cos(lat2)

    delta_lng = np.subtract(lng2, lng1)
    cos_delta_lng, sin_delta_lng = np.cos(delta_lng), np.sin(delta_lng)

    d = np.arctan2(np.sqrt((cos_lat2 * sin_delta_lng) ** 2 +
                 (cos_lat1 * sin_lat2 -
                  sin_lat1 * cos_lat2 * cos_delta_lng) ** 2),
                 sin_lat1 * sin_lat2 + cos_lat1 * cos_lat2 * cos_delta_lng)

    return RADIUS * d


def gc_dist(a, b):
        from math import radians, sin, cos, sqrt, atan2
        lat1, lng1 = radians(a[1]), radians(a[0])
        lat2, lng2 = radians(b[1]), radians(b[0])

        sin_lat1, cos_lat1 = sin(lat1), cos(lat1)
        sin_lat2, cos_lat2 = sin(lat2), cos(lat2)

        delta_lng = lng2 - lng1
        cos_delta_lng, sin_delta_lng = cos(delta_lng), sin(delta_lng)

        d = atan2(sqrt((cos_lat2 * sin_delta_lng) ** 2 +
                       (cos_lat1 * sin_lat2 -
                        sin_lat1 * cos_lat2 * cos_delta_lng) ** 2),
                  sin_lat1 * sin_lat2 + cos_lat1 * cos_lat2 * cos_delta_lng)

        return RADIUS * d


if __name__ == '__main__':

    import random
    random.seed(42)
    pts = [(random.uniform(-180, 180), random.uniform(-90, 90)) for _ in range(1000000)]
    
    centr = get_centroid(pts)

    import cProfile

    cProfile.runctx('gc_distance_points(centr, pts)', globals(), locals())
    cProfile.runctx('dists = [gc_dist(centr, b) for b in pts]', globals(), locals())

All code is also available at Github. Feel free to post a comment or email in case you have any questions.

Adding support for sftp to RCurl

While trying to download data from an FTP with the sftp protocol from R I encountered the folllowing error on my mac:

Error in function (type, msg, asError = TRUE)  : 
  Protocol "sftp" not supported or disabled in libcurl

The easiest way to fix this for me was performing the following steps:

brew install curl --with-libssh2

Then check whether sftp is now among the list of supported protocols:

/usr/local/opt/curl/bin/curl -V

Link to the new version of curl you installed:

brew link --force curl

Re-install RCurl from source:

install.packages("RCurl", type = "source")

And finally check if sftp is now supported:

RCurl::curlVersion()

A new fizzbuzz: generating sums for kids

With my oldest quizzing me for one sum after another for him to solve. I thought about writing a little piece of R code that would generate these for me. The code for doing so was less obvious then I imagined it to be.

Goal:

Generate all possible sums and subtractions for kids of the form

10 - 4 = ?
4 + 3 = ?

Constraints:

- Only numbers between 0-10
- Result should be >= 0

Solution:

generate_sums <- function(maxresult=10) {
  sums <- c()
  for (i in 0:(maxresult-1)) {
    for (j in 0:(maxresult-i)) {
      sums <- c(sums, paste0(i, ' + ', j, ' = ?'))
    }
  }
  sample(sums[!duplicated(sums)])
}
generate_subtractions <- function(maxresult=10) {
  substractions <- c()
  for (i in 0:maxresult) {
    for (j in i:maxresult) {
      substractions <- c(substractions, paste0(j, ' - ', i, ' = ?'))
    }
  }
  sample(substractions[!duplicated(substractions)])
}

I still believe that a more elegant approach should be possible but at least it works.

Fetching all records for a taxonomic group in OBIS


For example, if you want to download the records for all Nudibranchia then in R it is as simple as doing:

    install.packages("robis")

    data <- robis::occurrence("Nudibranchia")

    # get species data
    spdata <- data[!is.na(data$taxonRank) & data$taxonRank == "species",]

    # plot on a map
    robis::leafletmap(spdata)


Alternatively, you can download all data with the new mapper that is being developed:

http://iobis.org/mapper2?scientificname=Nudibranchia&download=true


Error in linbin2D: (list) object cannot be coerced to type 'double'

When calling bkde2D you might encounter the following error:

Error in linbin2D(x, gpoints1, gpoints2) : 
  (list) object cannot be coerced to type 'double'

This issue can manifest itself when the passed in data is a tibble instead of a data.frame.

Easy fix:

data <- as.data.frame(data)