Fetching all records for a taxonomic group in OBIS

For example, if you want to download the records for all Nudibranchia then in R it is as simple as doing:


    data <- robis::occurrence("Nudibranchia")

    # get species data
    spdata <- data[!is.na(data$taxonRank) & data$taxonRank == "species",]

    # plot on a map

Alternatively, you can download all data with the new mapper that is being developed:


Error in linbin2D: (list) object cannot be coerced to type 'double'

When calling bkde2D you might encounter the following error:

Error in linbin2D(x, gpoints1, gpoints2) : 
  (list) object cannot be coerced to type 'double'

This issue can manifest itself when the passed in data is a tibble instead of a data.frame.

Easy fix:

data <- as.data.frame(data)

Speeding up mean and standard deviation calculation for a stack of rasters in R

While modelling the distribution of invasive seaweeds during my PhD I wanted to calculate the mean and standard deviation of a set of SDMs that appeared to be equivalent in for the current climate but who gave different results for the future climate. However, the standard naive approach for calculating the mean and standard deviation of a RasterStack was rather slow. I managed to get a 20 fold speed up by calculating the mean and standard deviation by aggregating them in a stream. This rather obvious for calculating the mean, summing all rasters and then dividing by the number of rasters. For the standard deviation, the formula is only slightly more complicated and can be found both on Wikipedia and in code on StackOverflow. If your raster contains very large numbers, you might need to adapt this code to use Welford's method.

# Option 1: shorter but slower
rasterstack_meansd_slow <- function(x) {
mean <- raster::mean(x) sd <- raster::calc(x, sd)
  list(mean=mean, sd=sd)

# Option 2: faster but more code
rasterstack_meansd_fast <- function(x) {
  s0 <- nlayers(x)
  s1 <- raster(x, layer=1)
  s2 <- s1^2
  for(ri in 2:s0) {
    r <- raster(x, layer=ri)
    s1 <- s1 + r
    s2 <- s2 + r^2
  list(mean=s1/s0, sd=sqrt((s0 * s2 - s1 * s1)/(s0 * (s0 - 1))))

As a small example I calculate the mean and standard deviation of the sea surface temperature for different climate scenarios for 2050. The fast version only takes 10 seconds on my machine while the slow version takes 225 seconds to calculate the mean and standard deviation.


sstfuture <- load_layers(c('BO2_RCP26_2050_tempmean_ss', 'BO2_RCP45_2050_tempmean_ss', 
                           'BO2_RCP60_2050_tempmean_ss', 'BO2_RCP85_2050_tempmean_ss'))

system.time({ fast <- rasterstack_meansd_fast(sstfuture) }) # 10 seconds
system.time({ slow <- rasterstack_meansd_slow(sstfuture) }) # 225 seconds

Selecting comma separated data as multiple rows with SQLite

A while back I needed to split data stored in one column as a comma separated string into multiple rows in a SQL query from a SQLite database.

My table looked like this:

CREATE TABLE "predictor_sets" 
  (`id` INTEGER, `nvar` INTEGER, `predictors` TEXT, 
    `experiment` TEXT, PRIMARY KEY(`id`));

Insert some sample data:

INSERT INTO predictor_sets VALUES
  (1659, 5, 'BO_bathymax,BO_calcite,BO_parmax,BO_damean,BO_sstrange', 'bathymetry'),
  (1660, 5, 'BO_bathymin,BO_calcite,BO_parmax,BO_damean,BO_sstrange', 'bathymetry'),
  (1661, 5, 'BO_bathymean,BO_calcite,BO_parmax,BO_damean,BO_sstrange', 'bathymetry');

Splitting up the different predictors in seperate rows can be done with the following recursive common table expression:

WITH RECURSIVE split(predictorset_id, predictor_name, rest) AS (
  SELECT id, '', predictors || ',' FROM predictor_sets WHERE id
  SELECT predictorset_id, 
         substr(rest, 0, instr(rest, ',')),
         substr(rest, instr(rest, ',')+1)
    FROM split
   WHERE rest <> '')
SELECT predictorset_id, predictor_name 
  FROM split 
 WHERE predictor_name <> ''
 ORDER BY predictorset_id, predictor_name;

Check out the documentation for more info on writing your own common table expressions in SQLite, PostgreSQL or your favorite database.