Speeding up mean and standard deviation calculation for a stack of rasters in R

While modelling the distribution of invasive seaweeds during my PhD I wanted to calculate the mean and standard deviation of a set of SDMs that appeared to be equivalent in for the current climate but who gave different results for the future climate. However, the standard naive approach for calculating the mean and standard deviation of a RasterStack was rather slow. I managed to get a 20 fold speed up by calculating the mean and standard deviation by aggregating them in a stream. This rather obvious for calculating the mean, summing all rasters and then dividing by the number of rasters. For the standard deviation, the formula is only slightly more complicated and can be found both on Wikipedia and in code on StackOverflow. If your raster contains very large numbers, you might need to adapt this code to use Welford's method.

# Option 1: shorter but slower
rasterstack_meansd_slow <- function(x) {
mean <- raster::mean(x) sd <- raster::calc(x, sd)
  list(mean=mean, sd=sd)
}

# Option 2: faster but more code
rasterstack_meansd_fast <- function(x) {
  s0 <- nlayers(x)
  s1 <- raster(x, layer=1)
  s2 <- s1^2
  for(ri in 2:s0) {
    r <- raster(x, layer=ri)
    s1 <- s1 + r
    s2 <- s2 + r^2
  }
  list(mean=s1/s0, sd=sqrt((s0 * s2 - s1 * s1)/(s0 * (s0 - 1))))
}

As a small example I calculate the mean and standard deviation of the sea surface temperature for different climate scenarios for 2050. The fast version only takes 10 seconds on my machine while the slow version takes 225 seconds to calculate the mean and standard deviation.


library(sdmpredictors)

sstfuture <- load_layers(c('BO2_RCP26_2050_tempmean_ss', 'BO2_RCP45_2050_tempmean_ss', 
                           'BO2_RCP60_2050_tempmean_ss', 'BO2_RCP85_2050_tempmean_ss'))

system.time({ fast <- rasterstack_meansd_fast(sstfuture) }) # 10 seconds
system.time({ slow <- rasterstack_meansd_slow(sstfuture) }) # 225 seconds

2 comments:

Monika said...

Hi there,
what if there are NAs in the raster(s)?

Samuel Bosch said...

Then the resulting mean value will be NA, this for me ok as I was averaging out different models which had the same cells with value NA.