Showing posts with label BIOMOD. Show all posts
Showing posts with label BIOMOD. Show all posts

Geographic null models for species distribution modeling: An implementation combining BIOMOD2 and dismo [Code Dump]

Warning this is more of a code dump instead of a blogpost but I'm still putting it out there hoping it's still useful for someone else as it has been sitting in my draft folder for far too long. Anyways the paper by Robert Hijmans is really worth reading if you're interested in the evaluation of species distribution models. If you have questions, don't hesitate to contact me.

Geographic null model from paper by Robert Hijsmans (link to paper).
package dismo (maintained by Robert J. Hijmans). and the package BIOMOD2.

Load libraries:

library("biomod2")
library("dismo")

The null_model_evaluation function takes as input a BIOMOD2 model.

null_model_evaluation <- function(models) {
  ## flow
  ## 1) create geographic null model from the points used as training data
  ## 2) compare AUC of the model with AUC of the geographic null model (use same presence/absence data)
  
  ## extract test data from models
  data <- get_formal_data(models)
  calib.lines <- get(load(models@calib.lines@link))
  nbOfRuns <- (length(calib.lines) / length(data@data.species))
  calib.lines.mat <- as.matrix(calib.lines, nrow=length(data@data.species), ncol=nbOfRuns)

  evaluations <- get_evaluations(models)
  dims <- dimnames(evaluations)
  rocIndex <- which(dims[1][[1]] == "ROC")
  testingDataIndex <- which( dims[2][[1]] == "Testing.data")
  
  species.data <- data@coord[which(data@data.species == 1),]
  background.data <- data@coord[is.na(data@data.species),]
  
  result <- array(dim=c(nbOfRuns, 2, length(dims[3][[1]])), dimnames=list(dims[4][[1]],c("model", "geo_null_model"), dims[3][[1]]))
  
  ## calculate geographic null models
  for( methodIndex in 1:length(dims[3][[1]])){
    for (runIndex in 1:nbOfRuns){
      model.auc <- evaluations[rocIndex, testingDataIndex,methodIndex,runIndex,1]
      result[runIndex,1,methodIndex] <- model.auc
      
      train.species.data <- species.data[calib.lines[1:nrow(species.data),runIndex],]
      test.species.data <- species.data[!calib.lines[1:nrow(species.data),runIndex],]
      if (nrow(test.species.data) > 0 & !is.na(model.auc)){
        train.background.data <- background.data[calib.lines[nrow(species.data)+1:nrow(calib.lines)-nrow(species.data),runIndex],]
        test.background.data <- background.data[!calib.lines[nrow(species.data)+1:nrow(calib.lines)-nrow(species.data),runIndex],]
        gd <- geoDist(train.species.data, lonlat=TRUE)
        #p <- predict(gd, env)
        #plot(p)
        #points(train.background.data, pch=16, col="purple")
        #points(test.background.data, pch=16, col="blue")
        #points(train.species.data, pch=16, col="green")
        #points(test.species.data, pch=16, col="red")
        gd.evaluation <- evaluate(gd, p=test.species.data, a=test.background.data)
        result[runIndex,2,methodIndex] <- gd.evaluation@auc
      }
    }
  }
  result
}



BIOMOD2: invalid term in model formula

While trying out BIOMOD2 for species distribution modeling I encountered the following error when trying to create a random forest (RF) model:

Error in terms.formula(formula, data = data) : 
  invalid term in model formula
Error in predict(model.bm, Data[, expl_var_names, drop = FALSE], on_0_1000 = TRUE) : 
  error in evaluating the argument 'object' in selecting a method for function 'predict': Error: object 'model.bm' not found

After lots of digging in the source code of BIOMOD I discovered what happened and how to fix it.

The line where the error was occurring looked like this:

model.sp <- try(randomForest(formula = makeFormula(resp_name, 
    head(Data), "simple", 0), data = Data[calibLines, 
    ], ntree = Options@RF$ntree, importance = FALSE, 
    norm.votes = TRUE, strata = factor(c(0, 1)), 
    nodesize = Options@RF$nodesize, maxnodes = Options@RF$maxnodes))

As you can see a formula is created from our data. The generated formula looked like this:

654987634 ~ 1 + nitrate_mean + par_mean + sst_mean + sst_range


The number 654987634 is my response variable name that I set when calling BIOMOD_FormatingData which apparently has been added to the Data data.frame as the first column. 


Solution: adding a letter prefix to my response variable name fixed my issue.

Lessons learned:
  1. It's incredibly useful that you can take a look at external source code in an RSession by typing its name, for iternals you add the package name and 3 double dots e.g. biomod2:::.Biomod.Models. Generics functions can be fetched with getMethod e.g. getMethod(biomod2::predict, signature="RF_biomod2_model")
  2. Read the docs very carefully: this is the description for resp.name: response variable name (character). The species name.
  3. An assertion in BIOMOD_FormatingData would have saved me an hour or two. But note that a naive check like is.character(resp.name) wouldn't work in this case, because once the formula is made it isn't a character anymore.
  4. Adding more diagnostic information in your error messages saves people time. If the generated formula would have been printed as well then I would probably have noticed the issue directly.
  5. Libraries make you win time, but you lose some as well.
If you liked this content then don't forget to subscribe on the top left.