BIOMOD2: invalid term in model formula

While trying out BIOMOD2 for species distribution modeling I encountered the following error when trying to create a random forest (RF) model:

Error in terms.formula(formula, data = data) : 
  invalid term in model formula
Error in predict(, Data[, expl_var_names, drop = FALSE], on_0_1000 = TRUE) : 
  error in evaluating the argument 'object' in selecting a method for function 'predict': Error: object '' not found

After lots of digging in the source code of BIOMOD I discovered what happened and how to fix it.

The line where the error was occurring looked like this:

model.sp <- try(randomForest(formula = makeFormula(resp_name, 
    head(Data), "simple", 0), data = Data[calibLines, 
    ], ntree = Options@RF$ntree, importance = FALSE, 
    norm.votes = TRUE, strata = factor(c(0, 1)), 
    nodesize = Options@RF$nodesize, maxnodes = Options@RF$maxnodes))

As you can see a formula is created from our data. The generated formula looked like this:

654987634 ~ 1 + nitrate_mean + par_mean + sst_mean + sst_range

The number 654987634 is my response variable name that I set when calling BIOMOD_FormatingData which apparently has been added to the Data data.frame as the first column. 

Solution: adding a letter prefix to my response variable name fixed my issue.

Lessons learned:
  1. It's incredibly useful that you can take a look at external source code in an RSession by typing its name, for iternals you add the package name and 3 double dots e.g. biomod2:::.Biomod.Models. Generics functions can be fetched with getMethod e.g. getMethod(biomod2::predict, signature="RF_biomod2_model")
  2. Read the docs very carefully: this is the description for response variable name (character). The species name.
  3. An assertion in BIOMOD_FormatingData would have saved me an hour or two. But note that a naive check like is.character( wouldn't work in this case, because once the formula is made it isn't a character anymore.
  4. Adding more diagnostic information in your error messages saves people time. If the generated formula would have been printed as well then I would probably have noticed the issue directly.
  5. Libraries make you win time, but you lose some as well.
If you liked this content then don't forget to subscribe on the top left.

1 comment:

Pieter P. said...

The docs aren't really helping, as "12345" is in fact of type character and therefore not illegal. FormatingData already does some name checking and even conversion, maybe a check for syntactically valid names should be added:

> resp <- c("12345", "ABCDE")
> all(make.names(resp) == resp)
> resp <- make.names(resp)
> all(make.names(resp) == resp)
[1] TRUE
> resp
[1] "X12345" "ABCDE"