This document is a package vignette for ggBRT designed to explore and visualise the results of boosted regression trees fitted with the gbm.step routine in the dismo package. The vignette shows the use of every function in the package by using data from Jouffray et al. (2019), for which ggBRT was originally created.
The tutorial builds on data from 620 sites across the Hawaiian islands classified into four reef regimes. Each site is associated to a set of 20 biophysical and anthropogenic predictors. We use boosted regression trees (Elith et al. 2008) to model the occurence of each regime at each site based on those predictors. We start by computing boosted regression trees (BRT) with the gbm.step routine and then provide examples for every function included in ggBRT.
# Load ggBRT
library(ggBRT)
# Import data
dat<-Hawaii_RegimesPredictors
PredictorsCat<-Hawaii_PredictorsCategories
# Select the set of 20 predictors
pred<-c(15:34)
# Run the boosted regression trees analyses
brt1 <- gbm.step(data=dat, gbm.x = pred, gbm.y = which(names(dat) == 'Regime1'),
family = "bernoulli", tree.complexity = 3,
learning.rate = 0.01, bag.fraction = 0.75)
brt2 <- gbm.step(data=dat, gbm.x = pred, gbm.y = which(names(dat) == 'Regime2'),
family = "bernoulli", tree.complexity = 3,
learning.rate = 0.01, bag.fraction = 0.75)
brt3 <- gbm.step(data=dat, gbm.x = pred, gbm.y = which(names(dat) == 'Regime3'),
family = "bernoulli", tree.complexity = 3,
learning.rate = 0.01, bag.fraction = 0.75)
Function to extract the models performance of one or several boosted regression trees. Returns a data frame with mean total deviance, mean residual deviance, correlation, area under the receiver operating curve (AUC), percentage deviance explained, cross-validated deviance, cross-validated correlation, cross-validated AUC and cross-validated percentage deviance explained, calculated with the formula 1-(cross-validated deviance/Total.Deviance).
ggPerformance(Regime1=brt1,Regime2=brt2,Regime3=brt3)
#> Regime1 Regime2 Regime3
#> Total.Deviance 1.1778978 1.1809989 1.0801094
#> Residual.Deviance 0.4217460 0.5241492 0.3731044
#> Correlation 0.8551102 0.8088271 0.8657369
#> AUC 0.9751000 0.9572000 0.9837000
#> Per.Expl 64.1950265 55.6181480 65.4567942
#> cvDeviance 0.6947494 0.7500610 0.6546351
#> cvCorrelation 0.6896335 0.6542699 0.6577353
#> cvAUC 0.8966000 0.8818000 0.9008700
#> cvPer.Expl 41.0178546 36.4892719 39.3917820
Function to return a data frame with the relative influence of each predictor in the boosted regression trees and a bar plot. If signif= TRUE
, retains only the “significant” predictors (100/number of predictors, Muller et al. 2013) and rescale their influence to 100. There are several parameters to customise the aesthetics of the plot, see ?ggInfluence
.
ggInfluence(brt1)
#> rel.inf
#> Fishing_NonComm_Boat_Total 21.51630207
#> Complexity 20.06472410
#> Fishing_Comm_Total 11.60439750
#> Depth 8.84157882
#> Effluent 5.51566196
#> WAV_ANOM_F 4.88705511
#> WAV_CLIM_M 3.56151977
#> CHL_CLIM_M 3.46760599
#> PAR_STD 3.39546441
#> Fishing_NonComm_Shore_Line 2.62461046
#> SST_STD 2.59124138
#> Fishing_NonComm_Shore_Spear 2.48290879
#> Fishing_NonComm_Shore_Net 2.33254295
#> SST_CLIM_M 1.97082226
#> CHL_ANOM_F 1.87677332
#> Sedimentation 1.74553360
#> PAR_CLIM_M 0.80488729
#> New_Development 0.67903956
#> Invasive_Algae 0.02503618
#> Habitat_Modification 0.01229448
ggInfluence(brt1, signif=T,col.bar = "darkred",main = "Relative influence - Regime 1")
#> rel.inf
#> Fishing_NonComm_Boat_Total 31.855868
#> Complexity 29.706741
#> Fishing_Comm_Total 17.180841
#> Depth 13.090361
#> Effluent 8.166189
Function to return and plot (heatmap) the relative influence of predictors from several boosted regression trees. The models need to have the same set of predictors. There are several parameters to customise the aesthetics of the plot, see ?ggMultiInfluence
.
ggMultiInfluence(Regime1=brt1,Regime2=brt2,Regime3=brt3)
#> Predictor Regime1 Regime2 Regime3
#> 1 Fishing_NonComm_Boat_Total 21.5163 7.0412 4.282
#> 2 Complexity 20.0647 8.4717 7.623
#> 3 Fishing_Comm_Total 11.6044 2.5704 2.878
#> 4 Depth 8.8416 8.6967 8.872
#> 5 Effluent 5.5157 1.8023 2.630
#> 6 WAV_ANOM_F 4.8871 1.7214 3.590
#> 7 WAV_CLIM_M 3.5615 36.4942 8.220
#> 8 CHL_CLIM_M 3.4676 0.8984 13.556
#> 9 PAR_STD 3.3955 3.5059 8.998
#> 10 Fishing_NonComm_Shore_Line 2.6246 0.8620 2.162
#> 11 SST_STD 2.5912 7.8055 2.886
#> 12 Fishing_NonComm_Shore_Spear 2.4829 4.2890 4.834
#> 13 Fishing_NonComm_Shore_Net 2.3325 0.4240 1.029
#> 14 SST_CLIM_M 1.9708 9.3312 6.521
#> 15 CHL_ANOM_F 1.8768 1.7004 7.659
#> 16 Sedimentation 1.7455 0.9092 0.881
#> 17 PAR_CLIM_M 0.8049 1.5126 6.902
#> 18 New_Development 0.6790 1.9322 5.650
#> 19 Invasive_Algae 0.0250 0.0319 0.827
#> 20 Habitat_Modification 0.0123 0.0000 0.000
Function to return the relative influence per category of predictors. If signif= TRUE
, retains only the “significant” predictors (100/number of predictors, Muller et al. 2013) and rescale their influence to 100.
# all predictors
ggCatInfluence(brt1,brt2,brt3,category = "Category",data=PredictorsCat)
#> Category Model1 Model2 Model3
#> 1 Anthropogenic 48.53833 19.86211 25.17297
#> 2 Biophysical 51.46167 80.13789 74.82703
# significant predictors
ggCatInfluence(brt1,brt2,brt3,category = "Category",data=PredictorsCat, signif=T)
#> Category Model1 Model2 Model3
#> 1 Anthropogenic 57.2029 9.045641 7.634679
#> 2 Biophysical 42.7971 90.954359 92.365321
Function to draw partial dependency plots with fitted function. There are several parameters to customise the aesthetics of the plot: see ?ggPD
.
ggPD(brt1,rug = T)
Function to draw partial dependency plots with fitted values. There are several parameters to customise the aesthetics of the plot: see ?ggPDfit
.
ggPDfit(brt1)
Function to draw partial dependency plots with fitted function and confidence intervals obtained through bootstrapping (with replacement). There are several parameters to customise the aesthetics of the plot: see ?ggPDboot
.
# Boostrap the BRT 1000 times to build confidence intervals
brt1.prerun<- plot.gbm.4list(brt1)
brt1.boot <- gbm.bootstrap.functions(brt1, list.predictors=brt1.prerun, n.reps=100)
# Draw partial dependency plots a given predictor (i.e., "Complexity")
ggPD_boot(brt1, predictor="Complexity", list.4.preds=brt1.prerun, booted.preds=brt1.boot$function.preds, type.ci = "ribbon",rug = T)
Function to draw partial dependency plots with fitted functions from several boosted regression trees. The models need to have the same set of predictors. There are several parameters to customise the aesthetics of the plot: see ?ggMultiPD
.
ggMultiPD(brt1, brt3, predictor = "Complexity", legend.pos = "top", col.lines =c("blue","red"))
Function to draw partial dependency plots with fitted values from several boosted regression trees. The models need to have the same set of predictors. There are several parameters to customise the aesthetics of the plot: see ?ggMultiPDfit
.
ggMultiPDfit(brt1, brt3, predictor = "Complexity", legend.pos = "top", col.dots =c("blue","red"), cex.dot=1.5, cex.smooth = 0.8)
Function to return the interactions sizes for a given boosted regression trees.
ggInteract_list(brt2,index = F)
#> var1.names var2.names int.size
#> 1 WAV_CLIM_M SST_STD 64.82
#> 2 Depth WAV_CLIM_M 18.51
#> 3 SST_STD SST_CLIM_M 18.17
#> 4 Complexity SST_CLIM_M 9.96
#> 5 Depth SST_STD 7.55
#> 6 WAV_CLIM_M Fishing_NonComm_Boat_Total 7.17
#> 7 WAV_CLIM_M SST_CLIM_M 6.31
#> 8 WAV_CLIM_M PAR_STD 2.08
#> 9 Complexity WAV_CLIM_M 2.07
#> 10 Complexity Fishing_NonComm_Shore_Spear 1.52
#> 11 PAR_STD PAR_CLIM_M 1.19
#> 12 Depth SST_CLIM_M 0.87
#> 13 WAV_CLIM_M CHL_ANOM_F 0.84
#> 14 SST_CLIM_M Fishing_NonComm_Boat_Total 0.78
#> 15 Complexity SST_STD 0.71
#> 16 PAR_STD SST_STD 0.71
#> 17 WAV_CLIM_M Fishing_Comm_Total 0.57
#> 18 Depth PAR_STD 0.46
#> 19 WAV_ANOM_F WAV_CLIM_M 0.31
#> 20 WAV_CLIM_M Effluent 0.31
Function to randomize the response n times to generate a distribution under the null hypothesis of no interaction among predictors for one or several pairs of predictors (following Pinsky and Byler 2015). Model parameters should be the same as for the observed interaction strength. Results allow to compare with observed interaction strength.
# Randomization of the response to test significance of the two strongest interactions
Interact_boot_brt2<-ggInteract_boot(c('WAV_CLIM_M','SST_STD'),c('Depth','WAV_CLIM_M'),nboots = 10,data=dat, predictors = pred, response=which(names(dat) == 'Regime2'),family = "bernoulli", tc = 5, lr = 0.001, bf= 0.75,global.env=F)
Function to draw 2D interaction plot showing predicted values for two predictors from a boosted regression trees. Values for all other variables are set at their mean by default. There are several parameters to customise the aesthetics of the plot: see ?ggInteract_2D
.
ggInteract_2D(gbm.object = brt2,x="WAV_CLIM_M",y="SST_STD",col.gradient = c("white","#5576AF"),show.dot = T,col.dot = "grey20",alpha.dot = 0.5,cex.dot = 0.2,label.contour = T,col.contour = "#254376",show.axis = T,legend = T)
#> maximum value = 0.42
Function to draw 3D interaction plot showing predicted values for two predictors from a boosted regression trees. Values for all other variables are set at their mean by default. Possibility to export an interactive version of the plot in html format. There are several parameters to customise the aesthetics of the plot: see ?ggInteract_3D
.
ggInteract_3D(brt2,x="WAV_CLIM_M",y="SST_STD")
#> maximum value = 0.42
Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802-813.
Jouffray JB, Wedding L.M., Norstrom A.V., Donovan M.K., Williams G.J., Crowder L.B., Erickson A.L., Friedlander A.M., Graham N.A.J., Gove J.M., Kappel C.V., Kittinger J.N., Lecky J., Oleson K.L.L., Selkoe K.A., White C., Williams I.D., Nystrom M. (2019). Parsing Human and Biophysical Drivers of Coral Reef Regimes. Proc. R. Soc. B.
Müller, D., Leitão, P. J., & Sikor, T. (2013). Comparing the determinants of cropland abandonment in Albania and Romania using boosted regression trees. Agricultural Systems, 117, 66-77.
Pinsky, M. L., & Byler, D. (2015). Fishing, fast growth and climate variability increase the risk of collapse. In Proc. R. Soc. B (Vol. 282, No. 1813, p. 20151053).