Analysis of Mengsong Longhorn data using Hierarcheal Modelling of Species Communities

All the species have zero inflated distributions, so we apply the Hurdle Approach where we define a presence - absence (probit) model and a conditional-on-presence abundance model. Note that these models are statistically independent of one another and it is interesting to investigate to what extent the same factors may determine presence / absence, and abundance conditional-on-presence. All the abundance distributions still showed a lognormal poisson (reverse J-shaped count histogram, with large numbers of low counts and long tail of high counts), so this is the distribution we will use.

Now I consider the random structure for the model. As there are two height samples per plot, we should have a plot level random intercept. In addition, we want to investigate species-to-species associations, so we should include a sample level random intercept (each species count is a separate sample from the same trap). Species-to-species associations could represent a response to some important factor not included in the model, or they could represent species interactions such as competition or facilitation.

The model we developed was based on the Longhorn trap captures and determining factors, including trap height, plot elevation, plot slope, and plot level Leaf Area Index. Plot level Basal Area was significantly correlated with LAI with Pearson’s R >0.5, so we dropped this from the analysis. We also included all two-way interactions. Model simplification involved removing parameters that had weak support across species and then comparing the predictive r^2 across species based on 5-fold cross-validation.

Probit model

The final incidence model was defined as follows, where Y = the presence-absence matrix for the 26 species with >9 individuals

Y ~ Intercept + HEIGHT + ELEV + SLOPE + LAI + HEIGHT:ELEV + ELEV:LAI + (1|PLOT) + (1|SAMPLE)

We used a assumption of a Probit distribution (similar to Binomial which is not implemented for this framework). We obtained 1000 samples from the posterior distribution for each of four chains, sampling every 20th iteration, and with the first 1000 samples discarded as burn-in.

Abundance model (lognormal poisson)

The final abundance model was defined as follows, where Y = the abundance-condition-on-presence matrix for the 26 species with >9 individuals (i.e. Y[Y == 0] <- NA)

Y ~ Intercept + HEIGHT + ELEV + SLOPE + LAI + (1|PLOT) + (1|SAMPLE)

We used an assumption of a Lognormal Poisson distribution (similar to the Negative Binomial which is not implemented for this framework). We ploted the abundances of species to examine which distributions best described the data before selecting the lognormal Poisson.

We obtained 1000 samples from the posterior distribution for each of four chains, sampling every 100th iteration, and with the first 5000 samples discarded as burn-in.

Convergence diagnostics - Probit model

Checking the Effective Sample Size (ESS) and Potential Scale Reduction Factors for Fixed (Beta), Trait (Gamma) and Random (Omega) effects.

For all effects ESS are high (near 4000 which is the theoretical maximum (we sampled 4 chains x1000 samples)) and PSRF are near 1, so diagnostics are good.

Convergence diagnostics - Lognormal Poisson

The ESS is poorer than for the presence-absence model, but beta and gamma parameters are estimated with sample sizes >1000. Omega is less good but most are still >1000, which is acceptible. The PSRF are near one, which is good.

Model diagnostics - Probit model

Next we consider the RMSE and TjurR^2 for the statistical model and for model predictions using 5-fold cross-validation

## $RMSE
##  [1] 0.03738264 0.33666956 0.29942492 0.28998906 0.31889972 0.32048508
##  [7] 0.32288210 0.36507882 0.28733558 0.41767890 0.37390519 0.30690366
## [13] 0.39398894 0.36604643 0.36329282 0.36682273 0.42028607 0.31146595
## [19] 0.43065800 0.30864799 0.36845060 0.30492105 0.31371173 0.33182496
## [25] 0.28594237 0.23904362
## 
## $AUC
##  [1]        NA 0.8842975 0.7948718 0.9885057 0.9404467 0.9627792 0.9218750
##  [8] 0.8916667 0.9642857 0.8463158 0.9309524 0.9586207 0.8778468 0.9528536
## [15] 0.8906250 0.8904762 0.8321839 0.9419643 0.7841880 0.8880309 0.7873016
## [22] 0.9227799 0.8996139 0.8494208 0.9210526 0.9826389
## 
## $TjurR2
##  [1]       NaN 0.3637560 0.1144451 0.4609768 0.3871319 0.3650829 0.3751624
##  [8] 0.3812229 0.5324317 0.2176100 0.2283439 0.4386159 0.3128591 0.2427144
## [15] 0.2457139 0.2981793 0.1449588 0.5037532 0.1781346 0.2372772 0.1269533
## [22] 0.2231868 0.1909466 0.1177255 0.2154813 0.4629112
## $RMSE
##  [1] 0.05665705 0.40558780 0.33385356 0.43977047 0.44816322 0.46884088
##  [7] 0.37237020 0.43057174 0.36145290 0.49065054 0.47072612 0.44632146
## [13] 0.43122610 0.45110244 0.41379738 0.42443880 0.49158597 0.37157284
## [19] 0.50760513 0.36660147 0.41570769 0.42951040 0.36542594 0.39976926
## [25] 0.34639727 0.30921255
## 
## $AUC
##  [1]        NA 0.7327824 0.6051282 0.7356322 0.7245658 0.5880893 0.8151042
##  [8] 0.7979167 0.8973214 0.6168421 0.5642857 0.7425287 0.7805383 0.6253102
## [15] 0.7369792 0.7476190 0.5057471 0.8348214 0.6089744 0.7799228 0.4920635
## [22] 0.2393822 0.6679537 0.2779923 0.6096491 0.8680556
## 
## $TjurR2
##  [1]          NaN  0.224128082  0.023630009  0.162925502  0.132165098
##  [6]  0.044391200  0.278288188  0.271389898  0.410673720  0.060879717
## [11]  0.026233275  0.142474864  0.253382301  0.063252741  0.146039666
## [16]  0.175204434  0.005477916  0.399432005  0.038895036  0.104196334
## [21]  0.021767069 -0.115126862  0.065209197 -0.062589622  0.036823163
## [26]  0.299713440

TjurR2 are high for most species in the statistical model (range 11-52%). TjurR2 poorer for cross validation but still good for some species (Sp2, Sp4, Sp5, Sp7, Sp8, Sp9, Sp12, Sp13, Sp15, Sp16, SP18, Sp20 and Sp26). All other species TjurR2 <10%.

Note that the parameters could not be estimated for the most abundant species. This is because it was present in nearly all samples so the Probit model was uninformative.

Model diagnostics - Lognormal Poisson

We now examine the RMSE and R2 of the statistical model and the model predictions through 5-fold cross-validation for the lognormal poisson model.

## $RMSE
##  [1] 14.6701284  2.9658591  7.8865138  4.3402884  6.8879824  4.1121043
##  [7]  5.9971149  3.7994938  2.6792982  1.6739970  1.4661018  2.8030736
## [13]  1.6940855  2.2225374  1.8271597  1.1347948  1.4793412  0.6734745
## [19]  0.6025217  0.7149377  0.7250983  0.4736526  0.8410159  0.8501699
## [25]  0.7485926  0.3251492
## 
## $SR2
##  [1] 0.67203738 0.93229948 0.80122250 0.82281944 0.22584935 0.77475115
##  [7] 0.51816562 0.52878653 0.67180527 0.60430961 0.36425399 0.59259834
## [13] 0.63500864 0.21308819 0.47437269 0.52829799 0.05092458 0.09955003
## [19] 0.24260652 0.03214286 0.45873016 0.89285714 0.64285714 0.43214286
## [25] 0.38095238 0.57142857
## 
## $O.AUC
##  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [24] NA NA NA
## 
## $O.TjurR2
##  [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
## [18] NaN NaN NaN NaN NaN NaN NaN NaN NaN
## 
## $O.RMSE
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 
## $C.SR2
##  [1] 0.67203738 0.93229948 0.80122250 0.82281944 0.22584935 0.77475115
##  [7] 0.51816562 0.52878653 0.67180527 0.60430961 0.36425399 0.59259834
## [13] 0.63500864 0.21308819 0.47437269 0.52829799 0.05092458 0.09955003
## [19] 0.24260652 0.03214286 0.45873016 0.89285714 0.64285714 0.43214286
## [25] 0.38095238 0.57142857
## 
## $C.RMSE
##  [1] 14.6701284  2.9658591  7.8865138  4.3402884  6.8879824  4.1121043
##  [7]  5.9971149  3.7994938  2.6792982  1.6739970  1.4661018  2.8030736
## [13]  1.6940855  2.2225374  1.8271597  1.1347948  1.4793412  0.6734745
## [19]  0.6025217  0.7149377  0.7250983  0.4736526  0.8410159  0.8501699
## [25]  0.7485926  0.3251492
## $RMSE
##  [1] 26.8595775 12.1317089 13.4149576 13.6299532  8.6512394 10.2597545
##  [7]  7.8919027  5.1242898  5.8515252  4.8513248  2.2862911  4.3621035
## [13]  4.8679117  3.6664627  3.8947889  1.4402278  2.0085726  0.8708604
## [19]  0.8707304  1.1501093  1.1089216  0.8757301  1.6212410  1.2229929
## [25]  1.1276253  0.5789800
## 
## $SR2
##  [1]  2.968358e-01  2.809533e-01  3.650417e-01  6.124408e-01 -4.766549e-02
##  [6]  3.032537e-01  1.422189e-01  9.657842e-02  4.287878e-01  2.529064e-01
## [11]  1.868743e-02 -1.088251e-02  3.669172e-01 -1.808334e-01  9.675031e-02
## [16]  1.690192e-01 -9.529097e-02  9.074912e-04 -6.937760e-06 -3.571429e-01
## [21] -2.867063e-02  3.214286e-01 -1.984127e-03 -2.285714e-01 -1.152381e-01
## [26] -6.349206e-02
## 
## $O.AUC
##  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [24] NA NA NA
## 
## $O.TjurR2
##  [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
## [18] NaN NaN NaN NaN NaN NaN NaN NaN NaN
## 
## $O.RMSE
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 
## $C.SR2
##  [1]  2.968358e-01  2.809533e-01  3.650417e-01  6.124408e-01 -4.766549e-02
##  [6]  3.032537e-01  1.422189e-01  9.657842e-02  4.287878e-01  2.529064e-01
## [11]  1.868743e-02 -1.088251e-02  3.669172e-01 -1.808334e-01  9.675031e-02
## [16]  1.690192e-01 -9.529097e-02  9.074912e-04 -6.937760e-06 -3.571429e-01
## [21] -2.867063e-02  3.214286e-01 -1.984127e-03 -2.285714e-01 -1.152381e-01
## [26] -6.349206e-02
## 
## $C.RMSE
##  [1] 26.8595775 12.1317089 13.4149576 13.6299532  8.6512394 10.2597545
##  [7]  7.8919027  5.1242898  5.8515252  4.8513248  2.2862911  4.3621035
## [13]  4.8679117  3.6664627  3.8947889  1.4402278  2.0085726  0.8708604
## [19]  0.8707304  1.1501093  1.1089216  0.8757301  1.6212410  1.2229929
## [25]  1.1276253  0.5789800

R2 values for statistical model are high, and R2 values for cross validation are good for several species (sp1,sp2,sp3,sp4,sp6,sp7,sp9,sp10,sp13,sp16,sp22). As expected the abundance model is not well estimated for most of the rarer species.

Final model simplification

To ensure the beta, gamma and omega estimates were based on species with reasonable R^2 values, and to simplify the graphical presentation of results, we removed all species with predictive TjurR^2 or R^2 of <0.10 from the Probit and Lognormal Poisson models, respectively. Hence, the two models have different sets of species.

Variance partitioning

Here we examine the variance explained by the different main effects in the models.

Random effects explained <10% of the variance in the Probit model but around 28% in the Lognormal Poisson model. HEIGHT, SLOPE and LAI were important in the Probit model, whereas ELEV was markedly more important in the abundance model and other factors less so but still explaining ~10% of the variance.

Examine the importance of estimated parameters across species (support >0.8)

We examine the beta coefficients for species with >0.8 support.

All the fix effects were important in determining community composition based both on incidence and abundance-conditional-on-presence.

There was no direct effect of HEIGHT in the incidence model but there was a positive interactive effect between HEIGHT and ELEV in several common species. In the abundance model 7 species were positively associated with HEIGHT.

ELEV exhibited positive association with five species in the incidence model. Whereas in the abundance model it was positively associated with four species and negatively associated with three.

SLOPE was mostly positively associated with species’ incidences and abundances. However, Mego_cos was negatively associated with SLOPE in both models and Xeno_bim in the abundance model.

LAI was positively associated with 7 species in the incidence model but there was also a negative interaction between ELEV and LAI for 11 species. In the abundance model 8 out of 11 species showed a negative association with LAI.

Examine the significance of the estimated parameters (support >0.95)

Here we restrict the beta coefficients to only species with >0.95 support.

When support is restricted to >0.95 in the incidence model, there is a substantial reduction in the number of coefficients and only SLOPE (5 species), LAI (2 species) and ELEV (1 species) are important. HEIGHT and the HEIGHt ELEV interaction are no longer represented.

However, when support is restricted to 0.95 in the abundance model there are still a substantial number of beta coefficients represented; HEIGHT for six species, ELEV for three species, SLOPE for four species and LAI for three species.

Look at effect of subfamily

Subfamily may represent phylogenetically conserved traits at this taxonomic level that determine their responses to environmental factors. Hence, it is interesting to look at the overall responses at the Subfamily level. A similar analysis was not meaningful at the level of Tribe because there were too many Tribes represented (hence too few species per Tribe).

Cerambycinae is the baseline and hence represented by the Intercept. Prironinae is represented by only one species (Mego_cos) in the models. Hence, Cerambycinae incidence was positively associated with SLOPE and with less support LAI and negatively associated with the interaction between ELEV and LAI. It was not associated with HEIGHT. However, Lamiinae incidence did not show any strong associations with these environmental factors.

The abundance model produced somewhat different results. Cerambycinae abundance was positively associated with HEIGHT and SLOPE and negatively associated with LAI. As in the incidence model, Lamiinae abundance did not show any strong associations.

Look at species to species associations (support >0.8)

No strong associations amoung species at both Plot and Sample scales in the incidence model.

In the abundance model there are more associations among species at both the plot and sample level. However, most of these do not have strong support. There is just one sample level association with >0.95 support between Mego_cos and Rhap_hor

Community responses to environmental gradients

Species richness

## # weights:  3 (2 variable)
## initial  value 30.498476 
## final  value 30.498476 
## converged
## [1] 5e-04
## # weights:  3 (2 variable)
## initial  value 30.498476 
## final  value 30.498476 
## converged
## [1] 1
## # weights:  3 (2 variable)
## initial  value 30.498476 
## final  value 30.498476 
## converged
## [1] 0.01675

The incidence models shows there is a clear association between species richness and the environmental parameters. More species occurred in the canopy traps and in plots on steeper SLOPEs, but fewer species occured in plots at higher ELEV and with higher LAI.

Total abundance

## # weights:  3 (2 variable)
## initial  value 30.498476 
## final  value 30.498476 
## converged
## [1] 0.22775
## # weights:  3 (2 variable)
## initial  value 30.498476 
## final  value 30.498476 
## converged
## [1] 0.90875
## # weights:  3 (2 variable)
## initial  value 30.498476 
## final  value 30.498476 
## converged
## [1] 0.02875

The abundance model reveals that more individuals occur in the canopy traps, although the variance is higher than for incidence. For the other environmental variables, the variance is high and hence no clear pattern beteen total counts and ELEV, SLOPE or LAI can be discerned.