Modeling Insolation, Multi-spectral Imagery and LiDAR Point-cloud Metrics to Predict Plant Diversity in a Temperate Montane Forest

: Incident solar radiation (insolation) passing through the forest canopy to the ground surface is either absorbed or scattered. This phenomenon, known as radiation attenuation, is measured using the extinction coefficient (K). The amount of radiation at the ground surface of a given site is effectively controlled by the canopy ’s surface and structure, determining its suitability for plant species. Menhinick’s and Simpson biodiversity indexes were selected as spatially explicit response variables for the regression equation using canopy structure metrics as predictors. Independent variables include modeled area solar radiation, LiDAR derived canopy height, effective leaf area index data derived from multi-spectral imagery, and canopy strata metrics derived from LiDAR point-cloud data. The results support the hypothesis that, 1.) canopy surface and strata variability may be associated with understory species diversity due to habitat partitioning and radiation attenuation, and that, 2.) such a model can predict both this relationship and biodiversity clustering. The study data yielded significant correlations between predictor and response variables and was used to produce a multiple-linear model comprising canopy relief, texture of heights, and vegetation density to predict understory plant diversity. When analyzed for spatial autocorrelation, the predicted biodiversity data exhibited non-random spatial continuity.


Introduction
Spatial ecologists and biogeographers study the variation of plant species across spatial scales and latitudinal gradients. This spatial variability and its link to ecological and biogeochemical processes are considered by some researchers to be fundamental to biological inquiry [1], and understanding biological diversity a central research problem in modern biology. The spatial analysis of biota often focuses on either species distribution or species biodiversity within communities, landscapes and ecosystems, or genetic systems associated with spatial scales such as species populations. The discipline of phytogeography provides one such example, combining geography and botany to investigate the spatial distributions of plant species and their communities. Alexander von Humboldt and Aimé Jacques Bonpland's "Essai sur la géographie des plantes" [2], published in 1807, was groundbreaking work in this respect. In it, Humboldt and Bonpland considered the variation of ecological gradients and put forth a theory of the geographical repartition of species, visualizing these concepts in portraiture. Works such as the "Tableau physique des Andes et pays voisins" contributed significantly to the formation of modern biogeography [2].
Biodiversity measures describe species and trait richness and evenness on different scales. Alpha biodiversity (α-diversity), or species diversity in habitats at a local scale, is influenced both by the number of types of habitat and ecological processes [3]. Vegetation biodiversity in forest ecosystems has been positively correlated with productivity and resiliency in forest stands due to increased spatial, temporal and biogeochemical efficiencies in site utilization [4][5] [6]. These plant communities also tend to be less vulnerable to pathogens, the negative effects of high intensity wildfire and wind related disturbances [5]. It has been shown that the rates of such processes are affected by the spatial configuration of the environment. This study develops a hypothesis that α-diversity is influenced by two aspects of environmental heterogeneity: the range of environmental conditions (i.e., environmental variability) influencing the number of types of habitats available and the spatial configuration of those habitats [8].
The spatial variability of biodiversity on larger scales has been characterized by biogeographers with respect to latitudinal gradients. Meta-studies indicate that mechanisms such as solar energy, climate and area-specific processes are likely to contribute to species diversity on many spatial scales [7], and also within the vertical and horizontal structural variation in forest landscape. The asymmetric competition for light, for instance, is noted in a forest structure meta-study as a key negative influence on overall productivity [10].
A forest's structure is composed of a landscape scale terrestrial ecosystem including canopy flora, soil type and depth, subsurface biota and hydrology. The canopy is of considerable interest to researchers because of its functional interface with the atmosphere with respect to carbon, water and energy exchange and being the site of primary production [11]. On a global scale, forest canopies support approximately 40% of extant species, 10% of which are predicted to be canopy specialists [12]. Attributable to the complex threedimensional structure of the canopy affording niche diversification and vertical stratification, approximately 10% of all vascular plants are epiphytic canopy species [12].
Landscape scale meta-studies have observed that stand structure is a biodiversity indicator. Gao et al. [13] noted that mature stands with a stratified canopy had the highest plant species diversity of the 26 stand structure types and across the nine soil classes in the study, in particular stands comprising mixed conifer and broadleaved species with a semi-open canopy, whereas younger single-layered stands had consistently low species diversity. Age of canopy trees was closely associated with taxonomic diversity, followed by canopy stratification, tree species composition and canopy coverage [13].
Chronosequential field studies of forest succession related α-diversity near the study area indicate that post-disturbance taxonomic diversity trends upward until canopy closure. It then decreases for up to several decades before increasing again as canopy structure variability increases [14]. In contrast to more or less uniform mechanical stand replacement disturbances, Donato et al. [15] note that natural and compound disturbance regimes, such as mixed-severity fire, create alternate successional pathways and distinct early seral communities that contribute to canopy heterogeneity.
Due to its three-dimensional distribution of leaves, branches and stems, as well as its physical relationship to incoming solar radiation, canopy structure is a source of habitat niche partitioning for plant species [10]. Canopy structure can be considered in terms of both vertical components, such as strata, and horizontal ones, such as age related or other natural disturbance patches and fragments. Stands undisturbed by landscape level events such as fire or major windfall tend to develop an uneven-aged stand structure as natural disturbances generally provide open gaps in the canopy for younger trees [16]. Properties of this structure can be described by metrics such as stand height, density, distribution, and volume -each serve as a proxy for the more complex and unmeasurable distribution of forest canopy structure itself [17]. Derived point-cloud metrics measuring these variables with a high degree of accuracy can be developed with data sensed with laser arrays and subsequently 'gridded' to raster formats for spatial modeling in a geographic information system.
Remotely sensed and field data acquisition from randomly selected biodiversity plots located in a natural, temperate, montane conifer forest in Northern California were used to develop a multivariate regression model and clustering analysis of predicated points to visualize the spatial distribution and statistically significant clusters of plant species α-diversity. The results are believed to be significant and also adequate enough to initiate a more comprehensive study to aide forest management and the development of ecosystem services prescriptions with plant diversity components.

Materials and Methods
The purpose of this study is to evaluate spatially explicit area solar radiation and structural variables in the forest canopy that influence its distribution in order to develop a diversity prediction model using the most significant canopy metrics in the available data. Evaluating the statistically significant clustering of the predicted data is a secondary objective that assist in the observation of biodiversity hotspots.

Study Area
The study area is located in the center of the Klamath Ecoregion of Northwestern California and Southern Oregon, an area studied extensively for its geological antiquity and diversity of plant species. Geographically, the region is bounded on the north and south by lower elevation coast-range mountains and interior valleys, on the west by the Pacific Ocean, and on the east by montane valleys and desert plateaus. Due in part to its topographic variability and an abundance of alpine water sources, the study area contains a unique conifer diversity concentration within a region that represents a global maxima, comprising 30 native and endemic species [18].
The analysis extent and the study area are two distinct geographic areas due to solar radiation modeling requirements. The analysis extent is bounded by the topographic horizon captured in the bare earth digital elevation model (DEM). It consists of an arid eastslope canyon at the base of Russian Peak in the Upper Sugar Creek Watershed.
The study area used for canopy analysis is an approximately 18.5 hectare area located at 507090 E, 4572336 N (UTM) Zone 10, (NAD 1983) as shown in Figure 1. The study area elevation was between 1500 meters and 1700 meters height above ellipsoid (HAE), with the plots samples observed primarily near 1500 meters HAE. Loamy inceptisols are the dominant soil taxon in the study area. These soils are common on landscapes that are relatively active, such as mountain slopes, where erosional processes expose unweathered materials. All plots were sampled within a Klamath Mixed Conifer (KMC) plant community within an average canopy height of one standard deviation from the mean of the plant community boundary at a 30-meter resolution. Plots were selected from within a uniform climate, elevation, soil type and vegetation profile. Surface water quantity varied considerably, even outside excluded riparian habitat, due to water point-source locations and ground surface structure variability associated with biomass accumulation.

Biogeography
The Klamath Ecoregion is characterized by "complex biogeographic patterns, high endemism, and unusual community assemblages" [18]. It contains one of the four richest temperate coniferous forests in the world along with the Southeastern conifer forests of North America and those of the Primorye region of the Russian Far East and Sichuan region of China [18]. Relic, endemic and regionally dominant species coexist in among abrupt alternes and a diverse series of xeric, hydric and mesic habitats defined by variability in elevation and microclimate [19]. The taxonomic biodiversity present in the study area itself is thought to be the unique result of its topographic-climate variability and biogeographic history [19].

Forest Species
Larger forest reserves are generally found in the highest elevations of the region, with few significant areas of lower elevation habitat remaining undisturbed. The Dillon Creek watershed on the middle Klamath River reach is one of the last remaining unfragmented lowland forests of old-growth Klamath Mixed Conifer [18]. The montane Klamath Mixed Conifer forest type that defines the study extent in the Upper Sugar Creek Watershed is a regionally unique assemblage composed of tall, dense to moderately open conifer forests with patches of broad-leaved evergreen and deciduous low trees and shrubs typical of the assemblage [18]. It is dominated by evergreen conifers up to 60 meters in height and a rich shrub and herbaceous layer on undisturbed mesic sites. The overstory layer is characterized by a mixture of conifer species dominated by white fir (Abies concolor), Douglas-fir (Pseudotsuga menziesii), ponderosa pine (Pinus ponderosa), incense cedar (Calocedrus decurrens) and sugar pine (Pinus lambertiana). On xeric sites the forest canopy is less continuous, but the shrub layer is still abundant [21]. Figure 2 shows xeric and mesic understory plant communities observed in the study area.

Meteorology and Fire Regime
The region's weather patterns are a key source of canopy disturbance in its forest ecosystems. These conditions are related to high-velocity winds and low-humidity and represent a key physical component of natural forest dynamics and stand succession, including fire and windfall [22]. The Klamath ecoregion is described as having a mixed-severity fire regime, with a fire return interval of approximately 15 years in lower montane conifer forests. Return interval describes a fire frequency at the scale of a stand or relatively small landscape area, while the fire rotation interval presents a more nuanced view of the local study area fire regime in that it describes the fire cycle over the larger scale landscape with variable spatio-temporal frequency and intensity. Notably, the study area has missed some intervals in its natural fire regime due to suppression [23].

Area Solar Radiation, Canopy Structure, and Insolation Partitioning
The majority of incident solar radiation to Earth's vegetated surfaces is absorbed in the canopy layer, independent of the vegetation height above the surface [4]. In the context of the canopy surface radiation balance insolation is shortwave radiation influenced by atmospheric water vapor, aerosols and ozone, as shown in Table 1. The spatial distribution of insolation on the surface is governed by solar elevation, surface orientation and albedo as well as the screening or reflection effects from the surrounding terrain and the diffuse fraction of radiant flux [24,25]. Therefore, the spatial variability of annual photosynthetically active radiation (PAR) irradiation on the terrain surface is substantial in complex topography [24]. Seasonal variation due to atmospheric opaqueness also distributes radiation spatially due to topographic effects, with aspects receiving direct radiant fluxes contrasting with aspects that receive primarily diffuse solar radiation [26].
Canopy cover is the proportion of the forest floor covered by the vertical projection of the crown area, as contrasted with canopy closure, which is the proportion of sky hemisphere obscured by vegetation from a single point [27]. Sub-canopy radiant flux is not, however, inconsequential to either insolation attenuation or plant species persistence. While the upper strata of the canopy determine the type of vegetation on the forest floor [4], understory microclimate is determined by the topographic variability of the ground surface and availability of water, production in the overstory canopy, and to the distribution of understory species and the maintenance of subsurface processes [28].

Multispectral Imagery Data Specification
WorldView-2 is a high-resolution 8-band multispectral satellite operating at an altitude of 770 km. A corrected early season 'leaf-on' image from this system was used for the study, minimizing radiometric variability manifested as differences in the coloration of image features (trees and openings) and radial displacement (or the apparent elongation or displacement of objects having height) [29]. The imagery provides 1.85 m multispectral resolution (source scale), and a swath width is 16.4 km at nadir with a demonstrated geolocation accuracy of less than 3.5 m without ground control.

LiDAR Data Acquisition and Specification
The Sugar Creek LiDAR survey used the NAD83 (CORS96) datum (epoch 2002.00) and a Leica ALS50 system for acquisition in late season 'leaf-on' conditions. Actual first return average point density for the survey averaged 16.11 points/m2 for the analysis extent while ground classified point density, used for accuracy assessments with ground control survey data, was 2.36 points/m2. Absolute accuracy was determined using Fundamental Vertical Accuracy (FVA) methods outlined in the FGDC National Standard for Spatial Data Accuracy. FVA compares known real-time kinematic (RTK) ground control data collected on open, bare earth surfaces with level slope (less than 20°) to the triangulated surface generated by the Li-DAR points. Quantum Spatial both acquired and performed post acquisition processing to derive the LiDAR products used in the study. This included the high-resolution digital elevation model (DEM) and point-cloud dataset used to develop canopy metrics and surface models with the Boise Center Aerospace Laboratory (BCAL) library.

Data Processing Summary
Survey grade LiDAR products were used as inputs to georeference canopy and solar radiation models. The ERDAS Spatial Modeler was used to derive leaf area index (LAI) data from the multispectral image, and Boise Center Aerospace Laboratory (BCAL) Vegetation and Intensity tools developed in ENVI were used to derive canopy metrics from the LiDAR point-cloud data that were subsequently gridded to a 30-meter resolution raster format from raster data scales less than 30m.
The Klamath Mixed Conifer (KMC) forest type was selected as the focal plant community to delineate the study area. KMC is a California Wildlife Habitat Relationship System (CWHRS) attribute 'crosswalked' in the Classification and Assessment with LAND-SAT of Visible Ecological Groupings (CALVEG) data layer [30]. All LiDAR and imagery data were resampled to 30-meter resolution using bilinear or nearest neighbor methods for discrete and continuous data and then clipped to the KMC vegetation type within analysis extent and elevation boundary in ArcGIS, yielding study area 144 data points for each independent variable. The raster resolution and the corresponding centroids provided plot size and location coordinates for the field analysis.
The complete pre-processing of survey grade data products required to analyze the study area and select plots for biodiversity sampling is shown in a diagram ( Figure 3). Each 30-meter field plot references a pixel of gridded LiDAR data with the same resolution. All diversity values resulting from plot data are numerically ordinal and assumed to be spatially continuous. Geographic coordinate system and projected coordinate systems used in the processing were based on the survey grade bare earth DEM provided in the post-acquisition LiDAR products. The WorldView 2 imagery was transformed to these coordinate systems as well [31].
Forest dynamics (disturbance types and succession) are spatio-temporal phenomena that were not the focus of the study. There is a time-lag of five years between the LiDAR and multispectral data acquisition and the final field sampling of biodiversity plots. For this reason, sites evidencing plot scale canopy disturbance were excluded from the study to the extent that remotely sensed data did not align with the observed vegetation height.

Canopy Structure and Area Solar Radiation Metrics
The leaf area index (LAI) quantitatively characterizes plant canopy layers and is often used to model forest canopy structure. It is defined as the one-sided leaf area per unit ground surface area for broadleaf or half of the total needle surface area per unit ground surface area for conifer forests [32].
LAI is also a key variable for regional and global models of biosphere-atmosphere exchanges of energy, carbon dioxide, water vapor, and other materials [33]. Canopy photosynthesis and its equivalent, gross primary productivity (GPP), should theoretically reach a maximum as leaf area (LAI) increases to a value where spectral radiation in the .4 to .7 μm visible wavelength, is totally intercepted [34].
It can be challenging to quantify LAI accurately due to significant spatio-temporal variability and measurement limitations inherent in current methodologies [27]. These methods utilize 'direct' approaches employing field sampling or 'indirect' approaches involving optical instruments combined with modeling [27]. Optical instruments used to estimate LAI rely on the measurement of light transmittance and, because they are based on Beer's law, the resultant measures are termed 'effective LAI' [27].
LAI was derived in ERDAS Imagine Spatial Modeler by first deriving a Normalized Difference Vegetation Index (NDVI) from the multispectral image and then using a linear model to determine LAI [35]. NDVI is calculated from the visible red and near-infrared light (NIR) [36].
(2) NDVI uses normalization to minimize effects of variable irradiance and is commonly used to indicate the amount and vigor of vegetation and to differentiate vegetated and non-vegetated areas in an image. Plants appear relatively dark in the PAR and relatively bright in the NIR [34]. The biophysical interpretation of NDVI for Figure 4 is the fraction of absorbed photosynthetically active radiation (fPAR). Although derived index values can range from -1.0 to +1.0, values less than zero do not have any biological meaning and the range was adjusted for the model to exclude them. The empirical conversion from the NDVI described by Wulder et al. [35] was then used to derive the canopy leaf area in the ERDAS Spatial Modeler tool ( Figure 5) [25]. Photosynthetically available radiation (PAR) and its relationship to LAI is well established in the literature [27]. The theoretical basis for the absorption of light as it passes through a forest canopy is described by the Beer-Lambert law: t is the proportion of PAR incident at the top of the canopy that is transmitted to a given point x within the canopy. • LAI(x) is the total leaf area above point x. • K is the extinction coefficient.
The coefficient indicates that light intensity decreases exponentially as it passes through each canopy layer [4,33]. Also known as the attenuation coefficient, it describes the extent to which the radiant flux of a beam is reduced as it passes through a specific material, in this case the vegetation canopy. When a narrow (collimated) beam passes through canopy strata, the beam will lose intensity due to two processes: absorption and scattering. A detector can be used to measure a beam's directional path, or conversely using a non-narrow beam, one can measure how much of the lost radiant flux was scattered, and how much was absorbed. The extinction coefficient is, therefore, the sum of the absorption coefficient and the scattering coefficient [37].
The determination of the extinction coefficient requires direct measurement of LAI over consecutive seasons, as there is significant atmospheric and canopy structural variability in the determination of its value [38]. A meta-study of canopy light extinction showed significant intra-annual negative correlations between extinction coefficient (K) and seasonal changes in LAI in natural ecosystems [39]. In another study, a K value of .48 and LAI of 6.2 are thresholds at which 95% of incident solar radiation is intercepted by the forest canopy in a stand of Pseudotsuga menziesii [34], a dominant upper strata canopy species in the Klamath Mixed Conifer forest type.
The LAI variable defines the number of equivalent layers of leaves relative to a unit of ground area, but the fraction of PAR that is absorbed (fPAR) measures the proportion of available radiation in the photosynthetically active wavelengths that are absorbed by a canopy [40]. The basis for calculating fPAR using an exponential function is based on Beer's law: where P∞ is the asymptotically limiting value of PAR absorption for an infinitely thick canopy and was set to 0.94, with the assumption made that the canopy leaves are randomly distributed [25]. This assumption was made in the LAI calculation for this study as well. Both parameters are used for evaluating surface photosynthesis, evapotranspiration, and primary productivity for the analysis of ecosystem functions such as terrestrial energy, carbon, water cycle processes, and the biogeochemistry of vegetation [40].
In addition to LAI derived from satellite imagery, metrics were derived from LiDAR point-cloud and surface data to measure localized insolation and the canopy strata variables effecting solar radiation in the understory.
1. Canopy Height (CHM) in meters was determined from the LiDAR digital surface model (DSM) and the first returns data at 1 meter resolution. It was utilized to derive area incident solar radiation adjusted for topography as well as to provide data required for sample plot selection by using the average values at 30 meters resolution matched to the gridded canopy metrics raster [41,42,45].

2.
Area Solar Radiation (ASR) in WH/m2 was derived using the sum of 12 monthly values (2015 calendar year) for the analysis extent and resampled to 30 meters resolution. ASR was modeled in ArcGIS using the sum of the LiDAR bare earth surface product (DEM) and the canopy height model (CHM) product as the topographic parameter, including topographic elevation data for the entire watershed used as the horizon parameters for the ASR model equations [43,44].
Radiation parameters included diffuse model type (radiation flux varied with zenith angle in a non-uniform overcast sky condition), diffuse proportion (proportion of global normal radiation flux that is diffuse by month), and atmospheric transmissivity (fraction of radiation that passes through the atmosphere by month). Input parameters were derived from meteorological data acquired from a field station within close proximity to the analysis extent in 2015. ASR is a term used to describe irradiation, or the sum of downward area irradiance per unit area over a stated time interval expressed in WH/m 2 . Irradiance is the instantaneous density of solar radiation on a unit area expressed in W/m 2 . It comprises a global radiation value (Globaltot), or the sum of direct (Dirtot) and diffuse (Diftot) radiation of all sun map and sky map sectors as shown in the solar model equation [43,44]: Dirtot for a given location is the sum of the direct insolation (Dirθ,α) from all sun map sectors. Direct insolation from the sun map sector (Dirθ,α) with a centroid at zenith angle (θ) and azimuth angle (α) is calculated using the following equation: where: • SConst is the solar flux outside the atmosphere at the mean earth-sun distance, known as solar constant. The solar constant used in the analysis is 1367 W/m2. This is consistent with the World Radiation Center (WRC) solar constant. • β is the transmissivity of the atmosphere (averaged over all wavelengths) for the shortest path (in the direction of the zenith). • m(θ) is the relative optical path length, measured as a proportion relative to the zenith path length. • SunDurθ,α is the time duration represented by the sky sector. For most sectors, it is equal to the day interval (for example, a month) multiplied by the hour interval (for example, a half hour). For partial sectors (near the horizon), the duration is calculated using spherical geometry. • SunGapθ,α is the gap fraction for the sun map sector.
• AngInθ,α the angle of incidence between the centroid of the sky sector and the axis normal to the surface [43,44]. Total diffuse solar radiation for the location (Diftot) is calculated as the sum of the diffuse solar radiation (Dif) from all the sky map sectors. The diffuse radiation at its centroid (Dif) is calculated, integrated over the input time interval, and corrected by the gap fraction and angle of incidence using the following equation: where: • Rglb is the global normal radiation.
• Pdif is the proportion of global normal radiation flux that is diffused.
• Dur is the time interval for analysis.
• SkyGapθ,α is the gap fraction (proportion of visible sky) for the sky sector.
• Weightθ,α is the proportion of diffuse radiation originating in a given sky sector relative to all sectors. • AngInθ,α is the angle of incidence between the centroid of the sky sector and the intercepting surface [43,44].

3.
Intensity of return (IR) is a measure of amplitude describing the peak power ratio of the laser return to the emitted laser, calculated as a function of surface reflectivity. Values are corrected for variability between flight lines and preprocessed at a 0.5-meter pixel resolution before being processed using the BCAL vegetation intensity tools and output to a 30-meter resolution for the vegetation excluding bare earth data [41,42] Forest remote sensing research indicates that the returns of high-intensity and the low intensity peak count of the intensity distribution were predictive of live and dead tree biomass, respectively [48].

4.
Total Vegetation Density is a derived percent ratio of vegetation to ground returns per m 2 or,

5.
Canopy Relief Ratio is a derived mean height less the minimum height divided by the maximum height less the minimum height per m2 or,

Measures of Biodiversity
Biodiversity is the quantitative measure of the variety of distinct species in an area and their relative abundance, a biological phenomenon described with consideration of both spatial and temporal scales [4]. Alpha diversity (α-diversity) was developed by R. H. Whittaker during his study of plant communities in the Klamath region to describe habitat (local) level diversity characteristics. It was introduced with the complementary statistics, β-diversity and γ-diversity. Beta diversity is dimensionless, as a comparative statistic, i.e., the ratio between γ-diversity (regional) and α-diversity (local) diversities [49]. α-diversity and γ-diversity are limited to discrete units of space -roughly communities and ecosystems respectively [4], but the ratio describes the number of unique habitats utilized by species in a region [50].
Whittaker postulated that total species diversity in a landscape, described by γ-diversity, is determined by the mean species diversity at a community scale (α-diversity) and also by the differentiation among those communities (β-diversity) [49,51]. Whittaker's subsequent usage of α-diversity implies the application of the statistic across multiple sites in a landscape, strongly influencing its primary use at the assemblage and community scale [49,51].
Insofar as the describing an observation of diversity present in a discrete unit of space, three broad types of biodiversity measures in the literature indicate an evolution in the understanding of biodiversity and related ecological processes. These are taxa based (taxonomic), trait based (functional, phenotypical expression of traits) and phylogenetic diversity, defined as the minimum total length of phylogenetic branches required to span a given set of taxa on the phylogenetic tree [52].
We used taxonomic diversity indexes for relative ease of calculation and the extensive theoretical vetting that has occurred in the literature, but the others are mentioned in the discussion as they represent improvements on the usefulness of a spatial model. That being said, a fundamental ecological axiom is that habitats differ in the richness and abundance of species associated with them. Therefore, some issues arise in the most basic taxonomic diversity calculations which must be addressed. These are, 1.) that the total number of species in a sample varies both within habitats and between different habitats and that unless adjustments are made it is difficult to compare taxonomic α-diversity values, and, 2.) species of lesser abundance should not carry as much weight as those with higher abundance as their functional role is less [50]. Point samples were acquired within a uniform mixed-conifer habitat type, and Menhinick's Index was used to provide an adjustment to species richness data. It measures point diversity as the ratio of number of species (s) and the square root of the total number of individuals (N). It is notated: s is the number of different species in your sample and • N is the total number of individual organisms in the sample. To assess the impact of species abundances, the Simpson Index relates the contribution made by each species to the total number of individuals present. This is notated, where pi is the proportion of individuals found in species i. SDI is a non-parametric index value less sensitive to species richness in that species specific abundances are considered. As SDI's summation increases, evenness decreases. The difference between that value and 1 produces a range of 0-1, 1 being a monoculture.

Multiple Regression Modeling and Cluster Analysis Methods
Multiple linear regression was used to model the selected predictor variables and a biodiversity response variable. [53]. This is notated, = 0 + 1 1 , + 2 2 , + ⋯ + , + represents the random error Canopy metrics, radiation and diversity were evaluated using R language functions for correlation, predictor collinearity, regression, model fit and predictive modeling within the study area.
After producing biodiversity prediction maps, the global inferential statistics Moran's-I, Getis Global G and the local Getis-Ord Gi* statistics were used to evaluate spatial autocorrelation and clustering of the predicted data set. Global Moran's I statistic uses a z-score and p-value that test the null hypothesis that MDI is randomly distributed across the study area. Similarly, the Getis-Ord General G statistic uses a p-value and z-score to determine that there is no spatial clustering of feature values. However, whereas the Moran's I statistic evaluates high and low values together for purposes of spatial autocorrelation; the General G distinguishes between clustering of high and low values in the data set, determining which, if either, is significant and non-random. Therefore, if the p-value is statistically significant, the null hypothesis can be rejected, and the z-score is used to determine if the high or low values are contributing to the clustering. The Global Moran's I statistic is denoted, • zi is the deviation of an attribute for feature i from its mean, • wi,j is the spatial weight between feature i and j, • n is the number of features in the dataset and • 0 is the aggregate of spatial weights ∑ ∑ , =1 =1 . The Getis-Ord General G is denoted, • xi and xj are attribute values for features i and j, • wi,j is the spatial weight between feature i and j, • n is the number of features in the dataset and • ∀ ≠ Indicates that features i and j cannot be the same feature.

Sample Plot Selection
Six random plots were sampled without adjusting for variation in slope, aspect, or relative soil hydrology of the site (xeric, mesic, or hydric). Each sample plot provided 900 square meters of taxonomic field data with a total survey area of 5,400 m2. The survey was conducted by quantifying species richness and abundance within the plot perimeter. Absolute quantitative numbers were not the focus; species richness was estimated in areas of significant vegetation abundance. Surveyed taxa included visible vascular plants, bryophytes, and canopy lichens. Overall accuracy of species identification was high, with few visible species remaining unidentified or uncounted. The data are representative of the forest vegetation in the study area, but likely understate actual richness and abundance.
The plot map consisted of a web feature service published to a hosted cloud server, downloaded and synchronized to a field data collection tablet paired with a map grade GNSS receiver. The mobile field data collection device utilized a satellite based real-time kinematic (RTK) network for plot position corrections (or SBAS if the RTK network was unavailable). This method achieved an estimated horizontal plot accuracy of approximately 2 meters with respect to the LiDAR survey data used to produce the web-gis map service. Richness (number of specific taxa), abundance (quantity of each specific taxa) and two measures of biodiversity were computed for the model as shown in Table 2. Figure 6 shows the distribution of the data.

Study Area Predictor Variable Distributions
Area solar radiation derived from the DSM and the point-cloud and imagery metrics gridded to raster yielded 144 data points for analysis as predictors. The histograms in Figure 7 depict distributions of the predictor variable data points used in the regression equation.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 August 2021 doi:10.20944/preprints202108.0078.v1 A collinearity analysis of predictor variables (Figure 7) displays coefficient values in the upper half of the matrix and nonparametric trend lines (loess smoothed fits) and correlation ellipses below the diagonal. Modelled irradiation (ASR) within the study area data set exhibited a correlation with canopy height. A significant positive correlation coefficient (r) of .61 provided a credibility test of the irradiation data since canopy height should reasonably have some association to WH/m 2 due to the direct relationship between photosynthetic processes to the quantity of radiant energy in one area of space relative to another area of space. Effective leaf area and vegetation density relationships were not significant. It is worth noting that LAI and ASR were negatively correlated. This could be for several reasons, but TVD was also not strongly correlated either. The acquisition date for the multi-spectral data was early in the growth season, and although the forest canopy is dominated by evergreen species, the full lower strata canopy was not at a seasonal peak. Much of the analysis extent exhibited significant canopy height and structural variation with dead biomass in all strata. A seasonal analysis of LAI would be useful to validate data to use for further canopy studies. However, other controlling factors such as soil and water limitations would also require review.
Abundance of individuals (A) was correlated with the texture of heights (TH, r = .59) and the intensity of the returns (IR, r = .70). Taxanomic diversity (MDI), was significantly correlated with leaf area (LAI, r = .69). Given the size of the plot dataset, and the early seasonality of the multi-spectral image acquisition, additional plot data and peak 'leaf-on' imagery would assist in confirming the potential usefulness of this finding. Vegetation density was negatively correlated as well (TVD, r = -.43). Taxonomic diversity adjusted for evenness (SDI), was modestly correlated with canopy height (CH, r = .42) and the intensity of the returns (IR, r = .24).
The regression model was created by first identifying the best variable subset using cross-validation statistics (the 'regsubsets' function in the R 'leaps' library), and then selecting predictors with high r 2 . To summarize, the cross-validation statistical notation is: p is the specific predictor variable and • k is the total number of predictor variables.
If M0 is the null model containing 0 predictor values, the best model (Mk) from a series of models, denoted as M0…Mp. is derived by fitting ( ) models containing k predictors.
Biodiversity is analyzed as a function of all the variables in the canopy metrics data frame. The function evaluates the predictor variables using an exhaustive selection algorithm, selecting a maximum of four predictors for each variable subset.
Using R linear modeling functions, we evaluated several variable subsets with significant r 2 values to estimate the parameters of the linear model for all measures of diversity. A cross-validation analysis is shown in Figure 9 with the coefficient of determination (r 2 ) values on the y-axis, canopy parameters on the x-axis and values indicating the variables included in the validation. Results indicate that the greater portion of the variance (.9) is associated with two of the variables for richness (R). The result indicates a similar portion of the variance (.91) associated with two of the variables for abundance (A), but the variables differ. The MDI diversity cross-validation indicates that much of the variance (.99) is associated with three of the variables, while the SDI diversity analysis indicates that much of the variance (.99) is attributed to three of the predictors and a similar proportion (.94) is associated with two of the variables. Development of the prediction model focused the two diversity indexes, MDI and SDI.  Table 3 compares the results of the SDI model (Lm0) with an F statistic and p-value that do not indicate the model itself is significant, a second model (Lm1) with two variables having the best p-values from the first model, and a third model (Lm2) with as interaction term that increases the total variance associated with the predictors. Linear model 3 (Lm3) for MDI summary statistics present the best scenario for predictive modeling given the available data. The model p-value is below .05 and the adjusted r 2 value is greater than .96. In addition, each of the predictor variables strongly contributes to the significance of the model without adjustment.
The validity of linear regression modelling for prediction rests on four assumptions: • a linear relationship between the dependent and predictor variables, • the model errors are independent, • the model errors are normally distributed, and • the model errors have a constant variance with respect to the predictor variables [54].
The plot in Figure 10a graphs the model errors (residuals) vs. the predicted values (fitted) and tests for nonlinearity as well as heteroscedacity (non-constant variance) in errors. The plotted points should be symmetrically distributed around zero (a horizontal line) indicating that the model doesn't make systematic errors [54]. The QQ plot ( Figure  10b) displays a satisfactory positive diagonal trendline for the available data. Multicollinearity, such as that exhibited between CRR and TVD (r 2 = .80), makes it difficult to assess the relative importance of independent variables if they are both used in the model, but it does not impact the usefulness of the regression equation for prediction. Even when multicollinearity is great, the least-squares regression equation can be highly predictive.
Since predicting beyond the ranges of the original data will result in model extrapolating and subsequent prediction errors, areas where the model extrapolated beyond the range of observation were 'masked out'. Average CRR at a 30-meter resolution ranged in value from .27 to .40, the average TH from .23 to .31 and TVD from 656 to 1704 per meter square. The Con tool in ArcGIS creates a raster where '1' is the assigned value if the canopy parameters are met, and if false, '0' is the assigned value.
Combining three mask extrapolation layers -one for each variable -by deriving the product (using the Times tool) of raster layers with 0 or 1 pixel values creates an output raster consisting of pixels where variables are within the range of observations (a value of 1). Finally, this raster data is used to create a combined raster extrapolation mask applied to the original prediction file with the Set Null tool to derive a new predicted range of values [54] shown in Figure 12.  Table 4 provides a comparative summary for Moran's I and Getis-Ord General G values. A 100-meter Euclidian distance parameter was estimated for spatial statistics, although an estimated value based on nearest neighbor distances is considered as a default practice when not specified. These statistics imply that the data as a whole are non-random with respect to spatial autocorrelation -the Global Moran's I p-value indicates significance, so the null hypothesis is rejected. However, the Getis-Ord General G p-value affirms the null hypothesis that there is randomness in the data with respect to the clustering not being attributable to either high or low MDI biodiversity values.
The Getis-Ord Gi* statistic is used determine statistically significant spatial clustering of data with an influence threshold distance. The resulting data can be used to create an MDI biodiversity hot-spot analysis map from the predicted raster values (Figure 13). 'Hot spots derived from this statistic are both positive and negative, like the General G, and both types have significance for biodiversity prediction, based on the study parameters.
The notation for the local Gi-star (z-score) statistic is, where, • xj is the attribute for feature j, • wi,j is the spatial weight between feature i and j, • n is the number of features in the dataset and: The statistic in this case used a fixed distance band of 100 meters to impose a "sphere of influence" or "moving window" conceptual model of spatial interactions onto the data. Each feature is analyzed within the context of those neighboring features located within the specified distance for the 'distance band' and neighbors within that distance are weighted equally. The hot-spot map data frame contained 61 points each with 2-13 neighboring data points depending on its spatial position. These hotspots varied from 90-99% based on the concentration of nearest neighbors within 100 meters of each analyzed data point.

Discussion
Area solar radiation (ASR) and canopy height (CH) exhibited a significant positive correlation, and canopy height also exhibited positive correlation to total vegetation density (TVD). Statistically significant relationships were evident between the canopy structure and biodiversity data as well. While the available data did not indicate a relationship of significant magnitude between modeled radiant flux and observed biodiversity, there was a significant relationship between CH and diversity, depending on how diversity was measured.
The MDI linear model (Lm3) comprising the canopy relief ratio (CRR), the texture of heights (TH) and total vegetation density (TVD) was the most significant of the four models analyzed, given the available data. This set of predictors consists of metrics that would influence the variability of irradiance and the extinction of PAR in the lower strata. Unlike canopy height (CH) there was no significant positive correlation between ASR and vegetation density (TVD). The fact that TVD would have some relationship to MDI stands to reason, as does the relationship canopy variability represented by CRR and TH.
The MDI linear regression model had a p-value of 0.01874 and an adjusted r 2 of 0.9687. As a predictor variable, CRR had a p-value of 0.00853 (a 0.01 level of significance), TH a p-value of 0.01412 (a 0.05 level of significance), and TVD a value of 0.00630 (a 0.01 level of significance). This model was determined to be useful for predictive spatial modeling by providing over 60 data points after adjusting the prediction model for the range of canopy metrics surveyed. Spatial statistics at the global level indicated the predicted range of MDI was notrandom overall, and the local hot-spot map produced from this predicted data indicated several areas of both high and low biodiversity concentrations based on the prediction model data. The statistical significance of each of Since the forest stands in the study area are essentially undisturbed by human activity except for recent historical fire suppression, it is important to remember that natural succession and regeneration in managed stands can differ considerably from these conditions.
Modern silvicultural methods have effectively promoted biodiversity decline and subsequently generated interest in managing for increased structural complexity to enhance stand productivity by promoting complimentary resource utilization through spatial, temporal, and physiological differentiation [6]. Structural variation in the canopy should create habitat variability and niche partitioning, allowing greater potential for measurable variation in insolation and habitat and subsequently the diversity and richness of species. The model results indicated that the study method may yield useful biodiversity prediction models, especially with additional point samples and greater focus on horizontal and vertical vegetation density metrics.
The forest area sampled was nonetheless significant and varied in terms of area solar radiation and canopy structural diversity. Even in the relatively small number of plots surveyed, there were examples of hydric, mesic and dry-mesic soils which impacted vascular plant. Additional fieldwork could be considered in order to develop a truly significant point sample variation and determine a more precise relationship of structural metrics that can be associated with measurement of insolation extinction (K). This would serve to develop a more coherent understanding of canopy insolation partitioning of the radiant flux and its impact on both diversity and productivity.
Forest productivity research indicates that both functional and phylogenic diversity significantly influence biomass productivity on small scales, while taxonomic diversity evidenced "only indirect effects" at that scale [55]. Many researchers focused on the topic believe that the classification of species into functional types rather than higher taxonomic identity may improve the understanding of processes at the ecosystem scale, including vegetation responses and effects on climate, atmospheric chemistry, land use and disturbances [56]. Although there appears to be no consensus on a uniform definition for functional diversity in the literature, there are contrasts in areas of its measurement: those that use trait values directly and those that use distance-based and dendrogram-based constructs [57].
It has also been proposed that functional diversity is affected by the range of trait values (phenotype expressions) present as well as the distinct species in that range, and notably, that the trait measured is more important than the specific measure used [57]. Low functional richness, for instance, indicates that some alpha niches (i.e., resources) potentially available to the community remain unused, reducing productivity [58]. The number of traits included in future analysis must be adequate to capture the specific function of interest, continuous traits being more effective at capturing interspecific variability in trait values than categorical traits [59].
Given the many leaf traits available to research that impact productivity, such as specific leaf area (SLA) in mm 2 g -1 , they represent a practical area of focus for future application to this model. Given that trait-based information is more descriptive of biological phenomena than taxonomic information, this could have a significant impact on prediction of spatially continuous ecological phenomena.
Guisan and Zimmermann [53] describe general patterns in the geographical distribution of species that originated in the field of biogeography to characterize the usefulness of predictive models. They note that the importance of abiotic and biotic factors at the margins of a species' range of biological and physical stresses, respectively, as well as physical limits caused by environmental gradients and physiological constraints more generally determine the habitat preferences [50]. Our regression model focuses on generality with precision to predict and accurate response within a finite set of limitations (or a "simplified reality") that implies natural phenomena are too complex and heterogeneous to be predicted accurately in every spatio-temporal aspect [50].
This technique is especially useful where canopy structure and the physical attributes of a stand, such as total biomass, are tightly coupled relationships. Biodiversity is likely a less discrete spatial association, and consideration of other abiotic factors would likely improve the model's ability to predict. Borrowing from the field of spatial econometrics, it may be useful to consider spatial weights to adjust for abiotic proximity to temperature and precipitation gradients, or even soil moisture in future iterations of the linear model.