Real-Time Software for the Efﬁcient Generation of the Clumping Index and Its Application Based on the Google Earth Engine

: Canopy clumping index (CI) is a key structural parameter related to vegetation phenology and the absorption of radiation, and it is usually retrieved from remote sensing data based on an empirical relationship with the Normalized Difference between Hotspot and Darkspot (NDHD) index. A rapid production software was developed to implement the CI algorithm based on the Google Earth Engine (GEE) to update current CI products and promote the application of CI in different ﬁelds. Daily, monthly, and yearly global CI products are continuously generated and updated in real-time by the software. Users can directly download the product or work with CI without paying attention to data generation. For the application case study, a change detection algorithm, LandTrendr, was implemented on the GEE to examine the global CI trend from 2000 to 2020. The results indicate that the area of increase trend (28.7%, ∆ CI > 0.02) is greater than that of the decrease trend (17.1%, ∆ CI < − 0.02). Our work contributes toward the retrieval, application


Introduction
The canopy clumping index (CI) is a measurement of the spatial distribution pattern of foliage. Foliage commonly has a non-random distribution, which is aggregated at the canopy, branch, and shoot scales [1,2]. Theoretically, CI is >0. When CI < 1, the foliage has a clumping distribution. When CI > 1, the foliage has a regular distribution. The foliage has a random distribution when CI = 1. CI reflects the gap distribution within a vegetation canopy, thereby influencing light interception and evapotranspiration [3,4]. Previous studies have shown that the retrieval of other important vegetation indexes and the modeling of radiative transfer can be improved when the vegetation-clumping effect is taken into consideration [5][6][7]. Terrestrial ecological models also require CI as a key parameter for the better estimation of other variables, such as the Boreal Ecosystem Productivity Simulator [8], the soil temperature inversion procedure [9], and the canopy photosynthesis and evaporation model [10].
Currently, field and remote sensing methods are used to acquire CI [6]. The field method is categorized into direct and indirect methods. The direct method acquires CI by its straightforward definition as the ratio of effective leaf-area index to the leaf-area index [2,11,12]. The indirect method acquires CI using a series of optical instruments, such as digital hemispherical photography, the LAI-2200 plant canopy analyzer, and the tracing radiation and architecture of canopies, to manually capture the radiation above or below the canopy [6,13]. The field method is labor-intensive, and the obtained CI is limited to several plots wherein the model or relationship needs to be validated. The remote sensing method consists of the passive optical method, which is an estimation based on an empirical relationship between CI and indices constructed on reflectance values obtained from optical sensors [14,15]. The active light detection and ranging (LiDAR) method estimates CI using a gap-fraction-based method or a gap-size-based method using LiDAR cloud points because the laser pulse can penetrate the canopy [16,17].
Researchers have produced global CI products that are derived from optical images and process them locally. The spatial and temporal resolutions of these products range from 500 m to 6 km and daily to yearly, respectively [6,14,[18][19][20]. In a previous study, we generated global and long-term coverage (2001-2017) of CI products (hereafter referred to as CAS-CI) with a high temporal resolution based on an empirical relationship between the Normalized Difference between the Hotspot and Darkspot (NDHD) reflectance values derived from the moderate resolution imaging spectroradiometer (MODIS) dataset [18]. The product is freely available (http://www.geodata.cn/thematicView/modisCI.html accessed on 25 May 2022), and it has been supporting a series of research projects. However, producing the global CI requires intensive computing due to the large quantity of input data and the enormous image preprocessing, and it utilizes complex algorithms. Additionally, the current CI products are faced with difficulties such as inefficient data distribution and limitations in terms of computation and storage, especially on a global scale. These disadvantages restrict the timely updating of global products, and thus impede research on and applications of CI. Similarly, these obstacles also exist in CAS-CI products as they are not scaled correctly, and their distribution to the global scale leads to heavy data downloading, which wastes storage and increases the cost of computation during subsequent data analysis. Although CAS-CI has a long-term time series, its increasing application demands the update of the temporal coverage of the current product and an efficient dissemination method. Remote sensing images are spatiotemporally collected on a continuous basis, which necessitates the researcher to frequently update CI, which involves a tremendous effort to maintain previous works by downloading locally to process numerous images. Therefore, it is desirable to automatically fetch ongoing satellite observations and disseminate CI products universally for immediate applications.
A cloud computing platform equipped with high computational and data storage capacities will enable one to overcome the above-mentioned difficulties. Rapid computation services provided via the internet facilitate a variety of applications involving complex modeling and the analysis of large amounts of data [21][22][23]. The Google Earth Engine (GEE) is a cloud-based platform that integrates online computation with extensive earth observation datasets (https://earthengine.google.org/ accesed on 25 May 2022). The GEE also consists of tools with specialized processing and visualization application programming interfaces for geospatial data. The GEE provides free personal storage of 250 Gbit to each account, allowing users to expediently carry out research using their own data, together with its continuously archived satellite images and released data products [24]. The GEE enables the convenient performance of large-scale and long-term geospatial analysis by way of computation offloading and allows users to visualize and export results. Many widely used algorithms are deployed on the GEE, thereby enabling the successful implementation of a series of thematic methods such as land cover mapping, vegetation index retrieval, and change detection algorithms [25][26][27]. Thematic software that is specifically designed for different applications is also deployed on GEE. For example, GEE-enabled software is provided for processing Landsat images such that they can be converted into a collection of spatiotemporally high-quality Landsat images [28], and other software is used for the rapid analysis of floodwater data to estimate the floodwater's depth [29].
Consequently, our objective is to provide online software that completely leverages the strengths of image analysis and data synchronization in the GEE for the production of CI and its visualization and custom download, which enables fast utilization of the CI product and the quick receipt of feedback to improve future CI estimation. This GEEembedded software was developed to produce CI products in line with the methods and technologies that are available in some previous works [15,18]. Additionally, a change detection algorithm, namely, Landsat-based detection of trends in disturbance and recovery (LandTrendr), was implemented directly on the GEE to examine the characteristics of changes in global CI as a test case and track the temporal change of yearly CI during 2000-2020.

Materials
The input to the software is provided by the MODIS image datasets archived in the GEE and other required datasets that were not included in the GEE, such as the fraction of vegetation cover (fCover) and global land-cover classification product (GLC2000) ( Table 1). Kernel parameters such as f iso ,f vol , and f geo (MCD43A1) and observation information including MOD09A1 and MYD09A1 are MODIS-released products that are already archived in GEE. The normalized difference vegetation index (NDVI) as an input was calculated from the near-infrared band and the red band products from MCD43A4. MCD43A2 records the degree of contamination in pixels during the imaging process, which was used to indicate the quality of the final product. The GEOV fCover V2 was used as the input, and it was comprised of a collection with a 4-month interval that was released every 10 days. The fCover was uploaded into the GEE using monthly images, with the earliest start date of each month from March 2000 to May 2020. GLC2000 mapped the relatively detailed global vegetation categories, and it was also manually uploaded as a single image. All datasets can be filtered by the date specified by users, and the filtered image collection is sliced daily by matching their temporal resolution and then used in the software.

CI Retrieval Method
Optical remote sensing data are commonly used to estimate CI products by building an empirical relationship with reflectance or the vegetation index [6,14]. The retrieval algorithm adopted by Wei et al. [18] uses the linear relationship between CI and NDHD ( Figure 1): where θ s is the solar zenith angle (SZA), A(θ s ) and B(θ s ) are coefficients jointly determined by solar zenith and the shape of the vegetation canopy, previously modelled by Chen et al. [14]. The NDHD can be calculated using Equation (2) where ρ h and ρ d denote the reflectance at the hotspot and darkspot directions, respectively.
The Ross-Li model [30,31] is used to calculate ρ h and ρ d : where θ s , θ v , and ϕ are the solar zenith, observational, and relative azimuth angles, respectively. ρ h was calculated when θ s equaled θ v and ϕ was set to 0. ρ d was derived when θ s equaled θ v and ϕ was set to 180 • .  Table 1 for a description of input data.
The SZA is an average value between Terra and Aqua overpasses in their overlapping date obtained from observation information described in Section 2.1. f iso , f vol , and f geo are kernel parameters provided by the red band of MODIS43A1 products. k vol and k geo are parts of the bidirectional reflectance distribution function (BRDF) [30]. The expression to compute k vol is based on Roujean et al. [32].
k geo is called the LiSparse-R kernel, which is derived from Li et al. [33]. The expression and parameter settings are as follows: The variables θ s and θ v are theoretically in the range of 0-90 • ; ϕ was set to 0 • or 180 • for the calculation of reflectance corresponding to the hotspot and darkspot, respectively. The dimensionless parameters h/b = 2 and b/r = 1 were preselected from [30]. θ s was set to 60 • when the fCover was <25%, to reduce the overestimation of CI in sparsely vegetated areas.
Finally, the hotspot correction was conducted through an empirical relationship between the Ross-Li derived MODIS and the polarization and directionality of the Earth reflectance (POLDER) hotspots: ∆BRF hs is the difference in reflectance between POLDER and MODIS hotspots, to be parameterized by NDVI and θ s for the red band of MODIS BRDF.

Software Design
To ensure the consistency of the computed CI by the developed software with that of the initial version of the products, the retrieval process followed the previous method by Wei et al. [18], in which a lookup table was built between the hotspot/darkspot value and the observation information was matched according to the input. This software directly computes CI instead of matching based on criteria because computation at a global scale is not worth considering in the GEE.
The general steps for retrieval are listed briefly as follows: Step 1: Filter dataset by the user designated the date for new image collections.
Step 2: Select bands needed to be calculated using image collections to generate multi-band images for each day.
Step 3: The daily CI was retrieved based on band calculations, and low-quality pixels were excluded.
Step 4: The Savitzky-Golay smoothing filter (SG-filter) [34] was applied for daily CI collection, and then the monthly or yearly CI image was composited using quality indicators to provide the final CI for download.
When calculating NDHD, θ s and θ v were confined to the range of 0-60 • at an increment of 10 • and the pixels with an observation angle of >60 • or their corresponding fCover of <0.25 were set to 60 • . Therefore, k vol and k geo can also be calculated in advance based on the three parameters of angle and Equations (4)-(6) ( Table 2). During the determination of the final linear coefficients (A and B), θ s and θ v were confined to an increment of 5 • . Table 2. k vol and k geo , corresponding to hotspot (ϕ = 0 • ) and darkspot (ϕ = 180 • ), can be calculated against solar zenith angle (SZA) at an increment of 10 • . The quality and snow bands of MCD43A2 were used to exclude the contaminated pixels and indicate the retrieval quality of CI pixels, and based on the quality, the quality indicator band (QA) was generated and integrated into the final export. The main (QA = 0) and magnitude (QA = 2) inversions indicate the higher and lower qualities of CI, respectively. The generation of the QA was done according to Wei et al. [18]. For compositing, all the main inversions of the CI results within a month or year were selected based on quality information, and then their averages were computed and mosaiced with the average of the magnitude inversions to generate high-quality monthly or yearly CI images. Some settings of the final images for downloading are listed in Table 3. For convenience, a multiple band image was first generated by selecting corresponding bands from each remote sensing dataset at a given interval. The most recent image was repeatedly used when these datasets had no available image due to the difference in temporal resolution between remote sensing datasets, which proves that the retrieval of daily CI is possible. After all daily images consisting of input bands were ready, CI retrieval was conducted on a global scale. Software was devoted to provide a user-friendly interface to allow users to select the date and spatial coverage to export required products.

GEE-Based Software to Retrieve and Download CI
The software developed for GEE's JavaScript version mainly provides an online function of "generation and then download" for CI. The initial interface of this software is shown in Figure 2. After inputting the two dates (period) formatted as date strings, one of the buttons with text as daily, monthly, and yearly is clicked. The next screen will show the corresponding date list for selection. The daily CI image will be directly generated after smoothing. The monthly and yearly CI images are averaged from the daily smoothed CIs within the selected month and year, respectively. This software allows users to download the CI product via real-time online generation by a self-defined date and coverage, which is an improvement over the initially released product that preserves locally first and then provides for global download. The product containing CI and quality information is exported in the GeoTIFF format to the Google drive that shares the same account with that of the GEE. The image value is scaled by 1000 times to achieve a compact file size. The product can also be directly exported with only a part of the image by selecting only a small region to download. Because of constraints in the GEE, while exporting the global extent (60 • S-90 • N and 180 • E-180 • W), the product will be automatically clipped into nine equal-sized parts of the image. Figure 3 displays the global CI in 2020, and its quality distribution was computed by this software. The lower CI is concentrated in the forest region, e.g., the equatorial regions and boreal forest. The higher CI is primarily observed in arid or semiarid regions. The magnitude inversion is observed in South America, Central Africa, and Southeast Asia because of the continuous cloud cover over these areas. The results derived from this online retrieval software are very consistent with those of Wei et al. [18]. Additionally, the average CI in January, April, July, and October from 2000 to 2020 was composited using this software and exported to display the seasonal variation of CI ( Figure 4). In the Northern Hemisphere, the CI reaches the maximum in January, and the valid CI areas are relatively small because of low vegetation coverage in winter. In July, a low CI is observed in eastern Eurasia and North America. In the equatorial region, the CI is always low because of dense forest coverage.

Temporal Variation of CI
Variations in CI reflect specific temporal characteristics of vegetation growth, disturbances, and radiation budget and may indicate climate change in terms of long-term ecological dynamics [35,36]. Change detection in global CI can provide a rich and temporally consistent description for the trend, the change magnitude, and the date of change. A change detection algorithm (LandTrendr) was implemented for the time series of annual average CI as a test case to find the change trend. LandTrendr first segments the entire series from 2000 to 2020 into several segments based on the statistic method. Each segment indicates the monotonously increased, decreased, or stable trend in a specific period. In this study, the trend segment with the maximum change magnitude was selected to analyze and remove trend segments that covered <3 years because several segments can be detected from the time series of CI.
The CI change magnitude ( Figure 5) was derived from the difference between the two CI values, corresponding to the start and end dates of the trend. The results indicate that 54.2% of the pixels have no abrupt or gradual trend in the entire time series, and 28.7% of the pixels show a positive trend. A negative trend (17.1%) indicates that the foliage of vegetation is clumping, and this situation is mainly distributed in evergreen forests. The decreasing trend was tracked in northern India and central China, which is consistent with the results of Wei et al. [18]. Because the annual variation of CI is not distinct compared to its seasonal variation [20], the annual average value can cover the difference that exists in the specific period. Additionally, the range of CI is from 0.3 to 1.0, and its trajectories do not show variation as significantly as the LAIs [15,19]; therefore, the marginal trend is inadequate for detection by the LandTrendr. Here, for better comparison, the spatial distribution of duration and its percentage were drawn except for the 21-year segment (full-period trend) as most of the area exhibited a stable trend during the 21 years. Figure 6 spatially displays the duration of the segment trend through discrete coloring for 5-year intervals, and Figure 7 shows the statistics for the duration. These illustrate that the segment trend was maintained for a few years; however, the trend persisted for the full period, and those segments tended to concentrate in arid regions or those of very high latitude. The successive distribution of several year-long segments in those areas could have been induced by the bad retrieval of CI or the occurrence of strong disturbance of the vegetation.

Contribution of the Study
For the design of the CI retrieval software, many types of data, including the monthly fraction of vegetation coverage and the landcover map, were first uploaded to the GEE and then shared publicly so that others can directly obtain results using this visualization software. Our software generates and disseminates canopy CI effectively and conveniently and thus supports subsequent research to be conducted efficiently within a user-defined scale. First, our CI retrieval software runs on the internet, without installation and extra requirements for personal computation, and directly provides downloading of CI at three temporal scales (daily, monthly, and yearly). In terms of retrieval, the CI was smoothed using an SG-filter for the user-selected collection to ensure spatiotemporal consistency and continuity, which is otherwise inhibited due to the difference in satellite observation quality at different time points. The monthly or yearly export is averaged from the smoothed collection and then composited. A comparison of the offline and GEE outputs was added to Appendix A. The year 2019 was selected for the comparison. The two outputs show consistent fluctuation ( Figure A1). The GEE output shows more distinct seasonal variation but slightly underestimates the offline CI (bias <0.03). This is mainly attributed to a new version of fCover used in the GEE estimation. The GEE version uses the NDHD method directly, while the offline estimation retrieves multiple values first from the lookup table and outputs the averaged value.
Three different kinds of temporal-scale CI products with their quality indicators for the period March 2000 to May 2020 were released using this software. Users can also define their spatial or temporal scale to obtain CI products through a very simple modification of relative parameters in the shared code. fCover will be continually uploaded once the latest is updated on https://land.copernicus.vgt.vito.be/ (accesed on 25 May 2022) to enable our software to generate a real-time CI product as soon as the MODIS products are renewed in the GEE. Consequently, the developed CI downloading software with its code extends current product coverage and provides an opportunity to update CI products at the same pace at which data are input.

Applicability of the Software
A variety of key variables related to the ecological function of vegetation are influenced by the foliage clumping effect [3,12,37]. Our software is very promising as it is integrated with the global CI product to estimate global products such as vegetation cover, leaf-area index, and evapotranspiration. The GEE provides a series of basic operations for image or other geospatial data, together with thematic algorithms such as classification, deep learning, and change detection. Therefore, this software provides the possibility to conduct global-scale research related to CI relying on abundant remote sensing data archived in the GEE and its powerful computation ability. Consequently, we recommend that a large-scale analysis related to CI can be directly conducted on the GEE because it supports better online data-intensive processing than downloading data of several decades.
The variation in CI indicates the change in vegetation in terms of phenology, the receipt of radiation, and growth trends during ecological processes. The test case shows that the annual averaged CI has a relatively stable trend from 2000 to 2020, and this could be because of the relatively distinct variation in seasonal CI instead of the interannual average CI. The key characteristics derived from the change detection in seasonal CI have promising applications in terms of classification of vegetation, phenology monitoring, and climate change modeling.

Retrieval of CI in the GEE
The new version of monthly fCover was adopted in the production period. The fCover with a spatial resolution of 1 km was produced based on SPOT/VEGETATION data; however, it was not archived by the GEE [38]. Currently, the spatial matching fCover was produced based on a linear analysis of spectral reflectance derived from MODIS [39], which could be easily implemented in the GEE. The uploaded GLC2000 is a single image that determines the final retrieval coefficients based on the vegetation type. The global MODIS yearly land cover product was archived in the GEE. Therefore, all required data for CI retrieval were directly provided or further generated using the MODIS observation dataset. Additionally, the MODIS dataset has been continuously collected and accumulated in the GEE, which enables one to achieve a better spatiotemporally matched CI retrieval by avoiding a manual upload of the required data.
The CI retrieval by this software was achieved based on the generation of a look up table, indicating that the CI result is an approximation because input data were approximated. Therefore, the new retrieval algorithm is expected to directly link the input and output.

Conclusions
The increasing requirements for global CI pose significant challenges for data processing, storage, and dissemination. Current CI products need continuous updating to deal with the continuous generation of new data. To alleviate restrictions posed by previous products and make the application of CI convenient, visualization software that actualizes CI retrieval was developed by relying on the GEE to allow users to download and use real-time products that can be customized based on specific regions and dates. The CI is an important structural parameter of vegetation that influences the transfer of radiation by interacting with the canopy. Based on the developed software, global-scale research on CI can be directly carried out under the aegis of the GEE. The developed software will continuously update CI as soon as the latest materials are available in the GEE. Our retrieval program, which is coded using JavaScript, is shared such that many research contributions to and applications of CI can be achieved globally.

Acknowledgments:
The authors would like to thank Shanshan Wei and Yinghui Zhang for their support regarding the implementation of the algorithm, and Sijia Li and Miao Yu for their assistance with the software testing. We also thank the developers in the GEE community for sharing very useful sample codes.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
A comparison of CI derived from GEE with the offline CAS-CI product.