Efficient Estimation of Biomass from Residual Agroforestry

: Cost-effective sampling methods for the estimation of variables of interest that are time-consuming are a major concern. Ranked set sampling (RSS) is a sampling method that assumes that a set of sampling units drawn from the population can be ranked by other means without the actual measurement of the variable of interest. We used data on vegetation dynamics from satellite remote sensing as a means in which to rapidly rank sampling units across various land covers and to estimate their residual agroforestry biomass contribution for a small cogeneration facility located in the center of a study area in central Italy. A remote sensing map used as an auxiliary variable in RSS enabled us to cut down the photo-interpretation of the residual biomass present in sampling units from 745 to 139, increase the relative precision of the estimate over common simple random sampling, and avoid individual subjective bias being introduced. The photo-interpretation of the sampling units resulted in a 1.12 Mg ha −1 year −1 mean annual density of residual biomass supply, although unevenly distributed among land cover classes; this led to an estimate of a yearly supply of 132 Gg over the whole 2276 km 2 wide study area. Further applications of this study might include the spatial quantification of biomass supply-related ecosystem services.


Introduction
Mapping vegetation function is an important technical task for managing natural resources as well as related ecosystem services. The classification and mapping of ecosystem services and environmental variables at local to global scales provide valuable information for understanding both natural and anthropic environments at a given time point or over a continuous period [1]. Traditionally, agriculture and forest information is sampled through field samplings or by photointerpretation of very high resolution images [2,3]. However, in the absence of auxiliary data layers, simple random sampling may not be effective to quantify environmental data because the large number of samples needed to achieve sufficient precision is constrained by economic and time issues [4][5][6].
Bioenergy produced from the residues of crops and forest wood production has gained renewed attention in the context of carbon emissions reduction [7] and climate change mitigation strategies [8]. Using biomass residues from management practices such as prunings from orchards and forest cuts as input in novel supply chains may shift a waste problem into a resource [9]. Traditionally, in Mediterranean countries, residual biomass (RB) from prunings in olive groves, vineyards, and orchards is incinerated or left in place. Entering RB into an energy provision supply chain could achieve its promise as a sustainable alternative to short-rotation forestry, thereby avoiding land use change and its negative consequences [10][11][12].
Offsetting fossil fuel carbon emissions by burning agricultural and forestry RB in sustainable and short-distance supply chains to produce heat and power is among the many strategies to cope with climate change [13]. Estimating the RB on a spatially-explicit scale that can enter a sustainable energy provision chain is key to pinpointing the location of a potential heat and/or energy production plant [14][15][16] that can consume it. The potential RB availability per land cover of the regional area around the plant and the transportation time to the plant itself are among the most critical variables that can affect the sustainability of a short-distance supply chain.
Remote sensing technologies have proven to be good tools for assessing the availability of agriculture and forest resources such as biomass. Satellite-based data have proven to be effective both in terms of biomass estimation [17] and mapping [18,19]. Earth observation satellites record and collect spatial information regularly, with wide coverage and low cost, and therefore represents an advantageous tool for the detection of natural and agricultural resources over the last decades [20]. Remote sensing data are available from coarse (more than one kilometer) to fine (sub-meter) spatial resolution, and at variable temporal resolution, from daily to monthly [21].
Due to the general availability of high temporal and spatial frequency satellite remote sensing data, local and regional areas may be sampled by integrating suitable spatially explicit indicator maps into traditional sampling schemes as they are stratification layers. This is fundamentally opposite to the usual approaches to quantify ecosystem services such as aboveground productivity and RB by using remote sensing data on populations stratified by pre-existing ancillary data available on site [22,23].
Ranked set sampling (RSS) is an alternative to simple random sampling where the basic assumption is that a set of sampling units are randomly drawn from the population and are ranked by certain means without the actual measurement of the variable of interest, which is costly and/or time-consuming [24]. RSS is conceptually similar to the classical stratified sampling in that each population is divided into sub-populations so that the units within each sub-population are as homogenous as possible. In traditional applications of RSS, the ranking process requires visual comparison of the sampling units and, due to practical reasons, plots were clustered together. We removed the need for visual comparison by using a spatially explicit indicator map derived by optical satellite remote sensing as an auxiliary variate (auxiliary variable) to drive the ranking process of the sample units. While the introduction of spatially explicit maps as auxiliary variables avoids the bias that may be introduced by subjective ranking, its main advantage lies in the possibility of extensively widening the region of interest from local areas to regional and global areas while retaining or increasing the sampling precision of simple random sampling.
Our aim was to couple a spatially explicit map derived from optical satellite remote sensing as an auxiliary variable to a ranked set sampling design to sample the potential RB available in an area of 2200+ km 2 in order to evaluate the sustainability of the biomass supply for an heating power plant located north of Rome (Italy). To the best of our knowledge, this is the first time that (i) RSS has been applied in such a wide area context, and (ii) a remotely sensed spatially explicit map has been used as an auxiliary variable to rank the sample plots.

Study Area
The small scale supply chain is spatially located in the northeastern outskirts of Rome, central Italy ( Figure 1), approximately centered on a combined heat and power biomass plant (cogeneration facility) located at the geographical coordinates of 42.103 °N, 12.628 E. The cogeneration facility was built on the property of the Research Center for Engineering and Agro-Food Processing (CREA) of Monterotondo (Rome, Italy) to pursue academic investigations. The facility can be used for thermal energy production and for micro-cogeneration by using a steam generator, with a potential production of around 40 electric kWh. The potential residual biomass (RB) needed for the continuous (7200 operational hours per year) production of heat and power is 811 Mg year −1 . A RB storage and pre-processing facility is in operation close to the cogeneration plant. RB is then delivered to the preprocessing plant where it is chipped and stored to enable dehydration, and then fed to the plant. The Tiber River, which flows from the northern to southern side, cuts the area into two eastwest regions approximately equally spaced. The eastern part overlaps the Sabine Hills, an area historically devoted to olive cultivation and the production of olive oil [25]. The western side overlaps a low-lying area mainly used for mixed farming, interspersed with ditches and small rivers where the natural deciduous forest vegetation is still conserved and historically known as the "Roman Campagna" (e.g., [26]). The Tiber River flows toward and in the city of Rome, whose urban settlements are included in the southern part of the study area.
A transport-time constraint defined the study area's outermost boundary. Any spatial point located no more than 60 min by road transport away from the pre-processing facility was included in the study area, assuming direct delivery from the pick-up point to the facilities. The pick-upfacility route was selected as the shortest path on the publicly-available road network. Motorways such as the north/south-bound A1 were excluded from the selection process. Consequently, the shape of the study area was irregular and dependent on the spatial arrangements of the road network, its road categories, and speed limits. Hilliness or slope of terrain was not considered in the route selection. Rather, those components were indirectly taken into account in route selection according to the lower street categories and lower speed limits featured in hilly areas. Due to limitations of the publicly available dataset on the road network, load limits were not considered in the optimal route selection. Faster roads widened the study area (e.g., toward the north-east to the city of Rieti along 'Via Salaria', codenamed SS4dir and SS4bis in Figure 1, or toward the north-west along 'Via Cassia', codenamed a SR2), whereas slower roads narrowed it (e.g., toward the eastern hilly area). R package osrm (OpenStreetMap-Based Routing Service [27]) version 3.2.0 was used to compute five isochronous rings at 10' intervals, starting at 0'-20' (isochronous ring code IR5) to 50'-60' (isochronous ring code IR1) from the cogeneration facility ( Figure 2, Table 1).

Figure 2.
Spatial extent and distribution of the transportation time belts (isochronous rings) from pickup points to the cogeneration facility in the study area. Table 1. Minimum and maximum transportation times (in minutes) from the pick-up point to plant facility, and surface area (in km 2 ) of each isochronous ring (IR). IR1  50  60  821  IR2  40  50  542  IR3  30  40  433  IR4  20  30  354  IR5  0  20 126 All IRs 2276

Definition of the Sampling Populations
Potential RB supply was defined as 100% of the RB available in the study area. Actual RB that can be supplied to the cogeneration plant can vary greatly due to economic and logistic reasons, even for a tiny fraction of the potential biomass.
Potential RB is unevenly distributed across land use classes ranging from agriculture (providing close to zero RB) to highly stocked forested areas. Populations were defined according to specific land cover classes on the basis of the Corine Land Cover 2018 [28] classification (CLC). The CLC is a European program that provides information on land cover and land cover changes across Europe organized into 44 classes in a hierarchical 3-level nomenclature.
CLC classes that could supply potential RB included two classes among the urban category (  Each CLC class intersecting the study area was defined as a population. As a result, eight populations were defined ( Figure 3). Each population was sampled using ranked set sampling (RSS).

Auxiliary Variable from Remote Sensing
The auxiliary variable in ranked set sampling acts as the ranking variable of the random sample units. The normalized difference vegetation index (NDVI, [29]) aggregated at yearly level (NDVI2018) was used as the auxiliary variable. NDVI is directly related to the photosynthetic capacity, and hence the energy absorption of plant canopies [30,31] and is an excellent predictor of annual tree productivity [32], which in turn, is well correlated to RB. NDVI was estimated for all available visits over the study area of the satellites of the Copernicus Sentinel-2 (S-2) mission in 2018 as: where , , is the NDVI at time t and spatial coordinates x,y, 842 and 490 are the spectral reflectances of the central wavelengths of the near-infrared and red bands of S-2 recorded at time t and at the x,y coordinates. These spectral reflectances are themselves ratios of the reflected over the incoming radiation in each spectral band.
The S-2 satellite pair aims at providing multispectral data with a 5-day revisit frequency [33]. Cloud and cirrus formations were detected through the quality assurance metadata provided and the resulting pixels masked from the NDVI calculation. NDVI trajectories for all available pixels in the study area (at 10 m spatial resolution and 5-day temporal resolution) were collected; trajectories consisting of less than 13 time points were discarded. A harmonic model of time was fitted to each profile and NDVI harmonic trajectories were predicted every 10 days: , , where , , , and are harmonic model coefficients fitted to each (x,y) coordinate pair. An example of yearly NDVI trajectories for an olive grove and a broad-leaved forest in the study area is given in Figure 4. NDVI variability between successive values largely accounted for discrepancies between the spatial resolutions of the cloud mask (60 m) and the spectral bands [34], particularly evident at the boundary between the cloud and non-cloud mask. The raw NDVI trajectory of the broad-leaved forest appears to have been more severely affected by this issue, probably due to the higher frequency of cloudy conditions over mountain ranges where this land cover is prevalent.  Figure 5. Spatially explicit map of the normalized difference vegetation index for 2018 used as the auxiliary variable in RSS over the study area. The red-colored area corresponds to urban built areas (NDVI close to 0), greenest colors denote highly stocked vegetated areas such as broad-leaved forests (NDVI close to 300,000).
The estimation of the NDVI profiles and total NDVI2018 as the auxiliary variable was performed in Google Earth Engine [35].

Sampling Design
The essence of the ranked set sampling (RSS) is conceptually similar to stratified sampling where ranks are used to post-stratify the sample plots, resulting in a population being divided into subpopulations so that the plots within each sub-population are as homogeneous as possible.
RSS was first proposed by [36] in an effort to more efficiently estimate the yield of pastures by avoiding mowing and weighting the hay in every single sampling unit. Since then, it has been used in a variety of contexts where its efficiency has been proven such as for estimating shrub biomass in Appalachian Oak forests [37], sampling soil locations for estimating plutonium inventory [38], population censuses [39,40], genetic linkage analysis [41], horticultural surveys [42], and yield estimation in fruit tree orchards [43]. An overview of the original form of non-parametric ranked set sampling as well as balanced and unbalanced non-parametric design and RSS with auxiliary variables has been provided by [24].
A set of random sample plots was drawn independently from each population defined by the CLC class. The variable of interest was the yearly potential RB. The number of sample units varied according to population heterogeneity (estimated as the NDVI2018 standard deviation), and to spatial extension and fragmentation of the population area (Table 3). Table 3. RSS design parameters for the populations belonging to the CLC classes. k is the number of ranks (or sets); n is the number of cycles; RP is the relative precision of the estimator of the mean for RSS relative to the usual estimator for simple random sampling; the random sample is the sample size randomly drawn in each population; the ranked set sample is the sample size selected under RSS; and the discarded sample is the sample size that was discarded after selection and ranking. Each set of random sample plots in each population was ranked according to the total summation of NDVI2018 values of the pixels falling in. A ranked set sample was selected from each set of random sample plots. Ranked sets samples are usually described in terms of sets and cycles. A 'set' is a random sample of k units; the selection of k sets in sequence completes a 'cycle'. Balanced rank set sampling entails the selection of k sets in each of the n cycles of sampling. Thus, a total of nk 2 = nk + nk(k − 1) sample plots are selected from each population. From these nk 2 sample plots, a ranked set sample of nk plots is extracted for actual estimation of potential biomass. Of the k sample plots in the first set, only the plot with rank 1 is accepted in the ranked set sample. Next, a second set of k plots with rank 2 is accepted, and so on until the kth plot is accepted. This procedure is repeated in each of the n cycles. The unselected nk(k − 1) sample plots are discarded [4].

CLC
The optimal choice of n cycles and k sets varied according to the expert judgement of the heterogeneity of the variable of interest and the spatial extent of each CLC class. The higher heterogeneity and/or the wider spatial extent, the greater the k and n chosen ( Table 3).
The relative precision of the estimator of the mean for RSS relative to the usual estimator for simple random sampling (i.e., the ratio of the sampling variances) varies with set size k from 1 to (k + 1)/2, which means that RSS is always at least as precise as simple random sampling and increases with the number of ranks the population is divided by [4].
The unbiased estimator for the yearly potential supply of RB, per hectare is given by: where is the supply of RB density in the RSS sample plots, estimated by photo-interpretation (in Mg ha −1 yr −1 ). A consistent estimator for the variance of the mean is in [14] (p. 26): (5)

Photo-Interpretation of Potential Residual Biomass Density in the Sample Plots
Sample plots were projected onto Google Earth software and an expert photo-interpreter identified and quantified the potential RB density available in Mg ha −1 yr -1 . To aid in the process of RB density estimation, the photo-interpreter was given relevant RB density corner values to choose from. The RB density corner values were set according to the CLC class and tree density.
Four RB corner density values (Bc) were defined to cover maximum, ordinary, minimum, and no potential RB density that each CLC class could sustain in the study area (Table 4). Bc in urban and agriculture CLC classes (1.4.x and 2.x.x classes) was estimated as the total RB density available (in Mg ha −1 year −1 ) after each pruning operation averaged by the time between each operation: where bc is the average RB per tree (t tree −1 ); dc is the tree density (tree ha −1 ); and y is the number of years between each pruning operation (year). Minimum RB corner density value (c = 4) was set to 0 Mg ha −1 yr -1 .
Single tree estimates for bc are not available in broad-leaved forests. We used scaled up RB values at the hectare level ( ⋅ ) averaged over the typical rotation period for coppices (25 years) in central Italy.
The photo-interpreter could assign as many as four RB density corner values to different subareas of each (r, j) sample plot. The final potential RB density for each sample plot (in Mg ha −1 yr −1 ) was calculated as: where [ ] is the total area of the sample plot (r, j), in ha; [ ] is the area in ha of the c-th sub-area of the sample plot; and [ ] is the RB corner value assigned to the sub-area of the sample plot (in Mg ha −1 yr −1 ). Potential RB at the population level was estimated by Equations (4) and (5).
Examples of RSS sample plots in each CLC class are provided in Figure 6.

Results
The eight relevant land covers for potential residual biomass (RB) supply to the cogeneration plant can be grouped into three categories: urban (CLC class 1.4.1 and 1.4.2), agriculture (2.x.x classes), and forest (3.1.1).
Agriculture was by far the most spatially representative land-cover category in the study area, totaling 863 km 2 . Agricultural land was predominantly occupied with olive groves (41%) and fields were mainly aimed as crops, mixed with forestry land (35%). Deciduous forests extended for 414 km 2 , mainly distributed on the farthest rings away from the cogeneration plant. Urban land covers proved to be negligible in terms of extension and were mostly far away from the plant (Figure 7). The outermost area (IR1) was dominated by broad-leaved forests and agriculture crops, especially on the east sector of the study area (the Sabine Hills), while olive groves and complex cultivation patterns became increasingly dominant in the inner areas (Figures 3 and 7). The area closest to the cogeneration plant (IR5) was the poorest in terms of available RB.
The initial random sample units draw comprised a total of 745 sample plots across the eight CLC classes while the automatic ranking based on the NDVI auxiliary variables enabled us to discard 606 sample plots and to proceed to the photo-interpretation of 139 sample plots (19% of the original sample entity).
The photo-interpretation of the sample plots resulted in 1.12 Mg ha −1 yr −1 overall potential mean RB annual density across all land cover classes (Figure 8), although this was very unevenly distributed (standard deviation 0.457 Mg ha −1 yr −1 ). Green urban areas (1.4.1) such as natural parks and recreational areas carried the highest potentiality to supply RB (1.64 Mg ha −1 yr −1 ), although this varied widely from site to site and was the RB from vegetation operations on trees such as felling and pruning. At the other end of the spectrum, RB density from sport and leisure facilities (1.4.2) and complex cultivation patterns (2.4.2) was well below the average (<0.5 Mg ha −1 yr −1 ) among the land covers, signaling that they could be ignored while defining the economic sustainability of the supply to the plant.
A very high density in RB was also found in agricultural lands such as olive groves, orchards, crops mixed with forests, and, especially, vineyards (2.2.3, 2.2.2, 2.4.3, and 2.2.1), although sampling in the very low spatial extension of the latter land cover (0.16 km 2 ) probably drove the high variability around the mean density value.
Forests in the study area provided 0.767 Mg ha −1 yr −1 , although this is a long-term average (25 years rotation period, Table 4) and the actual yearly supply may be dependent on the spatial extent of the forests being cut year by year. Coupling the mean annual density of RB to CLC class area extensions provides an insight as to where and how far the RB reservoirs are (Figure 9). The total yearly potential RB available in the study area was 132 Gg. The western areas (the "Roman Campagna") could predominantly supply biomass from mixed agriculture/forest land cover (CLC 2.4.3) in moderate to high amounts each year (38 Gg available with a 30 to 60 min transportation time). The bulge area in IR1 was caused by the presence of a fast road toward north-west (SS2, Figures 1 and 2) making more heterogeneous land cover available, but does not represent an advantage in terms of supplying the plant due to the low potential RB quantities that these land covers can sustain and the high landscape fragmentation.
Moderately far away areas (a 20 to 40 min transportation time) can supply around 50 Gg year −1 predominantly from the potential RB of olive groves.
Overall, the broad-leaved forests provide 31.8 Gg year −1 RB supply, predominantly from the farthest areas to the east of the cogeneration plant (known as the Sabine Hills, a 40 to 60 min lorry drive time, Figure 3 and Table 1). The presence of a fast road (SS4 and SS4bis, Figure 1) from the north-east is a coincidental advantage to the supply chain as it makes RB from forests in IR1 ( Figure  2) available (i.e., no farther than a 60-minute drive time) to the cogeneration plant.
RB supply from green urban areas and leisure facilities was negligible and very far away from the cogeneration plant.

Discussion
The objective of this study was to propose an effective methodology for the evaluation and mapping of residual agroforestry biomass by coupling remote sensing and GIS to a traditional sampling design. In this framework, this study has introduced for the first time an auxiliary variable in the form of a spatially explicit indicator map from satellite remote sensing to augment the efficiency of the ranked sampling scheme. Although the study area was constrained to extend no further than a 60 min transport time from the pick-up location to the processing plant, the focus of our study was on the estimation of agroforestry residual biomass, rather than optimizing the logistics or the optimal facility localization [51,52]. The sampling design was applied on a large study area (2276 km 2 ) to estimate the potential residual biomass (RB) supply available to a cogeneration plant located in its center.
On the assumption that all yearly potential RB is available to the cogeneration plant, we estimated that the supply of biomass will greatly exceed the biomass demand of the plant by almost 200 times (132 Gg year −1 vs. 811 Mg year −1 ). A third of potential RB supply (39 Gg year −1 ) is accounted for by olive groves geographically located very close or moderately close to the plant (0 to 40 min lorry drive time) in the Sabine Hills. The second most overall supplier of RB (38 Gg year −1 ) lies in the crop/forest cultivation pattern that is widely represented in the Roman Campagna, west of the plant, and moderately far away from it (i.e., 30+ min transport drive time). Its high landscape fragmentation may pose a few constraints as far as the pick-up of RB operations is concerned and its economic sustainability will be investigated in a further study.
In this study, the auxiliary variable we selected to rank the samples was easily computed on cloud-computing platforms such as Google Earth Engine. The NDVI vegetation map of the study area contributed to define the total NDVI for each sample plot simply by summing each pixel NDVI value. Ranking of the sampling unit in each set/cycle based on their NDVI values is a straightforward not-intensive operation easily carried out via automatic routines. One further advantage of using an automatic ranking procedure is given by increasing the set sizes to improve RSS efficiency over simple random sampling, particularly on highly heterogeneous populations. Traditionally, set sizes of three or four are common due to the difficulty of individual judgement ordering them into a large number of ranks. In this study, we used a set size of seven in olive groves and complex cultivation patterns land covers, and of five in land principally occupied by agriculture land cover.
Traditional RSS ranking often involves judgment. Although erroneous ranking does not lead to biased estimates [53], it may fall into random assignment and may lower the efficiency of the RSS to the simple random sampling baseline [4]. Judgment may still lead to bias when the individual doing the ranking preferentially or purposively gives rank 1 to the sample plot they would most prefer to extract from the first sample, rank 2 to a preferred plot in the second sample, and so forth. The automatization of the ranking through a spatially explicit auxiliary variable enables us to easily overcome this potential bias. The cost-effectiveness of RSS is clear: in this study, a total of 745 sampling units were randomly drawn in all land cover populations. Ranking based on a priori defined set/cycles led to the estimation of the variable of interest (potential RB) in only 139 sampling units, 18% of those we would have sampled through simple random sampling.
A number of practical factors contribute to significantly lowering the available RB supply including the efficiency of the recovery from the field [54], the optimal quantity that must be left in the field or in the forest for agronomical and ecological reasons, or to maintain soil organic matter and prevent erosion [55]. In addition, it should be emphasized that the actual RB supply heavily depends on owners being economically rewarded by providing it to the plant. A further analysis will expand this study from an economic perspective to assess the economic sustainability of the cogeneration facility. The output of such studies may also represent a contribution in the framework of ecosystem services mapping, which is becoming mainstream in many sustainability assessments, but its impact on real world decision-making is still limited [56].