Anthropogenic and natural stressors and disturbances affect all levels of biotic organization in forest ecosystems, potentially affecting their resilience. Relationships among drivers, stressors, disturbances, and effects are complex, often non-linear, and multidimensional on temporal and spatial scales. Adaptation processes on forest species that are both species-specific and regional specifics make causal stress responses and their effects on resilience in ecosystems even more difficult to understand [1
]. In order to gain a better understanding of the effects of different stressors and disturbances and support forest managers, decision makers, and politicians in their decisions on forest management, a holistic approach is imperative. This includes data recording, monitoring, analysis, and the assessment of forest health on all levels of forest health monitoring [2
] and also a monitoring of biodiversity and forest ecosystem change through effective global coordination [3
] good observations, indicators and scenarios of biodiversity and ecosystem services change [4
There is still a large discrepancy between the information required by forest managers and scientists and the information that is available for understanding and assessing the complexity and multidimensionality of forest health drivers, stressors, disturbances, and effects. Decision makers require information on forest health in high spatial and temporal accuracy, from the local to the global, for short and long-term periods that can be recorded on a low-cost basis. Such information should be comparable among regions and based on harmonized methods. For this reason, forest inventory programs have largely been harmonized [5
]. On the one hand, indicators of forest health used in forest inventory programs are accurate, but on the other hand, their acquisition is time and cost-intensive, so that in most cases plot-based monitoring has to be carried out at long intervals (1 year).
Today, an increasing amount of forest health data is available from national forest inventories and monitoring programs [8
], experimental studies of forest ecosystems [10
], as well as from remote sensing (RS) data and RS data products [11
]. In addition, RS technologies can be used to derive a large range of forest health indicators [6
]. Studies increasingly refer to the importance of combining national forest inventory field plots and RS data for assessing forest health [18
], but only a limited number of countries [20
] and operational programs support plot-based national forest inventories monitoring by using close-range and air and spaceborne RS data.
A study on the significance of RS in national forest inventories [20
] showed that approximately 71% (32 out of 45 countries) consider their national forest inventory (NFI) to depend on or partly depend on earth observation data. Approximately 31% of those asked (14 countries) found the use of aerial images for NFI to be indispensable. Only around 13% of those surveyed considered RS data to be essential for NFI, whereas 9% of those surveyed found that a combination of aerial images and satellite images was necessary. Another 56% believed that RS data cannot be used for NFI. No country has outlined the use of radar or light detecting and ranging (LiDAR) RS data from air and spaceborne RS sensors or unmanned aerial vehicles (UAV) [20
]. In a further study focusing on European countries that was conducted in 2015, 24 out of 34 countries claimed to have made “no changes” in continuing the use of existing information instruments such as RS technologies, whereas eight countries quoted “changes” to their national forest inventories related to the use of new RS technologies [21
The reasons for a low integration of RS technologies in forest health monitoring are listed in the following. (i) Previously developed RS technologies either do not record certain forest inventory indicators or only record them with insufficient quality. (ii) Complex and large RS data often pose high technical and personal requirements for data management, storage, processing, analysis, and the derivation of forest health indicators. (iii) The processing and analysis of RS data requires highly skilled RS training, extensive expertise, and the access to RS software and RS data with high spatial and spectral resolution. (iv) The methodological approaches, variables, and recording parameters of forest inventory [6
] differ from those of close-range, air and spaceborne RS approaches [1
]. The plot-based forest inventory is primarily assessed by field visits. For the accuracy of national forest inventories, long-term experience with indicators and easily enforceable standardization targets are required. Difficulties in the recording and assessment of forest inventory indicators not only involve specialist competences, but also physiological, species-specific, bioregional or climate-related knowledge. Reasons underlying the shift in forest health indicators [22
] might thus easily be misinterpreted by the person processing the forest inventory data.
Plot-based forest inventory indicators are thus a combination of indicators that can primarily be recorded by humans (human-driven), and partly also by the implementation of instruments (machine-driven). (v) Conversely, close-range, air and space-borne RS techniques record machine-driven indicators of forest health such as spectral traits (ST) and spectral trait variations (STV) in a spatially and temporally explicit way [1
]. The accuracy in deriving forest health indicators using RS depends “on the spatial, spectral, radiometric, angular, and temporal resolution of the RS techniques, the distribution of ST/STV in space and their temporal changes, the choice of the modeling method (classification, biophysical/chemical parameter estimation), the geographic data representation (pixel-based or geographic object-based), and the appropriateness of the RS algorithm and its assumptions for the given spectral traits” [25
]. The application of data science methods combined with machine-driven forest health indicators (forest inventory, in situ and RS indicators) is objective, quantitative, repetitive, and comparable, and thus the basis for a better understanding of complex processes and drivers of forest health on different scales.
Even though appropriate data and methods are available, no uniform, comparable, multidimensional, and multi-source forest health monitoring network (MUSO-FH-MN) exists, which corresponds to the future requirements of data science in the 21st century. We think that such a network is necessary to enable timely data and process-based decisions in order to guarantee better understanding, modeling, prediction, and assessing of healthy and resilient forest ecosystems.
The goal of this paper is to discuss:
What is required to bridge the gaps in information, data, models, and tools needed by forest managers and scientists for a better understanding of the complexity and multidimensionality of forest health drivers, stressors, disturbances, effects, and related processes?
Why are these requirements essential for a better understanding of forest health?
Which requirements are imperative to establish a multidimensional, multi-source forest health monitoring network in the future?
2. Forest Health
2.1. Understanding Forest Health
Forest ecosystems are exposed to multiple stressors that can destabilize functions, structures, and processes as well as impair the provision of ecosystem services [26
]. These multiple stressors mostly interact and co-occur in different ways and on various temporal and organismic scales, which are not yet fully understood.
Therefore, stressors, disturbances, or resource limitation factors can take effect at each development stage or only at specific development stages or time windows of the development, i.e., during the germination, the budding, the flowering, or seed formation of plants, causing functions to become impaired or even destabilized. Moreover, the effects of multiple stressors on different scales of biological organization will vary [27
]. The effectiveness of stressors is influenced by process characteristics (chemical, mechanical, and organismic driver), just as much as it depends on the dose and the dose combination of multiple stressors, as well as the characteristics of the effective processes such as range, duration, scope, intensity, continuity, dominance, or overlap of the processes [28
Furthermore, over longer time frames, multiple stressors can lead to the development of adaptation strategies such the development of CSR plant strategy types (competition—C, stress tolerance—S and ruderal -R) [29
]. However, stressors will only lead to a change in resilience when there are irreversible changes, making it impossible for the system to return to its initial state (see Figure 1
). Accordingly, forest biodiversity is a crucial component of the concepts of ecosystem health and integrity [31
]. These concepts are linked to the self-organizing capacity of ecosystems in the presence of multiple stressors, disturbances, or resource limitations. Healthy forest ecosystems can be defined as vigorous, diverse systems that are characterized by a high resilience on different levels of biotic organization from the gene, molecular, individual, and community level to that of forest ecosystems, with the ability to quickly return to an initial state following external stressors, disturbances, or resource limitations, and withstand negative impacts from external influences [34
2.2. Indicator Requirements for Monitoring Forest Health
The effects of multivariate stressors on forest ecosystem health are mostly non-linear, complex, and multidimensional. They depend on (i) the spatial or organismic characteristics and scale; (ii) the temporal characteristics and scale; and (iii) the process characteristics and scale. Indicators of forest health need to fulfill these requirements i.e., they must be able to map various organismic, temporal, and process characteristics on different scales. Only then can forest health and its resilience be analyzed, comparable, and predictable. In the course of a greater integration of freely available RS data [12
], the indicators must also be suitable for the combination of different in situ and RS approaches. In summary, (sets of) forest health indicators must correspond to the following content-related and methodological criteria:
They must reflect status, stress, disturbances, and resource limitations.
They must be able to map different (i) organismic; (ii) temporal; and (iii) process characteristics, all across different scales.
They should be recordable by different in situ monitoring approaches (forest inventory monitoring, biological and morphological species concepts, concept of ecological integrity, monitoring of animal distribution, anthropogenic drivers, or abiotic indicators (soil, water, climate, air), as well as close-range and air and spaceborne RS approaches on different platforms).
It should be possible to standardize, objectively repeat, and thus compare indicators.
Due to the heterogeneity and the complexity of different indicators in big data, all indicators should be transferable into digital form through a human–computer communication interpretation language to meet the requirements of data science.
2.3. Existing Standardized Approaches for Monitoring Forest Health
Forest health monitoring has a long tradition with numerous monitoring programs on the local, regional, national, and international level, in which standardized indicators of FH have been used and developed. In Germany, three levels of forest health monitoring exist (Table 1
) that have become representative for Europe, as well as for the initiatives to unite forest monitoring such as the Forest Information System for Europe (FISE, http://data.jrc.ec.europa.eu/collection/FISE
). Here, we describe these three levels of in situ monitoring in Germany.
One is the forest condition monitoring (FCM) (Level-I Monitoring). In this case, the annual condition of the forest is recorded using a systematic sample grid of permanent plots measuring 16 km × 16 km. This is done by visually assessing the canopy condition as an indicator for tree vitality on around 10,000 trees of the most important tree species. Crucial indicators that are considered for FH are e.g., the thinning of tree crowns, fruiting intensity, the yellowing of leaves or needles, infestations by insect or fungi, as well as damage to trunks or tree crowns. Manuals on Level-I Monitoring provide a comprehensive insight into all of the recorded FH indicators.
Another level of in situ monitoring in Germany is referred to as intensive monitoring (Level-II Monitoring). Due to an increasing deterioration of the vitality of numerous tree species, monitoring includes indicators such as foliage condition, information about the soil’s solid phase, foliage content, data on deposition, soil solution, weather conditions, air quality, ground cover, phenology, leaf shedding, as well as visible ozone damage. Such intensive monitoring for the first time enables a better understanding of cause–effect relationships.
Finally, the third level of monitoring forest health is the national forest inventory (Level-III Monitoring). This national forest inventory, which is conducted every 10 years, was initiated by the government as a long-term sample-based national forest monitoring project, and has so far been conducted in 1986, 2002, and 2012. One of the main objectives of the national forest inventory is to obtain information about forest structure, composition, and quantities of round timber. Data collection is conducted on a 4 km × 4 km raster basis. The 4 km × 4 km raster is used in several German federal states, whereas other states have opted to reinforce the network of random samples. For every corner, the block or cluster method is applied, including data collection of the corners of a 150 m × 150 m quadrant [44
]. Other major objectives of the German NFI include monitoring the sustainability and balance of forest ecosystems, the growth and yield of forests, forest biodiversity and regeneration, or essential components for understanding FH and its resilience.
There are comparable in situ monitoring programs in a range of countries that point the way to the application of standardized FH monitoring strategies (Table 1
Some of the central tasks of the United Nations Economic Commission for Europe (UNECE) are to deal with issues concerning environmental protection, such as for example the Geneva Convention on Long-Range Transboundary Air Pollution (https://www.unece.org/env/lrtap/welcome.html
). It was under this convention that the Working Group on Effects (WEG) emerged, which is also responsible for coordinating the international monitoring programs. Two of these programs are the International Co-operative Program (ICP) Forest (EU-Level II) and the ICP Integrated Monitoring (ICP IM). Among the largest monitoring programs on the European level are the FOREST EUROPE program of the European Union, as well as the International Co-operative Program on the Assessment and Monitoring of Air Pollution (ICP Forests) that was established in 1990 [21
]. The FOREST EUROPE program defines seven criteria to describe the condition of European forests, whereby three criteria monitor the preservation of the forest ecosystem’s health and vitality [21
] (see Figure 2
The European forest monitoring for the protection of forests with annual reports to the ICP Forests was founded through the European Union (EU) Agreement on Long-Range Transboundary Air Pollution as a European monitoring program, and has formed the basis for coordinating sustainable European forestry practices over the last 30 years. The ICP forest monitoring is subdivided into two levels. Level-I serves as an annual long-term monitoring of approximately 6000 sampling measuring 16 km² distributed across Europe. The FH assessment is conducted via visual assessments of the canopy condition, density, and leaf color for specific tree species. Within the Level-II Monitoring, process-oriented studies at selected experimental sites are conducted that are equipped with different in situ sampling probes. In addition, an extensive mapping of air pollution is carried out, and additional FH indicators are recorded such as the development and decomposition of soil nutrients, water availability, and soil and vegetation cover [45
All national and international in situ programs such as FOREST EUROPE or ICP Forests use sample-based as well as qualitative assessments of indicators that can either be ascertained directly (e.g., species compositions, quantities of dead wood) or assessed by experts (e.g., classes for thinning of crowns, yellowing of leaves/needles). A summary of key forest and tree stand parameters observed in different forest inventories can be found in Pause et al. [46
]. The quality, consistency, and thus comparability of partially qualitative assessments often largely depend on the knowledge of the monitoring team. For numerous crucial FH indicators such as tree crown density, thinning, naturalness, or regeneration potential, there are no standardized in situ variables to be measured or direct measurement procedures. These kinds of assessments introduce a high level of uncertainty to the data collection, and subsequently to the analysis, assessment, and prognosis of FH. To account for these uncertainties in the in situ monitoring of FH, extensive specifications were compiled to deal with the in situ uncertainties of FH.
2.4. Initiatives to Unite Indicators, Data, and Approaches
Whilst in situ FH monitoring is applied on the plot level, it is easy to figure out how useful RS techniques can be in recording wall-to-wall information on multiple, temporal, and spatial scales [47
]. There are numerous methodological approaches that use RS technologies for the long-term monitoring of FH [48
]. With an increasing number and availability of qualitative sophisticated RS data [11
], free access to the RS data portals in 2008 (starting with the Landsat Mission), and open access research, applications, stakeholders, and political decision makers have introduced a new era of monitoring for FH [12
However, if we are to monitor and understand FH in all of its multi-functionality and complexity as well as assess the effects of multiple stressors, it would make sense to have a monitoring system that combines in situ terrestrial observations and RS approaches, as well as other approaches [2
However, the linking and establishment of an MUSO-FH-MN requires initiatives that promote the standardization, harmonization, and digitalization of different approaches or techniques on a single platform. This is a goal that is pursued by the Global Forest Observations Initiative (GFOI) (http://www.gfoi.org
) and the Forest Information System for Europe (FISE).
The Global Forest Observations Initiative (GFOI) (http://www.gfoi.org/about-gfoi/
) supports governments in setting up national systems by providing a platform for coordinating observations, support and guidance on utilizing observations. It also works with national governments that report into international forest assessments (i.e., the global Forest Resources Assessment (FRA) of the FAO) and the national greenhouse gas inventories reported to the United Nations Framework Convention on Climate Change (UNFCCC). Since 2011, the GFOI has been supported with numerous RS data from the Committee on Earth Observation Satellites (CEOS). The availability of this RS data has provided numerous countries with complete RS-based forest cover data, which they require for an annual monitoring of their forests and thus the monitoring of global climate goals. Due to the open data policy and the cooperation with various national space agencies and coordination units, RS data from the Landsat series (USGS—United States Geological Survey) and data from Sentinel-1 (radar) and Sentinel-2 from the Copernicus program have been integrated into GFOI. Other RS data are supplied by space agencies in France (CNES—National Centre for Space Studies), Italy (ASI—Italian Space Agency), Canada (CSA—Canadian Space Agency), Germany (DLR—German Aerospace Center), Japan (JAXA—Japan Aerospace Exploration Agency), Brazil (INPE—Instituto Nacional de Pesquisas Espaciais) and China (CRESDA—Centre for Resources Satellite Data and Application). In the coming years, RS data will follow from Great Britain (NovaSAR—Synthetic Aperture Radar -Mission) and Argentina (SAOCOM—Satellites for Observation and Communications-Mission). By 2025, such cooperation will guarantee a global FH monitoring supported by RS data. With the Forest Information System for Europe (FISE), which is hosted by the EU-JRC—Joint Research Centre
), a harmonized forest data management and information system was established by the EU in 2015 to save, monitor, assess, and predict goals set by the EU and its member states in forest strategies, such as the preservation of forest multi-functionality or forest biodiversity. With this harmonization, FISE pursues the following objectives:
Set up a collection of harmonized Europe-wide information on the multi-functional role of forests and forest resources.
Provision of harmonized forest datasets that were developed from EU funding, including Horizon2020, LIFE +, Copernicus, etc.
Integration of diverse information systems (e.g., European Forest Fire Information System—EFFIS) and data platforms (e.g., European Forest Data Center, EFDAC) into a dynamic modular system that combines data (in situ forest data, forest data based on Copernicus RS information) and models into applications.
Linking existing forest-related policy fields of the EU to relevant networks and initiatives in the context of forest information.
Foster the availability of geo-data that are standardized throughout Europe, irrespective of national borders and with a high degree of spatial accuracy, in line with the INSPIRE directive (infrastructure for spatial information in Europe).
Provision of a digital information on the distributions of the most important forest tree species and changes in forest dynamics.
Assessment of the suitability of tree species and forest types for current and future climate conditions, biophysical mapping, and assessment of ecosystem services in the forest;
Measurement of changes to the provision of these services as a consequence of anthropogenic and natural disturbances; economic assessment and ecological and economic audits.
Several modules have been developed within FISE: (i) forests and natural disturbances such as fires and pests; (ii) forest and the bioeconomy; (iii) forests and climate change; and (iv) forest and ecosystem services.
The successful implementation of FISE constitutes the first crucial step toward the harmonization of data, approaches, and indicators to record, store, analyze, assess, and predict the effects of multiple drivers, and can be regarded as a pioneer for the digital harmonization of FH in Europe. Beyond these achievements, a future MUSO-FH-MN requires a critical consideration of the “resilience” of existing FH monitoring programs and initiatives in order to overcome the impending challenges of the 21st century in data management, analysis, or prognosis. Such an MUSO-FH-MN can provide a data-driven, rapid, cost-effective, flexible, and consistent assessment and decision-making support system for scientists, forest managers, or stakeholders. In order to meet these challenges, numerous requirements will be discussed in detail in the following chapters.
Furthermore, there are numerous other networks and activities, which should also be included in a future MUSO-FH-MN (see Table 2
3. Requirements to Understand the Effects of Multiple Stressors on Forest Health
Our knowledge about the effects of stressors and disturbances on the biodiversity and functioning of forest ecosystems is still limited.
In order to understand the effects of stressors and disturbances on forest ecosystems, we need to investigate interactions between different trophic levels (e.g., soil organisms and plants), interactions between the biotic and abiotic environment (soil, water, trees), as well as different aspects of biodiversity (taxonomic, phylogenetic, (epi-)genetic, phenotypic/functional). A comprehensive understanding is crucial for Soil–Plant–Landscape modeling, forecasting stress effects, and implementing forest management strategies. All of the existing approaches, i.e., (i) tree and forest lysimeters; (ii) close-range RS approaches such as plant/forest phenomics facilities, controlled environmental facilities (Ecotrons), local tree experiments; and (iii) biodiversity–ecosystem functioning (BEF) experiments need to either be extended or re-established.
Level 1: Tree and forest lysimeters:
] was the first to indicate the role of soil mixture storage as a means of identifying the link between precipitation and evapotranspiration. The investigation of soil characteristics, hydrology, and structural properties of the vegetation on small scales is important in order to understand the mentioned link [69
]. In this respect, tree or forest lysimeters are particularly suitable for long-term high frequency measurements, and thus for understanding effects of stress on tree physiology, such as effects of water stress, high salinity, nutrient deficiency, or drought stress [69
]. Indeed, our knowledge about the interactions of soil matrix characteristics, soil moisture, and trees are currently derived from forest lysimeters, but information about the spectral response interactions to phenotype of the trees is still not available. To close this gap, the tree phenotype-spectral response in tree and forest lysimeters needs to be complemented with transmitters using a wireless sensor network [1
], as well as close-range RS techniques with different sensor characteristics (3D stereo camera, LiDAR technique, 2/3 D thermal camera, hyperspectral thermal cameras, or hyperspectral sensors).
Level 2: Forest inventory plots:
The inclusion of the ICP Forests monitoring network (see Chapter 1.3), which assesses tree condition and vitality on a 16 × 16 km grid over Europe [42
] or other sample plot networks of national forest inventories, is crucial for ground truth information of forest health. Here, it is particularly important to establish and integrate different RS sensors such as optical or thermal sensors or 3D sensors on different close-range RS platforms such as towers or wireless sensor networks to monitor high frequency spectral information for different forest species in forest inventory plots. These are essential interfaces that provide important information for model and air and spaceborne RS calibration and validation.
Level 3: Plant phenomics facilities, Ecotrons, spectral tree laboratory experiments:
One of the most important challenges in forest stress physiology research is the spectroscopic recording of plant species phenotypes in order to gain a better understanding of the links among the phylogenetic relationships of species, their genotype, epigenetics, and phenotype. The genotype of a plant species comprises crucial hereditary information about the DNA. Epigenetics provide information beyond DNA sequences, giving us an understanding of how genetics, the environment, and anthropogenic drivers work together to shape forest ecosystems. The phenotype of a plant represents its anatomical, morphological, and physiological characteristics (traits), and reflects the interactions of trees with their environment, and thus with stressors and disturbances. The affects the phylogenetic, taxonomic, structural, and functional traits of trees, and helps determine a specific phenotype [76
]. A better understanding of tree stress physiology can be gained by recording the spectral traits (ST) of individual plants and combining this trait information with information on the phylogenetic relationships of species, as well as their genotype and epigenetics. This can be achieved by recording phenotypical plant traits in plant phenomics facilities [78
] or in controlled environmental facilities (Ecotrons) [81
]. Here, non-invasive measurement methods such as RS techniques can be implemented in an automated manner, enabling a holistic and quantitative recording of the phenotype of a plant species over its entire life cycle at a reasonable cost [82
]. It is not only the phenotypical traits of the aboveground parts that can be recorded, but also those of seeds or roots [84
], as well as the abiotic characteristics of soil and soil water. Entire soil/soil water—tree—spectral response interactions have so far only been measured in Ecotrons, and partially in spectral tree laboratory experiments. Hence, it is imperative to set up forest phenomics facilities that are similar to plant phenomics facilities with additional instruments in the future.
Level 4: Biodiversity–ecosystem functioning (BEF) experiments:
The increase in monospecific plantations increases abiotic and biotic stress and disturbances in forest ecosystems. Hence, biodiversity–ecosystem functioning experiments are required to answer questions about the interactions of forest biodiversity (with different aspects: phylogenetic, functional, structural, taxonomic) with ecosystem functions [61
]. Furthermore, BEF experiments are crucial for answering questions about the response of forest species to stressors and disturbances and thresholds in the response of species [87
]. With this, we can gain important insights into post-disturbance trajectories and the individual and combined effects of stressors and disturbances on forest diversity [24
]. An example is the resilience of a forest in the light of climate change. To investigate stress–response interactions, BEF experiments should include gradients such as those of latitude, climate, biomass, or experimental diversity [88
]. Scherer-Lorenzen et al. [64
] for example established the large-scale long-term BEF experiment BIOTREE (BIOdiversity and ecosystem processes in experimental TREE stands) on different sites with temperate tree species in order to explore BEF relationships along geological and climatological gradients.
Level 5: Forest landscape types:
Long-term forest monitoring experiments for different forest landscape types and eco-regions are extremely helpful in understanding relationships between global forest biodiversity and forest ecosystem functioning as well as the underlying processes. TreeDivNet is a global network of tree diversity experiments that consists of 18 experiments on 36 sites in five eco-regions. TreeDivNet provides insights into the ecosystem functions (e.g., pest resistance potential, carbon sequestration) of diverse plantations [61
]. The FunDivEUROPE Exploratory Platform is a network of forest test plots along tree species diversity gradients in six major European forest types, which aims at understanding the effects of species diversity on forest ecosystem functioning [63
]. More BEF experiments with tree diversity manipulations have been conducted in Europe, America, Africa, Asia, and Australia [10
]. Bruelheide et al. [10
] for example established a BEF experiment with 40 tree species in the subtropical part of China to explain the role of tree and shrub species richness for carbon storage and soil erosion. Establishing wireless sensor networks as well as long-term ground truth monitoring data with a high frequency in time for air and spaceborne RS data (see requirement 5) in BEF experiments is urgently required in order to understand the spectral response (amplification factors, mutually exclusive factors) of forests and the effects of phylogenetics, (epi-)genetics, and phenotype on the spectral response.
6. Requirements for Using Data Science as a Bridge
Statement: Although 90% of the world’s data was generated over two years, around “50% of all research and experiment data (corresponding to some US$28B/year) are not reproducible, and over 80% of it never makes it to a trusted and sustainable repository” .
6.1. Open Science
Worldwide research uses a long-standing, domain-specific, and domain-overarching research infrastructure, comprising disciplinary and interdisciplinary commissions and cooperation. Previously established infrastructures for storage and accessing data are important forerunners for the reorganization of data science. However, previous research structures, commissions, and data storage methods will in no way meet the current and future challenges of data science [175
]. Approximately 90% of the world’s data and information were generated within only two years [176
]. Due to errors in research design and realization, together with the wrong development of previous data science in recording, saving, processing, and linking research data, approximately 50% (corresponding to some US$
28B/year) of all research data and information can no longer be reproduced [177
]. It is very probable that over 80% of research data have never been saved in a sustainable repository, enabling access or analysis at a later stage [175
]. Read et al. [178
] claimed that only 12% of the United States (US) National Institutes of Health-funded datasets, information, or experiments have been demonstrably deposited in recognized repositories. The consequences of these undesirable developments in data science are a tremendous waste of not only capital, but in particular the loss of research data, which are the basis for gaining knowledge, and transferred to our application case, a better understanding of complex environmental issues such as forest health.
Furthermore, the multidimensionality, complexity, and handling of big data with syntactic and semantic heterogeneous data and for the most part the lack of standardized approaches in data management, monitoring, protocols, models, analysis, and assessments will prove to be the greatest challenge. Currently, data science can be regarded as the best option for bridging the gap and solving these problems. Data science comprises numerous components, which will be described in the following chapters (see Figure 6
6.1.1. Open Data Policy and Free Data Access
The costs for FH monitoring using both spatial and temporal high resolution RS data are not conducive for achieving a future MUSO-FH-MN. Therefore, open access to data should form the basis for a future MUSO-FH-MN. Crucial factors for the success of data science for in the context of forest health are the open data policy and free data access to RS data and RS data products, GIS, and other data acquired for monitoring forest health. However, there are considerable differences between the imaging systems with regard to an open data policy and the free access to the data. In most cases, it is only the large earth observation programs that provide all data free of charge.
Light detection and ranging (LiDAR) sensors placed on airborne or unmanned aerial vehicle (UAV) platforms can provide a range of forest inventory data [1
]. After Barrett et al. [20
], 15 countries around the world showed great interest in using LiDAR data for their future NFI monitoring schemes and as forest health indicators. However, LiDAR data are rarely available as open data, and even when they are, no coherent recording period is available for larger areas. This makes the derivation of regular and large-scale parameters difficult or even impossible. One reason behind a rather low utilization rate is the cost of acquiring wall-to-wall LiDAR data. One GPS receiver currently costs $
10,000 (USD) [20
], and the price for airborne laser scanning ranges from 50 Euro/km² to 150 Euro/km². The use of cheaper LiDAR-UAV alternatives to record FH is possible, but the inaccuracy of LiDAR-UAV is around 20 cm, compared to 5 cm with airborne laser scanning.
To complement regional LiDAR-UAV and airborne laser scanning recording with cheaper spaceborne laser scanning or freely accessible data, the future use of the first satellite-supported laser-based instrument such as the Geoscience Laser Altimeter System (GLAS) (http://attic.gsfc.nasa.gov/glas
, experimental time 2003–2010) is expedient. Spaceborne laser scanning will enable the quantification and assessment of 3D vegetation structures such as stand structure, volume or biomass using satellites on a regional to a global scale for the first time to achieve a spatial resolution of 1 m [1
]. The same applies to spatially high-resolution optical sensor data such as WorldView-3, which currently has high costs and limited availability [182
With the opening of the Landsat archive in 2008 [183
], the use of RS for NFI and FH monitoring has increased dramatically [13
]. The Landsat missions that were started in 1972 have provided almost 40 years of data continuity, contributing to robust design and engineering as well as data infrastructure and data archiving. These will prove to be of unimaginable value in the long-term monitoring of the dynamics of processes and changes to the Earth, as entire ecosystems are faced with an increasing global population, demand for resources, and climate change [184
]. Based on a change to data policy in 2008, all new and archived Landsat data that had previously been for commercial use and archived by the United States Geological Survey (USGS) were made freely available to science and application for the first time in RS research history [12
The new data policy revolutionized the use of Landsat data in research and development, industry, and new applications [185
]. In this way, it was possible to integrate very early RS data into various monitoring programs for forest health [186
]. Moreover, open data access promoted the development of robust standard products and encouraged international cooperation and global networking in the field of biodiversity and ecosystem health research to deal with complex and global human–environment issues through innovative 21st
century earth observation methods.
Following the lead given by the opening of the portals, the open data policy was boosted with the free data access for other RS missions such as the Sentinels [187
], the EnMAP hyperspectral imager missions [188
], as well as the opening of other archived data and newly recorded RS data by IRS-1C, IRS-1D, Resourcesat-1, Resourcesat-2, and Cartosat-1 missions (https://www.gaf.de/content/irs-data-now-available-free-charge-scientific-users
). Free and open access satellite data according to the principles of open science, the Research Data Alliance (RDA), and Linked Open Data (LOD) bridges the gap to better understand forest health, and is the key to biodiversity conservation [14
6.1.2. Open Science Clouds
Open science means that it is possible to implement a robust and sustainable open data management and practices on the discoverability, accessibility, interoperability, and reusability of (forest health) data. For this reason, the establishment of Open Science Clouds such as the European Open Science Cloud (EOSC) [175
] is the first milestone for successful data sharing and Linked Open Data (LOD) approaches [189
], and promoting the sustainable use of information, models, software, protocols, and workflows. The use of open services connects various research disciplines as well as scientists, practitioners, and stakeholders. Therefore, Open Science Clouds lead to a better science and understanding of complex interactions in ecosystems and forest health. The European Open Science Cloud: (i) integrates, uses, and promotes human expertise, resources, standards, best practices, and indispensable research infrastructures; (ii) supports the search for, access to, interoperability, and in particular the reuse of open as well as sensitive data with state-of-the-art storage standards; and (iii) supports the access and the use of data-related elements (standards, software, protocols, workflows) that enable reuse and data-assisted knowledge transfer and innovation [175
]. One important example of Open Science Clouds is the Thematic Exploitation Platform (TEP).
NASA opened NASA’s Open Data Portal (https://data.nasa.gov/
), which enables access to (i) NASA remote sensing data; (ii) a developer portal for building apps or visualization; (iii) communication tools for open data, codes, and application programming interfaces (APIs); (iv) a catalogue of NASA open source projects available to the public; (v) NASA NEX as a collaboration and analytical platform for combining state-of-the-art supercomputing, earth system modeling, workflow management, and NASA remote sensing data, as well as (vi) a portal for access to documents or references.
6.2. Standardization in Data Analysis
6.2.1. Standardized Data Management Approaches
To link different approaches, data, or monitoring strategies, a future MUSO-FH-MN requires standardized measurement designs and temporal monitoring frequencies. Standardized data management is crucial for the “long-term care” of digital information, offering the possibility to combine recorded data with newly generated information, linking complex and multidimensional information with intelligent analysis approaches as well as knowledge discovery. The principles of Findability, Accessibility, Interoperability, and Reusability (FAIR) are crucial for “good data management” [191
]. The FAIR guiding principle not only refers to data, but also integrates tools, workflows, algorithms, models, approaches, frameworks, scales, and indicators [191
]. The Research Data Alliance (RDA, https://www.rd-alliance.org
) is based on the FAIR principles, and was established with the goal of building a social–technical infrastructure for sharing data and information and developing data standards without barriers across scientific, socio-economic, and governmental backgrounds. Furthermore, EOSC—the European Open Science Cloud for Research Pilot Project (www.eoscpilot.eu
) provides a multidisciplinary standardized open platform for research data, knowledge, and services across all scientific disciplines.
6.2.2. Standardization of Essential Variables of Forest Ecosystems
The standardization of forest health monitoring has continuously and successfully been implemented in numerous monitoring programs on the national and international level. This standardization in forest health monitoring that started 30 years ago laid the foundations for establishing standards not only in data management and monitoring, but also in other domains.
However, due to the inherently limited time, staff, and financial resources required for the in situ observations, it is essential to prioritize the planning process, implementation, and maintenance of observers as well as technical observation systems to carry out continuous, comparable, and repetitive measurements. Furthermore, this should be supported and facilitated by the management of monitoring data along the entire chain, from the measurement of raw data to the processing and provision of data, products, information, and services required by scientists, stakeholders and decision makers.
The tremendous complexity and multidisciplinarity of forest health calls for an increasing interaction with big data and big analytics, as well as for the integration of data science approaches. To successfully integrate forest health indicators into data science approaches, the standardization of data management, protocols, and monitoring strategies is crucial to guarantee a rapid realization, a minimalization of transfer errors and uncertainties, cloud computing, as well as the implementation of machine learning algorithms in the analysis, prediction, and assessment of the status, changes, and effects of forest management on forest health.
The concept of Essential Variables (EV) focuses on recording those monitoring variables that can be used as indicators for the state of an ecosystem and its changes, such as forest health or biodiversity. Here, the concept of EV assumes that with a small but essential number of variables, the state and trend of a system can be characterized without losing any significant information about the system [192
]. Moreover, the EV are less so individual variables or indicators; they can be rather regarded more as a cluster that is made up of several indicators. Due to the complexity and interactions of ecosystems, the respective EV cannot be regarded independently from one another. Instead, many indicators overlap or form cross-links (see Figure 7
The Global Climate Observing System (GCOS) was one of the first systems that developed a full set of Essential Climate Variables (ECV) [194
]. The Essential Climate Variables laid the foundations for establishing more specific EVs such as the EVs for Weather—EV (led by the WMO—World Meteorological Organization and GAW—Global Atmosphere Watch) [195
], the Essential Ocean Variables—EOV headed by the Global Ocean Observing System (GOOS) [193
] or in the field of biodiversity the Essential Biodiversity Variables (EBV), (driven by GEOBON) [192
]. The establishment of Essential GeoVariables—GEV is currently being promoted by the EU-funded project GeoEssential (http://www.geoessential.net/
). While the first EV were developed back in 2007 and were soon followed by others, some fields such as agriculture, soils, catastrophes, ecosystems, health, and urban development are still in the initial stages of developing EVs.
Consequently, the establishment of “Essential Forest Health Variables (EFHV)” or Essential Variables for Ecosystem Health seems reasonable for a future MUSO-FH-MN. Preliminary approaches can be found in the Sourcebook of Methods and Procedures for Monitoring EBVs in Tropical Forests with RS [197
6.2.3. Standardized in Data Analysis (Thematic Exploitation Platforms)
The growing volume of global monitoring data from space, combined with information of in situ data networks, long-term RS archives, information about different approaches, models, or scales, and the inherent growth in big and complex information (structure, format, origin) as well as error budgets, make it imperative to use “Thematic Exploitation Platforms—TEP” (https://tep.eo.esa.int
). The principle idea behind TEP is to allow easier access and exploitation of complex information and move processing to complex data rather than the data to the users, including their models and approaches [198
]. Data platforms that have been developed by the European Space Agency (ESA) for different purposes use cloud and Hadoop computing with MapReduce environments for in situ forest health and RS data. So far, different thematic oriented TEP exist, including: TEP Forestry (the Forestry Thematic Exploitation Platform) TEPCoastal, TEP Urban, TEP Polar, TEP Hydrology, TEP Geohazards, as well as a TEP for food security.
The basic principles of TEP are to: (a) move the code to the data, including tools and resources; (b) work in a virtual workplace environment with access to relevant non-RS and RS data, processing and analysis tools, platform services, and functions such as tools for data mining, visualization, and the most relevant development tools (IDL-Interactive Data Language, Python, R), or communication tools such as social networks; and (c) access and share data and collaborate among the user community, scientists, and governance. In the future, large-scale worldwide sensor networks and TEP based on the semantic web, ontology, and linked open data/spatially-linked open data approaches [189
] will form the basis of data archiving, intelligent data processing, integration, retrieval, modeling, and the classification of all RS data and the complex additional data of forest health data and their indicators [200
6.3.1. Human-Driven Monitoring
Indicators of forest health on different spatial scales need to be assessed relative to the extent to which they can be monitored by technical systems such as close-range, airborne, and spaceborne RS [1
]. It will indeed be the case that many of the indicators of forest health can neither be generated nor used or understood by technical systems such as RS. By contrast, these human-driven indicators are only decimal or hexadecimal figures and spectrally remotely-sensed signals for a computer. Hence, the knowledge and expertise of forest managers will be decisive in the successful process of recording and assessing forest health. The following constraints will for example influence the recordability and thus the digitalization of FH indicators with RS. Generally speaking, forest plant traits, trait combinations, and trait variations can be digitalized using RS if (i) the shape, density, and distribution of forest plant traits in space and over time can be recorded by the spatial, spectral, radiometric, directional, and temporal characteristics of the RS sensors; (ii) the classification method is chosen wisely, as a pixel-based or (geographic) object-based approach; and (iii) “the RS algorithm and its assumptions fit the RS data and the spectral traits of the plant species” [25
]. Nevertheless, there are still many FH indicators that cannot be recorded or monitored using remote-sensing systems. These indicators still need to be recorded using human-driven monitoring approaches, and then integrated into the system.
6.3.2. Digitalization and Big Data
The rapid development in global information processing along with its integration in all fields of industry, economics, society, and the environment reflects how our human society has entered an age of “big data“. The countless controversial discussions about how best to deal with digitalization and big data emphasize how we are at a historic turning point for making decisions that will have a tremendous influence on the future development and orientation of the research community. On the one hand, the handling of digitalization and big data is seen as a risk; on the other, it is regarded as a great opportunity to solve complex, multidimensional environmental problems.
Contesters of increasing digitalization believe that it is leading to information and data overload, subsequently threatening data and information flows and potentially resulting in a data collapse and the “end of theory in science“ [201
]. On the other hand, the aforementioned requirements show that it is not sufficient for a future MUSO-FH-MN to use a single measurement, platform, sensor, monitoring, or modeling approach, focus on a single scale, or consult only one discipline, especially if we aspire to understand forest health in all of its complexity. It is only by linking multi-source approaches, which requires handling digitalization and big data, that the disadvantages of global informatization can be compensated for through the advantages of other approaches.
Big data presents tremendous potential for understanding forest ecosystems and improving sustainable ecosystem management. The crucial scientific potential of monitoring short-term and long-term forest health data lies in revealing patterns and regularities that could lead to new questions, new measurement designs, new perspectives, and possibly new answers [202
]. The digital age with big earth data enables us to recognize, record, and reassess patterns and relationships that we could not see before. Also, in connection with Linked Open Data approaches, it enables us to analyze machine learning technologies to come up with statistical predictions about process–response interactions in forest ecosystems, which are not possible with traditional methods. Big data embodies the four Vs: the volume, the velocity, the variety, and the veracity of data. RS data is justifiably described as “big data” with a high volume. In 2014, around 200 on-orbit satellite sensors covered the global atmosphere, land surface, and oceans. The archived data from NASA’s Earth Observing System Data and the Information System (EOSDIS) volume exceeds 7.5 petabytes. In 2012 alone, EOSDIS distributed around 4.5 million gigabytes of data [203
]. The Copernicus Sentinels that are currently in orbit produce approximately 12 TB of data per day (http://www.copernicus.eu/news/editorial-2017-achievements
Increasing demands to spectrally record new traits and processes of land surface, water, and the atmosphere that could not previously be recorded and which are necessary to assess forest health, are progressively increasing not only the volume, but also the velocity and the variety of RS data. In this way, the development of RS sensors is leading to a recording of new three or four-dimensional traits, to the development of RS sensors with an improved spectral resolution (hyperspectral to ultraspectral, 2D to 3D thermal), to the improvement of spatial and directional resolution, as well as to sensors with shorter revisit cycles and larger coverage to compensate for the limitations of a single sensor.
6.4. The Semantic Web
The digitalization as well as the linkage of complex, heterogeneous FH data must be built on the Semantic Web or web 4.0 technologies, using the following methodological approaches as a foundation.
For a long time, the lack of a community dimension consensus from human–computer communication interpretation language to describe and integrate FH indicators was the limiting factor and the most pressing shortfall for the digitalization, integration, and coupling of various domains, indicators, and monitoring approaches of forest ecology.
When FH indicators cannot be replaced by technical systems, then languages and tools need to be implemented to transfer human-gained information to machine interactions. Over recent years, the World Wide Web Consortium (W3C, https://www.w3.org/W3C
] has developed standards for more intelligent computer-controlled information systems, i.e., the Semantic Web, which is also called web 4.0, in order to command more complex human–computer communication (Figure 8
). In the Semantic Web, digital information is complemented with formal significance, enabling computers to recognize meaningful relationships between data, record relevant information, and analyze complex information [206
]. The semantic web bridges the gap by linking human and computer-driven data, indicators, models, approaches, and measuring instruments, on different platforms and scales, as well as decision makers, and forest managers and others (see Figure 8
). For representing forest health data, specific standards have been developed, and recommendations exist for publishing earth observation data, e.g., using the language Resource Description Framework (RDF) data cubes in conjunction with standardized vocabularies for space, time, measurements, provenance, etc. (https://www.w3.org/TR/eo-qb/
Metadata forms the basis for describing and using Semantic Web technologies, and thus for linking data. Through increasing costs and increasing amounts of information, metadata concepts are required that can be efficiently optimized by electronic networks, uniformly generated, described, and saved by machines, so that they can ultimately be interpreted, linked, and efficiently analyzed. Furthermore, provenance data (see https://www.w3.org/TR/prov-overview/
) is essential for the interpretation of data, as it allows tracking data generation processes, and hence helps the interpreting agent to judge the trustworthiness of the sources, resolve conflicting information, and propagate trust scores to the results of the analyses. There exist good examples of using close-range laser scanning approaches in forests toward physically-based semantics across scales [207
] and also approaches to combine Semantic Web data with the data mining and knowledge discovery processes [208
] to get a deeper understanding of FH processes and their resilience.
The development of semantic models can be illustrated as a so-called semantic ladder (Glossar → Taxonomie → Thesaurus → TopicMap → Ontology). The aforementioned thesauruses can serve as starting points for creating more richly axiomatized ontologies. A thesaurus is defined as “controlled vocabulary designed to clarify the definition and structuring of key terms and associated concepts” [173
]. For specific disciplines such as plant traits [172
] or soil invertebrates [209
], there are already comprehensive thesauruses that form the basis for extremely complex semantic models (the ontologies).
Ontologies are the most powerful semantic language, linking together complex content and data such as forest health data by including human expertise, and thus enabling human–machine interactions. With the help of the Web Ontology Language (OWL), specialists can bring in their expertise in the form of rules, and thus, unlike with simple RDF-technologies, derive reasoning from it. Take the example of a spruce tree that will always be coniferous. In the spruce-coniferous tree relationship, the computer will therefore conclude that the spruce tree will always be a coniferous tree that will never lose its needles during the vegetation period, unless the spruce tree is under stress. Such ontology knowledge can be simulated in terms of network information, so-called semantic networks or knowledge graphs [210
] (see Figure 9
There are already numerous ontology-based knowledge networks such as the ontology for forest inventory and the mensuration [211
], classification, discrimination of forest species and forest types [211
], the ontology for biodiversity [163
], mammalian phenotype ontology [212
], plants [213
], the ontology for phenomics in plants [214
], disaster management ontology [215
], or crop ontology [216
]. Semantic networks are dynamic networks that are constantly updated with new knowledge and ontological networks in order to yield more precise analyses and results (Figure 10
and Figure 11
). At the same time, through methods of proof and trust in data science, uncertain statements can be altered, corrected, or deleted.
Based on semantic networks, different systems such as in situ approaches as well as RS monitoring approaches on all levels of biological organization are able to communicate with each another through an explicit, formal interpretation of all network data. The Global Biodiversity Information Facility (GBIF) platform (https://www.gfbio.org
) is a platform that pursues the goal of long-term data provision, archiving, and linking of complex biological data from multidisciplinary archives, institutions, and projects, based on simple vocabulary and extending to complex ontologies and semantic web technologies with the W3C standard [218
]. The GFBIO might act as an important role model or could form the basis for the future MUSO-FH-MN.
6.4.3. Linked Open Data Approaches
The concept of semantic linking, which is also known as Linked Open Data (LOD) or the Spatial Linked Open Data (SLOD) approach, now enables the machine-generated linking, retrieval, and analysis of complex and heterogeneous data, information, methods, or platforms of forest health.
The advantage of using semantically LOD compared to non-semantically linked data is the option of data retrieval using techniques of data mining, machine learning, or artificial intelligence, which are put into relation with one another through associative semantic linking. With LOD/SLOD data retrieval, it is not the analyst who determines in advance which data will be entered into databases and which models will be linked with one another and analyzed (as would be the case in traditionally non-semantically linked data retrieval processes), but rather the functional space alone that searches through existing semantic LOD/SLOD links according to valid patterns, similarities, and rules to compile completely new relationships, patterns, insights, and model predictions.
Another tremendous advantage of the LOD/SLOD approach is the semantic coupling of multidimensional, complex, and heterogeneous data, experiments, models or platforms. A future MUSO-FH-MN should integrate information from the data of site surveys for species, species lists, metabarcoding, microgenomics [219
], data from museums, phenotyping information [160
], data from lysimeters, plant phenomic facilities [78
], controlled environmental facilities—Ecotron’s [81
], long-term ecological research [98
], spectral laboratory experiments, biodiversity ecosystem functioning experiments [10
], RS data such as optical (multispectral, hyperspectral), thermal, radar, LiDAR data, laboratory, tower, camera traps, wireless sensor networks, drones, close-range, airborne and spaceborne RS platforms, while also linking monitoring databases, networks, citizen science information, abiotic (soil, water, air) or social and economic information. Semantic LOD/SLOD links and retrieval can help better investigate, model, and forecast the high complexity, multidimensionality, and interdisciplinarity of different research fields based on new or advanced knowledge.
6.5. Proof, Trust, and Uncertainty in Data Science
Unlike many other research areas, most methods used in data science introduce a certain level of uncertainty. The reasons for this are two-fold. First, the data that we operate on has some level of uncertainty and/or noise. Hence, the methods we use need to be robust, also in the presence of noise and uncertain data. The design choice is whether to use an approach that assumes that all of the data is certain. This approach has the danger of overfitting the data at hand, i.e., drawing false conclusions from uncertain or wrong measurements. The alternative is to use approaches that yield an approximate fit to the data given. This does indeed introduce uncertainty, but has proven to be more noise-tolerant.
The second reason for introducing uncertainty by design is scalability. Even for problems for which exact methods exist, it is often the case that they cannot be scaled to big data problems. Hence, heuristics are often employed as approximations to the exact solution. This gives way to approaches that also solve large-scale problems in real time, but at the price of introducing uncertainty, i.e., we obtain a solution, but we cannot be fully certain of its validity. Hence, uncertainty is introduced by design as a trade-off for scalability.
In many cases, the approaches are aware of their own uncertainty, and can yield solutions and predictions together with a statement about their confidence. This helps users deal with the uncertainty in many ways: they get a better intuition of whether or not to trust the outcome of an algorithm, and they can decide to review borderline cases manually.
6.6. Data Science Analysis
6.6.1. Data Mining, Machine Learning, Artifizielle Intelligenz
The increasing availability of open data repositories combined with constantly growing data acquisition rates urges the need to develop learning agents in the form of computer programs. Such agents can potentially speed up the pre-processing of data as well as improve the complexity and objectivity of data analyses. Pre-processing may include different aspects to be considered prior to mining a database for particular FH issues, e.g., the identification of hidden (yet unknown) links between stressors and the functioning of forest ecosystems. It starts with appraising the general suitability of data found in remote open databases prior to any further processing step, e.g., transferring the data to the local work environment and storing them there for further processing. This matches directly with the availability of suitable and non-purpose specific metadata to be used by remote data retrieval agents in order to judge the usefulness of available data. Pre-processing also comprises the necessity of identifying the presence of undesired data disturbances, e.g., due to anthropogenic impacts on ecosystems (installations), or acquisition system malfunctions prior to deriving filtered and/or transformed attributes from the data that are considered more useful for further analyses than the actual measured data.
In the sense of a knowledge discovery in the database’s framework, the pre-processed database may then be mined by learning agents for particular issues that are necessary to address FH. The selection of an algorithmic machine generally depends on the question of interest. However, if the pre-processed databases are in some sense harmonized so that the different sets of information can be labeled according to distinct data families, e.g., technically acquired data with physically defined resolution or subjective insights with unquantifiable trustworthiness etc., then data mining algorithms for a distinct task can be developed in a generalized manner to ensure performance, even if specifically available information may contain varying attributes for different times or locations.
Traditionally, the pre-processing and mining of the available database was carried out by trained human experts, and was based on their experience and insight into the response of distinct ecosystems and experimental setups. However, human abilities to gain an overview of complex databases in a short time are generally limited, and underlie an undesired learning bias. Here, learning agents may improve the breadth of data analyses, as well as the complexity level of explored and recognized inter-data relations (see Figure 12
6.6.2. Handling Petabyte-Scale RS Data with Cloud and Web Services
The opening of the Landsat archives in 2008 [185
] led to an explosive use of RS data in frorest monitoring [131
], different science, economics, and application [12
]. NASA (https://data.nasa.gov/
), the US Geological Survey, NOAA [183
], as well as the European Commission with its Copernicus regulation (http://data.europa.eufollowed
suit and opened petabyte-scale archives for freely available RS data from MODIS to Aster, Landsat 4–8, Sentinel, as well as numerous other sensors and geo-data.
To overcome the challenges presented by the petabyte-scale geo-data archives, the development of open-source program libraries was accelerated such as TerraLib, as was the use of Internet computing power by Hadoop and tools such as GeoSpark as well as GeoMesa, which enable a diverse and fast processing of big geo-data [221
]. At the same time, super computers and high performance computer systems were also promoted [222
], as was the new generation of quantum computers [223
]. In addition to this, cloud-computing started to become the norm in data science [224
Furthermore, for a better handling of big geo-data, Google developed the Google Earth Engine (GEE) [221
], using RS data heritage in a petabyte-scale [226
], or for mapping woody vegetation [227
]. GEE is also being increasingly used for geo-data analyses in portals for environmental research such as the free open-source solutions for environmental monitoring portal for forest health “openforis” (http://www.openforis.org/
). By setting up platforms such as Copernicus Data Information and Access Service (DIAS), the European Commission is also trying to advance the possibilities for the high-performance evaluation of large amounts of data in the cloud http://copernicus.eu/news/upcoming-copernicus-data-and-information-access-services-dias
If applications of processing, analysis, or standardization of various data formats are not integrated into platforms such as GEE, then web services will usually provide the opportunity of combining available systems and services cross-platform with one another, and making them available in their own applications and services. Threfore, web services enable an effective data and information exchange between heterogeneous applications. In the context of geo-data analysis, a wide range of various web services are already available, such as those provided by the Open Geospatial Consortium. There are also several other web-based systems that provide tools to search, order, and download RS data; these include the USGS Earth Explorer [228
] for data analysis, the National Climatic
Data Center—NCDC Climate WebGIS (http://www.climate.gov/#dataServices
), the NASA Giovanni [229
] or the Eearth Observation Data Service (http://www.earthserver.eu/services/eods
) for the RS community, the PlanetServer for the Planetary science community [230
], and single web services and free software solutions for automatically generated 3D point clouds for UAVs [231
]. Eberle et al. [232
] demonstrated how different web services can be embedded to generate a geo-data toolbox for monitoring forest in Siberia using RS data. Other novel developments of geospatial web services are oriented toward a server-based on-demand access and processing of big Earth data [230
]. These web services are suitable for monitoring platforms such as tablets for better managing forest health inventory data, where on-demand RS and geo-data are essential.
In addition to web services for geodata, freely accessible web services for biodiversity research have also been developed such as the “Biodiversity Virtual Laboratory” (BioVeL) [233
] for collecting, organizing, sharing, data processing, data analysis, and modeling in biodiversity science and ecology. It has integrated more than 60 web services for collecting, analyzing, modeling, simulating, and predicting in situ monitoring data for biodiversity and science.
Web services have been implemented in numerous ways, and yet they still do not meet the high demands of the Semantic Web and data science of tomorrow to provide services for semantic-linked data, models, or platforms. For this reason, Semantic Web Services (SWS) have been developed, which beyond their purely syntactic interface description, can execute functions such as the location, selection, execution, and analysis of data with semantic content and links [234
]. It is here that techniques of the semantic web are employed such as the Web Ontology Language for web services (OWL-S) [235
]. To make the technical functionality of a service semantically available, an additional ontology is essential, as well as the respective specialist domain i.e., the ontology for forest inventory and mensuration [211
] (see chapter Ontologization). Sudmann et al. [236
] demonstrated how semantic web approaches have been successful in the online processing of big RS data.
6.7. Easy-to-Handle Environmental Assessment and Decision-Making Support Systems
For environmental assessment and decision-making in the context of forest health, numerous analyses and assessments have to be carried out such as suitability analyses for sites and new plantations, or investigations on the effects of climate change, deforestation, or forest management plans for future forest developments. At the same time, the complexity and the extent of data analyses is growing rapidly in forestry, biodiversity, ecology, or remote sensing. It is becoming more and more commonplace to carry out complex analyses and assessments by using hundreds of data files with different structures and data types e.g., genetic, species and communities, from the local to the global scale, and at different points in time in combination with numerous algorithms, as well as visualize the results in innovative ways [233
Kiker et al. [237
] argued that decision-making in the context of the common use of resources and the understanding of existing environmental issues is tremendously complex and evidently unsolvable due to the inherent compromises between ecological, socio-political, and economic factors. The science-policy platform for biodiversity and ecosystem services (IPBES) therefore calls for a progressive development of tools for environmental assessment and decision-making for analysis, modeling, simulation, and prediction to gain a better understanding of complex ecosystems [238
Therefore, environmental assessment and decision-making systems should be tools that on the one hand meet the requirements of data science, while on the other hand are still open, easy, and simple enough to use so that they can serve as assessment and decision-making tools in particular for scientists, data managers, and stakeholders who are not from computer science backgrounds. Therefore, the following requirements should be imposed upon environmental assessment and decision-making support systems:
Environmental assessment and decision-making must meet the requirements of data science. In particular, this means handling big data with high volume, velocity, variety, and veracity, and to integrate open data.
Tools must enable extensive algorithms for data mining and machine learning.
Systems must contain interfaces to the Semantic Web, thus enabling the analysis of semantically linked heterogeneous information via linked open data.
Systems must allow for the assessment of uncertainties of in situ monitoring, RS data, and data science in the assessment and the resulting decision.
Data science analyses should be possible with a simple coupling and management with other systems such as Haadop®, databanks, Web Services, Semantic Web Services, or cloud computing, as well as the use of the Google Earth Engine.
Systems should be implementable on simple computers that are only available for authorities, users, and decision-makers.
The very first data science and Semantic Web-oriented environmental assessment and decision-making support systems can be found in Jelokhani-Niaraki et al. [239
]. These systems fulfill the first requirements such as data science and Semantic Web integration. In addition to the developments already mentioned, there are also other scientific workflow systems such as Kepler [240
], Pegasus [241
], KNIME [242
], RapidMiner [243
], Apache Taverna [244
], VisTrails [246
], or Galaxy [247
]. These are successful technologies for easy, applicable, and feasible analysis and decision-making support systems for scientists, forest managers, or stakeholders [233
]. Such tools must form the basis of an easy-to-handle environmental assessment and decision-making support system for scientists, data managers, and stakeholders in a future MUSO-FH-MN.
7. Requirements for a Future Multi-Source Forest Health Monitoring Network (MUSO-FH-MN)
At present, there is no multi-source forest health monitoring network that meets the requirements of data science, and thus the future requirements for data monitoring, management, linkage and the assessment of FH data in the 21st century.
The requirements mentioned in this paper show that so far, no monitoring approach, technique, model, or platform exists that is sufficient on its own to monitor, model, forecast, or assess forest health and its resilience. The increasing digitalization, the handling of petabytes for FH monitoring, and the promotion of human–computer communications, not to mention the transformation of FH monitoring information and approaches to data science and Semantic Web technologies, will determine the receptiveness of the analysis and communication process to FH monitoring. To facilitate this process, the following criteria will be crucial for a future MUSO-FH-MN (see also Figure 13
(I) Suitable Indicators of FH: FH indicators should be (a) indicators of status, stress, disturbances, and resource limitations that are (b) recordable on different levels of biological organization of the forest ecosystem; (c) standardizable; (d) ascertainable for different in situ monitoring approaches as well as RS approaches on different platforms; and (e) transferable to digital form through a human–computer communication interpreting language, and therefore meet the requirements of data science and the Semantic Web.
(II) Integration of existing data, networks, and platforms:
These have to be integrated into the MUSO-FH-MN, and should be able to be coupled with each other to enable a comprehensive assessment of FH that includes various drivers. The MUSO-VH-MN should integrate the following data and site survey platforms for animal and plant species and forest habitats: data of site surveys for species, species lists, species data of meta-barcoding, microgenomics, lysimeters, plant phenomics facilities [78
], controlled environmental facility—Ecotron’s [81
], spectral laboratory experiments, biodiversity ecosystem functioning experiments [10
], and long-term ecological research [98
]. In terms of remote sensing data, the following should be integrated: optical (multispectral, hyperspectral), thermal, radar, LiDAR data, laboratory data, tower, camera traps, wireless sensor networks, drones, close-range, airborne and spaceborne RS platforms, existing databases, networks, citizen science information, and abiotic (soil, water, air), as well as socio-economic information.
(III) Linkage of different FH monitoring approaches:
Future site-based long-term research and monitoring concepts will require the coupling of different approaches, which are: (1) for forest monitoring—the forest inventory approaches; (2) for vegetation monitoring beyond forest inventory monitoring—the phylogenetic species concept, the biological species concept, the morphological species concept, [58
], and the concept of phenotyping [162
]; (3) for abiotic and process monitoring—the concept of ecological integrity [33
] and (4) for remote sensing—the Spectral Trait/Spectral Trait Variation Concept (RS-ST/STV-C), [25
]. The coupling should be done with the help of techniques from data science.
(IV) Data science as a bridge:
When it comes to developing a MUSO-FH-MN, it is not surprising that big FH data with enormous complexity and syntactic and semantic heterogeneity in data types and formats require new solutions to fulfill the requirements of the 21st
century for monitoring, analysis, prognosis, and the assessment of FH. Therefore, data science bridges the gaps in managing these problems. The following elements of data science are crucial in this respect: (1) digitalization; (2) standardization; (3) the Semantic Web; (4) Data science analysis; (5) proof, trust, and uncertainties, and (6) easy-to-handle but data science-based environmental assessment and decision-making support systems for scientists, data managers, and stakeholders. The respective elements of the six groups of data science are complementary, and not always clearly attributable. Nevertheless, data science for a future MUSO-FH-MN can be described in detail by the following criteria (see Figure 13
The digitalization of FH data and information will determine the receptiveness of the forest analysis process in the future MUSO-FH-MN. Crucial criteria and elements of the digitalization process are: not all elements can be digitally monitored in the digitalization process, and consequently, one must differentiate between human-driven monitoring and digital-driven monitoring elements. Furthermore, the following aspects are all important to foster the digitalization process: open access to tools, software, algorithms, instruments, or platforms, freely available data, free policy for species and RS and geodata [55
], abiotic, socio-economic and other geo-data, the development and use of Open Science Clouds [175
], Thematic Exploitation Platforms, the handling of big FH data with high volume, velocity, variety and veracity, as well as the management of distributed repositories.
FH data, information, indicators, data management, various FH monitoring approaches, tools, algorithms, and models all have to be standardized, administered, stored, processed, updated, as well as linked and evaluated with other platforms and networks. The basic standardization and the basic elements of data science are effective metadata management based on the principles of FAIR with the four criteria of Findability, Accessibility, Interoperability, and Reusability of metadata [191
]. Standards in data management, standards in forest inventory monitoring, in situ monitoring beyond forest inventory monitoring for animals and plant species, as well as standards in RS approaches are crucial elements in data science. Furthermore, it is imperative to implement and integrate various existing concepts of essential variables such as the Essential Climate Variables (ECV) [194
], the Essential Variables for Weather—EVW [195
], the Essential Ocean Variables—EOV [193
], the Essential Biodiversity Variables (EBV) [192
], in order to develop the Essential GeoVariables—GEV GeoEssential (http://www.geoessential.net/
), as well as develop the essential variables for domains such as agriculture, soils, catastrophes, ecosystems, health, and urban development.
The linking of complex, heterogeneous, and multidimensional FH information, indicators, data, Internet of FH Things (IOT), information, monitoring approaches, tools, different scales, RS platforms, and models as well as assessment and decision-making support systems for scientists, data managers, and stakeholders in a semantic-enabling way according to the standards of the World Wide Web Consortium [205
] is an important step to cope with the human–computer communication process and couple complex FH data, information, models and platforms. Important elements of the semantic web are: semantification, ontologization [206
], Linked Open Data approaches [248
] and Spatially Linked Open Data approaches [189
Proof, Trust, and Uncertainties: Unlike many other research areas, most approaches, data, information or models used in data science involve a certain level of uncertainty i.e., uncertainties in forestry inventory monitoring information, in situ uncertainties, RS uncertainties, and data science uncertainties.
Data Science Analysis: The digitalization of the world and forest ecosystem components requires handling Big Data along with its four aspects: volume, velocity, variety, and veracity. Consequently, data science analysis requires methods of data mining, machine learning, deep learning, tools, systems, or platforms such as Hadoop, Google Engine, Hosting services, Semantic Web Services, cloud computing, as well as Thematic Exploitation Platforms. Here, deep learning is central to identifying, processing, and analyzing patterns in automation and new innovations of FH analysis and assessment.
Tools for scientists, data managers, and stakeholders: Crucial elements for a data-driven, fast, objective, applicable, and implementable decision-making support system for forest managers, stakeholders, and politicians are: open, easy handling and data science-based environmental assessment and decision-making support systems, comprehendible and easy-to-operate scientific workflows, and easy and up-to-date data publishing tools.
Stress, disturbances, and reduced resilience in forest ecosystems are constantly increasing. Causes, drivers, and responses of FH are often complex, multidimensional, multi-scale, and non-linear. However, forest managers, decision makers, and policy need to be able to make decisions that are data-driven and based on short and long-term monitoring. There is a great need to objectively assess the state, changes, and resilience of FH as a basis for the successful management of forest conversions and the stabilization of damaged forest ecosystems.
Previous FH monitoring approaches and initiatives for monitoring, as well as the standardization and digitalization of FH are good examples on which other monitoring strategies can be modeled. At the same time, the use and integration of numerous freely available RS data in FH are advancing at an unprecedented rate.
The goal of the paper was not to refine existing in situ and RS-based FH monitoring approaches, but rather to discuss which requirements are essential to bridge the gaps in linkage, information, data, models, and tools that are needed to fulfill the impending requirements of the 21st century for monitoring, data management, analysis, prognosis, and the assessment of FH. Five sets of requirements were discussed in detail, as was their relevance, necessity, and possible solutions that would be necessary for establishing a future MUSO-FH-MN that fulfills the requirements of the 21st century in ecology and information processing. Namely, these requirements include: (1) understanding the effects of multiple stressors on forest health; (2) using RS approaches to monitor forest health; (3) coupling different monitoring approaches; as well as (4) using data science as a bridge between complex and multidimensional big FH data; and (5) a future multi-source forest health monitoring network. We particularly elaborated on the requirements of data science, since this approach is regarded as expedient for a future MUSO-FH-MN.
No existing monitoring approach, technique, model, or platform is sufficient on its own to monitor, model, forecast, or assess forest health and its resilience. Therefore, to set up a future MUSO-FH-MN, the following main elements should be considered: (I) the selection and monitoring of suitable indicators of FH; (ii) the integration of existing data, networks and platforms; (iii) the linkage of different FH monitoring approaches to monitor indicators of FH, as well as (iii) using data science as a bridge for handling and coupling the volume, velocity, variety, and veracity of big forest health data.
There exist first very good environmental research infrastructures (RIs) for the implementation of data science and existing forest health networks for assembly a MUSO-FH-MN called ENVRIplus (http://www.envriplus.eu/
). ENVRIplus bringing together environmental and earth system research infrastructures, research networks and projects together to create a more coherent, interdisciplinary, standardizied and interoperable cluster of environmental research infrastructures across europe [249
We are living in a time of “ecological impoverishment”, but also one with novel and feasible ideas. However, we are also experiencing a time of “technical puberty” for human–machine interactions. On the one hand, we are disciplinary-focused, but on the other hand, we are still seeking re-orientation, and are aware that biodiversity and forest health require a holistic and interdisciplinary approach in measurement, coupling, modeling, prediction, and assessment. Holistic, complex, and multidisciplinary systems require the development, implementation, and application of novel concepts, methods, and tools that sometimes still appear to us as unrealistic and not very feasible, but still enable a new path to better understand our complex world.