Special Issue "Machine Learning Applications in Earth Science Big Data Analysis"

A special issue of Remote Sensing (ISSN 2072-4292).

Deadline for manuscript submissions: closed (31 October 2018).

Special Issue Editors

Dr. Sangram Ganguly
Website
Guest Editor
NASA Ames Research Center and Bay Area Environmental Research Institute, Bldg. 566, Room 114, NASA Ames Research Center, Moffett Field, CA 94035, USA
Interests: radiative transfer theory; machine learning and data science; advanced remote sensing techniques for carbon modeling and vegetation structure; climate modeling; high performance computing and cloud computing; large-scale image processing and signal processing
Special Issues and Collections in MDPI journals
Dr. Ramakrishna Nemani
Website
Guest Editor
Earth Science Division, NASA Ames Research Center, Washington, DC 20546, USA
Interests: ecosystem modeling; vegetation-climate interactions; remote sensing of vegetation
Special Issues and Collections in MDPI journals

Special Issue Information

Dear Colleagues,

Earth science research encompasses a wide gamut of study areas. These include studies of the Earth’s interior to its atmosphere, hydrosphere, and biosphere. The research community has continuously strived to understand and model the governing physics behind the observed phenomena and processes representing the complex space–time dynamics. This process benefited from an ever-growing suite of tools and data measured from ground-based instruments, space-borne sensors, and model-generated insights. Theoretical advancements in physics-based modeling, information theory, image understanding, pattern recognition and machine learning have already seen applications in Earth sciences over the last three decades. The open availability of consistently large datasets from multiple measurement systems now provides a unique opportunity to innovate and devise new ways to analyze and process these datasets to obtain valuable insights. These insights have implications for policymakers, stakeholders and for engineering new approaches in managing natural resources and/or for mitigating the impact of changes in climate or other natural disasters.

This Special Issue focuses on applying novel machine learning algorithms and paradigms, including new methodologies (e.g., deep neural networks) for analyzing spatiotemporal datasets to derive new insights or mimic traditional physics-based model-generated output. Traditional machine learning models, in terms of applicability, have been limited by a number of factors. These include a lack of model complexity in representing large non-linear systems, ability to parse through large volumes of data, computational runtime, lack of training data, and use of domain knowledge in influencing the "input" states of model architecture. In order to achieve wider applicability and acceptability, machine learning models will need to embed uncertainty quantification, encapsulate or work in tandem with known physics while offering new insights, account for correlations or nonlinear dependence and ideally persistence and teleconnection. They should remain open to scientific and data-driven interpretations and scale to a wide variety of problems solvable in desktop environments to high performance computing architectures. Novel machine learning models show promise across diverse disciplines as evidenced by a plethora of publications in the last few years in outperforming established benchmarks in prediction, forecasting, classification, and recommendations. Authors are encouraged to understand and evaluate their applicability, including whether such approaches may be combined with physics-based computer models to increase the accuracy of model predictions and at the same time leverage the growing volume of data in a way that new evolving technologies in hardware and storage are leveraged to address optimization, scalability, and portability.

This special issue encourages submissions to the following topics, but not limited to:

  • Newer architectures for image semantic segmentation in Earth sciences (e.g., deep convolutional neural networks)
  • New approaches for data downscaling and data fusion from multiple sources (e.g., optical with radar)
  • Model reproducibility towards physics-based simulations and hybrid approaches for biophysical data retrieval
  • New methodologies for feature engineering, image pre-processing and feature ranking as relevant to machine learning models
  • Object identification, change detection and unknown pattern synthesis from both ground-based and over-the-top imagery systems
  • New unsupervised approaches for training data generation
  • New methods for time series forecasting, anomaly detection, precursor analysis and gap filling
  • Model scaling, generalization, ensemble learning and transfer learning approaches
  • Embedded AI for Unmanned Aircraft Systems (UAS), airborne missions or SmallSats

As part of this special issue and in the process of manuscript acceptance, authors are also encouraged to submit their own Github repositories and/or containers (e.g. Docker) for other researchers to replicate model results, workflows and in essence encourage a collaborative mechanism for sharing research results.

Authors are requested to check and follow the Instructions to Authors, see https://www.mdpi.com/journal/remotesensing/instructions.

We look forward to receiving your submissions in this interesting area of specialization.

Dr. Sangram Ganguly
Dr. Ramakrishna Nemani
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Open AccessArticle
Arctic Vegetation Mapping Using Unsupervised Training Datasets and Convolutional Neural Networks
Remote Sens. 2019, 11(1), 69; https://doi.org/10.3390/rs11010069 - 02 Jan 2019
Cited by 2
Abstract
Land cover datasets are essential for modeling and analysis of Arctic ecosystem structure and function and for understanding land–atmosphere interactions at high spatial resolutions. However, most Arctic land cover products are generated at a coarse resolution, often limited due to cloud cover, polar [...] Read more.
Land cover datasets are essential for modeling and analysis of Arctic ecosystem structure and function and for understanding land–atmosphere interactions at high spatial resolutions. However, most Arctic land cover products are generated at a coarse resolution, often limited due to cloud cover, polar darkness, and poor availability of high-resolution imagery. A multi-sensor remote sensing-based deep learning approach was developed for generating high-resolution (5 m) vegetation maps for the western Alaskan Arctic on the Seward Peninsula, Alaska. The fusion of hyperspectral, multispectral, and terrain datasets was performed using unsupervised and supervised classification techniques over a ∼343 km2 area, and a high-resolution (5 m) vegetation classification map was generated. An unsupervised technique was developed to classify high-dimensional remote sensing datasets into cohesive clusters. We employed a quantitative method to add supervision to the unlabeled clusters, producing a fully labeled vegetation map. We then developed convolutional neural networks (CNNs) using the multi-sensor fusion datasets to map vegetation distributions using the original classes and the classes produced by the unsupervised classification method. To validate the resulting CNN maps, vegetation observations were collected at 30 field plots during the summer of 2016, and the resulting vegetation products developed were evaluated against them for accuracy. Our analysis indicates the CNN models based on the labels produced by the unsupervised classification method provided the most accurate mapping of vegetation types, increasing the validation score (i.e., precision) from 0.53 to 0.83 when evaluated against field vegetation observations. Full article
(This article belongs to the Special Issue Machine Learning Applications in Earth Science Big Data Analysis)
Show Figures

Graphical abstract

Open AccessArticle
Machine Learning-Based Slum Mapping in Support of Slum Upgrading Programs: The Case of Bandung City, Indonesia
Remote Sens. 2018, 10(10), 1522; https://doi.org/10.3390/rs10101522 - 22 Sep 2018
Cited by 12
Abstract
The survey-based slum mapping (SBSM) program conducted by the Indonesian government to reach the national target of “cities without slums” by 2019 shows mapping inconsistencies due to several reasons, e.g., the dependency on the surveyor’s experiences and the complexity of the slum indicators [...] Read more.
The survey-based slum mapping (SBSM) program conducted by the Indonesian government to reach the national target of “cities without slums” by 2019 shows mapping inconsistencies due to several reasons, e.g., the dependency on the surveyor’s experiences and the complexity of the slum indicators set. By relying on such inconsistent maps, it will be difficult to monitor the national slum upgrading program’s progress. Remote sensing imagery combined with machine learning algorithms could support the reduction of these inconsistencies. This study evaluates the performance of two machine learning algorithms, i.e., support vector machine (SVM) and random forest (RF), for slum mapping in support of the slum mapping campaign in Bandung, Indonesia. Recognizing the complexity in differentiating slum and formal areas in Indonesia, the study used a combination of spectral, contextual, and morphological features. In addition, sequential feature selection (SFS) combined with the Hilbert–Schmidt independence criterion (HSIC) was used to select significant features for classifying slums. Overall, the highest accuracy (88.5%) was achieved by the SVM with SFS using contextual, morphological, and spectral features, which is higher than the estimated accuracy of the SBSM. To evaluate the potential of machine learning-based slum mapping (MLBSM) in support of slum upgrading programs, interviews were conducted with several local and national stakeholders. Results show that local acceptance for a remote sensing-based slum mapping approach varies among stakeholder groups. Therefore, a locally adapted framework is required to combine ground surveys with robust and consistent machine learning methods, for being able to deal with big data, and to allow the rapid extraction of consistent information on the dynamics of slums at a large scale. Full article
(This article belongs to the Special Issue Machine Learning Applications in Earth Science Big Data Analysis)
Show Figures

Graphical abstract

Open AccessArticle
Machine Learning Using Hyperspectral Data Inaccurately Predicts Plant Traits Under Spatial Dependency
Remote Sens. 2018, 10(8), 1263; https://doi.org/10.3390/rs10081263 - 11 Aug 2018
Cited by 3
Abstract
Spectral, temporal and spatial dimensions are difficult to model together when predicting in situ plant traits from remote sensing data. Therefore, machine learning algorithms solely based on spectral dimensions are often used as predictors, even when there is a strong effect of spatial [...] Read more.
Spectral, temporal and spatial dimensions are difficult to model together when predicting in situ plant traits from remote sensing data. Therefore, machine learning algorithms solely based on spectral dimensions are often used as predictors, even when there is a strong effect of spatial or temporal autocorrelation in the data. A significant reduction in prediction accuracy is expected when algorithms are trained using a sequence in space or time that is unlikely to be observed again. The ensuing inability to generalise creates a necessity for ground-truth data for every new area or period, provoking the propagation of “single-use” models. This study assesses the impact of spatial autocorrelation on the generalisation of plant trait models predicted with hyperspectral data. Leaf Area Index (LAI) data generated at increasing levels of spatial dependency are used to simulate hyperspectral data using Radiative Transfer Models. Machine learning regressions to predict LAI at different levels of spatial dependency are then tuned (determining the optimum model complexity) using cross-validation as well as the NOIS method. The results show that cross-validated prediction accuracy tends to be overestimated when spatial structures present in the training data are fitted (or learned) by the model. Full article
(This article belongs to the Special Issue Machine Learning Applications in Earth Science Big Data Analysis)
Show Figures

Graphical abstract

Open AccessArticle
Mapping Burned Areas in Tropical Forests Using a Novel Machine Learning Framework
Remote Sens. 2018, 10(1), 69; https://doi.org/10.3390/rs10010069 - 06 Jan 2018
Cited by 6
Abstract
This paper presents an application of a novel machine-learning framework on MODIS (moderate-resolution imaging spectroradiometer) data to map burned areas over tropical forests of South America and South-east Asia. The RAPT (RAre Class Prediction in the absence of True labels) framework is able [...] Read more.
This paper presents an application of a novel machine-learning framework on MODIS (moderate-resolution imaging spectroradiometer) data to map burned areas over tropical forests of South America and South-east Asia. The RAPT (RAre Class Prediction in the absence of True labels) framework is able to build data adaptive classification models using noisy training labels. It is particularly suitable when expert annotated training samples are difficult to obtain as in the case of wild fires in the tropics. This framework has been used to build burned area maps from MODIS surface reflectance data as features and Active Fire hotspots as training labels that are known to have high commission and omission errors due to the prevalence of cloud cover and smoke, especially in the tropics. Using the RAPT framework we report burned areas for 16 MODIS tiles from 2001 to 2014. The total burned area detected in the tropical forests of South America and South-east Asia during these years is 2,071,378 MODIS (500 m) pixels (approximately 520 K sq. km.), which is almost three times compared to the estimates from collection 5 MODIS MCD64A1 (783,468 MODIS pixels). An evaluation using Landsat-based reference burned area maps indicates that our product has an average user’s accuracy of 53% and producer’s accuracy of 55% while collection 5 MCD64A1 burned area product has an average user’s accuracy of 61% and producer’s accuracy of 27%. Our analysis also indicates that the two products can be complimentary and a combination of the two approaches is likely to provide a more comprehensive assessment of tropical fires. Finally, we have created a publicly accessible web-based viewer that helps the community to visualize the burned area maps produced using RAPT and examine various validation sources corresponding to every detected MODIS pixel. Full article
(This article belongs to the Special Issue Machine Learning Applications in Earth Science Big Data Analysis)
Show Figures

Graphical abstract

Open AccessArticle
Google Earth Engine, Open-Access Satellite Data, and Machine Learning in Support of Large-Area Probabilistic Wetland Mapping
Remote Sens. 2017, 9(12), 1315; https://doi.org/10.3390/rs9121315 - 14 Dec 2017
Cited by 37
Abstract
Modern advances in cloud computing and machine-leaning algorithms are shifting the manner in which Earth-observation (EO) data are used for environmental monitoring, particularly as we settle into the era of free, open-access satellite data streams. Wetland delineation represents a particularly worthy application of [...] Read more.
Modern advances in cloud computing and machine-leaning algorithms are shifting the manner in which Earth-observation (EO) data are used for environmental monitoring, particularly as we settle into the era of free, open-access satellite data streams. Wetland delineation represents a particularly worthy application of this emerging research trend, since wetlands are an ecologically important yet chronically under-represented component of contemporary mapping and monitoring programs, particularly at the regional and national levels. Exploiting Google Earth Engine and R Statistical software, we developed a workflow for predicting the probability of wetland occurrence using a boosted regression tree machine-learning framework applied to digital topographic and EO data. Working in a 13,700 km2 study area in northern Alberta, our best models produced excellent results, with AUC (area under the receiver-operator characteristic curve) values of 0.898 and explained-deviance values of 0.708. Our results demonstrate the central role of high-quality topographic variables for modeling wetland distribution at regional scales. Including optical and/or radar variables into the workflow substantially improved model performance, though optical data performed slightly better. Converting our wetland probability-of-occurrence model into a binary Wet-Dry classification yielded an overall accuracy of 85%, which is virtually identical to that derived from the Alberta Merged Wetland Inventory (AMWI): the contemporary inventory used by the Government of Alberta. However, our workflow contains several key advantages over that used to produce the AMWI, and provides a scalable foundation for province-wide monitoring initiatives. Full article
(This article belongs to the Special Issue Machine Learning Applications in Earth Science Big Data Analysis)
Show Figures

Graphical abstract

Open AccessEditor’s ChoiceArticle
Exploring Subpixel Learning Algorithms for Estimating Global Land Cover Fractions from Satellite Data Using High Performance Computing
Remote Sens. 2017, 9(11), 1105; https://doi.org/10.3390/rs9111105 - 29 Oct 2017
Cited by 3
Abstract
Land cover (LC) refers to the physical and biological cover present over the Earth’s surface in terms of the natural environment such as vegetation, water, bare soil, etc. Most LC features occur at finer spatial scales compared to the resolution of primary remote [...] Read more.
Land cover (LC) refers to the physical and biological cover present over the Earth’s surface in terms of the natural environment such as vegetation, water, bare soil, etc. Most LC features occur at finer spatial scales compared to the resolution of primary remote sensing satellites. Therefore, observed data are a mixture of spectral signatures of two or more LC features resulting in mixed pixels. One solution to the mixed pixel problem is the use of subpixel learning algorithms to disintegrate the pixel spectrum into its constituent spectra. Despite the popularity and existing research conducted on the topic, the most appropriate approach is still under debate. As an attempt to address this question, we compared the performance of several subpixel learning algorithms based on least squares, sparse regression, signal–subspace and geometrical methods. Analysis of the results obtained through computer-simulated and Landsat data indicated that fully constrained least squares (FCLS) outperformed the other techniques. Further, FCLS was used to unmix global Web-Enabled Landsat Data to obtain abundances of substrate (S), vegetation (V) and dark object (D) classes. Due to the sheer nature of data and computational needs, we leveraged the NASA Earth Exchange (NEX) high-performance computing architecture to optimize and scale our algorithm for large-scale processing. Subsequently, the S-V-D abundance maps were characterized into four classes, namely forest, farmland, water and urban areas (in conjunction with nighttime lights data) over California, USA using a random forest classifier. Validation of these LC maps with the National Land Cover Database 2011 products and North American Forest Dynamics static forest map shows a 6% improvement in unmixing-based classification relative to per-pixel classification. As such, abundance maps continue to offer a useful alternative to high-spatial-resolution classified maps for forest inventory analysis, multi-class mapping, multi-temporal trend analysis, etc. Full article
(This article belongs to the Special Issue Machine Learning Applications in Earth Science Big Data Analysis)
Show Figures

Graphical abstract

Open AccessArticle
Exploring the Potential of WorldView-2 Red-Edge Band-Based Vegetation Indices for Estimation of Mangrove Leaf Area Index with Machine Learning Algorithms
Remote Sens. 2017, 9(10), 1060; https://doi.org/10.3390/rs9101060 - 18 Oct 2017
Cited by 20
Abstract
To accurately estimate leaf area index (LAI) in mangrove areas, the selection of appropriate models and predictor variables is critical. However, there is a major challenge in quantifying and mapping LAI using multi-spectral sensors due to the saturation effects of traditional vegetation indices [...] Read more.
To accurately estimate leaf area index (LAI) in mangrove areas, the selection of appropriate models and predictor variables is critical. However, there is a major challenge in quantifying and mapping LAI using multi-spectral sensors due to the saturation effects of traditional vegetation indices (VIs) for mangrove forests. WorldView-2 (WV2) imagery has proven to be effective to estimate LAI of grasslands and forests, but the sensitivity of its vegetation indices (VIs) has been uncertain for mangrove forests. Furthermore, the single model may exhibit certain randomness and instability in model calibration and estimation accuracy. Therefore, this study aims to explore the sensitivity of WV2 VIs for estimating mangrove LAI by comparing artificial neural network regression (ANNR), support vector regression (SVR) and random forest regression (RFR). The results suggest that the RFR algorithm yields the best results (RMSE = 0.45, 14.55% of the average LAI), followed by ANNR (RMSE = 0.49, 16.04% of the average LAI), and then SVR (RMSE = 0.51, 16.56% of the average LAI) algorithms using 5-fold cross validation (CV) using all VIs. Quantification of the variable importance shows that the VIs derived from the red-edge band consistently remain the most important contributor to LAI estimation. When the red-edge band-derived VIs are removed from the models, estimation accuracies measured in relative RMSE (RMSEr) decrease by 3.79%, 2.70% and 4.47% for ANNR, SVR and RFR models respectively. VIs derived from red-edge band also yield better accuracy compared with other traditional bands of WV2, such as near-infrared-1 and near-infrared-2 band. Furthermore, the estimated LAI values vary significantly across different mangrove species. The study demonstrates the utility of VIs of WV2 imagery and the selected machine-learning algorithms in developing LAI models in mangrove forests. The results indicate that the red-edge band of WV2 imagery can help alleviate the saturation problem and improve the accuracy of LAI estimation in a mangrove area. Full article
(This article belongs to the Special Issue Machine Learning Applications in Earth Science Big Data Analysis)
Show Figures

Graphical abstract

Back to TopTop