Editorial of Special Issue “Machine and Deep Learning for Earth Observation Data Analysis”

Earth observation and remote sensing technologies provide ample and comprehen-sive information regarding the dynamics and complexity of the Earth system. Currently, sizeable, high-velocity and heterogeneous data are products of the Earth’s subsystems’ continuous mapping and constitute new areas for exploitation in support of evidence-based science and informed policy making. In particular, open and free data provided by the European Union Copernicus Programme with the Sentinel ﬂeet of satellites

Earth observation and remote sensing technologies provide ample and comprehensive information regarding the dynamics and complexity of the Earth system. Currently, sizeable, high-velocity and heterogeneous data are products of the Earth's subsystems' continuous mapping and constitute new areas for exploitation in support of evidencebased science and informed policy making. In particular, open and free data provided by the European Union Copernicus Programme with the Sentinel fleet of satellites and the United States National Aeronautics and Space Administration, through its open data policy, have defined new challenges and form an outstanding blueprint that makes this wealth of information, together with the respective scientific developments, accessible to all levels of an inquiring society. The diversity, the reception frequency, the magnitude and, most importantly, the rich content of those data collections, call for robust, high-performance and large-scale data analysis methods.
The Special Issue "Machine and Deep Learning for Earth Observation Data Analysis" has as an objective to jointly present the new developments in the topics of Earth observation, big data and automated data-driven modeling and analysis with a focus on machine and deep learning techniques. The latter turns out to be a very efficient type of modeling, boosted by the advancements of distributed computing, cloud computing and optimized chips on graphics processing unit platforms. Both machine and deep learning tend to become the modus operandi for automatically revealing patterns, associations and valuable insights from big data collections. Data-driven approaches operate as complementary to classic physics-based approaches and reflect a methodological paradigm shift in Earth system data exploitation.
The present volume covers a wide range of applications emphasizing the role of data-driven modeling in Earth observation data analysis: image semantic segmentation by exploiting point clouds derived from tri-stereo satellite imagery, remote sensing-based monitoring of urban sprawl-related issues, rapid response to natural hazards (floods) through satellite image analysis, satellite-based precipitation data for hydrological modeling, support for sustainable land use management and ecosystem services through monitoring of erosion in alpine grasslands and effective remote sensing image information retrieval.
A more detailed description of the included works is as follows: The authors in [1] investigate the applicability of point clouds derived from tri-stereo satellite imagery for semantic segmentation by means of a generalized sparse convolutional neural network. In particular, a fully convolutional neural network that uses generalized sparse convolution is trained one time solely on 3D geometric information without the use of class weights, and twice on 3D geometric as well as color information with the use of class weights. This type of modeling is compared against a fully convolutional neural network that is trained on a 2D orthophoto, and a decision tree that is trained once on hand-crafted 3D geometric features, and trained once on hand-crafted 3D geometric and color features. The findings of this study are: (1) Geometric and color information only improves the performance of the generalized sparse convolutional neural network on the dominant class, which leads to a higher overall performance; (2) training the network with median class weighting partially reverts the effects of adding color and it also starts learning the classes with lower occurrences; (3) the fully convolutional neural network that is trained on the 2D orthophoto generally outperforms the other two models with a kappa score of over 90% and an average per class accuracy of 61%; however, the decision tree trained on colors and hand-crafted geometric features has a 2% higher accuracy for roads.
In [2], the authors study the efficiency of visible orthophotographs and photogrammetric dense point clouds in building detection with segmentation-based machine learning using visible bands, texture information and spectral and morphometric indices in different variable sets. According to the reported results: (1) Random forest achieves the best overall accuracy (99.8%) whereas partial least squares scores the lowest accuracy (~60%); (2) recursive feature elimination turns out to be an efficient variable selection method, resulting in six most significant variables out of 31 available variables; (3) morphometric indices help to achieve 82% producer's and 85% user's accuracy, while their combination with spectral and texture indices improves the results further; (4) due to the fact that morphometric indices are not always available, the combination of texture and spectral indices with RGB bands improves the producer's accuracy by 12% and the user's accuracy by 6%.
The study in [3] presents a fully automated approach based on a convolutional neural network that identifies image pixels that indicate flood events in freely available Copernicus Sentinel-1 synthetic aperture radar imagery with minimal pre-processing. Several CNN architectures are tested and flood masks generated via a combination of classical semiautomated techniques and extensive manual cleaning and visual inspection are employed for the model training. The paper concludes that the proposed methodology reduces the required time to develop a flood map by 80% while achieving high performance over a wide range of locations and environmental conditions. The work in [4] presents a combination of the convolutional neural network and the autoencoder architecture (ConvAE) to correct the pixel-by-pixel bias between two satellite-based products: The Asian Precipitation-Highly-Resolved Observational Data Integration towards Evaluation (APHRODITE), and the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record (PERSIANN-CDR). The performance of the bias correction methods is evaluated in terms of the probability distribution, temporal correlation and spatial correlation of precipitation. The findings suggest that: (1) ConvAE outperforms basic techniques such as the one relying on the standard deviation method; (2) ConvAE exhibits high performance in capturing extreme rainfall events, distribution trends and described spatial relationships between adjacent grid cells well; (3) experimental results support ConvAE's potential to resolve the precipitation bias correction problem.
The authors in [5] have developed a sophisticated erosion monitoring tool capable of large-scale analysis. A U-Net convolutional neural network is adjusted to map different erosion processes in high-resolution aerial images that span a 16-year period and are based on labeled erosion sites mapped with object-based image analysis (OBIA). The experimental outcome confirms that: (1) Results obtained by OBIA and U-Net follow similar linear trends for the 16-year study period, exhibiting increases in total degraded area of 167% and 201%, respectively; (2) segmentations of eroded sites are generally in good agreement but also display method-specific differences (overall precision of 73%, recall of 84% and F1-score of 78%); (3) U-Net is transferable to spatially and temporally unseen data and capable of efficiently capturing the temporal trends and spatial heterogeneity of degradation in alpine grasslands.
The work in [6] proposes a high-resolution remote sensing image retrieval method that leverages the following mechanisms: (i) A deep network that employs ResNet as a backbone, a Gabor filter for capturing the spatial frequency structure of the images and a channel attention component to detect discriminative features, and (ii) a split-based feature transform network that divides the extracted features into several segments and transforms them separately for reducing the dimensionality and the required storage space. Experimental results of four data sets show competitive performance compared to state-ofthe-art techniques, particularly when the image retrieval task involves rare target objects and complex texture.