An Incongruence-Based Anomaly Detection Strategy for Analyzing Water Pollution in Images from Remote Sensing

The potential applications of computational tools, such as anomaly detection and incongruence, for analyzing data attract much attention from the scientific research community. However, there remains a need for more studies to determine how anomaly detection and incongruence applied to analyze data of static images from remote sensing will assist in detecting water pollution. In this study, an incongruence-based anomaly detection strategy for analyzing water pollution in images from remote sensing is presented. Our strategy semi-automatically detects occurrences of one type of anomaly based on the divergence between two image classifications (contextual and non-contextual). The results indicate that our strategy accurately analyzes the majority of images. Incongruence as a strategy for detecting anomalies in real-application (non-synthetic) data found in images from remote sensing is relevant for recognizing crude oil close to open water bodies or water pollution caused by the presence of brown mud in large rivers. It can also assist surveillance systems by detecting environmental disasters or performing mappings.

Anomaly detection [13] and incongruence [15][16][17][18] are two powerful computational tools from pattern recognition (PR) [3,[11][12][13][14][15][16][17][18]34,35] and computer vision (CV) [3,36]. PR is a scientific area of study dedicated to analyze patterns and regularities in data [3]. PR provides powerful tools [34] for many different applications and research areas [35] such as scientific research, private and public industries, military activities, etc., [14, 21-24, 32, 37-48]. For example, PR is important for geosciences as its tools are used to analyze geographical features of environments in digital images from remote sensing, i.e., scenes [11,12,[49][50][51][52][53][54]. Additionally, PR also provides powerful tools to help machine perception for CV [3,36]. CV is a scientific area of study committed to developing artificial systems responsible for obtaining information from multidimensional data, such as images [36]. A relevant example of machine perception is the ability of computers to recognize behaviors of geographical features of interest, such as rivers, lakes, fields, plantations, forests, etc. Our interest goes further than a solution based on PR and CV for a human user of a computer to apply in order to solve a specific and already-known problem from remote sensing, such as only detecting water pollution [55][56][57][58][59][60] as an outlier in an image. In this case, we could use spectral analysis [61] (spectral index [62] or slope ratio [63]) to solve our problem directly. Our motivation has been to find a real application to the taxonomy [13] inspired by the real-world situations captured in images from remote sensing [2,[31][32][33].
When analyzing an image for the first time, a machine could apply each of all known solutions already used in remote sensing (e.g. spectral analysis [61]), one at a time, until detecting a problem. However, this could be considered trial and error learning [35]. The use of the taxonomy [13] can bring to the computational systems a more organized and structured approach of machine learning [28,35,[64][65][66][67][68][69][70][71][72][73][74] for detecting problems, i.e., more "intelligence." In this case, for example, a decisionmaking system should itself first identify a problem with the minimum interaction with humans before offering an adequate solution that could be based on spectral analysis [61]. A decision-making system [13,[16][17][18] is a software installed in a computer which needs to identify different types of problems in order to automatically select appropriate responses to deal with each type of problem [13]. In other words, the machine should choose the appropriate response to the detected problem based on the type of anomaly identified by the system that could be, for example, unexpected structure and structural components [13]. Our research is applied in this context.
The innovation of this work is to introduce the first solution to a practical application of the taxonomy [13] to solve real-world problems [2,[31][32][33]. What makes our study unique is that our study allows computers to categorize real-world problems based on the use of a thorough taxonomy [13] which is a "scaffolding" [75] to increase the machine's abilities [35]. In other words, beyond identifying problems, this study opens precedents for the machines to recognize the contexts in which the problems are inserted. Each type of anomaly represents a different context [13]. Therefore, the most appropriate response to a real-world problem depends on the context in which that problem is inserted [13]. Humans can learn how to solve problems inserted in different contexts much better than machines [75]. However, our study focuses on machines rather than humans. Providing conditions for a machine to learn how to distinguish and detect different real-world problems taking into consideration the different contexts and based on a single concept (the taxonomy [13]) make our study unique. This leads to a higher level of meaningful learning, i.e., an evolution in machine learning [28,35,[64][65][66][67][68][69][70][71][72][73][74].
Among all relevant areas of research, we have found the first solution to this problem just in remote sensing. It opens up an entirely new perspective and a set of possibilities for future work in remote sensing and many other areas. In remote sensing, for example, the already known solutions for specific problems, such as those based on spectral analysis [61], can be used to spread the taxonomy [13] to be applied on different problems. In this research, the real-world situation captured in images from remote sensing which inspired us to investigate solutions in PR and CV is water pollution, mainly regarding occurrences of disasters [31][32][33]. If this situation could be revealed by the disagreement between classifiers [13,[15][16][17][18], then water pollution could be discovered by anomaly detection [13].
This paper is organized as follows: Section 2 presents a background; Section 3 comments on related work; Section 4 summarizes the proposed strategy; Section 5 describes the materials and methods used for performing the experiments; Section 6 reviews the achieved results; Section 7 discusses the results and Section 8 presents the conclusions.

The Differences between Outlier, Anomaly, and Incongruence
In a statistical analysis, an outlier [19][20][21][22][23][24] is an observation which deviates significantly from other observations [19]. The deviation makes the observation appear to be generated by a different mechanism. Therefore, the observation appears to be inconsistent with the remainder of the data series. For example, consider an image of a large river for which all pixels are blue except one red pixel. In this case, the red pixel is an outlier. Outlier, in the conventional meaning, is a term used to define any type of anomaly [19]. The term "detection of outlier" [19][20][21][22][23][24] must be used when it is not necessary to categorize the anomaly. The term "detection of anomaly" [13] must be used only when it is necessary to categorize the anomaly. Therefore, it is common to find inadequate exchange of these terms in the scientific literature [14,19].
Anomaly detection is the activity of discovering and categorizing non-conforming patterns and occurrences when there is a failure to relate observed sensory data to information we already know about the image [13]. For example, the detection of absence of water in the part of an image that is correspondent to a large river is an out-of-pattern occurrence and, therefore, an anomaly. Because there are different types of anomalies [13], their detection must distinguish each type of anomaly.
For example, anomalies of the type unexpected structure and structural components are those ones detected when the analyses are performed taking into consideration the features which allow classifiers to identify components of samples and how they are structured to compose the samples [13]. The same analyses are applied to detect where differences related to components or their structures can be found in the image, i.e., spatial analyses [13]. Nonetheless, it does not consider when they occurred, i.e., temporal analyses [13]. Therefore, there is no time-series [76][77][78], i.e., satellite images before and after the occurrence of an event [19] or disaster [31][32][33], involved in the context. Moreover, detection of anomaly of this type depends on other three conditions to be satisfied: high sensory data quality [26], contextual [9], and non-contextual [8,[27][28][29][30] classification [5,13,[15][16][17][18] of the same image, and incongruence [13,[15][16][17][18]. An example of this type of anomaly is related to detecting brown mud where there are high levels of turbidity in large rivers. In contrast, for example, incongruent results from contextual and non-contextual classification of images qualified with low values indicate the presence of another type of anomaly [13]. In another example, incongruent results from contextual and non-contextual classification of time-series images [76][77][78] qualified with high values indicate the presence of a third type of anomaly [13].
Incongruence [13,[15][16][17][18] is the presence of discordant results and occurs whenever there is contradictory evidence provided by the analyses of sensory data performed by different types of classifiers [13]. For example, taking into consideration the two parallel analyses of the same part of an image that is correspondent to a large river, there is incongruence if water is detected by one analysis and not detected by the other one.

Non-Contextual and Contextual Classifiers
An environment can be modeled by some computational tools, e.g., those based on machine perception, using samples (aggregation of pixels) from the application domain and previous information about the problem [15]. Models trained with different samples are able to be represented by classifiers [16]. In PR, classifiers are experts or learners which perform classifications, i.e., they assign objects to one class or category among predefined others [79]. While predicting classes for a common input, such as a scene, classifiers are expected to present similar probability estimates despite the environment [16]. Consequently, computational tools, which base their decision-making actions on the use of multiple classifiers, usually expect all classifiers to support the same hypothesis [17].
There are two different types of classifiers, non-contextual and contextual [13]. Non-contextual classification is a general-level task that results in a weak learner, but it is less constrained than the contextual one [79]. It is more powerful than a random classifier, despite being much less precise when compared to the contextual one [79]. Contextual classification is a specific-level task which leads to a strong learner, but it is more dependent on specific knowledge such as previous knowledge or training data [9]. It is composed of multiple weaker classifiers that work in synergy to reach a more robust classification [9].
On one hand, when either two non-contextual [27][28][29][30] or two contextual [9] classifiers diverge in their classifications [13,[15][16][17][18], this condition only exposes the limited classification of one of the classifiers. It does not present any relation to incongruence detection [13,[15][16][17][18] for both classifiers. On the other hand, models of an environment which are represented by non-contextual and contextual classifiers can be used for identifying incongruences if they are used concurrently [15]. After receiving input data, both types of classifiers produce class-posterior probabilities as results [15]. Incongruence happens when the input data presented to both classifiers induces a significantly large discrepancy between posterior probabilities, i.e., incongruence is a conflicting prediction that happens when the probability calculated by some non-contextual classifier is much bigger than the probability calculated by some contextual classifier [15]. Therefore, the concurrent use of non-contextual and contextual classifiers allows systems to recognize the occurrence of conflicting classifications [15]. It enhances the capability of decisionmaking systems to detect incongruence [15][16][17][18]. Consequently, decision-making systems can have their control actions also conditioned by the monitoring and detection of incongruence [15][16][17][18]. Decision-making systems can guide computers to give adequate and fast responses after detecting problems [13], e.g., to prevent water pollution disasters or to analyze their extents in the case of their occurrences. For this reason, incongruence is of considerable interest for decision-making systems, since the strategy reveals the occurrence of some type of anomaly [13].

Research on Outliers Detection
Most current research focuses on outlier detection, such as [1,14,[19][20][21][22][23][24][25]44,[51][52][53][54]66,[76][77][78][80][81][82], however we discuss below only those research which are more relevant to this article. An outlierbased change detection method to detect abnormal points in multi-scale time-series images from remote sensing was proposed by Yin Shoujing et al. in [76]. The method analyzes time-series images from remote sensing and detects temporal and spatial changes. For that research, changes caused by weather condition variation, climate changes, sensor aging, vegetation phenology, human activities, and emergencies (such as drought, fire, pest, insect, etc.,) were considered abnormal. The research focused on land cover changes. These changes are relatively rare over large areas, even for a long period. The research concluded that the land cover changes registered in the images were outliers.
An algorithm capable of identifying outliers in the global data (distributed earth science databases) without moving the whole data to a single database was presented by Bhaduri et al. in [80]. The algorithm analyzes the data to search for extremes or outliers. The algorithm detects outliers which are missed when data is available only at a single database. Their achieved results were compared to the achievements of other algorithms which also detect outliers. An unsupervised outlier detection framework was proposed by Qi Liu et al. in [25]. The framework requires no prior knowledge to detect outliers and anomalous events. The research defined outliers as objects which have either low spatial or temporal coherence with their neighbors, and it defined an anomalous event as a group of outliers that share similar spatial and temporal anomalous behaviors. The research has defined four different categories of outliers or anomalous events and has reinforced the importance of basing anomaly detection studies on taxonomies of anomalies. Nonetheless, outlier detection is not enough for systems to categorize many types of problems. Anomaly detection can overcome this drawback.
A new daily snow cover dataset was developed by Bormann et al. in [81]. The dataset is a satellite-based observational record useful to characterize snow duration and cover extent. These snow cover observations are important to detect outliers such as anticipated melt of snow, declines in snow cover extent, and short season duration. The dataset was assessed using snow detection estimates derived from Landsat Thematic Mapper data and was compared to another snow cover dataset. Although the study focused on satellite data for the alpine region in Australia, the approach can be applied to other regions and other sensors to help assess snow monitoring. Consequently, the study contributes to research on water resource management and snow hydrology.
A method for detecting anomaly regions in each image of satellite image time-series was proposed by Zhou et al. in [77]. The method identifies spatial-temporal dynamic processes of unexpected changes of land cover based on seasonal autocorrelation analysis. The detection of spatial-temporal processes of flooding in satellite data was used to assess the method. A deeplearning approach for change detection applied to satellite time-series images was presented by Sublime and Kalinicheva in [66]. Their approach was compared against other machine-learning methods. The achieved results are better than the results of other existing techniques because of the method's higher performance and relative fast analysis. An online change detection algorithm was proposed by Chandola and Vatsavai in [78]. They used a Gaussian process based non-parametric time-series prediction model in an online mode. The process solves a large system of equations involving the associated covariance matrix which grows with every time step. Their algorithm identifies changes monitoring the difference between the predictions and previous observations within a statistical control chart framework.
An automatic clearance anomaly detection method was proposed by Chen et al. in [82]. The clearance anomaly detection measures the distance between power lines and other objects (e.g., trees and buildings) to evaluate whether the clearance is within the safe range. The clearance measurements were compared with the standard safe threshold to find the clearance anomalies. The results were validated through qualitatively visual inspection, quantitatively manual measurements in raw point clouds, and on-site field survey. The achieved results show that the proposed method detects the clearance hazards, such as tree encroachment, effectively, and the clearance measurement is accurate.

Research on Anomaly Detection
Anomaly detection can be used by systems to categorize problems based on the use of taxonomies, such as in [13]. Regarding identification of different types of anomalies useful to help decision-making systems, a unified framework for anomaly detection was proposed by Kittler et al. in [13]. According to them, the scientific literature conventionally understands an anomaly as an outlier. However, their framework has expanded the concept of anomaly beyond the conventional meaning of outlier. They claim that anomalies are related to many other occurrences, such as rare events, unexpected events, distribution drift, noise, and novelty detection of an object or an object primitive. The framework presented the multifaceted nature of anomalies and suggested effective mechanisms to identify and distinguish each of them. The research has provided a taxonomy of anomalies that includes, for example, unknown object, measurement model drift, unknown structure, unexpected structural component, component model drift, and unexpected structure and structural components. Because the practical application of the taxonomy is relatively new, and the detection and identification of the type of the anomaly involves complex aspects, it is appropriate to study the practical application of each type of anomaly individually. The research asserts that by identifying different types of anomalies, systems can select appropriate responses to deal with each type of anomaly. For decision-making systems, the application of the anomaly detection is potentially increased when it is based on incongruence.

Research on Incongruence
The use of incongruence for detecting the occurrences of some types of events was deeply discussed by Weinshall et al. in [15]. They identified distinct types of events based on the conflicting predictions given by weak and strong classifiers. Their study presented a framework for the representation and processing of those incongruent events. According to them, an incongruent event is an event for which the probability is divergent when computed based on the use of different types of classifiers. They applied their methodology on pictures of dogs and motorcycles, and indoor videos recorded to simulate common situations, such as people walking and talking. Similar to the other studies we have reviewed, this work also does not present the application of their methodology to solve real-world problems.
The decision cognizant Kullback-Leibler (DC-KL) divergence was proposed by Ponti et al. in [16]. The DC-KL divergence reduces the contribution of the minority classes which obscure the true degree of classifier incongruence. Simulations, i.e., synthetic data, were used to analyze the properties of the novel divergence measure. The measure is more robust to minority class clutter, less sensitive to estimation noise, and achieves much better statistics for discriminating between classifier congruence and incongruence than the classical KL divergence.
The delta divergence measure was proposed by Kittler and Zor in [17]. This measure focuses on the most probable hypotheses. Consequently, it reduces the effect of the probability mass distributed over the non-dominant hypotheses. The measure satisfies symmetry and independence of classifier confidence.
A new classifier incongruence measure was proposed by Kittler and Zor in [18]. The measure overcomes the shortcomings of previous measures and presents relatively low sensitivity to estimation noise, under the assumption of constrained Gaussian distribution. Moreover, the measure determines incongruence thresholds at given levels of statistical significance for different measured values corrupted with different levels of noise.

Research on Real-World Problems, which Inspired this Study
The studies commented in this subsection inspired us to investigate the potential application of the taxonomy [13] to solve real-world problems. Spatially explicit long-term monitoring frameworks and priority mitigation measures to cope with acute and chronic risks were proposed by Fernandes et al. in [31]. The research focused on the dam failure in Mariana, Brazil, where a dam breach abruptly discharged between 55 and 62 million m 3 of ore tailings into the Doce River. They claimed that environmental disasters like that of the Doce River would become more frequent in Brazil. The study is based on differential analyses of Landsat 8 scenes.
An automatic approach to enhance and detect river networks was proposed by Kang Yang in [2]. The research characterizes rivers in accordance with their Gaussian-like cross-sections and longitudinal continuity. River cross-sections were enhanced by a Gabor filter. The approach was applied to detect rivers in Landsat 8 images.
A study which focuses on the mapping and monitoring of the spatial extent of surfaces affected by mine waste was presented by Mielke et al. in [32]. They studied the influence of sensors on the ability to discriminate mine waste surfaces from their surroundings. The research explored the potential use of images from remote sensing to map and monitor the spatial extent of mine waste surface material in areas with mine tailings. They claimed that remote sensing analysis is very important to monitor the extent of mine waste surfaces and that mine waste sites have the potential to contain problematic contaminants, such as chrome, lead, uranium, etc. The research was based on the use of Landsat 8 data.
The impacts of oil extraction on the environment and health of indigenous communities in the Northern Peruvian Amazon (Marañon) were assessed by Rosell-Melé et al. in [33]. They investigated the occurrence of crude oil pollution in soils and river sediments caused by voluntary discharges or accidental spills. Their study is related to a very productive oil area in Peru, which is placed in the headwaters of the Amazon River. Their findings suggest that wildlife and indigenous populations in the Marañon region are exposed to the ingestion of oil, and local spillage of oil in the watercourses could have eventually reached the Amazon River.

Introduction to the Proposed Strategy
The flowchart in Figure 1 summarizes the methodology of the proposed strategy. More detailed descriptions of this methodology as well as of each parameter used here can be found in Section 5 and Appendix A. All chosen Landsat 8 images (see the fourth column in Table 1) were thoroughly evaluated by applying the proposed strategy. Each environment was evaluated separately. Many Landsat 8 scenes were previously checked visually to assess their conditions in order to select only those with low overlay contamination by cloud cover. This is critical to making geographical features as visible as possible. Afterwards, each Landsat 8 scene (see the fourth column in Table 1) was chosen according to its date of acquisition. In cases in which disaster had already occurred, the scene with the date of acquisition closest to the date of the disaster (see the fifth column in Table 1) was chosen in order to best evaluate the environment after the disaster. In the cases in which no environmental disaster had occurred, the most recent images with low overlay contamination by cloud cover were chosen (that is the case of the Athabasca, Elbe and Tietê Rivers).

The Advantages of Our Strategy
The following are advantages of our strategy: (1) It takes into consideration the multifaceted nature of anomalies exposed by [13]. (2) It helps to offer an appropriate response to deal with anomalies of the type unexpected structure and structural components. (3) To the knowledge of the authors, this is the first effort of a series of studies dedicated to investigate the practical application of detection of anomalies in accordance with the taxonomy published in [13] to solve real-world problems. (4) This research overcomes the number of categories studied in previous investigations, e.g., [25], which presented four different anomalies, because our strategy is based on that taxonomy [13] which presents ten different domain anomalies. (5) It processes just a single Landsat 8 image (scene), whereas other proposed anomaly detection methods use time-series of Landsat 8 images for each environment to be able to detect outliers. Time-series will only be used to detect anomalies of the types measurement model drift and component model drift in future papers.
Additionally, this research has established a relationship between the occurrence of anomalies of the type unexpected structure and structural components and the presence of brown mud in large rivers. Let us consider the example of a mining company which needs to monitor whether the brown mud of its reservoir is polluting a large river. In this example, the mining company does not need to  deal with any other type of pollutant other than brown mud, therefore a system able to detect only brown mud is enough for the company. The fewer the images, the fewer the demands for computational resources, consequently making the system cheaper. Moreover, the fewer the images the less data there is to be processed, consequently making the system faster. Therefore, a system based on the application of the proposed strategy, which deals with only one image per environment, will be advantageous. This system will save more money and time for the mining company than systems based on other proposed anomaly detection methods, because they generally deal with timeseries.
For the case studies presented in this paper, the anomalies are related to brown mud or crude oil present as either potential contaminators or contaminators of large rivers. Our strategy identifies incongruences to detect oil spills in some areas where there is water, or brown mud, and where there are high levels of turbidity in the water. These findings are important because an incongruence-based anomaly detection strategy can greatly increase the ability of surveillance systems to detect environmental disasters and to perform mappings.

Materials
Landsat 8 images (scenes of environments) from the United States Geological Survey (USGS) data sets and benchmarks [83] were used to evaluate the proposed strategy. The USGS consists of Landsat images and many other important data set collections, which are all freely available in [83] for non-commercial purposes. These data sets were chosen to perform the experiments in this study because they provide diversity regarding the images and geographical features for many different environments. Moreover, they have been frequently used to evaluate many recently-published proposals, such as [2,31,32], which allows us to compare the present proposed strategy with other studies. For this study, each scene was provided by the USGS through the Earth Explorer Platform.
All images used in this study are qualified with value "9," which represents the best quality for Landsat 8 images [26]. Moreover, all Landsat 8 data used in the present experiments were collected by the operational land imager (OLI), which is an instrument onboard the Landsat 8 satellite. The size of each image is approximately 170 km north-south by 183 km east-west, i.e., 106 mi by 114 mi. These images are available in the tagged image file format (TIFF). The three most common types of TIFF images are gray scale, indexed, and RGB.
For this study, each task of our strategy, including all classifications, was systematically performed using the Qgis 2.18.19 (Las Palmas) software with the Orfeo tool box. The software and the tool box are all freely available in [84] for non-commercial purposes. Qgis is a well-known and extensively used software composed of many tool sets, which are useful for analyzing images, mainly from remote sensing, for geosciences. Figure 2 shows the scenes and the hydrographical maps of the six different areas studied in this research. The information about the studied environments, including the identification of the Landsat 8 images (scenes) used in the experiments, is listed in Table 1. The geographical settings are listed in Table 2. All data are related to areas of large river basins, except the area of the Arava Valley. The image of the Arava Valley in southern Israel, acquired by the Landsat 8 satellite on 14 December 2014, was chosen because its soil received an estimated 3-5 million liters of crude oil on 4 December 2014 (29°40ʹ29.28ʺN 35°0ʹ25.56ʺE) [85]. This image was used to contrast the experiments with the Marañon River since there is absence of water in that part of the Arava Valley. This contrast is important because only two classes were defined for all experiments: "water," which categorizes any open water body, and "no-water," which categorizes anything different from open water bodies, for example land, trees, cities (this approach is similar to the one found in [86]). The idea is to evaluate if the proposed strategy would detect oil as if it were water. If oil were to be detected as water, then all experiments with images of the Marañon River basin would be invalidated. The image of the Marañon River in Peru, acquired by the Landsat 8 satellite on 27 August 2016, was chosen because its water received high quantity of crude oil on 22 August 2016. This oil spill was the consequence of a Colombian environmental disaster caused by a pipeline disruption (4°48ʹ45.6ʺS 75°23ʹ56.8ʺW) [87]. The crude oil is present around almost all extensions of the same pipeline since it was the source of other similar environmental disasters in the preceding three years.

Study Areas
The image of the Doce River in Brazil, acquired by the Landsat 8 satellite on 12 November 2015, was chosen because its water received around 55-62 million m 3 of iron ore tailings (brown mud) on 5 November 2015, as a result of Brazil's worst environmental disaster caused by a dam breach (20°12ʹ23.4ʺS 43°28ʹ01.6ʺW) [31]. The image of the Elbe River in Germany, acquired by the Landsat 8 satellite on 9 May 2018, was chosen because of the location of a dam close to its water (53°38ʹ20.4ʺN 9°25ʹ02.0ʺE). The river could receive many millions of m 3 of bauxite tailings (red mud) in the case of an environmental disaster similar to that which occurred in Brazil. The image of the Athabasca River in Canada, acquired by the Landsat 8 satellite on 26 August 2017, was chosen because there is a set of dams located close to its water (57°03ʹ06.3ʺN 111°35ʹ18.0ʺW), which could cause the river to receive many millions of m 3 of brown or red mud in the case of an environmental disaster similar to that which occurred in Brazil. The image of the Tietê River in Brazil, acquired by the Landsat 8 satellite on 27 August 2017, was chosen because its water receives significant quantities of sewage each year (23°23ʹ24.3ʺS 46°58ʹ36.5ʺW). For each Landsat 8 image used in this study, at least 100 samples were located and validated based on the topographic maps of the environments, adopted as ground truth for the purpose of this work. There are samples spread all over the scene. The numbers of samples used as training data to Each image presents different difficulty levels to locate and analyze geographical features, in the environments. For example, in any of these environments it is possible to find parts of the image with an easy to detect, large, and singular geographical feature that differs from other parts of the image with hard to detect, small, numerous, and geographically spread features. By using diversified environments, it was possible to verify the accuracy of the proposed strategy to identify incongruences in order to detect anomalies, when it is applied to different scenes as well as to different types of geographical features present in images.

Data Preprocessing
Each image was processed performing tasks to identify incongruences in accordance with Figure  1 Second, a band composition was built in order to allow the user of the system to analyze the scene visually. A band composition is an image composed of different bands of a satellite image. The band composition uses pixels of different color values in order to represent various properties of objects from the real world. A band composition is represented by Equation (1), for which, considering each Landsat 8 band as a two-dimensional array of pixels, x and y are coordinates of the pixels. The composition vr is a tridimensional array composed of three overlapped elements (bands), being b4(x, y) = Band 4, b5(x, y) = Band 5, and b6(x, y) = Band 6. Figure 3 shows the band composition R(4)G(5)B(6) as an example. Next, we applied a histogram stretching to the composite R(4)G(5)B(6). The procedure was performed by a Qgis tool from the menu properties (style). Because Qgis needs to create a color table in order to render the band composite, the tool calculates the table based on the mean and standard deviation for the three selected bands.
The mean and the standard deviation were calculated respectively according to Equations (2) [88] and (3) [88], for which, x and y are coordinates of the pixels, gi(x, y) is an image, K is the total number of images g(x, y), i is the identifier of each image g(x, y), and f(x, y) is the image formed averaging K images g(x, y). In Equation (3), h(x, y) is the image formed calculating the standard deviation. Figure 4 shows an example of an enhanced image [89].
in which MS is a multispectral image, is a pan-sharpened image, is a multispectral image interpolated at the scale of the panchromatic image, the subscript k indicates the k th spectral band, g = [g1,…, gk,….,gN] is the vector of the injection gains, P is the histogram-matched panchromatic image, and IL is defined by the Equation (5) [91], (5) in which the weight vector w = [w1,…, wi,…, wN] is the first row of the forward transformation matrix and may be chosen, whenever possible, to measure the degrees of spectral overlap among the multispectral and panchromatic channels.  For the pan-sharpening method, interpolation must guarantee the overlap of multispectral and panchromatic images at a finer scale. Depending on the acquisition geometry of the imaging instruments, interpolation with conventional zero-phase linear finite-impulse response filters may require a manual realignment, e.g., after a bi-cubic interpolation. Alternatively, linear nonzero-phase filters, having even numbers of coefficients, may be profitably used, according to [91,92].
Although the 30-meter original spectral bands are enough for dealing with large rivers, the pansharpening was necessary for dealing with lanes that support oil pipelines. Those lanes, although long, are much narrower than large rivers. Working with 15-meter images made possible the detection of the presence of oil on those narrow lanes. It is important because, for example, there is a very long lane, built in the vicinity of the Marañon River in order to support oil pipelines, that has already received high quantities of crude oil as a consequence of successive oil spills. Whereas those lanes should normally present either dry or humid soil around the pipelines, a significant oil spill turns the soil impermeable causing accumulation of water around the pipelines. The detection of water retained in the lane, therefore, is indicative of the occurrence of an oil spill on the soil of that area.
Next, the pan-sharpening resulting image had its histogram stretched. This step was performed exactly the way it was previously mentioned, including the calculation of Equations (2) [88] and (3) [88].
Then, the second-order image statistics were calculated using the Orfeo toolbox in order to compute the global mean and standard deviation for each band from a set of images. Their applications are related to geometric modeling, i.e., to create a model taking into consideration the spatial distribution of the pixels in the image. The model was created in order to statistically represent the enhanced image, and the statistics are not affected by the enhancing parameters. The secondorder image statistics are those for which the slope of the power spectrum tends to be close to negative two. The power spectrum of an M by M image is represented by Equation (6) [93], for which, F is the Fourier transform of the image, φ are directions, and u and v are two-dimensional frequencies represented in polar coordinates in accordance with Equations (7) and (8) cos ∅ sin ∅ Here, α ≈ −2 is the spectral slope, η is its deviation from −2, and constant A describes the overall image contrast." Contrast can be obtained by image statistics, for example, calculating the standard deviation of all pixel intensities divided by the mean intensity (σ/μ) in accordance with Equation (10).
Afterward, in the image, at least 100 samples were selected and validated based on maps of the environments (ground truth). The details regarding the chosen samples were described in the Subsection 5.2. The selection of samples is important because they are used as training data to model each environment for contextual or non-contextual classifiers. Whenever users select samples for classifiers, their classifications are named supervised or semi-automatic. A supervised classification allows the user to collect samples from the image and to use a classifier trained with these samples to classify the image [90].
The preprocessing is essential to prepare the data to be used for the learning and classification approaches adequately. Our main contribution in the preprocessing is regarding the care for selecting the samples. The samples were not located on the large rivers, ore tailings reservoirs, oil pipelines, or lanes for oil pipelines, in order to avoid the experiments yielding biased results. In other words, no areas located on the large rivers were selected to be samples because these samples could bias the classifiers to classify the large rivers as "water." No areas located on ore tailings reservoirs, oil pipelines, or lanes for oil pipelines were selected to be samples because these samples could bias the classifiers to classify those structures as "no-water."

Learning and Classification Approaches
The training of classifiers step was performed based on the enhanced 15-meters image, the statistical model, and the selected samples. The training is important in order to create the models which will be used by the contextual or non-contextual classifiers. The classifiers use these models as references based on the selected samples to classify features of interest present in the enhanced 15meters image [94]. Next, classifications were applied using various classifiers, such as Naïve Bayes [35], super vector machine (SVM) [8,65], decision tree (DT) [29,30], k-nearest-neighbor (kNN) [27,28], and boost [9]. Among the tested classifiers, the best results were achieved by kNN and decision tree (non-contextual classifiers), and boost (contextual classifier). Although the other tested classifiers were used the same way as kNN, DT, and boost, they were excessively time-consuming. Moreover, their results did not overcome those achieved by kNN, DT, and boost for the experiments of this study. Therefore, we decided to consider only kNN, DT, and boost as the classifiers of the experiments in this study.
The kNN classifier is useful to perform discriminant analysis in situations for which it is hard to determine parametric estimates of probabilities [27]. Let us consider that X = [x1,...,xN] is the training data with N points of dimensionality D, Xi = [xi1,...,xik] is the k nearest neighbors of xi, Xt is the testing data with Nt points, x0 is an arbitrary testing data point, X0 = [x01,...,x0k] contains its k nearest neighbors from training data, for which the labels are [l1,...lk], and Ω = [Ω1,…, ΩC] is the set composed of C classes present in the data. Additionally, the kNN considers that the k neighbors of a testing point have equal weights [28].
The classification performed based on the kNN finds the nearest neighbors regarding a testing point in the training data. Moreover, the classifier assigns the test point to the most frequently occurring class of its k neighbors. The kNN classifies x0 based on the majority voting rule presented by Equation (11) [28], for which δ is the Kronecker delta represented by Equation (12). * 1, … , , The decision tree classifier is useful to perform decision analysis by finding the most probable decision for achieving an objective [29]. Many algorithms are able to build decision trees. Among them, the Iterative Dichotomiser 3 (ID3) is a well-known algorithm used to generate decision trees from data sets. The ID3 is based on information entropy and information gain, which are represented respectively by Equations (13) and (14) [30]. The Boost classifier combines iteratively weak classifiers by taking into consideration a weight distribution on the training samples such that more weight is attributed to samples misclassified by the previous iterations [9]. The final strong classifier is a weighted combination of weak classifiers followed by a threshold. The steps performed by the Boost are described in detail as follows.
The initialization of weights is performed according to Equation (15) [9].
For t = 1,2,...,T (T is the maximum training number): (a) For each feature j, train a classifier hj which is a simple linear classifier, i.e., a classifier restricted to use a single feature. Equation (16) expresses this type of classifier [9].
In Equation (16), xi,j is the value of the j th feature of the sample xi, pj  {-1,1} determines the direction of the inequality sign, θi,j denotes the threshold value of the j th feature of the sample xi.
Equation (17) shows that the error t is evaluated with respect to Dt(xi,yi) [9].
, : for which Zt is a normalized constant computed to insure that Dt(xi,yi) represents a true distribution, established by Equation (20) [9].
The image classifications are important in order to group features of interest into classes according to the similarities of their characteristics. In the images resulting from the classifications, each class is represented by a different color. In the case of this research, because only two different classes were used for grouping features of interest, "water" and "no-water," the images resulting from the classification processes are binary [94].
At the end, a subtraction of images resultant from non-contextual and contextual classifications was performed, computing the difference of all pairs of corresponding pixels from both images [88]. The subtraction was calculated according to Equation (22) [88], for which, x and y are coordinates of the pixels, and f(x, y) and h(x, y) are respectively the images resultant from non-contextual and contextual classifications. Image g(x, y) results from the subtraction. This subtraction is important because it allows the automatic identification of congruence, in which case the subtraction results in zero, or automatic identification of incongruence, in which it results in at least one pixel equal to one. Incongruences can reveal the presence of anomalies as explained in [16]. When anomalies were identified, they were analyzed and categorized by type in accordance with [13].
The learning and classification approaches are essential to make feasible the application of the taxonomy [13] to solve real-world problems. Our main contributions in the learning and classification approaches are as follows: (1) The use of a single Landsat 8 image previously assessed as high-quality sensory data. (2) The creation of statistical and classification models adequate enough to make PR tools learn how to classify the features present in the environments. (3) The subtraction of the results from contextual and non-contextual classifiers to indicate the occurrence of incongruence. (4) Finding the combination of the three previous conditions to establish a relation between the occurrence of anomalies of the type unexpected structure and structural components and the presence of brown mud in large rivers.
Regarding methodological limitations, it is recognized that the higher the resolution of the images, the longer the computational time to process the images. Therefore, long time of processing is inevitable when large images are evaluated by anomaly detection strategy. This time depends on the computer being used and nowadays can vary from dozens of minutes to many hours. However, this variation, in the computational time, spent by the proposed strategy to process the images is acceptable since different computers have different computational resources.

Results
Three evaluation measure mechanisms (contextual and non-contextual classifiers; sensory data quality assessment; incongruence indicator) were used to reliably qualify the anomaly, as described in [13], because they are three successful mechanisms to qualify anomalies. According to Kittler et al. [13], three conditions need to be satisfied to guarantee the occurrence of the anomaly of the type unexpected structure and structural components. The three conditions are high sensory data quality, contextual and non-contextual classification of the same image, and incongruence. If any of the three conditions are not present, it is not possible to identify an anomaly of this type. In this study, the high sensory data quality was confirmed because all images are qualified with value "9," which represents the best quality for Landsat 8 images. The contextual and non-contextual classifications were confirmed by the results achieved applying respectively boost and kNN or DT classifiers. The presence of incongruence results was confirmed performing the last step of the strategy.
All assessments were carefully performed taking into consideration the application of the proposed strategy on Landsat 8 images. Since all those Landsat 8 images are large (170 km northsouth by 183 km east-west, i.e., 106 mi by 114 mi, approximately), the image of each environment was precisely cropped in order to create a set of smaller images tiles. This was fitting, as the majority of assessment approaches found in the scientific literature are applied to smaller images. Table 3 shows how the cropping was done for each Landsat 8 scene. In order to quantitatively validate the proposed strategy, we used the metrics accuracy, precision, recall [15], and F-measure [82] for which, TP are true positives, i.e., images in which incongruences are truly detected; TN are true negatives, i.e., images in which congruences are truly detected; FN are false negatives, i.e., images in which congruences are falsely detected; FP are false positives, i.e., images in which incongruences are falsely detected. Accuracy measures the efficiency of results and is represented by Equation (23), for which M is the number of images. Precision measures the relevancy of results and is represented by Equation (24). Recall measures the quantity of truly relevant results and is represented by Equation (25). F-Measure measures the balance between precision and recall and is represented by Equation (26). (26)

Incongruent Event Congruent Event
Detection Incongruent

Incongruent Event Congruent Event
Detection Incongruent

Incongruent Event Congruent Event
Detection Incongruent

Interpretation of the Results
Tables 4 to 9 present encouraging results. As can be seen, there is a significant quantity of true positives associated with the image sets that present brown mud, whereas this quantity is slightly lower for images presenting crude oil and considerably lower for images presenting red mud and sewage. False positives are present in almost all image sets, but their number are of little significance for this study.
The results regarding the accuracy, precision, recall, and F-measure of our strategy are shown in Table 10. Table 10. Accuracy, precision, recall, and F-measure of the proposed strategy for different environments (Landsat 8 scenes).  Table 11 shows the comparison of the accuracy, precision, recall, and F-measure of this study, presented in the first row, with results achieved by other studies. It seems that the results are quantitatively consistent with those found in the scientific literature. On average, our strategy achieved the highest values of accuracy and recall among the presented studies. Although our strategy did not achieve the highest values of precision and F-measure compared to the other studies, it is of little significance because the achieved precision and F-measure are high. Table 11. Comparison of the accuracy, precision, recall, and F-measure of this study with others.
In the first case, apart from other geographical features, Figure 6a shows the non-polluted water of a narrow and long lake, in black, in the bottom right corner and the water of part of the Doce River polluted with ore tailing waste (brown mud), also in black, starting on the left, making a sharp turn, and ending on the top. Figure 6c shows the capability of the contextual classifier (Boost) for recognizing both polluted and non-polluted water, whereas Figure 6d shows the capability of the non-contextual classifier (decision tree) for recognizing only non-polluted water.
In the second case, apart from other geographical features, Figure 7a shows part of a large reservoir polluted with ore tailing waste (brown mud), in black, in the top right corner and the nonpolluted water of part of the Athabasca River, also in black, starting on the bottom left, making a sharp turn, and ending on the top. Figure 7c shows the capability of the contextual classifier (boost) for recognizing both polluted and non-polluted water, whereas Figure 7d shows the capability of the non-contextual classifier (decision tree) for recognizing only non-polluted water.
In the third case, apart from other geographical features, Figure 8a shows part of a very long lane built in the vicinity of the Marañon River in order to support oil pipelines, in black and orange, starting on the top left and ending on the top right. Figure 8c shows the capability of the contextual classifier (Boost) for recognizing the polluted water concentrated on the impermeable soil in the lane, whereas Figure 8d shows that the non-contextual classifier (kNN) is not so much capable of recognizing the polluted water in the lane.
In the last step of our method, the image of the Figures 6d, 7d, and 8d were subtracted respectively from the images of the Figures 6c, 7c, and 8c. Figures 6b, 7b, and 8b show the respective results of these subtractions highlighted in white and overlapped on the real images. The small white points spread on Figures 6b and 7b can be easily removed applying a morphological opening filter in the image. However, the same filter would remove the result present in Figure 8b. Therefore, we have decided to not apply the filter.

Discussion
When the classes are very imbalanced, precision and recall measure the success of prediction. If the classifiers return correct results and the majority of all positive results, then precision and recall present high values. We achieved high values of accuracy, precision, and recall with the application of the incongruence based anomaly detection strategy in this study. Therefore, a significant quantity of truly relevant results showed that the anomalies of the type unexpected structure and structural components were detected with efficiency by our strategy. The values of precision would be even higher if we had limited the studied areas to the vicinity of the locations with pollutants instead of studying the whole Landsat 8 scene, since many false positives were located far from these locations. Moreover, the achieved results demonstrate that the application of the proposed strategy to detect water pollution is consistent with other studies, although this study has introduced an approach to reach results that is unlike the approaches commonly used.
The results presented in the first row of Table 10 are aligned with the results reached by [31], which reveal high concentration of brown mud in the Doce River. Moreover, the results presented in the second row of Table 10 compare well with the results reached by [32], which highlight the ability to discriminate mineral mine waste surfaces. Therefore, the present achieved results suggest that the incongruence occurs because, contrary to the non-contextual classifier (decision tree), the contextual one (boost) is able to detect the water, even for cases in which there is high turbidity in the water of the studied river or reservoirs, because of the high concentration of brown mud. The results presented in the third row of Table 10 are parallel to the results reached by [87], which also correspond to detection of oil spills in optical satellite images. In the case of our study, instead of detecting oil spills on the surface of the water such as in [87], the detection was performed in lanes built in the vicinity of the Marañon River in order to support oil pipelines. A significant oil spill turns the soil impermeable causing accumulation of water around the pipelines, whereas those lanes should normally present either dry or just humid soil around the pipelines. Therefore, the present achieved results suggest that the incongruence occurs because, contrary to the non-contextual classifier (kNN), the contextual one (Boost) presents high sensibility to detect the water around the pipelines (see Figure 9), revealing the presence of oil spill on the soil of that area. Additionally, the results presented in Table 8, which are related to the Arava Valley, have demonstrated that neither the non-contextual nor the contextual classifier classify crude oil as water in a dry environment, which corroborates the causes of the incongruence previously mentioned.  [87]. The image shows the crude oil around the pipeline. Source: http://geoportal.regionloreto.gob.pe/mapa-de-derrame-de-petroleo-enel-tramo-i-del-oleoducto-norperuano/ (accessed on 9 July 2018).
Outcomes from this study align with those presented by Kittler et al. in [13] and Weinshall et al. in [15], because they perform research involving the use of incongruences and recognize their potential to be applied to anomaly detection. Additionally, the current findings expand these prior studies to real-world images from remote sensing, instead of only synthetic data, and apply to realworld applications, such as the detection of water pollution. For example, after detecting anomalies, decision-making systems can guide computers to give adequate and fast responses to prevent water pollution as an environmental disaster or to analyze the extent to which they occur. Furthermore, the proposed strategy broadens a list of studies, such as [55][56][57][58][59][60]101], engaged to analyze images from remote sensing to offer possible solutions to detect and monitor water pollution.
The results achieved by this study demonstrate higher accuracy and recall than previous studies' results, such as [25,66,[76][77][78][80][81][82][98][99][100]. Therefore, the results suggest that our strategy provides more efficient results and a higher quantity of truly relevant results than the referred studies. The smallest difference regarding accuracy in comparison to our study was reported by Che et al. in [98]. The high level of accuracy was achieved by applying a method based on machine learning. However, Che et al. did not report values in regard to recall.
Another small difference regarding accuracy in comparison to our study was reported by Bormann et al. in [81]. In that study, a method based on detection index allowed them to achieve a high level of accuracy. In contrast, [81] reported the largest difference regarding recall in comparison to our study. Therefore, our method provides much more truly relevant results than the one proposed by them.
Three other studies [76,80,99] reported accuracies, whose values are lower than the one achieved by our study, although they are still close. In [76], a method based on median absolute deviation provided high level of accuracy. However, our method provides more truly relevant results than the one proposed in [76]. Similarly to our study, SVM was a method applied for classification by Bhaduri et al. in [80]. In contrast, we achieved the best results by applying other classifiers in our study compared to the use of SVM. A significant result was provided by a deep-learning method based on convolutional neural network (CNN) in [99]. Although [80,99] achieved high levels of accuracy applying their methods, they did not report results in regard to recall.
Other studies, such as [66,77,78,100], have more significant differences regarding both accuracy and recall in comparison to our study. In these studies, the levels of accuracy and recall were achieved after applying a method based on: (1) seasonal autoregressive integrated moving average (SARIMA) model for autocorrelation analysis in [77]. (2) Gaussian process (GP) based non-parametric timeseries prediction in [78]. (3) Maximum likelihood classification in [100]. (4) Joint fully convolutional auto-encoders (FC AE) model in [66]. The results suggest that our method provides significant higher levels of both accuracy and recall in comparison to the methods based on SARIMA, GP, and AE. Unfortunately, [100] did not report values in regard to recall.
Focusing the discussion specifically on the study presented by Sublime and Kalinicheva in [66], they proposed a deep-learning method for change detection (FC AE). Their method is useful for postdisaster damage mapping. Although they also deal with real-world problems, their study differs from ours because their method detects outliers rather than apply the taxonomy [13] (i.e., detecting anomalies) to solve real-world problems. According to Sublime and Kalinicheva, their outlier detection method achieved the best results when it was compared against other machine-learning methods. The levels of accuracy and recall achieved by our strategy are significantly higher than the ones achieved by the deep-learning method proposed in [66]. Additionally, the level of precision achieved by the deep-learning method proposed in [66] is lower than the one achieved by our strategy. Therefore, the results suggest that our method provide results that are more efficient and relevant, and a higher quantity of truly relevant results compared to machine-learning methods applied for post-disaster damage mapping, for example.
As previously mentioned, to the knowledge of the authors this is the first endeavor of a series of studies dedicated to investigating the detection of anomalies in accordance with the taxonomy published in [13] and their potential applications to remote sensing. This study proposed a strategy taking into consideration the multifaceted nature of anomalies [13], whereas other detection methods have been dealing with different types of anomalies as if all of them were outliers. Since the present strategy is based on this taxonomy [13] which presents ten different domain anomalies, this study overcomes the number of categories studied in previous investigations, e.g., [25] which presented four different anomalies. The smallest differences regarding the recall in comparison to our study were reported by Qi Liu et al. in [25] and Chen et al. in [82]. However, Chen et al. did not report their achieved accuracy. Overall, our study provides relevant improvements over other studies and also important support and guidelines for future studies.
Regarding limitations, it should be noted that the accuracy of the proposed strategy varies depending on the environment of application. It is inevitable when many different geographical features are evaluated by some anomaly detection strategy. However, the differences in levels of accuracy of the proposed strategy are of little significance. Further work is planned to examine the effects of applying filters on the images before applying them to the proposed strategy to determine if the different levels of accuracy can be minimized among different environments. The results indicate that the proposed strategy accurately detects anomalies related to brown mud and crude oil in the majority of evaluated images from remote sensing.
Regarding negative results, it should also be noted that the detection of incongruence, when the water pollution is caused by either red mud or sewage (Tables 6 and 9), was not obtained by the proposed strategy. This limitation is probably caused by the use of seven bands as a pattern for all experiments of this study. However, it is of little significance, since the ability of the proposed strategy detecting incongruence was demonstrated when the water pollution is caused by either brown mud or crude oil. All in all, the achieved results indicate that the proposed strategy detects the presence of anomalies of the type unexpected structure and structural components, in accordance with [13].

Conclusions
This paper presented an incongruence-based anomaly detection strategy for analyzing images from remote sensing, with the aim of describing its practical application in detecting water pollution. Therefore, the practical application of incongruence was introduced as a strategy for detecting anomalies in real-world applications, i.e., non-synthetic data present in images from remote sensing.
The proposed strategy detected anomalies of the type unexpected structure and structural components, distinguishing them from outliers. All processing was performed on just a single Landsat 8 image, whereas other methods use time-series for each environment to be able to detect outliers. This means that systems dedicated to observing or monitoring occurrences of anomalies of this type will save time and use few computational resources because they will process a single image. Therefore, systems based on the application of our strategy will help by saving more money than systems based on other proposed detection methods.
This study has established a relationship between the occurrence of anomalies of this type and the presence of brown mud or crude oil as either potential contaminators or contaminators of large rivers. Our strategy identified incongruences to detect oil spills in some areas where there is the presence of water, or brown mud where there are high levels of turbidity in the water. These findings are relevant since an incongruence-based anomaly detection strategy can significantly increase the ability of surveillance systems for detecting environmental disasters and for performing mappings.
Although the achieved results are encouraging, future studies should investigate: (1) The effects of using Gaussian process-based filters on the images before applying them to the proposed strategy in order to know if it would be useful to minimize differences between the levels of the accuracy among different environments. (2) The effects of applying feature extractors on the images as a preprocessing stage before applying the proposed strategy in order to determine if the classification process can be improved for all environments. (3) Whether our strategy will be able to detect other pollutants not using pan-sharpening.
Future work will investigate: (1) Practical applications of other types of anomalies individually, such as unknown object, measurement model drift, unknown structure, unexpected structural component, and component model drift.
(2) A single system to deal with the practical application of all types of anomalies together.
It is expected that this study will open up an entirely new range of applications for computational tools on images from remote sensing in order to model and analyze environments. Moreover, this anomaly detection strategy can be applied to a wide range of studies in geosciences and other scientific areas.

A.2 Rendering
We used the default parameters of Qgis 2.18.19 for rendering, which means: multiband color to Render type, Band 4 to Red Band, Band 5 to Green Band and Band 6 to Blue Band. The other parameters

Water
No-water used were stretched to MinMax to Contrast enhancement, 2%-98% to Cumulative count cut, 2 to mean +/-standard deviation, normal to blending mode, and off to Greyscale. More detailed descriptions of this tool as well as of each parameter used here can be found in [89].

A.3 UTM Projection
In the UTM projection the terrestrial surface is mapped into 60 zones. Each zone spans 6° in longitude and ranges from 84°N to 80°S. The origin of the UTM coordinates (E, for east-west direction and N, for north-south direction) is where the equator crosses the zone center meridian. In order to avoid working with negative values, a false easting of 500,000 m is added to the coordinate E, both east and west of the central meridian. As for N, when in the south hemisphere, a false northing of 10,000,000 m is adopted [102].

A.4 Pan-Sharpening
First CS provides a sensor modelling to zooming and register [90] the multi-spectral image on the panchromatic image, i.e., to perform the projection of an image into the geometry of another one. Second, CS fuses the co-registered pixels [90] of the multispectral image with the pixels of the panchromatic one by the application of a pixel-by-pixel fusion operator. This pan-sharpening method operates in the same way on the whole image.
The advantages of this pan-sharpening method are the high fidelity in rendering the spatial details in the final product and the fast, easy implementation. The limitation of this pan-sharpening method is the inability to account for local dissimilarities between the panchromatic and multispectral images originated by the spectral mismatch between the panchromatic and multispectral channels of the instruments, which may produce significant spectral distortions. Another formalization of the CS method can solve this limitation [91]. It takes into consideration that the fusion process can be obtained through a proper injection scheme without the explicit calculation of the forward and backward transformations, if the substitution of a single component and the hypothesis of a linear transformation are considered. This CS method fuses the multispectral image with the panchromatic one allowing the resultant image to show a combination of the high spectral resolution of the multispectral image with the high spatial resolution of the panchromatic one.
The steps used to perform the pan-sharpening method based on this CS method are described in detail as follows. At the start, the method projects the multispectral image into another space. It is assumed that this transformation separates the spatial structure from the spectral information in different components. Then, it replaces the component containing the spatial structure with the panchromatic image to enhance the transformed multispectral image. Meanwhile, it performs histogram matching of the panchromatic image to the selected component. The histogram matching is performed before substitution, because greater correlation between the panchromatic image and the replaced component corresponds to lower levels of distortion introduced by the pan-sharpening method. At the end, the data is brought back to the original space through the inverse transformation.
In Qgis, two tools from the Orfeo tool box, superimpose sensor and pansharpening (RCS), were used to perform respectively the projection and the fusion. Default values were chosen as parameters for performing the projection, i.e., the superimpose sensor tool used the values zero for default elevation, four for spacing of the deformation field, and two for radius for bi-cubic interpolation, and Nearest Neighbor interpolation. The projection was necessary in order to prepare the image for the pan-sharpening. Pan-sharpening was then performed in accordance with the algorithm of component substitution (RCS) in order to increase the spatial resolution of the image based on the Band 8. More detailed descriptions of these tools as well as of each parameter used here can be easily found in [92].

A.5 Learning and Classification Approaches
This step was performed by an Orfeo tool named TrainImagesClassifiers, which is itself completely responsible for performing automatically the training, validation, and testing of each classifier. Because the samples set is statistically significant to represent the whole scene, the tool uses hold-out validation for which the tool defines separately the percentage of each data set: training data set, validation data set, and test data set.
Some default parameters were used for training all classifiers: zero for default elevation, 1000 for maximum training sample size per class, 1000 for maximum validation sample size per class, one for bound sample number by minimum, 0.5 for training and validation sample ratio, class for the name of the discrimination field, and zero for set user defined seed. Other default parameters, which were used for training a specific classifier, are discriminated as follows. For the kNN classifier: 32 for number of neighbors, knn for classifier to use for training, and set off edge pixel inclusion. For the DT classifier: 65535 for maximum depth of the tree, 10 for minimum number of samples in each node, 0.01 for termination criteria for regression tree, 10 for cluster possible values of a categorical variable into K <= cat clusters to find a suboptimal split, 10 for K-fold cross-validations, set use 1seRule flag to false, set TruncatePrunedTree flag to false, dt for classifier to use for training, and set off edge pixel inclusion. For the Boost classifier: one for maximum depth of the tree, 100 for weak count, 0.95 for weight trim rate, real for boost type, boost for classifier to use for training, and set off edge pixel inclusion. More detailed description of the TrainImagesClassifiers tool as well as of each parameter used here can be easily found in [94].
The advantage of the kNN is the fast and easy implementation. The limitation of the kNN is that the commonly employed Euclidian distance metric, which considers the data as isotropic or homogeneous, is generally not suitable for many real-world data sets. Nevertheless, learning a new distance metric can solve this problem. Therefore, it is possible to improve this classification if the kNN learns a distance metric derived from the training data, such as the one represented by Equation (27) [28], for which T represents a linear transformation. , The ID3 can be used to build the block diagram of the decision tree model presented in Figure  A2a [30], starting from the root node, after generating internal nodes, and finishing each branch with a leaf node. For the ID3, the maximum information gain is used as a heuristic approach to choose the optimal decision attribute ai from A that will be assigned to the next internal node. The root and internal nodes correspond to the different test attributes, i.e., the root or each internal node tests the assigned attribute ai, as shown in Figure A2b. A branch for node is generated for every value of ai. Sample sets Dv included in every node are classified to respective child node based on the heuristic approach. In the extreme of each branch, each leaf node shows a decision result that designates a classification regarding the class ct, for which t = {1, 2, …, n}. The class ct, ct  C = {c1, c2, …, cn}, is determined by a specific combination of features attributed to ct, e.g., ct = {green, square, rough}, that will be assigned to objects during the training and test procedures. The root node includes the whole sample set D. The path which goes from the root node to every leaf node represents a test rule.
The advantages of the ID3 are its obvious intuitive characteristic and its easy decomposability. The limitations of the ID3 are the difficult to control tree sizes, the time lost to calculate the information entropy expression by using the logarithmic algorithm, and the tendency to select the attribute which has more values in the procedure to get the optimal attribute. Because these limitations can reduce performance, the use of this version of the ID3 is not appropriate for real-time systems. Real-time systems are not addressed in our study. Using an improved version of the ID3 algorithm based on the simplification of the information entropy can solve these limitations, such as in [30]. However, the description of this other version is of little significance here, because this improvement causes no impact on the quality or quantity of the results of this study.
The advantages of the boost are the ability to integrate different classifiers that focus on different aspects of a problem and assign more weight on features that can train more accurate weak classifiers. The limitation of the boost is the slower and harder implementation compared to weak classifiers. This limitation was addressed by using the Boost as coded to run in our tests.
(a) (b) Figure A2. Block diagram of the decision tree: (a) general model, (b) example with decision attributes assigned to nodes. Adapted from [30].