A Fully Automatic, Interpretable and Adaptive Machine Learning Approach to Map Burned Area from Remote Sensing

: The paper proposes a fully automatic algorithm approach to map burned areas from remote sensing characterized by human interpretable mapping criteria and explainable results. This approach is partially knowledge-driven and partially data-driven. It exploits active ﬁre points to train the fusion function of factors deemed inﬂuential in determining the evidence of burned conditions from reﬂectance values of multispectral Sentinel-2 (S2) data. The fusion function is used to compute a map of seeds (burned pixels) that are adaptively expanded by applying a Region Growing (RG) algorithm to generate the ﬁnal burned area map. The fusion function is an Ordered Weighted Averaging (OWA) operator, learnt through the application of a machine learning (ML) algorithm from a set of highly reliable ﬁre points. Its semantics are characterized by two measures, the degrees of pessimism/optimism and democracy/monarchy. The former allows the prediction of the results of the fusion as affected by more false positives (commission errors) than false negatives (omission errors) in the case of pessimism, or vice versa; the latter foresees if there are only a few highly inﬂuential factors or many low inﬂuential ones that determine the result. The prediction on the degree of pessimism/optimism allows the expansion of the seeds to be appropriately tuned by selecting the most suited growing layer for the RG algorithm thus adapting the algorithm to the context. The paper illustrates the application of the automatic method in four study areas in southern Europe to map burned areas for the 2017 ﬁre season. Thematic accuracy at each site was assessed by comparison to reference perimeters to prove the adaptability of the approach to the context; estimated average accuracy metrics are omission error = 0.057, commission error = 0.068, Dice coefﬁcient = 0.94 and relative bias = 0.0046.


Introduction
Data science comprises methods and techniques such as machine learning, statistics, data mining, pattern recognition and "soft" computing for discovering knowledge in the form of both patterns and relationship from large volumes of data, in order to understand actual phenomena [1,2]. These technologies are particularly suited for processing satellite and aerial remote sensing imageries. Disturbance phenomena such as wildfires, flooding, landslides can be identified as abrupt changes in the surface conditions induced by a rapid and unforeseen event. Thus, thematic mapping of remotely sensed data relies on the spatio-temporal patterns of the target surface to be extracted from both spectral signatures and signatures' change. Operational monitoring of the environmental systems imposes quick and efficient methods based on large-scale data, readily available to the agencies [3], and it asks for automatic algorithms able to extract information from big data. In this work, we propose a fully automatic, interpretable and adaptive data fusion algorithm applied to the case study of burned area (BA) mapping from remotely sensed data. Interpretability is an important property of automatic systems in order to capture the trust of those who have to use their results: criteria of the mapping that lead to a certain result should be understandable to learn more about the phenomenon. This concern is expressed in the recent Presidential Address (February 2021) in the issue of the Geological Society of America (GSA) magazine "GSA Today" that cites four principles that data processing should satisfy, and one of them is "I can't use something I don't understand".
In recent years, machine learning (ML), and specifically deep learning, has become very successful in environmental sciences [4][5][6] including mapping and monitoring the status of the territory and changes induced by disturbances, such as wildfires [7], thanks to the high accuracy of the predictions [8][9][10]. Nevertheless, the criteria it learns and applies, besides requiring unbiased big data for training that are not always available, as in our case, can be hardly explainable to both experts and decision makers due to its black-box nature. Furthermore, ex-post explainability of ML is considered an essential aspect of the European Union's way to Artificial Intelligence [11], in line with the European General Data Protection Regulation (GDPR) [12][13][14] that restricted the use of black-box ML [15]. Specifically, GDPR promotes transparency of automated systems, by requiring that they provide meaningful information about the logics, and a justification of outcomes in order to enable understanding [16]. Although some attempts have been made to create methods for explaining black-box deep learning, the way forward is to design models that are inherently interpretable [17].
Therefore, the approach proposed here is an interpretable and explainable method based on a ML algorithm that exploits active fire points as a small training set for learning an Ordered Weighted Averaging (OWA) operator defined within fuzzy set theory [18]; this operator is used for the fusion of multiple partial evidence maps of burned areas, each one derived from diverse factors identified based on expert's knowledge.
The approach is applied to map burned areas in Mediterranean ecosystems. Wildfires are a complex process caused by the simultaneous occurrence of several interrelated factors (e.g., ignition source, fuel characteristics and composition, weather conditions and topography) [10]. Fire monitoring from remotely sensed data is mainly carried out by observing two distinct surface conditions: the presence of a fire front and the area affected by a fire. The fire front can be captured at the time of satellite overpass and detected as thermal hot spot; we refer to it as active fire (AF) or hot spot points. The area affected by a fire shows a change in the vegetation cover and/or in the ground surface: we refer to it as burned areas (BAs) [19]. Active fires are useful indicators of fire presence and fire timing but they do not provide a direct estimate for the total burned area [20].
Remote Sensing (RS) technology has been proven to bring key source data for monitoring and modelling complex disturbance phenomena affecting the environment, e.g., wildfires, insects and disease [21] for their capability of adapting to changing environmental conditions [22]. Several satellite missions carry on-board sensors specifically designed for monitoring fires and thermal anomalies (e.g., NOAA-AVHRR, NASA Terra&Aqua MODIS, NASA/NOAA VIIRS) which, combined with other systems for Earth Observation (e.g., Landsat and Sentinel missions), provide data with variable spatial, temporal and spectral resolutions. In particular, data acquired by both Landsat and Sentinel missions have recently been widely exploited for fire monitoring for their medium spatial resolution (10 to 30 m) that allows small and fragmented burned areas to be mapped. Moreover, their free availability opens unprecedented opportunities to the scientific community [23]. However, Landsat missions offer the longest archive of medium resolution RS data, the 16-day revisit cycle, that is often increased by cloud cover, which could be a limitation for fire monitoring. The multispectral instrument (MSI) onboard the Sentinel-2 A and B (S2) satellites offers enhanced spatial and temporal resolutions, suitable for burned area mapping [24]; in fact, if both S2 satellites are combined, global median revisit time interval has been estimated to be 3.7 days [25].
Our approach incorporates a region growing (RG) algorithm [26] adaptively tuned to automatically map burned areas from imagery by exploiting spectral change in surface reflectance induced by the fire in the visible (VIS) to near and short infrared (NIR, SWIR) wavelength domain. In this context, a fuzzy multi-criteria approach is applied to fuse degrees of partial evidence derived from pixel reflectance values in the S2 bands by means of OWA operators to derive a single value of global evidence of burn. OWAs can model distinct aggregation strategies through their weighting vector applied to the input values reordered by their magnitude. Different OWAs lead to distinct burned area maps depicting the phenomenon with variable rates of accuracy: underestimation (false negative/omission error) and/or overestimation (false positive/commission error) errors are a function of several factors, including site and fire characteristics. Hence, OWAs applying different fusion strategies can be used to extract seed and growing layers to tune the RG algorithm, which relies on binding conditions for the selection of the seeds while conditions for identifying candidate boundaries (i.e., the limits for the region growing) can be looser. A RG algorithm expands seed pixels over pixels with low burn signal but connected to more reliable seeds. Indeed, in digital image processing, RG is a segmentation algorithm that exploits spatial adjacency of pixels with similar characteristics to create clusters, and it has been widely used for thematic classification of satellite images [20,[27][28][29][30].
The approach proposed here is a hybridization of our previous proposal [31] by incorporating an adaptation mechanism based on a ML algorithm, partially knowledge and data driven defined in Goffi et al. [32] to map standing water areas, in order to fully automatize the mapping process. The novel contribution is the automated adaptation mechanism for the generation of both the seed and growing layers used by the RG algorithm. Hence, we propose: (i) to exploit a small training set of fire points to automatically define the OWA for computing the seed layer, and (ii) to automatically choose the OWA for the generation of the growing layer exploiting heuristic rules defined to minimize the final error in the resulting map of the RG algorithm based on the interpretation of the decision attitude of the OWAs. This attitude is modelled by quantifying two dimensions: pessimism/optimism and democracy/monarchy. Pessimism is the tendency of the OWA to generate more commission errors than omission errors, and vice versa in the case of optimism. Democracy foresees if there are only a few highly influential factors or many low influential ones that determine the fused result. In facts, democracy is related to considering as necessary all input data (i.e., factors deemed influential by the expert to map the undesired status of burned area) in order to determine the fused maps, or just a few of the inputs in the case of monarchy. A democratic fusion hints to the fact that the mapped burned areas are identified thanks to all factors exhibiting low evidence, while a monarchical fusion indicates that the burned areas are identified thanks to a few factors providing high evidence. We state that, when the fusion is democratic, the patterns of the burned areas are more homogeneous with respect to the factors than in the case of monarchy, when the influential factors may be very different from region to region.
Since the OWA is learnt from training data, we can explain ex-post its proneness to generating commission/omission errors, and in taking into account all low influential or a few high influential factors (i.e., partial evidence of burn). This information is exploited to generate the growing layer so as to tune the RG algorithm that expands the seeds to generate the final burned area map, thus adapting the processing to site characteristics. Furthermore, this automatic adaptation can take place even when reducing or changing the input factors, without the need of human intervention.
The algorithm is applied to map burned areas at four distinct sites in southern Europe during the 2017 summer fire season. The accuracy of the output BA map is assessed by comparison with reference fire perimeters. Validation, besides demonstrating the high accuracy achieved by the algorithm, also demonstrates that the predictions provided by the learnt OWA operators in terms of degrees of pessimism/democracy can be appropriately used to tune the RG algorithm so as to maximize accuracy of the final map.

Materials
As case study, for applying the BA mapping algorithm from Sentinel-2 (S2) satellite images, we selected four sites in southern Europe where large fires occurred in 2017, a severe fire year for the European continent, due to abnormal droughts and heat waves [33]. Extreme weather conditions led to large fires affecting, in particular, Portugal, Spain, southern France, Greece and Italy [34]. In particular, we selected sites in Spain and Greece as shown in Figure 1. The geospatial dataset is composed of (i) S2 images, (ii) active fire points and (iii) reference burned area perimeters. The algorithm, exploiting S2 spectral bands as input, delivers BA maps depicting the total area affected by fires. The S2 multispectral instrument (MSI) measures the Earth's reflected radiance in 13 spectral bands from VIS/NIR to SWIR with a spatial resolution ranging from 10 m to 60 m [35]. Table 1 reports the characteristics of the MSI reflectance bands of interest for this study. Spectral bands in Table 1 were selected as potential inputs for the BA algorithm; in a previous work [31] each reflectance band and the temporal difference ∆ between post-fire and pre-fire reflectance were analyzed to identify the most suitable ones for discriminating burned and unburned surfaces: post-fire S2 reflectance in Red-Edge and NIR (RE2, RE3, NIR) and temporal difference ∆ between pre-fire and post-fire S2 reflectance in the same bands and SWIR2 (∆RE2, ∆RE3, ∆NIR, ∆SWIR2).
Over each site, a pair of clear-sky S2 images were selected before (pre-fire date) and after (post-fire date) major fire events (Figure 2, Table 2). The sooner the image is acquired after the fire event the easier is the detection due to a stronger spectral signature of burn, as a consequence of fire on vegetation compound. For this reason, in general, we select the first available clear image after the event [36]. In this study, the temporal difference between pre-and post-fire images is on average 20 days. Table 1. MSI S2 spectral bands, spectral domain, central wavelength, spatial resolution and name used in this paper. Temporal difference (∆) is the reflectance difference between post-fire and pre-fire S2 images. In bold are the spectral bands and difference used as input features for the BA mapping algorithm in this study.    . The Sen2r toolbox makes available functions to process Level-1C images for atmospheric correction to deliver Bottom of Atmosphere (BOA) reflectance images in the VIS-NIR-SWIR wavelengths (S2 bands 2 to 12) at 10 m spatial resolution (after resampling of the lower spatial resolution SWIR spectral bands available at 20 m).

Band
In this study, a burned area represents the area affected by a fire that shows a change in vegetation cover and/or in ground surface that can be detected by RS data [40]; hence, a BA map is a geo-spatial product, generally representing a binary thematic information (burned/unburned) as grid/raster or vector/polygon format, and delivered by the algorithm that takes S2 images as input.
Active fires (AFs) were downloaded from the Fire Information for Resource Management System (FIRMS, https://earthdata.nasa.gov/earth-observation-data/near-realtime/firms, last access 1 July 2021) and used as training data. FIRMS distributes Near Real-Time (NRT) and archived active fire data from the NASA's Moderate Resolution Imaging Spectroradiometer (MODIS), aboard the Terra and Aqua satellites [41], and NASA's Visible Infrared Imaging Radiometer Suite (VIIRS), aboard the joint NASA/NOAA Suomi National Polar orbiting Partnership (Suomi NPP) [42]. Both MODIS and VIIRS fire datasets are accompanied by a layer of detection confidence of each individual fire, ranging in 0-100%. MODIS and VIIRS products were subset to extract fires detected between the S2 pre-fire and post-fire dates for each study site (Figure 2).
At the four sites, reference fire perimeters were downloaded from the Copernicus Emergency Management Service (EMS). EMS delivers on-demand and near-real time (hours/days) geospatial information in support of emergency management activities: this information is and derived from processing and analysis of satellite imagery acquired immediately after natural or man-made disasters such as floods, droughts and forest fires. EMS delivers ready-to-print maps and geographic dataset (vector package) for fire perimeter and fire damage grading (burn severity) derived from very high-resolution (VHR) multispectral images (https://emergency.copernicus.eu/mapping/list-of-activations-rapid, last access 1 July 2021). In the EMS products, reference date is the date of the VHR image data acquired for producing the delineation and damage mapping. Table 2 reports the dates of the S2 pre-and post-fire image pairs and fire reference perimeters. EMS fire perimeters were used as reference/ground truth for the validation of the BA maps produced by the algorithm.

Methods
The multi-criteria data-fusion approach proposed in this work builds on fuzzy sets theory to aggregate information provided by a set of input features, or contributing factors, by applying Ordered Weighted Averaging Operators (OWA) [18]. The approach implements a fusion function through an aggregation of multiple inputs to provide a reliable evaluation of the target phenomenon by modelling a reinforcement of evidence provided by redundant and complementary information.
In the case study of BA mapping, the aggregation of S2 bands and their temporal difference (∆) can provide a reliable evaluation of the occurrence of fire, based on both the spectral signature of burned areas and the spectral reflectance change induced by the effect of a fire on the vegetated surface. Hence, input features are the seven spectral bands and difference identified by Sali et al. [31] as providing the greatest separability between burned and unburned surfaces: RE2, RE3, NIR, ∆RE2, ∆RE3, ∆NIR, ∆SWIR2 (Table 1). Spectral bands and difference are interpreted by Membership functions (MF) of the fuzzy sets, which assign to each pixel a membership degree (MD) in [0, 1] that is the partial evidence of burn as brought by a single input: the closer the value to 1, the greater the evidence of burn. This step first normalizes the domains of all input factors to a common domain, so as to make them comparable and consistent, by, at the same time, enhancing the signal of burned conditions. MFs can be defined with different approaches according to available expertise and training data [43][44][45]; in this study, we exploited MFs from our previous work defined as parameterized sigmoid-shaped functions from training data [31].
The MDs values are aggregated by means of a fusion function defined as an OWA operator to provide a synthetic score of global evidence of burn, as brought by redundant/complementary inputs. OWAs can model different attitudes/semantics. Global evidence of burned areas obtained with different OWAs (e.g., ranging from extreme conditions of minimum and maximum operators) is used as seed (OWA seed ) and a grow (OWA grow ) layers by the Region Growing (RG) algorithm that exploits spatial connectivity of burned pixels. Here, we present a new formalization of the BA algorithm where the input to the RG algorithm can be automatically generated from training AF points.

Ordered Weighted Averaging Operators (OWA)
An OWA of dimension N and weighting vector W, with ∑i = 1, ... N w i = 1, aggregates N input values [d 1 , . . . , d N ] and computes an aggregated value a in [0, 1] as follows [46]: Input values d 1 , . . . , d N , are rearranged from the greatest to the smallest; reordering is a fundamental step of OWA operators, meaning that a specific weight w i is not univocally associated with the specific i-th input, but rather it is associated with the i-th position of the reordered inputs. In the case of BA mapping, OWAs can adapt to burned surface spectral characteristics, that can vary with site characterisitcs, by selecting pixel by pixel the input feature that brings the greatest evidence of burn. Different weighting vectors W lead to different OWAs, including Max, Min and arithmetic mean operators: It can be proved that OWA operators satisfy commutativity, monotonicity and idempotency and are bounded by Max and Min operators:

Semantics of Ordered Weighted Averaging Operators (OWA)
The semantics of an OWA operator with weighting vector W has been characterized by two measures [46]: the measures of orness and of dispersion. The measure of orness(W) ∈ [0, 1] is defined as follows: This measure characterizes the degree to which the aggregation is similar to an OR (Max) operator. Generally, in decision making, it is related to the tolerance of the decision maker, intended as his/her attitude to accept that only some criteria are satisfied, while intolerant decision makers demand that most or even all criteria are satisfied [47]. In other terms, orness measures the degree to which the OWA operator has a conjunctive behaviour.
It can be shown that, when the input values d 1 , ..., d N are degrees of partial evidence of an undesired phenomenon from N distinct sources, i.e., the greater they are, the more severe the evidence, we can assess the following interpretation of orness in relation to the fusion attitude, in which the fusion function is regarded as a decision maker agent [18,48,49]: orness[1, . . . , 0] = 1 indicates a pessimistic attitude of the fusion applied to minimize the risk of underestimating the spatial extent of a critical phenomenon (i.e., nothing is disregarded, any single source alone is trusted and taken into consideration to map the phenomenon extent); orness[0, . . . 1] = 0 indicates an optimistic attitude of the fusion, applied to minimize false positives due to overestimation of the effects of a critical phenomenon (i.e., one wants to prioritize anomalies pointed out by all sources since any source alone is not trusted by itself); orness[1/N, . . . , 1/N] = 0.5 indicates a balanced and neutral attitude towards over and under estimation of the phenomenon extent.
Notice that, in this interpretation, high values to aggregate are considered with a negative connotation.
The dispersion measure can also be defined to qualify the semantics of an OWA operator depending on the form of the weighting vector and representing how much of the information in all the input values is used by an OWA. The idea behind its definition is that the greater the dispersion, the more democratic the aggregation of the correspondent OWA, since it uses information from more sources/factors. Several dispersion measures have been proposed [50], the first of which is based on the concept of information entropy of W: This definition of dispersion is an entropy and satisfies the following properties: • Minimum value is obtained when w i = 1 for some i, then dispersion(W) = 0, • Maximum value is obtained when w i = 1/N for all i, then dispersion(W) = ln(N).

Fusion Attitude based on Optimism and Democracy
As stated in the previous section, a fusion function can be regarded as an automatic decision maker agent, which can be pessimistic or optimistic on the impact of a critical or anomalous event such as a wildfire. When it is fully pessimistic, it requires the worst scenario to be identified, thus tolerating more false positive alarms (commission greater than omission errors) in order to be cautious and not to neglect any possible critical situation. Conversely, when it is fully optimistic, it means that it tolerates false negative (omission greater than commission errors), in order to analyze only the priorities that demand intervention. In between these two extreme cases, there is a continuum of blending of optimistic and pessimistic attitudes. Perfectly in the middle, we have a neutral attitude that equally balances optimism and pessimism. We can define a variable ps in [0, 1] to quantify the desired degree of pessimism of a fusion attitude that assumes values 1 or 0 in the cases of full pessimism or full optimism, respectively, and 0.5 in case of neutral attitude.
Another dimension of the fusion attitude is the level of democracy that it applies among the multiple input factors to determine the result. Democracy depends on both the number of the factors and the degree of evidence they provide, to determine the final result. In the case of maximum democracy, all factors are considered equally influential, while in the case of full monarchy, the rule of one drives to the result. In between these two extremes, we can have a blend of democracy and monarchy.
We define a variable dm in {1/N, 2/N, . . . 1}, with N is the number of factors, to represent the degree of democracy of a fusion attitude the meaning of which is the percentage of factors it considers. When dm = 1/N, we have full monarchy (the rule of one) while dm=1 corresponds to full democracy (one head one vote), while intermediate values 1/N < x/N < 1, with 1 ≤ x ≤ N, specify a blend of the two extremes.
If we consider an OWA with weighting vector W that fuses partial evidence of burn as a decision maker agent, we can define its attitude by the pair pessimism ps in [0, 1] and democracy dm in {1/N, 2/N, . . . 1}. Pessimism is related to its orness (3) and democracy dm to the dispersion measure (4) as follows: In order to make it easier to understand the semantics of the OWA operators once we computed the pair (ps, dm), in Figure 3, we provide linguistic expressions that allow humans to interpret the correspondent attitudes of the OWA operators to generate more omissions/commissions errors in heterogeneous/homogenous areas. In the bi-dimensional space of pessimism (ps, rows) and democracy (dm, columns) distinct quadrants correspond to given attitudes.
Pessimism and optimism are determined by high and low degrees of evidence, respectively, of an undesired status of the environment due to wildfires: a high value is considered a pessimistic view of the status, i.e., something negative, while a low value is an optimistic view, something positive. Thus, an OR (AND) fusion is regarded as a pessimistic (optimistic) attitude since one trusts the most pessimistic (optimistic) criterion, and thus is prone to generate more commission then omission errors (vice versa).
Moreover, a democratic fusion function indicates that all factors are needed to capture the characteristics of all burned areas, meaning that these surfaces are likely to have homogeneous conditions. Conversely, if the fusion is monarchical, it means that each burned area may exhibit its own highly influential factor, an thus may have heterogeneous conditions.

Learning OWA Weighting Vector from Training Points
An OWA operator, i.e., its weighting vector W, can be learnt from data assumed as ground truth by applying a ML algorithm; training data can be used to this aim being a highly reliable evidence of the phenomenon under investigation. The OWA operator can be defined by iteratively minimizing errors between OWA results and training points.
Given K georeferenced training points in the map a 1 , . . . a K (in our case: points labelled as active fires-Afs), they are assumed as ground truth. By knowing their geographic coordinates, we can associate to each of them the MDs [a i1 , . . . a iN ] of the N input factors having the same coordinates such that we obtain the following antecedent-consequent rules that must be satisfied: a 11 , ...., a N1 → a 1 . . . a 1K , ...., a NK → a K In the case study of BA mapping, the values a 1 , . . . a K , are defined on a continuous scale [0, 1] to quantify the extent of the evidence of burn in the specific location (1 full evidence, 0 no evidence, and intermediate values mean partial evidence).
The learning mechanism starts at epoch L = 0 by assuming as initial OWA 0 operator the weighted average (balanced and neutral attitude), that is defined with weighting vector W 0 = [1/N, . . . 1/N]. Then, at each epoch L, it iteratively determines the weighting vector W L = [w 1L , . . . , w NL ] of OWA L that minimizes the error existing between the results of its application to all the antecedents of the rules in (7) and the values a 1 , . . . a K , of the training points (i.e., the consequents of the rules). Formally, this is equivalent to applying the following rule: where Λ i (L + 1) = Λ i (L) − βw iL (argmax i (a 1k , ...., a Nk ) − OWA L (a 1k , ...., a Nk )) * (OWA L (a 1k , ...., a Nk ) − a k ) (9) in which β ∈ (0, 1] is a learning rate parameter, and the i-th weighting vector element at epoch L is defined as follows: While in Goffi et al. [32], field observations of standing water were used as training points to learn the best OWA operator for a given site, directly to map flooded areas, in this work, by exploiting Afs points as training data, a vector W was learnt for each site and used (i) to quantify the fusion attitude site by site and (ii) to define the OWA for the generation of the seed layer for the RG algorithm.

Workflow of the Automatic BA Mapping Algorithm
The workflow of the automatic BA algorithm is shown in Figure 4 in which the grey box enhances the novel part that has been embedded in our previous proposal [31]. Since the algorithm is applied at the pixel level, one independently from the others, we can refer to input/output of any step of the workflow as rasters/layers/maps, intended as georeferenced and co-registered matrices of pixels. First, remote sensing data are collected for each study area. A set of features is computed, identified as contributing factors to determine partial evidence of burn. In this case, input features are spectral bands and temporal difference. These features are subjected to membership functions of fuzzy sets defined on the domain of the features and computing degrees of evidence of burn in [0, 1] (i.e., MD for membership degree). Both the identification of the most suited features and the definition of the membership functions have been carried out in a previous study [31]. Since the same features and functions have been tested over the sites of the present experiment, there was no need to readapt them. Figure 4. Workflow of the fully automatic algorithm proposed for burned area mapping with the multi-criteria adaptive approach. The grey box highlights the innovative step in the algorithm introduced to fully automatize the definitions of both OWA seed and OWA grow , generating the seed and grow layers, respectively, and used in input by the RG algorithm.
Once MDs are computed, the ML algorithm described in Section 3.4 is applied independently at each site by taking as training data AF points within the site, and by using them to learn the OWA operator. The learnt OWA is subsequently applied to aggregate MD values of the site to generate the seed layer for the RG algorithm. In order to choose the most appropriate growing layer, we exploit the information on the attitude of the learnt OWA operator by computing its degrees of pessimism (ps) and democracy (dm) as described in Section 3.3 by applying Formulae (3) (4) and (5) (6).
While the knowledge of dm can be useful to the expert, in order to have an idea of the homogeneity or heterogeneity of the BAs' features in each site, the degree of pessimism ps is used in the RG algorithm to select the growing layer (OWA grow ) that should minimize predicted errors by applying heuristic rules such as the following ones: Hence, rules in (11) formalize a simplified choice of the OWA grow to counter-balance the tendency to increase the commission/omission errors by containing the seeds' expansion depending on the learnt OWA seed . The rationale of this heuristic can be understood by ordering the OWA operators by the increasing value of their pessimism ps: OWA AND < OWA Almost_AND < OWA Average < OWA Almost_OR < OWA OR 0 < 0.08 < 0.5 < 0.92 < 1 When ps is closer to the neutral attitude (ps ≈ 0.5) than to the extreme ps = 1 corresponding to full pessimism, the prediction on the type of error is very uncertain, meaning that omission and commission are more or less balanced. In this case, to maintain a balanced behavior, OWA grow is chosen as the OWA Average . When ps is closer to full pessimism, (ps = 1), it is more likely that the seed layer contains more commission than omission errors, hence the expansion is contained by selecting an OWA grow that is more optimistic than OWA seed , i.e., OWA Almost_AND . On the other side, if ps is closer to full optimism (ps = 0), seeds are likely to be affected by more omission than commission, and thus they are maximally expanded in an attempt to decrease the omission by selecting OWA OR . Since in our experiment we assumed that it is preferable to have more commission than omission, we introduced a rule for values of ps in between 0.25 and 0.5 that would hint to a slight prevalence of omission, so as to relax the expansion in an attempt to decrease the omission thus selecting OWA Almost_OR . Notice that these rules have been set a priori, based on the rationale, and without any tuning on experimental data.
After the selection of the most appropriate OWA grow , the region growing (RG) algorithm is run in Harris IDL language (https://www.l3harrisgeospatial.com/docs/region_ grow.html, last access 1 July 2021). Region growing is a procedure that groups pixels or sub regions into larger regions based on pre-defined criteria [26]. The RG algorithm is an iterative algorithm that starting from initial seeds extracted from the OWA seed layer (OWA learn , in this case), it searches the eight-neighbor connected pixels and it includes in the new seed layer for the next iteration only those pixels that satisfy the constraint OWA grow > 0. Initial seeds are pixels with OWA learn > 0.9. The output is a raster map with pixel value, RG score , in [0, 1].

Validation Metrics
Validation is the assessment of thematic accuracy of burned area maps, derived from the output raster of the RG algorithm. Over each site, the output RG score rasters are converted to binary, burned/unburned maps, prior to comparison with reference EMS fire perimeters. Since EMS products are distributed as shapefile, rasterization is necessary for pixel by pixel comparison to build the error/confusion matrix (Table 3). An error matrix is a square array of numbers organized in rows and columns, which expresses the number of sample units (i.e., pixels, clusters of pixels, or polygons) assigned to a particular class relative to the actual class, as indicated by the reference data [51]. In RS literature, classification is generally a multi-class problem (e.g., land cover classification) and the error/confusion matrix is an array with the number of columns/rows > 2. Table 3. Sampled error/confusion matrix: n ij express the number of pixels of agreements (diagonal cells) or disagreements (off diagonal cells) between the BA product and the reference EMS. In the case of binary classification (burned areas and not burned areas), considering as target objective of the algorithm the identification of burned areas, we can make this equivalence: True Positives (TP = n 11 ), True Negatives (TN = N 22 ), False Positives (FP = n 12 ) and False Negatives (FN = n 21 ). In the modelling literature, the evaluation of model forecast generates a 2 × 2 square matrix, where columns and rows are labelled as false/negative or true/positive occurrence; in this case, predicted values can be true positives (TP), false negatives (FN), true negatives (TN) and false positives (FP) [52].

EMS
BA mapping is a binary classification problem (burned/unburned) and the error/confusion matrix could be assimilated to the one proposed in the evaluation of modelling forecast, but we maintain the RS formalization as shown in Table 3 and widely used in the RS literature [53][54][55][56].
Various summary metrics can be derived from the error/confusion matrix; in this work we selected the following ones which are those commonly used in remote sensing: commission error, omission error, Dice Coefficient (DC) [57] and relative bias (Table 4).

Learning the OWA Operator for Seed Layer Computation
The algorithm described in Section 3.5 and depicted in Figure 4 was applied to the four study sites. As stated above, from S2 image pairs for each site, pre-and post-fire, we selected the seven input features, RE2, RE3, NIR, ∆RE2, ∆RE3, ∆NIR, ∆SWIR2, that are converted to membership degrees MDs by applying the MFs to generate seven partial evidence maps. These maps are fused by applying two distinct OWAs operators to generate OWA seed and OWA grow for the RG algorithm. In this case, OWA seed = OWA learn that is defined at each site by applying the ML algorithm described in Section 3.4 exploiting AF points: its weighting vectors, W, are reported in Table 5 for each site. Results show that at all sites except Kalamos, the learnt operators end towards a pessimistic attitude, generating more commission than omission errors. However, this prediction is highly uncertain due to the closer value of ps to the neutral attitude (ps = 0.5) than to the full pessimistic attitude 1 (ps = 0.55, ps = 0.70 and ps = 0.54, respectively). In Kalamos, the prediction is that there is slightly more omission, even if it is highly uncertain too (ps = 0.4). Furthermore, at all sites except Calar, the learnt operator is nearly monarchical, thus exploiting few partial evidence maps to determine the result (i.e., the global evidence). In Calar, the nearly democratic operator combines all partial evidence degrees. This is also apparent by looking at the elements of the single weighting vector W of Calar, which is never null (w i > 0, ∀i = 1 to 7), while for the other vectors, we have at least two or more null elements. Table 5. Weighting vectors of the OWA learn in each site, its degrees of pessimism ps and democracy dm, the correspondent attitude expressed linguistically, the expected type of error in the seed layer generated by OWA learn , the predicted OWA grow based on the heuristic rules in (11) and the best performing OWA grow that has been assessed based on the validation comparison. The OWA grow layer was then chosen by applying rules defined in (11), based on the values of pessimism ps in each site: at all the sites except Kalamos, predicted OWA grow = OWA Average , while in Kalamos predicted OWA grow = OWA almost_OR .

Burned Area Mapping Accuracy
The RG algorithm was applied to automatically generate the final map of burned areas by masking out not-vegetated (bare soil and urban classes) and agricultural regions based on the Corine Land Cover map [58]. Output BA maps are validated by comparison with reference fire perimeters from the EMS products.
We also compared the result of the proposed fully automatic algorithm with those yielded by using the semi-automatic algorithm proposed in Sali et al. [31], in which both OWA seed and OWA grow layers were manually selected with several different combinantions. In this comparative analysis, seeds were extracted from OWA AND (OWA AND > 0.9) to simulate the requirement of highest reliability of seed pixels and identified by the most restrictive operators (i.e., implanting an AND/Min condition) and from the OWA learn to simulate a fully automatic condition; for the grow layer, we tested OWA Average , OWA AlmostOR and OWA OR . All RG output maps were compared to EMS fire reference perimeters for accuracy assessment: the confusion matrices and metrics are reported in Tables A1 and A2. In the tables, the combination of OWA seed and OWA grow that achieves the greatest accuracy at each site is highlighted in bold; the same information is also reported in the last column of Table 5, together with the increase in the Dice coefficient (∆dc) brought by the best performing algorithm with respect to the fully automatic one, implemented with the predicted OWA grow .
Finally, we also performed an ablation study by removing the RG algorithm and considering as direct result the map generated by OWA learn in each site, and comparing it with the map obtained by applying other OWAs, specifically, OWA AND , OWA AlmostAND , OWA Average , OWA AlmostOR and OWA OR , with weighting vectors defined in Section 3.2. This study was performed to assess the utility of the RG algorithm, that is expected to reduce commission errors. Results are synthetized by the accuracy metrics (defined in the previous Section 3. 6), and used to compare the experiments; metrics are also reported in graphical form in Figure 5 and discussed in Section 5. By looking Table 5, we can observe that the fully automatic algorithm proposed in this paper using the predicted OWA grow (Table 5, Predicted OWA) together with the best-performing semi-automatic algorithm in two areas (Huelva and Zakynthos). At both sites, the predicted OWA grow = OWA Average , the one with manual selection yielding best performance (dc > 0.9).
In the Calar site, predicted OWA grow = OWA Average , while the best performing semiautomatic version corresponds to OWA grow = OWA Almost_OR . In Kalamos, predicted OWA grow = OWA Almost_OR , while the best performing one corresponds to OWA grow = OWA OR . Nevertheless, there is a negligible difference in mapping accuracy, as quantified by the Dice coefficient; indeed, accuracy loss (∆dc) is 0.01 and 0.007 for Calar and Kalamos, respectively.
We can also notice that, in both of these two cases, the predicted OWA grow operators have a smaller pessimism than the best performing ones. This suggests a revision of the heuristic rules which were set a priori, just based on the rationale and not on experimental tuning. We can conclude that the fully automatic algorithm performs equally or very close to the best semi-automatic algorithm with manual setting of both OWA seed and OWA grow for generating the seed and grow layers. Figure 5 depicts accuracy metrics bar plots (commission, omission, Dice coefficient and relative bias) for the tested combinations of OWAs with (RG) and without RG (noRG); numeric values are summarized in Tables A1 and A2.
Average dice coefficient of the full automatic algorithm over all sites is 0.94 ± 0.03 (±one standard deviation) that equals the average accuracy over all sites of the best performing semi-automatic algorithms in each site. They differ for the average relative bias that is slightly positive 0.004 (omission > commission) for the fully automatic algorithm, while the best performing algorithms, on average, produce a negative bias equal to −0.005. The balance between average omission and commission is greater for the full automatic algorithm (average oe = 0.057 and average ce = 0.068), while the best performing semi-automatic algorithm tends to generate more commission (average ce = 0.08) than omission (average oe = 0.044). These results show that the fully automatic algorithm has a more balanced behavior in terms of omission and commission than the best performing semiautomatic algorithm at each site and equal accuracy.
By analyzing the ablation study, i.e., by comparing the results obtained when using or not the RG algorithm, it is clear that the RG algorithm is very necessary to decrease commission errors. The first clear outcome is that using an RG algorithm in the full automatic algorithm leads to both: (i) a reduction in the errors and (ii) a lower variability in the accuracy metrics among selected OWAs. The former is quantified by the Dice coefficient ranging in [0.91, 0.97] and [0.73, 0.93] with and without RG algorithm, respectively.
At all the sites, we can observe a decrease in the Dice coefficient, dc, when not using the RG algorithm: the largest decrease occurs in the Calar site, Spain, where dc is 0.92 with RG, and 0.73 with noRG; in Huelva, dc is 0.91 with RG, and 0.88 with noRG; in Kalamos, dc is 0.97 with RG, and 0.93 with noRG; finally, in Zakynthos dc is 0.94 and 0.78 with RG and noRG, respectively. The reduced variability is confirmed by the lower standard deviation of the dc metric estimates, that decreases from 0.09, when not using RG, to 0.03 when RG is applied (Table A1).
Indeed, in the fully automatic version, the contextual conditions applied by the RG algorithm reduce commission error at all sites by a quantity that ranges between a minimum of 5.2% in Huelva, and 8.2% in Kalamos, to a maximum of 25.2% in Zakynthos, and 34% in Calar. This decrease in commission does not always affect the increase in omission: when applying RG at the Calar and Huelva sites, omission increases only for 0.4% and 0.1%, respectively, while when applying RG in Kalamos and Zakynthos, there is also a decrease in omission, with a reduction in ce equal to 0.3 and 0.1, respectively.
Results confirm that the best mapping accuracy is achieved when region growing is applied as a way of balancing omission and commission errors [59]; in fact, region growing and contextual approaches are largely used in thematic mapping.
Overall, the site with lowest ce is Kalamos (ce = 2%); also at this site the greatest values for the Dice coefficient and relative bias metrics are obtained for the fully automatic algorithm (dc = 0.97, relB = 0.005).
Finally, by observing the results obtained without the RG algorithm, extreme optimistic conditions are depicted, consistently over all sites, by the noRG_AND and noRG_AlmostAND algorithms which, not surprisingly, deliver the greatest omission errors due to the restrictive condition applied by AND-like operators. This result was largely expected since AND-like operators implement a fusion strategy based on the selection of the minimum value of the global evidence of burn. Despite leading to a significant underestimation, global evidence of these operators is highly reliable for the restrictive conditions applied to fuse input features. Hence, OWA AND could be chosen as alternative source for seed points when no training is available [31]. Indeed, it can be observed that, at all the sites, the fully automatic algorithm achieves the same results of the manually set algorithm generating the seed layer by OWA AND and having the same grow layer of the automatic algorithm. In seed-based region growing algorithms, selection of initial seed points is crucial since it influences the final accuracy [60,61]; our results confirmed that the proposed approach is robust with respect to the choice of seeds from both OWA AND and OWA learn . Nevertheless, when choosing OWA learn we can exploit the knowledge of its attitude to select the OWA grow for generating the grow layer adaptively in each site, so as to minimize errors. In fact, the combination OWA seed = OWA AND and OWA grow = OWA Average achieves the same accuracy of the fully automatic algorithm at all the sites. Nevertheless, if we regard the omission and commission, we can see that the balance is slightly different for the manual set algorithm at the Kalamos site, where oe = 5.2% and ce = 1.1, while with the fully automatic algorithm, a better balance is achieved; oe = 3.1 and ce = 2%. Nevertheless, to confirm the usefulness of the adaptability mechanism over the manual combination OWA seed = OWA AND and OWA grow = OWA Average new experiments are needed.

Discussion
In this paper, we propose an approach for automatically mapping burned areas from S2 imagery, exploiting reflectance values in the S2 spectral bands; spectral signal of burned surfaces in post-fire images as well as in temporal difference in reflectance values are fused by OWA operators. The proposed algorithm builds on our previous work where OWAs were exploited to map surfaces affected by disturbances such as wildfires [31] and flooding [32]. In this paper, we further confirm that OWAs are flexible operators for data fusion in multi-criteria evaluations and we also propose a fully automated version of the BA mapping algorithm. In this improved version, we apply a ML algorithm, trained over input active fire points operatively made available by RS data, to learn the weighting vector of the OWA operator (OWA learn ). This way, we can learn an OWA that is tuned over site and fire characteristics. The experimental tests carried out over four study sites in southern Europe (Spain and Greece) for the 2017 summer fire season showed that the weighting vector learnt from the training AFs changes from site to site, thus reflecting differences in the characteristics of the surfaces affected by fires.
We propose exploiting two measures that can be derived from the semantic of the OWA learn operator (orness and of dispersion) to formalize fusion attitude through pessimism (ps) and democracy (dm). In particular, pessimism (ps), is exploited to automatically identify OWA grow , i.e., the optimal growing layer, of the RG algorithm that is implemented in the approach ( Table 5).
Results of the experiments show that by adapting the choice of OWA grow depending on the degree of pessimism ps of OWA seed (where OWA seed = OWA learn ) and determined based on learning allows us to achieve accuracy levels of BA mapping equal or very close to the best performing OWA grow in all the four sites. This is because the adaptation mechanism actuated by the rules defined in (11) counterbalances the attitude of OWA learn to generate more or less commission/omissions.
If we compare the results of the automatic algorithm with all those obtained by the semi-automatic algorithm in which OWA seed = OWA AND combined with different OWA grow , we can observe that at two of the sites the automatic algorithm achieves equal or greater accuracy with respect to all semi-automatic versions. Only at the Calar site the greatest accuracy is achieved by the semi-automatic algorithm with OWA seed = OWA AND and OWA grow = OWA Almost_OR . The performance is certainly a function of the learning mechanism and of the ability of active fire points to represent the variability of the spectral characteristics of burned areas, as observed in S2 wavebands. Spectral characteristics of burned areas are largely variable as a function of pre-fire vegetation, soil properties, fire characteristics, fire severity and the age of the burned surface [62,63]. Fire severity is in fact one of the factors controlling post-fire vegetation recovery and regrowth. Although the learned OWA at this site may be inadequate to represent the actually burned areas, mapping accuracy, as quantified by the metrics, is more than satisfactory (dc > 0.8). Figure 6 shows the results of the automatic algorithms in each site: the RG score with seed points (first column) compared to EMS reference fire perimeters and the spatial distribution of the agreement between BA maps and reference (second column, distinct colors to mark agreement and disagreement classes). It can be immediately noticed that RG algorithm exploiting spatial connectivity allows more compact burned areas to be generated than by relying solely on the segmented RGscore maps obtained by applying a given threshold, which appear highly fragmented. It can also be appreciated that omission (False Negatives) and commission (False Positive) are located at the boundaries between the burned (TP) and unburned (TN) areas, meaning that the grow layer generation could be refined. It is, however, true that in these regions we can find the most critical detection conditions, such as partially burned pixels and low severity burned pixels. As far as the degree of democracy is concerned, it is an indicator of how many factors and how much influence they have in determining the final map of burned areas. Figures 7 and 8 depict the MDs of all factors at the Huelva and Kalamos sites, respectively; in both cases, the degree of democracy dm is below 0.5, that is the neutral attitude, thus corresponding to the nearly monarchical attitude (Table 5). Notwithstanding this, since their degrees of dm are noticeably different, being equal to 0.28 and 0.45 in Huelva and Kalamos, respectively, we can observe different patterns of the high influential partial evidence degrees (MDs) at the two sites. While in Huelva there are only a few highly influential factors, in line with lower value of dm, essentially the most influential is ∆SWIR2 and then ∆NIR and ∆RE3, in Kalamos, we have a more variable situation with two most influential factors ∆SWIR2 and ∆RE2, and then also ∆RE3 and ∆NIR, and finally RE2 and RE3. This means that at the two sites, burned areas are characterized by different spectral reflectance values in S2 bands probably due to different vegetation and burn severity. The algorithm is indeed able to adapt to site characteristics by flexibly selecting the most important layers in the fusion step. For example, in the Kalamos site, burned areas are mainly located in shrubland vegetation class (data not shown).

Conclusions
The fully automatic interpretable and adaptable algorithm presents several advantages over the literature both theoretical and practical.
First of all, it needs a small set of classified points for training (active fires) which allows fast learning; in this case study, in particular, we used a total of 300, 79, 327 and 189 AF points for the Calar, Huelva, Kalamos and Zakynthos sites, respectively. Deep learning approaches, on the contrary, typically Convolutional Neural Networks, need tens of thousands classified pixels and, as a consequence, longer training phases. Additionally, in many real cases of Earth Observation over large areas, representative and spatially distributed data sets for training are not available. As proposed here, the training phase relies on input active fires (MODIS and VIIRS from the FIRMS system) that are operationally available at the global scale from satellite-based products. Moreover, when changing the area, one generally needs to repeat the training phase with new ground truth data; in fact, transfer of a pre-trained models greatly depends on the choice of a proper CNN architecture for the target purpose.
Conversely, the proposed approach being partially knowledge-driven and partially data-driven has the second advantage of exportability of the knowledge mined in a different area: in fact, it exploits domain knowledge and data analysis performed in a study area to identify the influential factors and transfers it to new sites without any need of modification. Exportability with respect to membership functions has been proved in a previous paper in which we applied predefined MFs trained in a different site to new sites without any modification and achieving a high mapping accuracy at all sites [31]. The adaptation occurs at the level of factors fusion which allows both the selection of the most influential factors and the factors' contribution to be tuned depending on the characteristics of the site.
Average accuracy metrics values of the burned area maps delivered by the fully automatic approach at the four sites are oe = 0.057, ce = 0.068, dc = 0.94 and relB = 0.0046; these values refer to the implementation of the proposed approach incorporating the RG algorithm. In fact, validation clearly showed that RG provides the highest accuracy by reducing commission errors. Although a full comparison with published values is difficult due to the differences in input data and algorithms, our results are more than satisfactory and comparable to published reference values for accuracy metrics of burned area maps [64][65][66].
The third advantage is the interpretability of the fusion in terms of its attitude to generating a seed layer affected by more commission/omission errors. Deep learning approaches are black boxes which achieve high prediction accuracy at the expense of lack of transparency: there is no possibility of understanding the "why" of the prediction. Nevertheless, being able to understand the prediction criteria is important in order to increase experts' knowledge of the problem. For example, one important aspect when using a product generated from remote sensing data analysis, which is inevitably affected by some form of error, is the knowledge of the types of errors: when using a map of burned areas to estimate the loss in ecosystems, it is important to know whether one is underestimating or overestimating the damage/loss. Nevertheless, there are situations in which reference data are not available to assess the accuracy of the generated map. With our proposed approach, even in this situation, we can state if the product will be affected more by commission or omission errors. Furthermore, being able to identify the most influential factors that determine the result is a condition that increases the trust of domain experts and their knowledge of the context.
Finally, the approach is general: in this paper we presented it to map burned areas but it can be applied for different tasks of environmental status assessment in land and environment management and planning. A version without the RG algorithm was successfully applied for mapping standing water areas [32]. The approach could be used to deliver maps of critical situations and anomalies produced by disturbance phenomena such as wildfires, floods, desertification, erosion.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.
Acknowledgments: Authors wish to acknowledge the four anonymous reviewers who significantly contributed to the improvements of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest. Table A1. Confusion matrices (TP = true positive, TN = true negative, FP = false positive, FN = false negative) and the metrics (omission error = oe, commission error = ce, Dice coefficient = dc, relative bias = relbias) for the selected and tested combinations of OWAseed and OWAgrow in the RG algorithm. Labels are the same displayed in Figure 5. The lines in bold are the best performing combinations of the seed and grow layer in each site; the lines outlined in yellow are those corresponding with the full automatic algorithm.  Table A2. Confusion matrices (TP = true positive, TN = true negative, FP = false positive, FN = false negative) and the metrics (omission error = oe, commission error = ce, Dice coefficient = dc, relative bias = relbias) for the OWA layers used for mapping Bas without RG algorithm. Labels are the same displayed in Figure 5.