Detection of Geothermal Potential Zones Using Remote Sensing Techniques

The transition towards a new sustainable energy model—replacing fossil fuels with renewable sources—presents a multidisciplinary challenge. One of the major decarbonization issues is the question of to optimize energy transport networks for renewable energy sources. Within the range of renewable energies, the location and evaluation of geothermal energy is associated with costly processes, such as drilling, which limit its use. Therefore, the present research is aimed at applying different geomatic techniques for the detection of geothermal resources. The workflow is based on free/open access geospatial data. More specifically, remote sensing information (Sentinel-2A and Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER)), geological information, distribution of gravimetric anomalies, and geographic information systems have been used to detect areas of shallow geothermal potential in the northwest of the province of Orense, Spain. Due to the variety of parameters involved, and the complexity of the classification, a random forest classifier was employed, since this algorithm works well with large sets of data and can be used with categorical and numerical data. The results obtained allowed identifying a susceptible area to be operated on with a geothermal potential of 80 W·m−1 or higher.


Introduction
Fossil fuels (coal, petroleum, and natural gas) encompassed 81% of the total primary energy supply in 2016 [1]. These energy sources are characterized by their limited reserves and for producing greenhouse gases (GHG), which cause, as consequences, the increase of the average temperature and extreme changes in the climate, among others. Climate change stabilization requires us to impose limitations on GHG emissions from fossil fuel combustion, thus, to try to minimize the effects of climate change, governments of different countries have accepted and ratified agreements, including an energy transition from fossil fuels to renewable energies [2].
One of the least widespread renewable energies in the world is geothermal [3], which uses the Earth's internal heat to generate heat in buildings and to generate energy in geothermal plants. Geothermal energy is defined as the energy stored in the form of heat beneath the surface of solid Earth [4]. Among the different types of geothermal energy: very low, low, high, and very high enthalpy, the present research is focused in low or very-low enthalpy types, also known as shallow types [5]. This energy is characterized by being partially renewable, having high availability, reducing GHG emissions [6], and producing less waste than other energy sources.
The detection of areas with geothermal potential was initially carried out by means of perforations to measure the temperature in the subsoil. Despite the fact that this method is accurate, it relies on the estimation of the geothermal reservoir parameters (thickness, porosity, permeability, and temperature) to estimate the geothermal potential [7]. Another factor to consider are the significant costs for the The geographic data processing was carried out using three software. Firstly, QGIS (Quantum Geographic Information System) with the Semi-Automatic Classification (SPC) plugin [17] and the GDAL (Geospatial Data Abstraction Library) georeferencer plugin [18], were used, respectively, for the download and pre-processing of the Sentinel-2A images, and for the re-georeferencing of the geological map. Secondly, ENVI 5.0 (Exelis Visual Information Solutions) was employed for the classifications of the Sentinel-2A images and the processing of the thermal images. And, finally ArcGIS 10.4.1, which was used to perform auxiliary tasks (e.g., image clipping or conversion to different image formats), digitize raster maps and for the final classification.

Land Cover Creation
Through the use of the Sentinel-2A images, it is possible to create a mask in order to exclude non-relevant areas, which encompasses vegetation, water zones, and infrastructures. These last three classes were not to be taken into account in the classification and subsequent GIS analysis.
Though the use of QGIS' plugin Semi-Automatic Classification [17] Sentinel 2 Level 1C images can be downloaded for further processing and conversion at level 2A. The Semi-Automatic Classification plugin allows the ability to apply the dark object substation (DOS) to the images before converting to radiances and then reflectances. This method, according to [19], is based on the basic assumption that within the image some pixels are in full shadow and their satellite radiances are due to atmospheric scattering (path radiance). This assumption is combined with the fact that very few surface targets on Earth are absolute blacks, so a minimum reflectance of one percent better than zero percent is assumed. The path radiance (Lp) resulted from the interaction of the electromagnetic radiance with the atmospheric components is obtained by [20]: where Lmin is the radiance that corresponds to a digital count value for which the sum of all pixels with digital counts lower or equal to the 0.01% of all pixels from the image considered; whereas LD01% is the dark object radiance, assuming it has a reflectance value of 0.01. In the case of the Sentinel 2 images, these are converted to radiances before applying the DOS correction. The dark object radiance is given by [17,20]:

Land Cover Creation
Through the use of the Sentinel-2A images, it is possible to create a mask in order to exclude non-relevant areas, which encompasses vegetation, water zones, and infrastructures. These last three classes were not to be taken into account in the classification and subsequent GIS analysis.
Though the use of QGIS' plugin Semi-Automatic Classification [17] Sentinel 2 Level 1C images can be downloaded for further processing and conversion at level 2A. The Semi-Automatic Classification plugin allows the ability to apply the dark object substation (DOS) to the images before converting to radiances and then reflectances. This method, according to [19], is based on the basic assumption that within the image some pixels are in full shadow and their satellite radiances are due to atmospheric scattering (path radiance). This assumption is combined with the fact that very few surface targets on Earth are absolute blacks, so a minimum reflectance of one percent better than zero percent is assumed. The path radiance (L p ) resulted from the interaction of the electromagnetic radiance with the atmospheric components is obtained by [20]: where L min is the radiance that corresponds to a digital count value for which the sum of all pixels with digital counts lower or equal to the 0.01% of all pixels from the image considered; whereas L D01% is the dark object radiance, assuming it has a reflectance value of 0.01. In the case of the Sentinel 2 images, these are converted to radiances before applying the DOS correction. The dark object radiance is given by [17,20]: where T V and T Z are the atmospheric transmittance along the atmospheric trajectory from the ground to the sensor, E down is the downward irradiance due to the solar flux dispersed in the atmosphere, and ϑ S is the solar zenith angle. The mean solar exo-atmospheric irradiances (ESUN λ ) values were obtained from [17]. Since there are several DOS techniques, the simplest one, DOS1, was employed, where the values for the aforementioned coefficients are: T V = 1, T Z = 1, and E down = 0 [21]. From these corrected radiance values, surface reflectance can be calculated by the following equation: Remote Sens. 2019, 11, 2403 4 of 20 where L λ is the spectral radiance at the sensor's aperture (at-satellite radiance), and d is the Earth-Sun distance in astronomical units. Using ENVI 5.0., two indexes are created for each one of the satellite images to allow detecting vegetation and infrastructures zones with great ease. The first index is the normalized difference vegetation index (NDVI) [22]. This index is used to determine the amount of vegetation in an area. Its value varies from −1 to +1. The NDVI is calculated using the following equation with the near-infrared (NIR) and red (R) bands.
The second index employed is the normalized difference building index (NDBI) [23], which is used to determine built areas. The NDBI is calculated using the following equation according to a short wave infrared band (SWIR).
By the use of a vector file with the administrative limits of the analyzed region, a new file that delimits the study area was created, to crop the images. After the pre-processing of the images, the classification of the principal soil surfaces was carried out: vegetation, infrastructures, water, pasture, bare soil, shrubland, and outdoor. To carry out the classification, a maximum likelihood algorithm was used. This classifier assumes that the data sample follows a normal distribution to assign the probability that a pixel belongs to a predeterminate class. It offers robustness of calculation [24]. The chosen calculation coefficients were a scale factor of 1 and 4000 and a probability threshold of 0% in both cases. The scale factor was used to convert whole scale reflectance data into floating point values.
Once classified, the images were validated using both a visual check, to eliminate those combinations that exhibit significant problems when classifying, and a validation by a set of validation points on the image to calculate the confusion matrices. Additionally, the Cohen's kappa coefficient, overall accuracy, user accuracy, and producer accuracy were computed. For the two last parameters (user and producer accuracy) the confidence intervals at 95% confidence interval were computed using the adjusted Wald method [25,26]. This method is chosen since provides the best coverage when the sample size is less than 150 [27].
The combination of parameters that provided the better results were applied to the rest of the images.

Land Surface Temperature (LST) Map
The variation of temperature with depth is called a geothermal gradient, and raises with increasing depth in the Earth's crust. The thermal energy generated and stored in the Earth's core, mantle, and crust is transferred towards the surface, mostly by conduction. This conductive heat flow makes the temperature rise with increasing depth in the crust on average 25-30 • C/km [28]. With the aforementioned average thermal gradient, a 1 km geothermal drill in a dry rock formation would have a bottom temperature near 40 • C [28].
The first step to generate the soil temperature image is to calibrate the satellite images to obtain radiance images. In order to perform this step, it is necessary convert the digital levels (DL) to the spectral radiance at the sensor's aperture (L λ ) in radiance units (W·m −2 ·sr −1 ·µm −1 ) according to Equation (6). The parameters of the linear calibration (gain and offset) can be obtained from the image's metadata.
The LST map is carried out using ENVI software [29,30]. This software has implemented several algorithms to calculate the emissivity, but in the present research the emissivity alpha residual method was used [31]. This algorithm uses an approximation of the Wien law of the Plank function, so it offers results with a 1%-2% error for temperatures of 300 K (27 • C) at a wavelength of 10 µm [32]. The obtention of the LST is based on the law of Planck, which relates the energy emitted by a blackbody (emissivity = 1) with its temperature. However, while most of the natural elements are non-black bodies (0 < ε λ < 1), it is necessary to modify Planck's law according to its spectral emissivity (ε λ ) according to the following equation: where L(λ,T) is the spectral radiance of the natural object, L BB the blackbody radiance, λ the wavelength and c 1 and c 2 the first and second radiation constants. Based on Equation (7), if the radiance values have been corrected for atmospheric effects and the emissivity of the surface is known, the temperature value (T) can be obtained as follows: The above expression is based on the assumption that the surface cover behaves like a perfect Lambertian reflector. Please note that the result obtained from Equation (8) is expressed in Kelvin, so a transformation to Celsius degrees is required.

Geological Map
To add the geological map as a layer for the GIS analysis, it is necessary to digitize the national geological map. The first step is to reduce the number of informational classes present in the legend, which includes more than fifty different elements. This large number of elements will not improve the final analysis, in addition to complicating the digitization process and decision making. So, a reclassification of these classes into new ones was performed. The new classes were created according to the rock formation process [33]. A total of six informational classes were created, which are described below in Table 1 [34]. Table 1. Description of the reclassified geological classes from the Spanish national geological map.

Class Description
Intrusive plutonic rock Solidified igneous rock inside the Earth's crust, from a silicated magma.
Intrusive igneous rock Rock originated by the solidification of a silicated magma inside the Earth's crust at a shallow depth (not exceeding 3 km).
Extrusive or volcanic igneous rock Igneous rock produced by activity volcanic, usually of very fine or vitreous grain.

Metamorphic rock
Rock derived from a pre-existing rock, through mineralogical, chemical or structural changes, in response to physicochemical variations, mainly of pressure and temperature (there are excluded the rocks derived from processes of weathering or soil formation).
Detrital sedimentary rock Rock formed from a solid element that has been weathered by air or water, and may undergo changes when exposed to temperature or pressure.
Unconsolidated sediment Solid material that was formed by secondary sedimentation of previously weathered rocks, and/or by chemical and biochemical precipitation.
With the new classes defined, a digitization process was applied over the geological map. If necessary, a geo-referencing operation could be applied to ensure the correct fit of all individual map sheets. Once the informational classes (Table 1) were digitized, a topological revision was carried out to verify that the classes do not overlap each other or with themselves.

Geological Fault Map
By the use of ArcGIS, the faults are digitalized using the geological map (see previous subsection). They are split into two classes: one for known faults, and another one for supposed faults (Figure 2), With the new classes defined, a digitization process was applied over the geological map. If necessary, a geo-referencing operation could be applied to ensure the correct fit of all individual map sheets. Once the informational classes (Table 1) were digitized, a topological revision was carried out to verify that the classes do not overlap each other or with themselves.

Geological Fault Map
By the use of ArcGIS, the faults are digitalized using the geological map (see previous subsection). They are split into two classes: one for known faults, and another one for supposed faults ( Once the known and supposed faults are digitized as line features, the influence zones or buffers were computed using ArcGIS. The employed criterion for the selection of the distance thresholds is that proposed by Li et al., 2017 [14], in which seven influence zones separated from each other by a distance of one kilometer are created. Since the size of the study area of [14] encompasses 10,956 km 2 , an area greater than the present study case (2236.37 km 2 ), it is necessary to scale the distance between the influence zones. Therefore, for the present research, the distance interval is reduced to 250 m. After the buffer creation, it could appear that overlap among the influence zones of the known and supposed faults in certain areas occurred. So, a priority was given to the areas of known fault over the supposed ones. Finally, the different vector files generated were grouped into a single vector file that was later converted into raster for the final GIS analysis. Once the known and supposed faults are digitized as line features, the influence zones or buffers were computed using ArcGIS. The employed criterion for the selection of the distance thresholds is that proposed by Li et al., 2017 [14], in which seven influence zones separated from each other by a distance of one kilometer are created. Since the size of the study area of [14] encompasses 10,956 km 2 , an area greater than the present study case (2236.37 km 2 ), it is necessary to scale the distance between the influence zones. Therefore, for the present research, the distance interval is reduced to 250 m. After the buffer creation, it could appear that overlap among the influence zones of the known and supposed faults in certain areas occurred. So, a priority was given to the areas of known fault over the supposed ones. Finally, the different vector files generated were grouped into a single vector file that was later converted into raster for the final GIS analysis.

Bouguer Anomalies Map
Since satellite images only report the superficial behavior, additional data was employed, in this case related to the subsoil density distribution, by means of gravimetric anomaly data. More specifically, gravimetric anomalies were added to the classification, since they can be used to identify subsurface geological structures of interest. Gravimetric anomalies have an important role in geodesy and geophysics [35]. In geodesy they are used to define the figure of the Earth. The geophysical approach aims to highlight local anomalies, which are used to deduce variations in the mass and density of the geological surface, minimizing regional gravimetric anomalies, such as the presence of mountains. In this particular project, we focused on the geophysical approach, since Bouguer anomalies can be applied to the search for aquifers [36]. For more details related to the gravimetric anomalies, please refer to [37,38].

GIS Analysis
The multilayer analysis was carried out by a random forest decision classifier because of the complexity in decision-making and the large number of parameters to be taken into account. Random forest is an ensemble classifier that produces many classification and regression-like trees where each tree is generated from different bootstrapped samples of training data [39]. This algorithm has shown that one can obtain more accurate results than other classification techniques, such as maximum likelihood, when using data obtained from different sources [40]. The first step for the GIS analysis is the selection of the training zones. They are selected from the ground-truth information (Section 2.2.4.).
The training of the random forest in ArcGIS was carried out using a multilayer image, in which each of the layer corresponds to the maps generated (Sections 2.1.1-2.1.5). Finally, the mask of the area of interest was used to proceed with the final analysis of the occupied surface for each of the classes with geothermal potential, avoiding the non-interest areas. These areas correspond to surfaces classified as water, vegetation, or infrastructure.

Materials
Next, the materials employed are described. They encompass satellite imagery, gravimetric anomalies, geological maps, and resources from spatial data infrastructures.

Satellite Imagery
Due to the research requirements, there were different sources of satellite images considered due to the different spectral information provided.
Firstly, the Sentinel-2A sensor, which is part of the Copernicus program of the European Space Agency, is characterized by having a high-resolution camera with bands covering the visible, near infrared (NIR), and middle wave infrared (SWIR) regions, with spatial resolutions of 10, 20, or 60 m and with a radiometric resolution of 12 bits, and that allows the ability to obtain images with a revisit time of 10 days [41]. This sensor was employed to generate a land cover map (Section 2.1.1.). By means of this land cover it was possible to exclude areas where the exploitation of geothermal energy was not possible, such as water or infrastructure areas.
Secondly, the other type of images that were used were those of NASA's Advanced Spaceborne Thermal Emission Reflection Radiometer (ASTER) sensor [42], which allows the ability to obtain images from the same place every 16 days, with information in 14 bands that cover the visible-infrared (VNIR), SWIR, and thermal infrared (TIR), with spatial resolutions of 15, 30, and 90 m respectively. The thermal images of the ASTER sensor were used to make a map of land surface temperatures, of high spatial resolution (90 m), to seek the correspondence between soil and subsoil temperatures (Section 2.1.2.).

Geological Map
The geological model was obtained from the Spanish National Geological Map (MAGNA) [43], at a scale of 1:50,000. The MAGNA was used to create a map of rocks and minerals, since certain types of rocks and minerals are associated with geothermal activity (Section 2.1.3.). Due to the huge amount of types of rocks in the study area, all the available informational classes were simplified into six classes, to reduce the digitation and processing time. Once the vector file corresponding with the study area was loaded, a non-total correspondence with the administrative limits was detected. This indicates a georeferencing problem of the MAGNA map, therefore, all the individual map sheets were re-georeferenced. To carry it out, the first step was to combine all the individual map sheets into a single raster file, since this eases the subsequent process, and allows the ability to apply the corrections in a more homogeneous way. The georeferencing was carried out with the GDAL plugin [18] available in QGIS.
Another use of the MAGNA was to obtain information about geological faults located in the study area. If the rocks surrounding magma chambers are permeable or fractured, and there is Remote Sens. 2019, 11, 2403 8 of 20 underground water circulation, the latter captures the heat of the rocks, being able to ascend to the surface through cracks or faults, leading to the formation of water hot springs, geysers, fumaroles, and mud volcanoes [44]. So, faults are geological structures favorably correlated to the existence of geothermal reservoirs [45,46].
The MAGNA geological series classifies the faults in two groups, known and supposed faults. The faults were digitized using the re-georeferenced geological map; so, two vector files were created with information about them (Section 2.1.4.).

Gravimetric Anomalies
The data were obtained from the International Gravimetric Bureau [47]. The original format of 2 × 2 grid in geographical coordinates requires the reprojection to the coordinate reference system of the study case (in the present case European Terrestrial Reference System 1989 (ETRS89) Universal Transverse Mercator (UTM) 29N). Next, a resampling operation (nearest neighbor) was applied to homogenize the pixel size in relation to the temperature data obtained from the ASTER images, which have a spatial resolution of 90 m.

Ground-Truth Information
The ground truth for the classification was obtained by the International Renewable Energy Agency (IRENA) [48]. More specifically, the Spain map of geothermal potential distributed by the IRENA's spatial data infrastructure [49]. This map, compilated from the Spanish Institute for the Diversification and Saving of Energy (IDAE) [50], shows the accessible, available, and total potential of geothermal resources for Spain. This map was used to obtain training areas to carry out the classification using a random forest classifier. The index for defining the geothermal potential is the estimated linear heat extraction/injection capability of the ground, expressed in W·m

Results
Next, the description of the study case, data employed, and the experimental results and their interpretation are drawn.

Study Case
The study area is located in the province of Ourense, in the autonomous community of Galicia, Spain. Specifically, in the northwest area of the province of Ourense, as can be seen in Figure 4. The

Results
Next, the description of the study case, data employed, and the experimental results and their interpretation are drawn.

Study Case
The study area is located in the province of Ourense, in the autonomous community of Galicia, Spain. Specifically, in the northwest area of the province of Ourense, as can be seen in Figure 4. The province of Ourense is characterized by great orographic contrasts, with an altitudinal range between 60 and 2124 m. Among the mountains, there are a multitude of rivers, including the Sil and the Miño, which crosses the study area diagonally.

Results
Next, the description of the study case, data employed, and the experimental results and their interpretation are drawn.

Study Case
The study area is located in the province of Ourense, in the autonomous community of Galicia, Spain. Specifically, in the northwest area of the province of Ourense, as can be seen in Figure 4. The province of Ourense is characterized by great orographic contrasts, with an altitudinal range between 60 and 2124 m. Among the mountains, there are a multitude of rivers, including the Sil and the Miño, which crosses the study area diagonally.  In Galicia, in general, it is possible to find zones of low to medium enthalpy between 200 and 1500 m [51]. The suitability of this area for the application of the present methodology is due, in part, to the presence of several thermal sources that were already known in Celtic and Roman times. As for example, the Ourense city was built around the Burgas' springs of thermal water. In addition, there are other areas with geothermal potential in the towns of Celanova and Baños de Molgas.

Data
For the Sentinel-2, three images were downloaded from the Copernicus image server [52] with a 1C processing degree. The images were from the 17/06/2017 and 24/06/2017. The other satellite images employed were the ASTER. In total, four of daytime images were downloaded from Earthdata [53] were needed for the dates of 07/06 and 23/06 of 2017. The Bouguer anomaly data were obtained from the International Gravimetric Bureau (BGI) [47] in a grid format of 2 × 2 . The geological data were obtained from the geological map of Spain (MAGNA); more specifically the sheets 153, 154, 186, 187, 188, 224, 225, 226, 262, 263, and 264 were needed to cover the complete study area.

Classification Validation
The first validation stage of the surface covers was a visual check, to dismiss the combination of algorithms and coefficients that provided non-coherent results, as for example, the classification of infrastructure areas. The maximum likelihood (ML) classification shows no significant difference among the different coefficient tested (scale factor), so, only the ML classification for a scale factor of 4000 was validated.
Through the confusion matrix (Table 2) it was possible to verify that the classification accuracy is high, as stated by an overall accuracy higher than 90% (up to 94.7%), and a kappa coefficient higher than 0.88 (up to 0.94). Although the high accuracy values obtained by the three classifications, due to the limitations of the approach, water zones, in which clouds were reflected, could not be classified correctly, and were assigned to the infrastructure class. The validated classifier was applied to the satellite images to generate the land cover mosaic ( Figure 5). An overlap priority was given to the image with the higher kappa coefficient.

Results
Next, the different layers that will be used in the GIS analysis are described in Figures 6-9. The LST map in Celsius degrees ( Figure 6) was obtained according to the methodology described in Section 2.1.2. Additionally, a cloud mask was created, since these zones do not contain valid soil temperature values. On the map it can be seen that the higher temperature zones correspond mostly to industrial or urban areas, while the lower temperature zones mostly correspond to the cloud areas, covered by the cloud mask.

Results
Next, the different layers that will be used in the GIS analysis are described in Figures 6-9. The LST map in Celsius degrees ( Figure 6) was obtained according to the methodology described in Section 2.1.2. Additionally, a cloud mask was created, since these zones do not contain valid soil temperature values. On the map it can be seen that the higher temperature zones correspond mostly to industrial or urban areas, while the lower temperature zones mostly correspond to the cloud areas, covered by the cloud mask.
LST map in Celsius degrees ( Figure 6) was obtained according to the methodology described in Section 2.1.2. Additionally, a cloud mask was created, since these zones do not contain valid soil temperature values. On the map it can be seen that the higher temperature zones correspond mostly to industrial or urban areas, while the lower temperature zones mostly correspond to the cloud areas, covered by the cloud mask. The geological map (Figure 7) was created according the procedure described in Section 2.1.3. For the MAGNA re-georeferencing a total, thirty-six points were employed and distributed as homogeneously as possible for an optimal calculation and reprojection. The three-sigma criterion was applied to discard possible outliers [54]. Using the MAGNA data source, the fault (known and supposed) of the study area were digitized. Using the approached proposed in [14], the influences zones were computed with a distance interval of 250 m (Section 2.1.4.). The resulting layer is shown in Figure 8. The geological map (Figure 7) was created according the procedure described in Section 2.1.3. For the MAGNA re-georeferencing a total, thirty-six points were employed and distributed as homogeneously as possible for an optimal calculation and reprojection. The three-sigma criterion was applied to discard possible outliers [54].
Using the MAGNA data source, the fault (known and supposed) of the study area were digitized. Using the approached proposed in [14], the influences zones were computed with a distance interval of 250 m (Section 2.1.4.). The resulting layer is shown in Figure 8.
The last layer for the GIS analysis was the gravimetric anomalies map (Bouguer anomalies) as stated in Section 2.1.5. The resampled raster map, cropped for the study area extension, is shown in Figure 9. The low spatial resolution of the data source (grid of 2 × 2 ) is a significant constraint of the present layer. Using the MAGNA data source, the fault (known and supposed) of the study area were digitized. Using the approached proposed in [14], the influences zones were computed with a distance interval of 250 m (Section 2.1.4.). The resulting layer is shown in Figure 8. The last layer for the GIS analysis was the gravimetric anomalies map (Bouguer anomalies) as stated in Section 2.1.5. The resampled raster map, cropped for the study area extension, is shown in Figure 9. The low spatial resolution of the data source (grid of 2' × 2') is a significant constraint of the present layer. Once the five layers were computed and combined into a single raster file, the GIS analysis was carried out using a random forest classifier. The first step involves the classifier training using the IRENA map [49]. A total of 569 training polygons were distributed in the study area (Figure 3) for the four classes of the geothermal potential provided by IRENA, as stated in Table 3. The parameters of the RF classification are a depth = 8 and number of trees = 100. For the assessment of the RF, k-fold cross-validation classification is employed, which split the training data into k equal-sized partitions [55]. The classifier was trained using k-1 folds, and the error was computed by checking the obtained classifier with the remaining fold. As a result, the classifier's estimated error was the average value of the errors committed in each fold. For the study case, k was set to 10. The result of the final classification is shown in Figure 10.  Once the five layers were computed and combined into a single raster file, the GIS analysis was carried out using a random forest classifier. The first step involves the classifier training using the IRENA map [49]. A total of 569 training polygons were distributed in the study area (Figure 3) for the four classes of the geothermal potential provided by IRENA, as stated in Table 3. The parameters of the RF classification are a depth = 8 and number of trees = 100. For the assessment of the RF, k-fold cross-validation classification is employed, which split the training data into k equal-sized partitions [55]. The classifier was trained using k-1 folds, and the error was computed by checking the obtained classifier with the remaining fold. As a result, the classifier's estimated error was the average value of the errors committed in each fold. For the study case, k was set to 10. The result of the final classification is shown in Figure 10.  Figure 10. Geothermal potential map.
The classification results are evaluated by means of the overall accuracy, Cohen's Kappa index, and the receiving operating characteristic (ROC) curve area. The area under the ROC curve (AUC) provides a comparison between the predicted and actual instance values in a classification by measuring the precision and the recall [56]. The RF classification for the cross-validation achieved a Kappa index of 88.4%, an overall accuracy of 94.0% and an AUC equal to 98.7%. Please note that the last values correspond to the weighted averaged since the present classification employs a multi-class model [57].

Discussion
The result of the classification, as can be seen in Figure 11a, shows that areas with potential greater than 80 W·m −1 (highlighted in red) are found mainly in the southeast and east of the study area. Potential zones between 35 and 50 W·m −1 , as expected, are located near the potential area, greater than 80 W·m −1 , located in the east, in addition to a small area located near the center of the study area. The surface of each geothermal class is summed up in Table 4.
As can be seen in Figure 11b, there are a large number of common areas between the results obtained from the random forest classification and the IRENA map. Besides, it can be seen that the map obtained from the proposed classifier, the area related to a geothermal potential of 40-60 W·m −1 is larger than the stated by ground truth, which categorizes as 60-80 W·m −1 . Comparing this area with the Bouguer gravimetric anomalies map we can notice that the area of potential greater than 80 W·m −1 is located where the gravimetric anomalies are at its lowest, meaning that there is a relationship between them, as it was described in several works like [14,58,59].
Paying attention to the geological map ( Figure 7) and the geothermal potential map (Figure 10) there is a clear resemblance, as some areas look similar in both maps. For example, the area of potential greater than 80 W·m −1 located in the bottom east of the study area occupies the same area in the geological map as the loose sediments. Another example that stands out more is the case of the area of potential between 60 and 80 W·m −1 , which occupies the same area as the plutonic or intrusive igneous rocks. This relationship between some types of rocks and sediments is not unfamiliar as some other researchers have noticed this relationship before [10,12]. The classification results are evaluated by means of the overall accuracy, Cohen's Kappa index, and the receiving operating characteristic (ROC) curve area. The area under the ROC curve (AUC) provides a comparison between the predicted and actual instance values in a classification by measuring the precision and the recall [56]. The RF classification for the cross-validation achieved a Kappa index of 88.4%, an overall accuracy of 94.0% and an AUC equal to 98.7%. Please note that the last values correspond to the weighted averaged since the present classification employs a multi-class model [57].

Discussion
The result of the classification, as can be seen in Figure 11a, shows that areas with potential greater than 80 W·m −1 (highlighted in red) are found mainly in the southeast and east of the study area. Potential zones between 35 and 50 W·m −1 , as expected, are located near the potential area, greater than 80 W·m −1 , located in the east, in addition to a small area located near the center of the study area. The surface of each geothermal class is summed up in Table 4.
As can be seen in Figure 11b, there are a large number of common areas between the results obtained from the random forest classification and the IRENA map. Besides, it can be seen that the map obtained from the proposed classifier, the area related to a geothermal potential of 40-60 W·m −1 is larger than the stated by ground truth, which categorizes as 60-80 W·m −1 . Comparing this area with the Bouguer gravimetric anomalies map we can notice that the area of potential greater than 80 W·m −1 is located where the gravimetric anomalies are at its lowest, meaning that there is a relationship between them, as it was described in several works like [14,58,59].
was said before, the areas with the highest temperatures in the map correspond with areas of infrastructures, whereas in other areas the temperature variation is not enough to establish comparisons. The relationship between temperature in the subsoil and the surface using remote sensing techniques has been a goal of many works like [14,60] making it a reliable technique to apply in a geothermal study.  To refine the areas related to geothermal reservoirs, the land cover map was used to take into account those areas that cannot be exploited because they are occupied by infrastructures (roads, buildings, etc.) water and vegetation zones, and areas without reliable data, as the zones affected by the cloud cover during the image acquisition. As can be seen in Figure 12 and in Table 5, the number of potential zones has been reduced to almost half for the classes 35-50 W·m −1 and 40-60 W·m −1 .  Paying attention to the geological map ( Figure 7) and the geothermal potential map (Figure 10) there is a clear resemblance, as some areas look similar in both maps. For example, the area of potential greater than 80 W·m −1 located in the bottom east of the study area occupies the same area in the geological map as the loose sediments. Another example that stands out more is the case of the area of potential between 60 and 80 W·m −1 , which occupies the same area as the plutonic or intrusive igneous rocks. This relationship between some types of rocks and sediments is not unfamiliar as some other researchers have noticed this relationship before [10,12].
As for the geological faults map and the LST map, it is difficult to associate them with the geothermal potential map directly. In the case of the geological faults map (Figure 8), both known and unknown faults are distributed all over the study area in areas occupied by different types of geothermal potential classes. In the case of the LST map ( Figure 6) this is due to the fact that, as it was said before, the areas with the highest temperatures in the map correspond with areas of infrastructures, whereas in other areas the temperature variation is not enough to establish comparisons. The relationship between temperature in the subsoil and the surface using remote sensing techniques has been a goal of many works like [14,60] making it a reliable technique to apply in a geothermal study.
To refine the areas related to geothermal reservoirs, the land cover map was used to take into account those areas that cannot be exploited because they are occupied by infrastructures (roads, buildings, etc.) water and vegetation zones, and areas without reliable data, as the zones affected by the cloud cover during the image acquisition. As can be seen in Figure 12 and in Table 5  To assess the influence of the various elements of the presented workflow in the detecting ability of areas of geothermal potential, a sensitivity study was carried out. It aimed to highlight the importance the various features (see Figure 1) in detecting locations with the RF classifier.
It was considered a total of six combinations with three elements, namely: geological map, faults, and gravimetric anomalies. The land cover and LST map are included always in the assessment. A RF classification with a depth = 8 and number of trees = 100 trees was applied with a k-fold cross-validation with k = 10. In Table 6, the overall accuracy, the Kappa index, and the area under the ROC curve (AUC) for the six combinations, as well as the reference are reported.   To assess the influence of the various elements of the presented workflow in the detecting ability of areas of geothermal potential, a sensitivity study was carried out. It aimed to highlight the importance the various features (see Figure 1) in detecting locations with the RF classifier.
It was considered a total of six combinations with three elements, namely: geological map, faults, and gravimetric anomalies. The land cover and LST map are included always in the assessment. A RF classification with a depth = 8 and number of trees = 100 trees was applied with a k-fold cross-validation with k = 10. In Table 6, the overall accuracy, the Kappa index, and the area under the ROC curve (AUC) for the six combinations, as well as the reference are reported. Table 6. Sensitivity analysis. Results of a k-fold cross-validation (k = 10) of a random forest classification (depth = 8 and number of trees = 100). As it is shown in the previous table, there is a common behavior for the three variables considered. The faults' influence zone is the class with the lowest influence in the final results, lowering the Kappa index about 5.4% when it is absent. However, when the geological map or the gravimetric anomalies do not participate in the classification the Kappa index worsens between 14.2% and 16.0% respectively. In the last three rows of Table 6 the effect is shown when only three variables are considered: The land cover and LST map plus the highlighted variable. First of all, the results when only the faults are taken into account are very significant, yielding results similar to a random classification (Kappa = 0, and AUC = 0.5). Secondly, the geological map plays the most significant role, with a decrease of the Kappa index of only 16.6%, whereas the AUC remains approximately 90%. Finally, the inclusion of the gravimetric anomalies is not sufficient for an adequate classification, as stated by the low Kappa index (56%).

Geological
In order to assess the ability of the RF algorithm to forecast the geothermal potential, the proposed methodology is tested, training the RF classifier with a non-random subset of the training polygons. Next, the overall accuracy of the rest of the ground truth information, which has stayed completely unknown to the RF during training, is quantified. So, the RF is trained using only the training polygons located in the southeast quadrant of the study area, where the four classes of geothermal potential are present. A total of 171 polygons are considered, namely 30% of the original set. The RF configuration remains the same: depth = 8, number of trees = 100, and a k-fold validation with k = 10. The RF classification for the cross-validation achieved a Kappa index of 86.4%, an overall accuracy of 92.7%, and an AUC equal to 98.6%. These values are pretty similar to those obtained for the complete study area (see previous section). The subset RF classification is applied to the southeast quadrant of 529.8 km 2 (23.8% of the total area of the study region). Next, the trained classifier is applied to areas progressively larger up to the complete study region (2227.31 km 2 ). The overall accuracy and Kappa index for the different areas is evaluated in relation to the original RF classification (Figure 11a) as shown in Table 7. As shown in Table 7, the results of the subset RF classification diverges from the reference classification ( Figure 11a) proportionally to the distance from the training area. It is significant that the Kappa index decrease abruptly when the tested region is about 2.8 times the trained area and then the values decrease with a soft trend. For predictions up to 2.1 times the original training area, the proposed methodology is able to achieve an adequate classification with a Kappa > 75% [61,62].
As future lines of improvement, the enhancement of the initial dataset, as for example the generation of gravimetric anomalies by field observations, for a fine detection of geothermal potential zones, is recommended. Another improvement would be a geological survey of the study area, which would imply an enrichment of the geological classes as well as the reclassified informational classes. However, the potential improvement of the final precision will imply a significant increase in the cost of the approach; by these reasons, the present research, based on open/free data sources, provides an optimal baseline according to the authors.
The addition of data related to aquifer zones would allow obtaining the areas suitable for the exploitation of geothermal energy through vertical circuits, which take advantage of geothermal energy by taking and reinjecting water from aquifers. The thermal data would be improved by satellite night thermal images that would allow the ability to compare with the day images, and to derive thermal inertia maps.
Finally, it would be of interest to carry out geothermal drill tests in the areas identified with great geothermal potential, in addition to several samples in other areas spread over the study area, to validate the technique and the results obtained.

Conclusions
The use of geomatic techniques, such as remote sensing and geographical information systems allowed obtaining areas of geothermal potential in a simple way. The Sentinel-2A images allowed the creation of a land cover map in a detailed spatial resolution (between 10 and 20 m) and with zero cost. However, the use of these images together with the use of a spectral signature library did not allow the creation of a geological map as expected in the early stages of the work (probably due to the variation of conditions in the data acquisition and the library ones). Another factor related to this limitation is the scarce number of bands in the mid-infrared region, where it is easier to differentiate rocks [63]. The other image set employed (from ASTER), allowed the ability to obtain maps of soil temperatures at a fairly detailed spatial resolution (90 m) for a thermal sensor, and without any acquisition cost. The gravimetric anomalies, such as the Bouguer anomalies employed in the present work, are suitable for the search of geothermal potential areas due to their relationship with aquifer zones. Finally, the use of random forest eased the decision making in projects with a large amount of different and complex data, and where categorical and continuous data are mixed.
To incorporate geothermal resources into the energy network, it is necessary to optimize energy distribution and service organization. The availability of spatial information on the geothermal capacities of each zone allows the optimization of the energy network in terms of efficiency. Since the drilling costs are important, the identification of the reservoirs with higher potential will allow the reduction of these costs, such as the drilling length or the number of drill tests. The proposed workflow allows an easy generalization to different scenarios and regions and can be enriched with additional information layers.