Tools for Remote Exploration: A Lithium (Li) Dedicated Spectral Library of the Fregeneda–Almendra Aplite–Pegmatite Field

: The existence of diagnostic features in the visible and infrared regions makes it possible to use reﬂectance spectra not only to identify mineral assemblages but also for calibration and classiﬁcation of satellite images, considering lithological and/or mineral mapping. For this purpose, a consistent spectral library with the target spectra of minerals and rocks is needed. Currently, there is big market pressure for raw materials including lithium (Li) that has driven new satellite image applications for Li exploration. However, there are no reference spectra for petalite (a Li mineral) in large, open spectral datasets. In this work, a spectral library was built exclusively dedicated to Li minerals and Li pegmatite exploration through satellite remote sensing. The database includes ﬁeld and laboratory spectra collected in the Fregeneda–Almendra region (Spain–Portugal) from (i) distinct Li minerals (spodumene, petalite, lepidolite); (ii) several Li pegmatites and other outcropping lithologies to allow satellite-based lithological mapping; (iii) areas previously misclassiﬁed as Li pegmatites using machine learning algorithms to allow comparisons between these regions and the target areas. Ancillary data include (i) sample location and coordinates, (ii) sample conditions, (iii) sample color, (iv) type of face measured, (v) equipment used, and for the laboratory spectra, (vi) sample photographs, (vii) continuum removed spectra ﬁles, and (viii) statistics on the main absorption features automatically extracted. The potential future uses of this spectral library are reinforced by its major advantages: (i) data is provided in a universal ﬁle format; (ii) it allows users to compare ﬁeld and laboratory spectra; (iii) a large number of complementary data allow the comparison of shape, asymmetry, and depth of the absorption features of the distinct Li minerals. Dataset: http://doi.org/10.5281/zenodo.4575375.


Summary
The use of reflectance spectroscopy to identify minerals through diagnostic absorption features in the visible and infrared regions has been described by several authors in recent Data 2021, 6, 33 2 of 10 decades [1][2][3][4]. Due to these diagnostic features, the acquired mineral spectra can be employed in several knowledge-based satellite image classification approaches to delineate target areas for mineral occurrences [5,6]. Moreover, field spectra allows ground-checking of the remotely sensed data (e.g., [7][8][9]) and validation of the atmospheric corrections made to satellite images.
The recent growing economic importance of lithium (Li), mainly due to its application in batteries for electric cars, has triggered several attempts to use satellite images to target the occurrence of Li minerals and Li pegmatites [10][11][12][13][14]. Reference spectra for some of the most important Li minerals, such as spodumene and lepidolite, can be found in the United States Geological Survey (USGS) [15] and ECOSTRESS [16] spectral libraries. Nonetheless, there are no reference spectra for petalite in these open domain spectral libraries. The Geological Survey of Brazil (CPRM) has been trying to address this issue by compiling a spectral database with a Li minerals' dedicated section, but so far the petalite diagnostic features have not been identified [17].
To fill in this gap and to complement the development of image classification algorithms for Li exploration, a spectral database was built in this work based on samples collected in the Fregeneda-Almendra aplite-pegmatite field ( Figure 1). In this region, spodumene, petalite, and lepidolite minerals occur in evolved pegmatites [18,19] that intruded metasedimentary rocks belonging to the "Complexo Xisto-Grauváquico" (CXG) [20]. The use of reflectance spectroscopy to identify minerals through diagnostic absorption features in the visible and infrared regions has been described by several authors in recent decades [1][2][3][4]. Due to these diagnostic features, the acquired mineral spectra can be employed in several knowledge-based satellite image classification approaches to delineate target areas for mineral occurrences [5,6]. Moreover, field spectra allows ground-checking of the remotely sensed data (e.g., [7][8][9]) and validation of the atmospheric corrections made to satellite images.
The recent growing economic importance of lithium (Li), mainly due to its application in batteries for electric cars, has triggered several attempts to use satellite images to target the occurrence of Li minerals and Li pegmatites [10][11][12][13][14]. Reference spectra for some of the most important Li minerals, such as spodumene and lepidolite, can be found in the United States Geological Survey (USGS) [15] and ECOSTRESS [16] spectral libraries. Nonetheless, there are no reference spectra for petalite in these open domain spectral libraries. The Geological Survey of Brazil (CPRM) has been trying to address this issue by compiling a spectral database with a Li minerals' dedicated section, but so far the petalite diagnostic features have not been identified [17].
To fill in this gap and to complement the development of image classification algorithms for Li exploration, a spectral database was built in this work based on samples collected in the Fregeneda-Almendra aplite-pegmatite field ( Figure 1). In this region, spodumene, petalite, and lepidolite minerals occur in evolved pegmatites [18,19] that intruded metasedimentary rocks belonging to the "Complexo Xisto-Grauváquico" (CXG) [20]. Location of the Fregeneda-Almendra pegmatite field in the Iberian Peninsula. Diamonds represent pegmatites containing Li minerals: green-petalite; blue-spodumene; red-lepidolite; orange-spodumene+lepidolite [18,19]. The map projection is Universal Transverse Mercator zone 29N from the WGS84 datum.
The spectral library is composed of field and laboratory spectra not only of Li minerals (spodumene, petalite, lepidolite) but also from the main outcropping lithologies of the Fregeneda-Almendra area (granitoid rocks, CXG metasediments, Li pegmatite). These rock spectra were mainly acquired in areas of good exposition so they could be used as training areas for satellite-based lithological mapping. Additionally, to allow further investigation on the ability to discriminate Li minerals and Li pegmatites from other lithologies, the spectra from areas misclassified as Li pegmatites using machine learning algorithms [13] are also provided. The comparison of the aforementioned data can allow users to evaluate the degree of spectral similarity between the target minerals/rocks and the remaining within-scene elements.
Complementary data such as (i) the sample location and coordinates (when available), (ii) degree of alteration of the sample, (iii) sample color, (iv) type of face measured, Figure 1. Location of the Fregeneda-Almendra pegmatite field in the Iberian Peninsula. Diamonds represent pegmatites containing Li minerals: green-petalite; blue-spodumene; red-lepidolite; orange-spodumene + lepidolite [18,19]. The map projection is Universal Transverse Mercator zone 29N from the WGS84 datum.
The spectral library is composed of field and laboratory spectra not only of Li minerals (spodumene, petalite, lepidolite) but also from the main outcropping lithologies of the Fregeneda-Almendra area (granitoid rocks, CXG metasediments, Li pegmatite). These rock spectra were mainly acquired in areas of good exposition so they could be used as training areas for satellite-based lithological mapping. Additionally, to allow further investigation on the ability to discriminate Li minerals and Li pegmatites from other lithologies, the spectra from areas misclassified as Li pegmatites using machine learning algorithms [13] are also provided. The comparison of the aforementioned data can allow users to evaluate the degree of spectral similarity between the target minerals/rocks and the remaining within-scene elements.
Complementary data such as (i) the sample location and coordinates (when available), (ii) degree of alteration of the sample, (iii) sample color, (iv) type of face measured, and (v) equipment used, are provided for each spectrum. Spectra acquisition and curation are described thoroughly. For the laboratory spectra, also available are (i) sample photographs, Data 2021, 6, 33 3 of 10 (ii) respective continuum removed spectra files, and (iii) details on the main absorption features automatically extracted.
This spectral database was established in the ambit of the "Lightweight Integrated Ground and Airborne Hyperspectral Topological Solution" (LIGHTS) project, whose goal is to develop a tool that combines remote sensing data acquired at different scales with geological and geochemical data to rapidly identify target areas for Li exploration [21,22]. However, such a database could be useful for other ongoing research projects, namely: (i) the fiber laser plasma spectroscopy system for real-time element analysis (FLaPsys) project, which aims at developing an advanced spectroscopy system capable of real-time element identification and quantification mainly applied to Li mineralizations [23], and in which laser-induced breakdown spectroscopy (LIBS) will be correlated with the visible and infrared data acquired; (ii) new exploration tools for a European pegmatite greentech resources (GREENPEG) project which aims at improving responsible exploration in Europe for pegmatites through the development of integrated, multi-method exploration toolsets that include satellite image processing and airborne and ground-based geophysics and geochemical approaches [24]. Part of this spectral database was already the basis for some publications [25,26], but more works are expected in the future since there is great potential in this dataset whose main advantages include: (i) that the data are provided in a universal text file format that is not dependent on software; (ii) the ability to compare field and laboratory spectra for 52 coincident spots; (iii) finally, the details and statistics of the extracted features provided allow the user to compare the shape, asymmetry, and depth of the absorption features of the distinct Li minerals, including petalite.

Data Description
The spectral database is divided into two subsets; the first concerning the spectra collected in the laboratory and the second corresponding to the spectra acquired in the field. The spectra were organized into categories within each subset according to their application purpose (Table 1). To allow a rapid and easy identification of the spectra, the spectrum naming was made considering logical codes embedded in the spectrum title (Table 1). Consequently, information such as the location of the sample and analyzed lithology/mineralogy can be readily extracted based just on the spectrum name. Besides the codes of Table 1, each spectrum has its associated measurement number. All spectra are provided in a universal UTF-8 text file format that can be read in any proprietary or open-source software, representing measurements made in the visible and near-infrared (VNIR) and shortwave infrared (SWIR) regions. From a total of 340 spectra collected in the laboratory, 84 represent Li minerals, 196 correspond to the outcropping lithologies of the Fregeneda-Almendra area that can be used as training areas for satellite image classification, and 60 spectra were collected from samples in Li pegmatite falsepositive areas identified in previous satellite image classification attempts [13]. Additionally, 75 field spectra are presented in the spectral database (35 measurements of Li minerals and 40 measurements of distinct Li pegmatites). As mentioned in Table 1, two spectrometers were used for spectral measurements. For the data acquired with the SR-6500 equipment, the UTF-8 text files are composed of a 28 line-header containing information about the equipment and acquisition settings and two columns, the first with the wavelength (in nanometers or nm) and the second with the measured reflectance (in percentage). In the case of the spectra acquired with the ASD FieldSpec 4, there are just two columns, one with the wavelength (in nm) and the other with absolute reflectance values. For each subset (field and laboratory spectra) there is a *.xlsx table containing important ancillary data: The coordinates are only available for samples collected in situ and, therefore, samples collected in ore stockpiles (for example) do not have this kind of information registered. For the spectra acquired in the laboratory, additional information is also provided, namely: (i) sample photographs with the analyzed spots highlighted, (ii) a UTF-8 text file with the respective continuum removed spectra (with absolute reflectance values); (iii) PNG files showing for each spectrum the main absorption features and a CSV file summarizing the main statistics of each feature (Section 3.2).

Spectra Acquisition
Two field campaigns were carried out in February and July of 2020 to collect the data to build the spectral library. At the time of the first campaign, there was no portable spectrometer available in the University of Porto to collect spectra in situ, therefore representative samples from 25 locations were taken to the GeoRessources laboratory (University of Lorraine). The spectral measurements were performed with the SR-6500 equipment. Due to difficulties in accessing pegmatites containing spodumene on the field, samples from an existing Li mineral collection at the University of Porto were also analyzed. In the second field survey, favorable weather conditions (absence of clouds) allowed collection of spectra from several Li pegmatites and the different Li minerals in 35 locations dispersed along the Fregeneda-Almendra pegmatite field. In this case, the ASD FieldSpec 4 spectrometer was used. Samples were retrieved in 28 of these 35 locations to collect further spectra under a controlled environment at the University of Porto, using the same equipment. Either in the field or the laboratory, the two spectrometers were calibrated using a Spectralon (Labsphere) plate with a maximum reflectance higher than 95% for the 250-2500 nm region, and higher than 99% for 400-1500 nm interval. In the field, the calibration was repeated every time the solar lighting conditions changed (about each 30 ). Regarding the acquisition settings, to improve the signal-to-noise ratio, each of the spectra acquired in the laboratory represented the average of 40 scans. To expedite spectra acquisition in the field, this value was lowered to 30 scans. However, when using the ASD FieldSpec 4 equipment, five spectra were acquired in each analyzed spot. These five Data 2021, 6, 33 5 of 10 spectra were averaged afterward (Section 3.2) and only the final spectrum was provided in the spectral library. The characteristics of each spectrometer are compared in Table 2. The major differences are observed in terms of the spectral resolution with the SR-6500 outperforming the ASD FieldSpec 4. 0.8 × 10 −9 W/cm 2 /nm/sr @ 400 nm 1.0 × 10 −9 W/cm 2 /nm/sr @ 700 nm 0.3 × 10 −9 W/cm 2 /nm/sr @ 1500 nm 1.2 × 10 −9 W/cm 2 /nm/sr @ 1400 nm 5.8 × 10 −9 W/cm 2 /nm/sr @ 2100 nm 1.9 × 10 −9 W/cm 2 /nm/sr @ 2100 nm Weight 4.99 kg (11 pounds

Spectra Curation and Treatment
The spectra acquired with ASD FieldSpec 4 were post-processed using the ASD ViewSpec Pro ™ software [29]. The first step included averaging each of the five spectra collected for each spot. To the averaged spectrum, a splice correction was applied to remove the offset between detectors that is caused by differences in their sensibility [29] (Figure 2). Finally, the final spectra were exported to UTF-8 text files. In the case of the spectra collected using the SR-6500, a simple conversion from SED to the UTF-8 text file was necessary. The SpectraGryph software [30] was used to analyze each spectrum individually to exclude the ones with measurement errors or with noticeable noise caused by the presence of vegetation, for example. Data 2021, 6, 33 6 of 10 collected for each spot. To the averaged spectrum, a splice correction was applied to remove the offset between detectors that is caused by differences in their sensibility [29] ( Figure 2). Finally, the final spectra were exported to UTF-8 text files. In the case of the spectra collected using the SR-6500, a simple conversion from SED to the UTF-8 text file was necessary. The SpectraGryph software [30] was used to analyze each spectrum individually to exclude the ones with measurement errors or with noticeable noise caused by the presence of vegetation, for example.
(a) (b) Figure 2. Splice correction process: (a) before and (b) after applying the correction. The VNIR and SWIR2 ranges are adjusted using the endpoints (~1000 to 1800 nm) and shape of the curve of the SWIR1 range since this detector is more stable to temperature changes [29].
To allow a more detailed analysis of each spectrum, a hull quotient procedure was applied to remove the continuum and normalize the spectra (Figure 3). This was accomplished in a Python environment using the pysptools library [31] based on the technique proposed by Clark and Roush [2]. First, a convex hull was fitted to the spectrum ( Figure  3a) connecting the hull points (corresponding to local maxima) [32]. Afterwards, the convex hull (continuum) was removed through a hull-quotient process and the measured reflectance values were divided by the convex hull [2,[33][34]. The continuum removal process is of great importance because it not only allows the comparison of the data obtained with different spectrometers but also allows the enhancement and correct determination of the absorption features [32][33][34]. The pysptools library was used to automatically extract the main absorption features (Figure 3b-d) and to calculate their associated statistics according to the work of Kokaly [35]. To avoid the extraction of non-significant Figure 2. Splice correction process: (a) before and (b) after applying the correction. The VNIR and SWIR2 ranges are adjusted using the endpoints (~1000 to 1800 nm) and shape of the curve of the SWIR1 range since this detector is more stable to temperature changes [29].
To allow a more detailed analysis of each spectrum, a hull quotient procedure was applied to remove the continuum and normalize the spectra (Figure 3). This was accomplished in a Python environment using the pysptools library [31] based on the technique proposed by Clark and Roush [2]. First, a convex hull was fitted to the spectrum (Figure 3a) connecting the hull points (corresponding to local maxima) [32]. Afterwards, the convex hull (continuum) was removed through a hull-quotient process and the measured reflectance values were divided by the convex hull [2,33,34]. The continuum removal process is of great importance because it not only allows the comparison of the data obtained with different spectrometers but also allows the enhancement and correct determination of the absorption features [32][33][34]. The pysptools library was used to automatically extract the main absorption features (Figure 3b-d) and to calculate their associated statistics according to the work of Kokaly [35]. To avoid the extraction of non-significant features, a baseline value of 0.93 was employed: all features below the baseline were kept and the ones above it were rejected [31] (Figure 3). For each spectrum, a CSV file was automatically created summarizing the statistics of each absorption feature. Table 3 shows a list of the extracted statistics and the respective abbreviation used in the CSV files. The Supplementary Materials provided shows the adaptation of the pysptools functions to batch process this particular spectral library. features, a baseline value of 0.93 was employed: all features below the baseline were kept and the ones above it were rejected [31] (Figure 3). For each spectrum, a CSV file was automatically created summarizing the statistics of each absorption feature. Table 3 shows a list of the extracted statistics and the respective abbreviation used in the CSV files. The Supplementary Material provided shows the adaptation of the pysptools functions to batch process this particular spectral library.

User Notes
This spectral database has the advantage of presenting field and laboratory spectra for 52 analyzed spots, allowing comparisons and evaluation of the influence of the acquisition conditions on the final spectra. Additionally, field spectra are expected to better match the spectral signatures acquired by airborne sensors. However, as can be seen in Figure 4, field spectra present additional sharp bands around 1400 and 1800 nm caused by atmospheric water that should be removed before additional processing.
A final note on the usage of the automatically extracted features that are available for all spectra acquired in the laboratory. As aforementioned, a baseline value was set to determine which absorption features to extract. Therefore, setting a different baseline may produce different results and change the features' statistics. Additionally, different results can be obtained if the features are extracted considering the whole spectrum or spectra subset matching a given region of interest [34]. According to the user's objectives, one of the two methods may be more appropriate. This spectral database has the advantage of presenting field and laboratory spectra for 52 analyzed spots, allowing comparisons and evaluation of the influence of the acquisition conditions on the final spectra. Additionally, field spectra are expected to better match the spectral signatures acquired by airborne sensors. However, as can be seen in Figure 4, field spectra present additional sharp bands around 1400 and 1800 nm caused by atmospheric water that should be removed before additional processing. A final note on the usage of the automatically extracted features that are available for all spectra acquired in the laboratory. As aforementioned, a baseline value was set to determine which absorption features to extract. Therefore, setting a different baseline may produce different results and change the features' statistics. Additionally, different results can be obtained if the features are extracted considering the whole spectrum or spectra subset matching a given region of interest [34]. According to the user's objectives, one of the two methods may be more appropriate.
Supplementary Materials: The source code adapted in this study to remove the continuum and automatically extract the main absorption features is available online at www.mdpi.com/xxx/s1.

Conflicts of Interest:
The authors declare no conflict of interest.