Global Inversion of Lunar Surface Oxides by Adding Chang’e-5 Samples

: The chemical distribution on the lunar surface results from the combined effects of both endogenic and exogenic geological processes. Exploring global maps of chemical composition helps to gain insights into the compositional variation among three major geological units, unraveling the geological evolution of the Moon. The existing oxide abundance maps were obtained from spectral images of remote sensing and geochemical data from samples returned by Apollo and Luna, missing the chemical characteristics of the Moon’s late critical period. In this study, by adding geochemical data from Chang’e (CE)-5 lunar samples, we construct inversion models between the Christian-sen feature (CF) and oxide abundance of lunar samples using the particle swarm optimization–extreme gradient boosting (PSO-XGBoost) algorithm. Then, new global oxide maps (Al 2 O 3 , CaO, FeO, and MgO) and Mg# with the resolution of 32 pixels/degree (ppd) were produced, which reduced the space weathering effect to some extent. The PSO-XGBoost models were compared with partial least square regression (PLSR) models and four previous results, indicating that PSO-XGBoost models possess the capability to effectively describe nonlinear relationships between CF and oxide abundance. Furthermore, the average contents of our results and the Diviner results for 21 major maria demonstrate high correlations, with R 2 of 0.95, 0.82, 0.95, and 0.86, respectively. In addition, a new Mg# map was generated, which reveals different magmatic evolutionary processes in the three geologic units.


Introduction
Al, Ca, Fe, and Mg, as major components of lunar regolith, reflect a variety of complex geological processes, including volcanism and impact events, providing insight into the petrological characteristics and geological evolution of the Moon.Although chemical abundance within lunar samples can be assessed through laboratory measurements, the limited geochemical information from lunar samples represents only a small range of their composition [1][2][3].It is not yet possible to gain an overall understanding of the compositional distribution across the Moon through limited sample data.
Nowadays, remote sensing techniques, including high-energy and optical techniques, such as Lunar Prospector (LP) gamma ray spectroscopy (GRS) data [4][5][6][7], LP neutron spectroscopy data [8][9][10][11][12], Clementine ultraviolet/visible (UV/VIS) data [1,[13][14][15], Lunar Reconnaissance Orbiter wide angle camera (WAC) ultraviolet/visible (UV/VIS) data [16], KAGUYA multiband imager (MI) data [2,[17][18][19][20] and Chang'e-1 (CE-1) interference imaging spectrometer (IIM) data [3,[21][22][23][24], are widely used in oxide abundance inversion on the lunar surface.Compared with high-energy techniques, optical techniques offer the advantage of higher resolution, which has led to their predominant use for lunar surface oxide abundance inversion in existing studies [13,22].However, they also present some problems that need to be considered.(1) Sensitivity of visible near-infrared (VNIR) data: Only Fe and Ti, as transition metals, possess the ability for ligand field transitions that have absorption diagnostic features in the visible to near-infrared range [25][26][27].Previous models for FeO and TiO2 have relied on reflectance and a two-band ratio targeting the 1 micron iron band [15].(2) Limitations of previous models: The use of band ratio methods, which rely on only two bands to describe the non-linear relationship between reflectance and oxide abundance, has limited capability in accurately estimating oxide content [23].The accuracy of oxide content estimation is highly dependent on the inversion model [3,23].(3) Photometric effects on reflectance values: Differences in solar elevation angle at different times of the lunar day result in varying terrain shadow effects, which have an impact on the reflectance of the lunar surface and consequently increase the uncertainty in oxide content inversion [28].Additionally, the topographic effects also contribute to lower coverage in optical images [23], particularly in polar regions [3,23].
To overcome these problems, many studies have turned their attention to thermal infrared (TIR) remote sensing [28][29][30][31][32][33][34].The thermal infrared spectra of silicate minerals exhibit a prominent emissivity maximum near 8 μm, known as the Christiansen feature (CF).The CF can be used as an indicator for identifying the composition of silicate minerals, such as plagioclase, pyroxene, and olivine [35], compensating for the limitations of visible and NIR remote sensing [36].Additionally, the wavelength position of the CF is closely related to the degree of polymerization of silicate minerals and the content of cations in the minerals [29,37,38].Therefore, the CF map derived by Lucey et al. [32], with the correction of space weathering effects and a resolution of 32 pixels/degree (ppd), provides a different perspective for quantitative inversion of oxide abundance on the lunar surface.
In addition to remote sensing data, geochemical information from returned lunar samples and in situ measurements are also the basis of compositional inversion, providing a reliable ground truth for compositional modeling of the lunar surface.The lunar regolith samples used in previous studies were obtained from six Apollo and three Luna missions, which represent chemical features only from 3.1-4.3Ga [39,40], missing the later stage of lunar evolution.In November 2020, the Chang'e-5 (CE-5) mission returned lunar regolith samples from the Mons Rümker in the northern Oceanus Procellarum (43.06°N, 51.92°W), with a radiometric age of 2.0 Ga [41][42][43].The CE-5 sample represents the youngest volcanic eruption, and its inclusion holds significant importance in updating the lunar surface oxide maps.
The distribution of oxides on the lunar surface exhibits significant heterogeneity, which is particularly evident in samples from the highlands and maria, with corresponding differences in the number of samples from these regions [7].This heterogeneity, combined with the inconsistency in sample quantities, imposes potential limitations in predicting using single-model algorithms.However, ensemble machine learning algorithms can help alleviate these limitations by integrating multiple models to improve the accuracy and robustness of predictions [44].Among these, the extreme gradient boosting (XGBoost) algorithm, as an efficient integrated learning technique, has demonstrated excellent performance in several fields [45][46][47].Meanwhile, the partial least squares regression (PLSR) algorithm, as a well-established linear regression analysis method, is particularly suitable for datasets in which the number of predictor variables is larger than the number of observed samples, and it has been widely used in soil spectral analysis [48][49][50].Therefore, we choose the results of the PLSR model as the control group for the XGBoost model, aiming to explore the feasibility of machine learning algorithms to estimate the abundance of oxides and improve the accuracy of the inversion results.
Given the above limitations and challenges, this study aims to (1) employ the particle swarm optimization-extreme gradient boosting (PSO-XGBoost) algorithm to construct inversion models for exploring updated maps of oxide (Al2O3, CaO, FeO, and MgO) abundance, (2) evaluate the predictive accuracy of our model in comparison to the PLSR model and other previous work, (3) investigate the new Mg# map that delineates the lunar surface into three geological units (maria, highlands, and SPA basin) with distinct Mg# features, and discuss the differences in the magmatic processes undergone in these units.According to the results, the oxide distribution maps in this work provide valuable data support for understanding the geological evolution of the Moon.

Christiansen Feature
CF refers to the phenomenon in the spectra of silicate samples involving a strong minimum reflectance and maximum emission near 8 μm [35].The CF feature is mainly related to the Si-O stretching vibrations in the crystals.Although fine grain sizes characteristic of lunar soil can somewhat suppress the spectral contrast of vibrational features [51], the CF and its wavelength sensitivity to composition still persist, making the CF important in lunar remote sensing [32].Plagioclase, pyroxene, olivine, and ilmenite are the major minerals on the Moon, all of which, except for ilmenite, are silicate minerals and exhibit distinctly different CF values.Among them, the CF values of plagioclase, pyroxene, and olivine are 7.84 μm, 8.25 μm, and 8.67 μm, respectively [31].

Diviner CF Map
Diviner is a nine-channel radiometer carried on the Lunar Reconnaissance Orbiter (LRO) satellite, with a wavelength range from ultraviolet to far-infrared, approximately 0.35-400 μm [52,53].Diviner's three narrow channels near 8 μm were utilized to estimate the spectral position of maximum emission, known as CF [33].The CF maps were based on three-point spectral data from Diviner and there are uncertainties relating to those CF positions based on the position of the Diviner spectral filters.Recently, Lucey et al. [32] corrected the effects of space weathering and generated a new CF map with a resolution of 32 ppd (948 m/pixel at the equator), providing strong support for chemical (Al2O3, CaO, FeO, and MgO) inversion (Figure 1).

Lunar Sampling Sites
Lunar sample data are the basis of compositional inversion and also the standard of model accuracy evaluation.In this study, forty-nine lunar samples from the Apollo, Luna, and Chang'e projects [3,19,20,28] were utilized to represent the ground truth of oxide content (Table 1).Additionally, the CF value of CE-5 was extracted based on the criteria stated by Ma et al. [28] and the other CF values were derived from Ma et al. [28].

Oxide Inversion Model
We used PSO-XGBoost and PLSR algorithms to construct mathematical models for describing the relationship between the four oxide contents and CF values.The XGBoost algorithm is one of the supervised machine learning models proposed by Chen and Guestrin [44] and is essentially a gradient boosting decision tree (GBDT) model.The fundamental idea of the XGBoost algorithm is to integrate multiple weak estimators into a strong estimator [54], demonstrating excellent performance in parallel computing efficiency, handling missing values, controlling overfitting, and predicting generalization ability [55,56].
There are two important parameters in the XGBoost model: n_estimators and learn-ing_rate.The n_estimators parameter represents the number of trees built during training.The larger the n_estimators value, the better is the learning capacity of the model and the easier is the overfitting.The other parameter is learning_rate, also known as shrinkage, which controls the iteration rate and can be used to prevent overfitting.After each iteration, the XGBoost algorithm determines the weights of new features to control the weights of the subtrees and mitigate overfitting [47].
The PSO algorithm, proposed by J. Kennedy and R. Eberhart [57], is a stochastic search algorithm based on collective collaboration, which was developed through simulating the foraging behavior of birds [57,58].In the PSO algorithm, each potential solution is called a particle [54] and consists of the velocity vector and geometric position vector [59].The core of the PSO algorithm is to update the velocity and position of the particles, enabling the continuous search for the optimal solution within the search space [60].The PSO algorithm helps to find the optimal combination of parameters which can improve the performance of the XGBoost model.
In addition, the PLSR algorithm is a linear regression model proposed by de Jong [61].The core idea of PLSR is to find a new space that can simultaneously explain the variation in both the predictor and response variables [62].In the PLSR model, coefficients and intercepts are two important parameters.PLSR simplifies the model by extracting latent variables and captures the relationships between variables through coefficients and intercepts, thereby enhancing the model's predictive power.

Model Parameters and Evaluation Index
In this study, the oxide abundances and CF values of 49 lunar samples were input into the PSO-XGBoost and PLSR algorithms to construct oxide inversion models.To ensure the training effectiveness of the designed oxide inversion models, it was necessary to optimize relevant parameters of the models.In each oxide model constructed by the PSO-XGBoost algorithm, the optimal values of two hyperparameters were obtained through iteratively updating the location vectors and velocity vectors of the PSO (Table 2).The optimal parameters of the oxide model based on the PLSR algorithm are shown in Table 3.Based on these optimal parameters, the functional relationships between CF values and oxide abundances were constructed.In addition, random sampling was adopted to select 80% of the 49 lunar samples (39 samples) as the training set and 20% of the 49 samples (10 samples) as the testing set.
To assess the performance of the oxide inversion models, the determination coefficient (R 2 ) and the root mean square error (RMSE) were utilized as evaluation indicators and calculated as follows: where c i ̌, c i and c i denote the true, predicted, and mean values of the ith sample, respectively, and n denotes the number of samples.
Finally, the corrected CF values of the Moon (70°N/S) were used as input for predicting, and the Al2O3, CaO, FeO and MgO abundances of the Moon (70°N/S) were output, respectively.

Correlation Coefficients between Oxides and CF Values
Correlation analysis of CF values and oxide abundances at 49 sampling sites showed high Pearson correlation coefficients (Figure 2).In general, Al2O3 and CaO exhibited negative correlation with CF values, whereas FeO and MgO demonstrated positive correlation.Among them, the linear correlation between CF values and FeO abundance was 0.7737, and the nonlinear correlation between CF values and abundance of oxides (Al2O3, CaO, and MgO) were maintained at 0.8110, 0.7360, and 0.6641, respectively.However, the relationship between CF values and oxide content was not always univariate linear regression.For some oxides (e.g., Al2O3, CaO, and MgO), it was difficult to describe their complex relationship with CF values using traditional univariate regression.

Model Accuracy Evaluation
The prediction accuracies of the PSO-XGBoost models are shown in Figure 3.The R 2 values of the oxide (Al2O3, CaO, FeO, and MgO) abundance prediction models were 0.8806, 0.754, 0.8755, and 0.79, respectively, all greater than 0.75.The RMSE for these predictive models was also notably low, at 2.2698, 1.0392, 2.1327, and 0.9922.In contrast, the PLSR models (Figure 4), while possessing some predictive capability, demonstrate slightly inferior R 2 values of 0.769, 0.5116, 0.6827, and 0.2569, and relatively higher RMSE values of 3.1577, 1.4643, 2.2641, and 1.8664, respectively.Therefore, compared with the PLSR model, the PSO-XGBoost model demonstrated superior performance in capturing nonlinear relationships, enabling it to provide more precise predictions of oxide abundance.

Global Maps of Oxides
The global distribution maps of the four oxides' abundances with a resolution of 32 ppd were derived from the CF map based on the PSO-XGBoost model, as shown in Figure 5. From a global perspective, the average abundances of Al2O3, CaO, FeO, and MgO were 23.63 wt.%, 14.42 wt.%, 7.78 wt.%, and 7.14 wt.% respectively, and the standard deviations of Al2O3, CaO, FeO, and MgO were 5.64, 2.10, 4.47 and 2.15 (Table 4).Among them, the highest average of Al2O3 across the global Moon indicates that Al2O3 is present in a relatively significant proportion in the lunar oxides.Additionally, standard deviation is a statistical measure that quantifies the amount of variation or dispersion in a set of values, and the highest standard deviation of Al2O3 implies significant differences in Al2O3 abundance between different regions on the lunar surface.

Three Geological Units and Interesting Regions of Chemical Abundances
As shown in Table 4, the four oxides' average abundance in the Moon (70°N/S) and the SPA basin follows the order Al2O3 > CaO > FeO > MgO.In maria, the order of four oxides average abundance is FeO > Al2O3> CaO >MgO, while the order of the four oxides' average abundance in the highlands is Al2O3 > CaO> MgO > FeO.The average abundances of Al2O3 (25.66 wt.%) and CaO (15.12 wt.%) in the highlands are higher than (Al2O3: 14.71 wt.% and CaO: 11.45 wt.%) in maria, while the average abundances of FeO (6.16 wt.%) and MgO (6.46 wt.%) in the highlands are lower than in the maria (FeO: 15.09 wt.% and MgO: 9.88 wt.%).MgO exhibited the lowest average abundance among these four oxides in maria and the SPA basin, and the average FeO content was the lowest in the highlands.In the SPA basin, the average contents of each oxide were intermediate between the maria and highlands.
The average and STD of the four oxides' abundances in the major lunar maria are shown in Table 5.Except for the lowest value of Ca, which is in Mare Spumans, the maximum and minimum values of the other oxides are in Mare Australe and Mare Tranquillitatis.Among the 21 major maria, the STD values of Al2O3 and FeO abundances are larger than those of CaO and MgO content, indicating large changes in the distribution of Al2O3 and FeO abundances, especially in Mare Orientale.The compositional distributions of three interesting regions are shown in Figure 6.The oxide compositions are inconsistent within and around the Kepler impact crater.The high Al2O3 and CaO contents inside the Kepler crater are similar to those in the highlands, which may be due to the exposure of plagioclase material after the impact broke through the mare basalt.The higher FeO content in the interior of the Moscoviense Basin was formed by the filling of the basin with iron-and titanium-rich basaltic magma after its formation.In addition, the composition of the Helmet dome is relatively similar to the highland material, with higher Al2O3 and CaO and lower FeO and MgO.

Comparison with Previous Studies
The four oxides' distributions were estimated using LP GRS [7], CE-1 IIM (~200 m/pixel) [3,23], and LRO Diviner (32 ppd) data [28].Based on the global maps of the four oxides, their average content in the Moon (70°N/S) and its different terrains can be obtained (Figure 7). Figure 8 illustrates the spatial distribution of maria, highlands, and SPA basin.Based on the boundaries of different terrains, bar charts (Figure 7) were produced for comparative analysis by extracting the oxide abundances for the different geological units (global, maria, highlands, and SPA basin).
The average abundances of the four oxides in this study fell between the ranges of the other inversion results (Figure 7).For the Moon (70°N/S), the smallest differences of average abundances in Al2O3, CaO, FeO, and MgO were 0.16, 0.10, 0.03, and 0.29, respectively.Apart from CaO derived from LP GRS in maria and the SPA basin and FeO obtained from Diviner in maria, the average abundance of inversion results in this study was similar to the other results, especially those from CE-1 [23] and Diviner [28], which showed the closest resemblance.Relatively significant differences in levels of CaO in maria and the SPA basin were observed between this study and LP GRS, which may be due to the different resolution.The LP GRS and the Diviner data exhibit different spatial resolution, implying a variance in their capacity to discern the smallest pixels.Data characterized by a lower resolution may not adequately capture the fine-scale characteristics observable within higher resolution datasets, which could lead to the neglect of subtle variations in the CaO distribution.Despite the differences in their absolute abundances, the trends in their relative abundances are consistent.We compared the average and standard deviation of oxide abundance in 21 major maria with previous work conducted by Ma et al. [28], and the scatter plots with error bars for the four oxide abundances are shown in Figure 9.For the four oxides (Al2O3, CaO, FeO, and MgO), the average contents determined through this study and the LRO Diviner maps [28] in each maria demonstrated a relatively proximity to the 1:1 line, with R 2 values of 0.95, 0.82, 0.95 and 0.86, respectively.

Implications of the Mg# Map
For the four major oxides estimated in this study, the Mg# (Mg# = mole percent MgO/(MgO + FeO)) was obtained with a resolution of 32 ppd.The global distribution of Mg# is shown in Figure 10, and exhibits an obvious trichotomy.The average Mg# values across the Moon (70°N/S) and in the three geological units (maria, highlands, and SPA basin) were 0.50, 0.41, 0.52, and 0.48, which are close to the multiband imager (MI) Mg# values [19] of 0.53, 0.40, 0.58, and 0.46, respectively.The average Mg# value in the highlands was higher than in the maria, which is consistent with the fact that FeO was higher than MgO in the maria regions, while MgO was higher than FeO in the highland regions [3].In addition, the regions with low Mg# values in the new Mg# map correspond to the distribution of mare basalts on the Moon [63], which indicates that mare basalts exhibit low Mg#.
Mg# is an important indicator in lunar petrology for studying the compositional variation during magma crystallization [28,59], and a higher Mg# value represents an earlier magma ocean event with more primitive materials [2].The different Mg# values observed in these three geological units are similar to those described by Otake et al. [2], indicating that they have undergone different magma evolution processes resulting in the lava flow filling with diverse composition and minerals [2,64].Therefore, the maria have undergone a higher degree of evolution, primarily crystallized from the iron-and magnesium-rich magma, while the formation of the highlands marks the early stages of the lunar magma crystallization process, formed by the crystallization of relatively primitive magma.

Limitations and Prospects
The limited quantity of lunar samples poses an obstacle to compositional modelling.A total of 49 samples were utilized, including the Apollo and Luna missions, CE-3 in situ measurements, and the CE-5 mission, all of which were distributed in the low-latitude regions on the lunar near side, except for the CE-5 samples.Moreover, the small sample size and limited geographic coverage led to limitations in both quantity and spatial distribution, resulting in uncertainties in the prediction results.It is hoped that more samples can be obtained in the future, especially from the lunar far side and high-latitude regions, which is expected to improve the accuracy of the inversion results.
The study of minerals through spectral features, particularly the CF, has a long history in planetary science.Indeed, the CF has demonstrated capability in identifying silicate minerals and estimating lunar surface composition [29,30,52].However, both space weathering and viewing geometry influence the CF values.Recently, Greenhagen et al. [31] performed terrain-dependent photometric correction to map the latest -normalizedto-equatorial-noon (NEN) CF map.Meanwhile, Lucey et al. [32] corrected the effects of space weathering to generate a new surface optical maturity parameter (OMAT) CF map.With the advancement of remote sensing technology, exploring new methods to mitigate the effects of space weathering and viewing geometry can help to reduce the uncertainty of inversion results.
In addition, machine learning and deep learning algorithms can be further applied to lunar oxide inversion.In this paper, the PSO-XGBoost algorithm was used to predict the abundance of oxides with satisfactory results, although there is still space for further improvement.

Conclusions
Based on previous lunar samples, this study incorporated samples from the CE-5 mission to produce the new maps of oxide abundance and Mg# on the lunar surface.Except for FeO, the oxide (Al2O3, CaO, and MgO) abundances exhibited complex nonlinear relationships with CF values, making the PSO-XGBoost model a better choice for inversion.The higher R 2 values and lower RMSE in this study indicate the satisfactory performance and generalization ability of the models.The distinctive distribution of oxides and Mg# across the three geological units indicate differences in the source of the magma, resulting in mare basalts with different composition.Incorporating data from the Chang'e-5 mission has supplemented and refined our understanding of the Moon's late-stage magmatic activities, enabling updating of the lunar oxide distribution maps.

Figure 3 .
Figure 3. Scatter plots of the measured and predicted values for (a) Al2O3, (b) CaO, (c) FeO, and (d) MgO from PSO-XGBoost models.Error bars represent 95% confidence intervals around the oxides' predicted values.

Figure 7 .
Figure 7. Average abundances of four oxides in the (a) Moon (70°N/S), (b) maria, (c) highlands, and (d) SPA basin, according to five sets of results.

Figure 8 .
Figure 8. Sketch map showing the locations of the three geological units (maria are marked in blue, highlands are marked in orange, and the SPA basin is identified in green).

Figure 9 .
Figure 9. Scatter plot with error bars for (a) Al2O3 (b) CaO (c) FeO and (d) MgO abundances from Ma et al. [28] and this study, for 21 major maria.Error bar indicates the standard deviation.

Table 1 .
Four oxide (Al2O3, CaO, FeO, and MgO) contents and CF positions of 49 lunar sampling sites.Abundances are shown in wt.%, and CF values are shown in μm.

Table 2 .
Optimal values of parameters for four major oxides based on the PSO-XGBoost algorithm.

Table 3 .
Optimal values of parameters for four major oxides based on the PLSR algorithm.

Table 4 .
Average (AVG) and standard deviation (STD) values for the abundance of four oxides and Mg# in maria, highlands, and SPA basin (wt.%).

Table 5 .
The AVG and STD of oxide abundances in major lunar maria (wt.%).