Comparison on Quantitative Analysis of Olivine Using MarSCoDe Laser-Induced Breakdown Spectroscopy in a Simulated Martian Atmosphere

: A Mars Surface Composition Detector (MarSCoDe) instrument mounted on Zhurong rover of Tianwen-1, adopts Laser-Induced Breakdown Spectroscopy (LIBS), with no sample preparation or dust and coatings ablation required, to conduct rapid multi-elemental analysis and characterization of minerals, rocks and soils on the surface of Mars. To test the capability of MarSCoDe LIBS measurement and quantitative analysis, some methods of multivariate analysis on olivine samples with gradient concentrations were inspected based on the spectra acquired in a Mars-simulated environment before the rover launch in 2020. Firstly, LIBS spectra need preprocessing, including background subtraction, random signal denoising, continuum baseline removal, spectral drift correction and wavelength calibration, radiation calibration, and multi-channel spectra subset merging. Then, the quantitative analysis with univariate linear regression (ULR) and multivariate linear regression (MLR) are performed on the characteristic lines, while principal component regression (PCR), partial least square regression (PLSR), ridge, least-absolute-shrinkage-and-selection-operator (LASSO) and elastic net, and nonlinear analysis with back-propagation (BP) are conducted on the entire spectral information. Finally, the performance on the quantitative olivine analyzed by MarSCoDe LIBS is compared with the mean spectrum and all spectra for each sample and evaluated by some statistical indicators. The results show that: (1) the calibration curve of ULR constructed by the characteristic line of magnesium and iron indicates the linear relationship between the spectral signal and the element concentration, and the limits of detection of forsterite and fayalite is 0.9943 and 2.0536 (c%) analyzed by mean spectra, and 2.3354 and 3.8883 (c%) analyzed by all spectra; (2) the R 2 value on the calibration and validation of all the methods is close to 1, and the predicted concentration estimated by these calibration models is close to the true concentration; (3) the shrinkage or regularization technique of ridge, LASSO and elastic net perform better than the ULR and MLR, except for ridge overﬁtting on the testing sample; the best results can be obtained by the dimension reduction technique of PCR and PLSR, especially with PLSR; and BP is more applicable for the sample measured with larger spectral dataset.


Introduction
The Mars Surface Composition Detector (MarSCoDe) is a remote sensing instrument suite mounted on the front deck of the Zhurong rover, in China's first Mars exploration mission Tianwen-1. MarSCoDe adopts Laser-Induced Breakdown Spectroscopy (LIBS), ground data-preprocessing and some quantitative analyses are presented (in Sections 2.2 and 2.3). Then the performance on the quantification of olivine with MarSCoDe LIBS has been compared and evaluated (in Section 3). Finally, some conclusions are discussed (in Section 4).

Test Samples and Experiment Environment
The wide use of LIBS is to conduct qualitative and quantitative analysis by comparing the examined spectra and the reference spectra with a known material concentration. The LIBS spectra of a complex sample containing several elements may have overlapping spectral lines and the characteristic emission lines usually need high spectral resolution to cover a range from 240 to 850 nm [17,35,36]. Some samples can be prepared and placed in a simulated Martian atmosphere and measured by the LIBS system of MarSCoDe.

Instrument Description and Simulated Experiment
MarSCoDe is one of six scientific payloads and mounted on the front deck of the Zhurong rover, launched on 23 July 2020 and landed at the candidate site of the Utopia Planitia on 15 May 2021. It utilizes a LIBS to provide element compositional information with active spectroscopy over 240-850 nm, an Acousto-optic Tunable Filter (AOTF) to collect reflected information with passive SWIR over 800-2400 nm, and a TMI to capture sample texture and morphology image to perform the in situ detection with a stand-off distance of 1.6-7 m. With the help of a two-dimensional (2D) pointing mirror, the remote instrument can point small observation footprints with fine-scale sampling and perform the line scan on rock targets and depth profiling through surface coating. These can be used for sample classification, composition quantitative, and even 3D characterization. Figure 1a shows the main units of the MarSCoDe instrument suite, and Table 1 lists the main technical parameters of LIBS. The composition and performance of the equipment have been detailed in the literature [1]. There are three array CCDs with 2048 pixels to record the LIBS spectral response. Selected 1800 pixels among the CCD to correspond to three channels with a spectral sampling interval of 0.067, 0.132, and 0.203 nm, respectively. The LIBS spectrometer also has a sufficient sensitivity to collect the reflectance spectroscopy over the same spectral range, conducted in passive mode as a by-product of the LIBS. This measurement is typically used as a background after laser shots to assist in the calibration of LIBS.

. Sample Pretreatment and Component Content
Although one of the most advantages of LIBS is no need for sample preparation, the pretreatment of samples could improve the quality of LIBS spectra. For the solid samples, it usually directly presses the homogeneous powder of standard material into pellet. The olivine is a silicate of magnesium and iron, an ultramafic and the main igneous rock of These samples were placed into a vacuum chamber measured in a simulated Martian environment, as illustrated in Figure 1. The chamber was filled with the simulated Martian atmospheric composition gases with Ar of 1.6%, N 2 of 2.7% and CO 2 of 95.7%, at the pressure of 700 ± 50 Pa, with a room temperature. Considering most of the on-board detection targets may be located at 3 m and obtained with the ideal signal-noise ratio, the LIBS spectra were recorded by the MarSCoDe LIBS with a laser-to-sample distance of approximately 3 m on 9 October 2019. The laser was emitted at a frequency of 3 Hz after an autofocus was performed in the center of each sample. One location of each sample was shot by 60 consecutive laser pulses and hence 60 LIBS spectra were recorded, another 60 passive spectra without the laser shot were collected and their on-board mean was regarded as the background. All these spectra were measured with an integration time of 1 ms.

Sample Pretreatment and Component Content
Although one of the most advantages of LIBS is no need for sample preparation, the pretreatment of samples could improve the quality of LIBS spectra. For the solid samples, it usually directly presses the homogeneous powder of standard material into pellet. The olivine is a silicate of magnesium and iron, an ultramafic and the main igneous rock of Mars. A chemical composition of the olivine is usually the end member between magnesium olivine (Mg 2 SiO 4 ), referred to as forsterite (Fo), and iron rich olivine (Fe 2 SiO 4 ), referred as fayalite (Fa). The different content indicates the different physical conditions in the geological formation. In natural igneous rock, Mg and Fe are generally oxides, and they can also be used to simulate the Fo and Fa. For the synthesis of Mg 2 SiO 4 , MgO and SiO 2 were mixed at a ratio of 2 to 1, and for Fe 2 SiO 4 , Fe 2 O 3 and SiO 2 were mixed at a ratio of 1 to 1. The chemical formula of the final magnesium olivite terminal mineral (Mg 2 SiO 4 ) shares the composition (2(MgO) + SiO 2 ). The chemical formula of the final iron olivite terminal mineral (Fe 2 SiO 4 ) and the composition scheme (Fe 2 O 3 + SiO 2 ) are slightly different, but due to oxygen elements in the air, oxygen elements are ignored in the analysis. To obtain olivine with a gradient content of Mg and Fe, homogeneous well-ground oxide mixtures with different desired ratios of MgO, Fe 2 O 3 , and SiO 2 were prepared as a stoichiometric composition.

Preprocessing of LIBS Spectra
To improve the accuracy of quantitative analysis, some of the preprocessing of MarSCoDe LIBS must be carried out prior to the spectral analysis. Similar to ChemCam, the pretreatments of the measured spectra include: (1) background subtraction, (2) random signal denoising, (3) continuum background removal (also called baseline correction), (4) spectral drift correction and wavelength calibration (also called on-board wavelength calibration), (5) responded radiation calibration, and (6) multi-channel spectra merging and normalization. However, there are also some differences on denoising, baseline correction, wavelength, and radiation calibration. In this study, merging the channels can yield the complete spectra over full bands.

Noise and Background Removal
The measured spectra are usually the sum of the analyte signals corresponding to a specific atomic or ionic transition [37]. Besides the characteristic emission signal, background from the environment, noise from detectors and other factors, as well as bremsstrahlung radiation from free electrons and recombination emission are also included. In order to obtain clearer spectral signals, we made efforts to eliminate the noises and the background radiation.
(1) Subtracting background Regarding the dark background, the non-laser spectra closest in time to the active spectrum of interest can be deducted to avoid the influences of background. Each sample has 60 LIBS spectra and one background, and there are 5400 pixels corresponding to three channels (i.e., 1800 pixels per each channel). The responded signal for each pixel of the spectrometer could subtract the background value from the measured spectral value.
(2) Denoising random signal The measured spectra often contain a large amount of environmental noise, which is usually shown as random fluctuations (approximated as a "white Gaussian noise"), distributed over all frequencies, and have relatively low amplitudes. Their presence and the amplitude of noise depend on the experimental conditions. Instead of the method of undecimated cubic spline wavelet transform [16], a method of wavelet with hard threshold is conducted to remove the white noise.
(3) Removing continuum baseline LIBS spectra generally consist of a series of sharp peaks riding on top of a continuum background, which has a continuous spectral shape and is mainly due to the bremsstrahlung emission and the recombination of ions and electrons [38,39], and some stray lights during plasma radiation. The time-gating of the detector can reduce the continuum, which usually decays much faster than the discrete fluorescence. However, the MarSCoDe uses non-gated detectors same as the ChemCam, and the LIBS spectra may contain higher levels of continuum background although they were decreased in the Martian depression. Instead of the methods of linear interpolation [40] and spline function interpolated with minima estimated by wavelet [10,16], a method based on asymmetric least squares smoothing [41] was used to remove the continuous background in this investigation.

Wavelength and Radiation Calibration
(1) Spectral drift correction and wavelength calibration The LIBS spectrometer selected 1800 pixels among the CCD to record the spectral response over 240-340 nm, 340-540 nm, and 540-850 nm. The relationship between pixel and wavelength has been calibrated with four standard source lamps [1], including a Mercury-Argon one, covering 253.6-922.5 nm, a Zinc one covering 202.5-636.2 nm, a Cadmium one covering 214.4-643.8 nm, and a Neon one covering 337.0-1084.5 nm. The comparison of the known peak positions with their pixel indices in the experimental data enables the derivation of a 2nd-order polynomial calibration function for each of the three channels. Additionally, the LIBS spectrometer wavelength may drift slightly with different environmental conditions (i.e., temperature), and it has been indicated that the drift is nearly entirety offset by the responded pixels within each channel [1]. The titanium element provides multiple stable emission peaks over the main range of LIBS wavelengths that facilitates accurate wavelength calibration [10]. Therefore, some spectra of the Ti plate were collected in this simulated environment and can be used as reference spectra in on-board calibration.
(2) Radiation calibration on the respond The radiometric calibration can be used to establish the relationship between the response signal of spectrometer and the spectral radiation of target. A relative radiometric calibration based on an AvaLight-DH-S Deuterium-Halogen light source and an absolute radiometric calibration based on a Labsphere integrating sphere were conducted to transform the response signal of the spectrometer to the spectral radiation of the targets. The MarSCoDe response measurements and corrections have been described in [1]. The response of each pixel can be converted to the intensity with the radiometric calibration coefficient.

Merging and Normalization
(1) Merge multi-channel into complete spectrum Three channels of the LIBS spectrometer have some overlaps between the adjacent channels, so the three channels' spectra subsets can be merged into an entire spectrum to obtain the full bands covering 240-850 nm. The recorded spectral counts are consistent for the three channels' responses and can be insured with the radiation calibration. Two overlap spectra between the adjacent channels are first selected with the range of wavelength (with left and right expansion of the effective wavelength boundary is 2 nm). Secondly, new spectral counts on the given wavelength are computed by a linear interpolation. Thirdly, new spectral counts on the overlap wavelengths are meant with two spectral counts. Finally, a complete spectrum is merged over full bands.
(2) Normalization of the spectra To increase the stability of the response and overcome experimental effects, the analyte spectra can be normalized using a parameter representing the actual plasma conditions. To reduce the interference matrix effect and further improve the calibration model, Sarkar [44]. In brief, there are three main normalization methods, divided into the intensity of an internal standard line, the reference signal, and the plasma condition [45]. For each spectral channel of ChemCam, the normalization based on the spectral profile area was used, i.e., each spectral set can be individually normalized by dividing each spectral pixel by the total integrated intensity [9], because the total collected emission integrated intensity represents approximately the total energy released by the plasma in each shot, correcting for shot-to-shot variations in laser energy, spot size, plasma geometry and brightness, collection geometry, and physical matrix effects. In this experiment, the experimental conditions (such as equipment parameters, measurement of environment and distance, and sample matrix) were set as the same; hence, a L 2 normalization is used to normalize the MarSCoDe LIBS spectra.

Quantitative Analysis and Evaluation
To test the performance of commonly statistical analysis methods conducted on the MarSCoDe, the univariate and multivariate linear regression with characteristic spectral lines is used to examine the capability of single-element quantitative analysis, while the multivariate analysis of PCR, PLSR, ridge, LASSO and elastic net and even BP with entire spectral information are performed to quantify the behavior of LIBS for full bonds. In the analyses, the training samples' spectra were used to build a training set, and the test samples' spectra are used as a test set to apply derived parameters and provide an estimate of the generalization of the model with those fixed parameters, while the results on the mean spectrum and all of the 60 spectra are compared. Finally, the component concentrations predicted by these methods are evaluated with the true concentrations.

Quantitative Analysis
LIBS quantification is based on the relationship between the spectral signal and the concentration of analyte, which may be arbitrarily complex but it is highly desirable to obtain a linear correspondence between them. However, LIBS usually measures with high resolution spectroscopy and provides redundant spectral data beyond the characteristic lines. This will cause the number of independent variables to be greater than the number of dependent variables, and there is a certain correlation between the variables and the information that overlaps to some extent. It encounters the level of calculation and the complexity of the problem, such as multicollinearity, overfitting, or underfitting. To solve the problem of the bias-variance trade-off in linear regression, there are generally three ways: (a) increasing the number of samples or reduce the characteristic lines, with the attempt to remove irrelevant predictors, but in practice, both the number of samples and the characteristic lines are limited; (b) performing regularization to constrain or regularize the coefficients of a model to reduce the variance, which limits the size or number of parameters in the model as much as possible to prevent overfitting; and (c) reducing dimension via extracting PCs, which contain most of the information of the original variables can be retained by selecting a few principal components, so that these principal components can be used to replace the original variables. In this investigation, methods corresponding to the above three ways were performed and compared.
(1) Calibration curve with linear regression and multivariate linear regression A linear regression model can assume the best linear relationship between the spectral signals of the predictor (as input variables) and the concentration values of the analyte (as output variables) and then can also be used to estimate the concentration of samples to be analyzed. The univariate analysis is the simplest method, where only one predictor per sample is exploited and only the concentration value of one single analyte is predicted. The normalized intensity of the predictors can be plotted versus the material concentration to build the calibration curve of spectral signal and concentration.
For LIBS, the emission lines of an element are usually not individual and have a certain correlation with each other. The multivariate analysis is used to avoid the interference of self-absorption and matrix effects, regarded as MLR. Generally, ordinary least squares (OLS) is the most basic and commonly used method to fit the linear model. The equation for a linear model can be expressed as: where x i is the spectral features,ŷ i is the concentration values, i and j are the number of samples and features, and β is considered the coefficient assigned to each feature. This method assumes the best fitted line of the observed data by minimizing the sum of squared deviations of each data point from the line. The goal for a linear model then minimizes the residual sum of squares (RSS) between predictions and actual values, which is expressed as: where y i andŷ i are the actual values and predicted values for the ith observation and m and n are the sum of features and samples. In fact, the error of regression model can be decomposed into three parts: error resulting from a large variance, error resulting from significant bias, and the unexplainable error, which can be expressed as: In the regression, the bias and the variance could be balanced as well possible to keep both low. In practice, the predictor variables of LIBS spectra are highly correlated with each other, there is a problem of multicollinearity, which may cause unreliable coefficient estimates of the model and yield high variance.
(2) Ridge, LASSO, and Elastic Net In order to reduce the model complexity and prevent overfitting of the linear models, a shrinkage or regularization technique can be used to improve the OLS by imposing some constraints to reduce the high variance at the cost of introducing some bias. Ridge and LASSO regression are two of the most popular variations of linear regression, are more robust against outliers, and have better prediction accuracy and interpretation power.
Ridge regression is an improved least squares estimation method by adding a L 2 penalty term equal to the square of the magnitude of the coefficients. It is designed to introduce a little bias so that the variance can be greatly reduced, resulting in a lower overall mean squared error. It enhances the regular linear regression by changing its cost function slightly to avoid overfitting the data. Ridge regression performs L 2 -regularized models.
where λ is the regularization penalty. LASSO regression adds a L 1 penalty term equal to the absolute sum of the coefficients to induce sparsity. Unlike L 2 -regularized models, the L 1 -regularizer can perform automatic feature selection by constricting feature coefficients to zero. Since broadband LIBS has many superfluous features, the LASSO can eliminate some noisy features [23]. To calculate its model coefficients, the LASSO solved as follows: where λ controls the constriction level of the coefficient vector β. The main difference between ridge and LASSO regression is that ridge regression can shrink the coefficient close to 0 so that all predictor variables are retained, whereas LASSO can shrink the coefficient to exactly 0 so that LASSO can select and discard the predictor variables that have the right coefficient of 0. However, the LASSO causes a small bias when the prediction is dependent on a particular variable. Elastic net regression is a combination of ridge and LASSO that retains the sparse properties of LASSO and the stability of ridge. It can also select groups of correlated variables and perform variable selection and regularization through the smoothing of the coefficient weights. To calculate its model coefficients, the elastic net solves the following optimization problem: where λ controls the strength of the combined regularizer penalty and α controls the mixture of the two regularizers. If α = 0 the model is ridge regression, α = 1 the model is LASSO regression.
(3) PCR and PLSR LIBS provides redundant spectral data, and the spectral information is highly correlated and overlapped to some extent. In the multivariate regression, reducing the dimensionality of observations is especially critical. They can extract the significant features or irrelevant principal components (PCs) of the spectra but still consider the global dataset for each sample. Principle component analysis (PCA) and partial least squares (PLS) are two classical dimension reduction approaches. PCR and PLSR are also two alternative methods to the simple linear model that usually have better model fitting and higher accuracy. PCR offers an unsupervised approach, while PLSR is a supervised approach based on the correlation.
PCR is basically using the PCA to obtain the PCs and then performing multiple regression on certain PCs. First, PCA is performed on the training spectral matrix of the original independent variables to decompose them into an orthogonal basis, and we can find appropriate number of PCs according to the coefficient of the independent variable matrix, the eigenvalue and feature vector, and the variance contribution rate and cumulative contribution rate; second, the selected main PCs are used in the outcome variable regression and analyzed by a multiple regressor with least squares; then, transform the findings back to the scale of the covariates to obtain a PCR estimator so the regression coefficients are estimated and the strongest possible correlations between the orthogonal PC scores and elemental composition are established. The number of PCs must be selected carefully to ascertain whether the reduced samples will contain meaningful information.
PLSR combines the advantages of the PCA, the canonical correlation analysis, and the MLR analysis. Unlike the PCA, which try to extract the maximum information that reflects the variability of the spectra to explain the matrix of X without guarantee that the PCs are related to Y, PLS attempts to find the multi-dimensional direction in the X space as well as to explain the multi-dimensional direction in the Y space and establish the fundamental relationship of two matrices (X and Y), based on the covariance structures. First, it extracts a set of latent factors that performs a simultaneous decomposition of X and Y with the constraint, where the latent factors can explain as much of the covariance as possible between the independent and dependent variables. The components of X can be used to predict the Y component scores, which are then used to predict the actual values of the Y variables. The PLS iteratively maximizes the strength of the relation of successive pairs of X and Y component scores by maximizing the covariance of each X-score with the Y variables. Second, an OLS regression is used to predict values of the dependent variables using the decomposition of the independent variables. It involves either only one dependent variable regarding the concentrations of a single element (PLS-1 regresses) or multiple dependent variables concerning the concentrations of multiple element (PLS-2 regresses) against the predictor variables [46,47]. Both algorithms explain the variance and covariance in both X and Y.
(4) Back-propagation Back-propagation (BP) is a common supervised learning algorithm based on feedforward multilayered neural networks according to the error back-propagation algorithm, in an artificial neural network (ANN). It can obtain a functional relationship between the input and output by minimizing the loss function based on the gradient descent. The first and last layers are called the input and output layers and those in-between are the hidden layers.
Training the BP network involves the forward propagation of signal and back propagation of error. During forward propagation, the spectra are transferred from the input layer, processed by the hidden layer, and then the output layer. This process generates output signals through nonlinear transformation. If the actual output concentration of the output layer is inconsistent with the expected output concentration, it will turn to the back propagation. The error back-propagation is to back-transmit the output error to the input layer through the hidden layer and allocate the error to all nodes of each layer, so as to obtain the error of each layer nodes, which is used as the basis for modifying the weight of each node. By adjusting the connection weight between the nodes of each layer, the error of each layer is reduced along the gradient direction. After repeated learning and training, the network parameters (weights and thresholds) corresponding to the minimum error are obtained. Generally, the error output is calculated in the direction from input to output, while the weight and threshold are adjusted in the direction from output to input. In the weight update phase, the gradient weight is obtained according to the input activation level and output delta.

Evaluation and Validation
The minimum detectable concentration of an element is one of the key characteristics of evaluation on the method, technique, and equipment [48]. The limit of detection (LOD) can be calculated according to the IUPAC criterion [39,49], as described in Equation (11), where σ is the standard deviation of the blank and S is the slope of the linear part of the calibration curve. In this experiment, when the element spectral lines are present, the blank is measured as the background signal at the spectral position of the emission line or from the featureless region at both sides of the analyte emission line.
In practice, some reference samples with a known component content are usually used to construct the relationship between spectral signal and analyte concentration and can also be used to predict other samples. Some statistical indicators can be used to evaluate the performance of the quantitative analysis, such as the coefficient of determination (R 2 ), the mean absolute error (MAE), the standard error (Std), the root mean square error (RMSE), etc. Therefore, a R 2 value closer to 1 and a MAE, Std, and RMSE value closer to 0 indicate a accurate quantitative analysis. Meanwhile, it also can be verified by predicted vs. true concentration values to assess the accuracy and precision of the calibration model. The same statistical indicators can be performed on the residuals.

Results and Discussion
Fourteen olivine pellets were probed and analyzed, among them eleven samples were set as training and three samples as testing. Each sample was measured with 60 lasershot and 60 no-laser-shot in one location, and the 60 no-laser-shot spectra were taken the average (this process used on-board averaging) as the background. To avoid impurity contamination on the surface of the sample, the first two spectra were removed, from which a total of 812 spectra were estimated in the quantitative analysis.

Pretreatments of LIBS Spectra Preprocessing
In the experiment, each LIBS spectrum was first subtracted by the corresponding dark background and the white noise of the spectral signal was denoised. Figure 2 shows the recorded 58 LIBS spectra and one background for 5400 pixels with three channels and were reduced to obtain the signals. It indicates that the dark background within the three channels presents three steps and is randomly distributed separately; the signals have a high signal-to-noise ratio. The mean of the background within the three channels is 1339.3, 1474, and 1534.8, respectively. The range of the background within the three channels is 1318-1360.5, 1455-1493, and 1517.3-1552.3, calculated using 2.5 times the standard deviation. The recorded value of all the responded pixels within three channels are greater than the maximum value of the corresponding background. Figure 3 shows the denoised spectral signal with the method of wavelet with hard threshold. The noise distribution within the three channels indicates that the noise presents a Gaussian distribution and the denoising method retains the characteristic lines. The amount of noise ranges from −20 to 20, representing 0.3% of the maximum signal volume. Then, the continuum baseline of each spectral signal within each channel was estimated by the method of partial least squares and removed, as shown in        Figure 5 shows the 58 LIBS spectra with the wavelength and radiation calibration. A 2nd-order polynomial function for each channel from spectral calibration was used to convert each pixel to a wavelength value, where the amount of pixel drift within each channel was determined by the titanium plate spectrum and was used to correct the measured pixel of the sample before the wavelength calibration. Then the response digital number (DN) of each pixel was converted to the intensity with the radiation calibration coefficients.   Figure 5 shows the 58 LIBS spectra with the wavelength and radiation calibration. A 2nd-order polynomial function for each channel from spectral calibration was used to convert each pixel to a wavelength value, where the amount of pixel drift within each channel was determined by the titanium plate spectrum and was used to correct the measured pixel of the sample before the wavelength calibration. Then the response digital number (DN) of each pixel was converted to the intensity with the radiation calibration coefficients.   Figure 5 shows the 58 LIBS spectra with the wavelength and radiation calibration. A 2nd-order polynomial function for each channel from spectral calibration was used to convert each pixel to a wavelength value, where the amount of pixel drift within each channel was determined by the titanium plate spectrum and was used to correct the measured pixel of the sample before the wavelength calibration. Then the response digital number (DN) of each pixel was converted to the intensity with the radiation calibration coefficients.  The characteristic lines were found by a method of the find peak and then represented the element of these spectral lines, which were identified through comparison to the National Institute of Standards and Technology's atomic spectra database. The main characteristic lines of the elements contained in the sample can be distinctly identified, and the elements of Mg, Fe, Si, and O were labeled in the mean spectrum of Sample-A06, as shown in Figure 6a. Then, all the LIBS spectra were normalized by the L 2 normalization. Figure 6 shows the normalized spectra of the 14 samples, the intensity of the iron and magnesium spectral lines depends linearly on the concentration in these samples (as listed in Table 2) and the background of all the spectra presents a plane. The characteristic lines were found by a method of the find peak and then represented the element of these spectral lines, which were identified through comparison to the National Institute of Standards and Technology's atomic spectra database. The main characteristic lines of the elements contained in the sample can be distinctly identified, and the elements of Mg, Fe, Si, and O were labeled in the mean spectrum of Sample-A06, as shown in Figure 6a. Then, all the LIBS spectra were normalized by the L2 normalization. Figure 6 shows the normalized spectra of the 14 samples, the intensity of the iron and magnesium spectral lines depends linearly on the concentration in these samples (as listed in Table 2) and the background of all the spectra presents a plane.

Calibration and Validation of Quantitative Analysis
There are usually many characteristic spectral lines (peak intensity or peak area) indicating the concentration of elements. The calibration model can be established between representative or multiple spectra and concentration using linear regression, involving the ULR and MLR.
In the ULR analyses, the magnesium characteristic line of 294.20 nm was used to build the calibration curve of the Fo, while the iron characteristic line of 404.70 nm was used to build the calibration curve of the Fa. A calibration curve can be constructed by the measured intensity against the elemental concentration. The LOD of the calibration curves can be calculated with the standard deviation of blank and slope of the line of the calibration curve. To facilitate the comparison with the other methods, the intensity of spectral lines was used for the quantitative analysis in the experiment, and the mean spectrum and all of 58 LIBS spectra of the samples were established, respectively, against the known concentration, as shown in Figure 7. It distinctly indicates that their linear model between the spectral signal and the concentration values and the results of the analyzed by the mean spectrum and all the spectra are nearly uniform. The R 2 coefficients of the training set using the mean spectral analysis on the Fo and Fa are 0.9650 and 0.9901, respectively,

Calibration and Validation of Quantitative Analysis
There are usually many characteristic spectral lines (peak intensity or peak area) indicating the concentration of elements. The calibration model can be established between representative or multiple spectra and concentration using linear regression, involving the ULR and MLR.
In the ULR analyses, the magnesium characteristic line of 294.20 nm was used to build the calibration curve of the Fo, while the iron characteristic line of 404.70 nm was used to build the calibration curve of the Fa. A calibration curve can be constructed by the measured intensity against the elemental concentration. The LOD of the calibration curves can be calculated with the standard deviation of blank and slope of the line of the calibration curve. To facilitate the comparison with the other methods, the intensity of spectral lines was used for the quantitative analysis in the experiment, and the mean spectrum and all of 58 LIBS spectra of the samples were established, respectively, against the known concentration, as shown in Figure 7. It distinctly indicates that their linear model between the spectral signal and the concentration values and the results of the analyzed by the mean spectrum and all the spectra are nearly uniform. The R 2 coefficients of the training set using the mean spectral analysis on the Fo and Fa are 0.9650 and 0.9901, respectively, and 0.9615 and 0.9829 for analyzed by all spectra. This indicates that the LIBS signal intensities of the training set are linearly correlated at low concentrations. Table 3 lists the quantitative analysis accuracy and detection limit for the Fo and Fa analyzed by the mean spectrum and all spectra. The RMS of training set using the mean spectral analysis on the Fo and Fa are 5.9131 and 3.1495, respectively, and 6.2079 and 4.1358 as analyzed by all spectra. The LOD using the mean spectral analysis on the Fo and Fa are 0.9943 and 2.0536, respectively, and 2.3354 and 3.8883 as analyzed by all spectra. Therefore, it clearly presents that the better results can be obtained using the mean spectrum than all spectra in the ULR analysis, the accuracy of Fa by iron is better than that of Fo by magnesium, but the LOD of Fo is better than Fa, the reason for which may be that the iron has stronger activity than the magnesium in the laser-induced plasma and it shows strong lines even with less content. and 0.9615 and 0.9829 for analyzed by all spectra. This indicates that the LIBS signal intensities of the training set are linearly correlated at low concentrations. Table 3 lists the quantitative analysis accuracy and detection limit for the Fo and Fa analyzed by the mean spectrum and all spectra. The RMS of training set using the mean spectral analysis on the Fo and Fa are 5.9131 and 3.1495, respectively, and 6.2079 and 4.1358 as analyzed by all spectra. The LOD using the mean spectral analysis on the Fo and Fa are 0.9943 and 2.0536, respectively, and 2.3354 and 3.8883 as analyzed by all spectra. Therefore, it clearly presents that the better results can be obtained using the mean spectrum than all spectra in the ULR analysis, the accuracy of Fa by iron is better than that of Fo by magnesium, but the LOD of Fo is better than Fa, the reason for which may be that the iron has stronger activity than the magnesium in the laser-induced plasma and it shows strong lines even with less content.    In the ordinary MLR analysis, the more characteristic lines of magnesium and iron (such as the Mg of 294.20, 383.30, 517.40, and 518.50 nm and the Fe of 372.90, 404.70, 438.50, and 440.6 nm) were used to establish the relationship between spectral signal and component content for Fo and Fa. Meanwhile, in the multivariate analysis based on shrinkage or the regularization technique of ridge, LASSO and elastic net, based on principal components for PCR and PLSR and the ANN analysis of BP, all the spectral information was set as the independent variable matrix. Then the relationship involving the analyte concentration against the spectral intensity was estimated. In the PCR and PLSR analysis, we found that three PCs were sufficient to explain 95% of the variance of LIBS spectra. In the BP analysis, three hidden layers was set (each with a number of neurons of 12, 8, and 4), and the activation functions of the hidden and output layers adopt the tangential and linear transfer functions, respectively. Table 4 lists the accuracy of quantitative analysis for Fo and Fa analyzed by the mean spectrum and all the spectra of each sample; meanwhile, the validation on the predicted concentration and true concentration are also evaluated, as shown in Table 5. These results show that the R 2 value of the calibration and validation is closer to 1 (besides BP algorithm with mean spectrum is approximate 0.89, the other methods are larger than 0.99), and the R 2 value on the validation is better than that of the calibration. This indicates that the calibration model well describes the linear relationship between spectral signal and concentration of analytes and the predicted concentration is very close to the true concentration. To facilitate the comparison of the performance of these methods, some histograms of the training and testing samples are drawn by the RMSE value of these methods on the calibration and the validation, as displayed in Figure 8. For the accuracy of using the calibration model for the training sample (as shown in Figure 8a), the PLSR performs the best with either the mean spectrum of the sample or all spectra, with the RMSE value of 0.00 and 0.75, respectively. The PCR algorithm also performs better with the mean spectrum, although slightly worse than LASSO and elastic net using all the spectra, with the RMSE value of 1.47 and 2.34, respectively. For the BP algorithm, using all the spectra of each sample can accurately predict the concentration, while using the mean spectrum has the worst performance, with the RMSE value of 1.87 and 2.94 for Fo and Fa and 10.34 and 10.72 for Fo and Fa, respectively. The reason may be that the neural network needs more spectral data for each sample to train the accurate model. The algorithms of ridge, LASSO and elastic net also show better results for all spectra than on the mean, so these methods also require more data per sample to train the model. Of the three methods, elastic net generally performed the best and ridge performed the worst; this also confirms that elastic net combines the advantages of ridge and LASSO. The algorithms of ULR and MLR perform poorly compared to other algorithms, except the BP method (when used samples' mean spectrum), since MLR adopts more characteristic lines and its results are slightly better than ULR. For the accuracy of using the calibration model for the testing sample (as shown in Figure 8b), PLSR and PCR also perform the best, with the RMSE value of 1.33 and 1.23 for the Fo on the mean spectrum; 1.81 and 1.95 for Fa on all spectra, respectively; with the RMSE value of 1.33 and 1.27 for the Fa on mean spectrum; and 1.81 and 1.95 for Fa on all spectra, respectively; followed by elastic net and LASSO. For the accuracy of validation on both training and testing samples, these methods show consistent trends, and the accuracy of all spectra is better than the mean spectrum, comparing Figure 8c with Figure 8d. Generally speaking, the shrinkage or regularization technique of ridge, LASSO and elastic net perform better than the ULR and MLR, while the best results can be obtained using the principal component techniques of PCR and PLSR, representing all spectral information. Meanwhile, the neural network is more applicable for the sample measured with lager number spectral data. In practice, using the entire spectral information of all the spectra instead of the mean spectrum or characteristic lines in the quantitative analysis can avoid the cumbersome processes of preprocessing, such as spectra averaging, finding the spectral peak of characteristic lines, etc.  In addition, the predicted content of these methods on calibration and validation for each sample is detailed in the Supplementary Material   In addition, the predicted content of these methods on calibration and validation for each sample is detailed in the Supplementary Material Table S1. Supplementary Material Figure S1 plots the linear regression of relation on the predicted and true concentration values of these methods, and the estimated 95% confidence interval and 95% prediction interval. The deviation between the predicted concentration and the true concentration for each sample is detailed qualitatively and quantitatively, and it demonstrates that the predicted concentrations are consistent with the true concentrations. The PLSR also performs best with either the mean spectrum of the sample or all the spectra and yields a narrower 95% confidence interval and 95% prediction interval, followed by elastic net and PCR. Nearly all the samples are distributed within the 95% confidence interval. Although LASSO performs similar to elastic net, there are larger deviations than elastic net on the testing samples of T01 and T03, which are distributed outside the 95% confidence interval. This may overfit the raining set, as a given model may be able to fit the training dataset well but perform poorly on the testing dataset. The ULR and MLR algorithms analyzed by the spectral line also perform linear relationships well, where the MLR performs better than the ULR. Of all the methods, algorithm BP analyzed by the mean spectra has the worst performance, especially the results of the analysis of Fa with the largest 95% confidence intervals. This result is consistent with the accuracy analysis above. Since the concentration gradients of Fo and Fa are opposite, the calibration curve and the predicted concentration distribution plots analyzed by the entire spectral information of all spectra are almost identical in the figure. For the calibration curves obtained from analysis with all spectra, the predicted concentrations of the training and test samples are strongly consistent with the true concentrations, except that the confidence interval is larger than that analysis with the mean spectra, which may be due to the increased sample size of all spectra for each sample. Generally speaking, the principal component techniques of PCR and PLSR represent the best results and performance, especially PLSR. The shrinkage or regularization technique of ridge, LASSO and elastic net performs better than the ULR and MLR, except that the LASSO and ridge have large deviations on testing samples. The accuracy analysis using all the spectral information of all spectra can achieve better quantitative accuracy than the mean spectrum or characteristic lines.

Conclusions
This paper focuses on the procedure of preprocessing and quantitative analysis on the MarSCoDe LIBS spectra, utilizing common univariate linear regression (ULR) and multivariate linear regression, such as ordinary MLR, PCA, PLSR, ridge, LASSO, and elastic net and even the nonlinear analysis with BP. Thereinto, ULR and MLR are conducted on the characteristic lines, other methods are conducted on the entire spectral information. Firstly, the performance of the instrument suit, sample preparation and Mars simulation experiment are introduced. Secondly, the ground data-preprocessing, including background subtraction, denoising random signal, continuum baseline removal, spectral drift correction and wavelength calibration, radiation calibration, merge multi-channel spectra, and normalization, is presented and the qualitative and quantitative analysis of these methods are described. Thirdly, quantitative analyses (i.e., PCR, PLSR, Ridge, LASSO, Elastic Net, and BP) are conducted. Their results are compared and analyzed. Finally, the performance on the quantification of olivine with MarSCoDe LIBS is compared and evaluated by the mean spectrum and all spectra for each sample with statistical indicators (such as R 2 , MAE, Std, and RMSE). The results show that (1) the calibration curve of ULR constructed by the characteristic line of iron and magnesium can describe the linear relationship between the spectral signal and the element concentration, and the LOD using the mean spectral analysis on the Fo and Fa are 0.9943 and 2.0536, respectively, while they are 2.3354 and 3.8883 as analyzed by all spectra. (2) The R 2 value of the calibration and validation is close to 1, the calibration model describes the linear relationship between spectral signal and elemental concentration well, and the predicted concentration is very close to the true concentration. (3) The shrinkage or regularization technique of ridge, LASSO, and elastic net perform better than the ULR and MLR, and the best results can be obtained using the principal component techniques of PCR and PLSR, representing all spectral information, especially PLSR; BP is more applicable for the sample measured with lager spectral dataset. In addition, using all the spectral information of all the spectra instead of the mean spectrum or characteristic lines in the quantitative analysis can avoid the cumbersome processes of preprocessing in practice.
Future work will use the quantification model for the analysis of the in situ olivine data, realize the transmission of positive sample equipment and backup equipment, and obtain more sample data used for neural network analysis. In addition, the calculation of LOD in multivariate calibration will be studied.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/rs14215612/s1, Table S1. Predicted content of each sample on calibration and validation analyzed by the mean spectrum; Table S2. Predicted content of each sample on calibration and validation analyzed by all spectra, and Figure S1. calibration curve for Fo and Fa with the mean spectrum and full spectra of the samples.