Ultraviolet Spectroscopic Detection of Nitrate and Nitrite in Seawater Simultaneously Based on Partial Least Squares

A direct, reagent-free, ultraviolet spectroscopic method for the simultaneous determination of nitrate (NO3−), nitrite (NO2−), and salinity in seawater is presented. The method is based on measuring the absorption spectra of the raw seawater range of 200–300 nm, combined with partial least squares (PLS) regression for resolving the spectral overlapping of NO3−, NO2−, and sea salt (or salinity). The interference from chromophoric dissolved organic matter (CDOM) UV absorbance was reduced according to its exponential relationship between 275 and 295 nm. The results of the cross-validation of calibration and the prediction sets were used to select the number of factors (4 for NO3−, NO2−, and salinity) and to optimize the wavelength range (215–240 nm) with a 1 nm wavelength interval. The linear relationship between the predicted and the actual values of NO3−, NO2−, salinity, and the recovery of spiked water samples suggest that the proposed PLS model can be a valuable alternative method to the wet chemical methods. Due to its simplicity and fast response, the proposed PLS model can be used as an algorithm for building nitrate and nitrite sensors. The comparison study of PLS and a classic least squares (CLS) model shows both PLS and CLS can give satisfactory results for predicting NO3− and salinity. However, for NO2− in some samples, PLS is superior to CLS, which may be due to the interference from unknown substances not included in the CLS algorithm. The proposed method was applied to the analysis of NO3−, NO2−, and salinity in the Changjiang (Yangtze River) estuary water samples and the results are comparable with that determined by the colorimetric Griess assay.


Introduction
Nitrate (NO 3 − ) and nitrite (NO 2 − ) are the essential nutrients for marine phytoplankton growth and play a key role in many biogeochemical cycles [1,2]. NO 3 − and NO 2 − concentrations in seawater are also important indicators of water quality. Due to human activities, large amounts of nutrients are discharged into natural waters, thereby destroying the ecological balance and causing the eutrophication of water bodies [3]. Therefore, accurate quantification of NO 3 − and NO 2 − is critical for understanding the dynamics of marine ecosystems. Wet chemical analyses of NO 3 − and NO 2 − in seawater (e.g., the Griess assay) have been previously reviewed in the literature [4][5][6]. These chemical methods require the addition of chemical reagents, and thus, are time-consuming, and waste is generated during measurement.
Ultraviolet (UV) spectroscopy is another well-known method for determining NO 3 − , which is based on the strong UV absorption spectrum of NO 3 − [7]. It is a standard method for NO 3 − analysis by the American Public Health Association [8]. The advantages of this method include its simplicity and speed of data acquisition. It avoids the use of any chemical reagents. Therefore, it can easily be developed into an underwater sensor for long-term monitoring. However, this method is susceptible to interference from high concentrations of Cl − and Br − (or sea salt, salinity) in seawater, which have strong UV absorbance in the NO 3 − absorption range [9]. Previously, multi-wavelength measurement and classic least square (CLS) regression were used to separate the overlapping spectra and measure NO 3 − [9][10][11][12]. However, it did not take NO 2 − into account. Although NO 2 − is the least abundant of the major inorganic nitrogen ions (NH 4 + , NO 3 − , and NO 2 − ) [13], it can accumulate at concentrations up to 10 µM in low-oxygen estuary and coastal waters, oxygen-deficient zones, and upwelling regions [14][15][16][17][18]. In these areas, NO 2 − may influence the NO 3 − measurement considering NO 2 − has a similar UV absorption spectrum to that of NO 3 − (Figure 1). To date, there is no report on simultaneous determination of NO 3 − and NO 2 − in seawater using UV spectroscopy. Langergraber et al. (2003) and Rieger et al. (2004Rieger et al. ( , 2008) used a submersible UV/VIS spectrometer combined with partial least squares (PLS) regressions to monitor NO 3 − , NO 2 − , and even chemical oxygen demand, in the effluent of municipal wastewater treatment plants [19][20][21]. However, this method cannot be applied to analyze NO 3 − and NO 2 − in seawater. The reason for this is that they do not eliminate the interference from sea salt and chromophoric dissolved organic matter (CDOM) in seawater.
Molecules 2021, 26, x FOR PEER REVIEW 2 of 12 term monitoring. However, this method is susceptible to interference from high concentrations of Cl − and Br − (or sea salt, salinity) in seawater, which have strong UV absorbance in the NO3 − absorption range [9]. Previously, multi-wavelength measurement and classic least square (CLS) regression were used to separate the overlapping spectra and measure NO3 − [9][10][11][12]. However, it did not take NO2 − into account. Although NO2 − is the least abundant of the major inorganic nitrogen ions (NH4 + , NO3 − , and NO2 − ) [13], it can accumulate at concentrations up to 10 μM in low-oxygen estuary and coastal waters, oxygen-deficient zones, and upwelling regions [14][15][16][17][18]. In these areas, NO2 − may influence the NO3 − measurement considering NO2 − has a similar UV absorption spectrum to that of NO3 − ( Figure  1). To date, there is no report on simultaneous determination of NO3 − and NO2 − in seawater using UV spectroscopy. Langergraber et al. (2003) and Rieger et al. (2004Rieger et al. ( , 2008) used a submersible UV/VIS spectrometer combined with partial least squares (PLS) regressions to monitor NO3 − , NO2 − , and even chemical oxygen demand, in the effluent of municipal wastewater treatment plants [19][20][21]. However, this method cannot be applied to analyze NO3 − and NO2 − in seawater. The reason for this is that they do not eliminate the interference from sea salt and chromophoric dissolved organic matter (CDOM) in seawater. In this study, we aimed to (1) develop a direct, reagent-free, ultraviolet spectroscopic method to determine NO3 − , NO2 − , and salinity simultaneously based on the PLS model: (2) select an appropriate number of factors and optimal wavelength range for the PLS model; (3) evaluate the performance of the proposed method; (4) compare the results of the PLS and CLS models; (5) apply the proposed method to estuarine water samples.

PLS Regressions
PLS is a kind of multivariate calibration method based on factor analysis. It is a combination of principal component analysis, multiple linear regression analysis, and canonical correlation analysis. The theoretical basis for PLS regression can be found in several references [22][23][24][25][26]. PLS establishes a quantitative relationship (Equations (1) and (2)) between an n × m matrix (X) of independent variables (absorbance at each wavelength, in this case) and an n × k matrix (Y) of the predicted values of the variables (NO3 − , NO2 − , and salinity, in this case). The PLS model can be written as follows where P and Q are the loading matrices of X and Y, which give information about weights for each predictor in X when calculating latent variables (factors); E and F are the matrices In this study, we aimed to (1) develop a direct, reagent-free, ultraviolet spectroscopic method to determine NO 3 − , NO 2 − , and salinity simultaneously based on the PLS model: (2) select an appropriate number of factors and optimal wavelength range for the PLS model; (3) evaluate the performance of the proposed method; (4) compare the results of the PLS and CLS models; (5) apply the proposed method to estuarine water samples.

PLS Regressions
PLS is a kind of multivariate calibration method based on factor analysis. It is a combination of principal component analysis, multiple linear regression analysis, and canonical correlation analysis. The theoretical basis for PLS regression can be found in several references [22][23][24][25][26]. PLS establishes a quantitative relationship (Equations (1) and (2)) between an n × m matrix (X) of independent variables (absorbance at each wavelength, in this case) and an n × k matrix (Y) of the predicted values of the variables (NO 3 − , NO 2 − , and salinity, in this case). The PLS model can be written as follows where P and Q are the loading matrices of X and Y, which give information about weights for each predictor in X when calculating latent variables (factors); E and F are the matrices 3 of 12 of X and Y residuals (both with the same dimensions as the original absorbance and concentration matrices, respectively); T and U represent the score matrix for X and Y, which can summarize X and predict Y with small errors in E and F. The decompositions of X and Y are made to maximize the covariance between T and U. The PLS model is built using a calibration or training set of samples that have known property values. Following the establishment of a satisfactory model, the property values in a prediction set of samples can be predicted. Here, the experimental spectra (matrix X) of single and mix standards, with known concentrations NO 3 − , NO 2 − , and salinity (matrix Y), were used as the calibration set.
PLS calibration of a multi-component system can be performed in two different ways, PLS1 and PLS2. In PLS1, a separate set of scores and loading vectors is tuned and calculated for each variable (NO 3 − , NO 2 − , and salinity). In PLS2, several variables (NO 3 − , NO 2 − , and salinity) are fitted simultaneously, and there is one common set of factors for NO 3 − , NO 2 − , and salinity [25,27]. Therefore, PLS1 should give more accurate predictions than PLS2, especially when one of the variables is influenced by a number of factors different to other variables in the mixture [26,28,29]. However, PLS2 can simplify the procedure and allows for simultaneous graphical inspection. Thus, PLS2 is faster to use than PLS1. However, it should be noted that PLS2 usually performs equally well or worse than PLS1 if there is a weak or no correlation between response variables [30,31]. In the present study, the results from PLS1 and PLS2 models are compared.

Interference from CDOM
CDOM comprises a significant fraction of the DOM pool in natural waters (~10-90%) [32]. It has strong absorption in the UV range [33,34]. Thus, the interference with NO 3 − and NO 2 − from CDOM is of particular concern. CDOM is a mixture of many organic compounds that differ spatially and temporally due to their origin. The spectral shapes of CDOM vary with the compositions of CDOM. It is difficult to link optical absorbance directly to CDOM concentrations or compositions [35,36]. Therefore, CDOM cannot be added to the predictor variable list for the model building. Many previous studies have suggested that the UV absorption spectrum of CDOM in seawater fits an exponential function with wavelengths [36][37][38][39], which can be given in Equation (3) where λ is the wavelength (nm), λ 0 is a reference wavelength (nm); A CDOM (λ) and A CDOM (λ 0 ) are the CDOM absorbance at the wavelength of λ and λ 0 ; k is a background constant (m −1 ), which accounts for light scattering in the cuvette and drift of the instrument. S is the spectral slope (nm −1 ) that describes the approximate exponential rate of decrease in absorption with increasing wavelengths. In Equation (3), S and a are used to define differences among different samples. Usually, S is calculated over a broad wavelength range (e.g., 275-295, 350-400, and 300-600 nm) [36,38,39]. Several previous studies used a quadratic function [9,40] or linear function [41,42] to fit, approximately, the CDOM spectra. Given the comparatively high concentrations of CDOM in estuarine and coastal waters, in this study, we used an exponential function to fit the absorption spectrum of CDOM between 275 to 295 nm. The wavelength 300 nm was chosen as the reference wavelength (λ 0 ). Then this function was applied to the wavelength from 200 to 240 nm. Thus, the CDOM absorbance can be subtracted from the raw spectra in the developed PLS model.

CLS Regression
CLS is the simplest and most widely used technique for solving overdetermined systems. In its most important application-data fitting-it finds a hyperplane through a set of data points while minimizing the sum of squared errors [43,44]. For comparison with PLS, we also used CLS regression to fit the absorbance spectra of seawater samples and obtain NO 3 − and NO 2 − concentrations according to Equation (4) where b is the pathlength (cm) of the optical cell, ε is the absorption coefficient of the subscripted species (L mol −1 m −1 for NO 3 − and NO 2 − , PSU −1 m −1 for salinity), C is the concentration of the subscripted species. Each ε value can be obtained by measuring the absorption in standard solutions of each chemical species. The concentrations of NO 3 − , NO 2 − , salinity, and all the CDOM coefficients (Equation (4)), were fitted together.

Reagents
All chemicals were of analytical reagent grade and supplied by the Sigma-Aldrich Company (Shanghai, China). The standard solutions of NO 3 − and NO 2 − were freshly prepared from NaNO 3 , NaNO 2 , and deionized water (Milli-Q water, 18.2 MΩ) before use.

Apparatus and Software
A UV-Vis spectrophotometer (Specord plus 210, Analytik Jena AG, Germany) was used to collect absorbance data from 200 to 300 nm. Due to the comparatively low concentrations and absorbance of NO 2 − , all the samples were measured in a 3.0 cm quartz cuvette. Milli-Q water was used as the reference. The spectral resolution was set as 1 nm. A higher resolution (e.g., 0.2-0.8 nm) yields similar results. All data-processing scripts, including PLS1, PLS2 (see the Supplementary Materials), and CLS regressions, were written in MATLAB for Windows (Mathworks, version 2019b).

Model Validation
The evaluation of the modelling error was obtained from the analysis of the predicted vs. actual concentration plots, being the root mean square error of the prediction data (RMSEP) which provides information about the fit of the model to the calibration data, the correlation coefficient (R 2 ) between predicted and actual concentration values of the prediction set, and the relative percentage error in concentration prediction (RE). These definitions are as follows (Equations (5)- (7)) where C i , andĈ i are the real and predicted concentration in the ith sample, C i andĈ i are the mean of the real and the predicted concentrations of all the samples in the predicted sets. N is the number of samples in the prediction set.

The Calibration and Prediction Sample Sets
The samples for the calibration and prediction sets were prepared using seawater samples collected in the Changjiang estuary with known concentrations of NO 3 − , NO 2 − , and salinity spiked with NO 3 − and NO 2 − standard solutions. An experimental design was used to construct the calibration set to provide a good prediction. As shown in Table 1, 34 samples were selected as the training and the prediction set, which included one-, two-and three-components of NO 3 − , NO 2 − , and salinity with various concentrations. For the prediction set of 20 samples, their compositions were randomly designed within similar ranges of NO 3 − , NO 2 − , and salinity in the training set.

Selection of the Optimal Number of Factors
The number of factors, or latent variables, is an important parameter governing the performance of the PLS model. The introduction of an unnecessary number of factors may result in the overfitting of the calibration curve. To select the number of latent variables in PLS regression, a cross-validation procedure of leaving out one sample at a time was employed for PLS in order to model the compositions without overfitting the data [22,26,45]. From the set of 34 calibration spectra, the PLS calibration was performed on 33 spectra. Using this calibration, the concentration of the compounds in the sample left out was predicted. This process was repeated 34 times until each calibration sample had been left out once during the calibration process. The concentration predicted for each sample was then compared with its known concentration. The sum of the squared concentration prediction errors for all calibration samples (prediction error sum of squares (PRESS)) was used to determine how well a particular PLS model fitted the concentration data. This is defined in Equation (7).
One reasonable choice for the optimum number of factors (h) would be the number that yielded a minimum PRESS value. However, in many cases, the minimum PRESS value resulted in the overfitting of the data, given that it is based on a finite number of samples and thus subject to error [22,43]. A frequently used methodology to determine h is based on both the value of PRESS and a Q 2 threshold, defined as follows (Equations (8)-(10)) [46,47].
where PRESS h is the predictive residual error sum of squares when the number of components is equal to h. RESS h−1 is the residual sum of squares when the number of components is h−1. C i is the real concentration of the analyte andĈ h(−i) is the fitting concentration of the analyte in the ith sample computed by the PLS regression after deleting the ith sample and using h factors.Ĉ (h−1)i is the fitting concentration of the analyte in the ith sample computed by the PLS after using all the sample points and h−1 factors. The factor h is considered significant (p ≤ 0.05) for the prediction [43,44].
When Q h 2 is less than 0.0975, adding another factor does not improve model precision. For the PLS1 model, a cross-validation procedure was run three times for NO 3 − , NO 2 − , and salinity separately; thus, the factors were also calculated for NO 3 − , NO 2 − , and salinity separately. For the PLS2 model, the cross-validation was run only once and the number of factors were calculated only once for NO 3 − , NO 2 − , and salinity simultaneously. Take the wavelength range of 215-240 nm as an example; the PRESS and Q h 2 of PLS1 and PLS2 models are shown in Table 2. It can be seen that both the PLS1 and PLS2 models give the same factors of 4 for NO 3 − , NO 2 − , and salinity. The cumulative contribution rates for 4 factors reached 99.99%.

Wavelength Selection
The wavelength selection is carried out to choose a subset of spectral channels with which the established calibration model can give the minimum errors of the prediction. The optimal wavelength selection offers two clear benefits. Firstly, it has been shown that the inclusion of uninformative wavelengths in the training process negatively affects the accuracy of predictions and model interpretability [48,49]. Secondly, from a more practical point of view, the identification of a few wavelengths, or regions of the optical spectrum, that contain information about chemical species, significantly reduces the time and cost associated with their measurement and enables the development of portable and high-speed optical sensors.
We used interval partial least squares (iPLS) to optimize the wavelength selection proposed by Norgaard et al. (2000) [50], which is to split the spectrum into different intervals and treat each interval as a variable, then the RMSEP, R 2 , and RE for each interval was calculated. The interval with maximal R 2 , and minimum RMSEP and RE, was chosen as the optimal wavelength interval. Here, the 200-300nm wavelength is equally divided into equal subintervals of 16 nm, 200-215 nm, 210-225, 220-235, . . . , 285-300 nm. Then the wavelength range with the lowest RMSEP value was chosen for further optimization using one-sided symmetrical optimization. The results of the PLS2 for several wavelength ranges are shown in Figure 2. This suggests that the optimal wavelength interval is 215-240 nm, which gives the maximal R 2 , and minimal RE and RMSEP, for NO 3 − , NO 2 − , and salinity simultaneously. The plots of these predicted concentrations versus actual concentrations using the PLS2 model are shown in Figure 3. As can be seen, the predicted NO 3 − , NO 2 − , and salinity predicted are linearly correlated with the actual values, and all the correlation coefficients are > 0.98 ( Figure 3). As both the PLS1 and PLS2 models have 4 factors, the results of the PLS1 are the same as the PLS2. For reducing the complexity and computation time of the model, we recommend using the PLS2 model and wavelength of 215-240 nm for calibration and prediction.

Comparison of PLS2 and CLS Regressions
For comparison, we also built a CLS model to fit the NO 3 − , NO 2 − , and salinity based on Equation (4). The results obtained for the samples of the prediction set are shown in Figure 3. For NO 3 − and salinity, the results of CLS are also satisfactory. It is similar to previous studies [9,40,42], in which different CLS algorithms were used to fit NO 3 − concentrations and salinity in seawater. However, for several samples with low NO 2 − concentrations, the CLS model is less predictive than the PLS2 model. The reason for this may be that NO 2 − concentrations in seawater are significantly lower than NO 3 − , and thus, NO 2 − is more susceptible to the interference of sea salt, CDOM, hydrogen sulfide, and other unknown substances. Instead, PLS regression, as an indirect chemometric method, can lead to robust results even if not all the constituents are known [22,23].

Evaluation of the PLS2 Model
From the experimental data illustrated in Table 1, NO 3 − , NO 2 − , and salinity within the range of 0-85.62 µM, 0-20.12 µM, and 0-33.90 psu can be determined accurately by the PLS model. When NO 3 − concentration is higher than 100 µM, we suggest using a cuvette with a 1.0 or 2.0 cm pathlength instead of 3.0 cm in case the absorbance tends to saturate. To further evaluate the accuracy of the PLS2 model, recovery studies were carried out on seawater samples, to which known amounts of NO 3 − and NO 2 − were added ( Table 3). The percentage recovery for spiked samples ranged between 80 and 110%. The comparatively higher deviations from spiked concentrations were obtained from samples with low NO 3 − or NO 2 − concentrations. The detection limits of NO 3 − and NO 2 − in seawater were calculated as three times the standard deviation of 10 replicate analyses of a low-nutrient (surface) seawater. The standard deviation of the measurements was 0.07 and 0.10 µM for NO 3 − and NO 2 − , which gives NO 3 − and NO 2 − detection limits of 0.21 and 0.30 µM, respectively. The relative standard deviations for 10 repetitive analyses (n = 10) of a standard solution (3.71 µM NO 3 − + 1.48 µM NO 2 − ) were 3.16% and 7.42%, and another solution of 20.75 µM NO 3 − and 6.28 µM NO 2 − gave the relative standard deviation of 0.61% and 2.39%. Hence, the proposed method is quite precise for the quantitative determination of NO 3 − and NO 2 − in seawater, although the detection limits are comparatively higher than that of most colorimetric methods based on the Griess reaction [4][5][6]. Most importantly, it offers a simple, fast, and reagent-free method for the simultaneous determination of NO 3 − and NO 2 − .

Application and Comparison of the Predicted Results with Conventional Wet-Chemical Analyses
To evaluate the analytical applicability of the proposed PLS2 model, it was applied to the simultaneous determination of NO 3 − , NO 2 − , and salinity in water samples collected from the Changjiang estuary. These samples were filtered using 0.2µm polycarbonate filters to eliminate the interference from turbidity. For comparison, the NO 3 − and NO 2 − concentrations were also measured by conventional wet-chemical analyses (colorimetric Griess assay). The results are shown in Table 4, which suggests the good agreement of both methods. that contain information about chemical species, significantly reduces the time and cost associated with their measurement and enables the development of portable and highspeed optical sensors. We used interval partial least squares (iPLS) to optimize the wavelength selection proposed by Norgaard et al. (2000) [50], which is to split the spectrum into different intervals and treat each interval as a variable, then the RMSEP，R 2 , and RE for each interval was calculated. The interval with maximal R 2 , and minimum RMSEP and RE, was chosen as the optimal wavelength interval. Here, the 200-300nm wavelength is equally divided into equal subintervals of 16 nm, 200-215 nm, 210-225, 220-235, …, 285-300 nm. Then the wavelength range with the lowest RMSEP value was chosen for further optimization using one-sided symmetrical optimization. The results of the PLS2 for several wavelength ranges are shown in Figure 2. This suggests that the optimal wavelength interval is 215-240 nm, which gives the maximal R 2 , and minimal RE and RMSEP, for NO3 − , NO2 − , and salinity simultaneously. The plots of these predicted concentrations versus actual concentrations using the PLS2 model are shown in Figure 3. As can be seen, the predicted NO3 − , NO2 − , and salinity predicted are linearly correlated with the actual values, and all the correlation coefficients are > 0.98 ( Figure 3). As both the PLS1 and PLS2 models have 4 factors, the results of the PLS1 are the same as the PLS2. For reducing the complexity and computation time of the model, we recommend using the PLS2 model and wavelength of 215-240 nm for calibration and prediction.

Comparison of PLS2 and CLS Regressions
For comparison, we also built a CLS model to fit the NO3 − , NO2 − , and salinity based on Equation 4. The results obtained for the samples of the prediction set are shown in Figure 3. For NO3 − and salinity, the results of CLS are also satisfactory. It is similar to previous studies [9,40,42], in which different CLS algorithms were used to fit NO3 − concentrations and salinity in seawater. However, for several samples with low NO2 − concentrations, the CLS model is less predictive than the PLS2 model. The reason for this may be that NO2 − concentrations in seawater are significantly lower than NO3 − , and thus, NO2 − is more susceptible to the interference of sea salt, CDOM, hydrogen sulfide, and other unknown substances. Instead, PLS regression, as an indirect chemometric method, can lead to robust results even if not all the constituents are known [22,23].

Evaluation of the PLS2 Model
From the experimental data illustrated in Table 1, NO3 − , NO2 − , and salinity within the range of 0-85.62 μM, 0-20.12 μM, and 0-33.90 psu can be determined accurately by the PLS model. When NO3 − concentration is higher than 100 μM, we suggest using a cuvette   It should be noted that the in situ UV absorption spectrum, which is obtained at different temperatures, should be corrected according to the temperature dependence of bromide or sea salt, as suggested by Sakamoto et al. (2009Sakamoto et al. ( , 2017 [41,51]. However, here, we measured the UV absorption spectra in a laboratory at room temperature (~25 • C). Therefore, there is no need to perform the temperature and pressure correction.

Conclusions
A direct, reagent-free, ultraviolet spectroscopic was introduced for the simultaneous determination of NO 3 − , NO 2 − , and salinity in seawater. A PLS model was built for the resolution of the high overlapping spectra. This method has detection limits of 0.21 and 0.3 µM for NO 3 − and NO 2 − . It can be successfully used to determine NO 3 − , NO 2 − , and salinity, especially in estuarine and coastal waters with varying CDOM characteristics and different salinities. The simplicity, precision, and fast response time suggest that the proposed PLS model can be a valuable and cheap alternative to other chemical methods and can be used to build NO 3 − and NO 2 − sensors for seawater.
Supplementary Materials: All data-processing scripts for this article can be found online. Funding: This work was supported by financial support from the National Natural Science Foundation of China (NSFC 42076062).

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.