Feasibility of Near-Infrared Spectroscopy for Rapid Detection of Available Nitrogen in Vermiculite Substrates in Desert Facility Agriculture

: Fast and precise estimation of the available nitrogen content in vermiculite substrates promotes prescription fertilization in desert facility agriculture. This study explored near-infrared spectroscopy for rapid detection of the available nitrogen content in vermiculite substrates in desert facility agriculture. The spectra of vermiculite matrices with different available nitrogen contents were collected through a self-assembled near-infrared spectrometer. Partial least squares expression (PLSR) established the available nitrogen spectrum prediction model optimized using different pretreatments. After pretreatment, the prediction model of the available nitrogen spectrum was simpliﬁed by adopting three feature extraction methods. A comprehensive comparison of the results of each prediction model showed that the prediction model combining the ﬁrst derivative with SG smoothing pretreatment was the best. The correlation coefﬁcients of the corresponding calibration and prediction sets were 0.9972 and 0.9968, respectively. The root mean square errors of the calibration and prediction sets were 149.98 and 159.65 mg/kg, respectively, with 12.57 RPD. These results provide a feasible method for rapidly detecting the available nitrogen content of vermiculite substrates in desert facility agriculture.


Introduction
Combining information technology with modern agricultural operation and management systems is a new agricultural practice for positioning, timing, and quantification. The rapid development of precision agriculture in facility farming demands rapid and accurate detection technology for the precise control and management of desert facilities [1,2]. Precision agriculture is a new system in desert facility agriculture. The desert characteristics of massive evaporation and minimal agricultural materials birthed the development of facility agriculture which combines and precisely controls low-cost substrates with high water and fertilizer conservation [3]. Natural, inorganic, and non-toxic vermiculite ore expansion produces vermiculite substrate, which provides the nutrients and water necessary for long-term plant growth and effectively promotes crop root growth and stable seedling development. Vermiculite substrate is an alternative cultivation substrate in desert facility agriculture [4,5].
Notably, available nitrogen is easily absorbed and utilized by plants and is often used for short-term rapid nitrogen fertilizer supplementation to promote crop growth and ensure crop yield [6][7][8]. However, facility agriculture lacks the relevant technology and equipment to quickly detect the available nitrogen content of cultivation substrates. Thus, excessive application is generally adopted to ensure sufficient supplies, causing fertilizer overuse, serious waste, and agricultural pollution [9]. Presently, the detection of nitrogen in cultivation substrates mainly involves conventional chemical measurement of soil available nitrogen content, a time-and labor-intensive technique that pollutes the environment [10,11]. Therefore, methods for rapidly detecting the available nitrogen content in vermiculite substrates and determining in a timely manner the available nitrogen content in the cultivation environment are crucial for precise fertilization in desert facility agriculture and reducing agricultural pollution. Moreover, rapid and pollution-free methods for detecting nitrogen fertilizer levels in vermiculite substrates are urgently required.
The near-infrared (NIR) spectrum detection technique, widely used in food, medicine, agriculture, the chemical industry, and other fields, is suggested. The NIR technique is simple, rapid, nondestructive, and pollution-free [12]. The absorption bands in the NIR spectrum are all related to hydrogen groups (such as C-H, N-H, and O-H) and can analyze specific structures of chemical components. NIR spectroscopy has been widely used for detecting soil composition [13]. Many studies have shown that the NIR spectrum rapidly and non-destructively detects soil available nitrogen, phosphorus, and potassium [14][15][16]. This study adopted the NIR spectroscopy to evaluate the available nitrogen content of vermiculite substrates in crop cultivation.
Despite the research progress in detecting available nitrogen content, most reports have focused on detection from soil, with only a few reports on detection in soil-less cultivation substrates. Vermiculite is an agricultural mineral matrix used in desert facilities. Therefore, rapid and accurate detection of available nitrogen content can facilitate the improvement of available nitrogen and water usage, further promoting and popularizing desert agricultural facilities. However, a systematic technical system for detecting the available nitrogen content of vermiculite is still lacking. Therefore, this study explored the possibility of using near-infrared spectroscopy to detect the available nitrogen content of vermiculite matrices rapidly. The research provides technical support for the agricultural application of vermiculite, specifically, a NIR spectral prediction model for rapidly detecting available nitrogen content in cultivation environments, thus promoting the modernization of desert agricultural facilities.
This study aimed at: (1) a vermiculite available nitrogen NIR spectroscopy system with a 940-1660 nm wavelength range; (2) establishing a prediction model of full spectral data using the partial least squares regression (PLSR) of different spectral pretreatments for detecting the nitrogen content of vermiculite substrate; (3) using the successive projections algorithm (SPA), competitive adaptive reweighted sampling (CARS), and synergy interval partial least squares (Si-PLS) to screen the characteristic NIR spectroscopic wavelengths of vermiculite available nitrogen; (4) analyzing and comparing quantitative prediction models of the spectral data, available nitrogen, and the prediction results of different feature extraction methods. The best NIR spectroscopic model for predicting the available nitrogen content of vermiculite substrates was selected by comparing the modeling results of characteristic variables and the full spectral data.

Experimental Materials
White vermiculite from the Qeganbulak Vermiculite mine in the northeast corner of the Taklimakan Desert, Xinjiang Uygur Autonomous Region (China) was used as the vermiculite substrate material. Before the experiment, the vermiculite substrates were immersed in a container of deionized water for 24 h, then filtered with gauze to remove impurities, and the substrates were dried naturally. Then, 10 g was weighed from each dried substrate using an analytical balance with a 0.001 g sensing accuracy and successively added to an aluminum box (40 mm diameter). The reference concentration gradient was the range of available nitrogen content in the actual cultivation environment. Then, different concentrations of nitrogen fertilizer solutions were artificially and successively added to the aluminum box in a way that simulated fertilization. The available nitrogen content of the substrate in the aluminum box was similar to the actual cultivation environment. The numbers were marked and recorded. Vermiculite substrate samples containing nitrogen fertilizer solutions were placed in a closed environment for 24 h to ensure that vermiculite substrates fully absorbed the nitrogen fertilizer. Next, the vermiculite samples were placed in a 105 • C thermostatic drying oven to eliminate moisture that might influence the spectral data. Dried vermiculite substrate samples were crushed successively by a crusher and screened through a 0.25 mm screen to form vermiculite substrate powder with uniform particle size. Before each grinding, the crusher was cleaned, and each screened sample of vermiculite substrate was weighed and added into a clean aluminum box of 40 mm diameter. Weighed samples were compacted with 500 g calibration weights to eliminate the influence of particle size heterogeneity on spectral data collection.

NIR Spectrum Measurement System and Spectral Data Acquisition
A locally prepared reflectance NIR spectroscopy collection system determined the reflectance spectral data of the vermiculite substrate samples ( Figure 1). The local NIR spectrum collection system is composed of a Flame NIR spectrometer (FLAME-NIR-INTSMA25, Ocean Optics Co., Ltd., Dunedin, FL, USA), four 35 W halogen tungsten lamps (Philips halogen 12V, Shanghai Philips Co., Ltd., Shanghai, China), an optical fiber with a fiber probe, a ball screw, a sample chamber, and a computer control unit and display.
dried substrate using an analytical balance with a 0.001 g sensing accuracy and successively added to an aluminum box (40 mm diameter). The reference concentration gradient was the range of available nitrogen content in the actual cultivation environment. Then, different concentrations of nitrogen fertilizer solutions were artificially and successively added to the aluminum box in a way that simulated fertilization. The available nitrogen content of the substrate in the aluminum box was similar to the actual cultivation environment. The numbers were marked and recorded. Vermiculite substrate samples containing nitrogen fertilizer solutions were placed in a closed environment for 24 h to ensure that vermiculite substrates fully absorbed the nitrogen fertilizer. Next, the vermiculite samples were placed in a 105 °C thermostatic drying oven to eliminate moisture that might influence the spectral data. Dried vermiculite substrate samples were crushed successively by a crusher and screened through a 0.25 mm screen to form vermiculite substrate powder with uniform particle size. Before each grinding, the crusher was cleaned, and each screened sample of vermiculite substrate was weighed and added into a clean aluminum box of 40 mm diameter. Weighed samples were compacted with 500 g calibration weights to eliminate the influence of particle size heterogeneity on spectral data collection.

NIR Spectrum Measurement System and Spectral Data Acquisition
A locally prepared reflectance NIR spectroscopy collection system determined the reflectance spectral data of the vermiculite substrate samples ( Figure 1). The local NIR spectrum collection system is composed of a Flame NIR spectrometer (FLAME-NIR-INTSMA25, Ocean Optics Co., Ltd., Dunedin, FL, USA), four 35 W halogen tungsten lamps (Philips halogen 12V, Shanghai Philips Co., Ltd., Shanghai, China), an optical fiber with a fiber probe, a ball screw, a sample chamber, and a computer control unit and display. The wavelength range of the collected spectral data was 940-1660 nm, with a 5.8 nm sampling wavelength interval and 6000:1 signal-to-noise ratio. The tungsten halogen light source was preheated for 30 min before acquiring spectral data using the self-made detection system. In addition, the vermiculite substrate spectral data was acquired at 386 ms, 3 nm, 3 ms integration time, smoothness (average sliding width), and average scanning time, respectively. The optical fiber probe and sample surface were 20 mm apart. All spectral acquisition and testing were completed by the Oceanview windows launcher 2.0.7, a professional testing software package provided by Ocean Optics Co., Ltd., Dunedin, FL, USA. After pre-warming, a standard Teflon whiteboard generated the white reference to circumvent external environmental influence on the spectral data.
While obtaining the black reference spectral data, the halogen tungsten light source was wrapped in a black shield to form a dark environment and reduce interference from the unstable light source. Collecting the spectral data of vermiculite samples requires an The wavelength range of the collected spectral data was 940-1660 nm, with a 5.8 nm sampling wavelength interval and 6000:1 signal-to-noise ratio. The tungsten halogen light source was preheated for 30 min before acquiring spectral data using the self-made detection system. In addition, the vermiculite substrate spectral data was acquired at 386 ms, 3 nm, 3 ms integration time, smoothness (average sliding width), and average scanning time, respectively. The optical fiber probe and sample surface were 20 mm apart. All spectral acquisition and testing were completed by the Oceanview windows launcher 2.0.7, a professional testing software package provided by Ocean Optics Co., Ltd., Dunedin, FL, USA. After pre-warming, a standard Teflon whiteboard generated the white reference to circumvent external environmental influence on the spectral data.
While obtaining the black reference spectral data, the halogen tungsten light source was wrapped in a black shield to form a dark environment and reduce interference from the unstable light source. Collecting the spectral data of vermiculite samples requires an equal distance between the surface of each sample and the optical fiber probe. Moreover, the position and brightness of the light source should remain constant to reduce measurement errors.

Laboratory Chemical Measurements
Based on the Chinese forestry standard LY/T1229-1999, the available nitrogen of vermiculite was determined by the alkali hydrolysis diffusion method with proper adjustments. The 144 test samples were prepared by weighing 1 g of dried vermiculite substrate powder through a 0.25 mm sieve and spreading it evenly in an extracellular chamber of a Conway dish. Simultaneously, three blank vermiculite-free tests were set up, and 3 mL of 20 g/L boric acid indicator solutions were added to the intracellular chamber of the Conway dish. The boric indicator solution was composed of 100 mL of 20 g/L boric acid (AR, Beijing Chemical Plant, Beijing, China), 2 mL of 0.1 g methyl red (IND, Tianjin Guangxia Fine Chemical Institute, Tianjin, China), and 0.5 g bromocresol green (IND, Tianjin Guangxia Fine Chemical Institute, China). The constituents were dissolved in methyl red-bromocresol green indicator with 100 mL of 95% ethanol (AR, Tianjin Beilian Fine Chemicals Development Co., Ltd., Tianjin, China) and adjusted to pH 4.5. An alkaline gel was evenly smeared on the external edge of the Conway dish and covered with frosted glass. After sealing, 10 mL of 1.2 mol/L NaOH solution from the frosted glass edge was added to the extracellular chamber of the Conway dish. The frosted glass cover was bound to the Conway dish using a rubber band to form a cross bundling and a closed space inside the Conway dish. The bound Conway dish was placed in a thermostatic incubator at 40 • C for 24 h of alkaline hydrolysis diffusion reaction. After that, the mixture absorption solution of boric acid and indicator in the intracellular chamber was titrated with 0.01 mol/L standard hydrochloric acid solution (Xiamen Science and Technology Co., Ltd., Xiamen, China). The titration was terminated when the absorption solution changed from blue-green to purplish-red. The content of available nitrogen in the vermiculite substrate was calculated following the titration consumption volume of the standard 0.01 mol/L hydrochloric acid solution. The calculation is as shown in Equation (1).
where W N is sample available nitrogen content, mg/100 g; V is the consumption of standard titration acid solution of the test sample, mL; V 0 is the consumption of standard titration acid solution of the blank sample, mL; C is the concentration of the standard titration acid solution, mol/L; m is the weight of the sample, g; k is the water reduction coefficient of the air-dried sample.

Spectral Preprocessing
The acquired NIR data had interference in the inform of background noise and scattering [17], which could negatively influence the prediction accuracy of the model. Therefore, the first-and second-order derivatives, Savitzky-Golay smoothing (SG), multiplicative scatter correction (MSC), standard normal variable transformation (SNV), and derivativeintegrated SG smoothing were used to preprocess the spectral data and reduce the noise and scattering interference. The first-and second-order derivatives reduced background interference, enhancing spectral characteristics. Savitzky-Golay smoothing eliminated high-frequency noise, smoothened the spectral data, increased the signal-to-noise ratio, and retained important information. The window size and degree of the polynomial were 12 and 2, respectively [18]. The MSC eliminated the spectral differences caused by different scattering levels, enhancing the correlation between spectra and data [19]. However, SNV eliminated the influence of solid particle differences and information about the measured substances [20]. Finally, the derivative combined with the SG smoothing method eliminates the influence of solid particle difference and the content information of the measured substance, thus performing spectral data pretreatment.

Establishment and Evaluation of the Spectral Prediction Model
The absorption bands in the NIR spectral region were all related to hydrogen-containing groups. Therefore, the spectra of the available nitrogen content of the vermiculite substrate and wavelength were irrelevant for detecting the available nitrogen content. Moreover, the spectral data corresponding to these wavelengths significantly interfered with the available nitrogen content detection [21]. In this study, the characteristic variables closely related to available nitrogen content detection were selected to reduce the interference of irrelevant variables and enhance the prediction accuracy of the model.
The data was preprocessed using SPA, CARS, and SI-PLS to simplify the model and enhance its detection accuracy for the characteristic variables of the spectral data for the available nitrogen content of vermiculite substrate. The SPA method screened the characteristic variables via a forward loop. This technique (SPA) minimizes collinearity in the vector space of the spectral data and reduces spectral substrate redundancy [22]. The CARS method, however, uses Monte Carlo sampling and partial least squares (PLS) regression coefficients to significantly reduce the amount of data and establish the optimal combination of variables [23]. However, both SPA and CARS are univariate screening methods, and the selected wavelength variables are a subset of individually relevant variables. The Si-PLS algorithm considers the root mean square error of the synergy model as the accuracy measurement standard of each model [22]. The algorithm selects the lowest root mean square error combination and the corresponding sub-interval combination as the best combination.
The change in peak values of the spectral curve is the change of the available nitrogen content of vermiculite substrates. A linear relationship model between the spectral data of vermiculite substrates and the measured available nitrogen content was established using the PLSR to rapidly and quantitatively detect the available nitrogen content of vermiculite substrates [24]. The PLSR is an effective linear modeling method with the advantages of principal components, canonical correlation, and multiple linear regression analyses. PLSR is especially suitable for establishing the prediction model when the number of NIR spectral variables is larger than the number of samples. Therefore, the PLSR algorithm is useful for establishing the spectral prediction model of the available nitrogen content of vermiculite. All modeling was performed in MATLAB R2018b (Mathworks Inc., MA, USA).
The following indicators: calibration set correlation coefficient (RC), prediction set correlation coefficient (RP), the calibration set root mean square error (RMSEC), prediction set root mean square error (RMSEP), and the ratio of prediction to deviation (RPD) were applied to estimate the model performance. The root mean square error of the calibration set had the best prediction performance when the correlation coefficient and root mean square error of the calibration and the prediction sets of the prediction model were similar. Formulas (2)-(4) show the calculation methods for these evaluation indicators. When Rc and Rp are maximum, while RMSEC and RMSEP are minimum, RPD represents the accuracy level of the prediction model. The model was inapplicable for accurate quantitative prediction because RPD < 3; 3 ≤ RPD < 4 implied better prediction performance. An RPD ≥ 4 implied outstanding performance of the detection model [25,26].
where R is the correlation coefficient, n is the number of samples; y i,actuual is the measured reference of the available nitrogen content of the ith sample; y i,predicted is the spectrapredicted available nitrogen content of the ith sample; y average is the average of the average available nitrogen content of the reference sample; RMSE is the root mean square error; RPD is the ratio of prediction-to-deviation; RMSEP is the root mean square error of the prediction set; and SD is the standard deviation of the prediction set.

Grouping Statistics of the Available Nitrogen Content of Vermiculite
Using NIR spectroscopy, the available nitrogen content in 144 samples was determined to establish a stable and reliable model for predicting the available nitrogen in vermiculite. The spectral data of 144 vermiculite substrate samples were sorted in descending order following the chemical measurements of available nitrogen content. The samples collected at equal concentration intervals were divided into two groups, and 25% of the 144 samples were selected as the prediction set. Table 1 shows the content range and distribution of available nitrogen in the calibration and prediction sets.  Table 1 showed that sample available nitrogen chemical measurement value of calibration set covered the whole range of the sample set and had strong representativeness, and the sample distribution of the two sets was similar. The classification of the two sets met the requirements of spectral data grouping for establishing a model with better stability and robustness.

Analysis of the Spectral Data of Vermiculite
A self-made NIR spectroscopy system collected the reflectance spectral data of vermiculite samples. The reflectance spectral data of 144 vermiculite substrate samples was collected with a 940-1660 nm wavelength range. In Figure 2, below, the x-axis represents the wavelength and the y-axis the reflectance.
reference of the available nitrogen content of the ith sample; yi,predicted is the spectra-pre-dicted available nitrogen content of the ith sample; yaverage is the average of the average available nitrogen content of the reference sample; RMSE is the root mean square error; RPD is the ratio of prediction-to-deviation; RMSEP is the root mean square error of the prediction set; and SD is the standard deviation of the prediction set.

Grouping Statistics of the Available Nitrogen Content of Vermiculite
Using NIR spectroscopy, the available nitrogen content in 144 samples was determined to establish a stable and reliable model for predicting the available nitrogen in vermiculite. The spectral data of 144 vermiculite substrate samples were sorted in descending order following the chemical measurements of available nitrogen content. The samples collected at equal concentration intervals were divided into two groups, and 25% of the 144 samples were selected as the prediction set. Table 1 shows the content range and distribution of available nitrogen in the calibration and prediction sets.  Table 1 showed that sample available nitrogen chemical measurement value of calibration set covered the whole range of the sample set and had strong representativeness, and the sample distribution of the two sets was similar. The classification of the two sets met the requirements of spectral data grouping for establishing a model with better stability and robustness.

Analysis of the Spectral Data of Vermiculite
A self-made NIR spectroscopy system collected the reflectance spectral data of vermiculite samples. The reflectance spectral data of 144 vermiculite substrate samples was collected with a 940-1660 nm wavelength range. In Figure 2, below, the x-axis represents the wavelength and the y-axis the reflectance.  As shown in Figure 2, the basic variation trend of sample spectral curves is consistent. The difference between sample spectrum curves of vermiculite substrates shows the reflectivity of the absorption peak amplitude. The spectral curve showed no significant change, implying the similar composition and chemical structure of the samples. The absorption peak reflectance amplitude difference is due to the content differences of the same substance in different samples.
The spectral curve showed three obvious absorption peaks near 1390, 1460, and 1570 nm.  [27][28][29]. The characteristic peak at approximately 1120 nm was related to the vibration absorption of secondary frequency doubling and combination frequency in C-H (CH2, CH3). In addition, that characteristic peak represented the vibration absorption of methylene CH in the secondary frequency doubling region [30]. Therefore, most of the characteristic peaks reflecting the available nitrogen content were weak absorption peaks.

NIR Spectroscopy of Available Nitrogen Content Based on All-Band Spectral Data of Vermiculite Substrates
The PLSR algorithm established the prediction model of available nitrogen content in vermiculite substrates using full spectral data. The original spectral data of vermiculite substrates were pretreated with different algorithms to improve further the performance of the model for the available nitrogen content of vermiculite substrates. In this article, vermiculite matrix samples were arranged according to the different concentrations of available nitrogen content by taking four samples as one group. Next, 25% and 75% values of each group were extracted and combined into different sample intervals. Finally, the correction set models of the different sample intervals were established using the PLSR algorithm. The values of RC, RP, RMSEC, RMSEP, and RPD were used to select the optimal number of principal components, where RC > RP, RMSEC < RMSEP, and RMSEP < 2RMSEC. Finally, the optimal principal component score was used to calculate the PLSR prediction model and the RPD value. Table 2 shows the prediction results. As shown in Table 2, different spectral pretreatment methods significantly affected the PLSR model of vermiculite available nitrogen. All other pretreatments improved the prediction accuracy of the model except MSC and SNV pretreatments. The MSC and SNV pretreatments are generally used to eliminate the solid particle size difference and the impact of the sample surface scattering. All vermiculite substrate samples in this study were compacted after crushing and screening. The vermiculite sample particle size and surface scattering effect were indifferent. Thus, MSC and SNV pretreatments instead introduced irrelevant information, reducing model prediction accuracy. Combining the second derivative pretreatment with SG smoothing had the best effect and significantly improved the prediction effect of the model. The correlation coefficients of the calibration and the prediction sets of the optimal model were 0.9982 and 0.9977, respectively. The RPD was 12.14, indicating that the model had the best prediction performance.

Spectroscopic Measurement and Analysis of the Available Nitrogen Content of the Vermiculite Substrate Based on SPA-Screened Characteristic Wavelengths
The characteristic variables of the original and various preprocessed spectral data were SPA-screened to eliminate the influence of multicollinearity in NIR spectroscopy, reduce the number of variables involved in spectral modeling, and enhance modeling effectiveness. Moreover, PLSR established the prediction model for the nitrogen content of vermiculite substrates using the spectral data after SPA feature extraction. The modeling results are listed in Table 3. As shown in Table 3, the SPA screening of spectral data of characteristic variables greatly reduces multicollinearity and improves the prediction accuracy of the model. In the original spectrum, the first derivative and the second derivative had the largest data elimination, but the RPD of the prediction model was low. The modeling effect of SG smoothing alone and combining the derivative with SG smoothing performed better than other pretreatments. After combining the first derivative with SG smoothing, the best model had the optimal data prediction accuracy in PLSR modeling. The modeling variables were reduced from 128 to 54, and the correlation coefficients of the calibration and prediction sets of the optimal model were 0.9969 and 0.9966, respectively. The RPD was 10.99, indicating that the prediction performance of the model was excellent. Figure 3 shows the results of variable screening after using SPA to screen characteristic variables of the first derivative combined with SG smoothing pretreatment for spectral data.

Spectroscopic Measurement and Analysis of the Available Nitrogen Content of the Vermiculite Substrate Based on CARS-Screened Characteristic Wavelengths
The CARS algorithm screened the original and pretreated spectral data to reduce the number of variables for spectral modeling and obtained the main characteristic variables reflecting the change of available nitrogen content in vermiculite substrates. Subsequently, PLSR established a prediction model of the nitrogen content in vermiculite substrates based on the spectral data after CARS feature extraction and the results are listed in Table 4.  The screened characteristic variables were distributed near the 1390 and 1500 nm absorption peaks in the spectral band, as shown in Figure 3. The absorption peak near 1390 nm is related to the stretching vibration of the silanol O-H group in vermiculite, reflecting the characteristics of vermiculite substrates. The absorption peak near 1500 nm is related to the first frequency doubling vibration of the N-H group, closely related to the change in the available nitrogen content of vermiculite substrates.

Spectroscopic Measurement and Analysis of the Available Nitrogen Content of the Vermiculite Substrate Based on CARS-Screened Characteristic Wavelengths
The CARS algorithm screened the original and pretreated spectral data to reduce the number of variables for spectral modeling and obtained the main characteristic variables reflecting the change of available nitrogen content in vermiculite substrates. Subsequently, PLSR established a prediction model of the nitrogen content in vermiculite substrates based on the spectral data after CARS feature extraction and the results are listed in Table 4. As shown in Table 4, the CARS screening algorithm greatly reduces modeling redundancy and improves the prediction accuracy of the model. In PLSR modeling, CARS screening showed optimal prediction accuracy after combining the first derivative with SG smoothing. The modeling variables were reduced from 128 to 31. The amount of data was reduced by approximately 75%. The correlation coefficients of the calibration and prediction sets of the optimal model were 0.9972 and 0.9968, respectively. The root mean square errors (RMSE) of the calibration and prediction sets were 149.98 and 159.65 mg/kg, respectively. However, the RPD was 12.57. The prediction performance of the model was good. The CARS algorithm uses interactive verification to find the optimal combination of variables. Thus, CARS greatly reduces the multicollinearity of vermiculite substrate spectral data and the irrelevant variables of available nitrogen content spectral detection. However, CARS accurately screens out the spectral data by directly reflecting the available nitrogen content of vermiculite substrates and improving modeling efficiency.

Spectroscopic Measurement and Analysis of the Available Nitrogen Content of the Vermiculite Substrate Based on Si-PLS-Screened Characteristic Wavelengths
The chemical composition content information in NIR spectral data might be a wavelength region with a specific bandwidth. Therefore, the Si-PLS algorithm was adopted to screen the original and pretreated spectral data. All the 128 detection wavelength variables were approximately divided into 20 sub-intervals. The best four groups of intervals were selected to establish the prediction model and reduce the involvement of irrelevant spectral data in detecting available nitrogen content. The main characteristic variable regions reflecting the change of available nitrogen content of vermiculite substrates were obtained. Subsequently, PLSR was used to establish a prediction model of nitrogen content in vermiculite substrates based on the spectral data after Si-PLS feature extraction. The modeling results are listed in Table 5. As displayed in Table 5, the Si-PLS algorithm effectively reduces irrelevant information variables for detecting available nitrogen in vermiculite substrates. The input variables of the model were reduced from 128 to 28 and 24, respectively, while the amount of data was reduced by approximately 78%. The RPD values of the prediction models were all lower than the original spectral modeling because the Si-PLS interval introduces irrelevant information as compared to the univariate screening method, resulting in reduced modeling accuracy. However, PLSR modeling had the best effect on the spectral data, combining the first derivative with SG smoothing. The correlation coefficients of the calibration and prediction sets were 0.9964 and 0.9899, respectively, and the RPD was 6.40. Figure 4 shows the results where CARS were used to screen characteristic variables combining the first derivative with SG smoothing.

Prediction Model Performance for the Available Nitrogen Content of Vermiculite Substrates Based on All-Band Spectral Data and Characteristic Variables
The prediction performance of the optimal model was established by the full spectral data and the spectral data after screening and comparing three characteristic variables ( Table 6). The purpose was to achieve high precision and rapid detection of the available nitrogen content in vermiculite substrates and simplify the complexity of the model. Screening spectral data using SPA, CARS, and Si-PLS reduces the influence of multicollinearity and irrelevant information variables and improves the calculation speed of the prediction model. Table 6. Comprehensive comparison of the best prediction models from different algorithms for determining the available nitrogen content of a vermiculite matrix. Investigated from Table 6, the prediction model based on full and variable screening spectral data realized a rapid, nondestructive, quantitative detection model of the available nitrogen content in vermiculite substrate. The second derivative combined with the SG smoothing method was used to pretreat the spectral data. The optimal prediction model of the available nitrogen content of vermiculite substrates was established for the all-band spectral data. After combining the first derivative with the SG smoothing method, the optimal prediction model for the available nitrogen content of vermiculite substrates was established using SPA, CARS, and Si-PLS feature extractions. All three feature variable screening methods can effectively eliminate the redundant information in the spectral data. However, SPA had a poor simplification effect on the model, but CARS ×10 −3  Figure 4. Therefore, Si-PLS can find the optimal band region, greatly reducing the multicollinearity of vermiculite substrate spectral data and accurately screening the spectral data that directly reflects the available nitrogen content of vermiculite substrates. Thus, PLSR modeling can establish an excellent prediction model.

Prediction Model Performance for the Available Nitrogen Content of Vermiculite Substrates Based on All-Band Spectral Data and Characteristic Variables
The prediction performance of the optimal model was established by the full spectral data and the spectral data after screening and comparing three characteristic variables ( Table 6). The purpose was to achieve high precision and rapid detection of the available nitrogen content in vermiculite substrates and simplify the complexity of the model. Screening spectral data using SPA, CARS, and Si-PLS reduces the influence of multicollinearity and irrelevant information variables and improves the calculation speed of the prediction model. Investigated from Table 6, the prediction model based on full and variable screening spectral data realized a rapid, nondestructive, quantitative detection model of the available nitrogen content in vermiculite substrate. The second derivative combined with the SG smoothing method was used to pretreat the spectral data. The optimal prediction model of the available nitrogen content of vermiculite substrates was established for the all-band spectral data. After combining the first derivative with the SG smoothing method, the optimal prediction model for the available nitrogen content of vermiculite substrates was established using SPA, CARS, and Si-PLS feature extractions. All three feature variable screening methods can effectively eliminate the redundant information in the spectral data. However, SPA had a poor simplification effect on the model, but CARS and Si-PLS greatly simplified model complexity. The Si-PLS algorithm had the best elimination effect. The input variable of the model was reduced from 128 to 24, while the amount of spectral data decreased by approximately 81%.
The three screening methods contained all the optimal characteristic variables extracted within 1500 nm spectral data. Hence, the established prediction model of the available nitrogen content in vermiculite substrates is closely related to that characteristic band. The model was reliable and credible. Unlike the optimal prediction model based on full spectral data, except for CARS screening the accuracy of the model established using PLSR after feature extraction decreased to some extent, and the Si-PLS screening method had the lowest accuracy. The correlation coefficient of the calibration set decreased from 0.9982 to 0.9972, while that of the prediction set decreased from 0.9977 to 0.9968. The RPD of the model decreased from 12.14 to 7.64, with nearly 40% decreasing amplitude. This trend is related to introducing irrelevant information variables and block division in the characteristic Si-PLS selected region. However, the RPD of the PLSR prediction model was >3 after Si-PLS screening of the first derivative and SG smoothing of pretreated data, which still had reliable prediction performance. After feature extraction, the best effect was from the PLSR-established prediction model after CARS screening of the combined first derivative and SG smoothing pretreated data. There were 31 variables for modeling. The 97 pieces of irrelevant information about the available nitrogen content of vermiculite substrates were removed. The RPD of the model was 12.57, indicating a significant prediction effect. In summary, these results could reduce the gap in research into technology designed for the rapid detection of the available nitrogen content of various cultivation substrates. The results also provide references and a theoretical basis for developing detection equipment for the available nitrogen content of facility substrates.
The best model, with 12.57 RPD, was selected by comparing various preprocessing and feature extraction algorithms with prediction models constructed using PLSR methods. The selected model is better than those previously reported. For example, in 2008, Bambangh et al., collected 210 soil samples with 3.75 and 112.5 mm depths from seven field pastures in Taupo and Rotoru, New Zealand. The contents of soil total carbon and total nitrogen were detected with a LECO analyzer. After SG filtering and smoothing of the original spectrum and first derivative pretreatment, the prediction model of soil total nitrogen was established by PLSR [31]. In 2013, Kodaira et al. established partial least squares regression models for ammonium nitrogen (NH4), nitrate nitrogen (NN), hydrolyzed nitrogen (HN), and total nitrogen (TN) in soils sampled from Shisheng plain, Jiangjianchuan District, Hokkaido. The determination coefficient, R2, of the total nitrogen (TN) prediction model was >0.89, the relative analysis error RPD was >2.0, and the prediction accuracy reached class A, with a good prediction effect [32]. In addition, Shao and He used near-infrared spectroscopy to detect the spectrum of soil nitrogen, phosphorus, and potassium [33]. However, this study assayed vermiculite matrix with relatively simple properties as compared to soil; thus, it had less spectral collection interference and higher prediction accuracy.

Conclusions
This work built an NIR spectroscopy system of vermiculite substrate operating at a 940-1660 nm wavelength. Combining the NIR spectroscopy technology and stoichiometry, the correlation between the available nitrogen content of vermiculite substrates and NIR spectral curves was quantitatively analyzed. The results indicated that different spectral data preprocessing methods differentially influenced spectroscopic detection of the water content of vermiculite substrates. The combined prediction model using the second derivative with SG smoothing pretreatment had the best effect in band spectral modeling. The corresponding correlation coefficients of the calibration and prediction sets were 0.9982 and 0.9976, respectively, while the RPD was 12.14. After these results were obtained, SPA, CARS, and Si-PLS were used to optimize the characteristic variables of the spectral data. In contrast, PLSR was used to establish a predictive model for the available nitrogen content of the vermiculite substrate. The prediction accuracy of the models reached the optimum using the combination of SG smoothing and the first derivative pretreatment method. The CARS feature extraction methods were relatively optimal after optimizing the characteristic variables. Moreover, the corresponding correlation coefficients of the calibration and prediction sets of the best prediction model were 0.9972 and 0.9968, respectively. The root mean square errors of the calibration and prediction sets were 149.98 and 159.65 mg/kg, respectively, and the RPD was 12.57. This model detected the available nitrogen content of vermiculite substrates with high accuracy. Nonetheless, the samples used in this experiment were synthetic; hence, in situ desert agriculture samples should be used to update this model.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.