Quantitative Analysis of Organic Liquid Three-Component Systems: Near-Infrared Transmission versus Raman Spectroscopy, Partial Least Squares versus Classical Least Squares Regression Evaluation and Volume versus Weight Percent Concentration Units

The band shapes and band positions of near-infrared (NIR) and Raman spectra change depending on the concentrations of specific chemical functionalities in a multicomponent system. To elucidate these effects in more detail and clarify their impact on the analytical measurement techniques and evaluation procedures, NIR transmission spectra and Raman spectra of two organic liquid three-component systems with variable compositions were analyzed by two different multivariate calibration procedures, partial least squares (PLS) and classical least-squares (CLS) regression. Furthermore, the effect of applying different concentration units (volume percent (%V) and weight percent (%W) on the performance of the two calibration procedures have been tested. While the mixtures of benzene/cyclohexane/ethylbenzene (system 1) can be regarded as a blended system with comparatively low molecular interactions, hydrogen bonding plays a dominant role in the blends of ethyl acetate/1-heptanol/1,4-dioxane (system 2). Whereas system 1 yielded equally good calibrations by PLS and CLS regression, for system 2 acceptable results were only obtained by PLS regression. Additionally, for both sample systems, Raman spectra generally led to lower calibration performance than NIR spectra. Finally, volume and weight percent concentration units yielded comparable results for both chemometric evaluation procedures.


Introduction
Due to the different physical excitation mechanisms of mid-infrared (MIR)/NIR and Raman spectra, molecular interaction in multicomponent systems (e.g., hydrogen bonding) can affect these types of vibrational spectra to different extents. Hydrogen bonding plays a vital role in the molecular interaction of OH and NH groups with carbonyl or ether functionalities and has significant footprints in the MIR spectra of multicomponent systems with relevant chemistry [1][2][3][4][5]. These spectral effects, however, are not only observed with fundamental vibrations in the mid-infrared region but also occur in the overtone and combination band region of the near-infrared [6]. The application of NIR transmission spectroscopy as a counterpart of Raman spectroscopy has the advantage, to circumvent on the one hand the sample thickness limitations of MIR transmission spectroscopy for liquid mixtures and to avoid on the other hand the distortion and intensity changes of MIR absorption bands by the attenuated total reflection (ATR) technique [7].
Calibrations by CLS regression have often proved to be of excellent performance for multicomponent systems with no or small molecular interactions (e.g., gas analytical applications) [8,9]. However, no reports are available on the CLS calibration performance for multicomponent systems with strong molecular interactions. Thus, in this work, apart from comparing the impact of molecular interactions in liquid multicomponent systems on the results obtained by the two different types of spectroscopies, a further topic of this publication are the effects of these structural phenomena on the performance of two different multivariate evaluation routines (PLS and CLS regression) for the quantitative analysis of the investigated liquid three-component systems. Last but not least-and with reference to previous studies by Mark et al. [10][11][12][13]-we have tried to shed light on the consequences of using different concentration units (%V and %W) for the quantitative analysis of the described multicomponent systems.

NIR Spectra
The NIR spectra of the pure components of system 1 and system 2 are shown in Figure 1a,b, respectively. The NIR spectra of system 1 show a clear separation into aromatic and aliphatic/cycloaliphatic overtone and combination bands [14]. Thus, the bands between 6000-6200 cm −1 and in the wavenumber range 8500-9000 cm −1 can be assigned to the 1st and 2nd overtones, respectively, of the ν(CH) ar vibrations of benzene and ethylbenzene, whereas the 8000-8500 cm −1 region is characteristic of the 2nd overtones of the ν(CH 2 ) and ν(CH 3 ) vibrations of cyclohexane and ethylbenzene. The more intense bands in the 7000 cm −1 range (black and red spectra) belong to 2xν(CH 2 ) + δ(CH 2 ) and 2xν(CH 3 ) + δ(CH 3 ) combination vibrations of cyclohexane and ethylbenzene, whereas the very weak band complex (blue spectrum), can be assigned to combination bands of the 2xν(CH) ar + δ(CH) ar out-of-plane vibrations of benzene.
The band assignment of the aliphatic functionalities in the NIR spectra of the pure components of system 2 is analogous to system 1 for 1,4-dioxane and ethyl acetate whereas it becomes more complex for 1-heptanol due to the overlap with OH-specific absorption bands. Particularly noticeable are the intense, broad absorption bands of the 2xν(OH) + δ(OH) (8000-8500 cm −1 ) combination and 2xν(OH) (6000-7000 cm −1 ) 1st overtone vibrations of the hydrogen-bonded OH-functionalities.
The NIR spectra of all solvent blends of system 1 and system 2 with variable compositions are shown for the total available wavenumber range in Figure 2a,b, respectively. For better visualization, enlargements of the wavenumber range 8600-8150 cm −1 (system 1) and 7400-6100 cm −1 (system 2) are included in (c) and (d), respectively. Here, the dipole moment relevant differences in molecular interaction between the individual components of system 1 and system 2 lead to an interesting phenomenon. While the absorption bands of system 1 are more or less wavenumber invariant and only vary in intensity as a function of changing compositions, the absorption bands of system 2 not only reflect intensity fluctuations but also undergo drastic band shifts as a function of varying compositions. This can best be seen in Figure 2d, where the change in hydrogen bonding strength as a function of blend composition (the hydrogen bonding strength is different for the two hydrogen bonding acceptors) leads to drastic band shifts in the wavenumber range 6100-6600 cm −1 . Figure 2. NIR spectra (11,000-6000 cm −1 ) of the different three-component solvent mixtures of system 1 (a) and system 2 (b). In (c,d) enlargements for specific wavenumber ranges of (a,b) (see text) are shown.

Raman Spectra
The Raman spectra of the pure components of system 1 and system 2 are shown in Figure 3a,b, respectively. With the exception of 1-heptanol, in contrast to the NIR spectra, the Raman spectra are characterized by comparatively narrow and well-separated signals. In system 1, the most intense signals originate from ring-breathing vibrations around 1000 cm −1 (benzene and ethylbenzene) and 800 cm −1 (cyclohexane), respectively. The last-mentioned signal is also observed in the spectrum of 1,4-dioxane (system 2). More detailed band assignments are available in a recently published book by G. G. Hoffmann [15]. The Raman spectra of all variable-composition solvent blends of system 1 and system 2 are shown in Figure 4a,b respectively. Similar to the NIR spectra of system 1 and unlike the NIR spectra of system 2, the Raman spectra of both mixture systems reflect primarily composition-dependent intensity changes with only minor band shifts. This phenomenon is accentuated in the enlargement of Figure 4c for the wavenumber range 985-1015 cm −1 (system 1) and the enlargement of Figure 4d for the wavenumber range 830-860 cm −1 (system 2). Generally, the qualitative comparison of NIR and Raman spectra has to take into account their different excitation conditions. Thus, the dipole moment change of mechanically anharmonic oscillators with significant mass differences-such as the OH functionality of 1-heptanol-is susceptible to hydrogen bonding and leads to substantial spectral changes in the NIR spectra of the variable-composition blends. The corresponding Raman spectra, on the other hand, originate from changes in the polarizability, i.e., the measure for the simplicity to deform the electron envelope of a molecule during vibration. In a separate publication it will be shown, that although the impact of the composition-dependent polarizability changes of the skeletal, ring breathing, and ring deformation vibrations of system 1 is in a first inspection less obvious for the Raman spectra, closer examination also reveals intermolecular interactions for the cycloaliphatic and aromatic components of this system.

PLS/CLS Calibrations of NIR Spectra
As shown by the root mean square error (RMSE) and R square (R 2 ) values for system 1 in Table 1, the NIR-based CLS calibrations are of similar high quality as the PLS calibrations. With only two factors, the PLS calibrations also require the lowest possible number of factors for a 3-component system. For system 2, with strongly interacting components, however, CLS shows generally lower calibration performance than the corresponding PLS calibration. For the hydrogen bonding acceptors (ethyl acetate and 1,4-dioxane), the interactions with the hydrogen bonding donor (1-heptanol) are compensated by extra factors in the PLS calibrations. At this point, however, the lower number of factors required for the hydrogen bonding donor (only 2), cannot be explained. Thus, the CLS calibration method is preferably best applied to weakly or non-interacting mixture systems, where the NIR spectra of any composition can be almost perfectly reconstructed from pure-component spectra. In the CLS calibration, the spectrum is modeled as a weighted sum of the pure component spectra and the baseline function. For many simple mixtures, this method may be accurate enough, but it is not able to model molecular interaction effects such as peak broadening and substantial peak shifts [16][17][18]. Thus, the CLS model performance is high for system 1 and low for system 2.

PLS/CLS Calibrations of Raman Spectra
In Table 2, the calibration parameters obtained for the Raman spectra have been summarized. Most strikingly almost all PLS and CLS calibration parameters for both sample systems are of lower quality. The reason may be, that the S/N ratio of the NIR instrument used in the present investigations is much higher (10,000:1) than that of the Raman instrument (1000:1). Furthermore, unlike the results for the NIR spectra, the PLS calibrations of the Raman spectra clearly outperform the CLS calibrations for both sample systems. This is a consequence of the fact, that contrary to NIR spectroscopy, the Raman technique does slightly reflect spectral changes by molecular interactions of the mixture components of system 1 in the form of small band shifts of skeletal and ring breathing vibrations. Using a similar approach of pure-spectra reconstruction with the Raman spectra of system 1 as described in detail in a previous publication for NIR spectra [5], these interaction effects are observed as dispersion shaped signals. As described in Section 2.2, the samples were prepared by mixing the individual components by volume percentage. Subsequently, the density of each sample was determined, and the volume percentage concentrations (%V) were transformed into weight percentage concentrations (%W), and the volumes of the mixture solutions were calculated. From this procedure it was found that the differences in the actual volumes of the mixture samples and the sum of the volumes of the pure components used for sample preparation was l minimal, and the coefficients of variation for the actual volumes of the mixture samples were 0.43% and 0.51% for system 1 and system 2, respectively. Furthermore, in their publications [10][11][12][13] Mark et al. claimed, that for NIR calibrations the use of volume percentage-corresponding to the scaled volume fraction concentration unit of Beer's law-is the better approach than weight percentage. In what follows we will show, that this statement is not supported by the results of the investigated multicomponent systems.
In Tables 1 and 2 not only the results for both spectroscopic techniques and calibration procedures have been summarized, but also the parameters derived for both concentration units have been included. Although the rows with the concentration units that yielded the better calibration results have been highlighted in bold typeface in the Tables 1 and 2, it has to be clearly stated, that the differences in the RMSE and R 2 values, derived with the different concentration units, are rather small. Furthermore, the assignment of compounds with superior calibration does not allow for the definition of general rules based on specific chemical or physical phenomena, which could be eventually used to improve calibration performance. Therefore, volume and weight percent concentration units should be treated to perform equally for the respective calibration procedures with both spectroscopic techniques and sample systems under investigation.

Chemicals
Six organic liquids were selected to prepare 21 calibration samples and 10 test samples (30 mL each) for each of the two mixture systems with variable concentrations: benzene, cyclohexane, and ethylbenzene (system 1), and ethyl acetate, 1-heptanol, and 1,4-dioxane (system 2). The six chemicals were purchased from Sinopharm Chemical Reagent Co. Ltd. (Shanghai, China).

Determination of Volume and Weight Percentage Concentrations
The densities of the pure components and the individual solvent mixtures of the two sample systems were determined with a calibrated 25 mL volumetric flask. Based on these values, the weight percentages (%W) of the individual components in the different solvent mixtures were calculated. In Table 3, the volume and weight percentage compositions of system 1 and system 2 are summarized.

Instrumentation
The NIR spectra of the solvent mixtures were measured in the wavenumber range 11,117-5853 cm −1 with a NIRQuest512 spectrometer (Ocean Optics, Inc., Orlando, FL, USA) based on a grating monochromator, an uncooled InGaAs array detector, equipped with a HL-2000 light source (Ocean Optics, Inc., Orlando, FL, USA), and coupled to an optical fiber. The instrument has a signal-to-noise (S/N) ratio of 10,000:1. The liquid samples were measured in a 2 mm path length transmission cell with an integration time of 14 ms by accumulating 20 scans with a mean spectral resolution of 20.6 cm −1 . Each sample was measured in triplicate, and the mean spectrum was calculated as the final result.
The Raman spectra of the solvent mixtures were measured with a QE65 Pro Raman spectrometer (Ocean Optics, Inc., Orlando, FL, USA), that was equipped with a grating monochromator, a cooled InGaAs array detector and with an optical fiber coupled to a 180 mW Turnkey Raman Laser with 785 nm excitation (Innovative Photonic Solutions, Monmouth Junction, NJ, USA). The instrument has an S/N ratio of only 1000:1. The samples were placed in a GC glass bottle (ϕ: 10 mm, height: 32 mm, Zhejiang Aijiren Technology Co., Ltd., Zhejiang, China) and measured in the 136-2200 cm −1 wavenumber range with an integration time of 6 s by accumulating five scans with a mean spectral resolution of 5.8 cm −1 . Each sample was measured in triplicate, and the mean spectrum was used for further evaluations.

Chemometric Data Analysis
The PLS toolbox 6.21 (Eigenvector Research, Inc., Manson, WA, USA) was used for chemometric data analysis. For this purpose, the original NIR and Raman spectra were truncated to the wavenumber range 11,117-6033 cm −1 and 200-2000 cm −1 , respectively. As an additional data pretreatment, the baseline correction was applied, and then the calibration spectra were subjected to PLS and CLS calibrations with a leave-one-out (LOO) internal cross-validation to select the optimum number of factors.

Calibration Statistics Analysis
Calibration statistics included the R square of calibration (R 2 C ), the R square of cross-validation (R 2 CV ), the R square of prediction (R 2 P ), the root mean square error of calibration (RMSEC), the root mean square error of cross-validation (RMSECV), and the root mean square error of prediction (RMSEP). The R square is used to describe the linear correlation between the predicted values and the measured values. The higher the R 2 p and the closer it is to R 2 c, the higher is the correlation between the predicted and the actual values, and the robustness of the model. The RMSEC, RMSECV, and RMSEP were used to evaluate the feasibility of the calibration model [19]. The lower the RMSEP and the closer it is to the RMSEC, the stronger is the predictive ability and the robustness of the calibration model [20].

Conclusions
The mixtures of benzene/cyclohexane/ethylbenzene (system 1) can be regarded as a blended system with comparatively low molecular interactions, whereas hydrogen bonding plays a dominant role in the blends of ethyl acetate/1-heptanol/1,4-dioxane (system 2).
The calibration results evaluated by PLS and CLS regression with the NIR and Raman spectra of the investigated 3-component systems 1 and 2 taking into consideration volume percent and weight percent concentration units allow to draw the following conclusions: (1) Multicomponent systems-as system 1 in the present work-that do not induce significant band shifts in the NIR spectra of different blend compositions yield equally good calibrations by PLS and CLS regression. (2) Multicomponent systems with large spectral signatures due to molecular interactions by hydrogen bonding-like system 2 in the present work-should only be calibrated by PLS regression, because the negative effect of the molecular interactions can be efficiently compensated by the increase of the number of factors.
(3) For both sample systems, Raman spectra led to lower calibration performance than NIR spectra. Specifically, for system 1-the aromatic and cycloaliphatic 3-component system-a significant deterioration of calibration results by PLS and CLS regression was observed. (4) The hypothesis, that volume percent should be preferentially used as the concentration unit for the calibration of liquid multicomponent systems could not be confirmed by the presented results.
For both spectroscopic measurement techniques as well as chemometric calibration procedures volume and weight percent concentration units yielded comparable results.