Raman Calibration Models for Chemical Species Determination in CO 2 -Loaded Aqueous MEA Solutions Using PLS and ANN Techniques

: The improvement in energy efﬁciency is recognized as one of the signiﬁcant parameters for achieving our net-zero emissions target by 2050. One exciting area for development is conventional carbon capture technologies. Current amine absorption-based systems for carbon capture operate at suboptimal conditions resulting in an efﬁciency loss, causing a high operational expenditure. Knowledge of qualitative and quantitative speciation of CO 2 -loaded alkanolamine systems and their interactions can improve the equipment design and deﬁne optimal operating conditions. This work investigates the potential of Raman spectroscopy as an in situ monitoring tool for determining chemical species concentration in the CO 2 -loaded aqueous monoethanolamine (MEA) solutions. Experimental information on chemical speciation and vapour-liquid equilibrium was collected at a range of process parameters. Then, partial least squares (PLS) regression and an artiﬁcial neural network (ANN) were applied separately to develop two Raman species calibration models where the Kent–Eisenberg model correlated the species concentrations. The data were paired and randomly distributed into calibration and test datasets. A quantitative analysis based on the coefﬁcient of determination (R 2 ) and root mean squared error (RMSE) was performed to select the optimal model parameters for the PLS and ANN approach. The R 2 values of above 0.90 are observed for both cases indicating that both regression techniques can satisfactorily predict species concentration. ANN models are slightly more accurate than PLS. However, PLS (being a white box model) allows the analysis of spectral variables using a weight plot. tion were acquired through the VLE experiment of the CO 2 -MEA-H 2 O system at varying total MEA concentration. The VLE data from the experiment were then further used to predict species concentration inside the CO 2 -MEA-H 2 O system using the regressed Kent–Eisenberg thermodynamic model. A total of 21 Raman species calibration models were developed using each technique utilizing smoothed Raman spectra and the estimated species concentration. The performance of each model was evaluated using performance parameters of R 2 and RMSE. The R 2 values from the developed calibration models were above 0.90 while the RMSE values improved throughout model optimization. Comparison of both techniques shows that ANN models were more accurate with lower RMSE values observed compared to the PLS model. However, the small differences between the RMSE values indicate that PLS models were comparatively good in predicting species concentration. Additionally, the PLS model allows studies on the signiﬁcance of each variable to be conducted through analysing the weight plots, which was not possible in ANN models, this being a black box technique. The regression plots show close data distribution along the regression line with no trending observed from the data distribution for both calibration and test datasets. The result signiﬁes that both techniques are applicable for developing Raman species calibration models to predict species concentration with good accuracy in CO 2 -loaded aqueous MEA system. Notably, PLS provides added advantages over ANN in terms of analysis on the signiﬁcance of each spectral variable on the developed models.


Introduction
Carbon dioxide (CO 2 ) is deemed a global environmental concern due to its role in global warming and climate change as a greenhouse gas [1,2]. Industrial activities such as oil and gas processing, power generation and steel and iron production contribute 64% of total CO 2 emissions [3,4]. In the natural gas production industry, CO 2 had been among the main contaminants that need to be removed. In addition to environmental concern, CO 2 is also corrosive to natural gas transmission pipelines with the presence of water leading to the formation of carbonic acid, posing a major safety risk and thus requiring stringent regulation and maintenance to be implemented during operation to manage it [5]. The gross heating value of sales gas is also affected with sale price per volume decreases as CO 2 volume increases [6]. As more low-CO 2 gas fields deplete rapidly and the demand Application of a thermodynamic model for CO 2 -loaded amine speciation analysis can minimize human errors and overall time required as well as provides means to estimate species concentration of chemical species which are difficult to synthesize. The Kent-Eisenberg model is a thermodynamic model developed for acid gases-loaded amine systems [19]. The model is well-known for its simplicity that is achieved by lumping all the non-idealities into the equilibrium constants which are then fitted to the experimental data, resulting in quick estimation while maintaining good correlative accuracy [20]. Other complex models use excess Gibbs energy or a combination of Equation of State and Gibbs energy approach, requiring a substantial number of regression parameters for iteration compared to the much simpler Kent-Eisenberg model [21]. The measurement accuracy of a modified version of the Kent-Eisenberg model was proven to be good within the range of 0.2 to 0.8 mol of CO 2 per mol of amine [22][23][24].
Developing a Raman calibration model requires the application of a multivariate calibration technique. The requirement is due to the multivariate nature of the Raman spectrum. Each Raman spectrum is a linear combination of the pure compound spectra from chemical species in the system [12]. A Raman calibration model can be developed by first identifying the spectral peak of the species of interest. The spectral peak is identified by comparing the spectra with that from pure compound. The peak height or area of the identified spectra is then quantified against the measured species' concentration [6]. For a complex system with various chemical species, multiple spectral peaks may overlap causing difficulties in identifying and measuring the respective peak area. Each variable within the Raman spectrum is also highly correlated to each other and needs to be uncorrelated before the species calibration model can be constructed. This requires the application of a dimensionality reduction technique.
Principal component analysis (PCA) is a dimensionality reduction technique that transforms the original variables into new, orthogonal variables called principal components. Each principal component is a linear combination of the original variables [25,26]. Construction of principal components maximizes the percentage of explained variance for the first component. The percentage subsequently reduced as additional principal components are added. Such a hierarchical structure is attained as principal components are constructed to be orthogonal to each other. The orthogonality and hierarchical structure of principal components allow relevant components with a high percentage of explained variance to be identified. The components can then be used for developing the Raman calibration model by applying a regression technique such as principal component regression (PCR) [27].
The concept of variable transformation towards orthogonality in PCA was wellintegrated in the more advanced regression technique, called partial least squares (PLS) regression. The PLS regression model was developed using components that were constructed to maximize the correlation between predictor and response variables while also maximizing the explained variance within each predictor and response variables [28]. The approach in the PLS technique allows the construction of a simpler model compared to PCR. However, modelling a non-linear relationship using a linear regression technique such as PLS could impact accuracy. A non-linear relationship can exist in Raman spectral data due to overlapping signals, inconsistency in particle sizes within the sample, contaminants from external or ambient light and use of non-homogeneous or light-absorbing samples.
Modelling such a non-linear relationship might require the application of a non-linear regression technique. With the advancements made in the field of machine learning, the application of an artificial neural network (ANN) in chemometrics is also gaining traction [29]. ANN is a non-linear regression technique by virtue of utilizing a non-linear activation function. This capability can potentially improve measurement accuracy made by a process analyser compared to PCR and even PLS-based calibration model in real-world application [30][31][32].
Raman spectroscopy has been applied previously to quantify species concentration in a CO 2 -loaded aqueous MEA system. Souchon et al. studied species distribution in ChemEngineering 2021, 5, 87 4 of 18 various amine types using Raman spectroscopy. Species concentration was derived from the reference solutions that were measured by a 13 C nuclear-magnetic resonance (NMR) spectrometer. The work concluded that Raman spectroscopy can achieve lower analysis time compared to NMR due to the latter requires longer relaxation times per spectrum. The work also highlighted the importance of pairing a thermodynamic model with spectral measurement tools in estimating the quantities of carbamate ion, which are not measurable through the wet chemistry method [12]. Wong et al. investigated the chemical speciation in CO 2 -loaded aqueous MEA at varying pressure and temperature ranges using Raman spectroscopy. The author developed calibration models by manually plotting peak area ratio from the spectrum against species concentration, which was derived through the wet chemistry method. The models can predict species concentration with good accuracy [6]. Jinadasa et al. investigated the application of Raman spectroscopy for CO 2 -loaded, 30 wt% MEA solutions at room pressure and temperature. Using the PLS regression technique combined with offline titration analysis and Kent-Eisenberg thermodynamic models, the Raman calibration models are developed and the in situ monitoring technique was tested with good results [33,34]. Zubair et al. utilized Raman spectroscopy for measuring CO 2 concentration in aqueous diethanolamine (DEA), methyldiethanolamine (MDEA) as well as a combination of both. The development of a calibration model in their work utilized PLS regression and a model was developed to include CO 2 loading range above 0.5 for high-pressure applications [35]. Similarly, the author also investigated the application of Raman spectroscopy for monitoring species concentration in CO 2 -loaded aqueous DEA systems with a CO 2 loading range of up to 0.98, utilizing the PLS regression technique for developing a calibration model [36]. It can be established that previous research work within the area focused mainly on the application of PLS regression methods, while other multivariate regression methods remained unexplored.
In this work, Raman species calibration models for CO 2 -loaded aqueous MEA system at 3, 4 and 5 molar MEA concentrations were developed using PLS regression and ANN. To develop the calibration models, Raman spectra of the system were first acquired using a vapour-liquid equilibrium (VLE) experimental setup with Raman spectroscopy installed. Next, CO 2 loading values were calculated using a pressure drop method with ideal system assumed. The experimental VLE data together with CO 2 loading were used on the regressed modified Kent-Eisenberg model to calculate the species concentration. The smoothed and centred Raman spectra together with the respective species concentration data were then utilized for developing the Raman calibration models using PLS regression and ANN techniques. Comparisons were made between the developed models based on the predictive performance by evaluating the coefficient of determination (R 2 ) and root mean squared error (RMSE).

Materials
Materials required for conducting the VLE experiment are detailed in Table 1. Monoethanolamine was sourced from Merck Sdn. Bhd, while deionized water was obtained in-house. Carbon dioxide gas was procured from Linde Malaysia Sdn. Bhd.

Experimental Setup and Method
The VLE experimental setup was constructed as depicted in Figure 1. The setup consists of three sub-units which are CO 2 -amine loading cell, Raman spectroscopy unit and data acquisition unit. The CO 2 -amine loading cell consists of a 435 mL feed vessel connected to a 460 mL stainless steel reaction vessel. Both vessels are built with integrated pressure and temperature sensors to measure conditions and gathering experimental data. A CO 2 feed line was connected to the feed vessel to supply CO 2 gas into the experimental setup. A ceramic heater was installed around the reaction vessel to regulate system temperature. A CO 2 vent line was installed at the top side of the reaction vessel to allow vessel degassing. A magnetic stirrer was used inside the reaction vessel throughout the experimental work to maintain mass transfer between the gas and liquid phase.
Aqueous MEA solutions at 3, 4 and 5 molar strength were prepared by first weighin the required quantity of MEA on a Sartorius BSA2245S-BW mass balance. The weighte MEA was added to a 250 mL volumetric flask. Deionized water was added to the volu metric flask to prepare the solution at the required concentration. Each solution was pr pared at a controlled temperature of 298.15 K.

Experimental Setup and Method
The VLE experimental setup was constructed as depicted in Figure 1. The setup con sists of three sub-units which are CO2-amine loading cell, Raman spectroscopy unit an data acquisition unit. The CO2-amine loading cell consists of a 435 mL feed vessel con nected to a 460 mL stainless steel reaction vessel. Both vessels are built with integrate pressure and temperature sensors to measure conditions and gathering experiment data. A CO2 feed line was connected to the feed vessel to supply CO2 gas into the exper mental setup. A ceramic heater was installed around the reaction vessel to regulate system temperature. A CO2 vent line was installed at the top side of the reaction vessel to allo vessel degassing. A magnetic stirrer was used inside the reaction vessel throughout th experimental work to maintain mass transfer between the gas and liquid phase. A Raman spectroscopy unit was procured from StellarNet. The unit consists of a R man spectrometer and laser source. The Raman spectrometer has a resolution of 4 cm with signal to noise ratio of 1000:1 while the laser source uses 500 W of power-producin laser beam with a wavelength of 785 nm. A probe connected to the Raman spectroscop unit by an optical fibre cable was mounted at the side of the reaction vessel where a sap phire-glass window was built. The probe directed the laser beam towards the system an retrieved the resulting Raman scattering for the Raman spectrometer. The data acquisitio unit was pre-installed with SpectraWiz software and serves to process the data receive from the Raman spectroscopy unit.
In a typical run, the prepared aqueous MEA solution was firstly loaded into the r action vessel. The temperature for each run was set at 303.15 K. CO2 gases were then in troduced into the reaction vessel and the aqueous MEA solution was continuously stirre until an equilibrium state was reached. The temperature and pressure of the system wer continuously recorded using built-in temperature and pressure transmitters throughou the experimental run for CO2 loading calculation using Equation (1) [15,35]. A Raman spectroscopy unit was procured from StellarNet. The unit consists of a Raman spectrometer and laser source. The Raman spectrometer has a resolution of 4 cm −1 with signal to noise ratio of 1000:1 while the laser source uses 500 W of powerproducing laser beam with a wavelength of 785 nm. A probe connected to the Raman spectroscopy unit by an optical fibre cable was mounted at the side of the reaction vessel where a sapphire-glass window was built. The probe directed the laser beam towards the system and retrieved the resulting Raman scattering for the Raman spectrometer. The data acquisition unit was pre-installed with SpectraWiz software and serves to process the data received from the Raman spectroscopy unit.
In a typical run, the prepared aqueous MEA solution was firstly loaded into the reaction vessel. The temperature for each run was set at 303.15 K. CO 2 gases were then introduced into the reaction vessel and the aqueous MEA solution was continuously stirred until an equilibrium state was reached. The temperature and pressure of the system were continuously recorded using built-in temperature and pressure transmitters throughout the experimental run for CO 2 loading calculation using Equation (1) [15,35].
After the equilibrium state was reached, CO 2 was further introduced into the reaction vessel and the process was repeated until the targeted CO 2 loading in the aqueous MEA solution was reached inside the reaction vessel. The recorded Raman spectra were within the range of 0 to 2800 cm −1 . Forty (40), 67 and 66 datapoints for the Raman spectra were gathered for 3, 4 and 5 molar aqueous MEA solutions, respectively.

Thermodynamic Framework and Kent-Eisenberg Model
The modified Kent-Eisenberg model was used to estimate the species concentration for the CO 2 -loaded aqueous MEA solutions in equilibrium [20]. The model is an improved version based on the work by Kent and Eisenberg [19]. Aqueous MEA reaction with CO 2 follows an acid-base reaction and produced numerous ions. Equations (2)-(6) describe CO 2 dissolution into MEA.
Dissociation of a protonated alkanolamine: Reversion of carbamate to bicarbonate: Hydrolysis and ionization of dissolved CO 2 : Dissociation of bicarbonate: Ionization of water: Based on the defined chemical reactions, the equilibrium constants for each reaction are as follows: The value for equilibrium constants K 2 , K 3 , K 4 and K 5 were calculated based on the work of Edwards et al. [37], whereas the value of K 1 was treated as a fitting parameter to the experimental values. To fit the experimental loading data into the model, a parameter k m 1 is defined where: As for f : The purpose of fitting the model with experimental loading data is so that accurate calculation of speciation data is acquired. The defined k m 1 serve as an adjustable parameter to improve the prediction accuracy of the model towards the experiment parameters [6,20,38]. The unknowns were found by simultaneously solving the five reaction equations and three balance equations detailed below: Amine balance: CO 2 balance: Charge/electroneutrality balance: Henry's law is applied to quantify the concentration of carbon dioxide dissolved in the liquid phase: The equations were solved in a polynomial form: where: Equation (18) was solved using MATLAB to find the roots. Five numerical values were obtained, in which one of them is the concentration of hydrogen ion. The valid hydrogen ion concentration is within the range of 10 −7 to 10 −12 molar with respect to the pH value of between 7 and 11 in the loaded and unloaded MEA solution [22]. The concentration of each chemical species was then calculated using Equations (7)-(11).

Calibration Models Development and Evaluation
The gathered Raman spectra and the species concentration estimated using the regressed modified Kent-Eisenberg model were paired and classified according to the total MEA concentration in the system from which the data originated. The classification of the dataset is presented in Table 2. Each dataset was then randomly divided into training and test datasets. The training dataset was used to develop a calibration model of the chemical species. The test dataset was then used to validate the performance of the developed calibration model externally, thus the test dataset was excluded from the model training. As for ANN, the training data were further divided randomly into a validation dataset which was used during model training to pre-validate model performance for early stopping implementation. Generally, 60% to 70% of the data were randomly assigned as training data while the remaining 30% to 40% were assigned as test data [39][40][41]. Before model development, Raman spectra were preprocessed through the application of Savitzky-Golay smoothing, followed by mean centring and scaling of the spectra. Savitzky-Golay smoothing was performed to reduce noise while preserving information contained in characteristic peaks within the spectrum. The application was through a builtin filter in MATLAB with smoothing degree and range of filter span adjustment dictates the smoothing effects on the individual spectrum. The optimal smoothing parameters were determined by observing the filter effects on the performance of the developed models. Mean centring was then performed on each Raman spectrum by subtracting the mean value of a variable from the respective variable in each observation in the MATLAB environment.
Raman calibration models were developed by applying PLS and ANN techniques on the preprocessed Raman spectra and the estimated species concentration, which served as a predictor and response data, respectively. While the form of response data used for model developments was similar for both techniques, the form of the predictor data differs between both techniques. For the PLS model, preprocessed Raman spectra were used directly while for ANN, the PCA score from the transformed Raman spectra was used. Full Raman spectrum range (160 to 2800 cm −1 ) were utilized for the model development.
PLS-based Raman species calibration models were developed using the PLSREGRESS function in MATLAB, which utilized the SIMPLS algorithm for fast iterations [28,42]. During the execution of the PLSREGRESS function, both Raman spectra and species concentration data were each transformed into a PLS component, which is a linear combination of each variable within the data. The PLS component was constructed such that the variance explained by the component as well as the correlation between the predictor and the response variables were maximized. Each of the constructed PLS components was also orthogonal to the others. The orthogonality resulted from the deflation step performed at the end of each component construction. The deflation step removes the variance explained by the current PLS component from the error matrix while preserving the unexplained variance. Additional PLS components were then constructed using the error matrix. Initial PLS-based models were developed using 15 components. Such a large number of components is to ensure that all-important information was considered for developing a calibration model for a particular response variable. Diagnostics from the developed model were then used to select an optimal number of components for the final model.
Application of the PLSREGRESS function on the dataset produces several outputs, which include the loadings and scores for the predictor and response data, the PLS regression coefficients, and a matrix containing the percentage of variance explained by the developed model. Using the percentage-explained variance data, a PLS variance plot (which is a plot of cumulative percent explained variance against the PLS component) was drawn. The PLS plot was used to estimate the potential number of PLS components to be used in developing the calibration model. These PLS components were determined up to the point where the variance profile flattened. Beyond that point, the components do not contain significant information towards the model and will lead to overfitting [27]. An example of a scree plot is depicted in Figure 2 revealing that mean squared error (MSE) reduced as the number of components increased. However, the plotted trend flattens after 6 components. The area before the 6th component was defined as scree region and was retained in the final model. A similar method was conducted in selecting the number of components for PLS-based calibration models.
ChemEngineering 2021, 5, x FOR PEER REVIEW 9 the final model. A similar method was conducted in selecting the number of compon for PLS-based calibration models. ANN-based Raman species calibration models were developed in MATLAB u the built-in Neural Network Toolbox combined with manual parameter adjustments untrained neural network architecture was initiated with three layers as default consis of one input layer, one hidden layer and one output layer. A single hidden layer can as a universal approximator capable of modelling any functions. Although there is a sibility of attaining quicker training time, additional hidden layers rarely improve the formance of calibration models versus using a single hidden layer [43][44][45].
For the input layer, raw Raman spectra were first compressed into principal com nents (PC) using principal component analysis (PCA). The score was then used as i for ANN. The step was implemented to reduce the dimension and extract only the vant features from the data for lower memory requirement and faster iteration [26,46,47]. The number of input nodes was set based on the number of principal com nents extracted using PCA. The number of output nodes was set to one to account fo respective chemical species concentration. Similar to the consideration made for PLS m els, 15 LVs was selected for the initial training. The number is then further lowered du optimization. Additionally, the iterative optimization process on neural network struc during training will minimize the significance from less relevant input towards the re ing output [48]. One hidden layer was set for the initial design with one node to const model complexity and minimize overfitting. Additional nodes improve the predic performance in the presence of slight non-linearity, which is common in multivariate tasets [48]. The number of nodes was subsequently adjusted based on the observed m performance. For the output layer, one node was used similar to the number of targ outputs. The tan-sigmoid function was set as an activation function to allow flexibilit modelling non-linearity during training. MSE was implemented as the cost functio measure the performance and as a learning parameter for the optimization algorithm ing ANN training. Random weights were assigned with the values constrained with and 1 corresponding to the assigned activation function for the network.
The initiated neural network architecture was then trained using the Levenb Marquadt optimization algorithm which employed a backpropagation algorithm. E stopping was implemented by using 10% to 15% of randomly selected training data monitoring set during training. The training stopped after six iterations if no improvem in MSE value was observed for the training dataset compared to the monitoring s minimize overfitting. The training progression after each iteration is depicted in Figu from the training calibration model for CO2 loading using three molar total MEA data The MSE values for validation and test datasets was observed to decrease from the ANN-based Raman species calibration models were developed in MATLAB using the built-in Neural Network Toolbox combined with manual parameter adjustments. An untrained neural network architecture was initiated with three layers as default consisting of one input layer, one hidden layer and one output layer. A single hidden layer can act as a universal approximator capable of modelling any functions. Although there is a possibility of attaining quicker training time, additional hidden layers rarely improve the performance of calibration models versus using a single hidden layer [43][44][45].
For the input layer, raw Raman spectra were first compressed into principal components (PC) using principal component analysis (PCA). The score was then used as input for ANN. The step was implemented to reduce the dimension and extract only the relevant features from the data for lower memory requirement and faster iteration time [26,46,47]. The number of input nodes was set based on the number of principal components extracted using PCA. The number of output nodes was set to one to account for the respective chemical species concentration. Similar to the consideration made for PLS models, 15 LVs was selected for the initial training. The number is then further lowered during optimization. Additionally, the iterative optimization process on neural network structure during training will minimize the significance from less relevant input towards the resulting output [48]. One hidden layer was set for the initial design with one node to constraint model complexity and minimize overfitting. Additional nodes improve the prediction performance in the presence of slight non-linearity, which is common in multivariate datasets [48]. The number of nodes was subsequently adjusted based on the observed model performance. For the output layer, one node was used similar to the number of targeted outputs. The tan-sigmoid function was set as an activation function to allow flexibility for modelling non-linearity during training. MSE was implemented as the cost function to measure the performance and as a learning parameter for the optimization algorithm during ANN training. Random weights were assigned with the values constrained within −1 and 1 corresponding to the assigned activation function for the network.
The initiated neural network architecture was then trained using the Levenberg-Marquadt optimization algorithm which employed a backpropagation algorithm. Early stopping was implemented by using 10% to 15% of randomly selected training data as a monitoring set during training. The training stopped after six iterations if no improvement in MSE value was observed for the training dataset compared to the monitoring set to minimize overfitting. The training progression after each iteration is depicted in Figure 3 from the training calibration model for CO 2 loading using three molar total MEA datasets. The MSE values for validation and test datasets was observed to decrease from the start before increasing after the third iteration. The training was stopped based on the trends of MSE values for validation data which do not improve after the third iteration. Network parameters at the third iteration were then used for the resulting model.
ChemEngineering 2021, 5, x FOR PEER REVIEW of MSE values for validation data which do not improve after the third iteration parameters at the third iteration were then used for the resulting model. Ten training runs were conducted per model using randomized initial w improve model parameters towards achieving the best possible solution. Th models were then evaluated based on the calculated MSE values. Adjustments w on the neural network parameters based on the training results. The model wa trained with the implemented adjustment and the training process was monito validation data. The overall training process for the model concluded once the M stopped improving.
The performance of each developed model was evaluated by assessing the R coefficient of determination (R 2 ) of the model on the training dataset. R 2 value an overview of the tightness of the predicted and actual data towards the regre while RMSE evaluates the closeness of the predicted value to the actual value with R 2 value of above 0.90 indicates excellent correlations between Raman spe species concentration. Also, models with an R 2 value of 0.83 to 0.90 are still within a limited scope of use, typically in research work [49].
Selection of the final model parameters was conducted through external using the previously split test datasets. Externally validating the model with an ent test dataset allows for unbiased evaluation of the developed calibration mod for test datasets was compared to the similar RMSE plot for training datasets as in the examples in Figures 2 and 3 for PLS and ANN, respectively. The plot s the quality of the model does not improve with an additional number of comp yond six, which was consistent with the finding from the scree plot. Further eva R 2 was conducted using a test dataset. Additionally, regression plots of the tra test dataset for each calibration model were constructed to analyse the trend points distribution along the regression line. A total of 21 Raman calibration mo developed using each technique, with seven models each at 3, 4 and 5 molar centration. The seven models were for protonated MEA, bicarbonate, carbon acted MEA, carbamate, unreacted CO2, and CO2 loading. Ten training runs were conducted per model using randomized initial weights to improve model parameters towards achieving the best possible solution. The trained models were then evaluated based on the calculated MSE values. Adjustments were made on the neural network parameters based on the training results. The model was then retrained with the implemented adjustment and the training process was monitored using validation data. The overall training process for the model concluded once the MSE values stopped improving.

Characteristic Bands of Chemical Species in CO2-Loaded Aqueous Monoethanolam (MEA) System
The performance of each developed model was evaluated by assessing the RMSE and coefficient of determination (R 2 ) of the model on the training dataset. R 2 value provides an overview of the tightness of the predicted and actual data towards the regression line, while RMSE evaluates the closeness of the predicted value to the actual value. A model with R 2 value of above 0.90 indicates excellent correlations between Raman spectrum and species concentration. Also, models with an R 2 value of 0.83 to 0.90 are still applicable within a limited scope of use, typically in research work [49].
Selection of the final model parameters was conducted through external validation using the previously split test datasets. Externally validating the model with an independent test dataset allows for unbiased evaluation of the developed calibration model. RMSE for test datasets was compared to the similar RMSE plot for training datasets as observed in the examples in Figures 2 and 3 for PLS and ANN, respectively. The plot shows that the quality of the model does not improve with an additional number of components beyond six, which was consistent with the finding from the scree plot. Further evaluation of R 2 was conducted using a test dataset. Additionally, regression plots of the training and test dataset for each calibration model were constructed to analyse the trends in datapoints distribution along the regression line. A total of 21 Raman calibration models were developed using each technique, with seven models each at 3, 4 and 5 molar MEA concentration. The seven models were for protonated MEA, bicarbonate, carbonate, unreacted MEA, carbamate, unreacted CO 2, and CO 2 loading.

Characteristic Bands of Chemical Species in CO 2 -Loaded Aqueous Monoethanolamine (MEA) System
Comparison between the smoothed spectra of the unloaded and the CO 2 -loaded MEA solution were made as exemplified in Figure 4. Significant peaks were observed at Raman shift range between 800 cm −1 and 1100 cm −1 as well as 1250 cm −1 and 1400 cm −1 . The observed peaks were due to changes in species composition caused by CO 2 absorption and chemical reactions between species as described in Equations (7)- (11). The ionic species present in the CO 2 -loaded aqueous MEA solution were protonated MEA, bicarbonate, carbonate, and carbamate while non-ionic species include unreacted MEA and dissolved CO 2 .  For the ionic species, the characteristic band for bicarbonate ion is visible at 1017 cm −1 (C-OH stretching) while the characteristic band for carbonate is located at 1065 cm −1 (C-O bond stretching). The characteristic band for carbamate is observed at 1162 cm −1 (C-N stretching) and for protonated MEA at 1011 cm −1 (CH2 twisting), 1274 cm −1 (C-NH stretch) and 1320 cm −1 (C-C stretch). For non-ionic species, the dissolved CO2 characteristic bands are observed at 1274 cm −1 (CO2 symmetric stretch) and 1383 cm −1 (CO2 bend), while the band for unreacted MEA is observable at 845 cm −1 (CH2 rocking) and 873 cm −1 (CN stretching) [4,6,12,50,51].
It was observed that peak intensity varies at the spectral ranges containing the characteristic bands for the chemical species in the system. In contrast, minimal variations were observed at other spectral ranges. As an example, the Raman spectra in Figure 4 show that spectral intensity at characteristic bands for bicarbonate (1017 cm −1 ), carbonate (1065 cm −1 ) and aqueous CO2 (1274 cm −1 ) increases. The observation agrees with the understanding that ionic species of bicarbonate and carbonate increases in concentration as CO2 loading increases. As the pressure inside the reaction vessel increases, CO2 began to dissolve physically into the solution. The physical dissolution increases aqueous CO2 concentration in the liquid phase. For carbamate, the concentration increases up to a certain point before decreasing as the pressure increases. This follows the theory that carbamate production from the MEA reaction with CO2 limits the capacity for absorption to 0.5 mol of CO2 to mol of amine. As pressure increase, hydrolysis of carbamate forms free amine capable to react with CO2 leading to loading of above 0.5 [23,24]. For the ionic species, the characteristic band for bicarbonate ion is visible at 1017 cm −1 (C-OH stretching) while the characteristic band for carbonate is located at 1065 cm −1 (C-O bond stretching). The characteristic band for carbamate is observed at 1162 cm −1 (C-N stretching) and for protonated MEA at 1011 cm −1 (CH 2 twisting), 1274 cm −1 (C-NH stretch) and 1320 cm −1 (C-C stretch). For non-ionic species, the dissolved CO 2 characteristic bands are observed at 1274 cm −1 (CO 2 symmetric stretch) and 1383 cm −1 (CO 2 bend), while the band for unreacted MEA is observable at 845 cm −1 (CH 2 rocking) and 873 cm −1 (CN stretching) [4,6,12,50,51].
It was observed that peak intensity varies at the spectral ranges containing the characteristic bands for the chemical species in the system. In contrast, minimal variations were observed at other spectral ranges. As an example, the Raman spectra in Figure 4 show that spectral intensity at characteristic bands for bicarbonate (1017 cm −1 ), carbonate (1065 cm −1 ) and aqueous CO 2 (1274 cm −1 ) increases. The observation agrees with the understanding that ionic species of bicarbonate and carbonate increases in concentration as CO 2 loading increases. As the pressure inside the reaction vessel increases, CO 2 began to dissolve physically into the solution. The physical dissolution increases aqueous CO 2 concentration in the liquid phase. For carbamate, the concentration increases up to a certain point before decreasing as the pressure increases. This follows the theory that carbamate production from the MEA reaction with CO 2 limits the capacity for absorption to 0.5 mol of CO 2 to mol of amine. As pressure increase, hydrolysis of carbamate forms free amine capable to react with CO 2 leading to loading of above 0.5 [23,24].

Evaluation of the Raman Species Calibration Models
Evaluation on the final Raman calibration models was performed by comparing the coefficient of determination (R 2 ) and RMSE values from the training dataset against the values of the test dataset. The closeness of the R 2 and RMSE values between training and test datasets indicate that the model had good generalization towards external data and overfitting is minimal. The overall results for the performance evaluation and validation for PLS and ANN-based calibration models are summarized in Tables 3 and 4, respectively. It was observed that the R 2 values for all calibration models are above 0.90 indicating good fit [49]. For PLS models, the average R 2 value is 0.9442 for calibration and 0.9050 for validation while the average RMSE value is 0.1301 for calibration and 0.1583 for validation. As for ANN models, the average R 2 value is 0.9505 for calibration and 0.9355 for validation while the average RMSE value is 0.0965 for calibration and 0.1225 for validation. The average value of RMSE for validation (RMSEV) is observed to be higher than RMSE for calibration (RMSEC) while the average value for R 2 calibration is higher than R 2 validation on both types of model. The observations were due to the use of test data for external validation which carries variance that the model never observed during development. Such an approach was to ensure that the validated model is parsimonious [13,[52][53][54]. The differences between the average values were also small, indicating that both PLS and ANN-based calibration models can predict the species concentration in the CO 2 -MEA-H 2 O system with good accuracy.

Comparative Analysis between Partial Least Squares (PLS) and Artificial Neural Network (ANN) Technique for the Development of Raman Calibration Model
Comparisons were made between the final PLS and ANN-based calibration models to determine the technique that produced better performing models. Comparisons were based on the average values of the performance parameters for calibration data. The results are summarized in Table 5. Comparing the R 2 values, both techniques resulted in comparable calibration models with R 2 values above 0.9 indicating a good fit. As for the average RMSE values, ANN-based calibration models have better prediction accuracy compared to PLS-based calibration models. This was indicated by the overall lower average RMSE values observed for all ANN models compared to PLS models. The better performance recorded by ANN is due to the application of the non-linear transfer function that transforms original input into non-linear projection at lower-dimensional subspace. Compared to the linear projection in PLS, the non-linear projection in ANN is more flexible, allowing the ANN model to better fit the datasets. The better fit translates to lower RMSE values for ANN models compared to PLS models [55].
An advantage of applying PLS over ANN is the ability to analyse the model parameters to understand the significance of each Raman variables within the modelled system. It was previously discussed that Raman spectra carry information regarding the chemical species within the monitored system. The information was contained within a specific range of the Raman spectrum called the characteristic band. The characteristic band for a particular chemical species contains spectral variables that are highly correlated with the chemical species. The spectral variables thus contributed highly towards explaining the variance of the species correlated with them in the model. For a PLS calibration model, the contribution was reflected in the PLS weight plot. An example of a weight plot is depicted in Figure 5. The plot was generated from the calibration model of bicarbonate at 4 molar total MEA concentration The plot shows multiple peaks appearing between 500 and 1500 cm −1 , indicating that variables within that range are highly correlated to the response variables. The large PLS weight values of the highly correlated variables are evidence that they contribute the most towards the variance explained in the model. The observation is reflective of the literature finding where the characteristic band for bicarbonate is located at 1017 cm −1 . Other peaks are from weak intensity signals at 632 cm −1 , 1360 cm −1 and 1630 cm −1 due to (OH)CO band, symmetric CO stretching and anti-symmetric CO 2 stretch, respectively [56,57].
into non-linear projection at lower-dimensional subspace. Compared to the linear projection in PLS, the non-linear projection in ANN is more flexible, allowing the ANN model to better fit the datasets. The better fit translates to lower RMSE values for ANN models compared to PLS models [55].
An advantage of applying PLS over ANN is the ability to analyse the model parameters to understand the significance of each Raman variables within the modelled system. It was previously discussed that Raman spectra carry information regarding the chemical species within the monitored system. The information was contained within a specific range of the Raman spectrum called the characteristic band. The characteristic band for a particular chemical species contains spectral variables that are highly correlated with the chemical species. The spectral variables thus contributed highly towards explaining the variance of the species correlated with them in the model. For a PLS calibration model, the contribution was reflected in the PLS weight plot. An example of a weight plot is depicted in Figure 5. The plot was generated from the calibration model of bicarbonate at 4 molar total MEA concentration The plot shows multiple peaks appearing between 500 and 1500 cm −1 , indicating that variables within that range are highly correlated to the response variables. The large PLS weight values of the highly correlated variables are evidence that they contribute the most towards the variance explained in the model. The observation is reflective of the literature finding where the characteristic band for bicarbonate is located at 1017 cm −1 . Other peaks are from weak intensity signals at 632 cm −1 , 1360 cm −1 and 1630 cm −1 due to (OH)CO band, symmetric CO stretching and anti-symmetric CO2 stretch, respectively [56,57]. The results from performance evaluation are further supported by observing the regression plots. A good model shall have the datapoints distributed along the regression line. Examples of regression plots are depicted in Figure 6, taken from calibration models for 3 molar MEA concentration. It was observed that the datapoints are well-distributed along the straight line on all models. There was no noticeable trending observed in the distribution of datapoints, indicating that the current model is sufficiently parametrized in explaining the variance in the data. A small deviation was observed between actual and predicted datapoints which was reflected by the resulting RMSEC and RMSEV values. The results from performance evaluation are further supported by observing the regression plots. A good model shall have the datapoints distributed along the regression line. Examples of regression plots are depicted in Figure 6, taken from calibration models for 3 molar MEA concentration. It was observed that the datapoints are well-distributed along the straight line on all models. There was no noticeable trending observed in the distribution of datapoints, indicating that the current model is sufficiently parametrized in explaining the variance in the data. A small deviation was observed between actual and predicted datapoints which was reflected by the resulting RMSEC and RMSEV values.
It was also noted that during the development of the calibration model, the whole spectrum range was utilized. However, the characteristic bands for a chemical species are present only at a specific spectral range within a Raman spectrum based on literature. Other variables outside the spectral range are considered as noises. Developing calibration models by utilizing specific spectral ranges that were correlated with the particular species could reduce noises that interfere during model development thus improving performance [58,59].

Conclusions
Development of Raman calibration models utilizing two different multivariate regression techniques, which are partial least squares (PLS) regression and an artificial neural network (ANN), were explored. The Raman spectra and respective species concentration were acquired through the VLE experiment of the CO2-MEA-H2O system at varying It was also noted that during the development of the calibration model, the whole spectrum range was utilized. However, the characteristic bands for a chemical species are present only at a specific spectral range within a Raman spectrum based on literature. Other variables outside the spectral range are considered as noises. Developing calibration models by utilizing specific spectral ranges that were correlated with the particular species could reduce noises that interfere during model development thus improving performance [58,59].

Conclusions
Development of Raman calibration models utilizing two different multivariate regression techniques, which are partial least squares (PLS) regression and an artificial neural network (ANN), were explored. The Raman spectra and respective species concentra-tion were acquired through the VLE experiment of the CO 2 -MEA-H 2 O system at varying total MEA concentration. The VLE data from the experiment were then further used to predict species concentration inside the CO 2 -MEA-H 2 O system using the regressed Kent-Eisenberg thermodynamic model. A total of 21 Raman species calibration models were developed using each technique utilizing smoothed Raman spectra and the estimated species concentration. The performance of each model was evaluated using performance parameters of R 2 and RMSE. The R 2 values from the developed calibration models were above 0.90 while the RMSE values improved throughout model optimization. Comparison of both techniques shows that ANN models were more accurate with lower RMSE values observed compared to the PLS model. However, the small differences between the RMSE values indicate that PLS models were comparatively good in predicting species concentration. Additionally, the PLS model allows studies on the significance of each variable to be conducted through analysing the weight plots, which was not possible in ANN models, this being a black box technique. The regression plots show close data distribution along the regression line with no trending observed from the data distribution for both calibration and test datasets. The result signifies that both techniques are applicable for developing Raman species calibration models to predict species concentration with good accuracy in CO 2 -loaded aqueous MEA system. Notably, PLS provides added advantages over ANN in terms of analysis on the significance of each spectral variable on the developed models.