A Low-Cost System for Moisture Content Detection of Bagasse upon a Conveyor Belt with Multispectral Image and Various Machine Learning Methods

: This research aimed to propose an online system based on multispectral images for the real-time estimation of the moisture content (MC) of sugarcane bagasse. The system consisted of a conveyor belt, four halogen bulbs, and a multispectral camera. The MC models were developed using machine learning algorithms, i.e., multiple linear regression (MLR), principal component regression (PCR), artiﬁcial neural network (ANN), PCA-ANN, Gaussian process regression (GPR), PCA-GPR, random forest regression (RFR), and PCA-GPR. The models were developed using 150 samples (calibration set) meanwhile the remaining 50 samples were applied as a validation set. The comparison of all developed models showed that the PCA-RFR model achieved better detection with a higher accuracy of MC prediction. The PCA-RFR model showed the best results which were a coefﬁcient of determination of prediction (r 2 ) 0.72, root mean square error of prediction (RMSEP) 11.82 wt%, and a ratio of the standard error of prediction to standard deviation (RPD) of 1.85. The results show that this technique was very useful for MC rapid screening of the sugarcane bagasse.


Introduction
Sugarcane is one of the most important agricultural crops in Thailand and plays a significant role in the country s economics [1]. Sugarcane bagasse (SB) was a residue waste from the juice extracting process in the sugar production which was approximately 29% of as-received sugarcane weight [2]. In the harvesting season of 2019/2020, around 130 million tons of sugarcane were sent to sugarcane mill plants [3] which produced a total SB of approximately 37.7 million tons. Sugarcane bagasse normally can be used as a major source of energy and as a raw material in combustion, gasification, and pyrolysis processes. Approximately 99.96% of the total SB was used within the combustion process to operate a steam turbine for generating electricity by direct combustion [4]. The energy characteristics of SB were as follows: as-received SB contained a water content of about 50%, and dried SB contained 50% cellulose, 25% lignocellulose, and 25% lignin [5]. The proximate analysis of SB showed that it contained 14.95% volatile matter (VM), 73.78% fixed carbon (FC), 11.27% ash, and calorific measurement revealed a higher heating value (HHV) of 17.33 MJ/kg [6], and lower heating value (LHV) of 19.37 MJ/kg dry basis [7]. Isabirye et al. [8] suggested that one ton of SB could produce up to two tons of steam, and five tons of steam could Processes 2021, 9, 777 2 of 12 produce 1.0 MWh of electricity. Therefore, efficient SB use is significant in the power plant economy and sugar production.
Moisture content (MC) is the main factor negatively impacting the combustion process [9]. High MC levels reduced the energy potential of SB during combustion [7], and a high MC in SB also required a high amount of energy to release water [7]. Generally, the MC of SB is around 50% after the milling process [10], which is normally used for the boilers. Therefore, SB with an MC ≤ 50% is considered to be a good quality feedstock [1,10]. High moisture content in SB caused the reduction of combustion degradation [10], which induced a decrease in temperature, leading to delay in combustion [4]. The storage method was one of the factors affecting the MC of SB. Various initial properties of sugarcane, such as variety, harvesting age, and harvesting methods (auto-sugarcane harvester, manual harvester, and harvester by burning fields), led to varying SB properties. During the extraction process, water was sprayed on the sugarcane to maintain high juice quality and yield.
Currently, MC is estimated by either an MC sensor probe or visual checking, with the latter being labor intensive and lower accurate. When the operator observes SB with a high MC, other fuels as wood chip and rice husk are mixed with the SB to obtain lower MC. For this reason, inline MC monitoring should be implemented as a real-time measurement of MC in order to provide more reliable results within a shorter time.
The multispectral image camera, a low-cost device, provides three visible bands (blue, B; green, G; red, R) and two invisible bands (RedEdge and near-infrared, NIR); these wavelengths interact with the chemical substances i.e., carbon (C), hydrogen (H), nitrogen (N), oxygen (O), and sulfur (S). The reflectance value can be presented in a 2D image. The image provides the spatial information presented in the picture-pixel. The prediction of the organic matter was also presented in the 2D image. The information of the sample surface, MC, chemical properties, and physical properties interacting with VIS/NIR was reflected to the detector and recorded as a reflectance value. The optical information was used for modeling when using multivariate analysis.
The multispectral camera consisted of five bands i.e., blue (430-520 nm), green (520-600 nm), red (630-690 nm), NIR (760-900 nm), and RedEdge (710-730 nm). According to the summarization from Wu et al. [11], the wavebands of the blue, green, and NIR regions were related to the vibrational band of water and hydrocarbon, which corresponds to the O-H 2nd overtone (450 (blue) and 528 (green) nm), and the 3rd overtone of OHstretching + OH-bending (815-985 nm). In addition, the wavelength range of 870-885 nm was a vibrational band of the 3rd overtone of -CH 3 stretching. Spectral reflectance, in particular wavelengths of approximately 800 nm, was selected as an optimal band for predicting vegetation water content due to deep penetration of radiation into the leaf surface [11]. Liu et al. [12] reported that the combination of NIR spectra with R, G, B data could increase the model accuracy for the prediction of water-injected beef samples using multispectral imaging analysis.
The use of a multispectral camera has many advantages; for example, the 2D camera display can demonstrate the status of the sample on the conveyor which is useful to estimate the volume of the sample. The predicted value can be displayed via distribution mapping, and the camera can detect any impurities such as stones and pieces of wood.
Real-time measurement of MC in biomass (sawdust) using microwave reflection was reported with SEC values from 1.28 to 1.75% [13]. In several studies, the imaging device for online MC measurement of tea leaves was also applied, using hyperspectral imaging (R v 2 values between 0.918 and 0.951) [14] and water-injected meat using multispectral imaging (r 2 = 0.946) [12]. However, there were no reports on the possible use of the multispectral image camera for real-time measurement of MC, and its accuracy in the prediction of MC. The measurement of MC has mainly been performed via NIR spectroscopy, e.g., MC of bamboo chips (root mean square error of prediction (RMSEP) of 0.18%) [15], online measurement MC of biomass (relative standard deviations (RSD) of 9.04%) [16], and inline prediction of moisture content in tapioca starch during drying (SEP of 0.61%) [17].
According to our knowledge, there is no research reporting on the application of five-band multispectral imaging for detecting MC distribution in bagasse biomass during conveyor belt transportation. Therefore, the low-cost multispectral image for the real-time estimation of the MC was applied in this research. The prediction model of the MC of SB, based on real-time measurement, was then developed and improved using various machine learning techniques.

Sample
Each sample was approximately 3 kg and was randomly collected from different bagasse piles, pile position, and depth and then stored in a plastic bag. In total, 70 bags of SB were collected from the sugarcane mill over 4 months and 3 samples were collected from each bag from December 2019 to March 2020 which is the normal milling season of sugarcane in Thailand. The samples were brought to the Bio-Sensing and Field Robotic (BSFR) Laboratory, Khon Kean University, Thailand, for further experiments.

Measurement System
The experimental unit in Figure 1a, consisted of (1) a conveyor belt (4 m long, 1 m width, and 1.2 m height), (2) a measuring chamber (1 m long, 1 m width, and 1 m height), (3) a multispectral camera (RedEdge, MicaSense, Wichita, KS, USA) providing five wavebands, i.e., R, G, B, NIR, and RedEdge with a center wavelength (nm) of 668, 560, 475, 840, and 717, respectively, (4) four 75W halogen lamps (SYLVANIA Halogen 12 V SA111, Feilo Sylvania company, Hungary). Halogen lamps were installed at each corners of the measuring chamber ( Figure 1a) in order to be used as the light source, to eliminate shadow problem and avoid different light intensities. The light source was also used to confirm that the light intensity was not different during experiments. In addition, the halogen lamp provided the infrared radian, which had interaction with the moisture content. The multispectral camera was installed on the top of the measuring chamber at a height of 1 m from the flat conveyor belt. The 20 cm-deep SB sample was poured onto the conveyor, as shown in Figure 1b. and inline prediction of moisture content in tapioca starch during drying (SEP of 0.61%) [17]. According to our knowledge, there is no research reporting on the application of fiveband multispectral imaging for detecting MC distribution in bagasse biomass during conveyor belt transportation. Therefore, the low-cost multispectral image for the real-time estimation of the MC was applied in this research. The prediction model of the MC of SB, based on real-time measurement, was then developed and improved using various machine learning techniques.

Sample
Each sample was approximately 3 kg and was randomly collected from different bagasse piles, pile position, and depth and then stored in a plastic bag. In total, 70 bags of SB were collected from the sugarcane mill over 4 months and 3 samples were collected from each bag from December 2019 to March 2020 which is the normal milling season of sugarcane in Thailand. The samples were brought to the Bio-Sensing and Field Robotic (BSFR) Laboratory, Khon Kean University, Thailand, for further experiments.

Measurement System
The experimental unit in Figure 1a, consisted of (1) a conveyor belt (4 m long, 1 m width, and 1.2 m height), (2) a measuring chamber (1 m long, 1 m width, and 1 m height), (3) a multispectral camera (RedEdge, MicaSense, Wichita, KS, USA) providing five wavebands, i.e., R, G, B, NIR, and RedEdge with a center wavelength (nm) of 668, 560, 475, 840, and 717, respectively, (4) four 75W halogen lamps (SYLVANIA Halogen 12 V SA111, Feilo Sylvania company, Hungary). Halogen lamps were installed at each corners of the measuring chamber (Figure 1a) in order to be used as the light source, to eliminate shadow problem and avoid different light intensities. The light source was also used to confirm that the light intensity was not different during experiments. In addition, the halogen lamp provided the infrared radian, which had interaction with the moisture content. The multispectral camera was installed on the top of the measuring chamber at a height of 1 m from the flat conveyor belt. The 20 cm-deep SB sample was poured onto the conveyor, as shown in Figure 1b.

Image Acquisition
The experimental procedure of real-time MC measurement of bagasse using multispectral images was illustrated in Figure 2. Before capturing, the sample was mixed thoroughly to ensure homogeneity. Images were acquired as follows: (1) the light source was turned on and left for 5 min to ensure a stable light, (2) approximately 3 kg of SB was poured onto the conveyor belt, (3) the dark and white references were collected, (4) the conveyor belt switch was turned on (belt velocity of 20 cm/s) and (4) the bagasse image was captured while the sample was moving under the measuring chamber. The images, collected with five bands (R, G, B, NIR, and RedEdge), were arranged into a 3D format using MatLab (license no: 40846673, MathWorks, Natick, MA, USA). The image resolution was 1280 × 960 pixels with 16-bit.

Image Acquisition
The experimental procedure of real-time MC measurement of bagasse using multispectral images was illustrated in Figure 2. Before capturing, the sample was mixed thoroughly to ensure homogeneity. Images were acquired as follows: (1) the light source was turned on and left for 5 min to ensure a stable light, (2) approximately 3 kg of SB was poured onto the conveyor belt, (3) the dark and white references were collected, (4) the conveyor belt switch was turned on (belt velocity of 20 cm/s) and (4) the bagasse image was captured while the sample was moving under the measuring chamber. The images, collected with five bands (R, G, B, NIR, and RedEdge), were arranged into a 3D format using MatLab (license no: 40846673, MathWorks, Natick, MA, USA). The image resolution was 1280 × 960 pixels with 16-bit.

Image Acquisition
The experimental procedure of real-time MC measurement of bagasse using multispectral images was illustrated in Figure 2. Before capturing, the sample was mixed thoroughly to ensure homogeneity. Images were acquired as follows: (1) the light source was turned on and left for 5 min to ensure a stable light, (2) approximately 3 kg of SB was poured onto the conveyor belt, (3) the dark and white references were collected, (4) the conveyor belt switch was turned on (belt velocity of 20 cm/s) and (4)    After capturing, three regions of interest (ROI) in each image (sample) were cropped to 15 × 15 cm, each ROI was assigned as an individual sample then the total was 210 samples, and the reflectance value of each crop was averaged (see Figure 2). The reflectance value was then calculated using Equation (1). The SB at the ROI was taken to determine the MC by oven-drying. Each ROI sample consisted of two replicates; MC was calculated on a wet basis. The relative reflectance value (R) was calculated as described elsewhere [18,19]: where R sample is the reflectance of the sample, R dark is the collected reflectance value while turning off the light source, and R white is the reflectance of the Teflon plate, which was placed on the conveyor belt at the same level as the sample and photographed together with the SB sample (R white was the averaged ROI of the Teflon plate which was assumed to have 100% reflection).

Reference Method
After capturing and cropping, each cropped sample was assigned as an individual sample. Approximately 5 g of each sample was put into aluminum cans (5 cm diameter and 4 cm height) for determining the MC using a hot air oven (Memmert, model ULM 500, Germany) at 105 • C for 24 h. After that, the samples were re-heated at 6 h intervals until the sample weights remained constant. The weighing was done using a digital balance (AE-ADAM digital balance, Adam Equipment Inc, Fox Hollow Road, Oxford, New York, USA, resolution of 0.001 g). The MC (wt%) was calculated as Equation (2).
where MC (wt%) is moisture content based on wet basis by weight (wt%), m i denoted mass of the sample in g, and the subscripts i and f were initial and final weighing, respectively. There were three replications per sample.

Modeling and Performance Analysis
The total sample was divided into calibration (70% for the training set) and validation sets (30% for the testing set) of 150 and 50 samples, respectively. The calibration set was used to develop the predictive model, meanwhile, the validation set was used to test the performance of the created model, to confirm whether the developed model could be applied in the future because the independent variable of the test set was not included in the calibration set. Model development included two steps: (1) utilization of various machine learning algorithm techniques to link wavebands features to their MC, (2) testing of the predictive model using independent variables of the test set and examining their performance, and (3) the selection of the optimal algorithm that could predict MC with the highest accuracy.
The relationship between the feature of reflectance value and its corresponding MC was linked together using multivariate analyses including multiple linear regression (MLR), artificial neural networks (ANNs), Gaussian process regression (GPR), and random forest regression (RFR). Finally, their performances were compared.
The five variables, i.e., R, G, B, NIR, and Red-Edge bands were assigned as independent variables (raw) while the measured MC of a sample was the dependent variable. The input data were either raw reflectance data or principal component (PC-score) created by principal component analysis (PCA). The MLR, ANN, GPR, and RFR algorithm were run using PC-score to be the independent variable, called PCA-MLR or principal component regression (PCR); PCA-ANN, PCA-GPR, and PCA-RFR, respectively. The MLR is used to find the regression coefficients with which the best fit of data was performed using the method of least squares. For ANN, we used the optimal number of errors approach. The number of hidden layers was varied between 1 and 10. The optimal number of hidden layer neurons was found if it achieved the minimum of the mean square error. For RER, the minimum number of trees required was obtained where the number of trees provided the minimum mean square error which remained constant.
After modeling, the performance of the model was examined using statistical terms including the coefficient of determination (R 2 ), root mean square error of calibration (RMSEC), root mean square error of prediction (RMSEP), and root mean square error of prediction to standard deviation (RPD). Models providing the highest performance were selected. For example, the model with high r 2 , low RMSEP, and high RPD has high accuracy. The model with the lowest RMSEP was selected. The RMSEP was the error uncertainty that could be expected for predictions of future samples [20]. The usefulness of a developed model in application was indicated by the performance of the validation set. Considering the application of NIR spectroscopy, the r 2 value was used to consider the application level. The r 2 value of 0.00−0.49 indicates a poor correlation and the model was not recommended, 0.50−0.64 for rough screening, 0.66−0.81 for screening, 0.83−0.90 with caution for most applications, 0.92−0.96 for quality assurance, and >0.98 for any application. An RPD between 1.5 and 2 means that the model can discriminate between low and high values of the response variable; a value between 2 and 2.5 indicates that coarse quantitative predictions are possible; a value between 2.5 and 3 or above corresponds to good and excellent prediction accuracies, respectively [20].

Measured MC
The relative reflectance values of multispectral images including red, green, blue, NIR, and RedEdge, and MC data used for model development are shown in Table 1, including minimum, maximum, average, and standard deviation (SD). The range of MCs was 21.79 wt% (w.b.), indicating that within the same pile, MC varied to a large extent. Max denotes maximum; Min denotes minimum; AVE denotes average; SD denotes standard deviation; MC denotes moisture content. The histogram in Figure 3a-f shows the pixel intensity distribution of the bagasse for the red, green, blue, NIR, and RedEdge, respectively. Figure 3g shows the relationship between the intensity of the relative reflectance and the MC of the bagasse sample-it can be seen that the relative reflectance decreased with increasing in MC. This means that bagasse with much higher moisture content absorbed light more than a lower MC sample. Correlation between the relative reflectance value of a multispectral image and moisture content was demonstrated in Table 2, which shows that the red band had a strong correlation with MC, followed by RedEdge, green, NIR, and lastly, the blue band. All image bands correlated negatively with moisture content, meaning that as the moisture content of bagasse increased, reflectance from the sample decreased. It could be indicated that water was a strong light absorber and a very good absorber in the visible and IR region [21]. Figure 4 shows the principal components of the relative reflectance image for the bagasse sample. The first two principal components, accounting for 96% (87% + 9%), showed that the sample distribution was very wide and these samples were good representative samples [22].

Prediction Models for Estimation of Moisture Content in the Bagasse Samples
The performance of the MC model created using different machine learning techniques including MLR, PCR, ANN, PCA-ANN, GPR, PCA-GPR, RFR, and PCA-RFR are shown in Table 3, demonstrating the R 2 , RMSEC, r 2 , RMSEP, and RPD. Either the reflectance values of five variable bands (R, G, B, NIR, and RedEdge) or their PC-score from PCA coupled with various machine learning algorithms were used for model development based on the same calibration and prediction set. The feature used for model development is demonstrated in Table 3. The models PCR, ANN, and PCA-ANN had low accuracy, followed by MLR, RFR, and PCA-GPR, respectively. The performance of GPR and PCA-RFR was the same. However, the PCA-RFR model resulted in the highest accuracy. The moisture content model provided an R 2 of 0.83, RMSEC of 8.71 wt% (w.b.), r 2 of 0.72, RMSEC of 11.28 wt% (w.b.), and RPD of 1.85. The model showed excellent prediction if R 2 > 0.90, good prediction if 0.81 < R 2 < 0.90, approximate prediction if 0.66 < R 2 < 0.80, and poor prediction if R 2 < 0.66 [23]. An RPD between 1.5 and 2 means that the model can discriminate between low and high values of the response variable [20]. Therefore, this model was acceptable for screening. Figure 5a,b shows the scatter plot between the reference value and predicted value of the calibration and validation sets, respectively, illustrating the coefficient of determination (R 2 = 0.83 and r 2 = 0.72) and Pearson's coefficient of determination (R p 2 = 0.86 and r p 2 = 0.77). As a result, that the use of the prediction model in lower moisture contents (moisture content < 20%wt) was not recommended due to the fact that the predicted value was more than that of the measured value. There is information to support [23]-the absorption coefficient became higher with increasing moisture percentage. Therefore, for lower moisture contents, there was less absorption.  Figure 6a shows the important feature including PC1, PC2, PC3, PC4 and P figure shows a bar chart plotted between the out-of-bag feature importance and P which showed that feature number 1 (PC1) was the most important in predic lowed by PC3, PC2, PC4, and PC5. It is a fact that PC1 was the direction with th variance and was the most important. The optimal tree number and mean squar (RME) is demonstrated in Figure 6b. This showed that the number of tree of 30 w mal due to starting with a constant value of RME and providing a low RME. Pearson's coefficient of determination. Figure 6a shows the important feature including PC1, PC2, PC3, PC4 and PC5. The figure shows a bar chart plotted between the out-of-bag feature importance and PC-score, which showed that feature number 1 (PC1) was the most important in prediction, followed by PC3, PC2, PC4, and PC5. It is a fact that PC1 was the direction with the largest variance and was the most important. The optimal tree number and mean squared error (RME) is demonstrated in Figure 6b. This showed that the number of tree of 30 was optimal due to starting with a constant value of RME and providing a low RME.

Conclusions
A fast and non-destructive inline detection method using five-band multispe aging was proposed to estimate the quality of bagasse in terms of moisture cont method can be used for detecting the moisture content of bagasse upon conve movement. The RFR algorithm coupled with PC-scores (PC1, PC2, PC3, PC4, and five-band multispectral images called "PCA-RFR" was the most suitable for mo tion because its accuracy was the lowest with an RMSEP of 11.82 wt%. For meas of bagasse sample, four light bulbs were installed in the corner area of meas chamber, and the speed of belt of 20 cm/s was set for multispectral images acq This system could be applied with the bagasse sample obtained after juice extra cause the sample is homogeneous. The prediction model for the estimation of content provides the RPD of 1.85, this corresponded to the screening use app Therefore, the multispectral imaging technique system could be used as a low-cos for screening moisture content, and improving accuracy.

Conclusions
A fast and non-destructive inline detection method using five-band multispectral imaging was proposed to estimate the quality of bagasse in terms of moisture content. This method can be used for detecting the moisture content of bagasse upon conveyor belt movement. The RFR algorithm coupled with PC-scores (PC1, PC2, PC3, PC4, and PC5) of five-band multispectral images called "PCA-RFR" was the most suitable for model creation because its accuracy was the lowest with an RMSEP of 11.82 wt%. For measurement of bagasse sample, four light bulbs were installed in the corner area of measurement chamber, and the speed of belt of 20 cm/s was set for multispectral images acquisition. This system could be applied with the bagasse sample obtained after juice extraction because the sample is homogeneous. The prediction model for the estimation of moisture content provides the RPD of 1.85, this corresponded to the screening use application. Therefore, the multispectral imaging technique system could be used as a low-cost system for screening moisture content, and improving accuracy.