Intelligent Industrial Cleaning: A Multi-Sensor Approach Utilising Machine Learning-Based Regression

Effectively cleaning equipment is essential for the safe production of food but requires a significant amount of time and resources such as water, energy, and chemicals. To optimize the cleaning of food production equipment, there is the need for innovative technologies to monitor the removal of fouling from equipment surfaces. In this work, optical and ultrasonic sensors are used to monitor the fouling removal of food materials with different physicochemical properties from a benchtop rig. Tailored signal and image processing procedures are developed to monitor the cleaning process, and a neural network regression model is developed to predict the amount of fouling remaining on the surface. The results show that the three dissimilar food fouling materials investigated were removed from the test section via different cleaning mechanisms, and the neural network models were able to predict the area and volume of fouling present during cleaning with accuracies as high as 98% and 97%, respectively. This work demonstrates that sensors and machine learning methods can be effectively combined to monitor cleaning processes.


Introduction
The cleaning of surface fouling is important in many industries to ensure equipment, such as heat exchangers, are operating under optimal conditions. Cleaning is even more important within food and drink manufacturing, as it ensures that equipment is hygienic and eliminates the crosscontamination of ingredients between production batches. The cleaning of equipment was historically performed manually by factory operators who would be required to disassemble equipment and spray internal surfaces with water and cleaning chemicals. Currently, most of the industrial cleaning of equipment is performed by automated systems named Clean-in-Place (CIP). These systems feature process tanks for cleaning fluids and pumps, pipework, and spray balls to perform the cleaning within processing equipment, without the need for disassembly. Clean-in-Place systems clean the equipment by a combination of mechanical flow, chemicals, temperature, and time, and they generally feature several steps including initial rinse, detergent wash, post detergent rinse, and sterilant [1].
Although a significant improvement over manual cleaning, CIP comes at a cost. The economic cost of CIP is primarily associated with lost production time. Every minute spent cleaning equipment is time that the equipment cannot be used to manufacture product. As the food and drink manufacturing sector operates on tight profit margins, reducing production downtime is extremely desirable. In addition, there are financial costs associated with the use of cleaning chemicals, water, and energy use. The main environmental cost of cleaning is the vast amount of water and energy utilized. Statistics show that cleaning accounts for 30% of the energy used in dairy production [2] and 35% of the water used in beer brewing [3]. Due to the costs associated with CIP, research has been performed to optimize the process. The vast majority of this work has focused on either studying the cleaning settings, such as fluid temperature and flow rate [4] or studying the actual mechanisms of surface fouling and cleaning [5].

Sensor Research in Cleaning
Wide literature and industrial practice are available on industrial cleaning processes monitoring using different sensor technologies. Most research efforts are directed on either: (1) monitoring the cleaning fluid properties to determine if it contains soiling material or cleaning fluids, or (2) monitoring the internal surfaces of the processing equipment. Monitoring the properties of fluid in the systems has been performed using electrical and optical methods [6][7][8]. These techniques offer valuable insight but are not sensitive to fouling that is still adhered to a surface so can often produce misleading results on how clean the equipment actually is.
A variety of different sensor methods have been utilized to monitor the presence of surface fouling with the main ones being electrical (e.g., [9][10][11][12]), optical (e.g., [13][14][15][16][17]), acoustic (e.g., [13,18,19]) and ultrasonic methods (e.g., [20][21][22][23]). Electrical methods utilize sets of probes and measure electrical properties such as resistance and conductance across pairs of probes. The presence of fouling affects heat transfer in pipes, which is detected by the electrical methods. Although a promising technique, the recorded signals can become noisy and difficult to interpret when turbulent flow is used, which is nearly always the case in CIP. Optical method utilizing either florescence [13] or fibre-optic [17] techniques have been used to monitor surface fouling removal. Optical techniques can often provide spatial information of the location and degree of fouling; however, lighting is required, which is extremely challenging in pipe work. Acoustic methods use mechanical instruments to excite a vibration in the equipment where the fouling is present. These vibrations are subsequently detected by a transducer (generally piezoelectric). Acoustic methods have been used to monitor the removal of materials, including dairy fouling [24] and shampoo residue [18] from a surface. In general, the presence of fouling will damp the vibrations detected by the transducers. However, it will be challenging to deploy these methods in industrial environments where other vibrations exist.
Ultrasonic (US) sensor technologies employ high-frequency mechanical waves, which propagate in the system under inspection. The benefits of US sensors is that they are often significantly cheaper than other techniques and can perform non-destructive measurements in real time for a variety of different industrial applications [19]. The application of US techniques to monitor fouling was pioneered by Withers [13], who used two US transducers in a laboratory scale rig filled with a fluid and different measuring the thickness of diverse food and non-food fouling materials on the internal surfaces. [13]. Since then, research activities have been carried out utilizing US technologies for fouling detection and cleaning monitoring [19,[21][22][23][25][26][27]. The majority of previous works employed single US sensors operating in reflection mode and has mainly focused on dairy fouling. The focus has often been on dairy fouling, as it is especially difficult to clean and a current challenge for the sector.
These different sensor methods vary in terms of their cost, complexity, and operating parameters such as speed of data acquisition and spatial area monitored. It is clear that no single sensor method is suitable for monitoring all the different types of equipment used within food production, so a range of different sensor and data analysis methods should be deployed to monitor the different components in a CIP system.
With any sensor technique, there is a need for appropriate data analysis methods to link the recorded measurement to the system under inspection. Machine learning is an attractive data analysis method as the development of first-principle models, which are challenging for real-life applications, is not required. Supervised machine learning methods enable the development of models with a labelled dataset and have been used successfully to monitor surface fouling removal during cleaning processes [20][21][22][23]. All of this previous work has developed classification models to determine when the surface remains fouled or is clean using ultrasonic measurements. This previous work has focused on dairy and other fouling materials. The majority of this work has been performed on flat test sections, but examples exist [20] of the technique being used in pipes of circular crosssections.

Signal and Image Processing
Recorded US waves require appropriate signal processing to extract key information. Wavelet decomposition (WD) decomposes a single time domain signal instance (e.g., ultrasonic) into a twodimensional function, where each of the decomposed signals is a mixture of source signals [28]. It can be considered as a series of band pass filters, whose results could be regarded as different mixtures of independent source signals [29]. Compared to Fast Fourier Transformation, WD provides time and frequency combined information for more efficient feature extraction from a computational effort perspective. For this reason, such algorithms are very advantageous in real-time data processing, resulting in one of the most widespread and powerful methods of signal analysis [30]. A variant of this method, the wavelet packet transform (WPT), which is a generalization of the wavelet decomposition [31], is used in this work to process the ultrasonic signals and extract features used in the neural networks.
With reference to CIP processes, the literature reports a number of image segmentation procedures for food fouling detection. A review reported in [14] compares Otsu, Iteration method, 1D and 2D entropy. Fuzzy c-means (FCM) clustering [32,33] is a powerful clustering algorithm that allows each data point to belong to multiple clusters with varying degrees of membership. The main advantages are represented by its property of convergence and low complexity. Xiong et al. [34] utilized the FCM concept to perform image segmentation on RNAi Fluorescence Cellular Images setting the three-class concept to cluster the pixel values. The threshold is obtained by averaging the maximum in the class with the smallest centre and the minimum in the class with the middle centre. From an image histogram analysis perspective, the threshold results are to be located in correspondence of the demarcation line between the first and the second peak.
Intelligent decision-making on the cleaning state is enabled by machine learning paradigms; in this respect, a commonly utilized technique is the Neural Network (NN) due to the high-performance capabilities in nonlinear modelling using parallel processing [35] pattern recognition (classification), time series prediction, and data regression.
As regards classification purposes, the literature provides a variety of applications in cleaning processes, such as [20], in which ultrasonic measurements are used as input to a NN to monitor the fouling removal of food materials in plastic and metal cylindrical pipes. Moreover, [23] utilized acoustic features and fed them together with temperature and mass flow rate (both measured) into a NN to make the decision of fouling presence or absence.
A time series prediction-based intelligent decision-making support system is developed in [14] utilizing nonlinear autoregressive models with exogenous inputs (NARX) Neural Network was adopted and configured, trained, and tested to predict the cleaning time based on the image processing results.
Concerning the use of NN for regression purposes in cleaning processes, the available research is considerably limited, such as in applications to heat exchanger fouling assessment [36] during ultrasonic cleaning using Convolutional Neural Networks (CNN). This research will monitor the cleaning of different food fouling materials in a bespoke benchtop rig using ultrasonic (US) and optical sensors. The US signals are analysed using wavelet methods, whilst novel image processing methods are developed to calculate the number of pixels in an image's Region of Interest (ROI) that contain fouling and the total fouling volume. Results from both sensors are assessed to determine their capability in monitoring cleaning processes, and features extracted from the ultrasonic sensor measurements are used to develop a Neural Network regression model that can predict the amount of fouling present and be used to determine when the cleaning processes will be complete. This is the first time that US measurements have been used to develop regression ML models to predict the degree of fouling. It was decided to develop the regression models using the US measurements, as optical measurements would not be possible in processing equipment geometry such as closed pipework.

Materials and Methods
This section introduces the experimental and computational methodologies utilized for the design and realization of the cleaning tests along with the signal and image processing procedures adopted to estimate the remaining fouling during the cleaning process.

Experimental Tests
A benchtop experimental flow rig was constructed for the fouling removal experimental tests ( Figure 1). The rig featured a 1.2 mm thickness stainless steel (SS 430) bottom plate where the fouling removal would be monitored. In order to allow for image acquisition during fouling removal during cleaning, the rig lateral and upper surfaces were made of transparent Perspex. The rig was 300 mm long with an internal width and height of 40 mm. Valves controlling mains water pressure were located at either end of the rig to allow fluids to flow through and perform the cleaning. The fouling materials selected for the experimental tests were tomato paste, gravy, and concentrated malt extract. This range of different materials was utilized to investigate the different fouling and cleaning mechanisms [37,38]. Full details of the fouling materials are available in [27].
The particular choice of the fouling materials, namely tomato paste, gravy, and concentrated malt was made as they represent common materials from the food and drink manufacturing sector. In addition, they all have a different composition and therefore foul and clean from a surface differently. It has been shown in [39] that different materials yield different cleaning characteristics, and it is important to assess the potential of the developed sensing techniques for a range of industrially relevant scenarios. Such considerations highlight the need for more adaptive CIP systems in order to guarantee the removal of all fouling whilst avoiding over-cleaning.
The preparation of the gravy fouling included mixing 10 grams of granules and 10 mL of tap water in a beaker at 70 ℃ for 1 min with continuous stirring. The other fouling media did not require preparation. The fouling layer was obtained by applying 15 grams of the material on to the centre of the bottom plate of the rig (in correspondence of the US transducer, which was attached on the opposite side of the plate). Then, the fouling was manually spread using a spatula to form an even layer of 5 mm thickness. The cleaning experimental tests were initiated by opening the inlet valve on the rig to let it slowly fill with water. Then, data acquisition was started, and the outlet valve was opened. In this way, water flowed through the rig whilst removing the fouling material. Each cleaning test continued until the fouling appeared to be completely removed, which was determined via the camera images and visual observation. This was to ensure that enough data was recorded for both a range of fouled and clean conditions.
The experimental rig was endowed with a US sensor, a camera, and a temperature sensor to acquire data during the cleaning tests. The sensor's placement within the experimental rig is shown in Figure 1. The US sensor was a 5 MHz magnetic contact transducer (Olympus ® , Tokyo, Japan). This was mounted to the bottom side of the rig by using a thin film of couplant fluid between the US sensor and the SS430 wall to enable transmission of US waves into the material. A Bayonet Neill-Concelman (BNC) cable was used to connect the US transducer to a US box (Lecoeur Electronique ® , Chuelles, France). The US box performed the role of exciting the US transducer with an electrical voltage flat and recording and digitize the received US waves. A Logitech ® ( Lausanne, Switzerland) C270 3MP web camera was employed to acquire digital images, while the temperature was recorded with a RTD PT100 connected to a Pico Technology ® (Cambridgeshire, UK) PT-104 data logger. The US box, PT-104 data logger, and camera were all connected to a laptop and controlled through specifically designed software developed in MATLAB ® . During the cleaning experiments, US and temperature data were sampled at 0.25 Hz, while images were acquired every 20 s (0.05 Hz).
By varying and combining the fouling material and the temperature, an experimental program was defined and reported in Table 1. Each test was repeated a number of times to increase the statistical reliability of the experimental campaign.

Data Processing and Features Extraction
The estimation of the fouling surface and volume is carried out via intelligent regression based on features extracted from ultrasonic signals and information computed via image processing.
The framework adopted for this paper is illustrated in Figure 2, and it is composed of three main parts: ultrasonic signal processing (yellow boxes), aimed at extracting significant information on the cleaning process; image processing (green boxes) aimed at computing the amount of fouling surface and volume in correspondence to the acquired images; and machine learning-based regression (red boxes) aimed at performing an estimation of surface and volume fouling during the cleaning process.

Signal Processing
Due to the different sampling rate of the digital camera and the ultrasonic sensor, a preprocessing phase was carried out on each signal instance to allow for further processing.
Each raw US signal instance log includes four acquisitions distributed over a time span of 37.5 μs, as shown in Figure 3. Therefore, the first pre-processing step consists of averaging the four signals and subsequently, in order to match the US signals and the image acquisition sampling rates, the four signal instances between each image acquisition log are averaged as per Equation (1), where 4 indicates the four acquisitions and 4 indicates the four signal instances recorded between each image acquisition.
Then, such an averaged signal was subject to a segmentation procedure in order to remove the saturated signal portion corresponding to the first 6000 samplings. The result is a segmented signal made of 2400 samplings reported in Figure 3. The WPT [40] of a sensor signal generates packets of coefficients computed via scaling and shifting from a selected mother wavelet function. In this way, at the first level of WPT, the original sensor signal S is split into two frequency band packets, which are called approximation cA1 and detail cD1. Likewise, at the second level, each approximation and detail packet are again split into further approximations and details cA2 and cD2, respectively, and the process is repeated until the required level for the application is reached [41], generating a "tree" of decomposition packets, as shown in Figure 4. In this paper, the WPT is utilized to process the US signal instances with the aim of extracting significant features on the cleaning state over time. In this respect, a three-level wavelet decomposition was performed on the segmented signal using the order 3 Daubechies mother wavelet [42]. In this way, the coarse scale approximation coefficients and the detail coefficients from the decomposition were extracted from each US signal. Figure 5 displays an example of signal instance and its related details and approximation coefficients. From the decomposition vector, the approximation coefficients were computed for all the signal instances. Subsequently, from the approximation coefficients, a feature extraction procedure was applied to compute a number of statistical features [43], namely mean, standard deviation, minimum, maximum, skewness, kurtosis, and energy. An example of the most significant features is reported for the various fouling materials in Figure 6 with the aim of showing the trends during the cleaning process. The results in Figure 6 displays how three different statistical features (means, standard deviations, and energies) change during cleaning for the three different fouling materials. For all materials, the energies and standard deviations follow similar trends with an increase toward the end of the process when the test section becomes clean. Although the starting values for the standard deviations and energies are different for the three fouling materials, they all reach a similar end point once clean, which was approximately 350 for the standard deviations and 1.03 × 10 10 for the energies. The results for the means showed a different trend of gradually increasing during the initial stage of the cleaning process before suddenly reducing at the end. The means for the tomato did not show this trend, indicating that one statistical method may not be appropriate for monitoring the processes. It is not surprising to see that the largest change in the US statistical features was only at the end of the process, as previous work has shown that US reflection methods are only sensitive to the area of fouling covering the area on the test section opposite the transducer location and not the thickness of this fouling.

Image Processing
The quantification of the surface fouling and fouling volume required for the training dataset was carried out via image processing on the images acquired during the cleaning process.
The idea is to manipulate the images to highlight relevant information about the fouling. An initial pre-processing step for all the digital images is the channel separation. In this respect, the raw image in Figure 7a was acquired in Red-Green-Blue (RGB) space, consisting of three different layers, i.e., the red, green, and blue, separately shown in Figure 7b-d.
The raw RGB image was subject to a transformation procedure to be mapped into Hue-Saturation-Value (HSV) values [44]. The image channel breakdown is reported in Figure 7 with reference to a Tomato Cold 1 image instance. The HSV domain (Figure 7e) is characterized by three layers, corresponding respectively to the Hue (Figure 7f), i.e., the colour's position on the colour wheel, the Saturation (Figure 7g), i.e., amount of hue or departure from neutral (zero denotes a neutral shade, whereas 1 indicates maximum saturation), and Value (Figure 7h, i.e., the maximum value among the red, green, and blue components of a specific colour). Taking into account the experimental conditions, in terms of light source, food fouling colours, and camera settings, a tailored image processing procedure was developed in this research for quantifying the fouling volume. From the separated channels, a new image is computed as reported in Equation (2).
where is the red channel, is the saturation channel, is the blue channel, is the hue channel, and the operator ∁ indicates the image complement (negative image). In such an image, the background results are removed, and the pixel intensity results to be proportional to the fouling volume, as it can be seen in the result shown in Figure 8. The fouling volume can be graphically visualized as the pixel intensity (z-axis) as a function of the image pixels (x and y axes) shown in the 3D plot reported in Figure 9, where the pixel intensity was normalized between 0 and 1 for an enhanced visualisation.  Then, the computation of the fouling volume for the single image instance is computed as the sum of the pixel intensities, , within the image I, as reported in Equation (3).
where and are the vertical and the horizontal resolution of the image.
In order to estimate the amount of fouling, a Fuzzy c-means clustering-based thresholding method [34] was applied to the processed image following the steps listed below: The FCM is carried out by minimizing a certain objective function as shown in Equation (4) where is the number of the data points, here represented by the image pixels ( ), is the number of clusters, and is the fuzzy partition exponent corresponding to the fuzzy overlap degree. is the center of the cluster and is the degree of membership of each pixel ( ) in the cluster. For the image segmentation proposed in this paper, the steps for the FCM implementation are as follows: • Set the number of clusters = 3, respectively "small", "medium", and "large" based on the pixel intensity value. The fuzzy matrix exponent was set to 2. • Initialize the cluster membership with random values. • Compute the cluster centers according to Equation (5): • Update as per Equation (6): • Compute the value of the objective function • Recompute , , and until meeting a termination criterion, such as a minimum improvement or maximum number of iterations.
• Retrieve the final centroids coordinates and the final fuzzy membership degree of each piece of pixel data. • Assign each pixel to one of the three clusters based on the maximum membership degree. • Compute the maximum pixel intensity value for the cluster "small" max and the minimum pixel intensity value for the cluster "medium" min . • Compute the threshold value as the average of the two values computed above, i.e., = .
An example of segmented image using the FCM thresholding technique is reported in Figure 10a. To facilitate the computation process, the segmented image was then transformed to its negative, as shown in Figure 10b. From the negative segmented image, the surface fouling is computed as the sum of the pixel intensities, , as reported in Equation (8), where and are the horizontal and vertical image resolution, respectively.

Surface Fouling = (8)
Within the acquired image, a Region of Interest (ROI) was determined in correspondence of the sensor positioning; therefore, the surface fouling and the fouling volume were computed only for the ROI area, i.e., a circle with radius = 45 pixels, as depicted in Figure 11.  Figure 12 shows examples of surface fouling and fouling volume trends. The increasing trend occurring in some of the samples is explained by the fact that during the cleaning process, fouling material lumps are progressively smeared over a larger surface [14].
As regards fouling volume charts, the fluctuations are due to the small size of the ROI and the transient nature of the fouling, i.e., during the cleaning process, fouling may temporarily increase in the ROI as it is moves across the surfaces [14,16].
The results in Figure 12 show that for all three fouling materials, the number of fouled pixels and the fouling volume reduces in the ROI during the cleaning processes and achieves a value of zero when all fouling has been removed. Although the general trend of reduction is consistent between the surface fouling and fouling volume for the different fouling materials, differences do exist within the results. These are most noticeable for tomato where fouling volume (Figure 12f) begins to reduce at the start of the cleaning process, whereas surface fouling (Figure 12e) does not begin to reduce until after approximately the seventh image instance. This result indicates that volume fouling is a more useful technique to monitor cleaning processes; however, this trend was not identified with the gravy or malt (Figure 12a-d). For the gravy, both the surface fouling and fouling volume followed a similar trend of reducing until the 80 th image instance before increasing slightly. The volume fouling for the malt also appeared to increase marginally between the image instances of 15 and 25. Although it would not be expected for the amount of fouling to increase during cleaning processes, there are numerous explanations for this result. Only a small ROI of interest is been analysed, and it is possible that fouling from the surrounding area has moved into this ROI due to the mechanical motion of the flowing fluid. For the case of the gravy, it is known that this swells as it absorbs water and also becomes partly detached from the base of the surface [27], resulting in a lump of gravy that would move slightly closer to the top of the rig and nearer the camera, resulting in more fouling within the ROI. It was not surprising to identify different results for surface fouling and fouling volume between the three fouling materials studied. It is well known that materials with different physicochemical properties foul and are removed from surfaces differently [37]. Previous research has shown that tomato is removed primarily by mechanical force, whereas gravy initially swells due to moisture absorption before been removed as a bulk lump, and malt dissolves gradually into the fluid [27].

Machine Learning Regression for Surface Fouling and Fouling Volume Estimation
The estimation of surface fouling and fouling volume was modelled in this research as a regression problem from an ultrasonic signal. In this respect, the statistical features from wavelet decomposition vector (partially illustrated in Figure 6) were inputted to a Neural Network data fitting [25] decision-making support system for surface fouling quantification purposes. Three-layer feed-forward neural networks were built with the following architecture: • Input layer nodes corresponding to the seven features extracted from the wavelet approximation coefficients • Hidden layer nodes (HLN): 7 • Target layer: one node corresponding to the number of white pixels computed via image processing • The training algorithm adopted in this research was the Bayesian Regularization (BR) [27]. • The dataset was partitioned into three sets using specified indices, specifically alternating one instance for training, one for validation, and one for testing [22].

Results and Discussion
Regression results are characterized by two indicators of the goodness of fitting, namely the Root Mean Squared Error (RMSE) and the correlation coefficient R, which are computed according to Equations (9) and (10).
where, with reference to both surface fouling and fouling volume, are the actual values, are the values predicted by the NN, and represents the average values.
The results for all the tests are reported in Table 2 and Table 3, both in terms of R coefficient and RMSE for all the tests, which are divided in categories.  Results are shown for the three different NN phases. In this respect, the training results correspond to the NN performance obtained by using the training samples including the validation samples, accounting for two-thirds of the whole dataset. The test results correspond to the remaining one-third of the dataset samples used to train the system. The overall results show a weighted average of the training and test results.
The regression results for surface fouling ( Table 2) and fouling volume (Table 3) show acceptable results with R coefficient values above 0.8 for all models developed and above 0.9 for many. There was no clear difference in regression accuracy between the surface fouling and fouling volume models or in the models developed for the different fouling materials.
In general, the models developed for the cleaning experiments at the lower temperature had better model performance, but this is most likely because the fouling took a longer time to clean at the lower temperature, so more data were available to train the models.
It is difficult to directly compare these results with those from the literature, as the authors are not aware of previous research that has combined US measurements with neural networks to develop regression models that are capable of monitoring cleaning processes. However, previous work has used similar methods to monitor mixing progress with similar R values reported [45].
The model performance could be improved with more training data, which could be achieved by either (1) performing more experiments, (2) increasing the frequency of data collection during the experiments, or (3) adding additional optical and ultrasonic sensors to the experimental rig. From an industrial perspective, large amounts of data would be collected during routine CIP cycles, allowing the development of highly accurate and reliable regression models. Indeed, for industrial implementation, datasets would need to be generated for each installation and fouling type to develop accurate monitoring capabilities, thus improvements in performance are inherent with application. Figures 13 and 14 display the predicted and measured surface fouling and fouling volume for the three different materials. In general, the predicted results are in good agreement and follow the trend of the measured results, highlighting the potential of the technique. There appears to be less error between the predicted and measured results for the fouling volume, whilst the predictions of tomato fouling have the most error, but this could be attributed to the fact that there was less data available to train the models.   The analysis of the charts in Figures 13-14 highlights how different materials yield to different cleaning trends and different NN performance in terms of regression accuracy.
The main factor explaining the presence of such outliers is represented by the number of available samples. Meanwhile, the gravy absorbs the water and requires longer time to perform the cleaning process, resulting in a larger amount of data, therefore yielding to a more accurate regression. In contrast, the malt and tomato require a shorter cleaning time as they instead dissolve. Consequently, the number of available samples results is smaller thus; although overall accurate, the regression shows a (small) number of outliers.
From a strictly computational point of view, this can be further explained by how the nature of the fouling material affects the cleaning trend. As regards the malt and tomato, the surface fouling plots show a cleaning trend characterized by a two-phase process, i.e., a "flat" one and a "rapid descent" one. This means that the neural network is trained and tested with a high rate of different input signals but very similar target values, yielding to a lower regression performance.

Conclusions
This work has demonstrated how ultrasonic and optical sensors can be used to monitor surface fouling removal and therefore cleaning processes. Three different food fouling materials (tomato, gravy, and malt) were studied, and it was found that although they all cleaned via different mechanisms, they could all effectively be monitored via the sensing methods. The optical sensor provided spatial information on the area and volume of fouling in the region of interest, providing a greater insight to the fouling removal process than the ultrasonic method. However, it would be extremely difficult to effectively image the fouling in industrial equipment such as pipe work, so the ultrasonic method would be a more suitable industrial technology. Both sensors could only provide information on a small area of equipment, so care must be taken when deploying this in industrial environments to determine the most effective number of sensors to utilize and the precise locations to place these sensors. A larger number of sensors would provide more data but would come with additional costs. The regression machine learning models were effective at predicting the area and volume of fouling present from the ultrasonic measurements. In this respect, the surface fouling was estimated with an average RMSE of 746.265 (from 156.967 to 1157.345) and an average R of 0.921 (from 0.832 to 0.9978), while the fouling volume was estimated with an average RMSE of 650.498 (from 112.961 to 1374.544) and an average R of 0.8927 (from 0.708 to 0.983). A detailed experimental validation would be required to establish a quantitative correspondence between pixel intensity and fouling thickness under the adopted experimental conditions, i.e., lights and camera, which were finalized upon determining the maximum thickness detectability. In industry, the accuracy of prediction would be improved with the collection of larger datasets used to train the models, which would be generated for each installation and fouling material.
Additionally, outliers and missing data have been manually removed in a preliminary data screening phase. In this respect, future work will also include an automatic data debugging procedure that can be applied to detect evident outliers and image artefacts that could affect the processing and NN performance, yielding to misleading results.
The experimental limitations of the proposed monitoring systems are certainly represented by the number and the small size of the region of interest ROI, i.e., a single ROI of roughly 6000 pixels, as well as by the limited range of water temperature used cleaning, i.e., 12 °C and 45 °C and by the limited number of different fouling materials. Other limitations include the use of only a single point measurement, which may not be sensitive to fouling in the rest of the equipment.
A limitation to the industrial implementation is represented by operational difficulty in the proper positioning of the US sensors within an existing industrial equipment, especially in the presence of heater jackets on the pipes. Future research is aimed at determining the exact number of sensors as well their optimized positioning. Additionally, the cleaning performance of different types of fouling materials characterized by various concentrations of fats, minerals, and proteins has to be investigated. Finally, efforts will be focused on integrating the sensing units with a control system. Similar issues can be encountered in the image acquisition process, which is required to label the US data and needs to be carefully designed to be implemented in existing facilities.