1. Introduction
In aquatic environments such as intensive aquaculture systems, there is an accumulation of organic and inorganic matter, feed residues, and aquatic microorganisms [
1]. This accumulation is associated with total suspended solids (
) and could be defined as the mass present in a water column. Suspended particles scatter and absorb light, causing turbidity or loss of transparency of the water [
2]. Total suspended solids (TSS) is an important parameter in determining water quality. In aquaculture, a high level of TSS reduces the vision and ability of fish to catch their feed. Mathematically, a TSS value can be correlated with turbidity [
3], which uses the degree of loss of water transparency due to suspended solids. Therefore, the greater the number of suspended solids in the liquid, the greater the degree of turbidity [
4].
For turbidity measurement, an established protocol is method 180.1 by the U.S EPA. The measurement ranges between 0 and 40 NTU (nephelometric turbidity units), and to achieve higher values, the samples must be diluted in water and the measurement must be rescaled. Otherwise, there are several arrangements for turbidimeters that use different light sources and detectors. However, none of them can be used as a low-cost alternative to monitor water quality in a rapid noninvasive way with a wide dynamic range. For example, some low-cost turbidimeters operate with short ranges; their operating ranges are 2.2–54.2 [
5], 0–10 [
6], 0–12 [
7], and 0–100 NTU [
8,
9,
10]. Furthermore, these methods do not allow the estimation of water turbidity using a single data record. There are advanced turbidimeters with a wide operating range, but they are expensive and require two data records for turbidity estimation [
6,
11].
By contrast, recent advances in computer vision, software development, and other technological advances are being implemented to solve some measurement limitations (the lack of a wide operating range, high cost, etc.) of conventional turbidimeters. Recent advances in turbidimeters are focused on image analysis. Examples include the novel turbidimeter technique of Gimenez et al. that developed a turbidimeter based on image degradation analysis; however, this method needs reference samples, and samples need additional treatment (must be sonicated) [
12]. Gu et al. applied a random forest ensemble to space remote sensing data to obtain river turbidity measurement based on hyperspectral images, and obtained precision of 67% [
13]. Mullins et al. carried out turbidity measurements in the range 10–250 NTU using image processing methods, where they processed and analyzed image-by-image in the measurement process, reaching a precision of 90% with controlled environmental conditions [
14]. In addition, with the advent of smartphones, new turbidimeters have been reported. In the case of Bayram et al., they performed turbidity measurements using a smartphone; however, the precision between the samples measured with the calibrated Hach colorimeter and their smartphone colorimeter was 48% [
15]. Leeuw and Boss also measured turbidity using a smartphone by remotely detecting water reflectance based on environmental conditions, reaching precision of 74% [
16].
In this study, we measure turbidity and suspended solids by using a CNN that could be used as a low-cost alternative to monitor water quality in a rapid noninvasive way with a wide dynamic range. Our research proposes to use revolutionary emerging technology called convolutional neural network (CNN) and open-source technologies to develop a technique capable of measuring TSS and turbidity values simultaneously from a single image of the liquid sample to be measured. CNN is inspired by the working of the human brain and is able to analyze raw data without human intervention. CNNs have been applied successfully in different areas such as images classification [
17,
18], image segmentation [
19], roughness measurement [
20,
21], and soft sensors [
22,
23,
24]. Additionally, CNNs have gained popularity due to their ability to approximate any continuous function [
25,
26,
27], so they could have better precision in the turbidity task than other machine learning methods such as additional trees, multilayer perceptron, naive Bayes, random forest, and support vector machine [
13,
28].
This research describes how to use a soft sensor [
22] model for dynamic processes based on a convolutional neural network, with the highlights of measuring the TSS and turbidity values from a single image. This image is registered by a conventional smartphone (Android and IOS system). The measurements, acquired noninvasively, have high precision and a wide dynamic range for aquatic environments.
The rest of this paper is organized as follows.
Section 2 describes the CNN architecture and the experiments;
Section 3 shows the results and holds the discussion. Finally,
Section 4 reports some conclusions and discusses future work.
2. Materials and Methods
2.1. Proposed Classification CNN
This paper is based on the principles of artificial intelligence, more specifically in operating blocks and several layer of neurons that work together to mimic the functioning of the visual cortex of mammals, which are called convolutional neuronal networks (CNNs). In image classification tasks, a CNN first performs a feature extraction step from the input image, then the features are passed to a neural network and finally we obtain, as an output, the probability assigned to the input image that it belongs to a certain category [
29].
To work with CNNs, few people train a CNN from scratch, due to it is rare to have a large dataset, and it is more efficient to use the advantages of transfer learning, where a CNN model that was previously trained to perform a task is reused and trained again to learn a new task without the need to use large databases (e.g., 1000 classes for ImageNet dataset).
Figure 1 shows the transfer learning process carried out on our CNN model, whose classification task is changed.
A simple and elementary CNN model with a powerful modeling capability is AlexNet [
30]. AlexNet architecture is simple and easy to train and optimize, and has a proven ability to classify and recognize simple images with low visual complexity, such as low resolution [
31]. For this, only AlexNet was trained, due to its simplicity and popularity. In this research, an AlexNet model is trained using an image dataset of liquid samples with different values of suspended solids. RGB-image input size for AlexNet is 224 × 224 [
32]. The AlexNet model used is shown in
Figure 2.
2.2. Proposed Estimator Based on Multiple Linear Regression (MLR)
Thus, we have a model capable of classifying the images in the dataset. The output vector of the trained model can be treated as a decoded version of the input image since the model extracts implicit information from the liquid sample. Although the CNN model classifies certain liquid samples, it cannot classify images with intermediate values of suspended solids. Nevertheless, if the feature vectors (CCN output vectors) are used to fit an MLR, we can predict the TSS and turbidity values of any liquid sample. The main requirement is to train the CNN with a number of classes that contain the target dynamic range.
In a convolutional neural network, there are two steps: the first is called feature extraction, and the second is classification; the second step is built from a branch of neurons. For example, a single neuron uses the inputs (features) to compute a response (output) or a single logit, which is the value of multiplying the inputs with weights and adding the bias term (−∞ to +∞), and then passed through the activation function to obtain an output. Therefore, a logit vector can be acquired by using the real number calculated in a group of neurons. In CNN models, in the last layer, to obtain the probabilities of the classes, it is required to input the logit values into a SoftMax function to generate a normalized vector of probabilities (0 to 1) with a value for each predicted class.
The MLR is then fitted with the logits vector obtained from the last layer, and a linear equation is created to approximate new and unknown values.
Figure 3 shows how the MLR is fitted using the logit vector of the training dataset, and the general operation of the proposed method. When the CNN input is a new image (unknown sample), a new logit vector is created, and used in the MLR to obtain the estimated value. Hence, in this research, class probabilities are used to predict the TSS and turbidity values in a liquid sample.
Furthermore, the CNN and the MLP were trained separately. The CNN was trained using the Cross-entropy function, so a simple classification task was developed. A trained CNN was used to obtain the probabilities based on the input image, then a trained MLP was used to obtain the approximated value according to the values (obtained in the training process) used to fit the MLP. In other words, an MLP was used to predict the value of a variable based on the value of another variable. We then obtained the TSS and turbidity values based on the features (probabilities) found by the CNN.
2.3. CNN Validation as Classifier
One of the main advantages of using a CNN is that it automatically learns the most relevant features in an input image without human supervision. By viewing the convolutional feature maps of an image, we can look at the regions of the image noted by the CNN to perform the classification. In this research, several sets of feature maps were analyzed to confirm that CNN can detect image changes due to the particles suspended in the liquid samples. This would confirm that CNN is only counting differences in suspended solids and ignoring other parameters such as optic aberrations, spurious radiation, mismatch compensated pulse effects, etc.
Figure 4 shows a set of convolutional feature maps extracted from the stacked results in each convolutional layer.
2.4. Samples
Two samples of commercial interest were used: fish feed and paprika. The study was mainly based on fish feed, and to validate our results, paprika (with 2 extra classes) was tested. Commercial fish feed was used as the suspended solid, given that fish feed is one of the components that accumulates the most in intensive aquaculture systems [
33,
34]. Additionally, because fish feed for tilapia is designed to be in suspension [
35], it makes it easy to dose. Twenty liquid samples were prepared by mixing one liter of distilled water with each mass (g) of the fish feed and paprika mass, as shown in
Table 1. The masses were created on a Denver Instruments PI-214 high-resolution balance (with four decimal places). The sample concentration ranged from 0 to 0.8
, chosen because this is the operating range of most commercially available turbidimeters and/or solids meters, such as HANNA and Hach instruments. Concentration samples (fish feed and paprika mass) used for each class are shown in
Table 1.
2.5. Experimental Setup
A transparent container with a cubic shape of approximately 5 × 5 × 5 cm was filled using the liquid sample. A magnetic mixer was used to keep all the particles suspended, for preventing them from settling, and to allow the suspended particles to be recorded by the camera. The mixer speed used was 60 rpm during image registration (images from the training and validation dataset). The experimental setup used for this experiment is shown in
Figure 5. The illumination distance was selected to prevent shadows on the liquid sample. The camera distance was selected to avoid imaging the edges of the container, so that the model does not learn the morphology of the container during the CNN training. The experiment was carried out in a dark room. The background was constant throughout the experiment; therefore, the CNN learned the information from the samples for each color, and it did not take information about the background. This is also corroborated in
Figure 4, where the activations of the artificial neurons were focused, or the CNN took the information, for the classification of the suspended solid samples. The experiment setup implemented an RGB LED lamp. Red, green, blue, and white were used. Liquid samples illuminated with different colors are shown in
Figure 6. The illumination of all colors was kept constant at an irradiance of 0.852
. The spectral power distributions of the illumination used in this study are shown in
Figure 7, which were measured using a spectrophotometer (USB2000, Ocean Optics, Orlando, FL, USA). The experimental setup was adjusted to the neural network by keeping constant all parameters (smartphone camera mode, LED lamp intensity, etc.) that were not under measurement. The neural network learned to identify the image changing characteristics, and since its training included samples with only distilled water (class 0), and considering that the only parameter which varied was the suspended solids, the effects of spurious light and other sources of noise were minimized.
To create and record the liquid sample dataset, the rear-facing cameras of a Huawei Mate 20 Lite and an iPhone 6 were used. A total of 88,000 images were recorded. For samples with fish feed, 39,600 images were recorded with an Android operating system smartphone, and 48,400 images for paprika samples utilizing the smartphone with an IOS operating system. The size of the images recorded via the smartphones was 2448 × 2448 pixels; however, the images were center-cropped at 224 × 224 pixels, according to the input layer of the CNN. The smartphone camera was used in manual mode with a capture speed of 20 FPS (frames per second) in burst mode. The ISO level and the focus (focused at the container wall) option were kept constant during the experiment. Of the 88,000 recorded images, 80,000 images were randomly selected to create the training images dataset. The remaining 8000 images were used to build the validation dataset, i.e., 90% of the dataset was used for training, while the remaining 10% was used for validation or to test the operation of the CNN, that is, the validation images were not used in the validation process or training test.
The training process was developed and implemented using Google Colaboratory, which is a free cloud service for machine learning education and provides a Python notebook (Jupyter) environment running in a dedicated virtual machine on an Nvidia Tesla K80 GPU with 2496 CUDA cores. The AlexNet model (CNN) was taken from the Torchvision package, which offers some popular pre-trained models and other image processing tools.
In artificial intelligence, the process of finding the “best” or “optimal” parameters for the performance of a CNN model is called optimization. The classic optimization method is called stochastic gradient descent (SGD). It is a simple procedure that involves iteratively finding the values that result in the lowest possible error (loss) based on the training dataset. Although newer and more powerful optimization algorithms exist, SDG provides consistency in the overall training process and results.
One of the most important hyperparameters is the “learning rate”. It is responsible for adjusting the rate at which the model calculates the gradient of the loss function. Therefore, the learning rate controls how much the model changes its predictions as it updates its results based on model error. A high learning rate makes the model change its parameters quickly, while a low learning rate makes the model change its parameters slowly. The best option is the selection of a learning rate value that makes the error decrease correctly (not too quickly), finding the minimal error in the fewest number of epochs. An “epoch” is another relevant hyperparameter; it refers to the number of times that the entire training dataset is passed through the CNN model. However, a model is trained using batches. In the context of a single training epoch, “batch size” refers to the amount of data passed to be processed by the CNN and updates model parameters at a time until an epoch is complete. Larger batches allow for more computational parallelism, and can often lead to better performance.
However, larger batches also require more memory and can cause latency when passed into the training function. Finally, the hyperparameter “momentum” is employed to accelerate the gradient descent by taking into account a fraction of the previous gradients to update to the current one.
For the CNN training, the algorithm executed a total of 75 training epochs. The epoch number was selected by analyzing the loss of training according to previous executions of the training process. The CNN was trained using a stochastic gradient descent algorithm with a momentum of 0.9 and a batch size of 50. The other hyperparameters used in the experiment are listed in
Table 2.
The selection of the hyperparameters was established based on the image classification examples exposed in the PyTorch documentation. In addition, several executions were carried out in search of minimizing errors, and until a high accuracy of classification (100%) and estimation (98.24% and 97.20% for TSS and turbidity, respectively) was reached.
2.6. Performance Metrics
The performance of the proposed method was evaluated in two stages.
2.6.1. Performance Metrics for Classifier Evaluation
A confusion matrix is used to analyze the performance of a classification tool [
36,
37,
38]. Four important terms make up a confusion matrix, which describes the following cases: the cases that were predicted as elements that belong to a class and that actually belong to that class are known as true positives (
TP). Similarly, the cases of elements that were predicted that do not belong to a class and do not belong to that class are known as true negatives (
TN). On the other hand, the cases of elements that were predicted to belong to a class but which do not really belong to the class are named false positives (
FP). Finally, the cases that were predicted not to belong to the class and actually belong to the class are known as false negatives (
FN). The elements of the confusion matrix are used to calculate the following performance metrics for the evaluation of the classifier:
Accuracy is the percentage of the total number of predictions that were classified correctly and is calculated as:
where
N is the total number of elements to classify.
Precision is the ability of the classifier to predict a sample according to what it really is, and is defined as:
Recall is the ability of the classifier to find all the positive samples. In other words, how many examples of positive cases were correctly labeled, and can be written as:
Similar to
Recall,
Specificity is the ability of the classifier to find all the negative samples, and is defined as:
F-Score is the harmonic mean of
Precision and
Recall, and provides a notion of how precise the classifier is. A high
F-Score value indicates that the model performs better in positive cases. It is calculated as:
Receiver operating characteristic (ROC) is a plot of the rate of true positives (Recall) versus the rate of false positives (Specificity). This graph characterizes the ability of a CNN to identify positive cases as positive, and negative cases as negative. Meanwhile, the area under the ROC curve (AUC) is the probability that a randomly chosen pair of positive and negative cases will be classified correctly.
2.6.2. Performance Metrics for MLR Evaluation
The following metrics were used to evaluate the performance of the MLR, whose task is to estimate the correct measured value.
The coefficient of determination (
) is a statistical measure of the goodness of fit or reliability of the model according to the data. This coefficient determines the quality of the model to replicate the results. The values of
R2 are between 0 and 1. Zero implies that there is no linear relationship, and a value of one means that there is a perfect linear relationship. The coefficient of determination is calculated as:
where
y_predicted is defined as the predicted value,
y_true as the true value, and
y_mean as the average of the
y data.
Mean absolute error (
MAE) is evaluation metric used in regression models. It is the mean of the difference between the original values (
y_true) and the predicted values (
y_predicted). Mathematically, it is described as:
Mean square error (
MSE) is defined as the difference between the original values and the predicted values, and squared by the mean difference; the higher this value, the worse the model. Mathematically, it is represented as:
3. Results and Discussion
This section shows the results of the evaluation of CNN’s performance metrics. The CNN model was independently trained four times; in each training we used a color training dataset (white, red, green, and blue datasets). In addition, the efficiency of the MLR was evaluated to estimate the TSS and turbidity values. The validation dataset that was used consisted of nine classes for each color with 100 images in each class. For the CNN evaluation, a confusion matrix was implemented in which the rows are related to the true labels and the columns to the labels predicted by the CNN. Diagonal cells are linked to observations that are correctly classified. A perfect classification is reached when each space of the diagonal elements counts 100. The confusion matrix obtained at the end of the training process for the four color datasets was the same for fish feed and paprika (no extra classes) and is shown in
Figure 8. If we visualize the extra classes in the confusion matrix, the diagonal also has 100 elements. The success of this classification may be due both to the size of the dataset (which is relatively large) and to the lack of complexity of the classification objects. In addition, since the MLR evaluation metrics had a percentage of error, we can guarantee that the CNN did not memorize the dataset and was not overfitted. The trained CNN for each color dataset reached a maximum score on its performance evaluation metrics for accuracy, precision, recall, F-score, and ROC (see
Figure 9). The performance obtained for the trained CNN for each color dataset is shown in
Table 3.
Although the trained CNN achieved the same high score on the CNN performance metrics for each color dataset, it should be noted that the training time was different for each color dataset. The best training time was reached using the white light dataset. On the other hand, the worst training time was obtained using the green dataset. The difference between the CNNs trained with the white and green databases was 266% in training time. The CNN trained with the red dataset was the second-best model, which reached the maximum score in almost double the amount of time in comparison to the white one. These differences could be attributed to the fact that the CNN creates individual feature maps for each RGB color channel [
39]. Therefore, the white color, which is the combination of the three color channels, could generate more detailed feature maps using the three color channels, which could allow the CNN to classify all classes more effectively. Training time is a parameter that gives us information about which color dataset is best categorized by the CNN. Once the CNN has been trained, when entering a new image, it calculates its TSS and turbidity values in fractions of a second. The accuracy and loss curves in the training process are shown in
Figure 10. It is easy to see that the blue classifier (CNN model trained with the blue dataset) was the first to achieve a high accuracy score; however, its training process started with a low accuracy value of 42% in its first epoch, and only reached 100% accuracy at the 15th epoch. Another important aspect concerns the green classifier, which started with the lowest accuracy score of 12%, and reached a high accuracy value at the 12th epoch, but continued its training process with some fluctuations. The red classifier started with a low accuracy value of 45% in its first epoch, and reached 100% accuracy at the 12th epoch. Meanwhile, the white classifier obtained an accuracy value of 90% in its first epoch, and obtained an accuracy of 100% in its seventh one.
Differences between RGB colors in terms of accuracy and loss curves can be related to the spectral characteristics of the light. In particular, the sample was a brown color for fish feed, whose spectrum has more red-light content, and then the red image may have more information, which may improve the red channel analysis. Regarding the green and blue channels, the wavelength peak of green LED illumination is about 40 nm displaced from the peak wavelength of the green pixel responsivity of the CMOS camera, while the wavelength peak of the blue LED lamp is about 10 nm displaced. This spectral mismatch may reduce the information of the green channel in comparison with blue and red channels. This is similar for paprika samples whose spectrum has more red-light content.
MLR performance measures were calculated and the results obtained for the different color datasets are shown in
Table 4. When looking at the results in
Table 4, it is observed that the MLR for the white dataset had the highest value of the coefficient of determination
, and furthermore, the highest model quality to replicate the results. In addition, the MLR with the white dataset had the lowest MAE and MSE values. Therefore, the best MLR performance for the estimation task was performed with the white dataset, unlike the green illuminated samples, which had the worst MLR performance. The study was mainly focused on fish feed and, to validate our results, paprika was tested. For both samples, the CNN + MLR have the performance listed in
Table 4. The liquid samples created in this research have a TSS range of 0–800
. The TSS values, estimated by both the CNN + MLR, are shown in
Table 5 for the different color datasets.
Performance metrics shown in
Table 4 and
Table 5 indicate that the best light illumination for the proposed method was white color for the fish feed sample. This is because the white color showed an error of 2.53% compared to 3.16% for red, 4.16% for blue, and 9.57% for green. The proposed method with the white dataset had an operational range of 0 to 0.8
and high goodness of fit (
= 0.99). Therefore, the method with the white dataset obtained the lowest error of ±2.53% and a general standard deviation of ±0.018, which implies an accuracy of 97.46%. These errors are the measurement errors (observational errors), which were calculated by using the values of the reference samples listed in the
Table 1 (taken as the true values). In general, the results indicate that the measuring precision is reasonably good for the fish feed sample. However, the two smaller concentrations (Classes 0 and 1) had lower precisions. To improve this problem and test repeatability, we repeated and expanded the measurement with a different sample, paprika, and a smartphone with a different operating system (iPhone 6, IOS system).
In order to improve and validate our results, we analyzed a set of liquid samples made with paprika. This additional set of samples incorporated two new classes (1/2 and 3/4) into the CNN training process, which significantly reduced the measuring error of classes 0 and 1. Additionally,
Table 5 shows TSS values for the paprika samples with each color illumination dataset. In view of the training of the new classes, the CNN + MLR improved its accuracy to 98.2400% for the TSS values with the white dataset.
It should be highlighted that between TSS and turbidity there is a correlation due to the coefficient of proportionality that creates a linear regression between them. This coefficient of proportionality between TSS and turbidity depends on the geometric and optical properties of the suspended solids (i.e., size, shape, refraction index, mass density) [
40,
41,
42]. In other words, the samples of fish feed and paprika have the same concentration of TSS, but have different concentrations of turbidity since they have a different coefficient of proportionality.
In order to estimate the turbidity values in the liquid samples, the system was trained and validated using reference values measured with another instrument. These turbidity reference values were measured with a HACH DR900 colorimeter, within the operating range 0–263 NTU, and are shown in
Table 6. The reference measurements were replicated six times to obtain the standard deviation of the device. Note that when making measurements near 200 NTU with the HACH DR900, the standard deviations increased, due to it being adjusted with a calibration curve using the reading obtained with the 200 NTU formazin standard. Additionally, according to the user manual, the instrument error is ±21 NTU [
43].
Table 6 shows the TSS and turbidity reference values for the fish feed and paprika samples. The turbidity values estimated by our method are shown in
Table 7 for the different color datasets. The white dataset showed the lowest standard deviation values among all color datasets. It had a maximum standard deviation of ±13.68 NTU for the fifth class and, in turn, had a lower standard deviation than the instrument used as reference. In addition, for the fish feed samples, the white color presented an error of 9.84% compared to 11.64% for red, 12.55% for blue, and 42.50% for green. Therefore, the 0–263 NTU range is appropriate for aquatic environments, as the safe turbidity level for aquatic life should not exceed 25 NTU [
44]. For the turbidity measurement, the standard deviation in our proposed method was ±6.98 NTU and an accuracy of 90.16% for the white dataset, which was the best color dataset for the fish feed samples. Additionally,
Table 7 shows the turbidity values estimated by our method for paprika samples, and it can be noted that by training the new classes, the CNN + MLR improved its accuracy to 97.20% for turbidity values using the white dataset. In order to test our method, eight new samples with paprika were prepared with fractional concentrations of TSS. To validate the proposed method, these samples were estimated without training the CNN or the MLR for these TSS values. The TSS values estimated by the CNN + MLR are shown in
Table 8, with an accuracy of 96.88% for the white dataset. The turbidity values estimated by the proposed method are shown in
Table 9, with an accuracy of 96.14% for the white dataset. The CNN + MLR were validated for extra concentration samples.
The results showed that the proposed method can be improved by including more classes with small concentrations in the CNN training. This can be noticed by comparing the test results for the fish feed and paprika samples. For example, with white light illumination, the measurement error is reduced from 2.53% to 1.76% for TSS estimation, and from 9.84% to 2.79% for turbidity estimation. These results provide evidence for the effectiveness of the proposed method, and indicate high resolution and accuracy. Nevertheless, despite the high performance this method offers, its associated limitations should be recognized. Among these are the type of samples that can be measured. The proposed method was tested with commercial fish feed and paprika as suspended solids, which do not represent all types of suspended solids. There are other aquatic environments such as river water, domestic and industrial wastewater, drinking water, among others, where the size of the particles may be smaller than that of the fish feed or paprika mass. However, as CNN analyzes the images in depth, with the high-level convolutional layers calculating all the differences between the images, their potential for application to other types of suspended solids is promising. This is because the CNN image analysis not only differentiates particle distribution, but also contrast, brightness, and color. This means that further research should explore this issue, considering other types of samples such as river water, domestic and industrial wastewater, and potable water, among others. In addition, this research is expected to become a prototype, in which we would have an encapsulation box with LED lighting included. In the box, we could place the sample and the smartphone in certain fixed positions and perform the measurement. In a future work, it could be tested if the LED flash (white light) of the smartphone could be used because the white dataset presented the best performance.