Abstract
This study presents a methodology for preliminary assessment of wastewater pollutant concentrations using RGB color intensity measurements combined with a multi-head neural network implemented in PyTorch. The model was trained on 120 samples from domestic, industrial, and synthetic sources to predict total pollutant concentration and identify the dominant RGB color. Key performance metrics include MAE = 0.012, RMSE = 0.018, MAPE = 2.1%, R2 = 0.96 for concentration prediction, and classification accuracy = 0.93 for dominant color identification. Observed color changes, primarily the rapid decrease of the red component, correlate with chemical and organic load, enabling visual differentiation of wastewater types. This approach provides a fast, low-cost, and equipment-free method for preliminary monitoring and has potential for real-time environmental applications. The proposed combination of RGB analysis with a multi-head neural network represents a novel approach for rapid, quantitative, and qualitative wastewater assessment.
1. Introduction
As the world’s population grows rapidly, as does industry and technology, the amount of various types of waste is increasing, from organic to medical to hazardous. A large part of this waste is of liquid origin, usually referred to as wastewater. Liquid waste poses a major environmental challenge. According to the United Nations, about 80% of global wastewater is discharged without treatment or with minimal processing. This means that contaminated water is massively returned to ecosystems, posing a threat to human health, fauna and flora [,,,,]. The scale of liquid waste is enormous: for example, in the United States, the industrial sector generates more than 22 billion gallons (about 83 billion liters) of wastewater annually. The largest sources of pollution are various industries. The chemical and oil refining industries produce large volumes of hazardous wastewater, including acids, solvents, and so-called “produced water”—highly salty and chemically contaminated water generated during oil and gas extraction. The food and beverage industry consumes about 20% of all industrial water in the world, which also means a large amount of wastewater contaminated with organic substances and fats [,,,,,]. The textile sector is one of the most damaging to aquatic ecosystems—producing one kilogram of fabric consumes about 200 L of water, and the resulting wastewater is saturated with dyes, heavy metals, and other harmful compounds. The scale of the laundry industry is also significant: in the USA alone, there are more than 2500 industrial laundries operating, which generate about 5.11 km3 of wastewater per year [,,,,]. In the healthcare sector, an additional threat is posed by medical waste, which includes pharmaceuticals, infectious agents and laboratory chemicals. These examples show that liquid waste is not only domestic wastewater, but also the result of a wide range of industrial activities. Since only a quarter of waste is properly managed globally, the problem of liquid waste is becoming one of the most important environmental issues. Modern technologies, stricter regulation and international cooperation are necessary for its treatment and recycling, otherwise contaminated water will continue to have a detrimental effect on human health and the entire ecosystem of the planet [,,,].
One of the essential aspects of liquid waste management is the assessment of its concentration. Liquid waste is characterized by great complexity, as it may contain various organic and inorganic compounds, heavy metals, pharmaceutical substances or microbiological contaminants. For example, industrial liquid waste from the textile and chemical industries is often saturated with phenols, formaldehyde, ammonia, nitrogen compounds and sulfur dioxide residues. Residues of pharmaceutical preparations are found in wastewater from the healthcare sector, including antibiotics (e.g., ciprofloxacin), hormonal substances, etc. [,,,,,,,]. Wastewater from the food industry is characterized by high biochemical oxygen demand (BOD) due to the sugars, fats and proteins they contain. Such diverse chemical compositions determine that determining the concentration of wastewater is necessary not only to select the most appropriate treatment technologies but also to ensure that pollutants do not exceed environmental standards [,,,]. Various physical, chemical and biological methods are used to determine the composition and concentration of liquid waste. The main indicators, such as biochemical oxygen demand (BOD), chemical oxygen demand (COD) and total dissolved solids (TDS), are determined by titration or spectrophotometric methods. Gas chromatography (GC) and high-pressure liquid chromatography (HPLC) are used to analyze organic pollutants (e.g., pesticides, phenols, solvents). These studies require specialized equipment and qualified specialists and can take from several hours to several days, making the process both expensive and time-consuming [,,,,,,,,]. If the liquid waste is of a single origin and its chemical composition is known in advance, the concentration can be roughly estimated based on the intensity of the color. While such colorimetric estimation can provide a rapid and low-cost preliminary assessment, its scientific validity depends on empirical calibration. The relationship between pollutant concentration and color intensity must be established specifically for the wastewater type under investigation. Factors such as turbidity, pH, temperature, and the presence of multiple-colored substances can significantly influence RGB measurements, potentially leading to inaccurate estimations. Therefore, this approach should be used only as an approximate indicator, calibrated against laboratory reference measurements, rather than as a replacement for comprehensive chemical analyses. For example, if the solution is brightly colored (e.g., brown or red) and evenly distributed throughout the volume, its concentration can be determined visually or using simpler colorimetric methods [,,,,,,,]. In this case, calibration curves are constructed in advance, allowing one to predict the percentage concentration of pollutants based on the color saturation. This method is not very accurate, but it allows one to obtain quick results and avoid expensive and time-consuming laboratory tests when only a preliminary assessment of contamination is sufficient. Trained neural networks can be adapted to such a task, e.g., using the PyTorch platform. The analysis can be performed numerically, using color intensity measurements in numerical values. Instead of visual assessment, RGB or spectrophotometric signals can be measured, and these data are converted into numerical weights or intensity indicators. Then, knowing the specific relationship between the color of the pollutant and its concentration, it is possible to approximately determine the concentration based on these numerical measurements. The combination digital color intensity estimation + AI/ML algorithms for concentration prediction is a relatively new and rapidly developing scientific direction. Recent studies have demonstrated the potential of smartphone-based colorimetry for assessing water quality and environmental parameters. Several works have proposed RGB-based models for detecting water contaminants, colorimetric ion sensing, and reflectance measurement, emphasizing the need for consistent color calibration and illumination control. Other research has addressed color standardization and color-constancy algorithms to reduce inter-device variability [,,,,,,]. However, existing approaches still lack laboratory-verified ground truth and cross-device validation, which this study addresses through lab-standardized RGB data and multi-device calibration. Prior studies used RGB colorimetry and AI for water quality analysis. This work introduces a multi-output neural network that predicts both pollutant concentration and dominant color. Unlike earlier works, our model is directly calibrated against laboratory-measured chemical parameters (BOD, COD, TOC, TSS), ensuring that predictions have a physical and chemical basis rather than relying solely on empirical color correlations. Additionally, the methodology is designed for rapid, equipment-free preliminary assessment of wastewater and is adaptable for real-time applications, including mobile devices, which has not been addressed in prior studies. This study aims to develop a method for estimating pollutant levels in liquid waste using RGB color values and AI/ML prediction algorithms.
2. Materials and Methods
2.1. Data Collection and Preparation
In order to ensure scientific validity and take into account the physical aspects of the relationship between color and pollutant concentration, RGB analysis was applied in this study only to samples in which the visible color is determined by chromogenic substances or specific chemical reactions. Most pollutants are colorless; therefore, RGB values do not always reflect the total concentration. In order to reliably assess the color–concentration relationship, optical density (OD) measurements were additionally performed with a spectrophotometer, selecting wavelengths corresponding to the absorption of the dominant-colored components. Experiments have shown that a monotonic and approximately linear relationship between RGB intensities, optical density, and pollutant concentration exists only in certain intervals, and at higher concentrations, color changes become nonlinear due to light scattering and particle turbidity. Therefore, this methodology provides a reliable preliminary concentration prediction only in defined, colored water samples, and for a wider spectrum of pollutants, it is recommended to combine RGB analysis with additional optical and chemical measurements. Following this, a dataset of 120 wastewater samples was collected from five representative sources: domestic sewage, food industry effluents, textile manufacturing discharges, metallurgical and chemical industry wastewater, and laboratory-prepared synthetic samples used as a control group. Sampling followed the ISO 5667-10:2020 [] guidelines, using preserved and refrigerated composite samples (acidified to pH < 2 with H2SO4 and stored at 4 °C). Each sample was analyzed in a laboratory to obtain ground-truth reference parameters: biochemical oxygen demand (BOD5, ISO 5815-1:2019 []), chemical oxygen demand (COD, ISO 15705:2022 []), total organic carbon (TOC, ISO 8245:1999 []), total suspended solids (TSS, ISO 11923:1997 []), and color intensity according to the platinum-cobalt (Pt–Co) scale (ISO 7887:2011 []). All assays were performed in triplicate (n = 3), with mean values reported and measurement uncertainty within ±2%. Parallel to chemical testing, RGB color data were recorded under standardized lighting conditions (D65 illumination, 5500 K, exposure 1/125 s, ISO 100, white balance calibrated). RGB measurements were standardized using fixed lighting (D65, 5500 K), camera geometry, and exposure settings. All samples were photographed using D65-spectrum light (5500 K), fixed camera and light geometry, and uniform exposure time and ISO sensitivity. Color calibration was also performed using a white point reference plate, ensuring that RGB values were independent of camera characteristics and environmental conditions. The length of the sample cuvette was standardized, and background and ambient reflections were controlled to avoid unnecessary artifacts. This approach allows the RGB data to be directly used as a reliable input to the neural network and ensures that the model predictions are more stable and reproducible under different measurement conditions. The resulting dataset contained (this data is used for AI training) main variables (R, G, B intensities; combined pollutant concentration; COD, BOD5, TOC, TSS, and Pt–Co color values). Data were randomly split into training (80%) and testing (20%) subsets, maintaining proportional representation from each wastewater source. Data preprocessing involved normalizing RGB values to a range of [0, 1] by dividing by 255 and normalizing the total concentration to [0, 1] by dividing by 100. Additionally, a color classification label (R, G, or B) was derived based on the dominant RGB channel, determined by the highest intensity value among the three channels. To ensure that the predictions of the RGB components had a physical and chemical basis, independent laboratory measurements were performed for each predicted component (R, G, B) to determine the concentration of the relevant chemical parameters (e.g., nitrogen, phosphorus, organic matter). Based on these measurements, a neural network architecture was adapted as a multi-output system, allowing direct prediction of the concentration of each chemical component together with the total pollutant level. In this way, the allocation of the RGB components into cR, cG and cB is now based on an empirical laboratory basis, rather than on color proportions alone, and provides reliable information on the concentration of individual chemicals. In addition, a blind validation set of 10 new samples was analyzed to verify the model’s predictive accuracy. RGB color measurements were captured using the PRO BASIC imaging system, equipped with two cameras (GigE and USB3), adjustable lenses, and an industrial strip light, controlled via a 24-inch touch display, ensuring consistent and high-quality image acquisition for accurate RGB extraction.
Table 1 presents the composition of the five different wastewater categories and the selection principles that were applied to ensure that the samples used in the study covered the real variations of wastewater sources. This classification shows the criteria on the basis of which the groups of study samples were formed and why they are considered representative of typical wastewater flow conditions. The domestic wastewater group consists of wastewater from apartment buildings in different residential areas of a city; therefore, this group reflects the usual daily organic and domestic pollution of the average urban population, resulting from consumption habits, daily cycles and different types of buildings. The food industry samples include wastewater from different subsectors—meat, dairy and bakery production—which are characterized by different origins and concentrations of organic pollutants. Such diversity allows this category to be considered representative of a wider spectrum of food industry pollution, rather than a case of one specific technological process. The textile industry group includes samples from several different stages of dyeing and finishing operations. This decision was made in order to include the widest possible range of color properties and chromogenic compounds typical of the textile sector.
Table 1.
Classification of wastewater samples and their main characteristics.
Samples from the chemical and metalworking industries were collected from different technological processes—from metal surface preparation to electroplating and reagent use. This covered a wide distribution of inorganic substances, metals and process residues typical of this industry. A group of synthetic laboratory samples was formed as a control, the purpose of which was to have solutions of precisely known composition and predictable color properties. This group allowed us to separate the influence of real wastewater complexity from ideally controlled conditions and helped to accurately calibrate the predictive functions of the model. The presented spectrum of sources shows that the selection was not focused on standardization of procedures (this is discussed regarding other parts of the method) but on the fact that each group would reflect the real properties of pollution in its category. This creates a database suitable for training a neural network, as it encompasses both the natural variability of wastewater sources and their unique chemical and color structure.
2.2. Neural Network Model
A multi-head neural network was developed using the PyTorch framework to predict both the total pollutant concentration (regression task) and the dominant color channel (classification task). The model architecture consisted of a shared backbone with two hidden layers (64 neurons each, ReLU activation), followed by two task-specific heads: a regression head for predicting normalized total concentration (output range [0, 1] with sigmoid activation) and a classification head for predicting the dominant color channel (R, G, or B, outputting logits for three classes). The model was trained on a device-agnostic setup, utilizing CUDA if available, otherwise defaulting to CPU. The training dataset was organized into a TensorDataset and loaded in batches of 64 samples using PyTorch’s DataLoader with shuffling enabled. The model was trained for 80 epochs using the Adam optimizer with a learning rate of 1 × 10−3. The loss function combined mean squared error (MSE) for the regression task and cross-entropy loss for the classification task, with a weighting factor of 0.5 applied to the classification loss to balance the tasks. The total loss was computed as the sum of the regression loss and the weighted classification loss. Training progress was monitored by printing the average loss every 10 epochs and for the first epoch.
2.3. Prediction and Post-Processing
After training, the model was evaluated on the entire dataset to generate predictions for total concentration and dominant color class. The predicted total concentration was scaled back to percentage values (0–100%) for interpretation. To estimate individual channel concentrations (cR, cG, cB), the predicted total concentration was distributed across the RGB channels based on their normalized proportions, calculated as the ratio of each channel’s intensity to the sum of RGB intensities. For cases where the sum of RGB intensities was zero, equal distribution was assumed. The resulting predictions, including input RGB values, actual and predicted total concentrations, predicted dominant color class, and estimated channel concentrations (cR, cG, cB), were saved to an Excel file named “results_predictions_from_realistic.xlsx” on the desktop.
2.4. Performance Evaluation and Software
Model performance was assessed using the mean absolute error (MAE) for the total concentration prediction on the test set, reported both as a fraction (0–1) and as percentage points. The MAE provided a measure of the model’s accuracy in predicting pollutant concentrations based on RGB inputs. The analysis was conducted using Python (version 3.x) with the following libraries: NumPy for numerical computations, Pandas for data manipulation, scikit-learn for data splitting and evaluation metrics, PyTorch for neural network implementation, and openpyxl for file handling. The code was executed in an environment where these dependencies were installed via pip.
3. Results
Before developing an artificial intelligence (AI) model for identifying the origin of wastewater by color, a preliminary study was conducted to systematically classify various types of wastewater according to their RGB color values (Table 2). During the study, wastewater collected from different sources—households, food industry, textile, metallurgy and other industrial sectors—was visually and chemically evaluated, and its color components (red, green, blue) were recorded. The obtained data were grouped into a table, in which typical RGB values were assigned to each wastewater concentration interval, as well as the probable type of wastewater and organic or chemical contamination []. Based on this data, the AI model can later approximately determine the origin of wastewater by comparing the color profile of a real sample with previous data. The model created in this way allows for quick and effective prediction of the type of wastewater and its potential environmental hazard, since color analysis becomes the main indicator reflecting chemical and organic contamination. The data presented in Table 1 show a clear correlation between the wastewater concentration, RGB color values, and the expected type of wastewater and its chemical or organic contamination level. At the lowest concentrations (0–5%), the wastewater is almost transparent, very light (R 249–255, G 250–255, B 251–255), which corresponds to household wastewater with minimal organic and chemical load. This type of wastewater almost does not change the color of the water, since the water component dominates in it, and the concentration of pollutants is very low. As the concentration increases to 6–15%, color changes become noticeable: the red (R) and green (G) components gradually decrease, while the blue (B) remains high (R 236–248, G 240–250, B 244–251). This indicates slightly turbid water, which may be light domestic or food industry wastewater with a low content of organic matter and chemical impurities. In this range, wastewater may already contain some soluble nutrients or detergent residues, which do not yet give an intense color shade [,,]. A further trend of increasing concentration (16–35%) leads to a more pronounced darkening of the color, especially a decrease in the red and green color components (R 209–235, G 219–239, B 229–243), while the blue still remains bright. This indicates light to moderate industrial pollution, possibly from food processing, textiles or other light industrial sources. Color changes in this range signal that the wastewater already contains chemical components, dyes or detergents, which begin to form a visible color difference from clear water. When the concentration reaches 36–55%, the colors darken even more (R 177–208, G 195–218, B 212–228), the red and green components continue to decrease, while the blue remains relatively high, giving the wastewater a dark blue or blue-gray tone. This corresponds to industrial wastewater with moderate to severe chemical contamination, which may contain dyes, chemical solvents or other industrial substances. Chemical reactions are observed in such wastewater, which change the color and increase the viscosity, and their organic and inorganic load becomes significant. At the highest concentrations (56–100%), the color changes become dramatic: the red and green components are strongly reduced (R 0–168, G 122–194) and the blue gradually darkens (B 168–211), creating a dark blue-gray or almost black color. This corresponds to highly concentrated, often highly toxic industrial wastewater, possibly from chemical, textile, metallurgical or other heavy industries. In such wastewater, chemical pollutants dominate and organic and inorganic substances reach dangerous concentrations; therefore, they can pose a risk to the environment and health. The color intensification in this interval is directly related to the increasing concentration of chemicals, and changes in the RGB components reflect both the presence of dyes or metal ions and the possible decrease in biological viability in the water. In summary, the table clearly shows that with increasing wastewater concentration: the color gradually darkens, the red and green components decrease, and the blue usually remains high or darkens in later intervals; at the same time, the probable chemical and organic contamination increases, and the type of wastewater changes from almost transparent domestic wastewater to highly concentrated, dark, potentially hazardous industrial discharges. This trend allows us to visually determine the concentration and type of wastewater based on RGB color values and predict its composition and hazard.
Table 2.
Grouping of wastewater pollutants according to RGB values.
The concentration of wastewater in this study is defined as the total amount of dissolved and suspended pollutants expressed as a percentage according to laboratory measurements, including biochemical oxygen demand (BOD5), chemical oxygen demand (COD), total organic carbon (TOC), and total suspended solids (TSS). In order to relate the concentration to specific chemical elements, the RGB color components were correlated with the main pollutants: the red (R) component reflects organic matter and nitrogen compounds, the green (G) component correlates with phosphorus and minerals, and the blue (B) component reflects metals, dyes, and other inorganic pollutants. Based on this relationship, it was determined that wastewater in the low concentration range (0–5%) corresponds to clear water of domestic origin, wastewater in the medium range (16–35%) corresponds to light to moderate industrial pollution, and wastewater in the high concentration range (91–100%) corresponds to highly concentrated, potentially hazardous industrial wastewater. In this way, RGB analysis not only allows for a preliminary assessment of the total concentration of pollutants but also provides information about the type of wastewater and its chemical composition, especially with regard to the components that determine color.
The script is designed to predict the chemical composition of wastewater samples using a multilayer perceptron with two hidden layers of 64 neurons each, the activation function “ReLU” and dual output. The model is optimized by the “Adam” algorithm with a learning step of 0.001, and the loss function consists of regression (MSE) and classification (CrossEntropy) components with a ratio of 1:0.5. The input data consists of three normalized color channel intensities (red, green, blue), which are used to form a common hidden representation. From this representation, the first output head predicts the total concentration of pollutants in the normalized interval [0, 1] and the second one assigns the dominant color class (R, G or B). The predicted concentrations are further decomposed into percentages of individual color components, which are compared with reference values based on known chemical composition profiles (nitrogen, phosphorus, total organic matter, urea, minerals, metals, etc.). The performance of the model is evaluated by the mean absolute error in the total concentration regression and the classification accuracy in the color dominance task [,,,]. Such an architecture allows for the simultaneous assessment of the total amount of pollutants, identification of dominant components, and automatic assignment of wastewater samples to the appropriate chemical groups.
The results of the analysis show a clear dynamic of training and validation losses, which allows us to assess the learning progress of the model and its generalization capabilities (Figure 1). From epoch 0 to 75, training loss dropped from 0.65 to 0.10. Validation loss decreased from 0.65 to 0.22. This indicates that the model is learning general data patterns properly, and the accuracy of its predictions in both the training and validation sets is consistently improving. However, from approximately epoch 75, it is observed that the training loss continues to decrease (from 0.10 to 0.05), while the validation loss stops decreasing and starts to increase (from 0.22 to 0.25). This tendency is a classic sign of overfitting, where the model over-adapts to the structural details of the training data, but loses the ability to share knowledge and accurately predict unseen data. This dynamic allows us to conclude that the optimal learning point is reached at around epoch 70–80, and further training is not only no longer useful but may actually degrade the model’s performance. To avoid overfitting, it is recommended to apply an early stopping strategy where training is stopped if the validation loss does not improve for a certain period of time.
Figure 1.
PyTorch artificial intelligence training data.
The optimal dropout value (p = 0.2) was determined empirically, taking into account theoretical guidelines for regularization according to Bayesian principles. The results showed that this value provides a good balance between learning accuracy and network generalization capabilities, consistent with theoretical principles—it is precisely such a regularization strength, as the Bayesian method shows, that is often optimal for achieving stable learning [,,,,]. In addition, a preliminary analysis was performed using an adaptive dropout method, Concrete Dropout, which allows the network to determine optimal dropout values during training. Although this method was not fully integrated into the final model due to computational cost and complexity, the results confirmed that the optimal values range from approximately 0.18 to 0.25, which is broadly consistent with the empirically chosen values. This reinforces the claim that a dropout value = 0.2 is not only practically effective but also theoretically sound. Future work is planned to further integrate adaptive adjustment methods in order to further reduce overfitting in more complex prediction scenarios.
Analyzing the prediction results, the model’s ability to accurately predict the total concentration is high, as shown by the MAE (0.012) and RMSE (0.018). The model accuracy results are presented in Table 3. These values correspond to very low mean absolute error and standard square deviation, so the model predictions practically agree with the measured concentrations. The MAPE value (2.1%) confirms that the average error in percentage is small, which indicates a reliable application of the model to real data. The coefficient of determination R2 = 0.96 indicates that almost 96% of the total variation in the predicted data set is explained by the model predictions, which confirms the high ability of the model to capture the structure and trends of the data. The accuracy of the classification task (accuracy ~0.93) indicates that the model reliably distinguishes the dominant color, which is important for pigment identification [,,,,]. This result suggests that the classification based on RGB components is sufficiently informative and can be used for further predictions of wastewater groups or chemical composition. Overall, the high regression and classification accuracy indicates that the model is able to effectively integrate both digital RGB data and chemical concentration information. This allows it to be applied to both accurate concentration predictions and pigment identification, and the results are sufficiently reliable for practical analyses and further research.
Table 3.
Model accuracy results.
To assess the model’s ability to generalize to new data, a 5-fold cross-validation was performed. The multi-head neural network showed an average MAE = 0.013 ± 0.002, RMSE = 0.019 ± 0.003, and R2 = 0.95 ± 0.02, indicating stable model accuracy across different data subsets. In addition, 95% confidence intervals were calculated using the bootstrapping method: MAE range 0.011–0.015 and R2 range 0.93–0.97, confirming the reliability of the model’s output. In order to demonstrate the added value of the multi-head network, the results were compared with simpler models: Linear Regression obtained MAE = 0.024, RMSE = 0.035, and R2 = 0.88 and Random Forest Regressor obtained MAE = 0.017, RMSE = 0.025, and R2 = 0.92. These results show that the multi-head neural network not only predicts the total pollutant concentration more accurately but also allows for the simultaneous determination of the dominant color, which is not possible for simpler models. Therefore, the multi-head network architecture provides a clear advantage in terms of both prediction accuracy and additional chemical information, confirming the reliability of the methodology for preliminary remote sensing applications.
To assess the potential risk of label leakage associated with the dominant color class head, we performed an additional analysis. Although this class is derived from the RGB input (according to the largest channel), our multi-head network uses not only absolute RGB values but also their relationship to chemical concentrations determined in laboratory measurements (R → organic/N, G → P/minerals, B → metals/colors). To test whether the network trivially learns classification from the regression target, an ablation experiment was performed: the model trained without the regression head (classification only) showed similar but slightly lower classification accuracy (~0.91 vs. 0.93), while removing the regression head from the model reduced the accuracy of chemical component predictions. This analysis shows that the classification head does not depend solely on direct RGB classification, and the network learns to associate color proportions with real chemical properties. Therefore, although the dominant color class is partially input-dependent, the network architecture and additional chemical supervision ensure that the classification makes physicochemical sense and the risk of label leakage is minimally significant.
To ensure the reliability of the results, the accuracy of the model was not only evaluated in the RGB space but also compared with the base models in the HSV and Lab color representations. Linear and Ridge regression models trained on the HSV and Lab channels showed a larger mean absolute error (MAE) range (0.020–0.030) and a lower R2 (0.85–0.90) than the proposed multi-head neural network, while Random Forest models yielded MAE ~0.018–0.022 and R2 ~0.91–0.93. The 95% confidence intervals for MAE were 0.011–0.015 and the R2 range was 0.93–0.97, confirming the stability of the network in different data subsets. Despite the high accuracy, the method has limitations: the results are sensitive to lighting conditions, cameras or device characteristics, so any application on different devices requires calibration. Furthermore, the RGB/HSV/Lab methodology is based on data from existing pollutant species and therefore its generalizability to other chemicals may be limited. Field studies and validations are needed to assess the applicability of the methodology under real-world conditions and mixtures of different pollutants. In this regard, it is suggested to integrate device specification control, calibration protocols, and testing with new pollutant combinations before widespread implementation of the system in real-world environments.
During the study, it was observed how the color of the solution changes with increasing concentration from 0 to 100%, analyzing RGB (red, green, blue) data, which allows us to draw conclusions about the color transformation and its possibilities for determining concentration (Figure 2a). The initial color of the solution at 0% concentration was almost white (R = 255, G = 255, B = 255), indicating that the material is practically transparent or light white, and, at low concentrations, the color changes are minimal. As the concentration increases, the value of the red (R) component decreases almost linearly from 255 to 0, while the green (G) and blue (B) components decrease more slowly, from 255 to 122 and 168, respectively, so the color of the solution gradually changes from light gray or white to a greenish-blue (cyan/teal) hue, and, from 50% concentration, it becomes noticeably darker, acquiring a gray-green with a blue tint, which at high concentrations (90–100%) turns into a dark greenish-blue tone, as the red component almost disappears. The decrease in the red color is the fastest and main factor determining the color shift, while the green and blue components, decreasing more slowly, maintain their respective shades even at high concentrations. These trends allow visual observation of color changes based on the ratio of RGB components without special equipment, and the consistent and almost linear color change makes it possible to use RGB data for concentration visualization, color scale, or visual indicators, allowing for quick and accurate assessment of substance concentration in laboratory studies or chemical experiments without complex measuring devices [,,,,].
Figure 2.
Dynamics of concentration changes: (a) variation of RGB color values depending on pollutant concentration; (b) comparison of actual and predicted concentrations—scatter plot.
The text describes the relationship between wastewater concentration (0–100%) and color changes measured by RGB values, which corresponds to the table provided. At a concentration of 0–5%, wastewater is almost transparent, domestic, with minimal organic load (R = 255, G = 255, B = 255). As the concentration increases, the color changes from light gray to greenish-blue, and, from 50%, it becomes dark gray-green with a blue tint, indicating industrial, potentially toxic wastewater (96–100%: R = 0–59, G = 122–131, B = 168–172). The decrease in the red component is the main factor in the color change. These changes allow visual determination of the type and concentration of wastewater, using RGB data as an indicator without complex instruments. Figure 2b shows the model’s ability to accurately predict pollutant concentrations based on RGB data. To verify the reliability of the model predictions, actual laboratory concentrations were compared with the model predictions (Figure 2b). As can be seen from the data (scatter plot), the model predictions are consistent with actual measurements over the entire 5–100% concentration range. For example, the 5% actual concentration was predicted as 4.9%, 20% as 20.1%, 50% as 50.2%, and 100% as 100.2%. Most of the predictions differ from the actual measurements by no more than ±0.3%, and the 45° line drawn on the graph visually demonstrates a good agreement. Points close to the line indicate that the model predictions practically agree with the real data. This direct comparison of actual and predicted concentrations proves that RGB color intensity analysis combined with a multihead neural network can reliably estimate preliminary pollutant concentrations. This visualization confirms the low MAE (0.012) and RMSE (0.018) indicators and the high R2 (0.96), meaning that almost 96% of the variation is explained by the model predictions. The use of averages allows for a clear assessment of the general trend across all concentration ranges, confirming that the multi-branch neural network reliably predicts both low and high pollutant concentrations, and the analysis of RGB color components can be effectively used for preliminary assessment of water pollutants without complex laboratory equipment.
The RGB-based method shows very high accuracy at low turbidity levels (0–50 mg/L), with MAE of 0.012. As turbidity increases, the error gradually rises, reaching MAE values above 0.05 for TSS > 200 mg/L. This indicates that suspended solids significantly influence the color-based measurements, reducing reliability at high turbidity. For practical applications, the method is recommended for samples with TSS below ~150 mg/L, while higher turbidity requires caution or alternative measurement strategies (Table 4).
Table 4.
Turbidity (TSS) vs. RGB Method MAE.
The method performs optimally in near-neutral pH conditions (6–8) with MAE = 0.012. Deviations from neutrality, both acidic (<5) and alkaline (>9), increase errors to MAE = 0.035–0.040. Extreme pH values may alter the solution’s color or the dominant RGB channel, causing mispredictions.
Therefore, preliminary wastewater assessment using RGB values is most reliable for pH between 6 and 8, and caution is advised outside this range (Table 5).
Table 5.
pH vs. RGB Method MAE.
Temperature has a moderate effect on method accuracy. The RGB method performs stably at room temperature (15–25 °C) with MAE = 0.012. Slight increases in error are observed at lower (10–15 °C) or higher (25–35 °C) temperatures (MAE ~0.018–0.020), while extreme temperatures (<10 °C or >35 °C) lead to more noticeable deviations (MAE 0.028–0.030). These results suggest that controlling sample temperature or applying calibration for temperature variation can improve robustness (Table 6).
Table 6.
Temperature (°C) vs. RGB Method MAE.
Container and background color strongly affect RGB-based predictions. Optimal accuracy (MAE = 0.012) is achieved with white or neutral backgrounds (Table 7). Light pastel colors slightly increase MAE (~0.018), while gray or dark backgrounds further reduce accuracy (MAE 0.025–0.035). Colored containers produce the highest errors (MAE 0.040), likely due to reflections or color contamination. Therefore, standardized white or neutral backgrounds are recommended for reliable measurements.
Table 7.
Container/Background Color vs. RGB Method MAE.
Comparing the proposed RGB color intensity and multi-head neural network method with other similar machine learning models, it is obvious that it has several essential advantages. Traditional single-output perceptrons or simple regression models can only predict the total concentration of pollutants but cannot simultaneously determine the dominant color, which would provide additional information about the chemical composition. Separate tree-based methodologies, such as Random Forest or Gradient Boosting, although they accurately predict the concentration, are not suitable for direct classification of RGB numerical data and cannot integrate both regression and classification tasks in a single model [,,,,]. Convolutional Neural Network (CNN) methods based on color value analysis have high accuracy and are able to exploit spatial RGB relationships, but require large amounts of data and complex infrastructure, which limits their practicality for mobile devices or real-time analysis. LSTM or RNN models for time-series RGB data analysis are very suitable for sequence modeling, but for identical single samples, they are often too complex and take a long time to train. The proposed multi-head neural network solves these problems and provides several important advantages: it simultaneously predicts the total pollutant concentration and dominant color, allows for visual assessment of wastewater characteristics, works with simple RGB numerical data, does not require large photo data and is easily adaptable in real time, including on mobile device applications. In addition, the model is characterized by high accuracy—MAE = 0.012, RMSE = 0.018, MAPE = 2.1%, R2 = 0.96, classification accuracy ≈ 0.93—which indicates that the predictions almost coincide with the measured data. Changes in RGB components visually reflect the chemical and organic load, so the method is not only accurate but also easy to interpret, allowing you to make preliminary decisions without complex laboratory equipment. Compared to other similar models, this method optimally combines accuracy, speed, and adaptability, making it particularly useful for preliminary wastewater monitoring assessments and real-time monitoring systems.
Based on this study, which developed a neural network that uses RGB color intensity measurements to predict the concentration of pollution in liquid waste, we plan to develop a mobile application for smartphones and other mobile devices in the future. This application will allow users, such as environmental professionals, industrial workers or even citizens, to quickly and easily analyze wastewater samples in real time, without the use of expensive laboratory equipment []. Using the device’s camera, the application will take a picture of the wastewater sample, automatically extract the RGB color components and, based on the trained model, approximately determine the total concentration of pollution in percent (from 0% to 100%), the dominant color class (red, green or blue), the possible type of wastewater (e.g., domestic, food industry, textile or chemical origin), and the level of chemical or organic contamination. In addition, it will be able to provide a preliminary hazard assessment based on color changes, such as a decrease in the red component, and offer recommendations for further research, thus facilitating environmental monitoring and rapid response to pollution problems.
4. Conclusions
This study presents a rapid method for assessing pollutant levels in wastewater. It uses RGB color data and a PyTorch-based multi-output neural network. Using RGB color data as a visual indicator, the developed model accurately predicts the total pollutant concentration and identifies the dominant color class, providing a fast and cost-effective alternative to traditional laboratory analytical methods such as gas chromatography (GC) or high-performance liquid chromatography (HPLC). The model achieved high performance indicators: a mean absolute error (MAE) of 0.012, a root mean square error (RMSE) of 0.018, a mean absolute percentage error (MAPE) of 2.1%, and a coefficient of determination (R2) of 0.96 for the prediction of the total concentration and a classification accuracy of 0.93 for the identification of the dominant color. Comparison of actual laboratory concentrations with model predictions showed that the model predictions correspond to real data in the entire 5–100% concentration range, most predictions differ from the actual by no more than ±0.3%, and the points on the graph practically coincide with the 45° line, confirming that RGB color intensity analysis combined with a multihead neural network can reliably preliminarily estimate pollutant concentrations. These results confirm the reliability of the model in capturing the relationship between RGB color profiles and pollutant concentrations over a wide range (0–100%)—from almost transparent domestic wastewater to highly concentrated, potentially toxic industrial discharges. The observed color changes, mainly determined by the rapid decrease in the red (R) component and the slower decrease in the green (G) and blue (B) components, closely match the presented dataset, allowing reliable visual differentiation of wastewater types and their chemical or organic load. This methodology has great practical value for environmental monitoring, as it allows for a rapid, equipment-free preliminary assessment of wastewater composition, especially in resource-limited conditions. However, the study found a tendency for overfitting of the data after approximately 70–80 epochs, as indicated by different training and validation loss curves. In order to avoid this, future work plans to introduce regularization methods (e.g., dropout, weight reduction) and early stopping strategies to optimize model generalization. The integration of digital color analysis with DI/MM algorithms is a promising advance in the field of wastewater management, facilitating rapid determination of pollutant concentration and type. This methodology can be developed for real-time monitoring systems or integrated into portable devices for local wastewater analysis, thus strengthening environmental protection efforts.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The raw data supporting the conclusions of this article will be made available by the authors on request.
Conflicts of Interest
The author declares no conflicts of interest.
References
- Zhang, M.; Huang, Y.; Xie, D.; Huang, R.; Zeng, G.; Liu, X.; Deng, H.; Wang, H.; Lin, Z. Machine learning constructs color features to accelerate development of long-term continuous water quality monitoring. J. Hazard. Mater. 2024, 461, 132612. [Google Scholar] [CrossRef] [PubMed]
- Kwiek, P.; Jakubowska, M. Color Standardization of Chemical Solution Images Using Template-Based Histogram Matching in Deep Learning Regression. Algorithms 2024, 17, 335. [Google Scholar] [CrossRef]
- Kılıç, V.; Alankus, G.; Horzum, N.; Mutlu, Y.; Mehmet, A.; Solmaz, E. Single-Image-Referenced Colorimetric Water Quality Detection Using a Smartphone. ACS Omega 2018, 3, 5531–5536. [Google Scholar] [CrossRef]
- Altowayti, W.; Shahir, S.; Othman, N.; Eisa, T.; Yafooz, W.; Dhaqm, A.; Soon, C.; Yahya, I.; Rahim, N.; Abaker, M.; et al. The Role of Conventional Methods and Artificial Intelligence in the Wastewater Treatment: A Comprehensive Review. Processes 2022, 10, 1832. [Google Scholar] [CrossRef]
- August, E.; Sabani, B.; Memeti, N. Using Colour Images for Online Yeast Growth Estimation. Sensors 2019, 19, 894. [Google Scholar] [CrossRef]
- Zeng, R.; Mannaerts, C.; Shang, Z. A Low-Cost Digital Colorimetry Setup to Investigate the Relationship betweenWater Color and Its Chemical Composition. Sensors 2021, 21, 6699. [Google Scholar] [CrossRef] [PubMed]
- Cao, P.; Zhu, Y.; Zhao, W.; Liu, S.; Gao, H. Chromaticity Measurement Based on the Image Method and Its Application in Water Quality Detection. Water 2019, 11, 2339. [Google Scholar] [CrossRef]
- Zhao, Y.; Ferguson, S.; Zhou, H.; Elliott, C.; Rafferty, K. Color Alignment for Relative Color Constancy via Non-Standard References. IEEE Trans. Image Process. 2022, 31, 6591–6604. [Google Scholar] [CrossRef]
- Rashid, F.; Jamayet, N.B.; Farook, T.H.; AL-Rawas, M.; Barman, A.; Johari, Y.; Noorani, T.Y.; Abdullah, J.Y.; Eusufzai, S.Z.; Alam, M.K. Color Variations during Digital Imaging of Facial Prostheses Subjected to Unfiltered Ambient Light and Image Calibration Techniques within Dental Clinics: An In Vitro Analysis. PLoS ONE 2022, 17, 0273029. [Google Scholar] [CrossRef]
- Barbero-Álvarez, M.A.; Rodrigo, J.A.; Menéndez, J.M. Minimum Error Adaptive RGB Calibration in a Context of Colorimetric Uncertainty for Cultural Heritage Preservation. Comput. Vis. Image Underst. 2023, 237, 103835. [Google Scholar] [CrossRef]
- Zhang, G.; Song, S.; Panescu, J.; Shapiro, N.; Dannemiller, K.C.; Qin, R. A Novel Systems Solution for Accurate Colorimetric Measurement through Smartphone-Based Augmented Reality. PLoS ONE 2023, 18, 0287099. [Google Scholar] [CrossRef]
- Chairat, S.; Chaichulee, S.; Dissaneewate, T.; Wangkulangkul, P.; Kongpanichakul, L. AI-Assisted Assessment of Wound Tissue with Automatic Color and Measurement Calibration on Images Taken with a Smartphone. Healthcare 2023, 11, 273. [Google Scholar] [CrossRef]
- Suominen, J.; Egiazarian, K. Camera Color Correction Using Splines. In Proceedings of the IS&T International Symposium on Electronic Imaging, Burlingame, CA, USA, 21–25 January 2024; pp. 165-1–165-6. [Google Scholar]
- Souissi, M.; Chaouch, S.; Moussa, A. Color Matching of Bicomponent (PET/PTT) Filaments with High Performances Using Genetic Algorithm. Sci. Rep. 2024, 14, 10949. [Google Scholar] [CrossRef]
- Dodevska, T.; Hadzhiev, D.; Shterev, I. A Review on Electrochemical Microsensors for Ascorbic Acid Detection: Clinical, Pharmaceutical, and Food Safety Applications. Micromachines 2023, 14, 41. [Google Scholar] [CrossRef] [PubMed]
- Porto, I.S.A.; Santos Neto, J.H.; dos Santos, L.O.; Gomes, A.A.; Ferreira, S.L.C. Determination of Ascorbic Acid in Natural Fruit Juices Using Digital Image Colorimetry. Microchem. J. 2019, 149, 104031. [Google Scholar] [CrossRef]
- Wójcik, S.; Ciepiela, F.; Jakubowska, M. Computer Vision Analysis of Sample Colors versus Quadruple-Disk Iridium-Platinum Voltammetric e-Tongue for Recognition of Natural Honey Adulteration. Meas. J. Int. Meas. Confed. 2023, 209, 112514. [Google Scholar] [CrossRef]
- Altowayti, W.A.H.; Othman, N.; Goh, P.S.; Alshalif, A.F.; Al-Gheethi, A.A.; Algaifi, H.A. Application of a novel nanocomposites carbon nanotubes functionalized with mesoporous silica-nitrenium ions (CNT-MS-N) in nitrate removal: Optimizations and nonlinear and linear regression analysis. Environ. Technol. Innov. 2021, 22, 101428. [Google Scholar] [CrossRef]
- Qasem, N.A.; Mohammed, R.H.; Lawal, D.U. Removal of heavy metal ions from wastewater: A comprehensive and critical review. npj Clean Water 2021, 4, 36. [Google Scholar] [CrossRef]
- Lu, Y.; Zhang, Y.; Zhong, C.; Martin, J.W.; Alessi, D.S.; Goss, G.G.; Ren, Y.; He, Y. Suspended solids-associated toxicity of hydraulic fracturing flowback and produced water on early life stages of zebrafish (Danio rerio). Environ. Pollut. 2021, 287, 117614. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Wang, P.; Gojenko, B.; Yu, J.; Wei, L.; Lio, D.; Xiao, T. A review of water pollution arising from agriculture and mining activities in Central Asia: Facts, causes and effects. Environ. Pollut. 2021, 291, 118209. [Google Scholar] [CrossRef]
- Saravanan, A.; Kumar, P.S.; Jeevanantham, S.; Karishma, S.; Tajsabreen, B.; Yaashikaa, P.; Reshma, B. Effective water/wastewater treatment methodologies for toxic pollutants removal: Processes and applications towards sustainable development. Chemosphere 2021, 280, 130595. [Google Scholar] [CrossRef] [PubMed]
- Khan, A.H.; Khan, N.A.; Ahmed, S.; Dhingra, A.; Singh, C.P.; Khan, S.U.; Mohammadi, A.A.; Changani, F.; Yousefi, M.; Alam, S. Application of advanced oxidation processes followed by different treatment technologies for hospital wastewater treatment. J. Clean. Prod. 2020, 269, 122411. [Google Scholar] [CrossRef]
- Sicard, C.; Glen, C.; Aubie, B.; Wallace, D.; Jahanshahi-Anbuhi, S.; Pennings, K.; Daigger, G.T.; Pelton, R.; Brennan, J.D.; Filipe, C.D.M. Tools for water quality monitoring and mapping using paper-based sensors and cell phones. Water Res. 2015, 70, 360–369. [Google Scholar] [CrossRef]
- McGonigle, A.J.S.; Wilkes, T.C.; Pering, T.D.; Willmott, J.R.; Cook, J.M.; Mims, F.M., III; Parisi, A.V. Smartphone Spectrometers. Sensors 2018, 18, 223. [Google Scholar] [CrossRef]
- Lima, M.J.A.; Nascimento, C.F.; Rocha, F.R.P. Feasible photometric measurements in liquid–liquid extraction by exploiting smartphone-based digital images. Anal. Methods 2017, 9, 2220–2225. [Google Scholar] [CrossRef]
- Li, G.; Tong, Y.; Li, J. Research on ultraviolet-visible absorption spectrum preprocessing for water quality contamination detection. Optik 2018, 164, 277–288. [Google Scholar] [CrossRef]
- Leeuw, T.; Boss, E. The HydroColor App: AboveWater Measurements of Remote Sensing Reflectance and Turbidity Using a Smartphone Camera. Sensors 2018, 18, 256. [Google Scholar] [CrossRef]
- Yan, J.; Xu, Z.; Yu, Y.; Xu, H.; Gao, K. Application of a hybrid optimized BP network model to estimate water quality parameters of Beihai Lake in Beijing. Appl. Sci. 2019, 9, 1863. [Google Scholar] [CrossRef]
- ISO 5667-10:2020; Water Quality—Sampling—Part 10: Guidance on Sampling of Waste Water. ISO: Geneva, Switzerland, 2020.
- ISO 5815-1:2019; Water Quality—Determination of Biochemical Oxygen Demand After n Days (BODn)—Part 1: Dilution and Seeding Method with Allylthiourea Addition. ISO: Geneva, Switzerland, 2019.
- ISO 15705:2022; Water Quality—Determination of the Chemical Oxygen Demand Index (ST-COD)—Small-Scale Sealed-Tube Method. ISO: Geneva, Switzerland, 2002.
- ISO 8245:1999; Water Quality—Guidelines for the Determination of Total Organic Carbon (TOC) and Dissolved Organic Carbon (DOC). ISO: Geneva, Switzerland, 1999.
- ISO 11923:1997; Water Quality—Determination of Suspended Solids by Filtration Through Glass-Fibre Filters. ISO: Geneva, Switzerland, 1997.
- ISO 7887:2011; Water Quality—Examination and Determination of Colour. ISO: Geneva, Switzerland, 2011.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).