1. Introduction
The increasing amount of pollutants in surface water, ranging from industrial effluents and agricultural runoff to urban and domestic waste, poses a serious threat to the environment and public health. These stressors degrade aquatic ecosystems and compromise access to clean water, which is a critical resource for human well-being and economic resilience. Climate-related factors, such as prolonged droughts, rising temperatures, and changes in hydrological patterns, further exacerbate these issues by affecting water quality and encouraging the growth of harmful algal blooms [
1,
2].
As these challenges become increasingly complex and pressing, there is an urgent need for reliable and rapid on-site monitoring techniques that can detect critical water quality parameters. However, accurately characterizing contaminated water systems remains a significant scientific and technical challenge, particularly in dynamic environmental conditions. Optical methods such as ultraviolet-visible (UV/Vis) and fluorescence spectroscopy have therefore emerged as powerful tools for rapidly and non-invasively analyzing water quality. These spectroscopic techniques enable a non-invasive and rapid assessment of various water quality parameters and are particularly valuable for both qualitative and quantitative evaluations. Their ability to detect a wide range of organic and inorganic substances, dissolved solids, and even indicators for microbial contamination makes them essential tools in environmental monitoring. UV/Vis and fluorescence spectroscopy are characterized by their high sensitivity and relatively low operational costs, making them suitable for continuous and large-scale monitoring scenarios [
3,
4,
5]. In particular, fluorescence spectroscopy offers enhanced selectivity for specific organic compounds, while UV/Vis spectroscopy is frequently used to derive surrogate parameters based on characteristic absorption patterns [
4]. These technologies are widely used in field-based applications, including the deployment of submersible sensors for in situ analysis. This facilitates real-time monitoring and decision-making, especially in the context of wastewater treatment, surface water observation, and ecological studies [
3,
6]. Moreover, recent research has demonstrated the potential of combining spectroscopic data with data-driven approaches such as fuzzy logic and machine learning. These methods allow for improved interpretation of complex spectral datasets and have been successfully applied in scenarios such as algal detection and classification of water contaminants [
7,
8]. Overall, UV/Vis and fluorescence spectroscopy represent robust and cost-efficient techniques for water quality monitoring. Their integration with intelligent evaluation strategies is expected to further expand their capabilities in automated and adaptive environmental monitoring systems [
5].
While UV/Vis and fluorescence spectroscopy offer significant advantages in terms of sensitivity, speed, and cost-effectiveness, their accuracy and reliability are strongly influenced by various sources of uncertainty. This is particularly relevant in in situ applications, where measurements are performed under real-world environmental conditions. Factors such as temperature variations, turbidity, biofouling, sensor drift, and matrix effects lead to considerable uncertainties that often do not occur in laboratory environments or are easier to control.
Table 1 provides an overview of selected scientific publications that have already addressed uncertainties in UV/Vis and fluorescence spectroscopy to varying degrees. However, it also becomes evident that these uncertainties are often not comprehensively considered. In numerous instances, they are only partially discussed or not systematically integrated into the methodological approaches. This highlights the need for a more holistic and structured treatment of uncertainty in spectroscopic water analysis, particularly when aiming for robust in situ applications.
Addressing these challenges, this study explores the potential of in situ water quality monitoring to detect and evaluate algal contamination, focusing particularly on blue-green (cyanobacteria) and green algae. Although the classification system can theoretically identify other parameters, such as turbidity and chemical pollutants, this paper exclusively evaluates its performance using water samples affected by algal blooms. The study also considers the impact of measurement uncertainties, which are a critical factor in real-world monitoring applications. The main objective is to develop and evaluate a methodology for the in situ detection and classification of algae in real environmental conditions while explicitly accounting for uncertainty in spectroscopic measurements. This approach combines UV/Vis and fluorescence spectroscopy with robust, data-driven evaluation techniques to improve the reliability of sensor-based algal detection. The results demonstrate that incorporating uncertainty-aware methods improves the interpretability of spectral data and enables more accurate algal type identification. To ensure both methodical validation and real-world applicability, measurements were initially conducted using a controlled laboratory setup. Based on the same measurement principles, a developed submersible probe is also used for the measurement of the results. Ultimately, these findings will contribute to the development of practical, scalable tools for the early warning and monitoring of harmful algal blooms, an issue that is becoming increasingly relevant in the context of growing ecological pressures and climate-related changes.
The paper is structured as follows: The Introduction outlines the background and summarizes relevant research in the field. This is followed by a detailed description of the proposed methodology for water assessment, including data acquisition, pre-processing, feature extraction, and classification with consideration of uncertainty handling. The
Section 3 presents the findings of a targeted measurement campaign, which involved recording data from algae-affected waters using laboratory and in situ measurements. The subsequent discussion addresses the differences between laboratory and diving probe data collection methods and interprets the results. The article concludes with a summary of the most significant findings and their implications for future research.
2. Water Assessment Methodology
The methodological framework follows a structured workflow that is common in machine learning, covering the following steps: data acquisition, preprocessing, feature extraction, and classification [
21,
22]. Each step is customized to enable precise in situ water quality monitoring, with a focus on detecting algae and harmful substances. A key feature of the approach is the explicit incorporation of measurement uncertainties into the classification process via fuzzy pattern classification. This allows the influence of varying environmental conditions on measurement accuracy to be considered. This approach enhances robustness by effectively handling ambiguous data.
2.1. Data Acquisition
A custom-designed experimental setup was developed to record the absorbance and fluorescence spectra of water samples. This setup was later integrated into a mobile submersible probe for in situ measurements. It includes a miniaturized deuterium-tungsten light source (FiberLight D2, Heraeus, Hanau, Germany) for UV/Vis absorbance measurements, as well as a specially developed LED array for fluorescence and scattered light detection. The measuring cell, housed within a stainless-steel body, features two 90° collimators (PL-25-12-90SS-SLIM2-CO, Plasus, Mering, Germany) positioned opposite each other for precise optical alignment. The UV/Vis light source and spectrometer are connected to the collimators via solarization-resistant fiber optic cables (FG600AEA, Thorlabs, Newton, NJ, USA). The LED array is mounted at a 90° angle to the spectrometer to enable accurate fluorescence measurements. Light signals are detected using a Broadcom Qmini Wide UV spectrometer (San José, CA, USA), which covers a spectral range of 225–900 nm with a resolution of 1.5 nm.
All components are controlled by an embedded Raspberry Pi 4 Model B system, which manages the LED driver (TLC59108F, Texas Instruments, Dallas, TX, USA) via an I
2C bus and communicates with the spectrometer via USB. The entire system is designed to minimize external interference. The probe can be submerged directly in bodies of water, where it pumps water through the measuring cell, takes a reading, and then drains the sample. Measurements can be conducted with the pump turned off to ensure stability. The configuration of the probe and the experimental setup are illustrated schematically in
Figure 1, with the blue LED denoting excitation at 440 nm, the yellow LED representing excitation at 590 nm, and the black LED indicating excitation at 850 nm. It is important to note that these light-emitting diodes can also be modified. However, this study requires these excitation light sources. Further technical and software details are provided by Goblirsch et al. [
23].
Reference substances for algal detection were created using dilution series prepared with spinach extract (as a source of chlorophyll) and phycocyanin (a pigment found in blue-green algae). Spinach extract was prepared using the method described in DIN EN 17899, Annex C.3, except that water was used instead of ethanol as the solvent. As the extract contains various undefined compounds, a stock solution was first prepared and subsequently diluted to achieve concentrations ranging from 0.25% to 40% of the stock solution. Spectra were recorded using UV/Vis absorbance, fluorescence (excitation at 440 nm) and scattered light (excitation at 850 nm). To ensure robust classifier development, the measurements were repeated over several days. The variations observed in the spectra, even with identical solvent compositions, were attributed to the heterogeneous nature of the spinach extract. This variability mirrors the natural variation in the composition of real-world algal samples. Furthermore, phycocyanin fluorescence spectra were recorded at an excitation wavelength of 590 nm. As cyanobacteria contain both chlorophyll and phycocyanin, the latter serves as a specific marker.
2.2. Data Preprocessing
The first step is to perform a temperature correction. Temperature fluctuations can occur continuously in both the laboratory and on-site in the water, which directly affects the experimental setup and measurements. To analyze the impact of temperature, the entire experimental setup was placed in a climate chamber (FTD 100), where different temperatures can be set to observe the effects of temperature changes. To compensate for the effect of temperature, a factor is determined that normalizes the spectra to a reference temperature of 21.3 °C. The influence of temperature also depends significantly on the specific light source used. Therefore, a temperature correction factor is calculated for each light source and applied automatically during data preprocessing. This correction has been verified as valid up to approximately 30 °C. Beyond this range, however, deviations may occur due to non-linear thermal effects.
In addition, the UV/Vis spectra are converted to absorbance values using Lambert’s law (see Equation (
1)). At the start of each measurement, a blank spectrum of pure water is recorded. The absorbance spectrum is then obtained by subtracting this blank from the measured spectrum as part of the preprocessing step to improve the accuracy of the results. The absorbance
A is defined as
where
is the intensity of the incident light, and
I is the transmitted light intensity. According to Lambert’s law, the transmitted intensity depends on the molar absorptivity
, the concentration
c, and the path length
d. For fluorescence measurements, the blank spectrum is also subtracted from the measured intensity
I during preprocessing to yield the fluorescence intensity
F:
where
is the blank intensity and
I is the sample intensity.
To reduce noise in the spectra and improve data quality, a pre-processing filter is applied to the data. Due to the desired linear phase characteristics, a 30th-order FIR low-pass filter is used. To avoid phase shifts in the spectra, zero-phase filtering is performed using bidirectional filtering. The FIR filter is configured with a passband frequency of and a stopband frequency of , where is the sampling rate.
An important aspect of data pre-processing is turbidity compensation. In real-world measurements, turbidity is always present and has a significant effect on the UV/Vis spectra. The turbidity compensation method used here has been described in detail in a previous publication by Penzel et al. [
24]. To account for turbidity, scattered light spectra are recorded using an excitation wavelength of 850 nm. This spectral region is highly sensitive to turbidity-induced scattered light, making it ideal for estimating turbidity levels. The intensity of the scattered light spectrum is used as a quantitative measure of turbidity (expressed in FNU) to adjust the UV/Vis spectra. Subtracting the turbidity-related contribution from the UV/Vis spectrum effectively isolates the true absorbance. The exact procedure and algorithmic implementation can be found in the referenced publication. In this study, turbidity compensation is applied to UV/Vis spectra automatically. This compensation method is valid up to turbidity levels of around 20 FTU. For higher turbidity values, the accuracy of the correction decreases due to the effects of multiple scattering.
Figure 2 exemplarily shows the fully preprocessed spectra of spinach extract at different dilution levels, recorded using three optical measurement methods. Displayed are (a) UV/Vis absorbance spectra, (b) fluorescence spectra with excitation at 440 nm, and (c) scattered light spectra with excitation at 850 nm. These spectra represent the final processed data used for feature extraction and subsequent classification.
2.3. Feature Extraction
Following preprocessing, feature extraction is performed to derive characteristic parameters from the spectra. For feature extraction, a reference spectrum is first created for each substance at a specific concentration or dilution series value. This reference spectrum
is composed of a sum of Gaussian functions.The reference spectrum is defined as follows:
where
is the wavelength,
N is the number of Gaussian components,
represents the amplitude (peak height),
the center wavelength, and
the width of each Gaussian function. The scaling factor
k accounts for the intensity variations between the spectra. These parameters are optimized iteratively by identifying local maxima in the absorbance or difference spectra and then subtracting fitted Gaussians until the residual spectrum is almost flat or contains no significant peaks. This approach enables a mathematically defined reference spectrum to be constructed for each substance (e.g., algae substance, turbidity, pure water, etc.). Consequently, each recorded spectrum can be described by its respective Gaussian-based reference spectrum. A key extracted feature is the determination coefficient
, which quantifies the match between the measured spectrum and the reconstructed reference. This serves as a measure of the conformity of the substance. This procedure is illustrated in
Figure 3, using spinach spectra at a specific dilution level as an example. The Figure demonstrates how the fitted Gaussian functions approximate the original measurement and how the reference spectrum is constructed.
2.4. Fuzzy Classification
To classify the extracted spectral features while accounting for measurement uncertainties, a fuzzy pattern classification approach is applied. This method, developed by Bocklisch and colleagues [
25,
26], allows uncertainties (particularly those caused by varying environmental conditions) to be explicitly integrated into the classification process.
The fuzzy pattern classifier is a supervised learning method consisting of two phases: a learning phase and a working phase. During the learning phase, the extracted features are initially grouped using a conventional classification method, such as cluster analysis or an expert-based approach. These sharp classes are then transformed into fuzzy, multidimensional sets using the Aizerman potential function:
where
is the point of maximum membership (i.e., the representative value), and
b,
c, and
d define the shape, width, and slope of the membership function on each side. The fuzziness is determined by
, and the scope of uncertainty is described by
. The objects are unified into one-dimensional sets and then transformed into multidimensional fuzzy pattern classes using an N-fold compensatory Hamacher operator [
27]:
During the working phase, the resulting classifier maps new, unknown samples to all defined classes and assigns them membership values. Final classification is based on the class with the highest membership. The general structure and functionality of the fuzzy pattern classification system for assessing water quality parameters have already been tested and validated in previous studies [
7,
28]. These publications demonstrated the model’s capability to handle uncertainty and imprecision in environmental data, confirming its suitability for real-world water monitoring applications.
To train the classifier to assess algae and harmful substances in water bodies while taking uncertainties into account, spectra were recorded for various representative substances, including spinach (as an algae analogue), phycocyanin (a blue-algae pigment), turbidity standards, uranine (a foreign substance) and pure water. The spectroscopic data is collected under controlled laboratory conditions (temperature = 22 °C, pressure = 1.013 bar). All spectra were recorded using four measurement methods: UV/Vis absorbance, fluorescence with excitation at 440 nm, fluorescence with excitation at 590 nm, and scattered light with excitation at 850 nm.
The dataset includes the following measurements:
40 spectra from spinach dilution series ranging from 0.25% to 40%,
22 spectra of phycocyanin solutions with concentrations between 10 mg L−1 and 700 mg L−1,
12 spectra of samples with moderate turbidity (2–10 FNU),
12 spectra of samples with high turbidity (10–20 FNU),
11 spectra of pure water,
22 spectra of uranine solutions at various concentrations, recorded as a foreign substance.
These spectra were preprocessed and the features extracted, as previously described. The resulting training dataset comprises 400 of these features. The extracted
characteristics were then categorised into distinct groups based on the expertise of specialists in water quality assessment and spectral analysis. This expert-driven classification initially resulted in six sharply defined categories: the presence of blue algae, the presence of green algae, moderate turbidity, high turbidity and/or the presence of foreign substances (such as intensity residuals in absorbance), the presence of another unknown substance, and cases in which no detectable substance could be identified. The result is shown in
Figure 4.
To model the fuzzy pattern classes, the Aizerman membership function was applied using the parameter set
,
, and
for each object individually. These parameter values are widely recommended in the literature for the construction of normalized multivariate fuzzy membership functions, as they have proven effective in practical applications [
7,
29]. The corresponding class-specific fuzzy membership functions are then automatically derived based on the distribution of objects within each class. For
an analysis of the measurement uncertainties during data acquisition and evaluation was carried out in accordance with the
Guide to the Expression of Uncertainty in Measurement (GUM) [
30,
31]. The expanded uncertainty
U, calculated following GUM guidelines, was used as the basis for setting the parameter
in the membership function. Specifically, the expanded uncertainty was determined to be
for UV/Vis measurements and
for fluorescence and scattered light measurements. The detailed calculation procedure using the example of absorption is documented in [
32].
The result is a five-dimensional fuzzy pattern classifier. The corresponding calculated parameters are stored in the
Appendix A in
Table A1. Since a complete visualization of the fuzzy classifier in five dimensions is not feasible, a breakdown into four 3D projections is presented in
Figure 5, allowing for clearer interpretation and evaluation.
3. Results
To validate the proposed assessment method, a water sample was taken from Auensee Leipzig (Leipzig, Germany), a lake prone to cyanobacterial blooms during the summer months. These blooms have led to repeated usage restrictions over the years, posing ecological and health risks to humans and animals. Given this well-documented issue, lake Auensee is an ideal location in which to evaluate the classifier’s ability to detect harmful algae [
33].
In August 2024, water samples were collected and analyzed in the laboratory. The measurement setup included UV/Vis spectroscopy, fluorescence measurements at excitation wavelengths of 440 nm and 590 nm, and scattered light measurements at 850 nm. Although the measurements were taken in a laboratory, the experimental setup closely resembles the intended in situ measurements that will be taken using the submersible probe. This enables the classifier to be tested and performed under controlled conditions before deployment in the field. It is worth noting that the sample was collected near the bank of the sea and contained a high number of particulate components, resulting in considerable turbidity. This high turbidity affected the measurements, even after applying the turbidity compensation algorithm. In extreme cases, such turbidity can mask spectral peaks entirely, making the detection of characteristic features difficult or impossible.
The acquired spectra were fully preprocessed by the described steps. The results are shown in
Figure 6. Although multiple spectra were recorded, they exhibited highly similar features across all measurements. Therefore, only the first three representative spectra are presented and discussed here. The remaining spectra showed nearly identical profiles and did not provide additional information. Notably, the turbidity level displayed in
Figure 6c reaches the upper limit of the measurement system (approximately 40,000 counts), which highlights the intensity of scattered light present in the sample.
Subsequently, the spectra were analyzed using the feature extraction method specific to each measurement technique. As previously discussed, despite the preliminary turbidity compensation, the UV/Vis spectra showed only low detection performance for algae-like substances. This is reflected in the low coefficients of determination (
values between 0.3 and 0.4). In contrast, the fluorescence and scattered light spectra enabled clear identification of the corresponding reference spectra, with
values ranging from 0.7 to 0.9. The resulting dataset (see
Table 2) comprises three objects and twelve features, which can be input into the five-dimensional fuzzy pattern classifier introduced previously.
Object identification is performed again with the corresponding inherent fuzziness, derived from the calculated extended uncertainty. The classification results are shown in
Figure 7 as split three-dimensional top-view representations. The precise membership values are listed in
Table 3.
All three objects are assigned to Class 1 (presence of blue algae), with membership values exceeding 0.5. Compared to the values for the other classes, this clearly indicates contamination by blue algae. Therefore, the performance of the fuzzy pattern classifier can be considered validated. The consistent assignment of all objects to Class 1 confirms the fundamental validity of the classification model. As Lake Auensee is officially recognized as being contaminated with blue algae in the summer, this correct classification demonstrates that the model can reliably detect such pollution. This highlights the system’s potential to provide targeted warnings and actionable recommendations, enabling early intervention and minimizing risk.
In September 2024, water samples from Lake Auensee in Leipzig were analyzed using both the laboratory-based measurement setup and a submersible probe. For the laboratory setup, UV/Vis spectroscopy and fluorescence measurements were carried out at excitation wavelengths of 440 nm, 590 nm, and 850 nm. Each sample was measured in triplicate to ensure reproducibility. Additionally, on 24 September 2024, the same lake was sampled again using a submersible probe, applying identical spectral techniques. The resulting spectra from both measurement approaches were preprocessed and are shown in
Figure 8. As in the previous case, several spectra were acquired. However, due to their high similarity, only the first three are shown and used as representative examples. The top row of
Figure 8 displays the four spectral groups (440 nm, 590 nm, 850 nm fluorescence, and UV/Vis absorbance) obtained from the laboratory setup, while the bottom row contains the corresponding spectra captured by the submersible probe.
Subsequently, identical feature extraction methods were applied to all eight spectra. The results for feature
are zero. This is because the feature extraction algorithm is programmed to automatically assign a value of zero in cases of poor fit. The resulting
-based features from both measurement types are consolidated in
Table 4. This dataset includes three objects (samples) for each method and a total of 36 extracted features.
These features, incorporating their respective elemental uncertainties, were evaluated using the fuzzy pattern classifier. The resulting membership values (see
Table 5) show that, for both laboratory and in situ probe data, all three samples from Lake Auensee were consistently assigned to Class 2 (green algae contamination). Membership values for Class 1 (blue-green algae presence) were also notably high in both cases, indicating potential co-occurrence of both algal types.
Given these findings, a clear algal contamination can be assumed. Although the classifier assigns the dominant presence to green algae, the elevated values for blue-green algae warrant a warning and further investigation to verify the exact nature and extent of the contamination.
5. Conclusions
This study presents an approach for in situ water quality monitoring, focusing on the detection and classification of algal and harmful substances in water bodies, with explicit consideration of measurement uncertainties. Although the study focused specifically on blue-green (cyanobacteria) and green algae, this approach can also be used to detect other substances, such as turbidity and harmful or unknown substances. The system combines UV/Vis and fluorescence spectroscopy with a fuzzy pattern classifier to enable reliable operation in laboratory and real-world environments. Comprehensive pre-processing steps are applied to the data, including spectral normalization, noise reduction, and correction of systematic deviations, to minimize measurement uncertainty and enhance signal quality. These measures ensure that the input data is well prepared for accurate analysis. Furthermore, a feature extraction strategy that evaluates the full spectral profile is used, rather than focusing on individual wavelengths or peak intensities. This allows subtle yet meaningful spectral patterns associated with different types of algae to be identified. To evaluate the system’s transferability to field applications, measurements were first conducted using a laboratory setup and subsequently with a developed submersible probe designed for in situ deployment. Although minor deviations were observed between the two systems due to environmental and optical differences, the classification results remained consistent. Explicitly addressing measurement uncertainties throughout the data processing and classification stages enables the system to achieve greater robustness and reliability when interpreting complex spectral data. The results from laboratory and in situ measurement systems confirm that the classifier can adapt to environmental variability and deliver accurate, consistent results, even under challenging conditions.
Overall, combining a mobile submersible probe with refined spectral pre-processing, comprehensive feature analysis, and uncertainty-aware classification creates a robust basis for modern water quality monitoring. This system is particularly valuable for environmental surveillance and the early detection of harmful algae, providing a practical tool for the sustainable management of aquatic ecosystems in the face of growing ecological and climate-related challenges.
While this study demonstrates the potential of the proposed system, further research is needed to validate its performance in different bodies of water and environmental conditions. In particular, more in situ measurements from varied, ecologically diverse sites are essential to improve the robustness and generalizability of the classification.
Future work should also include long-term field studies, particularly during different bloom stages, as well as expanding the training dataset to enhance classification accuracy. Additionally, integrating advanced algorithms and exploring additional spectral features could enhance sensitivity and reliability further. Continued development and field validation are therefore crucial steps towards establishing a scalable, practical tool for the early detection of harmful algal blooms and improved water quality monitoring.