Next Article in Journal
Immediate Effects of Dynamic Cupping on Median Nerve Mechanosensitivity in Healthy Participants: A Randomized Controlled Trial
Previous Article in Journal
The Impact of Noise on Learning in Children and Adolescents: A Meta-Analysis
Previous Article in Special Issue
A Practical Approach on Reducing the Flood Impact: A Case Study from Romania
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Water Quality Monitoring: A Water Quality Dataset from an On-Site Study in Macao

Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(8), 4130; https://doi.org/10.3390/app15084130
Submission received: 5 March 2025 / Revised: 2 April 2025 / Accepted: 5 April 2025 / Published: 9 April 2025
(This article belongs to the Special Issue Novel Approaches for Water Resources Assessment)

Abstract

:
The Building Safe Water Use Plan promoted by the Macao Marine and Water Bureau aims to encourage property management entities to regularly maintain building water supply systems to ensure the safety and stability of drinking water. However, traditional laboratory testing methods are often time-consuming and labor-intensive, making real-time and efficient water quality monitoring challenging. To address this issue, this study proposes a Raspberry Pi-based multi-sensor system for rapid water quality detection and improved monitoring efficiency. This system integrates multiple sensors to measure key water quality parameters, such as pH, total dissolved solids (TDSs), temperature, and turbidity, while recording data in real-time. The data were continuously collected over a period of five months (July to November 2024). The collected data were analyzed and validated using machine learning algorithms, including Isolation Forest, Random Forest, Logistic Regression, and Local Outlier Factor. Among these models, Random Forest exhibited the best overall performance, achieving an accuracy of 98.10% and an F1 score of 98.99%. These results show that the dataset demonstrates high reliability in anomaly detection and classification tasks, accurately identifying deviations in water quality. This approach not only enhances the efficiency of water quality monitoring but also provides technological support for urban drinking water safety management.

1. Introduction

Located in the south of China, in the Pearl River Delta, Macao is an international free port, one of the world’s tourism and leisure centers, and one of the most densely populated regions in the world [1].
As there are no local rivers in Macao, rainwater harvesting and storage facilities are very limited, and there are no conditions for the construction of large-scale water storage projects. The source of drinking water has long been affected by salty tides, and local water resources have always been very scarce. Especially since 2005, with the intensification of the impact of salty tides, the security of Macao’s water supply has been seriously affected.
At present, about 96% of Macao’s daily raw water is supplied by Zhuhai (Zhuhai is a city in the Guangdong Province of China, neighboring Macao, and is Macao’s main source of raw water supply). The local reservoirs, which mainly play the roles of backup and emergency, have a total effective storage capacity of 1.9 million cubic meters, which is equivalent to the current status quo’s water demand for seven days [1]. The Macao Special Administrative Region (SAR) government has always attached great importance to water conservation and has always taken water conservation as a prerequisite for the development, utilization, and protection of water resources. Macao has put in place a number of measures to reduce water consumption and expenditure, and has been committed to implementing various water conservation measures over the years, which have greatly improved the efficiency of water resource utilization. The quality standards of the drinking water in Macao are implemented in accordance with the “Standards and Rules for the Quality of Water for Human Consumption” of the Macao Water Supply and Drainage Regulations. In order to enhance the safety and security of water quality, the government has revised the “Regulations on Water Supply and Drainage in Macao”, which mainly includes increasing the number of monitoring items, revising the values of some microbiological, physical, and chemical water quality index parameters and analytical methods, and updating the minimum sampling frequency and the number of samples [2]. In order to raise the awareness of owners and property management companies regarding the maintenance and management of building water systems, the Macao Marine and Water Bureau (MWB) launched the “Building Water Safety Program” in 2018, with more than 50 of all the high-rise buildings in Macao participating in the program, successfully encouraging the owners and property management companies of more than 820 high-rise buildings to properly manage their buildings in accordance with the “Guidelines on Maintenance of Building Water Systems”. The owners and property management companies of over 820 high-rise buildings have been encouraged to properly manage the internal water supply facilities of their buildings in accordance with the “Guidelines on Maintenance of Tap Water System in Buildings”, and to carry out regular inspections, maintenance, and cleaning, including washing of water tanks and maintenance of water supply facilities at least once every six months, to ensure the hygiene and safety of the drinking water supply in their buildings and the stability of the water supply so as to ensure that members of the public can enjoy quality water. Tap water supply in buildings refers to the direct or indirect supply of urban public water to users. Direct water supply means that the water supply pipes inside a building utilize the excess water pressure of the public water supply network to deliver tap water directly to consumers, which is applicable to low-rise buildings. Indirect water supply means that the tap water is stored, pressurized, and then supplied to consumers, which is applicable to high-rise buildings. A building’s water supply system mainly consists of underground and rooftop storage tanks, pipes, valves, pressurization equipment, and water appliances [3]. Without proper maintenance and management of the water supply system, such as regular cleaning and disinfection of the tanks and the replacement of corroded pipes and tank components, the quality and stability of the water supply to a building will be affected. Therefore, it is the responsibility of building owners or property management entities to properly manage the tap water systems of their buildings and carry out regular inspection, maintenance, and cleaning to avoid the discoloration and uncleanliness of tap water, and to ensure the hygiene, safety, and stability of the water supply to their buildings. In response to the scheme, we need to submit regular application reports to the relevant water quality testing structures for laboratory testing. Traditional laboratory testing of water quality, despite its high accuracy and reliability [4], has significant limitations: firstly, the testing cycle is long (usually taking a few hours to a few days), which makes it difficult to respond to sudden pollution incidents in a timely manner, and discrete sampling fails to capture dynamic changes in the water quality; and secondly, it relies on expensive and sophisticated instruments and professional manpower, from equipment maintenance to reagent consumption to sample transportation and preservation, which are costly, especially in remote areas. In addition, the detection is limited by the laboratory environment; it is difficult to implement on site, and discrete sampling may be due to an uneven spatial distribution or data under-representation. At the same time, the traditional methods are mostly targeted at specific conventional pollutants (e.g., COD and heavy metals) [5], and the detection of emerging trace pollutants (e.g., microplastics and antibiotics) is not sufficiently covered, and the samples are transported in a short time. At the same time, traditional methods are mostly used for specific routine pollutants (e.g., COD and heavy metals), with insufficient coverage of emerging trace pollutants (e.g., microplastics and antibiotics), and the samples are easily contaminated or lost in transportation and pre-treatment, which further affects the accuracy. Because of this, on-site rapid testing, online monitoring, and intelligent technology have gradually become an important direction to make up for the traditional shortcomings. To address these shortcomings, we propose an intelligent IoT-based water quality monitoring system, integrating a multi-sensor array with Raspberry Pi for real-time detection and anomaly classification. Our system continuously monitors pH, total dissolved solids (TDSs), temperature, and turbidity—four key physicochemical parameters that are directly linked to over 60% of global drinking water contamination events, according to reports from the World Health Organization (WHO) and the US Environmental Protection Agency (EPA). While our system does not directly detect microbial or chemical contaminants (e.g., heavy metals), these pollutants often correlate with turbidity or pH fluctuations, allowing for indirect risk assessment. Our proposed system offers significant improvements over traditional IoT-based water quality monitoring solutions. Traditional systems primarily rely on microcontrollers (e.g., STM32 and ESP8266) or embedded devices with limited computational capabilities, requiring cloud-based data processing. In contrast, our system leverages a Raspberry Pi-based architecture with enhanced computing power, enabling local data processing and real-time edge computing.
From a sensor integration perspective, traditional systems often incorporate a single or a limited number of sensors (e.g., pH or TDS), making comprehensive water quality assessment difficult. Our system, however, integrates multiple sensors (pH, TDSs, temperature, and turbidity), significantly enhancing its water quality evaluation accuracy. Additionally, traditional systems predominantly use fixed-threshold anomaly detection mechanisms, which struggle to adapt to complex and dynamic environmental conditions. In response, our system employs a machine-learning-based triple-classification model (normal, borderline, and abnormal), improving the anomaly detection accuracy and enabling adaptive responses.
Another limitation of conventional IoT systems is their dependence on cloud computing, which introduces latency and data transmission delays. Our system mitigates this by processing data locally, significantly reducing response time and enabling real-time decision-making. Moreover, our system meets 60% of the water quality monitoring standards set by the World Health Organization (WHO) and the US Environmental Protection Agency (EPA), making it a more reliable alternative for urban water management.
A key advantage of our system lies in its enhanced data availability. Traditional systems typically operate with long sampling intervals, failing to capture rapid fluctuations in water quality. In contrast, our system performs continuous data collection with a sampling interval of 5 s, providing a more comprehensive representation of water quality trends. Lastly, regarding deployment and cost, conventional systems often depend on external cloud servers, increasing operational costs. Our system, built on open-source technology and low-cost hardware, significantly reduces deployment costs and enhances scalability, making it a cost-effective solution for large-scale water monitoring.
So, our key contributions in this study include the following:
  • Real-time and cost-effective monitoring—By using low-cost sensors and a modular Raspberry Pi-based architecture, the system enables continuous data collection at significantly lower costs than commercial systems [4].
  • Comprehensive anomaly detection—A boundary-condition-based triple-classification rule categorizes water quality into normal, borderline, and abnormal states, allowing for early warning and adaptive responses.
  • Compliance with international standards—Our system meets 60% of the WHO and EPA physicochemical water quality monitoring criteria, making it a robust alternative for urban water management.
  • Macao water quality dataset—We have established a high-quality water quality dataset for Macao, which provides a database for future water management in Macao.

2. System Design

In this project, a number of pH sensors [6], TDS sensors [7], temperature sensors (DS18B20) [8], and turbidity sensors [9] were connected to the Raspberry Pi via an Arduino for data acquisition and transmission. Each sensor is connected to either an analog or digital input port on the Arduino; for example, the pH and TDS sensors are connected to analog pins A0 and A1, respectively, the turbidity sensor is connected to analog pin A3, and the temperature sensor (DS18B20) is connected via a 1-wire protocol to a digital pin (e.g., D2) on the Arduino. The Arduino converts the analog signals from the pH, TDS, and turbidity sensors into digital signals via its analog-to-digital converter (ADC) [10] and transmits the processed data to the Raspberry Pi in real time via serial communication using the UART protocol [11]. Data transfers from the Arduino to the Raspberry Pi are at fixed intervals, typically every 5 s. The data transfer takes place via UART communication, and each sensor’s data are packaged in a structured format (usually JSON) to ensure that the information transmitted is easy for the Raspberry Pi to parse and process [12]. The data sent from the Arduino to the Raspberry Pi are transmitted at regular intervals based on a defined sampling rate. The sampling period is uniform for each sensor to ensure that all sensors collect data synchronously. Once the data are received, the Raspberry Pi reads the transmitted information using a Python (version 3.11; https://www.python.org/) script and stores it in a local SQLite (version 3.44; https://www.sqlite.org/) database for subsequent processing and analysis. The workflow then proceeds with data preprocessing, which includes cleaning and preparing the collected data for further classification. These processed data are classified into three levels based on predefined thresholds for each sensor parameter: meets physicochemical criteria, boundary conditions, and does not meet physicochemical criteria. These classifications help to assess the water’s quality and determine its suitability for consumption. To validate and improve the classification accuracy, machine learning models such as Isolation Forest, Random Forest, Logistic Regression, and Local Outlier Factor are used for anomaly detection and model validation. The results from these models are analyzed and evaluated to ensure the system’s reliability in classifying water quality. The validation process involves comparing the predicted classifications with actual water quality data and adjusting the models as needed to improve their performance. The outcome is a robust water quality monitoring system that uses both sensor data and machine learning techniques to provide real-time insights into water safety. The workflow of the system is illustrated in Figure 1. The process starts with data acquisition and transmission, followed by preprocessing, classification, and machine-learning-based validation to assess water quality.
The system’s hardware structure, as shown in Figure 2, consists of multiple sensors connected to an Arduino, which then transmits data to a Raspberry Pi for further processing and storage in an SQLite database. This architecture ensures efficient data acquisition and real-time monitoring.
To verify the implementation, a simulation setup was developed, as shown in Figure 3.

2.1. Hardware Components

According to the World Health Organization (WHO) [13] and the US Environmental Protection Agency (EPA) [14], about 60% of drinking water contamination events originate from physicochemical anomalies, so we chose pH, TDSs (total dissolved solids), turbidity, and temperature as the main water quality monitoring parameters to comprehensively reflect the physicochemical characteristics of the water body. In the process of sensor selection in Table 1, we comprehensively considered core indicators such as accuracy, sensitivity, price, power consumption, calibration cycle, etc., to ensure the accuracy of data, system stability, and feasibility of long-term operation. In the end, we chose DS18B20 (Manufacturer: Maxim Integrated, San Jose, VA, USA) DFROBOT Gravimetric TDS (Manufacturer: DFROBOT, Beijing, China), DFROBOT SEN0189 (Manufacturer: DFROBOT, Beijing, China), and pH-4502C (Manufacturer: DFROBOT, Beijing, China) as the main monitoring sensors. DS18B20 sensors are low-cost, highly resistant to interference, and have a calibration cycle of up to 6 months, making them an ideal choice for long-term deployment of an IoT monitoring system. The DFROBOT Gravimetric TDS sensor’s measuring accuracy is lower than high-end models, but its price advantage and stability make it suitable for applications with limited budgets. Further, dFROBOT SEN0189 is a low-cost turbidity sensor that provides sufficient accuracy for non-precision measurement tasks, and pH-4502C sensors offer high accuracy and stability for long-term water quality monitoring needs and low power consumption for online monitoring systems.

2.1.1. Ph Sensor

The pH sensor measures the acidity or alkalinity of water on a scale from 0 to 14, where values below 7 indicate acidity, values above 7 indicate alkalinity, and 7 represents neutrality. Its output voltage is linearly correlated with the pH level and is interfaced with an Arduino via a connection. The typical measurement error of the pH sensor is ±0.1 pH. The accuracy may be affected by calibration status, ambient temperature, and sensor aging.
Figure 4 illustrates the relationship between pH voltage and pH concentration. The formula for calculating the pH is given as [15]
pH = 5.887 · V + 21.677
where V is the voltage measured by the sensor.

2.1.2. TDS Sensor

The TDS sensor measures the total dissolved solids (TDSs) in water [16] by assessing its conductivity, reflecting the concentration of dissolved substances such as salts, minerals, and metal ions, and it is used to evaluate water purity in various applications, with a measuring range of 0–1000 ppm and a resolution of 1 ppm [17]. The measurement error of this sensor is typically ±10 ppm or 5% of the reading, whichever is greater. Factors such as water temperature, calibration status, and environmental conditions may affect accuracy.
The formula for calculating the TDSs is provided in [18].
The following cubic polynomial describes the relationship between TDSs and the sensor output voltage Figure 5:
TDS = 66.71 · V 3 127.93 · V 2 + 428.7 · V
where
  • V: Sensor output voltage (in volts, V);
  • TDS: Total dissolved solid concentration (in ppm).
This equation is derived from experimental calibration and provides a precise estimation of TDSs based on the sensor’s voltage output.

2.1.3. Turbidity Sensor

The turbidity sensor measures the cloudiness or clarity of water by emitting a light beam through the liquid [19] and measuring the intensity of light scattered or absorbed by suspended particles, with the output voltage corresponding to turbidity levels in NTUs (nephelometric turbidity units), crucial for detecting contamination caused by particles like silt, algae, and organic matter. The measurement error of the turbidity sensor is typically ±2 NTU or 5% of the reading, whichever is greater. Accuracy is influenced by factors such as light source stability, sensor contamination, and sample consistency.
The following cubic polynomial describes the relationship between NTU and the sensor output voltage in Figure 6.
The formula for converting voltage to NTU is given as
y = 1120.4 x 2 + 5742.3 x 4352.9

2.1.4. Temperature Sensor

The DS18B20 temperature sensor can be used to quantify the water temperature, operating with the 1-wire communication protocol [20], accurately measuring water temperature in the range of −55 °C to +125 °C with an accuracy of ±0.5 °C. The typical measurement error is ±0.5 °C within the recommended operating range of 0 °C to 85 °C. At extreme temperatures (close to −55 °C or 125 °C), the error may slightly increase. For optimal accuracy, the sensor should be used within its recommended range.

3. Data Acquisition Methods

3.1. Sampling Location and Duration

The project scope of the Macao Management Work Package in Buildings covers all buildings using secondary water supply systems in Macao, including residential buildings, commercial buildings, industrial buildings, schools, hotels, and casinos. Some school dormitories are selected as sampling points for water quality monitoring based on actual demand, and water quality data will be collected on both regular and irregular bases. The study focuses on five dormitories, with the monitoring period spanning from July to November 2024. Regular sampling is scheduled at three fixed time slots—8:00 a.m., 2:00 p.m., and 8:00 p.m.—to ensure coverage of major water usage periods and capture fluctuations in water quality throughout the day. Additionally, unscheduled sampling is conducted based on specific demands to enhance dataset completeness and representativeness.

3.1.1. Selection of Sampling Interval

To ensure high temporal resolution and data accuracy, a 5 s sampling interval is selected for water quality monitoring. As shown in the frequency spectrum analysis (Figure 7), the 5 s interval provides higher signal amplitude in the low-frequency range while reducing noise in the high-frequency domain. In contrast, data collected at a 30 s interval show rapid attenuation in the low-frequency region, which may lead to information loss. The higher temporal resolution of the 5 s interval allows the system to capture rapid variations in water quality parameters, such as transient contamination events or sudden anomalies.
Further validation is provided by power spectral density (PSD) analysis (Figure 8), which demonstrates that the PSD distribution of the 5 s interval remains relatively stable, effectively reflecting actual water quality trends. Conversely, the PSD of the 30 s interval exhibits significant fluctuations, potentially causing low-frequency information loss. The presence of large power fluctuations at certain frequency bands suggests that longer sampling intervals may introduce aliasing effects, compromising the integrity and analytical reliability of the water quality data.

3.1.2. Sampling Point Installation

The selection of monitoring locations is based on site conditions and residents’ water usage patterns. Water quality monitoring equipment is installed in key areas, including the following:
  • Kitchen sinks: Reflecting water used for drinking and cooking.
  • Bathroom sinks: Monitoring water used for personal hygiene.
  • Toilets: Assessing the quality of water used for flushing [21].
  • Utility sinks: Providing insights into water use in dormitory common areas.
By integrating strategically selected monitoring points with an optimized sampling interval, the proposed monitoring framework enhances the accuracy, completeness, and representativeness of the dataset. Future research will explore further optimization strategies, such as wavelet analysis, to enhance detection capabilities and improve system performance.

3.2. Sensor Calibration

In this study, in order to ensure the accuracy and reliability of water quality monitoring, the pH, TDS, temperature (DS18B20), and turbidity sensors used in this research system may be affected by a variety of factors in practical applications, such as environmental conditions, temporal degradation, and external disturbances, etc. To address these challenges, we have developed an exhaustive calibration and maintenance program for the sensors to ensure their long-term stability and measurement accuracy.

3.2.1. pH Sensor Calibration

First of all, pH sensors need to be immersed in a 3N KCl solution for at least 8 h to activate the sensor after initial use or prolonged storage. This process helps to restore the electrode’s original performance and prepares it for subsequent measurements [22]. To ensure the long-term accuracy of the pH sensor, we calibrate it regularly with pH buffer solutions (e.g., pH 4.00, pH 7.00, and pH 10.00) to ensure accurate calibration at different pH values. During daily use, the sensor is rinsed with distilled water after each measurement and stored in a protective cover filled with a 3.3 mol/L KCl solution to keep it moist, preventing the electrodes from drying out or scaling, thus ensuring the accuracy of its measurements. Additionally, if the sensor’s response performance decreases, the electrode can be immersed in a 4% hydrogen fluoride solution for 3–5 s before being rinsed with distilled water and reconditioned in a KCl solution to restore its sensitivity.

3.2.2. TDS Sensor Calibration

For TDS sensors, we use a standard TDS calibration solution for periodic calibration to ensure that the sensor’s measurements are consistent with standard values. After each use, the electrodes are rinsed with deionized water to avoid contamination and deposits, thus ensuring their long-term stability [23]. Our TDS sensor electrodes often use platinum black electrodes, which have a lower polarization effect and a higher surface area. These properties improve measurement accuracy and stability while effectively reducing errors caused by electrode polarization. Over time, the performance of TDS electrodes may change due to contamination or aging, so we regularly adjust the electrode constants according to the manufacturer’s guidelines and recalibrate as needed to ensure the high accuracy of the sensor.

3.2.3. DS18B20 Temperature Sensor Calibration

The DS18B20 temperature sensor does not require frequent calibration since it is factory-calibrated with high precision. However, to ensure reliability in high-precision applications, we periodically verify its accuracy by comparing its readings to those of a reliable reference thermometer. If discrepancies are detected, the sensor’s measurements are adjusted via software to maintain high accuracy in real-world applications. Routine maintenance of the temperature sensor also involves keeping its surface clean and avoiding exposure to extreme temperatures or environmental conditions, which could introduce potential measurement errors.

3.2.4. Turbidity Sensor Calibration

The calibration process for turbidity sensors requires the use of a standard turbidity calibration solution (e.g., 0 NTU, 20 NTU, 100 NTU, or 400 NTU) [24]. The sensor is immersed in the standard solution, and the calibration settings are adjusted to match the known value. After each measurement, the sensor is thoroughly rinsed with deionized water to prevent sediment contamination from affecting the optical surface and measurement accuracy. In addition, to preserve measurement accuracy, we avoid prolonged exposure to highly turbid or abrasive solutions, which could scratch the optical surface. If contamination or deposits occur on the optical surfaces, they are gently wiped with a non-abrasive cleaning solution to restore sensor performance.

4. Data Description

This dataset, called “Macao Water Quality”, provides detailed measurements of tap water quality in a number of households in Macao. The data were collected using a multi-sensor water quality monitoring system that uses a Raspberry Pi as the core processing unit, along with various sensors to measure key water quality parameters. The dataset contains more than 5000 entries and captures important water quality metrics, including pH [25], TDSs (total dissolved solids), temperature, turbidity, and time of day readings. The trend plots for each parameter and time are shown below.
As shown in Figure 9, the relationship between temperature and time is depicted. This figure highlights the fluctuations in water temperature over the recorded period.
Figure 10 illustrates the trend between total dissolved solids (TDSs) and time. It provides an overview of how the concentration of dissolved solids changes over the course of the observations.
In Figure 11, the relationship between pH and time is demonstrated. It shows the variation in the acidity or alkalinity of the water over time.
Figure 12 depicts the turbidity of the water and how it varies over time. Turbidity indicates the clarity of the water, with higher turbidity values typically corresponding to higher levels of suspended particles in the water.
We can see the data we obtained by uploading them to the website through the Raspberry Pi, as shown in Figure 13. The dataset is stored in both CSV and XML formats for easy use in data analysis. Based on the available water quality characteristics, we assigned labels based on the composite score calculations; the dataset contains three categories of labels, which are meets physicochemical criteria, boundary conditions, and does not meet physicochemical criteria. Next, we will introduce the labeling rules in detail.

Data Labeling

In the water quality monitoring program, the water quality status of each data point is assessed based on the available physical and chemical parameters, and we have introduced authoritative standards in Table 2, such as WHO (World Health Organization), EPA (US Environmental Protection Agency), and the Chinese National Standard [26] which are combined with the local characteristics of the water quality in Macao. This process classifies water quality status into three categories based on real-time values of selected water quality parameters (e.g., pH, turbidity, TDSs, and temperature): compliance with physicochemical criteria, borderline status, and non-compliance with rationalization criteria, as shown in Table 3.
It is important to note that this classification method is based solely on physicochemical parameters and does not include microbiological or chemical contamination assessments, such as the presence of E. coli or heavy metals. Therefore, a sample classified as “meets physicochemical criteria” does not necessarily mean it is safe for drinking without further microbiological monitoring.
According to each water quality parameter (pH, turbidity, TDSs, and temperature), its value within the respective threshold rules for the score obtained, and the calculation of the composite score, the formula for calculating the overall score is
Overall Score = TDS Score + pH Score + Temperature Score + Turbidity Score 4
This score ranges from 0 to 1, where a higher score indicates better water quality.
This score ranges from 0 to 1, where a higher score indicates better compliance with the selected physicochemical parameters. Each data point is assigned a label (Table 4) (meets physicochemical criteria, borderline condition, or does not meet physicochemical criteria) based on the composite score, and the labeled data are saved to a database or output file.
Figure 14 shows an example of the original data, and Figure 15 shows an example of the completed data labeling.
The dataset used in the analysis is categorized into three classification labels based on the physicochemical parameters of water quality. These labels represent the water quality status as follows:
  • Meets Physicochemical Criteria (0.5): 25% of the dataset.
  • Borderline Condition (1): 69% of the dataset.
  • Does Not Meet Physicochemical Criteria (0): 6% of the dataset.
The following Figure 16 visualizes the distribution of these labels in the dataset.

5. Data Validation

5.1. Data Validation: Comparison with Laboratory Data

To ensure the accuracy and reliability of the collected water quality data, a comparative validationwas conducted between the multi-sensor Raspberry Pi-based monitoring system and standard laboratory methods employed by the Macao Municipal Laboratory. The validation process involved quantitative statistical analysis using the Bland–Altman method, which is widely used for assessing the agreement between two measurement techniques.

5.1.1. Methodology and Data Sources

The validation study used two independent data sources:
  • Experimental data (system measurements): Collected using the Raspberry Pi-based multi-sensor platform, measuring pH, total dissolved solids (TDSs), and turbidity at regular time intervals.
  • Reference data (laboratory measurements): Obtained from the Macao Municipal Laboratory website and the Macao Statistics and Census Service’s environmental statistics reports.
To ensure proper comparison, data synchronization was performed by matching system-recorded values with the nearest corresponding laboratory measurement timestamps (08:00, 14:00, and 20:00 daily).

5.1.2. Bland–Altman Analysis

The Bland–Altman method was used to evaluate the agreement between system and laboratory measurements [27]. The method consists of three key steps:
Mean difference ( μ D ) calculation evaluates the systematic bias between the two measurement methods:
μ D = 1 n i = 1 n ( Y i X i )
where X i represents the laboratory measurement, Y i is the system measurement, and n is the total number of paired samples.
Standard deviation (SD) of differences calculation assesses the variability in measurement differences:
S D D = 1 n 1 i = 1 n ( D i μ D ) 2
Limit of agreement (LOA) computation defines the 95% confidence interval for measurement differences:
L O A upper = μ D + 1.96 × S D D
L O A lower = μ D 1.96 × S D D
These limits represent the expected range within which 95% of differences should fall, assuming no proportional bias.

5.1.3. Results and Interpretation

Figure 17 presents the Bland–Altman plots for pH, TDSs, and turbidity, comparing system and laboratory measurements.
  • pH Analysis:The mean difference ( μ D ) is close to 0.00, with LOA within ± 0.2 pH, indicating minimal systematic bias and high consistency across the measurement range.
  • TDS Analysis: The system measurements show a slightly positive bias, with values 5–10 ppm higher than laboratory readings, particularly at higher concentrations (>800 ppm). The LOA extends to ± 50 ppm, suggesting minor discrepancies due to sensor non-linearity at elevated TDS levels.
  • Turbidity Analysis: The mean difference is 0.1 NTU, with LOA at ± 0.8 NTU, consistent with WHO drinking water standards (<5 NTU). However, higher variance near 0 NTU indicates that measurements in ultra-low-turbidity conditions may be affected by sensor noise or light scattering effects.
The overall results suggest that the system performs reliably within the expected tolerances for drinking water monitoring, with differences remaining within the 95% confidence limits.

5.1.4. Comparison with Standard Laboratory Methods

Compared to laboratory methods, which use high-precision electrochemical pH meters, gravimetric TDS determination, and nephelometric turbidity analyzers, the Raspberry Pi-based system provides real-time, cost-effective monitoring while maintaining a minor trade-off in absolute precision. The differences observed in TDSs at high concentrations and turbidity at very low levels highlight areas where sensor calibration and post-processing algorithms could further enhance performance.

5.2. Data Validation Through Anomaly Detection Algorithms

The validity of the dataset is further assessed using anomaly detection algorithms. These algorithms identify unusual or unexpected data points that could indicate errors in the data collection process, sensor malfunctions, or significant shifts in water quality. Anomaly detection helps to identify problematic data that might otherwise skew analysis and reduce the overall quality of the dataset.
Anomalies in water quality data can be caused by a variety of factors. For example, a faulty sensor can lead to inaccurate readings, such as a pH sensor that consistently displays a neutral value of 7.0, which does not correspond to the actual water pH, indicating that the sensor is damaged or out of calibration. Similarly, electrical noise or signal interference can cause erratic fluctuations in readings from sensors such as TDSs, often due to poor grounding or nearby electronic equipment. Anomaly detection algorithms can help to identify these irregular data points by flagging readings that deviate significantly from expected values, allowing for accurate data validation and ensuring reliable water quality assessments.
In this study, a standardized dataset partitioning strategy was used to structure the water quality monitoring data. The original dataset was split into training, validation, and testing sets in a 7:3:1 ratio.

5.2.1. Isolation Forest

Isolation Forest [28] is an unsupervised anomaly detection algorithm that is particularly effective in high-dimensional datasets. It isolates anomalies by partitioning the data into smaller segments rather than profiling normal data, making it ideal for detecting rare events, such as water contamination or sensor malfunctions, in real-time monitoring systems. This method is well suited for water quality data, where anomalies are often infrequent and represent significant changes in the water’s condition. Because it focuses on isolating anomalies rather than modeling normal data, it excels at identifying rare outlying samples without requiring labeled data for training, making it highly applicable to dynamic and evolving datasets like those in water quality monitoring. Figure 18 presents the Isolation Forest AUC

5.2.2. Random Forest

Random Forest [29] is an ensemble learning method that constructs multiple decision trees and merges their results to improve classification accuracy. This algorithm is effective for handling complex non-linear relationships between water quality parameters (such as pH, turbidity, TDSs, and temperature) and their corresponding quality labels (normal, borderline, and abnormal). By aggregating predictions from several trees, Random Forest reduces the likelihood of overfitting and ensures more stable, reliable classifications. It is particularly useful for water quality monitoring, where the data might be noisy and the relationships between features are not always linear. Additionally, Random Forest can provide feature importance rankings, helping to identify which parameters are most influential in determining water quality. Figure 19 presents the Random Forest AUC.
In this study, the feature importance for water quality classification was evaluated using the Random Forest model. Based on the experimental results, the impact of four key water quality parameters (pH, TDSs, turbidity, and temperature) on water quality classification is ranked as follows (from highest to lowest feature importance score) in Figure 20:
  • Turbidity: 0.42;
  • TDSs (total dissolved solids): 0.30;
  • pH: 0.18;
  • Temperature: 0.10.
The order of importance of the features showed that turbidity and TDSs were the two most important factors affecting the detection of water quality anomalies. This result indicates that these factors are closely related to changes in water pollution. On the other hand, pH and temperature have relatively less influence, but they are also indispensable factors.

5.2.3. Logistic Regression

Logistic Regression [30] is a simple interpretable model used for classification tasks, making it a good baseline for evaluating the usability of a dataset. In water quality monitoring, Logistic Regression is applied to predict water quality labels (normal, borderline, and abnormal) based on parameters such as pH and TDSs. While it may not capture complex non-linear interactions between features as well as more sophisticated models like Random Forest, it provides a valuable benchmark for determining whether the relationships between the data and the labels are sufficiently simple to be modeled using linear decision boundaries. Logistic Regression is also easy to interpret, which makes it useful for understanding how changes in water quality parameters influence the predicted labels. Figure 21 presents the Random Forest AUC.

5.2.4. Local Outlier Factor

Local Outlier Factor (LOF) [31] is a density-based anomaly detection technique that identifies outliers by comparing the local density of data points to that of their neighbors. LOF is especially useful for detecting localized anomalies that may be missed by global models like Isolation Forest. In water quality monitoring, LOF can identify unusual patterns in specific regions of the data, such as localized contamination or sudden changes in sensor readings, that deviate from the typical water quality trends in a given area or time. This makes LOF an ideal choice for detecting subtle anomalies in real-world monitoring systems where the data may have varying densities across different locations or times. Figure 22 presents the Local Outlier Factor AUC.

5.3. Model Evaluation and Performance

Evaluation Metrics for Anomaly Detection

  • Precision [32]: Represents the proportion of truly anomalous samples among all the samples predicted as anomalous by the model. A higher precision indicates that the model is reliable in detecting anomalies, minimizing false positives.
  • F1 Score [33]: The weighted average of precision and recall, particularly useful for imbalanced data situations. A higher F1 score indicates the model’s stronger stability in anomaly detection.
  • Accuracy: Measures the proportion of correctly classified samples in the entire dataset. However, for anomaly detection tasks, accuracy is not the best evaluation metric as normal data usually dominate the dataset.
  • AUC-ROC (Area Under the Curve of Receiver Operating Characteristic) [34]: Measures the model’s ability to distinguish between normal and anomalous data. An AUC close to 1.0 indicates that the model has strong discriminative power and can effectively detect water quality anomalies.
The Isolation Forest algorithm demonstrated a high F1 score of 98.56%, which shows its effectiveness in detecting anomalies. The model’s precision of 97.77% indicates that it reliably detects true anomalies with minimal false positives, and its AUC-ROC score of 0.85 suggests that the model performs well in distinguishing between normal and anomalous data points.
Random Forest achieved an accuracy of 98.10%, an F1 score of 98.99%, and an AUC-ROC of 0.87. These high values indicate that the model is highly effective in classifying water quality samples, confirming that the dataset provides meaningful high-quality data for machine learning tasks.
Logistic Regression achieved a precision of 96.76%, an F1 score of 98.36%, and an AUC-ROC of 0.84. These results show that the dataset is well structured and that even a simple model can make accurate predictions regarding water quality states.
The AUC-ROC score for LOF was 0.75, which is slightly lower than the other models but still indicates that LOF is effective in detecting local anomalies in water quality data.
As shown in Table 5, the performance metrics for the different models demonstrate their effectiveness in detecting anomalies in water quality data.

6. Conclusions

This study presents a comprehensive water quality dataset collected in Macao, utilizing an advanced Raspberry Pi-based multi-sensor system that monitors critical water quality parameters, such as pH, total dissolved solids (TDSs), temperature, and turbidity. The dataset, consisting of over 5000 entries, provides real-time monitoring data, which is essential for urban water quality management. To ensure the accuracy and reliability of the collected data, we employed a three-level classification system and a composite scoring approach for water quality assessment. Multiple machine learning algorithms, including Random Forest, Isolation Forest, Logistic Regression, and Local Outlier Factor (LOF), were applied for data validation. Among these, the Random Forest model demonstrated exceptional performance, achieving an accuracy of 98.10% and an F1 score of 98.99%, making it highly effective in anomaly detection and classification tasks. The findings confirm that the dataset aligns with Macao’s building water safety standards, providing a solid foundation for the development of intelligent water quality monitoring systems and pollution warning frameworks. This system’s real-time detection and anomaly classification capabilities make it an invaluable resource for improving water quality management practices. Furthermore, this research supports the “Safe Drinking Water for Buildings” initiative in Macao, contributing significantly to public health and environmental protection efforts by ensuring safe and stable drinking water for the population. Future research will focus on expanding the system to detect microbial and chemical pollutants, improving machine learning models for better anomaly detection, and incorporating real-time analysis techniques like wavelet analysis. Additionally, advanced sensors will be explored to capture more water quality parameters. The system’s integration with urban water management strategies and enhancement of its scalability and cost-effectiveness will also be key for broader applications, ensuring sustainable and efficient water quality management in diverse environments.

Author Contributions

Conceptualization, J.G.; methodology, J.G.; software, J.G.; validation, J.G. and B.C.; formal analysis, J.G. and B.C.; investigation, B.C.; resources, J.G. and S.-K.T.; data curation, B.C.; writing—original draft preparation, J.G.; writing—review and editing, S.-K.T.; visualization, S.-K.T.; supervision, S.-K.T.; project administration, S.-K.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dateset and code presented in this study are openly available at (https://github.com/PriGaoJiawei/Macao-water-dataset (accessed on 4 March 2025)).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Marine and Water Bureau of Macao Special Administrative Region. Water Quality Information; Marine and Water Bureau of Macao Special Administrative Region: Macao, China, 2025. [Google Scholar]
  2. Zhan, S.; Zhou, B.; Li, Z.; Li, Z.; Zhang, P. Evaluation of source water quality and the influencing factors: A case study of Macao. Phys. Chem. Earth Parts a/b/c 2021, 123, 103006. [Google Scholar] [CrossRef]
  3. Homagai, P.L.; Rayamajhi, S.; Dhami, D.; Shrestha, R.L.; Bhattarai, D.P. Comparative adsorption behavior of malachite green dye onto charred and aminated sal (shorea robusta) sawdust from aqueous solution. Nepal J. Sci. Technol. 2022, 21, 81–90. [Google Scholar]
  4. Babatunde, A. A study on traditional water quality assessment methods. Risk Assess. Manag. Decis. 2024, 1, 41–52. [Google Scholar]
  5. Saravanan, A.; Kumar, P.S.; Jeevanantham, S.; Karishma, S.; Tajsabreen, B.; Yaashikaa, P.; Reshma, B. Effective water/wastewater treatment methodologies for toxic pollutants removal: Processes and applications towards sustainable development. Chemosphere 2021, 280, 130595. [Google Scholar]
  6. Dutta, S.; Sarma, D.; Nath, P. Ground and river water quality monitoring using a smartphone-based pH sensor. Aip Adv. 2015, 5, 057151. [Google Scholar]
  7. Aluwong, K.C.; Mohd Hashim, M.H.B.; Ishmail, S. Design of wireless-based sensor for real-time monitoring pH and TDS in Surface and Groundwater using IoT. J. Min. Environ. 2024, 15, 1309–1320. [Google Scholar]
  8. Koestoer, R.; Saleh, Y.; Roihan, I.; Harinaldi, H. A simple method for calibration of temperature sensor DS18B20 waterproof in oil bath based on Arduino data acquisition system. AIP Conf. Proc. 2019, 2062, 020006. [Google Scholar]
  9. Mylvaganaru, S.; Jakobsen, T. Turbidity sensor for underwater applications. In Proceedings of the IEEE Oceanic Engineering Society. OCEANS’98. Conference Proceedings, Nice, France, 28 September–1 October 1998; Volume 1, pp. 158–161. [Google Scholar]
  10. Walden, R.H. Analog-to-digital converter survey and analysis. IEEE J. Sel. Areas Commun. 1999, 17, 539–550. [Google Scholar] [CrossRef]
  11. Peña, E.; Legaspi, M.G. UART: A hardware communication protocol understanding universal asynchronous receiver/transmitter. Visit Analog 2020, 54, 1–5. [Google Scholar]
  12. Fonseca-Campos, J.; Reyes-Ramirez, I.; Guzman-Vargas, L.; Fonseca-Ruiz, L.; Mendoza-Perez, J.A.; Rodriguez-Espinosa, P. Multiparametric system for measuring physicochemical variables associated to water quality based on the Arduino platform. IEEE Access 2022, 10, 69700–69713. [Google Scholar] [CrossRef]
  13. World Health Organization. Available online: https://www.who.int/ (accessed on 4 March 2025).
  14. Parry, R. Agricultural phosphorus and water quality: A US Environmental Protection Agency perspective. J. Environ. Qual. 1998, 27, 258–261. [Google Scholar]
  15. Li, Y.; Mao, Y.; Xiao, C.; Xu, X.; Li, X. Flexible pH sensor based on a conductive PANI membrane for pH monitoring. RSC Adv. 2020, 10, 21–28. [Google Scholar]
  16. Adjovu, G.E.; Stephen, H.; James, D.; Ahmad, S. Measurement of total dissolved solids and total suspended solids in water systems: A review of the issues, conventional, and remote sensing techniques. Remote Sens. 2023, 15, 3534. [Google Scholar] [CrossRef]
  17. Ma’ruf, K.; Setiawan, R.J.; Alam, A.A.K.; Ismail, T.; Muhammad, C.I.; Ali, J. Internet of Things for Real-Time Monitoring of Water Quality with Integrated Temperature, pH, and TDS Sensors. In Proceedings of the 2024 International Conference on Electrical Engineering and Computer Science (ICECOS), Palembang, Indonesia, 25–26 September 2024; pp. 314–319. [Google Scholar]
  18. Jamil, A.; Ting, T.S.; Abidin, Z.Z.; Othman, M.; Wahab, M.H.A.; Abdullah, M.F.L.; Homam, M.J.; Audah, L.H.M.; Shah, S.M. Polynomial Regression Calibration Method of Total Dissolved Solids Sensor for Hydroponic Systems. Pertanika J. Sci. Technol. 2023, 31, 2769–2782. [Google Scholar]
  19. Matos, T.; Martins, M.; Henriques, R.; Goncalves, L. A review of methods and instruments to monitor turbidity and suspended sediment concentration. J. Water Process. Eng. 2024, 64, 105624. [Google Scholar]
  20. Jingzhuo, W.; Chenglong, G. Research on 1-wire bus temperature monitoring system. In Proceedings of the 2007 8th International Conference on Electronic Measurement and Instruments, Xi’an, China, 16–18 August 2007; pp. 3–722. [Google Scholar]
  21. Jordán-Cuebas, F.; Krogmann, U.; Andrews, C.; Senick, J.; Hewitt, E.; Wener, R.; Sorensen Allacci, M.; Plotnik, D. Understanding apartment end-use water consumption in two green residential multistory buildings. J. Water Resour. Plan. Manag. 2018, 144, 04018009. [Google Scholar]
  22. Ghoneim, M.; Nguyen, A.; Dereje, N.; Huang, J.; Moore, G.; Murzynowski, P.; Dagdeviren, C. Recent progress in electrochemical pH-sensing materials and configurations for biomedical applications. Chem. Rev. 2019, 119, 5248–5297. [Google Scholar]
  23. Shehata, A.B.; AlAskar, A.R.; Al Dosari, R.A.; Al Mutairi, F.R. Calibration and ISO GUM Based Uncertainty of Conductivity and TDS Meters for Better Water Quality Monitoring. Sci. J. Chem. 2022, 10, 211–218. [Google Scholar]
  24. Trevathan, J.; Read, W.; Sattar, A. Implementation and calibration of an IoT light attenuation turbidity sensor. Internet Things 2022, 19, 100576. [Google Scholar]
  25. Edition, F. Guidelines for drinking-water quality. WHO Chron. 2011, 38, 104–108. [Google Scholar]
  26. Wei, Y.; Hu, D.; Ye, C.; Zhang, H.; Li, H.; Yu, X. Drinking water quality & health risk assessment of secondary water supply systems in residential neighborhoods. Front. Environ. Sci. Eng. 2024, 18, 18. [Google Scholar]
  27. Giavarina, D. Understanding bland altman analysis. Biochem. Medica 2015, 25, 141–151. [Google Scholar]
  28. Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In Proceedings of the 2008 Eighth Ieee International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; IEEE: Pisa, Italy, 2008; pp. 413–422. [Google Scholar]
  29. Rigatti, S.J. Random forest. J. Insur. Med. 2017, 47, 31–39. [Google Scholar] [CrossRef]
  30. LaValley, M.P. Logistic regression. Circulation 2008, 117, 2395–2399. [Google Scholar]
  31. Alghushairy, O.; Alsini, R.; Soule, T.; Ma, X. A review of local outlier factor algorithms for outlier detection in big data streams. Big Data Cogn. Comput. 2020, 5, 1. [Google Scholar] [CrossRef]
  32. Streiner, D.L.; Norman, G.R. “Precision” and “accuracy”: Two terms that are neither. J. Clin. Epidemiol. 2006, 59, 327–330. [Google Scholar]
  33. Yacouby, R.; Axman, D. Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models. In Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, Online, 20 November 2020; pp. 79–91. [Google Scholar]
  34. Narkhede, S. Understanding auc-roc curve. Towards Data Sci. 2018, 26, 220–227. [Google Scholar]
Figure 1. Workflow of the water quality monitoring system.
Figure 1. Workflow of the water quality monitoring system.
Applsci 15 04130 g001
Figure 2. System hardware structure, showing sensor connections and data transmission.
Figure 2. System hardware structure, showing sensor connections and data transmission.
Applsci 15 04130 g002
Figure 3. Simulation diagram for system testing and validation.
Figure 3. Simulation diagram for system testing and validation.
Applsci 15 04130 g003
Figure 4. PH voltage vs. PH concentration.
Figure 4. PH voltage vs. PH concentration.
Applsci 15 04130 g004
Figure 5. TDS sensor voltage vs. TDS concentration (** denotes superscript).
Figure 5. TDS sensor voltage vs. TDS concentration (** denotes superscript).
Applsci 15 04130 g005
Figure 6. Turbidity sensor voltage vs. turbidity concentration.
Figure 6. Turbidity sensor voltage vs. turbidity concentration.
Applsci 15 04130 g006
Figure 7. FFT results for different sampling intervals.
Figure 7. FFT results for different sampling intervals.
Applsci 15 04130 g007
Figure 8. Power spectral density (PSD) results for different sampling intervals.
Figure 8. Power spectral density (PSD) results for different sampling intervals.
Applsci 15 04130 g008
Figure 9. Temperature and time.
Figure 9. Temperature and time.
Applsci 15 04130 g009
Figure 10. TDSs and time.
Figure 10. TDSs and time.
Applsci 15 04130 g010
Figure 11. pH and time.
Figure 11. pH and time.
Applsci 15 04130 g011
Figure 12. Turbidity and time.
Figure 12. Turbidity and time.
Applsci 15 04130 g012
Figure 13. Data online.
Figure 13. Data online.
Applsci 15 04130 g013
Figure 14. Original dataset.
Figure 14. Original dataset.
Applsci 15 04130 g014
Figure 15. Labeling dataset.
Figure 15. Labeling dataset.
Applsci 15 04130 g015
Figure 16. Label distribution of water quality dataset.
Figure 16. Label distribution of water quality dataset.
Applsci 15 04130 g016
Figure 17. Bland–Altman plots for pH, TDS, and turbidity measurements. The red dashed line represents the mean difference, while the green dashed lines indicate the 95% limits of agreement.
Figure 17. Bland–Altman plots for pH, TDS, and turbidity measurements. The red dashed line represents the mean difference, while the green dashed lines indicate the 95% limits of agreement.
Applsci 15 04130 g017
Figure 18. Isolation Forest AUC.
Figure 18. Isolation Forest AUC.
Applsci 15 04130 g018
Figure 19. Random Forest AUC.
Figure 19. Random Forest AUC.
Applsci 15 04130 g019
Figure 20. Feature importance scores from Random Forest model.
Figure 20. Feature importance scores from Random Forest model.
Applsci 15 04130 g020
Figure 21. Logistic Regression AUC.
Figure 21. Logistic Regression AUC.
Applsci 15 04130 g021
Figure 22. Local Outlier Factor AUC.
Figure 22. Local Outlier Factor AUC.
Applsci 15 04130 g022
Table 1. Sensor selection summary.
Table 1. Sensor selection summary.
Sensor TypeModelAccuracySensitivityPricePower ConsumptionCalibration Cycle
pHAtlas Scientific EZO-pH±0.002HighHighLow4–6 weeks
pHDFROBOT SEN0161-V2±0.1MediumLowLow2–4 weeks
pHpH-4502C±0.02MediumMediumMedium2–3 weeks
TDSAtlas Scientific EZO-EC±2%HighHighLow4–6 weeks
TDSDFROBOT Gravity TDS±10%MediumLowLow2–4 weeks
TurbidityIn Situ Aqua TROLL 200±0.1 NTUHighHighLow8 weeks
TurbidityDFROBOT SEN0189±3%MediumLowLow4 weeks
TemperatureDS18B20 (Digital)±0.5 °CHighLowLow6 months
TemperaturePT100 (Analog)±0.1 °CHighHighHigh6 months
Table 2. Comparison of International Water Quality Standards for pH, TDSs, turbidity, and temperature.
Table 2. Comparison of International Water Quality Standards for pH, TDSs, turbidity, and temperature.
ParameterWHO GuidelineEPA Drinking Water StandardGB 5749-2022 (China Standard)Unit
pH 6.5 8.5 6.5 8.5 6.5 8.5 -
TDS≤1000≤500 (recommended)≤1000mg/L
Turbidity≤5 (short-term maximum)≤1 (95% sample value)≤1NTU
TemperatureNo strict standardNo strict standardNo strict standard°C
Table 3. Threshold rule for water quality parameters (pH, TDSs, temperature, and turbidity).
Table 3. Threshold rule for water quality parameters (pH, TDSs, temperature, and turbidity).
ParameterConditionScore
TDS (ppm)TDS ≤ 5001
500 < TDS ≤ 10000.5
TDS > 10000
pH6.5 ≤ pH ≤ 8.51
6.0 ≤ pH < 6.5 or 8.5 < pH ≤ 9.00.5
pH < 6.0 or pH > 9.00
Temperature (°C)Temperature ≤ 151
15 ≤ Temperature < 250.5
Temperature > 250
Turbidity (NTU)Turbidity ≤ 51
5 < Turbidity ≤ 100.5
Turbidity > 100
Table 4. This is a combined water quality scoring rule table.
Table 4. This is a combined water quality scoring rule table.
Overall ScoreLabelExplanation
1.0Meets Physicochemical Criteria (1)Falls within all optimal physicochemical limits.
0.5 ≤ Score < 1.0Borderline Condition (0.5)Partially acceptable but does not fully meet optimal criteria.
Score < 0.5Does Not Meet Physicochemical Criteria (0)Fails to meet the minimum physicochemical standards.
Table 5. This is a model evaluation and performance table.
Table 5. This is a model evaluation and performance table.
ModelPrecisionF1 ScoreAccuracyAUC
Isolation Forest97.7%98.56%NA0.85
Random Forest97.99%98.99%98.10%0.91
Logistic Regression96.76%98.36%96.90%0.86
Local Outlier Factor93.54%96.66%NA0.75
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, J.; Chen, B.; Tang, S.-K. Water Quality Monitoring: A Water Quality Dataset from an On-Site Study in Macao. Appl. Sci. 2025, 15, 4130. https://doi.org/10.3390/app15084130

AMA Style

Gao J, Chen B, Tang S-K. Water Quality Monitoring: A Water Quality Dataset from an On-Site Study in Macao. Applied Sciences. 2025; 15(8):4130. https://doi.org/10.3390/app15084130

Chicago/Turabian Style

Gao, Jiawei, Bochao Chen, and Su-Kit Tang. 2025. "Water Quality Monitoring: A Water Quality Dataset from an On-Site Study in Macao" Applied Sciences 15, no. 8: 4130. https://doi.org/10.3390/app15084130

APA Style

Gao, J., Chen, B., & Tang, S.-K. (2025). Water Quality Monitoring: A Water Quality Dataset from an On-Site Study in Macao. Applied Sciences, 15(8), 4130. https://doi.org/10.3390/app15084130

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop