Assessment of Small-Settlement Wastewater Discharges on the Irtysh River Using Tracer-Based Mixing Diagnostics and Regularized Predictive Models

Anapyanova, Samal; Kolpakova, Valentina; Kulisz, Monika; Nabiollina, Madina; Yeremeyeva, Yuliya; Nurbayeva, Nailya; Sherov, Anvar

doi:10.3390/w18020232

Open AccessArticle

Assessment of Small-Settlement Wastewater Discharges on the Irtysh River Using Tracer-Based Mixing Diagnostics and Regularized Predictive Models

by

Samal Anapyanova

^1,2

,

Valentina Kolpakova

²,

Monika Kulisz

^3,*

,

Madina Nabiollina

¹,

Yuliya Yeremeyeva

^2,*

,

Nailya Nurbayeva

⁴ and

Anvar Sherov

⁵

¹

Department of Water Resources and Melioration, Faculty of Water Resources and Information Technology, Non-Profit Joint Stock Company Kazakh National Agrarian Research University, Almaty 050010, Kazakhstan

²

Center for Competence and Technology Transfer in Water Management and Water Use, School of Architecture, Construction and Energy, Non-Profit Joint Stock Company “D. Serikbayev East Kazakhstan Technical University”, Ust-Kamenogorsk 070004, Kazakhstan

³

Department of Organization of Enterprise, Faculty of Management, Lublin University of Technology, 20-618 Lublin, Poland

⁴

Department of Ecology, Institute of Forestry and Environment, Non-Profit Joint Stock Company “S. Seifullin Kazakh Agro Technical Research University”, Astana 010011, Kazakhstan

⁵

Department of Use of Hydromelioration Systems, Faculty of Hydromelioration, Tashkent Institute of Irrigation and Agricultural Mechanization Engineers, National Research University, Tashkent 100000, Uzbekistan

^*

Authors to whom correspondence should be addressed.

Water 2026, 18(2), 232; https://doi.org/10.3390/w18020232

Submission received: 18 November 2025 / Revised: 22 December 2025 / Accepted: 9 January 2026 / Published: 15 January 2026

(This article belongs to the Special Issue Eco-Engineered Solutions for Industrial Wastewater)

Download

Browse Figures

Versions Notes

Abstract

An integrated field–analytical framework was applied to quantify the impact of two small-settlement treatment facilities (TF1 and TF2) on the Irtysh River (East Kazakhstan). The main objective of this study is to quantify effluent-driven dilution and non-conservative changes in key water-quality indicators downstream of TF1 and TF2 and to evaluate parsimonious models for predicting effluent-outlet BOD and COD from upstream measurements. Paired upstream–downstream control sections are sampled in 2024–2025 for 22 indicators, and plant influent–effluent records are compiled for key wastewater variables. Chloride-based conservative mixing indicated very strong dilution (approximately

D \approx 2.0 \times 10^{3}

for TF1 and

D \approx 4.2 \times 10^{2}

for TF2). Deviations from the mixing line were summarized using a transformation diagnostic

θ

. At TF1, several constituents exceeded mixing expectations (

θ \approx 13

for COD,

θ \approx 42

for ammonium, and

θ \approx 6

for phosphates), while nitrate shows net attenuation

θ < 0

. At TF2,

θ

values cluster near unity, indicating modest deviations. Under a small-sample regime

(N = 10)

and leave-one-out validation, regularized regression provided accurate forecasts of effluent-outlet BOD and COD. Lasso under LOOCV performed best (BOD_after: RMSE = 0.626, MAE = 0.459, and

R^{2} = 0.976

; COD_after: RMSE = 0.795, MAE = 0.634, and

R^{2} = 0.997

). The results reconcile strong reach-scale dilution with constituent-specific local departures and support targeted modernization and operational forecasting for water-quality management in small facilities.

Keywords:

Irtysh river; anthropogenic load; water quality; domestic wastewater; small sewage treatment facilities

1. Introduction

Issues related to monitoring the environmental status of water bodies in Kazakhstan are becoming increasingly important. This is relevant as it contributes to ensuring the sustainability of aquatic ecosystems and water security in the country.

The Irtysh is one of the largest rivers in Kazakhstan. Its total length, including the Kara-Irtysh, is 4.2 thousand kilometers. It flows through the territories of China, Kazakhstan, and Russia. The Irtysh river plays a significant role in the development of industry and agriculture in the region, and as a result, is subject to anthropogenic impact. This leads to numerous environmental and social problems [1]. In addition, the river is used for fishery purposes. With the increasing anthropogenic load, the need for continuous monitoring of its ecological condition becomes more urgent [2,3,4]. One of the sources of anthropogenic impact is the discharge of domestic wastewater from cities and small settlements. As a rule, treated domestic wastewater is discharged into nearby water bodies. Its reuse is not widespread in Kazakhstan due to limiting factors—technical, regulatory, economic, and social, including public acceptance.

As of 1 January 2024, the Republic of Kazakhstan (RK) comprises 17 regions, 188 districts, 89 cities—including 3 of republican significance—29 towns, 2169 rural districts, and 6256 villages [5,6]. According to the Bureau of National Statistics, the urban population of the republic amounts to 12,727,404 people, while the rural population is 7,516,577. Thus, about 63% of the population resides in urban areas and about 37% in rural areas. The number of small settlements in Kazakhstan is approximately 6324. Currently, in accordance with the strategic development plan “Kazakhstan-2050” [7], state programs are being implemented to protect water resources, and to modernize (reconstruct and build) housing and communal infrastructure, as well as heating, water supply, and wastewater systems, including in small settlements. Furthermore, with the development and expansion of urbanized areas, as well as the increase in population and industrial facilities, the volume of wastewater in Kazakhstan is also rising [8,9].

It should be noted that most of the existing treatment facilities are both morally and physically outdated, are in a state of emergency, and create an unfavorable sanitary-epidemiological and environmental situation for the water bodies [10,11,12,13]. As it was noted in the report of the Minister of Industry and Infrastructural Development of the Republic of Kazakhstan, out of 89 cities of the republic, sewage treatment plants are absent in 27 cities; in 41 cities the wear reaches from 60% [14]. This paper focuses on small settlements, which are economically weaker and less developed than cities. Most of the existing WWTPs in small settlements were built in the 1970s, have been in operation for a long time, and usually comply with the standard scheme of wastewater treatment (mechanical and biological) with discharge into reservoirs. Vilson et al. [15] propose two ways to solve this problem: first, they consider it appropriate to build new wastewater treatment plants if the existing structures were built before the 1970s; second, structures built in the second quarter of the 20th century may be retrofitted. When using traditional wastewater treatment technology (mechanical and artificial biological treatment), the values of key indicators in the treated water can vary within the following ranges: BOD_full (biochemical oxygen demand) 10–20 mg/L; suspended solids 12–20 mg/L; ammonium nitrogen 5–7 mg/L; nitrate nitrogen 12–15 mg/L; petroleum hydrocarbons 1 mg/L [16].

Despite significant hydrological dilution in the Irtysh River, local effluent effects on discharge remain noticeable in the form of elevated values for a number of parameters (BOD, COD, nitrites, ammonium, and phosphates) near the effluents [1].

A review of the literature shows that some authors focus on analyzing the condition and operation of treatment facilities, emphasizing proper operation and maintenance features of treatment facilities in small settlements [17,18,19,20]. Other authors address issues related to the intensification and improvement of treatment facility performance [15,21,22,23,24], as well as the reuse of treated domestic wastewater [25]. There are also studies that specifically focus on the Irtysh River. The river was studied by M. Burlybayev [26] over the period from 1986 to 2011, D. Burlybayeva [27] from 1947 to 2013, and I.V. Shenberger [28] from 2006 to 2015. Kolpakova et al. [29] conducted a hydrological assessment of the Kazakhstani section of the Irtysh River under conditions of industrial development and climate change from 2019 to 2023. In the aforementioned studies, the water quality of the Irtysh River was assessed at control sections near major cities (Ust-Kamenogorsk, Semey, Pavlodar).

Beyond conventional compliance-based assessments, hydrological and geochemical research has developed a broad family of tracer-aided approaches to quantify contributions from different water sources to observed stream chemistry [30,31]. A key concept in this family is End-Member Mixing Analysis (EMMA), a statistical and geochemical method widely used to estimate the contribution of various source waters to a mixture and to explain the formation of its current chemical composition [32,33,34,35,36]. EMMA is based on linear mixing theory and multidimensional statistical analysis: it combines water and tracer mass balance with principal component analysis (PCA) to treat several chemical variables jointly as a single multidimensional structure [32,33]. This allows one to reduce the dimensionality of large heterogeneous data sets, identify latent structure, and separate signal from noise in a way that is mathematically rigorous and easily reproducible [30,32].

In classical EMMA, pollutants or conservative solutes (e.g., major ions such as

{H C O}_{3}^{-}, {S O}_{4}^{2 -}, {C l}^{-}, {C a}^{2 +}, {M g}^{2 +}, {N a}^{+}, K^{+}

, selected trace parameters, and stable isotopes) are treated as tracers that distinguish between end-members, and PCA is used to define a low-dimensional mixing space [32,33,35]. Within this framework, stream chemistry is interpreted as a mixture of a finite number of end-members, and their contributions are computed from linear mixing equations [32,33]. The fundamental EMMA equations can be derived directly from the conservation of water and tracer mass, for example, for two components with discharges Q₁ and Q₂ and concentrations C₁ and C₂ mixing to produce a stream with discharge

Q_{r}

and concentration

Q_{r} = Q_{1} + Q_{2}

;

C_{r} Q_{r} = C_{1} Q_{1} + C_{2} Q_{2}

[32,33]. The method requires that end-members exhibit sufficiently distinct tracer signatures relative to analytical uncertainty, that the system can be approximated as quasi-stationary over the analysis period, and that no additional unaccounted sources or sinks significantly affect the water or tracer mass balance. Despite practical challenges in defining representative end-member concentrations in space and time, EMMA has been widely applied in small and medium-sized catchments to separate flow components, identify dominant sources of runoff (e.g., rainwater, soil water, groundwater, snowmelt, glacial melt), and test hydrological hypotheses in a way that is relatively independent of specific model structures [34,35,36].

Studies based on EMMA and related tracer-aided modeling are widespread internationally and increasingly used to understand how climate change and human activity modify water resources and water quality [31,35,36,37,38]. Applications include small mountain catchments in Asia and Europe; experimental basins in Russia (e.g., Laninsky Stream in the Baikal region, small mountain catchments in Central Sikhote-Alin); and snow-melt-dominated basins in North America, where EMMA has been used to quantify the proportions of different source waters in river runoff and to develop ensemble approaches to hydrograph separation [36,37,39]. Recent work has also emphasized that tracer-aided modeling (TAM) continues to expand its scope of application and that EMMA remains one of the core tools of hydrograph separation for quantifying the impact of surface- and groundwater sources on streamflow [31,38].

However, the use of EMMA-type approaches in the context of local wastewater plumes discharged from small-settlement treatment facilities into large rivers is still limited. Conceptually, the problem is similar to catchment-scale mixing: downstream river water can be represented as a mixture of upstream river water and wastewater effluent. Yet, compared to classical catchment applications, the system is often simpler in terms of the number of end-members, while effluent concentrations may be highly variable and key indicators such as BOD and COD may undergo rapid in-stream transformations that violate strict conservative-tracer assumptions. This creates a need for simplified, EMMA-inspired mixing diagnostics that can be applied with limited data, explicitly account for dilution and mixing, and provide a bridge between process-based understanding and operational predictive tools for effluent management [36,39].

A literature review showed that articles have not sufficiently studied the impact of discharges from small industrial facilities on the ecological condition of the Irtysh River, which reduces the completeness and objectivity of the assessment of the actual ecological impact. Studying the impact of discharges from small treatment facilities will provide an objective assessment of the ecological condition of the Irtysh River. This will allow for a more comprehensive consideration of anthropogenic pressures from smaller-scale facilities, such as small treatment facilities.

Small settlements are numerous along the Kazakhstani reach of the Irtysh, and many rely on aging facilities operating conventional mechanical-biological schemes; assessing their discharges is therefore essential for basin-scale water-quality management. This study addresses a gap in assessments of the Irtysh River, which have focused largely on control sections near major cities, by quantifying the impact of discharges from small-settlement treatment facilities that are widespread along the river corridor. In methodological terms, the work integrates three complementary components:

A conservative-tracer mixing analysis to estimate site-specific effluent fractions and dilution;
A transformation diagnostic (θ) that tests whether reactive indicators depart from the mixing line;
Low-complexity, cross-validated predictive models (regularized regressions and PLS) to forecast BOD and COD at facility effluents from upstream water-quality indicators.

Together, these parameters provide a physically grounded baseline, reveal biogeochemical departures from simple entrainment, and deliver practical, uncertainty-aware forecasts for operational use. The objective is to construct and compare parsimonious predictive models for BOD and COD under small-sample conditions, and to interpret their performance against dilution-based expectations derived from the tracer analysis.

This study integrates conservative-tracer mixing (f, D), a transformation diagnostic (θ), and low-complexity, cross-validated predictive models to evaluate two small-settlement facilities on the Irtysh River. The novelty lies in combining physically grounded dilution estimates with a statistical diagnostic of departures from the mixing line and with sparse, uncertainty-aware forecasts for BOD and COD under a small-sample regime (N = 10). The aim is to quantify dilution, identify reactive behavior, and select operationally useful predictors for effluent quality.

2. Materials and Methods

2.1. Study Area

The area of research is the impact of domestic wastewater discharges from two small settlements on the Irtysh River in the East Kazakhstan region. Figure 1 shows a schematic map showing the location of the area under study.

The objects of the study are sewage treatment plants TF1 and TF2. The design capacity of the wastewater treatment facilities (TF1) in Settlement No. 1 (constructed in 1980) is 5000 m³/day, while the actual capacity is 370.9 m³/day.

The design capacity of the wastewater treatment facilities (TF2) in Settlement No. 2 (constructed in 1964) is 1824 m³/day, while the actual capacity is 625 m³/day.

For the treatment of domestic wastewater from the two settlements, mechanical and artificial biological treatment is provided. The treatment facilities are composed as follows:

The composition of the TF1 in Settlement No. 1 includes a sewage pumping station; a receiving chamber; a horizontal grit chamber with circular water flow (85% wear); a sand hopper; a Venturi-type flow measuring flume; a primary clarifier (50% wear); two-lane aeration tanks; a secondary clarifier (50% wear); an aerobic mineralizer; a contact tank; a sludge digester; a chlorination unit; a discharge point for treated water into the watercourse; and sludge drying beds.
The composition of the TF2 in Settlement No. 2 includes a screening unit (70% wear); a grit chamber (85% wear); primary two-tier and vertical clarifiers (50% wear); biological reactors No. 1 (100% wear) and No. 2 (50% wear); secondary vertical clarifiers (50% wear); sludge drying beds; a chlorination unit; and a contact tank.

2.2. Chemical Water Quality Parameters

The study used data obtained in laboratory conditions from the treatment facilities of two small settlements, No. 1 and No. 2, over the last five years (2020–2024) for 12 indicators. The facilities are located in the East Kazakhstan region.

Water quality was assessed based on 22 indicators by taking samples in June 2024 and 2025 at four locations on the Irtysh River. Background points located 500 m upstream and 500 m downstream from the discharge point of the treatment facilities relative to settlements No. 1 and No. 2 were selected for monitoring. The selected period reflects the current state of the treatment facilities, the technologies used, and the applicable regulatory requirements. The analysis of data for this period really shows the degree of impact of discharges from sewage treatment plants on the ecological state of the Irtysh River.

During the field studies under real operating conditions, pollutants were examined according to the following conditional groups:

Biogenic substances—ammonium salts (NH₄⁺), nitrite (NO₂⁻) and nitrate (NO₃⁻) salts, phosphates (PO₄³⁻);
Organic substances—petroleum products, synthetic surfactants;
Major ions—sulfates (SO₄²⁻), chlorides (Cl⁻), calcium (Ca²⁺), magnesium (Mg²⁺);
Heavy metals—copper (Cu²⁺), zinc (Zn²⁺), lead (Pb²⁺), chromium (Cr⁶⁺), total iron (Fe²⁺), cadmium (Cd²⁺), manganese (Mn²⁺).

An analysis was conducted on the following indicators of wastewater before and after discharge from the studied treatment facilities: temperature, total mineralization (dry residue), pH, suspended solids, permanganate oxidizability, dissolved oxygen, BOD, and COD.

Laboratory-analytical studies to determine the hydrochemical composition of the water were carried out at the accredited laboratory of LLP “Testing Laboratory of NGO EK-ECO” in the city of Ust-Kamenogorsk. Analytical, systematic, and comparative methods were applied for data processing.

The efficiency of cleaning by contamination was determined by the following Formula (1):

Э = \frac{{(C}_{i n f} - C_{e f l}) \times 100 %}{C_{i n f}},

(1)

where C_inf is the concentration of pollutants at the entrance to sewage treatment plants; and C_efl is the concentration of pollutants at the effluent of sewage treatment plants.

2.3. Mathematical Methods/Models

Measurements taken at the treatment plant (influent/effluent) and in the river (upstream/downstream) alone do not allow for a clear distinction between the impact of mixing (dilution of wastewater with river water) and transformation processes (decrease/increase in indicators in the section of the river downstream of the discharge). A model-based analysis was undertaken to separate the contribution of hydraulic mixing from biogeochemical transformation along the river reach affected by the discharge. Measurements at the wastewater treatment plant (WWTP) effluent and in the river upstream and downstream reflect both processes simultaneously. Therefore, an algebraic framework was employed to quantify the mixing fraction and dilution attributable to effluent entrainment; deviations from conservative mixing consistent with net attenuation (downstream concentrations below the mixing prediction) or net generation/additional inputs (downstream concentrations above the mixing prediction) of reactive water-quality indicators; and the overall magnitude of multi-indicator change in a unitless form suitable for comparison across sites and periods. The approach complements compliance-based assessment by providing interpretable diagnostics of process dominance.

An overview of the complete field-to-model workflow—covering sampling, data preprocessing/QC, conservative-tracer mixing diagnostics (f, D), θ-based deviations from the mixing line, and the predictive modelling/validation pipeline—is provided in Figure 2.

For each indicator, triplets of concentrations were compiled: upstream river concentration (

C_{up}

), downstream river concentration (

C_{down}

), and WWTP effluent concentration (

C_{eff}

). Indicator names were normalized to consistent labels (e.g., Chlorides, Dry residue/TDS, COD, Ammonium, Phosphates, Nitrite, Nitrate). The analytical endpoints

{B O D}_{full}

(effluent) and

{B O D}_{5}

(river) were treated as different measures and were not combined within the same model calculations.

For a conservative tracer (i.e., negligible reaction over the inter-station distance), downstream concentration was represented as a linear mixture of upstream river water and effluent (2):

C_{d o w n} = (1 - f) C_{u p} + f C_{e f f},

(2)

where

f \in [0, 1]

denotes the effluent mixing fraction in the downstream cross-section. The fraction and the dilution factor were obtained as the following (3):

f = \frac{C_{d o w n} - C_{u p}}{C_{e f f} - C_{u p}}, D = \frac{1}{f} .

(3)

Tracer selection prioritized Chlorides. Feasibility was enforced by the convexity (mixture) condition

C_{down} \in [\min (C_{up}, C_{eff}), \max (C_{up}, C_{eff})]

and by the physical bounds

0 \leq f \leq 1

. For each site (and date, if applicable) a single reference fraction

f^{*}

was selected from feasible tracers following the stated priority; non-feasible records (e.g., sulfate cases violating convexity) were not used for

f^{*}

estimation but were retained descriptively.

Chloride (Cl⁻) was selected as the primary conservative tracer because, over the short inter-station distance considered here, it behaves quasi-conservatively in river water: it is a major dissolved ion that is not subject to rapid biological uptake or redox transformation, and it does not readily sorb or precipitate under typical oxic, circumneutral river conditions. In domestic wastewater, chloride is largely derived from household inputs (e.g., dietary salt) and therefore tends to be elevated relative to upstream river water, providing a strong signal-to-noise ratio for mixing calculations. Consequently, changes in Cl⁻ between the upstream and downstream cross-sections are attributed primarily to physical mixing, enabling estimation of the effluent fraction f and dilution factor D from the linear mass-balance model.

Given

f^{*}

from Model 1, the extent to which mixing alone explained the observed downstream concentration of a reactive indicator was evaluated using the following (4):

θ = \frac{C_{d o w n} - (1 - f^{*}) C_{u p}}{f^{*} C_{e f f}} .

(4)

Values

θ \approx 1

indicate consistency with mixing;

θ < 1

indicate net removal (attenuation) between stations in excess of dilution;

θ > 1

indicate net generation/additional inputs (i.e., downstream concentrations above the conservative mixing prediction) or an additional source term, or a possible underestimation of

f^{*}

under the sampling conditions;

θ < 0

indicate downstream concentrations below the mixing prediction, implying strong removal or data misalignment. The diagnostic was not evaluated for the tracer used to obtain

f^{*}

, for which

θ

equals 1 by construction. Indicators considered included COD, Ammonium, Phosphates, Nitrite, and Nitrate, with matching definitions and units across river and effluent.

To summarize multi-indicator upstream-to-downstream change without requiring effluent data, relative changes were computed for each indicator

i

in a site/date group as follows (5):

R_{i} = \frac{C_{d o w n, i} - C_{u p, i}}{|C_{u p, i}| + ε},

(5)

with

ε = 10^{- 6}

to avoid division by zero. Let

\bar{R}

denote the arithmetic mean of available

R_{i}

and

R_{m a x}

the maximum (most positive) value. The composite, unitless index was defined as follows (6):

I = \sqrt{\frac{{R_{m a x}}^{2} + {\bar{R}}^{2}}{2}} .

(6)

Larger

I

corresponds to greater overall downstream deviation (averaged and in the extreme positive direction). The sign of

\bar{R}

conveys the dominant direction of change. As a sensitivity alternative,

\max ∣ R_{i} ∣

may replace

R_{m a x}

to obtain a symmetric variant with equal sensitivity to large negative and positive departures; the primary analysis used the one-sided form above.

The objective of further analysis was to build and compare low-complexity predictive models for the treated effluent indicators BOD and COD under a very small sample size (N = 10), with a preference for methods that enforce strong regularization and limit degrees of freedom. The input variables comprised upstream physicochemical measurements recorded before treatment: BOD (before), COD (before), Suspended solids, Ammonium salt, Nitrites, Nitrates, Chlorides, Sulfates, Phosphates, Synthetic surfactants, Petroleum hydrocarbons, Total mineralization (dry residue), and site indicator (point). Point was treated as a categorical predictor and encoded with indicator variables after dropping the reference level. The output variables were the two downstream responses measured after treatment: BOD (after) and COD (after). All numeric predictors were standardized to zero mean and unit variance prior to multivariate modelling.

Let

y \in R^{n}

denote the response (either

{B O D}_{a f t e r}

or

{C O D}_{a f t e r}

), and let

X \in R^{n \times p}

be the standardized predictor matrix (upstream indicators and, where applicable, site indicator variables), with coefficient vector

β \in R^{p}

and intercept

b

. The general linear formulation is

\hat{y} = b 1 + X β .

(7)

For the linear families, estimation can be written as penalized least squares:

\min_{b, β} \frac{1}{2 n} ∥ y - b 1 - X β ∥_{2}^{2} + P (β),

(8)

where the intercept

b

is not penalized. The penalty term

P (β)

differentiates the methods:

Baseline-Linear (OLS): $P (β) = 0$ .
Ridge: $P (β) = λ ∥ β ∥_{2}^{2}$ .
Lasso: $P (β) = λ ∥ β ∥_{1}$ .
Elastic-Net: $P (β) = λ [α ∥ β ∥_{1} + \frac{1 - α}{2} ∥ β ∥_{2}^{2}]$ , with $α \in [0,1]$ controlling the L1/L2 balance.

For the Fractional-Logit (Ratio) model, the response is the truncated removal ratio

r_{i} \in (0,1)

with

μ_{i} = E (r_{i}∣ X_{i})

and

logit (μ_{i}) = b + X_{i} β

; parameters are estimated by maximizing the binomial quasi-likelihood (equivalently, minimizing the negative log-likelihood):

\sum_{i = 1}^{n} [r_{i} \log (μ_{i}) + (1 - r_{i}) \log (1 - μ_{i})]

(9)

For PLS with

K

latent components,

X

is projected to a lower-dimensional score matrix

T = X W

(

K = 1

or

2

), and the final regression is obtained by least squares in the latent space:

\min_{b, γ} ∥ y - b 1 - T γ ∥_{2}^{2},

(10)

with

T

constructed to capture directions in

X

that are most informative for predicting

y

.

Five model families were estimated. Because the dataset is very small (N = 10) relative to the number of candidate predictors, and because strong collinearity is expected among routine water-quality indicators, we compared a small set of complementary low-complexity frameworks that represent different trade-offs aligned with the study objectives. The baseline linear model serves as an interpretable benchmark; Ridge regression improves prediction stability under multicollinearity; LASSO provides sparse variable selection to identify a minimal predictor set; Elastic Net balances stability and sparsity when predictors are correlated and may act in groups; and PLS offers a latent-variable alternative that reduces dimensionality while preserving covariance with the response. All model types were evaluated under the same LOOCV protocol to select the most accurate yet parsimonious specification for operational forecasting.

First, baseline linear level models related the outputs to their corresponding inputs with an intercept and, when present, a fixed effect for site, that is, BOD (after) ~ BOD (before) (+site) and COD (after) ~ COD (before) (+site). Second, a fractional specification modeled the removal ratio

r = \frac{after}{before}

using a binomial generalized linear model with a logit link; ratios were truncated to the open interval

(10^{- 6}, 1 - 10^{- 6})

to stabilize estimation near the boundaries, and predictions on the level scale were obtained by back-transformation

\hat{after} = \hat{r} \cdot before

. Third, Ridge regression was fit with fitrlinear using least-squares loss,

L_{2}

regularization, an estimated bias term, and the L-BFGS solver; the regularization strength

λ

was selected from a logarithmic grid spanning

10^{- 3}

to

10^{3}

with 60 points based on leave-one-out performance. Fourth, Lasso and Elastic Net were fit with lasso using

α = 1

and

α = 0.5

, respectively, cross-validated with leave-one-out by setting the number of folds equal to the sample size, with external standardization disabled because predictors had already been scaled, and with the final

λ

chosen at the minimum cross-validated mean squared error while retaining the intercept returned by the routine. Fifth, PLS regression with one and two latent components was estimated with plsregress, and predictions were formed as

[1, x] \hat{β}

on the standardized design.

Model selection and hyperparameter tuning were carried out using leave-one-out cross-validation (LOOCV) due to the small sample size (N = 10). In each LOOCV fold, the model was trained on N − 1 observations and evaluated on the single held-out observation. Hyperparameters (e.g., the regularization strength for Ridge/Lasso/Elastic-Net and the number of latent components for PLS) were selected to minimize the LOOCV RMSE. Accuracy was quantified by Mean Absolute Error (MAE), the Root Mean Squared Error (RMSE), and the coefficient of determination (

R^{2}

) computed on the held-out observations, and the primary criterion for choosing a winner for each endpoint was the lowest LOOCV RMSE. In the next stage of the analyses, the best-performing model for each endpoint will be used to generate operational predictions accompanied by prediction intervals; residual diagnostics will be performed with attention to leverage and influential points; sensitivity to the regularization strength and the number of latent components will be assessed; and robustness will be checked with a leave-one-out-of-cluster strategy if site structure is present. If additional observations become available, external validation on new measurements will be conducted, hierarchical specifications with location-level random effects will be considered, and a simple mechanistic mixing benchmark will be used to contextualize level predictions of BOD (after) and COD (after).

3. Results and Discussion

Biological treatment facilities serve as a barrier protecting surface water bodies from the entry of pollutants, thereby contributing to the preservation of natural hydrobiocenoses. Only under artificially created conditions in biological treatment facilities can the waste products generated by biocenosis communities and large volumes of pollutants in wastewater be rapidly and effectively processed [40]. The average annual indicators of wastewater treatment efficiency at these treatment facilities are presented in Table 1 and Table 2.

According to the data of the water-quality parameters at the influent and effluent of TF1 and TF2, the purification effects were determined (Table 3).

The analysis of the data in Table 3 shows different values of the cleaning effect for each parameter over the period of operation 2020–2024. At the same time, for each parameter, the cleaning effect has similar values over the years. For example, BOD_full for TF1 had a purification effect ranging from 95.9% (2020) to 94.61% (2024), and for TF2-82.79% (2020) to 76.42% (2024). The decrease in cleaning efficiency by some indicators can be explained by the duration of operation.

Discharge standards for pollutants, approved by the Department of Natural Resources and Environmental Regulation of the East Kazakhstan Region, are established at the discharge points. Based on the data from Table 1 and Table 2, diagrams were constructed showing pollutant concentrations that exceed the permissible concentrations (PC) at the discharge points according to the established discharge standards. Figure 3 presents data on BOD_full, Figure 4 on ammonium salt, Figure 5 on nitrates, Figure 6 on chlorides, Figure 7 on sulfates, and Figure 8 on phosphates. The graphs show data from treatment facilities TF1 and TF2 for the years 2020–2024.

For the BOD₅ indicator at treatment facilities No. 1 and No. 2, no exceedances of permissible concentrations (PC) were recorded from 2020 to 2022. At facility No. 1, exceedances were observed in 2023 and 2024 by 1.16 and 1.21 times, respectively. At facility No. 2, exceedances occurred in 2023 and 2024 by 1.21 and 1.39 times, respectively.

For the ammonium salt indicator at treatment facilities No. 1 and No. 2, no exceedances of permissible concentrations (PC) were recorded from 2020 to 2022. At facility No. 1, exceedances were observed in 2023 and 2024 by 1.41 and 1.63 times, respectively. At facility No. 2, exceedances occurred in 2023 and 2024 by 1.15 and 1.26 times, respectively.

For nitrate levels at treatment facility No. 1, no exceedances of permissible concentrations (PC) were recorded from 2020 to 2023, with an exceedance of 1.03 times noted in 2024. At facility No. 2, no exceedances of PC for nitrates were observed.

For chloride levels at treatment facility No. 1, no exceedances of permissible concentrations (PC) were recorded from 2020 to 2022, while exceedances of 1.27 and 1.22 times were noted in 2023 and 2024, respectively. At facility No. 2, no exceedances of PC for chlorides were observed.

For sulfate levels at treatment facility No. 1, no exceedances of permissible concentrations (PC) were recorded from 2020 to 2022, while exceedances of 1.14 and 1.21 times were observed in 2023 and 2024, respectively. At facility No. 2, a sulfate PC exceedance of 1.05 times was recorded in 2024.

For phosphate levels at treatment facility No. 1, no exceedances of permissible concentrations (PC) were recorded from 2020 to 2022, while exceedances of 1.83 and 2.06 times were observed in 2023 and 2024, respectively. At facility No. 2, no exceedances of PC for phosphates were noted.

For monitored parameters at the discharge points of treatment facilities No. 1 and No. 2, such as nitrites, suspended solids, synthetic surfactants, and petroleum hydrocarbons, no exceedances of permissible concentrations (PC) were observed.

Table 4 and Table 5 present the background concentrations at the control sections upstream and downstream of the discharge points of the studied treatment facilities.

The data analysis shows that pollutant concentrations at the control sections do not exceed the maximum permissible concentrations for water bodies designated for fisheries. This can be attributed to the high water volume of the Irtysh River, dilution of discharges, and the river’s self-purification capacity.

The author [24] notes that during the period 2001–2011, the water of the Irtysh River was at a “normatively clean” level according to domestic and household standards. Krupa et al. [1] also states that the maximum content of phosphates and nitrate nitrogen in the river in 2023 will be low. An analysis of the research results (2024–2025) cited by the authors in this article also classifies the water of the Irtysh River as “normatively clean.” This is explained by the assimilative properties of river water, as well as the precipitation of pollutants in the cascade of reservoirs.

The discharge of domestic wastewater from small settlements, treated by conventional methods and with long service life, into low-flow water bodies may lead to adverse environmental conditions. Therefore, it is important to emphasize the need for the reconstruction of existing treatment facilities or the implementation of new, modern water treatment technologies. This will enhance treatment efficiency, reduce anthropogenic impact, and ultimately help preserve the ecological state of the Irtysh River.

Effluent fractions inferred from conservative tracers indicated very strong dilution at both sites. Using chloride as the conservative tracer, the reference fractions yielded

D \approx 2.0 \times 10^{3}

(facility TF1) and

D \approx 4.2 \times 10^{2}

(facility TF2). Candidate tracers were screened by convexity and

0 \leq f \leq 1

; infeasible records (e.g., sulfates at facility 1) were excluded from the reference fraction (Table 6).

Using the chloride-derived

f^{*}

,

θ

was computed for reactive indicators (Table 7). For facility TF1, values spanned from

- 47.67

(nitrates) to 51.53 (nitrites); selected results were COD = 13.32, ammonium = 42.14, phosphates = 6.18, and dry residue = 2.04. For facility TF2, the range was

- 37.72

(nitrites) to 2.59 (dry residue); selected results were COD = 1.56, ammonium = 0.15, phosphates = 0.09, and nitrates =

- 35.62

. Values for the tracer used to obtain

f^{*}

are not reported (for that indicator

θ \equiv 1

by definition).

Using all indicators with paired upstream–downstream values (19 per site), the composite index reached

I = 0.51

for facility TF1 and

I = 0.53

for facility TF2. The mean relative change was

\bar{R} = 0.09

(facility TF1) and

\bar{R} = 0.03

(facility TF2), indicating small net increases downstream on average. The largest relative increase among indicators attained

R_{m a x} = 0.71

at facility TF1 and

R_{m a x} = 0.75

at facility TF2. By definition,

I = \sqrt{\frac{R_{m a x} + {\bar{R}}^{2}}{2}}

.

The tracer-based mixing/dilution calculation indicates that the effluent signal arrived at the downstream cross-sections under very strong dilution at both facilities. Using chloride as the preferred conservative tracer, the reference effluent fractions (

f^{*}

) corresponded to dilution factors of approximately

D \approx 2.0 \times 10^{3}

(facility TF1) and

D \approx 4.2 \times 10^{2}

(facility TF2).

Against this dilution background, departures of reactive indicators from the mixing line were quantified using the transformation diagnostic

θ

, which compares observed downstream concentrations with values predicted by mixing at the site-specific

f^{*}

. At facility TF1, several constituents exceeded the mixing expectation substantially, with

θ \sim 13

for COD,

θ \sim 4 \times 10^{1}

for ammonium, and

θ \sim 6

for phosphates. These values imply downstream concentrations higher than can be explained by entrainment alone and are consistent with additional in-reach inputs (e.g., lateral inflows, bank returns), short-term storage–release dynamics (e.g., sediment–water exchanges), or incomplete transverse mixing at the sampling distance. In contrast, nitrate exhibited

θ < 0

, indicating concentrations below the mixing prediction, consistent with strong net removal over the monitored reach or with temporal misalignment between river and effluent samples. At facility TF2, values clustered nearer unity: COD showed only a modest excess over mixing (

θ ≳ 1

), whereas ammonium and phosphates were

< 1

, indicating net attenuation along the reach; nitrite and nitrate displayed negative

θ

, again suggestive of removal. Two numerical aspects merit attention. First, very small

f^{*}

(large

D

) inflate the denominator of

θ

, increasing sensitivity to small concentration differences; nonetheless, the consistent directional signals across multiple indicators support a process-based interpretation. Second, indicators with mismatched analytical definitions between effluent and river (e.g.,

{B O D}_{full}

vs.

{B O D}_{5}

) were excluded from this diagnostic to avoid bias.

A multi-indicator summary of upstream-to-downstream change, expressed by the unitless composite index

I

, yielded

I = 0.51

for facility TF1 and

I = 0.53

for facility TF2. Despite the large

θ

excursions for selected indicators, the mean relative changes were modest (

\bar{R} = 0.09

and

0.03

, respectively), while the largest relative increases (

R_{m a x} = 0.71

and

0.75

) explain why

I

settles near 0.5. This reconciliation is informative: the plume is strongly diluted overall, yet individual reactive constituents can deviate appreciably—positively or negatively—from the mixing line at local scales. These findings are coherent with a system where hydrodynamic dilution and self-purification dominate the aggregate signal, while site-specific sources/sinks or incomplete mixing shape the behavior of particular indicators. Practical implications follow. Verification of full mixing (e.g., by short-range logging of conductivity/chloride and, if possible, an intermediate station within the nominal mixing zone) would reduce uncertainty in

f^{*}

and, consequently, in

θ

. Improved temporal synchrony between effluent and river sampling would clarify the negative

θ

cases for oxidized nitrogen. Where flows are available, converting

f^{*}

to discharge ratios would strengthen the physical interpretation of dilution differences between facilities. Overall, the dilution calculation provides a physically grounded baseline; the transformation diagnostic identifies indicators and sites where biogeochemical processes or additional inputs are likely; and the composite index confirms that—despite notable departures for select constituents—the aggregate downstream impact remains modest.

In this stage, seven model specifications were evaluated—Baseline-Linear, Fractional-Logit (Ratio), Ridge, Lasso (α = 1), Elastic-Net (α = 0.5), PLS-1, and PLS-2—under leave-one-out cross-validation. The full set of quality indicators for BOD and COD is reported in Table 8. Briefly, for BOD the Lasso model achieved the lowest cross-validated error (RMSE = 0.626, MAE = 0.459, R² = 0.976), closely followed by Elastic-Net (RMSE = 0.655, MAE = 0.411, R² = 0.974), with Ridge and PLS yielding intermediate accuracy and Baseline-Linear and Fractional-Logit performing worse. For COD, Lasso again ranked first (RMSE = 0.795, MAE = 0.634, R² = 0.997), Elastic-Net placed second, Ridge and PLS formed a middle tier, the Baseline-Linear model was weaker, and the Fractional-Logit specification performed poorly after back-transformation to the level scale (RMSE = 18.978, R² < 0). All R² values reported in Table 8 are based on LOOCV (out-of-sample) predictions, not in-sample fits.

In the winning Lasso models, only a small subset of upstream indicators carried non-zero coefficients, which clarifies where most of the predictive signal resides. For BOD model, the retained predictors were Phosphates (β = 3.1202), Chlorides (β = −1.0255), and COD (β = 0.0088). On the standardized input scale this pattern suggests that higher phosphate content co-varies with higher treated BOD, whereas higher chloride—acting as a conservative tracer of dilution—is associated with lower BOD; the small positive weight on COD likely reflects residual correlation between organic load measures. For COD model, the active predictors were Nitrites (β = 9.8004), Sulfates (β = −3.2327), Suspended Solids (β = −2.0245), Petroleum Hydrocarbons (β = 1.9323), and COD (β = −1.5202). The strong positive coefficient for nitrites is consistent with episodes of incomplete nitrification co-occurring with elevated oxygen demand, while the negative coefficients for sulfates and suspended solids point to conditions where dilution or efficient solids removal reduce effluent COD; the positive petroleum-hydrocarbons coefficient indicates co-movement with COD, as expected for hydrophobic organic fractions. Because predictors were standardized before fitting, coefficient magnitudes provide a relative importance ranking (nitrites ≫ sulfates ≈ suspended solids for COD; phosphates > chlorides for BOD), but given the very small sample (N = 10) and the use of penalization, these signs should be interpreted as stable associations rather than causal effects.

The observed-versus-predicted scatterplot (Figure 9) shows a compact cloud of points aligned with the 1:1 identity line, indicating accurate level predictions across the observed range. A slight widening of the cloud at higher concentrations is visible and is consistent with the uncertainty expected under a sample size of ten, but no systematic bias is apparent.

Sensitivity checks confirmed the ranking: Elastic-Net trailed Lasso by a small margin, PLS(1) and PLS(2) showed only modest differences, and the ratio/logit specification under-performed for COD after back-transformation to levels.

Taken together, the mixing analysis and the statistical modeling tell a coherent story. The river’s large assimilative capacity and high dilution factors dominate the aggregate downstream signal, which is why background concentrations remain below regulatory thresholds. At the same time, several reactive constituents exhibit clear deviations from the mixing expectation at facility 1 and, to a lesser extent, at facility 2, consistent with local sources/sinks, short-term storage–release, or incomplete transverse mixing over the sampling distance. The sparse Lasso solutions reinforce this process view by isolating a few chemically meaningful predictors (e.g., oxidized nitrogen species for COD, phosphates and chlorides for BOD) that track effluent variability most closely.

Two limitations should be noted. First, the predictive sample is small (

N = 10

), so effect signs and magnitudes should be interpreted as stable associations rather than causal mechanisms; the chosen models address variance control through regularization and cross-validation, but additional data would sharpen estimates and allow external validation. Second, very small reference fractions (

f^{*}

) inflate the denominator of

θ

, which increases sensitivity to small concentration differences; nonetheless, the directional consistency across multiple indicators supports the qualitative interpretation.

From a management perspective, three implications follow. Verification of full mixing at the downstream cross-sections—for example by short-reach logging of conductivity or chloride, ideally with an intermediate station—would reduce uncertainty in

f^{*}

and, consequently, in

θ

. Synchronizing effluent and river sampling in time would clarify negative

θ

cases for oxidized nitrogen. Where discharge data are available, converting

f^{*}

to flow ratios would strengthen the physical interpretation of dilution and help compare facilities. In parallel, modernizing small, long-serving treatment plants remains important: even under strong river dilution, localized departures from the mixing line demonstrate that upgrades that enhance ammonium oxidation and phosphorus removal would further reduce the potential for reach-scale excursions. Overall, the dilution calculation provides a physically grounded baseline, the transformation diagnostic identifies where biogeochemical processes or additional inputs are likely, and the predictive models deliver practical tools for forecasting effluent quality with quantified uncertainty.

4. Conclusions

The article presents an assessment of the impact of discharges from two operating treatment facilities with a conventional treatment scheme serving small settlements. The research results showed that although the anthropogenic load on the water body from small treatment facilities is relatively minor, it is still present.

For treatment facilities, permissible concentrations are established for discharges in accordance with the approved pollutant discharge standards. These standards are authorized by the Department of Natural Resources and Environmental Regulation of the East Kazakhstan Region. Exceedances of permissible concentrations were recorded for BOD_full, ammonium salt, nitrates, chlorides, sulfates, and phosphates. No exceedances were observed for nitrites, suspended solids, synthetic surfactants, or petroleum hydrocarbons. This indicates the insufficient efficiency of the existing wastewater treatment technologies used in small-scale sewerage systems.

According to BODfull, suspended solids, ammonium salt, phosphates, synthetic surfactants, nitrates, and COD–, the purification effect for TF1 is more than 90%, and for TF2 it is more than 70%. For nitrites, the purification effect ranges from 20 to 6% for TF1, and 64 to 77% for TF2. For TF1 and TF2, the purification effect for chlorides ranges from 10 to 50%, for sulfates from 12 to 58%. Chlorides and sulfates are dissolved in water. In terms of total mineralization, the purification effect for TF1 ranges from 9 to 20%, and for TF2 from 65 to 67%. Their quantity depends on the quantitative content in drinking water for household and drinking purposes of consumers. For petroleum hydrocarbons, the purification effect for TF1 ranges from 25 to 50%, and for TF2 from 30 to 67%. These wastewater treatment plants do not provide technology for the retention of petroleum hydrocarbons, but they do not exceed PC at emission.

A comparative analysis of background concentrations before and after the discharges confirms that the high water volume and significant dilution capacity of the Irtysh River allow for partial compensation of localized impacts. However, despite the river’s natural self-purification ability, maintaining a stable ecological condition in the river basin requires a reduction in anthropogenic load. This can be achieved through the reconstruction of existing treatment facilities with the implementation of modern technological solutions aimed at removing biogenic parameters (phosphates, nitrogen compounds) and organic pollutants.

Thus, the modernization of small wastewater treatment facilities is a key condition for the sustainable development of coastal settlements and the preservation of ecological stability in the Irtysh River basin under increasing anthropogenic load.

This study shows that the downstream signal of the two investigated small-settlement facilities arrived under very strong dilution, as indicated by chloride-based reference fractions (D ≈ 2.0 × 10³ and 4.2 × 10²). Against this background, several reactive indicators at facility 1—most notably COD, ammonium, and phosphates—exceeded mixing expectations, whereas nitrate exhibited net removal; facility 2 displayed θ values closer to unity, indicating modest excess or attenuation. These outcomes reconcile a strongly diluted plume at the reach scale with constituent-specific departures that likely reflect in-reach sources/sinks, short-term storage–release, or incomplete transverse mixing at sampling distance.

From an operational perspective, low-complexity predictive models trained on upstream indicators performed strongly. Lasso regression yielded the most accurate forecasts for both BOD_after and COD_after, closely followed by Elastic-Net, while PLS and Ridge delivered intermediate accuracy. The winning models were sparse, relying on a small subset of upstream measurements, which simplifies routine monitoring and supports compact early-warning tools.

Three practical recommendations follow. First, tracer logging (e.g., conductivity/chloride) and, where feasible, an intermediate station would reduce uncertainty in effluent fractions and sharpen θ interpretation. Second, synchronizing river and effluent sampling times would improve diagnostics for oxidized nitrogen, which showed negative θ. Third, targeted modernization of small facilities—especially steps that enhance removal of biogenic elements and organic fractions—remains warranted, consistent with the manuscript’s broader argument on technology upgrades for small systems.

Author Contributions

Conceptualization, S.A. and V.K.; methodology, V.K.; validation, V.K. and A.S. and M.N.; formal analysis, M.N.; investigation, S.A. and M.K.; resources, S.A.; data curation, S.A. and N.N.; writing—original draft preparation, S.A. and Y.Y.; writing—review and editing, V.K. and M.K.; visualization, Y.Y. and N.N.; supervision, M.K. and Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original results presented in this study are included in the current paper. For any further inquiries, please contact the corresponding author.

Acknowledgments

We would like to express our deepest gratitude to the reviewers for their detailed reviews and comments, which helped to significantly improve this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Krupa, E.; Romanova, S.; Serikova, A.; Shakhvorostova, L. A Comprehensive Assessment of the Ecological State of the Transboundary Irtysh River (Kazakhstan, Central Asia). Water 2024, 16, 973. [Google Scholar] [CrossRef]
Kolpakova, V.P.; Yeremeyeva, Y.N.; Nurekenova, R.T.; Mamyrbekova, G.K.; Anapyanova, S.B. Assessment of the Water Quality Indicators of the Irtysh River under Conditions of Industrial Development and Global Processes. In Monograph; EKTU: Ust-Kamenogorsk, Kazakhstan, 2024; pp. 10–16. [Google Scholar]
Nadelyaeva, N.N. Protection of Water Bodies from Pollution by Bioorganic Material of Treated Wastewater in Regions with a Sharply Continental Climate. Ph.D. Dissertation, Transbaikal State University, Chita, Russia, 2009. [Google Scholar]
Stoyashcheva, N.V. Anthropogenic Load on Water Bodies of the Tom River Basin. Geogr. Nat. Resour. 2018, 3, 95–103. Available online: https://elibrary.ru/item.asp?id=35574511 (accessed on 3 July 2025).
Bureau of National Statistics of the Agency for Strategic Planning and Reforms of the Republic of Kazakhstan. Available online: https://stat.gov.kz/ru/industries/social-tatistics/demography/publications/117681/ (accessed on 3 July 2025).
Official Information Resource of the Prime Minister of the Republic of Kazakhstan. Available online: https://primeminister.kz/ru/news/17-trln-tenge-potratyat-do-2027-goda-na-razvitie-10-ti-monogorodov-v-kazakhstane-25472 (accessed on 3 July 2025).
Strategic Development Plan “Kazakhstan-2050” (Approved by Decree of the President of the Republic of Kazakhstan No. 636, 15 February 2018). Available online: https://www.akorda.kz/ru/official_documents/strategies_and_programs (accessed on 3 July 2025).
Zhasybaev, A.; Ospanov, K.T. Analysis of the Current State of Sewage Sludge Treatment in Cities of Republican Significance. Vestn. KazNTU 2013, 5, 102–104. Available online: http://wemag.ru/arhiv/2013 (accessed on 3 July 2025).
Zhumartov, Y.B. Improvement of Equipment and Technology for Wastewater Treatment in Small Sewerage Systems. Ph.D. Dissertation, Kazakh National Technical University named after K.I. Satpayev, Almaty, Kazakhstan, 2010; p. 40. [Google Scholar]
Zhalmagambetova, U.; Assanov, D.; Neftissov, A.; Biloshchytskyi, A.; Radelyuk, I. Implications of Water Quality Index and Multivariate Statistics for Improved Environmental Regulation in the Irtysh River Basin (Kazakhstan). Water 2024, 16, 2203. [Google Scholar] [CrossRef]
Kalmakhanova, M.S.; Diaz de Tuesta, J.L.; Malakar, A.; Gomes, H.T.; Snow, D.D. Wastewater Treatment in Central Asia: Treatment Alternatives for Safe Water Reuse. Sustainability 2023, 15, 4949. [Google Scholar] [CrossRef]
Ospanov, K.; Rakhimov, T.; Myrzakhmetov, M.; Andraka, D. Assessment of the Impact of Sewage Storage Ponds on the Water Environment in Surrounding Area. Water 2020, 12, 2483. [Google Scholar] [CrossRef]
Petrov, M.P.; Shagidullin, R.R. Anthropogenic Load on Water Bodies and Problems of Operation of Biological Treatment Facilities. Georesursy 2011, 2, 14–20. [Google Scholar]
Kazinform International News Agency. Sewerage Treatment Facilities Are Absent in 27 Cities—Ministry of Industry of the Republic of Kazakhstan. 2023. Available online: https://www.inform.kz/ru/kanalizacionnye-ochistnye-sooruzheniya-otsutstvuyut-v-27-gorodah-minindustrii-rk_a4017202 (accessed on 28 October 2025).
Vilson, E.V.; Butko, D.A. Updating Wastewater Treatment Technology Based on the Best Available Techniques. Bull. Eurasian Sci. 2019, 4. Available online: https://esj.today/PDF/39SAVN419.pdf (accessed on 22 July 2025).
Pupyrev, E.I.; Shelomkov, A.S. Economic Justification of Environmentally Safe Wastewater Treatment Technologies. Okhrana Okruzhayushchey Sredy. Water Supply Sanit. Tech. 2014, 1, 5–12. [Google Scholar]
Kulikov, N.I.; Omelchenko, V.V.; Kulikova, E.N.; Prikhodko, L.N. Wastewater Disposal: Textbook; LENAND: Moscow, Russia, 2018; pp. 110–118. [Google Scholar]
Shtonda, Y.I.; Panina, V.G. Studies of the Efficiency of Wastewater Treatment at Existing Small and Local Sewerage Treatment Facilities in the Republic of Crimea. In Proceedings of the International Scientific Forum: Science and Innovations—Modern Concepts; Infinity Publishing: Moscow, Russia, 2022. [Google Scholar] [CrossRef]
Kulakov, A.A. Small Wastewater Treatment Plants: The Problem of Choosing a Technical Solution. Ways of Effectively Eliminating the Consequences of an Unsuccessful Choice. In Best Available Technologies of Water Supply and Wastewater Disposal; Springer: Berlin/Heidelberg, Germany, 2019; Volume 6, pp. 12–22. Available online: https://www.elibrary.ru/item.asp?id=45695539 (accessed on 5 July 2025).
Boronina, L.V.; Abuova, G.B. Ecological Assessment of the Efficiency of Water Treatment for Small Settlements. Eng. Constr. Bull. Casp. Reg. 2019, 4, 38–42. [Google Scholar]
Kolpakova, V.; Yeremeyeva, Y.; Anapyanova, S.; Shevtsov, M.; Utepbergenova, L.; Abdukalikova, G.; Abduova, A.; Sarypbekova, N.; Shakhmov, Z. Design and Construction of Wastewater Treatment Facilities for Small Sewerage Facilities. Case Stud. Chem. Environ. Eng. 2024, 9, 100774. [Google Scholar] [CrossRef]
Shuvalov, M.V.; Strelkov, A.K.; Shuvalov, R.M. Transformation of Biological Wastewater Treatment Technology Using Disc Biofilters. In Traditions and Innovations in Construction and Architecture. Construction and Building Technologies, Proceedings of the 80th Anniversary All-Russian Scientific and Technical Conference, Samara, Russia, 28–31 March 2023; Available online: https://www.elibrary.ru/item.asp?id=54366366 (accessed on 5 July 2025).
Pervov, A.G. Creation of Wastewater Treatment Systems Using Bioreactors Manufactured by “Raifil”. Water Treatment. Water Conditioning. Water Supply 2014, 3, 58–67. [Google Scholar]
Titov, E.A. Intensification of Domestic Wastewater Treatment at Compact Plants Using Attached Biocenoses and Flocculants. Ph.D. Dissertation, Penza State University, Penza, Russia, 2006. Available online: https://www.dissercat.com/content/intensifikatsiya-ochistki-khozyaistvenno-bytovykh-stochnykh-vod-na-kompaktnykh-ustanovkakh-s (accessed on 5 July 2025).
Anapyanova, S.B.; Nabiollina, M.S.; Mamyrbekova, G.K.; Kolpakova, V.P.; Yeremeyeva, Y.; Frolova, G.P. Assessment of the Suitability of Treated Wastewater from Small Settlements Located in the Irtysh River Basin for Irrigation. Izděnıster Natızheler Issled. Rezul’taty 2024, 3, 361–371. [Google Scholar] [CrossRef]
Burlibaev, M.Z.; Amirgaliyev, N.A.; Sherberger, I.V.; Skolsky, V.A.; Burlibaeva, D.M.; Uvasov, D.V.; Smirnova, D.A.; Efimenko, A.V.; Milyukov, D.Y. Problems of Pollution of Major Transboundary Rivers of Kazakhstan; Kaganat: Almaty, Kazakhstan, 2014; Volume 1, pp. 112–130. [Google Scholar]
Burlibaeva, D.M. Hydroecological Principles of Water Allocation on Transboundary Rivers of Kazakhstan; Qanaǵat Publishing House: Almaty, Kazakhstan, 2017; pp. 76–128. [Google Scholar]
Shenberger, I.V. The Nature of Transformation of the Chemical Composition and Toxicological Indicators of the Irtysh River Runoff. Sci. New Technol. Innov. Kyrg. 2018, 3, 115–119. [Google Scholar]
Kolpakova, V.; Yeremeyeva, Y.; Anapyanova, S.; Mamyrbekova, G.; Nurekenova, R.; Utepbergenova, L.; Shakhmov, Z.; Aniskin, A. Hydroecological assessment of the Kazakh part of the Yertis River under conditions of industrial development. Results Eng. 2024, 24, 103578. [Google Scholar] [CrossRef]
Kendall, C.; McDonnell, J.J. (Eds.) Isotope Tracers in Catchment Hydrology; Elsevier Science: Amsterdam, The Netherlands, 1998. [Google Scholar]
McGuire, K.J.; McDonnell, J.J. Tracer advances in catchment hydrology. Hydrol. Process. 2015, 29, 5135–5138. [Google Scholar] [CrossRef]
Christophersen, N.; Hooper, R.P. Multivariate analysis of stream water chemical data: The use of principal components analysis for the end-member mixing problem. Water Resour. Res. 1992, 28, 99–107. [Google Scholar] [CrossRef]
Hooper, R.P. Diagnostic tools for mixing models of stream water chemistry. Water Resour. Res. 2003, 39, 1055. [Google Scholar] [CrossRef]
Burns, D.A.; McDonnell, J.J.; Hooper, R.P.; Peters, N.E.; Freer, J.E.; Kendall, C.; Beven, K. Quantifying contributions to storm runoff through end-member mixing analysis and hydrologic measurements at the Panola Mountain Research Watershed (Georgia, USA). Hydrol. Process. 2001, 15, 1903–1924. [Google Scholar] [CrossRef]
Barthold, F.K.; Tyralla, C.; Schneider, K.; Vaché, K.B.; Frede, H.-G.; Breuer, L. How many tracers do we need for end member mixing analysis (EMMA)? A sensitivity analysis. Water Resour. Res. 2011, 47, W08519. [Google Scholar] [CrossRef]
Bugaets, A.N.; Gartsman, B.I.; Gubareva, T.S.; Lupakov, S.M.; Kalugin, A.S.; Shamov, V.V.; Gonchukov, L.S. Comparing the runoff decompositions of small experimental catchments: End-member mixing analysis (EMMA) versus hydrological modelling. Water 2023, 15, 752. [Google Scholar] [CrossRef]
Zhou, J.; Wu, J.; Liu, S.; Zeng, G.; Qin, J.; Wang, X.; Zhao, Q. Hydrograph separation in the headwaters of the Shule River Basin: Combining water chemistry and stable isotopes. Adv. Meteorol. 2015, 2015, 830306. [Google Scholar] [CrossRef]
Yang, X.; Tetzlaff, D.; Müller, C.; Knöller, K.; Borchardt, D.; Soulsby, C. Upscaling tracer-aided ecohydrological modeling to larger catchments: Implications for process representation and heterogeneity in landscape organization. Water Resour. Res. 2023, 59, e2022WR033033. [Google Scholar] [CrossRef]
Thompson, A.N.; Bickmore, B.R.; Carling, G.T.; Evans, E.J.; Nelson, S.T.; LeMonte, J.J.; Rey, K.A.; Fernandez, D.P.; Caskey, K.L. Improved endmember mixing analysis (EMMA): Application to a snowmelt-dominated stream in northern Utah. EGUsphere 2025, 2025, 1–32. [Google Scholar] [CrossRef]
Petrov, A.M.; Knyazev, I.V.; Yakimova, T.V. The effectiveness of the «small» treatment facilities in the Republic of Tatarstan: Conditions that ensure stabilization and improve the depth of municipal wastewater treatment. J. Ecol. Ind. Saf. 2010, 2, 64–66. [Google Scholar]

Figure 1. Schematic map of settlements located along the Kazakhstani section of the Irtysh river.

Figure 2. Overview of the integrated workflow used in this study.

Figure 3. Dynamics of changes in BOD_full (mg/L) concentration.

Figure 4. Dynamics of changes in ammonium salt (mg/L) concentration.

Figure 5. Dynamics of changes in nitrate (mg/L) concentration.

Figure 6. Dynamics of changes in chloride (mg/L) concentration.

Figure 7. Dynamics of changes in sulfate (mg/L) concentration.

Figure 8. Dynamics of changes in phosphate (mg/L) concentration.

Figure 9. R² observed versus LOOCV-predicted for the best model (Lasso, α = 1): (a) BOD, (b) COD.

Table 1. Indicators of wastewater treatment quality at the treatment facilities (TF1) of small settlement No. 1 (parameter values in mg/L).

Indicators	Years
Indicators	2020	2021	2022	2023	2024
BOD_full	137.99 5.66	132.88 5.6	134.1 5.52	127.6 6.6	128.01 6.9
Suspended solids	105.66 4.34	101.58 4.33	101.58 4.0	105.7 4.2	116.4 4.5
Ammonium salt	45.66 0.69	45.33 0.61	49.33 0.63	53.3 1.0	48.33 1.16
Nitrites	0.05 0.04	0.07 0.04	0.08 0.04	0.09 0.04	0.09 0.04
Nitrates	1.21 53.0	1.23 53.58	1.25 53.51	1.28 53.7	1.3 55.4
Chlorides	52.37 36.3	53.75 36.36	51.17 36.3	51.67 46.1	49.5 44.3
Sulfates	33.25 24.15	30.83 24.28	32.1 23.97	34.17 27.58	33.3 29.3
Phosphates	11.56 0.46	12.5 0.46	10.88 0.4	13.55 0.88	13.3 0.99
Synthetic surfactants	1.09 0.04	1.1 0.04	0.55 0.04	0.34 0.02	0.3 0.02
Petroleum hydrocarbons	0.16 0.03	0.04 0.03	0.05 0.03	0.02 0.01	0.02 0.01
COD	ND 31.75	309.5 30.83	322.9 31.83	ND 32.5	ND 31.1
Total mineralization (dry residue)	309.3 274.1	298.2 260.3	295.9 237.3	300 272.7	315.2 262

Note: The numerator shows the quality of water entering the TF1, while the denominator shows the quality after complete biological treatment (at the effluent).

Table 2. Indicators of wastewater treatment quality at the treatment facilities (TF2) of small settlement No. 2 (parameter values in mg/L).

Indicators	Years
Indicators	2020	2021	2022	2023	2024
BOD_full	70.35 12.11	72.21 12.2	70.42 12.18	71.6 14.8	72.1 17
Suspended solids	80.87 17.24	79.41 17.29	78.87 16.99	79.6 16.5	81.2 17.02
Ammonium salt	29.6 4.74	28.76 4.69	29.1 4.6	29.23 5.6	29.13 6.1
Nitrites	0.32 0.09	0.31 0.08	0.28 0.1	0.3 0.07	0.3 0.1
Nitrates	0.31 1.14	0.3 1.18	0.31 1.2	0.3 0.96	0.3 0.9
Chlorides	43.1 24.4	44.05 24.59	41.58 24.46	37.81 18.9	39.27 18.4
Sulfates	29.1 12.9	26.13 12.31	28.28 12.17	29.6 12.4	28.16 13.76
Phosphates	18.3 11.36	18.74 11.48	19.03 11.48	21.03 10.12	21.7 9.9
Synthetic surfactants	2.03 0.3	2.09 0.27	1.94 0.27	1.89 0.3	1.7 0.3
Petroleum hydrocarbons	0.67 0.29	0.39 0.27	0.42 0.28	0.42 0.14	0.3 0.11
COD	277.9 63.2	272.2 65.23	270.9 57.76	268.3 59.1	273.1 59.3
Total mineralization (dry residue)	337.9 115.1	337.9 115.8	333.3 114.2	331.9 112.5	336.7 113.8

Note: The numerator shows the quality of water entering the TF2, while the denominator shows the quality after complete biological treatment (at the effluent).

Table 3. The effect (%) of purification on water-quality parameters for TF1 and TF2.

Indicators	Years
	2020		2021		2022		2023		2024
	TF1	TF2	TF1	TF2	TF1	TF2	TF1	TF2	TF1	TF2
BOD_full	95.9	82.79	95.79	83.1	95.88	82.7	94.83	79.33	94.61	76.42
Suspended solids	95.9	78.68	95.74	78.23	96.06	78.46	96.03	79.27	96.13	79.04
Ammonium salt	98.49	83.99	98.65	83.69	98.72	84.19	98.12	80.84	97.6	79.06
Nitrites	20	71.88	42.86	74.19	50	64.29	55.56	76.67	55.56	66.67
Nitrates	97.72	72.81	97.7	74.58	97.66	74.17	97.62	68.75	97.65	66.67
Chlorides	44.27	43.39	32.35	44.18	29.06	41.17	10.78	50.01	10.51	53.14
Sulfates	27.37	55.67	21.25	52.89	25.33	56.97	19.29	58.11	12.01	53.14
Phosphates	96.02	37.92	96.32	38.74	96.32	39.67	93.51	51.88	92.56	54.38
Synthetic surfactants	96.33	85.22	96.36	87.08	92.73	86.08	94.12	84.13	93.33	82.35
Petroleum hydrocarbons	81.25	56.72	25	30.77	40	33.33	50	66.67	50	63.33
COD	90.2	77.26	90.04	76.04	90.14	78.68	90.27	77.97	90.55	78.29
Total mineralization (dry residue)	11.38	65.94	12.71	65.73	19.8	65.74	9.1	66.1	16.88	66.2

Table 4. Concentrations of substances in the Irtysh River water before and after the discharge from treatment facility (TF1).

No.	Water Quality Indicators, mg/L	Background Concentrations– Facility (TF1)
		Before		After
		2024 Year	2025 Year	2024 Year	2025 Year
1	Suspended solids	3.5	6.3	3.7	6.5
2	Chlorides	3.62	4.28	3.64	4.32
3	Sulfates	36.4	38.4	24.5	32.5
4	Magnesium	7.6	7.4	7.6	7.4
5	Calcium	27.2	26.4	27.6	26.6
6	Total mineralization (dry residue)	128.6	130.7	128.8	131.8
7	COD	7.5	7.3	7.7	7.5
8	BOD₅	1.62	1.68	1.64	1.72
9	Ammonium salt	0.062	0.074	0.086	0.082
10	Nitrites	0.027	0.015	0.028	0.018
11	Nitrates	3.14	5.62	1.84	6.24
12	Phosphates	0.014	0.012	0.017	0.012
13	Phenols	ND	ND	ND	ND
14	Anionic surfactants	ND	ND	ND	ND
15	Petroleum hydrocarbons	0.009	0.006	0.012	0.006
16	Copper	<0.001	<0.001	<0.001	<0.001
17	Zinc	0.007	0.004	0.012	0.006
18	Lead	0.002	0.001	0.003	0.001
19	Chromium (VI)	<0.0002	<0.0002	<0.0002	<0.0002
20	Total iron	0.024	0.028	0.026	0.032
21	Cadmium	<0.0001	<0.0001	<0.0001	<0.0001
22	Manganese	0.006	0.005	0.006	0.007

Table 5. Concentrations of substances in the Irtysh River water before and after the discharge from treatment facility (TF2).

No.	Water Quality Indicators, mg/L	Background Concentrations– Facility (TF2)
		Before		After
		2024 Year	2025 Year	2024 Year	2025 Year
1	Suspended solids	8.2	10.6	4.6	8.4
2	Chlorides	1.46	1.58	1.5	1.86
3	Sulfates	20.6	23.5	18.2	24.3
4	Magnesium	7.3	7.8	7.4	8.2
5	Calcium	26.5	26.0	26.9	26.2
6	Total mineralization (dry residue)	124.8	126.8	125.2	127.4
7	COD	7.6	6.6	7.8	6.8
8	BOD₅	1.55	1.72	1.57	1.74
9	Ammonium salt	0.054	0.056	0.056	0.062
10	Nitrites	0.04	0.032	0.031	0.037
11	Nitrates	1.82	2.25	1.74	2.84
12	Phosphates	0.016	0.009	0.018	0.009
13	Phenols	ND	ND	ND	ND
14	Anionic surfactants	ND	ND	ND	ND
15	Petroleum hydrocarbons	0.008	0.004	0.01	0.006
16	Copper	<0.001	<0.001	<0.001	<0.001
17	Zinc	0.008	0.005	0.009	0.005
18	Lead	0.002	0.001	0.002	0.001
19	Chromium (VI)	<0.0002	<0.0002	<0.0002	<0.0002
20	Total iron	0.025	0.032	0.026	0.036
21	Cadmium	<0.0001	<0.0001	<0.0001	<0.0001
22	Manganese	0.004	0.002	0.007	0.002

Table 6. Indicators used to estimate

f^{*}

and dilution

D

.

Table 6. Indicators used to estimate

f^{*}

and dilution

D

.

Point	Water Quality Indicators, mg/L	$C_{up}$	$C_{down}$	$C_{eff}$	$f$	$D$
facility TF1	Suspended solids	3.500	3.700	4.500	0.2	5
facility TF1	Chlorides	3.620	3.640	44.300	0.000492	2034
facility TF1	Sulfates	36.400	24.500	29.300	1.676056	0.596639
facility TF1	Total mineralization (dry residue)	128.600	128.800	262.000	0.001499	667
facility TF1	COD	7.500	7.700	31.100	0.008475	118
facility TF1	Ammonium salt	0.062	0.086	1.160	0.021858	45.75
facility TF1	Nitrites	0.027	0.028	0.040	0.076923	13
facility TF1	Nitrates	3.140	1.840	55.400	−0.02488	−40.2
facility TF1	Phosphates	0.014	0.017	0.990	0.003074	325.3333
facility TF1	Petroleum hydrocarbons	0.009	0.012	0.010	3	0.333333
facility TF2	Suspended solids	8.200	4.600	17.020	−0.40816	−2.45
facility TF2	Chlorides	1.460	1.500	18.400	0.002361	423.5
facility TF2	Sulfates	20.600	18.200	13.760	0.350877	2.85
facility TF2	Total mineralization (dry residue)	124.800	125.200	113.800	−0.03636	−27.5
facility TF2	COD	7.600	7.800	59.300	0.003868	258.5
facility TF2	Ammonium salt	0.054	0.056	6.100	0.000331	3023
facility TF2	Nitrites	0.040	0.031	0.100	−0.15	−6.66667
facility TF2	Nitrates	1.820	1.740	0.900	0.086957	11.5
facility TF2	Phosphates	0.016	0.018	9.900	0.000202	4942
facility TF2	Petroleum hydrocarbons	0.008	0.010	0.110	0.019608	51

Table 7.

θ

values for reactive indicators computed with site-specific

f^{*}

for Chlorides.

Table 7.

θ

values for reactive indicators computed with site-specific

f^{*}

for Chlorides.

Point	Water Quality Indicators, mg/L	$C_{up}$	$C_{down}$	$C_{eff}$	$θ$
Facility TF1	COD	7.500	7.700	31.100	13.32
	Ammonium salt	0.062	0.086	1.160	42.14
	Nitrites	0.027	0.028	0.040	51.53
	Nitrates	3.140	1.840	55.400	−47.67
	Phosphates	0.014	0.017	0.990	6.18
Facility TF2	COD	7.600	7.800	59.300	1.56
	Ammonium salt	0.054	0.056	6.100	0.15
	Nitrites	0.040	0.031	0.100	−37.72
	Nitrates	1.820	1.740	0.900	−35.62
	Phosphates	0.016	0.018	9.900	0.09

Table 8. Cross-validated performance for BOD and COD for all models.

Model	BOD			COD
Model	MAE	RMSE	R²	MAE	RMSE	R²
Baseline-Linear	1.402	1.8239	0.79877	2.0632	2.5767	0.96968
Fractional-Logit (Ratio)	1.5276	1.8532	0.79227	15.195	18.978	−0.64453
Ridge	0.80739	1.2279	0.9088	2.6277	3.1643	0.95428
Lasso (α = 1)	0.45894	0.62611	0.97629	0.63449	0.79474	0.99712
Elastic-Net (α = 0.5)	0.41135	0.65492	0.97405	1.2952	1.6175	0.98805
PLS-1	1.6559	1.9043	0.78064	2.4617	2.9608	0.95997
PLS-2	1.1456	1.4603	0.871	2.9708	3.2014	0.9532

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Anapyanova, S.; Kolpakova, V.; Kulisz, M.; Nabiollina, M.; Yeremeyeva, Y.; Nurbayeva, N.; Sherov, A. Assessment of Small-Settlement Wastewater Discharges on the Irtysh River Using Tracer-Based Mixing Diagnostics and Regularized Predictive Models. Water 2026, 18, 232. https://doi.org/10.3390/w18020232

AMA Style

Anapyanova S, Kolpakova V, Kulisz M, Nabiollina M, Yeremeyeva Y, Nurbayeva N, Sherov A. Assessment of Small-Settlement Wastewater Discharges on the Irtysh River Using Tracer-Based Mixing Diagnostics and Regularized Predictive Models. Water. 2026; 18(2):232. https://doi.org/10.3390/w18020232

Chicago/Turabian Style

Anapyanova, Samal, Valentina Kolpakova, Monika Kulisz, Madina Nabiollina, Yuliya Yeremeyeva, Nailya Nurbayeva, and Anvar Sherov. 2026. "Assessment of Small-Settlement Wastewater Discharges on the Irtysh River Using Tracer-Based Mixing Diagnostics and Regularized Predictive Models" Water 18, no. 2: 232. https://doi.org/10.3390/w18020232

APA Style

Anapyanova, S., Kolpakova, V., Kulisz, M., Nabiollina, M., Yeremeyeva, Y., Nurbayeva, N., & Sherov, A. (2026). Assessment of Small-Settlement Wastewater Discharges on the Irtysh River Using Tracer-Based Mixing Diagnostics and Regularized Predictive Models. Water, 18(2), 232. https://doi.org/10.3390/w18020232

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Assessment of Small-Settlement Wastewater Discharges on the Irtysh River Using Tracer-Based Mixing Diagnostics and Regularized Predictive Models

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Chemical Water Quality Parameters

2.3. Mathematical Methods/Models

3. Results and Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI