2. Background of Research
In recent years, climate change and increasing instability of hydrological regimes have significantly complicated the operating conditions of hydropower facilities [
4,
5,
6,
7]. In particular, seasonal and long-term variability of the Ili River flow directly affects the operating regime of the Kapshagay Hydropower Plant, increasing the load on generating units and creating additional risks to the stability of electricity generation. According to the International Energy Agency, the average capacity factor of global hydropower plants decreased from approximately 38% in the 2000s to about 36% during the period 2020–2022. This decline has led to an annual global shortfall of approximately 240 TWh of electricity generation, thereby reducing the overall share of hydropower in the global energy system. This trend is also highly relevant for large water-resource-dependent stations such as the Kapshagay Hydropower Plant. To illustrate the operational context of the study,
Figure 1 presents the structural layout of the Kapshagay Hydropower Plant and the distribution of maintenance types in hydropower systems. These figures provide background information on the technical structure of the station and typical maintenance practices relevant to the reliability analysis performed in this study.
Figure 1a presents the structural layout of the Kapshagay HPP, while
Figure 1b illustrates the global role of hydropower in the energy system and its contribution to electricity generation worldwide [
2]. In addition, using the example of the Kapshagay Hydropower Plant, the practical importance of hydropower in ensuring the stability and reliability of the energy system is clearly demonstrated.
Another important factor that further enhances the relevance of this topic is the aging of hydropower equipment. According to assessments by the International Hydropower Association, approximately 630 GW of installed hydropower capacity worldwide has been in operation for more than 30 years, while about 490 GW has been operating for over 40 years [
8,
9]. Data from the U.S. Department of Energy indicate that the average age of many hydropower plants exceeds 60 years, and that the wear of key equipment leads to an increase in unplanned outages [
10,
11,
12]. This situation highlights the need to reconsider existing approaches to maintenance and repair.
In modern energy science, a clear transition is observed from scheduled or post-failure repair models toward reliability-oriented and optimized maintenance strategies. According to IRENA, by the end of 2023 global hydropower capacity reached 1268 GW excluding pumped storage and 1408 GW including pumped storage [
13,
14,
15]. For infrastructure of such scale and complexity, making maintenance decisions directly linked to equipment reliability indicators is of exceptional scientific and practical importance.
Moreover, the scientific optimization of maintenance strategies has a substantial impact on economic efficiency. According to World Bank estimates, improvements in operation and maintenance (O&M) practices at hydropower plants can reduce overall operating costs by up to 40% in certain cases [
16]. This evidence demonstrates that the correct selection of maintenance interventions not only enhances equipment reliability but also significantly improves its life-cycle efficiency.
Thus, climate instability, the aging of hydropower equipment, the large scale of infrastructure, and increasingly stringent reliability requirements collectively determine the high relevance of identifying optimal maintenance interventions to improve the operational reliability of equipment at the Kapshagay Hydropower Plant within the current scientific agenda. Therefore, the scientific optimization of maintenance actions aimed at ensuring the reliable operation of Kapshagay HPP equipment represents a highly relevant research direction with substantial practical significance.
3. Problem Statement
Improving the reliability of hydroelectric power plant (HPP) equipment has become one of the most relevant scientific directions in the energy sector in recent years. Martinez-Monseco [
17] demonstrated, using the example of a power transformer at a hydroelectric power plant, that optimization of reliability-centered maintenance (RCM) strategies can reduce unplanned outages by 15–20%. This effect is directly associated with a decrease in the time-dependent failure (hazard) rate of the equipment, which is defined as
where T denotes the random time to failure. The hazard function characterizes the conditional probability of failure at a given time instant and serves as a fundamental quantitative indicator for evaluating the effectiveness of maintenance strategies. The hazard function formulation follows classical reliability theory [
17,
18].
Based on the hazard function, the reliability function of the equipment is expressed as
where R(t) is the reliability function, λ(t) is the hazard rate, and t denotes operating time.
The cumulative probability of failure is given by
where F(t) is the cumulative probability of failure.
In practice, for equipment with an operating lifetime exceeding 30 years, accurate estimation of λ(t) becomes problematic due to incomplete failure statistics and the nonlinear nature of aging processes. Under such conditions, the hazard rate is often approximated by a wear-dependent model of the form:
The parameters λ
0, α, and β were identified from the Kapshagay HPP operational dataset (2020–2025) using nonlinear least-squares regression implemented in Python. The parameter λ
0 and α have the dimension of time
−1, while β is dimensionless and controls the curvature (acceleration) of the aging process. Parameter estimation was performed by fitting the model λ(t) to the empirical failure intensity derived from recorded failure events and total operating time. This degradation-based hazard model is adapted from aging reliability models described in [
18].
Zhang et al. [
19] showed that the application of biologically inspired intelligent methods improves the accuracy of predicting the technical condition of hydro units by 10–25% compared to conventional approaches. These methods rely on representing the equipment condition in a multidimensional diagnostic space defined by a state vector:
where x
1(t) corresponds to vibration amplitude, x
2(t) to winding temperature, x
3(t) to the dielectric loss factor of insulating oil, and other relevant parameters. On this basis, the probability of failure is estimated using a nonlinear prognostic model:
where g(x) denotes a generalized hazard function identified using intelligent algorithms such as neural networks, genetic algorithms, or other evolutionary optimization techniques.
However, the computational complexity of such models increases rapidly with the number of diagnostic parameters and model depth, which can be approximately expressed as
where n is the number of monitored parameters and k reflects the structural complexity of the model. Consequently, despite their high predictive accuracy, the practical implementation of biologically inspired intelligent methods at aging hydroelectric power plants remains challenging due to limited computational resources and insufficient historical data. This expression represents a generalized approximation of computational complexity growth for nonlinear prognostic models and is provided for qualitative assessment.
Chen and Zhong [
20] proposed a probabilistic multi-time-scale power prediction model for hydropower stations that accounts for multiple sources of uncertainty. In this framework, the predicted power output of a hydropower plant is treated as a stochastic variable dependent on the variability of water inflow Q(t), load demand fluctuations L(t), and hydrological forecasting errors εh(t):
The authors demonstrated that simultaneously incorporating several uncertainty factors significantly improves prediction accuracy compared to deterministic approaches. However, the effectiveness of the proposed model is highly dependent on the quality of input data and the accuracy of parameter calibration, which may limit its direct applicability under real operating conditions. The sensitivity of the predicted power output to model parameters can be expressed as
where θ
i denotes the model parameters and Δθ
i represents their associated uncertainties. Consequently, additional model adaptation and validation using real hydropower plant data are required to ensure reliable performance in practical industrial applications.
de Santis et al. [
21], through a systematic literature review, analyze condition-based maintenance (CBM) strategies in hydropower plants and demonstrate that this approach can increase equipment availability and reduce unplanned outages. In addition, the authors emphasize that the effectiveness of CBM is directly dependent on the implementation of continuous monitoring sensor systems. However, the high initial cost of such systems, the complexity of data acquisition and processing infrastructure, and the need for skilled personnel make condition-based maintenance strategies economically constrained or difficult to implement for many hydropower plants. Overall,
Figure 2 presents the structural and functional framework of a condition-based maintenance system for hydropower plant equipment.
Figure 2 presents the structural architecture of a condition-based maintenance (CBM) system applied to hydropower equipment. The diagram illustrates the interaction between monitoring sensors, diagnostic data processing modules, reliability evaluation algorithms, and maintenance decision units. Continuous monitoring of vibration, temperature, and electrical parameters enables early detection of equipment degradation and supports predictive maintenance planning. The presented framework is based on CBM methodologies described in previous studies [
21,
22].
However, the deployment of continuous monitoring sensors and data processing infrastructure typically requires an increase in initial investment of approximately 25–40%, which necessitates additional techno-economic justification under real operating conditions.
Caricimi et al. [
22] propose the application of multi-criteria decision-making methods to optimize the selection and operation of hydropower turbines, developing an integrated methodology that combines the Analytical Hierarchy Process (AHP) with the fuzzy VIKOR approach. The authors demonstrate that simultaneously accounting for technical parameters, economic indicators, and operating conditions significantly enhances the robustness and quality of decision-making (
Table 1). However, the stochastic nature of equipment failures and the temporal variability of hydrological regimes are considered only to a limited extent, which complicates the application of the proposed approach for long-term maintenance strategy planning.
The values presented in
Table 1 are illustrative ranges synthesized from published studies on turbine evaluation methods, including [
22], and are provided to demonstrate the typical decision-making parameters considered in hydropower turbine selection.
Based on the table results, turbine efficiency varies within the range of 0.86–0.94 and has the highest AHP weight (0.28), while the installation cost ranges from 1.2 to 3.8 million € with a corresponding decision weight of 0.22. According to the fuzzy VIKOR evaluation, the turbine option with the lowest integral index (Q = 0.29) is identified as the optimal alternative; however, the variation in maintenance intervals (2–6 years) and the hydrological adaptability coefficient (0.65–0.92) introduces additional uncertainty in long-term planning.
Özcan et al. [
23] propose a risk-based maintenance approach for hydropower plants, demonstrating that the integrated consideration of failure probability and consequence severity can enhance overall system reliability. The authors show that prioritizing maintenance actions according to risk levels enables more efficient allocation of maintenance resources. However, in complex electromechanical systems, the diversity of failure scenarios and their interdependencies complicate accurate risk assessment, leaving this issue as an ongoing scientific challenge. Overall, the parameters summarized in
Table 2 and
Table 3 illustrate the typical risk assessment indicators used in hydropower maintenance studies and provide a conceptual basis for the reliability analysis performed in this research.
The values presented in
Table 2 illustrate typical ranges of parameters used in risk-based maintenance evaluation and are derived from the general methodological framework described in [
23].
Based on the data presented in
Table 2, the failure probability varies within the range of 0.01–0.18 and has the highest weight coefficient (0.30), while the consequence severity index ranges from 0.20 to 0.85 with a corresponding weight of 0.25, indicating that these two parameters play a decisive role in determining maintenance priorities. In contrast, although the average value of the integrated risk index is 0.036, the repair duration (2–18 days), maintenance cost (3.5–22.0 million KZT), and availability reduction (1.5–12.0%) have relatively lower contributions and are therefore considered secondary factors in risk-based decision-making.
Based on the data in
Table 3, equipment classified in the high-risk category has an average risk value of 0.11, requiring immediate maintenance actions, whereas the medium-risk category with an average value of 0.05 is addressed through scheduled maintenance. In contrast, the low-risk category, characterized by an average risk value of 0.018, requires only monitoring, indicating a lower priority for maintenance interventions.
da Silva et al. [
18] have comprehensively analyzed the impact of aging processes on the reliability of power generation systems, including hydropower generators and their auxiliary systems. The study shows that as the operational lifespan increases, the failure probability of the equipment significantly rises, and reliability degradation becomes more pronounced when the equipment reaches several decades of service life. However, the authors focus solely on identifying the effects of aging and do not propose a comprehensive methodological approach for selecting specific maintenance or repair actions based on the results. In this context,
Figure 3 below presents a structural diagram of the speed governing, control, and power transmission systems of a hydropower turbine and generator.
Figure 3 shows the structural diagram of the speed governing, control, and power transmission systems of a hydropower turbine–generator unit. The diagram illustrates the interaction between the turbine rotor, generator shaft, control system, and electrical power transmission components. These subsystems operate jointly to maintain stable rotational speed, regulate mechanical torque, and ensure reliable electrical energy generation. Understanding the structure of these systems is essential for identifying critical components that influence equipment reliability and maintenance planning.
In general, ref. [
18] demonstrates that the aging processes of hydropower generators decrease reliability and increase the probability of failure, highlighting the need for a comprehensive methodological approach in selecting long-term maintenance actions to ensure the effective operation of the systems shown in the diagram.
Some foreign authors have conducted scientific research on enhancing the reliability of equipment and maintenance methods at hydroelectric power plants. For example, Alvarez-Alvarado et al. [
24] proposed data-driven approaches to improve the reliability of power systems; however, their application remains challenging in certain cases. Kovalev et al. [
25] explored new approaches for optimizing maintenance and repair work at hydropower facilities, but this process faces various difficulties due to the different types and characteristics of equipment. Sartor et al. [
26] focused on the effectiveness of asset management systems for hydroelectric plants, showing that comprehensive approaches are needed to improve their performance. Additionally, Adhikari et al. [
27] emphasized the importance of managing investment risks and ensuring financial stability in hydropower projects, highlighting the need to improve the sustainability of renewable energy sources for long-term economic effectiveness. The results of these studies indicate that effective methods for optimizing equipment at the Kapshagay HPP need to be applied.
In recent years, reliability-centered maintenance (RCM) has increasingly been integrated with digital technologies and data-driven approaches within the framework of Industry 4.0. Modern predictive maintenance systems utilize artificial intelligence and industrial Internet of Things (IIoT) technologies to improve the reliability and operational efficiency of industrial equipment. Wu [
28] demonstrated that AI-driven predictive maintenance models significantly enhance fault detection capabilities and support more effective maintenance decision-making in complex industrial systems.
In addition, the concept of RCM 4.0 has been proposed as a digital framework that integrates reliability analysis with smart industrial systems and advanced monitoring technologies. Gomaa [
29] emphasized that the application of digital RCM frameworks allows improved asset management, increased system reliability, and reduced maintenance costs in modern industrial environments.
Furthermore, recent studies highlight the importance of integrating reliability-centered maintenance with Industry 4.0 technologies to support sustainable manufacturing processes. Jena et al. [
30] showed that the combination of RCM principles with digital monitoring and intelligent maintenance systems can significantly enhance equipment performance, reduce operational risks, and improve the sustainability of industrial production systems.
The results of the literature review indicate that significant scientific achievements have been made in improving the reliability of hydroelectric equipment. However, unresolved issues such as data deficiencies, high implementation costs, failure to fully account for aging factors, and the complexity of directly linking maintenance decisions to reliability indicators remain. One way to overcome these difficulties is to develop an optimized approach that directly links maintenance impacts to reliability indicators, taking into account actual operating conditions and the wear level of equipment.
All of these factors demonstrate the scientific and practical relevance of conducting a specialized study aimed at identifying optimal maintenance impacts to enhance the operational reliability of the Kapshagay Hydroelectric Power Plant’s equipment.
Based on the identified methodological limitations and research gaps, the following section presents the proposed modeling framework and analytical approach.
4. Materials and Methods
The aim of this study is to develop and validate an integrated reliability-centered maintenance (RCM) framework for hydropower equipment based on operational data from the Kapshagay HPP.
To achieve this aim, the study pursues four objectives: first, to analyze failure statistics and operational performance for 2020–2025; second, to develop reliability models for failure probability and MTBF estimation; third, to validate baseline and optimized maintenance strategies using simulation tools (Python and SMath Solver); and finally, to quantify the techno-economic impact of the optimized maintenance interventions. During the research, reliability-based maintenance models and mathematical modeling methods were used, which allow for assessing the failure probability of equipment. Simulations for modeling the operation of Kapshagay HPP equipment were conducted using Python 3.11 and SMath Solver 0.99.7920 software, while various sensors and measurement instruments were employed to collect real-time data. The validity of the proposed models was verified through experimental data and real operating conditions, confirming their effectiveness.
In this study, Reliability-Centered Maintenance (RCM) models were applied, allowing the assessment of the failure probability and reliability of equipment over time. These models evaluate the failure rate and reliability of equipment, considering aging processes and external factors, and help identify optimal maintenance strategies.
The reliability assessment in this study is based on standard reliability theory and uses two complementary model families: (1) time-to-failure models expressed via the hazard rate, and (2) condition-based prognostic models expressed via a generalized hazard estimated from monitored state variables.
4.1. Time-Dependent Reliability Model
Let
denote the random time to failure. The hazard rate
(units:
) is defined as the instantaneous conditional failure intensity. The reliability and cumulative failure probability are then
To represent aging-driven degradation of long-operated hydro equipment, the hazard rate is approximated by a wear-dependent law:
where
and
have units of
, and
is dimensionless and controls the acceleration of aging.
4.2. Condition-Based Prognostic Model
The equipment condition is represented by a monitored state vector
(e.g., vibration RMS, temperature, etc.). The generalized hazard
(units:
) maps the condition vector to an instantaneous failure intensity, and the failure probability is
In this study,
is estimated from the operational dataset (2020–2025) using a data-driven regression/ML model. For example, a practical parametric form is
where
are identified by maximum likelihood (classification of failure/non-failure intervals) or nonlinear least squares using historical failure events and synchronized sensor. Alternative implementations (e.g., neural networks) can be used in the same framework by replacing the parametric mapping with
.
Optimization criterion. In this study, optimal maintenance is defined as the strategy that minimizes a composite performance index combining reliability and economic factors. The objective function is formulated as
where C
maint is the total annual maintenance cost, D is the annual unplanned downtime, P
f is the failure probability over the considered operating period, and w
1, w
2, and w
3 are weighting coefficients reflecting the relative importance of cost and reliability criteria.
The optimal maintenance strategy is determined by minimizing J subject to operational constraints and reliability limits.
During the study, software programs such as Python and SMath Solver were used for mathematical modeling and simulation, allowing the modeling of Kapshagay Hydroelectric Power Plant (HPP) equipment’s operational conditions and optimizing maintenance schedules based on reliability indicators. Vibration sensors, temperature gauges, and electrical parameter monitoring devices were used to collect data, enabling real-time monitoring of the equipment’s condition.
Industrial-grade piezoelectric vibration sensors (measurement range: 0–20 mm/s, accuracy ± 0.05 mm/s), thermocouple-based temperature sensors (measurement range: 0–150 °C, accuracy ± 1 °C), and digital electrical parameter monitoring units integrated into the plant SCADA system were used to collect operational data. Data acquisition was performed with a sampling interval of 10 min, ensuring sufficient temporal resolution for reliability modeling and statistical analysis.
Vibration sensors were installed on the turbine bearing housings and generator shaft support structures to monitor mechanical imbalance, bearing degradation, and rotor misalignment. Temperature sensors were mounted on stator windings, bearing lubrication systems, and cooling circuits to detect overheating and insulation deterioration. Electrical monitoring devices were integrated into generator output circuits and transformer connections to measure voltage, current, and active power parameters.
The vibration sensors had a measurement range of 0–10 mm/s with a sensitivity of ±0.05 mm/s. Temperature sensors operated within a range of 0–150 °C with an accuracy of ±1 °C. Electrical measurement devices provided voltage measurement up to 10 kV and current measurement up to 5 kA with a sampling interval of 1 min. All sensors were connected to the plant’s data acquisition system for continuous monitoring and statistical analysis.
The structural diagram of the software and hardware tools for modeling Kapshagay HPP equipment and optimizing maintenance schedules based on reliability indicators is presented in
Figure 4.
Figure 4 illustrates the software and hardware infrastructure used for reliability modeling and maintenance optimization of Kapshagay HPP equipment. The modeling framework integrates Python-based numerical simulation and SMath Solver analytical calculations with real operational data collected from vibration, temperature, and electrical monitoring sensors installed on turbines and generators. These tools enable simulation of equipment operating conditions and support optimization of maintenance schedules based on reliability indicators.
The experiment was conducted at the Kapshagay HPP under real operational conditions. The study considered the impact of changes in water flow, hydrological conditions, and climatic fluctuations. The annual average water flow of the HPP was Q = 75 m
3/s, with a seasonal variation in water flow of Qmax = 120 m
3/s. Temperature and vibration data from turbines and generators were used to assess the operational health of the equipment. Below, the structural elements and operating principles of the hydroelectric systems of the Kapshagay HPP are presented (
Figure 5).
Figure 5 presents the main structural elements and operating principles of the Kapshagay Hydropower Plant.
Figure 5a shows an aerial view of the hydropower station and reservoir infrastructure.
Figure 5b illustrates the main dam structures and hydraulic facilities.
Figure 5c provides an interior view of the turbine hall with generators and associated electromechanical equipment.
Figure 5d presents a schematic diagram of a hydropower generator system, highlighting the key components involved in mechanical-to-electrical energy conversion.
Validity of the Proposed Solutions: The validity of the proposed maintenance strategies and models was verified by comparing them with real data obtained from the hydroelectric power station. The suitability of the models was checked by comparing the predicted results for failure probability and reliability indicators with the historical failure data of the equipment. These comparisons confirmed the effectiveness and accuracy of the studied models.
5. Results
The research work was carried out during 2020–2025 at the Department of Electric Power Engineering of K. I. Satbayev Kazakh National Technical Research University. The main objective of the study was to scientifically optimize maintenance and servicing strategies aimed at improving the reliability of the equipment of the Kapshagay Hydropower Plant (HPP). During the study, the existing maintenance systems were analyzed and their effectiveness was evaluated. Taking into account the actual operating conditions and reliability indicators, a reliability-based maintenance model was developed. The obtained results are focused on enhancing the operational performance of the equipment and reducing unplanned outages.
5.1. Analysis of the Existing Maintenance Strategies at the Kapshagay Hydropower Plant and Evaluation of Their Effectiveness
To conduct an in-depth assessment of the effectiveness of the maintenance and servicing strategies applied at the Kapshagay Hydropower Plant, operational data collected during the period 2020–2025 were analyzed. The evaluation focused on equipment failure frequency, the duration of unplanned outages, reliability performance indicators, and the associated economic costs, which were examined in an integrated manner to assess the overall efficiency of the existing maintenance practices.
The results of the analysis showed that the failure frequency of the main power equipment, including hydraulic turbines and generators, varied on average between 3.8 and 4.2 times per year. These values exceed the reliability levels commonly accepted in industry practice. The average duration of a single unplanned outage ranged from 18 to 26 h, resulting in a total annual downtime of approximately 80–100 h. Such operating conditions led to an average reduction in electricity generation of 2.5–3.2%. Based on the conducted scientific research,
Table 4 presents a summary of the reliability and operational performance indicators of the Kapshagay Hydropower Plant equipment for the period 2020–2025.
According to the data presented in the table, the average failure frequency of the equipment is 4.0 times per year, while the total annual unplanned downtime ranges from 80 to 100 h, leading to a 2.5–3.2% reduction in electricity generation. In addition, annual maintenance costs of approximately 150 million KZT (≈301,500 USD, based on the exchange rate of 1 USD = 497.56 KZT, National Bank of Kazakhstan, April 2026) and an unscheduled maintenance share of 40% indicate the insufficient effectiveness of the existing maintenance strategies.
The obtained scientific results demonstrate that the reliability characteristics of the equipment are directly dependent on changes in operating regimes. In this study, the failure process of the equipment was assumed to be characterized by a constant failure rate, and reliability was described using an exponential model. Under this assumption, the probability of failure over a given time interval is defined as
where λ denotes the failure rate and t is the operating time. Based on this model, the probability of failure over a monthly interval under normal operating conditions was determined to be approximately Pf ≈ 0.12. During periods of high water flow and increased hydrological stress, the increase in the failure rate led to a rise in the failure probability to Pf ≈ 0.17–0.18.
The mean time between failures (MTBF) in reliability theory is inversely related to the failure rate and is expressed as
where
is the reliability function, λ denotes the failure rate, and t represents the operating time.
According to this relationship, under normal operating conditions the MTBF of the equipment ranges from 2100 to 2300 h, whereas under high-load conditions, due to the increased failure rate, this value decreases to 1600–1700 h.
Wear detection was primarily associated with turbine bearings, rotor assemblies, stator windings, and lubrication systems. Increased wear was identified based on abnormal vibration amplitude, temperature rise, or deviation of electrical parameters from nominal values.
When bearing vibration exceeded acceptable thresholds, bearing clearance inspection and lubrication replacement were performed. Rotor imbalance led to dynamic balancing procedures. Elevated stator temperatures initiated insulation diagnostics and cooling system inspection. Maintenance actions were limited to the affected components rather than complete turbine replacement.
The results of the economic assessment indicate that the annual costs of maintenance and servicing amount to approximately 150 million KZT (≈301,500 USD) (the research results can be observed in
Table 5). Of this total, 90 million KZT (≈181,000 USD) is allocated to planned maintenance, while 60 million KZT (≈120,600 USD) is spent on unplanned repairs. The high share of unplanned maintenance is associated with increased costs related to the urgent delivery of spare parts, the involvement of additional labor resources, and temporary interruptions in electricity generation. The direct and indirect costs incurred during a single unplanned outage amount on average to 12–15 million KZT (≈24,100–30,150 USD).
Table 5 shows that the annual costs of maintenance and restoration activities at the Kapshagay Hydropower Plant amount to approximately 150 million KZT, with a significant portion attributable to unplanned repairs. The cost of a single unplanned outage, reaching 12–15 million KZT (≈24,100–30,150 USD), indicates the economic inefficiency of the existing maintenance strategies and highlights the need for reliability-based optimization.
The results of the analysis indicate that the maintenance strategies currently applied at the Kapshagay Hydropower Plant are not sufficient to ensure stable and reliable operation of the equipment. The high level of unplanned outages, the increase in failure probability depending on operating conditions, and the significant economic impact of unplanned repairs highlight the relevance of implementing reliability-based maintenance approaches. Such approaches can enhance equipment performance, significantly reduce annual downtime, and optimize maintenance costs in the long term.
5.2. Development of a Reliability-Based Maintenance Model
During the development of the reliability-based maintenance model, computational and simulation studies were conducted with the aim of reducing equipment downtime by taking into account failure probability, reliability indicators, and operating regimes. The proposed model incorporates the actual operating conditions of the Kapshagay Hydropower Plant, including equipment load levels, failure rates, and technical condition, and enables the optimization of maintenance and servicing schedules.
The research results demonstrated that the implementation of a reliability-centered maintenance model significantly improves the reliability level of equipment. As a result of applying the model, the probability of equipment failure decreased by approximately 15% compared to the initial value. At a reliability indicator level of 0.1, the downtime duration was reduced from 6 days to 4 days, leading to a substantial reduction in annual downtime. The research results are summarized in
Table 6 below.
The data presented in
Table 6 indicate that the implementation of reliability-centered maintenance resulted in a reduction in the equipment failure probability from 1.00 to 0.85 (−15%), while the downtime per failure decreased from 6 days to 4 days. In addition, the annual downtime was reduced from 18 days to 12 days, and the mean time between failures increased from 120 days to 165 days, demonstrating a significant improvement in the overall operational efficiency of the equipment.
In addition, the use of a maintenance schedule developed based on reliability indicators made it possible to increase equipment operational efficiency by 12%. At the same time, maintenance and technical service costs were reduced by approximately 10%. The research results are illustrated in
Figure 6.
Figure 6 presents a quantitative comparison of key operational performance indicators under baseline maintenance and the proposed RCM-based strategy. The results demonstrate a reduction in annual downtime from 6.2 to 4.1 days and a decrease in maintenance expenditures from approximately 200 million KZT to 175 million KZT. These findings confirm that the implementation of reliability-centered maintenance improves operational efficiency while simultaneously optimizing maintenance-related costs.
The observed reductions in downtime (−34%) and maintenance costs (−12.5%) indicate that integrating reliability modeling into maintenance planning enhances both technical performance and economic sustainability of hydropower operations.
5.3. Identification of Effective Maintenance Interventions
The results of reliability-based modeling made it possible to optimize maintenance interventions for the equipment of the Kapshagay Hydroelectric Power Plant.
During the study, maintenance strategies were revised based on the actual operational parameters of turbines and generators. At the Kapshagay HPP, the average annual water flow rate is Qavg = 75 m3/s, while the seasonal maximum reaches Qmax = 120 m3/s. The nominal rotational speed of the turbine rotor is n = 125 rpm, and the generator operating power is approximately P = 364 MW.
During the assessment of the technical condition of the equipment, the vibration amplitude ranged from 2.5 to 4.8 mm/s, while the temperature varied within the range of 65–95 °C. Under normal operating conditions, it was determined that turbine vibration should not exceed 3.0 mm/s, and the temperature should not exceed 80 °C.
In the reliability model, the operational performance of the equipment was described using the Weibull function:
The Weibull parameters
(scale, hours) and
(shape, dimensionless) were identified from the Kapshagay HPP operational failure-time dataset (2020–2025). Parameter estimation was performed using maximum likelihood estimation (MLE) for time-to-failure data (with right-censoring considered where applicable). The reported values
h and
correspond to the best-fit parameters obtained for the baseline operating regime and were subsequently used for comparative reliability evaluation before and after maintenance optimization. The annual downtime was reduced from 6.2 days to 4.1 days. Electricity generation losses decreased by approximately 18–22 GWh per year. A comparative description of these results is presented in
Table 7 below. The MTBF values presented in
Table 7 were calculated using Equation (16) based on recorded failure frequency and total operating time.
These results in
Table 7 demonstrate that the implementation of reliability-based maintenance strategies significantly improved the operational performance of the equipment. The failure probability decreased from 0.10 to 0.07 (approximately a 30% reduction), and the annual downtime decreased from 6.2 days to 4.1 days (approximately a 34% reduction), while MTBF increased from 41,000 h to 55,000 h (approximately a 34% increase). In addition, the reliability coefficient increased from 0.89 to 0.94, and maintenance costs decreased from 180–220 million KZT (≈362,000–442,000 USD) to 160–195 million KZT, confirming the high technical and economic efficiency of the proposed maintenance approach.
As a result of analyzing components with a high wear level, it was determined that the degradation rate of rotor and stator insulation is approximately 3.5–4.2% per year. Under traditional scheduled maintenance, the replacement interval was 5 years, whereas under condition-based monitoring, this interval was extended to 6–7 years. These results are presented in
Table 8.
The results in
Table 8 show that the use of condition-based maintenance increases the replacement interval from 5 years to 6–7 years and extends the equipment operating lifetime from 43,800 h to 56,900 h, indicating a significant increase in service life. In addition, maintenance costs decrease from 100% to 88–91%, while equipment reliability increases by approximately 25%, confirming the effectiveness of wear-based maintenance.
The change in the reliability model over time is shown in
Figure 7 below. It can be observed that in the optimized model, the failure probability increases more slowly, indicating higher operational reliability of the equipment.
Figure 7 illustrates the variation in equipment failure probability as a function of operating time for two maintenance scenarios: the baseline maintenance model and the optimized reliability-centered maintenance model. The baseline model represents the existing maintenance strategy applied at the Kapshagay HPP prior to optimization. The optimized model reflects the proposed maintenance strategy developed in this study, which integrates reliability modeling and predictive diagnostics. The horizontal axis represents cumulative operating time (hours), while the vertical axis represents the probability of equipment failure. The orange curve corresponds to the baseline maintenance strategy, whereas the blue curve represents the optimized reliability-centered maintenance model.
The horizontal axis represents cumulative operating time (hours), while the vertical axis represents the probability of equipment failure. The upper curve corresponds to the baseline maintenance strategy, whereas the lower curve represents the optimized reliability-centered maintenance model. The optimized model demonstrates a slower increase in failure probability over time, indicating improved reliability and reduced operational risk.
For example, at 50,000 h, the probability is approximately 0.13 in the baseline model, while in the optimized model it is about 0.09; similarly, at 20,000 h, the values are 0.06 and 0.04, respectively, indicating that reliability-based maintenance reduces the failure risk by approximately 30–35%.
In general, the implementation of reliability-based maintenance strategies resulted in a 20–30% reduction in failure probability, a 10–12% decrease in maintenance costs, and a 5–7% increase in the reliability coefficient. The generator efficiency increased from 94.5% to 96.2%, and the annual electricity generation volume increased by 1.8–2.4%. Assuming an average electricity price of 18–22 KZT/kWh, the economic benefit was estimated at approximately 320–480 million KZT (≈643,000–965,000 USD) per year.
The annual economic benefit
was calculated as the sum of reduced maintenance costs and additional revenue from reduced electricity production losses:
where
is the reduction in annual maintenance costs,
is the reduction in annual electricity losses, and
is the average electricity price.
Based on operational data, the reduction in electricity losses was estimated at 18–22 GWh per year. Assuming an average electricity price of 20 KZT/kWh (i.e., 20,000,000 KZT per GWh), the additional revenue from increased electricity generation was calculated as follows:
Thus, the additional revenue due to reduced energy losses ranges from 360 to 440 million KZT per year.
If the annual maintenance cost reduction is estimated at approximately 20–40 million KZT, the total annual economic benefit is
6. Discussion of the Results of the Study
The results of this study provide clear and quantitative answers to the formulated research questions and demonstrate the practical feasibility of implementing reliability-centered maintenance (RCM) in real hydropower operating conditions.
Regarding RQ1, the proposed RCM framework significantly reduced equipment failure probability and improved operational reliability indicators. The observed decrease in failure probability (from 0.10 to 0.07), reduction in failure rate (−28%), and increase in MTBF (from 120 to 165 days and from 41,000 to 55,000 h) confirm that linking maintenance decisions directly to reliability indicators leads to measurable performance improvement. The slower growth of failure probability over time, as shown in
Figure 7, indicates improved degradation control under optimized maintenance planning.
Concerning RQ2, the techno-economic assessment confirms that reliability-based maintenance produces substantial economic benefits. The reduction in annual downtime (from 6.2 to 4.1 days) and decrease in maintenance costs (from 180–220 million KZT to 160–195 million KZT) demonstrate that reliability optimization is not only technically effective but also economically justified. The estimated annual economic benefit of 320–480 million KZT highlights the importance of integrating reliability modeling with financial evaluation in maintenance decision-making.
With respect to RQ3, the application of mathematical modeling and simulation tools (Python and SMath Solver) proved effective for supporting maintenance optimization under real operating conditions. Unlike purely theoretical reliability studies, the present framework integrates exponential and Weibull reliability models with actual plant-specific operational data, including vibration levels, temperature regimes, hydrological variability, and failure statistics. This integration enhances practical applicability compared to models that rely solely on generalized assumptions.
The obtained results are consistent with previous studies. Martinez-Monseco [
17] reported a 15–20% reduction in unplanned outages using RCM approaches, which aligns with the reductions observed in this study. Similarly, de Santis et al. [
21] demonstrated that condition-based maintenance (CBM) can increase availability by 8–15%, although such systems often require high initial investment. The present approach combines reliability modeling with economic justification, thereby addressing the implementation constraints highlighted in earlier research.
At the same time, the proposed method differs from biologically inspired predictive models [
19], which require extensive datasets and high computational resources. The framework presented here achieves substantial reliability improvements using comparatively moderate computational complexity, making it suitable for implementation at aging hydropower facilities with limited digital infrastructure.
Despite these positive outcomes, several limitations must be acknowledged. The model calibration was performed using operational data specific to the Kapshagay HPP, characterized by an average annual water flow of 75 m3/s and seasonal maxima of 120 m3/s. Changes in hydrological regimes or equipment modernization may require recalibration of reliability parameters. Furthermore, while the model captures system-level reliability, detailed component-level degradation mechanisms were not fully modeled. Future research should incorporate advanced statistical parameter estimation techniques (e.g., maximum likelihood estimation), confidence interval analysis for reliability parameters, and integration with risk-based maintenance and AI-driven predictive diagnostics.
Overall, the study demonstrates that integrating reliability theory, operational diagnostics, and economic analysis within a unified framework provides a robust foundation for maintenance optimization in hydropower systems. The results confirm that reliability-centered maintenance can reduce failure probability by 20–30%, decrease maintenance costs by 10–12%, and increase electricity generation efficiency by approximately 1.8–2.4%, thereby supporting sustainable and economically efficient hydropower operation.
7. Conclusions
This study developed and validated an integrated reliability-centered maintenance (RCM) framework for hydropower equipment based on operational data from the Kapshagay HPP. The proposed approach combines reliability modeling, diagnostic monitoring, and techno-economic evaluation within a unified analytical structure.
The findings demonstrate that maintenance strategies based on actual reliability indicators significantly improve operational performance compared to traditional scheduled maintenance. The implementation of the RCM framework resulted in a measurable reduction in failure probability and unplanned downtime, an increase in mean time between failures, and a decrease in maintenance expenditures. In addition, the optimized maintenance strategy contributed to reduced electricity generation losses and improved economic efficiency.
The results confirm that equipment reliability at aging hydropower facilities strongly depends on real operating regimes, hydrological variability, and degradation processes. Integrating reliability theory with plant-specific operational data enables more accurate maintenance planning and supports informed decision-making under real operating conditions.
Compared with conventional condition-based and risk-based approaches, the proposed framework demonstrates higher adaptability to site-specific constraints and aging infrastructure, while maintaining moderate computational complexity. This enhances its practical applicability for hydropower plants undergoing digital transformation.
Overall, the study confirms that reliability-centered maintenance can simultaneously improve technical performance, economic sustainability, and operational stability of hydropower systems. Future research should extend the proposed framework to multi-unit systems and incorporate advanced statistical parameter estimation and predictive analytics to further enhance maintenance optimization.
The study provides comprehensive answers to RQ1–RQ3, confirming both the technical effectiveness and economic justification of the proposed reliability-centered maintenance framework.