Next Article in Journal
Energy and Mass Balance Assessment of a Microalgae-Based Biomethane Biorefinery: Mesophilic Design vs. Psychrophilic Operation in a Pilot Plant
Previous Article in Journal
Reconstruction and Exploitation Simulation Analysis of Marine Hydrate Reservoirs Based on Color Recognition Technology
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Explainable Prediction of Power Generation for Cascaded Hydropower Systems Under Complex Spatiotemporal Dependencies

College of Electrical Engineering, Sichuan University, Chengdu 610065, China
*
Author to whom correspondence should be addressed.
Energies 2026, 19(6), 1540; https://doi.org/10.3390/en19061540
Submission received: 4 February 2026 / Revised: 10 March 2026 / Accepted: 16 March 2026 / Published: 20 March 2026
(This article belongs to the Section F5: Artificial Intelligence and Smart Energy)

Abstract

Hydropower plays a key regulating role in new-type power systems, and both forecasting accuracy and interpretability are critical for power dispatch. However, cascade hydropower forecasting is constrained by strong spatiotemporal coupling among multi-dimensional features, flow propagation delays, as well as the limited transparency of deep learning models. To tackle these issues, this paper develops a hybrid framework integrating Maximal Information Coefficient (MIC), the Long- and Short-term Time-series Network (LSTNet), and the SHapley Additive exPlanations (SHAP) interpretability method. First, an MIC-based nonlinear screening mechanism is employed to remove redundant noise and construct a high-quality input space. Second, an LSTNet model is developed to deeply extract spatiotemporal coupling features among cascade stations and flow evolution patterns, achieving high-accuracy forecasting of both system-level and station-level outputs. Finally, SHAP is used for global and local interpretability analysis to perform physics-consistency verification with respect to the model’s decision-making rationale. Experimental results indicate that the proposed approach achieves low errors in total output forecasting, reducing error levels by approximately 57–88% compared with Recurrent Neural Network (RNN), Gated Recurrent Unit (GRU), and Informer. Moreover, SHAP feature-dependence analysis reveals a nonlinear response change of station D around 7.8 MW, providing evidence for the physical consistency of the model outputs and improving model interpretability.

1. Introduction

In contemporary power systems, hydropower is playing a growing role by providing both flexible regulation capability and steady baseload support [1]. However, for cascade hydropower systems with limited overall regulation capacity, power output characteristics are deeply coupled with the strong randomness and volatility of natural runoff, posing severe challenges to the secure integration of electricity into the power grid. In such systems, although some hydropower stations possess a limited degree of regulation capability, it is insufficient to fundamentally alter the system’s high dependence on hydrometeorological conditions. As a result, the power generation process exhibits pronounced run-of-river characteristics [2]. Consequently, developing high-accuracy forecasting techniques under such complex hydraulic conditions is of great scientific significance and practical value for ensuring real-time power balance, optimizing coordinated cascade operation, and improving overall operational efficiency [3].
Cascade hydropower systems with weak regulation capacity are typically composed of multiple run-of-river stations connected in series along a river channel. The system output is governed not only by the generation characteristics of individual stations but also constrained by inter-station hydraulic coupling and flow propagation delays. Viewed as the elemental units of cascade schemes, run-of-river hydropower stations exhibit two defining features: station-level randomness and spatiotemporal heterogeneity. Station-level randomness arises because these plants respond acutely to hydrometeorological perturbations; it is this sensitivity that makes their generation series stochastic, with abrupt swings and clear nonstationarity. Spatiotemporal heterogeneity, in turn, reflects the influence of geographic dispersion: terrain, land-surface attributes, and watershed runoff generation and concentration processes each leave a distinct signature at a given site. As a result, meteorological forcing and inflow-formation pathways differ sharply among stations, yielding generation capability that is distinctly uneven—both through time and from one location to another along the cascade and across seasons.
Given the station characteristics described earlier, the existing literature on hydropower output forecasting is commonly organized into two broad lines of work: physics-based approaches and data-driven approaches. Physics-based methods are generally anchored in hydrological and hydraulic process modeling, within which the conversion mechanisms of hydropower are specified in an explicit, mechanistic manner. By jointly accounting for reservoir stage, river inflow/discharge, and basin-scale precipitation, these models calculate and forecast power production. Owing to their clear physical interpretability and strong explanatory capacity, physics-based models have long served as a foundational tool for hydropower prediction. Yet their performance is tightly coupled to the fidelity with which hydrological and hydraulic dynamics are represented. When catchment environments grow complex, observations are limited, or extreme climatic events arise, meeting practical operational demands can become challenging for such models.
With the steady enrichment of online monitoring and the continued growth of computing power, interest has gradually moved toward data-driven forecasting. Instead of relying solely on explicit physical equations, these methods learn statistical patterns and functional mappings directly from historical records, enabling them to reflect nonlinearity and uncertainty more effectively. The related body of work spans multiple methodological tiers, ranging from time-series analysis to machine learning and deep learning techniques. In the time-series and machine learning stream, Ref. [4] addressed nationwide hydropower generation forecasting in Ecuador using Autoregression Moving Average Model (ARIMA)/Autoregressive Integrated Moving Average with Exogenous Inputs (ARIMAX) as baseline models for monthly output, showing that the resulting parameterization remains interpretable while imposing relatively low computational cost-an advantage where computing resources are limited. Ref. [5] employed gradient-boosting decision trees (GBDT) to forecast hydropower generation in Turkey’s Ceyhan River Basin; by integrating upstream station production, meteorological variables, and electricity market prices, the Light Gradient-Boosting Machine (LightGBM) framework achieved a pronounced improvement in predictive accuracy. Turning to deep learning, RNNs, including variants such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), have been widely used in hydropower and hydrological prediction because they can represent temporal dependence effectively. For example, Ref. [6] mitigates non-stationarity via STL decomposition and employs models such as LSTM to forecast the hydropower generation of Cameroon’s Songloulou hydropower plant, demonstrating its effectiveness in handling periodic and non-stationary hydropower time-series.
Building on the generation behavior of individual stations, weakly regulated cascade hydropower systems exhibit strong hydraulic coupling between neighboring stations via river-flow propagation, with hydrological processes and meteorological drivers intertwined across space and time [7]. In addition to the environmental stochasticity affecting an individual station, variations in upstream discharge do not influence downstream conditions instantaneously; instead, they travel downstream through a finite flow-routing delay, which then alters downstream inflows and, consequently, downstream generation potential. As a result, the production time-series of upstream and downstream stations exhibits pronounced phase shifts along the temporal axis. Such complex spatiotemporal interdependence—arising from the combined effects of spatial coupling, time-lagged transmission, and ambient uncertainty—significantly increases the challenge of representing cascade dynamics and producing reliable output forecasts. It is for this reason that simply extending single-station forecasting techniques to cascade systems often fails to capture the system-level spatiotemporal dependencies introduced by inter-station coupling and propagation delays. In many existing studies employing traditional RNN-, LSTM-, or GRU-based models for multi-station forecasting, inputs from different stations and environmental variables are typically concatenated into a single feature vector and treated as parallel variables, with modeling efforts still primarily focused on temporal evolution along a single time dimension. Due to the lack of explicit structural mechanisms for characterizing local multivariate coupling and dynamic time-delay effects, such approaches struggle to fully disentangle the complex nonlinear interactions inherent in weakly regulated run-of-river cascade systems, thereby limiting their potential performance gains in cascade-level forecasting tasks.
Motivated by the above analysis, this study introduces the architecture of the Long- and Short-term Time-series Network (LSTNet) to jointly model variable coupling relationships and multi-scale temporal characteristics among upstream and downstream stations in cascade hydropower systems. By coupling convolutional neural network (CNN) with recurrent neural network (RNN) within a single architecture, LSTNet is able to uncover nonlinear cross-variable interactions as well as propagation delays whose magnitudes vary over time. In addition, an Autoregressive (AR) component is introduced to capture linear trends and to mitigate the model’s limited sensitivity to scale, thereby enhancing predictive accuracy. Nevertheless, when deployed in practical settings, two central challenges still emerge:
  • Key input feature selection in high-dimensional nonlinear settings. Cascade hydropower systems are jointly influenced by a wide range of environmental drivers-precipitation, evaporation, soil moisture, upstream inflows, among others—so the resulting input space is markedly high-dimensional and characterized by tightly intertwined cross-dependencies. Because run-of-river cascades offer only limited regulation, total output responds sharply to environmental perturbations. In turn, most meteorological and hydrological variables relate to power output in strongly nonlinear and nonstationary ways, while the coupling intensity among variables shifts with hydrological seasons. Conventional linear association metrics, exemplified by the Pearson correlation coefficient, cannot reliably uncover these nonlinear dependencies; they may either inject redundant noise or miss crucial signals, thereby capping forecasting performance.
  • The black-box characteristics of deep learning models may undermine the perceived credibility of their decision making. In practical hydropower dispatch, operators care about more than numerical accuracy; what matters as well is why a forecast is produced-whether a particular upstream station or an environmental driver is pushing downstream output, for instance [8,9]. Yet LSTNet, as a deep neural network, relies on highly intricate weight-update dynamics and feature-mapping transformations [10], making its decision pathway difficult to interpret in physically meaningful terms. Such limited transparency can, in many engineering applications, undermine both the acceptance of and confidence in deep-learning-based forecasting models [11].
To address these concerns, this work incorporates the Maximal Information Coefficient (MIC) and the SHapley Additive exPlanations (SHAP) interpretability framework, thereby constructing an interpretable forecasting scheme tailored to cascade hydropower systems. MIC is employed to characterize nonlinear associations between environmental drivers and power output, enabling complex dependencies to be identified efficiently without pre-specifying functional forms; in this way, it facilitates feature selection in high-dimensional settings. SHAP, grounded in cooperative game theory, assigns each feature a marginal influence on the final prediction and provides a consistent explanation pathway—from global importance evaluation to local, single-sample interpretation—thus substantially improving model transparency. In this paper, our contributions are as follows:
  • An MIC-guided nonlinear feature-selection strategy is developed to overcome the constraints of conventional linear correlation analysis, thereby constructing a high-quality input feature space for cascade hydropower forecasting.
  • A spatiotemporally coupled LSTNet-based forecasting model is constructed to capture nonlinear coupling relations and dynamic propagation delays among multiple stations in cascade hydropower systems.
  • A global—local, two-level interpretability framework based on SHAP is established, significantly improving the reliability and interpretability of deep learning models for cascade hydropower engineering applications.
The paper is structured as follows. Section 2 presents the proposed methodology and its theoretical basis. Section 3 describes the experimental setup and summarizes the results. Section 4 provides discussion, and Section 5 offers concluding remarks.

2. Materials and Methods

2.1. MIC-Based Input Feature Selection

In cascade hydropower systems, the generation capability of each station shows marked differences across space and time. These disparities are mainly attributable to local variations in terrain and geomorphic settings, runoff behavior, and meteorological forcing. Relative to forecasting for a single plant, systems made up of multiple distributed stations face not only more changeable environmental conditions, but also stronger and more explicit hydraulic coupling. Although many environmental indicators are linked to hydropower production, their influence on realized output is far from uniform. For this reason, it is necessary to analyze the associations between environmental factors and generation so as to pinpoint and retain the key input features that act as primary drivers of output fluctuations, as illustrated in Figure 1.
Conventional correlation measures, including the Pearson correlation coefficient [12] and the Spearman rank correlation coefficient [13], have long been used to quantify linear dependence between variables. When combined with regression analysis to construct mathematical expressions, they can represent linear relations and a limited set of basic nonlinear patterns. Still, when data exhibit pronounced nonlinearity together with multi-scale behavior, traditional linear correlation models are often unable to describe the underlying relationships effectively. To capture such intricate nonlinear dependence with greater fidelity, a range of newer correlation metrics has been proposed and studied. Among them, MIC [14], a nonparametric measure derived from mutual-information theory, has received considerable attention because of its distinctive advantages.
MIC, grounded in Information Theory, adaptively partitions the two-dimensional scatter space of paired observations ( x , y ) into candidate grids and identifies the maximum normalized mutual information. In doing so, it provides an accurate, model-free quantification of the strength of potentially arbitrary dependence between two variables, without pre-specifying any functional form. Given two variables X and Y, their mutual information can be expressed as:
M I C ( X , Y ) = max a b < B I ( X , Y ) log 2 ( min ( a , b ) )
where a and b denote the numbers of grid rows and columns, respectively, and B is the resolution parameter. The grid resolution is sample-size dependent and is commonly set as B = n 0.6 , with optional adjustment for specific datasets. Here, I ( X ; Y ) denotes the mutual information between X and Y.
I ( X , Y ) = p ( X , Y ) log 2 p ( X , Y ) p ( X ) p ( Y ) d x d y
where p ( X , Y ) denotes the joint probability distribution of X and Y. For a dataset with two attributes distributed in a two-dimensional space, if the space is divided into an a × b grid, then the joint probability p ( X , Y ) and the corresponding marginals p ( X ) and p ( Y ) can be estimated from the observed frequencies of samples falling into each grid cell.
Building on MIC, this study quantifies the latent association strength between environmental indicators and hydropower generation. The pool of candidate drivers spans a broad set of hydrometeorological and geographic variables, including precipitation, evaporation, relative humidity, dew-point temperature, wind speed, elevation, and air temperature. An MIC-guided nonlinear feature-selection strategy is developed to overcome the constraints of conventional linear correlation analysis, thereby constructing a high-quality input feature space for cascade hydropower forecasting.

2.2. An LSTNet-Based Cascade Hydropower Output Forecasting Model

2.2.1. CNN Front-End Component

In practical generation-forecasting scenarios for cascade hydropower systems, multiple reservoirs and power stations are hydraulically connected through river reaches and coordinated via joint dispatch, resulting in pronounced spatial coupling. Meanwhile, inflow processes are further modulated by inter-annual and seasonal variability in hydrological conditions as well as by extreme meteorological events, causing the output time-series to exhibit strong nonlinearity, superposition across multiple temporal scales, and complex multi-source forcing. Under such conditions, conventional linear models or single-station statistical models struggle to simultaneously characterize the intricate cross-station dependence structure and the effects of meteorological drivers. Even when multivariate time-series models such as RNNs/LSTMs are adopted, limited historical records and potential regime shifts in inflow dynamics or the occurrence of extreme events can still lead to insufficient learning of long-range dependencies, high sensitivity to the training data distribution, and limited generalization, thereby hindering robust adaptation to the operational characteristics of complex cascade hydropower systems. To address these challenges, we introduce a convolutional neural network (CNN) [15], which exploits local receptive fields and weight sharing to perform convolutions over a two-dimensional input space jointly formed by the temporal and feature dimensions, thereby automatically extracting engineering-relevant local patterns and achieving efficient representation of complex cascade structures with relatively few parameters. As the generic CNN architecture is well established, its standard composition and principles are not reiterated; the basic backbone is illustrated in Figure 2. In this study, the CNN front-end consists of convolutional layers and rectified linear units (ReLU). To preserve the original temporal granularity and avoid the loss of fine-scale temporal details, no pooling layer is employed, and the resulting outputs are subsequently fed into the recurrent module and the recurrent skip-connection module.
In this study, cascade hydropower generation forecasting is formulated as a multivariate time-series prediction problem. Let the input vector at the t time step be x t F , where the feature dimension F is jointly constructed from the generation outputs of all cascade hydropower stations and the meteorological factors selected via MIC (precipitation, temperature, etc.). The historical sequence of length T is denoted as X = [ x 1 , x 2 , , x T ] T × F . The CNN front-end component takes this two-dimensional input of “time step × feature dimension” as its operating object, and extracts short-term local patterns by sliding a window along the temporal dimension using one-dimensional convolution. Specifically, it learns local dependencies from the multivariate information within several adjacent time steps. For the k convolutional kernel W(k) s × F (with temporal width s and height matching the feature dimension F) and bias b(k), the convolution output at position t can be written as:
h t ( k ) = σ b ( k ) + i = 0 s 1 j = 1 F W i , j ( k ) x t i , j , t = s , , T
where s denotes the temporal kernel length, σ ( · ) denotes the ReLU nonlinear activation function, and h t ( k ) constitutes the output sequence of the k-th short-term feature channel along the temporal dimension. Through parallel convolution and activation with multiple kernels, the CNN can, under a limited number of parameters, automatically aggregate the joint variation information of the outputs of upstream and downstream stations in the cascade and the key meteorological factors within a short time window, thereby forming a feature sequence that captures both local temporal patterns and multivariate correlation characteristics.
Specifically tailored for cascade scenarios, the convolutional component effectively mines the correlations between the output features of individual stations and the total system output. Into the time-step x feature space, it maps the MIC-screened multivariate time-series, yielding a localized representation directly. Such a mapping brings to the fore evolutionary motifs between neighboring steps while tightening the mutually constrained links of multi-station outputs with meteorological drivers. With short-horizon local cues and cross-variable correlations distilled, the CNN delivers a sharper, information-rich, suitably sized input to later recurrent modules for capturing long-range dependence. In turn, forecasts of cascade hydropower output gain accuracy and robustness.

2.2.2. RNN Recurrent and Skip Components

In cascade hydropower generation forecasting systems, the data typically exhibit large sample volumes, long temporal spans, and high-dimensional feature spaces. If feature extraction relies solely on a single-level CNN, the same set of convolutional kernels must simultaneously capture short-term fluctuations and long-term evolution patterns, leading to an inherent and difficult trade-off in scale balancing. Therefore, this study introduces a recurrent neural network (RNN) [16] after the CNN module to dynamically memorize and propagate historical information via the recursive update of hidden states, thereby enhancing the ability to characterize long-term temporal evolution and serial dependence. The resulting outputs are then fed into a fully connected layer to generate forecasts for future time steps. Figure 3 illustrates the structure of the RNN.
To simultaneously account for continuous temporal dependencies and periodic long-range dependencies, this study adopts a strategy that combines standard recurrent components with Recurrent-Skip components. The mathematical principles are defined as follows:
Let the input time-series be { x t ( S ) } t = 1 T , the corresponding hidden states be { h t ( S ) } t = 1 T , the outputs be { y t ( S ) } t = 1 T , and the skip step size be p. At each time step t, the RNN performs the following computational steps:
The current input x t ( S ) and the hidden state h t p ( S ) from p time steps earlier are received, and a new hidden state is obtained through a nonlinear transformation:
h t ( S ) = f ( W xh ( S ) x t ( S ) + W hh ( S ) h t p ( S ) + b h ( S ) )
where W x h ( s ) and W h h ( s ) denote the weight matrices from input to hidden state and from hidden state to hidden state, respectively, b h ( s ) denotes the bias term, and f ( · ) typically denotes an activation function such as tanh or ReLU.
Subsequently, the output is generated based on the current hidden state:
y t ( S ) = g i = 1 p W h y i ( S ) h t i ( S ) + b y
where W h y i ( s ) represents the weight matrix mapping the hidden layer to the output layer, b y is the output bias, and g ( · ) is typically set to the identity function in time-series forecasting.
When the skip length p is large, this design creates dilated links along the temporal axis, so that long-range dependencies with strong periodic structure (e.g., daily regulation effects and seasonal cycles) can be accessed more directly. The model therefore becomes more responsive to periodic variability; this setting is termed the Recurrent-Skip component. When p = 1 , the model simply forwards the hidden state from the previous time step, allowing information from all earlier inputs to be carried and recursively updated in a continuous manner; this is the conventional recurrent component.
In the LSTNet forecasting model constructed in this work, the RNN module described above functions as the back-end temporal modeling unit and is placed directly after the front-end CNN. Its operating logic can be viewed as a layer-wise decomposition of representation learning: CNN modules are particularly effective at identifying structural cues within short, localized contexts, whereas the RNN component carries these cues forward via recurrent updating along the time axis. Specifically, the standard recurrent unit fuses within-window continuity with trend-related information, while the Recurrent-Skip branch deliberately separates cross-period seasonal regularities—such as wet–dry season transitions and the sustained effects induced by reservoir-level regulation. Thus, when CNN-based extraction of local spatial dependencies is combined with RNN-based modeling of long-horizon temporal evolution, the resulting mechanism remains responsive to fine-scale fluctuations yet achieves stronger predictive capability for longer-term tendencies in complex cascade hydropower systems.

2.2.3. Autoregressive Component and Fully Connected Layer

The fully connected (FC) layer is one of the most commonly used fundamental modules in deep neural networks. In this study, the FC layer is employed to aggregate the hidden states output by the recurrent module and the recurrent skip-connection module at the current forecasting time. By uniformly performing a weighted integration of features extracted at different temporal scales and by different structural components, it realizes a nonlinear mapping from the high-dimensional temporal feature space to the target generation-forecasting space and produces the final output. In general, given an output vector h with h R n , the FC layer yields a linear output via an affine transformation:
z = W h + b , z R m
where W R m × n and b R m are trainable parameters, and m denotes the output dimension.
If this layer is a hidden layer or nonlinearity needs to be introduced, an activation function ϕ ( · ) (e.g., ReLU, tanh, etc.) is applied element-wise to z to produce the layer output:
a = ϕ ( z ) = ϕ ( W h + b )
In this study, the input to the fully connected (FC) layer consists of the hidden state of the recurrent component at time step t, denoted as h t R , together with the p hidden states of the skip-recurrent component over the interval from t p + 1 to t, namely, h t i S i = 0 p 1 [17].
h t D = W R h t R + i = 0 p 1 W i S h t i S + b .
where h t D denotes the H-step-ahead prediction vector produced by a single forward pass of the neural network anchored at time step t.
Considering that cascade hydropower generation exhibits pronounced nonstationarity and amplitude-scale drift in real-world operational data, whereas the CNN and RNN components are only moderately sensitive to changes in input signal scale during feature extraction and thus may fail to capture amplitude-level dynamics in a timely manner, the model’s final prediction is decomposed into a linear part and a nonlinear part. The linear component adopts the classical autoregressive (AR) [18] model, Figure 4 illustrates the AR component. Its general representation is formulated as follows:
y t = c + i = 1 q ϕ i y t i + ε t .
where c is the constant term, q is the model order, ϕ i denotes the coefficients, and ε t denotes the white-noise disturbance term.
We adopt a direct multi-horizon forecasting strategy, where at each anchor time t the model produces, in a single forward pass, an H-step-ahead prediction vector y ^ t R H . Accordingly, the output of the nonlinear branch is denoted by h t D R H , and the output of the AR branch is denoted by h t L R H . The AR branch employs a fixed order q for all samples. Let y t ( q ) = y t , y t 1 , , y t q + 1 R q denote the most recent q target values prior to the anchor time t; the AR component then produces an H-dimensional output directly via a linear mapping:
h t L = W A R y t ( q ) + b A R , W A R R H × q , b A R R H
Finally, the multi-step prediction vector of LSTNet is obtained by additively combining the nonlinear and linear components:
y ^ t = h t D + h t L
This parallel framework, characterized by nonlinearity-dominant learning with linear correction, enables the deep model to capture inter-station temporal dependence and higher-order dynamics, while the linear pathway compensates for short-term trend variations and amplitude-scale drift. Consequently, it better accommodates non-stationary conditions such as abrupt inflow changes, seasonal transitions, and operational adjustments, thereby mitigating forecast bias and improving predictive stability. Figure 5 illustrates the structure of the LSTNet.

2.2.4. Interpretability Method Based on SHAP

In cascade hydropower output forecasting, complex hydraulic and operational coupling exists between upstream and downstream stations, and the magnitude as well as the mechanisms by which individual stations affect the overall system output can differ substantially. In practical applications, providing only accurate point predictions is often insufficient to support risk-aware safety assessment and operational strategy optimization. It is therefore necessary to explain which stations the model relies on and how such information is leveraged for prediction, and to further evaluate the reliability, applicability, and physical interpretability of the developed model. Accordingly, in this study, model interpretability tools are introduced to diagnose the model’s internal decision logic and to make the inference process more transparent [19,20].
Nevertheless, deep learning approaches—such as LSTNet—are frequently viewed as black boxes because their complex nonlinear structure makes the decision process difficult to interpret. Their internal predictive logic is not readily interpretable in an intuitive way, which to a large extent hinders broad acceptance in engineering practice. To address this issue, the present paper introduces interpretable learning techniques so as to uncover the underlying rationale of the hydropower output forecasting model.
At present, model interpretability approaches can be broadly categorized into global and local interpretability. Global interpretability aims to characterize the overall behavioral patterns of a model across the entire dataset, for example, via statistical analyses of feature importance [21]. In contrast, local interpretability focuses on explaining the prediction for a specific sample or a given time step by analyzing how the model derives the corresponding output from particular inputs. Representative model-agnostic local explanation methods include local interpretable model-agnostic explanations (LIME). LIME approximates a complex model by a local linear surrogate within a neighborhood of interest, which is attractive due to its simplicity; however, the approximation error may become substantial under strongly nonlinear conditions. By comparison, SHAP is grounded in cooperative game theory and quantifies feature importance by computing marginal contributions (Shapley values) to the model output, thereby supporting both local and global explanations [22,23]. Because SHAP can provide more fine-grained explanations than LIME and is well suited for analyzing complex dependence structures among multi-dimensional features, it is considered more advantageous for the cascade hydropower forecasting task in this study [24].
In the setting of this paper, let L ( · ) denote a trained LSTNet model. The goal is to construct an explanatory model g ( · ) that approximates the original predictor. In Ref. [25], this idea is expressed as:
g ( z ) L h X 1 z
where z denotes the simplified input, which is obtained via the invertible mapping z = h x ( x ) .
In practical applications, this mapping transforms the original input into a 0/1 indicator vector that specifies whether each feature is included. When an entry equals 0, the corresponding feature is treated as absent. In this work, SHAP is adopted to build an additive explanation model, where the prediction is decomposed into the sum of feature-wise contribution values. Accordingly, the explanation can be formulated as a linear function of these binary indicators. In Ref. [25], this idea is expressed as:
g ( z ) = Φ 0 + i = 1 M Φ i z i
where M denotes the number of feature types, z i { 0 , 1 } , and Φ i represents the contribution of the i-th feature type to the model output. The approach attributes an effect Φ i to each feature and aggregates these effects to approximate the original model output.
An ideal explanation model should satisfy three properties, namely local accuracy, missingness, and consistency. Local accuracy requires that the sum of attribution scores agrees with the model output for the given instance. Missingness requires that unused (absent) features have zero attribution. Consistency requires that, if a feature’s marginal contribution increases in the model, its attribution should not decrease. Under these properties, the value of ϕ i can be uniquely characterized. In Refs. [25,26], these properties are successively described as:
L ( z ) = g ( z ) = Φ 0 + i = 1 M Φ i z i
z i = 0 Φ i = 0
L x z L x z i L x z L x z i Φ i L , X Φ i ( L , X )
where L denotes the surrogate new model, and z i indicates the feature vector with z = 0 .
Ref. [27] proves that, for a given simplified input mapping, there exists a unique additive feature-attribution scheme satisfying the above three properties; this unique set of attributions is given by the Shapley values. This result follows from cooperative game theory, in which ϕ i is referred to as the Shapley value [28]. By examining the magnitude of ϕ i , the contribution of each hydropower station to the overall system can be quantified.
To balance computational cost and representativeness, SHAP attributions are computed using subsets rather than the full dataset. For GradientExplainer, the background distribution is constructed by randomly sampling 50 continuous windows of length T from the training set, while the explained samples are obtained by randomly drawing (without replacement) up to 1000 windows from the test set with a fixed random seed. Sampling is performed at the window level: the temporal order within each window is preserved and the continuous segments are not shuffled. Because SHAP is computed independently for each window, the relative ordering across windows does not affect the attribution results; thus, random window selection does not introduce an issue of “temporal order disruption”.
In addition, the MIC introduced above can also be employed to quantify, from the perspective of statistical dependence, the association strength between each hydropower station and the total system output, which constitutes a conventional interpretability pathway. In the following, the MIC-based feature-selection results are systematically compared with the SHAP attribution outcomes to elucidate their similarities and differences, thereby providing a comprehensive validation of the proposed method in revealing the intrinsic operational mechanisms of cascade hydropower systems.

3. Results

3.1. Study Area and Cascade Configuration

3.1.1. Study Basin and Hydrographic Setting

The study area is located in the Mudan River basin in Northeast China, which is an important sub-basin of the Songhua River system. The investigated reach is situated in the upper Dunhua section of the Mudan River main stem. The basin has a cold-temperate continental monsoon climate, with precipitation concentrated in June–September; floods are predominantly triggered by heavy rainfall and occur mainly in July–August. In addition to the propagation of main-stem inflow signals through the cascade system, reach-scale inflow is jointly affected by tributary inputs and engineered water transfers, leading to spatially heterogeneous water availability and dynamic responses among stations. The basin/hydrological characteristics, main hydrographic network, and catchment area information used in this study are summarized in Table 1, Table 2 and Table 3.

3.1.2. Cascade Layout and Hydropower Stations

The studied hydropower cascade is developed in series along the Mudan River main stem. From upstream to downstream, the stations are A, B, C, D, E, F, and G. The system is predominantly run-of-river or weakly regulated, while D exhibits limited regulation capability. In addition to the stepwise propagation of main-stem inflow signals through the cascade, Shanggou is influenced by an inter-basin diversion from the Sha River, and reach-scale inflow to the Hongshi segment includes tributary contributions such as the Huangni River. These factors jointly affect downstream reservoir inflow conditions and generation responses. The main station characteristics used in this study are summarized in Table 4 and the schematic diagram of the cascade layout is shown in Figure 6.

3.2. Data Sources and Experimental Setup

This study investigates an operational cascade hydropower system in the Songhua River Basin, Jilin Province, Northeast China, and uses nearly three years of measured operating records for validation. The dataset consists of hourly hydropower output time-series and contemporaneous meteorological driving factors. To alleviate scale heterogeneity across multi-source variables and improve model convergence and training stability, all input variables are subjected to standardization and normalization. The hydropower output data were automatically collected by the monitoring system as AC-side active power (MW). Variability analysis indicates that the system-level aggregate output ( t o t ) exhibits pronounced hourly dynamics and is not dominated by long constant-output regimes: t o t varies approximately within 10–55 MW with a standard deviation of about 13.41 MW. Using the hour-to-hour difference t o t between consecutive hours as the metric, the maximum length of a consecutive near-constant period is 9 h under the criterion | t o t | 0.01 MW. The dataset was partitioned into training, validation, and test subsets using a 7:2:1 split. Moreover, to reduce the disturbance introduced by redundant information, a nonlinear feature-screening module based on MIC was placed at the input stage. Only those variables showing strong coupling with total system power output were preserved as effective inputs.
The hyperparameter settings and training procedure of the LSTNet model settings are given below. Adaptive Moment Estimation (Adam) was used for parameter updates. The initial learning rate was set to 0.001, with a batch size of 64 and a maximum of 200 training epochs. To enhance generalization and reduce overfitting, L2 regularization and Dropout (rate 0.3) were embedded in the network. In addition, a validation-loss-driven learning-rate decay schedule and an early-stopping criterion were applied to further reinforce training robustness.

Evaluation Metrics

To ensure a thorough and impartial evaluation of predictive capability, this study adopts three standard metrics widely used in hydropower forecasting research: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE). MAE measures the mean absolute difference between predicted and observed values. Because it shares the same physical unit as the target variable, MAE is easy to interpret in engineering terms and provides a direct indication of the absolute error scale. RMSE reflects the spread of prediction errors; by accumulating squared deviations, it imposes a heavier penalty on larger misses, and therefore better reveals robustness and stability when confronted with extreme or abnormal fluctuations. MAPE, in turn, assesses accuracy on a relative basis by computing the mean percentage deviation of the forecasts from the observed values. Since it reduces the influence of magnitude, it enables consistent comparisons of relative forecasting accuracy across different scales.
M A E = 1 N h h = 1 N h y h Y ^ h
R M S E = 1 N h h = 1 N h y h Y ^ h 2
M A P E = 1 N h h = 1 N h y h Y ^ h y h × 100 %
where N h denotes the number of hours in the test set, y h denotes the ground-truth value, and Y ^ h denotes the predicted value.

3.3. MIC-Based Feature Screening

In a high-dimensional nonlinear setting, we employ the MIC to screen meteorological input features. And to quantitatively justify the use of MIC for feature screening, Pearson’s linear correlation coefficient and Spearman’s rank correlation coefficient are further computed under the same feature construction and sample-alignment strategy as MIC. The comparison results are presented in Figure 7 and Table 5.
Figure 7 presents a bar-chart comparison across meteorological features, while Table 5 reports the corresponding values of the three coefficients. The results indicate that most meteorological factors yield consistent or near-consistent conclusions under the three metrics. When the relationship between a given variable and hydropower generation is approximately linear or monotonic, MIC and correlation-based measures exhibit comparable magnitudes and broadly similar rankings; consequently, MIC does not alter the screening outcome implied by conventional correlation analysis. For example, for total precipitation, MIC = 0.694 , r = 0.625 , and ρ = 0.629 ; for specific humidity, MIC = 0.538 , r = 0.527 , and ρ = 0.498 .
Meanwhile, several variables show markedly higher MIC values than correlation coefficients, suggesting that relying solely on linear or monotonic correlation may underestimate the dependence between certain meteorological drivers and hydropower generation. For instance, for air temperature, MIC = 0.565 whereas r = 0.128 and ρ = 0.348 ; for dew-point temperature and wet-bulb temperature, MIC 0.517 while r 0.235 and ρ 0.375 . These discrepancies imply that the relationship between some meteorological factors and generation may involve nonlinear responses or conditional dependence modulated by other variables, under which Pearson’s correlation can be substantially attenuated and Spearman’s rank correlation, although improved, may still fail to fully capture the dependence strength. In contrast, MIC, as a more general nonparametric measure of dependence, can more robustly identify potentially relevant drivers and is therefore better suited to the feature-screening task in this study.

3.4. Prediction Performance of LSTNet

In this work, the aggregate power output variable t o t is used as the forecasting target for 24 h ahead prediction. To examine the effectiveness of LSTNet, three widely adopted reference models—RNN, GRU, and Informer—are introduced for side-by-side comparison. Meanwhile, generation records from Hydropower Station A and Hydropower Station D are further included as auxiliary targets, enabling performance checks under different spatial settings. The corresponding outcomes are reported in Table 6.
Table 6 compares the forecasting accuracy of several models for both the cascade-system total output ( t o t ) and the single-station outputs (A and D). Under the evaluated settings, LSTNet shows a distinct overall advantage, yielding substantially lower errors in most cases while remaining highly competitive when the task becomes more challenging.
  • LSTNet achieves the lowest errors among all approaches, with MAE = 1.0756 MW, RMSE = 1.1391 MW, and MAPE = 3.5559%. Compared with the RNN baseline, these metrics drop by 87.05%, 87.88%, and 73.94%, respectively. Relative to GRU, the reductions are 77.73%, 83.54%, and 57.41%; relative to Informer, the gains reach 79.42%, 85.34%, and 63.92%. In other words, it is LSTNet-not the recurrent baselines or Informer-that most effectively constrains both absolute and relative errors for the system-level target.
  • Even at the station scale, LSTNet’s advantage remains apparent. At the upstream Station A, it reports MAE = 0.3431 MW, RMSE = 1.7919 MW, and MAPE = 13.8712%. At the midstream Station D, the corresponding values are MAE = 1.4671 MW, RMSE = 1.8704 MW, and MAPE = 11.1815%. Considered jointly, these results substantiate the robustness and transferability of LSTNet across spatially heterogeneous targets within the cascade system.
Overall, LSTNet exhibits stable performance in both the aggregated total-output forecasting task and the single-station forecasting tasks with greater spatial heterogeneity; nevertheless, relatively large errors may still arise under a small number of operating conditions. For example, although LSTNet attains a lower MAE at Station A, its RMSE is comparatively higher. This apparent discrepancy can be attributed to the different sensitivities of MAE and RMSE to the tail of the error distribution: MAE is more robust to a small number of extreme residuals, whereas RMSE disproportionately magnifies such extremes through squared penalization. The subsequent error-distribution diagnostics, including the empirical cumulative distribution function (ECDF) and the complementary cumulative distribution function (CCDF) on a logarithmic scale, consistently suggest a heavier tail for LSTNet, thereby explaining why a small fraction of extreme-error instances can inflate RMSE (see Figure 8 and Figure 9).
To further validate the robustness of the performance differences summarized in Table 6, we examine the horizon-wise MAE behavior together with its uncertainty. All models adopt the same multi-step forecasting strategy. As shown in Figure 10 and Figure 11, LSTNet yields consistently positive MAE improvements over Informer across the full 24-h horizon, with many lead times remaining significant after multiple-comparison control, indicating a systematic advantage rather than an effect confined to a few horizons. In comparison with GRU, the improvements are more lead-time dependent (Figure 12 and Figure 13): LSTNet performs better for most horizons and is significant at several lead times, while the gap narrows around a subset of mid-range horizons where the uncertainty bands approach zero, suggesting comparable accuracy in those cases. Overall, these horizon-resolved results provide complementary evidence that the superiority reported in Table 6 is broadly consistent across lead times and not driven by a small number of atypical cases.
Figure 14 further illustrates predicted total output against observations on randomly selected test dates. The trajectories produced by RNN and GRU display evident lag and mismatch around peaks and troughs; by contrast, LSTNet follows the observed series (Real) more closely and captures the intra-day fluctuation structure with greater fidelity. The locally zoomed-in segments strengthen this observation: near extremes, LSTNet responds more accurately, highlighting its stronger ability to model nonlinear dynamics and rapid output variations.

3.5. Correlation Analysis of Hydropower Station Outputs

Figure 15 presents the MIC bar chart linking the generation features of individual stations to the cascade total output ( t o t ). Because MIC quantifies nonlinear dependence in a rigorous, numerical form, several patterns emerge. The upstream Station A exhibits the tightest coupling with the system total, reaching an MIC of 0.65, which signals a strong statistical dependence on overall cascade generation. Stations C and B follow, with MIC values of 0.42 and 0.40, respectively, reflecting a moderate association with t o t . By comparison, Stations D, E, F, and G are only weakly related to the total output, with MIC values of 0.15, 0.29, 0.24, and 0.16, respectively. Notably, Station D records the lowest MIC (0.15) among all candidate inputs. Viewed pairwise, this indicates that Station D has an exceptionally weak direct statistical correspondence with the cascade total output.

3.6. Model Interpretability Results

3.6.1. Global Feature Importance Ranking Based on SHAP

Figure 16 summarizes the global feature-importance order derived from mean absolute SHAP values. Here, bar length reflects a feature’s average marginal impact on the model output, providing an intuitive indication of how heavily each station is emphasized when predicting cascade total generation.
Among all inputs, the first-stage upstream station (Feature A) shows an unequivocally dominant influence: its mean SHAP value reaches +2.58, exceeding all other variables by a wide margin. Station D emerges as the second most influential factor, with a SHAP value of +1.03. Although this is still far lower than Station A, it remains noticeably larger than the contributions of the other downstream stations, suggesting that Station D serves as a key secondary driver within the model’s decision mechanism. By contrast, the remaining stations—B, C, E, and G—each have SHAP values below 0.3. This implies that, under the joint forecasting configuration for the cascade system, these stations offer relatively little marginal information when treated as stand-alone predictors, and their individual effect on the total-output estimate is therefore limited.

3.6.2. SHAP Dependence Analysis

Unlike global importance rankings, which capture only average effects, SHAP dependence plots provide a finer, sample-level perspective. By preserving the nonlinear mapping between feature values and their SHAP attributions, these plots can reveal local structural patterns that global aggregation tends to obscure, thereby shedding light on the model’s more nuanced response behavior.
Figure 17 illustrates the nonlinear association between Feature D (the output of Station D) and its SHAP values. The influence of Feature D displays a distinct threshold-like pattern, with 7.8 MW serving as a critical inflection point. When Station D’s output remains below 7.8 MW, SHAP values concentrate on the negative side (approximately −2.0 to 0.0), implying that in this interval Feature D predominantly pushes the predicted cascade total downward. Once the output exceeds 7.8 MW, however, the SHAP values increase sharply in a strongly nonlinear manner, rapidly turning positive and rising above +1.0. This segmented response suggests that Feature D’s marginal contribution to the cascade total is inherently regime-dependent and nonlinear, rather than being governed by a simple linear relationship.

3.6.3. Local SHAP Explanation for a Single Sample

To investigate the model’s microscopic decision-making mechanism at the individual sample level, SHAP decision plots are employed to provide a precise attribution of the prediction outcome. As illustrated in Figure 18, the decision plot visually depicts the complete path through which the model’s predicted value evolves from the base value (i.e., the mean prediction over the training set, indicated by the gray vertical line) to the final model output, via the cumulative aggregation of marginal feature contributions.In the plot, the polyline extends from bottom to top, where each turning point corresponds to the inclusion of a specific feature. At each feature node, the sign and extent of the horizontal shift quantitatively indicate whether the feature promotes or suppresses the prediction, as well as the magnitude of its contribution to the final output. By examining the evolution of the decision path, the key driving factors governing the prediction at the current time step can be clearly identified.
Contrasting this single-sample decision path with the global ordering in Figure 16 makes the extent of local heterogeneity immediately evident. To begin with, Feature D emerges as the strongest positive contributor at this time point: it yields the largest rightward shift, indicating that under the current hydro-meteorological boundary conditions, Station’s output is the main factor elevating the predicted cascade total. At the same step, the upstream Station A triggers a marked negative correction. Although Feature A is globally the most influential variable, it does not retain that dominance locally; once incorporated, it produces a clear leftward displacement, reflecting a dampening effect on the final estimate. As for the remaining stations (e.g., E, B, and G), they serve primarily as secondary adjusters, making only modest refinements-Feature E through a slight rightward increment, and Features B and G through slight leftward offsets.

4. Discussion

4.1. Mechanistic Analysis of the Superiority of the LSTNet Model

Table 6 and Figure 14 indicate that LSTNet outperforms the benchmark models—RNN, GRU, and Informer—on every reported metric (RMSE, MAE, MAPE), and it does so with striking consistency. This advantage is not accidental; rather, it underscores how effectively LSTNet’s hybrid architecture aligns with the spatiotemporal dynamics that characterize cascade hydropower operations across the full set of test scenarios.
Within LSTNet, the built-in CNN plays a central role by capturing local signatures that are crucial for short-horizon forecasting. In cascade hydropower systems-particularly when total regulating storage is constrained-generation can respond abruptly to hydro-meteorological forcing, leading to pronounced moment-to-moment variability and an output profile that is distinctly stochastic. Many RNN-family baselines, when asked to digest long histories, implicitly average away these rapid variations, and thus can miss sudden yet information-rich shifts even when those jumps carry cues for near-term forecasting. In contrast, the convolutional layers in LSTNet apply sliding windows along the temporal dimension, enabling the model can attend to local subsequences within the input matrix, thereby extracting short-term coupling patterns between station outputs and MIC-selected environmental variables. Consequently, LSTNet can promptly capture the immediate impacts of environmental factors (e.g., sudden heavy rainfall) or rapid upstream output variations on the total system generation, thereby significantly enhancing its responsiveness to short-term dynamics and improving prediction accuracy.
Next, compared with the benchmark models, the recurrent and Recurrent-Skip components of LSTNet substantially strengthen the model’s capacity to learn temporal dependencies at both short and long horizons. Cascade hydropower systems are inherently characterized by flow propagation delays: changes in upstream discharge propagate downstream through river channels and affect downstream generation only after a certain time lag. When confronted with long input sequences, conventional RNN and GRU architectures often suffer from vanishing gradients, so key signals embedded in distant history can fade during training. LSTNet addresses this weakness through its Recurrent-Skip mechanism: by inserting temporal skip connections, the model can directly learn long-horizon periodicity and persistent time dependencies. As a result, it describes more faithfully the lagged propagation from upstream inflow variations to downstream power output, thereby offsetting the limited “memory” inherent to a single recurrent pathway.
In essence, LSTNet unifies short- and long-range temporal learning in a tightly coupled manner. The CNN module focuses on extracting fine-grained local oscillations, whereas the recurrent backbone-augmented by skip links-tracks the slower, longer-term evolution of the system. Because these parts complement rather than duplicate each other, LSTNet avoids the structural bottlenecks of single-paradigm models and consequently produces forecasts of cascade hydropower total generation that are not only more accurate, but also more robust and stable.

4.2. Analysis of the Low-Correlation–High-Importance Paradox

4.2.1. Theoretical Analysis

The global SHAP importance ranking in Figure 16 indicates that the marginal contribution of hydropower station D is second only to that of the first-stage station A, ranking second in the entire system, which suggests that D plays a core driving role in the predictive rationale of the LSTNet model; however, when replacing it with MIC-based statistical perspective shown in Figure 15, a pronounced cognitive divergence is observed.
According to the MIC bar chart in Figure 15, the first-stage station A exhibits an MIC value as high as 0.65 with respect to the total output, confirming its role as the dominant system variable. By contrast, station D has an MIC value of only 0.15, ranking last among all input variables. Under classical feature-selection theory, station D would typically be regarded as a redundant feature or a noise variable, and its marginal influence on the total output would be expected to be negligible.
The pronounced divergence of Station D between the MIC statistical score (ranked last) and the SHAP-based model attribution (ranked second) reveals the low-correlation–high-importance paradox, which profoundly highlights the essential mechanistic divide between bivariate statistical metrics and high-dimensional model attribution. MIC is essentially an indicator that measures the pairwise dependence strength between a single feature and the target variable; its limitation lies in the difficulty of capturing complex interaction effects when multiple variables coexist. From an isolated statistical perspective, because Station D has a certain degree of regulation capability, its output is significantly influenced by dispatch strategies and reservoir-storage operation, thereby weakening its instantaneous synchrony with the cascade total output and resulting in a relatively low coupling strength in correlation-based measurements. However, within the high-dimensional nonlinear feature space constructed by LSTNet, Station D does not exist in isolation; rather, it forms a strongly coupled feature combination with variables such as the first-stage station A, namely a complementary mechanism in which Station A establishes the upstream flow baseline and Station D performs local correction. Specifically, at the beginning of the cascade, the first-stage station A’s output series effectively establishes the initial hydraulic boundary of the entire power-generation system. Meanwhile, Station D acts as a key relay node in the physical topology: it not only receives the lagged releases from upstream Station A, but also aggregates tributary inflow information from the intermediate reach. In this reach, the flow regime is further influenced by an inter-basin diversion of Sha River water and by tributary inflows such as the Huangni River, which contribute to reach-scale increments superimposed on the propagated upstream signal. Therefore, Station D’s output data effectively quantify the reach increment generated when the initial signal from Station A traverses the midstream reach (i.e., tributary confluences and rainfall recharge), providing key informational supplementation for the water-quantity evolution process of the reach. Precisely because Station D fills the information gap of the first-stage station signal during the evolution process, it yields an extremely high marginal contribution in the joint forecasting task and is thus identified by the model as an indispensable key driving factor. In contrast, the remaining stations such as B and C, because most of the hydrological evolution information they contain has largely been covered by the two key nodes A and D, exhibit a high degree of informational redundancy and homogeneity; consequently, their marginal contribution rates as independent predictive factors are substantially reduced.

4.2.2. Physical Mechanism Verification Based on Nonlinear Analysis

The threshold behavior of feature D identified in Section 3.6.2 (Figure 17) offers compelling station-level evidence for the baseline-correction hypothesis advanced in Section 4.2.1, and it clarifies-at the level of underlying mechanism-the apparent low-correlation–high-importance paradox. Figure 17, the reversal in the marginal contribution of Station D around 7.8 MW indicates that LSTNet is not merely performing a linear fit; instead, it has internalized a condition-dependent rule that switches with hydrological operating regimes:
  • Baseline confirmation during dry periods: When Station D’s output remains below the critical threshold of 7.8 MW, the corresponding SHAP values fall predominantly on the negative side (−2.0 to 0.0). From a physical standpoint, this pattern implies that during dry or near-normal flow conditions-when appreciable tributary inflows along the midstream reach are largely absent-Station D serves mainly as a stabilizing, bias-correcting signal. In this regime, it applies a sustained negative offset to the forecast, curbing any tendency to overpredict the cascade total and lowering the risk of generating unrealistically high-output estimates.
  • Gain indication during wet periods: Once Station D’s output exceeds 7.8 MW, its marginal contribution (SHAP value) rises sharply and in a distinctly nonlinear manner, rapidly surpassing +1.0. What this reveals is a concealed hydraulic-coupling cue embedded in the cascade response. Under wet conditions, elevated generation at Station D is no longer merely a manifestation of plant-level operation; instead, it becomes a sensitive surrogate for strengthened midstream inflows induced by basin-scale precipitation or tributary recharge. Station D thus operates as an indicator of heightened water availability at the watershed scale.
Interpreted through this SHAP dependence perspective, the physical basis for Station D’s low-correlation–high-importance characteristic becomes clear. Whereas the upstream Station A supplies a system-wide, near-linear baseline reference, Station D acts as a nonlinear gain marker that tracks basin-scale hydrological variability. By injecting key interval-increment information precisely at the dry-wet transition around the threshold, Station D enables the model to correct total-output estimates dynamically and, in turn, improves forecasting accuracy. This result substantiates the theoretical claim that Station D plays a pivotal role in physical correction and information supplementation during flow evolution; it also closes the loop between statistical patterns, attribution-based interpretation, and hydrological mechanism, strengthening the coherence and interpretability of the proposed framework.

4.3. Micro-Level Decision Mechanism Analysis Based on Local Explanations

The SHAP decision plot in Section 3.6.3 (Figure 18) juxtaposes single-sample feature attributions with the global importance ordering reported in Figure 16. Tracing this sample-specific decision trajectory makes it possible to see, in much finer detail, how LSTNet responds to spatiotemporal asynchrony and multi-scale coupling in cascade hydropower systems. Equally important, the local SHAP view counteracts a common limitation of global rankings: the tendency to obscure information that is distinctive to particular samples.
First, the influence of the principal drivers is reconfigured dynamically at the local scale. Although the global analysis suggests that the upstream Station A typically governs the prediction, the selected sample follows a markedly different path: Station A does not remain dominant, and its contribution appears instead as a negative correction, while Station D provides the strongest positive gain. Such a reversal points to intricate hydraulic interdependencies within the cascade, especially the mismatch that can arise between upstream and downstream hydrological states. In this case, the pronounced rightward shift attributed to Station D reflects a sizable replenishment of midstream inflow—consistent with episodes such as localized heavy rainfall—thereby making Station D the primary driver that lifts the forecast upward. In contrast, the leftward shift attributable to Station A captures the limiting role of upstream boundary conditions, including weakened inflow or low reservoir storage, which suppress the achievable total output. By learning this sample-level inversion of importance, the model exhibits a nonlinear compensation strategy under “dry upstream–wet downstream” conditions: decision weights are adjusted according to real-time boundary signals rather than being rigidly dictated by global statistical regularities.
Next, Figure 18 also makes the marginal regulatory effects of secondary stations explicit. Stations E, B, and G show comparatively small yet directionally diverse displacements (for example, positive for E and negative for B and G). These mixed signs are consistent with spatially heterogeneous forcing, such as uneven rainfall distribution across the basin. Rather than constituting redundant inputs, the secondary stations cooperate with the core stations (A and D) to form a multi-scale coupling structure that blends “macroscopic tendencies” with “microscopic perturbations.” Concretely, A and D establish the primary trajectory of the forecast, while stations such as E and B contribute fine-scale hydrological cues from local sub-basins, helping to explain residual variability that dominant drivers alone do not fully resolve. This targeted, low-amplitude adjustment enhances the physical completeness of the modeled output.
Taken together, the single-sample decomposition suggests that LSTNet does not merely adhere to a global statistical blueprint. Rather, it redistributes decision weight on the fly in accordance with contemporaneous hydrological boundary conditions. Through the positive amplification conveyed by Feature D, the negative constraint introduced by Feature A, and the fine-grained corrections contributed by stations such as E and B, the model effectively assembles a nonlinear dynamic equilibrium that rectifies forecasts under complex, nonstationary conditions. This further highlights the distinctive utility of micro-level SHAP explanations: by precisely quantifying individualized response patterns, local interpretability mitigates the masking effects of global importance summaries and, in turn, reinforces both the transparency and the hydrological plausibility of model decisions in the presence of nonstationary processes.

5. Conclusions

To tackle the key challenges in cascade run-of-river hydropower forecasting—complex interactions among high-dimensional features, nonstationary hydrological dynamics, and the limited interpretability of deep learning predictors—this study proposes an integrated forecasting and interpretation framework that combines MIC-based feature screening, the LSTNet deep neural network, and SHAP-driven game-theoretic attribution. Experiments are conducted on an operational cascade system in a northeastern province of China, and the main results are summarized as follows.
  • High-precision spatiotemporal coupled modeling: A hybrid prediction strategy is constructed by pairing MIC-driven feature screening with LSTNet-based spatiotemporal learning. MIC is first used to filter redundant, noise-dominated variables from high-dimensional micrometeorological inputs and to pinpoint the nonlinear drivers with the greatest explanatory power, thereby forming a more reliable and information-dense feature set. Building on this input space, LSTNet—through its convolutional-recurrent composite structure—captures short-horizon local coupling signatures while simultaneously representing the longer-term evolutionary behavior of the cascade system. Experiments indicate that the proposed approach achieves lower RMSE, MAE, and MAPE than typical baseline models (e.g., RNN and GRU), delivering the level of accuracy required for forecasting under complex hydraulic interactions.
  • An interpretable explanation architecture for model decisions: By incorporating SHAP explanations at both global and local levels, the proposed framework substantially enhances the physical intelligibility of the model’s decision-making process. At the global scale, SHAP reveals that LSTNet has internalized a form of hydrological complementarity: Station A delivers a basin-level baseline cue, while Station D provides focused midstream adjustments. Moreover, the 7.8 MW threshold behavior identified in the SHAP dependence analysis reinforces Station D’s function as a sensitive indicator of basin-scale hydrological variability, thereby elucidating the low-correlation–high-importance effect. At the local scale, single-sample attributions show that the model can accommodate upstream-downstream inconsistencies by dynamically reweighting features to establish a nonlinear balance, which directly reduces the information masking inherent in purely global summaries. In this manner, the network’s internal rationale is translated into interpretable hydrological mechanisms, strengthening the empirical foundation for engineering operation and dispatch decisions.
Overall, the MIC-LSTNet-SHAP framework offers a workable pathway for alleviating the long-standing trade-off between forecasting accuracy and interpretability in cascade hydropower prediction. It offers a new paradigm for advancing hydrological and hydropower studies from purely data-driven approaches toward physically interpretable artificial intelligence. Nevertheless, as this study is based on a single cascade system and does not yet explicitly incorporate hydrodynamic governing equations, future efforts will extend the validation to multi-basin settings and investigate incorporating Physics-Informed Neural Networks (PINNs) to strengthen physical consistency and improve robustness under extreme operating conditions.

Author Contributions

Z.L.: Conceptualization, Methodology. X.S.: Funding acquisition, Supervision, Writing—Reviewing and Editing. Y.H.: Data curation, Writing—Original Draft Preparation. Y.R.: Model Implementation, Writing—Reviewing and Editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Natural Science Foundation of Sichuan Province (2025ZNSFSC0452).

Data Availability Statement

The authors do not have permission to share the data.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
LSTNetLong- and Short-term Time-series Network
MICMaximal Information Coefficient
SHAPSHapley Additive exPlanations
CNNConvolutional Neural Network
RNNRecurrent Neural Network
ARIMAAutoregression Moving Average Model
ARIMAXAutoregressive Integrated Moving Average with Exogenous Inputs
GBDTgradient-boosting decision trees
LightGBMLight Gradient-Boosting Machine
LSTMLong Short-Term Memory
GRUGated Recurrent Unit
PINNsPhysics-Informed Neural Networks
FCFully Connected Layer
ARAutoregressive
LIMELocal Interpretable Model-Agnostic Explanations
AdamAdaptive Moment Estimation
MAEMean Absolute Error
RMSERoot Mean Square Error
MAPEMean Absolute Percentage Error

References

  1. Wen, Y.; Yang, W.; Wang, R.; Xu, W.; Ye, X.; Li, T. Review and Prospects of Building a 100% Renewable Energy Power System. Proc. CSEE 2020, 40, 1843–1856. [Google Scholar] [CrossRef]
  2. Li, F.; Li, X.S.; Li, Z.X.; Zhang, B.Q. Short-Term Optimal Scheduling of Multi-Energy Complementary Power Generation System Based on Cascaded Hydropower Regulation. Power Syst. Prot. Control 2022, 50, 11–20. [Google Scholar] [CrossRef]
  3. Zou, C.; Zhao, Q.; Zhang, G.; Xiong, B. Energy revolution: From a fossil energy era to a new energy era. Nat. Gas Ind. B 2016, 3, 1–11. [Google Scholar] [CrossRef]
  4. Barzola-Monteses, J.; Mite-León, M.; Espinoza-Andaluz, M.; Gómez-Romero, J.; Fajardo, W. Time Series Analysis for Predicting Hydroelectric Power Production: The Ecuador Case. Sustainability 2019, 11, 6539. [Google Scholar] [CrossRef]
  5. Atalay, B.A.; Zor, K. An Innovative Approach for Forecasting Hydroelectricity Generation by Benchmarking Tree-Based Machine Learning Models. Appl. Sci. 2025, 15, 10514. [Google Scholar] [CrossRef]
  6. Tebong, N.K.; Simo, T.; Takougang, A.N. Two-level deep learning ensemble model for forecasting hydroelectricity production. Energy Rep. 2023, 10, 2793–2803. [Google Scholar] [CrossRef]
  7. Liu, J.; Yuan, X.; Zeng, J.; Jiao, Y.; Li, Y.; Zhong, L.; Yao, L. Ensemble streamflow forecasting over a cascade reservoir catchment with integrated hydrometeorological modeling and machine learning. Hydrol. Earth Syst. Sci. 2022, 26, 265–278. [Google Scholar] [CrossRef]
  8. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
  9. Guidotti, R.; Monreale, A.; Ruggieri, S.; Turini, F.; Giannotti, F.; Pedreschi, D. A Survey of Methods for Explaining Black Box Models. ACM Comput. Surv. 2018, 51, 1–42. [Google Scholar] [CrossRef]
  10. Lai, G.; Chang, W.; Yang, Y.; Liu, H. Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks. In SIGIR’18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval; Association for Computing Machinery: New York, NY, USA, 2018; pp. 95–104. [Google Scholar] [CrossRef]
  11. Xu, B.; Yang, G. Interpretability research of deep learning: A literature survey. Inf. Fusion 2025, 115, 102721. [Google Scholar] [CrossRef]
  12. Schober, P.; Boer, C.; Schwarte, L.A. Correlation Coefficients: Appropriate Use and Interpretation. Anesth. Analg. 2018, 126, 1763–1768. [Google Scholar] [CrossRef]
  13. Taheri, N.; Tucci, M. Enhancing Regional Wind Power Forecasting through Advanced Machine-Learning and Feature-Selection Techniques. Energies 2024, 17, 5431. [Google Scholar] [CrossRef]
  14. Reshef, D.N.; Reshef, Y.A.; Finucane, H.K.; Grossman, S.R.; McVean, G.; Turnbaugh, P.J.; Lander, E.S.; Mitzenmacher, M.; Sabeti, P.C. Detecting Novel Associations in Large Data Sets. Science 2011, 334, 1518–1524. [Google Scholar] [CrossRef]
  15. Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 2020, 53, 5455–5516. [Google Scholar] [CrossRef]
  16. Lipton, Z.C. A Critical Review of Recurrent Neural Networks for Sequence Learning. arXiv 2015, arXiv:1506.00019. [Google Scholar] [CrossRef]
  17. Wang, H.; Fu, W.; Li, C.; Li, B.; Cheng, C.; Gong, Z.; Hu, Y. Short-Term Wind and Solar Power Prediction Based on Feature Selection and Improved Long- and Short-term Time-series Networks. Math. Probl. Eng. 2023, 2023, 7745650. [Google Scholar] [CrossRef]
  18. Brockwell, P.J.; Davis, R.A. Introduction to Time Series and Forecasting, 3rd ed.; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar] [CrossRef]
  19. Slater, L.; Blougouras, G.; Deng, L.; Deng, Q.; Ford, E.; Hoek van Dijke, A.; Huang, F.; Jiang, S.; Liu, Y.; Moulds, S.; et al. Challenges and opportunities of ML and explainable AI in large-sample hydrology. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2025, 383, 20240287. [Google Scholar] [CrossRef]
  20. Papacharalampous, G.A.; Tyralis, H. A review of machine learning concepts and methods for addressing challenges in probabilistic hydrological post-processing and forecasting. Front. Water 2022, 4, 961954. [Google Scholar] [CrossRef]
  21. Rengasamy, D.; Mase, J.M.; Kumar, A.; Rothwell, B.; Torres Torres, M.; Alexander, M.R.; Winkler, D.A.; Figueredo, G.P. Feature importance in machine learning models: A fuzzy information fusion approach. Neurocomputing 2022, 511, 163–174. [Google Scholar] [CrossRef]
  22. Parsa, A.B.; Movahedi, A.; Taghipour, H.; Derrible, S.; Mohammadian, A.K. Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis. Accid. Anal. Prev. 2020, 136, 105405. [Google Scholar] [CrossRef]
  23. Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef] [PubMed]
  24. Salih, A.M.; Raisi-Estabragh, Z.; Galazzo, I.B.; Radeva, P.; Petersen, S.E.; Menegaz, G. A perspective on explainable artificial intelligence methods: SHAP and LIME. arXiv 2024, arXiv:2305.02012. [Google Scholar] [CrossRef]
  25. Shen, X.; Liu, H.; Qiu, G.; Liu, Y.; Liu, J.; Fan, S. Interpretable Interval Prediction-Based Outlier-Adaptive Day-Ahead Electricity Price Forecasting Involving Cross-Market Features. IEEE Trans. Ind. Inform. 2024, 20, 7124–7137. [Google Scholar] [CrossRef]
  26. Sun, L. Revisiting Human Activity Recognition with Explainable AI Approaches. Ph.D. Thesis, State University of New York, Buffalo, NY, USA, 2023. [Google Scholar]
  27. Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30, pp. 4765–4774. [Google Scholar] [CrossRef]
  28. Shapley, L.S. A Value for n-Person Games. In Contributions to the Theory of Games II; Annals of Mathematics Studies; Princeton University Press: Princeton, NJ, USA, 1953; pp. 307–317, Reprinted in Classics in Game Theory, Princeton University Press: Princeton, NJ, USA, 2020; pp. 69–79. [Google Scholar] [CrossRef]
Figure 1. Environmental drivers of generation variability in cascade hydropower systems.
Figure 1. Environmental drivers of generation variability in cascade hydropower systems.
Energies 19 01540 g001
Figure 2. Basic structure of the CNN.
Figure 2. Basic structure of the CNN.
Energies 19 01540 g002
Figure 3. Basic structure of the RNN.
Figure 3. Basic structure of the RNN.
Energies 19 01540 g003
Figure 4. Autoregressive component.
Figure 4. Autoregressive component.
Energies 19 01540 g004
Figure 5. Overall architecture of LSTNet.
Figure 5. Overall architecture of LSTNet.
Energies 19 01540 g005
Figure 6. Schematic diagram of the cascade layout.
Figure 6. Schematic diagram of the cascade layout.
Energies 19 01540 g006
Figure 7. MIC vs. correlation analysis.
Figure 7. MIC vs. correlation analysis.
Energies 19 01540 g007
Figure 8. ECDF of absolute error.
Figure 8. ECDF of absolute error.
Energies 19 01540 g008
Figure 9. Tail comparison via CCDF.
Figure 9. Tail comparison via CCDF.
Energies 19 01540 g009
Figure 10. MAE improvement with 95% CI LSTNet vs. Informer.
Figure 10. MAE improvement with 95% CI LSTNet vs. Informer.
Energies 19 01540 g010
Figure 11. Per-horizon MAE of LSTNet vs. Informer.
Figure 11. Per-horizon MAE of LSTNet vs. Informer.
Energies 19 01540 g011
Figure 12. MAE improvement with 95% CI LSTNet vs. GRU.
Figure 12. MAE improvement with 95% CI LSTNet vs. GRU.
Energies 19 01540 g012
Figure 13. Per-horizon MAE of LSTNet vs. GRU.
Figure 13. Per-horizon MAE of LSTNet vs. GRU.
Energies 19 01540 g013
Figure 14. Prediction curves of different models.
Figure 14. Prediction curves of different models.
Energies 19 01540 g014
Figure 15. Correlation analysis of power outputs across cascade hydropower stations.
Figure 15. Correlation analysis of power outputs across cascade hydropower stations.
Energies 19 01540 g015
Figure 16. Global feature importance ranking plot.
Figure 16. Global feature importance ranking plot.
Energies 19 01540 g016
Figure 17. SHAP dependence plot for feature D.
Figure 17. SHAP dependence plot for feature D.
Energies 19 01540 g017
Figure 18. SHAP decision plot.
Figure 18. SHAP decision plot.
Energies 19 01540 g018
Table 1. Hydrological and basin characteristics summary.
Table 1. Hydrological and basin characteristics summary.
ItemContent
River basin hierarchyLocated within the Songhua River basin; the Mudan River basin is a sub-basin of the Songhua River system.
Study basin/reachLocated along the Mudan River main stem in the upper Dunhua reach (upstream of Jingbo Lake), with stations developed in a serial configuration along the main stem.
Mudan River basin area (total)39,090 km2.
Upper Dunhua reach catchment area (study reach)10,547 km2.
Upper Dunhua reach channel length233.60 km.
Climate typeCold-temperate continental monsoon climate.
Mean annual precipitation633.2 mm.
Precipitation seasonalityJune–September (about 73.6% of annual precipitation).
Mean annual evaporation699.9 mm (E601 pan).
Mean annual air temperatureApproximately 3.8 °C.
Flood seasonalityFloods are primarily triggered by heavy rainfall and occur mainly in July–August; events are generally more frequent and larger in August.
Flood process time scales (example at Dunhua gauging station)Direct runoff duration: 6–9 d; rising limb to peak: 1–2 d (wet antecedent) or 3–4 d (dry antecedent); peak lag: 3–6 h; recession duration: 5–7 d.
Runoff response (qualitative)Mountainous catchment with steep slopes and a dense river network; storm events tend to produce rapid runoff concentration and flashy hydrographs.
Table 2. Main hydrographic network summary.
Table 2. Main hydrographic network summary.
Network ElementKey Information
Main stemMudan River main stem (pilot cascade developed in series along the main stem).
Upstream network patternThe upstream hydrographic network upstream of Jingbo Lake exhibits a fan-shaped pattern.
Major tributaries (examples)Major tributaries in the upper reach include the Sha River and the Zhuerduo River, among others.
Hydrometric stations (above Jingbo Lake)Main-stem gauges: Dunhua and Dashanjuzi; tributary gauges: Qiuligou, Dongchang, and Emu.
Key reach-scale inflow/engineering pathway 1Inter-basin diversion into Shanggou (from the Sha River; ∼18 m3/s), increasing annual inflow to downstream cascade stations.
Key reach-scale inflow/engineering pathway 2Reach-scale inflow to Hongshi includes tributary contributions (e.g., the Huangni River).
Table 3. Catchment area summary.
Table 3. Catchment area summary.
Object/StationTypeCatchment Area (km2)
Mudan River basin (total)Study basin39,090
Upper Dunhua reach (study reach)Study reach10,547
APlant control section-
BPlant control section2112
CPlant control section2813.15
DPlant control section2930
EPlant control section4835
FPlant control section-
GPlant control section4861
Note: “-” indicates data not disclosed due to confidentiality.
Table 4. Main hydropower station summary.
Table 4. Main hydropower station summary.
StationInstalled Capacity (MW)Catchment Area (km2)Total Storage ( 10 4 m3)Regulation TypeLocation (River/Reach)Special Hydrology/
Engineering Notes
A---No regulationMain stem of the Mudan River
B15.821121539.9Low storage regulationMain stem of the Mudan RiverInter-basin diversion from Sha River (∼18 m3/s)
C112813.152645.9Low storage regulationMain stem of the Mudan RiverReach inflow includes
Huangni River
D8.729305300Some regulation capabilityMain stem of the Mudan River
E84835750Low storage regulationMain stem of the Mudan River
F---No regulationMain stem of the Mudan River
G1.99--No regulationMain stem of the Mudan River
Note: “-” indicates that data are not disclosed due to confidentiality.
Table 5. Comparison of MIC, Pearson correlation coefficient (r), and Spearman rank correlation coefficient ( ρ ).
Table 5. Comparison of MIC, Pearson correlation coefficient (r), and Spearman rank correlation coefficient ( ρ ).
FeatureMICPearson rSpearman ρ
Total precipitation0.6940.6250.629
Air temperature0.5650.1280.348
Specific humidity0.5380.5270.498
Land evaporation0.5340.5170.473
Dew-point temperature0.5170.2360.375
Wet-bulb temperature0.5170.2350.375
Zonal wind speed0.447−0.481−0.488
Surface pressure0.375−0.365−0.416
Wind speed0.363−0.361−0.373
Meridional wind speed0.2470.1690.155
Table 6. Performance metrics across models and targets.
Table 6. Performance metrics across models and targets.
TargetModelMetrics
MAE (MW)RMSE (MW)MAPE (%)
t o t RNN8.30289.401113.6432
GRU4.82956.92258.3488
Informer5.22587.76989.8552
LSTNet1.07561.13913.5559
ARNN1.50941.604850.7965
GRU0.73340.816727.0509
Informer0.55681.011236.8222
LSTNet0.34311.791913.8712
DRNN4.53865.168925.0625
GRU2.29002.361720.0410
Informer2.35963.344416.9539
LSTNet1.46711.870411.1815
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Z.; Shen, X.; Huang, Y.; Ren, Y. Explainable Prediction of Power Generation for Cascaded Hydropower Systems Under Complex Spatiotemporal Dependencies. Energies 2026, 19, 1540. https://doi.org/10.3390/en19061540

AMA Style

Li Z, Shen X, Huang Y, Ren Y. Explainable Prediction of Power Generation for Cascaded Hydropower Systems Under Complex Spatiotemporal Dependencies. Energies. 2026; 19(6):1540. https://doi.org/10.3390/en19061540

Chicago/Turabian Style

Li, Zexin, Xiaodong Shen, Yuhang Huang, and Yuchen Ren. 2026. "Explainable Prediction of Power Generation for Cascaded Hydropower Systems Under Complex Spatiotemporal Dependencies" Energies 19, no. 6: 1540. https://doi.org/10.3390/en19061540

APA Style

Li, Z., Shen, X., Huang, Y., & Ren, Y. (2026). Explainable Prediction of Power Generation for Cascaded Hydropower Systems Under Complex Spatiotemporal Dependencies. Energies, 19(6), 1540. https://doi.org/10.3390/en19061540

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop