1. Introduction
Modeling dynamical systems is a fundamental problem in control engineering [
1,
2], as it underpins system analysis, prediction, supervision, and the synthesis of control strategies for industrial processes. Classical modeling approaches are typically derived from first principles, in which the system’s behavior is described by differential equations derived from physical laws [
3]; while effective in many applications, this paradigm becomes limited in the presence of strong nonlinearities and complex coupling effects, or when accurate physical descriptions are unavailable or impractical to obtain, motivating the development of data-driven modeling approaches [
4].
In many contemporary industrial systems, increasing process complexity leads to unmodeled dynamics, parametric uncertainty, and time-varying operating conditions, which significantly complicate the development of accurate first-principles models [
5]. As system dimensionality grows, purely physics-based representations may exhibit limited accuracy or reduced validity, particularly in large-scale nonlinear MIMO systems [
6].
In this context, data-driven modeling has emerged as a compelling alternative, enabled by advances in sensing, computation, and system identification methodologies [
7,
8]. Unlike first-principles approaches, data-driven methods construct approximate models directly from input–output data, making them well suited for complex systems where physical modeling is infeasible or insufficiently accurate.
Among data-driven approaches, operator-theoretic methods based on the Koopman operator have gained considerable attention in recent years. The Koopman operator provides a linear infinite-dimensional representation of nonlinear dynamical systems by the evolution of observable functions of the state [
9,
10]. This linearity property has enabled the development of numerical techniques that approximate the Koopman operator using measured data, allowing nonlinear dynamics to be analyzed through linear models in well chosen feature spaces.
Of the many methods for estimating the Koopman operator, Dynamic Mode Decomposition (DMD) stands out as one of the most prominent computational approaches for approximating the operator from measurements [
11,
12]. DMD identifies dominant spatiotemporal modes and their associated spectral properties, offering insight into the underlying system dynamics. However, the classical DMD formulation is restricted to autonomous systems, limiting its direct applicability to systems subject to external forcing or control inputs [
12].
To address this limitation, Dynamic Mode Decomposition with control (DMDc) was introduced, explicitly incorporating control inputs into the identification process [
13]. This extension enables the separation of autonomous system dynamics from the effects of exogenous inputs, making the resulting models more suitable for control-oriented analysis and prediction. Furthermore, the use of time-delay embeddings via Hankel matrices has proven effective in enriching the dynamic representation by capturing temporal correlations and latent states, leading to approaches such as Hankel-DMD [
14].
Despite these advances, the accurate modeling of forced nonlinear multiple-input multiple-output (MIMO) systems remains an open and challenging problem. Strong nonlinearities and coupling effects often limit the effectiveness of Koopman-based representations constructed solely from delay embeddings or linear observables [
15]. To overcome these limitations, lifting functions have emerged as an effective mechanism for enriching the observable space. By mapping measured variables into higher-dimensional feature spaces, lifting functions enable the approximation of complex nonlinear dynamics via linear evolution in an extended coordinate system, thereby enhancing the expressive power of Koopman-based models [
16,
17].
This paper introduces a fully data-driven modeling framework for forced nonlinear MIMO dynamical systems that integrates Hankel Dynamic Mode Decomposition with control and lifting functions (HDMDc+Lift). The proposed approach leverages Hankel embeddings to encode temporal correlations and lifting functions to enrich the observable space, enabling the identification of an augmented-order linear state-space representation directly from measured input–output data. Importantly, the methodology does not rely on explicit knowledge of the underlying physical equations, making it applicable to complex systems for which first-principles modeling is impractical or incomplete.
The effectiveness of the proposed framework is demonstrated through a case study based on a real multivariable tank system, using operational data that are independent from those employed during the identification stage. The obtained results indicate that the HDMDc+Lift model provides accurate multi-step predictions while preserving the dominant dynamical structure of the physical system. Furthermore, spectral analysis of the identified linear operator suggests that the main dynamical modes are correctly captured, along with additional modes associated with nonlinear interactions. These characteristics highlight the potential of the proposed approach as a reliable foundation for subsequent tasks, including system analysis, monitoring, and fault diagnosis, in complex industrial processes.
2. Methodology
2.1. Problem Setting and Modeling Objective
Consider a forced nonlinear MIMO dynamical system described in discrete time by
where
denotes the (possibly unmeasured) system state,
represents the control inputs or external perturbations, and
corresponds to the measured outputs. The nonlinear mappings
and
govern the state evolution and output generation, respectively.
In many practical applications, the functions f and h are unknown, partially known, or too complex to be accurately derived from first principles. Strong nonlinearities, multivariable coupling effects, and unmodeled dynamics further complicate the development of reliable physics-based models. Consequently, the objective of this work is to identify an approximate dynamic representation directly from measured input–output data that enables accurate prediction while preserving the dominant dynamical structure of the underlying nonlinear system.
2.2. Data-Driven Approximations and DMD with Control
Dynamic Mode Decomposition (DMD) was originally introduced as a data-driven method for extracting coherent structures and dominant dynamics from time-series data [
11,
12]. From a Koopman perspective, DMD can be interpreted as a finite-dimensional approximation of the Koopman operator based on snapshot data. The resulting linear model captures the evolution of measured observables between consecutive time steps.
However, the classical formulation of DMD assumes autonomous system dynamics and does not explicitly account for the influence of external inputs. As a consequence, when applied to forced systems, the effect of control actions is implicitly absorbed into the identified dynamics, limiting the interpretability and applicability of the resulting model.
To overcome this limitation, Dynamic Mode Decomposition with control (DMDc) was proposed, extending the original framework by incorporating control inputs as exogenous variables [
13]. In this case, the system is approximated by a linear model of the form
where the matrices
and
are identified directly from data. This formulation explicitly separates the intrinsic system dynamics from the effects of control inputs, making it more suitable for control-oriented analysis, prediction, and monitoring applications.
From a robustness and disturbance-rejection perspective, the DMDc formulation can be interpreted as an implicit separation between intrinsic system dynamics and exogenous inputs, a concept that is also central to several advanced control and estimation frameworks. For example, equivalent-input disturbance (EID) approaches explicitly augment the system dynamics to estimate and compensate unknown perturbations acting on the system [
18]. Although DMDc does not explicitly model disturbances, the identification of separate operators
and
enables the data-driven isolation of forced and autonomous components of the dynamics, which is conceptually aligned with disturbance-decoupling objectives.
Similarly, robust zeroing neural network formulations employ augmented dynamic structures to enforce convergence properties in the presence of uncertainties [
19]. In comparison, DMDc achieves a linear representation directly from data, where robustness emerges implicitly from the quality and richness of the measured input–output trajectories rather than from explicitly designed nonlinear feedback terms. This highlights the suitability of DMDc-based models as a lightweight yet effective alternative for data-driven analysis of forced nonlinear systems.
2.3. Hankel Embedding and Motivation for Lifting
While DMDc improves the modeling of forced systems, its ability to capture complex nonlinear dynamics remains limited when only instantaneous measurements are used. To address this limitation, time-delay embeddings based on Hankel matrices are employed, incorporating delayed versions of the measured signals into the data representation.
The use of Hankel embeddings allows the model to encode temporal memory and latent dynamics, effectively enriching the observable space without requiring direct access to the full system state. This approach has led to variants such as Hankel-DMD and Hankel-DMDc (HDMDc), which have demonstrated improved performance in the identification of nonlinear and high-dimensional systems.
In addition to time-delay embedding, lifting functions provide a complementary mechanism for enhancing model expressiveness. Lifting maps the measured variables into a higher-dimensional feature space, where nonlinear relationships can be more effectively approximated by linear dynamics. From a Koopman perspective, lifting corresponds to enriching the observable dictionary, increasing the likelihood that the chosen subspace captures invariant or approximately invariant dynamics.
The combination of Hankel embedding and lifting constitutes the conceptual foundation of the HDMDc+Lift approach proposed in this work. By jointly exploiting temporal delays and lifted observables, the resulting framework enables the identification of extended-order linear models capable of accurately approximating forced nonlinear MIMO dynamics, as detailed in the following section.
The objective of the proposed methodology is to identify a low-order linear approximation of a nonlinear MIMO system directly from input–output data, while preserving its dominant dynamic structure and predictive capability. To this end, a data-driven modeling approach based on the Koopman operator is employed, combining three key elements:
Hankel time-delay embedding to capture temporal dependencies and latent dynamics;
Dynamic Mode Decomposition with control (DMDc) to explicitly incorporate control inputs;
Lifting of observables to enhance the representation of nonlinear effects.
The resulting method, referred to as HDMDc+Lift, produces a linear state-space model suitable for prediction, dynamic analysis, and fault detection tasks.
The use of Hankel time-delay embeddings can also be interpreted through the lens of state augmentation strategies commonly employed in robust estimation and fault-tolerant control. In dynamic high-gain and decentralized fault-tolerant control schemes, augmented states are often introduced to capture hidden coupling effects, internal dynamics, or fault signatures [
20]. Analogously, Hankel embeddings construct an extended state representation from measured data that implicitly captures latent dynamics and nonlinear interactions without requiring explicit physical modeling.
From a mathematical standpoint, delay-embedded observables approximate higher-order Markov representations of the underlying nonlinear system. This property enables the lifted linear model to retain memory of past system behavior, which is particularly beneficial in the presence of unmodeled dynamics or slowly varying disturbances. As a result, the Hankel-based formulation enhances the robustness of the identified model while preserving its linear structure.
2.4. Hankel-Based Data Organization
Consider a set of measured output data of length
T
and the corresponding control inputs
To incorporate temporal memory into the model, Hankel matrices of order q are constructed for both outputs and inputs.
The output Hankel matrix is defined as
Similarly, the input Hankel matrix is constructed as
This representation allows the model to encode temporal correlations and delayed nonlinear effects using only measured data.
2.5. Incorporation of Lifting Functions
To further enhance the expressive power of the model, the measured outputs are mapped to a lifted observable space through a function
In this work, the lifting is defined as a linear augmentation based on delayed measurements:
Although nonlinear lifting functions may be employed, empirical evaluation showed that delay-based lifting provides a favorable balance between model accuracy, numerical stability, and computational cost for the multitank system. This choice is consistent with the Koopman framework, where delay coordinates are known to approximate invariant subspaces under mild observability conditions.
The lifted Hankel matrix is therefore constructed as
2.6. HDMDc Formulation in the Lifted Space
The lifted output and input Hankel matrices are concatenated to form the extended data matrix
The corresponding future-state matrix is defined as
The objective is to identify a linear operator
such that
This operator is estimated via truncated singular value decomposition (SVD):
where
r denotes the truncation rank.
The system operator is obtained as
The matrix
is partitioned as
leading to the lifted linear state-space model
where
and, as demonstrated in (
9), the first
p rows correspond to the measured output at time instant
k. Since the dimension of the identity matrix is equal to the number of system outputs, therefore, the output matrix
can be conveniently constructed as a combination of identity and zero matrices, as follows:
The identification of the lifted operator
can be interpreted as constructing a linear surrogate of the nonlinear input–output dynamics in an augmented coordinate system. This perspective is closely related to hybrid modeling approaches that combine data-driven components with structured estimators. For instance, hybrid neural network- and physics-based estimators aim to balance expressive power with interpretability by embedding nonlinear effects into extended state representations [
21]. In contrast, the HDMDc+Lift framework embeds nonlinearities through delay coordinates and lifted observables, yielding a fully linear evolution model in the lifted space.
An important distinction is that the spectral properties of the operator can be directly analyzed, enabling stability assessment, modal interpretation, and integration with observer-based monitoring schemes. This characteristic contrasts with purely neural-network-based approaches, where stability and robustness analysis often remain challenging. Consequently, the lifted Koopman-based representation provides a mathematically transparent and computationally efficient alternative for modeling complex nonlinear MIMO systems under external forcing.
2.7. Algorithmic Summary
The complete HDMDc+Lift procedure is summarized in Algorithm 1.
| Algorithm 1 HDMDc with Lifting for Forced Nonlinear MIMO Systems. |
- Require:
Output data , input data , Hankel order q, truncation rank r - Ensure:
Lifted linear state-space model - 1:
Construct output Hankel matrix - 2:
Construct input Hankel matrix - 3:
Define lifting function using delayed measurements - 4:
Form lifted Hankel matrix - 5:
Construct extended data matrix - 6:
Compute truncated SVD of - 7:
Estimate operator - 8:
return Linear lifted model
|
The proposed algorithm identifies a finite-dimensional linear approximation of the Koopman operator restricted to a lifted and delay-embedded observable space, enabling accurate prediction of forced nonlinear dynamics using only input–output data.
3. Case Study and Experimental Setup
3.1. Multitank System Description
The proposed methodology is validated using a laboratory-scale multitank system with three control inputs and two measured outputs. This system constitutes a well-established benchmark in control engineering research due to its nonlinear dynamics, multivariable interactions, and strong coupling effects among state variables.
The system dynamics are governed by nonlinear flow relationships between interconnected tanks, where variations in pump actuation affect multiple liquid levels simultaneously. These characteristics make the multitank system particularly suitable for evaluating data-driven modeling techniques and Koopman-based linear representations, as they challenge both model expressiveness and generalization capability.
The variables considered in this study are summarized as follows:
Due to inherent cross-couplings, saturation effects, and unmodeled disturbances, deriving accurate physics-based models for this system typically requires complex parameter identification procedures. As such, it provides a representative test case for assessing the effectiveness of fully data-driven modeling approaches. Further details of this experimental system in [
12,
22,
23].
3.2. Data Acquisition and Preprocessing
The data used for model identification and validation were acquired directly from the real multitank system via a supervisory control and data acquisition (SCADA) platform. Several hours of operation under nominal conditions were recorded, ensuring sufficient excitation of the system dynamics across relevant operating regimes.
Prior to model construction, the collected data were subjected to a preprocessing pipeline consisting of the following steps:
Removal of outliers and segments affected by sensor anomalies or operational transients.
Normalization of input and output signals to mitigate scaling effects and improve numerical conditioning.
Partitioning of the dataset into two disjoint subsets:
- –
Training data, used exclusively for model identification.
- –
Validation data, not employed during training and reserved for performance assessment.
This strict separation between training and validation datasets enables an objective evaluation of the generalization capability of the identified model and reduces the risk of overfitting.
3.3. HDMDc+Lift Model Configuration
The HDMDc+Lift model was constructed following the methodology described in Algorithm 1. The performance of the proposed HDMDc+Lift method depends on two key parameters: the Hankel matrix order q, which determines the temporal memory depth, and the SVD truncation rank r, which controls the number of retained dynamical modes.
A systematic parameter sweep was conducted to evaluate the trade-off between predictive accuracy and model complexity within the HDMDc+Lift framework. This analysis considered multiple combinations of the Hankel order
q and truncation rank
r, assessing both prediction performance and computational cost. The obtained results, summarized in
Table 1, include the coefficient of determination for both system outputs, together with the corresponding training and prediction times.
The results indicate that increasing either the Hankel order q or the truncation rank r generally improves prediction accuracy up to a certain point; however, this improvement is accompanied by a significant increase in computational burden due to the higher-dimensional lifted state representation. This behavior is directly linked to the growth in the dimensions of the identified state-space matrices and , which scale with both the number of delays and the dimension of the lifted observable space. Although several configurations with larger values of q and r achieve predictive performance comparable to the selected configuration (, ), the associated training and prediction times grow markedly, reducing their practical suitability.
These values were found to provide an effective balance between predictive accuracy, numerical stability, and computational efficiency.
The resulting model is expressed in discrete-time linear state-space form as
where
denotes the lifted state vector,
represents the estimated outputs, and the matrices
are identified directly from data.
3.4. Validation Strategy and Performance Metrics
Model validation was conducted by comparing the estimated outputs with the corresponding measured outputs of the multitank system using validation data that were not employed during the identification stage.
The performance of the identified model was assessed using the following criteria:
Mean squared error (MSE), to quantify prediction accuracy.
Coefficient of determination (), to evaluate the proportion of explained output variance.
Visual inspection of time-domain trajectories, to assess dynamic consistency and transient behavior.
In addition to predictive accuracy, the stability properties of the identified model were examined through spectral analysis of the state transition matrix . The dominant eigenvalues were verified to lie within the unit circle, which constitutes a necessary condition for stability in discrete-time linear systems. This analysis provides further insight into the dynamical consistency of the learned model and its suitability for subsequent analysis and monitoring tasks.
4. Results and Discussion
4.1. Predictive Performance of the HDMDc+Lift Model
The model identified using the HDMDc+Lift methodology was evaluated using a dataset completely independent from that employed during the identification process. This strategy enabled an objective assessment of the model’s generalization capability and prevented overfitting.
Figure 1 and
Figure 2 show the comparison between the real outputs of the multitank system and the outputs estimated by the model for both measured variables. A high correspondence between the real and estimated trajectories is observed, both during transient and steady-state regimes.
Quantitatively, the model achieved an average coefficient of determination of 87%, demonstrating strong predictive capability given the nonlinear and coupled nature of the system. This level of accuracy was achieved without relying on physical models or explicit nonlinear functions, reinforcing the effectiveness of the data-driven approach.
4.2. Selection of the Lifting Dictionary
The choice of the observable dictionary used for lifting is a critical design aspect of the HDMDc+Lift framework, as it directly determines the expressiveness and numerical properties of the resulting lifted linear model. An inadequate selection may lead to poor generalization, overfitting, or excessive computational cost, despite achieving acceptable short-term prediction accuracy.
To address this issue, a systematic evaluation was conducted using several candidate dictionaries belonging to different functional classes, including polynomial, radial basis, trigonometric, logarithmic–rational, truncated Fourier, and delay-based functions. For each dictionary, the identified model was assessed using a joint criterion that accounts for multi-step prediction accuracy, training time, and prediction time.
The comparative results, summarized in
Table 2, indicate that the delay-based lifting functions provide the most favorable trade-off between predictive performance and computational efficiency; while some alternative dictionaries achieve comparable accuracy levels, they exhibit significantly higher training or prediction times, which limits their suitability for monitoring-oriented applications. Based on these results, the delay-based dictionary was selected for the remainder of this study, as it offers robust predictive performance while maintaining low computational complexity.
4.3. Hyperparameter Sensitivity and Selection
To address the sensitivity of the HDMDc+Lift model with respect to its hyperparameters and to justify their selection, a systematic optimization-based search was conducted over the Hankel embedding order
q and the SVD truncation rank
r. In particular, a Bayesian optimization strategy was employed to explore the two-dimensional hyperparameter space within predefined bounds, using a composite cost function based on multi-step prediction errors evaluated on independent validation data, as shown in
Figure 3.
The optimization results indicate that larger values of q and r, such as and , can achieve prediction accuracies comparable to those obtained with the originally selected parameters (, ), yielding coefficients of determination close to for both outputs. However, this increase in hyperparameter complexity does not result in a significant improvement in predictive performance.
In contrast, the computational cost associated with both model training and multi-step prediction increases substantially as the hyperparameters grow. Specifically, the average prediction time increases from approximately for , to for the optimized configuration of , . At the same time, the dimensions of the identified state-space matrices and increase considerably, leading to higher memory requirements and reduced scalability in real-time or monitoring-oriented applications.
Based on these results, the parameters and were selected, as they provide a favorable trade-off between predictive accuracy, numerical robustness, and computational efficiency. These findings indicate that, although the HDMDc+Lift framework exhibits limited sensitivity to moderate variations in hyperparameters in terms of prediction accuracy, practical considerations related to computational cost and model size are decisive for parameter selection in real-world deployments.
4.4. Evaluation Under Non-Nominal Operating Conditions
To assess the robustness and generalization capability of the proposed HDMDc+Lift model beyond nominal operating conditions, its predictive performance was evaluated under multiple non-nominal scenarios. Three representative operating regimes were considered: stochastic disturbances acting on the actuators, a persistent actuator bias introducing a steady offset, and a change in the operating regime that places the system dynamics outside the training distribution.
For each scenario, independent validation datasets were generated, and the multi-step prediction performance was quantified using the coefficient of determination for both output variables. The resulting performance metrics are reported in
Table 3. In addition, the corresponding time-domain predictions are shown in
Figure 4,
Figure 5 and
Figure 6, allowing a qualitative comparison of transient and steady-state behavior across the different operating regimes.
The results indicate that the HDMDc+Lift model preserves satisfactory predictive accuracy under stochastic disturbances and moderate regime changes, while its performance degrades in the presence of persistent actuator bias. Stochastic disturbances primarily introduce zero-mean fluctuations that are partially attenuated by the identified Koopman subspace, resulting in only a moderate reduction in prediction accuracy. In contrast, a persistent actuator bias induces a systematic shift in the system trajectories that is not represented in the training data, leading to a structural mismatch in the learned linear operator and a pronounced collapse of the R2 metric. Nevertheless, the model maintains stable and bounded predictions across all considered scenarios, highlighting its robustness and generalization capability despite structural deviations.
4.5. Predictive Performance and Uncertainty Quantification
In addition to pointwise prediction accuracy, the uncertainty associated with the proposed HDMDc+Lift model was quantified through confidence bounds derived from the training residuals. Specifically, the modeling error was computed as the difference between the measured outputs and the reconstructed outputs obtained during the identification phase. From these residuals, an empirical standard deviation was estimated for each output channel.
Assuming a Gaussian distribution of the modeling error, confidence bounds were defined as around the predicted output trajectories. These bounds provide a statistical characterization of the uncertainty induced by truncation effects, finite data length, and modeling approximations inherent to the Koopman-based formulation. It is important to emphasize that the confidence bounds were computed exclusively from the training data and subsequently propagated to the validation phase, without recalibration during prediction.
Figure 7 and
Figure 8 illustrate the results for both the training and validation datasets, including the corresponding confidence bounds. As observed, the measured outputs remain largely contained within the estimated bounds. This behavior confirms that the identified HDMDc+Lift model captures the dominant system dynamics while providing meaningful error envelopes for prediction-based analysis and monitoring applications.
4.6. Comparison with the Classical Linearized Model, HAVOK and HDMDc
To contextualize the performance of the proposed model, a direct comparison was conducted with a classical linearized model derived from the physical equations of the multitank system around a nominal operating point, which we consider the worst possible performance of an identified model.
The results show that the HDMDc+Lift model consistently outperforms the linearized model, particularly during:
While the linearized model exhibits significant errors under non-small variations, the data-driven model maintains a coherent response, effectively capturing dynamic coupling effects between variables.
This comparison highlights a key advantage of the proposed approach: the ability to represent complex nonlinear dynamics using an extended-order linear model.
Table 4 and
Table 5 present a statistical analysis of prediction errors for both models.
To assess the relative predictive performance of the proposed HDMDc+Lift framework, a comparative analysis was conducted against HAVOK and standard HDMDc using the same validation dataset and evaluation metrics. The results are summarized in
Table 6.
As shown in the table, the HAVOK model yields the lowest prediction accuracy for both outputs. This behavior is consistent with its linear representation of delay-embedded dynamics, which limits its ability to capture nonlinear input–output relationships, particularly in forced MIMO systems.
The HDMDc approach improves prediction performance by explicitly incorporating control inputs within a Hankel-embedded formulation. However, since the model operates in the original measurement space without nonlinear lifting, its representational capacity remains constrained.
In contrast, the proposed HDMDc+Lift model achieves the highest prediction accuracy for both outputs, with coefficients of determination exceeding .
The corresponding time-domain predictions, shown in
Figure 9, visually confirm these quantitative results, highlighting the superior tracking capability and reduced prediction error of the lifted formulation over the entire prediction horizon.
4.7. Multi-Step Ahead Prediction Performance
To further evaluate the predictive robustness of the proposed HDMDc+Lift model, a multi-step ahead prediction analysis was conducted under a constant input assumption. The prediction performance was quantified using the mean squared error (MSE) for increasing prediction horizons, as reported in
Table 7.
As expected, the prediction error gradually increases with the length of the prediction horizon due to the accumulation of modeling inaccuracies and truncation effects inherent to data-driven approximations. Nevertheless, the error growth remains bounded and does not exhibit divergence, indicating that the identified lifted Koopman model preserves the dominant dynamics and stability properties of the multitanque system.
The results demonstrate that the HDMDc+Lift formulation maintains reliable predictive performance even for long prediction horizons, which is a desirable property for monitoring and forecasting applications. This behavior highlights the ability of the proposed approach to capture the essential system dynamics while limiting long-term error propagation.
4.8. Modal Analysis and Stability of the Identified Model
Beyond numerical accuracy, it is essential to verify that the HDMDc+Lift model preserves the dynamic structure of the original system. To this end, a pole analysis was conducted for both the linearized model and the data-driven model.
In control theory, system poles correspond to the eigenvalues of the system matrix and are directly associated with stability, response speed, and transient behavior. Therefore, if the HDMDc+Lift model provides a structurally accurate approximation, it should reproduce at least the dominant poles of the linearized model.
Unlike the linearized model, which has a fixed number of poles determined by the physical system order, the HDMDc+Lift model can capture additional poles corresponding to nonlinear, coupled, or resonant dynamics not present in the classical formulation.
It is worth noting that, even for larger Hankel embedding orders (up to q = 20), no eigenvalues were observed outside the unit circle. The poles located closest to the unit circle correspond to the dominant dynamical modes of the system, which are consistently preserved across different values of q. As the embedding order increases, additional poles appear; however, these modes are additional dynamic modes with weak or hidden dynamics and remain clustered near the origin of the complex plane. Consequently, their contribution to the overall system response is negligible, and the truncated SVD effectively preserves stability while retaining the dominant spectral content.
Figure 10 compares the distances between the poles of the linearized model and those of the HDMDc+Lift model.
Figure 11 shows the complete set of poles estimated by the HDMDc+Lift model, highlighting the presence of additional dynamic modes.
Based on the spectral analysis presented in
Figure 10, it is observed that the HDMDc+Lift model not only achieves superior temporal fitting compared to the linearized model, but also accurately captures the essential poles of the real system. The Euclidean distance between the dominant poles of both models is minimal, as shown in
Table 8, indicating that the data-driven model reproduces the fundamental system dynamics with high fidelity.
4.9. Implications for Monitoring and Diagnosis
Although the primary focus of this work is on data-driven modeling, the proposed framework has important and immediate implications for system monitoring and fault diagnosis. The high fidelity of the identified model, together with its demonstrated structural stability, makes it well suited for integration into state estimation, observer-based monitoring, and residual generation schemes.
In particular, the linear representation in the lifted space enables the direct application of well-established diagnostic tools, such as Luenberger observers, Kalman filters, and model-based residual evaluators, while preserving the ability to capture nonlinear system dynamics. This constitutes a key advantage over purely black-box models, which often lack interpretability and robustness when deployed in diagnostic settings.
Moreover, the model’s ability to generalize beyond the training dataset is a critical requirement for reliable fault detection and isolation (FDI) in real-world systems. In practical operation, faults and abnormal conditions typically manifest as deviations from nominal behavior under operating regimes that may not have been explicitly observed during identification. The observed generalization capability suggests that the proposed approach can distinguish between normal variability and abnormal behavior, thereby reducing false alarms and improving diagnostic sensitivity.
5. Conclusions
This work presented a fully data-driven modeling framework for forced nonlinear multiple-input multiple-output (MIMO) dynamical systems, grounded in finite-dimensional approximations of the Koopman operator through Hankel time-delay embeddings, explicit incorporation of control inputs, and observable lifting. The proposed HDMDc+Lift methodology enables the identification of an extended-order linear state-space representation directly from input–output data, without requiring explicit knowledge of the underlying physical equations.
The effectiveness of the approach was demonstrated on a real multitank system, where the identified model exhibited accurate multi-step predictions when evaluated on independent validation data. These results indicate that the proposed framework captures both transient dynamics and nonlinear coupling effects while maintaining numerical stability and structural coherence. The preservation of dominant dynamical modes, as confirmed by spectral analysis, further supports the validity of the Koopman-based approximation using delay-coordinate observables.
In addition to its modeling accuracy, the proposed method exhibits favorable computational performance, making it suitable for practical deployment in industrial monitoring and analysis settings. Overall, the results establish HDMDc+Lift as a flexible and effective data-driven modeling tool for forced nonlinear systems. Future work will focus on extending the framework toward state estimation, fault detection, and health monitoring applications, leveraging the identified linear representations within advanced control and diagnostic architectures.