1. Introduction
Efficient representation of high-dimensional spatiotemporal flow fields and accurate capture of their dynamic evolution have long posed significant challenges in computational fluid dynamics (CFD). This is especially true for flow systems, such as cylinder wake flows [
1,
2], which exhibit both classical and complex behaviors. Although traditional CFD methods deliver high computational accuracy, their substantial computational costs and storage requirements severely limit their applicability in real-time prediction and optimization.
To overcome these limitations, a wide range of model reduction strategies—broadly referred to as reduced-order models (ROMs) [
3,
4]—have been used to construct low-dimensional representations that capture dominant flow features. The fundamental strength of such approaches lies in establishing a robust mapping between high-dimensional dynamical systems and their low-dimensional surrogates—a task that involves not only compressing data but also uncovering the essential physical mechanisms underlying fluid motion.
Traditional ROM methods can be broadly categorized into three types:
- (1)
Modal decomposition projection methods (e.g., POD-Galerkin [
5]) construct low-dimensional subspaces by extracting dominant modes and projecting governing equations onto these subspaces.
- (2)
Balanced truncation methods [
6] utilize controllability and observability analysis to retain state variables most significant to input–output behavior.
- (3)
Harmonic balance methods [
7] address periodic flows by approximating steady-state solutions in the frequency domain using a finite set of Fourier basis functions.
These approaches are physically grounded and interpretable, offering energy-optimal modal selection and high-fidelity reconstruction. However, they often rely on linear subspace assumptions and require precise numerical integration, which can be limiting for nonlinear or geometrically complex flows. Beyond traditional projection-based approaches, data-driven techniques have enabled a broader exploration of flow structures without reliance on governing equations. These can be classified into:
- (1)
Data-driven modal methods, like snapshot POD [
8] and dynamic mode decomposition (DMD) [
9], which extract coherent structures directly from simulation or experimental data. They are equation-free but often sensitive to noise and require extensive datasets.
- (2)
Neural-network-based approaches, such as autoencoders [
10] and their derivatives [
11], which learn nonlinear latent representations but may suffer from overfitting and lack interpretability.
- (3)
Hybrid strategies combining ROM and machine learning, such as POD-RBF, POD-LSTM [
12], and physics-informed neural networks (PINNs) [
13,
14], aiming to balance physical relevance and learning flexibility.
In addition to standard POD, various studies have expanded its framework:
Colanera [
15] proposed a robust spectral POD (SPOD) integrating robust principal component analysis for improved noise resilience.
Gu [
16] introduced frequency-domain POD (FD-POD) to reconstruct unsteady flows using embedded frequency information.
Bui [
17] developed POD-ISAT, combining POD with in situ adaptive tabulation for efficient steady-state PDE approximation.
Snapshot POD remains a popular tool due to its algorithmic simplicity and modal resolution, but its linear nature limits its ability to capture nonlinear interactions, such as turbulence. This motivates the search for more expressive and adaptive dimensionality reduction techniques.
Among these, clustering-based methods have emerged as powerful alternatives for representing nonlinear, multi-regime flow behaviors. Instead of relying on orthogonal basis functions, they group data points by similarity in feature space, making them well-suited to reflect local state transitions and complex dynamics. This is particularly useful in transitional flows, where traditional modal decompositions fall short.
While many clustering-based methods are inspired by ROM principles, in the strictest sense, ROMs refer to low-dimensional dynamical systems derived via projection (e.g., Galerkin or Petrov-Galerkin) from governing equations. The method proposed in this study does not involve such a projection and does not construct time-evolving models. Instead, it focuses on extracting and organizing spatial flow structures directly from data. Therefore, our approach is best categorized as a data-driven dimensionality reduction method, guided by physical insights from POD.
Building on these foundations, clustering-based dimensionality reduction methods have shown increasing promise. Burkardt et al. [
18] proposed a CVT-based approach but did not address mode importance. Iqbal et al. [
19] introduced modified pole clustering for improved initialization, with emphasis on clustering rather than reduction. Huera-Huarte and D’Adamo [
20,
21] combined clustering and POD to analyze vortex dynamics, yet lacked an effective mode-ranking strategy. Wei et al. [
22] presented CROM, which models transitions via a Markov matrix, but its dimensionality and sparsity hinder application to complex systems.
To tackle the dual challenges of unstable clustering center initialization and the absence of mode ranking, we propose a clustering-based dimensionality reduction method guided by POD structures (C-POD). This framework leverages POD to enhance initialization robustness and introduces a novel entropy-controlled Euclidean-to-probability mapping (ECEPM) to provide probabilistic, interpretable mode ranking.
We validate the C-POD framework on the one-dimensional Burgers’ equation and a two-dimensional cylinder wake flow. We further test its performance in an inverse reconstruction task using sparse sensor data.
To enhance conceptual clarity,
Figure 1 summarizes the taxonomy of the methods discussed.
In summary, this study offers the following contributions:
A clustering-based dimensionality reduction framework guided by POD, improving stability in nonlinear mode extraction.
A novel ECEPM-based probabilistic mode-ranking method.
Empirical validation demonstrating improved interpretability and reconstruction accuracy over traditional POD methods.
2. C-POD Method and Modal Sorting Method
2.1. C-POD Method
The proper orthogonal decomposition (POD) method, based on singular value decomposition (SVD), is a well-established, data-driven model reduction technique. POD generates a set of orthogonal basis functions that encapsulate the dominant features of a system. By employing linear combinations of these selected basis functions, one can effectively approximate the original system, thereby transforming a high-dimensional problem into a low-dimensional representation. To preserve computational resources, the snapshot POD method proposed by Sirovich is frequently used for dimensionality reduction of the dataset [
23]. However, POD is inherently a linear reduction technique. Although it captures the primary dynamic features of linear systems, it may not fully represent the complexities of nonlinear flows, particularly when nonlinear behaviors vary across multiple time steps and spatial domains. In such cases, the low-rank approximation afforded by POD might not sufficiently capture these nonlinear effects.
Clustering methods exhibit robust capabilities in extracting nonlinear features, effectively capturing patterns and regularities in data, especially within complex nonlinear systems. Moreover, these methods can autonomously adapt to diverse data patterns, mapping both linear and nonlinear datasets into a low-dimensional space that elucidates underlying dynamic characteristics. Consequently, this paper integrates clustering concepts with model reduction techniques by proposing a clustering-based dimensionality reduction framework, referred to as C-POD, to enhance the effectiveness of proper orthogonal decomposition (POD) in addressing nonlinear model reduction challenges. The construction process is detailed as follows:
- (1)
Arrange the responses of m data points under n different operating conditions as columns to form the database matrix D:
Here, di ∈ Rm (i = 1, 2, …, n) represents the data snapshot under the i-th operating condition.
- (2)
Determining initial cluster centers based on the snapshot POD method [
23]. First, the correlation matrix of the snapshot matrix
D is constructed:
Solving the eigenvalue problem of the correlation matrix
C:
Here,
λi represents the eigenvalue, and
φi is the corresponding eigenvector. The eigenvalues are arranged in descending order:
The corresponding eigenvectors
φi form an orthogonal basis, and the first
K eigenvectors are selected as the initial cluster centers:
- (3)
Sample assignment. Each column vector dⱼ (j = 1, 2, …, n) from the data matrix D is assigned to the nearest cluster center, forming K clusters:
Here, Si(t) represents the number of vectors in the i cluster, μi denotes the i-th cluster centroid, t is the current iteration count, and ||·|| represents the Euclidean distance norm.
- (4)
Updating the cluster centers. The cluster centers are updated by calculating the mean of all vectors in each cluster Si, and this mean becomes the new cluster centroid:
where
denotes the set of snapshots assigned to cluster
i. Each
can be interpreted as a local average attractor in the trajectory space of the flow field—analogous to a mean coherent structure in physical space.
- (5)
Repeat steps (6) and (7) until the following convergence criterion is met:
- (6)
To obtain the modal coefficients, project the original data matrix
D onto the cluster bases
μ by solving the following least squares problem [
24]:
Here,
ar ∈
Rk is the modal coefficient vector corresponding to the
r-th snapshot. Physically, solving the least squares problem in Equation (9) provides a projection of each snapshot
onto the low-dimensional basis formed by cluster centers
. These cluster centers are not arbitrary vectors but statistically representative patterns extracted from the dataset. As such, the projection yields modal coefficients
ar that quantify how much each typical flow pattern (i.e., cluster mode) contributes to the reconstruction of the current state. The full data reconstruction expression is:
where
A = [
a1,
a2, …,
an] ∈
Rkxn is the modal coefficient matrix, and
is the reconstructed data matrix. Through the C-POD method, the database matrix
D is decomposed into low-dimensional clustering bases
and the modal coefficient matrix
A. The construction of the C-POD method is illustrated in Algorithm 1, which presents the pseudocode of the entire procedure.
Algorithm 1: C-POD Method |
Input: //Snapshot matrix K //Number of clusters Output: //Cluster basis A = [a1 a2 … an] //Modal coefficients Steps: 1. POD Initialization - Solving the eigenvalue problem: - Select first K eigenvectors as the initial cluster centers: 2. Clustering - Assign each snapshot vector to its closest cluster center - Compute cluster centroids: 3. Compute modal coefficients - Modal coefficient vector ar ∈ Rk: 4. Return , A |
2.2. Entropy-Controlled Euclidean-to-Probability Mapping Modal Sorting Method
One notable advantage of the POD method is its capacity to quantify the contribution of each mode to the reconstruction of the original data. This contribution is indicated by the singular values, which represent the projection strength of the original data matrix onto the orthogonal modes. Larger singular values imply a more significant contribution from the corresponding mode in reconstructing the data, thereby identifying it as a dominant mode that captures the primary operational characteristics of the system. In contrast, the C-POD method does not involve singular value decomposition during its construction, rendering it infeasible to assess mode contributions based on singular value magnitudes. However, in the initialization phase, we solve the eigenvalue problem of the snapshot correlation matrix—whose eigenvalues are mathematically related to the singular values of the original data matrix. These eigenvalues could theoretically be used to rank modes by importance. Yet, because C-POD replaces orthogonal POD modes with clustering centroids—which are neither orthogonal nor ordered—the original eigenvalue-based ranking becomes inconsistent and inapplicable. To address this, we propose a novel probabilistic mode sorting method: the entropy-controlled Euclidean-to-probability mapping (ECEPM), which offers a robust and interpretable way to evaluate mode significance within the C-POD framework. This probabilistic mode sorting framework incorporates both the spatial proximity between modes and snapshots, and the consistency of modal contributions. It effectively replaces energy-based sorting in POD by offering a statistical interpretation of mode importance, thereby enabling robust feature extraction and mode separation in the clustering-based dimensionality reduction method.
Euclidean distance: The Euclidean distance quantifies the average proximity between each clustering mode and the dataset samples. A smaller distance suggests that the corresponding mode plays a significant role across a broader range of system states, thus representing a dominant dynamic feature. This measure offers clear geometric and physical interpretability.
- (1)
Euclidean distance matrix construction: Assume the database matrix , clustering bases , and the column vectors are and , where .
- (2)
Compute the column-wise sum of squares for
and
, as shown in Equations (11) and (12):
- (3)
Construct the inner product matrix
G:
- (4)
Construct the squared Euclidean distance matrix:
where
1 is a matrix with all elements equal to 1, so the Euclidean distance matrix is:
Probability mapping function construction: Relying solely on distance may lead to overestimating the importance of modes with sporadic proximity. To enhance robustness, we introduce information entropy as a regulatory factor, penalizing modes with dispersed or unstable contributions. A lower entropy value indicates that the mode has a focused and consistent influence on specific samples, making it more reliable for mode ranking. After obtaining the Euclidean distance matrix , the goal is to transform it into a weight probability mapping matrix , where represents the probability value of the i-th clustering base for the j-th sample. A higher probability value indicates a greater contribution of that clustering base to the current sample value, while satisfying .
- (1)
Take the negative of the distance matrix
C and shift each column, to avoid overflow in exponential calculations, resulting in the matrix
Z:
where
,
, and
represents the outer product.
- (2)
Perform the exponential operation on the shifted matrix and introduce the temperature parameter
, resulting in:
- (3)
Normalize the exponential matrix column-wise to obtain the final probability weight matrix
W:
where
represents element-wise division,
is the sum of each column, and
extends the column sum to a matrix where all columns are identical. In the construction of the probability mapping function, the temperature parameter
is used to adjust the entropy of the probability distribution. The temperature parameter
plays a critical role in shaping the entropy level of the probability mapping. A smaller
(e.g.,
≈ 0.1) leads to a more peaked distribution, favoring one dominant mode per snapshot, which increases mode separability but may result in instability. In contrast, a larger
(e.g.,
> 5) produces a flatter distribution, balancing contributions across multiple modes and improving robustness but potentially reducing interpretability.
To balance these trade-offs, we recommend choosing based on the desired sharpness of mode assignment. In this study, was selected via grid search in the range [0.1, 5], and the optimal value was determined by evaluating the reconstruction performance using the Cr index. This approach ensures that the mode-ranking mechanism remains both physically meaningful and numerically stable.
Through the above entropy-controlled Euclidean-to-probability mapping method, the probability weight matrix
W was obtained. For different samples in the database matrix,
Wij represents the probability value of the
i-th clustering base for the
j-th sample. A higher probability value indicates a greater contribution of that clustering base to the current sample value. Therefore, by examining the values in
W, the importance of each clustering base in the C-POD method was distinguished. The construction process of the C-POD method and the entropy-controlled Euclidean-to-probability mapping method is shown in
Figure 2. In
Section 3 and
Section 4, to validate the effectiveness of the C-POD method, we applied it to two classic fluid dynamics problems: the one-dimensional Burgers’ equation and the two-dimensional cylinder wake flow. These problems exhibited distinct nonlinear characteristics, making them ideal for testing the practical applicability of the method.
The full procedure of the ECEPM is summarized in Algorithm 2.
Algorithm 2: ECEPM |
Input: //Snapshot data matrix //Clustering bases (b modes) //Temperature parameter for entropy control Output: //Probability weight matrix Steps: 1. Compute the squared norms of D and μ: - - 2. Compute the inner product matrix 3. Compute the squared Euclidean distance matrix: - - 4. Shift the distance matrix column-wise to prevent numerical overflow: //Broadcasting to each row 5. Compute exponentials with entropy scaling: 6. Normalize E column-wise to obtain the probability matrix: //Element-wise division, columns sum to 1 Return: W //Each column W_j contains probabilities of modes for sample j |
3. Burgers’ Equation and C-POD Method
3.1. Introduction to the One-Dimensional Burgers’ Equation
As a nonlinear partial differential equation, the Burgers’ equation holds significant research value across various domains, including fluid mechanics [
25] and meteorological forecasting [
26]. In addition to characterizing the propagation of nonlinear waves, it encapsulates essential physical phenomena, such as viscosity and diffusion. Its inherent nonlinearity facilitates the simulation of complex flow phenomena, particularly in convection-dominated fields where highly nonlinear behaviors are prevalent. Dimensionality reduction methods aim to minimize computational costs while preserving the fundamental characteristics of the system. Consequently, using the Burgers’ equation as a test case provides an effective means to evaluate the efficacy and accuracy of these models in addressing nonlinear and transient flow processes. In this study, the accuracy of the POD and C-POD methods was comparatively assessed using a one-dimensional Burgers’ equation as the benchmark case. The mathematical formulation of the Burgers’ equation is presented as follows:
Let u(x, t) denote the wave amplitude, c represent the wave speed (i.e., the coefficient of the convection term), and β denote the dissipation coefficient. The one-dimensional Burgers’ equation incorporates both convection and diffusion characteristics, reflecting aspects of the heat conduction equation and the Navier–Stokes equations. Consequently, it establishes a conceptual link to both heat conduction and convection phenomena.
Consider the initial condition in the form of a Gaussian pulse, as shown in Equation (20):
Under this initial condition, the analytical solution to the one-dimensional Burgers’ equation is given by [
27]:
Here,
xo denotes the initial peak position of the wave,
σ represents the width of the wave packet, and c is the propagation speed (consistent with the meaning and value specified in Equation (19)). The analytical solution describes a Gaussian wave packet propagating in the
x-direction with velocity
c. Notably, the width of the wave packet increases gradually over time, indicative of a diffusion phenomenon, while its amplitude decreases, reflecting a dissipation phenomenon that leads to gradual attenuation during propagation. A schematic illustration of the wave evolution in the one-dimensional Burgers’ equation is presented in
Figure 3. It is worth noting that although no artificial noise was explicitly added to the test data, numerical disturbances may naturally arise during simulation and data preprocessing. In light of this, the C-POD method framework incorporated proper orthogonal decomposition (POD) as the initialization step for clustering. By extracting dominant energy modes, POD inherently suppressed high-frequency perturbations and served as a natural denoising mechanism. Consequently, C-POD demonstrated a degree of robustness to noise throughout the training and reconstruction processes, enhancing its reliability in unsupervised reduced-order modeling.
3.2. Database Construction
To evaluate the performance of the POD and C-POD methods, a dataset based on the Burgers’ equation was first constructed. The initial conditions and parameters were set as follows:
Initial wave peak position: x0 = 0
Initial wave packet width: σ = 1
Spatial range: x ∈ [0, 10], with Nx = 1000 uniformly distributed sampling points
Temporal range: t ∈ [0, 5], with waveforms sampled at intervals of Δt = 0.5
Wave speed c range: c ∈ [0.1, 2]
Dissipation coefficient β range: β ∈ [0, 1]
For each pair of (c, β), the Burgers’ equation was solved at all time steps within the defined temporal range. At each time step, the spatial field u(x,t) was recorded. Each such spatial field—corresponding to a unique combination of (c, β, t)—was treated as an independent snapshot. As a result, the snapshot matrix was constructed by stacking spatial fields obtained from different (c, β, t) combinations as column vectors. This resulted in a final snapshot matrix D1 of size 1000 × 303, where 1000 is the number of spatial points and 303 is the total number of snapshot instances across all parameters and time steps. This construction strictly followed the traditional POD framework, ensuring that each snapshot represented a single spatial field, with variations across snapshots arising from changes in either parameters or time—but not a mixture within a snapshot.
3.3. C-POD Method Accuracy Testing and Modal Decomposition
To evaluate the performance of the POD and C-POD methods, we conducted order reduction on the dataset and reconstructed the original data using various modal orders. The correlation coefficient and root mean square error (RMSE) were employed as evaluation metrics. While these two indicators, respectively, capture different aspects of reconstruction quality—trend similarity and numerical deviation—their combined use is not only less intuitive but also somewhat cumbersome in comparative analysis. Therefore, we introduced a new metric, Cr, defined in Equation (22), which integrates both correlation and error information to provide a more concise and comprehensive assessment of model performance:
where the root mean square error (RMSE) and the Pearson correlation coefficient
r are defined as shown in Equations (23) and (24):
where
n is the number of data points,
is the actual value,
is the predicted value,
is the mean of the actual data, and
is the mean of the predicted data. A higher Cr value indicates an increasing trend in RMSE or a decreasing trend in the correlation coefficient, suggesting poorer reduced-order modeling performance.
Under varying levels of modal truncation, the datasets were reduced using both the POD and C-POD methods. The resulting modal coefficients and orthogonal bases were then employed to reconstruct the original datasets.
Figure 4 and
Figure 5 present the average and maximum values of Cr, computed using Equation (22). The average Cr value reflects the overall reduction capability, while the maximum value indicates the lower bound of this capability. As illustrated in
Figure 4, both methods exhibited enhanced reduction performance with increasing modal truncation, stabilizing at higher modal orders. For modal orders below 6, the reduction capabilities of the two methods were comparable; however, for order values above this threshold, the C-POD method significantly outperformed POD.
Figure 4 further shows that, regarding the lower bound of reduction capabilities, C-POD was superior at modal orders 7, 8, and 9, whereas POD maintained an advantage at other modal orders. These findings preliminarily highlight the superior reduction accuracy of the C-POD method.
To compare the modal extraction capabilities of the C-POD and POD methods, we analyzed the reduced-order modeling of the Burgers’ equation under the following conditions:
c = 0.1,
β = 0, and
c = 2,
β = 1. At times
t = 0 and 2 s, both methods were applied to extract reduced-order modes, and the solutions reconstructed using only the first mode were analyzed. The temporal evolution of these first-mode-reconstructed solutions was tracked using the ECEPM method, as illustrated in
Figure 6 and
Figure 7.
As observed from
Figure 6 and
Figure 7, the first-mode-reconstructed solutions based on the C-POD method clearly exhibited time-dependent characteristics, including a gradual reduction in peak amplitude and forward propagation along the positive
x-axis. In contrast, the first-mode reconstruction using the POD method showed minimal temporal variation, failing to capture the evolving features of the waveform.
Further comparison revealed that, in the case of C-POD, the rate of peak amplitude decay was positively correlated with the parameter β, while the propagation distance along the x-axis increased with wave speed c. These results suggested that, compared to POD, the C-POD method provided more physically meaningful first-mode reconstructions that better reflected the temporal evolution of the system.
5. Application Value of C-POD ROM in Inverse Problems
The Gappy POD method [
31,
32] is a derivative of the POD reduced-order model, primarily used to solve the full-field reconstruction inverse problem when only sparse sensor data are available. The accuracy of this method heavily depends on the precision of the reduced-order model under low-order conditions. In both POD and C-POD methods, low-order modes represent the primary dynamics of the system, typically carrying most of the energy or important information, while high-order modes often represent finer details or noise. For a system, low-order modes are typically associated with the main frequencies and significant physical behaviors, making them more accurate for data reconstruction. Through the analysis in
Section 5, we found that, in the cylinder wake problem, the C-POD method had a greater advantage over POD at lower modal orders. This low-order advantage should be better reflected in inverse problems based on reduced-order methods. Continuing with the cylinder wake case described in
Section 4.4.3, we describe the performance of this inverse problem in the cylinder wake scenario as follows: placing a specified number of sparse sensors in the vorticity field of the cylinder wake and reconstructing the vorticity field in real time based only on the sparse sensor data, to capture the complete vortex evolution of the cylinder wake.
Here, 80% of the original dataset was randomly selected as the training set, and the remaining 20% was used as the validation set. Using the correlation coefficient filtering method proposed by Qingyang Yuan [
33], seven optimal sparse sensor locations were selected in the vorticity field behind the cylinder. Specifically, this method computed the Pearson correlation coefficient between each candidate sensor point and the low-order POD modal coefficients obtained from the training set, then ranked and selected sensor locations with the highest absolute correlations. This ensured that the selected sensors were most sensitive to the dominant dynamic features of the flow field. These sparse sensors can output real-time vorticity information at their respective positions. Based on the sparse sensor data, the missing reconstruction methods based on POD (Gappy POD) and C-POD (Gappy C-POD) were used to reconstruct the cylinder wake flow field under different modal orders, and the reconstruction capabilities were measured, as shown in
Figure 20 and
Figure 21.
From
Figure 20 and
Figure 21, it is observed that the reconstruction accuracy of the Gappy POD method increased initially and then decreased as the modal order increased. When the modal order was 5, the Gappy POD method achieved its optimal reconstruction accuracy, with an average Cr value of 0.81 and a maximum Cr value of 0.97. When the modal order was less than 12, the reconstruction accuracy of the Gappy C-POD method followed a similar trend, increasing initially and then decreasing. When the modal order was 6, the average Cr value was 0.65 and the maximum Cr value was 0.84. However, when the modal order exceeded 12, the reconstruction accuracy of Gappy C-POD decreased as the modal order increased. When the modal order reached 20, the average Cr value dropped to 0.49, lower than the average value of 0.65 at modal order 6. This demonstrated the significant advantage of the Gappy C-POD method over the Gappy POD method in reconstructing inverse problems. The average reconstruction accuracy of Gappy C-POD was improved by 19.75% compared to Gappy POD, and the lower bound of reconstruction capability was improved by 13.4%. Furthermore, the Gappy C-POD method can achieve higher field reconstruction accuracy under higher modal order conditions, as higher modal orders can capture more complex flow behaviors. Therefore, the Gappy C-POD method is more suitable for field reconstruction in complex nonlinear systems.
Based on the above analysis, a modal order of 5 was chosen for the Gappy POD method and a modal order of 6 for the Gappy C-POD method. The reconstruction of the cylinder wake vorticity field at
t = 141 s is shown in
Figure 22 and
Figure 23. From left to right, the three images in
Figure 22 and
Figure 23 represent the cylinder wake vorticity field at
t = 141 s, the reconstructed field with sparse sensor layout, and the distribution of reconstruction relative error. From
Figure 21 and
Figure 22, it can be seen that the maximum error in the field reconstruction using the Gappy C-POD method was 25%, which is smaller than the maximum error of 35% in the field reconstruction using the Gappy POD method.
Limitations and Future Improvements
While the proposed C-POD ROM achieved improved reconstruction accuracy and enhanced interpretability compared to traditional POD-based methods, certain limitations remain, particularly in practical applications.
The integration of clustering and entropy-based ranking introduces additional computational burden. When dealing with large-scale datasets or increasing the number of clusters, both memory usage and computation time grow considerably.
For tasks involving sparse sensor reconstruction, the spatial placement of sensors significantly affects the performance. If the sensors fail to cover dynamically important regions, the model may suffer from decreased reconstruction fidelity. This highlights the necessity of coupling reduced-order modeling with sensor placement optimization.
The performance of clustering and entropy-controlled ranking is sensitive to several hyperparameters, such as the number of clusters and entropy scaling coefficients. Inappropriate parameter settings can degrade the model’s effectiveness or lead to unstable results.
To address these limitations, future work may focus on:
- (1)
Developing lightweight clustering algorithms or incremental learning schemes to reduce computational overhead.
- (2)
Exploring adaptive sensor deployment strategies.
- (3)
Investigating robust and data-driven hyperparameter tuning frameworks.
These improvements will enhance the scalability and robustness of C-POD ROM in more complex and data-limited fluid dynamics scenarios.
6. Conclusions
This work introduced a clustering-based dimensionality reduction method guided by POD structures (C-POD), aiming to address the instability of clustering center initialization and the lack of effective mode ranking in conventional clustering-based approaches. By integrating the orthogonality and interpretability of POD with the flexibility of unsupervised clustering, and by incorporating an entropy-controlled Euclidean-to-probability mapping (ECEPM) for robust mode evaluation, the proposed method enhanced both the stability and transparency of dimensionality reduction.
The C-POD method was validated through applications to convective flow problems, including the one-dimensional Burgers’ equation and the two-dimensional cylinder wake flow. The results demonstrated that C-POD effectively identified dominant flow structures and preserved essential dynamics under reduced-dimensional settings. Moreover, it showed strong resilience in sparse reconstruction tasks, delivering better performance when only limited sensor data or a small number of modes were available.
Beyond improved numerical accuracy, the method provided meaningful insights into flow physics, making it well-suited for real-time prediction, control, and inverse analysis of complex unsteady systems.
Future work will extend this framework to three-dimensional turbulent convective flows and explore adaptive clustering mechanisms that dynamically optimize cluster configurations, further enhancing model scalability and generalization.