A Clustering-Based Dimensionality Reduction Method Guided by POD Structures and Its Application to Convective Flow Problems

Qingyang Yuan; Bo Zhang

doi:10.3390/a18060366

and

School of Energy and Power Engineering, Dalian University of Technology, Dalian 116081, China

^*

Author to whom correspondence should be addressed.

Algorithms2025, 18(6), 366;https://doi.org/10.3390/a18060366

This article belongs to the Special Issue Algorithmic Innovations: Bridging Theoretical Foundations and Practical Applications

Version Notes

Order Reprints

Review Reports

Abstract

Proper orthogonal decomposition (POD) is a widely used linear dimensionality reduction technique, but it often fails to capture critical features in complex nonlinear flows. In contrast, clustering methods are effective for nonlinear feature extraction, yet their application in dimensionality reduction methods is hindered by unstable cluster initialization and inefficient mode sorting. To address these issues, we propose a clustering-based dimensionality reduction method guided by POD structures (C-POD), which uses POD preprocessing to stabilize the selection of cluster centers. Additionally, we introduce an entropy-controlled Euclidean-to-probability mapping (ECEPM) method to improve modal sorting and assess mode importance. The C-POD approach is evaluated using the one-dimensional Burgers’ equation and a two-dimensional cylinder wake flow. Results show that C-POD achieves higher accuracy in dimensionality reduction than POD. Its dominant modes capture more temporal dynamics, while higher-order modes offer better physical interpretability. When solving an inverse problem using sparse sensor data, the Gappy C-POD method improves reconstruction accuracy by 19.75% and enhances the lower bound of reconstruction capability by 13.4% compared to Gappy POD. Overall, C-POD demonstrates strong potential for modeling and reconstructing complex nonlinear flow fields, providing a valuable tool for dimensionality reduction methods in fluid dynamics.

Keywords:

POD (proper orthogonal decomposition); clustering-based dimensionality reduction method guided by POD structures (C-POD); dimensionality reduction method; nonlinear model order reduction (NMOR)

1. Introduction

Efficient representation of high-dimensional spatiotemporal flow fields and accurate capture of their dynamic evolution have long posed significant challenges in computational fluid dynamics (CFD). This is especially true for flow systems, such as cylinder wake flows [,], which exhibit both classical and complex behaviors. Although traditional CFD methods deliver high computational accuracy, their substantial computational costs and storage requirements severely limit their applicability in real-time prediction and optimization.

To overcome these limitations, a wide range of model reduction strategies—broadly referred to as reduced-order models (ROMs) [,]—have been used to construct low-dimensional representations that capture dominant flow features. The fundamental strength of such approaches lies in establishing a robust mapping between high-dimensional dynamical systems and their low-dimensional surrogates—a task that involves not only compressing data but also uncovering the essential physical mechanisms underlying fluid motion.

Traditional ROM methods can be broadly categorized into three types:

(1): Modal decomposition projection methods (e.g., POD-Galerkin []) construct low-dimensional subspaces by extracting dominant modes and projecting governing equations onto these subspaces.
(2): Balanced truncation methods [] utilize controllability and observability analysis to retain state variables most significant to input–output behavior.
(3): Harmonic balance methods [] address periodic flows by approximating steady-state solutions in the frequency domain using a finite set of Fourier basis functions.

These approaches are physically grounded and interpretable, offering energy-optimal modal selection and high-fidelity reconstruction. However, they often rely on linear subspace assumptions and require precise numerical integration, which can be limiting for nonlinear or geometrically complex flows. Beyond traditional projection-based approaches, data-driven techniques have enabled a broader exploration of flow structures without reliance on governing equations. These can be classified into:

(1): Data-driven modal methods, like snapshot POD [] and dynamic mode decomposition (DMD) [], which extract coherent structures directly from simulation or experimental data. They are equation-free but often sensitive to noise and require extensive datasets.
(2): Neural-network-based approaches, such as autoencoders [] and their derivatives [], which learn nonlinear latent representations but may suffer from overfitting and lack interpretability.
(3): Hybrid strategies combining ROM and machine learning, such as POD-RBF, POD-LSTM [], and physics-informed neural networks (PINNs) [,], aiming to balance physical relevance and learning flexibility.

In addition to standard POD, various studies have expanded its framework:

Colanera [] proposed a robust spectral POD (SPOD) integrating robust principal component analysis for improved noise resilience.
Gu [] introduced frequency-domain POD (FD-POD) to reconstruct unsteady flows using embedded frequency information.
Bui [] developed POD-ISAT, combining POD with in situ adaptive tabulation for efficient steady-state PDE approximation.

Snapshot POD remains a popular tool due to its algorithmic simplicity and modal resolution, but its linear nature limits its ability to capture nonlinear interactions, such as turbulence. This motivates the search for more expressive and adaptive dimensionality reduction techniques.

Among these, clustering-based methods have emerged as powerful alternatives for representing nonlinear, multi-regime flow behaviors. Instead of relying on orthogonal basis functions, they group data points by similarity in feature space, making them well-suited to reflect local state transitions and complex dynamics. This is particularly useful in transitional flows, where traditional modal decompositions fall short.

While many clustering-based methods are inspired by ROM principles, in the strictest sense, ROMs refer to low-dimensional dynamical systems derived via projection (e.g., Galerkin or Petrov-Galerkin) from governing equations. The method proposed in this study does not involve such a projection and does not construct time-evolving models. Instead, it focuses on extracting and organizing spatial flow structures directly from data. Therefore, our approach is best categorized as a data-driven dimensionality reduction method, guided by physical insights from POD.

Building on these foundations, clustering-based dimensionality reduction methods have shown increasing promise. Burkardt et al. [] proposed a CVT-based approach but did not address mode importance. Iqbal et al. [] introduced modified pole clustering for improved initialization, with emphasis on clustering rather than reduction. Huera-Huarte and D’Adamo [,] combined clustering and POD to analyze vortex dynamics, yet lacked an effective mode-ranking strategy. Wei et al. [] presented CROM, which models transitions via a Markov matrix, but its dimensionality and sparsity hinder application to complex systems.

To tackle the dual challenges of unstable clustering center initialization and the absence of mode ranking, we propose a clustering-based dimensionality reduction method guided by POD structures (C-POD). This framework leverages POD to enhance initialization robustness and introduces a novel entropy-controlled Euclidean-to-probability mapping (ECEPM) to provide probabilistic, interpretable mode ranking.

We validate the C-POD framework on the one-dimensional Burgers’ equation and a two-dimensional cylinder wake flow. We further test its performance in an inverse reconstruction task using sparse sensor data.

To enhance conceptual clarity, Figure 1 summarizes the taxonomy of the methods discussed.

Figure 1. Taxonomy of ROM.

In summary, this study offers the following contributions:

A clustering-based dimensionality reduction framework guided by POD, improving stability in nonlinear mode extraction.
A novel ECEPM-based probabilistic mode-ranking method.
Empirical validation demonstrating improved interpretability and reconstruction accuracy over traditional POD methods.

2. C-POD Method and Modal Sorting Method

2.1. C-POD Method

The proper orthogonal decomposition (POD) method, based on singular value decomposition (SVD), is a well-established, data-driven model reduction technique. POD generates a set of orthogonal basis functions that encapsulate the dominant features of a system. By employing linear combinations of these selected basis functions, one can effectively approximate the original system, thereby transforming a high-dimensional problem into a low-dimensional representation. To preserve computational resources, the snapshot POD method proposed by Sirovich is frequently used for dimensionality reduction of the dataset []. However, POD is inherently a linear reduction technique. Although it captures the primary dynamic features of linear systems, it may not fully represent the complexities of nonlinear flows, particularly when nonlinear behaviors vary across multiple time steps and spatial domains. In such cases, the low-rank approximation afforded by POD might not sufficiently capture these nonlinear effects.

Clustering methods exhibit robust capabilities in extracting nonlinear features, effectively capturing patterns and regularities in data, especially within complex nonlinear systems. Moreover, these methods can autonomously adapt to diverse data patterns, mapping both linear and nonlinear datasets into a low-dimensional space that elucidates underlying dynamic characteristics. Consequently, this paper integrates clustering concepts with model reduction techniques by proposing a clustering-based dimensionality reduction framework, referred to as C-POD, to enhance the effectiveness of proper orthogonal decomposition (POD) in addressing nonlinear model reduction challenges. The construction process is detailed as follows:

(1): Arrange the responses of m data points under n different operating conditions as columns to form the database matrix D:

D = {d_{1}, d_{2}, \dots, d_{n}} \in R^{m \times n}

(1)

Here, d_i ∈ R^m (i = 1, 2, …, n) represents the data snapshot under the i-th operating condition.

(2): Determining initial cluster centers based on the snapshot POD method []. First, the correlation matrix of the snapshot matrix D is constructed:

C = D D^{T} \in R^{m \times m}

(2)

Solving the eigenvalue problem of the correlation matrix C:

C φ_{i} = λ_{i} φ_{i} (i = 1, 2, \dots m)

(3)

Here, λ_i represents the eigenvalue, and φ_i is the corresponding eigenvector. The eigenvalues are arranged in descending order:

λ_{1} \geq λ_{2} \geq \dots \geq λ_{m} \geq 0

(4)

The corresponding eigenvectors φ_i form an orthogonal basis, and the first K eigenvectors are selected as the initial cluster centers:

{μ_{i}}^{(0)} = φ_{i} (i = 1, 2, \dots K)

(5)

(3): Sample assignment. Each column vector dⱼ (j = 1, 2, …, n) from the data matrix D is assigned to the nearest cluster center, forming K clusters:

S_{i}^{(t)} = {d \in D |{‖d - μ_{i}^{(t - 1)}‖}^{2} \leq {‖d - μ_{j}^{(t - 1)}‖}^{2}, \forall j \neq i}

(6)

Here, S_i^(t) represents the number of vectors in the i cluster, μ_i denotes the i-th cluster centroid, t is the current iteration count, and ||·|| represents the Euclidean distance norm.

(4): Updating the cluster centers. The cluster centers are updated by calculating the mean of all vectors in each cluster S_i, and this mean becomes the new cluster centroid:

μ_{i}^{(t)} = (\frac{1}{|S_{i}^{(t)}|}) \sum_{x \in S_{i}^{(t)}} x (i = 1, 2, \dots, K)

(7)

where

S_{i}^{(t)}

denotes the set of snapshots assigned to cluster i. Each

μ_{i}^{(t)}

can be interpreted as a local average attractor in the trajectory space of the flow field—analogous to a mean coherent structure in physical space.

(5): Repeat steps (6) and (7) until the following convergence criterion is met:

{‖μ_{i}^{(t)} - μ_{i}^{(t - 1)}‖}^{2} < ε, \forall i \in {1, 2, \dots, K}

(8)

(6): To obtain the modal coefficients, project the original data matrix D onto the cluster bases μ by solving the following least squares problem []:

a_{r} = {(μ^{T} μ)}^{- 1} μ^{T} d_{r} (r = 1, 2, \dots, n)

(9)

Here, a_r ∈ R^k is the modal coefficient vector corresponding to the r-th snapshot. Physically, solving the least squares problem in Equation (9) provides a projection of each snapshot

d_{r}

onto the low-dimensional basis formed by cluster centers

{μ_{1}, μ_{2}, \dots, μ_{K}}

. These cluster centers are not arbitrary vectors but statistically representative patterns extracted from the dataset. As such, the projection yields modal coefficients a_r that quantify how much each typical flow pattern (i.e., cluster mode) contributes to the reconstruction of the current state. The full data reconstruction expression is:

\hat{D} = μ A

(10)

where A = [a₁, a₂, …, a_n] ∈ R^k^xⁿ is the modal coefficient matrix, and

\hat{D}

is the reconstructed data matrix. Through the C-POD method, the database matrix D is decomposed into low-dimensional clustering bases

μ

and the modal coefficient matrix A. The construction of the C-POD method is illustrated in Algorithm 1, which presents the pseudocode of the entire procedure.

Algorithm 1: C-POD Method

Input:

D = {d_{1}, d_{2}, \dots, d_{n}} \in R^{m \times n}

//Snapshot matrix
K //Number of clusters
Output:

μ = {μ_{1}, μ_{2}, \dots, μ_{Κ}}

//Cluster basis
A = [a₁ a₂ … a_n] //Modal coefficients
Steps:
1. POD Initialization
- Solving the eigenvalue problem:

C φ_{i} = λ_{i} φ_{i} (i = 1, 2, \dots m)

- Select first K eigenvectors as the initial cluster centers:

{μ_{i}}^{(0)} = φ_{i} (i = 1, 2, \dots K)

2. Clustering
- Assign each snapshot vector to its closest cluster center
- Compute cluster centroids:

μ_{i}^{(t)} = (\frac{1}{|S_{i}^{(t)}|}) \sum_{x \in S_{i}^{(t)}} x (i = 1, 2, \dots, K)

3. Compute modal coefficients
- Modal coefficient vector a_r ∈ R^k:

a_{r} = {(μ^{T} μ)}^{- 1} μ^{T} d_{r} (r = 1, 2, \dots, n)

4. Return

μ

, A

2.2. Entropy-Controlled Euclidean-to-Probability Mapping Modal Sorting Method

One notable advantage of the POD method is its capacity to quantify the contribution of each mode to the reconstruction of the original data. This contribution is indicated by the singular values, which represent the projection strength of the original data matrix onto the orthogonal modes. Larger singular values imply a more significant contribution from the corresponding mode in reconstructing the data, thereby identifying it as a dominant mode that captures the primary operational characteristics of the system. In contrast, the C-POD method does not involve singular value decomposition during its construction, rendering it infeasible to assess mode contributions based on singular value magnitudes. However, in the initialization phase, we solve the eigenvalue problem of the snapshot correlation matrix—whose eigenvalues are mathematically related to the singular values of the original data matrix. These eigenvalues could theoretically be used to rank modes by importance. Yet, because C-POD replaces orthogonal POD modes with clustering centroids—which are neither orthogonal nor ordered—the original eigenvalue-based ranking becomes inconsistent and inapplicable. To address this, we propose a novel probabilistic mode sorting method: the entropy-controlled Euclidean-to-probability mapping (ECEPM), which offers a robust and interpretable way to evaluate mode significance within the C-POD framework. This probabilistic mode sorting framework incorporates both the spatial proximity between modes and snapshots, and the consistency of modal contributions. It effectively replaces energy-based sorting in POD by offering a statistical interpretation of mode importance, thereby enabling robust feature extraction and mode separation in the clustering-based dimensionality reduction method.

Euclidean distance: The Euclidean distance quantifies the average proximity between each clustering mode and the dataset samples. A smaller distance suggests that the corresponding mode plays a significant role across a broader range of system states, thus representing a dominant dynamic feature. This measure offers clear geometric and physical interpretability.

(1): Euclidean distance matrix construction: Assume the database matrix $D \in R^{m \times n}$ , clustering bases $μ \in R^{m \times b}$ , and the column vectors are $D = {d_{1}, d_{2}, \dots, d_{n}} \in R^{m \times n}$ and $μ \in {μ_{1}, μ_{2}, \dots, μ_{n}} \in R^{m \times b}$ , where $C_{x, y} = ‖d_{x} - μ_{y}‖$ .
(2): Compute the column-wise sum of squares for $D$ and $μ$ , as shown in Equations (11) and (12):

$S_{D} = [{‖d_{1}‖}^{2}, {‖d_{2}‖}^{2}, \dots, {‖d_{n}‖}^{2}] \in R^{1 \times n}$

(11)

$S_{μ} = [{‖μ_{1}‖}^{2}, {‖μ_{2}‖}^{2}, \dots, {‖μ_{n}‖}^{2}] \in R^{1 \times b}$

(12)
(3): Construct the inner product matrix G:

$G = μ^{T} D$

(13)
(4): Construct the squared Euclidean distance matrix:

$C^{2} = S_{μ}^{T} 1_{1 \times n} + 1_{1 \times n} S_{D} - 2 G$

(14)

where 1 is a matrix with all elements equal to 1, so the Euclidean distance matrix is:

$C = \sqrt{C^{2}}$

(15)

Probability mapping function construction: Relying solely on distance may lead to overestimating the importance of modes with sporadic proximity. To enhance robustness, we introduce information entropy as a regulatory factor, penalizing modes with dispersed or unstable contributions. A lower entropy value indicates that the mode has a focused and consistent influence on specific samples, making it more reliable for mode ranking. After obtaining the Euclidean distance matrix

C \in R^{b \times n}

, the goal is to transform it into a weight probability mapping matrix

W \in R^{b \times n}

, where

W_{i j}

represents the probability value of the i-th clustering base for the j-th sample. A higher probability value indicates a greater contribution of that clustering base to the current sample value, while satisfying

\sum_{i = 1}^{b} W_{i j} = 1

.

(1): Take the negative of the distance matrix C and shift each column, to avoid overflow in exponential calculations, resulting in the matrix Z:

$Z = - C - M ⊙ 1_{b}^{T}$

(16)

where $M = [M_{1}, M_{2}, \dots, M_{k}]$ , $M_{j} = \max_{i} {(- D)}_{i j}$ , and $⊙$ represents the outer product.
(2): Perform the exponential operation on the shifted matrix and introduce the temperature parameter $τ > 0$ , resulting in:

$E = \exp (\frac{Z}{τ})$

(17)
(3): Normalize the exponential matrix column-wise to obtain the final probability weight matrix W:

$W = E \emptyset (1_{n} 1_{n}^{T} E)$

(18)

where $\emptyset$ represents element-wise division, $1_{n}^{T} E$ is the sum of each column, and $1_{n} 1_{n}^{T} E$ extends the column sum to a matrix where all columns are identical. In the construction of the probability mapping function, the temperature parameter $τ > 0$ is used to adjust the entropy of the probability distribution. The temperature parameter $τ$ plays a critical role in shaping the entropy level of the probability mapping. A smaller $τ$ (e.g., $τ$ ≈ 0.1) leads to a more peaked distribution, favoring one dominant mode per snapshot, which increases mode separability but may result in instability. In contrast, a larger $τ$ (e.g., $τ$ > 5) produces a flatter distribution, balancing contributions across multiple modes and improving robustness but potentially reducing interpretability.

To balance these trade-offs, we recommend choosing

τ

based on the desired sharpness of mode assignment. In this study,

τ

was selected via grid search in the range [0.1, 5], and the optimal value was determined by evaluating the reconstruction performance using the Cr index. This approach ensures that the mode-ranking mechanism remains both physically meaningful and numerically stable.

Through the above entropy-controlled Euclidean-to-probability mapping method, the probability weight matrix W was obtained. For different samples in the database matrix, W_ij represents the probability value of the i-th clustering base for the j-th sample. A higher probability value indicates a greater contribution of that clustering base to the current sample value. Therefore, by examining the values in W, the importance of each clustering base in the C-POD method was distinguished. The construction process of the C-POD method and the entropy-controlled Euclidean-to-probability mapping method is shown in Figure 2. In Section 3 and Section 4, to validate the effectiveness of the C-POD method, we applied it to two classic fluid dynamics problems: the one-dimensional Burgers’ equation and the two-dimensional cylinder wake flow. These problems exhibited distinct nonlinear characteristics, making them ideal for testing the practical applicability of the method.

Figure 2. Schematic representation of the C-POD method construction process.

The full procedure of the ECEPM is summarized in Algorithm 2.

Algorithm 2: ECEPM

Input:

D \in R^{m \times n}

//Snapshot data matrix

μ \in R^{m \times b}

//Clustering bases (b modes)

τ > 0

//Temperature parameter for entropy control
Output:

W \in R^{b \times n}

//Probability weight matrix
Steps:
1. Compute the squared norms of D and μ:
-

S_{D} = [{‖d_{1}‖}^{2}, {‖d_{2}‖}^{2}, \dots, {‖d_{n}‖}^{2}] \in R^{1 \times n}

-

S_{μ} = [{‖μ_{1}‖}^{2}, {‖μ_{2}‖}^{2}, \dots, {‖μ_{n}‖}^{2}] \in R^{1 \times b}

2. Compute the inner product matrix

G = μ^{T} D

3. Compute the squared Euclidean distance matrix:
-

C^{2} = S_{μ}^{T} 1_{1 \times n} + 1_{1 \times n} S_{D} - 2 G

-

C = \sqrt{C^{2}}

4. Shift the distance matrix column-wise to prevent numerical overflow:

Z = - C - M ⊙ 1_{b}^{T}

//Broadcasting to each row
5. Compute exponentials with entropy scaling:

E = \exp (\frac{Z}{τ})

6. Normalize E column-wise to obtain the probability matrix:

W = E \emptyset (1_{n} 1_{n}^{T} E)

//Element-wise division, columns sum to 1
Return:
W //Each column W_j contains probabilities of modes for sample j

3. Burgers’ Equation and C-POD Method

3.1. Introduction to the One-Dimensional Burgers’ Equation

As a nonlinear partial differential equation, the Burgers’ equation holds significant research value across various domains, including fluid mechanics [] and meteorological forecasting []. In addition to characterizing the propagation of nonlinear waves, it encapsulates essential physical phenomena, such as viscosity and diffusion. Its inherent nonlinearity facilitates the simulation of complex flow phenomena, particularly in convection-dominated fields where highly nonlinear behaviors are prevalent. Dimensionality reduction methods aim to minimize computational costs while preserving the fundamental characteristics of the system. Consequently, using the Burgers’ equation as a test case provides an effective means to evaluate the efficacy and accuracy of these models in addressing nonlinear and transient flow processes. In this study, the accuracy of the POD and C-POD methods was comparatively assessed using a one-dimensional Burgers’ equation as the benchmark case. The mathematical formulation of the Burgers’ equation is presented as follows:

\frac{\partial u (x, t)}{\partial t} + c \frac{\partial u (x, t)}{\partial x} = β \frac{\partial^{2} u (x, t)}{\partial x^{2}}

(19)

Let u(x, t) denote the wave amplitude, c represent the wave speed (i.e., the coefficient of the convection term), and β denote the dissipation coefficient. The one-dimensional Burgers’ equation incorporates both convection and diffusion characteristics, reflecting aspects of the heat conduction equation and the Navier–Stokes equations. Consequently, it establishes a conceptual link to both heat conduction and convection phenomena.

Consider the initial condition in the form of a Gaussian pulse, as shown in Equation (20):

u (x, 0) = \exp (- \frac{{(x - x_{0})}^{2}}{2 σ^{2}})

(20)

Under this initial condition, the analytical solution to the one-dimensional Burgers’ equation is given by []:

u (x, t) = \frac{1}{\sqrt{1 + \frac{2 β t}{σ^{2}}}} \exp (- \frac{{(x - x_{0} - c t)}^{2}}{2 σ^{2}})

(21)

Here, x_o denotes the initial peak position of the wave, σ represents the width of the wave packet, and c is the propagation speed (consistent with the meaning and value specified in Equation (19)). The analytical solution describes a Gaussian wave packet propagating in the x-direction with velocity c. Notably, the width of the wave packet increases gradually over time, indicative of a diffusion phenomenon, while its amplitude decreases, reflecting a dissipation phenomenon that leads to gradual attenuation during propagation. A schematic illustration of the wave evolution in the one-dimensional Burgers’ equation is presented in Figure 3. It is worth noting that although no artificial noise was explicitly added to the test data, numerical disturbances may naturally arise during simulation and data preprocessing. In light of this, the C-POD method framework incorporated proper orthogonal decomposition (POD) as the initialization step for clustering. By extracting dominant energy modes, POD inherently suppressed high-frequency perturbations and served as a natural denoising mechanism. Consequently, C-POD demonstrated a degree of robustness to noise throughout the training and reconstruction processes, enhancing its reliability in unsupervised reduced-order modeling.

Figure 3. Schematic of wave evolution in the one-dimensional Burgers’ equation.

3.2. Database Construction

To evaluate the performance of the POD and C-POD methods, a dataset based on the Burgers’ equation was first constructed. The initial conditions and parameters were set as follows:

Initial Conditions:

Initial wave peak position: x₀ = 0

Initial wave packet width: σ = 1

Spatial and Temporal Discretization:

Spatial range: x ∈ [0, 10], with Nx = 1000 uniformly distributed sampling points

Temporal range: t ∈ [0, 5], with waveforms sampled at intervals of Δt = 0.5

Independent Variable Settings:

Wave speed c range: c ∈ [0.1, 2]

Dissipation coefficient β range: β ∈ [0, 1]

For each pair of (c, β), the Burgers’ equation was solved at all time steps within the defined temporal range. At each time step, the spatial field u(x,t) was recorded. Each such spatial field—corresponding to a unique combination of (c, β, t)—was treated as an independent snapshot. As a result, the snapshot matrix was constructed by stacking spatial fields obtained from different (c, β, t) combinations as column vectors. This resulted in a final snapshot matrix D₁ of size 1000 × 303, where 1000 is the number of spatial points and 303 is the total number of snapshot instances across all parameters and time steps. This construction strictly followed the traditional POD framework, ensuring that each snapshot represented a single spatial field, with variations across snapshots arising from changes in either parameters or time—but not a mixture within a snapshot.

3.3. C-POD Method Accuracy Testing and Modal Decomposition

To evaluate the performance of the POD and C-POD methods, we conducted order reduction on the dataset and reconstructed the original data using various modal orders. The correlation coefficient and root mean square error (RMSE) were employed as evaluation metrics. While these two indicators, respectively, capture different aspects of reconstruction quality—trend similarity and numerical deviation—their combined use is not only less intuitive but also somewhat cumbersome in comparative analysis. Therefore, we introduced a new metric, Cr, defined in Equation (22), which integrates both correlation and error information to provide a more concise and comprehensive assessment of model performance:

Cr = \frac{RMSE}{|r|}

(22)

where the root mean square error (RMSE) and the Pearson correlation coefficient r are defined as shown in Equations (23) and (24):

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(23)

r = \frac{\sum_{i = 1}^{n} (y_{i} - \bar{y}) ({\hat{y}}_{i} - \bar{\hat{y}})}{\sqrt{\sum_{i = 1}^{n} (y_{i} - \bar{y}) \sum_{i = 1}^{n} ({\hat{y}}_{i} - \bar{\hat{y}})}}

(24)

where n is the number of data points,

y_{i}

is the actual value,

{\hat{y}}_{i}

is the predicted value,

\bar{y}

is the mean of the actual data, and

\bar{\hat{y}}

is the mean of the predicted data. A higher Cr value indicates an increasing trend in RMSE or a decreasing trend in the correlation coefficient, suggesting poorer reduced-order modeling performance.

Under varying levels of modal truncation, the datasets were reduced using both the POD and C-POD methods. The resulting modal coefficients and orthogonal bases were then employed to reconstruct the original datasets. Figure 4 and Figure 5 present the average and maximum values of Cr, computed using Equation (22). The average Cr value reflects the overall reduction capability, while the maximum value indicates the lower bound of this capability. As illustrated in Figure 4, both methods exhibited enhanced reduction performance with increasing modal truncation, stabilizing at higher modal orders. For modal orders below 6, the reduction capabilities of the two methods were comparable; however, for order values above this threshold, the C-POD method significantly outperformed POD. Figure 4 further shows that, regarding the lower bound of reduction capabilities, C-POD was superior at modal orders 7, 8, and 9, whereas POD maintained an advantage at other modal orders. These findings preliminarily highlight the superior reduction accuracy of the C-POD method.

Figure 4. Comparison of average Cr values under different modal orders: illustrating the overall reconstruction accuracy differences between POD and C-POD methods.

Figure 5. Comparison of maximum Cr values under different modal orders: illustrating the minimum reconstruction accuracy differences between POD and C-POD methods.

To compare the modal extraction capabilities of the C-POD and POD methods, we analyzed the reduced-order modeling of the Burgers’ equation under the following conditions: c = 0.1, β = 0, and c = 2, β = 1. At times t = 0 and 2 s, both methods were applied to extract reduced-order modes, and the solutions reconstructed using only the first mode were analyzed. The temporal evolution of these first-mode-reconstructed solutions was tracked using the ECEPM method, as illustrated in Figure 6 and Figure 7.

Figure 6. Time evolution of the waveform reconstructed using only the first mode extracted by the C-POD and POD methods, for c = 0.1 and β = 0.2, from t = 0 to t = 2.

Figure 7. Time evolution of the waveform reconstructed using only the first mode extracted by the C-POD and POD methods, for c = 2 and β = 2, from t = 0 to t = 2.

As observed from Figure 6 and Figure 7, the first-mode-reconstructed solutions based on the C-POD method clearly exhibited time-dependent characteristics, including a gradual reduction in peak amplitude and forward propagation along the positive x-axis. In contrast, the first-mode reconstruction using the POD method showed minimal temporal variation, failing to capture the evolving features of the waveform.

Further comparison revealed that, in the case of C-POD, the rate of peak amplitude decay was positively correlated with the parameter β, while the propagation distance along the x-axis increased with wave speed c. These results suggested that, compared to POD, the C-POD method provided more physically meaningful first-mode reconstructions that better reflected the temporal evolution of the system.

4. Cylinder Wake Flow Case Introduction

Two-dimensional flow around a cylinder represents a classical complex flow problem in fluid dynamics, characterized by phenomena such as vortex shedding, flow separation, and turbulence. In contrast to the one-dimensional Burgers’ equation, this flow configuration exhibits more intricate turbulent features, revealing a rich spectrum of physical phenomena. Consequently, it serves as an excellent benchmark for validating the efficacy of reduced-order models in capturing complex flow dynamics.

4.1. Model Introduction

To simulate two-dimensional fluid flow around a cylinder, the computational domain was configured as a rectangular region with the cylinder positioned near the mid-left boundary. The cylinder had a radius of 5 mm, and the domain dimensions were L_x = 1 mm (length) and L_y = 40 mm (width). Inflow velocities at the left boundary were varied to adjust the Reynolds number (Re), thereby capturing a range of flow phenomena. Figure 8 illustrates the geometric configuration of the computational domain.

Figure 8. Geometric dimensions of the computational domain for the two-dimensional cylinder wake flow.

For the two-dimensional incompressible Navier–Stokes equations, the motion of the fluid is described by the following equations:

\frac{\partial u}{\partial t} + (u \cdot \nabla) u = - \nabla p + ν \nabla^{2} u + f

(25)

\nabla \cdot u = 0

(26)

where u = (u, v) represents the velocity field components, with u being the velocity in the x direction and v is the velocity in the y direction, p is the pressure field, ν is the fluid’s kinematic viscosity, and f represents the body force (in this case, the body force field). For this two-dimensional flow problem, the governing equations were discretized using the finite difference method, and the implicit method combined with the alternating direction implicit (ADI) method was used for time-stepping solutions. The specific solution process is briefly described below:

(1): Using the alternating direction implicit (ADI) method, the equation was split into two one-dimensional equations. The implicit solution was first performed in the x direction, followed by the implicit solution in the y direction:

$\frac{\partial u}{\partial t} + u \frac{\partial u}{\partial x} + v \frac{\partial u}{\partial y} = - \nabla p + ν \nabla^{2} u$

(27)

$\frac{\partial v}{\partial t} + u \frac{\partial v}{\partial x} + v \frac{\partial v}{\partial y} = - \nabla p + ν \nabla^{2} v$

(28)
(2): The pressure field was obtained by solving the Poisson equation using the fast Fourier transform (FFT):

$\nabla^{2} p = \frac{ρ}{Δ t} (\frac{\partial u}{\partial x} + \frac{\partial v}{\partial y})$

(29)
(3): We set a fixed flow velocity at the left boundary x = 0 and applied a no-slip boundary condition on the surface of the cylinder.
(4): To quantify the rotational behavior in the flow field around the cylinder, the calculation of vorticity was introduced in the two-dimensional incompressible flow field. The vorticity is defined as:

$w = \frac{\partial v}{\partial x} - \frac{\partial u}{\partial y}$

(30)

The vorticity field distribution was obtained through the differential method, as follows:

w_{i j} = \frac{v_{i + 1, j} - v_{i - 1, j}}{2 Δ x} - \frac{u_{i, j + 1} - u_{i, j - 1}}{2 Δ y}

(31)

4.2. Grid Independence and Numerical Method Validation

To validate the reliability of the numerical method and the grid used in this study, we first performed a computational model verification for a fixed single cylinder with Re = 100. The effects of the total grid number and dimensionless time step on the results were considered, and the results were compared with the Stanton number and Lift coefficient values obtained from the literature [,,]. The comparison results are presented in Table 1. The dimensionless time step

Δ t *

is defined as shown in Equation (32):

Δ t^{*} = \frac{Δ t \cdot U_{0}}{D_{c}}

(32)

where

Δ t

represents the time step,

U_{0}

is the upstream inlet velocity of the cylinder, and

D_{c}

is the diameter of the cylinder.

Table 1. Comparison of numerical solutions with literature results for varying total grid numbers and dimensionless time steps.

From Table 1, it can be observed that the simulation results remained consistent as the grid number increased from 400 × 160 to 800 × 320. When the dimensionless time step was less than or equal to 0.01, the results were in close agreement with those in the literature. Therefore, in this study, we chose the A₃ condition, setting the total grid number to 400 × 160 and the dimensionless time step to 0.01.

4.3. Presentation of Computational Results

Figure 9 presents the vorticity distribution corresponding to several typical flow state transitions in the two-dimensional flow around a cylinder at Re = 150. The specific flow states are detailed as follows:

Figure 9. Typical flow patterns in the wake of a cylinder at Re = 150.

(1): From t = 1 to 9 s, a pair of side vortices formed on either side of the cylinder, gradually creating a pair of vortices with opposite rotational directions behind the cylinder.
(2): From t = 9 to 33 s, the counter-rotating vortices behind the cylinder gradually moved and diffused in the direction of the flow.
(3): From t = 33 to 110 s, the counter-rotating vortices behind the cylinder disappeared, and the flow became more stable.
(4): For times exceeding 110 s, the flow pattern underwent notable changes, with vortices detaching periodically to form a stable vortex street. Multiple regions of alternating positive and negative vorticity emerged, indicating pronounced shear and recirculation phenomena within the flow field—hallmarks of the classic Kármán vortex street.

4.4. Cylinder Wake Flow Field Dimensionality Reduction and Reconstruction Accuracy, and Modal Decomposition

4.4.1. Dataset and Test Set Construction

To test the accuracy variations of POD and C-POD in the reduced-order modeling and reconstruction of vorticity fields in flow around a cylinder, as well as their modal decomposition capabilities for the flow behavior of the cylinder, a database was constructed over the entire evolution of the cylinder’s flow pattern at Re = 150. Specifically, the time range was set from t = 0 to 145 s, with data collected every 0.1 s. The vorticity field distribution data were then written into the dataset, resulting in a total database size of 64,000 × 1450.

4.4.2. Dimensionality Reduction and Reconstruction Accuracy of Cylinder Wake Flow Field

This section assesses the accuracy of the POD and C-POD methods in both the dimensionality reduction and reconstruction of the cylinder wake vorticity field, as well as their modal decomposition capabilities for characterizing the cylinder’s wake flow behavior. As illustrated in Figure 10, the average Cr value of C-POD was higher at lower modal orders; however, as the mode number increased, the average Cr value of C-POD decreased rapidly and eventually stabilized at values consistently lower than those observed for POD. At higher modal orders, C-POD maintained a lower average Cr value, indicating improved reconstruction accuracy across multiple modes relative to POD. Furthermore, as shown in Figure 11, C-POD exhibited a lower maximum Cr value when managing higher modal orders compared to POD, suggesting a distinct advantage in minimizing the worst-case reconstruction error. Overall, C-POD demonstrated superior performance in terms of dimensionality reduction model accuracy—both in the lower bound of capability (maximum Cr) and the overall average accuracy (mean Cr)—thereby confirming its enhanced dimensionality reduction capability over that of POD.

Figure 10. Variation in average dimensionality reduction and reconstruction capability of POD and C-POD methods at different modal orders.

Figure 11. Variation in the lower bound of dimensionality reduction and reconstruction capability of POD and C-POD methods at different modal orders.

4.4.3. Comparison of Modal Decomposition Results

Based on Figure 9, the first ten modes were selected, and the POD and C-POD methods were used to decompose the vorticity field of the cylinder wake at t = 236 s and t = 240 s under the condition of Re = 150, when the flow transitioned into a stable vortex street. The vorticity field dominated by the first six modes is shown in Figure 12 and Figure 13. From the comparison of Figure 12 and Figure 13, it can be observed that the vorticity field dominated by the first six modes extracted by C-POD mainly contained four types of vortical structures, as shown in Figure 14. These are named as:

Figure 12. Distribution of the vorticity field dominated by the first six modes extracted by C-POD and POD at t = 236 s.

Figure 13. Distribution of the vorticity field dominated by the first six modes extracted by C-POD and POD at t = 240 s.

Figure 14. Various vortex patterns in the vorticity field dominated by C-POD modes.

(1): Type 1 vortex: Symmetrical side vortices extending downstream from both sides of the cylinder into the flow region.
(2): Type 2 vortex: Counter-rotating pair of vortices formed at the rear of the cylinder due to recirculation, resembling a “horseshoe” shape.
(3): Type 3 vortex: Developing horseshoe-shaped vortices.
(4): Type 4 vortex: Asymmetric side vortices on both sides of the cylinder.

Between t = 236 s and t = 240 s, vortex formation in the vorticity field was characterized by changes in the 1st, 2nd, and 5th modes, with all vortices exhibiting a contraction trend toward the rear of the cylinder. Concurrently, the rotational direction of vortices in the 1st, 3rd, 5th, and 6th modes reversed. Lower-order POD modes corresponded to the primary flow structures and were thus more intuitively interpretable. However, as the modal order increased, modes such as the 2nd and 3rd captured more complex flow characteristics, including primary vortices, secondary vortices, and localized vortices. Despite this, the physical interpretation of vortex evolution—particularly the temporal dynamics of higher-order modes—remains a challenging endeavor.

To further validate the reliability of the ECEPM method in ranking the significance of the basis for C-POD method clustering, the total number of modes was set to 10. The ECEPM method selected the first n modes, reconstructing the vorticity field distributions at t = 236 s and t = 240 s while computing the corresponding Cr values. These results were then compared with those obtained using the POD method, as shown in Figure 15. Figure 15 illustrates that when the modal order was less than 3, the reconstruction accuracy of the C-POD method increased rapidly with additional modes, exhibiting a significantly greater improvement than the POD method. This finding suggests that the first n modes selected by the ECEPM method promptly captured the system’s primary features, and the C-POD method demonstrated robust reconstruction capabilities at lower modal orders. However, as the modal order increased further, the reconstruction accuracies of both methods tended to converge.

Figure 15. Comparison of reconstruction accuracy for the first n modes extracted by POD and C-POD methods.

4.5. Evolutionary Characteristics of Modes Extracted by C-POD over Time

To further investigate the temporal evolution of modes extracted by C-POD, Figure 16 plots the variation of the first and second mode coefficients over time. The vorticity values in the cloud plot were restricted to the range [−2, 2] to facilitate comparison. It can be observed that the modal coefficients for both the first and second modes exhibited inflection points not only at the primary transition moments of the flow states (i.e., t = 1 s, 9 s, 33 s, and 110 s) but also at secondary fluctuation moments: t = 5 s, 13.5 s, 22 s, and 119.5 s.

Figure 16. Temporal variation of the first and second modal coefficients extracted by the C-POD method using the ECEPM method.

This indicates that C-POD’s time-varying modal coefficients effectively captured both dominant state transitions and subtle flow variations in the cylinder wake. The evolution of vorticity fields dominated by the first and second modes is illustrated at key time points: as shown in Figure 17, Figure 18 and Figure 19, the time-resolved vorticity structures reflected the dynamic flow features governed by these leading modes.

Figure 17. Temporal changes in the vorticity field dominated by the first and second modes of the C-POD method at t = 1.5, 5, and 9 s.

Figure 18. Temporal changes in the vorticity field dominated by the first and second modes of the C-POD method at t = 13.5, 22, and 33 s.

Figure 19. Temporal changes in the vorticity field dominated by the first and second modes of the C-POD method at t = 110, 119.5, and 135 s.

For t < 9 s:
- First mode: Dominated by third-type vortex, which weakened over time.
- Second mode: Transitioned from irregular fourth-type vortex to first-type vortex, with rotation changes observed at t = 1 s, 5 s, and 9 s.
For 9 s < t < 14.5 s:
- First mode: Evolved from third-type to first-type vortex, extending its influence across the field.
- Second mode: Underwent gradual rotational transition.
For 13.5 s < t < 33 s:
- The vorticity distribution became more stable.
- Second-mode rotation direction changed.
For 33 s < t < 110 s:
- System entered a quasi-stable regime, with negligible changes in vortex structures and modal vorticity values.
For 110 s < t < 119.5 s:
- Flow transitioned toward vortex street formation.
- Side vortices began shedding at the tail, and second-mode rotation altered again.
For t > 119.5 s:
- The system reached a state of periodic vortex street generation.
- Mode coefficients showed periodic oscillation (Figure 15).

5. Application Value of C-POD ROM in Inverse Problems

The Gappy POD method [,] is a derivative of the POD reduced-order model, primarily used to solve the full-field reconstruction inverse problem when only sparse sensor data are available. The accuracy of this method heavily depends on the precision of the reduced-order model under low-order conditions. In both POD and C-POD methods, low-order modes represent the primary dynamics of the system, typically carrying most of the energy or important information, while high-order modes often represent finer details or noise. For a system, low-order modes are typically associated with the main frequencies and significant physical behaviors, making them more accurate for data reconstruction. Through the analysis in Section 5, we found that, in the cylinder wake problem, the C-POD method had a greater advantage over POD at lower modal orders. This low-order advantage should be better reflected in inverse problems based on reduced-order methods. Continuing with the cylinder wake case described in Section 4.4.3, we describe the performance of this inverse problem in the cylinder wake scenario as follows: placing a specified number of sparse sensors in the vorticity field of the cylinder wake and reconstructing the vorticity field in real time based only on the sparse sensor data, to capture the complete vortex evolution of the cylinder wake.

Here, 80% of the original dataset was randomly selected as the training set, and the remaining 20% was used as the validation set. Using the correlation coefficient filtering method proposed by Qingyang Yuan [], seven optimal sparse sensor locations were selected in the vorticity field behind the cylinder. Specifically, this method computed the Pearson correlation coefficient between each candidate sensor point and the low-order POD modal coefficients obtained from the training set, then ranked and selected sensor locations with the highest absolute correlations. This ensured that the selected sensors were most sensitive to the dominant dynamic features of the flow field. These sparse sensors can output real-time vorticity information at their respective positions. Based on the sparse sensor data, the missing reconstruction methods based on POD (Gappy POD) and C-POD (Gappy C-POD) were used to reconstruct the cylinder wake flow field under different modal orders, and the reconstruction capabilities were measured, as shown in Figure 20 and Figure 21.

Figure 20. Variation in the average field reconstruction capability of Gappy POD and Gappy C-POD methods at different modal orders.

Figure 21. Variation in the lower bound of field reconstruction capability for Gappy POD and Gappy C-POD methods at different modal orders.

From Figure 20 and Figure 21, it is observed that the reconstruction accuracy of the Gappy POD method increased initially and then decreased as the modal order increased. When the modal order was 5, the Gappy POD method achieved its optimal reconstruction accuracy, with an average Cr value of 0.81 and a maximum Cr value of 0.97. When the modal order was less than 12, the reconstruction accuracy of the Gappy C-POD method followed a similar trend, increasing initially and then decreasing. When the modal order was 6, the average Cr value was 0.65 and the maximum Cr value was 0.84. However, when the modal order exceeded 12, the reconstruction accuracy of Gappy C-POD decreased as the modal order increased. When the modal order reached 20, the average Cr value dropped to 0.49, lower than the average value of 0.65 at modal order 6. This demonstrated the significant advantage of the Gappy C-POD method over the Gappy POD method in reconstructing inverse problems. The average reconstruction accuracy of Gappy C-POD was improved by 19.75% compared to Gappy POD, and the lower bound of reconstruction capability was improved by 13.4%. Furthermore, the Gappy C-POD method can achieve higher field reconstruction accuracy under higher modal order conditions, as higher modal orders can capture more complex flow behaviors. Therefore, the Gappy C-POD method is more suitable for field reconstruction in complex nonlinear systems.

Based on the above analysis, a modal order of 5 was chosen for the Gappy POD method and a modal order of 6 for the Gappy C-POD method. The reconstruction of the cylinder wake vorticity field at t = 141 s is shown in Figure 22 and Figure 23. From left to right, the three images in Figure 22 and Figure 23 represent the cylinder wake vorticity field at t = 141 s, the reconstructed field with sparse sensor layout, and the distribution of reconstruction relative error. From Figure 21 and Figure 22, it can be seen that the maximum error in the field reconstruction using the Gappy C-POD method was 25%, which is smaller than the maximum error of 35% in the field reconstruction using the Gappy POD method.

Figure 22. Gappy POD reconstruction of the cylinder wake vorticity field at t = 141 s.

Figure 23. Gappy C-POD reconstruction of the cylinder wake vorticity field at t = 141 s.

Limitations and Future Improvements

While the proposed C-POD ROM achieved improved reconstruction accuracy and enhanced interpretability compared to traditional POD-based methods, certain limitations remain, particularly in practical applications.

Computational Cost

The integration of clustering and entropy-based ranking introduces additional computational burden. When dealing with large-scale datasets or increasing the number of clusters, both memory usage and computation time grow considerably.

Sensitivity to Sensor Layout in Inverse Reconstruction

For tasks involving sparse sensor reconstruction, the spatial placement of sensors significantly affects the performance. If the sensors fail to cover dynamically important regions, the model may suffer from decreased reconstruction fidelity. This highlights the necessity of coupling reduced-order modeling with sensor placement optimization.

Hyperparameter Dependency

The performance of clustering and entropy-controlled ranking is sensitive to several hyperparameters, such as the number of clusters and entropy scaling coefficients. Inappropriate parameter settings can degrade the model’s effectiveness or lead to unstable results.

To address these limitations, future work may focus on:

(1): Developing lightweight clustering algorithms or incremental learning schemes to reduce computational overhead.
(2): Exploring adaptive sensor deployment strategies.
(3): Investigating robust and data-driven hyperparameter tuning frameworks.

These improvements will enhance the scalability and robustness of C-POD ROM in more complex and data-limited fluid dynamics scenarios.

6. Conclusions

This work introduced a clustering-based dimensionality reduction method guided by POD structures (C-POD), aiming to address the instability of clustering center initialization and the lack of effective mode ranking in conventional clustering-based approaches. By integrating the orthogonality and interpretability of POD with the flexibility of unsupervised clustering, and by incorporating an entropy-controlled Euclidean-to-probability mapping (ECEPM) for robust mode evaluation, the proposed method enhanced both the stability and transparency of dimensionality reduction.

The C-POD method was validated through applications to convective flow problems, including the one-dimensional Burgers’ equation and the two-dimensional cylinder wake flow. The results demonstrated that C-POD effectively identified dominant flow structures and preserved essential dynamics under reduced-dimensional settings. Moreover, it showed strong resilience in sparse reconstruction tasks, delivering better performance when only limited sensor data or a small number of modes were available.

Beyond improved numerical accuracy, the method provided meaningful insights into flow physics, making it well-suited for real-time prediction, control, and inverse analysis of complex unsteady systems.

Future work will extend this framework to three-dimensional turbulent convective flows and explore adaptive clustering mechanisms that dynamically optimize cluster configurations, further enhancing model scalability and generalization.

Author Contributions

Conceptualization, Q.Y. and B.Z.; methodology, Q.Y.; validation, Q.Y. and B.Z.; formal analysis, Q.Y.; investigation, Q.Y.; data curation, Q.Y.; writing—original draft preparation, Q.Y.; writing—review and editing, B.Z.; visualization, Q.Y.; supervision, B.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Guo, T.; Wu, W.-L.; Luo, Z.-M.; Zhang, Y.-N. Mode Transition and Drag Characteristics of Non-Circular Cylinders in a Uniform Flow. Ocean Eng. 2024, 297, 117025. [Google Scholar] [CrossRef]
Santana, G.M.; Fabro, A.T.; Miserda, R.F.B. Analysis of the Dynamic Modes of the Transonic Flow around a Cylinder. J. Braz. Soc. Mech. Sci. Eng. 2024, 46, 414. [Google Scholar] [CrossRef]
Copeland, D.M.; Cheung, S.W.; Huynh, K.; Choi, Y. Reduced Order Models for Lagrangian Hydrodynamics. Comput. Methods Appl. Mech. Eng. 2022, 388, 114259. [Google Scholar] [CrossRef]
Knockaert, L.; De Zutter, D. Stable Laguerre-SVD Reduced-Order Modeling. IEEE Trans. Circuits Syst. I Fundam. Theory Appl. 2003, 50, 576–579. [Google Scholar] [CrossRef]
Couplet, M.; Basdevant, C.; Sagaut, P. Calibrated Reduced-Order POD-Galerkin System for Fluid Flow Modelling. J. Comput. Phys. 2005, 207, 192–220. [Google Scholar] [CrossRef]
Besselink, B.; van de Wouw, N.; Scherpen, J.M.A.; Nijmeijer, H. Model Reduction for Nonlinear Systems by Incremental Balanced Truncation. IEEE Trans. Autom. Control 2014, 59, 2739–2753. [Google Scholar] [CrossRef]
Grolet, A.; Thouverez, F. On a New Harmonic Selection Technique for Harmonic Balance Method. Mech. Syst. Signal Process. 2012, 30, 43–60. [Google Scholar] [CrossRef]
Bagheri, M.H.; Esmailpour, K.; Hoseinalipour, S.M.; Mujumdar, A.S. Numerical Study and POD Snapshot Analysis of Flow Characteristics for Pulsating Turbulent Opposing Jets. Int. J. Numer. Methods Heat Fluid Flow 2019, 29, 2009–2031. [Google Scholar] [CrossRef]
Bistrian, D.A.; Navon, I.M. Randomized Dynamic Mode Decomposition for Nonintrusive Reduced Order Modelling. Int. J. Numer. Methods Eng. 2017, 112, 3–25. [Google Scholar] [CrossRef]
Phillips, T.R.F.; Heaney, C.E.; Smith, P.N.; Pain, C.C. An Autoencoder-Based Reduced-Order Model for Eigenvalue Problems with Application to Neutron Diffusion. Int. J. Numer. Methods Eng. 2021, 122, 3780–3811. [Google Scholar] [CrossRef]
Zhang, X.; Ji, T.; Xie, F.; Zheng, C.; Zheng, Y. Data-Driven Nonlinear Reduced-Order Modeling of Unsteady Fluid–Structure Interactions. Phys. Fluids 2022, 34, 053608. [Google Scholar] [CrossRef]
Shen, X.; Du, C.; Jiang, S.; Sun, L.; Chen, L. Enhancing Deep Neural Networks for Multivariate Uncertainty Analysis of Cracked Structures by POD-RBF. Theor. Appl. Fract. Mech. 2023, 125, 103925. [Google Scholar] [CrossRef]
Yuan, L.; Ni, Y.-Q.; Deng, X.-Y.; Hao, S. A-PINN: Auxiliary Physics Informed Neural Networks for Forward and Inverse Problems of Nonlinear Integro-Differential Equations. J. Comput. Phys. 2022, 462, 111260. [Google Scholar] [CrossRef]
Taassob, A.; Kumar, A.; Gitushi, K.M.; Ranade, R.; Echekki, T. A PINN-DeepONet Framework for Extracting Turbulent Combustion Closure from Multiscalar Measurements. Comput. Methods Appl. Mech. Eng. 2024, 429, 117163. [Google Scholar] [CrossRef]
Colanera, A.; Schmidt, O.T.; Chiatto, M. Robust Spectral Proper Orthogonal Decomposition. Comput. Phys. Commun. 2025, 307, 110056. [Google Scholar] [CrossRef]
Gu, X.; Xu, C.; Liu, M.; Mao, Y. Frequency-Domain Proper Orthogonal Decomposition for Efficient Reconstruction of Unsteady Flows. Phys. Fluids 2025, 37, 025109. [Google Scholar] [CrossRef]
Bui, D.; Hamdaoui, M.; De Vuyst, F. POD-ISAT: An Efficient POD-Based Surrogate Approach with Adaptive Tabulation and Fidelity Regions for Parametrized Steady-State PDE Discrete Solutions. Int. J. Numer. Methods Eng. 2013, 94, 648–671. [Google Scholar] [CrossRef]
Burkardt, J.; Gunzburger, M.; Lee, H.-C. Centroidal Voronoi Tessellation-Based Reduced-Order Modeling of Complex Systems. SIAM J. Sci. Comput. 2006, 28, 459–484. [Google Scholar] [CrossRef]
Iqbal, M.M.; Xavier, R.J. Development of Optimal Reduced-Order Model for Gas Turbine Power Plants Using Particle Swarm Optimization Technique. Int. Trans. Electr. Energy Syst. 2020, 30, e12278. [Google Scholar] [CrossRef]
Huera-Huarte, F.J.; Vernet, A. Vortex Modes in the Wake of an Oscillating Long Flexible Cylinder Combining POD and Fuzzy Clustering. Exp. Fluids 2010, 48, 999–1013. [Google Scholar] [CrossRef]
D’aDamo, J.; Collaud, M.; Sosa, R.; Godoy-Diana, R. Wake and Aeroelasticity of a Flexible Pitching Foil. Bioinspiration Biomim. 2022, 17, 045002. [Google Scholar] [CrossRef]
Wei, Z.; Yang, Z.; Xia, C.; Li, Q. Cluster-Based Reduced-Order Modelling of the Wake Stabilization Mechanism behind a Twisted Cylinder. J. Wind. Eng. Ind. Aerodyn. 2017, 171, 288–303. [Google Scholar] [CrossRef]
Sirovich, L.; Kirby, M. Turbulence and the Dynamics of Coherent Structures. Part1: Coherent Structures. Q. Appl. Math. 1987, 45, 561–571. [Google Scholar] [CrossRef]
Guo, H.B.; Renaut, R.A. A Regularized Total Least Squares Algorithm. In Total Least Squares and Errors-in-Variables Modeling: Analysis, Algorithms and Applications; Van Huffel, S., Lemmerling, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2002; pp. 57–66. [Google Scholar]
Yuan, S.; Blömker, D.; Duan, J. Stochastic Turbulence for Burgers Equation Driven by Cylindrical Lévy Process. Stoch. Dyn. 2022, 22, 2240004. [Google Scholar] [CrossRef]
Li, L.; Ong, K.W. Dynamic Transitions of Generalized Burgers Equation. J. Math. Fluid Mech. 2016, 18, 89–102. [Google Scholar] [CrossRef]
Senechal, M. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. Science 1993, 260, 1170. [Google Scholar] [CrossRef]
Williamson, C.H.K. Oblique and Parallel Modes of Vortex Shedding in the Wake of a Circular Cylinder at Low Reynolds Numbers. J. Fluid Mech. 1989, 206, 579–627. [Google Scholar] [CrossRef]
Singha, S.; Sinhamahapatra, K.P. High-Resolution Numerical Simulation of Low Reynolds Number Incompressible Flow About Two Cylinders in Tandem. J. Fluids Eng. 2010, 132, 011101. [Google Scholar] [CrossRef]
Qu, L.; Norberg, C.; Davidson, L.; Peng, S.-H.; Wang, F. Quantitative Numerical Analysis of Flow Past a Circular Cylinder at Reynolds Number Between 50 and 200. J. Fluids Struct. 2013, 39, 347–370. [Google Scholar] [CrossRef]
Everson, R.; Sirovich, L. Karhunen–Loève Procedure for Gappy Data. J. Opt. Soc. Am. A 1995, 12, 1657–1664. [Google Scholar] [CrossRef]
Deus, J.; Martin, E. Efficient Cardiovascular Parameters Estimation for Fluid-Structure Simulations Using Gappy Proper Orthogonal Decomposition. Ann. Biomed. Eng. 2024, 52, 3037–3052. [Google Scholar] [CrossRef] [PubMed]
Yuan, Q.; Guo, X.; Han, J.; Han, B.; Zhang, B.; Lan, T.; Ariyur, K. Correlation Coefficient Filtering Method for Optimal Sensor Layout Strategies in Inverse Problems. J. Eng. 2025, 2025, 5142118. [Google Scholar] [CrossRef]

Figure 1. Taxonomy of ROM.

Figure 2. Schematic representation of the C-POD method construction process.

Figure 3. Schematic of wave evolution in the one-dimensional Burgers’ equation.

Figure 4. Comparison of average Cr values under different modal orders: illustrating the overall reconstruction accuracy differences between POD and C-POD methods.

Figure 5. Comparison of maximum Cr values under different modal orders: illustrating the minimum reconstruction accuracy differences between POD and C-POD methods.

Figure 6. Time evolution of the waveform reconstructed using only the first mode extracted by the C-POD and POD methods, for c = 0.1 and β = 0.2, from t = 0 to t = 2.

Figure 7. Time evolution of the waveform reconstructed using only the first mode extracted by the C-POD and POD methods, for c = 2 and β = 2, from t = 0 to t = 2.

Figure 8. Geometric dimensions of the computational domain for the two-dimensional cylinder wake flow.

Figure 9. Typical flow patterns in the wake of a cylinder at Re = 150.

Figure 10. Variation in average dimensionality reduction and reconstruction capability of POD and C-POD methods at different modal orders.

Figure 11. Variation in the lower bound of dimensionality reduction and reconstruction capability of POD and C-POD methods at different modal orders.

Figure 12. Distribution of the vorticity field dominated by the first six modes extracted by C-POD and POD at t = 236 s.

Figure 13. Distribution of the vorticity field dominated by the first six modes extracted by C-POD and POD at t = 240 s.

Figure 14. Various vortex patterns in the vorticity field dominated by C-POD modes.

Figure 15. Comparison of reconstruction accuracy for the first n modes extracted by POD and C-POD methods.

Figure 16. Temporal variation of the first and second modal coefficients extracted by the C-POD method using the ECEPM method.

Figure 17. Temporal changes in the vorticity field dominated by the first and second modes of the C-POD method at t = 1.5, 5, and 9 s.

Figure 18. Temporal changes in the vorticity field dominated by the first and second modes of the C-POD method at t = 13.5, 22, and 33 s.

Figure 19. Temporal changes in the vorticity field dominated by the first and second modes of the C-POD method at t = 110, 119.5, and 135 s.

Figure 20. Variation in the average field reconstruction capability of Gappy POD and Gappy C-POD methods at different modal orders.

Figure 21. Variation in the lower bound of field reconstruction capability for Gappy POD and Gappy C-POD methods at different modal orders.

Figure 22. Gappy POD reconstruction of the cylinder wake vorticity field at t = 141 s.

Figure 23. Gappy C-POD reconstruction of the cylinder wake vorticity field at t = 141 s.

Table 1. Comparison of numerical solutions with literature results for varying total grid numbers and dimensionless time steps.

Operating Condition Name	Total Number of Grids	$Δ t *$	CD	St
A1	100 × 40	0.01	0.215	0.16
A2	200 × 80	0.01	0.225	0.164
A3	400 × 160	0.01	0.229	0.165
A4	600 × 240	0.01	0.229	0.165
A5	800 × 320	0.01	0.229	0.165
A6	400 × 160	0.005	0.229	0.165
A7	400 × 160	0.02	0.231	0.167
Experiment from []				0.165
CFD from []			0.226	0.165
CFD from []			0.233	0.166

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Clustering-Based Dimensionality Reduction Method Guided by POD Structures and Its Application to Convective Flow Problems

Abstract

1. Introduction

2. C-POD Method and Modal Sorting Method

2.1. C-POD Method

2.2. Entropy-Controlled Euclidean-to-Probability Mapping Modal Sorting Method

3. Burgers’ Equation and C-POD Method

3.1. Introduction to the One-Dimensional Burgers’ Equation

3.2. Database Construction

3.3. C-POD Method Accuracy Testing and Modal Decomposition

4. Cylinder Wake Flow Case Introduction

4.1. Model Introduction

4.2. Grid Independence and Numerical Method Validation

4.3. Presentation of Computational Results

4.4. Cylinder Wake Flow Field Dimensionality Reduction and Reconstruction Accuracy, and Modal Decomposition

4.4.1. Dataset and Test Set Construction

4.4.2. Dimensionality Reduction and Reconstruction Accuracy of Cylinder Wake Flow Field

4.4.3. Comparison of Modal Decomposition Results

4.5. Evolutionary Characteristics of Modes Extracted by C-POD over Time

5. Application Value of C-POD ROM in Inverse Problems

Limitations and Future Improvements

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics