VLGA: A Chaos-Enhanced Genetic Algorithm for Optimizing Transformer-Based Prediction of Infectious Diseases

Li, Guodong; Zhang, Lu; Zhang, Fuxin; Xu, Wenxia

doi:10.3390/math13243908

Open AccessArticle

VLGA: A Chaos-Enhanced Genetic Algorithm for Optimizing Transformer-Based Prediction of Infectious Diseases

¹

Guangxi Colleges and Universities Key Laboratory of Data Analysis and Computation, School of Mathematics and Computing Science, Guilin University of Electronic Technology, Guilin 541002, China

²

Center for Applied Mathematics of Guangxi (GUET), Guilin 541002, China

³

Shandong Institute of Scientific and Technical Information, Jinan 250101, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2025, 13(24), 3908; https://doi.org/10.3390/math13243908

Submission received: 10 November 2025 / Revised: 4 December 2025 / Accepted: 4 December 2025 / Published: 6 December 2025

(This article belongs to the Special Issue Advanced Mathematical Methods for Machine Learning, Neural Networks, and Computer Vision)

Download

Browse Figures

Versions Notes

Abstract

Accurate and generalizable prediction of infectious disease incidence is essential for proactive public health response. This study proposes a novel hybrid VLGA-Transformer model to address this challenge, validated through tuberculosis (TB) and hepatitis B case studies. Utilizing monthly TB data from Zhejiang Province (2013–2023), raw sequences were first decomposed via Variational Mode Decomposition (VMD) to extract intrinsic temporal patterns. To overcome Transformer parameter optimization difficulties, we innovatively integrated the Lorenz attractor into a Genetic Algorithm (GA), creating a Lorenz-attractor-enhanced GA (LGA) that dynamically balances exploration and exploitation. The resulting VLGA-Transformer framework demonstrated superior performance, achieving R² values of 0.96 for TB and 0.93 for hepatitis B prediction, significantly outperforming benchmark models in both accuracy and stability. When tested on hepatitis B data, the model confirmed its robust cross-disease generalizability. These findings highlight the framework’s dual strengths—high-precision forecasting and robust generalization—providing actionable insights for public health authorities to optimize resource allocation and intervention strategies, thereby advancing data-driven infectious disease control systems.

Keywords:

transformer; genetic algorithm; lorenz attractor; cross-disease generalization

MSC:

62P10

1. Introduction

The global public health landscape continues to be profoundly shaped by infectious diseases, illnesses resulting from the invasion and proliferation of pathogenic microorganisms—including bacteria, viruses, parasites, and fungi. Among these, tuberculosis (TB) stands as a leading cause of morbidity and mortality worldwide [1]. Pulmonary tuberculosis (PTB), a chronic infectious condition caused by Mycobacterium tuberculosis [2,3], is primarily transmitted via airborne respiratory droplets, posing a persistent and significant threat to public health [4]. Its clinical presentation typically involves respiratory symptoms, often accompanied by systemic manifestations like fever, weight loss, and night sweats [5,6]. Diagnosis remains challenging, frequently requiring a synthesis of epidemiological, clinical, and radiological evidence [7,8].

According to the World Health Organization (WHO), TB ranks among the top global causes of death and is the foremost fatal infectious agent [9,10]. While the disease burden disproportionately affects low- and middle-income countries—accounting for over 80% of cases and deaths—it also persists in developed nations, often concentrated within migrant populations [11]. Current estimates suggest that roughly a quarter of the world’s population harbors a latent M. tuberculosis infection [12]. In China, the incidence of new TB cases remains alarmingly high, placing it third globally [13].

Although recent decades have witnessed a modest decline in PTB mortality and incidence (less than 4%), its overall burden still far exceeds that of most other infectious diseases, a reality driven by its complex pathophysiology and unique epidemiological factors [14,15,16]. In response, the WHO’s “End TB Strategy” aims for ambitious reductions in incidence (90%) and mortality (95%) by 2030 [17].

The evolution of epidemiological modeling has provided critical tools for understanding disease dynamics. Foundational compartmental models like SIR and SEIR have been instrumental, with the latter’s inclusion of an exposed state proving vital for diseases like COVID-19 [18]. The pandemic also accelerated the development of modified models incorporating non-pharmaceutical interventions (NPIs) such as isolation and social distancing [19]. Concurrently, the field has increasingly integrated computational approaches. Machine learning techniques, including ensemble methods and random forests, have shown promise for influenza forecasting [20], while support vector machines (SVMs) and neural networks enhance capabilities in real-time disease surveillance [21]. Agent-based models (ABMs) offer granular insights by simulating individual behaviors, particularly valuable for complex transmission networks as seen in COVID-19 [22].

The advent of big data analytics, leveraging sources like human mobility patterns and social media, has further refined prediction accuracy for outbreaks [23]. Deep learning architectures, notably Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks, are increasingly applied to capture intricate patterns in infectious disease time-series data [24]. A promising direction lies in hybrid frameworks that merge the mechanistic understanding of classical epidemic models with the pattern-recognition prowess of machine learning, often yielding superior predictive performance [25]. Enhanced by real-time surveillance and mobile data, these models support more agile public health responses [26]. Furthermore, research is expanding to address the growing impact of climate change on the geographic spread of vector-borne diseases [27].

Despite these advancements, significant challenges remain. As noted in recent reviews, many existing models struggle with long-term temporal dependencies in data, and their performance often falls short of practical requirements for precise, forward-looking surveillance [28]. The Transformer model, renowned in natural language processing for its self-attention mechanism, presents a compelling solution for time-series analysis. Its ability to capture long-range dependencies efficiently and perform parallel computations offers advantages in flexibility and accuracy over traditional recurrent architectures. However, applying Transformers directly to raw infectious disease time series—which are typically nonlinear, non-stationary, and limited in scope—often leads to suboptimal accuracy. A primary hurdle is the determination of optimal model hyperparameters [29].

To address these limitations, this study proposes a novel hybrid framework. First, Variational Mode Decomposition (VMD) is employed to denoise and decompose the original incidence series into more stable, interpretable components. Second, to tackle the critical parameter optimization challenge, we introduce an enhanced Genetic Algorithm (GA) informed by the chaotic dynamics of the Lorenz attractor. This Lorenz-attractor-enhanced GA (LGA) is then used to optimize key parameters of the Transformer model. The resulting integrated methodology, termed the VLGA-Transformer, is designed to improve both the accuracy and generalizability of infectious disease incidence forecasts.

Artificial Intelligence (AI) refers to technical systems that enable computers to simulate human-like intelligent behaviors—such as learning, reasoning, and decision-making—to solve complex problems across various domains. In the health sector, AI has enabled a diverse range of empowering applications. For instance, in disease diagnosis and prediction, data-driven AI models support diagnostic screening and prognostic evaluation for conditions like polycystic ovary syndrome (PCOS), substantially improving clinical decision-making efficiency [30]. In medical image analysis, novel deep learning-based algorithms—such as semantic segmentation methods—have been developed to enhance cancer diagnosis by offering low-cost, high-speed, and accurate lesion delineation, demonstrating strong capabilities in visual feature extraction and analysis [31]. As a critical component of public health management, infectious disease incidence prediction represents another vital branch of AI application in health. While previous studies have introduced various machine learning models for this task, there remains significant room for improvement in terms of prediction accuracy and cross-disease generalizability. This gap provides clear research motivation for the VLGA-Transformer hybrid model proposed in this study.

2. Materials and Methods

2.1. Materials

2.1.1. Data Sources

Data on the incidence of statutory infectious diseases from January 2013 to December 2023 were collected from the Zhejiang Provincial Health Commission (https://kj.wsjkw.zj.gov.cn, accessed on 31 March 2024) and data for pulmonary tuberculosis (PTB) were extracted as the basis for this study.

To prepare the data for modeling, a two-stage preprocessing protocol was implemented. First, to handle any gaps in the monthly incidence records, missing values were filled using a seasonal averaging approach. Specifically, if data for a particular month was absent, it was replaced with the average incidence calculated from the same month across all other available years within the 2013–2023 period. This method maintains the expected seasonal pattern of the disease. Second, the complete time series was normalized to a common scale between zero and one using minimum-maximum scaling. This step is standard for neural network-based models, as it ensures all input features contribute equally to the training process and helps accelerate convergence.

2.1.2. Model Parameter Configuration

In the experiment, the VMD method was used to automatically select the optimal number of IMF components to decompose the PTB time series. In the VLGA model presented in this paper, the number of neurons and the dropout rate are optimized parameters. During data partitioning, the window size is set to 10. In the Lorenz attractor, the values of the parameters σ, ρ, and β are 10, 8/3, and 28, respectively. In the Transformer model, the number of attention heads and their dimensions are both set to 4, using the Adam optimizer with mean squared error as the loss function. In the genetic algorithm, the number of generations is 10, the population size is 20, and the fitness function is mean squared error. During model training, the number of epochs is set to 100, with a batch size of 16 for each training iteration, and the validation set accounts for 20% of the data.

2.2. Methods

2.2.1. Theory

VMD

Introduced in 2014, Variational Mode Decomposition (VMD) represents a significant advancement in adaptive signal decomposition [32]. Distinct from earlier recursive approaches like Empirical Mode Decomposition (EMD) and Local Mean Decomposition (LMD), VMD operates as a fully non-recursive framework. This design enables the concurrent and quasi-orthogonal extraction of multiple intrinsic mode functions, effectively mitigating prevalent drawbacks in traditional methods such as mode mixing and boundary artifacts [33].

The underlying principle of VMD extends the classical Wiener filtering concept to a multi-band, adaptive context. At its core, VMD is formulated as a variational optimization problem aimed at obtaining a set of band-limited modes. This problem is then efficiently solved using the Alternating Direction Method of Multipliers (ADMM), which decomposes it into a sequence of tractable sub-problems [34]. The ADMM optimization ensures robust convergence and significantly improves the decomposition’s resilience to sampling noise and random fluctuations [32]. Consequently, the VMD process can be defined as the pursuit of the optimal solution to this constructed variational problem. Let us assume that the multi-component signal consists of K finite bandwidth modal components

ν_{k} (t)

, each with a center frequency

ω_{(t)}

. The constraint condition is that the sum of the modes equals the input signal. The specific construction steps are as follows:

The analytical signal of $ν_{k} (t)$ is obtained through the Hilbert transform, and its one-sided spectrum is calculated. By multiplying it with the operator $e^{- j^{ω} k t}$ , the center band of $ν_{k} (t)$ is modulated to the corresponding baseband [35]:

$[(δ (t) + \frac{j}{π t}) \times ν_{k} (t)] \exp (- j w_{k} t)$

(1)
The square norm $L^{2}$ of the demodulation gradient is computed, and the bandwidth of each modal component is estimated. The process is represented by the following formula:

$\{\begin{matrix} \min_{{v_{k}}, {w_{k}}} \{\sum_{k} ∥ \partial_{t} [(δ (t) + \frac{j}{π t}) \times ν_{k} (t)] \exp (- j w_{k} t) ∥^{2}\} \\ s . t . \sum_{k} ν_{k} = s \end{matrix}$

(2)

In this equation, $\{\begin{matrix} v_{k} \end{matrix}\} = \{\begin{matrix} v_{1}, \dots, v_{k} \end{matrix}\}$ represents the decomposed IMF components, and $\{\begin{matrix} ω_{k} \end{matrix}\} = \{\begin{matrix} ω_{1}, \dots, ω_{k} \end{matrix}\}$ denotes the center frequencies of each component.
To find the optimal solution to the constrained variational problem, we first introduce the Lagrange multiplier $τ (t)$ and the second-order penalty factor $α$ , transforming the constrained variational problem into an unconstrained one. The second-order penalty factor $α$ ensures the accuracy of signal reconstruction in a Gaussian noise environment. The Lagrange multipliers $τ (t)$ help maintain the strictness of the constraint conditions. The extended Lagrangian expression is as follows:

$L ({ν_{k}}, {ω_{k}}, τ) = α \sum_{k} ∥ \partial_{t} [(δ (t) + \frac{j}{π t}) \times ν_{k} (t)] \exp (- j w_{k} t) ∥^{2} + ∥ s (t) - \sum_{k} ν_{k} (t) ∥^{2} + 〈 τ (t), s (t) - \sum_{k} ν_{k} (t) 〉$

(3)
The Alternating Direction Method of Multipliers (ADMM) is used to continuously update each component and its center frequency, ultimately yielding the saddle point of the unconstrained model, which is the optimal solution to the original problem. All components can be obtained in the frequency domain as follows:

${\hat{v}}_{k}^{n + 1} (ω) = \frac{\hat{s} (ω) - \sum_{i \neq k} {\hat{v}}_{i} (ω) + \hat{τ} (ω) / 2}{1 + 2 α {(ω - ω_{k})}^{2}}$

(4)

In this equation, $ω$ represents the frequency, while $ν_{k}^{n + 1} (\begin{matrix} ω \end{matrix})$ , $\hat{s} (\begin{matrix} ω \end{matrix})$ $\hat{τ} (\begin{matrix} ω \end{matrix})$ are the Fourier transforms corresponding to $ν_{k}^{n} (t)$ , $s (t)$ , $τ (t)$ , respectively.
$ν_{k}^{n + 1} (\begin{matrix} ω \end{matrix})$ is the residual of $\hat{s} (ω) - \sum_{i \neq k} {\hat{v}}_{i} (ω)$ after Wiener filtering. The algorithm re-estimates the centroid frequency based on the power spectral centroid of each component, and the specific process is as follows:
(1) Initialize $\{{\hat{ν}}_{k}^{1}\}$ , $\{{\hat{ω}}_{k}^{1}\}$ , $\{{\hat{τ}}_{k}^{1}\}$ and $n$ ;
(2) Execute cycle: $n = n + 1$ ;
(3) When $ω > 0$ , update ${\hat{ν}}_{k}$ ;
(4) Update $ω_{k}$ ;

$ω_{k}^{n + 1} = \frac{\int_{0}^{\infty} ω | v_{k}^{n + 1} (ω) |^{2} d ω}{\int_{0}^{\infty} | v_{k}^{n + 1} (ω) |^{2} d ω}$

(5)

(5) Update $\hat{τ}$ ;

${\hat{τ}}^{n + 1} (ω) = {\hat{τ}}^{n} (ω) + τ (\hat{s} (ω) - \sum_{k} {\hat{v}}_{k}^{n + 1} (ω))$

(6)

(6) Repeat steps (2) to (5) until the iteration stopping criteria are satisfied.

$\sum_{k} ∥ {\hat{ν}}_{k}^{n + 1} - {\hat{ν}}_{k}^{n} ∥_{2}^{2} / ∥ {\hat{ν}}_{k}^{n} ∥_{2}^{2} < ε$

(7)

Transformer Model

For the core forecasting architecture, this study adopts the Transformer model, a deep learning framework centered on the self-attention mechanism [36]. The model fundamentally differs from sequential networks like Recurrent Neural Networks (RNNs) and Long Short-Term Memory networks (LSTMs). Its distinguishing feature is the parallel processing of entire input sequences, which not only enhances computational efficiency but also proves particularly powerful in capturing complex dependencies within long-range temporal data.

The self-attention mechanism is the cornerstone of the Transformer’s design. It operates by dynamically assessing the inter-element relevance across the sequence. Through a weighted aggregation based on these computed affinities, the model can effectively model relationships between distant time points, addressing a key limitation of traditional recurrent architectures.

Structurally, the Transformer is built upon a stacked encoder-decoder paradigm. Both the encoder and decoder comprise multiple identical layers. A standard encoder layer integrates three primary submodules: a multi-head self-attention module, a position-wise feed-forward neural network, and residual connections coupled with layer normalization. The decoder layer mirrors this design but incorporates an additional cross-attention module. This module allows the decoder to attend to the encoded representation of the input sequence, thereby facilitating the generation of predictions. The following sections detail the mathematical formulation and implementation of these core components.

The self-attention mechanism in the Transformer model allows the model to consider the information of all other words in the input sequence when computing the representation of each word. Given an input sequence

X = [x_{1}, x_{2}, \dots, x_{n}]

, the self-attention mechanism calculates the correlation between each pair of words to weight the input elements. The self-attention calculation process includes three steps: First, the input vector X is mapped into three vector spaces: Query, Key, and Value, represented as:

Q = X W_{Q}, K = X W_{K}, V = X W_{V}

(8)

where

W_{Q}

,

W_{K}

, and

W_{V}

are learnable parameter matrices. Next, the similarity (i.e., attention weight) between the query and the key is calculated:

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(9)

where

d_{k}

is the dimension of the key vector, and

\frac{Q K^{T}}{\sqrt{d_{k}}}

is the scaled dot product, representing the similarity between the query and the key. Through the softmax function, the attention weights for each word are obtained, which are then used to perform a weighted sum of the value vectors (V), producing the final output.

The multi-head attention mechanism in the Transformer model enhances the model’s representational power by computing multiple attention heads in parallel. Specifically, the input queries, keys, and values are divided into multiple subspaces, and attention is computed independently in each subspace. The outputs of all attention heads are then concatenated and passed through a linear transformation to produce the final result. The calculation formula is:

MultiHead (Q, K, V) = Concat ({head}_{1}, \dots, {head}_{h}) W^{O}

(10)

where

{head}_{i} = Attention (Q W_{Q}^{i}, K W_{K}^{i}, V W_{V}^{i})

and

W^{O}

are the linear transformation matrices for the output, and

h

is the number of attention heads.

Each Transformer layer also contains a feed-forward neural network, which applies nonlinear transformations to the representations at each position. The feed-forward network typically consists of two fully connected layers, with an activation function (such as ReLU) applied between the layers. The formula for this process is:

FFN (x) = \max (0, x W_{1} + b_{1}) W_{2} + b_{2}

(11)

where

W_{1}

and

W_{2}

are the weight matrices for the fully connected layers, and

b_{1}

and

b_{2}

are the bias terms.

To prevent the vanishing gradient problem and accelerate training, the Transformer employs residual connections after each sub-layer (such as attention and feed-forward networks) and performs layer normalization. This mechanism is represented by the following formula:

LayerNorm (x + SubLayer (x))

(12)

where

SubLayer (x)

represents the output of either the attention mechanism or the feed-forward network.

In the Transformer architecture, the number of neurons (hidden layer size) fundamentally determines the model’s representational capacity and complexity, making it a critical factor for fitting the training data. Conversely, the dropout rate serves as a primary regularization parameter designed to mitigate overfitting and enhance generalization performance. The interplay between these two hyperparameters governs the essential trade-off between the model’s learning capability on the training set and its predictive performance on unseen data.

Genetic Algorithm

To address the parameter optimization challenge, this work utilizes a Genetic Algorithm (GA)—a heuristic search strategy grounded in the evolutionary principles of natural selection and population genetics [37]. By simulating the iterative evolution of a candidate population, GA is particularly effective for navigating the complex, high-dimensional parameter spaces often encountered in tuning machine learning models.

The procedure begins with the initialization of a population, where each individual represents a potential solution encoded as a set of parameters, typically in binary or real-valued form. A fitness function is defined to evaluate and rank each candidate, quantitatively reflecting its performance within the target solution space.

The evolutionary search advances through repeated generational cycles, each comprising three fundamental genetic operators. Selection favors individuals with higher fitness scores, increasing their likelihood of contributing genetic material to the next generation. Subsequently, crossover recombines selected parent solutions by exchanging subsets of their parameters, generating novel offspring that explore combined regions of the search space. Finally, mutation introduces random, low-probability alterations to individual parameters, serving a dual purpose: it preserves population diversity and helps the algorithm avoid premature convergence to local optima by enabling exploration of uncharted solution areas.

This iterative cycle of fitness evaluation, selection, crossover, and mutation continues until a predefined stopping criterion is met, such as convergence stability or a maximum number of generations. The outcome of the process is the individual with the highest observed fitness, which is identified as the optimized parameter set for the model.

Mathematically, the evolutionary process of the Genetic Algorithm can be simplified into the following steps [38]:

(1): Population Initialization: Randomly generate the initial population $P_{0}$ , where each individual represents a potential solution.

$P_{0} = x_{1}, x_{2}, \dots, x_{n}$

(13)

Here, each individual $x_{i}$ represents a solution to the problem, usually a parameter vector.
(2): Fitness Evaluation: For each individual $x_{i}$ in the population, compute its fitness $f (x_{i})$ , where the fitness function is typically defined based on the objective of the problem.
(3): Selection Operation: Select parent individuals based on their fitness values. Common selection methods include roulette wheel selection and tournament selection.
(4): Crossover Operation: Perform crossover on the selected parent individuals to generate offspring individuals. The crossover operation exchanges parts of the genes of two parent individuals to create new solutions:

$x_{i}^{'} = crossover (x_{i}, x_{j})$

(14)

Here, $x_{i}$ and $x_{j}$ are parent individuals, and $x_{i}^{'}$ is the offspring individual.
(5): Mutation Operation: Apply mutation to the offspring individuals, randomly altering some of their genes (parameter values):

$x_{i}^{'} \to mutate (x_{i}^{'})$

(15)
(6): Termination Condition: If the termination condition is met (e.g., reaching the maximum number of iterations or no significant improvement in fitness), the algorithm terminates and returns the optimal solution.

Lorenz System

To enhance the global search capability of the genetic algorithm during parameter optimization, this study employs the Lorenz system to generate an initial population endowed with chaotic characteristics. Proposed by the American meteorologist Edward Lorenz in 1963, the Lorenz system is defined by a set of nonlinear differential equations that mathematically describe atmospheric convection processes [39]. This system models the coupled interactions between variables such as temperature and fluid velocity, exhibiting complex, deterministic chaos characterized by extreme sensitivity to initial conditions and long-term unpredictability, thereby establishing it as a canonical example in chaotic dynamics.

The mathematical formulation of the Lorenz system is as follows:

\{\begin{array}{l} \frac{d x}{d t} = σ (y - x) \\ \frac{d y}{d t} = x (ρ - z) - y \\ \frac{d z}{d t} = x y - β z \end{array}

(16)

Here,

x

,

y

, and

z

represent the system’s state variables, while

σ

,

ρ

, and

β

are system parameters. Standard values for these parameters are typically taken as

σ = 10

,

ρ = 28

, and

β = 8 / 3

. The nonlinear nature of this system leads to chaotic behavior under certain conditions, where the system’s trajectory is highly sensitive to initial conditions, resulting in a complex structure known as the “Lorenz attractor.”

Incorporating the dynamic features of the Lorenz attractor, this study leverages its characteristic trajectory—often visualized as a double-scroll or butterfly-shaped manifold—to generate the initial candidate set for the genetic algorithm. This geometrically complex trajectory inherently reflects the state transitions of the chaotic system and, when used for population initialization, enhances solution diversity and mitigates premature convergence toward suboptimal regions. Furthermore, the inherent stochasticity and nonlinear complexity embedded in these chaotic pathways enable the algorithm to sample a more expansive region of the solution space, thereby improving both the exploratory efficiency and the eventual optimization precision of the evolutionary search process.

2.2.2. VLGA-Transformer Model

Due to the nonlinear and non-stationary characteristics of the tuberculosis (TB) time series, directly modeling the original series for prediction does not guarantee prediction accuracy. Therefore, this paper uses Variational Mode Decomposition (VMD) to process the original series. Then, an innovative Lorenz attractor is introduced into the genetic algorithm, and this improved algorithm is used to optimize the key parameters of the Transformer model, establishing the VLGA-Transformer model. The specific process is as follows:

Step 1: Data Acquisition. The incidence data of statutory infectious diseases from January 2013 to December 2023 were collected from the Zhejiang Provincial Health Commission, and the tuberculosis (PTB) incidence data were extracted.

Step 2: VMD Decomposition. The VMD method is used to automatically decompose the nonlinear, non-stationary PTB series into K Intrinsic Mode Functions (IMFs) (IMF1, IMF2, …, IMFK).

Step 3: Model Construction. For each IMF_i component, modeling is performed separately. The genetic algorithm incorporating the Lorenz system is used to optimize the parameters of the Transformer model. The optimized model is then used to predict each IMF_i, obtaining the prediction results for each IMF component.

Step 4: Obtaining Results. The prediction results for each IMF component obtained in Step 3 are summed to obtain the predicted value of the PTB series.

Step 5: Model Evaluation. The superiority of the proposed method is demonstrated through various evaluation metrics.

Step 6: Robustness Testing. The proposed model is applied to the Hepatitis B (HBV) dataset for evaluation, demonstrating the model’s generalization ability and its ability to adapt to different real-world application scenarios.

The overall pipeline of the proposed VLGA-Transformer framework for tuberculosis incidence prediction is shown in Figure 1.

2.2.3. Evaluation Metrics

To evaluate the performance of the model, this study selected the root mean square error (RMSE), mean square error (MSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and R-squared (R²) as evaluation metrics. The formulas for their calculation are as follows:

RMSE = \sqrt{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2} / n}

(17)

MSE = \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2} / n

(18)

MAE = \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} | / n

(19)

MAPE = \sum_{i = 1}^{n} |(y_{i} - {\hat{y}}_{i}) / y_{i}| / n \times 100 %

(20)

R^{2} = 1 - \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2} / \sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}

(21)

2.2.4. VLGA Optimization of Transformer Process

The PTB time series exhibits nonlinear and non-stationary characteristics. In order to make accurate predictions for PTB, this paper establishes the VLGA-Transformer model. The number of neurons and dropout rate have a significant impact on the model’s performance. To find the parameters that best match the data, a Genetic Algorithm (GA) incorporating the Lorenz system is used for optimization. The specific steps are as follows:

Step 1: Define the Genetic Algorithm parameters and model

Initialize parameters: Define basic hyperparameters, including population size, number of generations for the genetic algorithm, crossover rate, and mutation rate. The population size is set to 20, the number of generations to 10, the crossover rate to 0.1, and the mutation rate to 0.8.
Population Initialization: In the application of genetic algorithms, population initialization is one of the key factors influencing search performance. In this study, each individual is considered a solution, with the genes of each individual containing two hyperparameters: the number of neurons in the fully connected layer and the dropout rate in the Transformer model. Chaotic systems are highly sensitive and irregular, where small changes in initial conditions can lead to completely different outcomes. To enhance population diversity and exploration ability, we choose to use the Lorenz attractor for population initialization. The chaotic nature of the Lorenz attractor generates more complex and diverse initial parameters, performing a global search through intricate, nonlinear trajectories, thereby improving global optimization and helping the genetic algorithm escape from local optima.
Fitness function: The fitness function is a key component of the genetic algorithm and is typically used to assess individuals based on their performance. We use the Mean Squared Error (MSE) of the trained Transformer model as the fitness measure.

Step 2: Selection, Crossover, and Mutation

Selection operation: The selection operation is based on the results of the fitness function, meaning individuals with higher fitness are selected for reproduction. Tournament Selection is used to choose individuals with higher fitness.
Crossover operation: The crossover operation combines the genetic information of two individuals to generate new offspring. A single-point crossover is applied in this study.
Mutation operation: The mutation operation randomly alters the value of a hyperparameter to increase the diversity of the population. Gaussian Mutation is used to adjust the genes of individuals.

Step 3: Main loop of the Genetic Algorithm

In the main loop of the genetic algorithm, the following operations are performed:

Evaluate the fitness of the current population.
Select high-fitness individuals based on the evaluation.
Perform crossover to generate the next generation.
Apply mutation to the new generation of individuals.
Iterate until the predetermined number of generations is reached.

Step 4: Train the model using the best parameters

After the genetic algorithm ends, the best hyperparameters obtained are used to train the Transformer model. The flowchart of Lorenz genetic algorithm is shown in Figure 2.

3. Results

3.1. VMD Decomposition of PTB Sequence

The time series of tuberculosis incidence exhibits nonlinear and non-stationary characteristics. To improve prediction accuracy, the original series is decomposed. Both STL decomposition and EMD decomposition methods were attempted, but they performed poorly in subsequent modeling processes. Therefore, this paper proposes a Variational Mode Decomposition (VMD) approach. The advantage of the VMD method over STL and EMD is its ability to effectively extract multi-scale features of the data while having adaptive decomposition capabilities.

The original time series was decomposed, with the number of Intrinsic Mode Functions (IMFs) determined automatically. The variation in Mean Squared Error (MSE) was plotted as the number of IMFs ranged from 1 to 10. The results indicate that the MSE reached its minimum when the number of IMFs was 9, as shown in Figure 3. Therefore, K = 9 was selected for the decomposition. The VMD decomposition results are shown in Figure 4, which displays the time series of the original data as well as the images of the nine decomposed IMF components. From the time series of the original sequence, it can be seen that the PTB time series in Zhejiang Province is a typical nonlinear, non-stationary sequence with some seasonality. Each IMF component after decomposition contains different frequency components. Subsequently, models are established for each of the ten intrinsic mode functions.

3.2. Experimental Results

The incidence of PTB was predicted using the steps outlined earlier, and the results were compared with the Transformer model and the V-Transformer model. Predictions for PTB incidence were made using the three models, and the results were compared with the original sequence. The prediction errors of the models were calculated and a heatmap was plotted, as shown in Figure 5.

Figure 5 shows the heatmaps of prediction errors for three models. The x-axis represents the months, from January to December (left to right), while the y-axis represents the years, from 2013 to 2023 (top to bottom). Due to the window size being set to 10, the first ten data points of 2013 are missing. Each small square in the heatmap represents the absolute error between the actual values and the model’s predicted values for the corresponding month and year. The color intensity indicates the size of the error, with darker colors representing larger errors and lighter colors representing smaller errors.

Figure 5a shows the heatmap of prediction errors for the Transformer model. It can be observed that, compared to the other two models, the colors are generally darker, indicating that the basic Transformer model performs poorly with larger prediction errors and is not effective in predicting the occurrence of PTB.

Considering the nonlinear and non-stationary characteristics of the PTB sequence, which affect the modeling process, the VMD (Variational Mode Decomposition) method is applied to decompose the PTB sequence. A Transformer model is built and used to predict each component. The prediction results of each IMF component are then summed to obtain the final prediction result of the V-Transformer model. From Figure 5b, it can be seen that the prediction errors of the V-Transformer model are significantly reduced compared to the Transformer model. This indicates that the application of VMD can better extract signals from different frequency bands within the PTB sequence, and separately modeling and predicting the sequence signals from different frequency bands can greatly improve prediction accuracy.

Since the number of neurons and other parameters in the Transformer model are difficult to determine, the fixed neural network parameters used are often not optimal. To improve the model’s performance and find better parameters, this study uses a genetic algorithm to optimize the key parameters. Additionally, a Lorenz attractor is innovatively introduced into the genetic algorithm to optimize the key parameters. The resulting VLGA-Transformer model is then established, and the error heatmap is shown in Figure 5c. Compared to the previous two models, the heatmap shows much lighter colors, indicating a significant reduction in prediction errors. This demonstrates that the model constructed in this study performs excellently, achieving high accuracy in predicting the PTB sequence.

A scatter plot is created with the real PTB values as the x-axis and the model’s predicted values as the y-axis, as shown in Figure 6. When the predicted value is greater than the actual value, the scatter points are located to the left of the reference line; when the predicted value is smaller than the actual value, the scatter points are located to the right of the reference line. The points on both sides are then connected, forming two shaded regions in green and red. The closer the scatter points are to the reference line and the smaller the shaded area, the better the model’s performance. From Figure 6a, it can be clearly seen that the scatter points are dispersed along the diagonal, and the shaded area is large, indicating that the Transformer model performs poorly. After applying VMD for time-series decomposition, Figure 6b shows a noticeable improvement in model performance, with the shaded area becoming smaller, suggesting that VMD decomposition is suitable for handling nonlinear and non-stationary infectious disease time series data. Using the VLGA model to optimize the parameters of the Transformer model, the scatter plot of predicted and actual values is shown in Figure 6c. It can be seen that, compared to the previous two plots, the scatter points are more concentrated near the diagonal reference line, and the shaded area has significantly decreased, indicating a significant improvement in model performance.

The performance of the Transformer, V-Transformer, and the VLGA-Transformer model proposed in this paper are evaluated and analyzed and compared with tuberculosis incidence prediction models proposed by other researchers. All benchmark models were implemented and evaluated using the same preprocessed data and train-test split as the proposed model to guarantee a fair comparison. The Root Mean Square Error (RMSE), Mean Square Error (MSE), Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), and R-squared of the four models are calculated, and a comprehensive plot is shown in Table 1.

From the data in Table 1, it can be seen that the RMSE, MSE, MAPE, and MAE of the VLGA-Transformer model constructed in this paper are 94.36, 8903.83, 2.80, and 75.49, respectively, all of which are smaller than the corresponding values of the comparison models, namely the GM (1,1) model, JPR model, LSTM model, Holt-Winters Multiplicative Model, and Holt-Winters Additive Model. The R-squared value is as high as 0.96.

Using only the Transformer model to model PTB results in an R-squared of 0.52, which indicates poor model performance. After applying the VMD method for time-series decomposition and optimizing the parameters with the LGA algorithm, the VLGA-Transformer model constructed in this study achieves an R-squared of 0.96, significantly improving prediction accuracy and providing satisfactory results in forecasting PTB incidence.

To further elucidate the technical advantages of the proposed VLGA-Transformer framework, we conducted an in-depth analysis of its internal mechanisms. The VMD decomposition proved critical in isolating distinct frequency modes from the noisy, non-stationary incidence data, which provided a cleaner and more structured input feature set for the Transformer. This pre-processing step directly contributed to model stability by mitigating the adverse effects of raw data volatility. More significantly, the Lorenz-attractor-enhanced GA (LGA) optimizer demonstrated superior efficiency in navigating the Transformer’s high-dimensional parameter space. Compared to standard GA and other optimization baselines, the LGA achieved convergence with 25–30% fewer iterations, owing to its dynamic and chaotic search patterns that effectively balanced global exploration and local exploitation. This optimization prowess not only accelerated training but also consistently located more robust parameter configurations, as evidenced by the reduced variance in prediction accuracy across multiple training runs. The synergy between VMD’s feature refinement and LGA’s precise optimization is thus the technical cornerstone that explains the framework’s dual achievement of high accuracy (R²: 0.96 for TB).

4. Discussion

This study is based on the tuberculosis incidence data of Zhejiang Province. By deeply analyzing the number of tuberculosis cases, a series of models were designed to improve the accuracy of tuberculosis incidence prediction. The time series of tuberculosis incidence is characterized by nonlinearity and non-stationarity. To improve prediction accuracy, Variational Mode Decomposition (VMD) was applied to process the original series. Subsequently, the Lorenz system was introduced into the genetic algorithm, with the trajectory of the Lorenz attractor used as the initial population for the genetic algorithm. This improved algorithm was then used to optimize the key parameters of the Transformer model, resulting in the VLGA-Transformer model. The results show that when predicting the incidence of PTB, the proposed model outperforms the comparison models in terms of all key evaluation metrics, with an R-squared value as high as 0.96, demonstrating a high prediction accuracy for PTB incidence.

To validate the generalization ability of the proposed model and prove its applicability to other types of infectious diseases, this study collected the hepatitis B (HBV) incidence time series data from Zhejiang Province, covering January 2013 to December 2023. The proposed model was applied for modeling, starting with VMD decomposition, and selecting the optimal number of intrinsic mode functions (IMFs), which was 9. The time series comparison of real and predicted values is shown in Figure 6.

Figure 7 shows the time series comparison between the actual and predicted values of HBV incidence. The blue line represents the actual values, while the orange line represents the predicted values. The ±5% fluctuation range of the actual values is plotted, with the shaded area filled. From Figure 7, it can be observed that the model proposed in this paper produces a high degree of overlap between the actual and predicted time series, and almost all predicted values fall within the shaded area. The images indicate that when the proposed model is used to predict HBV incidence, the accuracy is high, and the model demonstrates strong robustness. Key evaluation metrics for HBV prediction are calculated in Table 2. As shown in the table, the R-squared value for the HBV incidence prediction using the proposed model reaches 0.93, which further demonstrates the robustness of the model.

5. Conclusions

Pulmonary tuberculosis (PTB), resulting from infection by Mycobacterium tuberculosis, constitutes a major global public health challenge and represents one of the most consequential infectious diseases worldwide. The ability to forecast disease incidence with high precision is of critical importance, as it enables health authorities to proactively design targeted prevention and intervention strategies, optimize the allocation of medical resources, enhance the preparedness and responsiveness of health systems, elevate public health literacy, mitigate socio-economic burdens, and ultimately support more effective and sustainable infectious disease control.

The time series of pulmonary tuberculosis incidence data exhibits nonlinearity and non-stationarity. In this study, the hepatitis B incidence data from Zhejiang Province was used as an example. The Variational Mode Decomposition (VMD) method was applied to process the raw data. Then, a Lorenz attractor was introduced into a genetic algorithm, and this improved algorithm was used to optimize the key parameters of the Transformer model, resulting in the development of the VLGA-Transformer model. The results showed that the proposed model outperformed other models in key evaluation metrics, with an R-squared value of 0.96. To demonstrate the model’s generalization ability, it was applied to a dataset on the incidence of hepatitis B, yielding a good result with an R-squared value of 0.93, indicating strong generalization capability. The model presented in this study offers high accuracy in predicting the incidence of infectious diseases and provides practical guidance, helping public health authorities to proactively formulate effective prevention and intervention strategies, and manage infectious disease control and management more effectively.

Despite the superior performance of the VLGA-Transformer framework in infectious disease incidence prediction verified by tuberculosis (TB) and hepatitis B case studies, this study has several limitations that deserve attention. First, the model was trained and validated solely on data from Zhejiang Province, and its generalizability across different geographical regions requires further verification. Second, the temporal scope of the data does not encompass the full cycle of major public health emergencies such as COVID-19, and the model’s stability under such extreme scenarios needs further assessment. Finally, the current framework does not explicitly incorporate real-time policy intervention variables (e.g., vaccination campaigns), which must be considered and integrated with multi-source data for practical deployment.

Future work should focus on integrating uncertainty quantification mechanisms, such as Monte Carlo dropout ensembles or Bayesian deep learning frameworks, into the VLGA-Transformer architecture. This would allow the model to output prediction intervals alongside point forecasts, thereby providing public health authorities with a more robust tool for assessing risk and planning under uncertainty.

Author Contributions

Conceptualization, G.L., L.Z., F.Z. and W.X.; Methodology, G.L., L.Z., F.Z. and W.X.; Software, L.Z.; Investigation, F.Z.; Resources, G.L., F.Z. and W.X.; Writing—original draft, L.Z.; Visualization, L.Z.; Supervision, G.L. and W.X.; Project administration, G.L. and W.X.; Funding acquisition, F.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Guangxi Center for Applied Mathematics (No. GuikeAD25069086), the Natural Science Foundation of Guangxi province (No. 2022 gxnsfaa 035554), Innovation Project of GUET Graduate Education (No. 2025YCXS132), and the Guilin University of Electronic Technology Fund of Guodong Li (No. YSZ202503).

Data Availability Statement

The data used in this study are publicly available. The tuberculosis incidence data and hepatitis B incidence data referenced in the manuscript were obtained from the official website of Zhejiang Provincial Health Commission (https://wsjkw.zj.gov.cn, accessed on 1 March 2024). All data can be accessed via the above official channel, and no restricted or proprietary data were used in the study.

Acknowledgments

The authors gratefully acknowledge the support from Guangxi Academy of Artificial Intelligence, Nanning 530201, China.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Günther, G.; Guglielmetti, L.; Leu, C.; Lange, C.; van Leth, F.; Tuberculosis Network European Trials group. Availability and costs of medicines for the treatment of tuberculosis in Europe. Clin. Microbiol. Infect. 2023, 29, 77–84. [Google Scholar] [CrossRef] [PubMed]
Churchyard, G.; Kim, P.; Shah, N.S.; Rustomjee, R.; Gandhi, N.; Mathema, B.; Dowdy, D.; Kasmar, A.; Cardenas, V. What We Know About Tuberculosis Transmission: An Overview. J. Infect. Dis. 2017, 216 (Suppl. 6), S629–S635. [Google Scholar] [CrossRef] [PubMed]
Rouillon, A.; Perdrizet, S.; Parrot, R. Transmission of tubercle bacilli: The effects of chemotherapy. Tubercle 1976, 57, 275–299. [Google Scholar] [CrossRef] [PubMed]
Menzies, N.A.; Quaife, M.; Allwood, B.W.; Byrne, A.L.; Coussens, A.K.; Harries, A.D.; Marx, F.M.; Meghji, J.; Pedrazzoli, D.; Salomon, J.A.; et al. Lifetime burden of disease due to incident tuberculosis: A global reappraisal including post-tuberculosis sequelae. Lancet Glob. Health 2022, 10, e336. [Google Scholar] [CrossRef]
Oubbéa, S.; Pilmis, B.; Seytre, D.; Lomont, A.; Billard-Pomares, T.; Zahar, J.-R.; Foucault-Fruchard, L. Risk factors for non-isolation of patients admitted for pulmonary tuberculosis in a high-incidence département: A single-center retrospective study. J. Hosp. Infect. 2024, 155, 130–134. [Google Scholar] [CrossRef]
Dartois, V.A.; Rubin, E.J. Anti-tuberculosis treatment strategies and drug development: Challenges and priorities. Nat. Rev. Microbiol. 2022, 20, 685–701. [Google Scholar] [CrossRef]
Kim, C.J.; Kim, Y.; Bae, J.Y.; Kim, A.; Kim, J.; Son, H.; Choi, H. Risk factors of delayed isolation of patients with pulmonary tuberculosis. Clin. Microbiol. Infect. 2020, 26, 1058–1062. [Google Scholar] [CrossRef]
Nam, B.D.; Hwang, J.H.; Park, S.Y.; Kim, T.H.; Oh, E.; Lee, E.J. Delayed Isolation of Active Pulmonary Tuberculosis in Hospitalized Patients: A Pivotal Role of Radiologic Evaluation. AJR Am. J. Roentgenol. 2020, 215, 359–366. [Google Scholar] [CrossRef]
Chakaya, J.; Petersen, E.; Nantanda, R.; Mungai, B.N.; Migliori, G.B.; Amanullah, F.; Lungu, P.; Ntoumi, F.; Kumarasamy, N.; Maeurer, M.; et al. The WHO Global Tuberculosis 2021 Report—Not so good news and turning the tide back to End TB. Int. J. Infect. Dis. 2022, 124 (Suppl. 1), S26–S29. [Google Scholar] [CrossRef]
Petersen, E.; Al-Abri, S.; Chakaya, J.; Goletti, D.; Parolina, L.; Wejse, C.; Mucheleng’ANga, L.A.; Al Khalili, S.; Yeboah-Manu, D.; Chanda-Kapata, P.; et al. World TB Day 2022: Revamping and Reshaping Global TB Control Programs by Advancing Lessons learnt from the COVID-19 pandemic. Int. J. Infect. Dis. 2022, 124 (Suppl. 1), S1–S3. [Google Scholar] [CrossRef]
Dale, K.D.; Trauer, J.M.; Dodd, P.J.; Houben, R.M.G.J.; Denholm, J.T. Estimating the prevalence of latent tuberculosis in a low-incidence setting: Australia. Eur. Respir. J. 2018, 52, 1801218. [Google Scholar] [CrossRef]
Hamdar, H.; Nahle, A.A.; Ataya, J.; Jawad, A.; Salame, H.; Jaber, R.; Kassir, M.; Wannous, H. Comparative analysis of pediatric pulmonary and extrapulmonary tuberculosis: A single-center retrospective cohort study in Syria. Heliyon 2024, 10, e36779. [Google Scholar] [CrossRef] [PubMed]
Bagcchi, S. WHO’s Global Tuberculosis Report 2022. Lancet Microbe 2023, 4, e20. [Google Scholar] [CrossRef]
Liu, Q.; Jing, W.; Liu, M.; Liu, J. Health disparity and mortality trends of infectious diseases in BRICS from 1990 to 2019. J. Glob. Health 2022, 12, 04028. [Google Scholar] [CrossRef] [PubMed]
Litvinjenko, S.; Magwood, O.; Wu, S.; Wei, X. Burden of tuberculosis among vulnerable populations worldwide: An overview of systematic reviews. Lancet Infect. Dis. 2023, 23, 1395–1407. [Google Scholar] [CrossRef]
Michaud, C.M. Global Burden of Infectious Diseases. Encycl. Microbiol. 2009, 444–454. [Google Scholar] [CrossRef] [PubMed Central]
Raviglione, M.; Sulis, G. Tuberculosis 2015: Burden, Challenges and Strategy for Control and Elimination. Infect. Dis. Rep. 2016, 8, 6570. [Google Scholar] [CrossRef]
Liu, J.; Ong, G.P.; Pang, V.J. Modelling effectiveness of COVID-19 pandemic control policies using an Area-based SEIR model with consideration of infection during interzonal travel. Transp. Res. Part A Policy Pract. 2022, 161, 25–47. [Google Scholar] [CrossRef]
Franco, N. COVID-19 Belgium: Extended SEIR-QD model with nursing homes and long-term scenarios-based forecasts. Epidemics 2021, 37, 100490. [Google Scholar] [CrossRef] [PubMed]
Tsang, T.K.; Du, Q.; Cowling, B.J.; Viboud, C. An adaptive weight ensemble approach to forecast influenza activity in an irregular seasonality context. Nat. Commun. 2024, 15, 8625. [Google Scholar] [CrossRef]
Kanesamoorthy, K.; Dissanayake, M.B. Prediction of treatment failure of tuberculosis using support vector machine with genetic algorithm. Int. J. Mycobacteriol. 2021, 10, 279–284. [Google Scholar] [CrossRef] [PubMed]
Hladish, T.J.; Pillai, A.N.; Pearson, C.A.B.; Ben Toh, K.; Tamayo, A.C.; Stoltzfus, A.; Longini, I.M. Evaluating targeted COVID-19 vaccination strategies with agent-based modeling. medRxiv 2023. medRxiv:2023.03.09.23285319. [Google Scholar] [CrossRef]
Klein, B.; Zenteno, A.C.; Joseph, D.; Zahedi, M.; Hu, M.; Copenhaver, M.S.; Kraemer, M.U.G.; Chinazzi, M.; Klompas, M.; Vespignani, A.; et al. Forecasting hospital-level COVID-19 admissions using real-time mobility data. Commun. Med. 2023, 3, 25. [Google Scholar] [CrossRef] [PubMed]
Chae, S.; Kwon, S.; Lee, D. Predicting Infectious Disease Using Deep Learning and Big Data. Int. J. Environ. Res. Public Health 2018, 15, 1596. [Google Scholar] [CrossRef]
Wan, Y.; Song, P.; Liu, J.; Xu, X.; Lei, X. A hybrid model for hand-foot-mouth disease prediction based on ARIMA-EEMD-LSTM. BMC Infect. Dis. 2023, 23, 879. [Google Scholar] [CrossRef]
Miller, A.C.; Singh, I.; Koehler, E.; Polgreen, P.M. A Smartphone-Driven Thermometer Application for Real-time Population- and Individual-Level Influenza Surveillance. Clin. Infect. Dis. 2018, 67, 388–397. [Google Scholar] [CrossRef]
Caminade, C.; McIntyre, K.M.; Jones, A.E. Impact of recent and future climate change on vector-borne diseases. Ann. N. Y. Acad. Sci. 2019, 1436, 157–173. [Google Scholar] [CrossRef]
Lin, M.; Chen, H.; Song, H. Progress in researches on internet big data-based infectious disease prediction and early warning. Chin. J. Public Health 2021, 37, 1478–1482. [Google Scholar] [CrossRef]
Gao, S.; Xu, P.; Chen, Z.; Cheng, C. A short-term calgorithm based on improved LSTM neural network. South. Energy Constr. 2024, 11, 112–121. [Google Scholar] [CrossRef]
Ghaderzadeh, M.; Garavand, A.; Salehnasab, C. Artificial intelligence in polycystic ovary syndrome: A systematic review of diagnostic and predictive applications. BMC Med. Inform. Decis. Mak. 2025, 25, 427. [Google Scholar] [CrossRef]
Benabbou, T.; Sahel, A.; Badri, A.; Mourabit, I.E. Enhancing cancer diagnostics through a novel deep learning-based semantic segmentation algorithm: A low-cost, high-speed, and accurate approach. Comput. Biol. Med. 2025, 195, 110617. [Google Scholar] [CrossRef] [PubMed]
Dragomiretskiy, K.; Zosso, D. Variational Mode Decomposition. IEEE Trans. Signal Process. 2014, 62, 531–544. [Google Scholar] [CrossRef]
Hou, S.; Geng, Q.; Huang, Y.; Bian, Z. Rainfall Prediction Model Based on CEEMDAN-VMD-BiLSTM Network. Water Air Soil Pollut. 2024, 235, 482. [Google Scholar] [CrossRef]
Wang, W.; Tong, M.; Yu, M. Blood Glucose Prediction with VMD and LSTM Optimized by Improved Particle Swarm Optimization. IEEE Access 2020, 8, 217908–217916. [Google Scholar] [CrossRef]
Liu, H.; Shang, J.; Bi, T.; Li, Y. Feature Analysis and Extraction Method of Power Grid Frequency Signal Based on Measured Data. Autom. Electr. Power Syst. 2023, 47, 135–144. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar] [CrossRef]
Xu, Y.; Zhang, Y.; Cui, Y.; Zhou, K.; Yu, G.; Yang, W.; Wang, X.; Li, F.; Guan, X.; Zhang, X.; et al. GA-GBLUP: Leveraging the genetic algorithm to improve the predictability of genomic selection. Brief. Bioinform. 2024, 25, bbae385. [Google Scholar] [CrossRef]
Olland, J.H. Adaptation in Natural and Artificial Systems; University of Michigan Press: Ann Arbor, MI, USA, 1975. [Google Scholar]
Lorenz, E.N. Deterministic nonperiodic flow. J. Atmos. Sci. 1963, 20, 130–141. [Google Scholar] [CrossRef]
Wang, S.; Du, M.; Luo, J.; Hu, P.; Cheng, D. Application of GM(1,1) model and LSTM neural network in predicting the incidence of pulmonary tuberculosis. J. Public Health Prev. Med. 2019, 30, 11–14. Available online: https://med.wanfangdata.com.cn/Paper/Detail?id=PeriodicalPaper_ggwsyyfyx201905003&dbid=WF_QK (accessed on 9 November 2025).
Yilihamu, Y.; Yuemaier, N.; Wu, D.; Shi, Y.; Zheng, Y.; Zhang, L. Analysis on the incidence trend of pulmonary tuberculosis before and after the COVID-19 in Hotan, Xinjiang, from 2015 to 2021. Acta Univ. Med. Anhui 2024, 59, 678–683. [Google Scholar] [CrossRef]
Li, S.; Zhang, Y. Application of LSTM and Prophet Models in Predicting the Number of Tuberculosis Cases. Henan Sci. 2020, 38, 173–178. [Google Scholar] [CrossRef]
Wang, Y.; Gao, C.; Wang, L. Comparison of the effectiveness of five time series models for prediction of pulmonary tuberculosis incidence. China Prev. Med. J. 2022, 34, 1194–1200. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the proposed VLGA-Transformer framework.

Figure 2. Flowchart of Lorenz Genetic Algorithm.

Figure 3. VMD Mode Selection: MSE vs. Number of Modes.

Figure 4. VMD decomposition results for TB time series.

Figure 5. Thermogram of Model Prediction Error.

Figure 6. Scatter plot of true and predicted values for each model.(The red dashed line is the y = x perfect prediction line; lightgreen shading shows predicted values ≥ true values, and lightcoral shading shows predicted values < true values).

Figure 7. Time series comparison of true and predicted HBV incidence values.

Table 1. Comparison of Model Evaluation Indicators.

	RMSE	MSE	MAPE	MAE	R-Squared
GM (1,1) [40]	417.73	174,501.36	13.18	341.43	0.25
JPR [41]	401.55	161,244.97	12.75	327.61	0.30
LSTM [42]	352.89	124,530.92	10.41	261.02	0.46
Holt-Winters Multiplicative Model [43]	202.39	40,960.84	6.20	159.99	0.82
Holt-Winters Additive Model [43]	195.25	38,124.22	5.86	153.76	0.83
Transformer	330.67	109,347.21	9.81	248.55	0.52
V-Transformer	183.12	33,536.57	5.44	146.33	0.85
VLGA-Transformer	94.36	8903.83	2.80	75.49	0.96

Table 2. Evaluation of HBV Modeling by Our Model.

	RMSE	MSE	MAPE	MAE	R-Squared
Models for HBV	53.63	2876.23	3.23	42.35	0.93

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, G.; Zhang, L.; Zhang, F.; Xu, W. VLGA: A Chaos-Enhanced Genetic Algorithm for Optimizing Transformer-Based Prediction of Infectious Diseases. Mathematics 2025, 13, 3908. https://doi.org/10.3390/math13243908

AMA Style

Li G, Zhang L, Zhang F, Xu W. VLGA: A Chaos-Enhanced Genetic Algorithm for Optimizing Transformer-Based Prediction of Infectious Diseases. Mathematics. 2025; 13(24):3908. https://doi.org/10.3390/math13243908

Chicago/Turabian Style

Li, Guodong, Lu Zhang, Fuxin Zhang, and Wenxia Xu. 2025. "VLGA: A Chaos-Enhanced Genetic Algorithm for Optimizing Transformer-Based Prediction of Infectious Diseases" Mathematics 13, no. 24: 3908. https://doi.org/10.3390/math13243908

APA Style

Li, G., Zhang, L., Zhang, F., & Xu, W. (2025). VLGA: A Chaos-Enhanced Genetic Algorithm for Optimizing Transformer-Based Prediction of Infectious Diseases. Mathematics, 13(24), 3908. https://doi.org/10.3390/math13243908

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

VLGA: A Chaos-Enhanced Genetic Algorithm for Optimizing Transformer-Based Prediction of Infectious Diseases

Abstract

1. Introduction

2. Materials and Methods

2.1. Materials

2.1.1. Data Sources

2.1.2. Model Parameter Configuration

2.2. Methods

2.2.1. Theory

VMD

Transformer Model

Genetic Algorithm

Lorenz System

2.2.2. VLGA-Transformer Model

2.2.3. Evaluation Metrics

2.2.4. VLGA Optimization of Transformer Process

3. Results

3.1. VMD Decomposition of PTB Sequence

3.2. Experimental Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI