A Dual-Branch Typhoon-Gated Axial Transformer for Accurate Tropical Cyclone Path Forecasting

Huang, Xiaoyang; Fan, Kenan; Zhu, Xiaolin; Lv, Wei

doi:10.3390/atmos17040339

Open AccessArticle

A Dual-Branch Typhoon-Gated Axial Transformer for Accurate Tropical Cyclone Path Forecasting

School of Big Data, Zhuhai College of Science and Technology, Zhuhai 519000, China

^*

Author to whom correspondence should be addressed.

Atmosphere 2026, 17(4), 339; https://doi.org/10.3390/atmos17040339

Submission received: 5 February 2026 / Revised: 20 March 2026 / Accepted: 23 March 2026 / Published: 27 March 2026

(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)

Download

Browse Figures

Versions Notes

Abstract

Typhoon track prediction is an important research direction in weather forecasting. Although deep learning methods have achieved some progress in this field, challenges remain, including insufficient fusion of meteorological features, limited capability in modeling temporal and spatial evolution, and high computational cost of some models. To address these issues, this paper proposes a dual-path, multi-modal typhoon track prediction model that incorporates a gated axial Transformer to enhance the modeling of deep structural features in the meteorological environment. Numerical experimental results show that the proposed model achieves higher prediction accuracy than comparative methods in typhoon track prediction tasks across multiple time scales, demonstrating the effectiveness of the approach.

Keywords:

typhoon track prediction; deep learning; multi-modal data fusion; gated axial transformer; LSTM

1. Introduction

A typhoon is a low-pressure cyclone occurring over tropical or subtropical oceans. It behaves like a vortex in the atmosphere, causing surrounding air to rotate rapidly around its center and move along with the ambient atmospheric flow. Each year in summer and autumn, rising sea surface temperatures lead to evaporation and the formation of a low-pressure center. Air from surrounding high-pressure areas continuously flows toward the low-pressure center and, under the influence of the Coriolis force, forms a tropical cyclone. If the sea surface remains warm, the tropical cyclone continues to intensify, eventually developing into a powerful typhoon. When a typhoon makes landfall, the surrounding high-speed winds can cause injuries, property damage, and the uprooting of trees. Additionally, typhoons are often accompanied by heavy rainfall and secondary hazards such as landslides. According to statistics from the Typhoon Committee under the United Nations Economic and Social Commission for Asia and the Pacific (ESCAP) and the World Meteorological Organization (WMO), typhoons in China alone cause an average of approximately 505 deaths and 5.6 billion USD in economic losses annually [1].

With the rapid development of deep learning, its potential in natural disaster prediction and meteorological forecasting has increasingly been demonstrated. In particular, for typhoon track and intensity prediction, deep neural networks have significantly improved accuracy thanks to the adoption of architectures such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and Transformers in recent years. To capture the temporal characteristics of meteorological sequences, RNNs are commonly employed. For instance, Chao et al. (2024) utilized LSTM models with offshore buoy data to enhance forecast accuracy [2]. Park et al. (2024) proposed the TPTNet model [3], which treats multi-station spatiotemporal data as spatial climate fields and integrates CNNs, Transformers, and graph neural networks, achieving superior performance in short-term forecasting tasks compared to traditional numerical weather prediction. However, such methods are typically based on point data recorded by meteorological observation stations, resulting in a limited receptive field and difficulties in capturing long-range dependencies and global variations in complex meteorological systems.

To address this issue, many researchers have attempted to combine remote sensing meteorological imagery with station observation data to construct multi-modal temporal inputs, and multi-modal models are then employed for joint modeling. For example, Zhou et al. (2020) proposed the GC-LSTM model [4], which integrates CNNs and LSTMs to model satellite cloud imagery, improving categorical forecast accuracy. Xu et al. (2022) proposed SAF-Net [5] and AM-ConvGRU [6]; SAF-Net employs a wide-and-deep dual-path structure for joint modeling of typhoon features, while AM-ConvGRU introduces residual channel attention and multi-scale convolutions, achieving lower path prediction error. Qin et al. (2022) developed the Trj-DMFMG model [7], which combines multi-modal fusion and multi-task generative modules to enhance multi-source data modeling capabilities. He et al. (2024) proposed ACFN [8], incorporating attention mechanisms into convolutional structures to enhance key feature extraction and fusion. Park et al. (2024) developed the Transformer-based LT3P model [9], which excels in full-time-step forecasting, especially in short-term predictions. Tian et al. (2024) designed the lightweight dual-branch AWL-Net [10], improving multiple accuracy metrics under low computational cost. Qiao et al. (2024) proposed AT-ResNeXt-50 [11], integrating self-attention mechanisms to enhance recognition in complex scenarios. Ren et al. (2025) introduced the dual-encoder spatiotemporal fusion model DESF-Typhoon [12], which significantly outperforms existing methods in both path prediction accuracy and year-round stability. However, meteorological disasters often exhibit broad spatial impacts and significant regional variability, and point-based predictions alone cannot fully support the assessment and mitigation of such disasters.

To simulate regional future meteorological evolution trends, some studies attempt to learn meteorological evolution patterns and generate corresponding future meteorological images. For example, Andrychowicz et al. (2023) proposed MetNet-3 [13], introducing a key densification mechanism to enhance spatially dense prediction capabilities, showing significant potential in short-term high-resolution forecasting. Gao et al. (2023) proposed the diffusion generative model PreDiff [14], combining Transformer and U-Net architectures and introducing a physics-guided mechanism to enhance the physical consistency of the model. Hu et al. (2023) proposed SwinVRNN [15], a variational RNN model based on Swin Transformers, achieving high forecast accuracy while maintaining reasonable ensemble diversity. Ling et al. (2024) proposed the CDDPM model [16], combining convolution and upsampling convolution structures to improve generative accuracy and perceptual quality. Ren et al. (2024) introduced SAM-Net [17] by incorporating a self-attention memory module into PredRNN-v2. Lin et al. (2024) proposed the multivariate spatiotemporal hybrid convolution-attention network StHCFormer [18], excelling in wind evolution feature modeling. Kochkov et al. (2024) developed NeuralGCM [19], a hybrid model integrating physical modeling and machine learning methods. Li et al. (2025) proposed SwinNowcast [20], a deep learning model based on the Swin Transformer architecture, achieving low false alarm rates in regional precipitation forecasting. Xu et al. (2025) proposed the Fourier near-term forecasting model FourCastLSTM [21], significantly improving B-MAE and B-MSE metrics in precipitation nowcasting tasks. While these models enable relatively accurate and comprehensive small-scale regional meteorological image forecasting, they still fall short of providing a full understanding of global climate evolution.

In recent years, with substantial improvements in computational power and the availability of high-quality real-time meteorological datasets, some research teams have begun training large-scale meteorological models using supercomputers. For example, Zhang et al. (2023) proposed NowcastNet [22], which consists of an evolution network and a generation network, incorporating physics-based evolution mechanisms to improve prediction accuracy. Bi et al. (2023) introduced Pangu-Weather [23], a Transformer-based model integrating hierarchical prediction and physical priors for fine-grained modeling of multi-level atmospheric variables. Chen et al. (2023) developed FengWu [24], an advanced data-driven global medium-range weather forecasting system, demonstrating superior performance in 80% of 880 predicted meteorological variables while reducing errors in long-term forecasts. Lam et al. (2023) proposed GraphCast [25], a graph neural network-based model capable of efficiently predicting multiple meteorological variables over the next 10 days in one minute. Niu et al. (2024) developed Pangu_SP [26], improving the stability of typhoon track and intensity prediction through a spectral perturbation mechanism. Bodnar et al. (2025) proposed Aurora [27], a large foundational model trained on one million hours of multi-source geospatial data, outperforming forecasts from seven meteorological centers in high-resolution weather prediction. Despite their excellent performance, these models require substantial computational resources for training and inference, which limits their deployment on most computing platforms.

In summary, although deep learning models have achieved significant progress in typhoon track prediction, local meteorological evolution modeling, and global weather forecasting, existing approaches still face challenges such as insufficient multi-modal feature fusion, limited temporal and spatial evolution modeling, or excessive computational demands. To address these issues, this paper proposes a dual-path, multi-modal typhoon track prediction model that employs a gated axial Transformer to capture deeper structural features and a dual-branch prediction architecture to strengthen temporal dependencies and spatial evolution modeling, thereby improving the model’s predictive performance.

2. Materials and Methods

2.1. Data

This study utilized the Typhoon Best Track dataset released by the China Meteorological Administration (CMA) and the CDAS (Climate Data Assimilation System) data provided by the Climate Forecast System Version 2 (CFSv2) developed by the National Centers for Environmental Prediction (NCEP) in the United States. The CMA dataset, published by the Tropical Cyclone Data Center of the CMA, contains six-hourly typhoon position and intensity data over the Northwest Pacific since 1949. The dataset has been corrected and integrated by multiple meteorologists from various data sources (Ying et al., 2014 [28]; Lu et al., 2021 [29]). Due to its high accuracy, this dataset is considered the best record of tropical cyclones. The variables contained in the dataset and their descriptions are listed in Table 1. In this study, the best-track position data provided by the CMA dataset were used to determine the typhoon center locations. Pressure and wind speed were selected as input variables, while datetime information and typhoon intensity level were not used. No external vortex detection algorithms were applied in this work.

The CFSv2 provided by NCEP is an integrated atmosphere-ocean-land coupled prediction system, designed to generate high-precision analysis fields and seasonal-to-interannual climate forecasts. Compared with its predecessor CFS, CFSv2 has significantly improved spatial resolution, vertical levels, data assimilation methods, and coupling mechanisms. Its Climate Data Assimilation System (CDAS) automatically integrates multi-source observational data from satellites, ground stations, and ocean buoys worldwide to generate high-quality reanalysis data. The system has been operational four times daily since 2011 (at 0, 6, 12, and 18 UTC), with a spatial resolution of 0.5° for atmospheric variables and 0.25° for oceanic variables (Saha et al., 2014 [30]). Thanks to its long temporal coverage, rich variable types, and multiple vertical levels, CDAS data from CFSv2 have been widely applied in climate monitoring, model-driven studies, and tropical cyclone trajectory analysis. In this study, the isobaric surface data from the CDAS dataset were used. These data include wind components, temperature, humidity, and circulation-related parameters, which collectively characterize the dynamic and thermodynamic properties of the atmospheric environment influencing typhoon movement. Therefore, these variables were selected as meteorological environmental input features for the model. In addition, these variables are consistently available in the CDAS dataset and have been widely used in previous tropical cyclone studies. The names of the selected variables and their descriptions are summarized in Table 2.

In this study, tropical cyclone data from 2011 to 2024 were selected as the research subjects. For the CMA dataset, observations at 6 h intervals starting from 00:00 daily were used, i.e., 00:00, 06:00, 12:00, and 18:00 each day. For the CDAS dataset, in order to align the time and coordinates with the CMA dataset, the reanalysis meteorological fields corresponding to the same timestamps (00:00, 06:00, 12:00, and 18:00) in the CDAS were extracted. Based on the typhoon center coordinates at each time step from the CMA dataset, a 24° × 24° region centered on the typhoon location was selected from the CDAS dataset as the input data. Through this approach, the meteorological field data and typhoon track data remain strictly aligned in time, with no additional time lag. The data resolution was 0.5° × 0.5°, and multiple isobaric levels (225 mbar, 500 mbar, and 750 mbar) were used as input feature maps.

Furthermore, to address the issue of insufficient typhoon samples and the difficulty for models to fully learn atmospheric evolution features, an atmospheric evolution dataset was randomly constructed under the same spatial range, resolution, and isobaric layer configuration as the typhoon feature maps, and was used for model pretraining. The two datasets used in this study are illustrated in Figure 1. In this study, the first five time steps were used as model inputs, and the subsequent four time steps were used as prediction targets.

In addition, to handle the differences in data structures between the CMA and CDAS datasets and the large differences in value ranges among different feature maps in the CDAS dataset, min–max normalization was applied to all data. In the CMA dataset, tropical cyclone positions are recorded using latitude and longitude coordinates, where the longitude ranges from 10° N to 78° N and the latitude from 102.5° E to 106.3° W. Since the model predicts changes in latitude and longitude, which are mostly less than 10°, the input data range is relatively large. Therefore, min–max normalization was applied to the CMA inputs.

For the CDAS dataset, the value ranges of different feature maps vary greatly: some feature maps contain values on the order of tens of thousands, while others only have values in the single digits. Such large discrepancies could lead the model to ignore features with smaller values. Therefore, min–max normalization was applied separately to each feature map to ensure all inputs were scaled appropriately.

2.2. Method

With the introduction of CDAS reanalysis data, the diversity of feature maps contained in the dataset increases significantly. To enable the model to perceive and integrate multiple types of meteorological information, we combine a gated axial Transformer with other neural network components to construct a dual-branch model termed the Typhoon-Gated Axial Transformer (TGAT). This model is capable of effectively fusing multi-channel data and improving the prediction accuracy of typhoon intensity over the next 24 h. The overall architecture of the TGAT model is illustrated in Figure 2.

As shown in Figure 2, the proposed model consists of four main components: the Atmospheric Reanalysis Feature Encoding Module, the Typhoon Core Dynamics Encoding Module, the Typhoon Track Prediction Branch, and the Environmental Field Prediction Branch. The typhoon core dynamics encoding module feeds the input historical track data into an LSTM network and outputs a one-dimensional vector of length 48 that represents the typhoon’s movement tendency. The atmospheric reanalysis feature encoding module is responsible for extracting features from the input CDAS environmental feature maps, and its final output is a feature tensor with a shape of (512, 3, 3).

In the typhoon track prediction branch, the feature maps generated by the atmospheric reanalysis feature encoding module are first flattened and then fed into a Transformer module. The Transformer output is subsequently fused with the output of the typhoon core dynamics encoding module and passed to a decoder composed of an LSTM and fully connected layers to generate the final track prediction. In the environmental field prediction branch, a recurrent module consisting of linear layers and an LSTM first transforms the feature maps from five time steps to four time steps. The transformed features are then spatially reconstructed through convolutional upsampling to generate predicted future meteorological environment maps, enabling the model to be pretrained using meteorological data.

In the following sections of this chapter, we provide a detailed description of the atmospheric reanalysis feature encoding module and elaborate on the implementation mechanisms and functional roles of the LSTM and Transformer modules.

2.2.1. Long Short-Term Memory Network

The historical trajectory data of tropical cyclones is a multi-feature sequential dataset containing hidden temporal information, such as typhoon movement direction and intensity trends. To enable the model to capture temporal dynamics, researchers often employ RNNs or their variants to process such data. In this study, we adopt a Long Short-Term Memory (LSTM) network to process the historical trajectories of tropical cyclones.

The structure of an LSTM unit is illustrated in Figure 3. Compared with standard recurrent neural networks, its key feature is the presence of a cell state and three gating mechanisms: the input gate, forget gate, and output gate. The LSTM is responsible for preserving important information from previous time steps. The forget gate

f_{t}

controls the retention of the previous cell state

C_{t - 1}

; the input gate

i_{t}

regulates the incorporation of the current candidate cell state

{\hat{C}}_{t}

; the current cell state

C_{t}

is obtained by combining

C_{t - 1}

and

{\hat{C}}_{t}

; finally, the output gate

O_{t}

determines the hidden state

h_{t}

at the current time step. The computation formulas are as follows:

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(1)

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(2)

{\hat{C}}_{t} = \tanh (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{C})

(3)

C_{t} = f_{t} ⊙ C_{t - 1} + i_{t} ⊙ {\hat{C}}_{t}

(4)

O_{t} = σ (W_{O} \cdot [h_{t - 1}, x_{t}] + b_{O})

(5)

h_{t} = O_{t} ⊙ \tanh (C_{t})

(6)

Compared with other recurrent neural networks, LSTM can more effectively handle long-term dependencies. Its unique gating mechanisms enable the network to learn how to selectively retain or discard information in the cell state, thus mitigating issues such as information loss over long sequences and the vanishing gradient problem.

2.2.2. Reanalysis Data Encoder Branch

The reanalysis data encoder branch consists of a CNN-based feature extraction layer, a global–local gated axial Transformer module, and a Transformer-based feature extraction layer. The CNN feature extraction layer is composed of a single convolutional layer, which performs preliminary feature extraction and adjusts the spatial shape and channel dimensions of the input images to facilitate subsequent processing.

To enable the model to simultaneously capture global and local information, a global–local gated axial Transformer module is employed to process the image data. This module processes the input through two parallel branches, namely a global branch and a local branch, and the final output is obtained by concatenating the outputs of both branches. The two branches are described as follows:

Global branch: This branch directly applies the gated axial Transformer to the entire input image to model long-range dependencies and capture global contextual information.

Local branch: In this branch, the input image is divided into nine non-overlapping patches of size

\frac{s}{3} \times \frac{s}{3}

, where

s

denotes the spatial dimension of the original image. A gated axial Transformer is then applied to each patch independently. Finally, the outputs from the nine spatial locations are concatenated to form the output of the local branch.

Subsequently, the output of the global–local gated axial Transformer module is flattened into a one-dimensional vector and fed into a multi-layer Transformer module for further feature extraction, yielding the final output of the reanalysis data encoder branch.

2.2.3. Self-Attention Mechanism and Transformer

Self-attention is a key technique used to model dependencies between different positions within an input sequence. It maps the input features into queries (Query), keys (Key), and values (Value) through three learnable linear transformations, which can be formulated as follows:

q_{i j} = x_{i j} W^{q}; k_{i j} = x_{i j} W^{k}; v_{i j} = x_{i j} W^{v}

(7)

where

W^{q}

,

W^{k}

, and

W^{v}

are trainable parameters, and

q_{i j}

,

k_{i j}

, and

v_{i j}

denote the query, key, and value at an arbitrary spatial position

(i, j)

in the image, respectively.

When an input image

x ϵ R^{C_{i n} \times H \times W}

with height

H

, width

W

, and

C_{i n}

channels is provided, the self-attention layer computes an output

Y ϵ R^{C_{o u t} \times H \times W}

The self-attention operation is formulated as follows:

y_{i j} = \sum_{h = 1}^{H} \sum_{w = 1}^{W} s o f t m a x (q_{i j}^{T} k_{h w}) v_{h w}

(8)

As shown in the above formulation, the attention weights are determined by the similarity between

q_{i j}

and

k_{h w}

, both of which are dynamically generated from the input x. This property enables the self-attention mechanism to adaptively adjust its weights according to different inputs, allowing each spatial position in an image to capture global contextual information.

Building upon the self-attention mechanism, Vaswani et al. (2017) [31] proposed the Transformer architecture, which stacks multiple self-attention layers and feed-forward neural networks. In particular, the multi-head attention mechanism employs multiple independent attention heads to compute attention in parallel, with each head learning different feature distributions. Owing to the introduction of multi-head attention, Transformers are capable of effectively modeling diverse dependency patterns within sequences and capturing deep contextual relationships. In this study, Transformer modules are also adopted for feature extraction.

2.2.4. Gated Axial Attention Mechanism

With the strong representation capability demonstrated by Transformer architectures, researchers have increasingly explored their application in computer vision tasks. Dosovitskiy et al. (2021) [32] proposed the Vision Transformer (ViT), which partitions an image into patches, flattens them, and feeds them into a Transformer to extract image features. Studies have shown that when trained on sufficiently large datasets, ViT can outperform traditional convolutional neural networks. However, despite its advantages—such as parallel computation and powerful modeling capacity—ViT suffers from several limitations, including high computational complexity, large memory consumption, and insufficient capability to explicitly capture spatial positional information. Moreover, ViT tends to overfit on small-scale datasets, leading to inferior performance compared with conventional CNNs in data-limited scenarios.

To address the high computational cost of self-attention and its limited ability to capture positional information, Wang et al. proposed an axial attention mechanism with relative positional encoding [33]. Specifically, axial attention abandons full 2D attention computation at each spatial location and instead decomposes attention along the height and width axes independently.

The gated axial attention module adopted in this study is illustrated in Figure 4. As shown, the module consists of a

1 \times 1

convolutional layer, a normalization layer, and multiple layers of gated multi-head axial attention operating along horizontal and vertical directions. The

1 \times 1

convolution is mainly responsible for channel projection and dimensional adjustment, the normalization layer stabilizes the training process, and the stacked gated axial multi-head attention layers efficiently extract long-range dependency features from the image while significantly reducing computational overhead.

In addition, Wang et al. [33] introduced positional bias terms into the queries (Q), keys (K), and values (V), enabling the attention mechanism to more accurately capture positional information within the sequence. While their approach addresses the issues of high computational cost and the lack of positional awareness in self-attention, their experiments were conducted on relatively large datasets. However, studies have shown that although self-attention mechanisms can effectively capture data features when trained on large-scale datasets, they may fail to realize their advantages on smaller datasets. This limitation is mainly attributed to the difficulty of learning effective positional encodings in small-scale datasets, which leads to reduced accuracy in encoding long-range dependencies.

To address the poor performance of self-attention mechanisms on small-scale datasets, Valanarasu et al. proposed the gated axial attention module [34], which incorporates multiple learnable gating parameters to control the influence of positional encodings in global context modeling. The structure of the gated attention mechanism is illustrated in Figure 5.

Here,

G_{Q}

,

G_{K}

,

G_{V 1}

, and

G_{V 2}

are learnable gating parameters. They can learn the relative importance of the positional encodings at different locations, assigning higher weights to positions where the positional encoding is more accurate. The computation formula for the axial attention with the gated mechanism is as follows:

y_{i j} = \sum_{w = 1}^{W} s o f t m a x (q_{i j}^{T} k_{i w} + G_{Q} q_{i j}^{T} r_{i j}^{q} + G_{K} k_{i w}^{T} r_{i w}^{k}) \cdot (G_{V 1} υ_{i w} + G_{V 2} r_{i w}^{ν})

(9)

2.2.5. Environmental Field Prediction Branch

The environmental field prediction branch consists of a temporal transformation module and a convolutional upsampling module. The temporal transformation module is responsible for extracting and reconstructing features along the temporal dimension to match the input requirements of subsequent modules. To prevent an excessive number of model parameters, a fully connected layer is first applied to scale the feature dimensions, followed by an LSTM network to reconstruct the features along the temporal dimension.

The convolutional upsampling module adopts a progressive upsampling strategy to spatially reconstruct the features generated by the temporal transformation module. The detailed architecture is illustrated in Figure 6. In this module, two image upsampling blocks are employed, each of which first applies convolutional layers for feature extraction, followed by bilinear interpolation to increase the spatial resolution. Bilinear interpolation is adopted because it effectively mitigates the mosaic effect, thereby preserving spatial continuity and smoothness while restoring spatial resolution. Finally, a

1 \times 1

convolutional layer is applied to compress and integrate the feature channels.

3. Results and Discussion

The numerical experiments were conducted on a workstation running the Windows 11 24H2 operating system, with hardware configurations including an NVIDIA GeForce RTX 4090 GPU (NVIDIA Corporation, Santa Clara, CA, USA), an Intel Core i7-13700KF CPU (Intel Corporation, Santa Clara, CA, USA), and 32 GB of RAM. The Python 3.12.3 environment was installed via Conda 24.11.3. The deep learning models were implemented using PyTorch 2.6.0 with CUDA 12.4. The implemented loss function and optimizer were L1Loss and Adam, respectively.

3.1. Experimental Details

3.1.1. Evaluation Metrics

The model in this study outputs a vector sequence

\{({Δ l a t}_{i}, {Δ l o n}_{i}) |i = 1, 2, 3, 4\}

, representing the differences between the typhoon coordinates at 6 h, 12 h, 16 h, and 24 h after the last time step of the input sequence and the coordinates at the last input time step. Let the last input in the sequence be the coordinates at time

n

,

({l a t}_{n}, {l o n}_{n})

. The model’s predicted coordinates

p r e

are then calculated as:

p r e = \{({l a t}_{n} + {Δ l a t}_{i}, {l o n}_{n} + {Δ l o n}_{i}) |i = 1, 2, 3, 4\}

We use the Average Position Error (APE) as the primary evaluation metric. APE is widely used in typhoon track prediction because of its accurate and intuitive measurement of prediction precision. In practice, the spherical distance between the predicted and actual coordinates is first calculated using spherical trigonometry. APE reflects the average great-circle distance between the predicted typhoon center and the true center, calculated as:

A P E = \frac{1}{N} \sum_{i = 1}^{N} d (({l a t}_{i}^{p r e d}, {l o n}_{i}^{p r e d}), ({l a t}_{i}^{t r u e}, {l o n}_{i}^{t r u e}))

(10)

d = 2 R \times \arcsin (\sqrt{{s i n}^{2} (\frac{{l a t}^{p r e d} - {l a t}^{t r u e}}{2}) + \cos ({l a t}^{p r e d}) \cos ({l a t}^{t r u e}) {s i n}^{2} (\frac{{l o n}^{p r e d} - {l o n}^{t r u e}}{2})})

(11)

where

N

in Equation (10) is the number of samples at the current prediction time,

d

is the great-circle distance, and

R

in Equation (11) is the radius of the Earth (usually 6371 km).

3.1.2. Experimental Setup

In this study, data from 2011 to 2018 were used as the training set, 2019 to 2020 as the validation set, and 2021 to 2024 as the test set. The initial learning rate was set to 0.0005, the batch size to 32, and the number of training epochs to 150. L1 loss was used to compute prediction errors, and the Adam optimizer was employed to update model parameters. During the model pretraining stage, all other settings remained the same, but the number of training epochs was reduced to 50.

3.2. Comparative Experiments

To verify the effectiveness of the proposed method, we compared it with several models proposed by other researchers in recent years. The baseline methods include ViT-LSTM, CNN-LSTM, AM-ConvGRU, and a Spatio-temporal model. In the numerical experiments, these models used the same loss function, batch size, learning rate, and other hyperparameters as the proposed TGAT model.

Based on this setup, we compared the APE between the predicted and true tropical cyclone coordinates at forecasting horizons of 6 h, 12 h, 18 h, and 24 h. The numerical experimental results are summarized in Table 3.

As shown in Table 3, compared with the Spatio-temporal model, which achieves the lowest error among the baseline methods, the proposed TGAT model reduces the prediction error by 2.51%, 2.14%, 3.12%, and 3.48% at different forecasting horizons within the 24 h prediction task. These results demonstrate the superiority of the TGAT model in tropical cyclone track prediction, with particularly notable improvements in long-term forecasting performance.

In addition, we evaluated the cross-year stability of different models by comparing their average errors across different years. The results indicate that the TGAT model achieves the lowest error in all years except 2024, suggesting strong adaptability and robustness. This demonstrates that TGAT is capable of stable and reliable prediction across most tropical cyclones, rather than performing well only under specific conditions.

As shown in Table 4, compared with the Spatio-temporal model, which achieves the lowest error among the baseline methods, the proposed TGAT model reduces the prediction error by 2.51%, 2.14%, 3.12%, and 3.48% at different forecasting lead times within the 24 h prediction task. These results confirm that the TGAT model has a clear advantage in tropical cyclone track prediction, particularly in terms of its longer-term forecasting capability.

Furthermore, we compared the average errors of different models across multiple years to evaluate their cross-year stability. The results show that the TGAT model achieves the lowest error in all years except 2024, indicating strong adaptability and robustness. This suggests that the TGAT model can reliably predict the tracks of most tropical cyclones, rather than performing well only under specific conditions.

To more intuitively demonstrate the effectiveness of the TGAT model in tropical cyclone track prediction, we randomly selected four intense typhoons from the test set and generated their predicted tracks based on the 24 h forecast positions, which were then compared with the corresponding observed tracks. As shown in Figure 7, the TGAT model is generally able to accurately capture the overall movement trends of tropical cyclones. However, larger prediction errors tend to occur when cyclones undergo abrupt changes in direction. This is likely due to the increased complexity of the atmospheric environment under such conditions, making it more difficult for the model to fully capture and represent the key influencing features.

Furthermore, to evaluate the practical performance of the proposed model, TGAT was compared with the operational global forecast system CMA-GFS of the China Meteorological Administration. According to the 2023 typhoon forecast evaluation results (Yang et al., 2025) [35], the 24 h mean track forecast error of CMA-GFS was 77.4 km. Although the proposed model has not yet reached the accuracy level of operational numerical weather prediction systems, it demonstrates advantages in computational cost and data dependency as a deep learning-based approach. Under the same input data conditions, TGAT can achieve reasonable track prediction performance with lower computational resources, providing an efficient technical framework for tropical cyclone track forecasting. Future work will focus on incorporating additional environmental variables and further improving the model architecture to narrow the gap with operational systems while maintaining computational efficiency.

3.3. Ablation Study

To verify the effectiveness of each component in the proposed model, we conducted a series of ablation experiments. In the ablation study, four models with different structural completeness were compared in terms of prediction accuracy to evaluate the contribution of each module. Among them, CNN + LSTM serves as the most basic prediction model. Its overall structure is similar to that of TGAT, but it only employs a CNN module to encode atmospheric reanalysis features and does not introduce the Transformer architecture. The CNN + Transformer + LSTM model further incorporates a standard Transformer module to perform additional modeling and fusion of the features extracted by the CNN. For the complete TGAT model, two configurations were evaluated, namely with and without pretraining, in order to assess the impact of the axial Transformer structure and the dual-branch pretraining strategy on model performance. The results of the ablation experiments are presented in Table 5.

It can be observed that the Average Position Error (APE) gradually decreases as the model architecture becomes more sophisticated. Specifically, incorporating a Transformer module to further extract features from the CNN outputs leads to a significant improvement in prediction accuracy, indicating that the Transformer is effective in capturing informative representations from convolutional features. Building upon this, the adoption of the TGAT architecture results in further improvements in prediction accuracy, demonstrating its superior capability in modeling spatial dependencies and their dynamic evolution. In addition, the introduction of a pretraining strategy further improves the model’s data-fitting ability.

3.4. Model Interpretability Analysis

To further investigate the contribution of different input features, feature ablation and gradient-based saliency analyses were conducted, as shown in Figure 8 and Figure 9. In the computation of the saliency maps in Figure 9, the results of each feature were averaged over three pressure levels. In addition, a smoothing operation was applied to reduce noise and enhance interpretability.

As illustrated in Figure 8, removing variables such as W, V, and VP leads to a significant increase in prediction error, with the maximum increase reaching up to 36%, indicating that these variables play a dominant role in model performance. In contrast, GH, RH, and Q have relatively smaller impacts on the prediction results.

The spatial saliency results in Figure 9 further indicate that the model tends to focus on localized key regions, while maintaining a moderate level of response across most areas. Combined with Figure 8, it can be observed that features with higher importance generally exhibit more concentrated and prominent high-response regions in Figure 9. In contrast, features with lower importance, such as GH, RH, and Q, tend to show more dispersed regions with moderate responses. This suggests that the former contains more concentrated and informative patterns that are easier for the model to capture, whereas the latter exhibit more complex or less distinct spatial structures, making them harder to be effectively learned.

It is also noteworthy that, although most high-saliency regions are concentrated around the typhoon center, relatively strong responses are also observed in the upper-right regions for features such as T, Q, STR, and VP. Considering the geographical characteristics of the Western North Pacific, this may indicate that the model pays attention to the variations of these features when the typhoon approaches coastal or land areas.

Overall, the combination of feature importance analysis and spatial saliency analysis demonstrates that the proposed model is capable of not only identifying key input variables but also learning their spatial distributions, thereby enhancing model interpretability. However, for certain features, the model still shows limitations in representation capability, suggesting that its ability to capture complex patterns could be further improved.

4. Conclusions

This study investigates tropical cyclone track prediction using deep learning-based approaches. Specifically, to enable long-term tropical cyclone track forecasting, we constructed a hybrid dataset with high temporal and spatial coverage spanning 2011 to 2024 by integrating CMA and CFSv2 datasets. A sliding-window strategy was employed to generate the model inputs and outputs. In addition, to improve data quality and enhance model fitting capability, normalization was applied separately to different data sources.

To address the challenge of insufficient spatial feature extraction in long-term forecasting, we proposed the Typhoon-Gated Axial Transformer (TGAT) model. The proposed model combines the efficiency of convolutional neural networks in local feature extraction with the ability of Transformers to model global dependencies. Furthermore, a gated axial attention mechanism was introduced to effectively control parameter redundancy in Transformer-based image modeling, thereby improving computational efficiency and generalization performance. A pretraining strategy was also incorporated to enhance the model’s capability to perceive and model future environmental changes.

Results from the numerical experiments demonstrate TGAT achieves lower Average Position Error (APE) than all comparison models in 6–24 h track prediction tasks and exhibits superior robustness in cross-year stability evaluations. In addition, ablation experiments confirm the contributions of individual model components, indicating that both the Transformer module and the gated axial attention mechanism play critical roles in improving prediction accuracy.

Although TGAT achieves strong performance on most test samples, we observe that prediction errors increase when tropical cyclones undergo abrupt directional changes. This limitation may arise from the difficulty of learning complex atmospheric features under special environmental conditions from limited samples. Moreover, while the gated axial attention mechanism effectively reduces the computational cost of Transformer-based image modeling, the overall computational overhead of the model remains higher than that of other comparable methods. Future work will focus on improving the model’s fitting capability for rare and extreme cases, as well as exploring more efficient architectural designs to further reduce computational complexity and enhance operational efficiency.

Author Contributions

Conceptualization, W.L. and X.Z.; methodology, X.H.; validation, X.H., X.Z. and K.F.; Software, X.H. and K.F.; investigation, X.H. and K.F.; data curation, X.H. and K.F.; writing—original draft preparation, X.H.; writing—review and editing, W.L., X.Z. and X.H.; visualization, X.H. and K.F.; supervision, X.Z. and W.L.; project administration, X.Z. and W.L.; funding acquisition, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Key Research Platforms and Projects of Ordinary Universities under the Guangdong Provincial Department of Education (grant number: 2023ZDZX1049, applicant: Wei Lv).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lee, T.-C.; Knutson, T.R.; Nakaegawa, T.; Ying, M.; Cha, E.J. Third Assessment on Impacts of Climate Change on Tropical Cyclones in the Typhoon Committee Region—Part I: Observed Changes, Detection and Attribution. Trop. Cyclone Res. Rev. 2020, 9, 1–22. [Google Scholar] [CrossRef]
Chao, W.-T.; Kuo, T.-J. Long Short-Term Memory Networks’ Application on Typhoon Wave Prediction for the Western Coast of Taiwan. Sensors 2024, 24, 4305. [Google Scholar] [CrossRef]
Park, J.; Lee, C. TPTNet: A Data-Driven Temperature Prediction Model Based on Turbulent Potential Temperature. Earth Space Sci. 2024, 11, e2024EA003523. [Google Scholar] [CrossRef]
Zhou, J.; Xiang, J.; Huang, S. Classification and Prediction of Typhoon Leve0/0/00 0:00:00 AMls by Satellite Cloud Pictures through GC–LSTM Deep Learning Model. Sensors 2020, 20, 5132. [Google Scholar] [CrossRef]
Xu, G.; Lin, K.; Li, X.; Ye, Y. SAF-Net: A Spatio-Temporal Deep Learning Method for Typhoon Intensity Prediction. Pattern Recognit. Lett. 2022, 155, 121–127. [Google Scholar] [CrossRef]
Xu, G.; Xian, D.; Fournier-Viger, P.; Li, X.; Ye, Y.; Hu, X. AM-ConvGRU: A Spatio-Temporal Model for Typhoon Path Prediction. Neural Comput. Appl. 2022, 34, 5905–5921. [Google Scholar] [CrossRef]
Qin, W.; Tang, J.; Lu, C.; Lao, S. A Typhoon Trajectory Prediction Model Based on Multimodal and Multitask Learning. Appl. Soft Comput. 2022, 122, 108804. [Google Scholar] [CrossRef]
He, H.; Shi, B.; Hao, Y.; Feng, L.; Lyu, X.; Ling, Z. Forecasting Sea Surface Temperature during Typhoon Events in the Bohai Sea Using Spatiotemporal Neural Networks. Atmos. Res. 2024, 309, 107578. [Google Scholar] [CrossRef]
Park, Y.-J.; Seo, M.; Kim, D.; Kim, H.; Choi, S.; Choi, B.; Ryu, J.; Son, S.; Jeon, H.-G.; Choi, Y. Long-Term Typhoon Trajectory Prediction: A Physics-Conditioned Approach Without Reanalysis Data. In Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
Tian, W.; Song, P.; Chen, Y.; Xu, H.; Jin, C.; Sian, K.T.C.L.K. Short-Term Intensity Prediction of Tropical Cyclones Based on Multi-Source Data Fusion with Adaptive Weight Learning. Remote Sens. 2024, 16, 984. [Google Scholar] [CrossRef]
Qiao, B.; Wang, Y.; Yao, L.; Han, D.; Wu, G. Attention Mechanism Fusion Neural Network for Typhoon Path Prediction. Appl. Intell. 2024, 55, 244. [Google Scholar] [CrossRef]
Ren, S.; Zhong, R.; Guo, Z.; Zhang, Z. Dual-Encoder Model for Typhoon Path Prediction with Multiscale Spatiotemporal Data Fusion. Earth Sci. Inform. 2025, 18, 385. [Google Scholar] [CrossRef]
Andrychowicz, M.; Espeholt, L.; Li, D.; Merchant, S.; Merose, A.; Zyda, F.; Agrawal, S.; Kalchbrenner, N. Deep Learning for Day Forecasts from Sparse Observations. arXiv 2023, arXiv:2306.06079. [Google Scholar] [CrossRef]
Gao, Z.; Shi, X.; Han, B.; Wang, H.; Jin, X.; Maddix, D.; Zhu, Y.; Li, M.; Wang, Y. (Bernie) PreDiff: Precipitation Nowcasting with Latent Diffusion Models. Adv. Neural Inf. Process. Syst. 2023, 36, 78621–78656. [Google Scholar]
Hu, Y.; Chen, L.; Wang, Z.; Li, H. SwinVRNN: A Data-Driven Ensemble Forecasting Model via Learned Distribution Perturbation. J. Adv. Model. Earth Syst. 2023, 15, e2022MS003211. [Google Scholar] [CrossRef]
Ling, Z.; Nath, P.; Quilodrán-Casas, C. Estimating Atmospheric Variables from Digital Typhoon Satellite Images via Conditional Denoising Diffusion Models. arXiv 2024, arXiv:2409.07961. [Google Scholar] [CrossRef]
Ren, Y.; Ye, J.; Wang, X.; Xiao, F.; Liu, R. SAM-Net: Spatio-Temporal Sequence Typhoon Cloud Image Prediction Net with Self-Attention Memory. Remote Sens. 2024, 16, 4213. [Google Scholar] [CrossRef]
Lin, L.; Zhang, Z.; Yu, H.; Wang, J.; Gao, S.; Zhao, H.; Zhang, J. StHCFormer: A Multivariate Ocean Weather Predicting Method Based on Spatiotemporal Hybrid Convolutional Attention Networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 3600–3614. [Google Scholar] [CrossRef]
Kochkov, D.; Yuval, J.; Langmore, I.; Norgaard, P.; Smith, J.; Mooers, G.; Klöwer, M.; Lottes, J.; Rasp, S.; Düben, P.; et al. Neural General Circulation Models for Weather and Climate. Nature 2024, 632, 1060–1066. [Google Scholar] [CrossRef]
Li, Z.; Lu, Z.; Li, Y.; Liu, X. SwinNowcast: A Swin Transformer-Based Model for Radar-Based Precipitation Nowcasting. Remote Sens. 2025, 17, 1550. [Google Scholar] [CrossRef]
Xu, C.; Liu, J.; Han, S.; Duan, X.; Xiang, L.; Zhang, T. FourCastLSTM: A Precipitation Nowcasting Model Integrating Global and Local Spatiotemporal Features. Comput. Geosci. 2025, 204, 105966. [Google Scholar] [CrossRef]
Zhang, Y.; Long, M.; Chen, K.; Xing, L.; Jin, R.; Jordan, M.I.; Wang, J. Skilful Nowcasting of Extreme Precipitation with NowcastNet. Nature 2023, 619, 526–532. [Google Scholar] [CrossRef]
Bi, K.; Xie, L.; Zhang, H.; Chen, X.; Gu, X.; Tian, Q. Accurate Medium-Range Global Weather Forecasting with 3D Neural Networks. Nature 2023, 619, 533–538. [Google Scholar] [CrossRef] [PubMed]
Chen, K.; Han, T.; Gong, J.; Bai, L.; Ling, F.; Luo, J.-J.; Chen, X.; Ma, L.; Zhang, T.; Su, R.; et al. FengWu: Pushing the Skillful Global Medium-Range Weather Forecast beyond 10 Days Lead. arXiv 2023, arXiv:2304.02948. [Google Scholar]
Lam, R.; Sanchez-Gonzalez, A.; Willson, M.; Wirnsberger, P.; Fortunato, M.; Alet, F.; Ravuri, S.; Ewalds, T.; Eaton-Rosen, Z.; Hu, W.; et al. Learning Skillful Medium-Range Global Weather Forecasting. Science 2023, 382, 1416–1421. [Google Scholar] [CrossRef]
Niu, Z.; Huang, W.; Zhang, L.; Deng, L.; Wang, H.; Yang, Y.; Wang, D.; Li, H. Improving Typhoon Predictions by Integrating Data-Driven Machine Learning Model with Physics Model Based on the Spectral Nudging and Data Assimilation. Earth Space Sci. 2025, 12, e2024EA003952. [Google Scholar] [CrossRef]
Bodnar, C.; Bruinsma, W.P.; Lucic, A.; Stanley, M.; Allen, A.; Brandstetter, J.; Garvan, P.; Riechert, M.; Weyn, J.A.; Dong, H.; et al. A Foundation Model for the Earth System. Nature 2025, 641, 1180–1187. [Google Scholar] [CrossRef]
Ying, M.; Zhang, W.; Yu, H.; Lu, X.; Feng, J.; Fan, Y.; Zhu, Y.; Chen, D. An Overview of the China Meteorological Administration Tropical Cyclone Database. J. Atmos. Ocean. Technol. 2014, 31, 287–301. [Google Scholar] [CrossRef]
Lu, X.; Yu, H.; Ying, M.; Zhao, B.; Zhang, S.; Lin, L.; Bai, L.; Wan, R. Western North Pacific Tropical Cyclone Database Created by the China Meteorological Administration. Adv. Atmos. Sci. 2021, 38, 690–699. [Google Scholar] [CrossRef]
Saha, S.; Moorthi, S.; Wu, X.; Wang, J.; Nadiga, S.; Tripp, P.; Behringer, D.; Hou, Y.-T.; Chuang, H.; Iredell, M.; et al. The NCEP Climate Forecast System Version 2. J. Clim. 2014, 27, 2185–2208. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Wang, H.; Zhu, Y.; Green, B.; Adam, H.; Yuille, A.; Chen, L.-C. Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 108–126. [Google Scholar]
Valanarasu, J.M.J.; Oza, P.; Hacihaliloglu, I.; Patel, V.M. Medical Transformer: Gated Axial-Attention for Medical Image Segmentation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2021, Strasbourg, France, 27 September–1 October 2021; de Bruijne, M., Cattin, P.C., Cotin, S., Padoy, N., Speidel, S., Zheng, Y., Essert, C., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 36–46. [Google Scholar]
Yang, M.; Chen, G.; Chen, C.; Zhang, X.; Tang, L.; Bai, L.; Guo, R. Evaluation of the Forecast Accuracy of Tropical Cyclones over the Western North Pacific and the South China Sea in 2023. Available online: http://qxqk.nmc.cn/html/2025/12/20251208.html#outline_anchor_17 (accessed on 8 March 2026).

Figure 1. Example of image data and typhoon trajectory inputs at each time step.

Figure 2. Architecture of the TGAT model.

Figure 3. Structure of an LSTM unit. The orange dashed box represents the forget gate, the blue dashed box represents the input gate, and the yellow dashed box represents the output gate.

Figure 4. Structure of the Gated Axial Transformer.

Figure 5. Structure of the Gated Attention Mechanism.

Figure 6. Structure of a upsampling module.

Figure 7. Trajectory comparison of four selected typhoons: (a) Typhoon Muifa, (b) Typhoon Guchol, (c) Typhoon Lan, and (d) Typhoon Son-Tinh. The blue solid line represents the ground truth, while the orange dashed line indicates the predicted results.

Figure 8. Feature importance analysis of the TGAT model. The bar chart presents the Average Position Error (APE) when each meteorological feature is removed during prediction, compared with the full-feature baseline (TGAT). The abbreviations of the features are as follows: GH—Geopotential Height, T—Temperature, RH—Relative Humidity, Q—Specific Humidity, W—Vertical Velocity, U—u-component of wind, V—v-component of wind, ABSV—Absolute Vorticity, O3MR—Ozone Mixing Ratio, STRF—Stream Function, and VP—Potential Vorticity. A higher bar indicates a larger increase in APE after removing the corresponding feature, suggesting greater importance in typhoon track prediction. The first bar represents the TGAT baseline using the complete feature set.

Figure 9. Spatial saliency maps of different input features. The saliency maps are computed based on gradient-based methods and averaged over three pressure levels. A smoothing operation is applied to reduce noise and improve interpretability. Each subfigure corresponds to a specific atmospheric variable: (a) Geopotential Height, (b) Temperature, (c) Relative Humidity, (d) Specific Humidity, (e) Vertical Velocity, (f) u-component of wind, (g) v-component of wind, (h) Absolute Vorticity, (i) Ozone Mixing Ratio, (j) Stream Function, and (k) Potential Vorticity. The color bar indicates the magnitude of saliency values, with colors ranging from blue to red representing increasing response intensity.

Table 1. Variables and descriptions in the CMA dataset.

Variable	Description
Datetime	Observation time, formatted as YYYYMMDDHH
Typhoon level	Typhoon intensity level, ranging from 0 to 6
Latitude	Latitude × 10
Longitude	Longitude × 10
Pressure	Minimum central pressure of the typhoon
Wind speed	Maximum wind speed at the typhoon center

Table 2. Variables and descriptions in CDAS data.

Variable	Description
Geopotential Height	Height of the isobaric surface relative to sea level, representing its three-dimensional undulations
Temperature	Reflects the thermal state of the air, a fundamental variable affecting atmospheric changes
Relative Humidity	Ratio of water vapor content in the air to saturation
Specific Humidity	Mass of water vapor per unit mass of air
Vertical Velocity	Intensity of vertical motion of air
u-component of wind	Wind speed component in the east–west direction; together with v-component, determines wind speed and direction
v-component of wind	Wind speed component in the north–south direction; together with u-component, determines wind speed and direction
Absolute Vorticity	Physical quantity measuring air rotation intensity, used to analyze cyclonic systems
Ozone Mixing Ratio	Proportion of ozone in the air
Stream Function	Describes non-divergent horizontal flow; contour lines reflect large-scale rotational structures
Potential Vorticity	Integrates air rotation and stratification stability; an important indicator for analyzing weather system evolution

Table 3. Comparison of APE (km) among different models.

Methods	6 h_APE	12 h_APE	18 h_APE	24 h_APE
ViT-LSTM	49.71	98.59	158.21	226.74
CNN-GRU	62.97	121.47	182.44	246.51
AM-Conv GRU (2022)	44.44	82.565	124.07	173.56
Spatio-temporal model (2024)	44.10	91.55	148.21	214.25
TGAT	36.85	73.53	115.67	167.23

Table 4. Average APE of different models across different years.

Methods	2021_APE	2022_APE	2023_APE	2024_APE	AVG_APE
Vit-LSTM	134.55	151.03	110.06	140.83	133.31
CNN-GRU	156.53	167.72	134.89	156.61	153.35
AM-Conv GRU (2022)	129.25	156.65	103.42	136.62	147.05
Spatio-temporal model (2024)	129.52	142.43	101.80	127.17	124.53
TGAT	104.16	120.98	86.69	108.11	104.15

Table 5. Results of the ablation experiments.

Methods	6 h_APE	12 h_APE	18 h_APE	24 h_APE
CNN + LSTM	59.21	119.02	185.91	255.35
CNN + Transformer + LSTM	51.32	102.20	158.77	231.29
TGAT without pretrain	38.73	77.67	122.50	177.71
TGAT	36.85	73.53	115.67	167.23

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, X.; Fan, K.; Zhu, X.; Lv, W. A Dual-Branch Typhoon-Gated Axial Transformer for Accurate Tropical Cyclone Path Forecasting. Atmosphere 2026, 17, 339. https://doi.org/10.3390/atmos17040339

AMA Style

Huang X, Fan K, Zhu X, Lv W. A Dual-Branch Typhoon-Gated Axial Transformer for Accurate Tropical Cyclone Path Forecasting. Atmosphere. 2026; 17(4):339. https://doi.org/10.3390/atmos17040339

Chicago/Turabian Style

Huang, Xiaoyang, Kenan Fan, Xiaolin Zhu, and Wei Lv. 2026. "A Dual-Branch Typhoon-Gated Axial Transformer for Accurate Tropical Cyclone Path Forecasting" Atmosphere 17, no. 4: 339. https://doi.org/10.3390/atmos17040339

APA Style

Huang, X., Fan, K., Zhu, X., & Lv, W. (2026). A Dual-Branch Typhoon-Gated Axial Transformer for Accurate Tropical Cyclone Path Forecasting. Atmosphere, 17(4), 339. https://doi.org/10.3390/atmos17040339

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Dual-Branch Typhoon-Gated Axial Transformer for Accurate Tropical Cyclone Path Forecasting

Abstract

1. Introduction

2. Materials and Methods

2.1. Data

2.2. Method

2.2.1. Long Short-Term Memory Network

2.2.2. Reanalysis Data Encoder Branch

2.2.3. Self-Attention Mechanism and Transformer

2.2.4. Gated Axial Attention Mechanism

2.2.5. Environmental Field Prediction Branch

3. Results and Discussion

3.1. Experimental Details

3.1.1. Evaluation Metrics

3.1.2. Experimental Setup

3.2. Comparative Experiments

3.3. Ablation Study

3.4. Model Interpretability Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI