Analysis of Generalization Performance of Tornado Detection Models: A Cross-Domain Evaluation from U.S. to Chinese Weather Radar Observations

Jiang, Biao; Zhang, Shuai; Chen, Yubao; Li, Xuehua; Wang, Yancheng

doi:10.3390/rs18060948

Open AccessArticle

Analysis of Generalization Performance of Tornado Detection Models: A Cross-Domain Evaluation from U.S. to Chinese Weather Radar Observations

by

Biao Jiang

¹,

Shuai Zhang

^2,3,

Yubao Chen

^3,*,

Xuehua Li

¹ and

Yancheng Wang

¹

College of Electronic Engineering, Chengdu University of Information Technology, Chengdu 610225, China

²

Tornado Key Laboratory, China Meteorological Administration, Foshan 528000, China

³

Meteorological Observation Center, China Meteorological Administration, Beijing 100081, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(6), 948; https://doi.org/10.3390/rs18060948

Submission received: 26 January 2026 / Revised: 18 March 2026 / Accepted: 19 March 2026 / Published: 20 March 2026

(This article belongs to the Special Issue State-of-the-Art Remote Sensing in Precipitation and Thunderstorm)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A systematic evaluation of the transferability of automated tornado detection algorithms trained on U.S. tornado cases and radar observations but applied to Chinese tornado cases and radar observations.
The decrease in model accuracy across different regions is mainly caused by differences in radar systems and naturally weaker storm characteristics in China.

What are the implications of the main findings?

Traditional tree-based models proved more stable and reliable than deep learning methods when tested on a new, unfamiliar radar network.
Adjusting the intensity of specific radar features can help models correctly identify tornadoes that would otherwise be missed in the new region.

Abstract

Tornadoes pose severe threats, yet their low frequency in China creates a labeled data scarcity that hinders training robust detection models. Leveraging abundant U.S. data offers a solution, though cross-domain generalization remains challenging due to distinct climatic environments and heterogeneous radar systems. This study systematically evaluates the generalization capability of three representative models—TORP, TORP-XGB, and TDA-CNN—trained on the U.S. TorNet dataset and applied to Chinese CINRAD observations (2020–2024) via a zero-shot transfer strategy. The results indicate that while all models demonstrated robust performance in the source domain (with POD values of 0.75, 0.72, and 0.71 for TORP, TORP-XGB, and TDA-CNN, respectively), they experienced varying degrees of performance attenuation in the target domain (with POD values dropping to 0.56, 0.48, and 0.41, respectively). Notably, the TORP model exhibited superior robustness with minimal performance degradation. Further analysis primarily attributes this cross-domain degradation to three factors: disparities in radar systems, magnitude differences in tornado rotational features, and data quality issues. Crucially, sensitivity experiments confirm that linear feature enhancement substantially improves the detection rate and effectively mitigates the cross-domain performance gap, albeit at the cost of increased false alarms. These findings provide a reference for the cross-domain deployment of tornado identification models and future improvements in transfer learning strategies.

Keywords:

tornado detection; weather radar; cross-domain generalization

1. Introduction

Tornadoes are small-scale but highly destructive extreme weather phenomena that typically develop near the base of convective storms and are often accompanied by severe winds, heavy precipitation, and other hazardous conditions [1]. Although their lifetimes are generally short—ranging from several minutes to tens of minutes—their peak near-surface wind speeds can exceed 150 m/s under extreme circumstances, with catastrophic impacts on lives and infrastructure. Consequently, tornadoes have become one of the most threatening natural hazards to human society in the context of ongoing global climate change [2,3]. Statistically, the United States experiences more than 1200 tornadoes annually on average, whereas the annual frequency in China accounts for only approximately 5–10% of that in the United States [4]. Nevertheless, tornadoes in China are predominantly concentrated in densely populated and economically developed regions, such as the Jianghuai Plain, the Huang–Huai Plain, and parts of South China. Moreover, their peak occurrence period (June–August, 14:00–20:00 local time) coincides with periods of intensive human activity, substantially amplifying societal vulnerability and disaster losses [5]. Recent high-impact events, including the 2016 Funing EF4 tornado and the 2020 Gaoyou EF3 tornado in Jiangsu Province, resulted in severe casualties and enormous economic losses, underscoring the urgent need for accurate and timely tornado detection and warning systems [6].

Weather radars, particularly the Doppler weather radar, have long been recognized as the most effective observational tool for tornado monitoring and early warning, owing to their high temporal and spatial resolution and their ability to resolve storm-scale dynamical structures [7]. One of the most prominent radar signatures associated with tornadoes is the Tornado Vortex Signature (TVS), which appears in Doppler radial velocity fields as a region of intense azimuthal shear characterized by adjacent inbound and outbound velocity extrema [8,9]. The identification of TVS has therefore served as a cornerstone for radar-based tornado detection. In addition, the deployment of dual-polarization radar has substantially enhanced tornado monitoring capabilities by providing microphysical information on hydrometeor shape, size, and phase. These additional observables enable improved discrimination between meteorological and non-meteorological echoes and facilitate the detection of Tornado Debris Signatures (TDS), which are strongly indicative of ongoing surface damage caused by tornadoes [10]. More recently, the advancement of phased-array weather radar technology has further improved temporal resolution, allowing for rapid-scan observations that capture the continuous evolution of tornadoes. Such high-frequency measurements provide unprecedented opportunities to resolve fine-scale structural features and dynamical processes during tornado formation and evolution [11].

The development of tornado detection algorithms has evolved from traditional empirical approaches toward modern machine learning and deep learning methodologies. Early tornado identification relied heavily on the subjective visual interpretation of radar imagery by forecasters, as well as on rule-based algorithms employing empirically defined thresholds [12]. These conventional approaches primarily focused on Doppler velocity data and the identification of TVS to infer tornado presence [13]. Within the algorithm generation framework of the U.S. Next Generation Weather Radar (NEXRAD) system, more than 40 operational algorithms have been developed, including the Tornado Detection Algorithm (TDA) and mesocyclone detection algorithms [14,15,16]. Although these methods provide valuable guidance, they are typically constrained by strict physical and threshold-based criteria, which can result in elevated false alarm rates. In operational environments, radar noise, non-meteorological echoes, and data quality issues further degrade algorithm performance, limiting their reliability and robustness [17].

In recent years, the rapid advancement of artificial intelligence and machine learning has ushered tornado detection research into a new stage. Data-driven models are capable of automatically learning discriminative features from large volumes of labeled radar observations and have demonstrated superior performance in tornado detection tasks. A wide range of algorithms, including support vector machines [18], decision trees [19], random forests (RF) [17,20], extreme gradient boosting (XGBoost) [21], and convolutional neural networks (CNN) [22,23], have been successfully applied, yielding substantial improvements over traditional methods. Representative examples include the tornado probability model TORP developed by the U.S. National Severe Storms Laboratory (NSSL) based on RF [20], the CNN-based tornado detection baseline model (TDA-CNN) proposed by the MIT Lincoln Laboratory [22], and the XGBoost-based approach (TDA-XGB) introduced by Zeng et al. [21].

Despite these advances, the practical deployment of machine learning–based tornado detection models in China faces a fundamental challenge: the relatively low frequency of tornado occurrences limits the availability of high-quality, labeled tornado samples for independent model training and optimization. In contrast, the central United States—often referred to as “Tornado Alley”—experiences exceptionally frequent tornado activity and has accumulated extensive, high-quality radar datasets over decades, providing a robust data foundation for model development. Motivated by this disparity in data availability, this study adopts a cross-domain research framework in which models are trained in a source domain (the United States) and evaluated in a target domain (China). Specifically, using comprehensive U.S. tornado radar datasets [23], we construct three representative detection models—TORP, TORP-XGB, and TDA-CNN—and systematically assess their performance using observations from the China New Generation Weather Radar (CINRAD) network and a limited set of operationally confirmed tornado cases. The primary objectives of this study are to evaluate the adaptability and robustness of different algorithmic paradigms under the Chinese radar system, to quantify the impact of radar system discrepancies on model performance, and to analyze the mechanisms underlying cross-domain performance degradation. These findings aim to provide a theoretical basis for the cross-regional application of tornado detection models and to offer technical guidance for the future development of transfer learning-based automated tornado detection systems.

The remainder of this paper is organized as follows. Section 2 introduces the data sources employed in this study. Section 3 details the data processing methodologies and the associated baseline models. Section 4 presents a comparative performance analysis of the trained models on both the U.S. and Chinese test datasets. Finally, Section 5 and Section 6 provide the discussion and conclusions.

2. Data

2.1. U.S. Tornado Radar Dataset (TorNet)

To ensure rigorous model validation and facilitate robust cross-domain applicability, this study leverages two distinct radar-based tornado datasets. The primary foundational dataset, serving as the source domain, is the Tornado Network (TorNet) database, a comprehensive benchmark developed by MIT Lincoln Laboratory in collaboration with multiple research institutions [23]. The genesis of TorNet represents a critical response to the historical scarcity of high-quality, pixel-level annotated tornado observations—a bottleneck that has long impeded the application of advanced data-driven paradigms in severe weather research. By synthesizing a standardized, large-scale, and reproducible dataset, TorNet provides the necessary infrastructure for the systematic development and benchmarking of machine learning and deep learning algorithms, moving the field beyond ad hoc case studies toward statistically significant performance evaluations.

The TorNet archive is extensive, comprising 203,133 radar samples derived from the U.S. NEXRAD network, spanning a decadal period from 2013 to 2022. Spatially, each cropped sample encapsulates a local bounding box (data patch) with a radial extent of 60 km and an azimuthal width of

60^{\circ}

. It is important to note that the actual tornadoes are captured somewhere within these data patches rather than strictly at their geometric centers. Furthermore, the absolute distances of these tornadic events from the radars vary, with the vast majority located within a 150 km radius. Crucially, these data are retained in their native polar coordinate system (

r, ϕ

) rather than being interpolated onto a Cartesian grid. This preservation of raw geometry is vital for deep learning models, as it prevents the introduction of interpolation artifacts and maintains the full, range-dependent spatial resolution inherent to radar sensing. To characterize the rapid temporal evolution and genesis of tornadic vortices, the dataset incorporates a temporal dimension, providing four consecutive volume scans for each event: the reference time T and three antecedent observations at

T - 5

,

T - 10

, and

T - 15

min. Vertically, the dataset provides data from the two lowest elevation angles (0.5° and 0.9°), which are physically the most significant layers for resolving the boundary-layer circulation features and low-level mesocyclones directly associated with tornadogenesis [23]. Table 1 presents the relevant basic information of the dataset.

Regarding event categorization, TorNet employs a rigorous labeling protocol that classifies samples into three distinct groups: confirmed tornado events (TOR), tornado warnings issued without verified touchdowns (WRN), and null events (NUL). This inclusion of the ‘Warning’ category—representing strong mesocyclones that triggered operational alerts but failed to produce tornadoes—is particularly valuable for training models to discriminate between subtle non-tornadic rotation and actual tornadic dynamics, thereby contributing to the reduction in the high false-alarm rate common in legacy algorithms. While these three categories exist, the samples are ultimately assigned a binary label (tornado vs. non-tornado) for detection tasks. The feature space for each radar volume is high-dimensional, consisting of six fundamental spectral and dual-polarization variables: horizontal reflectivity factor (DBZ), radial velocity (VEL), spectrum width (WIDTH), differential reflectivity (ZDR), correlation coefficient (RHOHV), and specific differential phase (KDP). This comprehensive variable set ensures that the models can learn not only from kinematic shear signatures but also from microphysical fingerprints, such as the distinct lowering of

ρ_{h v}

associated with lofted debris.

2.2. Chinese Tornado Case Dataset

To rigorously assess the cross-domain generalization capability and operational transferability of the trained models, we constructed a distinct target-domain dataset utilizing radar observations from China. This dataset focuses specifically on tornado events occurring in Jiangsu Provinces over the past five years—regions recognized as the primary hotspots for severe convective weather in China. Climatologically, Jiangsu is the most tornado-prone province in the country. It is imperative to note the meteorological background of these events: according to extensive regional studies [24,25], the vast majority of tornadoes in Jiangsu—particularly those occurring during the summer months around the Meiyu (plum rain) season—are predominantly triggered by typical mid-latitude synoptic systems (e.g., the Meiyu front, cold fronts, and extratropical cyclones) rather than tropical cyclones. Consequently, these tornadic events share fundamentally similar environmental conditions, thermodynamic profiles, and dynamic forcing with the classic mid-latitude systems dominating the U.S. source datasets. The ground truth for these events was derived from the high-fidelity “Chinese Tornado Case Database,” a comprehensive repository compiled by the Foshan Tornado Key Open Laboratory of the China Meteorological Administration (CMA) [26]. Developed through a multi-institutional collaboration involving the CMA Meteorological Observation Center, Peking University, and the Foshan Tornado Research Center, this database provides a scrutinized archive of verified tornado occurrences across China spanning from 2006 to 2024; these confirmed cases are rigorously verified through a combination of on-site damage surveys, Doppler radar analysis, and corroborated public meteorological reports, ensuring high reliability. Crucially, the database ensures rigorous quality control by providing granular metadata for each event, including precise spatiotemporal coordinates, Enhanced Fujita (EF) scale intensity ratings, and associated confidence levels, thereby serving as a reliable benchmark for scientific validation.

Leveraging the spatiotemporal anchors provided by this historical archive, we retrieved and aggregated volumetric base data from the CINRAD network. Crucially, all observation data utilized in this target-domain dataset were collected exclusively by S-band radars. This selection perfectly matches the wavelength of the U.S. NEXRAD system, thereby eliminating potential wavelength-dependent attenuation biases during cross-domain evaluation. The data extraction process was strictly confined to radar stations located within a radial distance of 150 km from the documented tornado touchdowns. This spatial constraint was imposed to mitigate the effects of beam broadening and sensitivity loss at extended ranges, ensuring that the input data retained sufficient spatial resolution to resolve mesoscale and storm-scale signatures. The resulting Chinese test dataset comprises 32 confirmed tornado events, representing a diverse array of convective modes and intensities. A detailed synopsis of these events, delineating their occurrence times, geographic coordinates, and the corresponding operational radar station identifiers, is presented in Table 2, and the spatial distribution of the confirmed tornado events included in this study is illustrated in Figure 1.

It is imperative to note that this dataset is employed exclusively for independent performance evaluation (inference only) and was not involved in any stage of the model training process. By applying the TORP, TORP-XGB, and TDA-CNN models—which were trained entirely on U.S. NEXRAD data—directly to these CINRAD observations, we aim to evaluate the robustness of the algorithmic paradigms against the “domain shift” inherent in differing radar hardware specifications, scanning strategies, and regional climatic characteristics. This experimental design allows for a critical assessment of whether the learned physical features and deep representations are universally applicable or sensor-dependent.

3. Method

3.1. Data Preprocessing

The azimuthal resolution of the radar base data differs between the two datasets used in this study: the TorNet dataset provides radar observations at an azimuthal resolution of 0.5°, whereas the Chinese CINRAD base data have an azimuthal resolution of 1°. Such discrepancies in angular resolution can lead to inconsistencies in gradient-based feature estimation when using the Linear Least Squares Derivative (LLSD) method [27], thereby affecting the comparability of extracted shear-related features across datasets.

To mitigate this issue, the TorNet radar data were downsampled from 0.5° to 1° in azimuth, ensuring consistency with the CINRAD observations. The downsampling procedure was implemented using a reflectivity-weighted averaging scheme for radial velocity fields [28]. Specifically, two adjacent radar gates sampled at a 0.5° azimuthal spacing were grouped into a single averaging window, within which weighted averaging was applied to obtain the final radar value at 1° resolution. This approach preserves physically meaningful velocity information while reducing resolution-induced biases in subsequent gradient calculations. The formulation is expressed as

\bar{V} = \frac{\sum_{i = 1}^{n} V_{i} Z_{i}}{\sum_{i = 1}^{n} Z_{i}}

(1)

where V denotes radial velocity, Z denotes reflectivity, and n represents the number of radial gates (n = 2 in this study), and i denotes the radial index within a given ray. Figure 2 presents a comparative visualization of a representative tornado sample from the TorNet dataset, contrasting the raw data with the downsampled version. As is evident from the figure, the salient vortex signatures are effectively preserved despite the downsampling process. This downsampling procedure was consistently applied to all other radar variables utilized in this study.

3.2. Feature Extraction

For machine learning-based tornado detection, models typically require carefully designed statistical features derived from radar observations. In this study, LLSD was employed to compute shear-related gradient features. Compared with traditional finite-difference schemes, LLSD estimates gradients by fitting a local plane through least-squares optimization, providing more robust and noise-resistant gradient estimates in Doppler velocity fields [27].

Six fundamental radar variables were used as inputs: DBZ, VEL, WIDTH, ZDR, RHOHV, KDP. For each radar variable field, only valid echo regions with reflectivity values exceeding 20 dBZ were retained. To suppress small-scale, physically insignificant isolated echoes, two successive median filtering operations were applied. Subsequently, a single morphological dilation was performed to enhance echo connectivity and improve the completeness of target structures.

Using the LLSD method, three types of gradients were computed for each radar variable: azimuthal gradients, radial gradients, and total gradients. These gradients characterize spatial variations in radar observables along azimuthal and radial directions and are particularly effective for capturing rotational signatures associated with tornadoes. In the LLSD formulation, the kernel size defines the local neighborhood used for plane fitting; radar observations within the kernel are weighted according to their spatial proximity, with radial and azimuthal offsets measured relative to the kernel center. The fitted plane yields estimates of the radial and azimuthal gradient components, along with a constant offset term.

[\begin{matrix} \sum_{k = 0}^{m \times n} Δ r_{k} Δ θ_{k} & \sum_{k = 0}^{m \times n} w_{k} Δ r_{k}^{2} & \sum_{k = 0}^{m \times n} w_{k} Δ r_{k} \\ \sum_{k = 0}^{m \times n} w_{k} Δ θ_{k}^{2} & \sum_{k = 0}^{m \times n} w_{k} Δ r_{k} Δ θ_{k} & \sum_{k = 0}^{m \times n} w_{k} Δ θ_{k} \\ \sum_{k = 0}^{m \times n} w_{k} Δ θ_{k} & \sum_{k = 0}^{m \times n} w_{k} Δ r_{k} & \sum_{k = 0}^{m \times n} w_{k} \end{matrix}] [\begin{matrix} u_{θ} \\ u_{r} \\ u_{0} \end{matrix}] = [\begin{matrix} \sum_{k = 0}^{m \times n} w_{k} Δ r_{k} u_{k} \\ \sum_{k = 0}^{m \times n} w_{k} Δ θ_{k} u_{k} \\ \sum_{k = 0}^{m \times n} w_{k} u_{k} \end{matrix}]

(2)

where

(m \times n)

denotes the size of the LLSD computational kernel, defined as the local radar-data neighborhood used to fit the least-squares plane.In our experiments, a kernel size of 750 m × 2500 m was employed to balance noise suppression and small-scale vortex preservation;

u_{k}

represents the radar-variable observation (e.g., radial velocity or reflectivity) at the kth grid cell within the kernel;

w_{k}

is the corresponding weight assigned to the kth grid cell;

Δ r_{k}

and

Δ θ_{k}

denote the radial and azimuthal offsets of the kth grid cell relative to the kernel center, respectively. The parameter

u_{0}

is the intercept term of the linear least-squares fit, while

u_{r}

and

u_{θ}

represent the estimated gradients in the radial and azimuthal directions, respectively.

After obtaining the gradient fields, azimuthal shear derived from the radial velocity field (Azshear) was selected as the primary indicator for tornado candidate identification. A threshold of 0.006 s⁻¹ was applied based on a statistical analysis of U.S. storm reports from 2011 to 2018, for which approximately 92% of confirmed tornado samples exceeded this threshold [20]. It is worth noting that downsampling the 0.5° U.S. data to 1.0° acts as a spatial smoothing filter, which naturally dampens the extreme gradient peaks. This inherently makes the U.S.-derived threshold slightly more stringent when applied to the coarser target domain data. Radar gates satisfying the threshold condition were spatially clustered using a depth-first search (DFS) algorithm. Starting from an initial gate, the DFS procedure recursively searches neighboring gates and aggregates all spatially connected gates that simultaneously meet the threshold criterion into a single candidate object, thereby forming preliminary tornado targets. Figure 3 presents a comparative visualization between the raw radial velocity field and the derived Azshear distribution; the Azshear values within the tornado vortex region are significantly amplified compared to the surrounding background flow, exhibiting a distinct high-intensity core.

To reduce the influence of random noise and long-range observational uncertainties, candidate objects consisting of fewer than four radar gates were discarded, and targets with centroid distances from the radar site exceeding 150 km were excluded. Furthermore, candidate objects with centroid separations smaller than 9 km were merged to avoid multiple detections of the same tornado. Once tornado target objects were finalized, a circular region with a radius of 2.5 km centered on the location of maximum Azshear within each object was defined as the feature-extraction domain. Within this region, five statistical descriptors—minimum, 25th percentile, median, 75th percentile, and maximum—were computed for each radar variable and its corresponding LLSD-derived gradient fields. All extracted features were concatenated to form the input feature vectors for the TORP and TORP-XGB models.

3.3. Baseline Models

To evaluate the performance of different algorithmic paradigms in tornado detection, three representative models were selected as baseline approaches: RF, XGBoost, and CNN. RF and XGBoost represent feature-driven machine learning methods that rely on manually extracted physical features, whereas CNN exemplify data-driven deep learning approaches capable of learning features directly from raw radar observations.

3.3.1. TORP

The RF algorithm serves as a robust baseline, constructing an ensemble of mutually independent decision trees via bootstrap aggregation (bagging) [29]. Unlike single estimators, RF introduces dual stochasticity—randomness in both the training sample space (via bootstrapping) and the feature space (via random subset selection at each split node). This mechanism effectively decorrelates the individual trees, thereby reducing the variance of the model and mitigating the risk of overfitting, which is particularly prevalent in high-dimensional meteorological datasets containing outliers.

In this study, the TORP model is instantiated within the RF framework. It ingests a vector of manually engineered features derived from the LLSD algorithm, encompassing both statistical moments of radar variables (e.g., DBZ cores, VEL couplets) and the morphological attributes of storm cells. By aggregating the probabilistic outputs of the entire forest, TORP achieves a stable consensus prediction, making it highly resilient to the heterogeneous noise often present in operational radar data. Figure 4 illustrates the integrated operational workflow of the TORP and TORP-XGB algorithms, delineating the end-to-end process from raw radar data preprocessing and feature extraction to the final probabilistic detection output.

3.3.2. TORP-XGB

To explore the potential of boosting strategies, we employ XGBoost, a scalable and highly optimized implementation of gradient boosted decision trees [30]. Unlike the parallel bagging strategy of RF, XGBoost employs a gradient boosting framework, where decision trees are trained sequentially. Each new tree is optimized to fit the residuals of the preceding ensemble, thereby incrementally improving model accuracy.

Crucially, XGBoost integrates a sophisticated objective function that combines a convex loss term with a regularization term (penalizing model complexity), alongside second-order Taylor approximation for precise optimization. Its implementation of weighted quantile sketches and sparsity-aware split finding makes it exceptionally efficient for handling sparse meteorological data and subtle feature interactions. The TORP-XGB model developed herein retains the feature extraction pipeline of TORP but leverages the gradient boosting framework to capture complex, non-linear dependencies among kinematic and microphysical predictors that standard bagging methods might overlook. The operational flowchart for TORP-XGB is detailed in the lower portion of Figure 4.

3.3.3. TDA-CNN

CNNs are end-to-end deep learning models that automatically learn hierarchical spatial feature representations directly from raw observational data. Through local receptive fields and parameter sharing, CNNs efficiently extract spatial structures while maintaining a relatively compact parameter space. Convolutional layers capture local spatial correlations, pooling layers enhance translational invariance through downsampling, and fully connected layers map high-dimensional feature representations to target classification or regression outputs. Benefiting from their automatic feature learning capability, CNNs have been widely applied to image recognition, object detection, and radar-based meteorological analysis [31].

The TDA-CNN model employed in this study is based on the architecture released by the MIT Lincoln Laboratory, which was specifically designed for tornado detection using the TorNet dataset [23]. The model input is a 14-channel tensor comprising six radar variables (DBZ, VEL, WIDTH, ZDR, RHOHV, KDP) observed at two elevation angles (0.5° and 0.9°), along with a range-folding gate mask. The network backbone consists of four hierarchical convolutional blocks. To address the non-uniform spatial geometry inherent in radar polar coordinates, a CoordConv mechanism is introduced in place of standard two-dimensional convolutions by explicitly incorporating spatial positional information—namely, the radial coordinate and its reciprocal—prior to convolution. This design enhances sensitivity to near-radar tornado signatures and effectively integrates radar geometry with deep feature representation. Figure 5 depicts the data processing pipeline of the TDA-CNN. The architecture enables the automatic mapping from raw multi-channel radar observations to a final tornado probability score via successive convolution and pooling operations.

A notable challenge in applying this architecture to CINRAD is the discrepancy in volume coverage patterns (VCP), specifically the absence of the 0.9° elevation scan. To address this, we implemented a data adaptation strategy wherein the lowest-elevation data (0.5°) is duplicated to populate the 0.9° input channel. While this reduces the vertical independent information, it preserves the channel dimensionality required by the pre-defined architecture, allowing the deep feature extractors to function on the available distinct polarimetric fingerprints.

4. Results

To ensure a rigorously objective and unbiased comparison of algorithmic performance across the divergent operational environments of the United States and China, a unified evaluation framework was enforced. All three candidate models (TORP, TORP-XGB, and TDA-CNN) were subjected to identical testing protocols and spatiotemporal matching criteria within their respective domains.

Given the inherent spatial uncertainty in radar sampling and the rapid translational motion of tornadic supercells, a rigid point-to-point matching is physically impractical. Consequently, we adopted a spatiotemporal proximity criterion rooted in mesoscale dynamics. A prediction is classified as a true positive (TP) if the model-inferred tornado centroid falls within a 5 km radial tolerance and a

\pm 6

min temporal matching window (two consecutive radar volume scans) of a verified tornado report. This threshold was selected to approximate the typical diameter of a mesocyclone or the representative scale of a TVS, accounting for potential navigational errors and grid discretization. Conversely, a false positive (FP) is recorded when the model triggers a detection absent of any verified event within the 5 km neighborhood, while a false negative (FN) denotes a failure to generate a detection signal for a confirmed tornado.

To quantify detection skill, we employed four standard contingency table metrics that are widely adopted in severe weather verification: Probability of Detection (POD), False Alarm Ratio (FAR), Critical Success Index (CSI), and Frequency Bias (BIAS) [32]. These are mathematically defined as follows:

\begin{matrix} POD = \frac{TP}{TP + FN} \\ FAR = \frac{FP}{TP + FP} \\ CSI = \frac{TP}{TP + FP + FN} \\ BIAS = \frac{TP + FP}{TP + FN} \end{matrix}

(3)

Each metric elucidates a distinct aspect of model efficacy. The POD characterizes the model’s sensitivity—its capability to correctly identify observed events—which is paramount for safety-critical warning systems to minimize missed detections. In contrast, the FAR quantifies operational reliability, where lower values indicate a reduction in false alarms, thereby preserving public confidence. However, given the extreme class imbalance inherent in tornado datasets where non-tornado events vastly predominate, relying solely on POD or FAR can be misleading. Therefore, the CSI, which integrates hits, false alarms, and misses while excluding the dominant true negatives, serves as the most robust indicator of overall detection performance and is treated as the primary ranking metric in this study. Complementing these skill scores, the BIAS provides insight into systematic model tendencies; a value of unity indicates perfect frequency consistency, whereas deviations reveal systematic over-forecasting (

> 1

) or under-forecasting (

< 1

). In this analysis, POD, FAR, and CSI are utilized to evaluate detection accuracy, while BIAS serves as an auxiliary criterion for calibrating optimal probability thresholds [33].

4.1. Evaluation on the U.S. Dataset

For source-domain evaluation, tornado samples from the TorNet test set during 2020–2022 were used, comprising 464 confirmed tornado events, along with an equal number of WRN and NUL samples.This subset comprises the entire available test dataset for the 2020–2022 period. To maintain class balance and avoid evaluation bias, equal-sized WRN and NUL samples were randomly selected from the corresponding available cases during the same period. These data were independently input into the TORP, TORP-XGB, and TDA-CNN models for tornado detection. Each model outputs a tornado probability, and a detection is declared when the probability exceeds a predefined threshold.

To systematically examine model sensitivity to threshold selection, probability thresholds were varied from 0.1 to 0.9 in increments, and the corresponding POD, FAR, CSI, and BIAS values were computed. The resulting performance curves are shown in Figure 6. Based on these curves, optimal probability thresholds were determined for each model. Importantly, these thresholds were subsequently fixed and directly applied to the Chinese dataset, ensuring a consistent evaluation baseline across regions and enhancing the comparability and operational relevance of cross-domain results.

Overall, all three models exhibit the characteristic trade-off between detection capability and reliability: as FAR decreases (i.e., reliability increases), POD gradually declines. The general trends of the curves are similar across models. However, TORP and TORP-XGB consistently outperform TDA-CNN on the U.S. dataset. At low to moderate thresholds, TORP and TORP-XGB maintain relatively high POD and CSI values, whereas TDA-CNN achieves comparable performance only at very low thresholds and exhibits a more rapid performance degradation as the threshold increases.

Considering CSI and BIAS jointly—favoring high CSI while maintaining BIAS close to unity—the optimal probability thresholds were determined as 40% for TORP, 40% for TORP-XGB, and 10% for TDA-CNN. These thresholds were held constant in subsequent evaluations on Chinese radar data. Table 3 summarizes the performance of the three models at their respective optimal thresholds. Among them, TORP achieves the highest POD and CSI, indicating the strongest overall detection performance on the U.S. test dataset, followed closely by TORP-XGB, while TDA-CNN performs relatively less effectively.

4.2. Evaluation on the Chinese Dataset

To comprehensively assess model applicability under different regional and radar-system conditions, the evaluation procedure for the Chinese dataset strictly followed that used for the U.S. dataset. The probability thresholds determined from the U.S. TorNet evaluation were fixed and applied without modification to the Chinese test dataset, ensuring consistency in cross-regional assessment.

Figure 7 presents a multi-perspective visualization of a correctly detected tornado event in China, which was simultaneously captured by three distinct radar stations. Consistently across these observations, the reflectivity and radial velocity fields exhibit canonical tornadic signatures. In the reflectivity fields, compact hook echo structures are evident near the vortex locations, characterized by intensity values exceeding 40 dBZ. Correspondingly, the radial velocity fields reveal pronounced inbound–outbound velocity couplets with velocity differences surpassing 50 m/s and minimal spatial separation between extrema, indicating intense rotational shear. All three models (TORP, TORP-XGB, and TDA-CNN) successfully identified the tornado target across the scans from all participating radars.

Table 4 provides a comprehensive synthesis of the predictive performance for the three algorithmic paradigms when applied to the target domain (Chinese dataset). A comparative analysis against the source-domain (U.S.) benchmarks reveals a consistent trend of performance attenuation across all models, confirming the challenge posed by domain shifts inherent in differing radar sensing environments. This degradation is quantitatively manifested as a contraction in POD coupled with an inflation in FAR, collectively leading to suppressed CSI scores. However, the magnitude of this skill deterioration varies significantly among the architectures. The TORP model exhibited the most robust stability, suffering the least pronounced drop in detection metrics, followed by the TORP-XGB. Conversely, the TDA-CNN experienced the most substantial decline in performance. This disparity suggests that while deep learning models offer superior sensitivity in the source domain, they may be more susceptible to overfitting specific sensor characteristics (e.g., NEXRAD data distribution). In contrast, the feature-driven TORP framework demonstrates superior cross-domain generalization capability, indicating that explicit physical features retain higher transferability across heterogeneous radar networks than the latent representations learned by current deep learning architectures.

Figure 8 exemplifies a representative FN scenario, highlighting the limitations of current algorithmic paradigms in identifying weak or disorganized systems. In the horizontal reflectivity field, the convective structure appears notably fragmented, exhibiting relatively attenuated and diffuse echoes. Crucially, the storm lacks the classic supercellular morphology; specifically, there is an absence of a coherent hook echo or a distinct inflow notch, which serves as a primary visual anchor for identifying low-level rotation. Kinematically, the radial velocity signature presents further ambiguity. While a weak inbound–outbound velocity couplet is discernible, the rotational dynamics are ill-defined. The velocity gradients are spatially delocalized rather than exhibiting the sharp, gate-to-gate shear characteristic of a compact mesocyclone or Tornado Vortex Signature. Consequently, this event falls into a “gray zone” for the automated detectors. For the feature-driven models (TORP and TORP-XGB), the calculated physical descriptors—such as rotational velocity and shear intensity—likely fell below the critical decision thresholds required to trigger a positive classification. Similarly, the TDA-CNN failed to resolve the event; the indistinct spatial topology and lack of sharp gradients prevented the convolutional layers from extracting salient high-level feature representations, resulting in the network classifying the pattern as non-tornadic background noise.

A prevalent source of false alarms within the target domain is attributed to radar data quality degradation, specifically contamination from non-meteorological scatterers and signal processing artifacts. Figure 9 illustrates a representative false positive case driven by range folding and ground clutter. In the radial velocity field, these artifacts do not manifest as random noise but rather generate fragmented, alternating inbound–outbound velocity patterns. Due to the aliasing or phase ambiguity inherent in range-folded gates, these discontinuities can inadvertently create spurious azimuthal shear signatures that spatially mimic the velocity couplet of a mesocyclone. Compounding this issue is the reflectivity presentation. The corresponding reflectivity field exhibits localized core values exceeding 30 dBZ. This coincidence creates a “numerical mimicry” in the feature space: the artifacts possess both the shear intensity (from chaotic velocity patterns) and the reflectivity intensity (from clutter or second-trip echoes) required to satisfy the detection criteria. Consequently, during the feature extraction phase, these non-tornadic signatures map to a vector space region that overlaps with the manifold of genuine tornadic vortices. The algorithms, unable to distinguish the texture of biological/terrain clutter from meteorological hydro-meteors, erroneously interpret these high-gradient features as tornadic dynamics, resulting in a high-confidence false alarm.

4.3. Feature Distribution Analysis and Sensitivity Experiment

To elucidate the physical mechanisms driving the observed cross-domain performance attenuation, we conducted a comparative statistical analysis of radar-derived feature distributions between the source (U.S.) and target (China) domains. Figure 10 identifies the top-five most significant features based on their Gini importance ranking, derived from the TORP framework during the training phase. These features represent the primary “digital footprints” used by the models to distinguish tornadic vortices from non-tornadic shear. Detailed definitions and physical interpretations of these variables are provided in Table A1 of Appendix A. To ensure statistical validity, the samples used for this characterization were drawn from a held-out validation set, and were strictly independent of the test samples used in previous sections.

The comparative statistics, summarized in Table 5, reveal a systematic “intensity gap” between the two regions. Specifically, radar signatures associated with Chinese tornadoes exhibit significantly attenuated magnitudes compared to their American counterparts across all key metrics. For instance, the median Azshear_max for Chinese samples is 0.0080 s⁻¹, which is approximately 1.4 times weaker than the 0.0115 s⁻¹ observed in U.S. samples. Similar reductions are evident in velocity gradients (Vrgrad_min) and median azimuthal shear, suggesting that Chinese tornadic signatures are inherently fainter within the radar reflectivity and velocity fields.

To validate whether this magnitude disparity is the primary driver of performance degradation, we performed a sensitivity experiment using a linear gain adjustment strategy. Based on the statistically derived ratio of feature means (

μ_{s o u r c e} / μ_{t a r g e t} \approx 1.35

), a scalar enhancement factor of 1.35 was applied to the shear-related feature vectors of the Chinese dataset during the inference phase. As presented in Table 6, this recalibration led to a marked improvement in POD for both TORP (0.69) and TORP-XGB (0.54), confirming that aligning the feature manifolds partially mitigates the cross-domain gap.

The systematic attenuation of Chinese tornadic signatures compared to U.S. samples can be primarily attributed to distinct climatological storm environments and storm-scale dynamics. Unlike the classic Great Plains supercells in the U.S., which frequently develop in environments with extreme convective available potential energy (CAPE) and produce deep, long-lived, and intense mesocyclones, Chinese tornadoes—particularly those in Jiangsu Province during the Meiyu season—typically occur in environments characterized by lower local instability (lower CAPE) but strong low-level wind shear. Consequently, the parent storms and associated vortices in the Chinese dataset are generally physically smaller in diameter and shallower in vertical extent, and exhibit inherently weaker rotational velocities. These meteorological realities translate directly to the fainter “digital footprints” observed in the target-domain radar feature space.

5. Discussion

The “intensity gap” documented in Section 4.3 introduces a fundamental challenge of covariate shift, as the decision boundaries of tree-based models (RF and XGBoost) calibrated on the high-intensity manifold of U.S. supercells effectively learned a “high activation threshold.” When applied to the systematically weaker Chinese tornado signatures, these source-trained models underestimate event probabilities and contract the Probability of Detection (POD). While our linear enhancement experiment (Table 6) serves as a preliminary sensitivity analysis—demonstrating that a 1.35 scalar multiplier can recover a significant portion of the POD—this heuristic approach inherently magnifies non-tornadic shear features and environmental noise. The resulting concurrent rise in the FAR yields only a marginal improvement in the overall CSI, demonstrating that simple magnitude scaling cannot fully resolve the cross-domain challenge. Rather, the performance gap is driven by complex physical discrepancies: Chinese mesocyclones are generally smaller in diameter, shallower in vertical height, and more transient than classic U.S. supercells, evolving within environments characterized by fundamentally different local instability and vertical shear profiles. Future work should explore more sophisticated, non-linear mapping and feature alignment methods to achieve robust domain adaptation.

The varying degrees of degradation between TORP and TORP-XGB stem from their internal logic. TORP, based on a bagging mechanism, aggregates decisions from independent trees, making it more robust to absolute scaling shifts. In contrast, XGBoost’s boosting framework is highly sensitive to the precise split thresholds learned during sequential optimization. When the target-domain feature distributions shift, these learned thresholds become sub-optimal, causing residual errors to accumulate and distorting the decision boundary.

For the CNN-based model (TDA-CNN), the degradation is linked to the activation of convolutional kernels. CNNs learn to detect local spatial textures and gradient structures. Because Chinese tornadoes often manifest as smaller, more transient, and less coherent rotational structures, the deep convolutional layers—trained to recognize the well-organized patterns of U.S. tornadoes—are insufficiently activated. This lack of high-level feature response prevents the final classifier from forming high-confidence decisions, leading to the observed drop in CSI.

Regarding future operational deployment, it is important to emphasize that the performance metrics reported in this study are strictly objective algorithmic outputs. These results are derived using fixed probability thresholds optimized on the U.S. dataset without human intervention. In actual practice at the CMA, these thresholds are not static parameters. To align with public safety priorities and the “over-warning” social preference observed in regions like the U.S., forecasters can dynamically lower the decision thresholds. Such an adjustment would successfully increase the operational POD to minimize missed events, albeit at the cost of a higher operational FAR. This study provides the baseline algorithmic capability, which can be further refined through local adaptation and operational threshold tuning.

6. Conclusions

In this study, three representative tornado identification models—TORP, TORP-XGB, and TDA-CNN—were developed based on the U.S. TorNet dataset and systematically evaluated for their cross-domain generalization capability using CINRAD observations from China (2020–2024). This evaluation strategy, based on fixed optimal operating points, ensured consistency across domains and enabled an objective assessment of zero-shot model transferability. The main conclusions are summarized as follows:

1.

Generalization Gap and Model Robustness: While all models demonstrated strong detection capability in the U.S. source domain, varying degrees of performance degradation were observed when applied to the Chinese target domain, primarily reflected in the decreased POD and CSI. Among the three paradigms, the bagging-based TORP model exhibited the smallest degradation, indicating greater robustness to cross-domain variability compared to boosting-based and deep learning approaches.

2.

Drivers of Degradation: The primary causes of this performance attenuation are identified as a triad of factors: (i) systematic differences in radar hardware and scanning strategies (e.g., resolution, VCPs); (ii) data quality issues, specifically the higher prevalence of noise and urban interference in the target domain; and (iii) morphological discrepancies, where Chinese tornadoes typically exhibit shorter lifetimes, smaller spatial scales, and more fragmented organizations than their U.S. counterparts.

3.

Future Perspectives: Although zero-shot transfer remains challenging, the results demonstrate that machine learning models retain considerable potential for cross-regional applications. Preliminary feature enhancement experiments provided initial evidence that reducing feature distribution discrepancies can improve generalization. Future research will focus on three strategic directions:

Developing domain-invariant feature alignment methods to mathematically harmonize radar-feature discrepancies.
Stratifying training datasets by tornado EF scale to select source-domain samples that better match the dominant intensity characteristics of the target region.
Implementing transfer learning strategies (e.g., few-shot learning) that leverage a limited number of target-domain samples to fine-tune model parameters while preserving transferable source-domain representations.

These efforts aim to improve model robustness, ultimately supporting the operational deployment of tornado identification systems across heterogeneous radar networks and climatic regimes.

Author Contributions

Conceptualization, B.J. and S.Z.; methodology and investigation, B.J. and S.Z.; data curation, S.Z. and Y.C.; visualization B.J. and Y.W.; writing—review and editing, B.J. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been supported by the National Natural Science Foundation of China (grant nos. 42405141 and U2542207) and Open Foundation of the China Meteorological Administration Tornado Key Laboratory (grant no. TKL202310).

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. The TOP5 features.

Feature	Predictor
Azshear_max	The maximum value of azimuthal shear
Vrgrad_min	The minimum value of the radial velocity gradient
Vrgrad_25th	The 25th percentile of the radial velocity gradient
Azshear_median	The median value of azimuthal shear
Azshear_min	The minimum value of azimuthal shear

References

Fan, W.; Yu, X. Spatial and Temporal Distribution Characteristics of Tornadoes in China. Meteorol. Mon. 2015, 41, 793–805. [Google Scholar]
Niloufar, N.; Naresh, D.; Valerie, W.; Reza, K. Explaining the trends and variability in the United States tornado records using climate teleconnections and shifts in observational practices. Sci. Rep. 2021, 11, 1741. [Google Scholar] [CrossRef] [PubMed]
He, J.; Zeng, Q.; Wang, H.; Shi, C. Research Progress in Radar Detection of Tornadoes. J. Chengdu Univ. Inf. Technol. 2018, 33, 477–489. [Google Scholar] [CrossRef]
Zhou, R.; Meng, Z.; Bai, L. Differences in tornado activities and key tornadic environments between China and the United States. Int. J. Climatol. 2021, 42, 367–384. [Google Scholar] [CrossRef]
Yu, X.; Zhao, J.; Fan, W. Spatial-Temporal Distribution and Characteristics of Key Environmental Parameters of Tornadoes in China. J. Trop. Meteorol. 2021, 37, 681–692. [Google Scholar] [CrossRef]
Zhang, X.; Yang, B.; Zhu, W.; Fang, C.; Liu, X.; Zhou, K.; Lan, Y.; Tian, F. Weather Analysis of an EF4-Rated Tornado in Funing, Jiangsu on 23 June 2016. Meteorol. Mon. 2016, 42, 1304–1314. [Google Scholar]
Simmons, K.M.; Sutter, D. WSR-88D radar, tornado warnings, and tornado casualties. Am. Meteorol. Soc. 2005, 20, 301–310. [Google Scholar] [CrossRef]
Lemon, L.R.; Donaldson, R.J.; Burgess, D.W. Doppler Radar Application to Severe Thunderstorm Study and Potential Real-Time Warning. Bull. Am. Meteorol. Soc. 1977, 58, 1187–1193. [Google Scholar] [CrossRef]
Wakimoto, R.M.; Wilson, J.W. Non-supercell tornadoes. Mon. Weather. Rev. 1989, 117, 1113–1140. [Google Scholar] [CrossRef]
Ryzhkov, A.V.; Schuur, T.J.; Burgess, D.W.; Zrnic, D.S. Polarimetric Tornado Detection. Am. Meteorol. Soc. 2005, 44, 557–570. [Google Scholar] [CrossRef]
Mei, Y.; Chen, S.; Liu, C.; Lei, Z. Radar Observation-Based Analysis of a Waterspout Event in the Pearl River Estuary in June 2021. J. Trop. Meteorol. 2022, 38, 825–832. [Google Scholar] [CrossRef]
Zheng, Y.; Yang, B.; Zhou, K.; Lan, Y.; Sheng, J.; Cao, Y.; Tian, F.; Zhou, X. Progress in Tornado Monitoring, Forecasting and Early Warning Technologies. J. Mar. Meteorol. 2025, 45, 1–13. [Google Scholar] [CrossRef]
Trafalis, T.B.; White, A. Data Mining Techniques for Pattern Recognition: Tornado Signatures in Doppler Weather Radar Data. Int. J. Smart Eng. Syst. Des. 2003, 5, 347–359. [Google Scholar] [CrossRef]
Mitchell, E.D.W.; Vasiloff, S.V.; Stumpf, G.J.; Witt, A.; Eilts, M.D.; Johnson, J.; Thomas, K.W. The national severe storms laboratory tornado detection algorithm. Weather. Forecast. 1998, 13, 352–366. [Google Scholar] [CrossRef]
Desrochers, P.R.; Donaldson, R.J., Jr. Automatic tornado prediction with an improved mesocyclone-detection algorithm. Weather. Forecast. 1992, 7, 373–388. [Google Scholar] [CrossRef]
Wang, Y.; Yu, T.Y.; Yeary, M.; Shapiro, A.; Nemati, S.; Foster, M.; Andra, D.L., Jr.; Jain, M. Tornado detection using a neuro–fuzzy system to integrate shear and spectral signatures. J. Atmos. Ocean. Technol. 2008, 25, 1136–1148. [Google Scholar] [CrossRef]
Zeng, Q.; Qing, Z.; Zhu, M.; Zhang, F.; Wang, H.; Liu, Y.; Shi, Z.; Yu, Q. Application of Random Forest Algorithm on Tornado Detection. Remote Sens. 2022, 14, 4909. [Google Scholar] [CrossRef]
Trafalis, T.B.; Ince, H.; Richman, M.B. Tornado detection with support vector machines. In Proceedings of the International Conference on Computational Science; Springer: Berlin/Heidelberg, Germany, 2003; pp. 289–298. [Google Scholar]
Gagne, D.J.; McGovern, A.; Brotzge, J. Classification of Convective Areas Using Decision Trees. J. Atmos. Ocean. Technol. 2009, 26, 1341–1353. [Google Scholar] [CrossRef]
Sandmæl, T.N.; Smith, B.R.; Reinhart, A.E.; Schick, I.M.; Ake, M.C.; Madden, J.G.; Steeves, R.B.; Williams, S.S.; Elmore, K.L.; Meyer, T.C. The Tornado Probability Algorithm: A Probabilistic Machine Learning Tornadic Circulation Detection Algorithm. Weather. Forecast. 2023, 38, 445–466. [Google Scholar] [CrossRef]
Zeng, Q.; Zhang, G.; Huang, S.; Song, W.; He, J.; Wang, H.; Liu, Y. A Novel Tornado Detection Algorithm Based on XGBoost. Remote Sens. 2025, 17, 167. [Google Scholar] [CrossRef]
Xie, J.; Zhou, K.; Chen, H.; Han, L.; Guan, L.; Wang, M.; Zheng, Y.; Chen, H.; Mao, J. Multi-Task Learning for Tornado Identification Using Doppler Radar Data. Geophys. Res. Lett. 2024, 51, e2024GL108809. [Google Scholar] [CrossRef]
Veillette, M.S.; Kurdzo, J.M.; Stepanian, P.M.; Cho, J.Y.; Reis, T.; Samsi, S.; McDonald, J.; Chisler, N. A Benchmark Dataset for Tornado Detection and Prediction using Full-Resolution Polarimetric Weather Radar Data. Artif. Intell. Earth Syst. 2025, 4, e240006. [Google Scholar] [CrossRef]
Meng, Z.; Bai, L.; Zhang, M.; Wu, Z.; Li, Z.; Pu, M.; Zheng, Y.; Wang, X.; Yao, D.; Xue, M.; et al. The deadliest tornado (EF4) in the past 40 years in China. Weather. Forecast. 2018, 33, 693–713. [Google Scholar] [CrossRef]
Xu, F.; Zheng, Y.; Sun, K. Spatiotemporal distribution and storm morphological characteristics of tornadoes in Jiangsu. Meteorol. Mon. 2021, 47, 517–528. [Google Scholar]
China Tornado Database. 2025. Available online: https://www.fs121.com/tornado/#/list?code=11100 (accessed on 10 November 2025).
Mahalik, M.C.; Smith, B.R.; Elmore, K.L.; Kingfield, D.M.; Ortega, K.L.; Smith, T.M. Estimates of Gradients in Radar Moments Using a Linear Least Squares Derivative Technique. Weather. Forecast. 2019, 34, 415–434. [Google Scholar] [CrossRef]
Cohen, B.K.; Bodine, D.J.; Yeary, M.B.; Snyder, J.C.; Bluestein, H.B. Examining Meteorological Benefits of Rapid-Scan, Fully Digital Phased Array Radar Observations for Detecting Tornado Formation and Intensification. J. Atmos. Ocean. Technol. 2025, 42, 909–933. [Google Scholar] [CrossRef]
Zeng, Q.; Qing, Z.; Chen, Y.; Wang, H.; Zhou, H.; Liu, Y. Random Forest-Based Tornado Detection Algorithm for Networked Radar. J. Trop. Meteorol. 2023, 39, 825–837. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. arXiv 2016, arXiv:1603.02754. [Google Scholar] [CrossRef]
Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 2020, 53, 5455–5516. [Google Scholar] [CrossRef]
Yang, Y.H.; King, P. Investigating the Potential of Using Radar Echo Reflectivity to Nowcast Cloud-to-Ground Lightning Initiation over Southern Ontario. Weather. Forecast. 2010, 25, 1235–1248. [Google Scholar] [CrossRef]
Xu, Y.; Tang, G.; Li, L.; Wan, W. Multi-source precipitation estimation using machine learning: Clarification and benchmarking. J. Hydrol. 2024, 635, 131195. [Google Scholar] [CrossRef]

Figure 1. Spatial distribution of the confirmed tornado events. (Note: The map displays the broader historical tornado records in the region to provide geographic context. The specific 32 events evaluated in the target domain are a subset of these records).

Figure 2. Comparison of a tornado radar sample before and after downsampling. (a) Original raw data. (b) Downsampled data.

Figure 3. Visual representation of shear extraction using the LLSD method, where the highlighted regions denote the tornado occurrence areas.

Figure 4. Schematic representation of the TORP and TORP-XGB.

Figure 5. Architectural overview of the TDA-CNN model.

Figure 6. Performance benchmarks of the three tornado detection models on the TorNet dataset, where the yellow points denote the selected optimal probability thresholds points.

Figure 7. Multi-radar visualization of the Lianshui tornado event (UTC12:21, 19 September 2023) and model detection results. The event was cross-referenced by three radar stations. The subpanels display the horizontal reflectivity and radial velocity fields from: (a,b) Station Z9515; (c,d) Station Z9518; and (e,f) Station Z9517. Overlaid circles denote the detection centroids generated by the three models: blue circles for TORP, black circles for TORP-XGB, and yellow circles for TDA-CNN. The adjacent numerical values indicate the predicted tornado probability associated with each detection.

Figure 8. Visualization of a false negative (missed detection) case occurring in Suining, Xuzhou (17 September 2024, 04:26–04:38 UTC). The panels display (a) the horizontal reflectivity factor and (b) the radial velocity. The center of the white circle indicates the ground truth location of the verified tornado. Note that no detection markers were generated by any of the models for this event.

Figure 9. Visualization of a false alarm (false positive) case in Suining, Xuzhou (17 September 2024, 04:26–04:38 UTC). The panels display (a) the horizontal reflectivity factor and (b) the radial velocity. The black circle indicates the location of the spurious detection generated by the model (false alarm), where no tornado occurrence was verified.

Figure 10. Ranking of the top-five feature importance scores derived from the training dataset.

Table 1. Overview of the TorNet Dataset.

Attribute Category	Detailed Description
Coverage Period	2013–2022
Sample Type	TOR/WRN/NUL
Basic Variables	DBZ, VEL, WIDTH, ZDR, RHOHV, KDP
Sample Resolution	Angular Resolution ( $0 . 5^{\circ}$ )/Range Bin Resolution (250 m)
Sample Size	120 × 240 (120 radials, 240 range bins)
Other Sample Information	Occurrence Time, EF Scale, Radar Station Information, etc.

Table 2. Radar tornado observations.

Occurrence Time (UTC+8)	Occurrence Location	Radar
11 September 2024 10:30	Jiangyin, Wuxi, Jiangsu	Z9519, Z9523, Z9513, Z9572
16 September 2024 9:00	Dongtai, Yancheng, Jiangsu	Z9523, Z9513, Z9515
17 September 2024 12:26–12:38	Suining, Xuzhou	Z9527, Z9539, Z9517
17 September 2024 14:07–14:12	Tongshan, Xuzhou	Z9516, Z9527, Z9539, Z9537
17 September 2024 14:33–14:41	Tongshan, Xuzhou	Z9516, Z9527, Z9539, Z9537
17 September 2024 15:30–15:33	Taierzhuang, Zaozhuang	Z9516, Z9527, Z9539, Z9518
17 September 2024 15:31–15:33	Peixian, Xuzhou	Z9516, Z9527, Z9537
17 September 2024 15:47–16:10	Donghai, Lianyungang	Z9516, Z9527, Z9539, Z9518
17 September 2024 16:02–16:09	Peixian, Xuzhou	Z9516, Z9537
17 September 2024 16:58–17:09	Sihong County, Suqian	Z9517, Z9527, Z9552
13 August 2023 15:15–15:18	Dafeng District, Yancheng	Z9513, Z9515, Z9523
13 August 2023 15:50	Dafeng District, Yancheng	Z9513, Z9515, Z9523
19 September 2023 15:40	Guanyun, Lianyungang	Z9517, Z9518, Z9527
19 September 2023 16:41	Suining County, Xuzhou	Z9516, Z9517, Z9527, Z9552
19 September 2023 17:20	Sucheng District, Suqian	Z9516, Z9517, Z9518, Z9527
19 September 2023 17:34–17:50	Suyu District, Suqian	Z9516, Z9517, Z9518, Z9527
19 September 2023 18:04	Siyang County, Suqian	Z9516, Z9517, Z9518, Z9527
19 September 2023 18:50	Xiangshui County, Yancheng	Z9515, Z9518, Z9527
19 September 2023 20:04–20:35	Lianshui County, Huai’an	Z9515, Z9517, Z9518, Z9527
19 September 2023 20:35	Binhai County, Yancheng	Z9515, Z9518
19 September 2023 21:31	Jianhu County, Yancheng	Z9515, Z9517, Z9518, Z9523
20 July 2022 8:30	Shuyang County, Suqian	Z9517, Z9518, Z9527, Z9539
20 July 2022 11:40	Guanyun, Lianyungang	Z9517, Z9518, Z9527, Z9539
20 July 2022 12:50–13:10	Xiangshui County, Yancheng	Z9515, Z9517, Z9518, Z9527
20 July 2022 12:20	Huaiyin District, Huai’an	Z9515, Z9517, Z9518, Z9527
26 July 2022 15:00	Gusu District, Suzhou	Z9002, Z9513, Z9519, Z9572
26 July 2022 13:00	Sheyang County, Yancheng	Z9515, Z9518
15 June 2021 0:00	Tongshan District, Xuzhou	Z9516, Z9527
20 August 2021 17:15	Huai’an District, Huai’an	Z9515, Z9517, Z9518, Z9527
20 August 2021 18:30	Yandu District, Yancheng	Z9515, Z9517, Z9523
23 August 2021 15:30	Huai’an District, Huai’an	Z9515, Z9516, Z9517, Z9527
22 July 2020 21:48	Gaoyou, Yangzhou	Z9515, Z9517, Z9519, Z9523

Table 3. Performance metrics of the three models on the U.S. TorNet dataset at optimal thresholds.

	POD	FAR	CSI
TORP	0.75	0.33	0.55
TORP-XGB	0.72	0.31	0.54
TDA-CNN	0.71	0.31	0.53

Table 4. Performance evaluation of the three tornado detection models on the Chinese target-domain dataset.

	POD	FAR	CSI
TORP	0.56	0.34	0.43
TORP-XGB	0.48	0.29	0.40
TDA-CNN	0.41	0.39	0.33

Table 5. Statistical comparison of key radar feature values between U.S. and Chinese tornado samples (Unit: s⁻¹).

Region	Azshear_max	Vrgrad_min	Vrgrad_25th	Azshear_median	Azshear_min
China Region	0.0080	0.0057	0.0069	0.0060	0.0045
USA Region	0.0115	0.0072	0.0090	0.0089	0.0064

Table 6. Performance evaluation of the three models on the Chinese test dataset following feature enhancement.

	POD	FAR	CSI
TORP	0.69	0.44	0.44
TORP-XGB	0.54	0.32	0.43

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jiang, B.; Zhang, S.; Chen, Y.; Li, X.; Wang, Y. Analysis of Generalization Performance of Tornado Detection Models: A Cross-Domain Evaluation from U.S. to Chinese Weather Radar Observations. Remote Sens. 2026, 18, 948. https://doi.org/10.3390/rs18060948

AMA Style

Jiang B, Zhang S, Chen Y, Li X, Wang Y. Analysis of Generalization Performance of Tornado Detection Models: A Cross-Domain Evaluation from U.S. to Chinese Weather Radar Observations. Remote Sensing. 2026; 18(6):948. https://doi.org/10.3390/rs18060948

Chicago/Turabian Style

Jiang, Biao, Shuai Zhang, Yubao Chen, Xuehua Li, and Yancheng Wang. 2026. "Analysis of Generalization Performance of Tornado Detection Models: A Cross-Domain Evaluation from U.S. to Chinese Weather Radar Observations" Remote Sensing 18, no. 6: 948. https://doi.org/10.3390/rs18060948

APA Style

Jiang, B., Zhang, S., Chen, Y., Li, X., & Wang, Y. (2026). Analysis of Generalization Performance of Tornado Detection Models: A Cross-Domain Evaluation from U.S. to Chinese Weather Radar Observations. Remote Sensing, 18(6), 948. https://doi.org/10.3390/rs18060948

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Analysis of Generalization Performance of Tornado Detection Models: A Cross-Domain Evaluation from U.S. to Chinese Weather Radar Observations

Highlights

Abstract

1. Introduction

2. Data

2.1. U.S. Tornado Radar Dataset (TorNet)

2.2. Chinese Tornado Case Dataset

3. Method

3.1. Data Preprocessing

3.2. Feature Extraction

3.3. Baseline Models

3.3.1. TORP

3.3.2. TORP-XGB

3.3.3. TDA-CNN

4. Results

4.1. Evaluation on the U.S. Dataset

4.2. Evaluation on the Chinese Dataset

4.3. Feature Distribution Analysis and Sensitivity Experiment

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI