Given the highly complex nonlinear relationships between lightning occurrence and various atmospheric physical parameters, which cannot be fully captured by traditional physical formulas or linear models, we adopted a data-driven deep learning approach. By employing the Multi-modal Decoding Enhanced UNet (MDE-UNet), this work establishes a same-size mapping between radar and satellite observations and lightning probability distribution, enabling end-to-end identification of lightning initiation timing.
3.1. Model Architecture
3.1.1. Channel Attention Mechanism Block (CA Block)
To address multi-source data fusion and noise interference in lightning monitoring, we introduce a lightweight channel attention mechanism for feature recalibration, which is designed to adaptively capture and amplify critical channel information while suppressing irrelevant noise channels.
Given an input feature map
(where
C represents the number of channels,
H denotes the height of the feature map, and
W is the width of the feature map), we first perform global average pooling to obtain a channel-wise descriptor that aggregates global spatial information for each channel, enabling the adaptive identification of key channelcharacteristics:
In the above equation, the operator implements global spatial summation over all spatial positions of the feature map, and the scaling factor normalizes the summation result to the range of the average value, ensuring the magnitude of the channel-wise descriptor s is invariant to the spatial dimensions of the input feature map.
We then learn adaptive channel weights
via a two-layer fully connected network, where the weights are automatically optimized to highlight channels carrying discriminative information for lightning monitoring (e.g., deep convection and cloud-top microphysics) and weaken redundant ones:
In the above equation, the parameters are defined as follows: and : weight matrix and bias vector of the first fully connected layer, where r (reduction ratio, selected to 2 in this study) is the dimensionality reduction factor to reduce computational complexity. Specifically, a smaller reduction ratio is adopted to capture finer-grained channel correlations inherent in radar echo features. and : weight matrix and bias vector of the second fully connected layer, which restores the dimensionality to the original number of channels C; : rectified linear unit activation function, introduced to add non-linearity and enhance the model’s ability to learn complex channel correlations; : sigmoid activation function, which maps the output of the fully connected network to the range , ensuring the channel weights can act as scaling factors to modulate the original feature map.
The core advantage of this design is that the weights are not manually predefined but adaptively learned from the data, enabling the model to dynamically focus on task-critical channels.
Finally, we recalibrate
X by element-wise multiplication with
, where the adaptive weights directly modulate the contribution of each channel to achieve targeted enhancement of key channel information:
In the above equation, is the recalibrated feature map, and the operator ⊙ denotes element-wise multiplication (Hadamard product) between the channel weight vector and the original feature map X—specifically, each element (the c-th element of ) is multiplied with all spatial elements of the c-th channel of X, thus scaling the entire channel by the learned weight.
This mechanism adaptively captures and reinforces key channels related to deep convection and cloud-top microphysics, suppresses redundant noise channels that interfere with lightning detection, and achieves efficient multi-source data fusion with low computational cost, improving lightning activity identification accuracy. The module structure is shown in
Figure 4.
3.1.2. Enhanced Decoding Unit Based on Weighted Sliding Window Multilayer Perceptron (WWMLP Block)
To address the core challenge of inversely mapping high-level semantic features to a high-resolution, spatially continuous physical field when reconstructing lightning-affected areas based on radar and multi-channel Himawari satellite data (including Band 9, TBB 13, TBB 15, and their derived indices), we innovatively propose an enhanced decoding unit based on a weighted sliding window multilayer perceptron (WWMLP Block) along the critical path of the decoder. Traditional decoding methods reconstruct the spatial distribution of lightning by upsampling the fused high-level features that have incorporated radar structural information and satellite cloud-top microphysical states; however, they often result in fragmented reconstruction outcomes, blurred boundaries, and numerous spatially isolated false-positive artifacts, leading to poor physical consistency. To solve this problem, this module employs a weighted sliding window mechanism to dynamically aggregate contextual information from neighboring fused features at each spatial position during the reconstruction process. By jointly considering the satellite brightness temperature gradient features corresponding to the current point and the surrounding radar reflectivity features, this module can explicitly model and reinforce the inherent spatial continuity and physical correlation of lightning-affected areas in the real atmosphere, thereby effectively suppressing isolated false signals that deviate from meteorological principles, while also enhancing the model’s ability to identify signals indicating strong lightning, improving the model’s sensitivity in capturing potential strong lightning features, and further optimizing the extraction and judgment accuracy of lightning-related features.
Simultaneously, the core multilayer perceptron (MLP) within the module—consisting of two fully-connected (FC) layers with the ReLU activation function—leverages its robust nonlinear fitting capability to precisely learn the complex mapping function from aggregated local multi-source feature vectors to the final lightning occurrence probability. This inherently deciphers the deep nonlinear physical relationship between composite indices (e.g., TBB15-TBB13), which reflect cloud-top ice crystallization processes, and lightning activity.
In terms of computational overhead, in scenarios where the number of input channels (total channels of fused radar-satellite features) is greater than the number of output channels (channels of lightning physical field features), the core parameter count formulas of the two are as follows (derived under the assumption of fair comparison for decoding scenarios):
Total parameters of WWMLP (2-layer MLP with ReLU activation):
Total parameters of standard double convolution:
Based on this formula, the parameter count of this module is significantly reduced compared to the standard double convolution block Experiments demonstrate that this decoding unit enables the model to generate spatially smooth, sharply bounded, and physically credible reconstructions of lightning-affected areas. It significantly reduces false alarms while maintaining a high detection rate, achieving precise and robust decoding from multi-source features to a high-quality lightning physical field. The module design is illustrated in
Figure 5.
3.1.3. Multi-Scale Feature Fusion Module (Fusion Block)
To address the operational challenges in traditional lightning identification, such as fragmented continuity of convective signals and loss of key microphysical features due to dispersed and poorly coordinated multi-scale reconstruction features, we designed a multi-scale feature fusion module. This module serves as an integration and optimization hub for the outputs of each decoder level. First, interpolation combined with cross-layer skip connections is used to perform spatial dimension alignment and channel concatenation on feature maps of different resolutions and levels, enabling the preliminary fusion of high-level and low-level features at the same scale. Then, a Dual Convolution Block ((Conv2d + Batch Norm + RELU)
) is employed to extract local cross-scale convective organization features and reconstruct vertical structural coherence, effectively addressing operational issues such as incoherent vertical structures in mesoscale convective systems and blurred weak-echo meteorological signals. The channel attention mechanism (CA Block) is introduced to apply physics-aware weighting based on the multi-scale consistency of feature channels in regions of convective available potential energy release, ice-phase particle scattering enhancement, and charge enrichment zones, effectively overcoming the traditional technical bottleneck of separating deep convective and stratiform precipitation features, suppressing non-convective interference signals and enhancing the application of key meteorological signals. The design resolves the physical decoupling of multi-source observation features during cross-scale transmission, constructing a multi-scale fusion feature field with clear meteorological consistency. The module design is illustrated in
Figure 6.
3.1.4. Radar Information Enhancement Module (RE Block)
Existing research indicates that regions of high radar reflectivity typically correspond to deep convective clouds and strong updrafts. In mature thunderstorms, updrafts lift a significant amount of supercooled water droplets above the freezing level (0 °C). At this altitude, frequent collisions, friction, and fragmentation occur between supercooled droplets and particles such as ice crystals and graupel. According to the classical non-inductive charging theory, under specific temperature and liquid water content conditions (typically within the “mixed-phase region” of −10 °C to −25 °C), collisions between ice crystals and graupel result in charge transfer. This process causes lighter ice crystals to become positively charged and be carried upward by updrafts to the upper part of the cloud, while heavier graupel becomes negatively charged and accumulates in the middle to lower parts, forming dipole or tripole charge structures. This large-scale charge separation is a prerequisite for generating strong electric fields and ultimately triggering lightning discharges. Studies clearly demonstrate a strong correlation between lightning activity and radar reflectivity exceeding 31.29 dBZ at the −20 °C level [
37,
38,
39]. This is precisely because this altitude range is the most active region for charge separation, and high reflectivity reflects the abundance of ice-phase particles in this area. The relationship between radar echoes and lightning distribution is illustrated in
Figure 7.
To address the problem that single-satellite data is difficult to capture the microphysical processes during the initial stage of severe convection, and to overcome the limitation that weak radar features are easily overshadowed by dominant features in traditional multimodal fusion [
40,
41], we propose a softly physics-guided bidirectional synergistic neural network framework. This framework takes the large-scale lightning-related physical priors embedded in satellite data as global physical guidance to construct a highly complementary physical mapping space. It further introduces a “identification-correction” mechanism to achieve functional decoupling: the satellite-dominant backbone network focuses on learning macroscopic precipitation patterns, while the specially designed Physics-Guided Radar Enhancement Module generates an adaptive residual probability map (denoted as RSI) based on the explicit physical relationship between radar reflectivity and precipitation intensity (e.g., Z-R relationship) [
42], spatial continuity, temporal evolution patterns, and other physical properties, which are implicitly learned within the module. The residual probability map corrects the initial inversion results (denoted as BO) in the form of residual compensation, and the corrected results are subject to value range constraint through a sigmoid layer to ensure they fall within a reasonable physical range. This design establishes a physical consistency verification mechanism at the output level, allowing radar physical features to serve as additional supervisory signals and guiding the model to focus on weak feature regions overlooked in traditional training. This design effectively activates the fine-scale local severe convective structures lost due to the global smoothing characteristic of satellite data, thereby recalling missed detection samples while maintaining a low false-alarm rate. This mechanism is consistent with the findings of He et al. (2025) [
43], indicating that the introduction of physical guidance can effectively restore the smoothed weak echo features and improve the physical consistency of prediction results. The proposed framework effectively avoids the inherent information loss or feature conflict in deep feature fusion, fully integrates the global contextual information of satellite data with the local physical guidance of radar observations, and achieves efficient synergy and precise alignment of different modal features under physical guidance, providing reliable support for improving the accuracy of lightning identification. The module structure illustrated in
Figure 8.
3.1.5. Overall Model Architecture
To address the core operational challenges in lightning monitoring—such as difficulties in fusing multi-source data, strong noise interference, fragmented reconstruction, and missed detection of local severe convection—we proposed the MDE-UNet model. Based on the U-Net skeleton, the model employs a systematic modular design to achieve synergistic enhancement of multi-source information. At the encoding stage, a Channel Attention Block (CA Block) is introduced to adaptively enhance critical satellite and radar channel features related to deep convection while suppressing redundant noise. At the core of the decoding stage, an innovative weighted sliding window MLP block (WWMLP Block) is designed to ensure spatial continuity and physical consistency in reconstruction results by aggregating neighborhood context, effectively eliminating isolated spurious signals. Crucially, the network leverages the inherent skip connections of the U-Net architecture to directly inject encoder-extracted features—rich in spatial detail—into the corresponding levels of the decoder. This fusion mechanism, which combines “abstract semantic guidance with fine-grained detail supplementation,” effectively mitigates the vanishing gradient problem during network training and provides the decoder with precise spatial positional references for high-resolution feature reconstruction. As a result, it significantly enhances the model’s ability to preserve and restore key spatial details—such as the boundaries and morphology of lightning-prone areas—when inverting lightning probability. Furthermore, a Multi-scale Feature Fusion Block (Fusion Block) integrates satellite and radar features at different resolutions, addressing scale decoupling issues in feature transmission and constructing a highly consistent feature field. Finally, a dedicated Radar Enhancement Block (RE Block) performs physics-guided local refinement of satellite-dominated preliminary inversion results, effectively recalling missed detection areas. Through hierarchical encoding, adaptive decoding, and cross-modal deep fusion, the model leverages the complementary advantages of multi-source data. It significantly reduces false alarm rates while improving the spatial continuity, physical credibility, and hit rate of lightning area reconstruction, enhancing the model’s robustness and operational applicability in complex weather scenarios. The model architecture is illustrated in
Figure 9.
3.2. Loss Function
To effectively address the core technical challenge of extreme sparsity and severe class imbalance in lightning monitoring data—where positive samples (lightning areas) are scarce—and to overcome issues such as model optimization bias and high false negative rates, as well as to address the practical operational problem of balancing missed detection and false alarm rates caused by gradient dominance of negative samples in traditional binary cross-entropy loss (BCELoss), we propose an asymmetric weighted BCE-Dice composite loss function.
This loss function is specifically designed to align with the model’s optimization objectives. The Dice loss term (Dice Loss) calculates the overlap between predicted and ground truth regions, effectively mitigating class imbalance while constraining the overall spatial continuity and localization accuracy of the regions. Meanwhile, the improved weighted binary cross-entropy term (weighted BCE) assigns higher penalty weights to sparse positive samples, compelling the model to focus more on the challenging-to-learn features of lightning pixels during training.
The weighted fusion of these two components enables the model to not only perform fine-grained pixel-level probability calibration during optimization but also ensure overall spatial consistency between predicted regions and actual lightning areas, thereby prioritizing the avoidance of high-risk missed detection errors. From the perspective of optimization objectives, this design fundamentally addresses a series of challenges, including model learning failure under sparse samples, fragmented prediction regions, and the difficulty of embedding operational risk preferences. It lays the theoretical foundation for the model to ultimately achieve a balance between high detection rates and low false alarm rates, serving as a key factor in enhancing the physical consistency and operational usability of lightning area reconstruction. The formula is as follows:
Among them, and are the weighting coefficients for the weighted BCE loss and Dice loss, respectively, used to balance the contributions of the two loss components.
The Dice loss leverages its focus on the intersection of positive samples to correct gradient bias, compelling the model to emphasize sparse lightning pixels. The formula is as follows:
Here, represents the predicted probability of the pixel at position (h,w) in the b-th sample by the model; denotes the ground truth label of the pixel at position (h,w) in the b-th sample.
Additionally, asymmetric weights for false positives and false negatives are introduced to precisely align with operational cost requirements. By combining pixel-level loss preservation and region-level overlap measurement, this approach addresses the issue of gradient dominance by negative samples while balancing pixel accuracy and regional localization precision, thereby ensuring effective learning of sparse lightning features.
The asymmetric weights
and weighted BCE loss WeightedBCE are formulated as follows:
In the above equation, denotes the penalty weight for false-positive (false alarm) samples, and denotes the penalty weight for false-negative (missed detection) samples.
In this paper, based on UNet, we investigate the effect of the loss function with different ratios of
(fixing
). Based on the Critical Success Index (CSI), the optimal combination (4:3) is selected through comparison, and the results are shown in
Table 1. The results indicate that a higher ratio of
makes the loss function more inclined to penalize missed reports, thus leading to an increase in the Probability of Detection (POD); however, the False Alarm Ratio (FAR) also increases accordingly.
In addition, based on UNet, we studied the effects of BCELoss and DICELoss loss functions under different ratios (fixing the sum of the ratios to 1). Based on the Critical Success Index (CSI), the optimal combination (7:3) is selected through comparison, and the results are shown in
Table 2.