A Few-Shot Optical Classification Approach for Meteorological Lightning Monitoring: Leveraging Frame Difference and Triplet Network

Xiao, Mengmeng; Yan, Yulong; Zhang, Qilin; Liu, Yan; Pan, Xingke; Dai, Bingzhe; Duan, Chunxu

doi:10.3390/rs18030386

Open AccessArticle

A Few-Shot Optical Classification Approach for Meteorological Lightning Monitoring: Leveraging Frame Difference and Triplet Network

by

Mengmeng Xiao

¹

,

Yulong Yan

¹,

Qilin Zhang

^1,*,

Yan Liu

^1,2,

Xingke Pan

¹,

Bingzhe Dai

¹

and

Chunxu Duan

¹

State Key Laboratory of Climate System Prediction and Risk Management/Key Laboratory of Meteorological Disaster, Ministry of Education/Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, Nanjing University of Information Science and Technology, Nanjing 210044, China

²

Key Laboratory of Big Data & Artificial Intelligence in Transportation, Ministry of Education, Beijing Jiaotong University, Beijing 100044, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(3), 386; https://doi.org/10.3390/rs18030386

Submission received: 26 November 2025 / Revised: 11 January 2026 / Accepted: 21 January 2026 / Published: 23 January 2026

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A frame difference triplet network (FD-TripletNet) is proposed, which achieves 94.8% classification accuracy for few-shot lightning optical recognition.
Multi-scale frame difference input and Triplet Loss effectively reduce background noise, with FNR of 3.2% and FPR of 7.4% for lightning/non-lightning classification.

What are the implications of the main findings?

The model addresses the bottleneck of scarce labeled samples and strong instantaneity in lightning classification, outperforming traditional and baseline deep learning methods.
It provides a reliable technical solution for real-time lightning monitoring in meteorological applications, enhancing the efficiency of disaster prevention and control.

Abstract

To address the challenges of scarce labeled samples, strong instantaneity, and variable morphology in lightning optical classification—issues that traditional methods struggle to handle efficiently and often require extensive manual intervention—we propose a frame difference triplet network (FD-TripletNet) tailored for few-shot lightning recognition. The lightning optical dataset used in this study was collected from two observation stations over six months, comprising 459 video samples that include lightning events with diverse morphologies (e.g., branched, spherical) and non-lightning events prone to misclassification (e.g., strong light interference, moving objects). Considering the critical feature of lightning—abrupt single-frame changes—we introduce adjacent frame difference matrices as model input to explicitly capture transient brightness variations, reducing noise from static backgrounds. To enhance discriminative ability in few-shot scenarios, the model leverages Triplet Loss to compact intra-class features and separate inter-class features, combined with a dynamic sample matching strategy to focus on challenging cases. The experimental results show that FD-TripletNet achieves a classification accuracy of 94.8% on the dataset, outperforming traditional methods and baseline deep learning models. It effectively reduces the False Negative Rate (FNR) to 3.2% and False Positive Rate (FPR) to 7.4%, successfully distinguishing between lightning and non-lightning events, thus providing an efficient solution for real-time lightning monitoring in meteorological applications.

Keywords:

lightning optical classification; few-shot learning; frame difference matrix; triplet network

1. Introduction

Lightning, a transient yet high-impact atmospheric discharge phenomenon, serves as a pivotal indicator of severe convective weather systems (e.g., thunderstorms, tornadoes) and a critical variable for long-term climate analysis. For short-term disaster prevention, real-time lightning detection enables early warnings of extreme weather, reducing casualties and economic losses caused by lightning-induced wildfires, power grid failures, and structural damage [1]. In climate research, decadal-scale lightning pattern analysis provides insights into atmospheric convection dynamics, global warming-induced changes in storm intensity, and variations in atmospheric electrical activity [2]. Among diverse lightning-monitoring technologies, optical detection—relying on high-speed cameras or ground-based optical sensors—offers unique advantages: it captures direct visual information of discharge morphology (e.g., channel structure, brightness dynamics) that complements electromagnetic or radar-based methods, making it indispensable for fine-grained lightning classification [3].

However, practical optical lightning monitoring faces three intertwined challenges that have long hindered the development of robust classification systems. The first is lightning’s instantaneity: a typical cloud-to-ground lightning discharge lasts only 10–100 milliseconds, manifesting as abrupt brightness spikes across 3–5 consecutive frames in high-speed video. This “single-frame abrupt change” contrasts sharply with the gradual feature variations targeted by most image recognition models, often leading to confusion with transient noise (e.g., sunlight glints on clouds, fast-moving birds) [4]. The second is morphological diversity: lightning discharges exhibit highly variable forms—branched (tree-like channels), spherical (rare ball lightning), and cloud-to-ground (single thick channels)—each with distinct spatial features (e.g., edge sharpness, brightness distribution). This diversity renders manually designed feature sets (e.g., edge density, texture statistics) ineffective, as a feature optimized for branched lightning fails to generalize to spherical events [5]. Third, and most restrictive, is scarce labeled samples: lightning events are spatially scattered and unpredictable, making large-scale labeled dataset construction logistically costly and time-consuming. For example, collecting over 1000 annotated lightning video clips requires years of continuous observation across multiple sites, leaving most studies with limited samples that exacerbate overfitting in deep learning models [6].

Remote sensing and meteorological communities have long sought efficient methods to mitigate such sample scarcity, and recent advances in few-shot learning (FSL) have shown promise in remote sensing scene classification—Wang et al. evaluated a meta-transfer approach for few-shot remote sensing scene classification, demonstrating that transferable meta-learners can mitigate data scarcity issues even with extremely limited samples (e.g., 1–5 labeled samples per class) [7]. However, direct adaptation of these FSL methods to lightning optical data remains underexplored: most existing FSL frameworks for remote sensing focus on static scenes (e.g., forests, deserts) and fail to account for lightning’s unique “millisecond-scale instantaneity”—a critical constraint for capturing the transient brightness changes that define true discharges. This gap means even state-of-the-art FSL models struggle to generalize to lightning’s dynamic morphology, let alone meet the real-time inference demands of field-deployed monitoring systems.

Existing solutions have struggled to address these challenges comprehensively. Traditional optical classification methods rely on handcrafted features and shallow classifiers: edge detection (e.g., Canny operator) identifies lightning channels but confuses jagged cloud edges with branched lightning [4]; optical flow tracks motion but is disrupted by fast-moving cloud masses, leading to false positives in stormy weather [5]; and frame difference methods (e.g., Otsu-thresholded frame subtraction) reduce static background noise but fail to capture the discriminative temporal patterns of dim or short-duration lightning [2]. These methods lack adaptability to complex environments (e.g., fog, heavy rain) and cannot scale to diverse lightning morphologies.

Deep learning has advanced optical recognition by automating feature extraction, but its application to lightning classification is constrained by the few-shot dilemma. Convolutional Neural Networks (CNNs) like MobileNetV2 [8] achieve high accuracy on large datasets but overfit severely when labeled samples are scarce—for instance, a CNN trained on 50 branched lightning samples may misclassify spherical lightning as non-lightning due to limited exposure to diverse morphologies [6]. Few-shot learning (FSL) frameworks, designed to generalize from limited examples, have shown promise in remote sensing but remain underadapted to lightning optical data: Prototypical Networks [9] learn class centroids but assume uniform feature distribution, which fails for lightning’s scattered morphological features; adaptive metric learning [10] improves inter-class separation but ignores lightning’s transient temporal cues; and self-supervised pre-training [3] (e.g., Masked Autoencoders on waveforms) focuses on 1D electromagnetic signals, not 2D optical frame dynamics critical for classification [4].

To bridge these gaps, this study proposes a Frame Difference Triplet Network (FD-TripletNet), a novel deep learning framework tailored for few-shot optical lightning classification. The core innovations address the three key challenges: (1) frame difference matrices—computed as the absolute pixel-wise difference between consecutive frames—explicitly capture lightning’s “single-frame abrupt brightness change,” suppressing noise from static backgrounds or slow-moving clouds [2]; (2) Triplet Loss with dynamic hard example mining enhances discriminative feature learning by compacting intra-class features (e.g., different branched lightning events) and separating inter-class features (e.g., lightning vs. strong light glare), even with limited labeled samples [3]; (3) non-consecutive frame selection balances efficiency and robustness by retaining only the most informative frames (optimal K = 4), avoiding redundancy while ensuring coverage of transient discharge processes [5]. Additionally, the framework adopts a lightweight MobileNetV2 backbone [11], enabling real-time inference on edge devices (e.g., remote weather stations) to fill the ground-based monitoring gap in sparsely instrumented areas [1].

The remainder of this manuscript is structured as follows: Section 2 reviews related work on lightning detection, few-shot learning, and efficient network architectures; Section 3 details the design of FD-TripletNet, including data preprocessing, network architecture, loss function, and training strategies; Section 4 presents experimental results and comparative analyses with baseline methods; Section 5 discusses the model’s strengths, limitations, and practical implications; and Section 6 concludes with future research directions.

2. Related Work

2.1. Lightning Detection and Classification

Lightning detection and classification have evolved from manual feature engineering to deep learning-driven solutions, with a focus on high-speed data processing, multi-source validation, and environmental robustness. Studies in this domain span optical video, satellite imagery, and single-site signals, laying the foundation for automated meteorological monitoring.

2.1.1. Optical Video-Based Lightning Segmentation and Classification

High-speed lightning footage (>1000 fps) captures microsecond-scale discharge processes but requires automated analysis to replace labor-intensive manual labeling. Cross et al. [12] evaluated five semantic segmentation networks (DeepLab3+, SegNet, FCN8s, U-Net, AlexNet) on 48,381 labeled frames of South African lightning footage, designing a pipeline of “time-series denoising → frame-level segmentation → sequence classification.” DeepLab3+ achieved the highest performance due to its atrous spatial pyramid pooling, which preserves edge details of lightning channels. However, this method relies on large-scale labeled data (48 k frames) and fails to handle dim lightning or lens scattering in rain—limitations attributed to its static frame difference strategy and lack of adaptive noise reduction.

For label-scarce scenarios, Lu et al. [13] proposed a self-supervised framework based on Masked Autoencoders (MAEs). They pre-trained the model on 100 k unlabeled lightning waveforms from the Beijing Broadband Lightning Network (BLNET) and fine-tuned it with only 3 k labeled samples (covering PCG, NCG, NBE, PB, IC lightning types). This work demonstrated the potential of self-supervised learning in reducing label dependence, but its focus on 1D waveforms limits direct application to 2D optical video—where lightning’s spatial morphology (e.g., branched vs. spherical) is critical for classification.

Qian et al. [14] further advanced deep learning for optical lightning identification, using a modified ResNet50 to extract spatial features from raw frames and a temporal attention module to model frame sequences. Their model achieved 89.5% accuracy on a dataset of 5 k video clips, but it still requires thousands of labeled samples to avoid overfitting—unsuitable for regions with sparse observation networks.

2.1.2. Satellite and Multi-Sensor Lightning Validation

Spaceborne platforms expand lightning detection coverage, but require precise geolocation and cross-validation with ground sensors. Schultz et al. [15] developed an automated method for the International Space Station (ISS) METEOR camera (260 m pixel resolution), using RGB color distinction (lightning exhibits blue hues, city lights red/orange) and bilateral filtering to separate lightning from background noise. However, the method relies on manual radial mask adjustment for geolocation and fails to integrate real-time atmospheric data (e.g., cloud cover) to reduce false positives in fog.

Single-site lightning location addresses remote areas with sparse sensor networks. The study optimized single-site positioning algorithms (likely via VLF/LF signal time-delay inversion), reducing reliance on dense multi-site deployments. While specific results were not fully disclosed, it highlights the need for low-cost, single-station solutions—though it lacks integration with optical data for multi-modal verification [16].

2.1.3. Remote Sensing-Assisted Meteorological Context

Remote sensing data provides contextual support for lightning analysis (e.g., cloud top temperature, precipitation intensity). The study explored meteorological parameter inversion from multi-spectral satellite images, which can be paired with optical lightning signals to distinguish cloud-to-ground (CG) and intra-cloud (IC) lightning (e.g., linking deep convection to CG lightning). However, it does not explicitly address lightning detection, leaving a gap in multi-source data fusion for end-to-end classification [17].

2.2. Few-Shot Learning for Lightning and Related Visual Tasks

Few-shot learning (FSL) addresses labeled lightning scarcity by enabling generalization from limited examples. Existing FSL methods are grouped into prototype learning, adaptive metric learning, and cross-modal/self-supervised FSL, with limited adaptation to meteorological scenarios.

2.2.1. Prototype-Based Few-Shot Learning

Snell et al. [9] proposed Prototypical Networks, a foundational FSL framework that learns class “prototypes” (cluster centers in embedding space) and classifies queries by distance to prototypes. They achieved 99.7% five-shot accuracy on Omniglot, demonstrating embedding-based generalization. However, this method assumes uniform class distribution, which fails for lightning, for which morphologies (branched, spherical) vary drastically, leading to scattered prototypes and poor generalization.

2.2.2. Adaptive Metric and Loss for Heterogeneous Data

To enhance discriminability for similar classes, adaptive metric learning adjusts distance metrics based on sample characteristics. Shi et al. [10] proposed an adaptive deep metric learning framework that dynamically selects Euclidean or cosine distance based on intra-class variance, improving separation between “dim lightning” and “glare.” Li et al. [18] further developed an Adaptive Margin Loss that tunes class margins (e.g., larger margins between weak lightning and bright non-lightning), boosting the five-shot F1-score by 4.2% on UCF-101. While effective for general tasks, these methods ignore lightning’s transient brightness changes, leading to suboptimal performance on dim events.

2.2.3. Cross-Model and Self-Supervised Augmentation

Cross-modal fusion enriches FSL representations by integrating complementary data. Wang et al. [19] introduced CLIP-guided prototype modulation for few-shot action recognition, using text semantics to refine visual prototype. This thought is transferable to lightning (e.g., pairing optical frames with meteorological text like “thunderstorm with branched lightning”) but has not been applied to meteorological tasks.

Self-supervised learning (SSL) leverages unlabeled data for FSL. Lu et al. [13] combined MAE-based SSL with FSL, pre-training on 100 k unlabeled lightning waveforms and fine-tuning with 3 k labels—achieving 98.30% accuracy on BLNET data. This bridges SSL and FSL for lightning but focuses on 1D waveforms, not 2D optical frames, and fails to handle open-world unknown lightning types. Zeng et al. [3] applied SSL with Triplet Loss to medical image denoising, showing that pre-training on unlabeled data improves few-shot generalization by 9.5%. Their work highlights the potential of Triplet Loss for FSL, but it targets static medical images and does not adapt to lightning’s temporal dynamics.

2.2.4. Open-World Few-Shot Learning Challenges

Open-world FSL challenges (unknown classes, distribution shift) are critical for lightning—where rare events (e.g., ball lightning) are unlabeled. It has been highlighted that most FSL methods assume closed-set scenarios, failing to generalize to real-world meteorological data. This gap underscores the need for FSL frameworks tailored to open-world lightning detection [20].

2.3. Efficient Network Architectures and Discriminative Loss Functions

Efficient architectures and loss functions enable lightning classification on edge devices (e.g., remote weather stations) and high-speed video. Research in this domain focuses on lightweight design, global feature capture, and discriminative loss optimization.

2.3.1. From Deep to Lightweight: Architecture Evolution

He et al. [21] proposed ResNet with residual connections to solve depth degradation, enabling 152-layer networks with lower complexity than VGG19. ResNet became a backbone for meteorological tasks (e.g., lightning feature extraction) but its parameter count limits edge deployment.

Sandler et al. [11] introduced MobileNetV2 with inverted residuals and linear bottlenecks, reducing parameters to 3.5M while maintaining accuracy. Duklan et al. [22] compared five CNNs (ResNet50V2, ResNet152V2, InceptionV3, Xception, MobileNetV2) on the Uttaranchal University dataset, finding Xception had the highest accuracy (99.82%) while MobileNetV2 (93.71%) suited edge devices for low computational cost. Modzianowski [8] validated MobileNetV2 for six-class weather classification, confirming its suitability for resource-constrained meteorological devices.

2.3.2. Transformer for Global Feature Modeling

Traditional CNNs struggle with global feature capture (e.g., large-scale lightning channels). Vision Transformer (ViT) was explored for weather scene identification, using self-attention to model long-range dependencies between cloud and lightning features [23]. While ViT showed potential for global integration, it was not applied to lightning detection—missing opportunities to model the spatial extent of discharges.

2.3.3. Discriminative Loss for Feature Separation

Schroff et al. [24] proposed Triplet Loss in FaceNet, minimizing intra-class distance and maximizing inter-class distance—achieving 99.63% accuracy on LFW. This metric learning loss was adapted for lightning but does not account for lightning’s dynamic brightness changes, leading to poor performance on dim events.

3. Model Design and Research

This study designs a deep learning model that integrates multimodal information and triplet learning mechanisms for lightning detection tasks. The model is deeply optimized regarding aspects from data processing, network architecture, and loss function to training strategies, and the following details the design ideas and key technical implementations of the model.

3.1. Dataset Construction and Preprocessing

To achieve accurate lightning detection, this study constructs a video dataset with two types of labels: “with lightning“ and “without lightning.” In the data preprocessing stage, a unique multi-frame processing strategy is designed for the temporal characteristics of video data.

Optical lightning discharges are characterized by abrupt and transient brightness variations embedded within complex and dynamic backgrounds. To emphasize discriminative temporal changes while suppressing static scene content, temporal difference is adopted as the core preprocessing and representation strategy.

Given an input optical video sequence

I = {I_{1}, I_{2}, \dots, I_{T}},

(1)

each frame

I_{t}

is first converted to grayscale and normalized to the range [0, 1] to reduce sensor-dependent variability and stabilize network training. These preprocessing steps are intentionally lightweight and do not introduce explicit denoising operations, allowing the subsequent temporal representation to remain physically interpretable. Noise suppression is primarily achieved through temporal differencing and frame selection rather than explicit spatial denoising, ensuring that physically meaningful brightness variations are preserved.

To highlight lightning-included brightness changes, the frame difference representation is defined as the absolute pixel-wise intensity difference between frames separated by a temporal interval

Δ t

:

D_{t, Δ t} = | I_{t} - I_{t - Δ t} | .

(2)

Adjacent-frame differencing (

Δ t = 1

) effectively captures abrupt intensity changes associated with return strokes. However, formulation above implicitly assumes that discriminative temporal information is concentrated within a single-frame interval and may fail to adequately represent gradually evolving discharge processes, such as leader development or decay phases, which often exhibit weaker but temporally extended optical emissions.

To accommodate the heterogeneous temporal dynamics of lightning discharges, the frame difference operation is extended to a multi-scale temporal formulation by computing difference maps over multiple temporal intervals:

Δ t \in {1, 2, 3} .

(3)

This multi-scale representation preserves sensitivity to instantaneous brightness spikes while improving temporal coverage for weaker and gradually evolving discharge processes.It should be noted that the selected temporal intervals (

Δ t = 1, 2, 3

) do not impose strict or exclusive assumptions on lightning discharge phases. Instead, they provide complementary temporal coverage for heterogeneous discharge dynamics, allowing the model to flexibly capture both abrupt and gradual brightness variations without explicitly segmenting physical stages.

3.2. Network Architecture Design

The proposed separated-path model innovatively processes and fuses frame difference information and original frame information independently. As shown in Figure 1, its architecture includes four core modules: frame difference feature-extraction module, original frame feature-extraction module, temporal aggregation module, and classification decision module.

3.3. Frame Difference Feature-Extraction Module

To effectively extract features of the frame difference matrix, a 1 × 1 convolutional layer is first introduced to convert the single-channel frame difference matrix into a three-channel one. This operation can not only increase the number of data channels but also extract local features through convolutional operations. Subsequently, MobileNetV2 is connected as the backbone network, whose core structure of depthwise separable convolution decomposes traditional convolution operations into depthwise convolution and pointwise convolution. Depthwise convolution independently performs spatial feature extraction for each input channel, and pointwise convolution (1 × 1 convolution) realizes information fusion between channels. Taking a convolutional layer with input feature map size and output channel number as an example, the computational complexity of traditional convolution is as follows:

{Complexity}_{traditional} = H \times W \times C_{in} \times C_{out} \times k \times k,

(4)

where k denotes the convolution kernel size. The computational complexity of depthwise separable convolution is as follows:

{Complexity}_{depthwise} = H \times W \times C_{in} \times (k \times k + C_{out}) .

(5)

This significantly reduces the computational complexity.

In addition, MobileNetV2 adopts a linear bottleneck structure, consisting of three parts: 1 × 1 convolution for dimension expansion, depthwise separable convolution for feature extraction, and 1 × 1 convolution for dimension reduction. A linear activation function is used during dimension reduction to avoid information loss in low-dimensional spaces. In this model, MobileNetV2 is initialized with ImageNet pre-trained weights and fine-tuned for lightning detection tasks. The extracted feature maps are compressed into fixed-length vectors by an adaptive average pooling layer.

Table 1 shows the parameter settings of each module in the model for this experiment, and Figure 2 shows the model structure.

3.4. Loss Function Design

To enhance the model’s discriminative ability and classification accuracy, a joint loss function integrating Triplet Loss and Binary Cross-Entropy Loss is designed to supervise model training from different perspectives.

3.4.1. Adaptive Triplet Loss

In the sample construction stage, triplet samples are generated through the custom TripletLightningDataset class. For each anchor sample (a), the positive sample (p) is selected from samples of the same class with the closest frame difference statistics, and the negative sample (n) is selected from samples of different classes with the most significant frame difference statistics to ensure that the sample group has high discriminability.

Regarding distance metrics, traditional Triplet Loss mostly uses a single distance calculation. The adaptive Triplet Loss proposed in this study introduces a learnable weight parameter

α

, which is constrained to the interval [0, 1] through the Sigmoid function to dynamically fuse Euclidean distance and Cosine distance. The specific calculation process is as follows:

1.: Euclidean Distance Calculation: Measures the absolute distance in the feature vector space.

$d_{euclidean} (x, y) = \sqrt{\sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}$

(6)

Calculate the Euclidean distances $d_{euclidean} (a, p)$ and $d_{euclidean} (a, n)$ between the anchor sample and the positive/negative samples, respectively.
2.: Cosine Distance Calculation: Measures the direction difference of feature vectors.

$d_{cosine} (x, y) = 1 - \frac{x \cdot y}{∥ x ∥ \cdot ∥ y ∥}$

(7)

Calculate $d_{cosine} (a, p)$ and $d_{cosine} (a, n)$ .
3.: Distance Fusion: The fusion of Euclidean Distance and Cosine Distance.

$d_{pos} = σ (x) \cdot d_{euclidean} (a, p) + (1 - σ (x)) \cdot d_{cosine} (a, p)$

(8)

$d_{neg} = σ (x) \cdot d_{euclidean} (a, n) + (1 - σ (x)) \cdot d_{cosine} (a, n)$

(9)

This allows the model to automatically learn the importance of the two distances.

The final loss function is

L_{triplet} = max (0, d_{pos} - d_{neg} + m),

(10)

where the margin value m = 0.8. By minimizing this loss, the model learns more discriminative feature representations.

3.4.2. Binary Cross-Entropy Loss

For the classification task, the Binary Cross-Entropy Loss function (BCEWithLogitsLoss) is adopted:

L_{cls} = - [y \cdot log (σ (\hat{y})) + (1 - y) \cdot log (1 - σ (\hat{y}))],

(11)

where y is the true label,

\hat{y}

is the original output of the model, and

σ

is the Sigmoid function, which is used to measure the difference between the predicted probability and the true label.

3.4.3. Joint Loss Function

The final joint loss function is

L = λ_{triplet} \cdot L_{triplet} + λ_{cls} \cdot L_{cls},

(12)

In this study, both

λ_{triplet}

and

λ_{cls}

are set to 1.0, and the weights are adjusted to balance the impact of the two losses on training.

3.5. Temporal Frame Selection and Sequence Construction

Lightning events are temporally sparse and typically occupy only a limited subset of frames within a continuous video stream. Therefore, temporal selection is preformed at two complementary levels: frame-level screening and sequence-level construction.

3.5.1. Frame-Level Screening Based on Intensity Variation

In practical implementation, a statistical screening strategy based on the average pixel intensity variation

Δ I

of the frame-difference matrix is adopted to identify informative frames. For each multi-scale frame-difference map,

Δ I

is computed as the mean pixel value of the corresponding matrix.

According to the magnitude of

Δ I

, frame-difference maps are categorized into three intensity intervals:

Low-intensity (

20 \leq Δ I < 60

), corresponding to weak brightness changes typically observed during leader or decay phases;

Medium-intensity (

60 \leq Δ I < 80

), representing transitional discharge states;

High-intensity (

Δ I \geq 80

), associated with abrupt brightness spikes of return stroke phases.

Within each intensity interval, frame-difference maps are sorted in descending order of

Δ I

while preserving their original temporal order to maintain the integrity of discharge dynamics. For a target sequence length K, frames are sampled a proportional allocation strategy based on a 2:1:1 intensity template (high:medium:low). The exact number of frames selected from each interval is scaled with K, prioritizing high-intensity responses while maintaining coverage of medium-intensity and low-intensity discharge phases. This design emphasizes phase representation rather than enforcing fixed frame counts and therefore generalizes naturally across different sequence lengths.

For rare lightning events, such as dim intra-cloud flashes that lack high-intensity responses, the required frames are supplemented by selected the highest-

Δ I

samples from adjacent intensity intervals to ensure sufficient temporal coverage.

To further suppress background interference, an isolate-frame exclusion mechanism based on temporal continuity is introduced. Low-intensity frame-difference maps are retained only if temporally adjacent frames within a

\pm 3

frames window exhibit consistent intensity responses. Slow and continuous wind-include interference, such as swaying vegetation, typically generates consecutive low-intensity frame differences with stable spatial distributions, whereas lightning discharge produce transient and spatially dispersed brightness changes. By exploiting this difference in temporal continuity characteristic, the proposed mechanism provides a principled way to distinguish slow background motion from lightning-included variations.

The selected frames preserve their relative temporal order to ensure logical continuity of the discharge process.

3.5.2. Sequence-Level Temporal Length Determination

After frame-level screening, the selected frame-difference maps are assembled into temporal sequences of fixed length K, which serve as the input to the network. The choice of sequence length directly affects the balance between capturing complete discharge dynamics and avoiding redundant background information.

From a physical perspective, optical lightning discharges typically manifest over several consecutive frame in high-speed video. Based on this observation, candidate sequence length within the range K = 3 to 7 are evaluated. Shorter sequences may fail to capture complete discharge signatures, while longer sequences tend to introduce redundant background fluctuations.

The optical sequence length is determined empirically through comparative experiments, ensuring a balance between physical interpretability and data-driven optimization.

For short video segments containing fewer than K valid frames, or in degenerate cases where only a single frame is available, a default padding mechanism is applied. All-zero matrices are used to fill missing positions, ensuring consistency of input dimensions without introducing artificial brightness patterns.

Figure 3 shows the experiment process of sequence length optimization.

3.6. Training Strategies

To achieve efficient model training and stable convergence, a combination of mixed-precision training, gradient accumulation, and optimizer and learning rate adjustment strategies is used.

3.6.1. Mixed-Precision Training

The autocast and GradScaler of the torch.cuda.amp library are used to implement mixed-precision training. Autocast automatically switches part of the computation to half-precision to reduce memory usage and accelerate computation. GradScaler dynamically adjusts the gradient scaling factor to avoid numerical instability caused by half-precision computation, ensuring training stability and accuracy.

3.6.2. Gradient Accumulation

To solve the problem of small batches under hardware resource constraints, gradient accumulation technology is adopted. Gradients of multiple small batches are accumulated to achieve the effect of equivalent large-batch training. In this study, the gradient accumulation steps are set to 2; that is, model parameters are updated once after calculating the gradients of two small batches, thereby improving training stability and convergence speed without increasing memory consumption.

3.6.3. Optimizer and Learning Rate Adjustment

The Adam optimizer is selected to update model parameters, which combines the advantages of Adagrad and RMSProp and dynamically adjusts the learning rate according to the gradient history, with an initial learning rate set to 0.0001. A learning rate decay strategy is adopted during training: when the validation set accuracy no longer improves for several consecutive epochs, the learning rate is reduced to 0.1 times the original. Meanwhile, an early stopping mechanism is introduced: if the validation set accuracy does not improve for 10 consecutive epochs, training is stopped to avoid overfitting and balance the model’s generalization ability and fitting ability.

4. Experiments and Results

4.1. Dataset

The data used in this study are all from two optical observation stations built by the research team, located in Nanchang, Jiangxi (28.7°N, 115.5°E), and Zhongshan, Guangdong (22.6°N, 113.4°E), respectively. The video acquisition system deployed at both stations is standardized to ensure data consistency: high-speed industrial cameras were used, operating at a frame rate of 1000 fps (frames per second) to capture the microsecond-scale dynamics of lightning discharges. To ensure stability in outdoor meteorological environments, each camera was installed 2 m above the ground on a weatherproof platform, equipped with a dustproof and waterproof housing to prevent lens fogging in high-humidity condition. The raw video data is stored in uncompressed raw format to preserve maximum brightness details, and the data is transferred to a local storage server via Ethernet. This study collects 459 optical video data from May 2024 to October 2024. The video files collected by the camera have a resolution of 1280 × 1024. The dataset includes lightning video samples under different lighting conditions and different cloud occlusion degrees, and the non-lightning dataset includes video data of strong light irradiation, moving objects, and other events easily mistaken for lightning. Using OpenCV 4.8.0, we processed raw uncompressed videos: we loaded files via cv2.VideoCapture(), validated frame count/1000 fps frame rate, retained 5–10-s valid segments. We then computed pixel-wise absolute difference matrices for consecutive frames to highlight lightning’s transient brightness changes, finally archiving data as compressed .npz files.

The dataset structure is visualized in Figure 4. The dataset is divided into Lightning and Non-lightning categories: for lightning samples, we further distinguish strong lightning (sharp edges) and weak lightning (blurred edges); the Non-lightning category includes no lightning (background only) and living (interference like moving objects). To address the issue that lightning channels are not easily distinguishable in original grayscale images, we use a lightning channel-enhanced visualization to highlight the discharge paths, which is only for better visual presentation (no temporal compositing methods involved in the model).

In this dataset, 319 video files are used for training, and 140 video files are used for testing. The divided dataset is shown in the Table 2.

4.2. Evaluation Metrics

This study uses accuracy, precision, recall, F1-score, False Positive Rate (FPR), False Negative Rate (FNR), and confusion matrix to measure the experimental results of the lightning recognition method. The specific calculation formulas are as follows:

1.: Accuracy: Measures the overall correctness of predictions.

$Accuracy = \frac{TN + TP}{TP + TN + FP + FN}$

(13)
2.: Precision: Reflects the proportion of correctly predicted lightning events among all predicted lightning events.

$Precision = \frac{TP}{TP + FP}$

(14)
3.: Recall: Indicates the proportion of actual lightning events that are correctly identified.

$Recall = \frac{TP}{TP + FN}$

(15)
4.: F1-Score: A balanced measure that combines precision and recall.

$F 1 - Score = 2 \times \frac{Precision \times Recall}{Precision + Recall}$

(16)
5.: FPR: Quantifies the proportion of non-lightning events incorrectly classified as lightning.

$FPR = \frac{FP}{FP + TN}$

(17)
6.: FNR: Quantifies the proportion of actual lightning events incorrectly classified as non-lightning.

$FNR = \frac{FN}{TP + FN}$

(18)
7.: Confusion Matrix: A tabular representation of prediction outcomes.

$Confusion Matrix = [\begin{matrix} TN & FP \\ FN & TP \end{matrix}]$

(19)

From these formulas, the following variables are defined:

TP (True Positives): Number of lightning events correctly classified as lightning.
TN (True Negatives): Number of non-lightning events correctly classified as non-lightning.
FP (False Positives): Number of non-lightning events incorrectly predicted as lightning.
FN (False Negatives): Number of lightning events incorrectly predicted as non-lightning.

These metrics collectively provide a comprehensive assessment of the model’s ability to distinguish between lightning and non-lightning events, particularly critical for minimizing false alarms in real-world applications.

4.3. Experimental Settings

The experiments in this study are implemented in Python 3.12 on a Windows 10 system with an NVIDIA GeForce RTX 4060 GPU. The training configuration includes an initial learning rate of 0.001, a batch size of 32, and 50 training epochs with cosine decay scheduling. The joint loss function combines Triplet Loss and cross-entropy loss to optimize the lightning recognition model, effectively addressing the challenges of few-shot learning and instantaneous feature extraction in lightning optical classification.

4.4. Baseline Methods

The proposed method is compared with the following baselines:

1.: Traditional Frame Difference with SVM [5]: Uses Otsu threshold segmentation to extract frame difference features, followed by classification with SVM.
2.: MobileNetV2 [8]: Directly inputs original frames into MobileNetV2 without frame difference or Triplet Loss.
3.: TripletNet (Static Input) [24]: Uses Triplet Loss but input static single frames instead of frame difference matrices.
4.: Single-Scale FD-TripletNet ( $Δ t = 1$ ): Uses only adjacent-frame differences.
5.: Multi-Scale FD-TripletNet ( $Δ t = 1, 2, 3$ ): Proposes variant incorporating multi-scale temporal differencing.

In addition, we discuss the work of Qian et al. [14], which achieves competitive lightning identification performance on large-scale datasets using deep learning. However, their approach relies on extensive labeled data (over 10,000 samples), whereas FD-TripletNet targets few-shot scenarios (fewer than 300 labeled samples per class), which better reflects realistic constraints for rare lightning events. Therefore, direct performance comparison under identical data regimes is non-trivial.

4.5. Overall Performance Comparison

We first compare the overall classification performance of all methods. Table 3 summarizes the results on the test set in terms of accuracy, precision, recall, F1-score, false positive rate (FPR), and false negative rate (FNR).

Traditional methods perform poorly due to their reliance on manual features, which are easily disturbed by cloud movement and noise; MobileNetV2 shows improvement but lacks robustness in few-shot scenarios with lower recall for weak lightning; TripletNet (Static Input) benefits from Triplet Loss yet fails to capture lightning’s temporal dynamics, resulting in lower recall than both FD-TripletNet variants; the Single-Scale FD-TripletNet (

Δ t = 1

) excels at capturing instantaneous return stroke signals but misses gradual discharge processes, limiting its performance on weak or complex lightning; while the Multi-Scale FD-TripletNet (

Δ t = 1, 2, 3

) achieves comprehensive improvement—with accuracy increasing by 2.5%, recall rising by 4.4%, and FNR dropping by 2.7% compared to the single-scale variant—thanks to its adaptive multi-scale fusion design:

Δ t = 1

captures instantaneous return stroke signals,

Δ t = 2, 3

retains gradual leader phase dynamics, and the model automatically weights discriminative features across scales to avoid redundancy, additionally demonstrating stronger robustness to weak lightning and low-SNR scenarios and addressing the single-scale variant’s blind spot in gradual discharge processes.

It is worth emphasizing that the proposed multi-scale fusion does not rely on an explicit attention or weighting mechanism. Instead, the relative contribution of different temporal scales is learned implicitly through end-to-end training within the metric learning framework. This design avoids introducing additional model complexity while allowing the network to adaptively emphasize discriminative temporal information.

4.6. Ablation Study

4.6.1. Impact of Temporal Scale Selection

To investigate the effect of temporal scale selection, we compare the Single-Scale FD-TripletNet (

Δ t = 1

) with the Multi-Scale FD-TripletNet (

Δ t = 1, 2, 3

).

The results show that incorporating multiple temporal intervals significantly improves recall and reduces FNR. This improvement indicates that larger temporal gaps (

Δ t = 2, 3

) effectively capture cumulative brightness variations associated with gradual leader phases, complementing the instantaneous return stroke signals captured by

Δ t = 1

.

4.6.2. Impact of Frame Selection Strategies

Figure 5 illustrates the performance difference between frame selection strategies for both single-scale and multi-scale settings.

For the Single-Scale FD-TripletNet (

Δ t = 1

), the classification accuracy of the non-consecutive frame strategy reaches its maximum of 92.3% at K = 4, and remains stable at approximately 91.3% for

K \geq 5

, indicating strong resistance to temporal redundancy. Precision further improves with increasing K, peaking at 91.2% at K = 6, while recall remains consistently around 90.5% across all tested sequence lengths—an important property for minimizing missed lightning detections. The corresponding F1-score, which jointly reflects precision and recall, also demonstrates the clear advantage of the non-consecutive frame selection strategy over consecutive sampling.

Error rate analysis shows that non-consecutive frame selection maintains a stable false positive rate below 0.08 and a false negative rate no higher than 0.09, substantially outperforming the larger fluctuations observed under consecutive frame selection. This behavior is consistent with the physical characteristics of lightning discharges, which typically span 3–5 frames. By avoiding redundant adjacent frames, non-consecutive sampling preserves salient abrupt optical changes while effectively suppressing background noise.

For the Multi-Scale FD-TripletNet (

Δ t = 1, 2, 3

), accuracy reaches a peak of 94.8% at K = 5 and only exhibits a mild decline to 93.7% at K = 7, reflecting enhanced robustness to temporal redundancy and improved adaptability to multi-scale discharge dynamics. Precision achieves its highest value of 93.5% at K = 6, while recall remains consistently high at approximately 94.1% across all K values, which is particularly critical for safety-sensitive lightning detection tasks. The resulting F1-score further confirms the superiority of the non-consecutive strategy under multi-scale settings.

From an error-rate perspective, non-consecutive frame selection stabilizes the false positive rate below 0.08 and reduces the false negative rate to ≤0.04, outperforming consecutive sampling. This outcome aligns well with the physical structure of lightning discharges: abrupt return strokes are effectively captured at

Δ t = 1

, while more gradual leader-phase dynamics are retained at

Δ t = 2, 3

. When combined with non-consecutive frame selection, the multi-scale differencing strategy aggregates discriminative temporal features across scales while filtering redundant background responses and isolated interference, thereby maximizing the synergy between multi-scale temporal representation and efficient frame sampling.

This superiority stems from the multi-scale strategy’s ability to distinguish “lightning dynamics” from “background redundancy”:

Δ t = 1

captures core return stroke signals,

Δ t = 2, 3

supplements leader phase features, and non-consecutive selection filters redundant background frames without losing critical temporal information.

4.6.3. Confusion Matrix Analysis

Figure 6 and Figure 7 present the confusion matrices of the Single-Scale and Multi-Scale FD-TripletNet, respectively.

For the Single-Scale FD-TripletNet (

Δ t = 1

), the model correctly classifies 81 lightning events and 49 non-lightning events, with 5 false negatives and 5 false positives. Among the false negatives, three cases correspond to weak lightning dominated by prolonged leader phases, which are insufficiently captured by adjacent-frame differencing alone.

For the Multi-Scale FD-TripletNet (

Δ t = 1, 2, 3

), the number of false negatives is reduced from 5 to 2, and true positive detections increase accordingly. Notably, only two false negative corresponds to a weak lightning event, indicating that the inclusion of larger temporal intervals effectively captures gradual discharge dynamics.

Overall, the multi-scale variant demonstrates improved discriminative ability, particularly in reducing missed detections of weak and complex lightning events.

5. Discussion

The experimental results on the optical lightning dataset demonstrate the effectiveness of the proposed Frame Difference Triplet Network (FD-TripletNet) and provide insight into its design rationale, practical applicability, and remaining limitations. Rather than attributing the performance gains to network complexity, the advantages of FD-TripletNet stem from several design choices that align with the physical and observational characteristics of lightning discharges.

5.1. Effectiveness of Design Choices for Lightning Classification

Lightning discharges exhibit extreme temporal instantaneity, often manifesting as abrupt brightness changes within a single or very few frames. By adopting adjacent frame difference matrices as input, FD-TripletNet directly targets this “frame abruptness” characteristics. This design choice is consistent with previous findings that frame difference representations are particularly effective for detecting abrupt temporal changes in dynamic scenes. In the context of lightning, the proposed adaptation further aligns with millisecond-scale discharge dynamics, enabling reliable capture of rapid optical transitions.

In addition, the use of Triplet Loss with dynamic hard example mining addresses the challenge of scarce labeled samples. By optimizing relative distances in the embedding space, the network learns compact intra-class representations while enforcing clear separation between lightning and non-lightning samples. Compared with a TripletNet using static frame input, this strategy improves features generalization and contributes to a notable increase in classification performance, particularly in terms of F1-score.

The non-consecutive frame selection strategy further enhances robustness and efficiency. By reducing temporal redundancy, it stabilizes classification performance and avoids the erratic behavior that can arise when consecutive frames are dominated by background fluctuations. At the same time, this strategy significantly reduces computational cost, which is essential for deployment on resource-constrained edge devices.

Nevertheless, the effectiveness of multi-scale temporal differencing is influenced by dataset characteristics and noise conditions, and should be interpreted as a conditional improvement rather than a universally optimal choice for all optical lightning monitoring scenarios.

5.2. Practical Value for Real-World Monitoring

From an application perspective, FD-TripletNet’s lightweight architecture enables real-time inference on low-power edge platforms such as remote weather observation stations. This capability is particularly valuable in regions where ground-based lightning monitoring infrastructure remain sparse. The achieved low false positive rate reduces unnecessary alarms and operational costs, while the low false negative rate ensures that critical lightning events are unlikely to be missed, supporting applications such as wildfire prevention, power grid protection, and severe weather monitoring.

Rather than replacing space-based lightning detection systems, the proposed framework complements them by providing high-resolution, ground-level validation and localized monitoring, thereby enhancing overall situation awareness.

5.3. Limitations Related to Signal Strength and Environmental Conditions

Despite its effectiveness, FD-TripletNet inherits inherent limitations associated with single-model optical sensing. Extremely dim lightning events produce weak frame difference signals that can be submerged by sensor noise, accounting for the majority of false negative cases observed in the experiments. This limitation reflects a fundamental constraint of optical-only approaches rather than deficiencies in the network design.

Performance degradation is also observed under adverse weather conditions such as heavy rain or fog. Atmospheric scattering and attenuation distort brightness patterns and reduce contrast, directly affecting the reliability of frame difference representations. While adaptive preprocessing may alleviate this issue to some extent, it cannot be fully resolved within a purely optical framework.

5.4. Background Interference and Motion-Induced False Positives

The model demonstrates strong robustness against slow wind-included interference, such as swaying tree branches or shaking weeds. These background motions typically generate low-intensity, temporally continuous frame differences with stable spatial distributions, which can be effectively suppressed through intensity-based screening and non-consecutive frame selection.

However, fast-moving small-scale objects, including birds or high-speed insects, remain a challenging source of false positives. Such objects produce localized, transient, and high-intensity brightness changes that partially overlap with lightning features. Their discrete motion trajectories do not satisfy temporal continuity assumptions, making them difficult to filter using frame-level screening alone. This limitation highlights the intrinsic ambiguity of short-duration optical signals in complex outdoor environments.

5.5. Adaptability to Different Optical Configurations

Another practical limitation concerns adaptability to different focal length lenses, which is crucial for large-scale deployment across heterogeneous monitoring stations. In this study, experimental validation of focal length adaptability was not feasible due to objective constraints. Lightning events are highly seasonal and concentrated within a limited collection window, making coordinated sampling across multiple lens types impractical. Moreover, capturing synchronized recordings of the same lightning event with different focal lengths at the same location and time is extremely challenging under natural conditions.

5.6. Future Directions

Future work will focus on integrating multi-modal data sources, such as atmospheric electric field measurements and radar observations, to improve detection of extremely dim lightning. Weather-aware adaptive preprocessing strategies will be explored to mitigate scattering effects under adverse conditions. In addition, multi-scale feature fusion and attention-based spatial-temporal modeling will be investigate to further suppress motion-induced false positives.

To address focal length adaptability, we plan to collaborate with regional meteorological observatories to establish a multi-focal-length lightning monitoring network. By collecting synchronized optical recordings of the same lightning events across different focal lengths, the robustness and generalization of FD-TripletNet under varying optical configurations can be systematically evaluated.

Overall, FD-TripletNet demonstrates that aligning representation learning with lightning’s physical characteristics enables a practical and deployable solution for optical lightning classification.

6. Conclusions

This study proposes a Frame Difference Triplet Network (FD-TripletNet) to address the challenges of scarce labeled samples, extreme temporal instantaneity, and diverse discharge morphologies in optical lightning classification. By leveraging adjacent frame difference representations, metric learning with Triplet Loss, and non-consecutive temporal sampling, the proposed framework effectively alleviates the limitations of traditional handcrafted methods and baseline deep learning models under few-shot conditions.

Experimental evaluation on a self-built dataset of 459 optical lightning samples demonstrates that FD-TripletNet achieves a classification accuracy of 94.8%, a false positive rate of 7.4%, and a false negative rate of 3.2%, outperforming frame difference plus SVM, MobileNetV2, and static-input TripletNet baselines. The lightweight MobileNetV2-based architecture further enables real-time inference on edge devices, supporting practical deployment in remote observation stations and severe convective weather monitoring systems.

At the same time, the study highlights inherent limitations of single-modal optical approaches, including reduced sensitivity to extremely dim lightning, performance degradation under heavy rain or fog, and susceptibility to fast-moving non-lightning objects. Addressing these challenges will require multi-modal sensing, adaptive temporal modeling, and more comprehensive interference-aware training strategies.

In summary, FD-TripletNet provides a physically informed, data-efficient, and deployable framework for optical lightning classification, laying a solid foundation for advancing real-time ground-based meteorological monitoring and early warning applications.

Author Contributions

Conceptualization, M.X. and Q.Z.; methodology, M.X., B.D. and Q.Z.; validation, M.X. and B.D.; formal analysis, B.D., M.X. and X.P.; resources, Q.Z.; data curation, Y.Y. and C.D.; writing—original draft preparation, M.X.; writing—review and editing, M.X. and Q.Z.; visualization, M.X. and Y.Y.; supervision, Q.Z.; project administration, M.X. and Y.Y.; funding acquisition, Q.Z. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Foundation of Key Laboratory of Meteorological Disaster (KLME), Ministry of Education and Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters (CIC-FEMD) under Grant No. KLME202311; and the Foundation of Key Laboratory of Big Data and Artificial Intelligence in Transportation (Beijing Jiaotong University), Ministry of Education under Grant No. BATLAB202402.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset available on request from the authors.

Acknowledgments

The authors would like to thank the staff of the two optical observation stations in Nanchang (Jiangxi Province, 28.7°N, 115.5°E) and Zhongshan (Guangdong Province, 22.6°N, 113.4°E) for collecting and organizing the lightning optical video dataset, all members of the research team at Nanjing University of Information Science and Technology for valuable discussions and technical support in data preprocessing and network training, the developers of open-source tools for facilitating the study, and the anonymous reviewers for constructive comments that improved the manuscript quality.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

FD-TripletNet	Frame Difference Triplet Network
FPR	False Positive Rate
FNR	False Negative Rate
FSL	Few-shot Learning
CNN	Convolutional Neural Networks
MAE	Masked Autoencoders
BLNET	Broadband Lightning Network
ISS	International Space Station
CG	Cloud-to-ground
IC	Intra-cloud
SSL	Self-supervised Learning
ViT	Vision Transformer
BN	Batch Normalization
TP	True Positives
TN	True Negatives
FP	False Positives
FN	False Negatives

References

Zhu, S.; Miao, M. SCNet: A lightweight and efficient object detection network for remote sensing. IEEE Geosci. Remote Sens. Lett. 2023, 21, 6001605. [Google Scholar] [CrossRef]
Simsek, E.; Ozyer, B. Selected Three Frame Difference Method for Moving Object Detection. Int. J. Intell. Syst. Appl. Eng. 2021, 9, 48–54. [Google Scholar] [CrossRef]
Zeng, X.; Guo, Y.; Li, L.; Liu, Y. Continual medical image denoising based on triplet neural networks collaboration. Comput. Biol. Med. 2024, 179, 108914. [Google Scholar] [CrossRef] [PubMed]
Asery, R.; Sunkaria, R.K.; Sharma, L.D.; Kumar, A. Fog detection using GLCM based features and SVM. In 2016 Conference on Advances in Signal Processing (CASP), Pune, India, 9–11 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 72–76. [Google Scholar] [CrossRef]
Sasithradevi, A.; Roomi, S.M.M.; Mareeswari, M. A vision based method for detecting lightning in surveillance videos. In 2016 International Conference on Emerging Technological Trends (ICETT), Kollam, India, 21–22 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–5. [Google Scholar] [CrossRef]
Li, Z.; Kamnitsas, K.; Glocker, B. Overfitting of Neural Nets Under Class Imbalance: Analysis and Improvements for Segmentation. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2019; Shen, D., Ed.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2019; Volume 11766, p. 45. [Google Scholar] [CrossRef]
Cheng, K.; Scott, G.J. Evaluation of a Meta-Transfer Approach for Few-Shot Remote Sensing Scene Classification. In IGARSS 2023—2023 IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 16–21 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 5002–5005. [Google Scholar] [CrossRef]
Młodzianowski, P. Weather Classification with Transfer Learning—InceptionV3, MobileNetV2 and ResNet50. In Digital Interaction and Machine Intelligence; Biele, C., Kacprzyk, J., Kopeć, W., Owsiński, J.W., Romanowski, A., Sikorski, M., Eds.; Lecture Notes in Networks and Systems; Springer: Cham, Switzerland, 2022; Volume 440, pp. 3–11. [Google Scholar] [CrossRef]
Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
Shi, W.; Shi, D.; Orouskhani, M.; Tian, F. Adaptive few-shot deep metric learning. Int. J. Electr. Inform. Eng. 2021, 15, 289–295. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
Cross, T.; Smit, J.R.; Schumann, C.; Warner, T.A.; Hunt, H.G.P. Deep Learning for High-Speed Lightning Footage—A Semantic Segmentation Network Comparison. Atmosphere 2024, 15, 873. [Google Scholar] [CrossRef]
Lu, J.; Li, J.; Liu, Y.; Yuan, S.; Pu, Y.; Bian, Q.; Jiang, R. An efficient lightning classifier using a self-supervised learning neural network. Geophys. Res. Lett. 2025, 52, e2025GL115067. [Google Scholar] [CrossRef]
Qian, Z.; Wang, D.; Shi, X.; Yao, J.; Hu, L.; Yang, H.; Ni, Y. Lightning Identification Method Based on Deep Learning. Atmosphere 2022, 13, 2112. [Google Scholar] [CrossRef]
Schultz, C.J.; Lang, T.J.; Leake, S.; Runco, M.; Stefanov, W. A technique for automated detection of lightning in images and video from the International Space Station for scientific understanding and validation. Earth Space Sci. 2021, 8, e2020EA001085. [Google Scholar] [CrossRef]
Dai, B.; Zhang, Q.; Li, J.; Liu, Y.; Zhao, M. New Method for Single-Site Cloud-to-Ground Lightning Location Based on Tri-Pre Processing. Remote Sens. 2025, 17, 1766. [Google Scholar] [CrossRef]
Wang, J.; Song, L.; Zhang, Q.; Li, J.; Ge, Q.; Yan, S.; Wu, G.; Yang, J.; Zhong, Y.; Li, Q. A Lightning Optical Automatic Detection Method Based on a Deep Neural Network. Remote Sens. 2024, 16, 1151. [Google Scholar] [CrossRef]
Li, A.; Huang, W.; Lan, X.; Feng, J.; Li, Z.; Wang, L. Boosting few-shot learning with adaptive margin loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12576–12584. [Google Scholar] [CrossRef]
Wang, X.; Zhang, S.; Cen, J. CLIP-guided Prototype Modulating for Few-shot Action Recognition. Int. J. Comput. Vis. 2024, 132, 1899–1912. [Google Scholar] [CrossRef]
Xue, H.; An, Y.; Qin, Y.; Li, W.; Wu, Y.; Che, Y.; Fang, P.; Zhang, M.L. Toward Few-Shot Learning in the Open World: A Review and Beyond. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 10420–10440. [Google Scholar] [CrossRef] [PubMed]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Duklan, N.; Kumar, S.; Maheshwari, H.; Singh, R.; Sharma, S.D.; Swami, S. CNN Architectures for Image Classification: A Comparative Study Using ResNet50V2, ResNet152V2. SSRG Int. J. Electr. Commun. Eng. 2024, 11, 11–21. [Google Scholar] [CrossRef]
Dewi, C.; Arshed, M.A.; Christanto, H.J.; Rehman, H.A.; Muneer, A.; Mumtaz, S. Enhancing Weather Scene Identification Using Vision Transformer. World Electr. Veh. J. 2024, 15, 373. [Google Scholar] [CrossRef]
Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]

Figure 1. Overall Architecture of Lightning Classification Network.

Figure 2. Network structure.

Figure 3. Sequence length optimization experiment process.

Figure 4. Lightning Optical Dataset Structure: Raw Grayscale and Lightning Channel-Enhanced Images.

Figure 5. Comparison of ingle-scale and multi-scale FD-TripletNet under different sequence lengths K.

Figure 6. Confusion matrix of the single-scale FD-TripletNet.

Figure 7. Confusion matrix of the multi-scale FD-TripletNet.

Table 1. Network parameter settings.

Module	Input Channels	Output Channels	Parameters	Number of Parameters
Conv2d_1	1	32	$3 \times 3$ convolution, stride = 2, padding = 1, bias = False	$1 \times 32 \times 3 \times 3 = 288$
BatchNorm2d_1	32	32	Batch normalization (BN)	∼64
ReLU6_1	32	32	Activation function	0
InvertedResidual Block 1	32	16	t = 1, n = 1, s = 1	∼1.2 K
InvertedResidual Block 2 ( $\times 2$ )	16	24	t = 6, n = 2, s = 2	∼9.0 K $\times 2$
InvertedResidual Block 3 ( $\times 3$ )	24	32	t = 6, n = 3, s = 2	∼14.1 K $\times 3$
InvertedResidual Block 4 ( $\times 4$ )	32	64	t = 6, n = 4, s = 2	∼30.6 K $\times 4$
InvertedResidual Block 5 ( $\times 3$ )	64	96	t = 6, n = 3, s = 1	∼56.1 K $\times 3$
InvertedResidual Block 6 ( $\times 3$ )	96	160	t = 6, n = 3, s = 2	∼86.4 K $\times 3$
InvertedResidual Block 7	160	320	t = 6, n = 1, s = 1	∼153.6 K
Conv2d	320	1280	$1 \times 1$ convolution, bias = False	$320 \times 1280 = 409, 600$
BatchNorm2d	1280	1280	Batch normalization (BN)	∼2560
ReLU6	1280	1280	Activation function	0
AdaptiveAvgPool2d	$(1280, H, W)$	$(1280, 1, 1)$	Global average pooling	0
Flatten	$(1280, 1, 1)$	1280	Flattening operation	0
Dropout	1280	1280	p = 0.2	0
Linear (Embedding)	1280	128	Fully connected layer	$1280 \times 128 + 128 = 163, 968$
Reshape and Permute	$(B \times L, 128)$	$(B, 128, L)$	Temporal dimension reorganization	0
AdaptiveAvgPool1d	$(128, L)$	$(128, 1)$	Temporal average pooling	0
Squeeze	$(128, 1)$	128	Remove dimension of length 1	0
Linear (Classifier)	128	2	Fully connected layer	$128 \times 2 + 2 = 258$

Note: B = batch size; L = sequence length; “K” denotes 1000.

Table 2. Sample distribution of the dataset.

Video Category	Total Videos	Training Set	Test Set
Lightning	250	164	86
Non-lightning	209	155	54

Table 3. Comprehensive performance of all the methods.

Method	Accuracy (%)	Precision (%)	Recall (%)	F1-Score	FPR (%)	FNR (%)
Traditional Frame Difference + SVM	76.5	72.3	70.1	0.712	28.6	29.9
CNN (MobileNetV2)	85.2	82.7	81.5	0.821	16.3	18.5
TripletNet (Static Input)	90.1	87.6	85.9	0.867	10.2	14.1
Single-Scalse FD-TripletNet ( $Δ t = 1$ )	92.3	88.7	89.2	0.908	9.1	5.9
Multi-Scalse FD-TripletNet ( $Δ t = 1, 2, 3$ )	94.8	91.3	93.6	0.924	7.4	3.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xiao, M.; Yan, Y.; Zhang, Q.; Liu, Y.; Pan, X.; Dai, B.; Duan, C. A Few-Shot Optical Classification Approach for Meteorological Lightning Monitoring: Leveraging Frame Difference and Triplet Network. Remote Sens. 2026, 18, 386. https://doi.org/10.3390/rs18030386

AMA Style

Xiao M, Yan Y, Zhang Q, Liu Y, Pan X, Dai B, Duan C. A Few-Shot Optical Classification Approach for Meteorological Lightning Monitoring: Leveraging Frame Difference and Triplet Network. Remote Sensing. 2026; 18(3):386. https://doi.org/10.3390/rs18030386

Chicago/Turabian Style

Xiao, Mengmeng, Yulong Yan, Qilin Zhang, Yan Liu, Xingke Pan, Bingzhe Dai, and Chunxu Duan. 2026. "A Few-Shot Optical Classification Approach for Meteorological Lightning Monitoring: Leveraging Frame Difference and Triplet Network" Remote Sensing 18, no. 3: 386. https://doi.org/10.3390/rs18030386

APA Style

Xiao, M., Yan, Y., Zhang, Q., Liu, Y., Pan, X., Dai, B., & Duan, C. (2026). A Few-Shot Optical Classification Approach for Meteorological Lightning Monitoring: Leveraging Frame Difference and Triplet Network. Remote Sensing, 18(3), 386. https://doi.org/10.3390/rs18030386

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Few-Shot Optical Classification Approach for Meteorological Lightning Monitoring: Leveraging Frame Difference and Triplet Network

Highlights

Abstract

1. Introduction

2. Related Work

2.1. Lightning Detection and Classification

2.1.1. Optical Video-Based Lightning Segmentation and Classification

2.1.2. Satellite and Multi-Sensor Lightning Validation

2.1.3. Remote Sensing-Assisted Meteorological Context

2.2. Few-Shot Learning for Lightning and Related Visual Tasks

2.2.1. Prototype-Based Few-Shot Learning

2.2.2. Adaptive Metric and Loss for Heterogeneous Data

2.2.3. Cross-Model and Self-Supervised Augmentation

2.2.4. Open-World Few-Shot Learning Challenges

2.3. Efficient Network Architectures and Discriminative Loss Functions

2.3.1. From Deep to Lightweight: Architecture Evolution

2.3.2. Transformer for Global Feature Modeling

2.3.3. Discriminative Loss for Feature Separation

3. Model Design and Research

3.1. Dataset Construction and Preprocessing

3.2. Network Architecture Design

3.3. Frame Difference Feature-Extraction Module

3.4. Loss Function Design

3.4.1. Adaptive Triplet Loss

3.4.2. Binary Cross-Entropy Loss

3.4.3. Joint Loss Function

3.5. Temporal Frame Selection and Sequence Construction

3.5.1. Frame-Level Screening Based on Intensity Variation

3.5.2. Sequence-Level Temporal Length Determination

3.6. Training Strategies

3.6.1. Mixed-Precision Training

3.6.2. Gradient Accumulation

3.6.3. Optimizer and Learning Rate Adjustment

4. Experiments and Results

4.1. Dataset

4.2. Evaluation Metrics

4.3. Experimental Settings

4.4. Baseline Methods

4.5. Overall Performance Comparison

4.6. Ablation Study

4.6.1. Impact of Temporal Scale Selection

4.6.2. Impact of Frame Selection Strategies

4.6.3. Confusion Matrix Analysis

5. Discussion

5.1. Effectiveness of Design Choices for Lightning Classification

5.2. Practical Value for Real-World Monitoring

5.3. Limitations Related to Signal Strength and Environmental Conditions

5.4. Background Interference and Motion-Induced False Positives

5.5. Adaptability to Different Optical Configurations

5.6. Future Directions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI