A Small-Sample Target Detection Method for Transmission Line Hill Fires Based on Meta-Learning YOLOv11

Huo, Yaoran; Zhang, Yang; Xu, Jian; Dai, Xu; Shen, Luocheng; Liu, Conghong; Fang, Xia

doi:10.3390/en18061511

Open AccessArticle

A Small-Sample Target Detection Method for Transmission Line Hill Fires Based on Meta-Learning YOLOv11

by

Yaoran Huo

¹,

Yang Zhang

¹,

Jian Xu

¹,

Xu Dai

¹,

Luocheng Shen

¹,

Conghong Liu

¹ and

Xia Fang

^2,*

¹

Information & Communication Company, State Grid Sichuan Electric Power Company, Chengdu 610065, China

²

School of Mechanical Engineering, Sichuan University, Chengdu 610065, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(6), 1511; https://doi.org/10.3390/en18061511

Submission received: 31 December 2024 / Revised: 8 February 2025 / Accepted: 14 February 2025 / Published: 19 March 2025

(This article belongs to the Topic Advances in Power Science and Technology, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

China has a large number of transmission lines laid in the mountains and forests and other regions, and these transmission lines enable national strategic projects such as the west-east power transmission project. However, the occurrence of mountain fires in the corresponding areas will seriously affect these transmission projects. At the same time, these mountain fires yield fewer image samples and complex backgrounds. Based on this, this paper proposes a transmission line hill fire detection model with YOLOv11 as the basic framework, named meta-learning attention YOLO (MA-YOLO). Firstly, the feature extraction module in it is replaced with a meta-feature extraction module, and the scale of the detection head is adjusted to detect smaller-sized hill fire targets. After this, the re-weighting module learns class-specific re-weighting vectors from the support set samples and uses them to recalibrate the mapping of meta-features. To enhance the model’s ability to learn target hill fire features from complex backgrounds, adaptive feature fusion (AFF) is integrated into the feature extraction process of YOLOv11 to improve the model’s feature fusion capabilities, filter out useless information in the features, and reduce the interference of complex backgrounds in detection. The experimental results show that the accuracy of MA-YOLO is improved by 10.8% in few-shot scenarios. MA-YOLO misses fewer hill fire targets in different scenarios and is less likely to be affected by complex backgrounds.

Keywords:

few-shot; meta-learning; adaptive feature fusion (AFF); spatial and channel reconstruction convolution; transmission lines; hill fire detection; convolutional neural network; deep learning

1. Introduction

Recent years have seen frequent wildfires occurring around the globe. Transmission lines are a major component of the power system and a crucial channel for the transmission of electrical energy [1]. However, the construction and layout of transmission lines often inevitably traverse forested areas, grasslands, and other regions where combustible materials are concentrated. Consequently, when a wildfire occurs, the transmission lines may be at risk of being engulfed or damaged by the fire source [2]. The necessity of detecting wildfires on transmission lines becomes particularly important [3]. Firstly, regular inspections for wildfires on transmission lines can be used to monitor and assess the risk of wildfires in power infrastructure areas and promptly discover and address fire sources to prevent fire incidents, thereby reducing the economic losses caused by fires and protecting the surrounding environment and ecosystems to avoid ecological crises caused by fires. This requires scientific methods and advanced technologies such as artificial intelligence, remote sensing technology, and drone inspections to improve the accuracy and efficiency of wildfire detection, so that we can respond to potential fire risks in a timely manner [4].

Traditional wildfire detection based on sensors requires the fusion of multi-sensor data [5], including temperature sensors, humidity sensors, and smoke sensors, among others, to integrate various sensor data and extract the fire source characteristics. The method of multi-sensor data fusion has certain advantages in fire source detection, but it has limitations in its monitoring range. The deployment cost is high, and it requires the handling of multiple data types, making the data processing and analysis procedure complex [6]. With the continuous development of image processing technology, the use of images for forest fire detection has become the mainstream trend in forest fire monitoring. In recent years, computer vision-based wildfire detection has been widely applied [7]. These methods have the advantages of less human interference, lower costs, a wider monitoring range, and real-time monitoring. Traditional machine learning methods include using SVM, decision trees, etc., which require manual feature extraction (such as hue, brightness, and texture, etc.) and the classification of features. This method is very dependent on the selected features and the performance of the classifier [1]. Feature extraction is manually performed, and this has a significant impact on feature selection and model performance. If the selected features cannot effectively reflect the characteristics of the fire, it may lead to poor detection performance [8]. At the same time, fire scenes have large variability, and fixed feature extraction methods may not be able to adapt to changes in various environmental factors [5].

In recent years, deep learning-based methods have developed rapidly, allowing for more efficient wildfire detection [9]. Currently, the most commonly used object detection algorithms include the region convolutional neural network (RCNN), single-shot multi-box detector (SSD) [10], faster region convolutional neural network (Faster RCNN) [11], and You Only Look Once (YOLO) [12]. These methods, based on convolutional neural networks (CNNs), can automatically learn and extract image features, avoiding the problem of feature selection. The monitoring of wildfires requires the rapid detection and control of wildfires. Compared with other algorithms, the YOLO algorithm, as a single-stage algorithm, has better real-time performance [13]. Moreover, YOLOv11 has been extensively used in wildfire detection due to its high accuracy and real-time performance, as demonstrated in several studies [7,14,15]. These works show that YOLOv11 can effectively handle complex backgrounds and small-scale objects, making it well suited for wildfire monitoring. Therefore, this study is based on the use of YOLOv11 for wildfire object detection.

However, YOLOv11 requires a large number of labeled data for training, and actual wildfire images are often scarce, being insufficient for target detection tasks [16]. To address the small sample problem in wildfire detection, some researchers have tried using meta-learning approaches [17]. Wildfire images also have complex backgrounds with many distractions and significant scale differences [18]. The original YOLO algorithm lacks the ability to fuse features across multiple scales, making it easily affected by complex backgrounds [19]. To address these issues, this paper proposes a transmission line wildfire target detection model called MA-YOLO (where M refers to meta-learning and A refers to the attention module). This model is based on the convolutional neural network detector YOLOv11 [12] and incorporates several improvements. The first is a meta-learning mechanism: to solve the problem of insufficient sample data for wildfire images, we propose a feature extractor based on meta-learning. During the feature extraction process, we use a meta-feature extractor and re-weighting module to improve the learning ability of the model in small-sample scenarios [20]. The second is an adaptive feature fusion module: to reduce interference from complex backgrounds, we replace the first convolutional module in the feature extraction process with a spatial and channel reconstruction module (SCConv). We then adopt an adaptive feature fusion (AFF) module with pruning operations to dynamically adjust the fusion ratios between different feature layers and remove redundant channels, enhancing multi-scale feature fusion and reducing the computational overhead [21]. Moreover, we include virtual data augmentation: we utilize the AirSim plugin from Unreal Engine to generate a large number of virtual wildfire image samples, constructing a support set to enhance the detection capabilities for small-sample wildfire data [22]. Lastly, we apply a small target detection head: to enhance the network’s detection ability for small-scale wildfires, we add a specialized small target feature detection head to YOLOv11 [15].

By incorporating these improvements, MA-YOLO addresses the limitations of existing wildfire detection models. Specifically, it enhances the model’s generalization abilities in small-sample conditions and its robustness against complex backgrounds through the introduction of meta-learning mechanisms and adaptive feature fusion techniques.

Our contributions are as follows.

(1): We designed a meta-learning-based target detection framework for small-sample wildfire detection and used a virtual engine to build a support set of wildfire image samples, thereby improving the detection accuracy of the target detection network in small-sample wildfire data.
(2): To further reduce the interference from complex backgrounds in wildfire detection, we constructed an adaptive feature fusion module with pruning operations, adaptively adjusting the fusion ratios between different feature layers and removing channels to enhance multi-scale feature fusion and reduce the computational overhead.
(3): To reduce the redundancy of CNN features in the support set feature extraction process, we performed spatial and channel reconstruction in the feature extraction process of the re-weighting module to reduce spatial redundancy and channel redundancy.
(4): We constructed a real wildfire image dataset and conducted extensive testing on the embedded device NUC13ANK i7000 in the inspection drone, thereby verifying the accuracy and detection speed of the method.

2. Related Works

2.1. Object Detection and YOLOv11 in the Field of Object Detection

The You Only Look Once (YOLO) series of algorithms is a milestone that is both important and popular [23]. Since the initial YOLOv1 was proposed in 2016, the algorithm has been iterated and improved multiple times. YOLOv11, one of the newer members of the family, is an efficient single-level target detection framework. As shown in Figure 1, YOLOv11 implements several architectural improvements. Firstly, it adopts a structure similar to that of YOLOv8’s CSPNet [17] to reduce the number of model parameters and decrease the computational costs. Secondly, YOLOv11 leverages more data augmentation techniques and more effective feature fusion methods to further improve the accuracy and robustness of the model. Additionally, YOLOv11 achieves a better balance between model size and speed, making it suitable for deployment on edge devices to detect wildfires, as well as on high-performance computing platforms [15].

On benchmark datasets, YOLOv11 has demonstrated excellent performance, particularly in terms of the detection speed, which has been proven to be faster than previous YOLO versions while maintaining comparable or higher accuracy [24]. YOLOv11 has become an ideal choice for applications such as video surveillance and real-time object detection. Moreover, YOLOv11 has shown great potential in wildfire detection and power line defect detection. Despite the significant achievements of YOLOv11, it still has limitations, such as insufficient precision in handling small object detection and a decrease in detection accuracy when dealing with multi-scale objects [24]. Additionally, interference from complex backgrounds can also affect the accuracy of the model [20].

2.2. Few-Shot and Meta-Learning

Wildfire detection is a task of few-shot object detection due to the limited real-world image samples of wildfires. The challenge lies in accurately identifying various types of wildfires in a few-shot scenario. Few-shot learning typically focuses on designing models that can handle a very small number of labeled data. These models need to leverage prior knowledge or transfer the learned general knowledge to new tasks. Meta-learning provides a framework for few-shot learning, enabling models to enhance their generalization abilities through prior experience [21].

Meta-learning allows machine learning algorithms to quickly adapt to new tasks by leveraging past experiences, even with only a few samples. In the context of few-shot wildfire detection, meta-learning can play a significant role.

Meta-learning methods can be roughly divided into three categories: model-based, optimization-based, and metric-based methods. Model-based methods, such as model-agnostic meta-learning (MAML) [22], operate directly on the model parameters, training the model to adapt to new tasks after a few gradient updates. Optimization-based methods, such as Reptile [25], focus on the optimization process, enabling the model to converge faster on new tasks. Metric-based methods, such as ProtoNet [26] and RelationNet [27], improve the few-shot performance by learning a space that can distinguish different categories. The method proposed in this study can extract hierarchical features at different scales and improve the performance by executing multi-scale object detection in a few-shot scenario.

Meta-learning has shown potential in few-shot learning problems, but there are still many challenges in practical applications, including, but not limited to, the selection of task distributions, the prevention of model overfitting, and the improvement of the cross-domain task generalization ability. Effectively designing the task distribution is crucial in enabling meta-learning models to quickly adapt to new tasks, but it is difficult to obtain an ideal task distribution in practice. During the meta-learning process, the composition of the support set is crucial for the training and adaptation ability of the model to new tasks. Due to the significant cost of obtaining sufficient wildfire images in reality, designing the support set in a reasonable way can help to maximize the use of existing data. The experimental Section 4 will discuss the composition of the support set in meta-learning.

2.3. Combination of YOLO and Meta-Learning

Previous studies have highlighted the effectiveness of deep learning-based object detection algorithms, particularly the YOLO series, in wildfire detection. YOLOv11 has shown excellent performance in terms of real-time detection and accuracy but faces challenges with small objects and complex backgrounds [22]. Meanwhile, few-shot learning and meta-learning offer promising solutions for scenarios with limited labeled data, enabling models to adapt quickly using minimal samples [17].

Our approach, named meta-learning-augmented YOLO (MA-YOLO), addresses these challenges by integrating a meta-learning mechanism to enhance learning from small datasets. We introduce an adaptive feature fusion module that improves the multi-scale feature integration and reduces background interference. Additionally, we construct a support set using virtual wildfire images generated by Unreal Engine’s AirSim plugin, which helps to maximize the use of existing data. These innovations enable our model to achieve better performance in detecting wildfires, especially in few-shot scenarios, while maintaining high accuracy and speeds.

By combining these elements, MA-YOLO not only mitigates the limitations of YOLOv11 but also significantly enhances the wildfire detection capabilities, providing a robust solution for real-world applications

3. The MA-YOLO Network

Because of the small number of real image samples of hill fires, it is a challenging task to accurately recognize hill fire targets in small-sample scenarios. In this study, a meta-learning-based hill fire target detection network (MA-YOLO) is proposed for small-sample hill fire detection. The modeled network can perform target detection on images from the invisible class with only a small number of annotated samples. Two types of data are available for training, i.e., visible and invisible classes. For hill fire detection with small samples, a large number of annotated samples constructed from the virtual dataset are defined as the visible class, while real hill fire samples with only a small number of markers are defined as the invisible class. In this section, the network structure of MA-YOLO is described in detail. As shown in Figure 2, the network is based on a modification of the YOLOv11 algorithm, and the backbone of the model consists of three parts: (1) meta-feature extractor: extracts meta-features of the hill fire target from the input query image; (2) re-weighting module: learns the supportive information to be embedded into the re-weighting vector and adjusts the contribution of each meta-feature of the query image accordingly; (3) detection head: hill fire target detection based on final characterization.

3.1. Meta-Feature Extractor

The meta-feature extractor network is designed to extract robust feature representations from the input query images. Due to the varying scales of wildfire targets in wildfire images, a multi-scale feature extraction network is used to better extract wildfire features for wildfire detection. The feature extractor proposed in this study is designed based on YOLOv11. For each input query image, the meta-feature extraction module generates meta-features at three different scales. Let I^q ∈ Q (q ∈ {1, 2, …, N_q}) be one of the input query images; the meta-features generated by the meta-feature extractor can be represented by the following formula:

F_{i}^{q} = F_{θ} (I) \in R^{h_{i} \times w_{i} \times m_{i}},

(1)

where

F_{θ}

represents the feature encoding network with parameters θ; i represents the scale level, i ∈ {1, 2, 3}; and h_i, w_i, and m_i represent the size of the feature map at scale i. This study addresses the problem of poor small-object detection performance in the original YOLOv11 by removing the P5 (20 × 20) feature level and adding the P2 (160 × 160) feature level to improve the network’s detection ability for small wildfire targets. The final features extracted at the three scales are P2 (160 × 160), P3 (80 × 80), and P4 (40 × 40). The adaptive feature fusion module (AFF) [28], as shown in Figure 3a, is then used to adaptively fuse the features at the three scales, and channel pruning is introduced in the AFF module to reduce redundant features and improve the computational efficiency.

For example, in AFF-3, two rounds of upsampling are required to ensure that the results of P2 and P3 are the same size as those of P4. First, P2 and P3 need to be compressed to the same number of channels as P4 through 1 × 1 convolution, and then upsampling is performed by a factor of four and two to obtain the same dimensions as P4, resulting in corresponding feature maps

X_{i}

. After this, for each feature map

X_{i}

, the importance measure of each channel j is calculated as follows:

S_{i, j} = \sum_{a, b} |X_{i, j} (a, b)|,

(2)

The importance score

S_{i, j}

reflects the contribution of a specific channel to the overall detection performance. A larger value of

S_{i, j}

indicates that the corresponding channel plays a more significant role in identifying key features necessary for accurate object detection. Specifically, this importance is measured based on the activation levels across different channels; channels with higher activation levels are considered more important because they capture more relevant information about the objects being detected. This scoring mechanism helps in dynamically adjusting the fusion weights, thereby enhancing the model’s ability to focus on critical features while filtering out less relevant information [29]. By defining importance in terms of the contribution to detection accuracy, we ensure that the model can prioritize essential features, improving its robustness and effectiveness, especially in challenging scenarios such as detecting small-scale wildfires or handling complex backgrounds.

Based on the results of the L1 norm, a pruning threshold θ is set, and the channel index set

I_{i}

is selected. The larger the value of

S_{i, j}

, the greater the contribution of channel j to the feature map, and the “more important” it is. The channels with

S_{i, j} > θ

are selected, and their channel index sets

I_{i}

are chosen.

I_{i} = \{j ∣ S_{i, j} > θ\},

(3)

The feature map

X_{i}^{'}

is reconstructed based on the set

I_{i}

of channel indices:

X_{i, j}^{'} = X_{i, j}, if j \in I_{i},

(4)

Finally, the sum is taken;

X_{2}^{'}

,

X_{3}^{'}

, and

X_{4}^{'}

are the features of P2, P3, and P4, respectively. Then, the three features are multiplied by their respective parameters and summed to obtain the fused features after AFF-3. This process is shown in Equation (5):

y_{3} = α_{3} X_{2}^{'} + β_{3} X_{3}^{'} + r_{3} X_{4}^{'}

(5)

where

y_{3}

represents the new feature map obtained through AFF. These parameters are the weight parameter of the three feature layers, and, through the Softmax function, it satisfies Equation (6):

α_{3} + β_{3} + r_{3} = 1,

(6)

α_{3}

and

β_{3}

are calculated via the following formulas:

\{\begin{matrix} α_{3} = \tanh (B N (W_{2} \cdot S i L U (B N (W_{1} \cdot [X_{2}^{'}, X_{4}^{'}])))) \\ β_{3} = \tanh (B N (W_{2} \cdot S i L U (B N (W_{1} \cdot [X_{3}^{'}, X_{4}^{'}])))) \end{matrix},

(7)

where [·] represents the channel-wise concatenation along the cascade. W₁ and W₂ represent pointwise convolutions. BN stands for batch normalization [30]. SiLU(·) and tanh(·) represent the Sigmoid linear unit (SiLU) and tanh activation functions, respectively.

The AFF module is integrated into the wildfire detection model in this study. It can improve the multi-scale feature fusion of wildfire targets and fully utilize the fine-grained features at the lower level and the semantic information of the high-level features. This enhances the model’s ability to represent wildfire target features in complex forest environments, suppresses the interference of invalid features in complex forest environments during wildfire detection, and improves the accuracy of MA-YOLO in detecting wildfire targets in complex forest environments.

3.2. Feature Re-Weighting Module

The re-weighting module M is designed as a lightweight CNN to improve the efficiency and simplify its learning. The feature re-weighting module extracts meta-knowledge from support images and guides wildfire detection in query images. The re-weighting module maps each support image to a set of re-weighting vectors, tailored to different detection heads. These re-weighting vectors are used to adjust the contributions of the meta-features, highlighting those that are meaningful for new object detection.

The support samples are from virtual wildfire images. Our feature re-weighting module receives input from virtual wildfire images and their annotations [31]. The feature re-weighting module first uses a feature encoding network Gφ (with parameters φ) to extract features for each object, obtaining a representation vector of wildfire features, as shown in the following formula:

V = G φ (I_{k}, M_{k})

(8)

where

I_{k}, M_{k}

represent K support images and annotations corresponding to their bounding boxes, respectively.

The re-weighting vectors V are used to re-weight the meta-features. The feature encoding module Gφ is shown in Figure 3b, and it is improved from YOLOv11’s backbone. Between the first and second convolutional modules, a spatial and channel reconstruction module (SCConv) [32] is added, which improves the performance of the re-weighting module’s feature extraction while reducing redundant features. The SCConv module utilizes two components: a spatial reconstruction unit (SRU) and a channel reconstruction unit (CRU). The SRU suppresses spatial redundancy through a separation–reconstruction method. The CRU reduces channel redundancy through a split–transform–fuse strategy. The SCConv module reduces the redundancy of features, speeds up the feature extraction of the re-weighting module, and improves the accuracy of the extraction of wildfire features. Finally, vectors with the same scale as the meta-features are generated.

After obtaining the re-weighting vector

V

, the generated meta-features

F_{i}^{q}

are multiplied by a 1 × 1 convolution implemented per channel, as the re-weighted vector

V

of the convolution kernel:

{\hat{F}}_{i}^{q} = F_{i}^{q} \otimes V

(9)

After channel multiplication, three sets of re-weighted features

{\hat{F}}_{i}^{q}

can be obtained, with each set corresponding to a scale. In each group, the feature re-weighting module produces the corresponding re-weighted feature vector. Finally, the features at the three scales are combined and used by the detection head to obtain the corresponding results of target recognition.

4. Experiment and Discussion

This study designed and implemented a drone-based wildfire inspection platform to verify the accuracy and effectiveness of MA-YOLOv11 in forest fire detection. As shown in Figure 4, the drone platform is equipped with an embedded device, NUC13ANK i7000 from Intel Corporation (Santa Clara, CA, USA), Intel RealSense from Intel Corporation (Santa Clara, CA, USA), and Pixhawk6C flight control from Holybro (Hong Kong, China). Among them, the NUC13ANK i7000 is especially suitable for real-time image processing and complex algorithm execution due to its excellent computational performance and efficient energy management. The subsequent test experiments were all implemented on the NUC13ANK i7000 platform. This section describes in detail the experimental environment and settings, evaluation indices, performance analysis methods, effect verification process, comparative analysis between different models, ablation study, and comparison of the visualization results.

4.1. Dataset

The real wildfire dataset was derived from images of wildfires from Baidu and Google; images of wildfires collected by State Grid; and a forest fire dataset based on aerial images from FLAME [33], which is provided by universities such as Northern Arizona University.

The virtual dataset is rendered by Unreal Engine. Unreal Engine can simulate and render realistic wildfire scenes and their characteristics, such as the size, shape, and color of the flames, the concentration and diffusion of smoke, and their interactions with the environment (such as mountains, trees, and weather). The wildfire images generated by Unreal Engine not only greatly enhance the scale of the training set but also cover various possible wildfire situations, satisfying the diversity requirement of meta-learning.

The model is trained on the dataset from Unreal Engine and then applied to detect real wildfire targets. The dataset from Unreal Engine consists of 1800 wildfire images in 6 different scenes [34]. There are 150 images similar to wildfires (sunset, red light, etc.). Although the virtual dataset has obvious differences from real wildfire images, the large number of virtual wildfires results in a dataset that provides sufficient support for the model to ensure its performance. A total of 1830 images are used as the support set.

4.2. Experimental Setup

The hardware and software configurations of the experimentally deployed platform are shown in Table 1. These were used to validate the accuracy and effectiveness of the improved YOLOv11 model in forest fire detection. The training phase of the platform was conducted on a PC equipped with a GeForce RTX 4070 graphics card and running the Windows 10 operating system. The Python 3.7 and PyTorch 1.8.1 deep learning frameworks were used for modeling and training in the training environment. After the training was completed, the weights of the model were deployed in the NUC13ANK i7000, an embedded device in the inspection UAV, for testing. With this platform, we were able to efficiently perform the real-time detection and prediction of hill fire images, providing an effective technical solution for forest fire monitoring in real applications. Considering the GPU memory size and time cost, we set the batch size of the model to 64, the momentum to 0.93, and the training process to 600 rounds using the Adam optimizer, with an initial value of 1 × 10⁻⁴ for the learning rate and 0.001 for the weight decay.

4.3. Evaluation Metrics

In this experiment, multiple object detection networks were compared, and different evaluation metrics were used to assess the performance of the networks. The metrics used included the precision (P), mean average precision mAP95 (%), and mAP50 (%), which were used to evaluate the performance of the model in the small-sample wildfire detection task. P reflects the model’s ability to classify samples, while AP reflects the overall performance of the model in detecting and classifying targets. mAP represents the average value of the average precision of all categories. mAP95 (%) and mAP50 (%), respectively, represent the average mAP obtained by evaluating forest target detection models at an IoU threshold of 0.5 and 0.95. The calculation formulas are as follows (Formulas (10)–(13)):

P = \frac{T P}{T P + F P}

(10)

R = \frac{T P}{T P + F N}

(11)

A P = \int_{0}^{1} P (r) d r

(12)

m A P = \frac{\sum_{i = 1}^{C} A P_{i}}{C}

(13)

4.4. Ablation Study

To verify the impact of each improvement on the network’s performance, we used YOLOv11 as a baseline and gradually added each improvement module to the model for an ablation study. The models with the best performance in the experiment were then used to validate the test set. The experimental results are shown in Table 2.

It can be seen from Table 2 that the improved MA-YOLO significantly improved the detection accuracy of forest fire targets; in particular, the mAP95 was improved by 9.8% compared to YOLOv11. At the same time, the FPS (test set containing images of different resolutions) slightly decreased, but it still reached 32.45, i.e., MA-YOLO could detect 38.21 forest fire images per second. Real-time monitoring videos usually have 25 to 30 frames per second, so the detection speed of MA-YOLO far exceeds the requirements of real-time detection.

4.5. Comparison Experiment with Different Networks

To further verify the effectiveness of the algorithm proposed in this paper, under the same training configuration and dataset conditions, the proposed network was compared with classic one-stage and two-stage target detection models, such as Faster R-CNN, SSD, YOLOX, and YOLOv11. The visualization results are shown in Figure 5, which shows that the accuracy of the network proposed in this study in terms of fewer missed detections is higher. Relative to other methods, the number of undetected hill fire targets for MA-YOLO is smaller, and the detection frame is more accurate, which proves the superiority of MA-YOLO. The evaluation metrics used were the mAP95 (%), mAP50 (%), precision, and FPS, enabling a comprehensive comparison of the performance of each network. The detailed comparison results are shown in Table 3.

As can be seen from Table 3, compared with mainstream target detection algorithms, the algorithm proposed in this paper has obvious advantages. The model’s detection speed is the fastest among its peers, fully meeting the real-time requirements of the forest fire detection task. In addition, unless otherwise specified, the datasets and processing methods used in this paper were consistent with those used in the comparison experiments to ensure fairness and accuracy.

5. Conclusions

Based on YOLOv11, in this work, we first replaced the feature extraction module with a meta-feature extraction module and adjusted the detection head’s scale to detect smaller forest fire targets. To enhance the model’s ability to learn the target fire features from complex backgrounds, adaptive spatial feature fusion (adaptive feature fusion, AFF) was integrated into the feature extraction process of YOLOv11, improving the model’s ability to fuse features and filtering out unnecessary information, thus reducing interference from complex backgrounds. Subsequently, the re-weighting module was used to learn scale-specific fire feature re-weighting vectors from the support set samples, and we used them to recalibrate the mapping of the meta-features. The experimental results show that MA-YOLO’s accuracy was improved by 9.8% in a small-sample scenario. MA-YOLO misses fewer forest fire targets in different scenarios and is less affected by complex backgrounds. Despite the promising results achieved by MA-YOLO in enhancing wildfire detection, especially in small-sample scenarios, there are several limitations that need to be acknowledged. Firstly, while our model has shown significant improvements in accuracy and robustness against complex backgrounds, its performance heavily relies on the quality and representativeness of the virtual dataset. The synthetic data might not fully capture all of the nuances and variations present in real-world wildfire scenarios, which could limit the generalization ability of the model under diverse conditions.

Looking towards future work, one promising direction is the integration of Internet of Things (IoT) technologies to address wildfire problems more comprehensively. Additionally, further studies should investigate how to effectively combine AI-driven image analysis with IoT sensor data to create a more holistic wildfire monitoring system. This would involve developing algorithms capable of fusing multi-source data and dynamically adjusting their behavior based on the real-time environmental conditions.

Author Contributions

Conceptualization, Y.H. and Y.Z.; methodology, Y.H.; software, C.L.; validation, J.X., X.D. and L.S.; formal analysis, L.S.; investigation, Y.Z.; resources, J.X.; data curation, C.L.; writing—original draft preparation, Y.H.; writing—review and editing, F.X.; visualization, L.S.; supervision, F.X.; project administration, F.X.; funding acquisition, Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Science and Technology Project of State Grid Sichuan Electric Power Company (Key Technology and Application of Accurate Identification of Transmission Line Defects for Small Data Samples, No. 521947230002).

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to the data involves sensitive information about the national grid.

Conflicts of Interest

Authors Y.H., Y.Z., J.X., X.D., L.S. and C.L. were employed by the company State Grid Sichuan Electric Power Company. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Huang, X.; Xie, W.; Zhang, Q.; Lan, Y.; Heng, H.; Xiong, J. A Lightweight Wildfire Detection Method for Transmission Line Perimeters. Electronics 2024, 13, 3170. [Google Scholar] [CrossRef]
Xie, Z.; Song, W.; Ba, R.; Li, X.; Xia, L. A Spatiotemporal Contextual Model for Forest Fire Detection Using Himawari-8 Satellite Data. Remote Sens. 2018, 10, 1992. [Google Scholar] [CrossRef]
Dao, M.; Kwan, C.; Ayhan, B.; Tran, T. Burn Scar Detection Using Cloudy MODIS Images Via Low-Rank and Sparsity-Based Models. In Proceedings of the 2016 IEEE Global Conference on Signal and Information Processing, Washington, DC, USA, 7–9 December 2016. [Google Scholar]
Zhang, N.; Sun, L.; Sun, Z. GF-4 Satellite Fire Detection with an Improved Contextual Algorithm. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 163–172. [Google Scholar] [CrossRef]
Liang, Y.; Zhou, L.; Chen, J.; Huang, Y.; Wei, R.; Zhou, E. Monitoring and Risk Assessment of Wildfires in the Corridors of High-Voltage Transmission Lines. IEEE Access 2020, 8, 170057–170069. [Google Scholar] [CrossRef]
Liu, J.; Wang, Y.; Guo, H.; Lu, Y.; Xu, Y.; Sun, Y.; Gan, W.; Sun, R.; Li, Z. Spatial and temporal patterns and driving factors of forest fires based on an optimal parameter-based geographic detector in the Panxi region, Southwest China. Fire Ecol. 2024, 20, 27. [Google Scholar] [CrossRef]
Luan, T.; Zhou, S.; Liu, L.; Pan, W. Tiny-Object Detection Based on Optimized YOLO-CSQ for Accurate Drone Detection in Wildfire Scenarios. Drones 2024, 8, 454. [Google Scholar] [CrossRef]
Matin, M.A.; Chitale, V.S.; Murthy, M.S.R.; Uddin, K.; Bajracharya, B.; Pradhan, S. Understanding forest fire patterns and risk in Nepal using remote sensing, geographic information system and historical fire data. Int. J. Wildland Fire 2017, 26, 276–286. [Google Scholar] [CrossRef]
Zheng, Z.; Hu, H.; Huang, W.; Zhou, F.; Ma, Y.; Liu, Q.; Jiang, L.; Wang, S. Wildfire Detection Based on the Spatiotemporal and Spectral Features of Himawari-8 Data. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5408213. [Google Scholar] [CrossRef]
Li, J.; Chen, J.; Yu, H.; Jiang, M.; Lu, Z.; Zhou, Y.; Wang, S.; Fan, J. Wildfire monitoring technologies of transmission line corridors based on Fengyun-3E satellite imaging. Front. Energy Res. 2023, 11, 1265516. [Google Scholar] [CrossRef]
Zheng, H.; Dembélé, S.; Wu, Y.; Liu, Y.; Chen, H.; Zhang, Q. A lightweight algorithm capable of accurately identifying forest fires from UAV remote sensing imagery. Front. For. Glob. Change 2023, 6, 1134942. [Google Scholar]
Xu, W.; Xu, T.; Alex Thomasson, J.; Chen, W.; Karthikeyan, R.; Tian, G.; Shi, Y.; Ji, C.; Su, Q. A lightweight SSV2-YOLO based model for detection of sugarcane aphids in unstructured natural environments. Comput. Electron. Agric. 2023, 211, 107961. [Google Scholar] [CrossRef]
Lin, J.; Lin, H.; Wang, F. A Semi-Supervised Method for Real-Time Forest Fire Detection Algorithm Based on Adaptively Spatial Feature Fusion. Forests 2023, 14, 361. [Google Scholar] [CrossRef]
Bondi-Kelly, E.; Dey, D.; Kapoor, A.; Piavis, J.; Shah, S.; Fang, F.; Dilkina, B.N.; Hannaford, R.; Iyer, A.; Joppa, L.N.; et al. AirSim-W: A Simulation Environment for Wildlife Conservation with UAVs. In Proceedings of the 1st ACM SIGCAS Conference on Computing and Sustainable Societies, San Jose, CA, USA, 20–22 June 2018. [Google Scholar]
Chen, Y.; Zheng, S.; Wang, H.; Cheng, L.; Chen, Q.; Qi, J. An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification. arXiv 2023, arXiv:abs/2305.12838. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv 2015, arXiv:abs/1502.03167. [Google Scholar]
Kang, B.; Liu, Z.; Wang, X.; Yu, F.; Feng, J.; Darrell, T. Few-Shot Object Detection via Feature Reweighting. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8419–8428. [Google Scholar]
Hu, Y.; Zhan, J.; Zhou, G.; Chen, A.; Cai, W.; Guo, K.; Hu, Y.; Li, L. Fast forest fire smoke detection using MVMNet. Knowl. Based Syst. 2022, 241, 108219. [Google Scholar] [CrossRef]
Fan, L.; Chen, X.; Wan, Y.; Dai, Y. Comparative Analysis of Remote Sensing Storage Tank Detection Methods Based on Deep Learning. Remote Sens. 2023, 15, 2460. [Google Scholar] [CrossRef]
Finn, C.; Abbeel, P.; Levine, S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1126–1135. [Google Scholar]
Zhou, M.; Liu, S.; Li, J. Multi-scale Forest Flame Detection Based on Improved and Optimized YOLOv5. Fire Technol. 2023, 59, 3689–3708. [Google Scholar] [CrossRef]
Nichol, A.; Achiam, J.; Schulman, J. On First-Order Meta-Learning Algorithms. arXiv 2018, arXiv:abs/1803.02999. [Google Scholar]
Li, X.; Deng, J.; Fang, Y. Few-Shot Object Detection on Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5601614. [Google Scholar] [CrossRef]
Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar]
Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P.H.S.; Hospedales, T.M. Learning to Compare: Relation Network for Few-Shot Learning. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1199–1208. [Google Scholar]
Snell, J.; Swersky, K.; Zemel, R.S. Prototypical Networks for Few-shot Learning. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Zheng, X.; Chen, F.; Lou, L.; Cheng, P.; Huang, Y. Real-Time Detection of Full-Scale Forest Fire Smoke Based on Deep Convolution Neural Network. Remote Sens. 2022, 14, 536. [Google Scholar] [CrossRef]
Li, J.; Wen, Y.; He, L. SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 6153–6162. [Google Scholar]
Fang, Q.; Zhang, L.; Yan, P. Fire detection method based on spatio-temporal dual-stream network. Signal Image Video Process. 2024, 19, 162. [Google Scholar] [CrossRef]
Hu, L.; Hochschild, V.; Neidhardt, H.; Schultz, M.; Khosravani, P.; Shokati, H. BIPE: A Bi-Layer Predictive Ensemble Framework for Forest Fire Susceptibility Mapping in Germany. Remote Sens. 2025, 17, 7. [Google Scholar] [CrossRef]
Zhang, C.; Chen, X.; Liu, P.; He, B.; Li, W.; Song, T. Automated detection and segmentation of tunnel defects and objects using YOLOv8-CM. Tunn. Undergr. Space Technol. 2024, 150, 105857. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.K.; Girshick, R.B.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2015; pp. 779–788. [Google Scholar]
Shamsoshoara, A.; Afghah, F.; Razi, A.; Zheng, L.; Fulé, P.Z.; Blasch, E. Aerial imagery pile burn detection using deep learning: The FLAME dataset. Comput. Netw. 2021, 193, 108001. [Google Scholar] [CrossRef]
Shen, S.-E.; Huang, Y.-C. Application of Reinforcement Learning in Controlling Quadrotor UAV Flight Actions. Drones 2024, 8, 660. [Google Scholar] [CrossRef]

Figure 1. YOLOv11 network structure diagram.

Figure 2. Meta-learning-based network structure for hill fire detection. ⊗ represents the product.

Figure 3. (a) Adaptive feature fusion module (AFF); (b) re-weighting module.

Figure 4. Mountain fire patrol drone schematic.

Figure 5. Test results.

Table 1. Hardware and software configurations.

Hardware	Software
CPU	CPU Intel Core i9-13900KF
GPU	GeForce RTX 4070
Embedded device	NUC13ANK i7000
Operating system	Windows 10
Programming language	Python 3.7
Deep learning framework	Pytorch 1.8.1
CUDA	11.3
cuDNN	8.2.0

Table 2. Ablation experiment. The bold represents optimal or sub-optimal.

Model	mAP95 (%)	Precision	FPS
YOLOv11	33.5	85.9	32.45
YOLOv11 + AFF	35.2	87.2	40.65
YOLOv11 + meta-learning	41.5	91.6	35.54
YOLOv11 + AFF + meta-learning (spatial reconfiguration and channel reconfiguration)	42.7	93.8	38.21

Table 3. The results of the comparison experiment. The bold represents optimal.

Method	mAP95 (%)	mAP50 (%)	Precision	FPS
Faster R-CNN	20.6	72.4	74.4	6.23
SSD	23.7	73.1	73.6	19.62
YOLOX	22.1	74.6	80.5	12.13
YOLOv11	33.5	84.2	85.9	32.45
MA-YOLO	42.7	90.1	93.8	38.21

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huo, Y.; Zhang, Y.; Xu, J.; Dai, X.; Shen, L.; Liu, C.; Fang, X. A Small-Sample Target Detection Method for Transmission Line Hill Fires Based on Meta-Learning YOLOv11. Energies 2025, 18, 1511. https://doi.org/10.3390/en18061511

AMA Style

Huo Y, Zhang Y, Xu J, Dai X, Shen L, Liu C, Fang X. A Small-Sample Target Detection Method for Transmission Line Hill Fires Based on Meta-Learning YOLOv11. Energies. 2025; 18(6):1511. https://doi.org/10.3390/en18061511

Chicago/Turabian Style

Huo, Yaoran, Yang Zhang, Jian Xu, Xu Dai, Luocheng Shen, Conghong Liu, and Xia Fang. 2025. "A Small-Sample Target Detection Method for Transmission Line Hill Fires Based on Meta-Learning YOLOv11" Energies 18, no. 6: 1511. https://doi.org/10.3390/en18061511

APA Style

Huo, Y., Zhang, Y., Xu, J., Dai, X., Shen, L., Liu, C., & Fang, X. (2025). A Small-Sample Target Detection Method for Transmission Line Hill Fires Based on Meta-Learning YOLOv11. Energies, 18(6), 1511. https://doi.org/10.3390/en18061511

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Small-Sample Target Detection Method for Transmission Line Hill Fires Based on Meta-Learning YOLOv11

Abstract

1. Introduction

2. Related Works

2.1. Object Detection and YOLOv11 in the Field of Object Detection

2.2. Few-Shot and Meta-Learning

2.3. Combination of YOLO and Meta-Learning

3. The MA-YOLO Network

3.1. Meta-Feature Extractor

3.2. Feature Re-Weighting Module

4. Experiment and Discussion

4.1. Dataset

4.2. Experimental Setup

4.3. Evaluation Metrics

4.4. Ablation Study

4.5. Comparison Experiment with Different Networks

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI