Deep Learning-Based Real-Time Vehicle Tire and Tank Temperature Monitoring Using Thermal Cameras

Hu, Yaoyao; Li, Jiaxin; Ma, Chuanyi; Cheng, Shuai; Zheng, Ruolin; Zhang, Xingang

doi:10.3390/app16062656

Open AccessArticle

Deep Learning-Based Real-Time Vehicle Tire and Tank Temperature Monitoring Using Thermal Cameras

by

Yaoyao Hu

¹,

Jiaxin Li

²,

Chuanyi Ma

¹,

Shuai Cheng

^2,*,

Ruolin Zheng

³ and

Xingang Zhang

²

¹

Shandong Hi-Speed Group Co., Ltd., Jinan 250098, China

²

School of Qilu Transportation, Shandong University, Jinan 250061, China

³

Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(6), 2656; https://doi.org/10.3390/app16062656

Submission received: 17 December 2025 / Revised: 18 February 2026 / Accepted: 25 February 2026 / Published: 11 March 2026

Download

Browse Figures

Versions Notes

Abstract

Ensuring the driving safety of hazardous chemical vehicles is a critical priority. High temperatures in tires and tanks can lead to catastrophic accidents, including fires and road damage, particularly in bridge and tunnel sections. Therefore, the purpose of this study is to utilize deep learning to obtain the temperature of vehicle tires and tanks in real time. We constructed a comprehensive dataset by combining the FLIR infrared vehicle dataset, the SPT visible tire dataset, and self-collected thermal video frames captured in various environments. State-of-the-art object detection models, including different scales of YOLOv8, YOLOv9, and YOLOv10, were evaluated for the multi-target detection of vehicles, tires, and tanks. Comparative analysis reveals that the YOLOv8-L model optimized with the GIoU loss function delivers the best performance. Specifically, it achieves a mean Average Precision (mAP) of 97.9% with an average inference time of 6.9 ms per frame, effectively balancing accuracy and real-time efficiency. Finally, by mapping the detection bounding boxes to the radiometric temperature matrix, the system achieves precise, real-time temperature monitoring of the vehicle components.

Keywords:

deep learning; thermal camera; YOLO; object detection

1. Introduction

As one of the countries with the most perfect road networks in the world, China’s complex and vast terrain makes bridge tunnels an important part of China’s high-speed road network. With the continuous development of bridge and tunnel construction technology, in modern China’s highway construction efforts, bridge construction and tunnel excavation have been widely used to cross mountains, rivers, and the seabed to shorten line mileage and improve road technical standards. Bridges and tunnels have become an important part of the contemporary transportation infrastructure. With the rapid development of highways and the increasing number of vehicles, the increasing pressure of transportation causes the operation and maintenance pressure of bridge and tunnel sections increases year by year. Fires caused by factors such as spontaneous combustion of vehicle tires and the explosion of tanks seriously affects the safety of high-speed traffic. Due to the geographical characteristics of bridge and tunnel sections, once the traffic accident occurs, the processing difficulty is greater than that of an ordinary road section. Therefore, it is an important part of highway operation and maintenance to prevent major driving risks such as road fires and abnormal road surface temperatures in bridge and tunnel sections.

Due to their long and narrow structures, the environment of bridge and tunnel sections is relatively closed, the space is small, and ventilation conditions are poor. In the event of a fire, the smoke generated by the combustibles will spread rapidly along the road section and fill the entire space quickly, posing a threat to the lives of people nearby. At the same time, it also greatly increases the difficulty of firefighting, rescue, and safe evacuation procedures [1]. Real-time temperature monitoring of both bridge and tunnel environments and vehicles, as well as early fire detection, are important means to prevent fires. The existing fire detection technologies include devices that are smoke-sensitive, temperature-sensitive, light-sensitive, etc. Due to the environmental impact of bridge and tunnel sections, the false alarm rate is high, and it is difficult to accurately warn about fires in a timely fashion [2]. With the development of machine vision and image recognition technology, image-based fire detection technology has become widely used [3]. Infrared thermal imaging technology can not only achieve the real-time display of monitoring video information but also the online monitoring of vehicle temperature information as well. At the same time, it has the characteristics of strong environmental adaptability and long detection distance [4], which is of great significance for the accurate early warning of fires in bridge and tunnel sections. As mentioned above, the current fire detection technology is far from meeting the requirements for accurate early warning. Thermal cameras are usually used for safety applications or the identification of overheated equipment [5]. Deep learning algorithms are used to identify vehicles because they can learn complex features. Vehicles generate heat during driving. At present, research has used thermal cameras and deep learning to detect moving vehicles [6]. However, there are few real-time temperature detection methods for vehicle tires and tanks. This paper aims to use deep learning and thermal cameras to identify vehicle tire and tank temperatures in real time. The first step is to use a target detection algorithm to calibrate the temperature range of vehicle tires and tanks. The second step is to calculate the temperature of the vehicle and its tires and tanks according to the position and temperature matrix returned by the detection frame.

The main contributions of this paper are summarized as follows:

Dataset Construction: We constructed a specialized multi-source dataset for thermal vehicle monitoring by integrating the FLIR infrared vehicle dataset, the SPT visible tire dataset, and self-collected thermal video frames. We manually annotated key temperature-sensitive regions (tires and tanks) to address the domain gap and improve model generalization in specific thermal scenarios.

Algorithm Optimization: We evaluated and optimized state-of-the-art detection models (YOLOv8, YOLOv9, YOLOv10) for thermal imagery. The YOLOv8-L model, enhanced with the GIoU loss function, was identified as the optimal solution, achieving a superior balance between detection accuracy (97.9% mAP) and real-time performance (6.9 ms inference time).

Real-time Temperature Monitoring System: We developed an end-to-end pipeline that maps detection bounding boxes directly to the radiometric temperature matrix. This enables the automatic, real-time extraction and monitoring of temperature data for hazardous chemical vehicles, effectively filling the gap in automated thermal safety monitoring for bridge and tunnel sections.

The rest of this article is organized as follows. Section 2 discusses the relevant literature and Section 3 discusses the data set and the proposed method. Section 4 introduces the results and analysis obtained in this paper. Section 5 discusses our work and the efficiency of the detector. Finally, this paper ends with the conclusions of Section 6 and considerations for future research.

2. Literature Review

2.1. Deep Learning Object Detection in Thermal Imagery

While traditional handcrafted features [7,8,9,10], statistical traffic analysis methods [11,12,13,14], and early detection models [15,16] laid the foundation, two-stage detectors [17,18,19,20,21] further advanced computer vision but often struggle to balance accuracy and latency in complex traffic environments. Consequently, one-stage detectors, particularly the YOLO family [22], have become the de facto standard for real-time engineering deployments. The continuous evolution from YOLOv2–v5 [23,24,25,26] to YOLOX, YOLOv6, YOLOv7, and PP-YOLO [27,28,29,30,31,32] demonstrates the rapid progress in this field. Recent iterations, such as YOLOv8 [33], YOLOv9 [34], and YOLOv10 [35], have further optimized the speed-accuracy trade-off, making them highly suitable for edge computing devices in bridge and tunnel monitoring. However, directly applying these visible-light-optimized models to thermal infrared (TIR) imagery presents unique challenges due to the lack of texture and color information, as well as susceptibility to specific weather phenomena [36].

2.2. Challenges in Quantitative Temperature Monitoring

Importantly, vehicle safety monitoring with infrared thermal cameras goes beyond simple object detection. Quantitative temperature readout is sensitive to radiometric calibration, emissivity assumptions, reflected ambient radiation, viewing distance, and environmental conditions (e.g., rain, fog, and background thermal clutter). Therefore, recent thermal-imaging studies increasingly couple deep learning-based region localization with explicit temperature extraction and quality control procedures, highlighting the need to report measurement uncertainty and potential error sources.

2.3. Recent Advances in Vehicle Thermal Monitoring (2022–2025)

Recent work proposed an automatic non-contact temperature measurement system for truck tires and brakes using a thermal camera and an HD camera. In this setup, YOLO detects tires from RGB images, and the corresponding thermal pixels are mapped to read temperature values, with KCF tracking used to reduce computation time [37]. This study reported high detection accuracy (F1 = 95.3%) and improved efficiency. Nevertheless, such a pipeline depends on reliable RGB-thermal registration and stable installation geometry, providing limited analysis of temperature readout errors introduced by distance and emissivity settings.

Another study combined dual-light (visible + infrared) fusion with deep learning for roadside brake-drum and tire temperature monitoring. A YOLOv5-based multimodal fusion detector and a segmentation module were used to localize key regions, suppress background interference, and obtain a temperature field matrix from infrared imagery [38]. This line of work highlights the benefit of multimodal fusion, but it introduces additional sensor dependency and calibration complexity, primarily focusing on tires/brakes rather than hazardous tank regions.

For fire risk assessment, roadside thermal imaging has been explored to provide early warnings for vehicle spontaneous combustion hazards in tunnels. For example, Fourier domain semantic augmentation and attention modules were integrated into YOLOv8 to improve detection in low-resolution tunnel infrared images, demonstrating gains in recall and mAP [39]. Such studies emphasize robustness under low contrast; however, the temperature estimation step is often detection-driven (region-level monitoring), and physical temperature validation is not always rigorously reported.

Related anomaly monitoring research indicates that thermal pipelines may require temporal modeling. For instance, gas leak early warning in utility tunnels leverages infrared imaging with video super-resolution and recurrent neural networks to classify leakage risks [40]. Additionally, quantitative fire studies used IR thermography with inverse modeling to estimate heat flux profiles, illustrating that reliable thermal quantification depends on both imaging quality and physical assumptions.

2.4. Gap Analysis

Overall, as summarized in Table 1, existing studies either (i) use thermal imaging mainly for object recognition without a formalized temperature extraction procedure, or (ii) perform temperature extraction for limited components (e.g., tire/brake) under constrained setups. In contrast, this study focuses on bridge and tunnel scenes for dangerous goods vehicles, jointly localizes tires and tank regions, and provides an end-to-end pipeline from detection to temperature readout with real-time reporting.

3. Data and Methods

This section mainly introduces the data acquisition, processing, and use of the algorithm.

3.1. Data with Thermal Camera

In northern China, the daytime is shorter in winter and longer in summer. In winter, snow can persist for three to four months, causing color cameras to face problems of insufficient light and snow interference. In rainy and snowy weather, the laser beam of LiDAR is often blocked by raindrops or snowflakes, resulting in a high false detection rate, poor real-time performance, and inaccurate positioning data [36]. The thermal infrared imager can better adapt to complex environments such as rain, snow, and fog, which has attracted much attention in recent years.

The vehicle temperature detection data were collected using a FLIR A615 infrared thermal imager (FLIR Systems, Wilsonville, OR, USA). The layout of the thermal imager needs to consider the ability to monitor the vehicle temperature within 20 m at the entrance of bridges and tunnels. It is necessary to collect a large amount of high-risk vehicle temperature information while accommodating the deceleration of vehicles as they are about to enter these sections, ensuring the accuracy and reliability of monitoring results as well as the safety of construction and maintenance.

Based on the above requirements for data acquisition, the entrance of the Qihe Service Area on the Jiliao Expressway was selected for monitoring. This location serves as a critical transportation hub with high daily traffic volume, particularly for heavy logistics and hazardous chemical transport, ensuring the collected data possesses strong representativeness and general applicability. Considering the actual deceleration behavior of vehicles, the stage where tankers decelerate into the service area was selected. The lens range was fixed at about 20 m, taking into account pixel clarity and sample diversity.

To ensure data diversity, collection was carried out during both day and night, in line with actual operating conditions. Since the goal of our research was to detect the temperature of the vehicle body, tank, and tires, we started recording when the tanker entered the service area. Between 3:00 p.m. and 9:00 p.m., we recorded dozens of videos containing tankers. We extracted frames from these videos with a resolution of 640 × 480 pixels. These frames include not only the tankers of interest but also cars, buses, and other vehicles. Nineteen videos of vehicles, tanks, and tires were selected to construct the temperature performance dataset. We extracted key frames from the infrared video at an interval of 8 frames per second and used the image structure similarity method to compare and remove adjacent frames with high similarity. After processing the original infrared video, the dataset contains a total of 2557 infrared vehicle image frames, with each frame containing target objects such as vehicles and tanks. Figure 1 shows the thermal infrared images during the day and night, from which we can observe that the vehicle temperature at night is lower than during the day, while the clarity remains consistent.

3.2. Measurement Reliability and Calibration

To ensure the trustworthiness of the thermal readings, we utilized a high-performance FLIR A615 thermal infrared camera. The camera features a high resolution of 640 × 480 pixels and a frame rate of 50 Hz, which effectively minimizes motion blur when capturing moving vehicles.

To address the specific factors affecting temperature accuracy, the following configurations and corrections were applied:

Emissivity Settings: The emissivity (ε) was set to 0.95, which is the standard characteristic value for rubber tires and automotive paint, ensuring accurate radiation-to-temperature conversion.

Spatial Resolution and Distance: The camera was equipped with a 24.6 mm lens, providing a spatial resolution (IFOV) of 0.68 mrad. This high spatial resolution ensured that even at typical monitoring distances, the target area (e.g., a tire surface) occupies sufficient pixels to avoid the “spot size effect,” thereby guaranteeing measurement accuracy.

Calibration and Environmental Correction: The camera operates within a calibrated temperature range of −20 °C to 650 °C. To mitigate environmental interference, parameters such as ambient temperature and relative humidity were input into the camera’s correction algorithm. Additionally, the camera was positioned at an oblique angle to the road surface to avoid direct reflection from the camera itself or perpendicular reflection from the vehicle body.

3.3. Infrared and Visible Light Public Dataset

The FLIR Free ADAS Thermal Dataset v2 dataset is a richly labeled infrared and visible video frame image dataset published by FLIR (FLIR Systems, Wilsonville, OR, USA), as shown in Figure 2, which can be used for target detection network model training. The dataset contains a total of 26,442 video frames, and 15 object categories and their detection boxes are labeled.

From the 15 categories of the dataset, we selected 4 types—cars, buses, trucks, and other vehicles—that are related to vehicle detection to train the infrared vehicle detection model. The label categories and the number of target detection objects in the dataset are shown in Table 2.

The Side Profile Tires (SPT) dataset is a visible light tire dataset containing 500 side-view images, as shown in Figure 3. Each image is labeled with the tire object and its corresponding bounding box. We used this dataset to pre-train the tire detection model. Although the SPT dataset consists of visible light images, it is valuable for infrared detection tasks due to the geometric consistency of vehicle components. The morphological features of tires (e.g., circular shape, edge contours, and relative position) remain invariant across visible and thermal modalities. Incorporating this dataset allows the model to learn robust structural representations and generalize better, effectively bridging the domain gap between visible and infrared images.

3.4. Labelling the Dataset

We use the aforementioned FLIR infrared vehicle dataset and SPT visible light tire dataset to train the vehicle detection model. However, since there was no pre-labeled thermal infrared tire and tank dataset available for our specific needs, we manually annotated the frames we collected. Key temperature detection areas, such as the vehicle body, tires, and tanks, were labeled. This was done to address the poor generalization performance of target detection models pre-trained on large general datasets when applied to the specific scenarios of vehicle tire and tank detection.

We extracted 785 frames from the collected thermal infrared videos to annotate the vehicle body, tires, and tanks. The annotation style is shown in Figure 4: the yellow box represents the vehicle body, the green box represents the tires, and the red box represents the tank. In total, more than 6000 detection bounding boxes were annotated. To ensure the accuracy and consistency of the dataset, a strict quality control mechanism was implemented during the annotation process. We adopted a cross-verification strategy where labeled images underwent a secondary review by a different expert. Any discrepancies in bounding box precision or category classification were resolved through discussion to reach a consensus, ensuring high inter-annotator agreement.

Since our objective was to detect the temperature of vehicle tires and tanks, in addition to labeling the bounding boxes, we also needed to annotate the temperature information for these frames. We processed the temperature data for all 2557 infrared vehicle images in the temperature performance dataset. The temperature matrix of the corresponding frame was extracted from the derived video temperature matrix data, as shown in Figure 5, to serve as the temperature ground truth. The dimensions of the temperature matrix are consistent with the size of the infrared image (640 × 480). The value at each position in the temperature matrix represents the measured infrared temperature (in Celsius) of the corresponding pixel in the infrared image.

3.5. Temperature Extraction and Outlier Filtering Strategy

After the YOLO model successfully localizes the tires and tanks, the final step is to quantify their thermal status. Following the object detection stage, we implemented a post-processing module to extract precise temperature readings from the detected regions. The bounding boxes generated by the YOLO model are directly mapped to the raw radiometric temperature matrix of the thermal image, where each pixel value corresponds to a specific temperature reading.

However, a direct mapping often introduces inaccuracies due to the rectangular nature of bounding boxes, which inevitably include background pixels (e.g., road surface, vehicle body, or air) that are significantly cooler than the target components. To ensure measurement reliability and filter out these outliers, we adopted a robust “Top-K Average” strategy. Instead of averaging all pixels within the box, which would be skewed by the cooler background, we extract the Region of Interest (ROI) and sort the pixel temperatures in descending order. We then calculate the average temperature of the top 10% hottest pixels within the ROI. This approach effectively eliminates cold background interference and smooths out potential sensor noise (hot pixel anomalies), ensuring that the extracted temperature represents the critical heat signature of the tires and tanks for effective risk warning.

3.6. Detectors

Before the popularity of deep learning technology, object detection mainly designed features (color, shape, texture, etc.) such as HOG and DPM manually through different target images, and classified features through linear classifiers. This kind of method is effective in scenes with few target categories, single tasks, and high distinguishability. However, in complex scenes, it is difficult to obtain robustness and there are great limitations in engineering applications. With the development of deep learning and the improvement of computing power of hardware devices, the two-stage method Fast-RCNN, Faster-RCNN, represented by the regional convolutional neural network R-CNN series, and the single-stage method represented by the single-stage detector SSD and YOLO series have become the two mainstream technical frameworks of current target detection technology. Among them, the YOLO series algorithm has gradually become the preferred framework for most industrial applications due to its better comprehensive performance. It has experienced various YOLO variants, such as YOLOv1 /v2/v3/v4/v5/v6/v7/v8/v9/v10.

YOLOv8 mainly draws on the design advantages of YOLOv5, YOLOv6, YOLOX and other models, focusing on engineering practice. A new SOTA model (including a target detection network with P5 640 and P6 1280 resolutions and an instance segmentation model based on YOLACT) is provided. Moreover, based on the scaling factor, different scales of N/S/M/L/X models are provided to meet the needs of different deployment platforms and application scenarios. YOLOv8 abandons the previous IoU allocation or unilateral proportional allocation method, but adopts the Task-Aligned Assigner positive and negative sample allocation strategy. In this paper, we choose small-scale YOLOv8-N and large-scale YOLOv8-L for model training. After the training is completed, the performance of these two models on the verification set is comprehensively evaluated and compared.

YOLOv9 has made further innovations on the basis of YOLOv8’s network, which mainly focuses on solving the challenges caused by information loss in deep neural networks. The innovative use of the information bottleneck principle and the reversible function is the core of its design, ensuring that YOLOv9 maintains high efficiency and accuracy. YOLOv9 introduces two key innovative concepts: Programmable Gradient Information (PGI) and Generalized Efficient Layer Aggregation Network (GELAN). PGI is a new concept introduced in YOLOv9 to solve the information bottleneck problem and ensure that basic data is stored across the deep network layer. This can generate reliable gradients, promote accurate model updates, and improve overall detection performance. GELAN represents a strategic advance in the architecture, enabling YOLOv9 to achieve superior parameter utilization and computational efficiency. Its design allows flexible integration of various computing blocks, enabling YOLOv9 to adapt to a wide range of applications without sacrificing speed or accuracy. We selected small-scale YOLOv9-T and large-scale YOLOv9-E for model training and compared their performance.

The architecture of YOLOv10 is based on the advantages of the previous YOLO model. By eliminating non-maximum suppression (NMS) and optimizing various model components, it achieves the most advanced performance and significantly reduces the computational overhead. The backbone network in YOLOv10 is responsible for feature extraction, and an enhanced version of CSPNet (Cross Stage Partial Network) is used to improve gradient flow and reduce computational redundancy. We selected small-scale YOLOv10-N, YOLOv9-S, and large-scale YOLOv10-X for model training, and compared their performance.

Aiming at the target detection tasks of vehicles, tires, and tanks, the YOLO network is improved, and the feature map size is increased to improve the target detection effect. Aiming at the problem of multi-scale and variable scale of the target, the feature pyramid structure is adopted and the feature fusion strategy is designed to fuse the deep abstract features with the shallow specific features. The detection head is used on different feature maps for regression and classification, and the detection results are obtained. Aiming at the problem of target occlusion in the video stream, it is proposed to adopt the method of time series information fusion before and after the frame to assist. In addition, focal loss gamma, random flip, Mosaic, and Mixup data enhancement methods are used to improve the detection accuracy and detection stability.

3.7. Temperature Extraction Algorithm

Although the YOLO detector provides the bounding box coordinates of the tires and tanks, directly using the raw temperature matrix within the box is prone to errors. The rectangular bounding boxes inevitably include background pixels (e.g., road surface, air) which are colder than the target, and the thermal sensor may contain random high-temperature noise (hot pixels). To ensure robust quantitative monitoring, we propose a statistical extraction algorithm.

Let B be the detected bounding box defined by coordinates (x₁, y₁, x₂, y₂), and M be the raw radiometric temperature matrix of the thermal image. The set of temperature values S_ROI within the region of interest is defined as

S_{R O I} = {M (i, j) ∣ x_{1} \leq i \leq x_{2}, y_{1} \leq j \leq y_{2}}

(1)

To eliminate the interference of the cold background included in the rectangular box, we first apply a filtering threshold Tth. Since the tires and tanks of hazardous chemical vehicles in operation exhibit significantly higher temperatures than the environment, pixels below the ambient temperature baseline are discarded. The filtered pixel set Sobj is

S_{o b j} = {t \in S_{R O I} ∣ t > T_{t h}}

(2)

Furthermore, to mitigate the impact of sensor noise, we avoid using the single maximum value. Instead, we adopt the Top-K% Average strategy. Let S’obj be the subset containing the top k% (set to 10% in this study) highest temperature values from Sobj. The final representative temperature Tfinal is computed as

T_{f i n a l} = \frac{1}{|S_{o b j}^{'}|} \sum_{t \in S_{o b j}^{'}} \cdot t

(3)

This method effectively filters out background interference while maintaining sensitivity to high-temperature anomaly regions, ensuring that the output temperature accurately reflects the heat status of the vehicle components.

3.8. Evaluation Metrics

In this paper, the seven models of YOLOv8-N, YOLOv9-T, YOLOv10-N, YOLOv10-S, YOLOv8-L, YOLOv9-E, and YOLOv10-X are trained on the dataset introduced for comparative analysis. The number of training rounds is 100 epochs. It mainly analyzes the performance of the training results of these seven models on performance indicators such as Precision, Recall, mAP50, and mAP50–95 to select the optimal model that is more suitable for this data set.

Accuracy is an indicator to measure the accuracy of the prediction results. It is defined as the proportion of true positive examples in the prediction of positive examples (i.e., the prediction is the target existence).

P r e c i s i o n = \frac{T P}{T P + F P}

(4)

As shown in (1), TP (True Positive) is the number of correctly predicted positives, and FP (False Positive) is the number of incorrectly predicted positives.

The recall rate measures the ability of the model to detect all actual positive cases, that is, the proportion of samples predicted as positive cases to all actual positive cases.

R e c a l l = \frac{T P}{T P + F N}

(5)

As shown in (2), TP (True Positive) is the number of correctly predicted positive cases, and FN (False Negative) is the number of incorrectly predicted negative cases (i.e., missed cases).

mAP50 (Mean Average Precision at Intersection over Union 0.5) is a very important indicator in target detection. It measures the average accuracy of the model when the IoU (Intersection over Union) threshold is 0.5. IoU is an indicator to measure the degree of overlap between the predicted bounding box and the real bounding box. The mAP50 is usually calculated on multiple categories and then averaged to obtain the overall average accuracy.

mAP50–95 is an indicator to measure the performance of the target detection model under different IoU thresholds. IoU is the degree of overlap between the predicted bounding box and the real bounding box. mAP50–95 calculates the average accuracy of the model in the range of IoU from 0.5 to 0.95.

Accuracy focuses on the accuracy of the prediction, that is, reducing false detection (FP). The recall rate focuses on the integrity of the detection, that is, reducing missed detections (FN). The mAP50 provides an indicator for balancing accuracy and recall, taking into account the performance of the model in different categories. The mAP50–95 reflects the performance of the model under different matching strictness, which is very important for evaluating the generalization ability of the model in practical applications.

3.9. Workflow of the Paper

As shown in Figure 6, the workflow of this study includes data acquisition, annotation, model training, and temperature extraction.

At present, there is no research on tank and tire detection of moving vehicles using target detection. We first conducted pre-training using the public FLIR thermal infrared dataset and the visible light tire dataset. In this paper, several videos of vehicles entering the service area are recorded by thermal imager, and some key frames are extracted from these videos as thermal infrared data sets of vehicle tanks and tires. The key temperature detection areas, such as the vehicle itself and the tires and tanks of the vehicle, are marked so as to solve the problem of poor generalization performance of the target detection model pre-trained on the large general data set in the specific application scenarios of vehicle tires and tank detection. Seven YOLO models with different scales were selected to train and verify our data, and the performance of different models was analyzed. According to the position of the vehicle tank and tire in the image, the values in the corresponding temperature matrix are derived, and the temperature of the tank and tire in different images is finally determined.

4. Results

This section shows the experimental results of vehicle, tire, and tank temperature detection.

4.1. The Experimental Results of Thermal Infrared Vehicle Detection

The vehicle detection results on the FLIR infrared vehicle dataset are shown in Table 3 and Table 4. The precision of the car in the verification set is 83.4%, and the recall is 87.6%. The detection model works well. The detection effect of other categories (buses, trucks, other vehicles) is not as good as that of car category, which is related to the insufficient model training caused by the unbalanced number of samples. The total processing time of each frame in the verification set is 6.9 ms, which meets the performance requirements of real-time analysis.

The visual results of infrared vehicle detection are shown in Figure 7. The left side is the data set annotation (Ground Truth), and the right side is the model prediction result. It can be seen that the vehicle detection model works well. The vast majority of vehicles are accurately positioned and the category is correct. A few of the vehicle categories of the detection box are misclassified but the detection box is still consistent with the actual vehicle location, and the detection effect is better.

4.2. Thermal Infrared Tire Test Results

The detection model is pre-trained using the visible light tire data set and tested on the infrared temperature performance data set. The visual experimental results are shown in Figure 8. It can be seen that the pre-trained tire detection model on the visible data set still has a good detection effect on the infrared data. However, due to the small number of visible light data sets, there are still missed detections.

4.3. Multi-Target Detection Results of Thermal Infrared Vehicles, Tanks and Tires

In this experiment, a total of 785 frames of thermal infrared images containing tag data were selected. To strictly prevent data leakage and ensure the independence of the evaluation, the dataset construction involved a rigorous de-duplication process. We utilized the Structural Similarity Index (SSIM) to remove highly similar adjacent frames from the original video sequences before splitting. From this filtered dataset, 549 frames were allocated to the training set and 236 frames to the verification set. The verification set contains a total of 2028 vehicles, tires, and tank labels. We choose precision (P), recall (R), mAP @ 50, and mAP @ 50–90 as the evaluation index. The latest small-scale models (YOLOv8-N, YOLOv9-T, YOLOv10-N, YOLOv10-S) and large-scale models (YOLOv8-L, YOLOv9-E, YOLOv10-X) of the YOLO series are compared.

The verification results are shown in Table 5. The performance of the seven models in the verification set is ideal, and they can accurately determine the vehicle, tire, and tank. YOLOv8-L, YOLOv9-E, and YOLOv10-X are larger models, which have higher accuracy than small models, but they also need more computing resources. Overall, YOLOv8 and YOLOv9 performed better. The best performances are YOLOv8-L and YOLOv9-E, which have similar performance, but YOLOv8-L has fewer layers and higher computational efficiency, which is more in line with our requirements for real-time detection.

The loss function plays a key role in the target detection task. It is the optimization goal in the process of model training that directly affects the performance and ability of the model. The subtasks need to be learned by the corresponding loss function. The definition of IoU Loss is to find the intersection and union ratio between the prediction box and the real box, and then find the negative logarithm; however, in practical use, we often write IoU Loss as 1-IoU. If the two boxes coincide, the intersection ratio is equal to 1, and Loss is 0, indicating that the coincidence degree is very high. Therefore, the value range of IoU is [0,1].

This paper analyzes the performance of the six loss functions of CIoU_Loss, GIoU_Loss, DIoU_Loss, EIoU_Loss, SIoU_Loss, and Focal_EIoU_Loss in the YOLO model. GIoU can not only focus on overlapping regions but also on other non-overlapping regions, which can better reflect the intersection of two boxes in the closure region. DIoU directly regresses the Euclidean distance between the center points of the two boxes on the basis of IoU, which accelerates the convergence speed. CIoU adds an impact factor on the basis of DIoU, that is, it increases the consistency of the aspect ratio between the prediction box and the real box. EIoU splits the aspect ratio on the basis of CIoU and clearly measures the differences of three geometric factors: namely, overlapping area, center point, and side length. At the same time, Fcoal_loss is introduced to solve the problem of unbalanced samples. DIoU_Loss considers the overlap area and the center point distance to solve the problem of slow convergence of GIoU. SIoU_Loss takes into account the angle of the vector between the expected regressions and redefines the angle penalty metric. It can make the prediction box quickly drift to the nearest axis, and then only one coordinate (X or Y) needs to be regressed, which effectively reduces the total number of degrees of freedom.

The YOLOv8-L with better performance, mentioned above, is tested using different loss functions, and the test results are shown in Table 6. The mAP50 and mAP50–95 of the six loss functions are similar, and GIoU_Loss performs best in precision and recall. In summary, YOLOv8-L has the best performance using GIoU_Loss; some of the test results are shown in Figure 9. Figure 9a,b show two more complex scenes. In (a), the tanker is not facing the thermal imager, which belongs to a more complex variable scale problem. In (b), there are many vehicle targets, and the occlusion is serious, which is a result of complex road conditions. It can be seen from the test results that the accuracy of the detection is very high in these two more complex cases.

Deploying the temperature detection program on the server and using the data in the temperature performance dataset for testing, the results shown in Figure 10 were obtained. Among them, a set of four data points in the output tensor represents the coordinate x₁, y₁ in the upper left corner of the target detection box and the coordinate x₂, y₂ in the lower right corner of the target detection box in order. According to the coordinate value, the temperature data in the corresponding coordinate is extracted from the temperature matrix, and the vehicle temperature is calculated and the vehicle temperature result is returned.

4.4. Analysis of Model Effectiveness

Based on the comparative results presented in Table 4, we further analyzed the impact of different model components—specifically model scale and architecture—on the detection performance. This analysis serves to identify which elements truly enhance the monitoring capability in thermal scenarios.

Impact of Model Scale (Depth and Width): By comparing the small-scale models (YOLOv8-N, v9-T, v10-N/S) with the large-scale models (YOLOv8-L, v9-E, v10-X), we observed that increasing the model depth and width significantly improves the Mean Average Precision (mAP). For instance, YOLOv8-L outperforms YOLOv8-N by a substantial margin. This indicates that for thermal images, where texture details are less distinct than in visible light, a deeper network capacity is essential to capture the geometric features of distant tires and tanks.

Impact of Model Architecture: The comparison across YOLOv8, YOLOv9, and YOLOv10 isolates the influence of network architecture. Although YOLOv10 introduces a newer design, it did not show a decisive advantage over YOLOv8 and YOLOv9 in this specific dataset. YOLOv8-L and YOLOv9-E achieved similar top-tier performance, suggesting that the established feature extraction backbone of YOLOv8 is already highly robust for thermal object detection.

Conclusion: The analysis confirms that the performance gain is primarily driven by the model scale (Large) rather than the architectural shift to v10. Consequently, YOLOv8-L is selected as the optimal solution because it provides high accuracy comparable to v9-E but with a more mature deployment ecosystem and efficient inference speed (6.9 ms), satisfying the real-time requirements of the monitoring system.

5. Discussion

In this paper, the thermal imager is used to detect the temperature of dangerous goods vehicles and their tires and tanks in real time. At present, there are few studies on temperature detection of dangerous goods vehicles while driving. We use the public FLIR Free ADAS Thermal Dataset V2 to pre-train the vehicle detection model. The data set contains 26,442 frames, and we select the types related to the vehicle for training. The Side Profile Tires visible light tire dataset was selected for pre-training, which contains 500 side tire images. However, there is no pre-labeled thermal infrared tire and tank data set for us to train, so the key temperature detection areas such as the vehicle itself and the vehicle‘s tires and tanks are labeled, so as to solve the problem of poor generalization performance of the target detection model pre-trained on the large general data set in the specific application scenarios of vehicle tire and tank detection. We use 549 frames of the labeled 785 thermal infrared images as the training set and 236 frames as the validation set. We select the latest YOLO series of small-scale models (YOLOv8-N, YOLOv9-T, YOLOv10-N, YOLOv10-S) and large-scale models (YOLOv8-L, YOLOv9-E, YOLOv10-X) for experiments. The experimental results are shown in Table 4. In the detection results, the detection accuracy and recall rate of YOLOv8 and YOLOv9 for each type of target are greater than 0.9, and the detection effect is very satisfactory. The accuracy and recall rate of the two larger models are about one percentage point higher than those of the two smaller models, which makes it acceptable for us to choose small-scale models more focused on efficiency or large-scale models more focused on accuracy. Among them, YOLOv8-L has the best detection effect on the tank, and this is what we focus on. YOLOv9-E has the best detection effect on vehicle types and higher accuracy of tire detection, but the proportion of false positives is also relatively high. The overall performance of YOLOv10 is not satisfactory, including even YOLOv10-X with the largest amount of calculation capacity. To sum up, we choose YOLOv8-L as our detection model. After that, we experiment with different loss functions on the YOLOv8-L model, and the overall level of GIoU Loss‘s accuracy and recall rate is deemed the most ideal.

Beyond detection accuracy, real-time performance is a critical prerequisite for safety monitoring systems in high-speed traffic environments. To verify the system’s efficiency, we analyzed the inference speed of the selected YOLOv8-L model. The experimental results indicate an average inference time of 6.9 ms per frame. Given that standard thermal monitoring cameras typically operate at a frame rate of 25 to 30 Frames Per Second (FPS)—corresponding to a time interval of approximately 33 ms to 40 ms per frame—our model’s processing speed is significantly faster than the video input rate (6.9 ms < 33 ms). This substantial margin ensures that the system can perform object detection, temperature extraction, and data transmission within a single frame cycle without latency or frame dropping. Consequently, the proposed method demonstrates high computational efficiency and fully meets the strict real-time requirements for online hazard monitoring in bridge and tunnel sections.

Despite the promising performance of the proposed method, practical deployment in real-world scenarios presents certain challenges. First, radiometric calibration is essential for maintaining temperature measurement accuracy over time, as infrared sensors may drift due to ambient temperature fluctuations. Periodic calibration using blackbody references is recommended to mitigate this issue. Second, extreme weather conditions, such as heavy rainfall or dense fog, can attenuate infrared radiation, potentially affecting detection range and accuracy. Additionally, physical accumulation of snow or mud on the lens necessitates the implementation of robust protective housing and automated cleaning systems to ensure long-term reliability.

6. Conclusions and Future Work

In this paper, we utilized deep learning on thermal imagery to perform real-time temperature detection of vehicle tires and tanks. Specifically, the YOLO architecture was selected to meet real-time processing requirements. We evaluated seven different scales of the latest YOLO models with various loss functions. The experimental results demonstrate that YOLOv8-L combined with GIoU Loss yields the most ideal performance. By mapping the detected bounding boxes to the raw radiometric temperature matrix, we successfully achieved precise temperature monitoring.

Despite these encouraging results, several limitations must be acknowledged regarding both the measurement system and the experimental scope. First, regarding the measurement principle, the system exhibits a certain degree of sensor dependency and is subject to potential error sources, such as emissivity variations (particularly for metal tanks) and atmospheric attenuation. We also acknowledge the lack of physical temperature verification (ground truth), as intercepting moving vehicles for contact-based measurement is operationally unfeasible; thus, our method focuses on detecting relative thermal anomalies rather than absolute metrological precision. Second, regarding the dataset, the current study relies on a relatively small dataset dominated by open-road scenarios. Generalization to complex environments, such as tunnels or bridges, and robustness under varying weather conditions (e.g., rain, fog), require further validation.

Based on this, our future work will focus on the following three aspects:

Dataset Expansion: We will collect more thermal infrared images to enrich our dataset, specifically targeting complex scenarios like bridge and tunnel entrances as well as diverse weather conditions. This will allow us to rigorously test and improve the model’s generalization capabilities.

Model Exploration: In addition to YOLO, we will experiment with other deep learning architectures and backbones, such as GoogleNet, ResNet, MobileNet, and NasNet, to further optimize the balance between accuracy and speed.

System Integration: We will integrate the detection and temperature extraction modules into a unified real-time warning system. Ultimately, the deployment of this technology holds significant potential for enhancing social and infrastructure safety. By enabling the early detection of thermal anomalies in hazardous chemical vehicles, the system serves as a proactive measure to prevent catastrophic accidents (e.g., fires or explosions), thereby safeguarding public lives and protecting critical transportation assets such as tunnels and bridges.

Author Contributions

Conceptualization, S.C. and X.Z.; methodology, Y.H. and J.L.; software, Y.H.; validation, C.M. and R.Z.; formal analysis, Y.H. and J.L.; investigation, C.M.; resources, X.Z.; data curation, Y.H.; writing—original draft preparation, Y.H.; writing—review and editing, S.C. and X.Z.; visualization, J.L.; supervision, S.C.; project administration, S.C.; funding acquisition, S.C. and X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (Grant Nos. 2022YFC3005602 and 2022YFC3005603) and the National Natural Science Foundation of China (Grant No. 51991394).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data, models, or code used during the study were generated by the authors and are not publicly available due to privacy and security restrictions. However, the data and related materials can be obtained from the corresponding author upon reasonable request.

Conflicts of Interest

Authors Yaoyao Hu and Chuanyi Ma were employed by the company Shandong Hi-Speed Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

YOLO	You Only Look Once
NMS	Non-Maximum Suppression
FPS	Frames Per Second
AP	Average Precision
IoU	Intersection over Union
mAP50	mean Average Precision at IoU = 0.5
mAP50–95	mean Average Precision averaged over IoU thresholds 0.50–0.95
TP	True Positive
FP	False Positive
FN	False Negative
GFLOPs	Giga Floating-Point Operations

References

Zhang, W.Q.; Sha, J. Research on fire detection of urban rail transit tunnel based on infrared thermal imaging technology. Sci. Innov. 2023, 79–81+84. (In Chinese) [Google Scholar] [CrossRef]
Sun, D.X.; Yao, B. Study on the applicability of linear temperature-sensitive fire detection system for super-large section highway tunnel. Fire Sci. 2021, 30, 165–172. (In Chinese) [Google Scholar] [CrossRef]
Chen, W.H. Research on Emergency Evacuation System of Subway Section Based on Image Processing. Master’s Thesis, Shenyang Aerospace University, Shenyang, China, 2015. (In Chinese) [Google Scholar]
Shen, D.D. Research on Fire Identification and Location Technology of Long Tunnel Based on Infrared Thermal Imaging. Master’s Thesis, Chang’an University, Chang’an, China, 2017. (In Chinese) [Google Scholar]
Robert, K. Night-time traffic surveillance: A robust framework for multi-vehicle detection, classification and tracking. In Proceedings of the 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance; IEEE: New York, NY, USA, 2009; pp. 1–6. [Google Scholar]
Chang, C.W.; Srinivasan, K.; Chen, Y.Y.; Cheng, W.-H.; Hua, K.-L. Vehicle detection in thermal images using deep neural network. In Proceedings of the 2018 IEEE Visual Communications and Image Processing (VCIP); IEEE: New York, NY, USA, 2018; pp. 1–4. [Google Scholar]
Bao, P.; Zhang, L.; Wu, X. Canny edge detection enhancement by scale multiplication. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1485–1490. [Google Scholar] [CrossRef]
Maini, R.; Aggarwal, H. Study and comparison of various image edge detection techniques. Int. J. Image Process. (IJIP) 2009, 3, 1–11. [Google Scholar]
Owais, M.; Moussa, G.S.; Hussain, K.F. Robust Deep Learning Architecture for Traffic Flow Estimation from a Subset of Link Sensors. J. Transp. Eng. Part A Syst. 2020, 146, 04019055. [Google Scholar] [CrossRef]
Mao, L.; Xie, M.; Huang, Y.; Zhang, Y. Preceding vehicle detection using histograms of oriented gradients. In Proceedings of the 2010 International Conference on Communications, Circuits and Systems (ICCCAS); IEEE: New York, NY, USA, 2010; pp. 354–358. [Google Scholar]
Li, P.; Abdel-Aty, M. Real-Time Crash Likelihood Prediction Using Temporal Attention-Based Deep Learning and Trajectory Fusion. J. Transp. Eng. Part A Syst. 2022, 148, 04022043. [Google Scholar] [CrossRef]
Iwasaki, Y.; Misumi, M.; Nakamiya, T. Robust vehicle detection under various environmental conditions using an infrared thermal camera and its application to road traffic flow monitoring. Sensors 2013, 13, 7756–7773. [Google Scholar] [CrossRef] [PubMed]
Morris, C.; Yang, J.J.; Chorzepa, M.G.; Kim, S.S.; Durham, S.A. Self-Supervised Deep Learning Framework for Anomaly Detection in Traffic Data. J. Transp. Eng. Part A Syst. 2022, 148, 04022020. [Google Scholar] [CrossRef]
Assi, K.; Ratrout, N.; Nemer, I.; Rahman, S.M.; Jamal, A. Framework of Big Data and Deep Learning for Simultaneously Solving Space Allocation and Signal Timing Problem. J. Transp. Eng. Part A Syst. 2023, 149, 04022126. [Google Scholar] [CrossRef]
Lan, W.; Dang, J.; Wang, Y.; Wang, S. Pedestrian detection based on YOLO network model. In Proceedings of the 2018 IEEE International Conference on Mechatronics and Automation (ICMA); IEEE: New York, NY, USA, 2018; pp. 1547–1551. [Google Scholar]
Amato, G.; Carrara, F.; Falchi, F.; Gennaro, C.; Meghini, C.; Vairo, C. Deep learning for decentralized parking lot occupancy detection. Expert Syst. Appl. 2017, 72, 327–334. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 142–158. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 26 July 2017; pp. 936–944. [Google Scholar]
Deepika, H.C. An Overview of You Only Look Once: Unified, Real-Time Object Detection. Int. J. Res. Appl. Sci. Eng. Technol. 2020, 8, 607–609. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016; pp. 6517–6525. [Google Scholar]
Redmon, J. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
Jocher, G.; Stoken, A.; Borovec, J.; NanoCode012; Stan, C.; Liu, C.; Laughing; Tkianai; Adam, H.; Mammana, L.; et al. ultralytics/yolov5: V3. 1—Bug Fixes and Performance Improvements; Zenodo: Geneva, Switzerland, 2020. [Google Scholar]
Ge, Z.; Liu, S.; Wang, F.; Li, Z. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
Li, C.; Li, L.; Jiang, H.; Weng, K. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Long, X.; Deng, K.; Wang, G.; Zhang, Y. PP-YOLO: An effective and efficient implementation of object detector. arXiv 2020, arXiv:2007.12099. [Google Scholar] [CrossRef]
Huang, X.; Wang, X.; Lv, W.; Bai, X. PP-YOLOv2: A practical object detector. arXiv 2021, arXiv:2104.10419. [Google Scholar] [CrossRef]
Xu, S.; Wang, X.; Lv, W.; Chang, Q. PP-YOLOE: An evolved version of YOLO. arXiv 2022, arXiv:2203.16250. [Google Scholar] [CrossRef]
Sohan, M.; Sai Ram, T.; Rami Reddy, C.V. A review on yolov8 and its advancements. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics; Springer: Singapore, 2024; pp. 529–545. [Google Scholar]
Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning what you want to learn using programmable gradient information. arXiv 2024, arXiv:2402.13616. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Liu, L.; Chen, K. YOLOv10: Real-time end-to-end object detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
Rasshofer, R.H.; Spies, M.; Spies, H. Influences of weather phenomena on automotive laser radar systems. Adv. Radio Sci. 2011, 9, 49–60. [Google Scholar] [CrossRef]
Pangkreung, S.; Pijarnvanit, N.; Yodrak, N.; Janya-Anurak, C. Automated Non-Contact Temperature Measurement System for Truck Tires and Brakes Using Thermal Imaging and YOLO. In Proceedings of the 2024 19th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP); IEEE: New York, NY, USA, 2024; pp. 1–5. [Google Scholar]
He, Y.; Wang, Y.; Wu, F.; Yang, R.; Wang, P.; She, S.; Ren, D. Temperature monitoring of vehicle brake drum based on dual light fusion and deep learning. Infrared Phys. Technol. 2023, 133, 104823. [Google Scholar] [CrossRef]
Han, K.; Tang, B.; Liu, D. FSA-FCA: Fourier semantic augmentation with attention for enhanced vehicle temperature monitoring in roadside thermal imaging. Infrared Phys. Technol. 2025, 153, 106305. [Google Scholar] [CrossRef]
Jiang, Z.; Zhang, C.; Xu, Z.; Song, W. Gas Leak Detection and Leakage Rate Identification in Underground Utility Tunnels Using a Convolutional Recurrent Neural Network. Appl. Sci. 2025, 15, 8022. [Google Scholar] [CrossRef]

Figure 1. (a) Daytime thermal infrared image, (b) Nighttime thermal infrared image.

Figure 2. The FLIR Free ADAS Thermal Dataset v2 dataset.

Figure 3. Side Profile Tires visible light tire inspection dataset.

Figure 4. Thermal infrared image vehicle, tire, and tank labeling. The yellow, green, and red boxes indicate the labeled bounding boxes for vehicles, tires, and tanks, respectively.

Figure 5. Thermal infrared image temperature labeling results.

Figure 6. Overview of the process.

Figure 7. Visualization results of thermal infrared vehicle detection.

Figure 8. The test results of the tire detection model on the infrared dataset.

Figure 9. YOLOv8-L vehicle, tire, and tank test results: (a) detection result of test sample 1; (b) detection result of test sample 2.

Figure 10. Temperature detection test program running results.

Table 1. Comparison of recent thermal imaging-based studies for vehicle safety monitoring and temperature readout (2022–2025).

Study (Year)	Scenario & Targets	Method	Temperature Readout	Real-Time Evidence	Key Limitations
Automated non-contact temperature measurement for truck tires and brakes (2024)	Roadside truck; tires & brakes	Thermal imaging + YOLO detection + KCF tracking	Non-contact ROI temperature from thermal image	0.165 s/frame with tracking (vs. ~1.08 s/frame without)	Component scope limited to tires/brakes; temperature affected by geometry/emissivity
Vehicle brake drum temperature monitoring via dual-light fusion (2023)	Roadside; wheel/brake region	Visible-thermal fusion + YOLOv5 + segmentation	Automatic extraction of wheel temperature (temperature field)	All-weather roadside monitoring (latency not stated)	Fusion/calibration details limited; focuses on wheel/brake
FSA-FCA for vehicle temperature monitoring in roadside thermal imaging (2025)	Roadside/tunnel; vehicle hotspots for fire-risk monitoring	Fourier semantic augmentation + attention integrated into YOLOv8	Hotspot/anomaly detection from thermal imagery (vehicle-level)	Improved recall/F1/mAP50 vs. baselines (deployment details not stated)	Component-level temperature readout not emphasized; sensor/scene dependence
This work	Bridge-tunnel sections; tire & tank	Thermal camera + YOLO-based detection + temperature extraction from radiometric matrix	ROI temperature (pixel-level) within detected tire/tank regions	Designed for real-time monitoring (see Results for runtime)	Distance/angle/reflection/emissivity may bias; sensor dependency

Table 2. Infrared public data set data distribution.

Label	Train	Val
person	50,478	4470
bike	7237	170
car	73,623	7133
motor	1116	55
bus	2245	179
train	5	0
truck	829	46
light	16,198	2005
hydrant	1095	94
sign	20,770	2472
dog	4	0
deer	8	0
skateboard	29	3
stroller	15	6
scooter	15	0
other vehicle	1373	63
Total	175,040	16,696

Table 3. Vehicle detection training set results.

Class	Images	Instances	Precision	Recall	mAP50	mAP50–95
all	10,741	78,054	0.889	0.809	0.895	0.698
car	10,741	73,608	0.905	0.836	0.923	0.714
bus	10,741	2244	0.902	0.86	0.934	0.736
truck	10,741	829	0.87	0.793	0.888	0.741
other	10,741	1373	0.877	0.747	0.836	0.602

Table 4. Vehicle detection validation set results.

Class	Images	Instances	Precision	Recall	mAP50	mAP50–95
all	1143	7394	0.623	0.561	0.592	0.446
car	1143	7109	0.834	0.876	0.922	0.725
bus	1143	176	0.798	0.631	0.753	0.599
truck	1143	46	0.442	0.5	0.467	0.339
other	1143	63	0.416	0.238	0.228	0.119

Table 5. Detection results of deep learning networks.

Detector	Class	Images	Instances	Precision	Recall	mAP50	mAP50–95
YOLOv8-N	all	236	2028	0.933	0.929	0.971	0.772
	vehicle	236	559	0.94	0.957	0.982	0.824
	tank	187	208	0.925	0.957	0.961	0.77
	tyre	232	1261	0.936	0.874	0.97	0.724
YOLOv8-N summary (fused): 168 layers, 3,006,233 parameters, 0 gradients, 8.1 GFLOPs
YOLOv8-L	all	236	2028	0.944	0.939	0.979	0.828
	vehicle	236	559	0.958	0.95	0.987	0.879
	tank	187	208	0.938	0.952	0.97	0.814
	tyre	232	1261	0.938	0.916	0.979	0.79
YOLOv8-L summary (fused): 268 layers, 43,608,921 parameters, 0 gradients, 164.8 GFLOPs
YOLOv9-T	all	236	2028	0.932	0.922	0.969	0.777
	vehicle	236	559	0.94	0.952	0.977	0.82
	tank	187	208	0.92	0.938	0.961	0.784
	tyre	232	1261	0.937	0.878	0.971	0.726
YOLOv9-T summary (fused): 486 layers, 1,971,369 parameters, 0 gradients, 7.6 GFLOPs
YOLOv9-E	all	236	2028	0.948	0.942	0.978	0.829
	vehicle	236	559	0.968	0.961	0.989	0.878
	tank	187	208	0.928	0.966	0.966	0.819
	tyre	232	1261	0.947	0.898	0.979	0.791
YOLOv9-E summary (fused): 687 layers, 57,378,713 parameters, 0 gradients, 189.1 GFLOPs
YOLOv10-N	all	236	2028	0.857	0.835	0.919	0.727
	vehicle	236	559	0.849	0.83	0.917	0.769
	tank	187	208	0.857	0.837	0.911	0.72
	tyre	232	1261	0.865	0.84	0.929	0.691
YOLOv10-N summary (fused): 285 layers, 2,695,586 parameters, 0 gradients, 8.2 GFLOPs
YOLOv10-S	all	236	2028	0.886	0.891	0.954	0.782
	vehicle	236	559	0.871	0.907	0.956	0.827
	tank	187	208	0.895	0.875	0.947	0.774
	tyre	232	1261	0.892	0.892	0.96	0.745
YOLOv10-S summary (fused): 293 layers, 8,037,282 parameters, 0 gradients, 24.5 GFLOPs
YOLOv10-X	all	236	2028	0.935	0.882	0.96	0.807
	vehicle	236	559	0.926	0.875	0.962	0.848
	tank	187	208	0.941	0.885	0.948	0.794
	tyre	232	1261	0.939	0.886	0.971	0.779
YOLOv10-X summary (fused): 503 layers, 31,589,858 parameters, 0 gradients, 169.8 GFLOPs

Table 6. Experimental results of YOLOv8-L using different loss functions.

YOLOv8-L	Class	Images	Instances	Precision	Recall	mAP50	mAP50–95
CIoU_Loss	all	236	2028	0.944	0.939	0.979	0.828
	vehicle	236	559	0.958	0.95	0.987	0.879
	tank	187	208	0.938	0.952	0.97	0.814
	tyre	232	1261	0.938	0.916	0.979	0.79
GIoU_Loss	all	236	2028	0.953	0.938	0.981	0.829
	vehicle	236	559	0.967	0.953	0.989	0.883
	tank	187	208	0.939	0.971	0.974	0.811
	tyre	232	1261	0.952	0.89	0.979	0.791
DIoU_Loss	all	236	2028	0.962	0.919	0.98	0.827
	vehicle	236	559	0.969	0.943	0.988	0.876
	tank	187	208	0.951	0.942	0.971	0.82
	tyre	232	1261	0.965	0.873	0.979	0.786
EIoU_Loss	all	236	2028	0.946	0.939	0.978	0.83
	vehicle	236	559	0.95	0.953	0.986	0.877
	tank	187	208	0.94	0.966	0.967	0.822
	tyre	232	1261	0.948	0.899	0.981	0.792
SIoU_Loss	all	236	2028	0.948	0.938	0.978	0.83
	vehicle	236	559	0.958	0.953	0.987	0.88
	tank	187	208	0.947	0.947	0.97	0.819
	tyre	232	1261	0.938	0.913	0.979	0.789
Focal_EIoU_Loss	all	236	2028	0.954	0.93	0.979	0.823
	vehicle	236	559	0.97	0.948	0.985	0.873
	tank	187	208	0.946	0.947	0.972	0.812
	tyre	232	1261	0.946	0.895	0.98	0.785

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hu, Y.; Li, J.; Ma, C.; Cheng, S.; Zheng, R.; Zhang, X. Deep Learning-Based Real-Time Vehicle Tire and Tank Temperature Monitoring Using Thermal Cameras. Appl. Sci. 2026, 16, 2656. https://doi.org/10.3390/app16062656

AMA Style

Hu Y, Li J, Ma C, Cheng S, Zheng R, Zhang X. Deep Learning-Based Real-Time Vehicle Tire and Tank Temperature Monitoring Using Thermal Cameras. Applied Sciences. 2026; 16(6):2656. https://doi.org/10.3390/app16062656

Chicago/Turabian Style

Hu, Yaoyao, Jiaxin Li, Chuanyi Ma, Shuai Cheng, Ruolin Zheng, and Xingang Zhang. 2026. "Deep Learning-Based Real-Time Vehicle Tire and Tank Temperature Monitoring Using Thermal Cameras" Applied Sciences 16, no. 6: 2656. https://doi.org/10.3390/app16062656

APA Style

Hu, Y., Li, J., Ma, C., Cheng, S., Zheng, R., & Zhang, X. (2026). Deep Learning-Based Real-Time Vehicle Tire and Tank Temperature Monitoring Using Thermal Cameras. Applied Sciences, 16(6), 2656. https://doi.org/10.3390/app16062656

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning-Based Real-Time Vehicle Tire and Tank Temperature Monitoring Using Thermal Cameras

Abstract

1. Introduction

2. Literature Review

2.1. Deep Learning Object Detection in Thermal Imagery

2.2. Challenges in Quantitative Temperature Monitoring

2.3. Recent Advances in Vehicle Thermal Monitoring (2022–2025)

2.4. Gap Analysis

3. Data and Methods

3.1. Data with Thermal Camera

3.2. Measurement Reliability and Calibration

3.3. Infrared and Visible Light Public Dataset

3.4. Labelling the Dataset

3.5. Temperature Extraction and Outlier Filtering Strategy

3.6. Detectors

3.7. Temperature Extraction Algorithm

3.8. Evaluation Metrics

3.9. Workflow of the Paper

4. Results

4.1. The Experimental Results of Thermal Infrared Vehicle Detection

4.2. Thermal Infrared Tire Test Results

4.3. Multi-Target Detection Results of Thermal Infrared Vehicles, Tanks and Tires

4.4. Analysis of Model Effectiveness

5. Discussion

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI