Fast Helmet Detection in Low-Resolution Surveillance via Super-Resolution and ROI-Guided Inference

He, Taiming; Wang, Ziyue; Yang, Lu

doi:10.3390/app16020967

Open AccessArticle

Fast Helmet Detection in Low-Resolution Surveillance via Super-Resolution and ROI-Guided Inference

by

Taiming He

,

Ziyue Wang

^*

and

Lu Yang

School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(2), 967; https://doi.org/10.3390/app16020967 (registering DOI)

Submission received: 29 November 2025 / Revised: 12 January 2026 / Accepted: 14 January 2026 / Published: 17 January 2026

(This article belongs to the Section Transportation and Future Mobility)

Download

Browse Figures

Versions Notes

Abstract

Reliable detection of safety helmets is essential for ensuring personnel protection in large-scale outdoor operations. However, recognition becomes difficult when monitoring relies on low-resolution or compressed video streams captured by fixed or mobile platforms such as UAVs—conditions commonly encountered in intelligent transportation and urban surveillance. This study proposes a super-resolution-enhanced detection framework that integrates video super-resolution with ROI-guided inference to improve the visibility of small targets while reducing computational cost. Focusing on a single, carefully selected VSR module (BasicVSR++), the framework achieves an F1-score of 0.904 in helmet detection across multiple low-quality surveillance scenarios. This demonstrates the framework’s effectiveness for robust helmet monitoring in low-resolution and compressed surveillance scenarios.

Keywords:

helmet detection; video super-resolution; low-resolution surveillance; industrial safety monitoring

1. Introduction

Ensuring the safety of field workers is a fundamental requirement in industrial operations and transportation infrastructure maintenance [1]. Outdoor tasks—such as road construction, bridge and tunnel inspection, utility facility maintenance, and unmanned aerial vehicles (UAV)-assisted field operations—often expose personnel to hazardous environments. Safety helmets play a crucial role in preventing head injuries [2], yet monitoring their proper usage remains challenging in large and dynamic worksites.

Automated video-based safety monitoring systems have been increasingly adopted to improve management efficiency [3]. In intelligent transportation–related environments, surveillance devices are deployed on fixed cameras, mobile platforms, and UAVs to cover extensive outdoor areas [4]. However, video captured in these real-world settings frequently suffers from low spatial resolution, motion blur, and compression artifacts due to long-distance imaging, platform vibration, or bandwidth limitations. Beyond these data-quality degradations, the perceptual performance of vision systems is further challenged by inherent sensor limitations [5] (e.g., noise, dynamic range) and adverse weather conditions [6] (e.g., fog, rain) that introduce additional image distortions. The resulting visual degradation significantly reduces the detectability of small safety-critical objects such as helmets, leading to unreliable monitoring performance.

Although modern deep learning–based detectors perform well on high-quality datasets, their accuracy drops considerably when applied to low-quality video streams [7,8]. Existing enhancement-based and lightweight detection approaches offer partial improvements but often fail to restore sufficient structural detail for small-object recognition or cannot achieve the computational efficiency required for real-time deployment on resource-constrained platforms. This challenge is actively being addressed by the research community, with the development roadmaps of future detection models (e.g., the anticipated model YOLOv26 [9]) explicitly targeting efficient inference on embedded edge devices. While promising, these advancements underscore the persistent and critical nature of the efficiency problem that this work aims to solve with currently available technologies.

To address these limitations, this paper proposes a super-resolution–enhanced detection framework designed for visually degraded outdoor environments. The framework incorporates a video super-resolution (VSR) module to enhance fine-grained visual cues of helmets and employs a region-of-interest (ROI) guided strategy to reduce redundant computation. This combination improves recognition reliability while maintaining near real-time performance suitable for UAVs and other mobile monitoring systems.

The proposed method is evaluated on diverse low-quality surveillance videos captured from representative outdoor work environments. These scenarios share the common challenge of visually degraded imagery, making them relevant to intelligent transportation infrastructure inspection and field-operation monitoring. Experimental results demonstrate significant improvements in detection confidence and recall, validating the effectiveness of the proposed framework for safety-critical monitoring tasks.

2. Related Work

2.1. Traditional and CNN-Based Object Detection

Object detection has experienced rapid development alongside advances in deep learning. Earlier approaches relied on handcrafted features—such as Haar-like descriptors [10], Histogram of Oriented Gradients (HOG) [11], and Local Binary Patterns (LBP) [12]—combined with conventional classifiers including Support Vector Machines (SVMs) [13] or Multi-Layer Perceptrons (MLPs) [14]. While these methods achieved reasonable results in controlled environments, they suffered from limited adaptability and degraded performance on low-resolution or noisy images.

The advent of Convolutional Neural Networks (CNNs) fundamentally transformed detection pipelines. Two-stage frameworks, including Region-based Convolutional Neural Network (R-CNN) [15] and Fast R-CNN, first propose candidate regions before classification, achieving high accuracy but relatively slow inference. In contrast, one-stage architectures, such as the You Only Look Once (YOLO) series [16], integrate detection and classification in a single step, greatly improving efficiency. Successive YOLO versions have incorporated stronger backbone networks and advanced feature fusion strategies, improving performance on small targets; however, results remain sensitive to image quality.

2.2. Image and Video Super-Resolution for Detection Enhancement

To mitigate the sensitivity of detectors to low-quality inputs, image and video super-resolution has been explored as a complementary technique. Classical interpolation-based approaches often struggle to recover fine details, whereas learning-based methods—such as Super-Resolution Convolutional Neural Network (SRCNN) [17], Very Deep Super-Resolution (VDSR) [18], and residual-based multi-scale networks [19]—learn mappings from low- to high-resolution images, achieving more faithful spatial reconstruction. In video contexts, Basic Video Super-Resolution++ (BasicVSR++) [20] leverages second-order grid propagation and flow-guided deformable alignment to exploit temporal continuity, producing sharper and more consistent super-resolved frames. These methods are particularly valuable in industrial monitoring, where small targets like helmets in compressed or blurred footage are difficult to identify.

2.3. Helmet Detection in Complex and UAV-Based Scenarios

Helmet detection has been extensively studied due to its significance in occupational safety. Traditional methods combined handcrafted features with simple classifiers, such as Haar-like features with Circle Hough Transform [21]. Modern approaches predominantly rely on CNN-based detectors. For example, RBFPDet [22] treats helmet detection as semantic feature point localization, while EH-DETR [23] integrates re-parameterized blocks and channel attention to improve small-object detection.

Applications have also expanded to UAV-based and harsh environments. Liang and Seo [24] proposed a UAV-based low-altitude helmet inspection system, while Kumari et al. [25] developed an intelligent vision system integrating thermal imaging and CNNs for mining operations under fog. Huang et al. [26] enhanced YOLOv3 for helmet detection, optimizing feature maps and prior dimensions. Other studies have focused on improved YOLO variants for helmet detection [27,28,29,30,31,32], demonstrating adaptability across various industrial scenarios. Despite these advances, performance deteriorates under low-resolution or heavily compressed inputs, motivating the integration of super-resolution and efficient detection strategies [33].

3. Method

3.1. Helmet Detection Framework

The performance of a standard YOLOv8 model was first evaluated on a dedicated helmet dataset, where it achieved a high mAP@0.5 of 0.928 under standard-resolution conditions. However, as shown in Figure 1, its performance dropped markedly when applied to original, low-quality frames from long-distance cameras or UAV streams, with many helmets either missed or detected with low confidence. This empirical observation clearly identified the limitation of direct detection in visually degraded surveillance and motivated the development of super-resolution-enhanced pipelines. Ultimately, this led to the formulation of the ROI-guided strategy, which is designed to ensure both high accuracy and real-time applicability.

To achieve accurate and efficient helmet detection in low-quality videos, a progressive framework is proposed that integrates video super-resolution with object detection through three distinct yet complementary pipelines, as illustrated in Figure 2. The design and selection of these pipelines are based on an analysis of the hardware and imaging conditions encountered in practical surveillance. The core challenge stems from videos typically captured by long-distance surveillance cameras or UAVs. For instance, in the experiments, videos were recorded at an outdoor construction site with a resolution of 4K (3840 × 2160 pixels) at 30 fps, where the camera-to-target distance was approximately 210 m, resulting in workers’ heights averaging around 150 pixels and helmets occupying only about 10 × 10 pixels. Under such conditions, the severe feature degradation necessitates specialized processing strategies.

Figure 2 delineates the workflow and the two core image representations involved.

I_{L R}

(Low-Resolution Image) refers to the original input captured by long-distance surveillance or UAVs. Due to transmission bandwidth constraints or optical resolution limits of the lens, the target objects occupy an extremely small pixel area (e.g., approximately 10 × 10 pixels). This results in severe feature degradation, making the targets undetectable by standard object detectors. Conversely,

I_{S R}

(Super-Resolution Image) refers to the high-resolution image restored via the Super-Resolution module. By compensating for pixels in the Region of Interest, the target size is significantly increased (e.g., to 60 × 60 pixels or larger). This process restores the structural features and edge details necessary for the detector to function accurately.

The framework comprises three processing pathways. Pipeline A (Direct Detection) serves as the baseline, applying the YOLOv8 helmet detector directly to

I_{L R}

frames, highlighting the inherent performance limitation of modern detectors on degraded streams. Pipeline B (Full-Frame Super-Resolution) addresses the quality issue by first using BasicVSR++ to enhance the entire video frame into

I_{S R}

, which is then fed to the detector. This pipeline validates the upper bound of accuracy improvement achievable through visual enhancement but incurs a prohibitively high computational cost for full-frame processing, making it unsuitable for real-time applications. To bridge the gap between the accuracy of Pipeline B and the efficiency needs of mobile platforms, Pipeline C (ROI-Guided Super-Resolution) introduces a novel two-stage strategy. It first employs a person detector on the

I_{L R}

frame to locate workers. The resulting bounding boxes are used to crop head regions as Regions of Interest. Only these semantically critical ROIs are subsequently super-resolved by BasicVSR++ before the final helmet detection. This design strategically focuses computational resources on areas most likely to contain helmets, aiming to recover essential details while drastically reducing redundant processing on non-informative background regions.

The core motivation for introducing Pipeline C is to overcome the computational bottleneck of full-frame VSR, making super-resolution-enhanced detection feasible for near real-time applications such as UAV and edge surveillance.

3.2. Direct Super-Resolution Preprocessing

The initial approach validates the potential of super-resolution in enhancing small-object detection under low-quality surveillance conditions. From the perspective of Convolutional Neural Networks like the YOLO series, increasing the pixel size of a target addresses two core limitations: Feature Vanishing and Receptive Field Mismatch. In YOLO-based architectures, feature maps undergo progressive downsampling. Let the total downsampling factor be S (e.g., S = 32 for YOLOv8):

W_{f e a t} = \frac{W_{i m g}}{S}, H_{f e a t} = \frac{H_{i m g}}{S}

(1)

A small target (e.g., 10 × 10 pixels) may shrink to sub-pixel size in deep layers, causing its signal to vanish before semantic extraction. Super-resolution mitigates this by preserving a viable feature area (e.g., ~2 × 2 pixels after S = 32 downsampling). Furthermore, a standard 3 × 3 convolutional kernel covering a minuscule target also captures predominant background noise, diluting intra-class features. Enlarging the target aligns it better with the kernel’s receptive field, defined recursively as:

R F_{l} = R F_{l - 1} + (k - 1) \times \prod_{i = 1}^{l - 1} s_{i}

(2)

allowing for cleaner feature extraction of patterns like helmet contours.

Finally, SR enhances the effective Signal-to-Noise Ratio (SNR) of the target. In the LR domain, target features can be commensurate with sensor noise. SR leverages spatio-temporal priors to amplify discriminative features, which directly strengthens the model’s confidence score:

C = P (O b j e c t) \times I o U_{p r e d}^{t r u t h}

(3)

through more precise localization (higher

I o U

) and a stronger object presence probability

P (O b j e c t)

due to recovered details.

In this framework, BasicVSR++ [18] is employed as the core enhancement module due to its proven capability in recovering temporal-consistent details, which is crucial for video-based monitoring. While lightweight or edge-oriented SR alternatives like ESPCN or FSRCNN offer superior inference speed and lower computational cost, they are primarily designed for single-image restoration and lack explicit temporal modeling. In video surveillance, consistency across frames is critical to suppress flickering artifacts and stabilize detection results over time. BasicVSR++’s bidirectional recurrent structure, with dedicated propagation and alignment modules, explicitly aggregates information across multiple frames. This architecture effectively reduces inter-frame noise and compensates for motion blur—both of which are common challenges in dynamic surveillance scenes. Its use of a hidden state

{\hat{h}}_{t}

to propagate information facilitates the learning of long-range dependencies, which is essential for recovering coherent details in sequential frames. However, the model’s computational load is non-negligible. For instance, applying BasicVSR++ to entire 4K frames (3840 × 2160 pixels) resulted in GPU memory overflow on the experimental platform, necessitating the ROI-guided processing strategy described in Section 3.3. The selection of BasicVSR++ is justified by its inherent design advantages for video-based enhancement, despite its higher computational complexity. Rather than developing a new SR model, the contribution lies in the strategic application of this advanced technique specifically for boosting helmet detection performance.

Formally, let

I_{t}^{L R}

denote the low-resolution frame at time t, and

I_{t}^{S R}

the reconstructed high-resolution frame. In keeping with the notation in BasicVSR++ the frame is generated by:

{\hat{h}}_{t} = f_{p r o p} (I_{t}^{L R}, h_{t - 1}), I_{t}^{S R} = f_{r e c o n} ({\hat{h}}_{t})

(4)

where

f_{p r o p}

is the propagation module,

f_{r e c o n}

is the reconstruction module, and

h_{t - 1}

is the hidden state from the previous frame.

This framework is trained to minimize the discrepancy between the super-resolved output and the ground-truth high-resolution frame. The training loss used is the L1 pixel-wise loss, ensuring the super-resolved frames are close to the high-resolution ground truth

I_{t}^{H R}

:

L_{S R} = \frac{1}{n} \sum_{i = 1}^{N} {‖ I_{i}^{H R} ‖}_{1}

(5)

In the target surveillance scenarios, workers’ helmets often occupy only a few pixels and exhibit weak contrast against complex backgrounds. By recovering fine-grained visual cues through super-resolution, the method provides more discriminative input for the subsequent detector. This enables YOLO to capture finer gradients, contours, and textures essential for accurate classification and localization, thereby reducing false negatives and improving overall detection robustness.

Furthermore, the temporal recurrence in BasicVSR++ enables information aggregation across frames, which helps suppress noise and stabilize the appearance of small objects throughout the video sequence. This property is particularly valuable in surveillance applications where motion blur or compression artifacts frequently cause detection instability. The combined effect of spatial detail recovery and temporal consistency significantly improves the reliability of helmet detection in challenging environments.

3.3. ROI-Guided Super-Resolution and Detection

Since full-frame super-resolution involves processing a large number of pixels per frame, the inference time grows significantly with frame size. When applied to continuous surveillance video, this approach introduces considerable latency, which limits the feasibility of real-time deployment in outdoor operational scenarios. To address this issue, a two-stage processing scheme was designed to reduce redundant computation. In the first stage, YOLOv8 was applied to the original surveillance frames to detect workers, whose larger body size makes them easier to localize than helmets. The detected bounding boxes were then used to define candidate ROIs. Rather than performing super-resolution on the entire frame, BasicVSR++ was applied only to these cropped regions, focusing computational effort on areas most relevant to helmet analysis. The super-resolved ROIs were subsequently processed by YOLOv8 for helmet detection. This design effectively reduces the workload of the super-resolution module while maintaining detection accuracy in critical regions.

However, the cropped ROIs are often smaller than the typical input size expected by YOLO, which can degrade detection performance due to insufficient feature resolution. A preliminary experiment confirmed that feeding small ROIs (e.g., 60 × 60 pixels) directly to YOLOv8 resulted in unstable detections and missed helmets due to the drastic downsampling in deep layers. To mitigate this issue, as shown in Figure 3, the ROIs can be repeated and tiled to form a grid-like image before inference. This tiling strategy aims to address two challenges: (1) It enlarges the effective input size to match the network’s downsampling architecture, preserving more spatial features after convolution and pooling operations. (2) It implicitly increases the spatial support for small objects without altering the detector’s receptive field design. Compared to simply resizing a small ROI to a larger resolution—which introduces interpolation artifacts and blur—the tiling approach maintains the original high-frequency details recovered by SR while providing YOLO with sufficient spatial context for feature extraction. This tiling approach effectively enlarges the input area, allowing YOLO to extract more spatial features from the small objects and increasing the detector’s sensitivity to helmets. By preserving relative spatial relationships within the ROI while providing a larger receptive field, this strategy improves the likelihood of accurate detection without increasing the overall computational load significantly.

To balance feature preservation and computational efficiency, different grid sizes were systematically evaluated. Given that the target helmet size after super-resolution is approximately 60 × 60 pixels and YOLOv8’s downsampling factor is S = 32, a single 60 × 60 ROI would be reduced to merely ≈1.9 × 1.9 pixels in the deepest feature map—insufficient for reliable feature extraction. By tiling the ROI into an N × N grid, the effective input dimension becomes (N·H) × (N·W), where H and W are the ROI height and width. For N = 3, the gridded input is 180 × 180 pixels, downsampled to ≈5.6 × 5.6 pixels in deep layers; for N = 5, it becomes 300 × 300 pixels, corresponding to ≈9.4 × 9.4 pixels; for N = 7, it reaches 420 × 420 pixels, yielding ≈13.1 × 13.1 pixels in the feature map. Although larger N preserves more spatial detail, it also increases pixel processing overhead quadratically. N = 5 was selected as a practical trade-off, providing nearly 10 × 10 feature-map support (adequate for modern detectors) while keeping the total pixel count within a manageable range for subsequent detection stages. This choice ensures that helmets retain discernible structural representation throughout the network without introducing excessive computational burden.

4. Experiments and Results

4.1. Dataset and Experimental Setup

To comprehensively evaluate the system performance, a custom-built helmet detection dataset alongside real-world industrial surveillance videos was employed.

The helmet detection dataset was specifically collected and constructed for this study to train and validate the helmet/head detection model. It comprises images from diverse sources, including public safety databases and controlled industrial environments. The dataset contains 7581 images partitioned into training, validation, and test sets with 5457, 607, and 1517 images, respectively (approximately a 7:1:2 ratio). All images were manually annotated with pixel-level bounding boxes using the labelImg tool, following a standard object detection annotation protocol. Two categories were defined: helmet (any standard safety helmet worn by a worker) and head (a visible human head without a helmet). To enhance model robustness and address the relatively lower sample count for the head class, standard data augmentation strategies were applied during training, including random horizontal flipping (±15°), minor angle rotation (±10°), and brightness/contrast adjustment (±20%). This resulted in an approximately balanced training set (helmet: ~48%, head: ~52%).

For model initialization and training, the helmet detection model was initialized with the officially released YOLOv8n pre-trained weights from Ultralytics and fine-tuned on the dataset for 100 epochs, achieving a mean Average Precision (mAP@0.5) of 0.928 on the test set. The human detection model utilized the same official YOLOv8n pre-trained weights without further fine-tuning to evaluate its out-of-the-box generalization. The video super-resolution component employed the official BasicVSR++ pre-trained model (trained on standard datasets including REDS and Vimeo-90K), configured with a channel width of 64, seven residual blocks, and eight-frame temporal propagation with one intermediate feature fusion after 600,000 training iterations.

Representative testing videos, including the sequence shown in Figure 4, were captured at an outdoor construction site in Bazhong, Sichuan Province, China, with a resolution of 4K at 30 fps. The camera-to-target distance was approximately 210 m, and the average height of persons in the frames was around 150 pixels. Furthermore, to more rigorously assess generalization capability, additional testing was conducted using publicly available video sequences sourced from the internet, encompassing a wider variety of scenarios such as open-pit mines and highway work zones.

All experiments were conducted on a notebook PC equipped with an NVIDIA RTX 4070 8GB GPU. The deep learning framework used was PyTorch 2.1 with CUDA 11.8 support.

4.2. Performance of Super-Resolution

The performance of the proposed super-resolution-enhanced helmet detection framework was quantitatively evaluated using test video sequences captured at a mining site in Bazhong, Sichuan Province, China. Each cropped ROI was repeated in a 5 × 5 grid to mitigate the negative impact of small spatial dimensions on detection accuracy. Both the original low-resolution ROIs and the super-resolved ROIs were analyzed under identical conditions to ensure a fair comparison.

Figure 5 and Figure 6 illustrate the comparison of average confidence and recall for a selected segment of the test video. Figure 5 presents the average confidence of helmet detection, while Figure 6 shows the corresponding detection rates. The results indicate that both metrics improve after super-resolution, suggesting that the details restored by BasicVSR++ facilitate the recognition of helmets occupying only a few pixels in the original frames, thereby enhancing detection reliability.

Table 1 summarizes the overall detection statistics for the test video sequence. Quantitative analysis focuses on four metrics: average confidence, recall, precision, and the F1-score. In the present experiments, no false positives were observed under the applied detection confidence threshold (>0.5), resulting in a precision of 1.000 for all strategies. This outcome can be attributed to the two-stage YOLO detection architecture: while the first stage may introduce false positives, most are effectively filtered out by the second-stage detector. Furthermore, the relatively controlled background in this test set and the distinct visual appearance of safety helmets compared to common headwear (e.g., ordinary caps) may have contributed to the absence of false alarms.

The inclusion of super-resolution processing led to a significant increase in the number of correctly detected helmets and a corresponding decrease in missed detections. This improvement is reflected in the substantial gains in both recall and F1-score. The rise in average confidence further demonstrates that the fine-grained features recovered through super-resolution can be effectively leveraged by YOLO for enhanced feature extraction and classification. It should be noted, however, that the high confidence threshold, while ensuring perfect precision in this context, may reduce the recall for low-quality targets (e.g., extremely blurred helmets). In more complex deployment environments, such as scenarios involving severe occlusions, non-standard helmet designs, or extreme motion blur, careful threshold tuning will be essential to effectively balance precision and recall, as such conditions may introduce significantly more failure scenarios.

A comparative baseline is included in Table 1 to contextualize the performance gain of the proposed framework. This baseline shares a similar “super-resolution followed by detection” structure with the proposed method and represents a moderate-complexity solution with practical relevance. It should be noted that the use of YOLOv5 in the baseline, versus YOLOv8 in the proposed research, may introduce some degree of uncertainty in the comparison. However, the substantial performance margin observed in Table 1—particularly in recall—suggests that the improvement stems largely from the proposed ROI-guided strategy and the integration of an advanced VSR module, rather than from detector differences alone.

To visually validate the efficacy of the BasicVSR++ super-resolution module prior to its integration into the detection pipeline, a comprehensive analysis was conducted on representative test cases. The following figures provide both qualitative and quantitative evidence of its performance in enhancing image detail and structural integrity.

Figure 7 provides a holistic visual assessment of the BasicVSR++ algorithm on a low-resolution input image (64 × 147). Figure 7a establishes the baseline, showing the original low-quality image with limited detail. The direct output of BasicVSR++ (Figure 7b) demonstrates a significant 4× spatial resolution increase (256 × 588), resulting in a visibly sharper image. To enable a fair, pixel-aligned comparison, the super-resolved image is downsampled back to the original resolution in Figure 7c. A zoomed side-by-side comparison (Figure 7d) between this downsampled SR result and the original (Figure 7a) reveals that the SR version contains richer local details, confirming the algorithm’s ability to inject plausible high-frequency information rather than merely performing interpolation.

The Difference Heatmap (Figure 7e), where red indicates larger pixel-wise deviations, visually quantifies the changes introduced by BasicVSR++. The concentration of changes in textured and edge areas aligns with the expected behavior of a detail-enhancing algorithm. Further analysis through Edge Detection (Figure 7f) and High-Frequency Detail extraction (Figure 7g) confirms that BasicVSR++ not only preserves but also sharpens structural edges and amplifies genuine texture details, in contrast to the blurred or noisy artifacts often produced by simple upscaling. Finally, the direct comparison in Figure 7h between a naïve Nearest-Neighbor ×4.0 upscaling of the original and the BasicVSR++ output conclusively demonstrates the superiority of the learned SR model in generating a more detailed, photorealistic, and visually coherent high-resolution image.

Figure 8 offers a granular, region-specific analysis to dissect the performance of BasicVSR++ across three fundamental image component types: texture, edges, and smooth areas. Each row is dedicated to one region type, comparing the naïve Nearest-Neighbor ×4.0 upscaling (baseline) against the BasicVSR ×4.0 result, supported by a difference map with objective metrics (PSNR, SSIM) and a frequency spectrum analysis.

Row 1 (Texture Region): The complex texture area (Figure 8a,b) shows that BasicVSR++ reconstructs a more natural and coherent texture pattern compared to the blocky and artifact-prone result from nearest-neighbor interpolation. The extremely high PSNR (54.04 dB) and SSIM (0.999) values in (Figure 8c) indicate that the downsampled SR result is nearly identical to the original LR input in this region, proving excellent information preservation in the low-frequency domain. The frequency spectra (Figure 8d) are highly similar, confirming that the algorithm adds high-frequency details in a controlled manner consistent with the original signal’s characteristics.

Row 2 (Edge Region): For sharp edges (Figure 8e,f), BasicVSR++ produces clean, continuous lines, effectively mitigating the “jagged” staircase artifacts characteristic of simple interpolation. The near-perfect PSNR (57.72 dB) and SSIM (1.000) in Figure 8g signify almost flawless structural alignment upon downsampling. The spectrum comparison (Figure 8h) shows that BasicVSR enhances the mid-to-high frequency components associated with edges, leading to superior visual sharpness.

Row 3 (Smooth Region): In homogeneous areas (Figure 8i,j), BasicVSR++ generates a clean, noise-free surface, whereas nearest-neighbor interpolation can amplify sensor noise or compression artifacts. The high PSNR (53.65 dB) and SSIM (1.000) in Figure 8k again demonstrate remarkable consistency with the original LR image after downsampling. The spectra (Figure 8l) reveal that BasicVSR suppresses spurious high-frequency noise, resulting in a spectrally cleaner output.

Collectively, Figure 7 and Figure 8 provide concrete visual evidence that the BasicVSR++ algorithm successfully addresses the core challenges of super-resolution: it enhances fine details in textures, sharpens and preserves structural edges, and suppresses noise in smooth regions. The consistently high PSNR/SSIM values when comparing the downsampled SR output to the original LR image confirm the algorithm’s high fidelity and stability, ensuring that the enhancement process does not distort the original scene content. It should be noted that this analysis is based on relatively static test frames. Performance may vary under more challenging real-world conditions such as severe motion blur, extreme lighting variations (e.g., heavy shadows or overexposure), or highly complex, non-repetitive textures, where the model’s ability to hallucinate accurate high-frequency details could be constrained. Nonetheless, the demonstrated robust visual improvement under typical surveillance degradation establishes a solid foundation for its subsequent role in boosting the performance of downstream detection tasks in low-resolution surveillance scenarios.

Figure 9 provides a qualitative comparison of helmet detection in a sample frame before and after super-resolution. The detection improvement is evident, and these qualitative results, together with the quantitative enhancements in Table 1, indicate that super-resolution as a preprocessing step effectively improves detection performance in low-quality surveillance scenarios.

Building upon the initial tests, the framework was further deployed across diverse outdoor operational scenarios to validate its generalizability. As illustrated by the additional test result from a road construction site in Figure 10, the method consistently enhances helmet visibility and detection reliability. Extensive qualitative evaluations conducted on footage from construction sites, open-pit mining operations, and highway maintenance zones—environments that share the common challenges of low-resolution and small targets but differ in background clutter and lighting conditions—confirm the robust performance of the SR-enhanced detection pipeline. This demonstrates the framework’s potential for broad applicability in various safety-critical monitoring tasks.

4.3. Impact of ROI-Strategy on Processing Time

The processing time of full-frame super-resolution and ROI-guided strategies was evaluated to assess the feasibility of real-time helmet detection. Applying BasicVSR++ to entire 4K frames (3840 × 2160 pixels) resulted in GPU memory overflow on the experimental platform. When frames were cropped to a size of 462 × 260 pixels, processing 25 frames required approximately 50 s, which remains insufficient for real-time requirements.

To reduce computational load, an ROI-guided strategy was introduced. Worker bodies were first detected using YOLOv8, and the resulting bounding boxes defined the regions for super-resolution. Applying BasicVSR++ exclusively to these cropped ROIs significantly reduced the per-frame processing time. Table 2 summarizes the comparison of average processing times between full-frame and ROI-based super-resolution methods.

It should be noted that the processing times reported in Table 2 were measured on a desktop GPU (NVIDIA RTX 4070). While the ROI-guided strategy significantly reduces computational load compared to full-frame super-resolution, real-time deployment on UAVs or other edge devices would require further optimization. Current UAV platforms with onboard AI capabilities (e.g., DJI Matrice 30 Series (DJI, Shenzhen, China), Autel EVO II (Autel Robotics, Shenzhen, China)) typically support lightweight detection models but may not yet accommodate complex super-resolution networks without customized acceleration. Nevertheless, the demonstrated efficiency gain from ROI guidance provides a viable pathway toward edge deployment, especially as embedded AI processors continue to evolve in performance and programmability.

4.4. End-to-End Latency and Real-Time Applicability Analysis

To comprehensively evaluate the real-time processing capability of the proposed framework in practical deployment scenarios, end-to-end latency tests were conducted for the complete processing pipeline on a unified experimental platform. The tests encompassed the entire process from low-resolution video frame input to final detection output, including four main stages: person detection, ROI cropping, super-resolution reconstruction, and helmet detection. The test video was the aforementioned 4K resolution (3840 × 2160), 30 fps outdoor construction site surveillance footage, from which a continuous 100-frame segment containing three workers was extracted for analysis.

As shown in Table 3, directly using YOLOv8 for helmet detection yields the lowest latency (28.3 ms/frame) with a frame rate of 35.3 fps. However, as mentioned earlier, its detection performance on low-resolution videos is severely inadequate. The full-frame super-resolution strategy, while not executable due to GPU memory overflow with BasicVSR++ on 4K frames, demonstrates the computational infeasibility of processing entire high-resolution frames in real-time. The baseline super-resolution method from prior work, although operational, incurs a high latency of 1325.2 ms/frame (0.75 fps), which is still far from real-time requirements.

In contrast, the ROI-Guided Super-Resolution Strategy substantially reduces the end-to-end latency to approximately 229.7 ms/frame (equivalent to 4.35 fps) while maintaining high detection performance. This strategy incorporates a person detection stage (33.9 ms/frame) to locate workers, followed by ROI cropping (0.4 ms/frame). Super-resolution is then applied only to these cropped regions (187.6 ms/frame), significantly reducing the pixel count processed compared to full-frame approaches. The final helmet detection on the enhanced ROIs is efficient (7.8 ms/frame). This pipeline effectively avoids the computational redundancy associated with full-frame processing and demonstrates a viable trade-off between accuracy and latency.

It is noteworthy that the current tests were conducted on a desktop-class GPU (NVIDIA RTX 4070 Laptop). Further optimization would be required for deployment on actual edge devices, such as UAV embedded platforms. Nevertheless, a processing speed of 4.35 fps approaches the “near real-time” performance thresholds acceptable in many industrial monitoring contexts (often 5–10 fps), particularly in scenarios with slow-moving targets or relatively static backgrounds. Future work involving model quantization, lightweight super-resolution networks tailored for edge devices, and hardware-specific acceleration (e.g., via TensorRT [34]) could further improve the frame rate, enabling stricter real-time performance.

In practical deployments, adaptive strategies such as dynamic ROI selection, variable-resolution processing based on target size, or frame-skipping for stationary scenes could be employed to reduce the computational load further. For instance, reusing ROIs across consecutive frames when worker movement is minimal could enhance system responsiveness without significantly compromising detection reliability.

5. Conclusions

This work introduces a stepwise framework for helmet detection in challenging outdoor environments, such as construction sites, transportation infrastructure maintenance zones, and open-pit mining areas. The framework is designed to address the limitations of low-resolution and compressed video obtained from both stationary cameras and UAV platforms. By integrating video super-resolution with ROI-guided detection, the method effectively recovers fine details of small objects while reducing unnecessary computation, thereby supporting near real-time performance essential for mobile surveillance systems.

Experiments conducted across multiple low-quality surveillance scenarios demonstrate that the proposed approach consistently improves both detection confidence and recall, particularly when helmets occupy only a few pixels in the original footage. Quantitative evaluations confirm that super-resolution enhances feature extraction for YOLO, while the ROI-based strategy significantly reduces inference time compared to full-frame super-resolution processing. Qualitative results further validate the framework’s ability to improve visibility and detection reliability under visually degraded conditions.

Despite these advantages, the framework exhibits certain limitations. The super-resolution module relies heavily on CUDA acceleration, which imposes considerable computational demands. The experiments were conducted on a high-end NVIDIA RTX 4070 Laptop GPU, and the current implementation achieves near real-time performance only on such desktop-class hardware. This may challenge direct deployment on resource-embedded devices commonly used in UAVs or edge monitoring systems. However, the progressive reduction in processing time from full-frame SR to ROI-guided SR demonstrates the potential for further optimization towards edge deployment. Notably, modern UAV platforms such as the DJI Matrice series are increasingly equipped with dedicated onboard computing modules capable of running lightweight detection models in real-time. While the current super-resolution module remains computationally intensive, its ROI-guided variant represents a step toward feasible edge integration. Future work will focus on developing lightweight super-resolution networks, model quantization, and hardware-aware optimization to improve operational efficiency on edge devices, thereby better supporting real-time helmet monitoring in UAV-based and mobile surveillance scenarios.

Looking forward, further research may explore adaptive ROI selection mechanisms driven by scene dynamics, multi-scale feature aggregation strategies, and system-level implementation on optimized embedded hardware. Furthermore, while this study is focused on helmet detection, the underlying framework of enhancing low-resolution inputs and focusing computation is conceptually extensible. A promising direction is its adaptation to multi-class detection of various Personal Protective Equipment (PPE), such as safety vests and gloves, which would involve addressing new challenges like curated multi-class dataset construction and model adaptation for heterogeneous objects. Such enhancements will strengthen the framework’s applicability for real-time helmet monitoring across a broader range of large-scale outdoor industrial and transportation scenarios.

Author Contributions

Conceptualization, T.H. and Z.W.; methodology, T.H. and Z.W.; software, Z.W.; validation, T.H. and Z.W.; formal analysis, Z.W.; investigation, T.H.; resources, T.H.; data curation, Z.W.; writing—original draft preparation, Z.W.; writing—review and editing, T.H., Z.W. and L.Y.; visualization, Z.W.; supervision, L.Y.; project administration, L.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors would like to express their sincere gratitude to School of Automation Engineering, University of Electronic Science and Technology of China, for providing the resources that were essential for this research.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CNN	Convolutional Neural Network
EH-DETR	Enhanced two-wheeler Helmet Detection TRansformer
HR	High-Resolution
LR	Low-Resolution
ROI	Region Of Interest
SR	Super-Resolution
UAV	Unmanned Aerial Vehicle
VSR	Video Super-Resolution
YOLO	You Only Look Once

References

Dhillon, B.S. Mining equipment safety: A review, analysis methods and improvement strategies. Int. J. Min. Reclam. Environ. 2009, 23, 168–179. [Google Scholar] [CrossRef]
Park, J.; Nehad, E.; Zhu, Z. Hardhat-wearing detection for enhancing on-site safety of construction workers. J. Constr. Eng. Manag. 2015, 141, 04015024. [Google Scholar] [CrossRef]
Park, J.; Kang, D. Artificial intelligence and smart technologies in safety management: A comprehensive analysis across multiple industries. Appl. Sci. 2024, 14, 11934. [Google Scholar] [CrossRef]
Zhang, C.; Zhang, J.; Bai, T.; Gao, X. Monitoring and Identification of Road Construction Safety Factors via UAV. Sensors 2022, 22, 8797. [Google Scholar] [CrossRef]
Nowakowski, M. Operational Environment Impact on Sensor Capabilities in Special Purpose Unmanned Ground Vehicles. In Proceedings of the 2024 21st International Conference on Mechatronics-Mechatronika (ME), Prague, Czech Republic, 10–13 December 2024; IEEE: Piscataway, NJ, USA, 2024. [Google Scholar] [CrossRef]
Munir, A.; Wang, K.; Li, J.; Chen, Z.; Zhang, Y. Impact of Adverse Weather and Image Distortions on Vision-Based UAV Detection: A Performance Evaluation of Deep Learning Models. Drones 2024, 8, 638. [Google Scholar] [CrossRef]
Wang, Y.; Wei, X.; Shen, H.; Hu, J.; Luo, L. Performance evaluation of low resolution visual tracking for unmanned aerial vehicles. Neural Comput. Appl. 2020, 33, 2229–2248. [Google Scholar] [CrossRef]
Fang, Y.; Yuan, Y.; Li, L.; Wu, J.; Lin, W.; Li, Z. Performance Evaluation of Visual Tracking Algorithms on Video Sequences With Quality Degradation. IEEE Access 2017, 5, 2430–2441. [Google Scholar] [CrossRef]
Ultralytics. YOLOv26: The Future of Real-Time State-of-the-Art Object Detection. Available online: https://docs.ultralytics.com/models/yolo26/ (accessed on 25 November 2025).
Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Kauai, HI, USA, 8–14 December 2001; Volume 1, p. I. [Google Scholar] [CrossRef]
Gholami, M.; Varshosaz, M.; Pirasteh, S.; Shamsipour, G. Optimizing Sector Ring Histogram of Oriented Gradients for human injured detection from drone images. Geomat. Nat. Hazards Risk 2021, 12, 581–604. [Google Scholar] [CrossRef]
Ojala, T.; Pietikäinen, M.; Mäenpää, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
Wang, H.; Li, G.; Wen, Z. Fast SVM classifier for large-scale classification problems. Inf. Sci. 2023, 642, 119136. [Google Scholar] [CrossRef]
Tolstikhin, I.O.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Steiner, A.; Keysers, D.; Uszkoreit, J.; et al. Mlp-mixer: An all-mlp architecture for vision. Adv. Neural Inf. Process. Syst. 2021, 34, 24261–24272. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv 2013, arXiv:1311.2524. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. arXiv 2015, arXiv:1506.02640. [Google Scholar] [CrossRef]
Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Computer Vision–ECCV 2014; Springer: Cham, Switzerland, 2014; pp. 184–199. [Google Scholar] [CrossRef]
Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar] [CrossRef]
Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Computer Vision–ECCV 2018; Springer: Cham, Switzerland, 2018; pp. 286–301. [Google Scholar] [CrossRef]
Chan, K.C.K.; Zhou, S.; Xu, X.; Loy, C.C. BasicVSR++: Improving video super-resolution with enhanced propagation and alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 5972–5981. [Google Scholar] [CrossRef]
Doungmala, P.; Klubsuwan, K. Helmet wearing detection in Thailand using Haar like feature and circle hough transform on image processing. In Proceedings of the 2016 IEEE International Conference on Computer and Information Technology (CIT), Nadi, Fiji, 8–10 December 2016; pp. 611–614. [Google Scholar] [CrossRef]
Song, R.; Wang, Z. RBFPDet: An anchor-free helmet wearing detection method. Appl. Intell. 2023, 53, 5013–5028. [Google Scholar] [CrossRef]
Liu, L.; Yue, X.; Lu, M.; He, P. EH-DETR: Enhanced two-wheeler helmet detection transformer for small and complex scenes. J. Electron. Imaging 2025, 34, 013035. [Google Scholar] [CrossRef]
Liang, H.; Seo, S. UAV low-altitude remote sensing inspection system using a small target detection network for helmet wear detection. Remote Sens. 2022, 15, 196. [Google Scholar] [CrossRef]
Kumari, S.; Choudhary, M.; Mishra, R.; Chaulya, S.K.; Prasad, G.M.; Mandal, S.K.; Banerjee, G. Artificial intelligent based smart system for safe mining during foggy weather. Concurr. Comput. Pract. Exp. 2022, 34, e6631. [Google Scholar] [CrossRef]
Huang, L.; Fu, Q.; He, M.; Jiang, D.; Hao, Z. Detection algorithm of safety helmet wearing based on deep learning. Concurr. Comput. Pract. Exp. 2021, 33, e6234. [Google Scholar] [CrossRef]
Yang, G.; Hong, X.; Sheng, Y.; Sun, L. YOLO-Helmet: A novel algorithm for detecting dense small safety helmets in construction scenes. IEEE Access 2024, 12, 107170–107180. [Google Scholar] [CrossRef]
Zhang, Y.; Huang, S.; Qin, J.; Li, X.; Zhang, Z.; Fan, Q.; Tan, Q. Detection of helmet use among construction workers via helmet-head region matching and state tracking. Autom. Constr. 2025, 171, 105987. [Google Scholar] [CrossRef]
Han, D.; Ying, C.; Tian, Z.; Dong, Y.; Chen, L.; Wu, X.; Jiang, Z. YOLOv8s-SNC: An improved safety-helmet-wearing detection algorithm based on YOLOv8. Buildings 2024, 14, 3883. [Google Scholar] [CrossRef]
Wang, S.; Wu, P.; Wu, Q. Safety helmet detection based on improved YOLOv7-tiny with multiple feature enhancement. J. Real-Time Image Process. 2024, 21, 120. [Google Scholar] [CrossRef]
Song, X.; Zhang, T.; Yi, W. An improved YOLOv8 safety helmet wearing detection network. Sci. Rep. 2024, 14, 17550. [Google Scholar] [CrossRef] [PubMed]
Chen, X.; Xie, Q. Safety Helmet-Wearing Detection System for Manufacturing Workshop Based on Improved YOLOv7. J. Sens. 2023, 2023, 7230463. [Google Scholar] [CrossRef]
Liu, Y.; Li, Z.; Zhan, B.; Han, J.; Liu, Y. A super-resolution reconstruction driven helmet detection workflow. Appl. Sci. 2022, 12, 545. [Google Scholar] [CrossRef]
Ultralytics. Optimizing Ultralytics YOLO Models with the TensorRT Integration. Ultralytics Blog. 2024. Available online: https://www.ultralytics.com/zh/blog/optimizing-ultralytics-yolo-models-with-the-tensorrt-integration (accessed on 8 January 2026).

Figure 1. Performance evaluation of the helmet detection model: (a) Precision–Recall (PR) curve of the helmet detection model trained on the dedicated helmet dataset; (b) Detection performance: the upper panel shows results on standard test targets, while the lower panel illustrates recognition results from actual low-resolution surveillance footage.

Figure 2. Workflow of the different detection strategies.

Figure 3. Example of a tiled ROI for detection enhancement. A single super-resolved ROI (left) is replicated into a grid (here arranged as 3 × 5 for visualization; the actual pipeline uses a 5 × 5 grid) to ensure adequate feature-map resolution after network down-sampling. The tiled grid (right) shows how this single ROI is replicated to form the input batch for the network, preserving spatial details through the down-sampling stages.

Figure 4. Example of the test video captured in a real scene.

Figure 5. Comparison of confidence before and after super-resolution of test cases.

Figure 6. Comparison of detection rates before and after super-resolution of test cases.

Figure 7. Comprehensive Visual Comparison of Super-Resolution Effects.

Figure 8. Fine-Grained Detail Enhancement Analysis Across Different Image Regions.

Figure 9. Qualitative comparison of helmet detection before and after super-resolution preprocessing: (a) Detection results on the original low-resolution frame (individual panel dimensions: 64 × 147 pixels); (b) Detection results on the BasicVSR++ super-resolved frame (individual panel dimensions: 256 × 588 pixels).

Figure 10. Validation of the proposed method in complex outdoor environments: a case study at a road construction site. (a) Helmet detection results under original low-resolution conditions (dimensions: 232 × 138 pixels); (b) Helmet detection results after BasicVSR++ super-resolution preprocessing (dimensions: 928 × 552 pixels).

Table 1. Detection performance data of the test target in three different modes.

Detection Strategies	Average Confidence	Recall	F1 Score	Precision
YOLOv8	0.210	0.066	0.124	1.0
Proposed SR & YOLOv5 [33]	0.493	0.629	0.773	1.0
ROI Grid & YOLOv8	0.406	0.598	0.748	1.0
ROI Grid & BasicVSR++ & YOLOv8	0.632	0.825	0.904	1.0

Table 2. Comparison of super-resolution time performance data under different strategies.

SR Strategies	Image Size (pixels)	Total Time (s)	Average Time per Frame (s)	Frames per Seconds
Full-Frame SR	3840 × 2160	(GPU overflow)	-	-
Cropped SR	462 × 260	49.86	1.994	0.502
ROI-Guided SR	around 150 × 64 ¹	4.69	0.188	5.330

¹ The minimum input size for BasicVSR++ is 64 × 64 pixels. A larger value is used when the target width exceeds this dimension.

Table 3. End-to-end processing latency under different strategies (unit: milliseconds per frame).

Processing Strategy	Person Detection	ROI Cropping	Super-Resolution	Helmet Detection	Total Latency	FPS
YOLOv8	-	-	-	28.3	28.3	35.3
Full-Frame SR & YOLOv8	-	-	(GPU overflow)	-	-	-
Proposed SR & YOLOv5 [33]	-	-	1286.3	38.9	1325.2	0.75
ROI-Guided SR & YOLOv8	33.9	0.4	187.6	7.8	229.7	4.35

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

He, T.; Wang, Z.; Yang, L. Fast Helmet Detection in Low-Resolution Surveillance via Super-Resolution and ROI-Guided Inference. Appl. Sci. 2026, 16, 967. https://doi.org/10.3390/app16020967

AMA Style

He T, Wang Z, Yang L. Fast Helmet Detection in Low-Resolution Surveillance via Super-Resolution and ROI-Guided Inference. Applied Sciences. 2026; 16(2):967. https://doi.org/10.3390/app16020967

Chicago/Turabian Style

He, Taiming, Ziyue Wang, and Lu Yang. 2026. "Fast Helmet Detection in Low-Resolution Surveillance via Super-Resolution and ROI-Guided Inference" Applied Sciences 16, no. 2: 967. https://doi.org/10.3390/app16020967

APA Style

He, T., Wang, Z., & Yang, L. (2026). Fast Helmet Detection in Low-Resolution Surveillance via Super-Resolution and ROI-Guided Inference. Applied Sciences, 16(2), 967. https://doi.org/10.3390/app16020967

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Fast Helmet Detection in Low-Resolution Surveillance via Super-Resolution and ROI-Guided Inference

Abstract

1. Introduction

2. Related Work

2.1. Traditional and CNN-Based Object Detection

2.2. Image and Video Super-Resolution for Detection Enhancement

2.3. Helmet Detection in Complex and UAV-Based Scenarios

3. Method

3.1. Helmet Detection Framework

3.2. Direct Super-Resolution Preprocessing

3.3. ROI-Guided Super-Resolution and Detection

4. Experiments and Results

4.1. Dataset and Experimental Setup

4.2. Performance of Super-Resolution

4.3. Impact of ROI-Strategy on Processing Time

4.4. End-to-End Latency and Real-Time Applicability Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI