MDPI - Publisher of Open Access Journals

27 pages, 2835 KB

Open AccessArticle

Textile Defect Detection Using Artificial Intelligence and Computer Vision—A Preliminary Deep Learning Approach

by Rúben Machado, Luis A. M. Barros, Vasco Vieira, Flávio Dias da Silva, Hugo Costa and Vitor Carvalho

Electronics 2025, 14(18), 3692; https://doi.org/10.3390/electronics14183692 - 18 Sep 2025

Viewed by 523

Abstract

Fabric defect detection is essential for quality assurance in textile manufacturing, where manual inspection is inefficient and error-prone. This paper presents a real-time deep learning-based system leveraging YOLOv11 for detecting defects such as holes, color bleeding and creases on solid-colored, patternless cotton and [...] Read more.

Fabric defect detection is essential for quality assurance in textile manufacturing, where manual inspection is inefficient and error-prone. This paper presents a real-time deep learning-based system leveraging YOLOv11 for detecting defects such as holes, color bleeding and creases on solid-colored, patternless cotton and linen fabrics using edge computing. The system runs on an NVIDIA Jetson Orin Nano platform and supports real-time inference, Message Queuing Telemetry (MQTT)-based defect reporting, and optional Real-Time Messaging Protocol (RTMP) video streaming or local recording storage. Each detected defect is logged with class, confidence score, location and unique ID in a Comma Separated Values (CSV) file for further analysis. The proposed solution operates with two RealSense cameras placed approximately 1 m from the fabric under controlled lighting conditions, tested in a real industrial setting. The system achieves a mean Average Precision (mAP@0.5) exceeding 82% across multiple synchronized video sources while maintaining low latency and consistent performance. The architecture is designed to be modular and scalable, supporting plug-and-play deployment in industrial environments. Its flexibility in integrating different camera sources, deep learning models, and output configurations makes it a robust platform for further enhancements, such as adaptive learning mechanisms, real-time alerts, or integration with Manufacturing Execution System/Enterprise Resource Planning (MES/ERP) pipelines. This approach advances automated textile inspection and reduces dependency on manual processes. Full article

(This article belongs to the Special Issue Deep/Machine Learning in Visual Recognition and Anomaly Detection)

► Show Figures

Figure 1

16 pages, 27727 KB

Open AccessArticle

Prompt Self-Correction for SAM2 Zero-Shot Video Object Segmentation

by Jin Lee, Ji-Hun Bae, Dang Thanh Vu, Le Hoang Anh, Zahid Ur Rahman, Heonzoo Lee, Gwang-Hyun Yu and Jin-Young Kim

Electronics 2025, 14(18), 3602; https://doi.org/10.3390/electronics14183602 - 10 Sep 2025

Viewed by 464

Abstract

Foundation models, exemplified by the Segment Anything Model (SAM), have revolutionized object segmentation with their impressive zero-shot capabilities. The recent SAM2 extended these abilities to the video domain, utilizing an object pointer and memory attention to maintain temporal segment consistency. However, a critical [...] Read more.

Foundation models, exemplified by the Segment Anything Model (SAM), have revolutionized object segmentation with their impressive zero-shot capabilities. The recent SAM2 extended these abilities to the video domain, utilizing an object pointer and memory attention to maintain temporal segment consistency. However, a critical limitation of SAM2 is its vulnerability to error accumulation, where an initial incorrect mask can propagate through subsequent frames, leading to tracking failure. To address this, we propose a novel method that actively monitors the temporal segment consistency of masks by evaluating the distance of object pointers across frames. When a potential error is detected via a sharp increase in distance, our method triggers a particle filter based re-inference module. This framework models object’s motion to predict a corrected bounding box, effectively guiding the model to recover the valid mask and preventing error propagation. Extensive zero-shot evaluations on DAVIS, LVOS v2, YouTube-VOS and qualitative results show that the proposed, parameter-free procedure consistently improves temporal coherence, raising mean IoU by 0.1 on DAVIS, by 0.13 on the LVOS v2 train split and 0.05 on the LVOS v2 validation split, and by 0.02 on YouTube-VOS, thereby offering a simple and effective route to more robust video object segmentation with SAM2. Full article

(This article belongs to the Collection Image and Video Analysis and Understanding)

► Show Figures

Figure 1

15 pages, 3434 KB

Open AccessArticle

Incremental Spatio-Temporal Augmented Sampling for Power Grid Operation Behavior Recognition

by Lingwen Meng, Di He, Guobang Ban and Siqi Guo

Electronics 2025, 14(18), 3579; https://doi.org/10.3390/electronics14183579 - 9 Sep 2025

Viewed by 249

Abstract

Accurate recognition of power grid operation behaviors is crucial for ensuring both safety and operational efficiency in smart grid systems. However, this task presents significant challenges due to dynamic environmental variations, limited labeled training data availability, and the necessity for continuous model adaptation. [...] Read more.

Accurate recognition of power grid operation behaviors is crucial for ensuring both safety and operational efficiency in smart grid systems. However, this task presents significant challenges due to dynamic environmental variations, limited labeled training data availability, and the necessity for continuous model adaptation. To overcome these limitations, we propose an Incremental Spatio-temporal Augmented Sampling (ISAS) method for power grid operation behavior recognition. Specifically, we design a spatio-temporal Feature-Enhancement Fusion Module (FEFM) which employs multi-scale spatio-temporal augmented fusion combined with a cross-scale aggregation mechanism, enabling robust feature learning that is resilient to environmental interference. Furthermore, we introduce a Selective Replay Mechanism (SRM) that implements a dual-criteria sample selection strategy based on error variability and feature-space divergence metrics, ensuring optimal memory bank updates that simultaneously maximize information gain while minimizing feature redundancy. Experimental results on the power grid behavior dataset demonstrate significant advantages of the proposed method in recognition robustness and knowledge retention compared to other methods. For example, it achieves an accuracy of 89.80% on sunny days and maintains exceptional continual learning stability with merely 2.74% forgetting rate on three meteorological scenarios. Full article

(This article belongs to the Special Issue Applications and Challenges of Image Processing in Smart Environment)

► Show Figures

Figure 1

20 pages, 2600 KB

Open AccessArticle

Multi-Radar Track Fusion Method Based on Parallel Track Fusion Model

by Jiadi Qi, Xiaoke Lu and Jinping Sun

Electronics 2025, 14(17), 3461; https://doi.org/10.3390/electronics14173461 - 29 Aug 2025

Viewed by 568

Abstract

With the development of multi-sensor collaborative detection technology, radar track fusion has become a key means to improve target tracking accuracy. Traditional fusion methods based on Kalman filtering and weighted averaging have the problem of insufficient adaptability in complex environments. This paper proposes [...] Read more.

With the development of multi-sensor collaborative detection technology, radar track fusion has become a key means to improve target tracking accuracy. Traditional fusion methods based on Kalman filtering and weighted averaging have the problem of insufficient adaptability in complex environments. This paper proposes an end-to-end deep learning track fusion method, which achieves high-precision track reconstruction through residual extraction and parallel network fusion, providing a new end-to-end method for track fusion. The method combines the attention mechanism and the long short-term memory network in parallel and optimizes the computational complexity. Through the uncertainty weighting mechanism, the fusion weight is dynamically adjusted according to the reliability of the track features. Experimental results show that the mean absolute error of fusion accuracy of this method is 79% lower than the Kalman filter algorithm and about 87% lower than the mainstream deep learning model, providing an effective way for multi-radar track fusion in complex scenarios. Full article

(This article belongs to the Special Issue Applications of Computational Intelligence, 3rd Edition)

► Show Figures

Figure 1

11 pages, 2091 KB

Open AccessArticle

Underwater Image Enhancement Method Based on Vision Mamba

by Yongjun Wang, Zhuo Chen, Maged Al-Barashi and Zeyu Tang

Electronics 2025, 14(17), 3411; https://doi.org/10.3390/electronics14173411 - 27 Aug 2025

Viewed by 434

Abstract

To address issues like haze, blurring, and color distortion in underwater images, this paper proposes a novel underwater image enhancement model called U-Vision Mamba, built on the Vision Mamba framework. The core innovation lies in a U-shaped network encoder for multi-scale feature extraction, [...] Read more.

To address issues like haze, blurring, and color distortion in underwater images, this paper proposes a novel underwater image enhancement model called U-Vision Mamba, built on the Vision Mamba framework. The core innovation lies in a U-shaped network encoder for multi-scale feature extraction, combined with a novel multi-scale sparse attention fusion module to effectively aggregate these features. This fusion module leverages sparse attention to capture global context while preserving fine details. The decoder then refines these aggregated features to generate high-quality underwater images. Experimental results on the UIEB dataset demonstrate that U-Vision Mamba significantly reduces image blurring and corrects color distortion, achieving a PSNR of 25.65 dB and an SSIM of 0.972. Both comprehensive subjective evaluation and objective metrics confirm the model’s superior performance and robustness, making it a promising solution for improving the clarity and usability of underwater imagery in applications like marine exploration and environmental monitoring. Full article

(This article belongs to the Special Issue Advanced Machine Learning Technologies and Their Applications in Intelligent Imaging and Image Processing)

► Show Figures

Figure 1

14 pages, 6474 KB

Open AccessArticle

Mobile-Based Deep Learning Approach for Multi-Object Aiwen Mango Grade Classification

by Yi-Chao Wu, Hung-Wei Hsu and Shyh-Wei Chen

Electronics 2025, 14(16), 3188; https://doi.org/10.3390/electronics14163188 - 11 Aug 2025

Viewed by 427

Abstract

In this paper, a mobile-based deep learning approach for multi-object Aiwen mango grade classification, MDLMAGC, was proposed to instantly identify the quality grade of Aiwen mangoes through smart mobile devices. The new Ivan mango training set images could be uploaded to the cloud [...] Read more.

In this paper, a mobile-based deep learning approach for multi-object Aiwen mango grade classification, MDLMAGC, was proposed to instantly identify the quality grade of Aiwen mangoes through smart mobile devices. The new Ivan mango training set images could be uploaded to the cloud database for model training to improve model accuracy. Through MDLMAGC proposed in this paper, the labor cost of manual identification could be reduced. The grade of Ivan mango could be classified accurately. The different grades of Aiwen mangoes thus could adopt corresponding sales strategies to achieve the most effective sales benefits. Full article

(This article belongs to the Special Issue AI in Signal and Image Processing)

► Show Figures

Figure 1

21 pages, 2267 KB

Open AccessArticle

Dual-Branch Network for Blind Quality Assessment of Stereoscopic Omnidirectional Images: A Spherical and Perceptual Feature Integration Approach

by Zhe Wang, Yi Liu and Yang Song

Electronics 2025, 14(15), 3035; https://doi.org/10.3390/electronics14153035 - 30 Jul 2025

Viewed by 338

Abstract

Stereoscopic omnidirectional images (SOIs) have gained significant attention for their immersive viewing experience by providing binocular depth with panoramic scenes. However, evaluating their visual quality remains challenging due to its unique spherical geometry, binocular disparity, and viewing conditions. To address these challenges, this [...] Read more.

Stereoscopic omnidirectional images (SOIs) have gained significant attention for their immersive viewing experience by providing binocular depth with panoramic scenes. However, evaluating their visual quality remains challenging due to its unique spherical geometry, binocular disparity, and viewing conditions. To address these challenges, this paper proposes a dual-branch deep learning framework that integrates spherical structural features and perceptual binocular cues to assess the quality of SOIs without reference. Specifically, the global branch leverages spherical convolutions to capture wide-range spatial distortions, while the local branch utilizes a binocular difference module based on discrete wavelet transform to extract depth-aware perceptual information. A feature complementarity module is introduced to fuse global and local representations for final quality prediction. Experimental evaluations on two public SOIQA datasets—NBU-SOID and SOLID—demonstrate that the proposed method achieves state-of-the-art performance, with PLCC/SROCC values of 0.926/0.918 and 0.918/0.891, respectively. These results validate the effectiveness and robustness of our approach in stereoscopic omnidirectional image quality assessment tasks. Full article

(This article belongs to the Special Issue AI in Signal and Image Processing)

► Show Figures

Figure 1

19 pages, 650 KB

Open AccessArticle

LEMAD: LLM-Empowered Multi-Agent System for Anomaly Detection in Power Grid Services

by Xin Ji, Le Zhang, Wenya Zhang, Fang Peng, Yifan Mao, Xingchuang Liao and Kui Zhang

Electronics 2025, 14(15), 3008; https://doi.org/10.3390/electronics14153008 - 28 Jul 2025

Viewed by 1576

Abstract

With the accelerated digital transformation of the power industry, critical infrastructures such as power grids are increasingly migrating to cloud-native architectures, leading to unprecedented growth in service scale and complexity. Traditional operation and maintenance (O&M) methods struggle to meet the demands for real-time [...] Read more.

With the accelerated digital transformation of the power industry, critical infrastructures such as power grids are increasingly migrating to cloud-native architectures, leading to unprecedented growth in service scale and complexity. Traditional operation and maintenance (O&M) methods struggle to meet the demands for real-time monitoring, accuracy, and scalability in such environments. This paper proposes a novel service performance anomaly detection system based on large language models (LLMs) and multi-agent systems (MAS). By integrating the semantic understanding capabilities of LLMs with the distributed collaboration advantages of MAS, we construct a high-precision and robust anomaly detection framework. The system adopts a hierarchical architecture, where lower-layer agents are responsible for tasks such as log parsing and metric monitoring, while an upper-layer coordinating agent performs multimodal feature fusion and global anomaly decision-making. Additionally, the LLM enhances the semantic analysis and causal reasoning capabilities for logs. Experiments conducted on real-world data from the State Grid Corporation of China, covering 1289 service combinations, demonstrate that our proposed system significantly outperforms traditional methods in terms of the F1-score across four platforms, including customer services and grid resources (achieving up to a 10.3% improvement). Notably, the system excels in composite anomaly detection and root cause analysis. This study provides an industrial-grade, scalable, and interpretable solution for intelligent power grid O&M, offering a valuable reference for the practical implementation of AIOps in critical infrastructures. Evaluated on real-world data from the State Grid Corporation of China (SGCC), our system achieves a maximum F1-score of 88.78%, with a precision of 92.16% and recall of 85.63%, outperforming five baseline methods. Full article

(This article belongs to the Special Issue Advanced Techniques for Multi-Agent Systems)

► Show Figures

Figure 1

17 pages, 1327 KB

Open AccessArticle

MA-HRL: Multi-Agent Hierarchical Reinforcement Learning for Medical Diagnostic Dialogue Systems

by Xingchuang Liao, Yuchen Qin, Zhimin Fan, Xiaoming Yu, Jingbo Yang, Rongye Shi and Wenjun Wu

Electronics 2025, 14(15), 3001; https://doi.org/10.3390/electronics14153001 - 28 Jul 2025

Viewed by 737

Abstract

Task-oriented medical dialogue systems face two fundamental challenges: the explosion of state-action space caused by numerous diseases and symptoms and the sparsity of informative signals during interactive diagnosis. These issues significantly hinder the accuracy and efficiency of automated clinical reasoning. To address these [...] Read more.

Task-oriented medical dialogue systems face two fundamental challenges: the explosion of state-action space caused by numerous diseases and symptoms and the sparsity of informative signals during interactive diagnosis. These issues significantly hinder the accuracy and efficiency of automated clinical reasoning. To address these problems, we propose MA-HRL, a multi-agent hierarchical reinforcement learning framework that decomposes the diagnostic task into specialized agents. A high-level controller coordinates symptom inquiry via multiple worker agents, each targeting a specific disease group, while a two-tier disease classifier refines diagnostic decisions through hierarchical probability reasoning. To combat sparse rewards, we design an information entropy-based reward function that encourages agents to acquire maximally informative symptoms. Additionally, medical knowledge graphs are integrated to guide decision-making and improve dialogue coherence. Experiments on the SymCat-derived SD dataset demonstrate that MA-HRL achieves substantial improvements over state-of-the-art baselines, including +7.2% diagnosis accuracy, +0.91% symptom hit rate, and +15.94% symptom recognition rate. Ablation studies further verify the effectiveness of each module. This work highlights the potential of hierarchical, knowledge-aware multi-agent systems for interpretable and scalable medical diagnosis. Full article

(This article belongs to the Special Issue Advanced Techniques for Multi-Agent Systems)

► Show Figures

Figure 1

20 pages, 1647 KB

Open AccessArticle

Research on the Enhancement of Provincial AC/DC Ultra-High Voltage Power Grid Security Based on WGAN-GP

by Zheng Shi, Yonghao Zhang, Zesheng Hu, Yao Wang, Yan Liang, Jiaojiao Deng, Jie Chen and Dingguo An

Electronics 2025, 14(14), 2897; https://doi.org/10.3390/electronics14142897 - 19 Jul 2025

Viewed by 373

Abstract

With the advancement in the “dual carbon” strategy and the integration of high proportions of renewable energy sources, AC/DC ultra-high-power grids are facing new security challenges such as commutation failure and multi-infeed coupling effects. Fault diagnosis, as an important tool for assisting power [...] Read more.

With the advancement in the “dual carbon” strategy and the integration of high proportions of renewable energy sources, AC/DC ultra-high-power grids are facing new security challenges such as commutation failure and multi-infeed coupling effects. Fault diagnosis, as an important tool for assisting power grid dispatching, is essential for maintaining the grid’s long-term stable operation. Traditional fault diagnosis methods encounter challenges such as limited samples and data quality issues under complex operating conditions. To overcome these problems, this study proposes a fault sample data enhancement method based on the Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP). Firstly, a simulation model of the AC/DC hybrid system is constructed to obtain the original fault sample data. Then, through the adoption of the Wasserstein distance measure and the gradient penalty strategy, an improved WGAN-GP architecture suitable for feature learning of the AC/DC hybrid system is designed. Finally, by comparing the fault diagnosis performance of different data models, the proposed method achieves up to 100% accuracy on certain fault types and improves the average accuracy by 6.3% compared to SMOTE and vanilla GAN, particularly under limited-sample conditions. These results confirm that the proposed approach can effectively extract fault characteristics from complex fault data. Full article

(This article belongs to the Special Issue Applications of Computational Intelligence, 3rd Edition)

► Show Figures

Figure 1

23 pages, 22543 KB

Open AccessArticle

Sketch Synthesis with Flowpath and VTF

by Junho Kim, Heekyung Yang and Kyumgha Min

Electronics 2025, 14(14), 2861; https://doi.org/10.3390/electronics14142861 - 17 Jul 2025

Viewed by 493

Abstract

We present a novel sketch generation scheme from an image using the flowpath and the value through the flow (VTF). The first stage of our scheme is to produce grayscale noisy sketch using a deep learning-based approach. In the second stage, the unclear [...] Read more.

We present a novel sketch generation scheme from an image using the flowpath and the value through the flow (VTF). The first stage of our scheme is to produce grayscale noisy sketch using a deep learning-based approach. In the second stage, the unclear contour and unwanted noises in the grayscale noisy sketch are resolved using our flowpath and VTF-based schemes. We build a flowpath by integrating the tangent flow extracted from the input image. The integrated tangent flow produces a strong clue for the salient contour of the shape in the image. We further compute VTF by sampling values through the flowpath to extract line segments that correspond to the sketch stroke. By combining the deep learning-based approach and VTF, we can extract salient sketch strokes from various images that suppresses unwanted noises. We demonstrate the excellence of our scheme by generating sketches from various images including portrait, landscape, objects, animals, and animation scenes. Full article

(This article belongs to the Special Issue New Trends in Computer Vision and Image Processing)

► Show Figures

Figure 1

25 pages, 1669 KB

Open AccessArticle

Zero-Shot Infrared Domain Adaptation for Pedestrian Re-Identification via Deep Learning

by Xu Zhang, Yinghui Liu, Liangchen Guo and Huadong Sun

Electronics 2025, 14(14), 2784; https://doi.org/10.3390/electronics14142784 - 10 Jul 2025

Viewed by 570

Abstract

In computer vision, the performance of detectors trained under optimal lighting conditions is significantly impaired when applied to infrared domains due to the scarcity of labeled infrared target domain data and the inherent degradation in infrared image quality. Progress in cross-domain pedestrian re-identification [...] Read more.

In computer vision, the performance of detectors trained under optimal lighting conditions is significantly impaired when applied to infrared domains due to the scarcity of labeled infrared target domain data and the inherent degradation in infrared image quality. Progress in cross-domain pedestrian re-identification is hindered by the lack of labeled infrared image data. To address the degradation of pedestrian recognition in infrared environments, we propose a framework for zero-shot infrared domain adaptation. This integrated approach is designed to mitigate the challenges of pedestrian recognition in infrared domains while enabling zero-shot domain adaptation. Specifically, an advanced reflectance representation learning module and an exchange–re-decomposition–coherence process are employed to learn illumination invariance and to enhance the model’s effectiveness, respectively. Additionally, the CLIP (Contrastive Language–Image Pretraining) image encoder and DINO (Distillation with No Labels) are fused for feature extraction, improving model performance under infrared conditions and enhancing its generalization capability. To further improve model performance, we introduce the Non-Local Attention (NLA) module, the Instance-based Weighted Part Attention (IWPA) module, and the Multi-head Self-Attention module. The NLA module captures global feature dependencies, particularly long-range feature relationships, effectively mitigating issues such as blurred or missing image information in feature degradation scenarios. The IWPA module focuses on localized regions to enhance model accuracy in complex backgrounds and unevenly lit scenes. Meanwhile, the Multi-head Self-Attention module captures long-range dependencies between cross-modal features, further strengthening environmental understanding and scene modeling. The key innovation of this work lies in the skillful combination and application of existing technologies to new domains, overcoming the challenges posed by vision in infrared environments. Experimental results on the SYSU-MM01 dataset show that, under the single-shot setting, Rank-1 Accuracy (Rank-1) andmean Average Precision (mAP) values of 37.97% and 37.25%, respectively, were achieved, while in the multi-shot setting, values of 34.96% and 34.14% were attained. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Computer Vision)

► Show Figures

Figure 1

22 pages, 9809 KB

Open AccessArticle

Real-Time Multi-Camera Tracking for Vehicles in Congested, Low-Velocity Environments: A Case Study on Drive-Thru Scenarios

by Carlos Gellida-Coutiño, Reyes Rios-Cabrera, Alan Maldonado-Ramirez and Anand Sanchez-Orta

Electronics 2025, 14(13), 2671; https://doi.org/10.3390/electronics14132671 - 1 Jul 2025

Viewed by 937

Abstract

In this paper we propose a novel set of techniques for real-time Multi-Target Multi-Camera (MTMC) tracking of vehicles in congested, low speed environments, such as those of drive-thru scenarios, where metrics such as the number of vehicles, time of stay, and interactions between [...] Read more.

In this paper we propose a novel set of techniques for real-time Multi-Target Multi-Camera (MTMC) tracking of vehicles in congested, low speed environments, such as those of drive-thru scenarios, where metrics such as the number of vehicles, time of stay, and interactions between vehicles and staff are needed and must be highly accurate. Traditional methods of tracking based on Intersection over Union (IoU) and basic appearance features produce fragmented trajectories of misidentifications under these conditions. Furthermore, detectors, such as YOLO (You Only Look Once) architectures, exhibit different types of errors due to vehicle proximity, lane changes, and occlusions. Our methodology introduces a new tracker algorithm, Multi-Object Tracker based on Corner Displacement (MTCD), that improves the robustness against bounding box deformations by analysing corner displacement patterns and several other factors involved. The proposed solution was validated on real-world drive-thru footage, outperforming standard IoU-based trackers like Nvidia Discriminative Correlation Filter (NvDCF) tracker. By maintaining accurate cross-camera trajectories, our framework enables the extraction of critical operational metrics, including vehicle dwell times and person–vehicle interaction patterns, which are essential for optimizing service efficiency. This study tackles persistent tracking challenges in constrained environments, showcasing practical applications for real-world surveillance and logistics systems where precision is critical. The findings underscore the benefits of incorporating geometric resilience and delayed decision-making into MTMC architectures. Furthermore, our approach offers the advantage of seamless integration with existing camera infrastructure, eliminating the need for new deployments. Full article

(This article belongs to the Special Issue New Trends in Computer Vision and Image Processing)

► Show Figures

Figure 1

21 pages, 15478 KB

Open AccessReview

Small Object Detection in Traffic Scenes for Mobile Robots: Challenges, Strategies, and Future Directions

by Zhe Wei, Yurong Zou, Haibo Xu and Sen Wang

Electronics 2025, 14(13), 2614; https://doi.org/10.3390/electronics14132614 - 28 Jun 2025

Viewed by 1428

Abstract

Small object detection in traffic scenes presents unique challenges for mobile robots operating under constrained computational resources and highly dynamic environments. Unlike general object detection, small targets often suffer from low resolution, weak semantic cues, and frequent occlusion, especially in complex outdoor scenarios. [...] Read more.

Small object detection in traffic scenes presents unique challenges for mobile robots operating under constrained computational resources and highly dynamic environments. Unlike general object detection, small targets often suffer from low resolution, weak semantic cues, and frequent occlusion, especially in complex outdoor scenarios. This study systematically analyses the challenges, technical advances, and deployment strategies for small object detection tailored to mobile robotic platforms. We categorise existing approaches into three main strategies: feature enhancement (e.g., multi-scale fusion, attention mechanisms), network architecture optimisation (e.g., lightweight backbones, anchor-free heads), and data-driven techniques (e.g., augmentation, simulation, transfer learning). Furthermore, we examine deployment techniques on embedded devices such as Jetson Nano and Raspberry Pi, and we highlight multi-modal sensor fusion using Light Detection and Ranging (LiDAR), cameras, and Inertial Measurement Units (IMUs) for enhanced environmental perception. A comparative study of public datasets and evaluation metrics is provided to identify current limitations in real-world benchmarking. Finally, we discuss future directions, including robust detection under extreme conditions and human-in-the-loop incremental learning frameworks. This research aims to offer a comprehensive technical reference for researchers and practitioners developing small object detection systems for real-world robotic applications. Full article

(This article belongs to the Special Issue New Trends in Computer Vision and Image Processing)

► Show Figures

Figure 1

24 pages, 19576 KB

Open AccessArticle

Evaluating HAS and Low-Latency Streaming Algorithms for Enhanced QoE

by Syed Uddin, Michał Grega, Mikołaj Leszczuk and Waqas ur Rahman

Electronics 2025, 14(13), 2587; https://doi.org/10.3390/electronics14132587 - 26 Jun 2025

Viewed by 2078

Abstract

The demand for multimedia traffic over the Internet is exponentially growing. HTTP adaptive streaming (HAS) is the leading video delivery system that delivers high-quality video to the end user. The adaptive bitrate (ABR) algorithms running on the HTTP client select the highest feasible [...] Read more.

The demand for multimedia traffic over the Internet is exponentially growing. HTTP adaptive streaming (HAS) is the leading video delivery system that delivers high-quality video to the end user. The adaptive bitrate (ABR) algorithms running on the HTTP client select the highest feasible video quality by adjusting the quality according to the fluctuating network conditions. Recently, low-latency ABR algorithms have been introduced to reduce the end-to-end latency commonly experienced in HAS. However, a comprehensive study of the low-latency algorithms remains limited. This paper investigates the effectiveness of low-latency streaming algorithms in maintaining a high quality of experience (QoE) while minimizing playback delay. We evaluate these algorithms in the context of both Dynamic Adaptive Streaming over HTTP (DASH) and the Common Media Application Format (CMAF), with a particular focus on the impact of chunked encoding and transfer mechanisms on the QoE. We perform both objective as well as subjective evaluations of low-latency algorithms and compare their performance with traditional DASH-based ABR algorithms across multiple QoE metrics, various network conditions, and diverse content types. The results demonstrate that low-latency algorithms consistently deliver high video quality across various content types and network conditions, whereas the performance of the traditional adaptive bitrate (ABR) algorithms exhibit performance variability under fluctuating network conditions and diverse content characteristics. Although traditional ABR algorithms download higher-quality segments in stable network environments, their effectiveness significantly declines under unstable conditions. Furthermore, the low-latency algorithms maintained high user experience regardless of segment duration. In contrast, the performance of traditional algorithms varied significantly with changes in segment duration. In summary, the results underscore that no single algorithm consistently achieves optimal performance across all experimental conditions. Performance varies depending on network stability, content characteristics, and segment duration, highlighting the need for adaptive strategies that can dynamically respond to varying streaming environments. Full article

(This article belongs to the Special Issue Video Streaming Service Solutions)

► Show Figures

Figure 1

Search Results (267)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Article Types

Countries / Regions

Search Results (267)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI