MDPI - Publisher of Open Access Journals

19 pages, 2675 KB

Open AccessArticle

Fast Intra-Coding Unit Partitioning for 3D-HEVC Depth Maps via Hierarchical Feature Fusion

by Fangmei Liu, He Zhang and Qiuwen Zhang

Electronics 2025, 14(18), 3646; https://doi.org/10.3390/electronics14183646 - 15 Sep 2025

Viewed by 311

As a new generation 3D video coding standard, 3D-HEVC offers highly efficient compression. However, its recursive quadtree partitioning mechanism and frequent rate-distortion optimization (RDO) computations lead to a significant increase in coding complexity. Particularly, intra-frame coding in depth maps, which incorporates tools like [...] Read more.

As a new generation 3D video coding standard, 3D-HEVC offers highly efficient compression. However, its recursive quadtree partitioning mechanism and frequent rate-distortion optimization (RDO) computations lead to a significant increase in coding complexity. Particularly, intra-frame coding in depth maps, which incorporates tools like depth modeling modes (DMMs), substantially prolongs the decision-making process for coding unit (CU) partitioning, becoming a critical bottleneck in compression encoding time. To address this issue, this paper proposes a fast CU partitioning framework based on hierarchical feature fusion convolutional neural networks (HFF-CNNs). It aims to significantly accelerate the overall encoding process while ensuring excellent encoding quality by optimizing depth map CU partitioning decisions. This framework synergistically captures CU’s global structure and local details through multi-scale feature extraction and channel attention mechanisms (SE module). It introduces the wavelet energy ratio designed for quantifying the texture complexity of depth map CU and the quantization parameter (QP) that reflects the encoding quality as external features, enhancing the dynamic perception ability of the model from different dimensions. Ultimately, it outputs depth-corresponding partitioning predictions through three fully connected layers, strictly adhering to HEVC’s quad-tree recursive segmentation mechanism. Experimental results demonstrate that, across eight standard test sequences, the proposed method achieves an average encoding time reduction of 48.43%, significantly lowering intra-frame encoding complexity with a BDBR increment of only 0.35%. The model exhibits outstanding lightweight characteristics with minimal inference time overhead. Compared with the representative methods under comparison, this method achieves a better balance between cross-resolution adaptability and computational efficiency, providing a feasible optimization path for real-time 3D-HEVC applications. Full article

► Show Figures

Figure 1

20 pages, 4093 KB

Open AccessArticle

CNN Input Data Configuration Method for Fault Diagnosis of Three-Phase Induction Motors Based on D-Axis Current in D-Q Synchronous Reference Frame

by Yeong-Jin Goh

Appl. Sci. 2025, 15(15), 8380; https://doi.org/10.3390/app15158380 - 28 Jul 2025

Viewed by 360

Abstract

This study proposes a novel approach to input data configuration for the fault diagnosis of three-phase induction motors. Conventional neural network (CNN)-based diagnostic methods often employ three-phase current signals and apply various image transformation techniques, such as RGB mapping, wavelet transforms, and short-time [...] Read more.

This study proposes a novel approach to input data configuration for the fault diagnosis of three-phase induction motors. Conventional neural network (CNN)-based diagnostic methods often employ three-phase current signals and apply various image transformation techniques, such as RGB mapping, wavelet transforms, and short-time Fourier transform (STFT), to construct multi-channel input data. While such approaches outperform 1D-CNNs or grayscale-based 2D-CNNs due to their rich informational content, they require multi-channel data and involve an increased computational complexity. Accordingly, this study transforms the three-phase currents into the D-Q synchronous reference frame and utilizes the D-axis current (Id) for image transformation. The Id is used to generate input data using the same image processing techniques, allowing for a direct performance comparison under identical CNN architectures. Experiments were conducted under consistent conditions using both three-phase-based and Id-based methods, each applied to RGB mapping, DWT, and STFT. The classification accuracy was evaluated using a ResNet50-based CNN. Results showed that the Id-STFT achieved the highest performance, with a validation accuracy of 99.6% and a test accuracy of 99.0%. While the RGB representation of three-phase signals has traditionally been favored for its information richness and diagnostic performance, this study demonstrates that a high-performance CNN-based fault diagnosis is achievable even with grayscale representations of a single current. Full article

(This article belongs to the Special Issue Advanced AI and Machine Learning Techniques for Time Series Analysis and Pattern Recognition)

► Show Figures

Figure 1

21 pages, 5889 KB

Open AccessArticle

Mobile-YOLO: A Lightweight Object Detection Algorithm for Four Categories of Aquatic Organisms

by Hanyu Jiang, Jing Zhao, Fuyu Ma, Yan Yang and Ruiwen Yi

Fishes 2025, 10(7), 348; https://doi.org/10.3390/fishes10070348 - 14 Jul 2025

Viewed by 825

Abstract

Accurate and rapid aquatic organism recognition is a core technology for fisheries automation and aquatic organism statistical research. However, due to absorption and scattering effects, images of aquatic organisms often suffer from poor contrast and color distortion. Additionally, the clustering behavior of aquatic [...] Read more.

Accurate and rapid aquatic organism recognition is a core technology for fisheries automation and aquatic organism statistical research. However, due to absorption and scattering effects, images of aquatic organisms often suffer from poor contrast and color distortion. Additionally, the clustering behavior of aquatic organisms often leads to occlusion, further complicating the identification task. This study proposes a lightweight object detection model, Mobile-YOLO, for the recognition of four representative aquatic organisms, namely holothurian, echinus, scallop, and starfish. Our model first utilizes the Mobile-Nano backbone network we proposed, which enhances feature perception while maintaining a lightweight design. Then, we propose a lightweight detection head, LDtect, which achieves a balance between lightweight structure and high accuracy. Additionally, we introduce Dysample (dynamic sampling) and HWD (Haar wavelet downsampling) modules, aiming to optimize the feature fusion structure and achieve lightweight goals by improving the processes of upsampling and downsampling. These modules also help compensate for the accuracy loss caused by the lightweight design of LDtect. Compared to the baseline model, our model reduces Params (parameters) by 32.2%, FLOPs (floating point operations) by 28.4%, and weights (model storage size) by 30.8%, while improving FPS (frames per second) by 95.2%. The improvement in mAP (mean average precision) can also lead to better accuracy in practical applications, such as marine species monitoring, conservation efforts, and biodiversity assessment. Furthermore, the model’s accuracy is enhanced, with the mAP increased by 1.6%, demonstrating the advanced nature of our approach. Compared with YOLO (You Only Look Once) series (YOLOv5-12), SSD (Single Shot MultiBox Detector), EfficientDet (Efficient Detection), RetinaNet, and RT-DETR (Real-Time Detection Transformer), our model achieves leading comprehensive performance in terms of both accuracy and lightweight design. The results indicate that our research provides technological support for precise and rapid aquatic organism recognition. Full article

(This article belongs to the Special Issue Technology for Fish and Fishery Monitoring)

► Show Figures

Figure 1

17 pages, 7786 KB

Open AccessArticle

Video Coding Based on Ladder Subband Recovery and ResGroup Module

by Libo Wei, Aolin Zhang, Lei Liu, Jun Wang and Shuai Wang

Entropy 2025, 27(7), 734; https://doi.org/10.3390/e27070734 - 8 Jul 2025

Viewed by 442

Abstract

With the rapid development of video encoding technology in the field of computer vision, the demand for tasks such as video frame reconstruction, denoising, and super-resolution has been continuously increasing. However, traditional video encoding methods typically focus on extracting spatial or temporal domain [...] Read more.

With the rapid development of video encoding technology in the field of computer vision, the demand for tasks such as video frame reconstruction, denoising, and super-resolution has been continuously increasing. However, traditional video encoding methods typically focus on extracting spatial or temporal domain information, often facing challenges of insufficient accuracy and information loss when reconstructing high-frequency details, edges, and textures of images. To address this issue, this paper proposes an innovative LadderConv framework, which combines discrete wavelet transform (DWT) with spatial and channel attention mechanisms. By progressively recovering wavelet subbands, it effectively enhances the video frame encoding quality. Specifically, the LadderConv framework adopts a stepwise recovery approach for wavelet subbands, first processing high-frequency detail subbands with relatively less information, then enhancing the interaction between these subbands, and ultimately synthesizing a high-quality reconstructed image through inverse wavelet transform. Moreover, the framework introduces spatial and channel attention mechanisms, which further strengthen the focus on key regions and channel features, leading to notable improvements in detail restoration and image reconstruction accuracy. To optimize the performance of the LadderConv framework, particularly in detail recovery and high-frequency information extraction tasks, this paper designs an innovative ResGroup module. By using multi-layer convolution operations along with feature map compression and recovery, the ResGroup module enhances the network’s expressive capability and effectively reduces computational complexity. The ResGroup module captures multi-level features from low level to high level and retains rich feature information through residual connections, thus improving the overall reconstruction performance of the model. In experiments, the combination of the LadderConv framework and the ResGroup module demonstrates superior performance in video frame reconstruction tasks, particularly in recovering high-frequency information, image clarity, and detail representation. Full article

(This article belongs to the Special Issue Rethinking Representation Learning in the Age of Large Models)

► Show Figures

Figure 1

23 pages, 12771 KB

Open AccessArticle

Design and Simulation of a Bio-Inspired Deployable Mechanism Achieved by Mimicking the Folding Pattern of Beetles’ Hind Wings

by Hongyun Chen, Xin Li, Shujing Wang, Yan Zhao and Yu Zheng

Biomimetics 2025, 10(5), 320; https://doi.org/10.3390/biomimetics10050320 - 15 May 2025

Cited by 1 | Viewed by 941

Abstract

In this paper, a beetle with excellent flight ability and a large folding ratio of its hind wings is selected as the biomimetic design. We mimicked the geometric patterns formed during the folding process of the hind wings to construct a deployable mechanism [...] Read more.

In this paper, a beetle with excellent flight ability and a large folding ratio of its hind wings is selected as the biomimetic design. We mimicked the geometric patterns formed during the folding process of the hind wings to construct a deployable mechanism while calculating the sector angles and dihedral angles of the origami mechanism. In the expandable structure of thick plates, hinge-like steps are added on the thick plate to effectively avoid interference motion caused by the folding of the thick plate. The kinematic characteristics of two deployable mechanisms were characterized by ADAMS 2018 software to verify the feasibility of the mechanism design. The finite element method is used to analyze the structural performance of the deployable mechanism, and its modal response is analyzed in both unfolded and folded configurations. The aerodynamic generation of a spatially deployable wing is characterized by computational fluid dynamics (CFD) to study the vortex characteristics at different frame rates. Based on the aerodynamic parameters obtained from CFD simulation, a wavelet neural network is introduced to learn and train the aerodynamic parameters. Full article

► Show Figures

Figure 1

17 pages, 4114 KB

Open AccessArticle

Biomimetic Computing for Efficient Spoken Language Identification

by Gaurav Kumar and Saurabh Bhardwaj

Biomimetics 2025, 10(5), 316; https://doi.org/10.3390/biomimetics10050316 - 14 May 2025

Viewed by 758

Abstract

Spoken Language Identification (SLID)-based applications have become increasingly important in everyday life, driven by advancements in artificial intelligence and machine learning. Multilingual countries utilize the SLID method to facilitate speech detection. This is accomplished by determining the language of the spoken parts using [...] Read more.

Spoken Language Identification (SLID)-based applications have become increasingly important in everyday life, driven by advancements in artificial intelligence and machine learning. Multilingual countries utilize the SLID method to facilitate speech detection. This is accomplished by determining the language of the spoken parts using language recognizers. On the other hand, when working with multilingual datasets, the presence of multiple languages that have a shared origin presents a significant challenge for accurately classifying languages using automatic techniques. Further, one more challenge is the significant variance in speech signals caused by factors such as different speakers, content, acoustic settings, language differences, changes in voice modulation based on age and gender, and variations in speech patterns. In this study, we introduce the DBODL-MSLIS approach, which integrates biomimetic optimization techniques inspired by natural intelligence to enhance language classification. The proposed method employs Dung Beetle Optimization (DBO) with Deep Learning, simulating the beetle’s foraging behavior to optimize feature selection and classification performance. The proposed technique integrates speech preprocessing, which encompasses pre-emphasis, windowing, and frame blocking, followed by feature extraction utilizing pitch, energy, Discrete Wavelet Transform (DWT), and Zero crossing rate (ZCR). Further, the selection of features is performed by DBO algorithm, which removes redundant features and helps to improve efficiency and accuracy. Spoken languages are classified using Bayesian optimization (BO) in conjunction with a long short-term memory (LSTM) network. The DBODL-MSLIS technique has been experimentally validated using the IIIT Spoken Language dataset. The results indicate an average accuracy of 95.54% and an F-score of 84.31%. This technique surpasses various other state-of-the-art models, such as SVM, MLP, LDA, DLA-ASLISS, HMHFS-IISLFAS, GA base fusion, and VGG-16. We have evaluated the accuracy of our proposed technique against state-of-the-art biomimetic computing models such as GA, PSO, GWO, DE, and ACO. While ACO achieved up to 89.45% accuracy, our Bayesian Optimization with LSTM outperformed all others, reaching a peak accuracy of 95.55%, demonstrating its effectiveness in enhancing spoken language identification. The suggested technique demonstrates promising potential for practical applications in the field of multi-lingual voice processing. Full article

► Show Figures

Figure 1

18 pages, 8414 KB

Open AccessArticle

Fish Body Pattern Style Transfer Based on Wavelet Transformation and Gated Attention

by Hongchun Yuan and Yixuan Wang

Appl. Sci. 2025, 15(9), 5150; https://doi.org/10.3390/app15095150 - 6 May 2025

Viewed by 540

Abstract

To address the temporal jitter with low segmentation accuracy and the lack of high-precision transformations for specific object classes in video generation, we propose the fish body pattern sync-style network for ornamental fish videos. This network innovatively integrates dynamic texture transfer with instance [...] Read more.

To address the temporal jitter with low segmentation accuracy and the lack of high-precision transformations for specific object classes in video generation, we propose the fish body pattern sync-style network for ornamental fish videos. This network innovatively integrates dynamic texture transfer with instance segmentation, adopting a two-stage processing architecture. First, high-precision video frame segmentation is performed using Mask2Former to eliminate background elements that do not participate in the style transfer process. Then, we introduce the wavelet-gated styling network, which reconstructs a multi-scale feature space via discrete wavelet transform, enhancing the granularity of multi-scale style features during the image generation phase. Additionally, we embed a convolutional block attention module within the residual modules, not only improving the realism of the generated images but also effectively reducing boundary artifacts in foreground objects. Furthermore, to mitigate the frame-to-frame jitter commonly observed in generated videos, we incorporate a contrastive coherence preserving loss into the training process of the style transfer network. This enhances the perceptual loss function, thereby preventing video flickering and ensuring improved temporal consistency. In real-world aquarium scenes, compared to state-of-the-art methods, FSSNet effectively preserves localized texture details in generated videos and achieves competitive SSIM and PSNR scores. Moreover, temporal consistency is significantly improved. The flow warping error index decreases to 1.412. We chose FNST (fast neural style transfer) as our baseline model and demonstrate improvements in both model parameter count and runtime efficiency. According to user preferences, 43.75% of participants preferred the dynamic effects generated by this method. Full article

(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)

► Show Figures

Figure 1

27 pages, 43447 KB

Open AccessArticle

Vibration-Based Non-Contact Activity Classification for Home Cage Monitoring Using a Tuned-Beam IMU Sensing Device

by Pieter Try, René H. Tolba and Marion Gebhard

Sensors 2025, 25(8), 2549; https://doi.org/10.3390/s25082549 - 17 Apr 2025

Cited by 1 | Viewed by 2884

Abstract

This work presents a vibration-based non-contact monitoring method to classify the physical activity of a mouse inside a home cage. A novel tuned-beam sensing device is developed to measure low-amplitude activity-induced cage vibrations. The sensing device uses a mechanical beam structure to enhance [...] Read more.

This work presents a vibration-based non-contact monitoring method to classify the physical activity of a mouse inside a home cage. A novel tuned-beam sensing device is developed to measure low-amplitude activity-induced cage vibrations. The sensing device uses a mechanical beam structure to enhance a six-axis IMU that increases the signal-to-noise ratio (SNR) by 20 to 40 times in a relevant environment. A sophisticated classification algorithm is developed to process vibration sequences with a variable time frame that utilizes multi-level discrete wavelet transformation (MLDWT) to extract time–frequency features and optimize signal properties. The extracted features are classified by a convolutional neural network–long short-term memory (CNN-LSTM) machine learning model to determine the activity class. The ground truth is obtained with a camera-based system using EthoVision XT from Noldus and a custom post-processor. The method is developed on a dataset containing 300 h of vibration measurements with camera-based reference and includes two separate home cages and two individual mice. The method classifies the activity types Resting, Stationary Activity, Walking, Activity in Feeder, and Drinking with an accuracy of 86.81% and an average F1 score of 0.798 using a 9 s time frame. In long-term monitoring, the proposed method reproduces behavioral patterns such as sleep and acclimatization as accurately as the reference method, enabling home cage monitoring in the husbandry environment with a low-cost sensor. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

23 pages, 9740 KB

Open AccessArticle

Rip Current Identification in Optical Images Using Wavelet Transform

by Hsu-Min Wang, Dong-Jiing Doong and Jian-Wu Lai

J. Mar. Sci. Eng. 2025, 13(4), 707; https://doi.org/10.3390/jmse13040707 - 2 Apr 2025

Cited by 1 | Viewed by 1016

Abstract

Rip currents are fast-moving, narrow channels of water that flow seaward from the shoreline, typically forming within the surf zone and extending beyond the wave-breaking region. These currents pose significant hazards to swimmers, contributing to numerous drowning incidents, especially with the increasing popularity [...] Read more.

Rip currents are fast-moving, narrow channels of water that flow seaward from the shoreline, typically forming within the surf zone and extending beyond the wave-breaking region. These currents pose significant hazards to swimmers, contributing to numerous drowning incidents, especially with the increasing popularity of ocean recreation. Despite their prevalence, rip currents remain difficult to detect visually, and no universally reliable method exists for their identification by beachgoers. To address this challenge, this study presents a novel approach for detecting rip currents in optical images using wavelet-based edge detection and image convolution techniques. Five identification criteria were established based on previous literature and expert observations. The proposed program incorporates image augmentation, averaging, and frame aggregation to enhance generalization and accuracy. Experimental analysis involving four iterations and four wavelet bases demonstrated that using two iterations with the Daubechies wavelet yielded the highest interpretation accuracy (88.3%). Performance evaluation using a confusion matrix further confirmed an accuracy rate of 83.0%. The results indicate that the proposed method identifies rip currents in images, offering a valuable tool for researchers studying rip current patterns. This approach lays the groundwork for future advancements in rip current detection and related research. Full article

(This article belongs to the Special Issue Storm Tide and Wave Simulations and Assessment, 3rd Edition)

► Show Figures

Figure 1

24 pages, 2050 KB

Open AccessArticle

An Autoregressive-Based Motor Current Signature Analysis Approach for Fault Diagnosis of Electric Motor-Driven Mechanisms

by Roberto Diversi, Alice Lenzi, Nicolò Speciale and Matteo Barbieri

Sensors 2025, 25(4), 1130; https://doi.org/10.3390/s25041130 - 13 Feb 2025

Cited by 1 | Viewed by 1861

Abstract

Maintenance strategies such as condition-based maintenance and predictive maintenance of machines have gained importance in industrial automation firms as key concepts in Industry 4.0. As a result, online condition monitoring of electromechanical systems has become a crucial task in many industrial applications. Motor [...] Read more.

Maintenance strategies such as condition-based maintenance and predictive maintenance of machines have gained importance in industrial automation firms as key concepts in Industry 4.0. As a result, online condition monitoring of electromechanical systems has become a crucial task in many industrial applications. Motor current signature analysis (MCSA) is an interesting noninvasive alternative to vibration analysis for the condition monitoring and fault diagnosis of mechanical systems driven by electric motors. The MCSA approach is based on the premise that faults in the mechanical load driven by the motor manifest as changes in the motor’s current behavior. This paper presents a novel data-driven, MCSA-based CM approach that exploits autoregressive (AR) spectral estimation. A multiresolution analysis of the raw motor currents is first performed using the discrete wavelet transform with Daubechies filters, enabling the separation of noise, disturbances, and variable torque effects from the current signals. AR spectral estimation is then applied to selected wavelet details to extract relevant features for fault diagnosis. In particular, a reference AR power spectral density (PSD) is estimated using data collected under healthy conditions. The AR PSD is then continuously or periodically updated with new data frames and compared to the reference PSD through the Symmetric Itakura–Saito spectral distance (SISSD). The SISSD, which serves as the health indicator, has proven capable of detecting fault occurrences through changes in the AR spectrum. The proposed procedure is tested on real data from two different scenarios: (i) an experimental in-house setup where data are collected during the execution of electric cam motion tasks (imbalance faults are emulated); (ii) the Korea Advanced Institute of Science and Technology testbed, whose data set is publicly available (bearing faults are considered). The results demonstrate the effectiveness of the method in both fault detection and isolation. In particular, the proposed health indicator exhibits strong detection capabilities, as its values under fault conditions exceed those under healthy conditions by one order of magnitude. Full article

(This article belongs to the Special Issue Advanced Fault Diagnosis and Health Monitoring Techniques for Complex Engineering Systems: 2nd Edition)

► Show Figures

Figure 1

20 pages, 917 KB

Open AccessArticle

Developing a Dataset of Audio Features to Classify Emotions in Speech

by Alvaro A. Colunga-Rodriguez, Alicia Martínez-Rebollar, Hugo Estrada-Esquivel, Eddie Clemente and Odette A. Pliego-Martínez

Computation 2025, 13(2), 39; https://doi.org/10.3390/computation13020039 - 5 Feb 2025

Cited by 2 | Viewed by 3352

Abstract

Emotion recognition in speech has gained increasing relevance in recent years, enabling more personalized interactions between users and automated systems. This paper presents the development of a dataset of features obtained from RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) to classify [...] Read more.

Emotion recognition in speech has gained increasing relevance in recent years, enabling more personalized interactions between users and automated systems. This paper presents the development of a dataset of features obtained from RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) to classify emotions in speech. The paper highlights audio processing techniques such as silence removal and framing to extract features from the recordings. The features are extracted from the audio signals using spectral techniques, time-domain analysis, and the discrete wavelet transform. The resulting dataset is used to train a neural network and the support vector machine learning algorithm. Cross-validation is employed for model training. The developed models were optimized using a software package that performs hyperparameter tuning to improve results. Finally, the emotional classification outcomes were compared. The results showed an emotion classification accuracy of 0.654 for the perceptron neural network and 0.724 for the support vector machine algorithm, demonstrating satisfactory performance in emotion classification. Full article

(This article belongs to the Section Computational Engineering)

► Show Figures

Figure 1

16 pages, 8462 KB

Open AccessArticle

Wavelet-Based, Blur-Aware Decoupled Network for Video Deblurring

by Hua Wang, Pornntiwa Pawara and Rapeeporn Chamchong

Appl. Sci. 2025, 15(3), 1311; https://doi.org/10.3390/app15031311 - 27 Jan 2025

Viewed by 1354

Abstract

Video deblurring faces a fundamental challenge, as blur degradation comprehensively affects frames by not only causing detail loss but also severely distorting structural information. This dual degradation across low- and high-frequency domains makes it challenging for existing methods to simultaneously restore both structural [...] Read more.

Video deblurring faces a fundamental challenge, as blur degradation comprehensively affects frames by not only causing detail loss but also severely distorting structural information. This dual degradation across low- and high-frequency domains makes it challenging for existing methods to simultaneously restore both structural and detailed information through a unified approach. To address this issue, we propose a wavelet-based, blur-aware decoupled network (WBDNet) that innovatively decouples structure reconstruction from detail enhancement. Our method decomposes features into multiple frequency bands and employs specialized restoration strategies for different frequency domains. In the low-frequency domain, we construct a multi-scale feature pyramid with optical flow alignment. This enables accurate structure reconstruction through bottom-up progressive feature fusion. For high-frequency components, we combine deformable convolution with a blur-aware attention mechanism. This allows us to precisely extract and merge sharp details from multiple frames. Extensive experiments on benchmark datasets demonstrate the superior performance of our method, particularly in preserving structural integrity and detail fidelity. Full article

(This article belongs to the Special Issue Deep Learning and Transformer Technologies for Image/Video Enhancement and Restoration)

► Show Figures

Figure 1

29 pages, 17674 KB

Open AccessArticle

Noise-Perception Multi-Frame Collaborative Network for Enhanced Polyp Detection in Endoscopic Videos

by Haoran Li, Guoyong Zhen, Chengqun Chu, Yuting Ma and Yongnan Zhao

Electronics 2025, 14(1), 62; https://doi.org/10.3390/electronics14010062 - 27 Dec 2024

Cited by 1 | Viewed by 1047

Abstract

The accurate detection and localization of polyps during endoscopic examinations are critical for early disease diagnosis and cancer prevention. However, the presence of artifacts and noise, along with the high similarity between polyps and surrounding tissues in color, shape, and texture complicates polyp [...] Read more.

The accurate detection and localization of polyps during endoscopic examinations are critical for early disease diagnosis and cancer prevention. However, the presence of artifacts and noise, along with the high similarity between polyps and surrounding tissues in color, shape, and texture complicates polyp detection in video frames. To tackle these challenges, we deployed multivariate regression analysis to refine the model and introduced a Noise-Suppressing Perception Network (NSPNet) designed for enhanced performance. NSPNet leverages wavelet transform to enhance the model’s resistance to noise and artifacts while improving a multi-frame collaborative detection strategy for dynamic polyp detection in endoscopic videos, efficiently utilizing temporal information to strengthen features across frames. Specifically, we designed a High-Low Frequency Feature Fusion (HFLF) framework, which allows the model to capture high-frequency details more effectively. Additionally, we introduced an improved STFT-LSTM Polyp Detection (SLPD) module that utilizes temporal information from video sequences to enhance feature fusion in dynamic environments. Lastly, we integrated an Image Augmentation Polyp Detection (IAPD) module to improve performance on unseen data through preprocessing enhancement strategies. Extensive experiments demonstrate that NSPNet outperforms nine SOTA methods across four datasets on key performance metrics, including F1Score and recall. Full article

► Show Figures

Figure 1

17 pages, 9263 KB

Open AccessArticle

HHS-RT-DETR: A Method for the Detection of Citrus Greening Disease

by Yi Huangfu, Zhonghao Huang, Xiaogang Yang, Yunjian Zhang, Wenfeng Li, Jie Shi and Linlin Yang

Agronomy 2024, 14(12), 2900; https://doi.org/10.3390/agronomy14122900 - 4 Dec 2024

Cited by 8 | Viewed by 1737

Abstract

Background: Given the severe economic burden that citrus greening disease imposes on fruit farmers and related industries, rapid and accurate disease detection is particularly crucial. This not only effectively curbs the spread of the disease, but also significantly reduces reliance on manual detection [...] Read more.

Background: Given the severe economic burden that citrus greening disease imposes on fruit farmers and related industries, rapid and accurate disease detection is particularly crucial. This not only effectively curbs the spread of the disease, but also significantly reduces reliance on manual detection within extensive citrus planting areas. Objective: In response to this challenge, and to address the issues posed by resource-constrained platforms and complex backgrounds, this paper designs and proposes a novel method for the recognition and localization of citrus greening disease, named the HHS-RT-DETR model. The goal of this model is to achieve precise detection and localization of the disease while maintaining efficiency. Methods: Based on the RT-DETR-r18 model, the following improvements are made: the HS-FPN (high-level screening-feature pyramid network) is used to improve the feature fusion and feature selection part of the RT-DETR model, and the filtered feature information is merged with the high-level features by filtering out the low-level features, so as to enhance the feature selection ability and multi-level feature fusion ability of the model. In the feature fusion and feature selection sections, the HWD (hybrid wavelet-directional filter banks) downsampling operator is introduced to prevent the loss of effective information in the channel and reduce the computational complexity of the model. Through using the ShapeIoU loss function to enable the model to focus on the shape and scale of the bounding box itself, the prediction of the bounding box of the model will be more accurate. Conclusions and Results: This study has successfully developed an improved HHS-RT-DETR model which exhibits efficiency and accuracy on resource-constrained platforms and offers significant advantages for the automatic detection of citrus greening disease. Experimental results show that the improved model, when compared to the RT-DETR-r18 baseline model, has achieved significant improvements in several key performance metrics: the precision increased by 7.9%, the frame rate increased by 4 frames per second (f/s), the recall rose by 9.9%, and the average accuracy also increased by 7.5%, while the number of model parameters reduced by

{0.137 \times 10}^{7}

. Moreover, the improved model has demonstrated outstanding robustness in detecting occluded leaves within complex backgrounds. This provides strong technical support for the early detection and timely control of citrus greening disease. Additionally, the improved model has showcased advanced detection capabilities on the PASCAL VOC dataset. Discussions: Future research plans include expanding the dataset to encompass a broader range of citrus species and different stages of citrus greening disease. In addition, the plans involve incorporating leaf images under various lighting conditions and different weather scenarios to enhance the model’s generalization capabilities, ensuring the accurate localization and identification of citrus greening disease in diverse complex environments. Lastly, the integration of the improved model into an unmanned aerial vehicle (UAV) system is envisioned to enable the real-time, regional-level precise localization of citrus greening disease. Full article

(This article belongs to the Section Precision and Digital Agriculture)

► Show Figures

Figure 1

15 pages, 2137 KB

Open AccessArticle

Research on Abnormal State Detection of CZ Silicon Single Crystal Based on Multimodal Fusion

by Lei Jiang, Haotan Wei and Ding Liu

Sensors 2024, 24(21), 6819; https://doi.org/10.3390/s24216819 - 23 Oct 2024

Viewed by 1171

Abstract

The Czochralski method is the primary technique for single-crystal silicon production. However, anomalous states such as crystal loss, twisting, swinging, and squareness frequently occur during crystal growth, adversely affecting product quality and production efficiency. To address this challenge, we propose an enhanced multimodal [...] Read more.

The Czochralski method is the primary technique for single-crystal silicon production. However, anomalous states such as crystal loss, twisting, swinging, and squareness frequently occur during crystal growth, adversely affecting product quality and production efficiency. To address this challenge, we propose an enhanced multimodal fusion classification model for detecting and categorizing these four anomalous states. Our model initially transforms one-dimensional signals (diameter, temperature, and pulling speed) into time–frequency domain images via continuous wavelet transform. These images are then processed using a Dense-ECA-SwinTransformer network for feature extraction. Concurrently, meniscus images and inter-frame difference images are obtained from the growth system’s meniscus video feed. These visual inputs are fused at the channel level and subsequently processed through a ConvNeXt network for feature extraction. Finally, the time–frequency domain features are combined with the meniscus image features and fed into fully connected layers for multi-class classification. The experimental results show that the method can effectively detect various abnormal states, help the staff to make a more accurate judgment, and formulate a personalized treatment plan for the abnormal state, which can improve the production efficiency, save production resources, and protect the extraction equipment. Full article

(This article belongs to the Special Issue Feature Papers in Physical Sensors 2024)

► Show Figures

Figure 1

Search Results (91)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (91)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI