Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,257)

Search Parameters:
Keywords = lightweight deep learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
29 pages, 12453 KB  
Article
A Lightweight Drainage Pipe Defect Detection Method Based on an Improved YOLO11 Network
by Rui Xue, Hongtao Fu, Hui Zhao and Chongquan Wang
Information 2026, 17(6), 613; https://doi.org/10.3390/info17060613 (registering DOI) - 21 Jun 2026
Abstract
Drainage pipe defect detection is essential for maintaining the normal operation of urban infrastructure. In recent years, deep learning-based object detection methods have provided an effective technical solution for drainage pipe defect recognition. Among them, YOLO-series models have demonstrated strong potential in visual [...] Read more.
Drainage pipe defect detection is essential for maintaining the normal operation of urban infrastructure. In recent years, deep learning-based object detection methods have provided an effective technical solution for drainage pipe defect recognition. Among them, YOLO-series models have demonstrated strong potential in visual detection tasks due to their end-to-end architecture and high inference efficiency. However, directly applying baseline YOLO models may still face challenges such as limited detection accuracy, relatively high model complexity, and insufficient adaptability for lightweight deployment scenarios. To address these issues, this paper proposes a lightweight drainage pipe defect detection method based on an improved YOLO11 network. Rather than treating detection enhancement and model compression as two separate procedures, the proposed method integrates feature enhancement, adaptive pruning, and distillation-based recovery into a unified lightweight detection framework. Specifically, an improved SimAM attention mechanism is introduced into the backbone and integrated with the C3k2 module to construct the C3K2_SWS module, aiming to enhance the representation capability of critical defect features. In the neck network, a focused diffusion pyramid network with a dimension-aware selective fusion structure, termed FDPN-DASI, is designed to strengthen multi-scale feature interactions. In addition, an adaptive-threshold focal loss (ATFL) is introduced to improve the learning capability for hard samples. For efficient deployment, the LAMP pruning algorithm is further improved, and an entropy-guided entropy-adaptive magnitude-based pruning method (EA-LAMP) is proposed to enable adaptive allocation of pruning ratios across different network layers. Moreover, BCKD knowledge distillation is applied after pruning to mitigate the accuracy degradation caused by model compression. Experimental results indicate that the proposed lightweight YOLO11-SFA+EA+BCKD framework achieves a precision of 92.4%, a recall of 88.5%, and an mAP50 of 93.3%, while maintaining a compact model size of 1.6 M parameters and 4.5 G FLOPs. Compared with the baseline model, the proposed method improves precision, recall, and mAP50 by 5.9%, 5.0%, and 4.7%, respectively, while reducing the number of parameters, FLOPs, and model size by 1.0 M, 1.8 G, and 2.1 M, respectively. These results suggest that the proposed framework can improve detection performance while reducing model complexity under the current experimental setting, indicating its potential for lightweight drainage pipe defect detection tasks. Full article
(This article belongs to the Section Artificial Intelligence)
25 pages, 5240 KB  
Article
Monocular Estimation of Grape Berry Size (Caliber) Distributions Using Geometry-Aware Representations and Structured Prediction
by Matias Soto, Pablo Ormeño-Arriagada and Jorge Vasquez
Appl. Sci. 2026, 16(12), 6225; https://doi.org/10.3390/app16126225 (registering DOI) - 20 Jun 2026
Abstract
Grape caliber distributions are critical for packing, grading, yield estimation, and post-harvest logistics. However, estimating reliable caliber histograms from single images remains challenging due to occlusion and dense bunch structure. This work presents a two-stage monocular pipeline that integrates instance segmentation, geometry-aware representations, [...] Read more.
Grape caliber distributions are critical for packing, grading, yield estimation, and post-harvest logistics. However, estimating reliable caliber histograms from single images remains challenging due to occlusion and dense bunch structure. This work presents a two-stage monocular pipeline that integrates instance segmentation, geometry-aware representations, residual quantity correction, and structured histogram prediction. In the first stage, a YOLO-based model detects grape instances and a calibration object, enabling the construction of geometry-aware auxiliary channels and a segmentation-derived counting prior. In the second stage, these representations are used to estimate total grape count and caliber distributions. Results show that RGBDT consistently outperforms RGB, indicating that geometry-aware cues improve both histogram fidelity and counting accuracy. The framework achieves stable performance under realistic conditions while maintaining low runtime, supporting practical deployment in agricultural environments. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
34 pages, 22401 KB  
Article
Sensor-Driven Short-Term Forecasting on the Metropolitan LA Traffic Dataset: A Comparative Study for Multi-Step Prediction
by Bowen Dong, Xinyu Zhang, Weiyan Zhu, Lingmin Hou, Chaoya Yan, Yifan Feng and Lixing Lin
Sensors 2026, 26(12), 3917; https://doi.org/10.3390/s26123917 (registering DOI) - 20 Jun 2026
Abstract
Short-term traffic forecasting is a critical component of intelligent transportation systems. While deep learning architectures for this task have proliferated rapidly, the sensor-level data characteristics—zero-value prevalence, distributional heterogeneity, and cross-sensor correlation structure—that drive architecture-specific failure modes remain insufficiently understood, and their implications for [...] Read more.
Short-term traffic forecasting is a critical component of intelligent transportation systems. While deep learning architectures for this task have proliferated rapidly, the sensor-level data characteristics—zero-value prevalence, distributional heterogeneity, and cross-sensor correlation structure—that drive architecture-specific failure modes remain insufficiently understood, and their implications for evidence-based model selection in real deployments have not been systematically addressed. This study addresses that question through a sensor-network diagnostic framework applied to the METR-LA dataset (Metropolitan Los Angeles; 207 inductive loop detectors, 5-min resolution). The framework integrates systematic characterization of sensor data properties, a controlled benchmark of four representative architectures—Transformer, Spatio-Temporal Graph Convolutional Network (STGCN), Diffusion Convolutional Recurrent Neural Network (DCRNN), and Gated Temporal Convolutional Network (Gated TCN)—under a unified 12→3 prediction setting, and a novel per-sensor regression analysis that quantitatively links zero-value ratios to model-specific prediction errors across all 207 sensors. Building on these findings, this study further proposes Graph-Enhanced Transformer (GETFormer), a lightweight hybrid architecture that augments the Transformer with a single-hop Graph Convolutional Network (GCN) layer and a gated residual fusion module. The diagnostic findings and condition-dependent model-selection guidelines provide an empirically grounded foundation for principled hybrid architecture development in urban traffic sensing. Full article
28 pages, 8358 KB  
Article
Deep Climate Model Distillation for Localized Flood Forecasting in Low-Resource Areas
by Julius Olaniyan, Deborah Olaniyan, Ibidun C. Obagbuwa and Madison N. Ngafeeson
Meteorology 2026, 5(2), 16; https://doi.org/10.3390/meteorology5020016 (registering DOI) - 19 Jun 2026
Viewed by 50
Abstract
Floods remain among the most devastating natural disasters globally, disproportionately impacting low-resource regions where real-time flood forecasting is constrained by limited computational infrastructure and the scarcity of fine-resolution predictive models. Although state-of-the-art global climate models achieve high predictive accuracy, their scale and computational [...] Read more.
Floods remain among the most devastating natural disasters globally, disproportionately impacting low-resource regions where real-time flood forecasting is constrained by limited computational infrastructure and the scarcity of fine-resolution predictive models. Although state-of-the-art global climate models achieve high predictive accuracy, their scale and computational complexity restrict their applicability in localized and resource-constrained settings. This study proposes a deep climate model distillation framework that transfers knowledge from a high-capacity Fourier Neural Operator (FNO)-based global climate model inspired by FourCastNet into lightweight, regionally adaptive student networks suitable for edge deployment. The framework combines climate variables, satellite observations, and hydrological measurements to improve localized flood prediction. Knowledge transfer is achieved through a multi-objective distillation strategy that combines supervised learning, soft-target alignment, and intermediate feature matching. Experimental evaluation across multiple flood-prone regions in Sub-Saharan Africa and South Asia shows that the distilled student model achieves an average classification accuracy of 0.89, an AUC of 0.91, and an F1-score of 0.88, retaining approximately 96.7% of the teacher model’s predictive performance. In continuous discharge estimation, the model attains a mean absolute error of 0.17, RMSE of 0.24, and an R2 score of 0.85. The proposed distillation approach yields an 8× reduction in inference latency and over a 20× reduction in model size, enabling real-time execution on low-power edge devices such as the Raspberry Pi 4 and NVIDIA Jetson Nano. The student model further demonstrates robust regional and temporal generalization, with limited performance degradation in unseen geographic areas and during extreme flood years. Full article
(This article belongs to the Special Issue Early Career Scientists’ (ECS) Contributions to Meteorology (2026))
Show Figures

Graphical abstract

20 pages, 4527 KB  
Article
A Re-Parameterized Lightweight Residual Attention Framework for Resource-Constrained Edge Computing
by Yuze Gao, Jiamin Zhu, Xiaoxiao Liu and Wei Wu
Computers 2026, 15(6), 395; https://doi.org/10.3390/computers15060395 (registering DOI) - 19 Jun 2026
Viewed by 125
Abstract
Edge vision systems require convolutional neural networks (CNNs) that preserve recognition accuracy under strict storage, computation, and latency constraints. Although ResNet18 is a compact residual backbone, direct deployment on resource-constrained devices remains costly, whereas simple channel reduction weakens representation capacity. This study aims [...] Read more.
Edge vision systems require convolutional neural networks (CNNs) that preserve recognition accuracy under strict storage, computation, and latency constraints. Although ResNet18 is a compact residual backbone, direct deployment on resource-constrained devices remains costly, whereas simple channel reduction weakens representation capacity. This study aims to build a deployable ResNet18-based classifier that reduces model complexity while recovering the accuracy lost during compression. We propose a lightweight framework that combines global channel scaling, a re-parameterized attention residual block, and teacher–student knowledge distillation. The proposed block uses multi-branch convolution and squeeze-and-excitation attention during training, then folds the linear branches into a single 3-by-3 convolution for inference. Experiments on CIFAR-100 show that the final model reduces parameters from 11.220 M to 2.841 M, retains comparable Top-1 accuracy (0.7579 vs. 0.7606), improves Top-5 accuracy (0.9340 vs. 0.9253), and reduces graphics processing unit (GPU) batch inference latency from 3.279 ms to 2.161 ms. Deployment on PYNQ-Z2 verifies the complete camera-based CPU-side inference workflow, with an average end-to-end latency of 421.467 ms/frame. The results indicate that residual topology preservation, re-parameterized feature enhancement, and distillation form a practical route for edge-oriented lightweight CNN deployment. Full article
(This article belongs to the Topic Smart Edge Devices: Design and Applications)
Show Figures

Figure 1

40 pages, 1911 KB  
Article
Monocular 3D Position Estimation of a Moving Vehicle Based on a Kalman-Goldschmidt Adaptive Filter
by Diana Kalita, Pavel Lyakhov, Valery Andreev and Denis Butusov
J. Sens. Actuator Netw. 2026, 15(3), 48; https://doi.org/10.3390/jsan15030048 (registering DOI) - 18 Jun 2026
Viewed by 63
Abstract
Determining the 3D position of a vehicle from a 2D image plays a key role in video surveillance, autonomous driving, and spatial localization. However, localization accuracy can significantly degrade in conditions of incomplete or synthetic measurement noise and keypoint jitter. In this paper, [...] Read more.
Determining the 3D position of a vehicle from a 2D image plays a key role in video surveillance, autonomous driving, and spatial localization. However, localization accuracy can significantly degrade in conditions of incomplete or synthetic measurement noise and keypoint jitter. In this paper, we propose a new iterative 3D position estimation algorithm (KGA). This algorithm includes geometric correction and calibration steps for converting from 2D to 3D coordinates; trajectory prediction and correction using a Kalman filter; and adaptive tuning of the filter parameters using the Goldschmidt algorithm. Experiments confirm that KGA outperforms the standard (FK) and modified (MFK) Kalman filters in accuracy and convergence speed, demonstrating robustness to various camera angles and noise levels. The novelty of this approach lies in the integration of the Goldschmidt algorithm into the Kalman filter to create an adaptation mechanism that dynamically adjusts the measurement noise covariance based on instantaneous innovation magnitude. Unlike end-to-end deep learning trackers or nonlinear filters (EKF/UKF), KGA is designed as a lightweight post-processing stage that can be seamlessly integrated into existing detection pipelines while maintaining the low computational footprint required for UAV-based edge deployment. The algorithm is of practical value for computer vision systems requiring accurate and robust tracking under varying observational conditions, with current implementation suitable for offline or buffered processing, and clear pathways to real-time deployment through code optimization. The algorithm is of practical value for computer vision systems requiring accurate and robust tracking under varying observational conditions. Full article
(This article belongs to the Section Big Data, Computing and Artificial Intelligence)
34 pages, 2338 KB  
Review
A Taxonomy of Machine Learning for UAV-Enabled Precision Agriculture: A Structured Survey
by Wan D. Bae, Shayma Alkobaisi, Muhammad Farhan Safdar and Prachitee Chouhan
AgriEngineering 2026, 8(6), 249; https://doi.org/10.3390/agriengineering8060249 - 18 Jun 2026
Viewed by 211
Abstract
Precision agriculture increasingly relies on machine learning applied to high-resolution data acquired by unmanned aerial vehicles (UAVs) to support crop monitoring, stress detection, and yield forecasting. This survey presents a structured review of machine learning methods for UAV-enabled precision agriculture and organizes over [...] Read more.
Precision agriculture increasingly relies on machine learning applied to high-resolution data acquired by unmanned aerial vehicles (UAVs) to support crop monitoring, stress detection, and yield forecasting. This survey presents a structured review of machine learning methods for UAV-enabled precision agriculture and organizes over 100 peer-reviewed studies within a unified four-dimensional taxonomy defined by sensing modality, data type, model family, and analytical task. The taxonomy enables systematic comparison across RGB, multispectral, hyperspectral, LiDAR, and IoT data sources and across classical machine learning, deep learning, hybrid sequential models, and emerging transformer-based architectures. We analyze how modeling choices interact with data characteristics to influence robustness, cross-environment generalization, computational efficiency, and deployment feasibility on UAV and edge platforms. Recurring challenges include limited labeled data, domain shift across seasons and fields, multimodal heterogeneity, occlusion, and real-time processing constraints. We identify emerging research directions, including data-efficient learning, representation-level multimodal fusion, domain adaptation, lightweight architectures for embedded deployment, and uncertainty aware decision support. By formalizing the landscape through a unified taxonomy, this survey provides a foundation for designing scalable, robust, and deployable machine learning systems for next-generation precision agriculture. Full article
Show Figures

Figure 1

26 pages, 13171 KB  
Article
A Deep Learning Approach for Pixel-Level Material Classification via Hyperspectral Imaging
by Savvas Sifnaios, George Arvanitakis, Fotios K. Konstantinidis, Georgios Tsimiklis, Angelos Amditis and Panayiotis Frangos
J. Imaging 2026, 12(6), 267; https://doi.org/10.3390/jimaging12060267 - 18 Jun 2026
Viewed by 171
Abstract
Recent advancements in computer vision, particularly in detection, segmentation, and classification, have significantly impacted various domains. However, these advancements are still strongly tied to RGB-based systems, which are insufficient for applications in industries such as waste sorting, pharmaceuticals, and defence, where material characterization [...] Read more.
Recent advancements in computer vision, particularly in detection, segmentation, and classification, have significantly impacted various domains. However, these advancements are still strongly tied to RGB-based systems, which are insufficient for applications in industries such as waste sorting, pharmaceuticals, and defence, where material characterization beyond shape or visible colour is necessary. Hyperspectral (HS) imaging captures spatial and spectral information for each pixel and therefore offers a promising route for material-level classification. This study evaluates the potential of combining HS imaging with deep learning for plastic material classification. The work includes: (i) the design of an experimental setup with a HS line-scan camera, conveyor, and controlled illumination; (ii) the construction of an object-disjoint dataset of HDPE, PET, PP, and PS samples with semi-automated mask generation and Raman spectroscopy-based labelling; and (iii) the development of P1CH, a lightweight pixel-wise 1D convolutional hyperspectral classifier. On object-disjoint test images, P1CH achieved 97.44% all-pixel accuracy. A boundary sensitivity analysis, reported separately because semi-automated labels are uncertain at material/background interfaces, yielded 99.94% accuracy after excluding a pre-defined two-pixel border band. Additional ablation, baseline, and robustness analyses show that the proposed pixel-wise spectral approach is effective for small fragments, visually similar plastics, and overlapping materials, while black or very dark plastics remain challenging under the present camera and illumination configuration. Full article
(This article belongs to the Special Issue Advancement in Hyperspectral Image Processing with Machine Learning)
Show Figures

Figure 1

19 pages, 13879 KB  
Article
An Integrated Framework for Multi-UAV Trajectory Prediction and Handover Optimization in 5G Networks
by Ahmed Lateef Salih Al-Karawi and Rafet Akdeniz
Electronics 2026, 15(12), 2702; https://doi.org/10.3390/electronics15122702 - 18 Jun 2026
Viewed by 151
Abstract
The proliferation of Unmanned Aerial Vehicles (UAVs) in various applications has created a pressing need for robust and efficient communication systems. Fifth-generation (5G) networks can support UAV connectivity through high bandwidth and low-latency communication; however, rapid three-dimensional UAV mobility creates handover-management challenges that [...] Read more.
The proliferation of Unmanned Aerial Vehicles (UAVs) in various applications has created a pressing need for robust and efficient communication systems. Fifth-generation (5G) networks can support UAV connectivity through high bandwidth and low-latency communication; however, rapid three-dimensional UAV mobility creates handover-management challenges that can increase signalling overhead, service interruption, and Quality of Service (QoS) degradation. This paper presents an integrated framework that combines LSTM-based multi-UAV trajectory prediction with proactive handover optimization using an Advantage Actor–Critic (A2C) Deep Reinforcement Learning (DRL) agent. The LSTM predictor is evaluated on a real-world UAV trajectory dataset and reports a root mean square error (RMSE) of 4.37 m over a 5 s prediction horizon after conversion to a local East–North–Up coordinate frame. A lightweight simulation-level coordination mechanism is included to reduce simultaneous target-cell contention among multiple UAVs; it is not claimed as a new standardized 3GPP signalling procedure. Handover performance is evaluated by replaying 180 held-out flight trajectories in a controlled 5G simulation across ten independent random seeds. Under these stated assumptions, the proposed framework achieves a handover success rate of 94.2±0.8%, an average SINR of 15.8±0.2 dB, a handover delay of 45.2±1.1 ms, and a handover frequency of 0.85±0.05 HOs/min, outperforming the tuned 3GPP A3, reactive SINR, and CASH baselines in the reported simulation results (Wilcoxon signed-rank test, p<0.01, Bonferroni-corrected). The experimental setup is described in detail to support methodological transparency and facilitate future replication, but the handover results should be interpreted as simulation-based evidence rather than live-network validation. Full article
Show Figures

Figure 1

23 pages, 2110 KB  
Article
A Lightweight LCGRU–Wave-SkipConvNet Framework for Speech–Noise Separation in Urban Acoustic Environments and Performing-Arts Spaces Toward Sustainable and Equitable Acoustic Communication
by Baoli Zhang, Yanping Lu, Dandan Wang and Hongyan Liu
Sustainability 2026, 18(12), 6242; https://doi.org/10.3390/su18126242 - 17 Jun 2026
Viewed by 164
Abstract
Urban acoustic environments and performing-arts spaces strongly influence speech communication quality, acoustic comfort, and public wellbeing, particularly in noise-exposed shared environments such as transport hubs, campuses, healthcare spaces, public service facilities, music-education settings, and rehearsal or performance-related spaces. To address speech–noise separation in [...] Read more.
Urban acoustic environments and performing-arts spaces strongly influence speech communication quality, acoustic comfort, and public wellbeing, particularly in noise-exposed shared environments such as transport hubs, campuses, healthcare spaces, public service facilities, music-education settings, and rehearsal or performance-related spaces. To address speech–noise separation in low signal-to-noise ratio and acoustically complex scenarios, this study proposes a lightweight two-stage deep learning framework termed LCGRU–Wave-SkipConvNet. In the preprocessing stage, a Lightweight Convolutional Gated Recurrent Unit (LCGRU) model is employed to achieve preliminary separation of target speech and background noise by capturing both spatial and temporal acoustic features. In the post-processing stage, a Wave-SkipConvNet model is introduced to further suppress residual noise and enhance speech quality. Experimental results demonstrate that the proposed framework achieves superior performance under different signal-to-noise ratios, sound-source angles, and target angle errors. For example, in the preprocessing stage, the LCGRU model achieved a perceptual evaluation of speech quality (PESQ) score of 2.64 at source angles between 0° and 30°, outperforming the convolutional neural network-long short-term memory (CNN-LSTM) model by 1.17. In the post-processing stage, the Wave-SkipConvNet model achieved higher short-time objective intelligibility (STOI) and segmental signal-to-noise ratio (segSNR) values than the comparison models under different SNR conditions. The proposed framework provides an effective and deployment-oriented AI solution for improving speech accessibility and acoustic comfort in urban acoustic environments and performing-arts spaces. Beyond speech enhancement, it offers practical potential for supporting healthier, more inclusive, and more equitable acoustic environments in noise-sensitive public and educational spaces. It should be noted that this study focuses on the objective acoustic environment and signal-level speech enhancement, rather than subjective soundscape perception, musical perception, or human perceptual evaluation. Full article
Show Figures

Figure 1

33 pages, 8848 KB  
Article
A Fault Identification Method for EHA Multivariate Time Series Based on Multi-View Heterogeneous Ensemble Learning
by Guozhu Zhi, Kelin Zhong, Zhen Jia, Weijun Yan, Zhihao Gao, Baodong Wang, Qingqing Dang and Zhenbao Liu
Machines 2026, 14(6), 694; https://doi.org/10.3390/machines14060694 - 17 Jun 2026
Viewed by 184
Abstract
Accurate fault classification of electro-hydrostatic actuators (EHAs) remains challenging because multivariate fault signals contain local transient variations, inter-variable coupling, and dynamic temporal dependencies that are difficult to capture simultaneously using a single model. To address this problem, this paper proposes a multi-view temporal [...] Read more.
Accurate fault classification of electro-hydrostatic actuators (EHAs) remains challenging because multivariate fault signals contain local transient variations, inter-variable coupling, and dynamic temporal dependencies that are difficult to capture simultaneously using a single model. To address this problem, this paper proposes a multi-view temporal feature collaborative heterogeneous ensemble learning model (MTF-HEM) for EHA multivariate time series fault classification. MTF-HEM integrates a representative subsequence-guided time series forest (RSG-TSF), XGBoost, and a lightweight LSTM to extract local morphological, global statistical, and temporal dependency features, respectively. The outputs of these heterogeneous base learners are fused using a bootstrap-driven out-of-bag probability binning stacking (BOPB-stacking) strategy. The proposed method was evaluated on an AMESim-based simulated EHA plunger pump fault dataset containing one normal condition and six fault conditions. Under the present simulation setting, MTF-HEM achieved an accuracy of 99.52% and outperformed the tested deep time series classification models, ensemble models, and individual base learners. These results suggest that multi-view heterogeneous feature fusion can improve the classification of simulated EHA fault time series and provide a methodological reference for intelligent actuator fault diagnosis. However, the current validation is based on data generated from a single AMESim simulation model, and further evaluation on real EHA systems is needed to assess the practical applicability and generalizability of the proposed approach. Full article
(This article belongs to the Special Issue Fault Diagnosis and Fault Tolerant Control in Mechanical System)
Show Figures

Figure 1

22 pages, 3156 KB  
Article
A Lightweight Fish Detection Method for Complex Underwater Scenes
by Xiaojing Guo, Yuan Liu, Minghui Wang, Guangyu Zuo, Liwei Kou and Yinke Dou
J. Mar. Sci. Eng. 2026, 14(12), 1114; https://doi.org/10.3390/jmse14121114 - 17 Jun 2026
Viewed by 177
Abstract
Fish observation is a key component of marine ecological monitoring and is valuable for understanding ecological processes and fish population dynamics. In practical applications, observation equipment is often constrained by limited memory and computational resources, making it difficult to deploy visual detection models [...] Read more.
Fish observation is a key component of marine ecological monitoring and is valuable for understanding ecological processes and fish population dynamics. In practical applications, observation equipment is often constrained by limited memory and computational resources, making it difficult to deploy visual detection models with large parameter counts and high computational complexity. Under limited computational resources, existing deep-learning-based fish detection models struggle to balance detection accuracy, model lightweighting, and real-time edge deployment. To address this issue, a lightweight GEM-YOLOv8n model based on YOLOv8n is proposed for fish detection. For high model complexity, insufficient feature representation, and limited bounding box regression accuracy in complex underwater observation scenarios, the model replaces C2f modules and some Conv modules with GhostC2f and GhostConv, introduces the EMA attention mechanism, and adopts MPDIoU loss instead of CIoU. Experimental results show that, compared with YOLOv8n, GEM-YOLOv8n improves Precision, Recall, mAP50, and mAP50–95 by 0.53%, 2.16%, 0.52%, and 0.34% while reducing parameters and FLOPs by 48.0% and 39.5%. These results demonstrate that the proposed model improves detection performance while reducing model complexity. Tests on Jetson Xavier NX demonstrate real-time performance and deployment feasibility, providing a lightweight deployment solution for resource-constrained underwater fish detection. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

29 pages, 7383 KB  
Article
A Lightweight Transformer-Based Network for Image Deraining with Feature-Wise Attention and Cross-Level Feature Refinement
by Baozhu Li, Wanci Dai and Chao He
Appl. Sci. 2026, 16(12), 6108; https://doi.org/10.3390/app16126108 - 17 Jun 2026
Viewed by 182
Abstract
The aim in single-image deraining tasks is to remove rain streaks from degraded images while preserving scene structures and fine details. However, existing deep learning-based methods often face a trade-off between restoration quality and computational efficiency, and many models struggle to capture hierarchical [...] Read more.
The aim in single-image deraining tasks is to remove rain streaks from degraded images while preserving scene structures and fine details. However, existing deep learning-based methods often face a trade-off between restoration quality and computational efficiency, and many models struggle to capture hierarchical information effectively under complex rain conditions. To address these limitations, we propose a lightweight cross-gated hierarchical transformer for image deraining. The proposed network adopts a five-stage encoder–decoder architecture with Multi-head Feature-wise Attention (MFA) to efficiently model channel-wise dependencies while reducing the computational burden associated with conventional self-attention. In addition, an Enhanced Gated Depthwise Feed-Forward Network (EGDFN) is introduced to obtain refined feature representations with improved efficiency, and a Cross-Level Feature Refinement (CLFR) module is designed to enhance information exchange between corresponding encoder and decoder stages, thereby strengthening hierarchical feature integration and preserving structural details. The network is trained using a single SSIM-based loss, which enhances the structural fidelity of the restored results. Extensive experiments on four synthetic datasets, two real-world datasets, and a downstream semantic segmentation benchmark demonstrate that the proposed method consistently achieves strong restoration performance, producing cleaner outputs with sharper details and improved effectiveness for subsequent vision tasks. Full article
Show Figures

Figure 1

19 pages, 2057 KB  
Article
Research on Human Sitting Posture Recognition Based on an Improved LeNet-5 Optimization Algorithm
by Wei Li, Bowen Yang, Dawen Sun, Shijun Sun, Zhenyang Qin and Qianjin Liu
Processes 2026, 14(12), 1964; https://doi.org/10.3390/pr14121964 - 17 Jun 2026
Viewed by 159
Abstract
Human sitting posture recognition is critical for smart seating, ergonomic monitoring, and healthcare systems. However, existing deep learning approaches typically rely on highly complex network architectures that are computationally expensive, hindering their lightweight deployment on edge devices. Furthermore, current methods frequently struggle with [...] Read more.
Human sitting posture recognition is critical for smart seating, ergonomic monitoring, and healthcare systems. However, existing deep learning approaches typically rely on highly complex network architectures that are computationally expensive, hindering their lightweight deployment on edge devices. Furthermore, current methods frequently struggle with indistinct boundaries among multi-class postures and are highly prone to overfitting when constrained by small-sample pressure sensor datasets. To bridge this gap, this paper proposes a novel, lightweight posture recognition framework specifically tailored for pressure distribution maps. First, sitting pressure data is collected using a thin-film pressure array sensor and uniformly mapped into an [M × N] image representation, establishing an effective sample format for Convolutional Neural Network (CNN) inputs. Second, as our primary architectural contribution, we fundamentally optimize the classic LeNet-5 network to enhance complex feature representation without inflating model complexity. Specifically, the depth of the convolutional layers is increased with a progressively increasing channel configuration. Batch Normalization (BN) is introduced to accelerate convergence and ensure training stability, while a Dropout mechanism is embedded within the fully connected layers to strictly penalize overfitting under small-sample constraints. These architectural improvements are synergistically combined with targeted data augmentation strategies—including random translation, rotation, and intensity perturbation—to further strengthen the model’s generalization capability. Experimental results demonstrate that the proposed method achieves a classification accuracy of 95.5% in a five-class sitting posture recognition task, significantly outperforming baseline models such as the traditional LeNet-5, AlexNet-Lite, and VGG-Small. The findings indicate that this approach achieves an optimal balance among recognition accuracy, training stability, and low model complexity, providing a robust algorithmic baseline and proof-of-concept for smart healthcare perception systems, paving the way for future large-scale subject-independent validation. Full article
Show Figures

Figure 1

64 pages, 5039 KB  
Review
Deep Learning-Based Fruit Tree Pest and Disease Recognition Technology: Model Evolution, Challenges, and Edge Intelligence Deployment
by Yuxin Wang, Yawei Li, Wenhao Zhang, Zhihao Zhang, Chao Wang, Shuo Li, Kaiming Wang, Xiangzuo Huo and Xiaoju Yin
Agriculture 2026, 16(12), 1329; https://doi.org/10.3390/agriculture16121329 - 16 Jun 2026
Viewed by 182
Abstract
The early and accurate recognition of fruit tree pests and diseases is essential for safeguarding fruit yield, quality, and sustainable agricultural production. Conventional manual inspection methods are inadequate for meeting the demands of continuous, objective, and real-time monitoring in large-scale orchards. Following the [...] Read more.
The early and accurate recognition of fruit tree pests and diseases is essential for safeguarding fruit yield, quality, and sustainable agricultural production. Conventional manual inspection methods are inadequate for meeting the demands of continuous, objective, and real-time monitoring in large-scale orchards. Following the framework of “model evolution–key challenges–edge-intelligent deployment,” this review systematically summarizes advances in deep learning-based recognition of fruit tree pests and diseases, and compares the effectiveness and limitations of representative methods from the perspectives of data complexity, model generalization and robustness, real-time inference, cross-modal fusion, and trustworthy diagnosis. Existing studies indicate that CNNs, attention mechanisms, Transformers, multimodal fusion, and lightweight networks have promoted the transition of fruit tree pest and disease recognition from image classification to object detection, lesion segmentation, and edge deployment; however, sample scarcity, class imbalance, insufficient cross-domain generalization, black-box decision-making, energy constraints, and long-term robustness remain major bottlenecks for field application. Future research should focus on open orchard environments and develop data-efficient, interpretable, low-power, and continuously updatable edge-intelligent recognition systems, thereby advancing precision agriculture and smart orchards. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Graphical abstract

Back to TopTop