MDPI - Publisher of Open Access Journals

31 pages, 1452 KB

Open AccessArticle

A User-Centric Context-Aware Framework for Real-Time Optimisation of Multimedia Data Privacy Protection, and Information Retention Within Multimodal AI Systems

by Ndricim Topalli and Atta Badii

Sensors 2025, 25(19), 6105; https://doi.org/10.3390/s25196105 - 3 Oct 2025

Abstract

The increasing use of AI systems for face, object, action, scene, and emotion recognition raises significant privacy risks, particularly when processing Personally Identifiable Information (PII). Current privacy-preserving methods lack adaptability to users’ preferences and contextual requirements, and obfuscate user faces uniformly. This research [...] Read more.

The increasing use of AI systems for face, object, action, scene, and emotion recognition raises significant privacy risks, particularly when processing Personally Identifiable Information (PII). Current privacy-preserving methods lack adaptability to users’ preferences and contextual requirements, and obfuscate user faces uniformly. This research proposes a user-centric, context-aware, and ontology-driven privacy protection framework that dynamically adjusts privacy decisions based on user-defined preferences, entity sensitivity, and contextual information. The framework integrates state-of-the-art recognition models for recognising faces, objects, scenes, actions, and emotions in real time on data acquired from vision sensors (e.g., cameras). Privacy decisions are directed by a contextual ontology based in Contextual Integrity theory, which classifies entities into private, semi-private, or public categories. Adaptive privacy levels are enforced through obfuscation techniques and a multi-level privacy model that supports user-defined red lines (e.g., “always hide logos”). The framework also proposes a Re-Identifiability Index (RII) using soft biometric features such as gait, hairstyle, clothing, skin tone, age, and gender, to mitigate identity leakage and to support fallback protection when face recognition fails. The experimental evaluation relied on sensor-captured datasets, which replicate real-world image sensors such as surveillance cameras. User studies confirmed that the framework was effective, with over 85.2% of participants rating the obfuscation operations as highly effective, and the other 14.8% stating that obfuscation was adequately effective. Amongst these, 71.4% considered the balance between privacy protection and usability very satisfactory and 28% found it satisfactory. GPU acceleration was deployed to enable real-time performance of these models by reducing frame processing time from 1200 ms (CPU) to 198 ms. This ontology-driven framework employs user-defined red lines, contextual reasoning, and dual metrics (RII/IVI) to dynamically balance privacy protection with scene intelligibility. Unlike current anonymisation methods, the framework provides a real-time, user-centric, and GDPR-compliant method that operationalises privacy-by-design while preserving scene intelligibility. These features make the framework appropriate to a variety of real-world applications including healthcare, surveillance, and social media. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

24 pages, 4022 KB

Open AccessArticle

Dynamic Vision Sensor-Driven Spiking Neural Networks for Low-Power Event-Based Tracking and Recognition

by Boyi Feng, Rui Zhu, Yue Zhu, Yan Jin and Jiaqi Ju

Sensors 2025, 25(19), 6048; https://doi.org/10.3390/s25196048 - 1 Oct 2025

Abstract

Spiking neural networks (SNNs) have emerged as a promising model for energy-efficient, event-driven processing of asynchronous event streams from Dynamic Vision Sensors (DVSs), a class of neuromorphic image sensors with microsecond-level latency and high dynamic range. Nevertheless, challenges persist in optimising training and [...] Read more.

Spiking neural networks (SNNs) have emerged as a promising model for energy-efficient, event-driven processing of asynchronous event streams from Dynamic Vision Sensors (DVSs), a class of neuromorphic image sensors with microsecond-level latency and high dynamic range. Nevertheless, challenges persist in optimising training and effectively handling spatio-temporal complexity, which limits their potential for real-time applications on embedded sensing systems such as object tracking and recognition. Targeting this neuromorphic sensing pipeline, this paper proposes the Dynamic Tracking with Event Attention Spiking Network (DTEASN), a novel framework designed to address these challenges by employing a pure SNN architecture, bypassing conventional convolutional neural network (CNN) operations, and reducing GPU resource dependency, while tailoring the processing to DVS signal characteristics (asynchrony, sparsity, and polarity). The model incorporates two innovative, self-developed components: an event-driven multi-scale attention mechanism and a spatio-temporal event convolver, both of which significantly enhance spatio-temporal feature extraction from raw DVS events. An Event-Weighted Spiking Loss (EW-SLoss) is introduced to optimise the learning process by prioritising informative events and improving robustness to sensor noise. Additionally, a lightweight event tracking mechanism and a custom synaptic connection rule are proposed to further improve model efficiency for low-power, edge deployment. The efficacy of DTEASN is demonstrated through empirical results on event-based (DVS) object recognition and tracking benchmarks, where it outperforms conventional methods in accuracy, latency, event throughput (events/s) and spike rate (spikes/s), memory footprint, spike-efficiency (energy proxy), and overall computational efficiency under typical DVS settings. By virtue of its event-aligned, sparse computation, the framework is amenable to highly parallel neuromorphic hardware, supporting on- or near-sensor inference for embedded applications. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

28 pages, 32809 KB

Open AccessArticle

LiteSAM: Lightweight and Robust Feature Matching for Satellite and Aerial Imagery

by Boya Wang, Shuo Wang, Yibin Han, Linfeng Xu and Dong Ye

Remote Sens. 2025, 17(19), 3349; https://doi.org/10.3390/rs17193349 - 1 Oct 2025

Abstract

We present a (Light)weight (S)atellite–(A)erial feature (M)atching framework (LiteSAM) for robust UAV absolute visual localization (AVL) in GPS-denied environments. Existing satellite–aerial matching methods struggle with large appearance variations, texture-scarce regions, and limited efficiency for real-time UAV [...] Read more.

We present a (Light)weight (S)atellite–(A)erial feature (M)atching framework (LiteSAM) for robust UAV absolute visual localization (AVL) in GPS-denied environments. Existing satellite–aerial matching methods struggle with large appearance variations, texture-scarce regions, and limited efficiency for real-time UAV applications. LiteSAM integrates three key components to address these issues. First, efficient multi-scale feature extraction optimizes representation, reducing inference latency for edge devices. Second, a Token Aggregation–Interaction Transformer (TAIFormer) with a convolutional token mixer (CTM) models inter- and intra-image correlations, enabling robust global–local feature fusion. Third, a MinGRU-based dynamic subpixel refinement module adaptively learns spatial offsets, enhancing subpixel-level matching accuracy and cross-scenario generalization. The experiments show that LiteSAM achieves competitive performance across multiple datasets. On UAV-VisLoc, LiteSAM attains an RMSE@30 of 17.86 m, outperforming state-of-the-art semi-dense methods such as EfficientLoFTR. Its optimized variant, LiteSAM (opt., without dual softmax), delivers inference times of 61.98 ms on standard GPUs and 497.49 ms on NVIDIA Jetson AGX Orin, which are 22.9% and 19.8% faster than EfficientLoFTR (opt.), respectively. With 6.31M parameters, which is 2.4× fewer than EfficientLoFTR’s 15.05M, LiteSAM proves to be suitable for edge deployment. Extensive evaluations on natural image matching and downstream vision tasks confirm its superior accuracy and efficiency for general feature matching. Full article

20 pages, 2084 KB

Open AccessArticle

Automatic Sparse Matrix Format Selection via Dynamic Labeling and Clustering on Heterogeneous CPU–GPU Systems

by Zheng Shi, Yi Zou and Xianfeng Song

Electronics 2025, 14(19), 3895; https://doi.org/10.3390/electronics14193895 - 30 Sep 2025

Abstract

Sparse matrix–vector multiplication (SpMV) is a fundamental kernel in high-performance computing (HPC) whose efficiency depends heavily on the storage format across central processing unit (CPU) and graphics processing unit (GPU) platforms. Conventional supervised approaches often use execution time as training labels, but our [...] Read more.

Sparse matrix–vector multiplication (SpMV) is a fundamental kernel in high-performance computing (HPC) whose efficiency depends heavily on the storage format across central processing unit (CPU) and graphics processing unit (GPU) platforms. Conventional supervised approaches often use execution time as training labels, but our experiments on 1786 matrices reveal two issues: labels are unstable across runs due to execution-time variability, and single-label assignment overlooks cases where multiple formats perform similarly well. We propose a dynamic labeling strategy that assigns a single label when the fastest format shows clear superiority, and multiple labels when performance differences are small, thereby reducing label noise. We further extend feature analysis to multi-dimensional structural descriptors and apply clustering to refine label distributions and enhance prediction robustness. Experiments demonstrate 99.2% accuracy in hardware (CPU/GPU) selection and up to 98.95% accuracy in format prediction, with up to 10% robustness gains over traditional methods. Under cost-aware, end-to-end evaluation that accounts for feature extraction, prediction, conversion, and kernel execution, CPUs achieve speedups up to 3.15× and GPUs up to 1.94× over a CSR baseline. Cross-round evaluations confirm stability and generalization, providing a reliable path toward automated, cross-platform SpMV optimization. Full article

21 pages, 4397 KB

Open AccessArticle

Splatting the Cat: Efficient Free-Viewpoint 3D Virtual Try-On via View-Decomposed LoRA and Gaussian Splatting

by Chong-Wei Wang, Hung-Kai Huang, Tzu-Yang Lin, Hsiao-Wei Hu and Chi-Hung Chuang

Electronics 2025, 14(19), 3884; https://doi.org/10.3390/electronics14193884 - 30 Sep 2025

Abstract

As Virtual Try-On (VTON) technology matures, 2D VTON methods based on diffusion models can now rapidly generate diverse and high-quality try-on results. However, with rising user demands for realism and immersion, many applications are shifting towards 3D VTON, which offers superior geometric and [...] Read more.

As Virtual Try-On (VTON) technology matures, 2D VTON methods based on diffusion models can now rapidly generate diverse and high-quality try-on results. However, with rising user demands for realism and immersion, many applications are shifting towards 3D VTON, which offers superior geometric and spatial consistency. Existing 3D VTON approaches commonly face challenges such as barriers to practical deployment, substantial memory requirements, and cross-view inconsistencies. To address these issues, we propose an efficient 3D VTON framework with robust multi-view consistency, whose core design is to decouple the monolithic 3D editing task into a four-stage cascade as follows: (1) We first reconstruct an initial 3D scene using 3D Gaussian Splatting, integrating the SMPL-X model at this stage as a strong geometric prior. By computing a normal-map loss and a geometric consistency loss, we ensure the structural stability of the initial human model across different views. (2) We employ the lightweight CatVTON to generate 2D try-on images, that provide visual guidance for the subsequent personalized fine-tuning tasks. (3) To accurately represent garment details from all angles, we partition the 2D dataset into three subsets—front, side, and back—and train a dedicated LoRA module for each subset on a pre-trained diffusion model. This strategy effectively mitigates the issue of blurred details that can occur when a single model attempts to learn global features. (4) An iterative optimization process then uses the generated 2D VTON images and specialized LoRA modules to edit the 3DGS scene, achieving 360-degree free-viewpoint VTON results. All our experiments were conducted on a single consumer-grade GPU with 24 GB of memory, a significant reduction from the 32 GB or more typically required by previous studies under similar data and parameter settings. Our method balances quality and memory requirement, significantly lowering the adoption barrier for 3D VTON technology. Full article

(This article belongs to the Special Issue 2D/3D Industrial Visual Inspection and Intelligent Image Processing)

► Show Figures

Figure 1

37 pages, 3163 KB

Open AccessArticle

TurkerNeXtV2: An Innovative CNN Model for Knee Osteoarthritis Pressure Image Classification

by Omer Esmez, Gulnihal Deniz, Furkan Bilek, Murat Gurger, Prabal Datta Barua, Sengul Dogan, Mehmet Baygin and Turker Tuncer

Diagnostics 2025, 15(19), 2478; https://doi.org/10.3390/diagnostics15192478 - 27 Sep 2025

Abstract

Background/Objectives: Lightweight CNNs for medical imaging remain limited. We propose TurkerNeXtV2, a compact CNN that introduces two new blocks: a pooling-based attention with an inverted bottleneck (TNV2) and a hybrid downsampling module. These blocks improve stability and efficiency. The aim is to achieve [...] Read more.

Background/Objectives: Lightweight CNNs for medical imaging remain limited. We propose TurkerNeXtV2, a compact CNN that introduces two new blocks: a pooling-based attention with an inverted bottleneck (TNV2) and a hybrid downsampling module. These blocks improve stability and efficiency. The aim is to achieve transformer-level effectiveness while keeping the simplicity, low computational cost, and deployability of CNNs. Methods: The model was first pretrained on the Stable ImageNet-1k benchmark and then fine-tuned on a collected plantar-pressure OA dataset. We also evaluated the model on a public blood-cell image dataset. Performance was measured by accuracy, precision, recall, and F1-score. Inference time (images per second) was recorded on an RTX 5080 GPU. Grad-CAM was used for qualitative explainability. Results: During pretraining on Stable ImageNet-1k, the model reached a validation accuracy of 87.77%. On the OA test set, the model achieved 93.40% accuracy (95% CI: 91.3–95.2%) with balanced precision and recall above 90%. On the blood-cell dataset, the test accuracy was 98.52%. The average inference time was 0.0078 s per image (≈128.8 images/s), which is comparable to strong CNN baselines and faster than the transformer baselines tested under the same settings. Conclusions: TurkerNeXtV2 delivers high accuracy with low computational cost. The pooling-based attention (TNV2) and the hybrid downsampling enable a lightweight yet effective design. The model is suitable for real-time and clinical use. Future work will include multi-center validation and broader tests across imaging modalities. Full article

(This article belongs to the Special Issue Advances in Machine Learning for Medical Image Processing and Analysis)

► Show Figures

Figure 1

31 pages, 3788 KB

Open AccessArticle

Multi-Scale Feature Convolutional Modeling for Industrial Weld Defects Detection in Battery Manufacturing

by Waqar Riaz, Xiaozhi Qi, Jiancheng (Charles) Ji and Asif Ullah

Fractal Fract. 2025, 9(9), 611; https://doi.org/10.3390/fractalfract9090611 - 21 Sep 2025

Viewed by 226

Abstract

Defect detection in lithium-ion battery (LIB) welding presents unique challenges, including scale heterogeneity, subtle texture variations, and severe class imbalance. We propose a multi-scale convolutional framework that integrates EfficientNet-B0 for lightweight representation learning, PANet for cross-scale feature aggregation, and a YOLOv8 detection head [...] Read more.

Defect detection in lithium-ion battery (LIB) welding presents unique challenges, including scale heterogeneity, subtle texture variations, and severe class imbalance. We propose a multi-scale convolutional framework that integrates EfficientNet-B0 for lightweight representation learning, PANet for cross-scale feature aggregation, and a YOLOv8 detection head augmented with multi-head attention. Parallel dilated convolutions are employed to approximate self-similar receptive fields, enabling simultaneous sensitivity to fine-grained microstructural anomalies and large-scale geometric irregularities. The approach is validated on three datasets including RIAWELC, GC10-DET, and an industrial LIB defects dataset, where it consistently outperforms competitive baselines, achieving 8–10% improvements in recall and F1-score while preserving real-time inference on GPU. Ablation experiments and statistical significance tests isolate the contributions of attention and multi-scale design, confirming their role in reducing false negatives. Attention-based visualizations further enhance interpretability by exposing spatial regions driving predictions. Limitations remain regarding fixed imaging conditions and partial reliance on synthetic augmentation, but the framework establishes a principled direction toward efficient, interpretable, and scalable defect inspection in industrial manufacturing. Full article

(This article belongs to the Special Issue Advances in Fractal and Multifractal Analysis Driven by Machine Learning Approaches)

► Show Figures

Figure 1

17 pages, 4643 KB

Open AccessArticle

Deep Learning Emulator Towards Both Forward and Adjoint Modes of Atmospheric Gas-Phase Chemical Process

by Yulong Liu, Meicheng Liao, Jiacheng Liu and Zhen Cheng

Atmosphere 2025, 16(9), 1109; https://doi.org/10.3390/atmos16091109 - 21 Sep 2025

Viewed by 301

Abstract

Gas-phase chemistry has been identified as a major computational bottleneck in both the forward and adjoint modes of chemical transport models (CTMs). Although previous studies have demonstrated the potential of deep learning models to simulate and accelerate this process, few studies have examined [...] Read more.

Gas-phase chemistry has been identified as a major computational bottleneck in both the forward and adjoint modes of chemical transport models (CTMs). Although previous studies have demonstrated the potential of deep learning models to simulate and accelerate this process, few studies have examined the applicability and performance of these models in adjoint sensitivity analysis. In this study, a deep learning emulator for gas-phase chemistry is developed and trained on a diverse set of forward-mode simulations from the Community Multiscale Air Quality (CMAQ) model. The emulator employs a residual neural network (ResNet) architecture referred to as FiLM-ResNet, which integrates Feature-wise Linear Modulation (FiLM) layers to explicitly account for photochemical and non-photochemical conditions. Validation within a single timestep indicates that the emulator accurately predicts concentration changes for 74% of gas-phase species with coefficient of determination (R²) exceeding 0.999. After embedding the emulator into the CTM, multi-timestep simulation over one week shows close agreement with the numerical model. For the adjoint mode, we compute the sensitivities of ozone (O₃) with respect to O₃, nitric oxide (NO), nitrogen dioxide (NO₂), hydroxyl radical (OH) and isoprene (ISOP) using automatic differentiation, with the emulator-based adjoint results achieving a maximum R² of 0.995 in single timestep evaluations compared to the numerical adjoint sensitivities. A 24 h adjoint simulation reveals that the emulator maintains spatially consistent adjoint sensitivity distributions compared to the numerical model across most grid cells. In terms of computational efficiency, the emulator achieves speed-ups of 80×–130× in the forward mode and 45×–102× in the adjoint mode, depending on whether inference is executed on Central Processing Unit (CPU) or Graphics Processing Unit (GPU). These findings demonstrate that, once the emulator is accurately trained to reproduce forward-mode gas-phase chemistry, it can be effectively applied in adjoint sensitivity analysis. This approach offers a promising alternative approach to numerical adjoint frameworks in CTMs. Full article

(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)

► Show Figures

Figure 1

17 pages, 3136 KB

Open AccessArticle

MS Mamba: Spectrum Forecasting Method Based on Enhanced Mamba Architecture

by Dingyin Liu, Donghui Xu, Guojie Hu and Wang Zhang

Electronics 2025, 14(18), 3708; https://doi.org/10.3390/electronics14183708 - 18 Sep 2025

Viewed by 341

Abstract

Spectrum prediction is essential for cognitive radio, enabling dynamic management and enhanced utilization, particularly in multi-band environments. Yet, its complex spatiotemporal nature and non-stationarity pose significant challenges for achieving high accuracy. Motivated by this, we propose a multi-scale Mamba-based multi-band spectrum prediction method. [...] Read more.

Spectrum prediction is essential for cognitive radio, enabling dynamic management and enhanced utilization, particularly in multi-band environments. Yet, its complex spatiotemporal nature and non-stationarity pose significant challenges for achieving high accuracy. Motivated by this, we propose a multi-scale Mamba-based multi-band spectrum prediction method. The core Mamba module combines Bidirectional Selective State Space Models (SSMs) for long-range dependencies and dynamic convolution for local features, efficiently extracting spatiotemporal characteristics. A multi-scale pyramid and adaptive prediction head select appropriate feature levels per prediction step, avoiding full-sequence processing to ensure accuracy while reducing computational cost. Experiments on real-world datasets across multiple frequency bands demonstrate effective handling of spectrum non-stationarity. Compared to baseline models, the method reduces root mean square error (RMSE) by 14.9% (indoor) and 7.9% (outdoor) while cutting GPU memory by 17%. Full article

(This article belongs to the Special Issue Cognitive Radio Networks: Recent Developments and Emerging Trends)

► Show Figures

Graphical abstract

22 pages, 6378 KB

Open AccessArticle

LU-Net: Lightweight U-Shaped Network for Water Body Extraction of Remote Sensing Images

by Chengzhi Deng, Ruqiang He, Zhaoming Wu, Xiaowei Sun and Shengqian Wang

Water 2025, 17(18), 2763; https://doi.org/10.3390/w17182763 - 18 Sep 2025

Viewed by 257

Abstract

Deep learning-based water body extraction methods generally focus on maximizing accuracy while neglecting inference speed, which can make them challenging to apply in real-time applications. To address this problem, this paper proposes a lightweight u-shaped network (LU-Net), which improves inference speed while maintaining [...] Read more.

Deep learning-based water body extraction methods generally focus on maximizing accuracy while neglecting inference speed, which can make them challenging to apply in real-time applications. To address this problem, this paper proposes a lightweight u-shaped network (LU-Net), which improves inference speed while maintaining comparable accuracy. To reduce inference latency, a lightweight decoder block (LDB) is designed, which employs a depthwise separable convolution structure to accelerate the decoding process. To enhance accuracy, a lightweight convolutional block attention module (LCBAM) is designed, which effectively captures water-specific spectral and spatial characteristics through a dual-attention mechanism. To improve multi-scale water boundary extraction, a structurally re-parameterized multi-scale fusion prediction module (SRMFPM) is designed, which integrates multi-scale water boundary information through convolutions of different sizes. Comparative experiments are conducted on the GID and LoveDA datasets, with model performance assessed using the MIoU metric and inference latency. The results demonstrate that LU-Net achieves the lowest GPU latency of 3.1 MS and the second-lowest CPU latency of 36 MS in the experiments. On the GID, LU-Net achieves the MIoU of 91.36%, outperforming other tested methods. On the LoveDA datasets, LU-Net achieves the second-highest MIoU of 86.32% among the evaluated models, which is 0.08% lower than the top-performing CGNet. Considering both latency and MIoU, LU-Net demonstrates commendable efficiency on the GID and LoveDA datasets across all compared networks. Full article

(This article belongs to the Section New Sensors, New Technologies and Machine Learning in Water Sciences)

► Show Figures

Figure 1

15 pages, 3677 KB

Open AccessArticle

Contextual Feature Expansion with Superordinate Concept for Compositional Zero-Shot Learning

by Soohyeong Kim and Yong Suk Choi

Appl. Sci. 2025, 15(17), 9837; https://doi.org/10.3390/app15179837 - 8 Sep 2025

Viewed by 387

Abstract

Compositional Zero-Shot Learning (CZSL) seeks to enable machines to recognize objects and attributes (i.e., primitives),learn their associations, and generalize to novel compositions, enabling systems to exhibit a human-like ability to infer and generalize. The existing approaches, multi-label and multi-class classification, face inherent trade-offs: [...] Read more.

Compositional Zero-Shot Learning (CZSL) seeks to enable machines to recognize objects and attributes (i.e., primitives),learn their associations, and generalize to novel compositions, enabling systems to exhibit a human-like ability to infer and generalize. The existing approaches, multi-label and multi-class classification, face inherent trade-offs: the former suffers from biases against unrelated compositions, while the latter struggles with exponentially growing search spaces as the number of objects and attributes increases. To overcome these limitations and address the exponential complexity in CZSL, we introduce Concept-oriented Feature ADjustment (CoFAD), a novel method that extracts superordinate conceptual features based on primitive relationships and expands label feature boundaries. By incorporating spectral clustering and membership function in fuzzy logic, CoFAD achieves state-of-the-art performance while using 2×–4× less GPU memory and reducing training time by up to 50× on large-scale dataset. Full article

► Show Figures

Figure 1

20 pages, 9291 KB

Open AccessArticle

BGWL-YOLO: A Lightweight and Efficient Object Detection Model for Apple Maturity Classification Based on the YOLOv11n Improvement

by Zhi Qiu, Wubin Ou, Deyun Mo, Yuechao Sun, Xingzao Ma, Xianxin Chen and Xuejun Tian

Horticulturae 2025, 11(9), 1068; https://doi.org/10.3390/horticulturae11091068 - 4 Sep 2025

Viewed by 650

Abstract

China is the world’s leading producer of apples. However, the current classification of apple maturity is predominantly reliant on manual expertise, a process that is both inefficient and costly. In this study, we utilize a diverse array of apples of varying ripeness levels [...] Read more.

China is the world’s leading producer of apples. However, the current classification of apple maturity is predominantly reliant on manual expertise, a process that is both inefficient and costly. In this study, we utilize a diverse array of apples of varying ripeness levels as the research subjects. We propose a lightweight target detection model, termed BGWL-YOLO, which is based on YOLOv11n and incorporates the following specific improvements. To enhance the model’s ability for multi-scale feature fusion, a bidirectional weighted feature pyramid network (BiFPN) is introduced in the neck. In response to the problem of redundant computation in convolutional neural networks, a GhostConv is used to replace the standard convolution. The Wise-Inner-MPDIoU (WIMIoU) loss function is introduced to improve the localization accuracy of the model. Finally, the LAMP pruning algorithm is utilized to further compress the model size. The experimental results demonstrate that the BGWL-YOLO model attains a detection and recognition precision rate of 83.5%, a recall rate of 81.7%, and an average precision mean of 90.1% on the test set. A comparative analysis reveals that the number of parameters has been reduced by 65.3%, the computational demands have been decreased by 57.1%, the frames per second (FPS) have been boosted by 5.8% on the GPU and 32.8% on the CPU, and most notably, the model size has been reduced by 74.8%. This substantial reduction in size is highly advantageous for deployment on compact smart devices, thereby facilitating the advancement of smart agriculture. Full article

(This article belongs to the Special Issue Application of Intelligent Technology and Equipment in Horticultural Production)

► Show Figures

Figure 1

24 pages, 4538 KB

Open AccessArticle

CNN–Transformer-Based Model for Maritime Blurred Target Recognition

by Tianyu Huang, Chao Pan, Jin Liu and Zhiwei Kang

Electronics 2025, 14(17), 3354; https://doi.org/10.3390/electronics14173354 - 23 Aug 2025

Viewed by 436

Abstract

In maritime blurred image recognition, ship collision accidents frequently result from three primary blur types: (1) motion blur from vessel movement in complex sea conditions, (2) defocus blur due to water vapor refraction, and (3) scattering blur caused by sea fog interference. This [...] Read more.

In maritime blurred image recognition, ship collision accidents frequently result from three primary blur types: (1) motion blur from vessel movement in complex sea conditions, (2) defocus blur due to water vapor refraction, and (3) scattering blur caused by sea fog interference. This paper proposes a dual-branch recognition method specifically designed for motion blur, which represents the most prevalent blur type in maritime scenarios. Conventional approaches exhibit constrained computational efficiency and limited adaptability across different modalities. To overcome these limitations, we propose a hybrid CNN–Transformer architecture: the CNN branch captures local blur characteristics, while the enhanced Transformer module models long-range dependencies via attention mechanisms. The CNN branch employs a lightweight ResNet variant, in which conventional residual blocks are substituted with Multi-Scale Gradient-Aware Residual Block (MSG-ARB). This architecture employs learnable gradient convolution for explicit local gradient feature extraction and utilizes gradient content gating to strengthen blur-sensitive region representation, significantly improving computational efficiency compared to conventional CNNs. The Transformer branch incorporates a Hierarchical Swin Transformer (HST) framework with Shifted Window-based Multi-head Self-Attention for global context modeling. The proposed method incorporates blur invariant Positional Encoding (PE) to enhance blur spectrum modeling capability, while employing DyT (Dynamic Tanh) module with learnable α parameters to replace traditional normalization layers. This architecture achieves a significant reduction in computational costs while preserving feature representation quality. Moreover, it efficiently computes long-range image dependencies using a compact 16 × 16 window configuration. The proposed feature fusion module synergistically integrates CNN-based local feature extraction with Transformer-enabled global representation learning, achieving comprehensive feature modeling across different scales. To evaluate the model’s performance and generalization ability, we conducted comprehensive experiments on four benchmark datasets: VAIS, GoPro, Mini-ImageNet, and Open Images V4. Experimental results show that our method achieves superior classification accuracy compared to state-of-the-art approaches, while simultaneously enhancing inference speed and reducing GPU memory consumption. Ablation studies confirm that the DyT module effectively suppresses outliers and improves computational efficiency, particularly when processing low-quality input data. Full article

► Show Figures

Figure 1

18 pages, 2181 KB

Open AccessArticle

MPCTF: A Multi-Party Collaborative Training Framework for Large Language Models

by Ning Liu and Dan Liu

Electronics 2025, 14(16), 3253; https://doi.org/10.3390/electronics14163253 - 16 Aug 2025

Viewed by 441

Abstract

The demand for high-quality private data in large language models is growing significantly. However, private data is often scattered across different entities, leading to significant data silo issues. To alleviate such problems, we propose a novel multi-party collaborative training framework for large language [...] Read more.

The demand for high-quality private data in large language models is growing significantly. However, private data is often scattered across different entities, leading to significant data silo issues. To alleviate such problems, we propose a novel multi-party collaborative training framework for large language models, named MPCTF. MPCTF consists of several components to achieve multi-party collaborative training: (1) a one-click launch mechanism with multi-node and multi-GPU training capabilities, significantly simplifying user operations while enhancing automation and optimizing the collaborative training workflow; (2) four data partitioning strategies for splitting client datasets during the training process, namely fixed-size strategy, percentage-based strategy, maximum data volume strategy, and total data volume and available GPU memory strategy; (3) multiple aggregation strategies; and (4) multiple privacy protection strategies to achieve privacy protection. We conducted extensive experiments to validate the effectiveness of the proposed MPCTF. The experimental results demonstrate that the proposed MPCTF achieves superior performance; for example, our MPCTF acquired an accuracy rate of 65.43 and outperformed the existing work, which acquired an accuracy rate of 14.25 in the experiments. Moreover, we hope that our proposed MPCTF can promote the development of collaborative training for large language models. Full article

(This article belongs to the Special Issue Advances in Information Processing and Network Security)

► Show Figures

Figure 1

29 pages, 2185 KB

Open AccessArticle

Calculating the Singular Values of Many Small Matrices on GPUs

by Amedeo Capozzoli, Claudio Curcio, Salvatore Di Donna and Angelo Liseno

Electronics 2025, 14(16), 3217; https://doi.org/10.3390/electronics14163217 - 13 Aug 2025

Viewed by 282

Abstract

This paper presents a fast and robust approach to evaluate the singular values of small (e.g.,

4 \times 4

,

5 \times 5

) matrices on single- and multi-Graphics Processing Unit (GPU) systems, enabling the modulation of the accuracy–speed trade-off. Targeting applications that [...] Read more.

This paper presents a fast and robust approach to evaluate the singular values of small (e.g.,

4 \times 4

,

5 \times 5

) matrices on single- and multi-Graphics Processing Unit (GPU) systems, enabling the modulation of the accuracy–speed trade-off. Targeting applications that require only computations of the SVs in electromagnetics (e.g., Multiple Input Multiple Output—MIMO link capacity optimization) and emerging deep-learning kernels, our method contrasts with existing GPU singular value decomposition (SVD) routines by computing singular values only, thereby reducing overhead compared to full-SVD libraries such as cuSOLVER’s gesvd and MKL’s desvg. The method uses four steps: interlaced storage of the matrices in GPU global memory, bidiagonalization via Householder transformations, symmetric tridiagonalization, and root finding by bisection using Sturm sequences. We implemented the algorithm in CUDA and evaluated it on different single- and multi-GPU systems. The approach is particularly suited for the analysis and design of multiple-input/multiple-output (MIMO) communication links, where thousands of tiny SVDs must be computed rapidly. As an example of the satisfactory performance of our approach, the speed-up reached for large matrix batches against cuSOLVER’s gesvd has been around 20 for

4 \times 4

matrices. Furthermore, near-linear scaling across multi-GPUs systems has been reached, while maintaining root mean square errors below

2.3 \times 10^{- 7}

in single precision and below

2.3 \times 10^{- 13}

in double precision. Tightening the tolerance from

δ = 10^{- 7}

to

δ = 10^{- 9}

increased the total runtime by only about 10%. Full article

(This article belongs to the Special Issue Smart Antennas and Systems for 5G and Beyond: Latest Advances and Prospects)

► Show Figures

Figure 1

Search Results (282)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (282)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI