MDPI - Publisher of Open Access Journals

20 pages, 2103 KiB

Open AccessArticle

Federated Multi-Stage Attention Neural Network for Multi-Label Electricity Scene Classification

by Lei Zhong, Xuejiao Jiang, Jialong Xu, Kaihong Zheng, Min Wu, Lei Gao, Chao Ma, Dewen Zhu and Yuan Ai

J. Low Power Electron. Appl. 2025, 15(3), 46; https://doi.org/10.3390/jlpea15030046 - 5 Aug 2025

Abstract

Privacy-sensitive electricity scene classification requires robust models under data localization constraints, making federated learning (FL) a suitable framework. Existing FL frameworks face two critical challenges in multi-label electricity scene classification: (1) Label correlations and their strengths significantly impact classification performance. (2) Electricity scene [...] Read more.

Privacy-sensitive electricity scene classification requires robust models under data localization constraints, making federated learning (FL) a suitable framework. Existing FL frameworks face two critical challenges in multi-label electricity scene classification: (1) Label correlations and their strengths significantly impact classification performance. (2) Electricity scene data and labels show distributional inconsistencies across regions. However, current FL frameworks lack explicit modeling of label correlation strengths, and locally trained regional models naturally capture these differences, leading to regional differences in their model parameters. In this scenario, the server’s standard single-stage aggregation often over-averages the global model’s parameters, reducing its discriminative ability. To address these issues, we propose FMMAN, a federated multi-stage attention neural network for multi-label electricity scene classification. The main contributions of this FMMAN lie in label correlation learning and the stepwise model aggregation. It splits the client–server interaction into multiple stages: (1) Clients train models locally to encode features and label correlation strengths after receiving the server’s initial model. (2) The server clusters these locally trained models into K groups to ensure that models within a group have more consistent parameters and generates K prototype models via intra-group aggregation to reduce over-averaging. The K models are then distributed back to the clients. (3) Clients refine their models using the K prototypes with contrastive group-specific consistency regularization to further mitigate over-averaging, and sends the refined model back to the server. (4) Finally, the server aggregates the models into a global model. Experiments on multi-label benchmarks verify that FMMAN outperforms baseline methods. Full article

(This article belongs to the Special Issue Advances in Low Power Neuromorphic Computing: Models, Algorithms, and Applications)

► Show Figures

Figure 1

28 pages, 21813 KiB

Open AccessArticle

Adaptive RGB-D Semantic Segmentation with Skip-Connection Fusion for Indoor Staircase and Elevator Localization

by Zihan Zhu, Henghong Lin, Anastasia Ioannou and Tao Wang

J. Imaging 2025, 11(8), 258; https://doi.org/10.3390/jimaging11080258 - 4 Aug 2025

Abstract

Accurate semantic segmentation of indoor architectural elements, such as staircases and elevators, is critical for safe and efficient robotic navigation, particularly in complex multi-floor environments. Traditional fusion methods struggle with occlusions, reflections, and low-contrast regions. In this paper, we propose a novel feature [...] Read more.

Accurate semantic segmentation of indoor architectural elements, such as staircases and elevators, is critical for safe and efficient robotic navigation, particularly in complex multi-floor environments. Traditional fusion methods struggle with occlusions, reflections, and low-contrast regions. In this paper, we propose a novel feature fusion module, Skip-Connection Fusion (SCF), that dynamically integrates RGB (Red, Green, Blue) and depth features through an adaptive weighting mechanism and skip-connection integration. This approach enables the model to selectively emphasize informative regions while suppressing noise, effectively addressing challenging conditions such as partially blocked staircases, glossy elevator doors, and dimly lit stair edges, which improves obstacle detection and supports reliable human–robot interaction in complex environments. Extensive experiments on a newly collected dataset demonstrate that SCF consistently outperforms state-of-the-art methods, including PSPNet and DeepLabv3, in both overall mIoU (mean Intersection over Union) and challenging-case performance. Specifically, our SCF module improves segmentation accuracy by 5.23% in the top 10% of challenging samples, highlighting its robustness in real-world conditions. Furthermore, we conduct a sensitivity analysis on the learnable weights, demonstrating their impact on segmentation quality across varying scene complexities. Our work provides a strong foundation for real-world applications in autonomous navigation, assistive robotics, and smart surveillance. Full article

(This article belongs to the Topic State-of-the-Art Object Detection, Tracking, and Recognition Techniques)

► Show Figures

Figure 1

22 pages, 6628 KiB

Open AccessArticle

MCA-GAN: A Multi-Scale Contextual Attention GAN for Satellite Remote-Sensing Image Dehazing

by Sufen Zhang, Yongcheng Zhang, Zhaofeng Yu, Shaohua Yang, Huifeng Kang and Jingman Xu

Electronics 2025, 14(15), 3099; https://doi.org/10.3390/electronics14153099 - 3 Aug 2025

Viewed by 51

Abstract

With the growing demand for ecological monitoring and geological exploration, high-quality satellite remote-sensing imagery has become indispensable for accurate information extraction and automated analysis. However, haze reduces image contrast and sharpness, significantly impairing quality. Existing dehazing methods, primarily designed for natural images, struggle [...] Read more.

With the growing demand for ecological monitoring and geological exploration, high-quality satellite remote-sensing imagery has become indispensable for accurate information extraction and automated analysis. However, haze reduces image contrast and sharpness, significantly impairing quality. Existing dehazing methods, primarily designed for natural images, struggle with remote-sensing images due to their complex imaging conditions and scale diversity. Given this, we propose a novel Multi-Scale Contextual Attention Generative Adversarial Network (MCA-GAN), specifically designed for satellite image dehazing. Our method integrates multi-scale feature extraction with global contextual guidance to enhance the network’s comprehension of complex remote-sensing scenes and its sensitivity to fine details. MCA-GAN incorporates two self-designed key modules: (1) a Multi-Scale Feature Aggregation Block, which employs multi-directional global pooling and multi-scale convolutional branches to bolster the model’s ability to capture land-cover details across varying spatial scales; (2) a Dynamic Contextual Attention Block, which uses a gated mechanism to fuse three-dimensional attention weights with contextual cues, thereby preserving global structural and chromatic consistency while retaining intricate local textures. Extensive qualitative and quantitative experiments on public benchmarks demonstrate that MCA-GAN outperforms other existing methods in both visual fidelity and objective metrics, offering a robust and practical solution for remote-sensing image dehazing. Full article

► Show Figures

Figure 1

34 pages, 5777 KiB

Open AccessArticle

ACNet: An Attention–Convolution Collaborative Semantic Segmentation Network on Sensor-Derived Datasets for Autonomous Driving

by Qiliang Zhang, Kaiwen Hua, Zi Zhang, Yiwei Zhao and Pengpeng Chen

Sensors 2025, 25(15), 4776; https://doi.org/10.3390/s25154776 - 3 Aug 2025

Viewed by 84

Abstract

In intelligent vehicular networks, the accuracy of semantic segmentation in road scenes is crucial for vehicle-mounted artificial intelligence to achieve environmental perception, decision support, and safety control. Although deep learning methods have made significant progress, two main challenges remain: first, the difficulty in [...] Read more.

In intelligent vehicular networks, the accuracy of semantic segmentation in road scenes is crucial for vehicle-mounted artificial intelligence to achieve environmental perception, decision support, and safety control. Although deep learning methods have made significant progress, two main challenges remain: first, the difficulty in balancing global and local features leads to blurred object boundaries and misclassification; second, conventional convolutions have limited ability to perceive irregular objects, causing information loss and affecting segmentation accuracy. To address these issues, this paper proposes a global–local collaborative attention module and a spider web convolution module. The former enhances feature representation through bidirectional feature interaction and dynamic weight allocation, reducing false positives and missed detections. The latter introduces an asymmetric sampling topology and six-directional receptive field paths to effectively improve the recognition of irregular objects. Experiments on the Cityscapes, CamVid, and BDD100K datasets, collected using vehicle-mounted cameras, demonstrate that the proposed method performs excellently across multiple evaluation metrics, including mIoU, mRecall, mPrecision, and mAccuracy. Comparative experiments with classical segmentation networks, attention mechanisms, and convolution modules validate the effectiveness of the proposed approach. The proposed method demonstrates outstanding performance in sensor-based semantic segmentation tasks and is well-suited for environmental perception systems in autonomous driving. Full article

(This article belongs to the Special Issue AI-Driving for Autonomous Vehicles)

► Show Figures

Figure 1

22 pages, 2498 KiB

Open AccessArticle

SceEmoNet: A Sentiment Analysis Model with Scene Construction Capability

by Yi Liang, Dongfang Han, Zhenzhen He, Bo Kong and Shuanglin Wen

Appl. Sci. 2025, 15(15), 8588; https://doi.org/10.3390/app15158588 (registering DOI) - 2 Aug 2025

Viewed by 141

Abstract

How do humans analyze the sentiments embedded in text? When attempting to analyze a text, humans construct a “scene” in their minds through imagination based on the text, generating a vague image. They then synthesize the text and the mental image to derive [...] Read more.

How do humans analyze the sentiments embedded in text? When attempting to analyze a text, humans construct a “scene” in their minds through imagination based on the text, generating a vague image. They then synthesize the text and the mental image to derive the final analysis result. However, current sentiment analysis models lack such imagination; they can only analyze based on existing information in the text, which limits their classification accuracy. To address this issue, we propose the SceEmoNet model. This model endows text classification models with imagination through Stable diffusion, enabling the model to generate corresponding visual scenes from input text, thus introducing a new modality of visual information. We then use the Contrastive Language-Image Pre-training (CLIP) model, a multimodal feature extraction model, to extract aligned features from different modalities, preventing significant feature differences caused by data heterogeneity. Finally, we fuse information from different modalities using late fusion to obtain the final classification result. Experiments on six datasets with different classification tasks show improvements of 9.57%, 3.87%, 3.63%, 3.14%, 0.77%, and 0.28%, respectively. Additionally, we set up experiments to deeply analyze the model’s advantages and limitations, providing a new technical path for follow-up research. Full article

(This article belongs to the Special Issue Advanced Technologies and Applications of Emotion Recognition)

► Show Figures

Figure 1

22 pages, 24173 KiB

Open AccessArticle

ScaleViM-PDD: Multi-Scale EfficientViM with Physical Decoupling and Dual-Domain Fusion for Remote Sensing Image Dehazing

by Hao Zhou, Yalun Wang, Wanting Peng, Xin Guan and Tao Tao

Remote Sens. 2025, 17(15), 2664; https://doi.org/10.3390/rs17152664 - 1 Aug 2025

Viewed by 172

Abstract

Remote sensing images are often degraded by atmospheric haze, which not only reduces image quality but also complicates information extraction, particularly in high-level visual analysis tasks such as object detection and scene classification. State-space models (SSMs) have recently emerged as a powerful paradigm [...] Read more.

Remote sensing images are often degraded by atmospheric haze, which not only reduces image quality but also complicates information extraction, particularly in high-level visual analysis tasks such as object detection and scene classification. State-space models (SSMs) have recently emerged as a powerful paradigm for vision tasks, showing great promise due to their computational efficiency and robust capacity to model global dependencies. However, most existing learning-based dehazing methods lack physical interpretability, leading to weak generalization. Furthermore, they typically rely on spatial features while neglecting crucial frequency domain information, resulting in incomplete feature representation. To address these challenges, we propose ScaleViM-PDD, a novel network that enhances an SSM backbone with two key innovations: a Multi-scale EfficientViM with Physical Decoupling (ScaleViM-P) module and a Dual-Domain Fusion (DD Fusion) module. The ScaleViM-P module synergistically integrates a Physical Decoupling block within a Multi-scale EfficientViM architecture. This design enables the network to mitigate haze interference in a physically grounded manner at each representational scale while simultaneously capturing global contextual information to adaptively handle complex haze distributions. To further address detail loss, the DD Fusion module replaces conventional skip connections by incorporating a novel Frequency Domain Module (FDM) alongside channel and position attention. This allows for a more effective fusion of spatial and frequency features, significantly improving the recovery of fine-grained details, including color and texture information. Extensive experiments on nine publicly available remote sensing datasets demonstrate that ScaleViM-PDD consistently surpasses state-of-the-art baselines in both qualitative and quantitative evaluations, highlighting its strong generalization ability. Full article

(This article belongs to the Special Issue Artificial Intelligence Algorithm for Remote Sensing Imagery Processing (5th Edition))

► Show Figures

Figure 1

12 pages, 3315 KiB

Open AccessArticle

NeRF-RE: An Improved Neural Radiance Field Model Based on Object Removal and Efficient Reconstruction

by Ziyang Li, Yongjian Huai, Qingkuo Meng and Shiquan Dong

Information 2025, 16(8), 654; https://doi.org/10.3390/info16080654 - 31 Jul 2025

Viewed by 124

Abstract

High-quality green gardens can markedly enhance the quality of life and mental well-being of their users. However, health and lifestyle constraints make it difficult for people to enjoy urban gardens, and traditional methods struggle to offer the high-fidelity experiences they need. This study [...] Read more.

High-quality green gardens can markedly enhance the quality of life and mental well-being of their users. However, health and lifestyle constraints make it difficult for people to enjoy urban gardens, and traditional methods struggle to offer the high-fidelity experiences they need. This study introduces a 3D scene reconstruction and rendering strategy based on implicit neural representation through the efficient and removable neural radiation fields model (NeRF-RE). Leveraging neural radiance fields (NeRF), the model incorporates a multi-resolution hash grid and proposal network to improve training efficiency and modeling accuracy, while integrating a segment-anything model to safeguard public privacy. Take the crabapple tree, extensively utilized in urban garden design across temperate regions of the Northern Hemisphere. A dataset comprising 660 images of crabapple trees exhibiting three distinct geometric forms is collected to assess the NeRF-RE model’s performance. The results demonstrated that the ‘harvest gold’ crabapple scene had the highest reconstruction accuracy, with PSNR, LPIPS and SSIM of 24.80 dB, 0.34 and 0.74, respectively. Compared to the Mip-NeRF 360 model, the NeRF-RE model not only showed an up to 21-fold increase in training efficiency for three types of crabapple trees, but also exhibited a less pronounced impact of dataset size on reconstruction accuracy. This study reconstructs real scenes with high fidelity using virtual reality technology. It not only facilitates people’s personal enjoyment of the beauty of natural gardens at home, but also makes certain contributions to the publicity and promotion of urban landscapes. Full article

(This article belongs to the Special Issue Extended Reality and Its Applications)

► Show Figures

Figure 1

20 pages, 3729 KiB

Open AccessArticle

Can AIGC Aid Intelligent Robot Design? A Tentative Research of Apple-Harvesting Robot

by Qichun Jin, Jiayu Zhao, Wei Bao, Ji Zhao, Yujuan Zhang and Fuwen Hu

Processes 2025, 13(8), 2422; https://doi.org/10.3390/pr13082422 - 30 Jul 2025

Viewed by 348

Abstract

More recently, artificial intelligence (AI)-generated content (AIGC) is fundamentally transforming multiple sectors, including materials discovery, healthcare, education, scientific research, and industrial manufacturing. As for the complexities and challenges of intelligent robot design, AIGC has the potential to offer a new paradigm, assisting in [...] Read more.

More recently, artificial intelligence (AI)-generated content (AIGC) is fundamentally transforming multiple sectors, including materials discovery, healthcare, education, scientific research, and industrial manufacturing. As for the complexities and challenges of intelligent robot design, AIGC has the potential to offer a new paradigm, assisting in conceptual and technical design, functional module design, and the training of the perception ability to accelerate prototyping. Taking the design of an apple-harvesting robot, for example, we demonstrate a basic framework of the AIGC-assisted robot design methodology, leveraging the generation capabilities of available multimodal large language models, as well as the human intervention to alleviate AI hallucination and hidden risks. Second, we study the enhancement effect on the robot perception system using the generated apple images based on the large vision-language models to expand the actual apple images dataset. Further, an apple-harvesting robot prototype based on an AIGC-aided design is demonstrated and a pick-up experiment in a simulated scene indicates that it achieves a harvesting success rate of 92.2% and good terrain traversability with a maximum climbing angle of 32°. According to the tentative research, although not an autonomous design agent, the AIGC-driven design workflow can alleviate the significant complexities and challenges of intelligent robot design, especially for beginners or young engineers. Full article

(This article belongs to the Special Issue Design and Control of Complex and Intelligent Systems)

► Show Figures

Figure 1

21 pages, 2267 KiB

Open AccessArticle

Dual-Branch Network for Blind Quality Assessment of Stereoscopic Omnidirectional Images: A Spherical and Perceptual Feature Integration Approach

by Zhe Wang, Yi Liu and Yang Song

Electronics 2025, 14(15), 3035; https://doi.org/10.3390/electronics14153035 - 30 Jul 2025

Viewed by 163

Abstract

Stereoscopic omnidirectional images (SOIs) have gained significant attention for their immersive viewing experience by providing binocular depth with panoramic scenes. However, evaluating their visual quality remains challenging due to its unique spherical geometry, binocular disparity, and viewing conditions. To address these challenges, this [...] Read more.

Stereoscopic omnidirectional images (SOIs) have gained significant attention for their immersive viewing experience by providing binocular depth with panoramic scenes. However, evaluating their visual quality remains challenging due to its unique spherical geometry, binocular disparity, and viewing conditions. To address these challenges, this paper proposes a dual-branch deep learning framework that integrates spherical structural features and perceptual binocular cues to assess the quality of SOIs without reference. Specifically, the global branch leverages spherical convolutions to capture wide-range spatial distortions, while the local branch utilizes a binocular difference module based on discrete wavelet transform to extract depth-aware perceptual information. A feature complementarity module is introduced to fuse global and local representations for final quality prediction. Experimental evaluations on two public SOIQA datasets—NBU-SOID and SOLID—demonstrate that the proposed method achieves state-of-the-art performance, with PLCC/SROCC values of 0.926/0.918 and 0.918/0.891, respectively. These results validate the effectiveness and robustness of our approach in stereoscopic omnidirectional image quality assessment tasks. Full article

(This article belongs to the Special Issue AI in Signal and Image Processing)

► Show Figures

Figure 1

21 pages, 5817 KiB

Open AccessArticle

UN15: An Urban Noise Dataset Coupled with Time–Frequency Attention for Environmental Sound Classification

by Yu Shen, Ge Cao, Huan-Yu Dong, Bo Dong and Chang-Myung Lee

Appl. Sci. 2025, 15(15), 8413; https://doi.org/10.3390/app15158413 - 29 Jul 2025

Viewed by 141

Abstract

With the increasing severity of urban noise pollution, its detrimental impact on public health has garnered growing attention. However, accurate identification and classification of noise sources in complex urban acoustic environments remain major technical challenges for achieving refined noise management. To address this [...] Read more.

With the increasing severity of urban noise pollution, its detrimental impact on public health has garnered growing attention. However, accurate identification and classification of noise sources in complex urban acoustic environments remain major technical challenges for achieving refined noise management. To address this issue, this study presents two key contributions. First, we construct a new urban noise classification dataset, namely the urban noise 15-category dataset (UN15), which consists of 1620 audio clips from 15 representative categories, including traffic, construction, crowd activity, and commercial noise, recorded from diverse real-world urban scenes. Second, we propose a novel deep neural network architecture based on a residual network and integrated with a time–frequency attention mechanism, referred to as residual network with temporal–frequency attention (ResNet-TF). Extensive experiments conducted on the UN15 dataset demonstrate that ResNet-TF outperforms several mainstream baseline models in both classification accuracy and robustness. These results not only verify the effectiveness of the proposed attention mechanism but also establish the UN15 dataset as a valuable benchmark for future research in urban noise classification. Full article

(This article belongs to the Section Acoustics and Vibrations)

► Show Figures

Figure 1

30 pages, 92065 KiB

Open AccessArticle

A Picking Point Localization Method for Table Grapes Based on PGSS-YOLOv11s and Morphological Strategies

by Jin Lu, Zhongji Cao, Jin Wang, Zhao Wang, Jia Zhao and Minjie Zhang

Agriculture 2025, 15(15), 1622; https://doi.org/10.3390/agriculture15151622 - 26 Jul 2025

Viewed by 277

Abstract

During the automated picking of table grapes, the automatic recognition and segmentation of grape pedicels, along with the positioning of picking points, are vital components for all the following operations of the harvesting robot. In the actual scene of a grape plantation, however, [...] Read more.

During the automated picking of table grapes, the automatic recognition and segmentation of grape pedicels, along with the positioning of picking points, are vital components for all the following operations of the harvesting robot. In the actual scene of a grape plantation, however, it is extremely difficult to accurately and efficiently identify and segment grape pedicels and then reliably locate the picking points. This is attributable to the low distinguishability between grape pedicels and the surrounding environment such as branches, as well as the impacts of other conditions like weather, lighting, and occlusion, which are coupled with the requirements for model deployment on edge devices with limited computing resources. To address these issues, this study proposes a novel picking point localization method for table grapes based on an instance segmentation network called Progressive Global-Local Structure-Sensitive Segmentation (PGSS-YOLOv11s) and a simple combination strategy of morphological operators. More specifically, the network PGSS-YOLOv11s is composed of an original backbone of the YOLOv11s-seg, a spatial feature aggregation module (SFAM), an adaptive feature fusion module (AFFM), and a detail-enhanced convolutional shared detection head (DE-SCSH). And the PGSS-YOLOv11s have been trained with a new grape segmentation dataset called Grape-⊥, which includes 4455 grape pixel-level instances with the annotation of ⊥-shaped regions. After the PGSS-YOLOv11s segments the ⊥-shaped regions of grapes, some morphological operations such as erosion, dilation, and skeletonization are combined to effectively extract grape pedicels and locate picking points. Finally, several experiments have been conducted to confirm the validity, effectiveness, and superiority of the proposed method. Compared with the other state-of-the-art models, the main metrics

F 1

score and mask mAP@0.5 of the PGSS-YOLOv11s reached 94.6% and 95.2% on the Grape-⊥ dataset, as well as 85.4% and 90.0% on the Winegrape dataset. Multi-scenario tests indicated that the success rate of positioning the picking points reached up to 89.44%. In orchards, real-time tests on the edge device demonstrated the practical performance of our method. Nevertheless, for grapes with short pedicels or occluded pedicels, the designed morphological algorithm exhibited the loss of picking point calculations. In future work, we will enrich the grape dataset by collecting images under different lighting conditions, from various shooting angles, and including more grape varieties to improve the method’s generalization performance. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

16 pages, 5703 KiB

Open AccessArticle

Document Image Shadow Removal Based on Illumination Correction Method

by Depeng Gao, Wenjie Liu, Shuxi Chen, Jianlin Qiu, Xiangxiang Mei and Bingshu Wang

Algorithms 2025, 18(8), 468; https://doi.org/10.3390/a18080468 - 26 Jul 2025

Viewed by 239

Abstract

Due to diverse lighting conditions and photo environments, shadows are almost ubiquitous in images, especially document images captured with mobile devices. Shadows not only seriously affect the visual quality and readability of a document but also significantly hinder image processing. Although shadow removal [...] Read more.

Due to diverse lighting conditions and photo environments, shadows are almost ubiquitous in images, especially document images captured with mobile devices. Shadows not only seriously affect the visual quality and readability of a document but also significantly hinder image processing. Although shadow removal research has achieved good results in natural scenes, specific studies on document images are lacking. To effectively remove shadows in document images, the dark illumination correction network is proposed, which mainly consists of two modules: shadow detection and illumination correction. First, a simplified shadow-corrected attention block is designed to combine spatial and channel attention, which is used to extract the features, detect the shadow mask, and correct the illumination. Then, the shadow detection block detects shadow intensity and outputs a soft shadow mask to determine the probability of each pixel belonging to shadow. Lastly, the illumination correction block corrects dark illumination with a soft shadow mask and outputs a shadow-free document image. Our experiments on five datasets show that the proposed method achieved state-of-the-art results, proving the effectiveness of illumination correction. Full article

(This article belongs to the Section Combinatorial Optimization, Graph, and Network Algorithms)

► Show Figures

Figure 1

23 pages, 4467 KiB

Open AccessArticle

Research on Indoor Object Detection and Scene Recognition Algorithm Based on Apriori Algorithm and Mobile-EFSSD Model

by Wenda Zheng, Yibo Ai and Weidong Zhang

Mathematics 2025, 13(15), 2408; https://doi.org/10.3390/math13152408 - 26 Jul 2025

Viewed by 222

Abstract

With the advancement of computer vision and image processing technologies, scene recognition has gradually become a research hotspot. However, in practical applications, it is necessary to detect the categories and locations of objects in images while recognizing scenes. To address these issues, this [...] Read more.

With the advancement of computer vision and image processing technologies, scene recognition has gradually become a research hotspot. However, in practical applications, it is necessary to detect the categories and locations of objects in images while recognizing scenes. To address these issues, this paper proposes an indoor object detection and scene recognition algorithm based on the Apriori algorithm and the Mobile-EFSSD model, which can simultaneously obtain object category and location information while recognizing scenes. The specific research contents are as follows: (1) To address complex indoor scenes and occlusion, this paper proposes an improved Mobile-EFSSD object detection algorithm. An optimized MobileNetV3 with ECA attention is used as the backbone. Multi-scale feature maps are fused via FPN. The localization loss includes a hyperparameter, and focal loss replaces confidence loss. Experiments show that the method achieves stable performance, effectively detects occluded objects, and accurately extracts category and location information. (2) To improve classification stability in indoor scene recognition, this paper proposes a naive Bayes-based method. Object detection results are converted into text features, and the Apriori algorithm extracts object associations. Prior probabilities are calculated and fed into a naive Bayes classifier for scene recognition. Evaluated using the ADE20K dataset, the method outperforms existing approaches by achieving a better accuracy–speed trade-off and enhanced classification stability. The proposed algorithm is applied to indoor scene images, enabling the simultaneous acquisition of object categories and location information while recognizing scenes. Moreover, the algorithm has a simple structure, with an object detection average precision of 82.7% and a scene recognition average accuracy of 95.23%, making it suitable for practical detection requirements. Full article

► Show Figures

Figure 1

20 pages, 3528 KiB

Open AccessArticle

High-Precision Optimization of BIM-3D GIS Models for Digital Twins: A Case Study of Santun River Basin

by Zhengbing Yang, Mahemujiang Aihemaiti, Beilikezi Abudureheman and Hongfei Tao

Sensors 2025, 25(15), 4630; https://doi.org/10.3390/s25154630 - 26 Jul 2025

Viewed by 466

Abstract

The integration of Building Information Modeling (BIM) and 3D Geographic Information System (3D GIS) models provides high-precision spatial data for digital twin watersheds. To tackle the challenges of large data volumes and rendering latency in integrated models, this study proposes a three-step framework [...] Read more.

The integration of Building Information Modeling (BIM) and 3D Geographic Information System (3D GIS) models provides high-precision spatial data for digital twin watersheds. To tackle the challenges of large data volumes and rendering latency in integrated models, this study proposes a three-step framework that uses Industry Foundation Classes (IFCs) as the base model and Open Scene Graph Binary (OSGB) as the target model: (1) geometric optimization through an angular weighting (AW)-controlled Quadric Error Metrics (QEM) algorithm; (2) Level of Detail (LOD) hierarchical mapping to establish associations between the IFC and OSGB models, and redesign scene paging logic; (3) coordinate registration by converting the IFC model’s local coordinate system to the global coordinate system and achieving spatial alignment via the seven-parameter method. Applied to the Santun River Basin digital twin project, experiments with 10 water gate models show that the AW-QEM algorithm reduces average loading time by 15% compared to traditional QEM, while maintaining 97% geometric accuracy, demonstrating the method’s efficiency in balancing precision and rendering performance. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

20 pages, 77932 KiB

Open AccessArticle

Image Alignment Based on Deep Learning to Extract Deep Feature Information from Images

by Lin Zhu, Yuxing Mao and Jianyu Pan

Sensors 2025, 25(15), 4628; https://doi.org/10.3390/s25154628 - 26 Jul 2025

Viewed by 330

Abstract

To overcome the limitations of traditional image alignment methods in capturing deep semantic features, a deep feature information image alignment network (DFA-Net) is proposed. This network aims to enhance image alignment performance through multi-level feature learning. DFA-Net is based on the deep residual [...] Read more.

To overcome the limitations of traditional image alignment methods in capturing deep semantic features, a deep feature information image alignment network (DFA-Net) is proposed. This network aims to enhance image alignment performance through multi-level feature learning. DFA-Net is based on the deep residual architecture and introduces spatial pyramid pooling to achieve cross-scalar feature fusion, effectively enhancing the feature’s adaptability to scale. A feature enhancement module based on the self-attention mechanism is designed, with key features that exhibit geometric invariance and high discriminative power, achieved through a dynamic weight allocation strategy. This improves the network’s robustness to multimodal image deformation. Experiments on two public datasets, MSRS and RoadScene, show that the method performs well in terms of alignment accuracy, with the RMSE metrics being reduced by 0.661 and 0.473, and the SSIM, MI, and NCC improved by 0.155, 0.163, and 0.211; and 0.108, 0.226, and 0.114, respectively, compared with the benchmark model. The visualization results validate the significant improvement in the features’ visual quality and confirm the method’s advantages in terms of stability and discriminative properties of deep feature extraction. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Graphical abstract

Search Results (2,370)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (2,370)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI