MDPI - Publisher of Open Access Journals

30 pages, 25011 KB

Open AccessArticle

Multi-Level Contextual and Semantic Information Aggregation Network for Small Object Detection in UAV Aerial Images

by Zhe Liu, Guiqing He and Yang Hu

Drones 2025, 9(9), 610; https://doi.org/10.3390/drones9090610 - 29 Aug 2025

In recent years, detection methods for generic object detection have achieved significant progress. However, due to the large number of small objects in aerial images, mainstream detectors struggle to achieve a satisfactory detection performance. The challenges of small object detection in aerial images [...] Read more.

In recent years, detection methods for generic object detection have achieved significant progress. However, due to the large number of small objects in aerial images, mainstream detectors struggle to achieve a satisfactory detection performance. The challenges of small object detection in aerial images are primarily twofold: (1) Insufficient feature representation: The limited visual information for small objects makes it difficult for models to learn discriminative feature representations. (2) Background confusion: Abundant background information introduces more noise and interference, causing the features of small objects to easily be confused with the background. To address these issues, we propose a Multi-Level Contextual and Semantic Information Aggregation Network (MCSA-Net). MCSA-Net includes three key components: a Spatial-Aware Feature Selection Module (SAFM), a Multi-Level Joint Feature Pyramid Network (MJFPN), and an Attention-Enhanced Head (AEHead). The SAFM employs a sequence of dilated convolutions to extract multi-scale local context features and combines a spatial selection mechanism to adaptively merge these features, thereby obtaining the critical local context required for the objects, which enriches the feature representation of small objects. The MJFPN introduces multi-level connections and weighted fusion to fully leverage the spatial detail features of small objects in feature fusion and enhances the fused features further through a feature aggregation network. Finally, the AEHead is constructed by incorporating a sparse attention mechanism into the detection head. The sparse attention mechanism efficiently models long-range dependencies by computing the attention between the most relevant regions in the image while suppressing background interference, thereby enhancing the model’s ability to perceive targets and effectively improving the detection performance. Extensive experiments on four datasets, VisDrone, UAVDT, MS COCO, and DOTA, demonstrate that the proposed MCSA-Net achieves an excellent detection performance, particularly in small object detection, surpassing several state-of-the-art methods. Full article

(This article belongs to the Special Issue Intelligent Image Processing and Sensing for Drones, 2nd Edition)

► Show Figures

Figure 1

24 pages, 1689 KB

Open AccessArticle

Safeguarding Brand and Platform Credibility Through AI-Based Multi-Model Fake Profile Detection

by Vishwas Chakranarayan, Fadheela Hussain, Fayzeh Abdulkareem Jaber, Redha J. Shaker and Ali Rizwan

Future Internet 2025, 17(9), 391; https://doi.org/10.3390/fi17090391 - 29 Aug 2025

Abstract

The proliferation of fake profiles on social media presents critical cybersecurity and misinformation challenges, necessitating robust and scalable detection mechanisms. Such profiles weaken consumer trust, reduce user engagement, and ultimately harm brand reputation and platform credibility. As adversarial tactics and synthetic identity generation [...] Read more.

The proliferation of fake profiles on social media presents critical cybersecurity and misinformation challenges, necessitating robust and scalable detection mechanisms. Such profiles weaken consumer trust, reduce user engagement, and ultimately harm brand reputation and platform credibility. As adversarial tactics and synthetic identity generation evolve, traditional rule-based and machine learning approaches struggle to detect evolving and deceptive behavioral patterns embedded in dynamic user-generated content. This study aims to develop an AI-driven, multi-modal deep learning-based detection system for identifying fake profiles that fuses textual, visual, and social network features to enhance detection accuracy. It also seeks to ensure scalability, adversarial robustness, and real-time threat detection capabilities suitable for practical deployment in industrial cybersecurity environments. To achieve these objectives, the current study proposes an integrated AI system that combines the Robustly Optimized BERT Pretraining Approach (RoBERTa) for deep semantic textual analysis, ConvNeXt for high-resolution profile image verification, and Heterogeneous Graph Attention Networks (Hetero-GAT) for modeling complex social interactions. The extracted features from all three modalities are fused through an attention-based late fusion strategy, enhancing interpretability, robustness, and cross-modal learning. Experimental evaluations on large-scale social media datasets demonstrate that the proposed RoBERTa-ConvNeXt-HeteroGAT model significantly outperforms baseline models, including Support Vector Machine (SVM), Random Forest, and Long Short-Term Memory (LSTM). Performance achieves 98.9% accuracy, 98.4% precision, and a 98.6% F1-score, with a per-profile speed of 15.7 milliseconds, enabling real-time applicability. Moreover, the model proves to be resilient against various types of attacks on text, images, and network activity. This study advances the application of AI in cybersecurity by introducing a highly interpretable, multi-modal detection system that strengthens digital trust, supports identity verification, and enhances the security of social media platforms. This alignment of technical robustness with brand trust highlights the system’s value not only in cybersecurity but also in sustaining platform credibility and consumer confidence. This system provides practical value to a wide range of stakeholders, including platform providers, AI researchers, cybersecurity professionals, and public sector regulators, by enabling real-time detection, improving operational efficiency, and safeguarding online ecosystems. Full article

► Show Figures

Figure 1

16 pages, 306 KB

Open AccessArticle

Adaptive Cross-Scale Graph Fusion with Spatio-Temporal Attention for Traffic Prediction

by Zihao Zhao, Xingzheng Zhu and Ziyun Ye

Electronics 2025, 14(17), 3399; https://doi.org/10.3390/electronics14173399 - 26 Aug 2025

Viewed by 178

Abstract

Traffic flow prediction is a critical component of intelligent transportation systems, playing a vital role in alleviating congestion, improving road resource utilization, and supporting traffic management decisions. Although deep learning methods have made remarkable progress in this field in recent years, current studies [...] Read more.

Traffic flow prediction is a critical component of intelligent transportation systems, playing a vital role in alleviating congestion, improving road resource utilization, and supporting traffic management decisions. Although deep learning methods have made remarkable progress in this field in recent years, current studies still face challenges in modeling complex spatio-temporal dependencies, adapting to anomalous events, and generalizing to large-scale real-world scenarios. To address these issues, this paper proposes a novel traffic flow prediction model. The proposed approach simultaneously leverages temporal and frequency domain information and introduces adaptive graph convolutional layers to replace traditional graph convolutions, enabling dynamic capture of traffic network structural features. Furthermore, we design a frequency–temporal multi-head attention mechanism for effective multi-scale spatio-temporal feature extraction and develop a cross-multi-scale graph fusion strategy to enhance predictive performance. Extensive experiments on real-world datasets, PeMS and Beijing, demonstrate that our method significantly outperforms state-of-the-art (SOTA) baselines. For example, on the PeMS20 dataset, our model achieves a 53.6% lower MAE, a 12.3% lower NRMSE, and a 3.2% lower MAPE than the best existing method (STFGNN). Moreover, the proposed model achieves competitive computational efficiency and inference speed, making it well-suited for practical deployment. Full article

(This article belongs to the Special Issue Graph-Based Learning Methods in Intelligent Transportation Systems)

► Show Figures

Figure 1

30 pages, 10140 KB

Open AccessArticle

High-Accuracy Cotton Field Mapping and Spatiotemporal Evolution Analysis of Continuous Cropping Using Multi-Source Remote Sensing Feature Fusion and Advanced Deep Learning

by Xiao Zhang, Zenglu Liu, Xuan Li, Hao Bao, Nannan Zhang and Tiecheng Bai

Agriculture 2025, 15(17), 1814; https://doi.org/10.3390/agriculture15171814 - 25 Aug 2025

Viewed by 229

Abstract

Cotton is a globally strategic crop that plays a crucial role in sustaining national economies and livelihoods. To address the challenges of accurate cotton field extraction in the complex planting environments of Xinjiang’s Alaer reclamation area, a cotton field identification model was developed [...] Read more.

Cotton is a globally strategic crop that plays a crucial role in sustaining national economies and livelihoods. To address the challenges of accurate cotton field extraction in the complex planting environments of Xinjiang’s Alaer reclamation area, a cotton field identification model was developed that integrates multi-source satellite remote sensing data with machine learning methods. Using imagery from Sentinel-2, GF-1, and Landsat 8, we performed feature fusion using principal component, Gram–Schmidt (GS), and neural network techniques. Analyses of spectral, vegetation, and texture features revealed that the GS-fused blue bands of Sentinel-2 and Landsat 8 exhibited optimal performance, with a mean value of 16,725, a standard deviation of 2290, and an information entropy of 8.55. These metrics improved by 10,529, 168, and 0.28, respectively, compared with the original Landsat 8 data. In comparative classification experiments, the endmember-based random forest classifier (RFC) achieved the best traditional classification performance, with a kappa value of 0.963 and an overall accuracy (OA) of 97.22% based on 250 samples, resulting in a cotton-field extraction error of 38.58 km². By enhancing the deep learning model, we proposed a U-Net architecture that incorporated a Convolutional Block Attention Module and Atrous Spatial Pyramid Pooling. Using the GS-fused blue band data, the model achieved significantly improved accuracy, with a kappa coefficient of 0.988 and an OA of 98.56%. This advancement reduced the area estimation error to 25.42 km², representing a 34.1% decrease compared with that of the RFC. Based on the optimal model, we constructed a digital map of continuous cotton cropping from 2021 to 2023, which revealed a consistent decline in cotton acreage within the reclaimed areas. This finding underscores the effectiveness of crop rotation policies in mitigating the adverse effects of large-scale monoculture practices. This study confirms that the synergistic integration of multi-source satellite feature fusion and deep learning significantly improves crop identification accuracy, providing reliable technical support for agricultural policy formulation and sustainable farmland management. Full article

(This article belongs to the Special Issue Computers and IT Solutions for Agriculture and Their Application)

► Show Figures

Figure 1

23 pages, 2967 KB

Open AccessArticle

Ultra-Short-Term Wind Power Prediction Based on Spatiotemporal Contrastive Learning

by Jie Xu, Tie Chen, Jiaxin Yuan, Youyuan Fan, Liping Li and Xinyu Gong

Electronics 2025, 14(17), 3373; https://doi.org/10.3390/electronics14173373 - 25 Aug 2025

Viewed by 267

Abstract

With the accelerating global energy transition, wind power has become a core pillar of renewable energy systems. However, its inherent intermittency and volatility pose significant challenges to the safe, stable, and economical operation of power grids—making ultra-short-term wind power prediction a critical technical [...] Read more.

With the accelerating global energy transition, wind power has become a core pillar of renewable energy systems. However, its inherent intermittency and volatility pose significant challenges to the safe, stable, and economical operation of power grids—making ultra-short-term wind power prediction a critical technical link in optimizing grid scheduling and promoting large-scale wind power integration. Current forecasting techniques are plagued by problems like the inadequate representation of features, the poor separation of features, and the challenging clarity of deep learning models. This study introduces a method for the prediction of wind energy using spatiotemporal contrastive learning, employing seasonal trend decomposition to encapsulate the diverse characteristics of time series. A contrastive learning framework and a feature disentanglement loss function are employed to effectively decouple spatiotemporal features. Data on geographical positions are integrated to simulate spatial correlations, and a convolutional network of spatiotemporal graphs, integrated with a multi-head attention system, is crafted to improve the clarity. The proposed method is validated using operational data from two actual wind farms in Northwestern China. The research indicates that, compared with typical baselines (e.g., STGCN), this method reduces the RMSE by up to 38.47% and the MAE by up to 44.71% for ultra-short-term wind power prediction, markedly enhancing the prediction precision and offering a more efficient way to forecast wind power. Full article

► Show Figures

Figure 1

39 pages, 4783 KB

Open AccessArticle

Sparse-MoE-SAM: A Lightweight Framework Integrating MoE and SAM with a Sparse Attention Mechanism for Plant Disease Segmentation in Resource-Constrained Environments

by Benhan Zhao, Xilin Kang, Hao Zhou, Ziyang Shi, Lin Li, Guoxiong Zhou, Fangying Wan, Jiangzhang Zhu, Yongming Yan, Leheng Li and Yulong Wu

Plants 2025, 14(17), 2634; https://doi.org/10.3390/plants14172634 - 24 Aug 2025

Viewed by 255

Abstract

Plant disease segmentation has achieved significant progress with the help of artificial intelligence. However, deploying high-accuracy segmentation models in resource-limited settings faces three key challenges, as follows: (A) Traditional dense attention mechanisms incur quadratic computational complexity growth (O(

n^{2} d

)), rendering [...] Read more.

Plant disease segmentation has achieved significant progress with the help of artificial intelligence. However, deploying high-accuracy segmentation models in resource-limited settings faces three key challenges, as follows: (A) Traditional dense attention mechanisms incur quadratic computational complexity growth (O(

n^{2} d

)), rendering them ill-suited for low-power hardware. (B) Naturally sparse spatial distributions and large-scale variations in the lesions on leaves necessitate models that concurrently capture long-range dependencies and local details. (C) Complex backgrounds and variable lighting in field images often induce segmentation errors. To address these challenges, we propose Sparse-MoE-SAM, an efficient framework based on an enhanced Segment Anything Model (SAM). This deep learning framework integrates sparse attention mechanisms with a two-stage mixture of experts (MoE) decoder. The sparse attention dynamically activates key channels aligned with lesion sparsity patterns, reducing self-attention complexity while preserving long-range context. Stage 1 of the MoE decoder performs coarse-grained boundary localization; Stage 2 achieves fine-grained segmentation by leveraging specialized experts within the MoE, significantly enhancing edge discrimination accuracy. The expert repository—comprising standard convolutions, dilated convolutions, and depthwise separable convolutions—dynamically routes features through optimized processing paths based on input texture and lesion morphology. This enables robust segmentation across diverse leaf textures and plant developmental stages. Further, we design a sparse attention-enhanced Atrous Spatial Pyramid Pooling (ASPP) module to capture multi-scale contexts for both extensive lesions and small spots. Evaluations on three heterogeneous datasets (PlantVillage Extended, CVPPP, and our self-collected field images) show that Sparse-MoE-SAM achieves a mean Intersection-over-Union (mIoU) of 94.2%—surpassing standard SAM by 2.5 percentage points—while reducing computational costs by 23.7% compared to the original SAM baseline. The model also demonstrates balanced performance across disease classes and enhanced hardware compatibility. Our work validates that integrating sparse attention with MoE mechanisms sustains accuracy while drastically lowering computational demands, enabling the scalable deployment of plant disease segmentation models on mobile and edge devices. Full article

(This article belongs to the Special Issue Advances in Artificial Intelligence for Plant Research)

► Show Figures

Figure 1

40 pages, 10467 KB

Open AccessArticle

Cascaded Hierarchical Attention with Adaptive Fusion for Visual Grounding in Remote Sensing

by Huming Zhu, Tianqi Gao, Zhixian Li, Zhipeng Chen, Qiuming Li, Kongmiao Miao, Biao Hou and Licheng Jiao

Remote Sens. 2025, 17(17), 2930; https://doi.org/10.3390/rs17172930 - 23 Aug 2025

Viewed by 343

Abstract

Visual grounding for remote sensing (RSVG) is the task of localizing the referred object in remote sensing (RS) images by parsing free-form language descriptions. However, RSVG faces the challenge of low detection accuracy due to unbalanced multi-scale grounding capabilities, where large objects have [...] Read more.

Visual grounding for remote sensing (RSVG) is the task of localizing the referred object in remote sensing (RS) images by parsing free-form language descriptions. However, RSVG faces the challenge of low detection accuracy due to unbalanced multi-scale grounding capabilities, where large objects have more prominent grounding accuracy than small objects. Based on Faster R-CNN, we propose Faster R-CNN in Visual Grounding for Remote Sensing (FR-RSVG), a two-stage method for grounding RS objects. Building on this foundation, to enhance the ability to ground multi-scale objects, we propose Faster R-CNN with Adaptive Vision-Language Fusion (FR-AVLF), which introduces a layered Adaptive Vision-Language Fusion (AVLF) module. Specifically, this method can adaptively fuse deep or shallow visual features according to the input text (e.g., location-related or object characteristic descriptions), thereby optimizing semantic feature representation and improving grounding accuracy for objects of different scales. Given that RSVG is essentially an expanded form of RS object detection, and considering the knowledge the model acquired in prior RS object detection tasks, we propose Faster R-CNN with Adaptive Vision-Language Fusion Pretrained (FR-AVLF_PRE). To further enhance model performance, we propose Faster R-CNN with Cascaded Hierarchical Attention Grounding and Multi-Level Adaptive Vision-Language Fusion Pretrained (FR-CHAGAVLF_PRE), which introduces a cascaded hierarchical attention grounding mechanism, employs a more advanced language encoder, and improves upon AVLF by proposing Multi-Level AVLF, significantly improving localization accuracy in complex scenarios. Extensive experiments on the DIOR-RSVG dataset demonstrate that our model surpasses most existing advanced models. To validate the generalization capability of our model, we conducted zero-shot inference experiments on shared categories between DIOR-RSVG and both Complex Description DIOR-RSVG (DIOR-RSVG-C) and OPT-RSVG datasets, achieving performance superior to most existing models. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Figure 1

22 pages, 17793 KB

Open AccessArticle

Small Object Detection in Agriculture: A Case Study on Durian Orchards Using EN-YOLO and Thermal Fusion

by Ruipeng Tang, Tan Jun, Qiushi Chu, Wei Sun and Yili Sun

Plants 2025, 14(17), 2619; https://doi.org/10.3390/plants14172619 - 22 Aug 2025

Viewed by 353

Abstract

Durian is a major tropical crop in Southeast Asia, but its yield and quality are severely impacted by a range of pests and diseases. Manual inspection remains the dominant detection method but suffers from high labor intensity, low accuracy, and difficulty in scaling. [...] Read more.

Durian is a major tropical crop in Southeast Asia, but its yield and quality are severely impacted by a range of pests and diseases. Manual inspection remains the dominant detection method but suffers from high labor intensity, low accuracy, and difficulty in scaling. To address these challenges, this paper proposes EN-YOLO, a novel enhanced YOLO-based deep learning model that integrates the EfficientNet backbone and multimodal attention mechanisms for precise detection of durian pests and diseases. The model removes redundant feature layers and introduces a large-span residual edge to preserve key spatial information. Furthermore, a multimodal input strategy—incorporating RGB, near-infrared and thermal imaging—is used to enhance robustness under variable lighting and occlusion. Experimental results on real orchard datasets demonstrate that EN-YOLO outperforms YOLOv8 (You Only Look Once version 8), YOLOv5-EB (You Only Look Once version 5—Efficient Backbone), and Fieldsentinel-YOLO in detection accuracy, generalization, and small-object recognition. It achieves a 95.3% counting accuracy and shows superior performance in ablation and cross-scene tests. The proposed system also supports real-time drone deployment and integrates an expert knowledge base for intelligent decision support. This work provides an efficient, interpretable, and scalable solution for automated pest and disease management in smart agriculture. Full article

(This article belongs to the Special Issue Plant Protection and Integrated Pest Management)

► Show Figures

Figure 1

45 pages, 2283 KB

Open AccessReview

Agricultural Image Processing: Challenges, Advances, and Future Trends

by Xuehua Song, Letian Yan, Sihan Liu, Tong Gao, Li Han, Xiaoming Jiang, Hua Jin and Yi Zhu

Appl. Sci. 2025, 15(16), 9206; https://doi.org/10.3390/app15169206 - 21 Aug 2025

Viewed by 301

Abstract

Agricultural image processing technology plays a critical role in enabling precise disease detection, accurate yield prediction, and various smart agriculture applications. However, its practical implementation faces key challenges, including environmental interference, data scarcity and imbalance datasets, and the difficulty of deploying models on [...] Read more.

Agricultural image processing technology plays a critical role in enabling precise disease detection, accurate yield prediction, and various smart agriculture applications. However, its practical implementation faces key challenges, including environmental interference, data scarcity and imbalance datasets, and the difficulty of deploying models on resource-constrained edge devices. This paper presents a systematic review of recent advances in addressing these challenges, with a focus on three core aspects: environmental robustness, data efficiency, and model deployment. The study identifies that attention mechanisms, Transformers, multi-scale feature fusion, and domain adaptation can enhance model robustness under complex conditions. Self-supervised learning, transfer learning, GAN-based data augmentation, SMOTE improvements, and Focal loss optimization effectively alleviate data limitations. Furthermore, model compression techniques such as pruning, quantization, and knowledge distillation facilitate efficient deployment. Future research should emphasize multi-modal fusion, causal reasoning, edge–cloud collaboration, and dedicated hardware acceleration. Integrating agricultural expertise with AI is essential for promoting large-scale adoption, as well as achieving intelligent, sustainable agricultural systems. Full article

(This article belongs to the Special Issue Pattern Recognition Applications of Neural Networks and Deep Learning)

► Show Figures

Figure 1

17 pages, 1733 KB

Open AccessArticle

Synergistic Remote Sensing and In Situ Observations for Rapid Ocean Temperature Profile Forecasting on Edge Devices

by Jingpeng Shi, Yang Zhao and Fangjie Yu

Appl. Sci. 2025, 15(16), 9204; https://doi.org/10.3390/app15169204 - 21 Aug 2025

Viewed by 258

Abstract

Regional rapid forecasting of vertical ocean temperature profiles is increasingly important for marine aquaculture, as these profiles directly affect habitat management and the physiological responses of farmed species. However, observational temperature profile data with sufficient temporal resolution are often unavailable, limiting their use [...] Read more.

Regional rapid forecasting of vertical ocean temperature profiles is increasingly important for marine aquaculture, as these profiles directly affect habitat management and the physiological responses of farmed species. However, observational temperature profile data with sufficient temporal resolution are often unavailable, limiting their use in regional rapid forecasting. In addition, traditional numerical ocean models suffer from intensive computational demands and limited operational flexibility, making them unsuitable for regional rapid forecasting applications. To address this gap, we propose PICA-Net (Physics-Inspired CNN–Attention–BiLSTM Network), a hybrid deep learning model that coordinates large-scale satellite observations with local-scale, continuous in situ data to enhance predictive fidelity. The model also incorporates weak physical constraints during training that enforce temporal–spatial diffusion consistency, mixed-layer homogeneity, and surface heat flux consistency, enhancing physical consistency and interpretability. The model uses hourly historical inputs to predict temperature profiles at 6 h intervals over a period of 24 h, incorporating features such as sea surface temperature, sea surface height anomalies, wind fields, salinity, ocean currents, and net heat flux. Experimental results demonstrate that PICA-Net outperforms baseline models in terms of accuracy and generalization. Additionally, its lightweight design enables real-time deployment on edge devices, offering a viable solution for localized, on-site forecasting in smart aquaculture. Full article

► Show Figures

Figure 1

20 pages, 10705 KB

Open AccessArticle

EMFE-YOLO: A Lightweight Small Object Detection Model for UAVs

by Chengjun Yang, Yan Shen and Lutao Wang

Sensors 2025, 25(16), 5200; https://doi.org/10.3390/s25165200 - 21 Aug 2025

Viewed by 718

Abstract

Small object detection in Unmanned Aerial Vehicles’ (UAVs) aerial images faces challenges such as low detection accuracy and complex backgrounds. Meanwhile, it is difficult to deploy the object detection models with large parameters on resource-constrained UAVs. Therefore, a lightweight small object detection model [...] Read more.

Small object detection in Unmanned Aerial Vehicles’ (UAVs) aerial images faces challenges such as low detection accuracy and complex backgrounds. Meanwhile, it is difficult to deploy the object detection models with large parameters on resource-constrained UAVs. Therefore, a lightweight small object detection model EMFE-YOLO is proposed based on efficient multi-scale feature enhancement by improving YOLOv8s. Firstly, the Enhanced Attention to Large-scale Features (EALF) structure is applied in EMFE-YOLO to focus on large-scale features, improve the detection ability to small objects, and decrease the parameters. Secondly, the efficient multi-scale feature enhancement (EMFE) module is integrated into the backbone of EALF for feature extraction and enhancement. The EMFE module reduces the computational cost, obtains richer contextual information, and mitigates the interference from complex backgrounds. Finally, DySample is employed in the neck of EALF to optimize the upsampling process of features. The EMFE-YOLO is validated on the VisDrone2019-val dataset. Experimental results show that it improves mAP50 and mAP50:95 by 8.5% and 6.3%, respectively, and reduces the parameters by 73% compared to YOLOv8s. These results demonstrate that EMFE-YOLO achieves a good balance between accuracy and efficiency, making it suitable for deployment on UAVs with limited computational resources. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

25 pages, 7900 KB

Open AccessArticle

Multi-Label Disease Detection in Chest X-Ray Imaging Using a Fine-Tuned ConvNeXtV2 with a Customized Classifier

by Kangzhe Xiong, Yuyun Tu, Xinping Rao, Xiang Zou and Yingkui Du

Informatics 2025, 12(3), 80; https://doi.org/10.3390/informatics12030080 - 14 Aug 2025

Viewed by 554

Abstract

Deep-learning-based multiple label chest X-ray classification has achieved significant success, but existing models still have three main issues: fixed-scale convolutions fail to capture both large and small lesions, standard pooling is lacking in the lack of attention to important regions, and linear classification [...] Read more.

Deep-learning-based multiple label chest X-ray classification has achieved significant success, but existing models still have three main issues: fixed-scale convolutions fail to capture both large and small lesions, standard pooling is lacking in the lack of attention to important regions, and linear classification lacks the capacity to model complex dependency between features. To circumvent these obstacles, we propose CONVFCMAE, a lightweight yet powerful framework that is built on a backbone that is partially frozen (77.08 % of the initial layers are fixed) in order to preserve complex, multi-scale features while decreasing the number of trainable parameters. Our architecture adds (1) an intelligent global pooling module that is learnable, with

1 \times 1

convolutions that are dynamically weighted by their spatial location, and (2) a multi-head attention block that is dedicated to channel re-calibration, along with (3) a two-layer MLP that has been enhanced with ReLU, batch normalization, and dropout. This module is used to enhance the non-linearity of the feature space. To further reduce the noise associated with labels and the imbalance in class distribution inherent to the NIH ChestXray14 dataset, we utilize a combined loss that combines BCEWithLogits and Focal Loss as well as extensive data augmentation. On ChestXray14, the average ROC–AUC of CONVFCMAE is 0.852, which is 3.97 percent greater than the state of the art. Ablation experiments demonstrate the individual and collective effectiveness of each component. Grad-CAM visualizations have a superior capacity to localize the pathological regions, and this increases the interpretability of the model. Overall, CONVFCMAE provides a practical, generalizable solution to the problem of extracting features from medical images in a practical manner. Full article

(This article belongs to the Section Medical and Clinical Informatics)

► Show Figures

Figure 1

31 pages, 13384 KB

Open AccessArticle

Physics-Informed and Explainable Graph Neural Networks for Generalizable Urban Building Energy Modeling

by Rudai Shan, Hao Ning, Qianhui Xu, Xuehua Su, Mengjin Guo and Xiaohan Jia

Appl. Sci. 2025, 15(16), 8854; https://doi.org/10.3390/app15168854 - 11 Aug 2025

Viewed by 597

Abstract

Urban building energy prediction is a critical challenge for sustainable city planning and large-scale retrofit prioritization. However, traditional data-driven models struggle to capture real urban environments’ spatial and morphological complexity. In this study, we systematically benchmark a range of graph-based neural networks (GNNs)—including [...] Read more.

Urban building energy prediction is a critical challenge for sustainable city planning and large-scale retrofit prioritization. However, traditional data-driven models struggle to capture real urban environments’ spatial and morphological complexity. In this study, we systematically benchmark a range of graph-based neural networks (GNNs)—including graph convolutional network (GCN), GraphSAGE, and several physics-informed graph attention network (GAT) variants—against conventional artificial neural network (ANN) baselines, using both shape coefficient and energy use intensity (EUI) stratification across three distinct residential districts. Extensive ablation and cross-district generalization experiments reveal that models explicitly incorporating interpretable physical edge features, such as inter-building distance and angular relation, achieve significantly improved prediction accuracy and robustness over standard approaches. Among all models, GraphSAGE demonstrates the best overall performance and generalization capability. At the same time, the effectiveness of specific GAT edge features is found to be district-dependent, reflecting variations in local morphology and spatial logic. Furthermore, explainability analysis shows that the integration of domain-relevant spatial features enhances model interpretability and provides actionable insight for urban retrofit and policy intervention. The results highlight the value of physics-informed GNNs (PINN) as a scalable, transferable, and transparent tool for urban energy modeling, supporting evidence-based decision making in the context of aging residential building upgrades and sustainable urban transformation. Full article

(This article belongs to the Special Issue AI-Assisted Building Design and Environment Control)

► Show Figures

Figure 1

14 pages, 8017 KB

Open AccessArticle

Fast Rice Plant Disease Recognition Based on Dual-Attention-Guided Lightweight Network

by Chenrui Kang, Lin Jiao, Kang Liu, Zhigui Liu and Rujing Wang

Agriculture 2025, 15(16), 1724; https://doi.org/10.3390/agriculture15161724 - 10 Aug 2025

Viewed by 410

Abstract

The yield and quality of rice are severely affected by rice disease, which can result in crop failure. Early and precise identification of rice plant diseases enables timely action, minimizing potential economic losses. Deep convolutional neural networks (CNNs) have significantly advanced image classification [...] Read more.

The yield and quality of rice are severely affected by rice disease, which can result in crop failure. Early and precise identification of rice plant diseases enables timely action, minimizing potential economic losses. Deep convolutional neural networks (CNNs) have significantly advanced image classification accuracy by leveraging powerful feature extraction capabilities, outperforming traditional machine learning methods. In this work, we propose a dual attention-guided lightweight network for fast and precise recognition of rice diseases with small lesions and high similarity. First, to efficiently extract features while reducing computational redundancy, we incorporate FasterNet using partial convolution (PC-Conv). Furthermore, to enhance the network’s ability to capture fine-grained lesion details, we introduce a dual-attention mechanism that aggregates long-range contextual information in both spatial and channel dimensions. Additionally, we construct a large-scale rice disease dataset, named RD-6, which contains 2196 images across six categories, to support model training and evaluation. Finally, the proposed rice disease detection method is evaluated on the RD-6 dataset, demonstrating its superior performance over other state-of-the-art methods, especially in terms of recognition efficiency. For instance, the method achieves an average accuracy of 99.9%, recall of 99.8%, precision of 100%, specificity of 100%, and F1-score of 99.9%. Additionally, the proposed method has only 3.6 M parameters, demonstrating higher efficiency without sacrificing accuracy. Full article

(This article belongs to the Section Crop Protection, Diseases, Pests and Weeds)

► Show Figures

Figure 1

31 pages, 8113 KB

Open AccessArticle

An Autoencoder-like Non-Negative Matrix Factorization with Structure Regularization Algorithm for Clustering

by Haiyan Gao and Ling Zhong

Symmetry 2025, 17(8), 1283; https://doi.org/10.3390/sym17081283 - 10 Aug 2025

Viewed by 405

Abstract

Clustering plays a crucial role in data mining and knowledge discovery, where non-negative matrix factorization (NMF) has attracted widespread attention due to its effective data representation and dimensionality reduction capabilities. However, standard NMF has inherent limitations when processing sampled data embedded in low-dimensional [...] Read more.

Clustering plays a crucial role in data mining and knowledge discovery, where non-negative matrix factorization (NMF) has attracted widespread attention due to its effective data representation and dimensionality reduction capabilities. However, standard NMF has inherent limitations when processing sampled data embedded in low-dimensional manifold structures within high-dimensional ambient spaces, failing to effectively capture the complex structural information hidden in feature manifolds and sampling manifolds, and neglecting the learning of global structures. To address these issues, a novel structure regularization autoencoder-like non-negative matrix factorization for clustering (SRANMF) is proposed. Firstly, based on the non-negative symmetric encoder-decoder framework, we construct an autoencoder-like NMF model to enhance the characterization ability of latent information in data. Then, by fully considering high-order neighborhood relationships in the data, an optimal graph regularization strategy is introduced to preserve multi-order topological information structures. Additionally, principal component analysis (PCA) is employed to measure global data structures by maximizing the variance of projected data. Comparative experiments on 11 benchmark datasets demonstrate that SRANMF exhibits excellent clustering performance. Specifically, on the large-scale complex datasets MNIST and COIL100, the clustering evaluation metrics improved by an average of 35.31% and 46.17% (ACC) and 47.12% and 18.10% (NMI), respectively. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

Search Results (515)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (515)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI