MDPI - Publisher of Open Access Journals

24 pages, 9603 KiB

Open AccessArticle

Label-Efficient Fine-Tuning for Remote Sensing Imagery Segmentation with Diffusion Models

by Yiyun Luo, Jinnian Wang, Jean Sequeira, Xiankun Yang, Dakang Wang, Jiabin Liu, Grekou Yao and Sébastien Mavromatis

Remote Sens. 2025, 17(15), 2579; https://doi.org/10.3390/rs17152579 - 24 Jul 2025

Abstract

High-resolution remote sensing imagery plays an essential role in urban management and environmental monitoring, providing detailed insights for applications ranging from land cover mapping to disaster response. Semantic segmentation methods are among the most effective techniques for comprehensive land cover mapping, and they [...] Read more.

High-resolution remote sensing imagery plays an essential role in urban management and environmental monitoring, providing detailed insights for applications ranging from land cover mapping to disaster response. Semantic segmentation methods are among the most effective techniques for comprehensive land cover mapping, and they commonly employ ImageNet-based pre-training semantics. However, traditional fine-tuning processes exhibit poor transferability across different downstream tasks and require large amounts of labeled data. To address these challenges, we introduce Denoising Diffusion Probabilistic Models (DDPMs) as a generative pre-training approach for semantic features extraction in remote sensing imagery. We pre-trained a DDPM on extensive unlabeled imagery, obtaining features at multiple noise levels and resolutions. In order to integrate and optimize these features efficiently, we designed a multi-layer perceptron module with residual connections. It performs channel-wise optimization to suppress feature redundancy and refine representations. Additionally, we froze the feature extractor during fine-tuning. This strategy significantly reduces computational consumption and facilitates fast transfer and deployment across various interpretation tasks on homogeneous imagery. Our comprehensive evaluation on the sparsely labeled dataset MiniFrance-S and the fully labeled Gaofen Image Dataset achieved mean intersection over union scores of 42.7% and 66.5%, respectively, outperforming previous works. This demonstrates that our approach effectively reduces reliance on labeled imagery and increases transferability to downstream remote sensing tasks. Full article

(This article belongs to the Special Issue AI-Driven Mapping Using Remote Sensing Data)

► Show Figures

Figure 1

35 pages, 4256 KiB

Open AccessArticle

Automated Segmentation and Morphometric Analysis of Thioflavin-S-Stained Amyloid Deposits in Alzheimer’s Disease Brains and Age-Matched Controls Using Weakly Supervised Deep Learning

by Gábor Barczánfalvi, Tibor Nyári, József Tolnai, László Tiszlavicz, Balázs Gulyás and Karoly Gulya

Int. J. Mol. Sci. 2025, 26(15), 7134; https://doi.org/10.3390/ijms26157134 - 24 Jul 2025

Abstract

Alzheimer’s disease (AD) involves the accumulation of amyloid-β (Aβ) plaques, whose quantification plays a central role in understanding disease progression. Automated segmentation of Aβ deposits in histopathological micrographs enables large-scale analyses but is hindered by the high cost of detailed pixel-level annotations. Weakly [...] Read more.

Alzheimer’s disease (AD) involves the accumulation of amyloid-β (Aβ) plaques, whose quantification plays a central role in understanding disease progression. Automated segmentation of Aβ deposits in histopathological micrographs enables large-scale analyses but is hindered by the high cost of detailed pixel-level annotations. Weakly supervised learning offers a promising alternative by leveraging coarse or indirect labels to reduce the annotation burden. We evaluated a weakly supervised approach to segment and analyze thioflavin-S-positive parenchymal amyloid pathology in AD and age-matched brains. Our pipeline integrates three key components, each designed to operate under weak supervision. First, robust preprocessing (including retrospective multi-image illumination correction and gradient-based background estimation) was applied to enhance image fidelity and support training, as models rely more on image features. Second, class activation maps (CAMs), generated by a compact deep classifier SqueezeNet, were used to identify, and coarsely localize amyloid-rich parenchymal regions from patch-wise image labels, serving as spatial priors for subsequent refinement without requiring dense pixel-level annotations. Third, a patch-based convolutional neural network, U-Net, was trained on synthetic data generated from micrographs based on CAM-derived pseudo-labels via an extensive object-level augmentation strategy, enabling refined whole-image semantic segmentation and generalization across diverse spatial configurations. To ensure robustness and unbiased evaluation, we assessed the segmentation performance of the entire framework using patient-wise group k-fold cross-validation, explicitly modeling generalization across unseen individuals, critical in clinical scenarios. Despite relying on weak labels, the integrated pipeline achieved strong segmentation performance with an average Dice similarity coefficient (≈0.763) and Jaccard index (≈0.639), widely accepted metrics for assessing segmentation quality in medical image analysis. The resulting segmentations were also visually coherent, demonstrating that weakly supervised segmentation is a viable alternative in histopathology, where acquiring dense annotations is prohibitively labor-intensive and time-consuming. Subsequent morphometric analyses on automatically segmented Aβ deposits revealed size-, structural complexity-, and global geometry-related differences across brain regions and cognitive status. These findings confirm that deposit architecture exhibits region-specific patterns and reflects underlying neurodegenerative processes, thereby highlighting the biological relevance and practical applicability of the proposed image-processing pipeline for morphometric analysis. Full article

(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine: 3rd Edition)

► Show Figures

Figure 1

22 pages, 2952 KiB

Open AccessArticle

Raw-Data Driven Functional Data Analysis with Multi-Adaptive Functional Neural Networks for Ergonomic Risk Classification Using Facial and Bio-Signal Time-Series Data

by Suyeon Kim, Afrooz Shakeri, Seyed Shayan Darabi, Eunsik Kim and Kyongwon Kim

Sensors 2025, 25(15), 4566; https://doi.org/10.3390/s25154566 - 23 Jul 2025

Viewed by 46

Abstract

Ergonomic risk classification during manual lifting tasks is crucial for the prevention of workplace injuries. This study addresses the challenge of classifying lifting task risk levels (low, medium, and high risk, labeled as 0, 1, and 2) using multi-modal time-series data comprising raw [...] Read more.

Ergonomic risk classification during manual lifting tasks is crucial for the prevention of workplace injuries. This study addresses the challenge of classifying lifting task risk levels (low, medium, and high risk, labeled as 0, 1, and 2) using multi-modal time-series data comprising raw facial landmarks and bio-signals (electrocardiography [ECG] and electrodermal activity [EDA]). Classifying such data presents inherent challenges due to multi-source information, temporal dynamics, and class imbalance. To overcome these challenges, this paper proposes a Multi-Adaptive Functional Neural Network (Multi-AdaFNN), a novel method that integrates functional data analysis with deep learning techniques. The proposed model introduces a novel adaptive basis layer composed of micro-networks tailored to each individual time-series feature, enabling end-to-end learning of discriminative temporal patterns directly from raw data. The Multi-AdaFNN approach was evaluated across five distinct dataset configurations: (1) facial landmarks only, (2) bio-signals only, (3) full fusion of all available features, (4) a reduced-dimensionality set of 12 selected facial landmark trajectories, and (5) the same reduced set combined with bio-signals. Performance was rigorously assessed using 100 independent stratified splits (70% training and 30% testing) and optimized via a weighted cross-entropy loss function to manage class imbalance effectively. The results demonstrated that the integrated approach, fusing facial landmarks and bio-signals, achieved the highest classification accuracy and robustness. Furthermore, the adaptive basis functions revealed specific phases within lifting tasks critical for risk prediction. These findings underscore the efficacy and transparency of the Multi-AdaFNN framework for multi-modal ergonomic risk assessment, highlighting its potential for real-time monitoring and proactive injury prevention in industrial environments. Full article

(This article belongs to the Special Issue (Bio)sensors for Physiological Monitoring)

► Show Figures

Figure 1

33 pages, 2512 KiB

Open AccessArticle

Evolutionary Framework with Binary Decision Diagram for Multi-Classification: A Human-Inspired Approach

by Boyuan Zhang, Wu Ma, Zhi Lu and Bing Zeng

Electronics 2025, 14(15), 2942; https://doi.org/10.3390/electronics14152942 - 23 Jul 2025

Viewed by 44

Abstract

Current mainstream classification methods predominantly employ end-to-end multi-class frameworks. These approaches encounter inherent challenges including high-dimensional feature space complexity, decision boundary ambiguity that escalates with increasing class cardinality, sensitivity to label noise, and limited adaptability to dynamic model expansion. However, human beings may [...] Read more.

Current mainstream classification methods predominantly employ end-to-end multi-class frameworks. These approaches encounter inherent challenges including high-dimensional feature space complexity, decision boundary ambiguity that escalates with increasing class cardinality, sensitivity to label noise, and limited adaptability to dynamic model expansion. However, human beings may avoid these mistakes naturally. Research indicates that humans subconsciously employ a decision-making process favoring binary outcomes, particularly when responding to questions requiring nuanced differentiation. Intuitively, responding to binary inquiries such as “yes/no” often proves easier for humans than addressing queries of “what/which”. Inspired by the human decision-making hypothesis, we proposes a decision paradigm named the evolutionary binary decision framework (EBDF) centered around binary classification, evolving from traditional multi-classifiers in deep learning. To facilitate this evolution, we leverage the top-N outputs from the traditional multi-class classifier to dynamically steer subsequent binary classifiers, thereby constructing a cascaded decision-making framework that emulates the hierarchical reasoning of a binary decision tree. Theoretically, we demonstrate mathematical proof that by surpassing a certain threshold of the performance of binary classifiers, our framework may outperform traditional multi-classification framework. Furthermore, we conduct experiments utilizing several prominent deep learning models across various image classification datasets. The experimental results indicate significant potential for our strategy to surpass the ceiling in multi-classification performance. Full article

(This article belongs to the Special Issue Advances in Machine Learning for Image Classification)

► Show Figures

Figure 1

35 pages, 954 KiB

Open AccessArticle

Beyond Manual Media Coding: Evaluating Large Language Models and Agents for News Content Analysis

by Stavros Doropoulos, Elisavet Karapalidou, Polychronis Charitidis, Sophia Karakeva and Stavros Vologiannidis

Appl. Sci. 2025, 15(14), 8059; https://doi.org/10.3390/app15148059 - 20 Jul 2025

Viewed by 290

Abstract

The vast volume of media content, combined with the costs of manual annotation, challenges scalable codebook analysis and risks reducing decision-making accuracy. This study evaluates the effectiveness of large language models (LLMs) and multi-agent teams in structured media content analysis based on codebook-driven [...] Read more.

The vast volume of media content, combined with the costs of manual annotation, challenges scalable codebook analysis and risks reducing decision-making accuracy. This study evaluates the effectiveness of large language models (LLMs) and multi-agent teams in structured media content analysis based on codebook-driven annotation. We construct a dataset of 200 news articles on U.S. tariff policies, manually annotated using a 26-question codebook encompassing 122 distinct codes, to establish a rigorous ground truth. Seven state-of-the-art LLMs, spanning low- to high-capacity tiers, are assessed under a unified zero-shot prompting framework incorporating role-based instructions and schema-constrained outputs. Experimental results show weighted global F1-scores between 0.636 and 0.822, with Claude-3-7-Sonnet achieving the highest direct-prompt performance. To examine the potential of agentic orchestration, we propose and develop a multi-agent system using Meta’s Llama 4 Maverick, incorporating expert role profiling, shared memory, and coordinated planning. This architecture improves the overall F1-score over the direct prompting baseline from 0.757 to 0.805 and demonstrates consistent gains across binary, categorical, and multi-label tasks, approaching commercial-level accuracy while maintaining a favorable cost–performance profile. These findings highlight the viability of LLMs, both in direct and agentic configurations, for automating structured content analysis. Full article

(This article belongs to the Special Issue Natural Language Processing in the Era of Artificial Intelligence)

► Show Figures

Figure 1

20 pages, 688 KiB

Open AccessArticle

Multi-Modal AI for Multi-Label Retinal Disease Prediction Using OCT and Fundus Images: A Hybrid Approach

by Amina Zedadra, Mahmoud Yassine Salah-Salah, Ouarda Zedadra and Antonio Guerrieri

Sensors 2025, 25(14), 4492; https://doi.org/10.3390/s25144492 - 19 Jul 2025

Viewed by 293

Abstract

Ocular diseases can significantly affect vision and overall quality of life, with diagnosis often being time-consuming and dependent on expert interpretation. While previous computer-aided diagnostic systems have focused primarily on medical imaging, this paper proposes VisionTrack, a multi-modal AI system for predicting multiple [...] Read more.

Ocular diseases can significantly affect vision and overall quality of life, with diagnosis often being time-consuming and dependent on expert interpretation. While previous computer-aided diagnostic systems have focused primarily on medical imaging, this paper proposes VisionTrack, a multi-modal AI system for predicting multiple retinal diseases, including Diabetic Retinopathy (DR), Age-related Macular Degeneration (AMD), Diabetic Macular Edema (DME), drusen, Central Serous Retinopathy (CSR), and Macular Hole (MH), as well as normal cases. The proposed framework integrates a Convolutional Neural Network (CNN) for image-based feature extraction, a Graph Neural Network (GNN) to model complex relationships among clinical risk factors, and a Large Language Model (LLM) to process patient medical reports. By leveraging diverse data sources, VisionTrack improves prediction accuracy and offers a more comprehensive assessment of retinal health. Experimental results demonstrate the effectiveness of this hybrid system, highlighting its potential for early detection, risk assessment, and personalized ophthalmic care. Experiments were conducted using two publicly available datasets, RetinalOCT and RFMID, which provide diverse retinal imaging modalities: OCT images and fundus images, respectively. The proposed multi-modal AI system demonstrated strong performance in multi-label disease prediction. On the RetinalOCT dataset, the model achieved an accuracy of 0.980, F1-score of 0.979, recall of 0.978, and precision of 0.979. Similarly, on the RFMID dataset, it reached an accuracy of 0.989, F1-score of 0.881, recall of 0.866, and precision of 0.897. These results confirm the robustness, reliability, and generalization capability of the proposed approach across different imaging modalities. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

24 pages, 3474 KiB

Open AccessArticle

Research on Unsupervised Domain Adaptive Bearing Fault Diagnosis Method Based on Migration Learning Using MSACNN-IJMMD-DANN

by Xiaoxu Li, Jiahao Wang, Jianqiang Wang, Jixuan Wang, Qinghua Li, Xuelian Yu and Jiaming Chen

Machines 2025, 13(7), 618; https://doi.org/10.3390/machines13070618 - 17 Jul 2025

Viewed by 237

Abstract

To address the problems of feature extraction, cost of obtaining labeled samples, and large differences in domain distribution in bearing fault diagnosis on variable operating conditions, an unsupervised domain-adaptive bearing fault diagnosis method based on migration learning using MSACNN-IJMMD-DANN (multi-scale and attention-based convolutional [...] Read more.

To address the problems of feature extraction, cost of obtaining labeled samples, and large differences in domain distribution in bearing fault diagnosis on variable operating conditions, an unsupervised domain-adaptive bearing fault diagnosis method based on migration learning using MSACNN-IJMMD-DANN (multi-scale and attention-based convolutional neural network, MSACNN, improved joint maximum mean discrepancy, IJMMD, domain adversarial neural network, DANN) is proposed. Firstly, in order to extract fault-type features from the source domain and target domain, this paper establishes a MSACNN based on multi-scale and attention mechanisms. Secondly, to reduce the feature distribution difference between the source and target domains and address the issue of domain distribution differences, the joint maximum mean discrepancy and correlation alignment approaches are used to create the metric criterion. Then, the adversarial loss mechanism in DANN is introduced to reduce the interference of weakly correlated domain features for better fault diagnosis and identification. Finally, the method is validated using bearing datasets from Case Western Reserve University, Jiangnan University, and our laboratory. The experimental results demonstrated that the method achieved higher accuracy across different migration tasks, providing an effective solution for bearing fault diagnosis in industrial environments with varying operating conditions. Full article

(This article belongs to the Special Issue Advances in Bearing Modeling, Fault Diagnosis, RUL Prediction (2nd Edition))

► Show Figures

Figure 1

24 pages, 890 KiB

Open AccessArticle

MCTGNet: A Multi-Scale Convolution and Hybrid Attention Network for Robust Motor Imagery EEG Decoding

by Huangtao Zhan, Xinhui Li, Xun Song, Zhao Lv and Ping Li

Bioengineering 2025, 12(7), 775; https://doi.org/10.3390/bioengineering12070775 - 17 Jul 2025

Viewed by 261

Abstract

Motor imagery (MI) EEG decoding is a key application in brain–computer interface (BCI) research. In cross-session scenarios, the generalization and robustness of decoding models are particularly challenging due to the complex nonlinear dynamics of MI-EEG signals in both temporal and frequency domains, as [...] Read more.

Motor imagery (MI) EEG decoding is a key application in brain–computer interface (BCI) research. In cross-session scenarios, the generalization and robustness of decoding models are particularly challenging due to the complex nonlinear dynamics of MI-EEG signals in both temporal and frequency domains, as well as distributional shifts across different recording sessions. While multi-scale feature extraction is a promising approach for generalized and robust MI decoding, conventional classifiers (e.g., multilayer perceptrons) struggle to perform accurate classification when confronted with high-order, nonstationary feature distributions, which have become a major bottleneck for improving decoding performance. To address this issue, we propose an end-to-end decoding framework, MCTGNet, whose core idea is to formulate the classification process as a high-order function approximation task that jointly models both task labels and feature structures. By introducing a group rational Kolmogorov–Arnold Network (GR-KAN), the system enhances generalization and robustness under cross-session conditions. Experiments on the BCI Competition IV 2a and 2b datasets demonstrate that MCTGNet achieves average classification accuracies of 88.93% and 91.42%, respectively, outperforming state-of-the-art methods by 3.32% and 1.83%. Full article

(This article belongs to the Special Issue Brain Computer Interfaces for Motor Control and Motor Learning)

► Show Figures

Figure 1

24 pages, 4383 KiB

Open AccessArticle

Predicting Employee Attrition: XAI-Powered Models for Managerial Decision-Making

by İrem Tanyıldızı Baydili and Burak Tasci

Systems 2025, 13(7), 583; https://doi.org/10.3390/systems13070583 - 15 Jul 2025

Viewed by 351

Abstract

Background: Employee turnover poses a multi-faceted challenge to organizations by undermining productivity, morale, and financial stability while rendering recruitment, onboarding, and training investments wasteful. Traditional machine learning approaches often struggle with class imbalance and lack transparency, limiting actionable insights. This study introduces an [...] Read more.

Background: Employee turnover poses a multi-faceted challenge to organizations by undermining productivity, morale, and financial stability while rendering recruitment, onboarding, and training investments wasteful. Traditional machine learning approaches often struggle with class imbalance and lack transparency, limiting actionable insights. This study introduces an Explainable AI (XAI) framework to achieve both high predictive accuracy and interpretability in turnover forecasting. Methods: Two publicly available HR datasets (IBM HR Analytics, Kaggle HR Analytics) were preprocessed with label encoding and MinMax scaling. Class imbalance was addressed via GAN-based synthetic data generation. A three-layer Transformer encoder performed binary classification, and SHapley Additive exPlanations (SHAP) analysis provided both global and local feature attributions. Model performance was evaluated using accuracy, precision, recall, F1 score, and ROC AUC metrics. Results: On the IBM dataset, the Generative Adversarial Network (GAN) Transformer model achieved 92.00% accuracy, 96.67% precision, 87.00% recall, 91.58% F1, and 96.32% ROC AUC. On the Kaggle dataset, it reached 96.95% accuracy, 97.28% precision, 96.60% recall, 96.94% F1, and 99.15% ROC AUC, substantially outperforming classical resampling methods (ROS, SMOTE, ADASYN) and recent literature benchmarks. SHAP explanations highlighted JobSatisfaction, Age, and YearsWithCurrManager as top predictors in IBM and number project, satisfaction level, and time spend company in Kaggle. Conclusion: The proposed GAN Transformer SHAP pipeline delivers state-of-the-art turnover prediction while furnishing transparent, actionable insights for HR decision-makers. Future work should validate generalizability across diverse industries and develop lightweight, real-time implementations. Full article

(This article belongs to the Special Issue Decision-Making in Sustainable Business Models: Prediction and Modeling)

► Show Figures

Figure 1

21 pages, 3826 KiB

Open AccessArticle

UAV-OVD: Open-Vocabulary Object Detection in UAV Imagery via Multi-Level Text-Guided Decoding

by Lijie Tao, Guoting Wei, Zhuo Wang, Zhaoshuai Qi, Ying Li and Haokui Zhang

Drones 2025, 9(7), 495; https://doi.org/10.3390/drones9070495 - 14 Jul 2025

Viewed by 342

Abstract

Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore [...] Read more.

Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore open-vocabulary or open-world detection, their application to UAV imagery remains limited and underexplored. In this paper, we address this limitation by exploring the relationship between images and textual semantics to extend object detection in UAV imagery to an open-vocabulary setting. We propose a novel and efficient detector named Unmanned Aerial Vehicle Open-Vocabulary Detector (UAV-OVD), specifically designed for drone-captured scenes. To facilitate open-vocabulary object detection, we propose improvements from three complementary perspectives. First, at the training level, we design a region–text contrastive loss to replace conventional classification loss, allowing the model to align visual regions with textual descriptions beyond fixed category sets. Structurally, building on this, we introduce a multi-level text-guided fusion decoder that integrates visual features across multiple spatial scales under language guidance, thereby improving overall detection performance and enhancing the representation and perception of small objects. Finally, from the data perspective, we enrich the original dataset with synonym-augmented category labels, enabling more flexible and semantically expressive supervision. Experiments conducted on two widely used benchmark datasets demonstrate that our approach achieves significant improvements in both mean mAP and Recall. For instance, for Zero-Shot Detection on xView, UAV-OVD achieves 9.9 mAP and 67.3 Recall, 1.1 and 25.6 higher than that of YOLO-World. In terms of speed, UAV-OVD achieves 53.8 FPS, nearly twice as fast as YOLO-World and five times faster than DetrReg, demonstrating its strong potential for real-time open-vocabulary detection in UAV imagery. Full article

(This article belongs to the Special Issue Applications of UVs in Digital Photogrammetry and Image Processing)

► Show Figures

Figure 1

14 pages, 6691 KiB

Open AccessArticle

Remote Sensing Extraction of Damaged Buildings in the Shigatse Earthquake, 2025: A Hybrid YOLO-E and SAM2 Approach

by Zhimin Wu, Chenyao Qu, Wei Wang, Zelang Miao and Huihui Feng

Sensors 2025, 25(14), 4375; https://doi.org/10.3390/s25144375 - 12 Jul 2025

Viewed by 246

Abstract

In January 2025, a magnitude 6.8 earthquake struck Dingri County, Shigatse, Tibet, causing severe damage. Rapid and precise extraction of damaged buildings is essential for emergency relief and rebuilding efforts. This study proposes an approach integrating YOLO-E (Real-Time Seeing Anything) and the Segment [...] Read more.

In January 2025, a magnitude 6.8 earthquake struck Dingri County, Shigatse, Tibet, causing severe damage. Rapid and precise extraction of damaged buildings is essential for emergency relief and rebuilding efforts. This study proposes an approach integrating YOLO-E (Real-Time Seeing Anything) and the Segment Anything Model 2 (SAM2) to extract damaged buildings with multi-source remote sensing images, including post-earthquake Gaofen-7 imagery (0.80 m), Beijing-3 imagery (0.30 m), and pre-earthquake Google satellite imagery (0.15 m), over the affected region. In this hybrid approach, YOLO-E functions as the preliminary segmentation module for initial segmentation. It leverages its real-time detection and segmentation capability to locate potential damaged building regions and generate coarse segmentation masks rapidly. Subsequently, SAM2 follows as a refinement step, incorporating shapefile information from pre-disaster sources to apply precise, pixel-level segmentation. The dataset used for training contained labeled examples of damaged buildings, and the model optimization was carried out using stochastic gradient descent (SGD), with cross-entropy and mean squared error as the selected loss functions. Upon evaluation, the model reached a precision of 0.840, a recall of 0.855, an F1-score of 0.847, and an IoU of 0.735. It successfully extracted 492 suspected damaged building patches within a radius of 20 km from the earthquake epicenter, clearly showing the distribution characteristics of damaged buildings concentrated in the earthquake fault zone. In summary, this hybrid YOLO-E and SAM2 approach, leveraging multi-source remote sensing imagery, delivers precise and rapid extraction of damaged buildings with a precision of 0.840, recall of 0.855, and IoU of 0.735, effectively supporting targeted earthquake rescue and post-disaster reconstruction efforts in the Dingri County fault zone. Full article

(This article belongs to the Special Issue Sensing in Harsh Environments: Power, Communication and Material Challenges)

► Show Figures

Figure 1

18 pages, 4631 KiB

Open AccessArticle

Semantic Segmentation of Rice Fields in Sub-Meter Satellite Imagery Using an HRNet-CA-Enhanced DeepLabV3+ Framework

by Yifan Shao, Pan Pan, Hongxin Zhao, Jiale Li, Guoping Yu, Guomin Zhou and Jianhua Zhang

Remote Sens. 2025, 17(14), 2404; https://doi.org/10.3390/rs17142404 - 11 Jul 2025

Viewed by 355

Abstract

Accurate monitoring of rice-planting areas underpins food security and evidence-based farm management. Recent work has advanced along three complementary lines—multi-source data fusion (to mitigate cloud and spectral confusion), temporal feature extraction (to exploit phenology), and deep-network architecture optimization. However, even the best fusion- [...] Read more.

Accurate monitoring of rice-planting areas underpins food security and evidence-based farm management. Recent work has advanced along three complementary lines—multi-source data fusion (to mitigate cloud and spectral confusion), temporal feature extraction (to exploit phenology), and deep-network architecture optimization. However, even the best fusion- and time-series-based approaches still struggle to preserve fine spatial details in sub-meter scenes. Targeting this gap, we propose an HRNet-CA-enhanced DeepLabV3+ that retains the original model’s strengths while resolving its two key weaknesses: (i) detail loss caused by repeated down-sampling and feature-pyramid compression and (ii) boundary blurring due to insufficient multi-scale information fusion. The Xception backbone is replaced with a High-Resolution Network (HRNet) to maintain full-resolution feature streams through multi-resolution parallel convolutions and cross-scale interactions. A coordinate attention (CA) block is embedded in the decoder to strengthen spatially explicit context and sharpen class boundaries. The rice dataset consisted of 23,295 images (11,295 rice + 12,000 non-rice) via preprocessing and manual labeling and benchmarked the proposed model against classical segmentation networks. Our approach boosts boundary segmentation accuracy to 92.28% MIOU and raises texture-level discrimination to 95.93% F1, without extra inference latency. Although this study focuses on architecture optimization, the HRNet-CA backbone is readily compatible with future multi-source fusion and time-series modules, offering a unified path toward operational paddy mapping in fragmented sub-meter landscapes. Full article

(This article belongs to the Topic Advances in Smart Agriculture with Remote Sensing as the Core and Its Applications in Crops Field)

► Show Figures

Figure 1

22 pages, 2320 KiB

Open AccessReview

Use of Radiomics in Characterizing Tumor Hypoxia

by Mohan Huang, Helen K. W. Law and Shing Yau Tam

Int. J. Mol. Sci. 2025, 26(14), 6679; https://doi.org/10.3390/ijms26146679 - 11 Jul 2025

Viewed by 364

Abstract

Tumor hypoxia involves limited oxygen supply within the tumor microenvironment and is closely associated with aggressiveness, metastasis, and resistance to common cancer treatment modalities such as chemotherapy and radiotherapy. Traditional methodologies for hypoxia assessment, such as the use of invasive probes and clinical [...] Read more.

Tumor hypoxia involves limited oxygen supply within the tumor microenvironment and is closely associated with aggressiveness, metastasis, and resistance to common cancer treatment modalities such as chemotherapy and radiotherapy. Traditional methodologies for hypoxia assessment, such as the use of invasive probes and clinical biomarkers, are generally not very suitable for routine clinical applications. Radiomics provides a non-invasive approach to hypoxia assessment by extracting quantitative features from medical images. Thus, radiomics is important in diagnosis and the formulation of a treatment strategy for tumor hypoxia. This article discusses the various imaging techniques used for the assessment of tumor hypoxia including magnetic resonance imaging (MRI), positron emission tomography (PET), and computed tomography (CT). It introduces the use of radiomics with machine learning and deep learning for extracting quantitative features, along with its possible clinical use in hypoxic tumors. This article further summarizes the key challenges hindering the clinical translation of radiomics, including the lack of imaging standardization and the limited availability of hypoxia-labeled datasets. It also highlights the potential of integrating radiomics with multi-omics to enhance hypoxia visualization and guide personalized cancer treatment. Full article

(This article belongs to the Special Issue Emerging Advances in Cancer Biomarkers: Machine Learning, Radiomics, Genomics, and More)

► Show Figures

Figure 1

18 pages, 1667 KiB

Open AccessArticle

Multi-Task Deep Learning for Simultaneous Classification and Segmentation of Cancer Pathologies in Diverse Medical Imaging Modalities

by Maryem Rhanoui, Khaoula Alaoui Belghiti and Mounia Mikram

Onco 2025, 5(3), 34; https://doi.org/10.3390/onco5030034 - 11 Jul 2025

Viewed by 268

Abstract

Background: Clinical imaging is an important part of health care providing physicians with great assistance in patients treatment. In fact, segmentation and grading of tumors can help doctors assess the severity of the cancer at an early stage and increase the chances [...] Read more.

Background: Clinical imaging is an important part of health care providing physicians with great assistance in patients treatment. In fact, segmentation and grading of tumors can help doctors assess the severity of the cancer at an early stage and increase the chances of cure. Despite that Deep Learning for cancer diagnosis has achieved clinically acceptable accuracy, there still remains challenging tasks, especially in the context of insufficient labeled data and the subsequent need for expensive computational ressources. Objective: This paper presents a lightweight classification and segmentation deep learning model to assist in the identification of cancerous tumors with high accuracy despite the scarcity of medical data. Methods: We propose a multi-task architecture for classification and segmentation of cancerous tumors in the Brain, Skin, Prostate and lungs. The model is based on the UNet architecture with different pre-trained deep learning models (VGG 16 and MobileNetv2) as a backbone. The multi-task model is validated on relatively small datasets (slightly exceed 1200 images) that are diverse in terms of modalities (IRM, X-Ray, Dermoscopic and Digital Histopathology), number of classes, shapes, and sizes of cancer pathologies using the accuracy and dice coefficient as statistical metrics. Results: Experiments show that the multi-task approach improve the learning efficiency and the prediction accuracy for the segmentation and classification tasks, compared to training the individual models separately. The multi-task architecture reached a classification accuracy of 86%, 90%, 88%, and 87% respectively for Skin Lesion, Brain Tumor, Prostate Cancer and Pneumothorax. For the segmentation tasks we were able to achieve high precisions respectively 95%, 98% for the Skin Lesion and Brain Tumor segmentation and a 99% precise segmentation for both Prostate cancer and Pneumothorax. Proving that the multi-task solution is more efficient than single-task networks. Full article

► Show Figures

Figure 1

16 pages, 3611 KiB

Open AccessArticle

Study on the Effectiveness of Multi-Dimensional Approaches to Urban Flood Risk Assessment

by Hyung Jun Park, Su Min Song, Dong Hyun Kim and Seung Oh Lee

Appl. Sci. 2025, 15(14), 7777; https://doi.org/10.3390/app15147777 - 11 Jul 2025

Viewed by 241

Abstract

Increasing frequency and severity of urban flooding, driven by climate change and urban population growth, present major challenges. Traditional flood control infrastructure alone cannot fully prevent flood damage, highlighting the need for a comprehensive and multi-dimensional disaster management approach. This study proposes the [...] Read more.

Increasing frequency and severity of urban flooding, driven by climate change and urban population growth, present major challenges. Traditional flood control infrastructure alone cannot fully prevent flood damage, highlighting the need for a comprehensive and multi-dimensional disaster management approach. This study proposes the Flood Risk Index for Building (FRIB)—a building-level assessment framework that integrates vulnerability, hazard, and exposure. FRIB assigns customized risk levels to individual buildings and evaluates the effectiveness of a multi-dimensional method. Compared to traditional indicators like flood depth, FRIB more accurately identifies high-risk areas by incorporating diverse risk factors. It also enables efficient resource allocation by excluding low-risk buildings, focusing efforts on high-risk zones. For example, in a case where 5124 buildings were targeted based on 1 m flood depth, applying FRIB excluded 24 buildings with “low” risk and up to 530 with “high” risk, reducing unnecessary interventions. Moreover, quantitative metrics like entropy and variance showed that as FRIB levels rise, flood depth distributions become more balanced—demonstrating that depth alone does not determine risk. In conclusion, while qualitative labels such as “very low” to “very high” aid intuitive understanding, FRIB’s quantitative, multi-dimensional approach enhances precision in urban flood management. Future research may expand FRIB’s application to varied regions, supporting tailored flood response strategies. Full article

(This article belongs to the Special Issue Selected Papers from the 14th International Multi-Conference on Engineering and Technology Innovation (IMETI 2025))

► Show Figures

Figure 1

Search Results (793)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (793)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI