Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (4,327)

Search Parameters:
Keywords = visual localization

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 5632 KiB  
Article
Classification of Rockburst Intensity Grades: A Method Integrating k-Medoids-SMOTE and BSLO-RF
by Qinzheng Wu, Bing Dai, Danli Li, Hanwen Jia and Penggang Li
Appl. Sci. 2025, 15(16), 9045; https://doi.org/10.3390/app15169045 (registering DOI) - 16 Aug 2025
Abstract
Precise forecasting of rockburst intensity categories is vital to safeguarding operational safety and refining design protocols in deep underground engineering. This study proposes an intelligent forecasting framework through the integration of k-medoids-SMOTE and the BSLO-optimized Random Forest (BSLO-RF) algorithm. A curated dataset encompassing [...] Read more.
Precise forecasting of rockburst intensity categories is vital to safeguarding operational safety and refining design protocols in deep underground engineering. This study proposes an intelligent forecasting framework through the integration of k-medoids-SMOTE and the BSLO-optimized Random Forest (BSLO-RF) algorithm. A curated dataset encompassing 351 rockburst instances, stratified into four intensity grades, was compiled via systematic literature synthesis. To mitigate data imbalance and outlier interference, z-score normalization and k-medoids-SMOTE oversampling were implemented, with t-SNE visualization confirming improved inter-class distinguishability. Notably, the BSLO algorithm was utilized for hyperparameter tuning of the Random Forest model, thereby strengthening its global search and local refinement capabilities. Comparative analyses revealed that the optimized BSLO-RF framework outperformed conventional machine learning methods (e.g., BSLO-SVM, BSLO-BP), achieving an average prediction accuracy of 89.16% on the balanced dataset—accompanied by a recall of 87.5% and F1-score of 0.88. It exhibited superior performance in predicting extreme grades: 93.3% accuracy for Level I (no rockburst) and 87.9% for Level IV (severe rockburst), exceeding BSLO-SVM (75.8% for Level IV) and BSLO-BP (72.7% for Level IV). Field validation via the Zhongnanshan Tunnel project further corroborated its reliability, yielding an 80% prediction accuracy (four out of five cases correctly classified) and verifying its adaptability to complex geological settings. This research introduces a robust intelligent classification approach for rockburst intensity, offering actionable insights for risk assessment and mitigation in deep mining and tunneling initiatives. Full article
Show Figures

Figure 1

24 pages, 2703 KiB  
Article
Unsupervised Person Re-Identification via Deep Attribute Learning
by Shun Zhang, Yaohui Xu, Xuebin Zhang, Boyang Cheng and Ke Wang
Future Internet 2025, 17(8), 371; https://doi.org/10.3390/fi17080371 - 15 Aug 2025
Abstract
Driven by growing public security demands and the advancement of intelligent surveillance systems, person re-identification (ReID) has emerged as a prominent research focus in the field of computer vision. %The primary objective of person ReID is to retrieve individuals with the same identity [...] Read more.
Driven by growing public security demands and the advancement of intelligent surveillance systems, person re-identification (ReID) has emerged as a prominent research focus in the field of computer vision. %The primary objective of person ReID is to retrieve individuals with the same identity across different camera views. However, this task presents challenges due to its high sensitivity to variations in visual appearance caused by factors such as body pose and camera parameters. Although deep learning-based methods have achieved marked progress in ReID, the high cost of annotation remains a challenge that cannot be overlooked. To address this, we propose an unsupervised attribute learning framework that eliminates the need for costly manual annotations while maintaining high accuracy. The framework learns the mid-level human attributes (such as clothing type and gender) that are robust to substantial visual appearance variations and can hence boost the accuracy of attributes with a small amount of labeled data. To carry out our framework, we present a part-based convolutional neural network (CNN) architecture, which consists of two components for image and body attribute learning on a global level and upper- and lower-body image and attribute learning at a local level. The proposed architecture is trained to learn attribute-semantic and identity-discriminative feature representations simultaneously. For model learning, we first train our part-based network using a supervised approach on a labeled attribute dataset. Then, we apply an unsupervised clustering method to assign pseudo-labels to unlabeled images in a target dataset using our trained network. To improve feature compatibility, we introduce an attribute consistency scheme for unsupervised domain adaptation on this unlabeled target data. During training on the target dataset, we alternately perform three steps: extracting features with the updated model, assigning pseudo-labels to unlabeled images, and fine-tuning the model. % change Through a unified framework that fuses complementary attribute-label and identity label information, our approach achieves considerable improvements of 10.6\% and 3.91\% mAP on Market-1501→DukeMTMC-ReID and DukeMTMC-ReID→Market-1501 unsupervised domain adaptation tasks, respectively. Full article
(This article belongs to the Special Issue Advances in Deep Learning and Next-Generation Internet Technologies)
18 pages, 2055 KiB  
Article
Language-Driven Cross-Attention for Visible–Infrared Image Fusion Using CLIP
by Xue Wang, Jiatong Wu, Pengfei Zhang and Zhongjun Yu
Sensors 2025, 25(16), 5083; https://doi.org/10.3390/s25165083 - 15 Aug 2025
Abstract
Language-guided multimodal fusion, which integrates information from both visible and infrared images, has shown strong performance in image fusion tasks. In low-light or complex environments, a single modality often fails to fully capture scene features, whereas fused images enable robots to obtain multidimensional [...] Read more.
Language-guided multimodal fusion, which integrates information from both visible and infrared images, has shown strong performance in image fusion tasks. In low-light or complex environments, a single modality often fails to fully capture scene features, whereas fused images enable robots to obtain multidimensional scene understanding for navigation, localization, and environmental perception. This capability is particularly important in applications such as autonomous driving, intelligent surveillance, and search-and-rescue operations, where accurate recognition and efficient decision-making are critical. To enhance the effectiveness of multimodal fusion, we propose a text-guided infrared and visible image fusion network. The framework consists of two key components: an image fusion branch, which employs a cross-domain attention mechanism to merge multimodal features, and a text-guided module, which leverages the CLIP model to extract semantic cues from image descriptions containing visible content. These semantic parameters are then used to guide the feature modulation process during fusion. By integrating visual and linguistic information, our framework is capable of generating high-quality color-fused images that not only enhance visual detail but also enrich semantic understanding. On benchmark datasets, our method achieves strong quantitative performance: SF = 2.1381, Qab/f = 0.6329, MI = 14.2305, SD = 0.8527, VIF = 45.1842 on LLVIP, and SF = 1.3149, Qab/f = 0.5863, MI = 13.9676, SD = 94.7203, VIF = 0.7746 on TNO. These results highlight the robustness and scalability of our model, making it a promising solution for real-world multimodal perception applications. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

20 pages, 4041 KiB  
Article
Enhancing Cardiovascular Disease Detection Through Exploratory Predictive Modeling Using DenseNet-Based Deep Learning
by Wael Hadi, Tushar Jaware, Tarek Khalifa, Faisal Aburub, Nawaf Ali and Rashmi Saini
Computers 2025, 14(8), 330; https://doi.org/10.3390/computers14080330 - 15 Aug 2025
Abstract
Cardiovascular Disease (CVD) remains the number one cause of morbidity and mortality, accounting for 17.9 million deaths every year. Precise and early diagnosis is therefore critical to the betterment of the patient’s outcomes and the many burdens that weigh on the healthcare systems. [...] Read more.
Cardiovascular Disease (CVD) remains the number one cause of morbidity and mortality, accounting for 17.9 million deaths every year. Precise and early diagnosis is therefore critical to the betterment of the patient’s outcomes and the many burdens that weigh on the healthcare systems. This work presents for the first time an innovative approach using the DenseNet architecture that allows for the automatic recognition of CVD from clinical data. The data is preprocessed and augmented, with a heterogeneous dataset of cardiovascular-related images like angiograms, echocardiograms, and magnetic resonance images used. Optimizing the deep features for robust model performance is conducted through fine-tuning a custom DenseNet architecture along with rigorous hyper parameter tuning and sophisticated strategies to handle class imbalance. The DenseNet model, after training, shows high accuracy, sensitivity, and specificity in the identification of CVD compared to baseline approaches. Apart from the quantitative measures, detailed visualizations are conducted to show that the model is able to localize and classify pathological areas within an image. The accuracy of the model was found to be 0.92, precision 0.91, and recall 0.95 for class 1, and an overall weighted average F1-score of 0.93, which establishes the efficacy of the model. There is great clinical applicability in this research in terms of accurate detection of CVD to provide time-interventional personalized treatments. This DenseNet-based approach advances the improvement on the diagnosis of CVD through state-of-the-art technology to be used by radiologists and clinicians. Future work, therefore, would probably focus on improving the model’s interpretability towards a broader population of patients and its generalization towards it, revolutionizing the diagnosis and management of CVD. Full article
(This article belongs to the Special Issue Machine Learning and Statistical Learning with Applications 2025)
Show Figures

Figure 1

21 pages, 9031 KiB  
Article
A Pyramid Convolution-Based Scene Coordinate Regression Network for AR-GIS
by Haobo Xu, Chao Zhu, Yilong Wang, Huachen Zhu and Wei Ma
ISPRS Int. J. Geo-Inf. 2025, 14(8), 311; https://doi.org/10.3390/ijgi14080311 - 15 Aug 2025
Abstract
Camera tracking plays a pivotal role in augmented reality geographic information systems (AR-GIS) and location-based services (LBS), serving as a crucial component for accurate spatial awareness and navigation. Current learning-based camera tracking techniques, while achieving superior accuracy in pose estimation, often overlook changes [...] Read more.
Camera tracking plays a pivotal role in augmented reality geographic information systems (AR-GIS) and location-based services (LBS), serving as a crucial component for accurate spatial awareness and navigation. Current learning-based camera tracking techniques, while achieving superior accuracy in pose estimation, often overlook changes in scale. This oversight results in less stable localization performance and challenges in coping with dynamic environments. To address these tasks, we propose a pyramid convolution-based scene coordinate regression network (PSN). Our approach leverages a pyramidal convolutional structure, integrating kernels of varying sizes and depths, alongside grouped convolutions that alleviate computational demands while capturing multi-scale features from the input imagery. Subsequently, the network incorporates a novel randomization strategy, effectively diminishing correlated gradients and markedly bolstering the training process’s efficiency. The culmination lies in a regression layer that maps the 2D pixel coordinates to their corresponding 3D scene coordinates with precision. The experimental outcomes show that our proposed method achieves centimeter-level accuracy in small-scale scenes and decimeter-level accuracy in large-scale scenes after only a few minutes of training. It offers a favorable balance between localization accuracy and efficiency, and effectively supports augmented reality visualization in dynamic environments. Full article
Show Figures

Figure 1

21 pages, 8328 KiB  
Article
Three-Dimensional Morphometric Analysis of the Columbretes Grande Turbidite Channel (Ebro Continental Margin, NW Mediterranean)
by José Luis Casamor
Geosciences 2025, 15(8), 318; https://doi.org/10.3390/geosciences15080318 - 15 Aug 2025
Abstract
Turbidite channels are final conduits for the transfer of terrigenous detritus to the deep-sea depositional systems. Studying their morphology and geometric parameters can provide information on density flow characteristics and sedimentary processes, making it an objective and quantitative way to differentiate the deep-sea [...] Read more.
Turbidite channels are final conduits for the transfer of terrigenous detritus to the deep-sea depositional systems. Studying their morphology and geometric parameters can provide information on density flow characteristics and sedimentary processes, making it an objective and quantitative way to differentiate the deep-sea deposits they feed, which are of special interest to the oil industry. In this work, the morphology is studied, the main geometric parameters are calculated, and the potential sedimentary fill of a turbiditic channel, the Columbretes Grande channel, located on the Ebro continental margin (NW Mediterranean Sea), is reconstructed and visualized in 3D. This complete morphometric analysis shows a concave and smooth channel indicating a profile in equilibrium with local evidence of erosion. Considering the height of the flanks (< 150 m), the existence of well-developed levees, the high sinuosity of some of its reaches, and the relatively low slopes, the channel can be classified as depositional. The sinuosity index is close to 2 in some courses, and the gentle slopes suggest that the fine-grained turbidity currents that episodically circulate in its interior reach the channel’s end. Full article
Show Figures

Figure 1

26 pages, 663 KiB  
Article
Multi-Scale Temporal Fusion Network for Real-Time Multimodal Emotion Recognition in IoT Environments
by Sungwook Yoon and Byungmun Kim
Sensors 2025, 25(16), 5066; https://doi.org/10.3390/s25165066 - 14 Aug 2025
Abstract
This paper introduces EmotionTFN (Emotion-Multi-Scale Temporal Fusion Network), a novel hierarchical temporal fusion architecture that addresses key challenges in IoT emotion recognition by processing diverse sensor data while maintaining accuracy across multiple temporal scales. The architecture integrates physiological signals (EEG, PPG, and GSR), [...] Read more.
This paper introduces EmotionTFN (Emotion-Multi-Scale Temporal Fusion Network), a novel hierarchical temporal fusion architecture that addresses key challenges in IoT emotion recognition by processing diverse sensor data while maintaining accuracy across multiple temporal scales. The architecture integrates physiological signals (EEG, PPG, and GSR), visual, and audio data using hierarchical temporal attention across short-term (0.5–2 s), medium-term (2–10 s), and long-term (10–60 s) windows. Edge computing optimizations, including model compression, quantization, and adaptive sampling, enable deployment on resource-constrained devices. Extensive experiments on MELD, DEAP, and G-REx datasets demonstrate 94.2% accuracy on discrete emotion classification and 0.087 mean absolute error on dimensional prediction, outperforming the best baseline (87.4%). The system maintains sub-200 ms latency on IoT hardware while achieving a 40% improvement in energy efficiency. Real-world deployment validation over four weeks achieved 97.2% uptime and user satisfaction scores of 4.1/5.0 while ensuring privacy through local processing. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

36 pages, 8425 KiB  
Article
Multifactorial Analysis of Defects in Oil Storage Tanks: Implications for Structural Performance and Safety
by Alexandru-Adrian Stoicescu, Razvan George Ripeanu, Maria Tănase, Costin Nicolae Ilincă and Liviu Toader
Processes 2025, 13(8), 2575; https://doi.org/10.3390/pr13082575 - 14 Aug 2025
Abstract
This article investigates the combined effects of different common defects on the structural integrity and operational and environmental safety in the operation of an existing Light Cycle Oil (LCO) storage tank. This study correlates all the tank defects (like corrosion and local plate [...] Read more.
This article investigates the combined effects of different common defects on the structural integrity and operational and environmental safety in the operation of an existing Light Cycle Oil (LCO) storage tank. This study correlates all the tank defects (like corrosion and local plate thinning, deformations, and local stress concentrators) against the loads and their combinations that occur during the tank’s lifetime. All the information gathered by various inspection techniques is used together to create a digital twin of the equipment that will be further analyzed by Finite Element Analysis. A tank condition assessment is a complex activity, and it is based on the experience of the engineer performing it. Since there are multiple methods for performing a comprehensive analysis, starting from the basic visual inspection (which is the most important) and some measurements followed by analytical calculations, up to full wall thickness measurements, 3D scan of deformations and FEA analysis of the tank digital twin, it depends on the engineer performing the evaluation to chose the best method for each particular case from technical and economical point of views. The goal of this article is to demonstrate that analytical and FEA methods have the same result and also to establish a well-determined standard calculation model for future applications. Full article
(This article belongs to the Section Materials Processes)
Show Figures

Figure 1

25 pages, 15383 KiB  
Article
SplitGround: Long-Chain Reasoning Split via Modular Multi-Expert Collaboration for Training-Free Scene Knowledge-Guided Visual Grounding
by Xilong Qin, Yue Hu, Wansen Wu, Xinmeng Li and Quanjun Yin
Big Data Cogn. Comput. 2025, 9(8), 209; https://doi.org/10.3390/bdcc9080209 - 14 Aug 2025
Abstract
Scene Knowledge-guided Visual Grounding (SK-VG) is a multi-modal detection task built upon conventional visual grounding (VG) for human–computer interaction scenarios. It utilizes an additional passage of scene knowledge apart from the image and context-dependent textual query for referred object localization. Due to the [...] Read more.
Scene Knowledge-guided Visual Grounding (SK-VG) is a multi-modal detection task built upon conventional visual grounding (VG) for human–computer interaction scenarios. It utilizes an additional passage of scene knowledge apart from the image and context-dependent textual query for referred object localization. Due to the inherent difficulty in directly establishing correlations between the given query and the image without leveraging scene knowledge, this task imposes significant demands on a multi-step knowledge reasoning process to achieve accurate grounding. Off-the-shelf VG models underperform under such a setting due to the requirement of detailed description in the query and a lack of knowledge inference based on implicit narratives of the visual scene. Recent Vision–Language Models (VLMs) exhibit improved cross-modal reasoning capabilities. However, their monolithic architectures, particularly in lightweight implementations, struggle to maintain coherent reasoning chains across sequential logical deductions, leading to error accumulation in knowledge integration and object localization. To address the above-mentioned challenges, we propose SplitGround—a collaborative framework that strategically decomposes complex reasoning processes by fusing the input query and image with knowledge through two auxiliary modules. Specifically, it implements an Agentic Annotation Workflow (AAW) for explicit image annotation and a Synonymous Conversion Mechanism (SCM) for semantic query transformation. This hierarchical decomposition enables VLMs to focus on essential reasoning steps while offloading auxiliary cognitive tasks to specialized modules, effectively splitting long reasoning chains into manageable subtasks with reduced complexity. Comprehensive evaluations on the SK-VG benchmark demonstrate the significant advancements of our method. Remarkably, SplitGround attains an accuracy improvement of 15.71% on the hard split of the test set over the previous training-required SOTA, using only a compact VLM backbone without fine-tuning, which provides new insights for knowledge-intensive visual grounding tasks. Full article
Show Figures

Figure 1

24 pages, 1735 KiB  
Article
A Multi-Sensor Fusion-Based Localization Method for a Magnetic Adhesion Wall-Climbing Robot
by Xiaowei Han, Hao Li, Nanmu Hui, Jiaying Zhang and Gaofeng Yue
Sensors 2025, 25(16), 5051; https://doi.org/10.3390/s25165051 - 14 Aug 2025
Abstract
To address the decline in the localization accuracy of magnetic adhesion wall-climbing robots operating on large steel structures, caused by visual occlusion, sensor drift, and environmental interference, this study proposes a simulation-based multi-sensor fusion localization method that integrates an Inertial Measurement Unit (IMU), [...] Read more.
To address the decline in the localization accuracy of magnetic adhesion wall-climbing robots operating on large steel structures, caused by visual occlusion, sensor drift, and environmental interference, this study proposes a simulation-based multi-sensor fusion localization method that integrates an Inertial Measurement Unit (IMU), Wheel Odometry (Odom), and Ultra-Wideband (UWB). An Extended Kalman Filter (EKF) is employed to integrate IMU and Odom measurements through a complementary filtering model, while a geometric residual-based weighting mechanism is introduced to optimize raw UWB ranging data. This enhances the accuracy and robustness of both the prediction and observation stages. All evaluations were conducted in a simulated environment, including scenarios on flat plates and spherical tank-shaped steel surfaces. The proposed method maintained a maximum localization error within 5 cm in both linear and closed-loop trajectories and achieved over 30% improvement in horizontal accuracy compared to baseline EKF-based approaches. The system exhibited consistent localization performance across varying surface geometries, providing technical support for robotic operations on large steel infrastructures. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

28 pages, 19126 KiB  
Article
Digital Geospatial Twinning for Revaluation of a Waterfront Urban Park Design (Case Study: Burgas City, Bulgaria)
by Stelian Dimitrov, Bilyana Borisova, Antoaneta Ivanova, Martin Iliev, Lidiya Semerdzhieva, Maya Ruseva and Zoya Stoyanova
Land 2025, 14(8), 1642; https://doi.org/10.3390/land14081642 - 14 Aug 2025
Abstract
Digital twins play a crucial role in linking data with practical solutions. They convert raw measurements into actionable insights, enabling spatial planning that addresses environmental challenges and meets the needs of local communities. This paper presents the development of a digital geospatial twin [...] Read more.
Digital twins play a crucial role in linking data with practical solutions. They convert raw measurements into actionable insights, enabling spatial planning that addresses environmental challenges and meets the needs of local communities. This paper presents the development of a digital geospatial twin for a residential district in Burgas, the largest port city on Bulgaria’s southern Black Sea coast. The aim is to provide up-to-date geospatial data quickly and efficiently, and to merge available data into a single, accurate model. This model is used to test three scenarios for revitalizing coastal functions and improving a waterfront urban park in collaboration with stakeholders. The methodology combines aerial photogrammetry, ground-based mobile laser scanning (MLS), and airborne laser scanning (ALS), allowing for robust 3D modeling and terrain reconstruction across different land cover conditions. The current topography, areas at risk from geological hazards, and the vegetation structure with detailed attribute data for each tree are analyzed. These data are used to evaluate the strengths and limitations of the site concerning the desired functionality of the waterfront, considering urban priorities, community needs, and the necessity of addressing contemporary climate challenges. The carbon storage potential under various development scenarios is assessed. Through effective visualization and communication with residents and professional stakeholders, collaborative development processes have been facilitated through a series of workshops focused on coastal transformation. The results aim to support the design of climate-neutral urban solutions that mitigate natural risks without compromising the area’s essential functions, such as residential living and recreation. Full article
Show Figures

Figure 1

18 pages, 3894 KiB  
Article
Validation of Acoustic Emission Tomography Using Lagrange Interpolation in a Defective Concrete Specimen
by Katsuya Nakamura, Mikika Furukawa, Kenichi Oda, Satoshi Shigemura and Yoshikazu Kobayashi
Appl. Sci. 2025, 15(16), 8965; https://doi.org/10.3390/app15168965 - 14 Aug 2025
Abstract
Acoustic Emission tomography (AET) has the potential to visualize damage in existing structures, contributing to structural health monitoring. Further, AET requires only the arrival times of elastic waves at sensors to identify velocity distributions, as source localization based on ray-tracing is integrated into [...] Read more.
Acoustic Emission tomography (AET) has the potential to visualize damage in existing structures, contributing to structural health monitoring. Further, AET requires only the arrival times of elastic waves at sensors to identify velocity distributions, as source localization based on ray-tracing is integrated into its algorithm. Thus, AET offers the advantage of easy acquisition of measurement data. However, accurate source localization requires a large number of elastic wave source candidate points, and increasing these candidates significantly raises the computational resource demand. Lagrange Interpolation has the potential to reduce the number of candidate points, optimizing computational resources, and this potential has been validated numerically. In this study, AET incorporating Lagrange Interpolation is applied to identify the velocity distribution in a defective concrete plate, validating its effectiveness using measured wave data. The validation results show that the defect location in the concrete plate is successfully identified using only 36 source candidates, compared to the 121 candidates required in conventional AET. Furthermore, when using 36 source candidates, the percentage error in applying Lagrange Interpolation is 8.4%, which is significantly more accurate than the 25% error observed in conventional AET. Therefore, it is confirmed that AET with Lagrange Interpolation has the potential to identify velocity distributions in existing structures using optimized resources, thereby contributing to the structural health monitoring of concrete infrastructure. Full article
(This article belongs to the Special Issue Advances in Structural Health Monitoring in Civil Engineering)
Show Figures

Figure 1

25 pages, 7900 KiB  
Article
Multi-Label Disease Detection in Chest X-Ray Imaging Using a Fine-Tuned ConvNeXtV2 with a Customized Classifier
by Kangzhe Xiong, Yuyun Tu, Xinping Rao, Xiang Zou and Yingkui Du
Informatics 2025, 12(3), 80; https://doi.org/10.3390/informatics12030080 - 14 Aug 2025
Viewed by 46
Abstract
Deep-learning-based multiple label chest X-ray classification has achieved significant success, but existing models still have three main issues: fixed-scale convolutions fail to capture both large and small lesions, standard pooling is lacking in the lack of attention to important regions, and linear classification [...] Read more.
Deep-learning-based multiple label chest X-ray classification has achieved significant success, but existing models still have three main issues: fixed-scale convolutions fail to capture both large and small lesions, standard pooling is lacking in the lack of attention to important regions, and linear classification lacks the capacity to model complex dependency between features. To circumvent these obstacles, we propose CONVFCMAE, a lightweight yet powerful framework that is built on a backbone that is partially frozen (77.08 % of the initial layers are fixed) in order to preserve complex, multi-scale features while decreasing the number of trainable parameters. Our architecture adds (1) an intelligent global pooling module that is learnable, with 1×1 convolutions that are dynamically weighted by their spatial location, and (2) a multi-head attention block that is dedicated to channel re-calibration, along with (3) a two-layer MLP that has been enhanced with ReLU, batch normalization, and dropout. This module is used to enhance the non-linearity of the feature space. To further reduce the noise associated with labels and the imbalance in class distribution inherent to the NIH ChestXray14 dataset, we utilize a combined loss that combines BCEWithLogits and Focal Loss as well as extensive data augmentation. On ChestXray14, the average ROC–AUC of CONVFCMAE is 0.852, which is 3.97 percent greater than the state of the art. Ablation experiments demonstrate the individual and collective effectiveness of each component. Grad-CAM visualizations have a superior capacity to localize the pathological regions, and this increases the interpretability of the model. Overall, CONVFCMAE provides a practical, generalizable solution to the problem of extracting features from medical images in a practical manner. Full article
(This article belongs to the Section Medical and Clinical Informatics)
Show Figures

Figure 1

21 pages, 691 KiB  
Article
The High Prevalence of Oncogenic HPV Genotypes Targeted by the Nonavalent HPV Vaccine in HIV-Infected Women Urgently Reinforces the Need for Prophylactic Vaccination in Key Female Populations Living in Gabon
by Marcelle Mboumba-Mboumba, Augustin Mouinga-Ondeme, Pamela Moussavou-Boundzanga, Jeordy Dimitri Engone-Ondo, Roseanne Mounanga Mourimarodi, Abdoulaye Diane, Christ Ognari Ayoumi, Laurent Bélec, Ralph-Sydney Mboumba Bouassa and Ivan Mfouo-Tynga
Diseases 2025, 13(8), 260; https://doi.org/10.3390/diseases13080260 - 14 Aug 2025
Viewed by 46
Abstract
Background/Objectives. Women living with human immunodeficiency virus (WLWH) have a six-fold higher risk of developing cervical cancer associated with high-risk human Papillomavirus (HR-HPV) than HIV-negative women. We herein assessed HR-HPV genotype distribution and plasma levels of the cancer antigen 125 (CA-125) in WLWH [...] Read more.
Background/Objectives. Women living with human immunodeficiency virus (WLWH) have a six-fold higher risk of developing cervical cancer associated with high-risk human Papillomavirus (HR-HPV) than HIV-negative women. We herein assessed HR-HPV genotype distribution and plasma levels of the cancer antigen 125 (CA-125) in WLWH in a rural town in Gabon, in Central Africa. Methods. Adult WLWH attending the local HIV outpatient center were prospectively enrolled and underwent cervical visual inspection and cervicovaginal and blood sampling. HIV RNA load and CA-125 levels were measured from plasma using the Cepheid® Xpert® HIV-1 Viral Load kit and BioMérieux VIDAS® CA-125 II assay, respectively. HPV detection and genotyping were performed via a nested polymerase chain reaction (MY09/11 and GP5+/6+), followed by sequencing. Results. Fifty-eight WLWH (median age: 52 years) were enrolled. Median CD4 count was 547 cells/µL (IQR: 412.5–737.5) and HIV RNA load 4.88 Log10 copies/mL (IQR: 3.79–5.49). HPV prevalence was 68.96%, with HR-HPV detected in 41.37% of women. Among HR-HPV-positive samples, 87.5% (21/24) were genotypes targeted by the Gardasil vaccine, while 12.5% (3/24) were non-vaccine types. Predominant HR-HPV types included HPV-16 (13.8%), HPV-33 (10.34%), HPV-35 (5.17%), HPV-31, and HPV-58 (3.45%). Most participants had normal cervical cytology (62.07%), and a minority (14.29%) had elevated CA-125 levels, with no correlation to cytological abnormalities. Conclusions. In the hinterland of Gabon, WLWH are facing an unsuspected yet substantial burden of cervical HR-HPV infection and a neglected risk for cervical cancer. Strengthening cervical cancer prevention through targeted HPV vaccination, sexual education, and accessible screening strategies will help in mitigating associated risk. Full article
Show Figures

Figure 1

27 pages, 5515 KiB  
Article
Optimizing Multi-Camera Mobile Mapping Systems with Pose Graph and Feature-Based Approaches
by Ahmad El-Alailyi, Luca Morelli, Paweł Trybała, Francesco Fassi and Fabio Remondino
Remote Sens. 2025, 17(16), 2810; https://doi.org/10.3390/rs17162810 - 13 Aug 2025
Viewed by 139
Abstract
Multi-camera Visual Simultaneous Localization and Mapping (V-SLAM) increases spatial coverage through multi-view image streams, improving localization accuracy and reducing data acquisition time. Despite its speed and generally robustness, V-SLAM often struggles to achieve precise camera poses necessary for accurate 3D reconstruction, especially in [...] Read more.
Multi-camera Visual Simultaneous Localization and Mapping (V-SLAM) increases spatial coverage through multi-view image streams, improving localization accuracy and reducing data acquisition time. Despite its speed and generally robustness, V-SLAM often struggles to achieve precise camera poses necessary for accurate 3D reconstruction, especially in complex environments. This study introduces two novel multi-camera optimization methods to enhance pose accuracy, reduce drift, and ensure loop closures. These methods refine multi-camera V-SLAM outputs within existing frameworks and are evaluated in two configurations: (1) multiple independent stereo V-SLAM instances operating on separate camera pairs; and (2) multi-view odometry processing all camera streams simultaneously. The proposed optimizations include (1) a multi-view feature-based optimization that integrates V-SLAM poses with rigid inter-camera constraints and bundle adjustment; and (2) a multi-camera pose graph optimization that fuses multiple trajectories using relative pose constraints and robust noise models. Validation is conducted through two complex 3D surveys using the ATOM-ANT3D multi-camera fisheye mobile mapping system. Results demonstrate survey-grade accuracy comparable to traditional photogrammetry, with reduced computational time, advancing toward near real-time 3D mapping of challenging environments. Full article
Show Figures

Figure 1

Back to TopTop