MDPI - Publisher of Open Access Journals

13 pages, 4544 KB

Open AccessArticle

Anodic Catalytic Oxidation of Sulfamethoxazole: Efficiency and Mechanism on Co₃O₄ Nanowire Self-Assembled CoFe₂O₄ Nanosheet Heterojunction

by Han Cui, Qiwei Zhang and Shan Qiu

Catalysts 2025, 15(9), 854; https://doi.org/10.3390/catal15090854 - 4 Sep 2025

Abstract

By modulating the mass ratio of hydrothermal agents to cobalt/iron precursors, Co₃O₄ nanowires were successfully integrated into spinel-type Co/Fe@NF, forming a heterojunction anode for alkaline water electrolysis (AWE) hydrogen production. This Co₃O₄ nanowire-assembled CoFe₂O₄ [...] Read more.

By modulating the mass ratio of hydrothermal agents to cobalt/iron precursors, Co₃O₄ nanowires were successfully integrated into spinel-type Co/Fe@NF, forming a heterojunction anode for alkaline water electrolysis (AWE) hydrogen production. This Co₃O₄ nanowire-assembled CoFe₂O₄ nanosheet anode (Co/Fe(5:1)@NF) exhibits exceptional electrochemical oxygen evolution reaction (OER) performance, requiring only 221 mV overpotential to achieve 10 mA cm⁻². Sulfamethoxazole (SMX) was employed as a model pollutant to investigate the anode sacrificial material; it achieved approximately 95% SMX degradation efficiency, reducing the OER potential of 50 mV/10 mA cm⁻². SMX oxidation coupled with Co/Fe heterojunction structure partially substitutes the OER. Co/Fe heterojunction generates an internal magnetic field, which induces the formation of novel active species within the system. ·O₂⁻ is the newly formed active oxygen species, which enhanced the proportion of indirect SMX oxidation. Quantitative analysis reveals that superoxide radical-mediated indirect oxidation of SMX accounts for approximately 38.5%, Fe(VI) for 9.4%, other active species for 6.1%, and direct oxidation for 46.0%. The nanowire–nanosheet assembly stabilizes a high-spin configuration on the catalyst surface, redirecting oxygen intermediate pathways toward triplet oxygen (³O₂) generation. Subsequent electron transfer from nanowire tips facilitates rapid ³O₂ reduction, forming superoxide radicals (·O₂⁻). This study effectively driven by indirect oxidation, with cathodic hydrogen production, providing a novel strategy for utilizing renewable electricity and reducing OER while offering insights into the design of Co/Fe-based catalyst. Full article

(This article belongs to the Section Electrocatalysis)

► Show Figures

Graphical abstract

15 pages, 1690 KB

Open AccessArticle

OTB-YOLO: An Enhanced Lightweight YOLO Architecture for UAV-Based Maize Tassel Detection

by Yu Han, Xingya Wang, Luyan Niu, Song Shi, Yingbo Gao, Kuijie Gong, Xia Zhang and Jiye Zheng

Plants 2025, 14(17), 2701; https://doi.org/10.3390/plants14172701 - 29 Aug 2025

Viewed by 338

Abstract

To tackle the challenges posed by substantial variations in target scale, intricate background interference, and the likelihood of missing small targets in multi-temporal UAV maize tassel imagery, an optimized lightweight detection model derived from YOLOv11 is introduced, named OTB-YOLO. Here, “OTB” is an [...] Read more.

To tackle the challenges posed by substantial variations in target scale, intricate background interference, and the likelihood of missing small targets in multi-temporal UAV maize tassel imagery, an optimized lightweight detection model derived from YOLOv11 is introduced, named OTB-YOLO. Here, “OTB” is an acronym derived from the initials of the model’s core improved modules: Omni-dimensional dynamic convolution (ODConv), Triplet Attention, and Bi-directional Feature Pyramid Network (BiFPN). This model integrates the PaddlePaddle open-source maize tassel recognition benchmark dataset with the public Multi-Temporal Drone Corn Dataset (MTDC). Traditional convolutional layers are substituted with omni-dimensional dynamic convolution (ODConv) to mitigate computational redundancy. A triplet attention module is incorporated to refine feature extraction within the backbone network, while a bidirectional feature pyramid network (BiFPN) is engineered to enhance accuracy via multi-level feature pyramids and bidirectional information flow. Empirical analysis demonstrates that the enhanced model achieves a precision of 95.6%, recall of 92.1%, and mAP@0.5 of 96.6%, marking improvements of 3.2%, 2.5%, and 3.1%, respectively, over the baseline model. Concurrently, the model’s computational complexity is reduced to 6.0 GFLOPs, rendering it appropriate for deployment on UAV edge computing platforms. Full article

(This article belongs to the Special Issue Application of Remote Sensing in Crop Production and Farmland Soil Monitoring)

► Show Figures

Figure 1

27 pages, 2379 KB

Open AccessArticle

Dual-Branch EfficientNet Model with Hybrid Triplet Loss for Architectural Era Classification of Traditional Dwellings in Longzhong Region, Gansu Province

by Shangbo Miao, Yalin Miao, Chenxi Zhang and Yushun Piao

Buildings 2025, 15(17), 3086; https://doi.org/10.3390/buildings15173086 - 28 Aug 2025

Viewed by 326

Abstract

Traditional vernacular architecture is an important component of historical and cultural heritage, and the accurate identification of its construction period is of great significance for architectural heritage conservation, historical research, and urban–rural planning. However, traditional methods for period identification are labor-intensive, potentially damaging [...] Read more.

Traditional vernacular architecture is an important component of historical and cultural heritage, and the accurate identification of its construction period is of great significance for architectural heritage conservation, historical research, and urban–rural planning. However, traditional methods for period identification are labor-intensive, potentially damaging to buildings, and lack sufficient accuracy. To address these issues, this study proposes a deep learning-based method for classifying the construction periods of traditional vernacular architecture. A dataset of traditional vernacular architecture images from the Longzhong region of Gansu Province was constructed, covering four periods: before 1911, 1912–1949, 1950–1980, and from 1981 to the present, with a total of 1181 images. Through comparative analysis of three mainstream models—ResNet50, EfficientNet-b4, and Vision Transformer—we found that EfficientNet demonstrated optimal performance in the classification task, achieving Accuracy, Precision, Recall, and F1-scores of 85.1%, 81.6%, 81.0%, and 81.1%, respectively. These metrics surpassed ResNet50 by 1.4%, 1.3%, 0.5%, and 1.2%, and outperformed Vision Transformer by 8.1%, 9.1%, 9.5%, and 9.1%, respectively. To further improve feature extraction and classification accuracy, we propose the “local–global feature joint learning network architecture” (DualBranchEfficientNet). This dual-branch design, comprising a global feature branch and a local feature branch, effectively integrates global structure with local details and significantly enhances classification performance. The proposed architecture achieved Accuracy, Precision, Recall, and F1-scores of 89.6%, 87.7%, 86.0%, and 86.7%, respectively, with DualBranchEfficientNet exhibiting a 2.0% higher Accuracy than DualBranchResNet. To address sample imbalance, a hybrid triplet loss function (Focal Loss + Triplet Loss) was introduced, and its effectiveness in identifying minority class samples was validated through ablation experiments. Experimental results show that the DualBranchEfficientNet model with the hybrid triplet loss outperforms traditional models across all evaluation metrics, particularly in the data-scarce 1950–1980 period, where Recall increased by 7.3% and F1-score by 4.1%. Finally, interpretability analysis via Grad-CAM heat maps demonstrates that the DualBranchEfficientNet model incorporating hybrid triplet loss accurately pinpoints the key discriminative regions of traditional dwellings across different eras, and its focus closely aligns with those identified by conventional methods. This study provides an efficient, accurate, and scalable deep learning solution for the period identification of traditional vernacular architecture. Full article

(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

► Show Figures

Figure 1

26 pages, 7962 KB

Open AccessArticle

IntegraPSG: Integrating LLM Guidance with Multimodal Feature Fusion for Single-Stage Panoptic Scene Graph Generation

by Yishuang Zhao, Qiang Zhang, Xueying Sun and Guanchen Liu

Electronics 2025, 14(17), 3428; https://doi.org/10.3390/electronics14173428 - 28 Aug 2025

Viewed by 398

Abstract

Panoptic scene graph generation (PSG) aims to simultaneously segment both foreground objects and background regions while predicting object relations for fine-grained scene modeling. Despite significant progress in panoptic scene understanding, current PSG methods face challenging problems: relation prediction often only relies on visual [...] Read more.

Panoptic scene graph generation (PSG) aims to simultaneously segment both foreground objects and background regions while predicting object relations for fine-grained scene modeling. Despite significant progress in panoptic scene understanding, current PSG methods face challenging problems: relation prediction often only relies on visual representations and is hindered by imbalanced relation category distributions. Accordingly, we propose IntegraPSG, a single-stage framework that integrates large language model (LLM) guidance with multimodal feature fusion. IntegraPSG introduces a multimodal sparse relation prediction network that efficiently integrates visual, linguistic, and depth cues to identify subject–object pairs most likely to form relations, enhancing the screening of subject–object pairs and filtering dense candidates into sparse, effective pairs. To alleviate the long-tail distribution problem of relations, we design a language-guided multimodal relation decoder where LLM is utilized to generate language descriptions for relation triplets, which are cross-modally attended with vision pair features. This design enables more accurate relation predictions for sparse subject–object pairs and effectively improves discriminative capability for rare relations. Experimental results show that IntegraPSG achieves steady and strong performance on the PSG dataset, especially with the R@100, mR@100, and mean reaching 38.7%, 28.6%, and 30.0%, respectively, indicating strong overall results and supporting the validity of the proposed method. Full article

► Show Figures

Figure 1

19 pages, 9845 KB

Open AccessArticle

TriQuery: A Query-Based Model for Surgical Triplet Recognition

by Mengrui Yao, Wenjie Zhang, Lin Wang, Zhongwei Zhao and Xiao Jia

Sensors 2025, 25(17), 5306; https://doi.org/10.3390/s25175306 - 26 Aug 2025

Viewed by 510

Abstract

Artificial intelligence has shown great promise in advancing intelligent surgical systems. Among its applications, surgical video action recognition plays a critical role in enabling accurate intraoperative understanding and decision support. However, the task remains challenging due to the temporal continuity of surgical scenes [...] Read more.

Artificial intelligence has shown great promise in advancing intelligent surgical systems. Among its applications, surgical video action recognition plays a critical role in enabling accurate intraoperative understanding and decision support. However, the task remains challenging due to the temporal continuity of surgical scenes and the long-tailed, semantically entangled distribution of action triplets composed of instruments, verbs, and targets. To address these issues, we propose TriQuery, a query-based model for surgical triplet recognition and classification. Built on a multi-task Transformer framework, TriQuery decomposes the complex triplet task into three semantically aligned subtasks using task-specific query tokens, which are processed through specialized attention mechanisms. We introduce a Multi-Query Decoding Head (MQ-DH) to jointly model structured subtasks and a Top-K Guided Query Update (TKQ) module to incorporate inter-frame temporal cues. Experiments on the CholecT45 dataset demonstrate that TriQuery achieves improved overall performance over existing baselines across multiple classification tasks. Attention visualizations further show that task queries consistently attend to semantically relevant spatial regions, enhancing model interpretability. These results highlight the effectiveness of TriQuery for advancing surgical video understanding in clinical environments. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

28 pages, 4317 KB

Open AccessArticle

Multi-Scale Attention Networks with Feature Refinement for Medical Item Classification in Intelligent Healthcare Systems

by Waqar Riaz, Asif Ullah and Jiancheng (Charles) Ji

Sensors 2025, 25(17), 5305; https://doi.org/10.3390/s25175305 - 26 Aug 2025

Viewed by 535

Abstract

The increasing adoption of artificial intelligence (AI) in intelligent healthcare systems has elevated the demand for robust medical imaging and vision-based inventory solutions. For an intelligent healthcare inventory system, accurate recognition and classification of medical items, including medicines and emergency supplies, are crucial [...] Read more.

The increasing adoption of artificial intelligence (AI) in intelligent healthcare systems has elevated the demand for robust medical imaging and vision-based inventory solutions. For an intelligent healthcare inventory system, accurate recognition and classification of medical items, including medicines and emergency supplies, are crucial for ensuring inventory integrity and timely access to life-saving resources. This study presents a hybrid deep learning framework, EfficientDet-BiFormer-ResNet, that integrates three specialized components: EfficientDet’s Bidirectional Feature Pyramid Network (BiFPN) for scalable multi-scale object detection, BiFormer’s bi-level routing attention for context-aware spatial refinement, and ResNet-18 enhanced with triplet loss and Online Hard Negative Mining (OHNM) for fine-grained classification. The model was trained and validated on a custom healthcare inventory dataset comprising over 5000 images collected under diverse lighting, occlusion, and arrangement conditions. Quantitative evaluations demonstrated that the proposed system achieved a mean average precision (mAP@0.5:0.95) of 83.2% and a top-1 classification accuracy of 94.7%, outperforming conventional models such as YOLO, SSD, and Mask R-CNN. The framework excelled in recognizing visually similar, occluded, and small-scale medical items. This work advances real-time medical item detection in healthcare by providing an AI-enabled, clinically relevant vision system for medical inventory management. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

17 pages, 3553 KB

Open AccessArticle

Comparative Evaluation of Computational Methods for Validating Housekeeping Gene RT-qPCR Data in 3T3-L1 Cells

by Zhenya Ivanova, Natalia Grigorova, Valeria Petrova, Ekaterina Vachkova and Georgi Beev

Biomedicines 2025, 13(8), 2036; https://doi.org/10.3390/biomedicines13082036 - 21 Aug 2025

Viewed by 506

Abstract

Background: Postbiotics with anti-adipogenic properties can significantly modify adipocyte metabolism by influencing key cellular pathways involved in lipid accumulation. In preliminary in vitro studies, it is essential to monitor various cellular and subcellular variables, including gene expression and protein synthesis potential, through RT-qPCR [...] Read more.

Background: Postbiotics with anti-adipogenic properties can significantly modify adipocyte metabolism by influencing key cellular pathways involved in lipid accumulation. In preliminary in vitro studies, it is essential to monitor various cellular and subcellular variables, including gene expression and protein synthesis potential, through RT-qPCR analysis. It is also crucial to select internal controls carefully and evaluate their stability for effective normalization and accurate interpretation of the results. Methods: In this study, we assessed the stability of six commonly used housekeeping genes: GAPDH, Actb, HPRT, HMBS, 18S, and 36B4. We analyzed their variability in mature 3T3-L1 adipocytes treated with supernatants from newly isolated Lacticaseibacillus paracasei strains. Our analysis combined classical statistical methods, a ∆Ct analysis, and software algorithms such as geNorm, NormFinder, BestKeeper, and RefFinder. Results: Our stepwise, multiparameter strategy for selecting reference genes led to the exclusion of Actb and 18S as the most variable reference genes. We identified HPRT as the most stable internal control. Additionally, HPRT and HMBS emerged as a stable pair, while the recommended triplet of genes for reliable normalization consists of HPRT, 36B4, and HMBS. Conclusions: The widely used putative genes in similar studies—GAPDH and Actb—did not confirm their presumed stability, which once again emphasizes the need for experimental validation of internal controls to increase the accuracy and reliability of gene expression. Combining a unique biological model—postbiotic-treated adipocytes—with multiple algorithms integrated into a single workflow allows us to provide a methodological template applicable to similar nutritional and metabolic research settings. Full article

(This article belongs to the Section Molecular Genetics and Genetic Diseases)

► Show Figures

Figure 1

23 pages, 1291 KB

Open AccessArticle

(Oxidopyridyl)Porphyrins of Different Lipophilicity: Photophysical Properties, ROS Production and Phototoxicity on Melanoma Cells Under CoCl₂-Induced Hypoxia

by Martina Mušković, Martin Lončarić, Ivana Ratkaj and Nela Malatesti

Antioxidants 2025, 14(8), 992; https://doi.org/10.3390/antiox14080992 - 13 Aug 2025

Viewed by 432

Abstract

One of the main limitations of photodynamic therapy (PDT) is hypoxia, which is caused by increased tumour proliferation creating a hypoxic tumour microenvironment (TME), as well as oxygen consumption by PDT. Hypoxia-activated prodrugs (HAPs), such as molecules containing aliphatic or aromatic N-oxide [...] Read more.

One of the main limitations of photodynamic therapy (PDT) is hypoxia, which is caused by increased tumour proliferation creating a hypoxic tumour microenvironment (TME), as well as oxygen consumption by PDT. Hypoxia-activated prodrugs (HAPs), such as molecules containing aliphatic or aromatic N-oxide functionalities, are non-toxic prodrugs that are activated in hypoxic regions, where they are reduced into their cytotoxic form. The (oxido)pyridylporphyrins tested in this work were synthesised as potential HAPs from their AB₃ pyridylporphyrin precursors, using m-chloroperbenzoic acid (m-CPBA) as an oxidising reagent. Their ground-state and excited-state spectroscopic properties, singlet oxygen (¹O₂) production by the photodegradation of 1,3-diphenylisobenzofurane (DPBF) and theoretical lipophilicity were determined. In vitro analyses included cellular uptake, localisation and (photo)cytotoxicity under normoxia and CoCl₂-induced hypoxia. The CoCl₂ hypoxia model was used to reveal their properties, as related to HIF-1 activation and HIF-1α accumulation. (Oxido)pyridylporphyrins showed promising properties, such as the long lifetime of the excited triplet state, a high quantum yield of intersystem crossing, and high production of ROS/¹O₂. Lower cellular uptake resulted in an overall lower phototoxicity of these N-oxide porphyrins in comparison to their N-methylated analogues, and both porphyrin series were less active on CoCl₂-treated cells. (Oxido)pyridylporphyrins showed higher selectivity for pigmented melanoma cells, and the antioxidant activity of melanin pigment seemed to have a lower impact on their PDT activity compared to their N-methylated analogues in both CoCl₂-induced hypoxia and normoxia. Their potential HAP activity will be evaluated under conditions of reduced oxygen concentration in our future studies. Full article

(This article belongs to the Section ROS, RNS and RSS)

► Show Figures

Figure 1

24 pages, 16454 KB

Open AccessArticle

Enhanced Wavelet-Convolution and Few-Shot Prototype-Driven Framework for Incremental Identification of Holstein Cattle

by Weijun Duan, Fang Wang, Honghui Li, Buyu Wang, Yuan Wang and Xueliang Fu

Sensors 2025, 25(16), 4910; https://doi.org/10.3390/s25164910 - 8 Aug 2025

Viewed by 357

Abstract

Individual identification of Holstein cattle is crucial for the intelligent management of farms. The existing closed-set identification models are inadequate for breeding scenarios where new individuals continually join, and they are highly sensitive to obstructions and alterations in the cattle’s appearance, such as [...] Read more.

Individual identification of Holstein cattle is crucial for the intelligent management of farms. The existing closed-set identification models are inadequate for breeding scenarios where new individuals continually join, and they are highly sensitive to obstructions and alterations in the cattle’s appearance, such as back defacement. The current open-set identification methods exhibit low discriminatory stability for new individuals. These limitations significantly hinder the application and promotion of the model. To address these challenges, this paper proposes a prototype network-based incremental identification framework for Holstein cattle to achieve stable identification of new individuals under small sample conditions. Firstly, we design a feature extraction network, ResWTA, which integrates wavelet convolution with a spatial attention mechanism. This design enhances the model’s response to low-level features by adjusting the convolutional receptive field, thereby improving its feature extraction capabilities. Secondly, we construct a few-shot augmented prototype network to bolster the framework’s robustness for incremental identification. Lastly, we systematically evaluate the effects of various loss functions, prototype computation methods, and distance metrics on identification performance. The experimental results indicate that utilizing ResWTA as the feature extraction network achieves a top-1 accuracy of 97.43% and a top-5 accuracy of 99.54%. Furthermore, introducing the few-shot augmented prototype network enhances the top-1 accuracy by 4.77%. When combined with the Triplet loss function and the Manhattan distance metric, the identification accuracy of the framework can reach up to 94.33%. Notably, this combination reduces the incremental learning forgetfulness by 4.89% compared to the baseline model, while improving the average incremental accuracy by 2.4%. The proposed method not only facilitates incremental identification of Holstein cattle but also significantly bolsters the robustness of the identification process, thereby providing effective technical support for intelligent farm management. Full article

(This article belongs to the Special Issue Sensor and AI Technologies in Intelligent Agriculture: 2nd Edition)

► Show Figures

Figure 1

18 pages, 640 KB

Open AccessArticle

Fine-Tuning Methods and Dataset Structures for Multilingual Neural Machine Translation: A Kazakh–English–Russian Case Study in the IT Domain

by Zhanibek Kozhirbayev and Zhandos Yessenbayev

Electronics 2025, 14(15), 3126; https://doi.org/10.3390/electronics14153126 - 6 Aug 2025

Viewed by 458

Abstract

This study explores fine-tuning methods and dataset structures for multilingual neural machine translation using the No Language Left Behind model, with a case study on Kazakh, English, and Russian. We compare single-stage and two-stage fine-tuning approaches, as well as triplet versus non-triplet dataset [...] Read more.

This study explores fine-tuning methods and dataset structures for multilingual neural machine translation using the No Language Left Behind model, with a case study on Kazakh, English, and Russian. We compare single-stage and two-stage fine-tuning approaches, as well as triplet versus non-triplet dataset configurations, to improve translation quality. A high-quality, 50,000-triplet dataset in information technology domain, manually translated and expert-validated, serves as the in-domain benchmark, complemented by out-of-domain corpora like KazParC. Evaluations using BLEU, chrF, METEOR, and TER metrics reveal that single-stage fine-tuning excels for low-resource pairs (e.g., 0.48 BLEU, 0.77 chrF for Kazakh → Russian), while two-stage fine-tuning benefits high-resource pairs (Russian → English). Triplet datasets improve cross-linguistic consistency compared with non-triplet structures. Our reproducible framework offers practical guidance for adapting neural machine translation to technical domains and low-resource languages. Full article

(This article belongs to the Special Issue Natural Language Processing Based on Neural Networks and Large Language Models)

► Show Figures

Figure 1

15 pages, 1241 KB

Open AccessArticle

Triplet Spatial Reconstruction Attention-Based Lightweight Ship Component Detection for Intelligent Manufacturing

by Bocheng Feng, Zhenqiu Yao and Chuanpu Feng

Appl. Sci. 2025, 15(15), 8676; https://doi.org/10.3390/app15158676 - 5 Aug 2025

Viewed by 301

Abstract

Automatic component recognition plays a crucial role in intelligent ship manufacturing, but existing methods suffer from low recognition accuracy and high computational cost in industrial scenarios involving small samples, component stacking, and diverse categories. To address the requirements of shipbuilding industrial applications, a [...] Read more.

Automatic component recognition plays a crucial role in intelligent ship manufacturing, but existing methods suffer from low recognition accuracy and high computational cost in industrial scenarios involving small samples, component stacking, and diverse categories. To address the requirements of shipbuilding industrial applications, a Triplet Spatial Reconstruction Attention (TSA) mechanism that combines threshold-based feature separation with triplet parallel processing is proposed, and a lightweight You Only Look Once Ship (YOLO-Ship) detection network is constructed. Unlike existing attention mechanisms that focus on either spatial reconstruction or channel attention independently, the proposed TSA integrates triplet parallel processing with spatial feature separation–reconstruction techniques to achieve enhanced target feature representation while significantly reducing parameter count and computational overhead. Experimental validation on a small-scale actual ship component dataset demonstrates that the improved network achieves 88.7% mean Average Precision (mAP), 84.2% precision, and 87.1% recall, representing improvements of 3.5%, 2.2%, and 3.8%, respectively, compared to the original YOLOv8n algorithm, requiring only 2.6 M parameters and 7.5 Giga Floating-point Operations per Second (GFLOPs) computational cost, achieving a good balance between detection accuracy and lightweight model design. Future research directions include developing adaptive threshold learning mechanisms for varying industrial conditions and integration with surface defect detection capabilities to enhance comprehensive quality control in intelligent manufacturing systems. Full article

(This article belongs to the Special Issue Artificial Intelligence on the Edge for Industry 4.0)

► Show Figures

Figure 1

21 pages, 9010 KB

Open AccessArticle

Dual-Branch Deep Learning with Dynamic Stage Detection for CT Tube Life Prediction

by Zhu Chen, Yuedan Liu, Zhibin Qin, Haojie Li, Siyuan Xie, Litian Fan, Qilin Liu and Jin Huang

Sensors 2025, 25(15), 4790; https://doi.org/10.3390/s25154790 - 4 Aug 2025

Viewed by 433

Abstract

CT scanners are essential tools in modern medical imaging. Sudden failures of their X-ray tubes can lead to equipment downtime, affecting healthcare services and patient diagnosis. However, existing prediction methods based on a single model struggle to adapt to the multi-stage variation characteristics [...] Read more.

CT scanners are essential tools in modern medical imaging. Sudden failures of their X-ray tubes can lead to equipment downtime, affecting healthcare services and patient diagnosis. However, existing prediction methods based on a single model struggle to adapt to the multi-stage variation characteristics of tube lifespan and have limited modeling capabilities for temporal features. To address these issues, this paper proposes an intelligent prediction architecture for CT tubes’ remaining useful life based on a dual-branch neural network. This architecture consists of two specialized branches: a residual self-attention BiLSTM (RSA-BiLSTM) and a multi-layer dilation temporal convolutional network (D-TCN). The RSA-BiLSTM branch extracts multi-scale features and also enhances the long-term dependency modeling capability for temporal data. The D-TCN branch captures multi-scale temporal features through multi-layer dilated convolutions, effectively handling non-linear changes in the degradation phase. Furthermore, a dynamic phase detector is applied to integrate the prediction results from both branches. In terms of optimization strategy, a dynamically weighted triplet mixed loss function is designed to adjust the weight ratios of different prediction tasks, effectively solving the problems of sample imbalance and uneven prediction accuracy. Experimental results using leave-one-out cross-validation (LOOCV) on six different CT tube datasets show that the proposed method achieved significant advantages over five comparison models, with an average MSE of 2.92, MAE of 0.46, and R² of 0.77. The LOOCV strategy ensures robust evaluation by testing each tube dataset independently while training on the remaining five, providing reliable generalization assessment across different CT equipment. Ablation experiments further confirmed that the collaborative design of multiple components is significant for improving the accuracy of X-ray tubes remaining life prediction. Full article

(This article belongs to the Special Issue Intelligent Sensors for Condition Monitoring, Diagnosis, and Prognostics)

► Show Figures

Figure 1

24 pages, 3121 KB

Open AccessArticle

SG-RAG MOT: SubGraph Retrieval Augmented Generation with Merging and Ordering Triplets for Knowledge Graph Multi-Hop Question Answering

by Ahmmad O. M. Saleh, Gokhan Tur and Yucel Saygin

Mach. Learn. Knowl. Extr. 2025, 7(3), 74; https://doi.org/10.3390/make7030074 - 1 Aug 2025

Viewed by 864

Abstract

Large language models (LLMs) often tend to hallucinate, especially in domain-specific tasks and tasks that require reasoning. Previously, we introduced SubGraph Retrieval Augmented Generation (SG-RAG) as a novel Graph RAG method for multi-hop question answering. SG-RAG leverages Cypher queries to search a given [...] Read more.

Large language models (LLMs) often tend to hallucinate, especially in domain-specific tasks and tasks that require reasoning. Previously, we introduced SubGraph Retrieval Augmented Generation (SG-RAG) as a novel Graph RAG method for multi-hop question answering. SG-RAG leverages Cypher queries to search a given knowledge graph and retrieve the subgraph necessary to answer the question. The results from our previous work showed the higher performance of our method compared to the traditional Retrieval Augmented Generation (RAG). In this work, we further enhanced SG-RAG by proposing an additional step called Merging and Ordering Triplets (MOT). The new MOT step seeks to decrease the redundancy in the retrieved triplets by applying hierarchical merging to the retrieved subgraphs. Moreover, it provides an ordering among the triplets using the Breadth-First Search (BFS) traversal algorithm. We conducted experiments on the MetaQA benchmark, which was proposed for multi-hop question-answering in the movies domain. Our experiments showed that SG-RAG MOT provided more accurate answers than Chain-of-Thought and Graph Chain-of-Thought. We also found that merging (up to a certain point) highly overlapping subgraphs and defining an order among the triplets helped the LLM to generate more precise answers. Full article

(This article belongs to the Special Issue Knowledge Graphs and Large Language Models)

► Show Figures

Figure 1

30 pages, 37977 KB

Open AccessArticle

Text-Guided Visual Representation Optimization for Sensor-Acquired Video Temporal Grounding

by Yun Tian, Xiaobo Guo, Jinsong Wang and Xinyue Liang

Sensors 2025, 25(15), 4704; https://doi.org/10.3390/s25154704 - 30 Jul 2025

Viewed by 478

Abstract

Video temporal grounding (VTG) aims to localize a semantically relevant temporal segment within an untrimmed video based on a natural language query. The task continues to face challenges arising from cross-modal semantic misalignment, which is largely attributed to redundant visual content in sensor-acquired [...] Read more.

Video temporal grounding (VTG) aims to localize a semantically relevant temporal segment within an untrimmed video based on a natural language query. The task continues to face challenges arising from cross-modal semantic misalignment, which is largely attributed to redundant visual content in sensor-acquired video streams, linguistic ambiguity, and discrepancies in modality-specific representations. Most existing approaches rely on intra-modal feature modeling, processing video and text independently throughout the representation learning stage. However, this isolation undermines semantic alignment by neglecting the potential of cross-modal interactions. In practice, a natural language query typically corresponds to spatiotemporal content in video signals collected through camera-based sensing systems, encompassing a particular sequence of frames and its associated salient subregions. We propose a text-guided visual representation optimization framework tailored to enhance semantic interpretation over video signals captured by visual sensors. This framework leverages textual information to focus on spatiotemporal video content, thereby narrowing the cross-modal gap. Built upon the unified cross-modal embedding space provided by CLIP, our model leverages video data from sensing devices to structure representations and introduces two dedicated modules to semantically refine visual representations across spatial and temporal dimensions. First, we design a Spatial Visual Representation Optimization (SVRO) module to learn spatial information within intra-frames. It selects salient patches related to the text, capturing more fine-grained visual details. Second, we introduce a Temporal Visual Representation Optimization (TVRO) module to learn temporal relations from inter-frames. Temporal triplet loss is employed in TVRO to enhance attention on text-relevant frames and capture clip semantics. Additionally, a self-supervised contrastive loss is introduced at the clip–text level to improve inter-clip discrimination by maximizing semantic variance during training. Experiments on Charades-STA, ActivityNet Captions, and TACoS, widely used benchmark datasets, demonstrate that our method outperforms state-of-the-art methods across multiple metrics. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

26 pages, 16392 KB

Open AccessArticle

TOSD: A Hierarchical Object-Centric Descriptor Integrating Shape, Color, and Topology

by Jun-Hyeon Choi, Jeong-Won Pyo, Ye-Chan An and Tae-Yong Kuc

Sensors 2025, 25(15), 4614; https://doi.org/10.3390/s25154614 - 25 Jul 2025

Viewed by 509

Abstract

This paper introduces a hierarchical object-centric descriptor framework called TOSD (Triplet Object-Centric Semantic Descriptor). The goal of this method is to overcome the limitations of existing pixel-based and global feature embedding approaches. To this end, the framework adopts a hierarchical representation that is [...] Read more.

This paper introduces a hierarchical object-centric descriptor framework called TOSD (Triplet Object-Centric Semantic Descriptor). The goal of this method is to overcome the limitations of existing pixel-based and global feature embedding approaches. To this end, the framework adopts a hierarchical representation that is explicitly designed for multi-level reasoning. TOSD combines shape, color, and topological information without depending on predefined class labels. The shape descriptor captures the geometric configuration of each object. The color descriptor focuses on internal appearance by extracting normalized color features. The topology descriptor models the spatial and semantic relationships between objects in a scene. These components are integrated at both object and scene levels to produce compact and consistent embeddings. The resulting representation covers three levels of abstraction: low-level pixel details, mid-level object features, and high-level semantic structure. This hierarchical organization makes it possible to represent both local cues and global context in a unified form. We evaluate the proposed method on multiple vision tasks. The results show that TOSD performs competitively compared to baseline methods, while maintaining robustness in challenging cases such as occlusion and viewpoint changes. The framework is applicable to visual odometry, SLAM, object tracking, global localization, scene clustering, and image retrieval. In addition, this work extends our previous research on the Semantic Modeling Framework, which represents environments using layered structures of places, objects, and their ontological relations. Full article

(This article belongs to the Special Issue Event-Driven Vision Sensor Architectures and Application Scenarios)

► Show Figures

Figure 1

Search Results (438)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (438)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI