Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (3,033)

Search Parameters:
Keywords = visual transformers

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 21564 KiB  
Article
Remote Visualization and Optimization of Fluid Dynamics Using Mixed Reality
by Sakshi Sandeep More, Brandon Antron, David Paeres and Guillermo Araya
Appl. Sci. 2025, 15(16), 9017; https://doi.org/10.3390/app15169017 - 15 Aug 2025
Abstract
This study presents an innovative pipeline for processing, compressing, and remotely visualizing large-scale numerical simulations of fluid dynamics in a virtual wind tunnel (VWT), leveraging virtual and augmented reality (VR/AR) for enhanced analysis and high-end visualization. The workflow addresses the challenges of handling [...] Read more.
This study presents an innovative pipeline for processing, compressing, and remotely visualizing large-scale numerical simulations of fluid dynamics in a virtual wind tunnel (VWT), leveraging virtual and augmented reality (VR/AR) for enhanced analysis and high-end visualization. The workflow addresses the challenges of handling massive databases generated using Direct Numerical Simulation (DNS) while maintaining visual fidelity and ensuring efficient rendering for user interaction. Fully immersive visualization of supersonic (Mach number 2.86) spatially developing turbulent boundary layers (SDTBLs) over strong concave and convex curvatures was achieved. The comprehensive DNS data provides insights on the transport phenomena inside turbulent boundary layers under strong deceleration or an Adverse Pressure Gradient (APG) caused by concave walls as well as strong acceleration or a Favorable Pressure Gradient (FPG) caused by convex walls under different wall thermal conditions (i.e., Cold, Adiabatic, and Hot walls). The process begins with a .vts file input from a DNS, which is visualized using ParaView software. These visualizations, representing different fluid behaviors based on a DNS with a high spatial/temporal resolution and employing millions of “numerical sensors”, are treated as individual time frames and exported in GL Transmission Format (GLTF), which is a widely used open-source file format designed for efficient transmission and loading of 3D scenes. To support the workflow, optimized Extract–Transform–Load (ETL) techniques were implemented for high-throughput data handling. Conversion of exported Graphics Library Transmission Format (GLTF) files into Graphics Library Transmission Format Binary files (typically referred to as GLB) reduced the storage by 25% and improved the load latency by 60%. This research uses Unity’s Profile Analyzer and Memory Profiler to identify performance limitations during contour rendering, focusing on the GPU and CPU efficiency. Further, immersive VR/AR analytics are achieved by connecting the processed outputs to Unity engine software and Microsoft HoloLens Gen 2 via Azure Remote Rendering cloud services, enabling real-time exploration of fluid behavior in mixed-reality environments. This pipeline constitutes a significant advancement in the scientific visualization of fluid dynamics, particularly when applied to datasets comprising hundreds of high-resolution frames. Moreover, the methodologies and insights gleaned from this approach are highly transferable, offering potential applications across various other scientific and engineering disciplines. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

16 pages, 7955 KiB  
Article
Development and Validation of a Computer Vision Dataset for Object Detection and Instance Segmentation in Earthwork Construction Sites
by JongHo Na, JaeKang Lee, HyuSoung Shin and IlDong Yun
Appl. Sci. 2025, 15(16), 9000; https://doi.org/10.3390/app15169000 - 14 Aug 2025
Abstract
Construction sites report the highest rate of industrial accidents, prompting the active development of smart safety management systems based on deep learning-based computer vision technology. To support the digital transformation of construction sites, securing site-specific datasets is essential. In this study, raw data [...] Read more.
Construction sites report the highest rate of industrial accidents, prompting the active development of smart safety management systems based on deep learning-based computer vision technology. To support the digital transformation of construction sites, securing site-specific datasets is essential. In this study, raw data were collected from an actual earthwork site. Key construction equipment and terrain objects primarily operated at the site were identified, and 89,766 images were processed to build a site-specific training dataset. This dataset includes annotated bounding boxes for object detection and polygon masks for instance segmentation. The performance of the dataset was validated using representative models—YOLO v7 for object detection and Mask R-CNN for instance segmentation. Quantitative metrics and visual assessments confirmed the validity and practical applicability of the dataset. The dataset used in this study has been made publicly available for use by researchers in related fields. This dataset is expected to serve as a foundational resource for advancing object detection applications in construction safety. Full article
(This article belongs to the Section Civil Engineering)
Show Figures

Figure 1

32 pages, 6394 KiB  
Article
Neuro-Bridge-X: A Neuro-Symbolic Vision Transformer with Meta-XAI for Interpretable Leukemia Diagnosis from Peripheral Blood Smears
by Fares Jammal, Mohamed Dahab and Areej Y. Bayahya
Diagnostics 2025, 15(16), 2040; https://doi.org/10.3390/diagnostics15162040 - 14 Aug 2025
Abstract
Background/Objectives: Acute Lymphoblastic Leukemia (ALL) poses significant diagnostic challenges due to its ambiguous symptoms and the limitations of conventional methods like bone marrow biopsies and flow cytometry, which are invasive, costly, and time-intensive. Methods: This study introduces Neuro-Bridge-X, a novel neuro-symbolic hybrid model [...] Read more.
Background/Objectives: Acute Lymphoblastic Leukemia (ALL) poses significant diagnostic challenges due to its ambiguous symptoms and the limitations of conventional methods like bone marrow biopsies and flow cytometry, which are invasive, costly, and time-intensive. Methods: This study introduces Neuro-Bridge-X, a novel neuro-symbolic hybrid model designed for automated, explainable ALL diagnosis using peripheral blood smear (PBS) images. Leveraging two comprehensive datasets, ALL Image (3256 images from 89 patients) and C-NMC (15,135 images from 118 patients), the model integrates deep morphological feature extraction, vision transformer-based contextual encoding, fuzzy logic-inspired reasoning, and adaptive explainability. To address class imbalance, advanced data augmentation techniques were applied, ensuring equitable representation across benign and leukemic classes. The proposed framework was evaluated through 5-fold cross-validation and fixed train-test splits, employing Nadam, SGD, and Fractional RAdam optimizers. Results: Results demonstrate exceptional performance, with SGD achieving near-perfect accuracy (1.0000 on ALL, 0.9715 on C-NMC) and robust generalization, while Fractional RAdam closely followed (0.9975 on ALL, 0.9656 on C-NMC). Nadam, however, exhibited inconsistent convergence, particularly on C-NMC (0.5002 accuracy). A Meta-XAI controller enhances interpretability by dynamically selecting optimal explanation strategies (Grad-CAM, SHAP, Integrated Gradients, LIME), ensuring clinically relevant insights into model decisions. Conclusions: Visualizations confirm that SGD and RAdam models focus on morphologically critical features, such as leukocyte nuclei, while Nadam struggles with spurious attributions. Neuro-Bridge-X offers a scalable, interpretable solution for ALL diagnosis, with potential to enhance clinical workflows and diagnostic precision in oncology. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

25 pages, 15383 KiB  
Article
SplitGround: Long-Chain Reasoning Split via Modular Multi-Expert Collaboration for Training-Free Scene Knowledge-Guided Visual Grounding
by Xilong Qin, Yue Hu, Wansen Wu, Xinmeng Li and Quanjun Yin
Big Data Cogn. Comput. 2025, 9(8), 209; https://doi.org/10.3390/bdcc9080209 - 14 Aug 2025
Abstract
Scene Knowledge-guided Visual Grounding (SK-VG) is a multi-modal detection task built upon conventional visual grounding (VG) for human–computer interaction scenarios. It utilizes an additional passage of scene knowledge apart from the image and context-dependent textual query for referred object localization. Due to the [...] Read more.
Scene Knowledge-guided Visual Grounding (SK-VG) is a multi-modal detection task built upon conventional visual grounding (VG) for human–computer interaction scenarios. It utilizes an additional passage of scene knowledge apart from the image and context-dependent textual query for referred object localization. Due to the inherent difficulty in directly establishing correlations between the given query and the image without leveraging scene knowledge, this task imposes significant demands on a multi-step knowledge reasoning process to achieve accurate grounding. Off-the-shelf VG models underperform under such a setting due to the requirement of detailed description in the query and a lack of knowledge inference based on implicit narratives of the visual scene. Recent Vision–Language Models (VLMs) exhibit improved cross-modal reasoning capabilities. However, their monolithic architectures, particularly in lightweight implementations, struggle to maintain coherent reasoning chains across sequential logical deductions, leading to error accumulation in knowledge integration and object localization. To address the above-mentioned challenges, we propose SplitGround—a collaborative framework that strategically decomposes complex reasoning processes by fusing the input query and image with knowledge through two auxiliary modules. Specifically, it implements an Agentic Annotation Workflow (AAW) for explicit image annotation and a Synonymous Conversion Mechanism (SCM) for semantic query transformation. This hierarchical decomposition enables VLMs to focus on essential reasoning steps while offloading auxiliary cognitive tasks to specialized modules, effectively splitting long reasoning chains into manageable subtasks with reduced complexity. Comprehensive evaluations on the SK-VG benchmark demonstrate the significant advancements of our method. Remarkably, SplitGround attains an accuracy improvement of 15.71% on the hard split of the test set over the previous training-required SOTA, using only a compact VLM backbone without fine-tuning, which provides new insights for knowledge-intensive visual grounding tasks. Full article
Show Figures

Figure 1

28 pages, 19126 KiB  
Article
Digital Geospatial Twinning for Revaluation of a Waterfront Urban Park Design (Case Study: Burgas City, Bulgaria)
by Stelian Dimitrov, Bilyana Borisova, Antoaneta Ivanova, Martin Iliev, Lidiya Semerdzhieva, Maya Ruseva and Zoya Stoyanova
Land 2025, 14(8), 1642; https://doi.org/10.3390/land14081642 - 14 Aug 2025
Abstract
Digital twins play a crucial role in linking data with practical solutions. They convert raw measurements into actionable insights, enabling spatial planning that addresses environmental challenges and meets the needs of local communities. This paper presents the development of a digital geospatial twin [...] Read more.
Digital twins play a crucial role in linking data with practical solutions. They convert raw measurements into actionable insights, enabling spatial planning that addresses environmental challenges and meets the needs of local communities. This paper presents the development of a digital geospatial twin for a residential district in Burgas, the largest port city on Bulgaria’s southern Black Sea coast. The aim is to provide up-to-date geospatial data quickly and efficiently, and to merge available data into a single, accurate model. This model is used to test three scenarios for revitalizing coastal functions and improving a waterfront urban park in collaboration with stakeholders. The methodology combines aerial photogrammetry, ground-based mobile laser scanning (MLS), and airborne laser scanning (ALS), allowing for robust 3D modeling and terrain reconstruction across different land cover conditions. The current topography, areas at risk from geological hazards, and the vegetation structure with detailed attribute data for each tree are analyzed. These data are used to evaluate the strengths and limitations of the site concerning the desired functionality of the waterfront, considering urban priorities, community needs, and the necessity of addressing contemporary climate challenges. The carbon storage potential under various development scenarios is assessed. Through effective visualization and communication with residents and professional stakeholders, collaborative development processes have been facilitated through a series of workshops focused on coastal transformation. The results aim to support the design of climate-neutral urban solutions that mitigate natural risks without compromising the area’s essential functions, such as residential living and recreation. Full article
Show Figures

Figure 1

23 pages, 2744 KiB  
Article
CASF: Correlation-Alignment and Significance-Aware Fusion for Multimodal Named Entity Recognition
by Hui Li, Yunshi Tao, Huan Wang, Zhe Wang and Qingzheng Liu
Algorithms 2025, 18(8), 511; https://doi.org/10.3390/a18080511 - 14 Aug 2025
Viewed by 101
Abstract
With the increasing content richness of social media platforms, Multimodal Named Entity Recognition (MNER) faces the dual challenges of heterogeneous feature fusion and accurate entity recognition. Aiming at the key problems of inconsistent distribution of textual and visual information, insufficient feature alignment and [...] Read more.
With the increasing content richness of social media platforms, Multimodal Named Entity Recognition (MNER) faces the dual challenges of heterogeneous feature fusion and accurate entity recognition. Aiming at the key problems of inconsistent distribution of textual and visual information, insufficient feature alignment and noise interference fusion, this paper proposes a multimodal named entity recognition model based on dual-stream Transformer: CASF-MNER, which designs cross-modal cross-attention based on visual and textual features, constructs a bidirectional interaction mechanism between single-layer features, forms a higher-order semantic correlation modeling, and realizes the cross relevance alignment of modal features; construct a dynamic perception mechanism of multimodal feature saliency features based on multiscale pooling method, construct an entropy weighting strategy of global feature distribution information to adaptively suppress noise redundancy and enhance key feature expression; establish a deep semantic fusion method based on hybrid isomorphic model, design a progressive cross-modal interaction structure, and combine with contrastive learning to realize global fusion of the deep semantic space and representational consistency optimization. The experimental results show that CASF-MNER achieves excellent performance on both Twitter-2015 and Twitter-2017 public datasets, which verifies the effectiveness and advancement of the method proposed in this paper. Full article
(This article belongs to the Section Algorithms for Multidisciplinary Applications)
Show Figures

Figure 1

36 pages, 9430 KiB  
Article
Numerical Method for Internal Structure and Surface Evaluation in Coatings
by Tomas Kačinskas and Saulius Baskutis
Inventions 2025, 10(4), 71; https://doi.org/10.3390/inventions10040071 - 13 Aug 2025
Viewed by 110
Abstract
This study introduces a MATrix LABoratory (MATLAB, version R2024b, update 1 (24.2.0.2740171))-based automated system for the detection and measurement of indication areas in coated surfaces, enhancing the accuracy and efficiency of quality control processes in metal, polymeric and thermoplastic coatings. The developed code [...] Read more.
This study introduces a MATrix LABoratory (MATLAB, version R2024b, update 1 (24.2.0.2740171))-based automated system for the detection and measurement of indication areas in coated surfaces, enhancing the accuracy and efficiency of quality control processes in metal, polymeric and thermoplastic coatings. The developed code identifies various indication characteristics in the image and provides numerical results, assesses the size and quantity of indications and evaluates conformity to ISO standards. A comprehensive testing method, involving non-destructive penetrant testing (PT) and radiographic testing (RT), allowed for an in-depth analysis of surface and internal porosity across different coating methods, including aluminum-, copper-, polytetrafluoroethylene (PTFE)- and polyether ether ketone (PEEK)-based materials. Initial findings had a major impact on indicating a non-homogeneous surface of obtained coatings, manufactured using different technologies and materials. Whereas researchers using non-destructive testing (NDT) methods typically rely on visual inspection and manual counting, the system under study automates this process. Each sample image is loaded into MATLAB and analyzed using the Image Processing Tool, Computer Vision Toolbox, Statistics and Machine Learning Toolbox. The custom code performs essential tasks such as image conversion, filtering, boundary detection, layering operations and calculations. These processes are integral to rendering images with developed indications according to NDT method requirements, providing a detailed visual and numerical representation of the analysis. RT also validated the observations made through surface indication detection, revealing either the absence of hidden defects or, conversely, internal porosity correlating with surface conditions. Matrix and graphical representations were used to facilitate the comparison of test results, highlighting more advanced methods and materials as the superior choice for achieving optimal mechanical and structural integrity. This research contributes to addressing challenges in surface quality assurance, advancing digital transformation in inspection processes and exploring more advanced alternatives to traditional coating technologies and materials. Full article
(This article belongs to the Section Inventions and Innovation in Advanced Manufacturing)
Show Figures

Figure 1

20 pages, 6570 KiB  
Article
Autonomous Vehicle Maneuvering Using Vision–LLM Models for Marine Surface Vehicles
by Tae-Yeon Kim and Woen-Sug Choi
J. Mar. Sci. Eng. 2025, 13(8), 1553; https://doi.org/10.3390/jmse13081553 - 13 Aug 2025
Viewed by 168
Abstract
Recent advances in vision–language models (VLMs) have transformed the field of robotics. Researchers are combining the reasoning capabilities of large language models (LLMs) with the visual information processing capabilities of VLMs in various domains. However, most efforts have focused on terrestrial robots and [...] Read more.
Recent advances in vision–language models (VLMs) have transformed the field of robotics. Researchers are combining the reasoning capabilities of large language models (LLMs) with the visual information processing capabilities of VLMs in various domains. However, most efforts have focused on terrestrial robots and are limited in their applicability to volatile environments such as ocean surfaces and underwater environments, where real-time judgment is required. We propose a system integrating the cognition, decision making, path planning, and control of autonomous marine surface vehicles in the ROS2–Gazebo simulation environment using a multimodal vision–LLM system with zero-shot prompting for real-time adaptability. In 30 experiments, adding the path plan mode feature increased the success rate from 23% to 73%. The average distance increased from 39 m to 45 m, and the time required to complete the task increased from 483 s to 672 s. These results demonstrate the trade-off between improved reliability and reduced efficiency. Experiments were conducted to verify the effectiveness of the proposed system and evaluate its performance with and without adding a path-planning step. The final algorithm with the path-planning sub-process yields a higher success rate, and better average path length and time. We achieve real-time environmental adaptability and performance improvement through prompt engineering and the addition of a path-planning sub-process in a limited structure, where the LLM state is initialized with every application programming interface call (zero-shot prompting). Additionally, the developed system is independent of the vision–LLM archetype, making it scalable and adaptable to future models. Full article
(This article belongs to the Special Issue Intelligent Measurement and Control System of Marine Robots)
Show Figures

Figure 1

26 pages, 4766 KiB  
Article
RetinoDeep: Leveraging Deep Learning Models for Advanced Retinopathy Diagnostics
by Sachin Kansal, Bajrangi Kumar Mishra, Saniya Sethi, Kanika Vinayak, Priya Kansal and Jyotindra Narayan
Sensors 2025, 25(16), 5019; https://doi.org/10.3390/s25165019 - 13 Aug 2025
Viewed by 211
Abstract
Diabetic retinopathy (DR), a leading cause of vision loss worldwide, poses a critical challenge to healthcare systems due to its silent progression and the reliance on labor-intensive, subjective manual screening by ophthalmologists, especially amid a global shortage of eye care specialists. Addressing the [...] Read more.
Diabetic retinopathy (DR), a leading cause of vision loss worldwide, poses a critical challenge to healthcare systems due to its silent progression and the reliance on labor-intensive, subjective manual screening by ophthalmologists, especially amid a global shortage of eye care specialists. Addressing the pressing need for scalable, objective, and interpretable diagnostic tools, this work introduces RetinoDeep—deep learning frameworks integrating hybrid architectures and explainable AI to enhance the automated detection and classification of DR across seven severity levels. Specifically, we propose four novel models: an EfficientNetB0 combined with an SPCL transformer for robust global feature extraction; a ResNet50 ensembled with Bi-LSTM to synergize spatial and sequential learning; a Bi-LSTM optimized through genetic algorithms for hyperparameter tuning; and a Bi-LSTM with SHAP explainability to enhance model transparency and clinical trustworthiness. The models were trained and evaluated on a curated dataset of 757 retinal fundus images, augmented to improve generalization, and benchmarked against state-of-the-art baselines (including EfficientNetB0, Hybrid Bi-LSTM with EfficientNetB0, Hybrid Bi-GRU with EfficientNetB0, ResNet with filter enhancements, Bi-LSTM optimized using Random Search Algorithm (RSA), Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), and a standard Convolutional Neural Network (CNN)), using metrics such as accuracy, F1-score, and precision. Notably, the Bi-LSTM with Particle Swarm Optimization (PSO) outperformed other configurations, achieving superior stability and generalization, while SHAP visualizations confirmed alignment between learned features and key retinal biomarkers, reinforcing the system’s interpretability. By combining cutting-edge neural architectures, advanced optimization, and explainable AI, this work sets a new standard for DR screening systems, promising not only improved diagnostic performance but also potential integration into real-world clinical workflows. Full article
Show Figures

Figure 1

22 pages, 10765 KiB  
Article
Exploring the Cognitive Reconstruction Mechanism of Generative AI in Outcome-Based Design Education: A Study on Load Optimization and Performance Impact Based on Dual-Path Teaching
by Qidi Dong, Jiaxi He, Nanxin Li, Binzhu Wang, Heng Lu and Yingyin Yang
Buildings 2025, 15(16), 2864; https://doi.org/10.3390/buildings15162864 - 13 Aug 2025
Viewed by 189
Abstract
Undergraduate design education faces a structural contradiction characterized by high cognitive load (CL) and relatively low innovation output. Meanwhile, existing generative AI tools predominantly emphasize the generation of visual outcomes, often overlooking the logical guidance mechanisms inherent in design thinking. This study proposes [...] Read more.
Undergraduate design education faces a structural contradiction characterized by high cognitive load (CL) and relatively low innovation output. Meanwhile, existing generative AI tools predominantly emphasize the generation of visual outcomes, often overlooking the logical guidance mechanisms inherent in design thinking. This study proposes a Dual-Path teaching model integrating critical reconstruction behaviors to examine how AI enhances design thinking. It adopts structured interactions with the DeepSeek large language model, CL theory, and Structural Equation Modeling for analysis. Quantitative results indicate that AI-assisted paths significantly enhance design quality (72.43 vs. 65.60 in traditional paths). This improvement is attributed to a “direct effect + multiple mediators” model: specifically, AI reduced the mediating role of Extraneous Cognitive Load from 0.907 to 0.017, while simultaneously enhancing its investment in Germane Cognitive Load to support deep, innovative thinking. Theoretically, this study is among the first to integrate AI-driven critical reconstruction behaviors (e.g., iteration count, cross-domain terms) into CL theory, validating the “logical chain externalization → load optimization” mechanism in design education contexts. Practically, it provides actionable strategies for the digital transformation of design education, fostering interdisciplinary thinking and advancing a teaching paradigm where low-order cognition is outsourced to reinforce high-order creative thinking. Full article
(This article belongs to the Topic Architectural Education)
Show Figures

Figure 1

29 pages, 12262 KiB  
Article
3D Heritage Reconstruction Through HBIM and Multi-Source Data Fusion: Geometric Change Analysis Across Decades
by Przemysław Klapa, Andrzej Żygadło and Massimiliano Pepe
Appl. Sci. 2025, 15(16), 8929; https://doi.org/10.3390/app15168929 - 13 Aug 2025
Viewed by 161
Abstract
The reconstruction of historic buildings requires the integration of diverse data sources, both geometric and non-geometric. This study presents a multi-source data analysis methodology for heritage reconstruction using 3D modeling and Historic Building Information Modeling (HBIM). The proposed approach combines geometric data, including [...] Read more.
The reconstruction of historic buildings requires the integration of diverse data sources, both geometric and non-geometric. This study presents a multi-source data analysis methodology for heritage reconstruction using 3D modeling and Historic Building Information Modeling (HBIM). The proposed approach combines geometric data, including point clouds acquired via Terrestrial Laser Scanning (TLS), with architectural documentation and non-geometric information such as photographs, historical records, and technical descriptions. The case study focuses on a wooden Orthodox church in Żmijowiska, Poland, analyzing geometric changes in the structure over multiple decades. The reconstruction process integrates modern surveys with archival sources and, in the absence of complete geometric data, utilizes semantic, topological, and structural information. Geometric datasets from the 1990s, 1930s, and the turn of the 20th century were analyzed, supplemented by intermediate archival photographs and technical documentation. This integrated method enabled the identification of transformation phases and verification of discrepancies between historical records and the building’s actual condition. The findings confirm that the use of HBIM and multi-source data fusion facilitates accurate reconstruction of historical geometry and supports visualization of spatial changes across decades. Full article
Show Figures

Figure 1

20 pages, 4191 KiB  
Article
A Deep Transfer Contrastive Learning Network for Few-Shot Hyperspectral Image Classification
by Gan Yang and Zhaohui Wang
Remote Sens. 2025, 17(16), 2800; https://doi.org/10.3390/rs17162800 - 13 Aug 2025
Viewed by 202
Abstract
Over recent decades, the hyperspectral image (HSI) classification landscape has undergone significant transformations driven by advances in deep learning (DL). Despite substantial progress, few-shot scenarios remain a significant challenge, primarily due to the high cost of manual annotation and the unreliability of visual [...] Read more.
Over recent decades, the hyperspectral image (HSI) classification landscape has undergone significant transformations driven by advances in deep learning (DL). Despite substantial progress, few-shot scenarios remain a significant challenge, primarily due to the high cost of manual annotation and the unreliability of visual interpretation. Traditional DL models require massive datasets to learn sophisticated feature representations, hindering their full potential in data-scarce contexts. To tackle this issue, a deep transfer contrastive learning network is proposed. A spectral data augmentation module is incorporated to expand limited sample pairs. Subsequently, a spatial–spectral feature extraction module is designed to fuse the learned feature information. The weights of the spatial feature extraction network are initialized with knowledge transferred from source-domain pretraining, while the spectral residual network acquires rich spectral information. Furthermore, contrastive learning is integrated to enhance discriminative representation learning from scarce samples, effectively mitigating obstacles arising from the high inter-class similarity and large intra-class variance inherent in HSIs. Experiments on four public HSI datasets demonstrate that our method achieves competitive performance against state-of-the-art approaches. Full article
Show Figures

Figure 1

11 pages, 1650 KiB  
Article
A RUBY Reporter for Efficient Banana Transformation and Development of Betalain-Rich Musa Germplasm
by Weidi He, Huoqing Huang, Shuxian Wang, Dalin Wang, Yanling Xie and Chunhua Hu
Int. J. Mol. Sci. 2025, 26(16), 7805; https://doi.org/10.3390/ijms26167805 - 13 Aug 2025
Viewed by 116
Abstract
Bananas are economically important crops valued for both their nutritional and dietary uses. However, the global banana industry suffers from a narrow base dominated by a single variety. Developing novel varieties enriched in health-promoting compounds such as betalains can help diversify banana germplasm [...] Read more.
Bananas are economically important crops valued for both their nutritional and dietary uses. However, the global banana industry suffers from a narrow base dominated by a single variety. Developing novel varieties enriched in health-promoting compounds such as betalains can help diversify banana germplasm and meet evolving consumer demands. In this study, the RUBY reporter system was employed to produce betalain-rich bananas via stable and transient genetic transformations. Transient transformation by injecting 3 mL of Agrobacterium suspension into immature fruits produced vivid red-purple pulp containing up to 1.78 mg/g of betalains. For stable transformation, embryonic cell suspensions expressing RUBY exhibited a red-purple coloration after the first screening, reducing the selection period from 45 to 15 days. These findings demonstrate that RUBY is a reliable visual reporter for efficient screening and can be used to develop nutritionally enhanced bananas. Full article
(This article belongs to the Special Issue Plant Breeding and Genetics: New Findings and Perspectives)
Show Figures

Figure 1

18 pages, 1534 KiB  
Article
TSGformer: A Unified Temporal–Spatial Graph Transformer with Adaptive Cross-Scale Modeling for Multivariate Time Series
by Yan Chen, Cheng Li and Xiaoli Zhao
Systems 2025, 13(8), 688; https://doi.org/10.3390/systems13080688 - 12 Aug 2025
Viewed by 187
Abstract
Multivariate time series forecasting requires modeling complex and evolving spatio-temporal dependencies as well as frequency-domain patterns; however, the existing Transformer-based approaches often struggle to effectively capture dynamic inter-series correlations and disentangle relevant spectral components, leading to limited forecasting accuracy and robustness under non-stationary [...] Read more.
Multivariate time series forecasting requires modeling complex and evolving spatio-temporal dependencies as well as frequency-domain patterns; however, the existing Transformer-based approaches often struggle to effectively capture dynamic inter-series correlations and disentangle relevant spectral components, leading to limited forecasting accuracy and robustness under non-stationary conditions. To address these challenges, we propose TSGformer, a Transformer-based architecture that integrates multi-scale adaptive graph learning, adaptive spectral decomposition, and cross-scale interactive fusion modules to jointly model temporal, spatial, and spectral dynamics in multivariate time series data. Specifically, TSGformer constructs dynamic graphs at multiple temporal scales to adaptively learn evolving inter-variable relationships, applies an adaptive spectral enhancement module to emphasize critical frequency components while suppressing noise, and employs interactive convolution blocks to fuse multi-domain features effectively. Extensive experiments across eight benchmark datasets show that TSGformer achieves the best results on five datasets, with an MSE of 0.354 on Exchange, improving upon the best baselisnes by 2.4%. Ablation studies further verify the effectiveness of each proposed component, and visualization analyses reveal that TSGformer captures meaningful dynamic correlations aligned with real-world patterns. Full article
Show Figures

Figure 1

21 pages, 6057 KiB  
Article
PFSKANs: A Novel Pixel-Level Feature Selection Model Based on Kolmogorov–Arnold Networks
by Rui Yang, Michael V. Basin, Guangzhe Yao and Hongzheng Zeng
Sensors 2025, 25(16), 4982; https://doi.org/10.3390/s25164982 - 12 Aug 2025
Viewed by 209
Abstract
Inspired by the interpretability of Kolmogorov–Arnold Networks (KANs), a novel Pixel-level Feature Selection (PFS) model based on KANs (PFSKANs) is proposed as a fundamentally distinct alternative from trainable Convolutional Neural Networks (CNNs) and transformers in the computer vision tasks. We modify the simplification [...] Read more.
Inspired by the interpretability of Kolmogorov–Arnold Networks (KANs), a novel Pixel-level Feature Selection (PFS) model based on KANs (PFSKANs) is proposed as a fundamentally distinct alternative from trainable Convolutional Neural Networks (CNNs) and transformers in the computer vision tasks. We modify the simplification techniques of KANs to detect key pixels with high contribution scores directly at the input image. Specifically, a trainable selection procedure is intuitively visualized and performed only once, since the obtained interpretable pixels can subsequently be identified and dimensionally standardized using the proposed mathematical approach. Experiments on the image classification tasks using the MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets demonstrate that PFSKANs achieve comparable performance to CNNs in terms of accuracy, parameter efficiency, and training time. Full article
Show Figures

Figure 1

Back to TopTop