MDPI - Publisher of Open Access Journals

18 pages, 2950 KB

Open AccessArticle

A Target-Free Vision-Based Method for Measuring Girder Rigid-Body Displacement Under Long-Distance Imaging Conditions

by Guangyu Li, Hai-Bin Huang, Shengzhi Ai, Yuan Cheng and Dong Liang

Infrastructures 2026, 11(5), 161; https://doi.org/10.3390/infrastructures11050161 - 6 May 2026

Viewed by 255

The rigid-body displacement of bridge girders, particularly the lateral displacement of curved girder bridges, is a critical indicator reflecting the structural safety reserve and durability of bridges. However, under long-distance imaging conditions, the inherent scale ambiguity and perspective distortion in monocular vision measurement, [...] Read more.

The rigid-body displacement of bridge girders, particularly the lateral displacement of curved girder bridges, is a critical indicator reflecting the structural safety reserve and durability of bridges. However, under long-distance imaging conditions, the inherent scale ambiguity and perspective distortion in monocular vision measurement, coupled with environmental interferences such as weakened natural edges and varying illumination, pose severe challenges to target-free, high-precision, and real-time displacement measurement. To this end, this paper proposes a target-free visual method for measuring rigid-body displacement of bridge girders under long-distance imaging. By fusing optical flow and Hough transform to extract seismic block edges and adopting hierarchical NCC matching for stable girder tracking, the method achieves millimeter-level accuracy, real-time performance, and strong illumination robustness. Model tests and field validation confirm its effectiveness for low-cost bridge health monitoring. Full article

(This article belongs to the Special Issue Sustainable Bridge Engineering)

► Show Figures

Figure 1

37 pages, 6776 KB

Open AccessArticle

Semantic Mapping and Cross-Model Data Integration in BIM: A Lightweight and Scalable Schedule-Level Workflow

by Tianjiao Zhao and Ri Na

Buildings 2026, 16(7), 1347; https://doi.org/10.3390/buildings16071347 - 28 Mar 2026

Viewed by 679

Abstract

Despite the widespread adoption of BIM, information exchange across disciplines remains hindered by heterogeneous structures at the tabular data level, particularly when integrating data across multiple discipline-specific models. Manual mapping, rigid templates, or one-off programming scripts are labor-intensive and difficult to scale, limiting [...] Read more.

Despite the widespread adoption of BIM, information exchange across disciplines remains hindered by heterogeneous structures at the tabular data level, particularly when integrating data across multiple discipline-specific models. Manual mapping, rigid templates, or one-off programming scripts are labor-intensive and difficult to scale, limiting automated querying, cross-model aggregation, and schedule-level analytics. This study proposes a lightweight, workflow-driven approach for semantic normalization and cross-model integration of BIM schedule data, with optional script-supported workflow configuration used only to assist the configuration of deterministic, rule-guided mapping logic, rather than serving as a core analytical method. By introducing a customizable subcategory layer, the workflow enables fine-grained semantic alignment and efficient normalization across diverse schedule datasets, implemented through lightweight Python scripting and rule-guided semantic matching used solely as a supporting mechanism for deterministic field mapping. Using structural, architectural, and HVAC models, we demonstrate a stepwise process including data cleaning, hierarchical classification, consistency checking, batch analytics, and automated computation of cross-model metrics such as opening-to-wall ratios. Sample-based validation confirms the workflow’s reliability, achieving semantic mapping agreement rates above 95% and reducing manual processing time by more than 85%. The workflow is readily extensible to other disciplines and modeling conventions, supporting high-throughput data integration for tasks such as design coordination, semantic alignment, RFI reduction, accelerated design reviews, and data-driven decision making. Overall, rather than introducing a new algorithm, the contribution of this work lies in formalizing a reusable, schedule-level workflow abstraction that enables consistent semantic alignment and automated cross-model aggregation without relying on rigid ontologies or training-intensive learning-based models. Any optional tooling used during workflow configuration is auxiliary and does not constitute a standalone learning-based method requiring model training or performance benchmarking. This provides a reusable methodological foundation for scalable, schedule-level BIM data integration and cross-model analytics. Full article

(This article belongs to the Special Issue Building Information Modelling (BIM) Applications in Construction Management: 2nd Edition)

► Show Figures

Figure 1

22 pages, 8563 KB

Open AccessArticle

Computer Simulation-Guided Rational Design of Sulfadiazine-Imprinted Polymers for High-Efficiency Adsorption of Antibiotics in Complex Aquatic Matrices

by Mengfan Xu, Yanhong Wang, Mingfen Niu, Qiang Zhou and Wang Yang

Membranes 2026, 16(4), 118; https://doi.org/10.3390/membranes16040118 - 28 Mar 2026

Viewed by 487

Abstract

To address the limited selectivity of conventional membrane materials toward sulfonamide antibiotics, this study employed a DFT calculation approach to optimize the design of a molecularly imprinted system for sulfadiazine (SDZ). A hierarchical set of template molecules—aniline (ANL), sulfanilamide (SNM), and SDZ—was introduced [...] Read more.

To address the limited selectivity of conventional membrane materials toward sulfonamide antibiotics, this study employed a DFT calculation approach to optimize the design of a molecularly imprinted system for sulfadiazine (SDZ). A hierarchical set of template molecules—aniline (ANL), sulfanilamide (SNM), and SDZ—was introduced to systematically elucidate structure-dependent template–monomer matching mechanisms in sulfonamide imprinting systems. Through rational screening, trifluoroethyl methacrylate (TFEMAA) was identified as the optimal functional monomer, with an optimal imprinting molar ratio of 1:4 (SDZ to TFEMAA). Guided by the simulation results, SDZ molecularly imprinted polymers (MIPs) were synthesized via precipitation polymerization and systematically characterized for their morphology and recognition properties. The MIPs exhibited a well-defined spherical morphology with abundant imprinted cavities, achieving adsorption equilibrium within 1.5 h. The adsorption kinetics followed a pseudo-second-order model, indicating a chemisorption-dominated process. Scatchard analysis revealed the presence of both high- and low-affinity binding sites in the MIPs. Selectivity experiments, quantified by distribution coefficients (K_d) and selectivity coefficients (k), demonstrated a significantly higher adsorption capacity for SDZ than for structural analogs and non-analogs. In real water samples, the MIPs outperformed conventional HLB sorbents and showed strong anti-interference capability (RSD < 3%). This work provides a material foundation for developing highly selective SDZ-imprinted membranes and advances the application of molecular imprinting technology in membrane separation systems. Full article

(This article belongs to the Special Issue Advances in Reverse Osmosis Membrane Research Through Computer Simulation)

► Show Figures

Figure 1

18 pages, 3239 KB

Open AccessArticle

LPA-Tuning CLIP: An Improved CLIP-Based Classification Model for Intestinal Polyps

by Zumin Wang, Jun Gao, Wenhao Ping, Jing Qin and Changqing Ji

Sensors 2026, 26(6), 1764; https://doi.org/10.3390/s26061764 - 11 Mar 2026

Viewed by 444

Abstract

Background and Objective: Accurate classification of intestinal polyps is crucial for preventing colorectal cancer but is hindered by visual similarity among subtypes and endoscopic variability. While deep learning aids in diagnosis, single-modal models face efficiency–accuracy trade-offs and ignore pathological semantics. We propose a [...] Read more.

Background and Objective: Accurate classification of intestinal polyps is crucial for preventing colorectal cancer but is hindered by visual similarity among subtypes and endoscopic variability. While deep learning aids in diagnosis, single-modal models face efficiency–accuracy trade-offs and ignore pathological semantics. We propose a multimodal framework that integrates endoscopic images with structured pathological descriptions to bridge this gap. Methods: We propose LPA-Tuning CLIP, which incorporates three key innovations: replacing CLIP’s instance-level contrastive loss with cross-modal projection matching (CMPM) with ID loss to explicitly optimize intraclass compactness and interclass separation through label-aware image-text similarity matrices; introducing structured clinical semantic templates that encode WHO diagnostic criteria into hierarchical text prompts for consistent pathology annotations; and developing medical-aware augmentation that preserves lesion features while reducing domain shifts. Results: The experimental results demonstrate that our proposed method achieves an accuracy of 85.8% and an F1 score of 0.862 on the internal test set, establishing a new state-of-the-art performance for intestinal polyp classification. Conclusions: This study proposes a multimodal polyp classification paradigm that achieves 85.8% accuracy on three-subtype classification via endoscopic image-pathology text joint representation learning, outperforming unimodal baselines by 8.7% and a multimodal baseline by 4.3%. Full article

(This article belongs to the Special Issue AI and Intelligent Sensors for Medical Imaging)

► Show Figures

Figure 1

27 pages, 4721 KB

Open AccessArticle

A Template-Based Approach for Industrial Title Block Compliance Check

by Olivier Laurendin, Khwansiri Ninpan, Quentin Robcis, Richard Lehaut, Hélène Danlos, Nicolas Bureau and Robert Plana

Algorithms 2026, 19(2), 105; https://doi.org/10.3390/a19020105 - 29 Jan 2026

Viewed by 830

Abstract

Title block compliance checking requires interpreting irregular tabular layouts and reporting structural inconsistencies, not only extracting metadata. This paper introduces a user-in-the-loop, template-based method that leverages a graphical annotation workflow to encode title block structure as a hierarchical annotation graph combining detected primitives [...] Read more.

Title block compliance checking requires interpreting irregular tabular layouts and reporting structural inconsistencies, not only extracting metadata. This paper introduces a user-in-the-loop, template-based method that leverages a graphical annotation workflow to encode title block structure as a hierarchical annotation graph combining detected primitives (cells/text) with user-defined semantic entities (key–value pairs, tables, headers). The resulting template is matched onto target title blocks using relative positional constraints and category-specific rules that distinguish acceptable variability from non-compliance (e.g., variable-size tables versus missing fields). The system outputs extracted key–value information and localized warning logs for end-user correction. On a real industrial example from the nuclear domain, the approach achieves 98–99% compliant annotation matching and 84% accuracy in flagging structural/content deviations, while remaining tolerant to moderate layout changes. Limitations and extensions are discussed, including support for additional fields, improved key similarity metrics, operational deployment with integrated feedback and broader benchmarking. Full article

► Show Figures

Figure 1

19 pages, 1533 KB

Open AccessArticle

Multi-Chain of Thought Prompt Learning for Aspect-Based Sentiment Analysis

by Yating He, Zhenzhen He, Tiquan Gu, Bowen Gu, Yaling Wan and Min Li

Appl. Sci. 2025, 15(22), 12225; https://doi.org/10.3390/app152212225 - 18 Nov 2025

Cited by 1 | Viewed by 1442

Abstract

Due to their extensive common-sense knowledge and linguistic understanding, large language models (LLMs) have demonstrated remarkable capabilities in text comprehension and logical reasoning for natural language processing tasks. Traditional prompt-based learning methods, which rely on contextual pattern matching, have proven to be effective [...] Read more.

Due to their extensive common-sense knowledge and linguistic understanding, large language models (LLMs) have demonstrated remarkable capabilities in text comprehension and logical reasoning for natural language processing tasks. Traditional prompt-based learning methods, which rely on contextual pattern matching, have proven to be effective in extracting knowledge from LLMs. However, these approaches are constrained by training data pattern matching, overlook reasoning processes, and consequently suffer from suboptimal prompt performance and limited interpretability. Moreover, considering that the intermediate steps generated by single-chain reasoning may not effectively assist LLMs in identifying the sentiment polarity of aspect terms, and that multiple reasoning paths often exist for complex reasoning tasks to reach correct conclusions, this paper proposes a Multi-Chain Thought Prompt Learning framework (MT-CPL). Starting from fundamental concepts, this method simulates human multi-path reasoning patterns to progressively construct comprehensive thought processes and deeply explore sentiment cues. Based on syntactic structures and the semantic logic of text, the framework incorporates four distinct perspectives of text comprehension: hierarchical reading, experiential reading, keyword-based reading, and analogical reading. It establishes a multi-chain prompt template and employs voting mechanisms to select correct reasoning path outcomes. The MT-CPL approach aims to guide LLMs in mining multi-dimensional textual information from different perspectives, gradually uncovering hidden contextual sentiment clues, while mitigating issues caused by irrelevant sentiment cues in intermediate reasoning steps. By decomposing main tasks incrementally, the method achieves progressive reasoning, effectively reduces the difficulty of direct analysis, and further enhances model interpretability through the integration of inherent common-sense knowledge. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

27 pages, 1949 KB

Open AccessArticle

Hierarchical Prompt Engineering for Remote Sensing Scene Understanding with Large Vision–Language Models

by Tianyang Chen and Jianliang Ai

Remote Sens. 2025, 17(22), 3727; https://doi.org/10.3390/rs17223727 - 16 Nov 2025

Viewed by 1933

Abstract

Vision–language models (VLMs) show strong potential for remote-sensing scene classification but still struggle with fine-grained categories and distribution shifts. We introduce a hierarchical prompting framework that decomposes recognition into a coarse-to-fine decision process with structured outputs, combined with parameter-efficient adaptation using LoRA/QLoRA. To [...] Read more.

Vision–language models (VLMs) show strong potential for remote-sensing scene classification but still struggle with fine-grained categories and distribution shifts. We introduce a hierarchical prompting framework that decomposes recognition into a coarse-to-fine decision process with structured outputs, combined with parameter-efficient adaptation using LoRA/QLoRA. To evaluate robustness without depending on external benchmarks, we construct five protocol variants of the AID (V₀–V₄) that systematically vary label granularity, class consolidation, and augmentation settings. Each variant is designed to align with a specific prompting style and hierarchy. The data pipeline follows a strict split-before-augment strategy, in which augmentation is applied only to the training split to avoid train-test leakage. We further audit leakage using rotation/flip–invariant perceptual hashing across splits to ensure reproducibility. Experiments on all five AID variants show that hierarchical prompting consistently outperforms non-hierarchical prompts and matches or exceeds full fine-tuning, while requiring substantially less compute. Ablation studies on prompt design, adaptation strategy, and model capacity—together with confusion matrices and class-wise metrics—indicate improved recognition at both coarse and fine levels, as well as robustness to rotations and flips. The proposed framework provides a strong, reproducible baseline for remote-sensing scene classification under constrained compute and includes complete prompt templates and processing scripts to support replication. Full article

► Show Figures

Figure 1

21 pages, 14861 KB

Open AccessArticle

Feature Equalization and Hierarchical Decoupling Network for Rotated and High-Aspect-Ratio Object Detection

by Wenbin Gao, Jinda Ji and Donglin Jing

Symmetry 2025, 17(9), 1491; https://doi.org/10.3390/sym17091491 - 9 Sep 2025

Viewed by 1099

Abstract

Current mainstream remote sensing target detection algorithms mostly estimate the rotation angle of targets by designing different bounding box descriptions and loss functions. However, they fail to consider the symmetry–asymmetry duality anisotropy in the distribution of key features required for target localization. Moreover, [...] Read more.

Current mainstream remote sensing target detection algorithms mostly estimate the rotation angle of targets by designing different bounding box descriptions and loss functions. However, they fail to consider the symmetry–asymmetry duality anisotropy in the distribution of key features required for target localization. Moreover, the equivalent feature extraction mode of shared convolutional kernels may lead to difficulties in accurately predicting parameters with different attributes, thereby reducing the performance of the detector. In this paper, we propose the Feature Equalization and Hierarchical Decoupling Network (FEHD-Net), which comprises three core components: a Symmetry-Enhanced Parallel Interleaved Convolution Module (PICM), a Parameter Decoupling Module (PDM), and a Critical Feature Matching Loss Function (CFM-Loss). PICM captures diverse spatial features over long distances by integrating square convolution and multi-branch continuous orthogonal large kernel strip convolution sequences, thereby enhancing the network’s capability in processing long-distance spatial information. PDM decomposes feature maps with different properties and assigns them to different regression branches to estimate the parameters of the target’s rotating bounding box. Finally, to stabilize the training of anchors with different qualities that have captured the key features required for detection, CFM-Loss utilizes the intersection ratio between anchors and true value labels, as well as the uncertainty of convolutional regression during training, and designs an alignment criterion (symmetry-aware alignment) to evaluate the regression ability of different anchors. This enables the network to fine-tune the processing of templates with different qualities, achieving stable training of the network. A large number of experiments demonstrate that compared with existing methods, FEHD-Net can achieve state-of-the-art performance on DOTA, HRSC2016, and UCAS-AOD datasets. Full article

(This article belongs to the Special Issue Symmetry and Asymmetry Study in Object Detection)

► Show Figures

Figure 1

25 pages, 13698 KB

Open AccessEditor’s ChoiceArticle

Self-Supervised Foundation Model for Template Matching

by Anton Hristov, Dimo Dimov and Maria Nisheva-Pavlova

Big Data Cogn. Comput. 2025, 9(2), 38; https://doi.org/10.3390/bdcc9020038 - 11 Feb 2025

Cited by 5 | Viewed by 3749

Abstract

Finding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations [...] Read more.

Finding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations in the textures, different modalities, and weak visual features exist in the images, leading to limited applications on real-world tasks. We introduce Self-Supervised Foundation Model for Template Matching (Self-TM), a novel end-to-end approach to self-supervised learning template matching. The idea behind Self-TM is to learn hierarchical features incorporating localization properties from images without any annotations. As going deeper in the convolutional neural network (CNN) layers, their filters begin to react to more complex structures and their receptive fields increase. This leads to loss of localization information in contrast to the early layers. The hierarchical propagation of the last layers back to the first layer results in precise template localization. Due to its zero-shot generalization capabilities on tasks such as image retrieval, dense template matching, and sparse image matching, our pre-trained model can be classified as a foundation one. Full article

(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)

► Show Figures

Figure 1

24 pages, 8460 KB

Open AccessArticle

Combining Higher-Order Statistics and Array Techniques to Pick Low-Energy P-Seismic Arrivals

by Giovanni Messuti, Mauro Palo, Silvia Scarpetta, Ferdinando Napolitano, Francesco Scotto di Uccio, Paolo Capuano and Ortensia Amoroso

Appl. Sci. 2025, 15(3), 1172; https://doi.org/10.3390/app15031172 - 24 Jan 2025

Cited by 1 | Viewed by 1381

Abstract

We propose the HOSA algorithm to pick P-wave arrival times on seismic arrays. HOSA comprises two stages: a single-trace stage (STS) and a multi-channel stage (MCS). STS seeks deviations in higher-order statistics from background noise to identify sets of potential onsets on each [...] Read more.

We propose the HOSA algorithm to pick P-wave arrival times on seismic arrays. HOSA comprises two stages: a single-trace stage (STS) and a multi-channel stage (MCS). STS seeks deviations in higher-order statistics from background noise to identify sets of potential onsets on each trace. STS employs various thresholds and identifies an onset only for solutions that are gently variable with the threshold. Uncertainty is assigned to onsets based on their variation with the threshold. MCS verifies that detected onsets are consistent with the array geometry. It groups onsets within an array by hierarchical agglomerative clustering and selects only groups whose maximum differential times are consistent with the P-wave travel time across the array. HOSA needs a set of P-onsets to be calibrated. These sets may be already available (e.g., preliminary catalogs) or retrieved from picking (manually/automatically) a subset of traces in the target area. We tested HOSA on 226 microearthquakes recorded by 20 temporary arrays of 10 stations each, deployed in the Irpinia region (Southern Italy), which, in 1980, experienced a devastating 6.9 Ms earthquake. HOSA parameters were calibrated using a preliminary catalog of onsets obtained using an automatic template-matching approach. HOSA solutions are more reliable, less prone to false detection, and show higher inter-array consistency than template-matching solutions. Full article

(This article belongs to the Special Issue Advanced Research in Seismic Monitoring and Activity Analysis)

► Show Figures

Figure 1

34 pages, 15986 KB

Open AccessArticle

A Comprehensive Framework for Transportation Infrastructure Digitalization: TJYRoad-Net for Enhanced Point Cloud Segmentation

by Zhen Yang, Mingxuan Wang and Shikun Xie

Sensors 2024, 24(22), 7222; https://doi.org/10.3390/s24227222 - 12 Nov 2024

Cited by 2 | Viewed by 2082

Abstract

This research introduces a cutting-edge approach to traffic infrastructure digitization, integrating UAV oblique photography with LiDAR point clouds for high-precision, lightweight 3D road modeling. The proposed method addresses the challenge of accurately capturing the current state of infrastructure while minimizing redundancy and optimizing [...] Read more.

This research introduces a cutting-edge approach to traffic infrastructure digitization, integrating UAV oblique photography with LiDAR point clouds for high-precision, lightweight 3D road modeling. The proposed method addresses the challenge of accurately capturing the current state of infrastructure while minimizing redundancy and optimizing computational efficiency. A key innovation is the development of the TJYRoad-Net model, which achieves over 85% mIoU segmentation accuracy by including a traffic feature computing (TFC) module composed of three critical components: the Regional Coordinate Encoder (RCE), the Context-Aware Aggregation Unit (CAU), and the Hierarchical Expansion Block. Comparative analysis segments the point clouds into road and non-road categories, achieving centimeter-level registration accuracy with RANSAC and ICP. Two lightweight surface reconstruction techniques are implemented: (1) algorithmic reconstruction, which delivers a 6.3 mm elevation error at 95% confidence in complex intersections, and (2) template matching, which replaces road markings, poles, and vegetation using bounding boxes. These methods ensure accurate results with minimal memory overhead. The optimized 3D models have been successfully applied in driving simulation and traffic flow analysis, providing a practical and scalable solution for real-world infrastructure modeling and analysis. These applications demonstrate the versatility and efficiency of the proposed methods in modern traffic system simulations. Full article

(This article belongs to the Special Issue UAVs Revolutionizing Smart City Transportation: Innovations, Challenges, and Potential)

► Show Figures

Figure 1

16 pages, 13027 KB

Open AccessArticle

A Real-Time Global Re-Localization Framework for a 3D LiDAR-Based Navigation System

by Ziqi Chai, Chao Liu and Zhenhua Xiong

Sensors 2024, 24(19), 6288; https://doi.org/10.3390/s24196288 - 28 Sep 2024

Viewed by 3124

Abstract

Place recognition is widely used to re-localize robots in pre-built point cloud maps for navigation. However, current place recognition methods can only be used to recognize previously visited places. Moreover, these methods are limited by the requirement of using the same types of [...] Read more.

Place recognition is widely used to re-localize robots in pre-built point cloud maps for navigation. However, current place recognition methods can only be used to recognize previously visited places. Moreover, these methods are limited by the requirement of using the same types of sensors in the re-localization process and the process is time consuming. In this paper, a template-matching-based global re-localization framework is proposed to address these challenges. The proposed framework includes an offline building stage and an online matching stage. In the offline stage, virtual LiDAR scans are densely resampled in the map and rotation-invariant descriptors can be extracted as templates. These templates are hierarchically clustered to build a template library. The map used to collect virtual LiDAR scans can be built either by the robot itself previously, or by other heterogeneous sensors. So, an important feature of the proposed framework is that it can be used in environments that have never been visited by the robot before. In the online stage, a cascade coarse-to-fine template matching method is proposed for efficient matching, considering both computational efficiency and accuracy. In the simulation with 100 K templates, the proposed framework achieves a 99% success rate and around 11 Hz matching speed when the re-localization error threshold is 1.0 m. In the validation on The Newer College Dataset with 40 K templates, it achieves a 94.67% success rate and around 7 Hz matching speed when the re-localization error threshold is 1.0 m. All the results show that the proposed framework has high accuracy, excellent efficiency, and the capability to achieve global re-localization in heterogeneous maps. Full article

(This article belongs to the Special Issue Simultaneous Localization and Mapping (SLAM) and Artificial Intelligence (AI) Based Localization for Positioning Applications and Mobile Robot Navigation—Second Edition)

► Show Figures

Figure 1

16 pages, 483 KB

Open AccessArticle

Query-Based Object Visual Tracking with Parallel Sequence Generation

by Chang Liu, Bin Zhang, Chunjuan Bo and Dong Wang

Sensors 2024, 24(15), 4802; https://doi.org/10.3390/s24154802 - 24 Jul 2024

Cited by 3 | Viewed by 2398

Abstract

Query decoders have been shown to achieve good performance in object detection. However, they suffer from insufficient object tracking performance. Sequence-to-sequence learning in this context has recently been explored, with the idea of describing a target as a sequence of discrete tokens. In [...] Read more.

Query decoders have been shown to achieve good performance in object detection. However, they suffer from insufficient object tracking performance. Sequence-to-sequence learning in this context has recently been explored, with the idea of describing a target as a sequence of discrete tokens. In this study, we experimentally determine that, with appropriate representation, a parallel approach for predicting a target coordinate sequence with a query decoder can achieve good performance and speed. We propose a concise query-based tracking framework for predicting a target coordinate sequence in a parallel manner, named QPSTrack. A set of queries are designed to be responsible for different coordinates of the tracked target. All the queries jointly represent a target rather than a traditional one-to-one matching pattern between the query and target. Moreover, we adopt an adaptive decoding scheme including a one-layer adaptive decoder and learnable adaptive inputs for the decoder. This decoding scheme assists the queries in decoding the template-guided search features better. Furthermore, we explore the use of the plain ViT-Base, ViT-Large, and lightweight hierarchical LeViT architectures as the encoder backbone, providing a family of three variants in total. All the trackers are found to obtain a good trade-off between speed and performance; for instance, our tracker QPSTrack-B256 with the ViT-Base encoder achieves a 69.1% AUC on the LaSOT benchmark at 104.8 FPS. Full article

(This article belongs to the Special Issue Computer Vision for Object Detection and Tracking with Sensor-Based Applications)

► Show Figures

Figure 1

19 pages, 1736 KB

Open AccessFeature PaperArticle

Motivations and Tools Relevant to Personalized Workspaces in VR Environments

by Ildikó Horváth and Ádám B. Csapó

Electronics 2023, 12(9), 2059; https://doi.org/10.3390/electronics12092059 - 29 Apr 2023

Cited by 14 | Viewed by 2175

Abstract

In this paper, we propose a new virtual reality (VR) concept referred to as ‘context control’, which we use to describe VR workspaces that are dynamically reconfigurable based on the task at hand and the user’s individual learning and working style. To demonstrate [...] Read more.

In this paper, we propose a new virtual reality (VR) concept referred to as ‘context control’, which we use to describe VR workspaces that are dynamically reconfigurable based on the task at hand and the user’s individual learning and working style. To demonstrate the viability of the concept as well as how it could be applied in practical applications, we present an implementation framework that, at its foundations, relies on Kolb’s learning styles taxonomy, consisting of Assimilators, Accommodators, Convergers and Divergers. We propose a layout schema for each of these categories of learning style, and validate them based on an experiment involving 52 university students, showing that the test subjects preferred content layouts that represent cognitive profiles matching their own to a greater extent. We also propose a hierarchical schema template language with which the schemas can be formalized and made amenable to further dynamic customization. Full article

(This article belongs to the Special Issue Virtual Reality, Augmented Reality and the Metaverse for Enhanced Human Cognitive Capabilities)

► Show Figures

Figure 1

13 pages, 818 KB

Open AccessArticle

Automatic Electronic Invoice Classification Using Machine Learning Models

by Chiara Bardelli, Alessandro Rondinelli, Ruggero Vecchio and Silvia Figini

Mach. Learn. Knowl. Extr. 2020, 2(4), 617-629; https://doi.org/10.3390/make2040033 - 30 Nov 2020

Cited by 22 | Viewed by 12270

Abstract

Electronic invoicing has been mandatory for Italian companies since January 2019. All the invoices are structured in a predefined xml template which facilitates the extraction of the information. The main aim of this paper is to exploit the information contained in electronic invoices [...] Read more.

Electronic invoicing has been mandatory for Italian companies since January 2019. All the invoices are structured in a predefined xml template which facilitates the extraction of the information. The main aim of this paper is to exploit the information contained in electronic invoices to build an intelligent system which can simplify accountants’ work. More precisely, this contribution shows how it is possible to automate part of the accounting process: all the invoices of a company are classified into specific codes which represent the economic nature of the financial transactions. To accomplish this classification task, a multiclass classification algorithm is proposed to predict two different target variables, the account and the VAT codes, which are part of the general ledger entry. To apply this model to real datasets, a multi-step procedure is proposed: first, a matching algorithm is used for the reconstruction of the training set, then input data are elaborated and prepared for the training phase, and finally a classification algorithm is trained. Different classification algorithms are compared in terms of prediction accuracy, including ensemble models and neural networks. The models under comparison show optimal results in the prediction of the target variables, meaning that machine learning classifiers succeed in translating the complex rules of the accounting process into an automated model. A final study suggests that best performances can be achieved considering the hierarchical structure of the account codes, splitting the classification task into smaller sub-problems. Full article

► Show Figures

Figure 1

Search Results (18)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (18)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI