Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (171)

Search Parameters:
Keywords = bag-of-words model

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 978 KiB  
Article
Emotional Analysis in a Morphologically Rich Language: Enhancing Machine Learning with Psychological Feature Lexicons
by Ron Keinan, Efraim Margalit and Dan Bouhnik
Electronics 2025, 14(15), 3067; https://doi.org/10.3390/electronics14153067 (registering DOI) - 31 Jul 2025
Viewed by 38
Abstract
This paper explores emotional analysis in Hebrew texts, focusing on improving machine learning techniques for depression detection by integrating psychological feature lexicons. Hebrew’s complex morphology makes emotional analysis challenging, and this study seeks to address that by combining traditional machine learning methods with [...] Read more.
This paper explores emotional analysis in Hebrew texts, focusing on improving machine learning techniques for depression detection by integrating psychological feature lexicons. Hebrew’s complex morphology makes emotional analysis challenging, and this study seeks to address that by combining traditional machine learning methods with sentiment lexicons. The dataset consists of over 350,000 posts from 25,000 users on the health-focused social network “Camoni” from 2010 to 2021. Various machine learning models—SVM, Random Forest, Logistic Regression, and Multi-Layer Perceptron—were used, alongside ensemble techniques like Bagging, Boosting, and Stacking. TF-IDF was applied for feature selection, with word and character n-grams, and pre-processing steps like punctuation removal, stop word elimination, and lemmatization were performed to handle Hebrew’s linguistic complexity. The models were enriched with sentiment lexicons curated by professional psychologists. The study demonstrates that integrating sentiment lexicons significantly improves classification accuracy. Specific lexicons—such as those for negative and positive emojis, hostile words, anxiety words, and no-trust words—were particularly effective in enhancing model performance. Our best model classified depression with an accuracy of 84.1%. These findings offer insights into depression detection, suggesting that practitioners in mental health and social work can improve their machine learning models for detecting depression in online discourse by incorporating emotion-based lexicons. The societal impact of this work lies in its potential to improve the detection of depression in online Hebrew discourse, offering more accurate and efficient methods for mental health interventions in online communities. Full article
(This article belongs to the Special Issue Techniques and Applications of Multimodal Data Fusion)
Show Figures

Figure 1

22 pages, 579 KiB  
Article
Automated Classification of Crime Narratives Using Machine Learning and Language Models in Official Statistics
by Klaus Lehmann, Elio Villaseñor, Alejandro Pimentel, Javiera Preuss, Nicolás Berhó, Oswaldo Diaz and Ignacio Agloni
Stats 2025, 8(3), 68; https://doi.org/10.3390/stats8030068 - 30 Jul 2025
Viewed by 300
Abstract
This paper presents the implementation of a language model–based strategy for the automatic codification of crime narratives for the production of official statistics. To address the high workload and inconsistencies associated with manual coding, we developed and evaluated three models: an XGBoost classifier [...] Read more.
This paper presents the implementation of a language model–based strategy for the automatic codification of crime narratives for the production of official statistics. To address the high workload and inconsistencies associated with manual coding, we developed and evaluated three models: an XGBoost classifier with bag-of-words features and word embeddings features, an LSTM network using pretrained Spanish word embeddings as a language model, and a fine-tuned BERT language model (BETO). Deep learning models outperformed the traditional baseline, with BETO achieving the highest accuracy. The new ENUSC (Encuesta Nacional Urbana de Seguridad Ciudadana) workflow integrates the selected model into an API for automated classification, incorporating a certainty threshold to distinguish between cases suitable for automation and those requiring expert review. This hybrid strategy led to a 68.4% reduction in manual review workload while preserving high-quality standards. This study represents the first documented application of deep learning for the automated classification of victimization narratives in official statistics, demonstrating its feasibility and impact in a real-world production environment. Our results demonstrate that deep learning can significantly improve the efficiency and consistency of crime statistics coding, offering a scalable solution for other national statistical offices. Full article
(This article belongs to the Section Applied Statistics and Machine Learning Methods)
Show Figures

Figure 1

32 pages, 465 KiB  
Article
EsCorpiusBias: The Contextual Annotation and Transformer-Based Detection of Racism and Sexism in Spanish Dialogue
by Ksenia Kharitonova, David Pérez-Fernández, Javier Gutiérrez-Hernando, Asier Gutiérrez-Fandiño, Zoraida Callejas and David Griol
Future Internet 2025, 17(8), 340; https://doi.org/10.3390/fi17080340 - 28 Jul 2025
Viewed by 123
Abstract
The rise in online communication platforms has significantly increased exposure to harmful discourse, presenting ongoing challenges for digital moderation and user well-being. This paper introduces the EsCorpiusBias corpus, designed to enhance the automated detection of sexism and racism within Spanish-language online dialogue, specifically [...] Read more.
The rise in online communication platforms has significantly increased exposure to harmful discourse, presenting ongoing challenges for digital moderation and user well-being. This paper introduces the EsCorpiusBias corpus, designed to enhance the automated detection of sexism and racism within Spanish-language online dialogue, specifically sourced from the Mediavida forum. By means of a systematic, context-sensitive annotation protocol, approximately 1000 three-turn dialogue units per bias category are annotated, ensuring the nuanced recognition of pragmatic and conversational subtleties. Here, annotation guidelines are meticulously developed, covering explicit and implicit manifestations of sexism and racism. Annotations are performed using the Prodigy tool (v1. 16.0) resulting in moderate to substantial inter-annotator agreement (Cohen’s Kappa: 0.55 for sexism and 0.79 for racism). Models including logistic regression, SpaCy’s baseline n-gram bag-of-words model, and transformer-based BETO are trained and evaluated, demonstrating that contextualized transformer-based approaches significantly outperform baseline and general-purpose models. Notably, the single-turn BETO model achieves an ROC-AUC of 0.94 for racism detection, while the contextual BETO model reaches an ROC-AUC of 0.87 for sexism detection, highlighting BETO’s superior effectiveness in capturing nuanced bias in online dialogues. Additionally, lexical overlap analyses indicate a strong reliance on explicit lexical indicators, highlighting limitations in handling implicit biases. This research underscores the importance of contextually grounded, domain-specific fine-tuning for effective automated detection of toxicity, providing robust resources and methodologies to foster socially responsible NLP systems within Spanish-speaking online communities. Full article
(This article belongs to the Special Issue Deep Learning and Natural Language Processing—3rd Edition)
Show Figures

Figure 1

18 pages, 591 KiB  
Article
Active Learning for Medical Article Classification with Bag of Words and Bag of Concepts Embeddings
by Radosław Pytlak, Paweł Cichosz, Bartłomiej Fajdek and Bogdan Jastrzębski
Appl. Sci. 2025, 15(14), 7955; https://doi.org/10.3390/app15147955 - 17 Jul 2025
Viewed by 247
Abstract
Systems supporting systematic literature reviews often use machine learning algorithms to create classification models to assess the relevance of articles to study topics. The proper choice of text representation for such algorithms may have a significant impact on their predictive performance. This article [...] Read more.
Systems supporting systematic literature reviews often use machine learning algorithms to create classification models to assess the relevance of articles to study topics. The proper choice of text representation for such algorithms may have a significant impact on their predictive performance. This article presents an in-depth investigation of the utility of the bag of concepts representation for this purpose, which can be considered an enhanced form of the ubiquitous bag of words representation, with features corresponding to ontology concepts rather than words. Its utility is evaluated in the active learning setting, in which a sequence of classification models is created, with training data iteratively expanded by adding articles selected for human screening. Different versions of the bag of concepts are compared with bag of words, as well as with combined representations, including both word-based and concept-based features. The evaluation uses the support vector machine, naive Bayes, and random forest algorithms and is performed on datasets from 15 systematic medical literature review studies. The results show that concept-based features may have additional predictive value in comparison to standard word-based features and that the combined bag of concepts and bag of words representation is the most useful overall. Full article
Show Figures

Figure 1

34 pages, 5774 KiB  
Article
Approach to Semantic Visual SLAM for Bionic Robots Based on Loop Closure Detection with Combinatorial Graph Entropy in Complex Dynamic Scenes
by Dazheng Wang and Jingwen Luo
Biomimetics 2025, 10(7), 446; https://doi.org/10.3390/biomimetics10070446 - 6 Jul 2025
Viewed by 408
Abstract
In complex dynamic environments, the performance of SLAM systems on bionic robots is susceptible to interference from dynamic objects or structural changes in the environment. To address this problem, we propose a semantic visual SLAM (vSLAM) algorithm based on loop closure detection with [...] Read more.
In complex dynamic environments, the performance of SLAM systems on bionic robots is susceptible to interference from dynamic objects or structural changes in the environment. To address this problem, we propose a semantic visual SLAM (vSLAM) algorithm based on loop closure detection with combinatorial graph entropy. First, in terms of the dynamic feature detection results of YOLOv8-seg, the feature points at the edges of the dynamic object are finely judged by calculating the mean absolute deviation (MAD) of the depth of the pixel points. Then, a high-quality keyframe selection strategy is constructed by combining the semantic information, the average coordinates of the semantic objects, and the degree of variation in the dense region of feature points. Subsequently, the unweighted and weighted graphs of keyframes are constructed according to the distribution of feature points, characterization points, and semantic information, and then a high-performance loop closure detection method based on combinatorial graph entropy is developed. The experimental results show that our loop closure detection approach exhibits higher precision and recall in real scenes compared to the bag-of-words (BoW) model. Compared with ORB-SLAM2, the absolute trajectory accuracy in high-dynamic sequences improved by an average of 97.01%, while the number of extracted keyframes decreased by an average of 61.20%. Full article
(This article belongs to the Special Issue Artificial Intelligence for Autonomous Robots: 3rd Edition)
Show Figures

Figure 1

18 pages, 839 KiB  
Article
From Narratives to Diagnosis: A Machine Learning Framework for Classifying Sleep Disorders in Aging Populations: The sleepCare Platform
by Christos A. Frantzidis
Brain Sci. 2025, 15(7), 667; https://doi.org/10.3390/brainsci15070667 - 20 Jun 2025
Viewed by 974
Abstract
Background/Objectives: Sleep disorders are prevalent among aging populations and are often linked to cognitive decline, chronic conditions, and reduced quality of life. Traditional diagnostic methods, such as polysomnography, are resource-intensive and limited in accessibility. Meanwhile, individuals frequently describe their sleep experiences through [...] Read more.
Background/Objectives: Sleep disorders are prevalent among aging populations and are often linked to cognitive decline, chronic conditions, and reduced quality of life. Traditional diagnostic methods, such as polysomnography, are resource-intensive and limited in accessibility. Meanwhile, individuals frequently describe their sleep experiences through unstructured narratives in clinical notes, online forums, and telehealth platforms. This study proposes a machine learning pipeline (sleepCare) that classifies sleep-related narratives into clinically meaningful categories, including stress-related, neurodegenerative, and breathing-related disorders. The proposed framework employs natural language processing (NLP) and machine learning techniques to support remote applications and real-time patient monitoring, offering a scalable solution for the early identification of sleep disturbances. Methods: The sleepCare consists of a three-tiered classification pipeline to analyze narrative sleep reports. First, a baseline model used a Multinomial Naïve Bayes classifier with n-gram features from a Bag-of-Words representation. Next, a Support Vector Machine (SVM) was trained on GloVe-based word embeddings to capture semantic context. Finally, a transformer-based model (BERT) was fine-tuned to extract contextual embeddings, using the [CLS] token as input for SVM classification. Each model was evaluated using stratified train-test splits and 10-fold cross-validation. Hyperparameter tuning via GridSearchCV optimized performance. The dataset contained 475 labeled sleep narratives, classified into five etiological categories relevant for clinical interpretation. Results: The transformer-based model utilizing BERT embeddings and an optimized Support Vector Machine classifier achieved an overall accuracy of 81% on the test set. Class-wise F1-scores ranged from 0.72 to 0.91, with the highest performance observed in classifying normal or improved sleep (F1 = 0.91). The macro average F1-score was 0.78, indicating balanced performance across all categories. GridSearchCV identified the optimal SVM parameters (C = 4, kernel = ‘rbf’, gamma = 0.01, degree = 2, class_weight = ‘balanced’). The confusion matrix revealed robust classification with limited misclassifications, particularly between overlapping symptom categories such as stress-related and neurodegenerative sleep disturbances. Conclusions: Unlike generic large language model applications, our approach emphasizes the personalized identification of sleep symptomatology through targeted classification of the narrative input. By integrating structured learning with contextual embeddings, the framework offers a clinically meaningful, scalable solution for early detection and differentiation of sleep disorders in diverse, real-world, and remote settings. Full article
(This article belongs to the Special Issue Perspectives of Artificial Intelligence (AI) in Aging Neuroscience)
Show Figures

Graphical abstract

25 pages, 2920 KiB  
Article
Compiler Identification with Divisive Analysis and Support Vector Machine
by Changlan Liu, Yingsong Zhang, Peng Zuo and Peng Wang
Symmetry 2025, 17(6), 867; https://doi.org/10.3390/sym17060867 - 3 Jun 2025
Viewed by 445
Abstract
Compilers play a crucial role in software development, as most software must be compiled into binaries before release. Analyzing the compiler version from binary files is of great importance in software reverse engineering, maintenance, traceability, and information security. In this work, we propose [...] Read more.
Compilers play a crucial role in software development, as most software must be compiled into binaries before release. Analyzing the compiler version from binary files is of great importance in software reverse engineering, maintenance, traceability, and information security. In this work, we propose a novel framework for compiler version identification. Firstly, we generated 1000 C language source codes using CSmith and subsequently compiled them into 16,000 binary files using 16 distinct versions of compilers. The symmetric distribution of the dataset among different compiler versions may ensure unbiased model training. Then, IDA Pro was used to decompile the binary files into assembly instruction sequences. From these sequences, we extracted frequency-based features via the Bag-of-Words (BOW) model and sequence-based features derived from the grey-level co-occurrence matrix (GLCM). Finally, we introduced a divide-and-conquer framework (DIANA-SVM) to effectively classify compiler versions. The experimental results demonstrate that traditional Support Vector Machine (SVM) models struggle to accurately identify compiler versions using compiled executable files. In contrast, DIANA-SVM’s symmetric data separation approach enhances performance, achieving an accuracy of 94% (±0.375%). This framework enables precise identification of high-risk compiler versions, offering a reliable tool for software supply chain security. Theoretically, our GLCM-based sequence modeling and divide-and-conquer framework advance feature extraction methodologies for binary files, offering a scalable solution for similar classification tasks beyond compiler identification. Full article
(This article belongs to the Special Issue Advanced Studies of Symmetry/Asymmetry in Cybersecurity)
Show Figures

Figure 1

40 pages, 3224 KiB  
Article
A Comparative Study of Image Processing and Machine Learning Methods for Classification of Rail Welding Defects
by Mohale Emmanuel Molefe, Jules Raymond Tapamo and Siboniso Sithembiso Vilakazi
J. Sens. Actuator Netw. 2025, 14(3), 58; https://doi.org/10.3390/jsan14030058 - 29 May 2025
Viewed by 1893
Abstract
Defects formed during the thermite welding process of two sections of rails require the welded joints to be inspected for quality, and the most used non-destructive method for inspection is radiography testing. However, the conventional defect investigation process from the obtained radiography images [...] Read more.
Defects formed during the thermite welding process of two sections of rails require the welded joints to be inspected for quality, and the most used non-destructive method for inspection is radiography testing. However, the conventional defect investigation process from the obtained radiography images is costly, lengthy, and subjective as it is conducted manually by trained experts. Additionally, it has been shown that most rail breaks occur due to a crack initiated from the weld joint defect that was either misclassified or undetected. To improve the condition monitoring of rails, the railway industry requires an automated defect investigation system capable of detecting and classifying defects automatically. Therefore, this work proposes a method based on image processing and machine learning techniques for the automated investigation of defects. Histogram Equalization methods are first applied to improve image quality. Then, the extraction of the weld joint from the image background is achieved using the Chan–Vese Active Contour Model. A comparative investigation is carried out between Deep Convolution Neural Networks, Local Binary Pattern extractors, and Bag of Visual Words methods (with the Speeded-Up Robust Features extractor) for extracting features in weld joint images. Classification of features extracted by local feature extractors is achieved using Support Vector Machines, K-Nearest Neighbor, and Naive Bayes classifiers. The highest classification accuracy of 95% is achieved by the Deep Convolution Neural Network model. A Graphical User Interface is provided for the onsite investigation of defects. Full article
(This article belongs to the Special Issue AI-Assisted Machine-Environment Interaction)
Show Figures

Figure 1

14 pages, 1656 KiB  
Article
A Hybrid Learning Framework for Enhancing Bridge Damage Prediction
by Amal Abdulbaqi Maryoosh, Saeid Pashazadeh and Pedram Salehpour
Appl. Syst. Innov. 2025, 8(3), 61; https://doi.org/10.3390/asi8030061 - 30 Apr 2025
Cited by 1 | Viewed by 626
Abstract
Bridges are crucial structures for transportation networks, and their structural integrity is paramount. Deterioration and damage to bridges can lead to significant economic losses, traffic disruptions, and, in severe cases, loss of life. Traditional methods of bridge damage detection, often relying on visual [...] Read more.
Bridges are crucial structures for transportation networks, and their structural integrity is paramount. Deterioration and damage to bridges can lead to significant economic losses, traffic disruptions, and, in severe cases, loss of life. Traditional methods of bridge damage detection, often relying on visual inspections, can be challenging or impossible in critical areas such as roofing, corners, and heights. Therefore, there is a pressing need for automated and accurate techniques for bridge damage detection. This study aims to propose a novel method for bridge crack detection that leverages a hybrid supervised and unsupervised learning strategy. The proposed approach combines pixel-based feature method local binary pattern (LBP) with the mid-level feature bag of visual words (BoVW) for feature extraction, followed by the Apriori algorithm for dimensionality reduction and optimal feature selection. The selected features are then trained using the MobileNet model. The proposed model demonstrates exceptional performance, achieving accuracy rates ranging from 98.27% to 100%, with error rates between 1.73% and 0% across multiple bridge damage datasets. This study contributes a reliable hybrid learning framework for minimizing error rates in bridge damage detection, showcasing the potential of combining LBP–BoVW features with MobileNet for image-based classification tasks. Full article
Show Figures

Figure 1

20 pages, 3071 KiB  
Article
A Keyframe Extraction Method for Assembly Line Operation Videos Based on Optical Flow Estimation and ORB Features
by Xiaoyu Gao, Hua Xiang, Tongxi Wang, Wei Zhan, Mengxue Xie, Lingxuan Zhang and Muyu Lin
Sensors 2025, 25(9), 2677; https://doi.org/10.3390/s25092677 - 23 Apr 2025
Viewed by 897
Abstract
In modern manufacturing, cameras are widely used to record the full workflow of assembly line workers, enabling video-based operational analysis and management. However, these recordings are often excessively long, leading to high storage demands and inefficient processing. Existing keyframe extraction methods typically apply [...] Read more.
In modern manufacturing, cameras are widely used to record the full workflow of assembly line workers, enabling video-based operational analysis and management. However, these recordings are often excessively long, leading to high storage demands and inefficient processing. Existing keyframe extraction methods typically apply uniform strategies across all frames, which are ineffective in detecting subtle movements. To address this, we propose a keyframe extraction method tailored for assembly line videos, combining optical flow estimation with ORB-based visual features. Our approach adapts extraction strategies to actions with different motion amplitudes. Each video frame is first encoded into a feature vector using the ORB algorithm and a bag-of-visual-words model. Optical flow is then calculated using the DIS algorithm, allowing frames to be categorized by motion intensity. Adjacent frames within the same category are grouped, and the appropriate number of clusters, k, is determined based on the group’s characteristics. Keyframes are finally selected via k-means++ clustering within each group. The experimental results show that our method achieves a recall rate of 85.2%, with over 90% recall for actions involving minimal movement. Moreover, the method processes an average of 274 frames per second. These results highlight the method’s effectiveness in identifying subtle actions, reducing redundant content, and delivering high accuracy with efficient performance. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

29 pages, 4979 KiB  
Article
Land Cover Classification Model Using Multispectral Satellite Images Based on a Deep Learning Synergistic Semantic Segmentation Network
by Abdorreza Alavi Gharahbagh, Vahid Hajihashemi, José J. M. Machado and João Manuel R. S. Tavares
Sensors 2025, 25(7), 1988; https://doi.org/10.3390/s25071988 - 22 Mar 2025
Cited by 1 | Viewed by 1771
Abstract
Land cover classification (LCC) using satellite images is one of the rapidly expanding fields in mapping, highlighting the need for updating existing computational classification methods. Advances in technology and the increasing variety of applications have introduced challenges, such as more complex classes and [...] Read more.
Land cover classification (LCC) using satellite images is one of the rapidly expanding fields in mapping, highlighting the need for updating existing computational classification methods. Advances in technology and the increasing variety of applications have introduced challenges, such as more complex classes and a demand for greater detail. In recent years, deep learning and Convolutional Neural Networks (CNNs) have significantly enhanced the segmentation of satellite images. Since the training of CNNs requires sophisticated and expensive hardware and significant time, using pre-trained networks has become widespread in the segmentation of satellite image. This study proposes a hybrid synergistic semantic segmentation method based on the Deeplab v3+ network and a clustering-based post-processing scheme. The proposed method accurately classifies various land cover (LC) types in multispectral satellite images, including Pastures, Other Built-Up Areas, Water Bodies, Urban Areas, Grasslands, Forest, Farmland, and Others. The post-processing scheme includes a spectral bag-of-words model and K-medoids clustering to refine the Deeplab v3+ outputs and correct possible errors. The simulation results indicate that combining the post-processing scheme with deep learning improves the Matthews correlation coefficient (MCC) by approximately 5.7% compared to the baseline method. Additionally, the proposed approach is robust to data imbalance cases and can dynamically update its codewords over different seasons. Finally, the proposed synergistic semantic segmentation method was compared with several state-of-the-art segmentation methods in satellite images of Italy’s Lake Garda (Lago di Garda) region. The results showed that the proposed method outperformed the best existing techniques by at least 6% in terms of MCC. Full article
Show Figures

Figure 1

32 pages, 1286 KiB  
Article
Real-Time Fuzzy Record-Matching Similarity Metric and Optimal Q-Gram Filter
by Ondřej Rozinek, Jaroslav Marek, Jan Panuš and Jan Mareš
Algorithms 2025, 18(3), 150; https://doi.org/10.3390/a18030150 - 6 Mar 2025
Cited by 1 | Viewed by 1037
Abstract
In this paper, we introduce an advanced Fuzzy Record Similarity Metric (FRMS) that improves approximate record matching and models human perception of record similarity. The FRMS utilizes a newly developed similarity space with favorable properties combined with a metric space, employing a bag-of-words [...] Read more.
In this paper, we introduce an advanced Fuzzy Record Similarity Metric (FRMS) that improves approximate record matching and models human perception of record similarity. The FRMS utilizes a newly developed similarity space with favorable properties combined with a metric space, employing a bag-of-words model with general applications in text mining and cluster analysis. To optimize the FRMS, we propose a two-stage method for approximate string matching and search that outperforms baseline methods in terms of average time complexity and F measure on various datasets. In the first stage, we construct an optimal Q-gram count filter as an optimal lower bound for fuzzy token similarities such as FRMS. The approximated Q-gram count filter achieves a high accuracy rate, filtering over 99% of dissimilar records, with a constant time complexity of O(1). In the second stage, FRMS runs for a polynomial time of approximately O(n4) and models human perception of record similarity by maximum weight matching in a bipartite graph. The FRMS architecture has widespread applications in structured document storage such as databases and has already been commercialized by one of the largest IT companies. As a side result, we explain the behavior of the singularity of the Q-gram filter and the advantages of a padding extension. Overall, our method provides a more accurate and efficient approach to approximate string matching and search with real-time runtime. Full article
(This article belongs to the Section Analysis of Algorithms and Complexity Theory)
Show Figures

Figure 1

19 pages, 3143 KiB  
Article
Non-Convex Metric Learning-Based Trajectory Clustering Algorithm
by Xiaoyan Lei and Hongyan Wang
Mathematics 2025, 13(3), 387; https://doi.org/10.3390/math13030387 - 24 Jan 2025
Viewed by 561
Abstract
To address the issue of suboptimal clustering performance arising from the limitations of distance measurement in traditional trajectory clustering methods, this paper presents a novel trajectory clustering strategy that integrates the bag-of-words model with non-convex metric learning. Initially, the strategy extracts motion characteristic [...] Read more.
To address the issue of suboptimal clustering performance arising from the limitations of distance measurement in traditional trajectory clustering methods, this paper presents a novel trajectory clustering strategy that integrates the bag-of-words model with non-convex metric learning. Initially, the strategy extracts motion characteristic parameters from trajectory points. Subsequently, based on the minimum description length criterion, trajectories are segmented into several homogeneous segments, and statistical properties for each segment are computed. A non-convex metric learning mechanism is then introduced to enhance similarity evaluation accuracy. Furthermore, by combining a bag-of-words model with a non-convex metric learning algorithm, segmented trajectory fragments are transformed into fixed-length feature descriptors. Finally, the K-means method and the proposed non-convex metric learning algorithm are utilized to analyze the feature descriptors, and hence, the effective clustering of trajectories can be achieved. Experimental results demonstrate that the proposed method exhibits superior clustering performance compared to the state-of-the-art trajectory clustering approaches. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

20 pages, 3018 KiB  
Article
Global Semantic Localization from Abstract Ellipse-Ellipsoid Model and Object-Level Instance Topology
by Heng Wu, Yanjie Liu, Chao Wang and Yanlong Wei
Remote Sens. 2024, 16(22), 4187; https://doi.org/10.3390/rs16224187 - 10 Nov 2024
Viewed by 1208
Abstract
Robust and highly accurate localization using a camera is a challenging task when appearance varies significantly. In indoor environments, changes in illumination and object occlusion can have a significant impact on visual localization. In this paper, we propose a visual localization method based [...] Read more.
Robust and highly accurate localization using a camera is a challenging task when appearance varies significantly. In indoor environments, changes in illumination and object occlusion can have a significant impact on visual localization. In this paper, we propose a visual localization method based on an ellipse-ellipsoid model, combined with object-level instance topology and alignment. First, we develop a CNN-based (Convolutional Neural Network) ellipse prediction network, DEllipse-Net, which integrates depth information with RGB data to estimate the projection of ellipsoids onto images. Second, we model environments using 3D (Three-dimensional) ellipsoids, instance topology, and ellipsoid descriptors. Finally, the detected ellipses are aligned with the ellipsoids in the environment through semantic object association, and 6-DoF (Degree of Freedom) pose estimation is performed using the ellipse-ellipsoid model. In the bounding box noise experiment, DEllipse-Net demonstrates higher robustness compared to other methods, achieving the highest prediction accuracy for 11 out of 23 objects in ellipse prediction. In the localization test with 15 pixels of noise, we achieve ATE (Absolute Translation Error) and ARE (Absolute Rotation Error) of 0.077 m and 2.70 in the fr2_desk sequence. Additionally, DEllipse-Net is lightweight and highly portable, with a model size of only 18.6 MB, and a single model can handle all objects. In the object-level instance topology and alignment experiment, our topology and alignment methods significantly enhance the global localization accuracy of the ellipse-ellipsoid model. In experiments involving lighting changes and occlusions, our method achieves more robust global localization compared to the classical bag-of-words based localization method and other ellipse-ellipsoid localization methods. Full article
Show Figures

Figure 1

21 pages, 1242 KiB  
Article
A Bag-of-Words Approach for Information Extraction from Electricity Invoices
by Javier Sánchez and Giovanny A. Cuervo-Londoño
AI 2024, 5(4), 1837-1857; https://doi.org/10.3390/ai5040091 - 8 Oct 2024
Viewed by 1586
Abstract
In the context of digitization and automation, extracting relevant information from business documents remains a significant challenge. It is typical to rely on machine-learning techniques to automate the process, reduce manual labor, and minimize errors. This work introduces a new model for extracting [...] Read more.
In the context of digitization and automation, extracting relevant information from business documents remains a significant challenge. It is typical to rely on machine-learning techniques to automate the process, reduce manual labor, and minimize errors. This work introduces a new model for extracting key values from electricity invoices, including customer data, bill breakdown, electricity consumption, or marketer data. We evaluate several machine learning techniques, such as Naive Bayes, Logistic Regression, Random Forests, or Support Vector Machines. Our approach relies on a bag-of-words strategy and custom-designed features tailored for electricity data. We validate our method on the IDSEM dataset, which includes 75,000 electricity invoices with eighty-six fields. The model converts PDF invoices into text and processes each word separately using a context of eleven words. The results of our experiments indicate that Support Vector Machines and Random Forests perform exceptionally well in capturing numerous values with high precision. The study also explores the advantages of our custom features and evaluates the performance of unseen documents. The precision obtained with Support Vector Machines is 91.86% on average, peaking at 98.47% for one document template. These results demonstrate the effectiveness of our method in accurately extracting key values from invoices. Full article
Show Figures

Figure 1

Back to TopTop