MDPI - Publisher of Open Access Journals

18 pages, 591 KiB

Open AccessArticle

Active Learning for Medical Article Classification with Bag of Words and Bag of Concepts Embeddings

by Radosław Pytlak, Paweł Cichosz, Bartłomiej Fajdek and Bogdan Jastrzębski

Appl. Sci. 2025, 15(14), 7955; https://doi.org/10.3390/app15147955 - 17 Jul 2025

Viewed by 140

Systems supporting systematic literature reviews often use machine learning algorithms to create classification models to assess the relevance of articles to study topics. The proper choice of text representation for such algorithms may have a significant impact on their predictive performance. This article [...] Read more.

Systems supporting systematic literature reviews often use machine learning algorithms to create classification models to assess the relevance of articles to study topics. The proper choice of text representation for such algorithms may have a significant impact on their predictive performance. This article presents an in-depth investigation of the utility of the bag of concepts representation for this purpose, which can be considered an enhanced form of the ubiquitous bag of words representation, with features corresponding to ontology concepts rather than words. Its utility is evaluated in the active learning setting, in which a sequence of classification models is created, with training data iteratively expanded by adding articles selected for human screening. Different versions of the bag of concepts are compared with bag of words, as well as with combined representations, including both word-based and concept-based features. The evaluation uses the support vector machine, naive Bayes, and random forest algorithms and is performed on datasets from 15 systematic medical literature review studies. The results show that concept-based features may have additional predictive value in comparison to standard word-based features and that the combined bag of concepts and bag of words representation is the most useful overall. Full article

(This article belongs to the Special Issue Natural Language Processing and Semantic Technologies: From Theories to Applications)

► Show Figures

Figure 1

34 pages, 5774 KiB

Open AccessArticle

Approach to Semantic Visual SLAM for Bionic Robots Based on Loop Closure Detection with Combinatorial Graph Entropy in Complex Dynamic Scenes

by Dazheng Wang and Jingwen Luo

Biomimetics 2025, 10(7), 446; https://doi.org/10.3390/biomimetics10070446 - 6 Jul 2025

Viewed by 359

Abstract

In complex dynamic environments, the performance of SLAM systems on bionic robots is susceptible to interference from dynamic objects or structural changes in the environment. To address this problem, we propose a semantic visual SLAM (vSLAM) algorithm based on loop closure detection with [...] Read more.

In complex dynamic environments, the performance of SLAM systems on bionic robots is susceptible to interference from dynamic objects or structural changes in the environment. To address this problem, we propose a semantic visual SLAM (vSLAM) algorithm based on loop closure detection with combinatorial graph entropy. First, in terms of the dynamic feature detection results of YOLOv8-seg, the feature points at the edges of the dynamic object are finely judged by calculating the mean absolute deviation (MAD) of the depth of the pixel points. Then, a high-quality keyframe selection strategy is constructed by combining the semantic information, the average coordinates of the semantic objects, and the degree of variation in the dense region of feature points. Subsequently, the unweighted and weighted graphs of keyframes are constructed according to the distribution of feature points, characterization points, and semantic information, and then a high-performance loop closure detection method based on combinatorial graph entropy is developed. The experimental results show that our loop closure detection approach exhibits higher precision and recall in real scenes compared to the bag-of-words (BoW) model. Compared with ORB-SLAM2, the absolute trajectory accuracy in high-dynamic sequences improved by an average of 97.01%, while the number of extracted keyframes decreased by an average of 61.20%. Full article

(This article belongs to the Special Issue Artificial Intelligence for Autonomous Robots: 3rd Edition)

► Show Figures

Figure 1

18 pages, 839 KiB

Open AccessArticle

From Narratives to Diagnosis: A Machine Learning Framework for Classifying Sleep Disorders in Aging Populations: The sleepCare Platform

by Christos A. Frantzidis

Brain Sci. 2025, 15(7), 667; https://doi.org/10.3390/brainsci15070667 - 20 Jun 2025

Viewed by 915

Abstract

Background/Objectives: Sleep disorders are prevalent among aging populations and are often linked to cognitive decline, chronic conditions, and reduced quality of life. Traditional diagnostic methods, such as polysomnography, are resource-intensive and limited in accessibility. Meanwhile, individuals frequently describe their sleep experiences through [...] Read more.

Background/Objectives: Sleep disorders are prevalent among aging populations and are often linked to cognitive decline, chronic conditions, and reduced quality of life. Traditional diagnostic methods, such as polysomnography, are resource-intensive and limited in accessibility. Meanwhile, individuals frequently describe their sleep experiences through unstructured narratives in clinical notes, online forums, and telehealth platforms. This study proposes a machine learning pipeline (sleepCare) that classifies sleep-related narratives into clinically meaningful categories, including stress-related, neurodegenerative, and breathing-related disorders. The proposed framework employs natural language processing (NLP) and machine learning techniques to support remote applications and real-time patient monitoring, offering a scalable solution for the early identification of sleep disturbances. Methods: The sleepCare consists of a three-tiered classification pipeline to analyze narrative sleep reports. First, a baseline model used a Multinomial Naïve Bayes classifier with n-gram features from a Bag-of-Words representation. Next, a Support Vector Machine (SVM) was trained on GloVe-based word embeddings to capture semantic context. Finally, a transformer-based model (BERT) was fine-tuned to extract contextual embeddings, using the [CLS] token as input for SVM classification. Each model was evaluated using stratified train-test splits and 10-fold cross-validation. Hyperparameter tuning via GridSearchCV optimized performance. The dataset contained 475 labeled sleep narratives, classified into five etiological categories relevant for clinical interpretation. Results: The transformer-based model utilizing BERT embeddings and an optimized Support Vector Machine classifier achieved an overall accuracy of 81% on the test set. Class-wise F1-scores ranged from 0.72 to 0.91, with the highest performance observed in classifying normal or improved sleep (F1 = 0.91). The macro average F1-score was 0.78, indicating balanced performance across all categories. GridSearchCV identified the optimal SVM parameters (C = 4, kernel = ‘rbf’, gamma = 0.01, degree = 2, class_weight = ‘balanced’). The confusion matrix revealed robust classification with limited misclassifications, particularly between overlapping symptom categories such as stress-related and neurodegenerative sleep disturbances. Conclusions: Unlike generic large language model applications, our approach emphasizes the personalized identification of sleep symptomatology through targeted classification of the narrative input. By integrating structured learning with contextual embeddings, the framework offers a clinically meaningful, scalable solution for early detection and differentiation of sleep disorders in diverse, real-world, and remote settings. Full article

(This article belongs to the Special Issue Perspectives of Artificial Intelligence (AI) in Aging Neuroscience)

► Show Figures

Graphical abstract

25 pages, 2920 KiB

Open AccessArticle

Compiler Identification with Divisive Analysis and Support Vector Machine

by Changlan Liu, Yingsong Zhang, Peng Zuo and Peng Wang

Symmetry 2025, 17(6), 867; https://doi.org/10.3390/sym17060867 - 3 Jun 2025

Viewed by 428

Abstract

Compilers play a crucial role in software development, as most software must be compiled into binaries before release. Analyzing the compiler version from binary files is of great importance in software reverse engineering, maintenance, traceability, and information security. In this work, we propose [...] Read more.

Compilers play a crucial role in software development, as most software must be compiled into binaries before release. Analyzing the compiler version from binary files is of great importance in software reverse engineering, maintenance, traceability, and information security. In this work, we propose a novel framework for compiler version identification. Firstly, we generated 1000 C language source codes using CSmith and subsequently compiled them into 16,000 binary files using 16 distinct versions of compilers. The symmetric distribution of the dataset among different compiler versions may ensure unbiased model training. Then, IDA Pro was used to decompile the binary files into assembly instruction sequences. From these sequences, we extracted frequency-based features via the Bag-of-Words (BOW) model and sequence-based features derived from the grey-level co-occurrence matrix (GLCM). Finally, we introduced a divide-and-conquer framework (DIANA-SVM) to effectively classify compiler versions. The experimental results demonstrate that traditional Support Vector Machine (SVM) models struggle to accurately identify compiler versions using compiled executable files. In contrast, DIANA-SVM’s symmetric data separation approach enhances performance, achieving an accuracy of 94% (±0.375%). This framework enables precise identification of high-risk compiler versions, offering a reliable tool for software supply chain security. Theoretically, our GLCM-based sequence modeling and divide-and-conquer framework advance feature extraction methodologies for binary files, offering a scalable solution for similar classification tasks beyond compiler identification. Full article

(This article belongs to the Special Issue Advanced Studies of Symmetry/Asymmetry in Cybersecurity)

► Show Figures

Figure 1

40 pages, 3224 KiB

Open AccessArticle

A Comparative Study of Image Processing and Machine Learning Methods for Classification of Rail Welding Defects

by Mohale Emmanuel Molefe, Jules Raymond Tapamo and Siboniso Sithembiso Vilakazi

J. Sens. Actuator Netw. 2025, 14(3), 58; https://doi.org/10.3390/jsan14030058 - 29 May 2025

Viewed by 1694

Abstract

Defects formed during the thermite welding process of two sections of rails require the welded joints to be inspected for quality, and the most used non-destructive method for inspection is radiography testing. However, the conventional defect investigation process from the obtained radiography images [...] Read more.

Defects formed during the thermite welding process of two sections of rails require the welded joints to be inspected for quality, and the most used non-destructive method for inspection is radiography testing. However, the conventional defect investigation process from the obtained radiography images is costly, lengthy, and subjective as it is conducted manually by trained experts. Additionally, it has been shown that most rail breaks occur due to a crack initiated from the weld joint defect that was either misclassified or undetected. To improve the condition monitoring of rails, the railway industry requires an automated defect investigation system capable of detecting and classifying defects automatically. Therefore, this work proposes a method based on image processing and machine learning techniques for the automated investigation of defects. Histogram Equalization methods are first applied to improve image quality. Then, the extraction of the weld joint from the image background is achieved using the Chan–Vese Active Contour Model. A comparative investigation is carried out between Deep Convolution Neural Networks, Local Binary Pattern extractors, and Bag of Visual Words methods (with the Speeded-Up Robust Features extractor) for extracting features in weld joint images. Classification of features extracted by local feature extractors is achieved using Support Vector Machines, K-Nearest Neighbor, and Naive Bayes classifiers. The highest classification accuracy of 95% is achieved by the Deep Convolution Neural Network model. A Graphical User Interface is provided for the onsite investigation of defects. Full article

(This article belongs to the Special Issue AI-Assisted Machine-Environment Interaction)

► Show Figures

Figure 1

14 pages, 1656 KiB

Open AccessArticle

A Hybrid Learning Framework for Enhancing Bridge Damage Prediction

by Amal Abdulbaqi Maryoosh, Saeid Pashazadeh and Pedram Salehpour

Appl. Syst. Innov. 2025, 8(3), 61; https://doi.org/10.3390/asi8030061 - 30 Apr 2025

Cited by 1 | Viewed by 588

Abstract

Bridges are crucial structures for transportation networks, and their structural integrity is paramount. Deterioration and damage to bridges can lead to significant economic losses, traffic disruptions, and, in severe cases, loss of life. Traditional methods of bridge damage detection, often relying on visual [...] Read more.

Bridges are crucial structures for transportation networks, and their structural integrity is paramount. Deterioration and damage to bridges can lead to significant economic losses, traffic disruptions, and, in severe cases, loss of life. Traditional methods of bridge damage detection, often relying on visual inspections, can be challenging or impossible in critical areas such as roofing, corners, and heights. Therefore, there is a pressing need for automated and accurate techniques for bridge damage detection. This study aims to propose a novel method for bridge crack detection that leverages a hybrid supervised and unsupervised learning strategy. The proposed approach combines pixel-based feature method local binary pattern (LBP) with the mid-level feature bag of visual words (BoVW) for feature extraction, followed by the Apriori algorithm for dimensionality reduction and optimal feature selection. The selected features are then trained using the MobileNet model. The proposed model demonstrates exceptional performance, achieving accuracy rates ranging from 98.27% to 100%, with error rates between 1.73% and 0% across multiple bridge damage datasets. This study contributes a reliable hybrid learning framework for minimizing error rates in bridge damage detection, showcasing the potential of combining LBP–BoVW features with MobileNet for image-based classification tasks. Full article

► Show Figures

Figure 1

20 pages, 3071 KiB

Open AccessArticle

A Keyframe Extraction Method for Assembly Line Operation Videos Based on Optical Flow Estimation and ORB Features

by Xiaoyu Gao, Hua Xiang, Tongxi Wang, Wei Zhan, Mengxue Xie, Lingxuan Zhang and Muyu Lin

Sensors 2025, 25(9), 2677; https://doi.org/10.3390/s25092677 - 23 Apr 2025

Viewed by 845

Abstract

In modern manufacturing, cameras are widely used to record the full workflow of assembly line workers, enabling video-based operational analysis and management. However, these recordings are often excessively long, leading to high storage demands and inefficient processing. Existing keyframe extraction methods typically apply [...] Read more.

In modern manufacturing, cameras are widely used to record the full workflow of assembly line workers, enabling video-based operational analysis and management. However, these recordings are often excessively long, leading to high storage demands and inefficient processing. Existing keyframe extraction methods typically apply uniform strategies across all frames, which are ineffective in detecting subtle movements. To address this, we propose a keyframe extraction method tailored for assembly line videos, combining optical flow estimation with ORB-based visual features. Our approach adapts extraction strategies to actions with different motion amplitudes. Each video frame is first encoded into a feature vector using the ORB algorithm and a bag-of-visual-words model. Optical flow is then calculated using the DIS algorithm, allowing frames to be categorized by motion intensity. Adjacent frames within the same category are grouped, and the appropriate number of clusters, k, is determined based on the group’s characteristics. Keyframes are finally selected via k-means++ clustering within each group. The experimental results show that our method achieves a recall rate of 85.2%, with over 90% recall for actions involving minimal movement. Moreover, the method processes an average of 274 frames per second. These results highlight the method’s effectiveness in identifying subtle actions, reducing redundant content, and delivering high accuracy with efficient performance. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

29 pages, 4979 KiB

Open AccessArticle

Land Cover Classification Model Using Multispectral Satellite Images Based on a Deep Learning Synergistic Semantic Segmentation Network

by Abdorreza Alavi Gharahbagh, Vahid Hajihashemi, José J. M. Machado and João Manuel R. S. Tavares

Sensors 2025, 25(7), 1988; https://doi.org/10.3390/s25071988 - 22 Mar 2025

Cited by 1 | Viewed by 1672

Abstract

Land cover classification (LCC) using satellite images is one of the rapidly expanding fields in mapping, highlighting the need for updating existing computational classification methods. Advances in technology and the increasing variety of applications have introduced challenges, such as more complex classes and [...] Read more.

Land cover classification (LCC) using satellite images is one of the rapidly expanding fields in mapping, highlighting the need for updating existing computational classification methods. Advances in technology and the increasing variety of applications have introduced challenges, such as more complex classes and a demand for greater detail. In recent years, deep learning and Convolutional Neural Networks (CNNs) have significantly enhanced the segmentation of satellite images. Since the training of CNNs requires sophisticated and expensive hardware and significant time, using pre-trained networks has become widespread in the segmentation of satellite image. This study proposes a hybrid synergistic semantic segmentation method based on the Deeplab v3+ network and a clustering-based post-processing scheme. The proposed method accurately classifies various land cover (LC) types in multispectral satellite images, including Pastures, Other Built-Up Areas, Water Bodies, Urban Areas, Grasslands, Forest, Farmland, and Others. The post-processing scheme includes a spectral bag-of-words model and K-medoids clustering to refine the Deeplab v3+ outputs and correct possible errors. The simulation results indicate that combining the post-processing scheme with deep learning improves the Matthews correlation coefficient (MCC) by approximately 5.7% compared to the baseline method. Additionally, the proposed approach is robust to data imbalance cases and can dynamically update its codewords over different seasons. Finally, the proposed synergistic semantic segmentation method was compared with several state-of-the-art segmentation methods in satellite images of Italy’s Lake Garda (Lago di Garda) region. The results showed that the proposed method outperformed the best existing techniques by at least 6% in terms of MCC. Full article

(This article belongs to the Special Issue Advancing Land Monitoring through Synergistic Harmonization of Optical, Radar and Lidar Satellite Technologies)

► Show Figures

Figure 1

32 pages, 1286 KiB

Open AccessArticle

Real-Time Fuzzy Record-Matching Similarity Metric and Optimal Q-Gram Filter

by Ondřej Rozinek, Jaroslav Marek, Jan Panuš and Jan Mareš

Algorithms 2025, 18(3), 150; https://doi.org/10.3390/a18030150 - 6 Mar 2025

Viewed by 984

Abstract

In this paper, we introduce an advanced Fuzzy Record Similarity Metric (FRMS) that improves approximate record matching and models human perception of record similarity. The FRMS utilizes a newly developed similarity space with favorable properties combined with a metric space, employing a bag-of-words [...] Read more.

In this paper, we introduce an advanced Fuzzy Record Similarity Metric (FRMS) that improves approximate record matching and models human perception of record similarity. The FRMS utilizes a newly developed similarity space with favorable properties combined with a metric space, employing a bag-of-words model with general applications in text mining and cluster analysis. To optimize the FRMS, we propose a two-stage method for approximate string matching and search that outperforms baseline methods in terms of average time complexity and F measure on various datasets. In the first stage, we construct an optimal Q-gram count filter as an optimal lower bound for fuzzy token similarities such as FRMS. The approximated Q-gram count filter achieves a high accuracy rate, filtering over 99% of dissimilar records, with a constant time complexity of

\approx O (1)

. In the second stage, FRMS runs for a polynomial time of approximately

\approx O (n^{4})

and models human perception of record similarity by maximum weight matching in a bipartite graph. The FRMS architecture has widespread applications in structured document storage such as databases and has already been commercialized by one of the largest IT companies. As a side result, we explain the behavior of the singularity of the Q-gram filter and the advantages of a padding extension. Overall, our method provides a more accurate and efficient approach to approximate string matching and search with real-time runtime. Full article

(This article belongs to the Section Analysis of Algorithms and Complexity Theory)

► Show Figures

Figure 1

19 pages, 3143 KiB

Open AccessArticle

Non-Convex Metric Learning-Based Trajectory Clustering Algorithm

by Xiaoyan Lei and Hongyan Wang

Mathematics 2025, 13(3), 387; https://doi.org/10.3390/math13030387 - 24 Jan 2025

Viewed by 550

Abstract

To address the issue of suboptimal clustering performance arising from the limitations of distance measurement in traditional trajectory clustering methods, this paper presents a novel trajectory clustering strategy that integrates the bag-of-words model with non-convex metric learning. Initially, the strategy extracts motion characteristic [...] Read more.

To address the issue of suboptimal clustering performance arising from the limitations of distance measurement in traditional trajectory clustering methods, this paper presents a novel trajectory clustering strategy that integrates the bag-of-words model with non-convex metric learning. Initially, the strategy extracts motion characteristic parameters from trajectory points. Subsequently, based on the minimum description length criterion, trajectories are segmented into several homogeneous segments, and statistical properties for each segment are computed. A non-convex metric learning mechanism is then introduced to enhance similarity evaluation accuracy. Furthermore, by combining a bag-of-words model with a non-convex metric learning algorithm, segmented trajectory fragments are transformed into fixed-length feature descriptors. Finally, the K-means method and the proposed non-convex metric learning algorithm are utilized to analyze the feature descriptors, and hence, the effective clustering of trajectories can be achieved. Experimental results demonstrate that the proposed method exhibits superior clustering performance compared to the state-of-the-art trajectory clustering approaches. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

20 pages, 3018 KiB

Open AccessArticle

Global Semantic Localization from Abstract Ellipse-Ellipsoid Model and Object-Level Instance Topology

by Heng Wu, Yanjie Liu, Chao Wang and Yanlong Wei

Remote Sens. 2024, 16(22), 4187; https://doi.org/10.3390/rs16224187 - 10 Nov 2024

Viewed by 1187

Abstract

Robust and highly accurate localization using a camera is a challenging task when appearance varies significantly. In indoor environments, changes in illumination and object occlusion can have a significant impact on visual localization. In this paper, we propose a visual localization method based [...] Read more.

Robust and highly accurate localization using a camera is a challenging task when appearance varies significantly. In indoor environments, changes in illumination and object occlusion can have a significant impact on visual localization. In this paper, we propose a visual localization method based on an ellipse-ellipsoid model, combined with object-level instance topology and alignment. First, we develop a CNN-based (Convolutional Neural Network) ellipse prediction network, DEllipse-Net, which integrates depth information with RGB data to estimate the projection of ellipsoids onto images. Second, we model environments using 3D (Three-dimensional) ellipsoids, instance topology, and ellipsoid descriptors. Finally, the detected ellipses are aligned with the ellipsoids in the environment through semantic object association, and 6-DoF (Degree of Freedom) pose estimation is performed using the ellipse-ellipsoid model. In the bounding box noise experiment, DEllipse-Net demonstrates higher robustness compared to other methods, achieving the highest prediction accuracy for 11 out of 23 objects in ellipse prediction. In the localization test with 15 pixels of noise, we achieve

A T E

(Absolute Translation Error) and

A R E

(Absolute Rotation Error) of 0.077 m and

2 . 70^{\circ}

in the

f r 2_d e s k

sequence. Additionally, DEllipse-Net is lightweight and highly portable, with a model size of only 18.6 MB, and a single model can handle all objects. In the object-level instance topology and alignment experiment, our topology and alignment methods significantly enhance the global localization accuracy of the ellipse-ellipsoid model. In experiments involving lighting changes and occlusions, our method achieves more robust global localization compared to the classical bag-of-words based localization method and other ellipse-ellipsoid localization methods. Full article

► Show Figures

Figure 1

21 pages, 1242 KiB

Open AccessArticle

A Bag-of-Words Approach for Information Extraction from Electricity Invoices

by Javier Sánchez and Giovanny A. Cuervo-Londoño

AI 2024, 5(4), 1837-1857; https://doi.org/10.3390/ai5040091 - 8 Oct 2024

Viewed by 1547

Abstract

In the context of digitization and automation, extracting relevant information from business documents remains a significant challenge. It is typical to rely on machine-learning techniques to automate the process, reduce manual labor, and minimize errors. This work introduces a new model for extracting [...] Read more.

In the context of digitization and automation, extracting relevant information from business documents remains a significant challenge. It is typical to rely on machine-learning techniques to automate the process, reduce manual labor, and minimize errors. This work introduces a new model for extracting key values from electricity invoices, including customer data, bill breakdown, electricity consumption, or marketer data. We evaluate several machine learning techniques, such as Naive Bayes, Logistic Regression, Random Forests, or Support Vector Machines. Our approach relies on a bag-of-words strategy and custom-designed features tailored for electricity data. We validate our method on the IDSEM dataset, which includes 75,000 electricity invoices with eighty-six fields. The model converts PDF invoices into text and processes each word separately using a context of eleven words. The results of our experiments indicate that Support Vector Machines and Random Forests perform exceptionally well in capturing numerous values with high precision. The study also explores the advantages of our custom features and evaluates the performance of unseen documents. The precision obtained with Support Vector Machines is 91.86% on average, peaking at 98.47% for one document template. These results demonstrate the effectiveness of our method in accurately extracting key values from invoices. Full article

► Show Figures

Figure 1

18 pages, 489 KiB

Open AccessArticle

Maximizing Profitability and Occupancy: An Optimal Pricing Strategy for Airbnb Hosts Using Regression Techniques and Natural Language Processing

by Luca Di Persio and Enis Lalmi

J. Risk Financial Manag. 2024, 17(9), 414; https://doi.org/10.3390/jrfm17090414 - 18 Sep 2024

Cited by 1 | Viewed by 3729

Abstract

In the competitive landscape of Airbnb hosting, optimizing pricing strategies for properties is a complex challenge that requires revenue maximization with high occupancy rates. This research aimed to introduce a solution that leverages big data and machine learning techniques to help hosts improve [...] Read more.

In the competitive landscape of Airbnb hosting, optimizing pricing strategies for properties is a complex challenge that requires revenue maximization with high occupancy rates. This research aimed to introduce a solution that leverages big data and machine learning techniques to help hosts improve their property’s market performance. Our primary goal was to introduce a solution that can augment property owners’ understanding of their property’s market value within their urban context, thereby optimizing both the utilization and profitability of their listings. We employed a multi-faceted approach with diverse models, including support vector regression, XGBoost, and neural networks, to analyze the influence of factors such as location, host attributes, and guest reviews on a listing’s financial performance. To further refine our predictive models, we integrated natural language processing techniques for in-depth listing review analysis, focusing on term frequency-inverse document frequency (TF-IDF), bag-of-words, and aspect-based sentiment analysis. Integrating such techniques allowed for in-depth listing review analysis, providing nuanced insights into guest preferences and satisfaction. Our findings demonstrated that AirBnB hosts can effectively utilize both state-of-the-art and traditional machine learning algorithms to better understand customer needs and preferences, more accurately assess their listings’ market value, and focus on the importance of dynamic pricing strategies. By adopting this data-driven approach, hosts can achieve a balance between maintaining competitive pricing and ensuring high occupancy rates. This method not only enhances revenue potential but also contributes to improved guest satisfaction and the growing field of data-driven decisions in the sharing economy, specially tailored to the challenges of short-term rentals. Full article

(This article belongs to the Section Mathematics and Finance)

► Show Figures

Figure 1

28 pages, 2653 KiB

Open AccessArticle

How Does Digital Transformation Moderate Green Culture, Job Satisfaction, and Competitive Advantage in Sustainable Hotels?

by Gul Coskun Degirmen, Derya Ozilhan Ozbey, Emine Sardagı, Ilknur Cevik Tekin, Durmus Koc, Pınar Erdogan, Feden Koc and Emel Arık

Sustainability 2024, 16(18), 8072; https://doi.org/10.3390/su16188072 - 15 Sep 2024

Cited by 1 | Viewed by 3125

Abstract

Target groups within an organization adopt its culture, reflecting it in all internal and external business processes. Adopting a green organizational culture in hotels with sustainability certificates plays an important role in reshaping business processes by developing sustainability awareness among employees. Digital transformation, [...] Read more.

Target groups within an organization adopt its culture, reflecting it in all internal and external business processes. Adopting a green organizational culture in hotels with sustainability certificates plays an important role in reshaping business processes by developing sustainability awareness among employees. Digital transformation, which facilitates corporate culture and business processes, plays a role in employee job satisfaction while also supporting environmental, social, and economic sustainability. This research aims to determine the relationship between green organizational culture, job satisfaction, and competitive advantage variables and to examine the moderating role of digital transformation on these relationships. The data-collecting techniques of choice were surveys and semi-structured interviews. While Amos software (Version 24) was used to test the hypothetical model in the analysis of survey data, a Hayes Process macro was used to determine the moderating effect. The interview forms’ data was analyzed using a bag-of-words model. According to the research results, there is a positive relationship between the participation, consistency, and adaptability sub-dimensions of green organizational culture and job satisfaction, while there is no significant relationship between the mission sub-dimension and job satisfaction. Furthermore, the study reveals the moderating role of digital transformation in the effect of job satisfaction on competitive advantage. Full article

(This article belongs to the Special Issue Sustainability and Environmental, Social, and Governance (ESG) in Hospitality and Tourism Sector)

► Show Figures

Figure 1

26 pages, 1413 KiB

Open AccessArticle

Active Learning for Biomedical Article Classification with Bag of Words and FastText Embeddings

by Paweł Cichosz

Appl. Sci. 2024, 14(17), 7945; https://doi.org/10.3390/app14177945 - 6 Sep 2024

Viewed by 1590

Abstract

In several applications of text classification, training document labels are provided by human evaluators, and therefore, gathering sufficient data for model creation is time consuming and costly. The labeling time and effort may be reduced by active learning, in which classification models are [...] Read more.

In several applications of text classification, training document labels are provided by human evaluators, and therefore, gathering sufficient data for model creation is time consuming and costly. The labeling time and effort may be reduced by active learning, in which classification models are created based on relatively small training sets, which are obtained by collecting class labels provided in response to labeling requests or queries. This is an iterative process with a sequence of models being fitted, and each of them is used to select query articles to be added to the training set for the next one. Such a learning scenario may pose different challenges for machine learning algorithms and text representation methods used for text classification than ordinary passive learning, since they have to deal with very small, often imbalanced data, and the computational expense of both model creation and prediction has to remain low. This work examines how classification algorithms and text representation methods that have been found particularly useful by prior work handle these challenges. The random forest and support vector machines algorithms are coupled with the bag of words and FastText word embedding representations and applied to datasets consisting of scientific article abstracts from systematic literature review studies in the biomedical domain. Several strategies are used to select articles for active learning queries, including uncertainty sampling, diversity sampling, and strategies favoring the minority class. Confidence-based and stability-based early stopping criteria are used to generate active learning termination signals. The results confirm that active learning is a useful approach to creating text classification models with limited access to labeled data, making it possible to save at least half of the human effort needed to assign relevant or irrelevant class labels to training articles. Two of the four examined combinations of classification algorithms and text representation methods were the most successful: the SVM algorithm with the FastText representation and the random forest algorithm with the bag of words representation. Uncertainty sampling turned out to be the most useful query selection strategy, and confidence-based stopping was found more universal and easier to configure than stability-based stopping. Full article

(This article belongs to the Special Issue Data and Text Mining: New Approaches, Achievements and Applications)

► Show Figures

Figure 1

Search Results (168)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (168)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI