Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (51)

Search Parameters:
Keywords = Skip-gram

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 2599 KiB  
Article
AdaGram in Python: An AI Framework for Multi-Sense Embedding in Text and Scientific Formulas
by Arun Josephraj Arokiaraj, Samah Ibrahim, André Then, Bashar Ibrahim and Stephan Peter
Mathematics 2025, 13(14), 2241; https://doi.org/10.3390/math13142241 - 10 Jul 2025
Viewed by 346
Abstract
The Adaptive Skip-gram (AdaGram) algorithm extends traditional word embeddings by learning multiple vector representations per word, enabling the capture of contextual meanings and polysemy. Originally implemented in Julia, AdaGram has seen limited adoption due to ecosystem fragmentation and the comparative scarcity of Julia’s [...] Read more.
The Adaptive Skip-gram (AdaGram) algorithm extends traditional word embeddings by learning multiple vector representations per word, enabling the capture of contextual meanings and polysemy. Originally implemented in Julia, AdaGram has seen limited adoption due to ecosystem fragmentation and the comparative scarcity of Julia’s machine learning tooling compared to Python’s mature frameworks. In this work, we present a Python-based reimplementation of AdaGram that facilitates broader integration with modern machine learning tools. Our implementation expands the model’s applicability beyond natural language, enabling the analysis of scientific notation—particularly chemical and physical formulas encoded in LaTeX. We detail the algorithmic foundations, preprocessing pipeline, and hyperparameter configurations needed for interdisciplinary corpora. Evaluations on real-world texts and LaTeX-encoded formulas demonstrate AdaGram’s effectiveness in unsupervised word sense disambiguation. Comparative analyses highlight the importance of corpus design and parameter tuning. This implementation opens new applications in formula-aware literature search engines, ambiguity reduction in automated scientific summarization, and cross-disciplinary concept alignment. Full article
(This article belongs to the Section E: Applied Mathematics)
Show Figures

Figure 1

26 pages, 2931 KiB  
Article
CB-MTE: Social Bot Detection via Multi-Source Heterogeneous Feature Fusion
by Meng Cheng, Yuzhi Xiao, Tao Huang, Chao Lei and Chuang Zhang
Sensors 2025, 25(11), 3549; https://doi.org/10.3390/s25113549 - 4 Jun 2025
Viewed by 542
Abstract
Social bots increasingly mimic real users and collaborate in large-scale influence campaigns, distorting public perception and making their detection both critical and challenging. Traditional bot detection methods, constrained by single-source features, often fail to capture the complete behavioral and contextual characteristics of social [...] Read more.
Social bots increasingly mimic real users and collaborate in large-scale influence campaigns, distorting public perception and making their detection both critical and challenging. Traditional bot detection methods, constrained by single-source features, often fail to capture the complete behavioral and contextual characteristics of social bots, especially their dynamic behavioral evolution and group coordination tactics, resulting in feature incompleteness and reduced detection performance. To address this challenge, we propose CB-MTE, a social bot detection framework based on multi-source heterogeneous feature fusion. CB-MTE adopts a hierarchical architecture: user metadata is used to construct behavioral portraits, deep semantic representations are extracted from textual content via DistilBERT, and community-aware graph embeddings are learned through a combination of random walk and Skip-gram modeling. To mitigate feature redundancy and preserve structural consistency, manifold learning is applied for nonlinear dimensionality reduction, ensuring both local and global topology are maintained. Finally, a CatBoost-based collaborative reasoning mechanism enhances model robustness through ordered target encoding and symmetric tree structures. Experiments on the TwiBot-22 benchmark dataset demonstrate that CB-MTE significantly outperforms mainstream detection models in recognizing dynamic behavioral traits and detecting collaborative bot activities. These results confirm the framework’s capability to capture the complete behavioral and contextual characteristics of social bots through multi-source feature integration. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

17 pages, 2975 KiB  
Article
A Topology Identification Strategy of Low-Voltage Distribution Grids Based on Feature-Enhanced Graph Attention Network
by Yang Lei, Fan Yang, Yanjun Feng, Wei Hu and Yinzhang Cheng
Energies 2025, 18(11), 2821; https://doi.org/10.3390/en18112821 - 29 May 2025
Viewed by 446
Abstract
Accurate topological connectivity is critical for the safe operation and management of low-voltage distribution grids (LVDGs). However, due to the complexity of the structure and the lack of measurement equipment, obtaining and maintaining these topological connections has become a challenge. This paper proposes [...] Read more.
Accurate topological connectivity is critical for the safe operation and management of low-voltage distribution grids (LVDGs). However, due to the complexity of the structure and the lack of measurement equipment, obtaining and maintaining these topological connections has become a challenge. This paper proposes a topology identification strategy for LVDGs based on a feature-enhanced graph attention network (F-GAT). First, the topology of the LVDG is represented as a graph structure using measurement data collected from intelligent terminals, with a feature matrix encoding the basic information of each entity. Secondly, the meta-path form of the heterogeneous graph is designed according to the connection characteristics of the LVDG, and the walking sequence is enhanced using a heterogeneous skip-gram model to obtain an embedded representation of the structural characteristics of each node. Then, the F-GAT model is used to learn potential association patterns and structural information in the graph topology, achieving a joint low-dimensional representation of electrical attributes and graph semantics. Finally, case studies on five urban LVDGs in the Wuhan region are conducted to validate the effectiveness and practicality of the proposed F-GAT model. Full article
Show Figures

Figure 1

24 pages, 3598 KiB  
Article
Information Disclosure in the Context of Combating Climate Change: Evidence from the Chinese Natural Gas Industry
by Xufei Pang, Peidong Zhang, Zhen Guo, Xiaoping Jia, Raymond R. Tan, Yanmei Zhang and Xiaohan Qu
Sustainability 2025, 17(10), 4315; https://doi.org/10.3390/su17104315 - 9 May 2025
Viewed by 498
Abstract
Natural gas (NG) is a key transitional energy source for clean energy transition. Against the backdrop of a grim climate change situation, the sustainable development of the Chinese NG industry is emphasized. Climate change disclosure (CCD) has become an important way for corporations [...] Read more.
Natural gas (NG) is a key transitional energy source for clean energy transition. Against the backdrop of a grim climate change situation, the sustainable development of the Chinese NG industry is emphasized. Climate change disclosure (CCD) has become an important way for corporations to fulfill their social responsibility and demonstrate their capacity for sustainable development. In order to understand the current status of CCD in the Chinese NG industry and to improve the deficiencies, this paper assesses the quality of CCD in the Chinese NG industry. Climate change information is not fully covered by the existing quality evaluation systems. This study establishes a highly applicable system for evaluating the quality of CCD based on the theory pillar perspective. It includes the following five dimensions: completeness, balance, reliability, comparability, and understandability. This study evaluates the quality of CCD of 58 NG corporations using content analysis and quality evaluation index methods, incorporating Skip-Gram and CRITIC models. The evaluation results indicate that the quality of climate reports in the Chinese NG industry has shown general improvement over time; however, inconsistencies remain, making comparisons challenging. There are differences in the level of quality of CCD in the Chinese NG industry. Policy incentives with clear guidance and regional economic development conditions have a notable impact on the quality of CCD. For Chinese NG corporations themselves, disclosing climate change information related to risk management is the focus of narrowing the reporting gap. The CCD quality evaluation system constructed in this paper provides a theoretical reference for all industries to accurately promote disclosure quality. It also provides practical guidelines for corporations to identify weak links in CCD. Full article
(This article belongs to the Section Energy Sustainability)
Show Figures

Figure 1

25 pages, 1964 KiB  
Article
Hate Speech Detection and Online Public Opinion Regulation Using Support Vector Machine Algorithm: Application and Impact on Social Media
by Siyuan Li and Zhi Li
Information 2025, 16(5), 344; https://doi.org/10.3390/info16050344 - 24 Apr 2025
Viewed by 797
Abstract
Detecting hate speech in social media is challenging due to its rarity, high-dimensional complexity, and implicit expression via sarcasm or spelling variations, rendering linear models ineffective. In this study, the SVM (Support Vector Machine) algorithm is used to map text features from low-dimensional [...] Read more.
Detecting hate speech in social media is challenging due to its rarity, high-dimensional complexity, and implicit expression via sarcasm or spelling variations, rendering linear models ineffective. In this study, the SVM (Support Vector Machine) algorithm is used to map text features from low-dimensional to high-dimensional space using kernel function techniques to meet complex nonlinear classification challenges. By maximizing the category interval to locate the optimal hyperplane and combining nuclear techniques to implicitly adjust the data distribution, the classification accuracy of hate speech detection is significantly improved. Data collection leverages social media APIs (Application Programming Interface) and customized crawlers with OAuth2.0 authentication and keyword filtering, ensuring relevance. Regular expressions validate data integrity, followed by preprocessing steps such as denoising, stop-word removal, and spelling correction. Word embeddings are generated using Word2Vec’s Skip-gram model, combined with TF-IDF (Term Frequency–Inverse Document Frequency) weighting to capture contextual semantics. A multi-level feature extraction framework integrates sentiment analysis via lexicon-based methods and BERT for advanced sentiment recognition. Experimental evaluations on two datasets demonstrate the SVM model’s effectiveness, achieving accuracies of 90.42% and 92.84%, recall rates of 88.06% and 90.79%, and average inference times of 3.71 ms and 2.96 ms. These results highlight the model’s ability to detect implicit hate speech accurately and efficiently, supporting real-time monitoring. This research contributes to creating a safer online environment by advancing hate speech detection methodologies. Full article
(This article belongs to the Special Issue Information Technology in Society)
Show Figures

Figure 1

30 pages, 42410 KiB  
Article
The Application of Lite-GRU Embedding and VAE-Augmented Heterogeneous Graph Attention Network in Friend Link Prediction for LBSNs
by Ziteng Yang, Boyu Li, Yong Wang and Aoxue Liu
Appl. Sci. 2025, 15(8), 4585; https://doi.org/10.3390/app15084585 - 21 Apr 2025
Viewed by 529
Abstract
Friend link prediction is an important issue in recommendation systems and social network analysis. In Location-Based Social Networks (LBSNs), predicting potential friend relationships faces significant challenges due to the diversity of user behaviors, along with the high dimensionality, sparsity, and complex noise in [...] Read more.
Friend link prediction is an important issue in recommendation systems and social network analysis. In Location-Based Social Networks (LBSNs), predicting potential friend relationships faces significant challenges due to the diversity of user behaviors, along with the high dimensionality, sparsity, and complex noise in the data. To address these issues, this paper proposes a Heterogeneous Graph Attention Network (GEVEHGAN) model based on Lite Gate Recurrent Unit (Lite-GRU) embedding and Variational Autoencoder (VAE) enhancement. The model constructs a heterogeneous graph with two types of nodes and three types of edges; combines Skip-Gram and Lite-GRU to learn Point of Interest (POI) and user node embeddings; introduces VAE for dimensionality reduction and denoising of the embeddings; and employs edge-level attention mechanisms to enhance information propagation and feature aggregation. Experiments are conducted on the publicly available Foursquare dataset. The results show that the GEVEHGAN model outperforms other comparative models in evaluation metrics such as AUC, AP, and Top@K accuracy, demonstrating its superior performance in the friend link prediction task. Full article
Show Figures

Figure 1

22 pages, 7770 KiB  
Article
Advancing Arabic Word Embeddings: A Multi-Corpora Approach with Optimized Hyperparameters and Custom Evaluation
by Azzah Allahim and Asma Cherif
Appl. Sci. 2024, 14(23), 11104; https://doi.org/10.3390/app142311104 - 28 Nov 2024
Cited by 1 | Viewed by 1433
Abstract
The expanding Arabic user base presents a unique opportunity for researchers to tap into vast online Arabic resources. However, the lack of reliable Arabic word embedding models and the limited availability of Arabic corpora poses significant challenges. This paper addresses these gaps by [...] Read more.
The expanding Arabic user base presents a unique opportunity for researchers to tap into vast online Arabic resources. However, the lack of reliable Arabic word embedding models and the limited availability of Arabic corpora poses significant challenges. This paper addresses these gaps by developing and evaluating Arabic word embedding models trained on diverse Arabic corpora, investigating how varying hyperparameter values impact model performance across different NLP tasks. To train our models, we collected data from three distinct sources: Wikipedia, newspapers, and 32 Arabic books, each selected to capture specific linguistic and contextual features of Arabic. By using advanced techniques such as Word2Vec and FastText, we experimented with different hyperparameter configurations, such as vector size, window size, and training algorithms (CBOW and skip-gram), to analyze their impact on model quality. Our models were evaluated using a range of NLP tasks, including sentiment analysis, similarity tests, and an adapted analogy test designed specifically for Arabic. The findings revealed that both the corpus size and hyperparameter settings had notable effects on performance. For instance, in the analogy test, a larger vocabulary size significantly improved outcomes, with the FastText skip-gram models excelling in accurately solving analogy questions. For sentiment analysis, vocabulary size was critical, while in similarity scoring, the FastText models achieved the highest scores, particularly with smaller window and vector sizes. Overall, our models demonstrated strong performance, achieving 99% and 90% accuracies in sentiment analysis and the analogy test, respectively, along with a similarity score of 8 out of 10. These results underscore the value of our models as a robust tool for Arabic NLP research, addressing a pressing need for high-quality Arabic word embeddings. Full article
Show Figures

Figure 1

19 pages, 2595 KiB  
Article
Ancient Text Translation Model Optimized with GujiBERT and Entropy-SkipBERT
by Fuxing Yu, Rui Han, Yanchao Zhang and Yang Han
Electronics 2024, 13(22), 4492; https://doi.org/10.3390/electronics13224492 - 15 Nov 2024
Cited by 1 | Viewed by 1195
Abstract
To cope with the challenges posed by the complex linguistic structure and lexical polysemy in ancient texts, this study proposes a two-stage translation model. First, we combine GujiBERT, GCN, and LSTM to categorize ancient texts into historical and non-historical categories. This categorization lays [...] Read more.
To cope with the challenges posed by the complex linguistic structure and lexical polysemy in ancient texts, this study proposes a two-stage translation model. First, we combine GujiBERT, GCN, and LSTM to categorize ancient texts into historical and non-historical categories. This categorization lays the foundation for the subsequent translation task. To improve the efficiency of word vector generation and reduce the limitations of the traditional Word2Vec model, we integrated the entropy weight method in the hopping lattice training process and spliced the word vectors with GujiBERT. This improved method improves the efficiency of word vector generation and enhances the model’s ability to accurately represent lexical polysemy and grammatical structure in ancient documents through dependency weighting. In training the translation model, we used a different dataset for each text category, significantly improving the translation accuracy. Experimental results show that our categorization model improves the accuracy by 5% compared to GujiBERT. In contrast, the Entropy-SkipBERT improves the BLEU scores by 0.7 and 0.4 on historical and non-historical datasets. Ultimately, the proposed two-stage model improves the BLEU scores by 2.7 over the baseline model. Full article
Show Figures

Figure 1

18 pages, 2925 KiB  
Article
Exploring the Features and Trends of Industrial Product E-Commerce in China Using Text-Mining Approaches
by Zhaoyang Sun, Qi Zong, Yuxin Mao and Gongxing Wu
Information 2024, 15(11), 712; https://doi.org/10.3390/info15110712 - 6 Nov 2024
Cited by 2 | Viewed by 1476
Abstract
Industrial product e-commerce refers to the specific application of the e-commerce concept in industrial product transactions. It enables industrial enterprises to conduct transactions via Internet platforms and reduce circulation and operating costs. Industrial literature, such as policies, reports, and standards related to industrial [...] Read more.
Industrial product e-commerce refers to the specific application of the e-commerce concept in industrial product transactions. It enables industrial enterprises to conduct transactions via Internet platforms and reduce circulation and operating costs. Industrial literature, such as policies, reports, and standards related to industrial product e-commerce, contains much crucial information. Through a systematical analysis of this information, we can explore and comprehend the development characteristics and trends of industrial product e-commerce. To this end, 18 policy documents, 10 industrial reports, and five standards are analyzed by employing text-mining methods. Firstly, natural language processing (NLP) technology is utilized to pre-process the text data related to industrial product commerce. Then, word frequency statistics and TF-IDF keyword extraction are performed, and the word frequency statistics are visually represented. Subsequently, the feature set is obtained by combining these processes with the manual screening method. The original text corpus is used as the training set by employing the skip-gram model in Word2Vec, and the feature words are transformed into word vectors in the multi-dimensional space. The K-means algorithm is used to cluster the feature words into groups. The latent Dirichlet allocation (LDA) method is then utilized to further group and discover the features. The text-mining results provide evidence for the development characteristics and trends of industrial product e-commerce in China. Full article
Show Figures

Graphical abstract

19 pages, 1401 KiB  
Article
Enhancing Arabic Sentiment Analysis of Consumer Reviews: Machine Learning and Deep Learning Methods Based on NLP
by Hani Almaqtari, Feng Zeng and Ammar Mohammed
Algorithms 2024, 17(11), 495; https://doi.org/10.3390/a17110495 - 3 Nov 2024
Cited by 2 | Viewed by 1942
Abstract
Sentiment analysis utilizes Natural Language Processing (NLP) techniques to extract opinions from text, which is critical for businesses looking to refine strategies and better understand customer feedback. Understanding people’s sentiments about products through emotional tone analysis is paramount. However, analyzing sentiment in Arabic [...] Read more.
Sentiment analysis utilizes Natural Language Processing (NLP) techniques to extract opinions from text, which is critical for businesses looking to refine strategies and better understand customer feedback. Understanding people’s sentiments about products through emotional tone analysis is paramount. However, analyzing sentiment in Arabic and its dialects poses challenges due to the language’s intricate morphology, right-to-left script, and nuanced emotional expressions. To address this, this study introduces the Arb-MCNN-Bi Model, which integrates the strengths of the transformer-based AraBERT (Arabic Bidirectional Encoder Representations from Transformers) model with a Multi-channel Convolutional Neural Network (MCNN) and a Bidirectional Gated Recurrent Unit (BiGRU) for Arabic sentiment analysis. AraBERT, designed specifically for Arabic, captures rich contextual information through word embeddings. These embeddings are processed by the MCNN to enhance feature extraction and by the BiGRU to retain long-term dependencies. The final output is obtained through feedforward neural networks. The study compares the proposed model with various machine learning and deep learning methods, applying advanced NLP techniques such as Term Frequency-Inverse Document Frequency (TF-IDF), n-gram, Word2Vec (Skip-gram), and fastText (Skip-gram). Experiments are conducted on three Arabic datasets: the Arabic Customer Reviews Dataset (ACRD), Large-scale Arabic Book Reviews (LABR), and the Hotel Arabic Reviews dataset (HARD). The Arb-MCNN-Bi model with AraBERT achieved accuracies of 96.92%, 96.68%, and 92.93% on the ACRD, HARD, and LABR datasets, respectively. These results demonstrate the model’s effectiveness in analyzing Arabic text data and outperforming traditional approaches. Full article
Show Figures

Figure 1

18 pages, 446 KiB  
Article
Skip-Gram and Transformer Model for Session-Based Recommendation
by Enes Celik and Sevinc Ilhan Omurca
Appl. Sci. 2024, 14(14), 6353; https://doi.org/10.3390/app14146353 - 21 Jul 2024
Cited by 6 | Viewed by 2606
Abstract
Session-based recommendation uses past clicks and interaction sequences from anonymous users to predict the next item most likely to be clicked. Predicting the user’s subsequent behavior in online transactions becomes a problem mainly due to the lack of user information and limited behavioral [...] Read more.
Session-based recommendation uses past clicks and interaction sequences from anonymous users to predict the next item most likely to be clicked. Predicting the user’s subsequent behavior in online transactions becomes a problem mainly due to the lack of user information and limited behavioral information. Existing methods, such as recurrent neural network (RNN)-based models that model user’s past behavior sequences and graph neural network (GNN)-based models that capture potential relationships between items, miss different time intervals in the past behavior sequence and can only capture certain types of user interest patterns due to the characteristics of neural networks. Graphic models created to improve the current session reduce the model’s success due to the addition of irrelevant items. Moreover, attention mechanisms in recent approaches have been insufficient due to weak representations of users and products. In this study, we propose a model based on the combination of skip-gram and transformer (SkipGT) to solve the above-mentioned drawbacks in session-based recommendation systems. In the proposed method, skip-gram both captures chained user interest in the session thread through item-specific subreddits and learns complex interaction information between items. The proposed method captures short-term and long-term preference representations to predict the next click with the help of a transformer. The transformer in our proposed model overcomes many limitations in turn-based models and models longer contextual connections between items more effectively. In our proposed model, by giving the transformer trained item embeddings from the skip-gram model as input, the transformer has better performance because it does not learn item representations from scratch. By conducting extensive experiments with three real-world datasets, we confirm that SkipGT significantly outperforms state-of-the-art solutions with an average MRR score of 5.58%. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

17 pages, 855 KiB  
Article
Automated Identification of Sensitive Financial Data Based on the Topic Analysis
by Meng Li, Jiqiang Liu and Yeping Yang
Future Internet 2024, 16(2), 55; https://doi.org/10.3390/fi16020055 - 8 Feb 2024
Cited by 2 | Viewed by 1909
Abstract
Data governance is an extremely important protection and management measure throughout the entire life cycle of data. However, there are still data governance issues, such as data security risks, data privacy breaches, and difficulties in data management and access control. These problems lead [...] Read more.
Data governance is an extremely important protection and management measure throughout the entire life cycle of data. However, there are still data governance issues, such as data security risks, data privacy breaches, and difficulties in data management and access control. These problems lead to a risk of data breaches and abuse. Therefore, the security classification and grading of data has become an important task to accurately identify sensitive data and adopt appropriate maintenance and management measures with different sensitivity levels. This work started from the problems existing in the current data security classification and grading work, such as inconsistent classification and grading standards, difficult data acquisition and sorting, and weak semantic information of data fields, to find the limitations of the current methods and the direction for improvement. The automatic identification method of sensitive financial data proposed in this paper is based on topic analysis and was constructed by incorporating Jieba word segmentation, word frequency statistics, the skip-gram model, K-means clustering, and other technologies. Expert assistance was sought to select appropriate keywords for enhanced accuracy. This work used the descriptive text library and real business data of a Chinese financial institution for training and testing to further demonstrate its effectiveness and usefulness. The evaluation indicators illustrated the effectiveness of this method in the classification of data security. The proposed method addressed the challenge of sensitivity level division in texts with limited semantic information, which overcame the limitations on model expansion across different domains and provided an optimized application model. All of the above pointed out the direction for the real-time updating of the method. Full article
Show Figures

Figure 1

17 pages, 761 KiB  
Article
It’s Not Always about Wide and Deep Models: Click-Through Rate Prediction with a Customer Behavior-Embedding Representation
by Miguel Alves Gomes, Richard Meyes, Philipp Meisen and Tobias Meisen
J. Theor. Appl. Electron. Commer. Res. 2024, 19(1), 135-151; https://doi.org/10.3390/jtaer19010008 - 12 Jan 2024
Viewed by 3649
Abstract
Alongside natural language processing and computer vision, large learning models have found their way into e-commerce. Especially, for recommender systems and click-through rate prediction, these models have shown great predictive power. In this work, we aim to predict the probability that a customer [...] Read more.
Alongside natural language processing and computer vision, large learning models have found their way into e-commerce. Especially, for recommender systems and click-through rate prediction, these models have shown great predictive power. In this work, we aim to predict the probability that a customer will click on a given recommendation, given only its current session. Therefore, we propose a two-stage approach consisting of a customer behavior-embedding representation and a recurrent neural network. In the first stage, we train a self-supervised skip-gram embedding on customer activity data. The resulting embedding representation is used in the second stage to encode the customer sequences which are then used as input to the learning model. Our proposed approach diverges from the prevailing trend of utilizing extensive end-to-end models for click-through rate prediction. The experiments, which incorporate a real-world industrial use case and a widely used as well as openly available benchmark dataset, demonstrate that our approach outperforms the current state-of-the-art models. Our approach predicts customers’ click intention with an average F1 accuracy of 94% for the industrial use case which is one percentage point higher than the state-of-the-art baseline and an average F1 accuracy of 79% for the benchmark dataset, which outperforms the best tested state-of-the-art baseline by more than seven percentage points. The results show that, contrary to current trends in that field, large end-to-end models are not always needed. The analysis of our experiments suggests that the reason for the performance of our approach is the self-supervised pre-trained embedding of customer behavior that we use as the customer representation. Full article
(This article belongs to the Topic Online User Behavior in the Context of Big Data)
Show Figures

Figure 1

17 pages, 2745 KiB  
Article
MDBF: Meta-Path-Based Depth and Breadth Feature Fusion for Recommendation in Heterogeneous Network
by Hongjuan Liu and Huairui Zhang
Electronics 2023, 12(19), 4017; https://doi.org/10.3390/electronics12194017 - 24 Sep 2023
Cited by 1 | Viewed by 1779
Abstract
The main challenge of recommendation in a heterogeneous information network comes from the diversity of nodes and links and the problem of semantic expression ambiguity caused by diversity. Therefore, we propose a movie recommendation algorithm for a heterogeneous network called Meta-Path-Based Depth and [...] Read more.
The main challenge of recommendation in a heterogeneous information network comes from the diversity of nodes and links and the problem of semantic expression ambiguity caused by diversity. Therefore, we propose a movie recommendation algorithm for a heterogeneous network called Meta-Path-Based Depth and Breadth Feature Fusion(MDBF). Using a random walk for depth feature learning, we can extract a depth feature meta-path that reflects the overall structure of the network. In addition, using random walks in adjacent nodes, we can extract a breadth feature meta-path, preserving the neighborhood information of a node. If there is some auxiliary information, it will be learned by its own meta-paths. Then, all of the feature sequences can be fused and learned by the Skip-gram algorithm to obtain the final feature vector. In the recommendation process, based on traditional collaborative filtering, we propose a secondary filtering recommendation. The experimental results show that, without external auxiliary information, compared to the existing state-of-the-art models, the algorithm improves each index by an average of 12% on MovieLens and 22% on MovieTweetings. The algorithm not only improves the effect of movie recommendation, but also provides application scenarios for accurate recommendation services through auxiliary information. Full article
(This article belongs to the Special Issue Artificial Intelligence Technologies and Applications)
Show Figures

Figure 1

23 pages, 991 KiB  
Article
Improved Skip-Gram Based on Graph Structure Information
by Xiaojie Wang, Haijun Zhao and Huayue Chen
Sensors 2023, 23(14), 6527; https://doi.org/10.3390/s23146527 - 19 Jul 2023
Cited by 2 | Viewed by 2942
Abstract
Applying the Skip-gram to graph representation learning has become a widely researched topic in recent years. Prior works usually focus on the migration application of the Skip-gram model, while Skip-gram in graph representation learning, initially applied to word embedding, is left insufficiently explored. [...] Read more.
Applying the Skip-gram to graph representation learning has become a widely researched topic in recent years. Prior works usually focus on the migration application of the Skip-gram model, while Skip-gram in graph representation learning, initially applied to word embedding, is left insufficiently explored. To compensate for the shortcoming, we analyze the difference between word embedding and graph embedding and reveal the principle of graph representation learning through a case study to explain the essential idea of graph embedding intuitively. Through the case study and in-depth understanding of graph embeddings, we propose Graph Skip-gram, an extension of the Skip-gram model using graph structure information. Graph Skip-gram can be combined with a variety of algorithms for excellent adaptability. Inspired by word embeddings in natural language processing, we design a novel feature fusion algorithm to fuse node vectors based on node vector similarity. We fully articulate the ideas of our approach on a small network and provide extensive experimental comparisons, including multiple classification tasks and link prediction tasks, demonstrating that our proposed approach is more applicable to graph representation learning. Full article
(This article belongs to the Special Issue Recent Trends and Advances in Fault Detection and Diagnostics)
Show Figures

Figure 1

Back to TopTop