MDPI - Publisher of Open Access Journals

17 pages, 5144 KiB

Open AccessArticle

Deep Learning-Based Truthful and Deceptive Hotel Reviews

by Devbrat Gupta, Anuja Bhargava, Diwakar Agarwal, Mohammed H. Alsharif, Peerapong Uthansakul, Monthippa Uthansakul and Ayman A. Aly

Sustainability 2024, 16(11), 4514; https://doi.org/10.3390/su16114514 - 26 May 2024

Cited by 3 | Viewed by 2388

Abstract

For sustainable hospitality and tourism, the validity of online evaluations is crucial at a time when they influence travelers’ choices. Understanding the facts and conducting a thorough investigation to distinguish between truthful and deceptive hotel reviews are crucial. The urgent need to discern [...] Read more.

For sustainable hospitality and tourism, the validity of online evaluations is crucial at a time when they influence travelers’ choices. Understanding the facts and conducting a thorough investigation to distinguish between truthful and deceptive hotel reviews are crucial. The urgent need to discern between truthful and deceptive hotel reviews is addressed by the current study. This misleading “opinion spam” is common in the hospitality sector, misleading potential customers and harming the standing of hotel review websites. This data science project aims to create a reliable detection system that correctly recognizes and classifies hotel reviews as either true or misleading. When it comes to natural language processing, sentiment analysis is essential for determining the text’s emotional tone. With an 800-instance dataset comprising true and false reviews, this study investigates the sentiment analysis performance of three deep learning models: Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Recurrent Neural Network (RNN). Among the training, testing, and validation sets, the CNN model yielded the highest accuracy rates, measuring 98%, 77%, and 80%, respectively. Despite showing balanced precision and recall, the LSTM model was not as accurate as the CNN model, with an accuracy of 60%. There were difficulties in capturing sequential relationships, for which the RNN model further trailed, with accuracy rates of 57%, 57%, and 58%. A thorough assessment of every model’s performance was conducted using ROC curves and classification reports. Full article

(This article belongs to the Special Issue Smart Technologies and Sustainable Development in Hospitality and Tourism)

► Show Figures

Figure 1

16 pages, 487 KiB

Open AccessArticle

The Language of Deception: Applying Findings on Opinion Spam to Legal and Forensic Discourses

by Alibek Jakupov, Julien Longhi and Besma Zeddini

Languages 2024, 9(1), 10; https://doi.org/10.3390/languages9010010 - 22 Dec 2023

Cited by 2 | Viewed by 4098

Abstract

Digital forensic investigations are becoming increasingly crucial in criminal investigations and civil litigations, especially in cases of corporate espionage and intellectual property theft as more communication occurs online via e-mail and social media. Deceptive opinion spam analysis is an emerging field of research [...] Read more.

Digital forensic investigations are becoming increasingly crucial in criminal investigations and civil litigations, especially in cases of corporate espionage and intellectual property theft as more communication occurs online via e-mail and social media. Deceptive opinion spam analysis is an emerging field of research that aims to detect and identify fraudulent reviews, comments, and other forms of deceptive online content. In this paper, we explore how the findings from this field may be relevant to forensic investigation, particularly the features that capture stylistic patterns and sentiments, which are psychologically relevant aspects of truthful and deceptive language. To assess these features’ utility, we demonstrate the potential of our proposed approach using the real-world dataset from the Enron Email Corpus. Our findings suggest that deceptive opinion spam analysis may be a valuable tool for forensic investigators and legal professionals looking to identify and analyze deceptive behavior in online communication. By incorporating these techniques into their investigative and legal strategies, professionals can improve the accuracy and reliability of their findings, leading to more effective and just outcomes. Full article

(This article belongs to the Special Issue New Challenges in Forensic and Legal Linguistics)

► Show Figures

Figure 1

16 pages, 1393 KiB

Open AccessFeature PaperEditor’s ChoiceArticle

Analyzing Political Polarization on Social Media by Deleting Bot Spamming

by Riccardo Cantini, Fabrizio Marozzo, Domenico Talia and Paolo Trunfio

Big Data Cogn. Comput. 2022, 6(1), 3; https://doi.org/10.3390/bdcc6010003 - 4 Jan 2022

Cited by 17 | Viewed by 8704

Abstract

Social media platforms are part of everyday life, allowing the interconnection of people around the world in large discussion groups relating to every topic, including important social or political issues. Therefore, social media have become a valuable source of information-rich data, commonly referred [...] Read more.

Social media platforms are part of everyday life, allowing the interconnection of people around the world in large discussion groups relating to every topic, including important social or political issues. Therefore, social media have become a valuable source of information-rich data, commonly referred to as Social Big Data, effectively exploitable to study the behavior of people, their opinions, moods, interests and activities. However, these powerful communication platforms can be also used to manipulate conversation, polluting online content and altering the popularity of users, through spamming activities and misinformation spreading. Recent studies have shown the use on social media of automatic entities, defined as social bots, that appear as legitimate users by imitating human behavior aimed at influencing discussions of any kind, including political issues. In this paper we present a new methodology, namely TIMBRE (Time-aware opInion Mining via Bot REmoval), aimed at discovering the polarity of social media users during election campaigns characterized by the rivalry of political factions. This methodology is temporally aware and relies on a keyword-based classification of posts and users. Moreover, it recognizes and filters out data produced by social media bots, which aim to alter public opinion about political candidates, thus avoiding heavily biased information. The proposed methodology has been applied to a case study that analyzes the polarization of a large number of Twitter users during the 2016 US presidential election. The achieved results show the benefits brought by both removing bots and taking into account temporal aspects in the forecasting process, revealing the high accuracy and effectiveness of the proposed approach. Finally, we investigated how the presence of social bots may affect political discussion by studying the 2016 US presidential election. Specifically, we analyzed the main differences between human and artificial political support, estimating also the influence of social bots on legitimate users. Full article

(This article belongs to the Special Issue Big Data and Cognitive Computing: 5th Anniversary Feature Papers)

► Show Figures

Figure 1

15 pages, 2093 KiB

Open AccessArticle

Quantifiable Interactivity of Malicious URLs and the Social Media Ecosystem

by Chun-Ming Lai, Hung-Jr Shiu and Jon Chapman

Electronics 2020, 9(12), 2020; https://doi.org/10.3390/electronics9122020 - 30 Nov 2020

Cited by 5 | Viewed by 2703

Abstract

Online social network (OSN) users are increasingly interacting with each other via articles, comments, and responses. When access control mechanisms are weak or absent, OSNs are perceived by attackers as rich environments for influencing public opinions via fake news posts or influencing commercial [...] Read more.

Online social network (OSN) users are increasingly interacting with each other via articles, comments, and responses. When access control mechanisms are weak or absent, OSNs are perceived by attackers as rich environments for influencing public opinions via fake news posts or influencing commercial transactions via practices such as phishing. This has led to a body of research looking at potential ways to predict OSN user behavior using social science concepts such as conformity and the bandwagon effect. In this paper, we address the question of how social recommendation systems affect the occurrence of malicious URLs on Facebook, based on the assumption that there are no differences among recommendation systems in terms of delivering either legitimate or harmful information to users. Next, we use temporal features to build a prediction framework with >75% accuracy to predict increases in certain user group behaviors. Our effort involves the demarcation of URL classes, from malicious URLs viewed as causing significant damage to annoying spam messages and advertisements. We offer this analysis to better understand OSN user sensors reactions to various categories of malicious URLs in order to mitigate their effects. Full article

(This article belongs to the Special Issue New Challenges on Cyber Threat Intelligence)

► Show Figures

Figure 1

16 pages, 14677 KiB

Open AccessData Descriptor

Digital Psychological Platform for Mass Web-Surveys

by Evgeny Nikulchev, Dmitry Ilin, Anastasiya Silaeva, Pavel Kolyasnikov, Vladimir Belov, Andrey Runtov, Pavel Pushkin, Nikolay Laptev, Anna Alexeenko, Shamil Magomedov, Alexander Kosenkov, Ilya Zakharov, Victoria Ismatullina and Sergey Malykh

Data 2020, 5(4), 95; https://doi.org/10.3390/data5040095 - 5 Oct 2020

Cited by 10 | Viewed by 3799

Abstract

Web-surveys are one of the most popular forms of primary data collection used for various researches. However, mass surveys involve some challenges. It is required to consider different platforms and browsers, as well as different data transfer rates using connections in different regions [...] Read more.

Web-surveys are one of the most popular forms of primary data collection used for various researches. However, mass surveys involve some challenges. It is required to consider different platforms and browsers, as well as different data transfer rates using connections in different regions of the country. Ensuring guaranteed data delivery in these conditions should determine the right choice of technologies for implementing web-surveys. The paper describes the solution to transfer a questionnaire to the client side in the form of an archive. This technological solution ensures independence from the data transfer rate and the stability of the communication connection with significant survey filling time. The conducted survey benefited the service of education psychologists under the federal Ministry of Education. School psychologists consciously took part in the survey, realizing the importance of their opinion for organizing and improving their professional activities. The desire to answer open-ended questions in detail created a part of the answers in the dataset, where there were several sentences about different aspects of professional activity. An important challenge of the problem is the Russian language, for which there are not as many tools as for the languages more widespread in the world. The survey involved 20,443 school psychologists from all regions of the Russian Federation, both from urban and rural areas. The answers did not contain spam, runaround answers, and so on as evidenced by the average response time. For the surveys, an authoring development tool DigitalPsyTools.ru was used. Full article

► Show Figures

Figure 1

16 pages, 2452 KiB

Open AccessArticle

Content Noise Detection Model Using Deep Learning in Web Forums

by Jiyoung Woo and Jaeseok Yun

Sustainability 2020, 12(12), 5074; https://doi.org/10.3390/su12125074 - 22 Jun 2020

Cited by 3 | Viewed by 2925

Abstract

Spam posts in web forum discussions cause user inconvenience and lower the value of the web forum as an open source of user opinion. In this regard, as the importance of a web post is evaluated in terms of the number of involved [...] Read more.

Spam posts in web forum discussions cause user inconvenience and lower the value of the web forum as an open source of user opinion. In this regard, as the importance of a web post is evaluated in terms of the number of involved authors, noise distorts the analysis results by adding unnecessary data to the opinion analysis. Here, in this work, an automatic detection model for spam posts in web forums using both conventional machine learning and deep learning is proposed. To automatically differentiate between normal posts and spam, evaluators were asked to recognize spam posts in advance. To construct the machine learning-based model, text features from posted content using text mining techniques from the perspective of linguistics were extracted, and supervised learning was performed to distinguish content noise from normal posts. For the deep learning model, raw text including and excluding special characters was utilized. A comparison analysis on deep neural networks using the two different recurrent neural network (RNN) models of the simple RNN and long short-term memory (LSTM) network was also performed. Furthermore, the proposed model was applied to two web forums. The experimental results indicate that the deep learning model affords significant improvements over the accuracy of conventional machine learning associated with text features. The accuracy of the proposed model using LSTM reaches 98.56%, and the precision and recall of the noise class reach 99% and 99.53%, respectively. Full article

(This article belongs to the Section Economic and Business Aspects of Sustainability)

► Show Figures

Figure 1

26 pages, 1524 KiB

Open AccessReview

Spam Review Detection Techniques: A Systematic Literature Review

by Naveed Hussain, Hamid Turab Mirza, Ghulam Rasool, Ibrar Hussain and Mohammad Kaleem

Appl. Sci. 2019, 9(5), 987; https://doi.org/10.3390/app9050987 - 8 Mar 2019

Cited by 81 | Viewed by 11289

Abstract

Online reviews about the purchase of products or services provided have become the main source of users’ opinions. In order to gain profit or fame, usually spam reviews are written to promote or demote a few target products or services. This practice is [...] Read more.

Online reviews about the purchase of products or services provided have become the main source of users’ opinions. In order to gain profit or fame, usually spam reviews are written to promote or demote a few target products or services. This practice is known as review spamming. In the past few years, a variety of methods have been suggested in order to solve the issue of spam reviews. In this study, the researchers carry out a comprehensive review of existing studies on spam review detection using the Systematic Literature Review (SLR) approach. Overall, 76 existing studies are reviewed and analyzed. The researchers evaluated the studies based on how features are extracted from review datasets and different methods and techniques that are employed to solve the review spam detection problem. Moreover, this study analyzes different metrics that are used for the evaluation of the review spam detection methods. This literature review identified two major feature extraction techniques and two different approaches to review spam detection. In addition, this study has identified different performance metrics that are commonly used to evaluate the accuracy of the review spam detection models. Lastly, this work presents an overall discussion about different feature extraction approaches from review datasets, the proposed taxonomy of spam review detection approaches, evaluation measures, and publicly available review datasets. Research gaps and future directions in the domain of spam review detection are also presented. This research identified that success factors of any review spam detection method have interdependencies. The feature’s extraction depends upon the review dataset, and the accuracy of review spam detection methods is dependent upon the selection of the feature engineering approach. Therefore, for the successful implementation of the spam review detection model and to achieve better accuracy, these factors are required to be considered in accordance with each other. To the best of the researchers’ knowledge, this is the first comprehensive review of existing studies in the domain of spam review detection using SLR process. Full article

► Show Figures

Figure 1

11 pages, 815 KiB

Open AccessArticle

CoFea: A Novel Approach to Spam Review Identification Based on Entropy and Co-Training

by Wen Zhang, Chaoqi Bu, Taketoshi Yoshida and Siguang Zhang

Entropy 2016, 18(12), 429; https://doi.org/10.3390/e18120429 - 30 Nov 2016

Cited by 9 | Viewed by 4800

Abstract

With the rapid development of electronic commerce, spam reviews are rapidly growing on the Internet to manipulate online customers’ opinions on goods being sold. This paper proposes a novel approach, called CoFea (Co-training by Features), to identify spam reviews, based on entropy and [...] Read more.

With the rapid development of electronic commerce, spam reviews are rapidly growing on the Internet to manipulate online customers’ opinions on goods being sold. This paper proposes a novel approach, called CoFea (Co-training by Features), to identify spam reviews, based on entropy and the co-training algorithm. After sorting all lexical terms of reviews by entropy, we produce two views on the reviews by dividing the lexical terms into two subsets. One subset contains odd-numbered terms and the other contains even-numbered terms. Using SVM (support vector machine) as the base classifier, we further propose two strategies, CoFea-T and CoFea-S, embedded with the CoFea approach. The CoFea-T strategy uses all terms in the subsets for spam review identification by SVM. The CoFea-S strategy uses a predefined number of terms with small entropy for spam review identification by SVM. The experiment results show that the CoFea-T strategy produces better accuracy than the CoFea-S strategy, while the CoFea-S strategy saves more computing time than the CoFea-T strategy with acceptable accuracy in spam review identification. Full article

► Show Figures

Figure 1

15 pages, 765 KiB

Open AccessArticle

CoSpa: A Co-training Approach for Spam Review Identification with Support Vector Machine

by Wen Zhang, Chaoqi Bu, Taketoshi Yoshida and Siguang Zhang

Information 2016, 7(1), 12; https://doi.org/10.3390/info7010012 - 9 Mar 2016

Cited by 23 | Viewed by 5361

Abstract

Spam reviews are increasingly appearing on the Internet to promote sales or defame competitors by misleading consumers with deceptive opinions. This paper proposes a co-training approach called CoSpa (Co-training for Spam review identification) to identify spam reviews by two views: one is the [...] Read more.

Spam reviews are increasingly appearing on the Internet to promote sales or defame competitors by misleading consumers with deceptive opinions. This paper proposes a co-training approach called CoSpa (Co-training for Spam review identification) to identify spam reviews by two views: one is the lexical terms derived from the textual content of the reviews and the other is the PCFG (Probabilistic Context-Free Grammars) rules derived from a deep syntax analysis of the reviews. Using SVM (Support Vector Machine) as the base classifier, we develop two strategies, CoSpa-C and CoSpa-U, embedded within the CoSpa approach. The CoSpa-C strategy selects unlabeled reviews classified with the largest confidence to augment the training dataset to retrain the classifier. The CoSpa-U strategy randomly selects unlabeled reviews with a uniform distribution of confidence. Experiments on the spam dataset and the deception dataset demonstrate that both the proposed CoSpa algorithms outperform the traditional SVM with lexical terms and PCFG rules in spam review identification. Moreover, the CoSpa-U strategy outperforms the CoSpa-C strategy when we use the absolute value of decision function of SVM as the confidence. Full article

► Show Figures

Figure 1

Search Results (9)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (9)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI