Semantic Features-Based Discourse Analysis Using Deceptive and Real Text Reviews
Abstract
:1. Introduction
- It highlighted the deep and semantic attributes of fake and real reviews to identify them.
- It performed a comparison of word frequency–based features and deep-learning features to understand the attributes of fake versus real reviews. It is concluded that semantic awareness in text is more important as compared to frequency count only.
- It used a credibility check on reviews that could be used as a discourse analysis tool.
- The current study used methods that are robust and could be applied on other e-commerce websites to check the validity of reviews.
2. Related Work
3. Materials and Methods
3.1. Preprocessing and Lemmatization
3.2. Term Frequency–Inverse Document Frequency Features
3.3. Machine-Learning Classification
3.4. Universal Sentence Encoding (USE) Method
3.5. USE Layer–Based Deep-Learning Classification
4. Results and Discussion
4.1. Data Description
4.2. Evaluation Metrics
4.3. Experiment 1
4.3.1. Holdout Validation
4.3.2. K-Fold Validation
4.4. Experiment 2
4.5. Comparison
5. Theoretical Analysis
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Mahir, E.M.; Akhter, S.; Huq, M.R. Detecting fake news using machine learning and deep learning algorithms. In Proceedings of the 2019 7th International Conference on Smart Computing & Communications (ICSCC), Sarawak, Malaysia, 28–30 June 2019; pp. 1–5. [Google Scholar]
- Girgis, S.; Amer, E.; Gadallah, M. Deep learning algorithms for detecting fake news in online text. In Proceedings of the 2018 13th International Conference on Computer Engineering and Systems (ICCES), Cairo, Egypt, 18–19 December 2018; pp. 93–97. [Google Scholar]
- Toral, S.; Martínez-Torres, M.; Gonzalez-Rodriguez, M. Identification of the unique attributes of tourist destinations from online reviews. J. Travel Res. 2018, 57, 908–919. [Google Scholar] [CrossRef]
- Jacobs, T.; Tschötschel, R. Topic models meet discourse analysis: A quantitative tool for a qualitative approach. Int. J. Soc. Res. Methodol. 2019, 22, 469–485. [Google Scholar] [CrossRef] [Green Version]
- Popat, K.; Mukherjee, S.; Strötgen, J.; Weikum, G. CredEye: A credibility lens for analyzing and explaining misinformation. In Proceedings of the Web Conference 2018 (WWW ’18), Lyon, France, 23–27 April 2018; pp. 155–158. [Google Scholar]
- Agrawal, S.R. Adoption of WhatsApp for strengthening internal CRM through social network analysis. J. Relatsh. Mark. 2021, 20, 261–281. [Google Scholar] [CrossRef]
- Racine, S.S.J. Changing (Inter) Faces: A Genre Analysis of Catalogues from Sears, Roebuck to Amazon.com; University of Minnesota: Minneapolis, MN, USA, 2002. [Google Scholar]
- Skalicky, S. Was this analysis helpful? A genre analysis of the Amazon. com discourse community and its “most helpful” product reviews. Discourse Context Media 2013, 2, 84–93. [Google Scholar] [CrossRef]
- Chen, C.; Wen, S.; Zhang, J.; Xiang, Y.; Oliver, J.; Alelaiwi, A.; Hassan, M.M. Investigating the deceptive information in Twitter spam. Future Gener. Comput. Syst. 2017, 72, 319–326. [Google Scholar] [CrossRef]
- Feng, V.W.; Hirst, G. Detecting deceptive opinions with profile compatibility. In Proceedings of the sixth International Joint Conference on Natural Language Processing, Nagoya, Japan, 14–19 October 2013; pp. 338–346. [Google Scholar]
- Cody, M.J.; Marston, P.J.; Foster, M. Deception: Paralinguistic and verbal leakage. Ann. Inter. Commu. Assoc. 1984, 8, 464–490. [Google Scholar]
- Ramalingam, A.; Navaneethakrishnan, S.C. An Analysis on Semantic Interpretation of Tamil Literary Texts. J. Mob. Multimed. 2022, 18, 661–682. [Google Scholar] [CrossRef]
- Arenas-Marquez, F.J.; Martínez-Torres, M.R.; Toral, S. Electronic word-of-mouth communities from the perspective of social network analysis. Technol. Anal. Strateg. Manag. 2014, 26, 927–942. [Google Scholar] [CrossRef]
- Govers, R.; Go, F.M. Deconstructing destination image in the information age. Inf. Technol. Tour. 2003, 6, 13–29. [Google Scholar] [CrossRef]
- Conroy, N.K.; Rubin, V.L.; Chen, Y. Automatic deception detection: Methods for finding fake news. Proc. Assoc. Inf. Sci. Technol. 2015, 52, 1–4. [Google Scholar] [CrossRef] [Green Version]
- Mondo, T.S.; Perinotto, A.R.; Souza-Neto, V. A user-generated content analysis on the quality of restaurants using the TOURQUAL model. J. Glob. Bus. Insights 2022, 7, 1–15. [Google Scholar] [CrossRef]
- Perinotto, A.R.C.; Araújo, S.M.; Borges, V.d.P.C.; Soares, J.R.R.; Cardoso, L.; Lima Santos, L. The Development of the Hospitality Sector Facing the Digital Challenge. Behav. Sci. 2022, 12, 192. [Google Scholar] [CrossRef] [PubMed]
- Santos, A.I.G.P.; Perinotto, A.R.C.; Soares, J.R.R.; Mondo, T.S.; Cembranel, P. Expressing the Experience: An Analysis of Airbnb Customer Sentiments. Tour. Hosp. 2022, 3, 685–705. [Google Scholar] [CrossRef]
- Larcker, D.F.; Zakolyukina, A.A. Detecting deceptive discussions in conference calls. J. Account. Res. 2012, 50, 495–540. [Google Scholar] [CrossRef]
- Barbado, R.; Araque, O.; Iglesias, C.A. A framework for fake review detection in online consumer electronics retailers. Inf. Process. Manag. 2019, 56, 1234–1244. [Google Scholar] [CrossRef] [Green Version]
- Du, X.; Zhao, F.; Zhu, Z.; Han, P. DRDF: A Deceptive Review Detection Framework of Combining Word-Level, Chunk-Level, And Sentence-Level Topic-Sentiment Models. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 8–22 July 2021; pp. 1–7. [Google Scholar]
- Weng, C.H.; Lin, K.C.; Ying, J.C. Detection of Chinese Deceptive Reviews Based on Pre-Trained Language Model. Appl. Sci. 2022, 12, 3338. [Google Scholar] [CrossRef]
- Shojaee, S.; Murad, M.A.A.; Azman, A.B.; Sharef, N.M.; Nadali, S. Detecting deceptive reviews using lexical and syntactic features. In Proceedings of the 2013 13th International Conference on Intellient Systems Design and Applications, Salangor, Malaysia, 8–10 December 2013; pp. 53–58. [Google Scholar]
- Olmedilla, M.; Martínez-Torres, M.R.; Toral, S. Harvesting Big Data in social science: A methodological approach for collecting online user-generated content. Comput. Stand. Interfaces 2016, 46, 79–87. [Google Scholar] [CrossRef]
- Ku, Y.C.; Wei, C.P.; Hsiao, H.W. To whom should I listen? Finding reputable reviewers in opinion-sharing communities. Decis. Support Syst. 2012, 53, 534–542. [Google Scholar] [CrossRef]
- Li, S.; Zhong, G.; Jin, Y.; Wu, X.; Zhu, P.; Wang, Z. A Deceptive Reviews Detection Method Based on Multidimensional Feature Construction and Ensemble Feature Selection. IEEE Trans. Comput. Soc. Syst. 2022. [Google Scholar] [CrossRef]
- Cao, N.; Ji, S.; Chiu, D.K.; Gong, M. A deceptive reviews detection model: Separated training of multi-feature learning and classification. Expert Syst. Appl. 2022, 187, 115977. [Google Scholar] [CrossRef]
- Jacob, M.S.; Selvi Rajendran, P. Deceptive Product Review Identification Framework Using Opinion Mining and Machine Learning. In Mobile Radio Communications and 5G Networks; Springer: Singapore, 2022; pp. 57–72. [Google Scholar]
- Bird, S.; Klein, E.; Loper, E. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2009. [Google Scholar]
- Hub, T. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: http://download.tensorflow.org/paper/whitepaper2015.pdf (accessed on 10 December 2022).
- Ott, M.; Cardie, C.; Hancock, J.T. Negative deceptive opinion spam. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, Georgia, 9–14 June 2013; pp. 497–501. [Google Scholar]
- Ott, M.; Choi, Y.; Cardie, C.; Hancock, J.T. Finding deceptive opinion spam by any stretch of the imagination. arXiv 2011, arXiv:1107.4557. [Google Scholar]
- Rout, J.K.; Dalmia, A.; Choo, K.K.R.; Bakshi, S.; Jena, S.K. Revisiting semi-supervised learning for online deceptive review detection. IEEE Access 2017, 5, 1319–1327. [Google Scholar] [CrossRef]
- Hassan, R.; Islam, M.R. Detection of fake online reviews using semi-supervised and supervised learning. In Proceedings of the 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox’s Bazar, Bangladesh, 7–9 February 2019; pp. 1–5. [Google Scholar]
- Etaiwi, W.; Awajan, A. The effects of features selection methods on spam review detection performance. In Proceedings of the 2017 International Conference on New Trends in Computing Sciences (ICTCS), Amman, Jordan, 11–13 October 2017; pp. 116–120. [Google Scholar]
- Fusilier, D.H.; Montes-y Gómez, M.; Rosso, P.; Cabrera, R.G. Detection of opinion spam with character n-grams. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics, Cairo, Egypt, 14–20 April 2015; pp. 285–294. [Google Scholar]
References | Year | Methods | Dataset | Results |
---|---|---|---|---|
[5] | 2018 | Credibility Score assessment via web-sources based upon user input | Web-Sources and User inputs of fake and true claims | Macro-avg accuracy Pipeline = 82% |
[5] | 2018 | Credibility Score assessment via web-sources based upon user input | Web-Sources and User inputs of fake and true claims | Macro-avg accuracy CRF = 80% |
[5] | 2018 | Credibility Score assessment via web-sources based upon user input | Web-Sources and User inputs of fake and true claims | Macro-avg accuracy
LSTM = 78.09% |
[20] | 2019 | Different Features TF-IDF, user-centric and bag of words with ML classifiers | Yelp dataset via web-scrapping | Highest achieved F1-score Ada-Boost = 84% |
[21] | 2021 | Word, sentence and chunk level model | Different public datasets | - |
[22] | 2022 | Chinese language model via dynamic features | Samsung case reviews and restaurant reviews datasets | Precision = 0.92, Recall = 0.91, F-score = 0.91 |
[16] | 2022 | TOURQUAL Protocol visa T-LAB software is applied to find hotel quality control indicators | 1,143,631 user reviews extracted from TripAdvisor website | |
[17] | 2022 | A conceptual framework applied based upon UTAUT2 model | 195 residents data collected using digital questioner | |
[18] | 2022 | Airbnb customer behavior analyzed using Nvivo software | 2353 reviews collected via manual coding |
Serial Number | Layer Name | Input Parameter | Activation Function |
---|---|---|---|
1 | Universal Sentence Encoding | Text input | - |
2 | Dropout | 0.2 | - |
3 | Dense | 128 | ReLU |
4 | Dense | 64 | ReLU |
5 | Dense | 32 | ReLU |
6 | Dense | 16 | ReLU |
7 | Dense | 1 | Sigmoid |
Sub-Exp. | Validation Split | Method | Acc. (%) | Prec. (%) | Rec. (%) | F1. (%) |
---|---|---|---|---|---|---|
S-1 | 70–30 | Naïve Bayesian | 86 | 86 | 86 | 85.5 |
Linear Regression | 81 | 84 | 81 | 80.5 | ||
Random Forest | 75 | 75 | 75 | 75 | ||
S-2 | 60–40 | Naïve Bayesian | 86 | 86 | 85.5 | 85.5 |
Linear Regression | 81 | 84 | 81.5 | 80.5 | ||
Random Forest | 74 | 74.5 | 74.5 | 74.5 | ||
S-3 | 50–50 | Naïve Bayesian | 87 | 87 | 87 | 86.5 |
Linear Regression | 81 | 83.5 | 81.5 | 80.5 | ||
Random Forest | 73 | 73.5 | 72.5 | 72.5 | ||
S-4 | 30–70 | Naïve Bayesian | 84 | 84 | 84 | 83.5 |
Linear Regression | 80 | 82.5 | 80.5 | 80 | ||
Random Forest | 70 | 71.5 | 70 | 69.5 |
Methods | 3-Fold | 5-Fold | 8-Fold | 10-Fold |
---|---|---|---|---|
Naïve Bayesian | 72% | 72.8% | 73.5% | 75.1% |
Linear Regression | 83% | 83.2% | 82.9% | 83.2% |
Random Forest | 73% | 76.4% | 78.3% | 79.1% |
Sub-Experiment | Data Splits | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) |
---|---|---|---|---|---|
S1 | 70–30 | 85.62 | 88.40 | 81.27 | 84.70 |
S1 | 60–40 | 82.96 | 83.05 | 81.16 | 82.10 |
S1 | 50–50 | 79.88 | 76.92 | 84.18 | 80.38 |
S1 | 30–70 | 79.29 | 78.54 | 79.96 | 79.24 |
References | Method | Results (%) |
---|---|---|
[33] | Parts of speech, linguistic, word count and content-based features used | F1-Score = 83.7 |
[34] | Semi-supervised best-performed Naïve Bayesian | Accuracy = 85.21 |
[34] | Supervised best-performed Naïve Bayesian | Accuracy = 86.32 |
[35] | Bag of words feature selection method, best-performed Naïve Bayesian | Acc. = 87 Prec. = 52.78 Rec. = 92.63 |
[36] | N-gram character-based deceptive review detection | 25–75% split and F1. = 80% |
Proposed Work | Semantic aware deep feature-based deceptive review detection | Acc. = 87 Prec. = 87 Rec. = 87 F1. = 86.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alawadh, H.M.; Alabrah, A.; Meraj, T.; Rauf, H.T. Semantic Features-Based Discourse Analysis Using Deceptive and Real Text Reviews. Information 2023, 14, 34. https://doi.org/10.3390/info14010034
Alawadh HM, Alabrah A, Meraj T, Rauf HT. Semantic Features-Based Discourse Analysis Using Deceptive and Real Text Reviews. Information. 2023; 14(1):34. https://doi.org/10.3390/info14010034
Chicago/Turabian StyleAlawadh, Husam M., Amerah Alabrah, Talha Meraj, and Hafiz Tayyab Rauf. 2023. "Semantic Features-Based Discourse Analysis Using Deceptive and Real Text Reviews" Information 14, no. 1: 34. https://doi.org/10.3390/info14010034