Next Article in Journal
Pre-Work for the Birth of Driver-Less Scraper (LHD) in the Underground Mine: The Path Tracking Control Based on an LQR Controller and Algorithms Comparison
Previous Article in Journal
A CMOS PSR Enhancer with 87.3 mV PVT-Insensitive Dropout Voltage for Sensor Circuits
Previous Article in Special Issue
Path Planning Generator with Metadata through a Domain Change by GAN between Physical and Virtual Environments
Article

Linguistic Patterns for Code Word Resilient Hate Speech Identification

1
Institute of Information Systems and Applications, National Tsing Hua University, East District, Guang Fu Rd. Sec. 2, No. 101, Hsinchu City 300, Taiwan
2
Social Networks and Human-Centered Computing, Taiwan International Graduate Program, Institute of Information Sciences, Academia Sinica, 128, Academia Road, Sec. 2, Nankang, Taipei 115, Taiwan
*
Author to whom correspondence should be addressed.
Academic Editors: Pau-Choo Chung, Gary G. Yen, De-Nian Yang and Meng-Hsun Tsai
Sensors 2021, 21(23), 7859; https://doi.org/10.3390/s21237859
Received: 31 October 2021 / Revised: 22 November 2021 / Accepted: 22 November 2021 / Published: 25 November 2021
(This article belongs to the Special Issue AI Drives Our Future Life)
The permanent transition to online activity has brought with it a surge in hate speech discourse. This has prompted increased calls for automatic detection methods, most of which currently rely on a dictionary of hate speech words, and supervised classification. This approach often falls short when dealing with newer words and phrases produced by online extremist communities. These code words are used with the aim of evading automatic detection by systems. Code words are frequently used and have benign meanings in regular discourse, for instance, “skypes, googles, bing, yahoos” are all examples of words that have a hidden hate speech meaning. Such overlap presents a challenge to the traditional keyword approach of collecting data that is specific to hate speech. In this work, we first introduced a word embedding model that learns the hidden hate speech meaning of words. With this insight on code words, we developed a classifier that leverages linguistic patterns to reduce the impact of individual words. The proposed method was evaluated across three different datasets to test its generalizability. The empirical results show that the linguistic patterns approach outperforms the baselines and enables further analysis on hate speech expressions. View Full-Text
Keywords: hate speech; social media; linguistic patterns hate speech; social media; linguistic patterns
Show Figures

Figure 1

MDPI and ACS Style

Calderón, F.H.; Balani, N.; Taylor, J.; Peignon, M.; Huang, Y.-H.; Chen, Y.-S. Linguistic Patterns for Code Word Resilient Hate Speech Identification. Sensors 2021, 21, 7859. https://doi.org/10.3390/s21237859

AMA Style

Calderón FH, Balani N, Taylor J, Peignon M, Huang Y-H, Chen Y-S. Linguistic Patterns for Code Word Resilient Hate Speech Identification. Sensors. 2021; 21(23):7859. https://doi.org/10.3390/s21237859

Chicago/Turabian Style

Calderón, Fernando H., Namrita Balani, Jherez Taylor, Melvyn Peignon, Yen-Hao Huang, and Yi-Shin Chen. 2021. "Linguistic Patterns for Code Word Resilient Hate Speech Identification" Sensors 21, no. 23: 7859. https://doi.org/10.3390/s21237859

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop