Analysis of Human Behavior by Mining Textual Data: Current Research Topics and Analytical Techniques

: The goal of this study was to conduct a literature review of current approaches and techniques for identifying, understanding, and predicting human behaviors through mining a variety of sources of textual data with a focus on enabling classiﬁcation of psychological behaviors regarding emotion, cognition, and social empathy. This review was performed using keyword searches in ISI Web of Science, Engineering Village Compendex, ProQuest Dissertations, and Google Scholar. Our ﬁndings show that, despite recent advancements in predicting human behaviors based on unstructured textual data, signiﬁcant developments in data analytics systems for identiﬁcation, determination of interrelationships, and prediction of human cognitive, emotional and social behaviors remain lacking.


Introduction
At present, the vast amount of textual data being generated from myriad sources (e.g., formal or informal reports, interviews, call logs, emails, performance documents, blogs, tweets, comments, or social media entries) is rapidly increasing [1]. Although this increase in textual data allows for large repositories to be analyzed, summarized, and deciphered, using these data to make insightful decisions has become much more challenging. Thus, in this study, we sought to explore the current approaches through which the unstructured textual data can be analyzed by extracting valuable information to support decision-making for various purposes. Consequently, we conducted a systematic literature review of the techniques and methods used to identify, understand, and predict human behaviors by mining various textual data sources.
The problem of mining textual data has received substantial attention, owing to the proliferation of social networks that allow the distribution of opinions and sharing sentiment on diverse subject matters. This literature review is focused on the methods for understanding human psychological behavior through the use of textual data. Mining textual data can provide deep insights into an individual's views, attitudes, sentiments, and emotions toward other individuals and help predict future social behaviors [2]. Such human behaviors can be identified and understood by extracting textual data with meaningful semantic properties, including metadata such as concepts, events, keywords, categories, including symmetric and asymmetric relationships. Such knowledge can facilitate improved decision-making (e.g., personnel selection and training) or intelligence analyses [3]. According to Bornstein et al. [4], human behavior is described as "the potential and expressed capacity for physical, mental, and social activity during the phases of human life." Regarding the identification of behaviors by text mining, Tausczik et al. [5] stated that "by drawing on massive amounts of • RQ1. What has been the most relevant research reported in the scientific literature for the identification of human behaviors through text mining? • RQ2: How can current analytical techniques for the prediction of human behavior from unstructured textual data be classified?
The inclusion criteria were as follows: (a) papers written in English; (b) peer-reviewed papers; and (c) papers depicting graphs, charts, equations, and/or tables presenting text mining techniques that initially identified research focusing only on methods of psychological analysis of behavior. The exclusion criteria were as follows: (a) papers not written in English; (b) papers determined upon evaluation to be unrelated to the research questions; and (c) opinions, letters, and editorials.
A search strategy for the review was used to identify papers applicable to answering the research questions. The strategy involved defining the search space and the vetting process to be used in identifying pertinent literature. The recent and influential literature in the field of text mining, including journal articles, textbooks, proceedings, and grey literature, were important sources in this research.
With the knowledge of the subject matter and based on widely cited articles such as [2,5], we developed a list of set of keywords, which after testing in search engines, was reduced to the 15 keywords that are presented in Table 1. Subsequently, this set was used to query databases, such as EBSCOhost, Compendex, IEEE Xplore, Google Scholar, and ProQuest. This process resulted in a reduction of the core search parameters used to identify the key components affecting the prediction and understanding of human behavior via text mining. After retrieving the articles, we then carefully chose pertinent papers. The terms used in the EBSCOhost database were as follows: ("data mining"  To assess the risk of bias in the present study, we used the Cochrane Risk of Bias Tool [8] as a support instrument. The relevant papers were classified among different bias domains, such as sequence generation (the methods through which the data were collected), allocation concealment (whether data allocations could have been foreseen before or during collection), blinding of participants (the people who generated the text), blinding outcomes (the people who generated the text data not having knowledge of the results), incomplete outcome data (whether the papers showed completeness of the outcome in their results) and finally selective outcome reporting (whether the authors showed outcome reporting and what was found). Figure 1 depicts the number of papers in each of these categories. Most of the papers had a low risk of bias. The use of the Cochrane Tool allowed us to reduce all possible biases that could have affected the quality of the review and thus the reliability of conclusions. The risk of bias was evaluated using a subjective judgment (high, low, or unclear) regarding the individual elements of the domains represented in Figure 1. Once the classification was made, a percentage estimate of these judgments was obtained. On average, the different domains were approximately 58% low, 23% unclear, and 18% high risk of bias.

Results
To understand the evolution of the research on the prediction of human behavior on the basis of unstructured textual data, the selection procedure and the numbers of papers selected in the various stages of selection are shown in Figure 2. Then the literature review was classified into categories. After a category was identified, we proceeded to identify the main text mining approach and the main insights of each work analyzed. The characteristics of the included papers are shown in Table 2. Words of wisdom: Language use over the life span Natural language processing Cognition and theory [9] Mobile phones as medical devices in mental disorder treatment: an overview Natural language processing Emotional [10] Opinion Mining for text classification Document classification Emotional [11] A new significant area: Emotion detection in E-learning using opinion mining techniques Natural language processing Emotional [12] Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena Natural language processing Emotional [12] Twitter mood predicts the stock market Information extraction Social [13] Sentiment classification based on supervised latent n-gram analysis Document classification Emotional [14] Supporting disease insight through data analysis: Refinements of the monarca self-assessment system Natural language processing Emotional [15] Smartphone-Based Recognition of States and State Changes in Bipolar Disorder Patients Natural language processing Emotional [16] Mining and summarizing customer reviews Natural language processing Emotional [17] Sentiment analysis with long short-term memory networks Natural language processing Emotional [18] Thumbs up? Sentiment Classification using Machine Learning Techniques Natural language processing Emotional [19] A faceted characterization of the opinion mining landscape Natural language processing Emotional [20] A   [23] Product aspect ranking using sentiment analysis: A survey Web mining Emotional [24] Opinion mining and sentimental analysis approaches: A survey Natural language processing Emotional [25] Opinion mining and sentiment analysis Natural language processing Emotional [26] Sentiment analysis and opinion mining: A survey Natural language processing Emotional [27] #MyDepressionLooksLike: Examining Public Discourse About Depression on Twitter Web mining Emotional [28] Understanding customers using Facebook Pages: Data mining users feedback using text analysis Natural language processing Emotional and social [29] Using causal models in heterogeneous information fusion to detect terrorists Natural language processing Emotional and social [30] INSiGHT: A system for detecting radicalization trajectories in large heterogeneous graphs Natural language processing Emotional and social [31] Harvesting and analysis of weak signals for detecting lone wolf terrorists Natural language processing Emotional and social [32] Detecting Linguistic Markers for Radical Violence in Social Media Natural language processing Emotional and social [33] Personality and language: The projection and perception of personality in computer-mediated communication Natural language processing Emotional and social [34] Hierarchical Sentiment Analysis Model for Automatic Review Classification for E-commerce Users Natural language processing Emotional and social [35] Assessing Bipolar Episodes Using Speech Cues Derived from Phone Calls Information retrieval Emotional and social [36] Using behavioral indicators to help detect potential violent acts Natural language processing Emotional and social [37] Sentiment analysis: capturing favorability using natural language processing Natural language processing Emotional and social [38] Identifying topical influencers on twitter based on user behavior and network topology Natural language processing Emotional and social [39] Language-based personality: a new approach to personality in a digital world Natural language processing Emotional and cognition [40] The efficacy of SMS text messages to compensate for the effects of cognitive impairments in schizophrenia Natural language processing Emotional and cognition [41] Sentiment classification on customer feedback data: Noisy data, large feature vectors, and the role of linguistic analysis Natural language processing Emotional and theory [42] The Development and Psychometric Properties of LIWC2015 Natural language processing Emotional and theory [43] The Role of Text Pre-processing in Sentiment Analysis Document clusterization Emotional and theory [44] Text sentiment analysis based on long short-term memory Information extraction Emotional and theory [45] Analysing the presence of school-shooting related communities at social media sites Web mining Social [46] The State of the Art 2015: A literature review of social media intelligence capabilities for counter-terrorism Natural language processing Social [47] Opinion Mining platform for Intelligence in business Natural language processing Social [48] Mining the peanut gallery: opinion extraction and semantic classification of product reviews Information extraction Social [49] Analysis of Online Social Networks Posts to Investigate Suspects Using SEMCON Information retrieval Social [50] A mutually beneficial integration of data mining and information extraction Information extraction Social [51] Product aspect ranking and its applications Information extraction Social [52] Opinion zoom: a modular tool to explore tourism opinions on the web Natural language processing Social [53] Language and interaction: applying sociolinguistics to social network analysis Information retrieval Social [54] A Generic Architecture for a Social Network Monitoring and Analysis System Document classification Social  The articles, which were included on the basis of the publication date, relevance, and content, were classified into three main behavioral categories: emotional, social, and cognitive. For each paper, we identified objectives, algorithms/techniques, models of computational aims, and main applications. Each of the included papers was subclassified according to the categories shown in Figure 3. The present review retrieved a combination of 82 relevant papers, which are identified by subcategory.  The results indicated that more than 50% of the reviewed literature was completed by using natural language processing (NLP), which was one of the strongest approaches. This method was followed by information extraction (15%); document classification and clusterization (13%); and web mining, information retrieval, and summarization (20% combined). We also provide a map of the co-occurrence of the "text mining" term in the title and abstract in Figure 5. We used VOSviewer software (https://www.vosviewer.com/, accessed on 10 June 2021) to map the bibliometric data as a network and develop keyword co-occurrence maps. In this figure, links between the "text mining" node and other nodes show the cooccurrence of the terms, and their sizes indicate the frequency of occurrence.
Efficient analyses of unstructured information about people make continuous monitoring of a given individual's performance or learning effectiveness very difficult. This aspect explains the increased need for automated techniques to analyze and apply tags of human behavior signatures and human performance. Such a task, performed by a human expert, might require weeks or months when performed manually, particularly when the analyzed results are biased because of emotional, relational, and other environmental factors. The present study discusses the state of the art in the applications of behavior analysis from the mining of unstructured texts to assess attitudes, emotions, or performance at the individual level.

Discussion
In this section, the included papers are discussed according to their research methods and categories of human behavior.

Research Methods and Classification of Approaches and Techniques for Text Mining
Recently, studies highlighted multiple applications of text mining in a variety of forms . Select examples of such applications, which used various data mining algorithms and methods, and are relevant to the goals of this research, are briefly reviewed below. Huang et al. [92] described a technique focusing on processing large quantities of unstructured intelligence information and used mathematical analyses for unstructured data in emergency systems. This approach focuses on visual computing, cognitive modeling, and NLP to build an intelligence information service platform. Bakshi [93] reported an unstructured data-mining method based on the data processing paradigm MapReduce.
MapReduce is a programming model for processing large quantities of unstructured data and generating datasets by using a parallel, distributed algorithm on a cluster to extract sentiment information and other meaningful social and relational data. Weerdt et al. [74] proposed a method based on a combination of text mining and trace-clustering for incident reporting and predicting possible modes of action.
Wu et al. [28] described an analytical process for interpreting dialogue on social network platforms, such as Facebook or Twitter group pages. This technique focuses on critical elements of posted internet content and uses textual analyses to apply knowledge and information processing to extract key phrases from conversations. Shahbaz et al. [68] proposed a method based on the software Sentiment Miner. Sentiment Miner filters text files, such as interviews, for "opinion mining" at the sentence level by using NLP techniques and opinion mining (OM) algorithms. This approach filters users or groups of users that are relevant to the custom search query in question via an analysis of unstructured data. Chakraborty et al. [64] described an approach to analyzing unstructured textual data to extract user insights from an extensive collection of documents by using "Text-Miner" and "SAS Sentiment Analysis", which are based on artificial neural networks and use a regression model to predict target variables such as descriptive classifications of behavioral models.
Every day, humans generate vast amounts of textual information, which is stored electronically. This information is of great relevance because it includes information on moods, opinions, behavioral trends, and preferences. An example of this value is in text mining for commercial uses, such as consumer identification and purchase preferences for products and services [94]. Text mining includes the application of different methodological approaches and algorithms .
Information extraction: Information extraction enables the automatic extraction of structured information from structured or semi-structured documents or databases, which can later be used to perform calculations. It has various applications, such as consumer care and personal information management [96].
Information retrieval: Information retrieval is a process enabling organizing and retrieving information at different levels of storage, whether from metadata, images, documents, or information within a specific document or query. The process is performed according to the user's needs via a query, after which the information is indexed and filtered, and the relevant data are subsequently extracted and returned to the user for a corresponding purpose. Document classification and clusterization: In many practical applications, data must be organized by groups according to content to facilitate information handling. The process of organizing the information often uses supervised document classification models or methods created for unsupervised clusterization, such as NMF [97] or LDA [98]. As discussed by Kowsari et al. [99], text classification is a significant challenge in many domains and fields of application. Clustering, in contrast, allows for the same organization to be performed according to groups, but in an unsupervised manner, by using clusters of the same types of data information that are not initially labeled, thereby decreasing the possibility that the data that are incorrectly assigned [100].
Summarization: Summarization creates a short representation of the original text, thus allowing readers to grasp the entire information on a general level. It can take the form of identification of key sentences in a document, which are subsequently presented as a summary with considerable relevance for the user. In contrast, in the abstract form, the summarization tool attempts to understand the information described in the text to later extract the relevant idea and present it to the user [101].
Natural language processing: NLP is the technique through which data and texts written in natural language are processed. The importance of this technique lies in the difficulties in computing systems' understanding of how humans execute communication, because communication requires not only the use of words and symbols but also accentuation, expression, and their respective meanings within a given context, which in many cases can become abstract [102]. The field of applying NLP ranges from applications such as Grammarly, Siri, or Alexa to sectors such as the economic and health sectors (e.g., the IBM supercomputer Watson).
Web mining: In web mining, techniques and algorithms are used to obtain relevant information found on the internet. It can focus on user activities, such as visited websites, the links that users click during browsing, or the documents consulted. However, the data gathered from the internet are most often used to extract relevant information to find patterns that enable, for example, diagnosis and prediction of trends of consumption. These data also include analyses of a given user's reaction to receiving a product or service and are used to create personalized marketing strategies, thereby enabling business and closing processes to be conducted effectively [103].
OM or sentiment analysis: Sentiment analysis refers to a wide range of machine learning methods, including computational classification techniques that focus on linguistics, NLP, and textual analytics. These techniques allow for the identification of the attitudes and opinions of individuals toward a specific issue on the basis of metrics representing the characteristics of the problems of interest. Such attitudes and opinions can be evaluated via human judgments, expert evaluations of the emotional states of individuals, or the intended meaning of communication in a specific context.
One method for unstructured text mining is sentiment analysis [24,26,69], also known as OM in the context of NLP. Opinions are subjective descriptions, appraisals, and feelings expressed by individuals, whereas sentiment analysis focuses on the algorithmic extraction of various attributes of expressed opinions, such as polarity, subject matter, and ownership. Sentiment mining focuses on the computational analysis of the "subjective" information typically contained in textual sources, such as reports, reviews, blogs, posts, and comments [34]. As noted by Shahbaz et al. [68], the main aim of sentiment analysis is to categorize text at the document or sentence level and to provide information about whether the analyzed text expresses a positive, negative, or neutral sentiment toward a given topic.
Turney [21] and Pang et al. [18] discussed various approaches for categorizing the polarity of individual opinions and introduced the feature-based analysis model, similarly to the sentiment analysis used by Hu and Liu [16]. This model can be used to determine views expressed by individuals about specific features (e.g., product features) as well as user attitudes (e.g., positive, neutral, or negative) regarding specific features or aspects of an issue through the use of certain words and sentences. Sentiment analysis can be applied at the text, sentence, or sub-sentence level and can be optimized to offer a fine-grained analysis on a five-point Likert scale or a five-star rating, which is used to detect the emotion of the entity expressing the opinion in aspect-based, intent, and multilingual analyses. With the exponential increase in unstructured data, the algorithmic extraction of sentiment from expressed personal views significantly improves behavioral insights and makes behavioral analysis more efficient.

Human Behavior
This literature review proposes three main categories to classify human behavior in the context of text mining: cognitive, emotional, and social behaviors. Most of the literature conveys how textual data are analyzed to understand the activities, mental skills, and social interactions among people, with the goal of identifying emotional, social, and cognitive behaviors, whose characteristics are depicted in Figure 6 [55]. Emotional behavior is correlated with mental health issues (e.g., stress, depression, anger, or violence), and the monitoring and treatment of mental disorders can be achieved by extracting textual data from communication devices. Several studies have explicitly shown that assumptions can be made about a given person's current mood by analyzing variations in mobile usage patterns, texting, and calling [9]. Moreover, studies by Tausczik et al. [5] and Rutland et al. [70] described machine-learning methods to analyze the content of messages sent by short message service (SMS) to scan words from texts and link them to psychologically meaningful terms that can be used to asses emotions and changes in mood. In addition, audio data from mobile calls can be analyzed and translated to text, and the person's mood can be extracted to detect emotional signatures [35].
Social behavior is associated with issues of social interaction, such as empathy or loneliness. Social networks, such as Twitter, LinkedIn, or Facebook, were used to study human social behavior. Textual comments were analyzed to identify sentiments to extract information about attitudes, social activity, and interactions with other users [27]. For example, the number of ongoing or outgoing comments or conversations can reflect the current mood of a given individual. In addition, people with mental health conditions tend to increase the number of texts sent during maniac episodes, whereas low levels of texts sent can be correlated with depression [14]. In terms of societies, governments, and leader actions, interesting advances using hybrid approaches such as complexity science, symmetry, and information systems (text) presented by Helbing et al. [91] demonstrate that they can contribute to areas such as understanding geopolitical tensions by analyzing an extensive data set from newspaper articles. The published news was used to search the text of the article for mentions of a given country along with a set of keywords typically associated with tensions (for example, crisis, conflict, antagonism, clash, contention, discord, fight, attack, combat) and have predictions about the subsequent actions.
Cognitive behavior is associated with the performance of mental processes such as thinking and casual reasoning [55]. The information expressed as textual data can be used to monitor an individual's skills in different activities. Some authors divide cognitive functions into categories (e.g., perception, attention, memory, language skills, and executive functioning), which can be monitored by assessing performance in specific tasks within these categories. For example, textual analytics was used in SMS text messages from people with schizophrenia to help identify cognitive impairment [104].

Emotional Category
In this category, the included papers focused on using smartphones, written information, opinion mining, and customer feedback. Gravenhorst et al. [9] recognize smartphones as a promising technology for use in the treatment of mental disorders through the implementation of sensor devices to monitor illnesses. By using human-computer interfaces to support therapy and by collecting data from patients' daily lives, smartphones can be beneficial for treating people with mental disorders. Grünerbl et al. [15] explored the use of mobile phones to recognize depressive and manic states in people with bipolar disorder. This sensor-based smartphone system can support the treatment of patients with bipolar disorder as a supplementary tool for health care professionals [15]. Muaremi et al. [35] demonstrated the applicability of phone calls to assess episodes of bipolar disorder in patients. Statistics were extracted from various phone call conversations by using speech cues, and different features, social signals, and emotional properties were identified [35]. Li and Qian [44] identified how long-term memory helps classify information by analyzing different emotions in texts. This method helps classify different sentences with a corresponding emotion and may be used to project possible trends in preferences [44].
Wang et al. [80] used emotion evolution law for emotion analysis. This method evaluates natural language text from web news by using one-step and limited-step shifts as well as path transfer; it was validated on a data set of titles, bodies, and comments from news articles. This method can identify feelings such as love-anger, sadness-anger, and joy, thus providing insight into applications regarding affective interaction in network public sentiment, social media communication, and human-computer interaction. Swain et al. [83] proposed a method for detecting suicide ideation by using sentiment analysis from tweets via supervised learning. By using Python language modules and machine learning models for opinion mining, the research using this method suggests that machine sentiment analysis can aid in timely detection and act as an alert system for suicidal tendencies. Similar work was recently presented by Bayram and Benhiba in [88], where with machine learning techniques, it was possible to identify a person's suicide risk based on the shortterm history of their tweets. Fareri et al. [86], in 2020, focused on the development of a data-driven approach using text mining techniques to analyze job profiles and quantify the readiness of employees of a large firm to adopt the Industry 4.0 paradigm. This approach provides a framework for estimating the Industry 4.0 readiness of enterprises.
Mahendran et al. [10] proposed classifying written information as positive, negative, or neutral to efficiently study raw data by using traditional approaches such as Bag of Words, Naïve Bayes classifier, and frequency distribution. Tausczik et al. [5] used the computerized text analysis program Linguistic Inquiry and Word Count (LIWC) to determine the physiological meaning of textual information. In this program, words are categorized into different psychological classes to assess peoples' thought processes, emotional states, intentions, and motivations [5]. Turney [21] categorized data according to an analysis of the meanings of different words by using algorithms. For example, positive reviews (thumbs up) are determined if the review contains positive words, whereas negative reviews are determined by negative words (thumbs down). Nasukawa and Yi [37] applied a semantic analysis and achieved 70-95% precision in relating sentiments to positive or negative words in text documents from web pages and articles. Extracting information by using NLP can help determine sentiments expressed online [37]. Thakur and Han [105] presented an attractive approach for analyzing the acceptance of interaction with virtual assistants throughout different interactive devices with sentiment analysis using Natural Language Processing to explore the views, expressions, and beliefs expressed by older adults.
Pennebaker et al. [42] studied the software LIWC, which processes textual information to capture the beliefs, preferences, and sentiments of people expressed in words. This study provides evidence that the words that people use have psychological value [42]. Emotions play a critical role in the studies of human knowledge and behavior. These emotions can be determined by the environmental events of the individual or by their cognitive abilities and social skills. Knowledge management (KM) research considers them from specific angles, and, to date, a comprehensive understanding of the emotions that dominate KM and their prediction has been lacking. To offer a holistic view, this study investigated the presence of emotions in knowledge management publications by applying sentiment analysis [87].
Liu [2] introduced different aspects of sentiment analysis and opinion mining because these two fields have become the most critical approaches in analyzing people's opinions, sentiments, emotions, and attitudes through the collection of textual language. Miedema [17] explored how sentiment classification can be used to arrange documents according to sentiments. This method was used to organize feelings gathered through movie reviews for the long short-term memory. Bo Pang et al. [18] indicated that some machine learning techniques have not performed correctly in classifying texts by sentiment. This aspect has become a concern because it makes sentiment analyses more challenging. Othman et al. [24] explored approaches for opinion mining and sentiment analysis to gather and analyze information about the opinions of the public. Machine learning can help collect the responses posted on different social media platforms so that data can be used for various purposes in the industry. Acheampong et al. [78] focused on sentiment analysis through emotional detection via text mining. With the ease of sourcing for data, the analysis of text mining has led to different approaches in the design of text-based emotional detection systems as well as different proposals regarding the concepts of contributions, approaches used, datasets used, results obtained, and strengths and weaknesses.
Vinodhini and Chandrasekaran [26] established that sentiment analysis and opinion mining can help predict future behavioral trends by elucidating the preferences of customers according to what they write. This capability is valuable for economic and marketing studies. Usability in logistics and supply chain management was used recently to examine customer perceptions of companies' services. For example, Siby et al., in [90], presented an interesting application in last-mile logistics. The research used customer reviews about their delivery experience regarding quality, service quality, product return, refund policy, information sharing issues, etc. This work recommended suggestions for redesigning processes related to last-mile logistics by introducing artificial intelligence technology. Pang and Lee [25] compared traditional analyses and sentiment-aware applications that process information about the sentiments and opinions of people. Different techniques, benchmarking, future work, and resources were also studied. Salloum et al. [20] proposed a different classification system for the different aspects of opinion mining because the challenges of correctly detecting the meanings and interpretations of different opinions can complicate opinion mining (i.e., an understanding of the domain-specific opinion is required) [20].
Greco and Polli [77] focused on the abundance and use of textual data as a source of valuable information regarding opinions and feelings and discussed the use of emotional text mining in brand management. This method is used to profile social media users' representations and sentiments about a topic by extracting information from a collection of texts such as Twitter. Raeesi Vanan [82] performed a study in which 3 million inbound tweets and outbound brand responses (tweets) were collected for brand sentiment analysis.
Steps of CRIP-DM were used as a reference guide for business and data understanding, preparation, text mining, validation, and discussion of its contributions. The analytical conclusions regarding the sentiment trends were that the sentiments of customers toward a brand are significantly correlated with the brand's proper response to a brand community over social media as well as providing customers with a deep feeling of reciprocal understanding of needs.
Pang and Lee [25] presented the importance of opinion mining and sentiment analysis, which has led to the development of several techniques and machines to gather and process information about the opinions and moods of people. The challenge is to seek better approaches to sentiment-aware applications. Haddi et al. [43] used support-vector machines (SVM) to explore the importance of text pre-processing in sentiment analysis because understanding the relevance of product opinion can be very challenging, owing to the diversity and quantity of unstructured data in existence. [43].
Binali et al. [11] indicated that determining the emotional experiences of e-learning students can be difficult; however, through mining techniques, analyses can detect emotion in online students. In addition, identifying the different emotions of e-learning students can help model more suitable educational programs. Another study [16] proposed summarizing customer reviews by choosing product features on which they commented, classifying whether the opinion was positive or negative, and summarizing the results. This analysis is important because extremely high numbers of reviews prevent potential customers from reviewing every single opinion. Mate [23] proposed a ranking of essential product features from the online reviews of consumers. These aspects were identified by the number of times the product features appeared in reports and how these aspects influenced the overall opinions of consumers.
Estrada et al. [79] performed a comparison of sentiment analysis classifying techniques, machine learning, deep learning, and EvoMSA to classify education opinions in an Intelligent Learning Environment called ILE-Java. The development of two corpora expressions, sentiTEXT, which has polarity positive and negative labels, and eduSERE, which has positive and negative learning-centered emotion labels, reflected students' emotional states regarding teachers, exams, homework, and projects. EvoMSA produced the best results among the classifying techniques, with a 93% accuracy rating for the senti-TEXT corpus and an 84% accuracy rating for the eduSERE corpus. Two expressions in the programming language domain reflect the emotional states of students and their feelings regarding teachers' exams, homework, and academic projects: sentiTEXT (positive and negative labels) and eduSERE (positive and negative learning-centered emotions labels. Misuraca et al., 2021 [81] discussed OM as a combination of statistics, linguistics, and computer science that evaluates sentiments of individual opinions and highlights semantic orientation. The discussion includes the induction of OM as a statistical text analysis tool in a learning environment to process student feedback from natural language producing useful analytics, and to explore text collections from a quantitative viewpoint.
Wu et al. [28] studied how information shared on Facebook pages can be beneficial in determining whether a company is correctly reaching its customers or the desired requirements are met. By analyzing the interactions of Facebook users and the reactions to their posts, companies can gather information, apply statistical analyses, and model behavioral trends [28]. Kaur and Bansal [34] introduced opinion mining as a powerful tool for e-commerce because it gathers information about how customers feel about different products. This collection of opinions can help companies make better decisions and align their efforts with what customers really want. The classification of e-commerce users represents an appealing area of study for marketers seeking to align their efforts to capture more consumers. [34]. Gamon [41] used large feature vectors and feature reduction to demonstrate that large, noisy data regarding customer feedback can be analyzed and classified. Feedback received from customers can present many challenges, and classifying these data is necessary to retrieve only the important information [41].
Bollen et al. [12] highlighted that many Twitter users express their emotions through this social media platform. With the use of a psychometric instrument, different social events were found to profoundly affect changes in public mood. The identification of these sentiments reflects personality trends, as well as the atmosphere and emotions of Twitter users. Basari et al. [22] examined how tweets can contain information about users' preferences regarding movies. SVM can analyze natural language to determine patterns via opinion mining. Online reviews can help predict the possible preferences of the movie audience [22]. Zengin Alp and Gündüz Ögüdücü [38] introduced a method called Personalized PageRank, which integrates the information retrieved from network topology and the information of Twitter users regarding their actions and activities. This capability has become appealing for marketers because Twitter is an online platform where users share their preferences.
Saire and Cruz [84] focused on the use of text mining of data collected from social media and search trends to analyze the effects of COVID-19 on the population of Paris, France, from 23 April 2020 to 18 June 2020. The primary findings revealed a decreasing pattern of publication/interest in the health crisis and the health and economic effects on the population resulting from the effects of COVID-19. Chire-Saire [85] used analysis of social media through complex network representation and text mining to compare the effects of COVID-19 in other countries. Focusing on South American countries, the analysis of texts via Twitter indicated the existence of patterns similar to those in complex systems and confirmed the idea of system and visualization of adjacency matrices, which may potentially identify posts made by robots as opposed to humans.
Frost et al. [14] studied the system MONARCA 2.0 to collect relevant information from bipolar patients, with an aim to provide insight into the disease for both patients and clinicians by processing subjective and objective data about patient mood. This system helps identify patterns in behaviors and factors affecting the disease [14]. Lachmar et al. [27] gathered information shared by individuals with sentiments of depression on Twitter through the hashtag #MyDepressionLooksLike. These tweets presented dysfunctional thoughts, hopeless feelings, and unlovability characteristics, thus revealing how people with depression talk about their symptoms via social networking. Pijnenborg et al. [40] discussed the benefits of using SMS to decrease the effects of cognitive impairments in patients who have schizophrenia. Because schizophrenia also involves delusions and hallucinations, improvements in the status of patients using SMS can be very modest.
Bespalov et al. [13] proposed an approach to modeling higher-order sentences to a lower order to make the classification of data viable. Supervised latent n-gram analyses can help classify sentiments that are extracted from textual information. Davis et al. [29] determined how analytical models can enhance public safety with the help of probabilistic and parametric methods, as well as different nonlinear algebraic models, by analyzing uncertain data and identifying threats and false alarms, and detecting possible terrorist profiles [29].
Gill [33] illustrated the relationship between the language used and the personality projected by word choice. The personality traits of extraversion, neuroticism, and psychoticism can be determined by analyzing text from emails [33]. Boyd and Pennebaker [39] studied the language used by people to identify personality patterns. Rather than focusing on responses to self-reported questionnaires, language-based measures represent a new approach to model personality trends. A.S. Cohen et al. [3] applied computerized lexical analyses to determine positive or negative affectivity dimensions through natural speech. Measuring personality was possible because people with positive affectivity demonstrate high levels of positive emotions, whereas those with negative affectivity show high levels of negative emotions.
Brynielsson et al. [31] used different techniques for analyzing data to detect "lone wolf" terrorists with the goal of preventing possible attacks. Analytical models were created by using a platform to harvest and capture online information and trace possible lone wolves [31]. K. Cohen et al. [32] established the challenges of detecting lone wolves by using traditional police methods and introduced new tools and technologies that can detect weak signals in the form of linguistic markers that facilitate the identification of lone wolves' profiles [32].
Hung et al. [30] introduced a new framework and technology called INSiGHT (Investigative Search for Graph-Trajectories) that helps detect groups or individuals whose behavior suggests a potential for violence by identifying radicalization trajectories over time [30]. Paul K. Davis et al. [36] studied behavioral patterns and their usage to predict possible acts of violence.

Social Category
In this category, Alexander Semenov et al. [45] studied the identification of possible school shooters by analyzing the content shared by users on different social media platforms. Future shooters can be identified by analyzing the emails, chats, texts, and social media feeds of prior school shooters sharing similar behaviors [45]. Bartlett and Reynolds [46] presented how social media faces legal and ethical responsibilities, yet also can be useful to prevent terrorism and preserve public safety. Privacy can protect the public and prevent the use of social media for terrorism and propagandistic purposes [46]. Marrese-Taylor et al. [52] tested the software Opinion Zoom to gather online information about tourism opinions to propose solutions to problems in the industry. A modular tool was used because tourism opinions on the web can help predict possible traveling patterns as well as preferences of travelers.
Kastrati et al. [49] investigated the activities of users on online social networks to identify crimes by applying the objective metric SEMCON. By retrieving online posts, feeds, or users' comments, this method can determine whether a user is a suspect [49].
Bollen et al. [12] analyzed how OpinionFinder and Google-Profile of Mood States (GPMOS) can help determine the mood patterns presented on social media regarding worldwide events. This analysis can also help companies predict the behavior of customers regarding the stock market and minimize the effects of fluctuations in the stock market. Bucur [47] established that opinion mining had become a key technique for extracting and collecting relevant information needed for companies to make better decisions and that the opinions of customers are fundamental input. Opinion mining has become an appealing area of study for many businesses [47].
Dave et al. [48] extracted textual information and classified online reviews as positive or negative according to different product attributes. Opinions can be classified through semantic analysis of online reviews [48]. Zha et al. [51] introduced a ranking system for product aspects by identifying that a) the most important aspects are described by more consumers, and b) these aspects directly affect the overall opinion of consumers. Product aspect ranking has many applications in various industries, and the main use is to gather relevant information to make better decisions.
Nahm and Mooney [50] examined how DiscoTEX can help extract data by combining data mining and information extraction. This method can locate data within documents and transform unstructured text into a structured database, as well as predict additional information for extraction from other documents. The integration of data mining and information extraction can help combine data in a more readable structure [50]. McCallum [56] investigated how unstructured data present a challenge in interpreting information. Therefore, the aim of information extraction is to create a database by gathering loosely formatted texts in which patterns can be identified by data mining [56].
Diehl [53] examined not only the structural but also the cultural aspects of social networks. Relational sociology studies have tended to examine and retrieve information from text data, whereas the importance of the implications of face-to-face interactions when analyzing network information has largely been ignored. A. Semenov et al. [54] proposed three modules for long-term monitoring of different social networks: the crawler, the repository, and the analyzer. By crawling, storing, and analyzing different sites, longitudinal data from social media sites can be examined.
Pennebaker [55] analyzed the words that people use in emails, Twitter feeds, and Facebook posts to determine their emotions, thoughts, social relationships, and personalities. The focus was on word use rather than on how people were speaking. Mind mapping can help explore social and psychological trends. Ibrahim and Ahmad [57] researched how Requirements Analysis and Class Diagram Extraction (RACE) can expedite textual extractions and improve the analysis of the data requirements that are currently performed manually. Many NLP techniques were developed to extract relevant information from textual data.

Cognition Category
Eichinger et al. [58] introduced Affinity, a system that can assess similarities among the text message histories of users while preserving private information. A latent format is used, which does not allow for the reconstruction of the comparison words. Chung and Pennebaker [61] distinguished the adjectives most commonly used by college students by applying computerized text analytic tools. This study has established the strengths of analyzing open-ended texts to extract information from the natural language used by different participants. This method enables the examination of cultural patterns as well as personality characteristics.
Bond and Pennebaker [59] experimented with changing pronouns to moderate the health benefits of expressive writing by alternating the focus of participants. Expressive writing can therefore affect people's physical and psychological health. Pennebaker and Stone [6] developed two projects showing the relationship between language use and aging: as people age, they tend to use more positive affect words than negative affect words and to use fewer self-references and fewer past-tense verbs.
Rajman and Besançon [62] established that text mining is a powerful technique to extract important information from a dataset by applying probabilistic associations of keywords because unstructured data can be challenging to interpret.
Fishhoff and Chauvin [106] investigated how intelligence analysis helps clear difficult situations and enhance valuable information for better decision-making by evaluating and integrating pertinent information. Intelligence analysis can help determine behavioral profiles and social conduct.

Other Studies
Kosala and Blockeel [65] explored the use of web mining by dividing it into three different categories-web content mining, web structure mining, and web usage mining-and studying representation issues, recess, and learning algorithms. Balazs and Velásquez [63] studied how information fusion seeks to correctly transform and compress data to trans-form them into a more understandable representation. Fusion processes and the development of surveys to extract relevant data can be helpful as the use of opinion mining steadily increases. Nigam et al. [67] evaluated maximum entropy techniques to establish how a uniform distribution can benefit the classification of data. More studies must be performed, but this technique appears promising.
Continuous efforts have been undertaken worldwide to propose new classification algorithms such as Tsetlin Machine [107] or Dendritic Neuron Models [108]. Rutland et al. [70] evaluated how the use of SMS can be measured with the SMS Problem Use Diagnostic Questionnaire (SMS-PUDQ) to determine behavioral addiction to SMS use. The time spent using SMS and other measures of mobile phone use were detected during the study. Aggarwal and Zhai [71] explored the importance of mining text data, an appealing research topic, given that the amount of web-enabled data has increased and facilitates the exploration of vast quantities of textual data. A comparison of the classical and modern aspects of text mining was also described. Berry and Kogan [72] studied the contributions of text mining, as well as major topics associated with text mining, by categorizing text into three different components to explore keyword extraction, classification, and the clustering of information presented in textual data. Akilan [73] investigated the field of text mining to extract unstructured data and identify interesting and non-trivial patterns from text documents. An exploration of the current challenges and projected directions of this field was described [73]. Chakraborty et al. [64] prepared various case studies and performed text mining and analysis to extract important information from textual data. Different scenarios were created wherein SAS was used to perform comprehensive text analytics to help industries leverage the textual data [64].
Shahbaz et al. [68] proposed a solution to the analysis of textual information by developing a system, Sentiment Miner, to process and classify text files according to opinions stated in various sentences by using NLP techniques and opinion mining algorithms. Weiss et al. [69] introduced methods to predict and analyze unstructured information presented on textual data. Methods used for data mining could be adapted to be applied to text.
Chakraborty et al. [64,109] collected insightful information from customers by analyzing textual data from various documents to improve business operations and performance. Analyses of unstructured data are possible by extracting important information when performing text analysis and sentiment mining. Weerdt et al. [74] described the importance of retrieving data to benefit business process management by applying process mining, which uses techniques to analyze and extract knowledge and information from system event logs.
Manning and Schutze [66] established the value of using statistical NLP to extract and interpret textual data, not only for businesses but also for government agencies and individuals who could benefit from extracting information from a large amount of data. The theory and practice of these techniques are also explored. [66] Moraes et al. [75] compared SVM and artificial neural networks to determine the differences between these two approaches in performing sentiment analysis and determined that artificial neural networks perform better than SVMs. Fraley [76] presented guidelines on how to construct web-based surveys to conduct behavioral research. Strengths and limitations of online surveys are highlighted, as well as the factors affecting the design of internet-based research.

Conclusions
In this paper, we analyzed published articles on different topics related to text mining and human behavior. We divided the analysis into psychological behaviors regarding emotion, cognition, and social empathy. The current research in identifying behaviors has focused primarily on detecting emotional and social behaviors, whereas studies on cognitive behavior are rarer. We found that NLP is the most common approach, which is followed by information extraction and document classification. Another main finding in this review was that few studies have focused on detecting cognitive behavior. To our knowledge, no decision support system has used a holistic approach to analyze cognitive, emotional, and social behaviors simultaneously. The literature reviews analyzed and the articles in Table 2 focus primarily on detecting emotions or empathy. The psychological studies, for example [4] and [76], identified relationships between cognitive aspects, emotions, and empathy. For this reason, it would be helpful to develop analytical and computational systems that make it possible to identify the connections between different aspects of human behavior through text analysis. In this way, predictions of future human behaviors and explanations of past actions could be made. Furthermore, behaviors and their effects on the outcomes of human action should be distinguished in greater detail. For instance, the detection of negative words in comments can be associated with certain social behaviors (e.g., being socially aggressive), as well as with cognitive behaviors (e.g., having dementia or depression).
Through the literature review, we identified a trend in the detection of mood states that may affect a person's life. For example, technological tools can support the detection of behaviors over time (e.g., hours, days, weeks, or months). Consequently, detecting shortterm emotional behaviors in users (e.g., being "socially inactive" over a long period) could, in turn, predict mood states and disorders (e.g., loneliness or depression), which could also affect long-term social and cognitive behaviors. Thus, the traceability of behaviors was studied. In the same way, many authors have demonstrated how text messages or the translation of audio or video to textual data could contain delicate information that might otherwise be missed. In addition, privacy and security are issues that must be managed through the use of anonymous analyses.
Most of the literature discussed how the formulation and understanding of human behavior were challenging and remained an evolving area of research that considerably affects analytics. On the basis of the present review, a method or platform allowing all classification methods to be combined has not been thoroughly explored. In contrast, we found that each element of behavior has generally been examined individually. Hence, future work should address this lack of information by using more systematic approaches, in which multiple behavioral aspects can be analyzed simultaneously.
We hope that this review will support the design of a system that combines sentiment mining and NLP techniques to develop an unstructured data opinion miner and index engine for polarity extraction and classification at the sentence level through the use of a variety of documents from repositories that represent or describe a given group of individuals. Such a system should also facilitate the use of progressive tracking to capture various changes in individual behavior, including the detection of behavioral changes, the detection of anomalies, risk evaluation, and monitoring. This research may serve as a reference for practitioners and researchers interested in detecting human behavior through text analysis.  Informed Consent Statement: Not applicable to this study.

Data Availability Statement:
The study did not report any data.

Conflicts of Interest:
No potential conflict of interest was reported by the authors.