Detecting Binge Drinking and Alcohol-Related Risky Behaviours from Twitter’s Users: An Exploratory Content- and Topology-Based Analysis

Binge Drinking (BD) is a common risky behaviour that people hardly report to healthcare professionals, although it is not uncommon to find, instead, personal communications related to alcohol-related behaviors on social media. By following a data-driven approach focusing on User-Generated Content, we aimed to detect potential binge drinkers through the investigation of their language and shared topics. First, we gathered Twitter threads quoting BD and alcohol-related behaviours, by considering unequivocal keywords, identified by experts, from previous evidence on BD. Subsequently, a random sample of the gathered tweets was manually labelled, and two supervised learning classifiers were trained on both linguistic and metadata features, to classify tweets of genuine unique users with respect to media, bot, and commercial accounts. Based on this classification, we observed that approximately 55% of the 1 million alcohol-related collected tweets was automatically identified as belonging to non-genuine users. A third classifier was then trained on a subset of manually labelled tweets among those previously identified as belonging to genuine accounts, to automatically identify potential binge drinkers based only on linguistic features. On average, users classified as binge drinkers were quite similar to the standard genuine Twitter users in our sample. Nonetheless, the analysis of social media contents of genuine users reporting risky behaviours remains a promising source for informed preventive programs.


Introduction
Excessive alcohol use is a frequent risky behaviour, which accounts for between 1.3% and 3.3% of health costs globally [1]. High rates of alcohol consumption and heavy drinking are common among young people, raising concerns in terms of public health issues [2]. Binge drinking (BD) is defined as four or more drinks for women and five or more for men on a single occasion [3], with current rates of up to 27% both in the United States and Europe [4,5]. The use of the term is popular and clearly recognizable not only to researchers in the field but also to the general public and young people in particular [6]. Young adults who engage in BD are more likely to report other health risks such as riding with drunk drivers, smoking cigarettes, being a victim of violence, attempting suicide, or using illicit drugs [7]. In addition, knowledge and perception of BD risks are often limited [8,9] among young people, with impaired decision making playing a major role [10] in actions leading to immediate rewards, poor skills in terms of anticipating negative consequences and learning from previous mistakes, considering consequences not relevant to themselves [11,12]. IT-based evidence has shown encouraging results as regards alcohol use reduction and behavioural support among young people (e.g., [13]). This is likely to be due to young people's propensity to use electronic devices and their expertise with them (e.g., smartphones) to engage with social media [14]. Previous research explored vulnerability to addiction and risky behaviours across big data by identifying clusters according to individuals' personal characteristics and circumstances, and by comparing different techniques in terms of methodological reliability [15,16].
Social media platforms are increasingly popular among both young people and individuals belonging to different age groups; they combine media and peer influences from a broad range of areas involving social norms, risk perceptions, and related behaviours [17]. Indeed, social influences affect drinking behaviours, and online social networks can have an effect on both the style and the amount of drinking behaviours also in farthest circles [18,19]. Specifically, people share online information and access contents that other subjects have posted on the Web, including their own experiences, which defines a new IT user paradigm [20,21]. This scenario, completely different from an end-user condition, enables the access to social media, where large amounts of User-Generated Content (UGC) are spread every day across virtual communities, almost without any external control [22][23][24], lowering at the same time the perception of anonymity and confidentiality issues among users [25,26]. This applies particularly to those topics that people are reluctant to discuss with healthcare professionals, including behaviours, opinions, and individual-directed actions that are difficult to track and measure in a clinical setting [27].

Binge Drinking and Social Media
Previous evidence showed that, for instance, young people frequently discuss their drinking behaviour on social media [28], alcohol misuse contents are easily shown on users' profiles [29,30], and exposure to drinking-related content contributes to the normalization of drinking [31]. Indeed, impulsivity features typical of BD and similar alcohol-related risky behaviours may fit particularly well into social media, whose users can easily and instantly connect to a mass audience via brief messages [32]. Furthermore, a recent systematic review and meta-analysis drew actually attention to a moderate strength of relationship between exposure to alcohol-related social media content and alcohol consumption and consequences, with study participants frequently discussing their drinking behaviour on social networking sites [28]. Risky behaviour-and substance-related research has used Twitter databases to the aim of mining data to explore sentiment, topics and sources for Tweets related to tobacco [33], marijuana [34], alcohol [35] or a combination of different substances [36]. In [33], the authors explored tweets to learn more about the use of tobacco by adopting an unsupervised clustering algorithm to group tweets. Further studies investigated the categorization of substance related content (i.e., cannabis and synthetic cannabinoid) by using supervised machine learning with fairly high accuracy [37], and by tracking changes in users' opinions in Twitter over time and across different regions. Interestingly, recent evidence demonstrated the popularity of drinking-related chatter in particular on Twitter, with most alcohol-related tweets reflecting a positive sentiment toward alcohol use, outnumbering anti-alcohol Tweets, and with references to heavy drinking behaviours [35]. Tweets normalizing or encouraging marijuana use over alcohol use are reported to be even more common [36].

The Current Study
In order to explore features that are hardly detectable by using classical epidemiological designs, an alternative, yet consistent, approach may involve BD-related UGC, by employing a data-driven process to investigate public health concerns at a reduced cost [27,38]. The available tools for automatic content classification may be fruitfully employed to analyse tweets related to alcohol and drug recreational use, to the aim of harnessing social media platforms for alcohol and drug misuse surveillance research [37]. In particular, the study reported in this paper was aimed to explore the communication of risky behaviours on Twitter, grounded on the hypothesis that there might be a relationship between alcohol-related shared information as available from social media posts and risky behaviours such as BD. A better understanding of how people are involved in social networks about alcohol-related behaviours could help in finding innovative ways of promoting healthy behaviours and in establishing potential preventive programmes. Therefore, we aimed at mapping clusters of tweets that explore a semantic spectrum of alcohol-related UGC, by identifying both the language and the common topics discussed by potential binge drinkers.

Materials and Methods
We considered the Twitter social media platform, a free-to-use microblogging site, characterized by immediacy and easiness of use [39], with over 200 million users internationally [40]. Twitter involves unidentified people, who can instantly connect to a mass audience via brief messages (280 characters or less, i.e., "tweets"), displayed on both the author's homepage and those of his/her followers [41]. It represents an ideal public place to hear the latest news, exchange ideas and connect with people, in real time with an impressive volume of around 500 million tweets per day [42], producing a considerable amount of unstructured data. Thus, Twitter can be considered as a key source of social media contents, since it provides feasible access to data (via Advanced Programming Interfaces-APIs and suitable libraries) both retrospectively on sets of historical tweets connected to specific users, and prospectively to capture several matching tweets and related metadata [43], thus allowing to study also individuals' health behaviours, such as drug and alcohol use [27]. To develop a specific approach for our condition of interest, i.e. BD, a structured workflow was followed, with three distinct phases ( Figure 1). In Phase 1, a dataset was built by gathering alcohol-related tweets and related metadata from the microblogging platform, by focusing on BD-related hashtags identified by a panel of experts. Phase 2 dealt with the automatic identification via supervised classification of genuine unique users (intended as real persons) with respect to the Twitter accounts of media and business activities, and social bots, by considering different characteristics connected to both tweets' content and metadata. Finally, in Phase 3, supervised classification was applied to automatically identify potential binge drinkers among genuine users, focusing only on linguistic features.

Phase 1: Data Gathering
Eligible records were Twitter posts (i.e., tweets) quoting alcohol-related behaviours. In particular, only tweets written in English and containing specific keywords addressing the condition of interest were gathered. Relevant keywords were constituted by some hashtags identified according to previous evidence on BD and by exploring the platform through a systematic search by a panel of experts. Hashtags are labels preceded by the # symbol (technically metadata tags), and they are generated by users on Twitter to allow an easy identification of a specific topic on a dynamic thread of tweets. The search phase for BD-related hashtags from Twitter was carried out for one week; they belong to distinct categories:
The complete list of the 23 hashtags used to filter tweets related to alcohol consumption is as follows: #alcohol, #alcoholic, #alcoholics, #bingedrinking, #botellon, #cocktail, #cocktails, #drinking, #drinks, #drunk, #drunkasfuck, #drunkennights, #drunkies, #getdrunk, #hangover, #nomorealcohol, #pubcrawl, #pubcrawling, #rhum, #sorehead, #toomuchalcohol, #vodka, #wasted. Based on the selected hashtags, a systematic focused crawling process [44] (i.e., focused on the specific hashtags) through public APIs was implemented on Twitter, taking into account Twitter updates in terms of tweets length (currently 280 characters vs. 140 before) using an ad-hoc Python script [45]. During the crawling process, specific available data about tweets and their authors were recorded. These data included: (i) The entire text of the tweet, discarding multimedia content; (ii) metadata associated with the tweet, such as the reactions to the tweet, expressed via retweets and likes, the date and time the tweet was created, information on geo-location if available; (iii) details of the original tweet if the post was a retweet (including information about the original author of the tweet); (iv) author's details such as screenname (also known as handle), complete name, biography, number of tweets in their timeline, number of followees and followers, date of account creation.
Twitter data were gathered with respect to three different time periods, i.e., from December 2017 to March 2018, from April 2018 to June 2018, and from July 2018 to September 2018. These represent approximately three seasonal intervals, i.e., winter, springtime, and summer. These tweets were therefore split into three datasets: D1, D2, and D3. Based on the gathered tweets, we were able to collect also the thread of tweets related to single users of interest, thus constituting a new dataset D4.

Phase 2: Identification of Genuine Users with Respect to Bots, Media, and Business Accounts
To ensure that collected tweets were suitable for the analytical algorithms, a source classification aimed at removing "source noise" [37] was carried out before the identification of potential binge drinkers. Specifically, in this phase, we separated tweets belonging to personal accounts from bot-, media-and business-related tweets, since these sources generated inappropriate contents. These included educational messages and videos for problems related to alcohol; news reports or blogs with food-related content; businesses accounts and bartenders advertising alcohol premises.

Supervised Learning for the Identification of Genuine Users
Several approaches have been proposed in the literature to distinguish user-generated tweets (personal communications) from spam components or automated programs, assigning "objects" characterized by specific features to two (or more) predefined classes [46][47][48]. This task is generally accomplished by means of supervised machine learning techniques by using (i) a subset of the objects already labelled with respect to the class to which they belong, and (ii) a vector of characteristics (i.e., features), associated with the objects to be classified. In supervised learning, a model (e.g., a classification model) is trained on the labelled objects (i.e., the training set) by considering the values of the features associated with the labelled objects; then, the resulting model can be applied to unlabelled objects (i.e., the test set) to automatically assign them a class.
To this purpose, in this work, a binary classification algorithm was applied to a training set containing both tweets written by genuine users and tweets written by bots or media/business, which were manually labelled. In order to build a labelled set of tweets written by genuine users we randomly selected a subsample of tweets from the gathered datasets by considering around 500 distinct users (either genuine or non-genuine) and their associated tweets (each user had on average two tweets in the gathered datasets). Based on a manual analysis of those users, we were able to label 320 users as non-genuine, while 180 users were identified as to be likely genuine users. The feature vectors identified to classify personal tweets with respect to bot-, media-, and business-related tweets (and, hence, real users with respect to non-real ones), included the following features: 1. The number of tweets of a single account: Users with a high number of tweets are probably media, commercial, or bot accounts [46]; 2. the average number of hashtags per tweet: Hashtags are the "keywords" by which users identify the main topics contained in their message. A genuine user is expected to include a limited number of hashtags in a single tweet, while those who want to promote their own content often abuse of hashtags to increase the probability to find their content when using search engines [48,49]; 3. the average number of mentions per tweet: Mentions, i.e., citing another Twitter account by the use of the symbol '@' followed by the name of another user, for conversation and discussion purposes. These interactions are more specific of real people, while commercial activities often send general messages and do not hold individual conversations with their circle of followers [49]; 4. the number of occurrences of personal pronouns per tweet: The use of personal pronouns is strictly connected to people. Advertising messages are often written in a "dry" and impersonal form [49]; 5. the average number of URLs per tweet: Links to external sites (often more than one) are frequently posted by commercial activities to move users' browsing from Twitter to their brand's site [46]; 6. the presence of URLs in the user profile: Commercial activities extensively use the platform's advertising potential; 7. the retweet/tweet ratio: Genuine users rarely re-tweet without comments, whereas accounts retweeting about a brand behave in RSS feed style [48]; 8. the network size: Profiles with a large number of followees and followers are likely to represent a famous person or a company; 9. the followers/followees ratio: For genuine user accounts, this ratio does not deviate too far from the unit. It is reasonable to expect that one person follows a certain number of profiles in a reciprocal way. Often the imbalance is severe for famous people and businesses that tend to have a high number of followers (even in the order of tens or hundreds of thousands of units) but very few or even zero followees (because the purpose of that account is not to read the contents published by third parties); 10. the presence of geo-located tweets: The use of Twitter occurs mainly via its mobile app, often with geo-localization turned on; on the other hand, desktop use is typical of business users [50]; 11. the number of "bad tokens" per tweet: Along with the features described above, we identified by manual inspection of a random sample of some users' tweets, some words (bad tokens) that likely indicate a non-personal profile. Since a high number of occurrences of bad tokens suggests that the tweet has been written by a business or a bot, they were automatically eliminated from the dataset by using a Python script through the Natural Language Toolkit framework (NLTK) [51].

Classifying Genuine Users
Based on the selected features, and by employing the training set composed of the 500 users mentioned in the previous section, two classifiers were trained, tested and evaluated, based on two well-known supervised models: Support Vector Machines (SVMs) and Random Forests (RFs) [46,48,49].

Normalization
Due to the different ranges of values associated with the features, these were normalized in a common range before being analyzed by the (supervised learning) classifiers. For example, the average number of URLs per tweet is a real number, while the presence of the URL in the user profile is a value that can assume values only in the set {0, 1}. To obtain a common scale, each value x i associated with feature i has been normalized according to the following formula: where x i is the value x i normalized in the [0, 1] interval, µ i and σ i are the mean and the standard deviation of feature i with respect to the associated values referred to the training data. Formally, and where N is the cardinality of the training set.

Cross-Validation
In addition, we had to handle the relatively limited size of the training data, since the labelled tweets associated with the users' profiles represented a subset of the entire dataset of tweets gathered during Phase 1. This limitation was unavoidable given the manual labelling of the considered datasets, which were composed of around a million of tweets. Cross-validation is a well-known technique that may be used to evaluate the results of a model (i.e., a classifier). It enables the use of a limited sample of labelled data in order to estimate how the model is expected to perform when used to make predictions on data not considered during the model training. Specifically, we performed a k-fold cross-validation, with a value of k equal to 5. By this strategy, the available labelled dataset was split into five subsets, and the classifier was trained and evaluated five times. At each of the five training and evaluation rounds, 4 (i.e., 5-1) subsets were used to train the model, and the remaining subset was used as a test set to validate the model. The test set changes sequentially at each round. Evaluation results from the five evaluation rounds have been then summarized to obtain a final estimate of effectiveness for the considered classifier.
The framework used for the implementation of the selected classifiers (i.e., SVM and RF) was the scikit-learn (sklearn) library, which constitutes a usual choice for machine learning applications in Python [52]. In particular, for both the normalization and cross-validation processes, the methods of preprocessing.StandardScaler and model_selection.cross_validate classes of sklearn were used. The two classifiers have been evaluated through Receiver Operating Characteristic (ROC) curve analyses with Area Under the resulting ROC Curve (AUC) showing the discrimination capability of classifiers at different operating points.

Phase 3: Identifying Potential Binge Drinkers
After distinguishing genuine from non-genuine users, we implemented an additional classifier focusing only on linguistic features (i.e., features connected only to the text of the tweets), since these may allow to capture similarities in the vocabularies used by potential binge drinkers and, possibly, automatically identify them. To extract linguistic features we considered, for each user, her/his set of tweets as a unique text, i.e., a document. Technically speaking, through the CountVectorizer class of the sklearn framework, we counted for each word its number of occurrences in a document. We then computed for each word and each document (through the TfidfTransformer class) the so called tf-idf value, a weight that combines the number of occurrences of a word in a document with the frequency of the word in the whole collection. The tf-idf values associated with each word in each user's tweet constitute feature values considered to the classification purpose.
During the classification process we considered, at first, only the tweets of the 180 users (around 360 tweets) manually labelled as genuine in Phase 2. Of these tweets, only those reporting a potential BD behaviour were selected. By following this approach, only 45 of the 180 genuine users were considered as potentially at risk of BD. With respect to Phase 2, in Phase 3 only an RF classifier via 5-fold cross validation was applied to the set of tweets belonging to these 45 users as training set. We opted for RF since it is effective in general, and especially for text categorization [53,54]. As illustrated in Section 2.2.1, on average, only two tweets were gathered with respect to each user during the focused crawling process based on hashtags detailed in Section 2.1. Therefore, it has been deemed necessary and useful to collect the tweet entire history of the 45 users manually labelled as potential binge drinkers. For this reason, a further crawler (focused on the 45 users) was developed, and a total of 86,204 tweets were retrieved, for an average of 1959 tweets per user (dataset D4). The complete classification pipeline is shown in Figure 2.

Results
In order to find tweets related to alcohol and BD (Phase 1 of the proposed approach), a preliminary textual content analysis was carried out to select those keywords (hashtags) supposed to be useful to identify alcohol-related tweets according to experts' panel, as introduced in Section 2.1. Figure 3 shows that most frequent hashtags were #alcohol, #cocktail, #cocktails, #drinks, and #rum, followed by #drunk, #vodka, #drinking, and #hangover (see also Table A1 in Appendix A.1).

Dataset Characteristics
We extracted three seasonal waves of tweets: 409,788 from December 2017 to March 2018; 316,541 from April to June 2018; and 318,071 from July to September 2018. Table 1 summarizes the characteristics of both users and tweets from the three datasets. The average number of daily tweets was similar across time-periods, though the number of tweets was larger in winter (December 2017-March 2018). The majority of tweets was not in a favourite list in all time-periods (December-March 74%; April-June 88%; July-September 89%) or was liked only once (December-March 13%; April-June 8%; July-September 7%). In addition, users' followers, favourites and friends' distributions were highly skewed, with the right tail of the distribution longer than the left one, meaning the existence of users with a very large number of followers, favourites and friends.

Identification of Real Users with Respect to Bots, Media, and Business Accounts
Since a source of noise was likely to mask personal communications among genuine users', in Phase 2 we used a structured approach to distinguish tweets of likely genuine (non-retailers) users with respect to bots, media, and business accounts. Various features (e.g., number of tweets of a single account, average number of hashtags, mentions, and occurrences of personal pronouns per tweet) were extracted to distinguish real users from non-real ones. Along with these features, distinct words were manually identified as keywords suggesting a media-or retail-related content, and included as additional linguistic features. This facilitated the assignment of tweets containing similar "bad tokens" to the class of those produced by non-real users ( Table 2). The random subsample from the original set of tweets labelled by experts appeared to include a proportion of personal communications of about 36%. The classifier was trained on the subset labelled by experts, based on both SVM and RF models, and it showed a moderate fit in terms of performance in distinguishing personal communications (AUC values 0.76 ± 0.04 and 0.73 ± 0.05) when applied to the test set. When performing automatic classification on unlabelled data (in datasets D1, D2, and D3), the proposed approach estimated cumulative 45% of personal communications. The ROC curves and AUC average values for the SVM and RF classifiers are shown in Figure 4a,b.

Details on Evaluation Metrics
A ROC curve shows the performance of a classification model at all classification thresholds. In a classification task, a score s(i, c) is predicted for each item i, where the score denotes the probability that the item belongs to a class c. Therefore, it is possible to test different values for a threshold t, such that, in binary classification, s(i, c) ≥ t is interpreted as predicting c (i.e., the positive class), and s(i, c) < t is interpreted as predictingc (i.e., the negative class). The positive class is represented by genuine users, while the negative one by retail users. The ROC curve plots two parameters: A portion of bot, media and business users was identified as genuine users, i.e., some false positives were produced. Therefore, to get an in-depth focus on these classification results, we performed a simple content analysis to investigate occurrences and patterns of words within the tweets associated with users identified as genuine in D1, D2, and D3 by the automatic classification process. The text of these tweets was divided into sequences of n contiguous words occurring within a single tweet (n-grams). For each dataset (i.e., D1, D2, and D3), the resulting list was made up of the n-grams most frequently used by users identified as genuine. Table 3 reports the bigrams and trigrams more frequently used in each time-period, while Figures A1-A3 in Appendix A.2 show the word clouds of the most relevant unigrams mentioned by the automatically identified genuine users. Bigrams and trigrams were more informative as compared with unigrams, since they provided some contextual information. As it emerges from Table 3, some retained bigrams, such as "mental health", "public health", and trigrams, such as "need help tweet", might likely belong to profiles of users who help people and deal with public health issues (e.g., doctors, experts, journalists) and do not strictly represent personal communications. Furthermore, trigrams showed that some profiles, posting tweets about alcohol (e.g., bartenders), often include in their tweets an explicit indication to ban the consumption of alcohol by minors according to specific minimum legal age, e.g., "(don't) share anyone 21", "must (be) 21 (to) follow", and "please drink responsibly".  Table 4 shows the characteristics of the subsample constituted by the 45 users manually labelled as potential binge drinkers. For each user, the entire tweet history was considered, including on average 1959 tweets per user. Similarly to the whole sample, the majority of tweets was not in a favourite list or was liked only once. Statuses, followers and friends counts were consistent with the entire sample characteristics. However, a less skewed distribution was observed, since only personal communications were likely to be included (no media, bot, or business accounts). On average, these users have been registered in Twitter for a longer period of time. Since we were able to distinguish genuine users from commercial accounts with a pretty satisfactory accuracy (Phase 2, Section 3.2), we tried to further automatically identify those users who were likely to binge drink, by implementing an RF classifier trained on the set of 45 users labelled as binge drinkers and their linguistic features. The training of the RF classifier was carried out by considering both: (i) The original training dataset made by around two tweets per user (90 tweets); and (ii) the training dataset constituted by the entire tweet history of the 45 users (on average, 1959 tweets per user). The proposed approach to automatically classify binge drinkers did not reach satisfactory results with both strategies, though in the latter, given the greater number of tweets per user, accuracy improved (AUC values 0.67 ± 0.05). The ROC curve and the AUC value obtained by the RF classifier for the second training dataset are shown in Figure 5. Finally, similarly to users identified as genuine in datasets D1, D2, and D3, we performed a basic content analysis to investigate occurrences and patterns of words within the whole sample of tweets from the 45 users manually identified as potential binge drinkers. Bigrams and trigrams more frequently used by these subjects (dataset D4) are reported in Appendix A.3. They did not appear to recur across different tweets. However, n-grams in dataset D4 were more likely to be related to personal matters as compared with those in datasets D1-D3, including also some suggestions of risky behaviors (e.g., "currently drunk abandon"; "drunk abandon building"). Figure A4 in Appendix A.4 shows the wordcloud of the most relevant unigrams for the 45 binge drinkers in dataset D4, with unigram dimension proportional to frequency.

Discussion
In the current exploratory study, we developed a systematic process that enabled an analysis of Twitter User-Generated Content in terms of alcohol-related behaviours aimed at identifying language and shared topics of potential binge drinkers. We aimed to gain insight into specific topics and patterns helpful to envisage preventive approaches to recognize and target those people who are at risk of BD. The study assessed a technique to automatically identify potential binge drinkers, by testing a classifier on the contents of tweets. Since alcohol-related tweets were frequently associated with media and/or business activities, personal communications were automatically distinguished from this "noise" from data, by involving a team of experts that manually identified potential noise indicators (e.g., bad tokens). These bad tokens and other features connected to users and their contents were employed to classify with a reasonable accuracy genuine users with respect to "retail" ones.
Considering genuine users, when assessing the performance of the supervised machine learning classifier for automatically identifying potential binge drinkers, we obtained not completely satisfactory results in terms of accuracy. This is consistent with previous evidence showing similar models exploring mental health disorders from Twitter are often fuzzy and unstable [43]. Moreover, in our study, manual coding seems a crucial step in order to perform analyses based on machine learning classifiers, since it allows to train the algorithm model according to tweet's characteristics vector and linguistic features in order to identify the target group (i.e., people at risk of BD and alcohol-related behaviours). Potential explanations of this difficulty include the need to consider new and changeable alcohol and drug use practices and related slang terminology, in order to identifying contents unequivocally related to BD. Manually labelled datasets to train algorithms able to identify alcohol-related contents have to deal with ambiguities in tweets when carrying out manual coding and need appropriate metrics to assess inter-coder reliability [55]. Furthermore, features selection process should be improved including all suitable n-grams that could be considered as bad tokens to let existing classification algorithms working more efficiently. Alternatives may involve patterns of use of Twitter, since people who are likely to BD are very much similar to standard users in terms of statuses, number of followers, friends and likes. Users identified as at risk convey a small proportion of tweets on BD, as compared with their entire bulk of tweets. People at risk are likely to share vocabulary and language, despite "background noise" from Twitter: We certainly do not expect a user to continually post messages regarding his/her problematic behaviour. In addition, slang expressions and number restrictions of the maximum characters of a tweet have a strong effect on tweets writing style, making the analysis complex. Magnitude and relevance of specific features might be exploited from n-grams and relevant analyses. Despite the actual identification of unique people with BD from their tweets is unfeasible, and probably not appropriate from an ethics perspective, these features might inform targeted preventive programs and focused campaigns, possibly benefiting from the cooperation of social media like Twitter that users clearly choose to express their BD characteristics.
The epidemiological approach to Twitter data represents an important challenge, since extrapolating knowledge from big data, including Twitter streams, and managing different textual contents possibly require more advanced computational methods to mine user profiles descriptions. These would allow to handle further metadata to take into account relevant individuals' demographic characteristics in the analysis [56,57]. Surveys from social platforms are thus hardly comparable with standard epidemiological studies, rather they bring additional limitations. Privacy concerns emerge at different levels. According to a recent study [58], Twitter users appear to be unfamiliar with Twitter warning about the platform opportunity to broadly and instantly disseminate information or content like photos, videos, and links to a wide range of users, customers, services, and organizations, including researchers and public health agencies [59]. Thus, focusing on users' perceptions about research on Twitter and how contextual factors are perceived, some best practices were identified. These include anonymizing identifying information when quoting tweets, not quoting tweets verbatim, honoring Twitter users' efforts to control their personal data by omitting private and deleted information, using larger datasets [58,60]. Furthermore, users feel more comfortable with the idea of tweets being analyzed by a computer rather than read by humans. Thus, the development of automated tools might contribute to ethical practices and research implications, though outside the standard framework of research ethics. Algorithms should pursue the maximum benefit minimising the risk of potential harm during data collection, analysis and publication, while researchers should assess algorithms' performance and routinely test them for effectiveness, avoiding the mislabelling of content [61]. Furthermore, discarding re-tweets may be considered a discretionary choice, since we aimed at preliminarily investigating individual-level data on social networking about alcohol-related behaviours. Multi-level data about re-tweet contents that users think will resonate with their followers is matter for future research.
We acknowledge the discretionary nature involving both the selection of hashtags and the supervised learning procedure chosen, as well as the linguistic feature analysis run. Moreover, we cannot assume that people who tweet on alcohol-related behaviours do actually use alcohol, though it is a likely linguistic proxy measure [62]. Moreover, we considered a single platform, i.e., Twitter, though how BD specific characteristics would match with certain features (public vs. private) of different messaging apps, e.g., WhatsApp or Line, remains to be explored. Finally, Twitter streams were sampled multiple times to reduce the impact of Twitter restrictions on the amount of data that can be collected through Twitter public APIs.

Preventive Implications and Conclusions
Behavioral and universal prevention programs have shown limited evidence in reducing BD, though its impact remains a cause for concern. Emerging issues such as BD may benefit from Twitter research focusing on behaviours less likely to be addressed in epidemiological research. Based on surveillance-like data from Twitter, strategies may be implemented encouraging awareness of the negative consequences of hazardous drinking, delivering a preventive message about BD. Likelihood of targeted behaviour patterns and the identification of target groups or places at high risk for unhealthy behaviours may represent key, high-resolution information to inform relevant stakeholders responsible for preventive policies [63]. Specifically, detecting real users reporting BD and alcohol-related risky behaviours on social media appears as a complex but promising approach deserving a deeper investigation in future studies.
Funding: This research received no external funding.
Acknowledgments: Our thanks to the panel of experts, including Gloria Castagna, Ilaria Riboldi, and Giulia Trotta, who contributed to the main study, by identifying relevant keywords from previous evidence, by manually labelling a random sample of tweets, and by determining potential noise indicators. Thanks also to Luca Chiodini, who contributed to the gathering of tweets and the implementation of the classifiers.

Conflicts of Interest:
The authors declare no conflict of interest.