An Examination of Electronic Cigarette Content on Social Media: Analysis of E-Cigarette Flavor Content on Reddit

In recent years, the emerging electronic cigarette (e-cigarette) marketplace has shown great development prospects all over the world. Reddit, one of the most popular forums in the world, has a very large user group and thus great influence. This study aims to gain a systematic understanding of e-cigarette flavors based on data collected from Reddit. Flavor popularity, mixing, characteristics, trends, and brands are analyzed. Fruit flavors were mentioned the most (n = 15,720) among all the posts and were among the most popular flavors (n = 2902) used in mixed blends. Strawberry and vanilla flavors were the most popular for e-juice mixing. The number of posts discussing e-cigarette flavors has increased sharply since 2014. Mt. Baker Vapor and Hangen were the most popular brands discussed among users. Information posted on Reddit about e-cigarette flavors reflected consumers’ interest in a variety of flavors. Our findings suggest that Reddit could be used for data mining and analysis of e-cigarette-related content. Understanding how e-cigarette consumers’ view and utilize flavors within their vaping experience and how producers and marketers use social media to promote flavors and sell products could provide valuable information for regulatory decision-makers.

experience and how producers and marketers use social media to promote flavors and sell products could provide valuable information for regulatory decision-makers.

Introduction
In recent years, the emerging e-cigarette marketplace shows great development prospects all over the world. According to data from Euromonitor International, the global sales of vaping devices will hit £6 billion in 2015, as business in its largest market, the US, will more than double to £1.7 billion [1]. With tremendous growth in the e-cigarette marketplace, there have been heated discussions as to regulatory approaches to e-cigarettes, prompting significant research interest and regulatory concern.
The consumer experience of e-cigarettes depends on various factors, such as the structure of the cartridge or tank, the material of the mouthpiece, and the liquid of the solution. The solution, which is called e-liquid, or e-juice, is differentiated by concentration of nicotine, price, safety, and flavor. A better and smoother taste may increase enjoyment of the vaping experience.
Flavor has been found to be an attractive factor to e-cigarette adopters [2][3][4]. Fruit flavors have been found to be the most popular among users to date, while tobacco flavors have been more important for initiation of e-cigarette use among current smokers [5]. For traditional combustible cigarettes, menthol has been found to be most popular among African-Americans and younger consumers and is the only flavor of traditional cigarettes not banned by the FDA in the United States [6].
Tobacco companies have already successfully marketed traditional tobacco products to youth by using flavor varieties. For example, Kostygina et al. found that menthol and candy-like flavors increased little cigars' and cigarillos' appeal to starters by masking the heavy cigar taste [7]. This leads to the FDA banning or limiting such practices as they were deemed to specifically target minors. Similarly, the FDA has expressed concerns that flavored e-cigarettes could attract youth and lead them to take up smoking and become susceptible to the diseases and premature deaths it causes [8].
Vapor stores, stores selling e-cigarette devices and liquids, have relied heavily on the internet and social media for marketing and promotion, preferring its more interactive and immediate format and low cost to target current and potential users. Flavor is broadly used, both in online social media advertisements and offline store promotions, to increase the appeal of e-cigarette products [9]. Thus, the study of e-cigarette flavors, and in particular how social media is used to promote flavors and in turn e-cigarette products, is of great significance to regulatory decision-makers and public health advocates in order to better understand the use and initiation of e-cigarettes.
Although e-cigarette research literature is growing steadily, there are still significant gaps. For consumers, e-cigarette vendors, healthcare providers, and policy makers, a better understanding of the characteristics of e-cigarette flavors could serve to inform the current debate around the use, initiation, and short-and long-term effects of e-cigarettes. E-liquid is an important factor of e-cigarette usage. For example, Lerner examined the reactive oxygen species (ROS) associated with e-cigarette components and the aerosols that were inhaled by the user. They found that some constituents with oxidizing properties associated with e-cigarettes were health hazards that warrant further examination [10].
Jensen et al. studied the formaldehyde produced by vaping e-juice. They found that long-term vaping is associated with an incremental lifetime cancer risk of 4.2 × 10 −3 , assuming that inhaling formaldehyde-releasing agents carries the same risk per unit of formaldehyde as the risk associated with inhaling gaseous formaldehyde. This risk is five times higher than smoking one pack of conventional tobacco [11]. Both papers mentioned that flavors were an important component in e-juice. However, very few studies have focused on this research topic. Fuoco et al. found that e-cigarette liquid flavors cannot be considered a major influence parameter in particle concentration emission [12]. However, Behar et al. tested eight cinnamon-flavored e-juices and found these fluids could produce Cinnamaldehyde (CAD) and 2-methoxycinnamaldehyde (2MOCA), which were highly cytotoxic. Thus, the cinnamon-flavored e-juice could adversely affect e-cigarette users [13]. They also noted that most studies mention e-liquid flavor only briefly and as one possible factor that could increase the appeal of e-cigarettes [4,[14][15][16][17]. Similarly, Bahl et al. found e-cigarette refill fluids were cytotoxic to human embryonic stem cell and mouse neural stem cell not due to nicotine but related to chemicals used to flavor fluids, such as Cinnamon Ceylon [18]. Farsalinos et al. evaluated sweet-flavored e-liquid for avoidable risk induced by presence of diacetyl and acetyl propionyl [19]. Lisko et al. measured 10 additive flavor compounds among 36 e-cigarette products, and found that added menthol might reduce harshness or more closely simulate the sensory experience of smoking traditional cigarettes [20]. Tierney et al. revealed 13 out of 30 products were more than 1% by weight flavor chemicals [21]. Research on e-liquid flavors has not been sufficient and many questions still remain. It is important to determine the most popular e-cigarette product flavors and the reasons why consumers choose one flavor over another. Analysis of the effects and influences of flavor popularity on product use and initiation is also critical to better understand the role that flavors play in the vaping experience. Finally, these insights are necessary in order to inform the public and policy makers for further discussion and debate.
Social media such as YouTube and Facebook have recently become a significant platform for health surveillance [22] and social intelligence [23]. For example, YouTube has become an important data source for consumers, and previous research has examined the information shared on YouTube related to smoking [24], smoking cessation [25], smoking imagery associated with cigarettes [26], smokeless tobacco [27], and little cigars [28]. One recent study on YouTube also found that flavor has been used as a key promotional element for the sale of e-cigarettes and related products [17]. Some videos claim that consumers should choose from the multiple flavors, such as chocolate and strawberry, to make them more attractive and appealing [17]. It was also found that the vast majority of information on YouTube about e-cigarettes promoted their use and depicted the use of e-cigarettes as socially acceptable [17]. An analysis of 365 e-cigarette-related videos found that they highlighted e-cigarettes' economic and social benefits, featuring a low level of fear appeal and negative message valence and a high level of marketing information [8]. Hua et al. used YouTube videos to study users' puff duration and found it was approximately twice as long as puff duration for conventional smokers [29]. In another paper, Hua et al. used data collected from three forums, Electronic Cigarette Forum, Vapers Forum, and Vapor Talk, to study the symptoms caused by e-cigarettes. The symptoms were classified into medical categories with positive or negative effect and the association between symptoms was examined [30]. Research on other platforms found that Twitter appeared to be an important marketing platform for e-cigarettes [31]. Another well-known social media platform, Facebook, was studied by Liang et al. Their research primarily focused on the social networks constructed from fan pages [32]. In general, Twitter research focused on detection of trends and patterns. YouTube provides rich information in videos; thus the research in YouTube was more practical and specific. However, Reddit, an important social media platform, has not been thoroughly studied. Therefore, the current study examined data from Reddit and conducted content analysis based on information shared on this platform.
Reddit, founded in 2005, is essentially an online bulletin board system that provides news, entertainment, and social networking functionality. Users provide all the content and decide, through voting, the ranking of posts and comments [33]. As one of the most popular forums in the world, Reddit has great influence and a huge number of user groups. Reddit users are anonymous. Demographic information was not required when a user signed up. Even email was optional in registration. The loose requirement creates a highly free environment for users to discuss and communicate with each other. However, it also creates the difficulty of defining characteristics of users. As of 28 June 2015, Reddit had 163,966,958 unique visitors hailing from over 212 different countries, viewing a total of 7,086,828,967 pages [34]. Since 2008, Reddit allows users to create communities (called "subreddits") where they can discuss interesting topics. Some research about health has been done based on the data collected from Reddit. Pavalanathan and Choudhury used Reddit to study mental health [35]. Arthur used Reddit to track the 2014 Ebola outbreak [36]. An article from the Institute for Health Research and Policy at the University of Illinois at Chicago pointed out that there was rich conversation happening in subreddits dedicated to electronic cigarettes and quitting smoking [37]. There are many publicly available posts about e-cigarettes and flavors, which have the potential to influence e-cigarette and flavor-related attitudes, choices, and behaviors. We believe the discussion of e-cigarette flavors in Reddit could profoundly influence readers subscribing to this community.
Despite the growing amount of literature on e-cigarettes on YouTube, Twitter, and Facebook, there are no published studies to date that have systematically mined e-cigarette and flavor content on Reddit. Given the potential that Reddit has to promote e-cigarette use through user-generated content or covert advertising, this study aims to gain a systematic understanding of the characteristics of a variety of flavors, popular flavors, and the reasons why they are popular by analyzing e-cigarette flavor-related posts and user information on Reddit.

Data Collection
E-cigarette flavor-related posts were collected from Reddit from 1 January 2011 to 30 June 2015 for study purposes in line with methods from several previous and related studies on Facebook and YouTube [17,32]. In these studies, a wide range of data was collected based on several keywords related to e-cigarettes. Some additional rules were applied to make sure that data was accurate and relevant. The data coding processes to classify the records were conducted manually. We used a similar approach. From previous studies [17,31], we identified the following seven keywords for data collection: electronic cigarettes, e-cigarettes, ecigarettes, e-cigs, flavor, flavors, and e-juice. Several subreddits were returned from this initial search. We chose the top 10 popular and relevant subreddits in ranking: /r/electronic_cigarette, /r/ecigclassifieds, /r/ejuice, /r/Vaping101, /r/ejuice_reviews, /r/EJuicePorn, /r/DIY_eJuice, /r/ecig_vendors, /r/Vaping, /r/E_cigarette. We believe that 10 subreddits could provide enough posts and comments for data analysis, and at the same time, a comprehensive understanding could be gained from analyzing a range of subreddits rather than a specific subreddit.
Two strategies were used to pull posts from the identified subreddits: (a) keyword searches and (b) ranking by relevance, hot spot, importance, up-to-date information, and reply count. Hot spot was given by Reddit search engine, and calculated by comprehensively considering of the number of browsing, comments, upvotes, and downvotes of a specific post. These strategies were chosen to mimic typical user behavior. Using these strategies, our dataset contained 493,994 posts on Reddit. In practice, there is some noise in the posts due to semantic ambiguity. For example, the word "apple" not only refers to apple flavor, but also has different meanings in other contexts. For instance, "Snapple Apple" is a kind of apple beverage, while "apple watch" is an electronic product produced by the Apple Company. We consider words that are not relevant to e-cigarette flavors (e.g., "Snapple Apple", "apple watch") as noise. Thus, we eliminated posts with such noise. Finally, a total of 27,638 unique e-cigarette flavor-related posts and 7376 brand-related posts were identified for analysis.

Data Analysis
To gain a systematic understanding of the characteristics of a variety of flavors, popular flavors, and the reasons why they are popular, the following processes were carried out. First, two reviewers reviewed 2200 sample posts and classified the flavors based on their ingredients. In total, 29 flavors were identified into eight categories: fruit, cream, tobacco, menthol, beverages, sweet, seasonings, and nuts. We ran another test to classify these 29 flavors, and the kappa coefficient of this classification was 0.73. Based on the concept provided by Fleiss et al. [38], this kappa coefficient indicated substantial or good agreement.
Second, the number of times that each flavor occurred in posts was counted. Using this method, popular flavors can be identified and flavor categories created and/or confirmed. Furthermore, the evolution trend of each flavor category over time can be analyzed.
Third, mixed flavor patterns were analyzed. Searches for mixed flavor posts were conducted using the following search keywords: mix, mixes, mixed, premixed, blend, blended, and blends. The most popular two-flavor and three-flavor combinations were determined.
Finally, by analyzing brand-related posts, the top 10 most popular e-liquid brands were identified. Characteristics of a variety of flavors were also identified in order to attempt to gain a better understanding of the reasons why some flavors are more popular than others.

Classification of Flavors
E-cigarette flavor-related posts were manually reviewed and classified into eight categories: Fruit, cream, tobacco, menthol, beverages, sweet, seasonings, and nuts. As shown in Table 1, these eight flavor categories include 29 different flavors identified in Reddit posts.
Previous research used a questionnaire to survey e-cigarette consumers and identified seven categories: tobacco, mint/menthol, sweet, nuts, fruit, drinks/beverages, and other [5]. In addition to these, we further differentiated some categories. Cream flavors were differentiated from sweet ones as some flavors were not only characterized as sweet, but also as smooth and creamy. For example, cream flavors included chocolate, vanilla, and milk as compared to honey and candy for sweet flavors. This research is the first to identify the seasonings category, which are interesting flavors derived primarily from various spices. which is a redeeming quality for me. However, the banana flavor is a bit more pronounced than I would have hoped for, with the other flavors taking a back seat. This juice seems much sweeter than Baked Blue. I'm experiencing almost the same level of harshness here. 3. A sweet natural cherry e-juice that will definitely take you back to that Greek bar a few summers ago that served delicious cherries on ice. I won't put it in that many words but if you ever kissed a girl (and liked it!) with cherry lipbalm you'll know how this tastes immediately. It's a sweet and candylike cherry flavour with a nice touch. I quite like it, it reminds me of my first kiss. (aww)

Cream
Cream category flavors are sweet and smooth. The exhale leaves the creamy taste.
1. All of the chocolates are ok though not massively inspiring, the coffees are very chemical and cinnamon is so bloody strong it's insane. 2. More bakery goodness here. Sweet, creamy, with a good full cookie note. Not an all day vape for me, it's a bit rich, but it's a flavour I'm happy to spend an evening with sometimes. 3. Vanilla: Decent, there are better vanilla flavours available, but also worse.

Tobacco
Not all tobacco category flavors taste the same. In general, some tobacco flavors are used to mimic the real cigarette flavors to help people switch from real cigarettes to e-cigarettes. Other tobacco flavors are blended with sweeteners to have sweet, smooth, and fragrant taste. These kind of flavors are more appealing to e-cigarette users who are not interested in the traditional tobacco taste.   1. I have an allergy to nuts in general, primarily peanuts (on the allergen blood test peanuts were given the second highest rating-Anaphylaxis shock after consumption). I carry an epipen at all times just in case of accidental exposure. I have personally made juice with a variety of different nut/peanut butter flavors and I have never had an issue vaping them or with accidental contact with my skin. Granted, the smell of TFA peanut butter still makes me nauseous due to the association I have obviously developed between the smell of peanut butter and potential death haha. 2. Om Nom is a reference to a local flavor. It's a banana nut bread and cinnamon flavor. The banana nut was really strong on the exhale, with a hint of cinnamon. 3. So I'm new to making e juice and I'm definitely more of a cloud chaser so I tend to be higher with vg maybe like a 80 vg 20 pg and most of the juices I buy are that ratio. My juices are so dull. The flavors are extremely muted and I'm following a calculator to do the mixes. Some have been very good at first but then the next day they are muted. I'm familiar with vapors tongue but I mean these are really muted. A good example was today I made a 7% banana cream 10% peanut butter both lorann flavoring. Any tips or advice anyone can give me?

Flavor Characteristics
Many users shared their experiences of using e-cigarettes with different flavors on Reddit. From the data set we have collected, the active users consist of e-cigarette vapers, reviewers, vendors, individual e-juice makers, etc. Different users have different perspectives of these flavors. We collected some characteristics of these flavors described by the users, which are shown in Table 1.
In the table above, we selected three posts for each category of flavors and summarized some characteristics. Fruit flavors taste sweet, do not stimulate, are mild, and are closest to the naturally occurring flavor of fruit. User 1 provided some mixing patterns for fruits flavors. User 2 expressed the feeling that banana flavor was "harsh". User 3 described the taste of cherry flavor as the feeling of "kissing a girl".
Cream category flavors are sweet and smooth. The exhale leaves the creamy taste. User 1 talked about mixing chocolate flavor. User 2 preferred cream flavor User 3 reviewed e-juice, including a comment about a vanilla flavor from a specific brand.
Tobacco flavor is more complicated. Some tobacco e-juices are used to mimic the conventional cigarette. However, others are blended with sweeteners to appeal to e-cigarette users who are not interested in the traditional tobacco taste. User 1 insisted the classical tobacco flavor was the best flavor for a smoker. However, user 2 was fond of a milder tobacco taste, and user 3 discussed approaches to increase the throat hit when using tobacco flavors.
Menthol category flavors are slightly bitter on the inhale, offer a light throat hit, and have a cooling sensation in the nasal cavity and on the exhale. Some people report that it tastes harsh. User 1 compared the taste of minty menthol with fresh breath. User 2 mentioned that the menthol flavor could be used to mask the traditional tobacco taste. User 3 was a new user and went to Reddit asking for some suggestions.
Beverages category flavors have a rich aroma, a smooth inhale, and a thick vapor production. User 1 expressed their enjoyment of using coffee flavor. User 2 talked about how his/her mother loved the tea flavor and asked for some information about e-cigarette vendors in Latvia. User 3 said he/she would like to find a Chai tea flavor.
Sweet category flavors are sugary but not greasy as cream. User 1 just did not like the sweet flavors because they did not like the taste of VG. User 2 loved the cotton candy flavor very much, and user 3 said the candy flavor could be helpful in quitting smoking.
Seasonings category flavors are unique and usually are mixed with other flavors. User 1 said that the cinnamon flavor was similar to the taste in a cinnamon roll, which was "wonderful". User 2 discussed the pepper flavor. User 3 shared the composition of a kind of cinnamon e-juice from Mt baker vapor.
Nuts category flavors are mild and can be mixed with other flavors. User 1 was allergic to nuts but still made DIY e-juice with nuts flavors. However, this user had never had an issue vaping them or with accidental contact with skin. Granted, the smell of TFA peanut butter still produced a nauseous feeling due to the association they had developed between the smell of peanut butter and an allergic reaction. User 2 seemed to be a local e-juice vender promoting its products. User 3 discussed experience with making DIY e-juice and asked for help.
In general, the user experiences with these different flavors were different. Their evaluation covers a wide scope. There was plenty of discussion of mixing blends, which seems to be a common trend.

Single Flavor
The number of times that each flavor category and flavor word occurred in posts was counted. If a post mentioned several flavors, all the flavors mentioned would be counted once. If a post contained apple three times and banana one time, we count the number of posts that contained apple as one, and the number of posts that contained banana as one. Table 2 shows the breakdown of posts for each flavor category and specific flavor. We identified a total of 27,638 unique e-cigarette flavor-related posts. The total number of flavors frequencies is 45,130. Thus the average of flavors per post is 1.63. Fruit, cream, and tobacco were the most popular flavor categories, while nuts and seasonings flavors were the least mentioned. It should be noted, however, that many posts discussed several different flavors in the same post. When multiple flavors were mentioned in a single post, it was most often: (

1) A comparison of several flavors (2) An inquiry for introduction of different flavors (3) A demonstration of mixing flavors (4) A promotion of different flavors
Many users shared their e-juice DIY experience. They believed DIY flavors could produce better taste. However, there is a concern that mixing different flavors created some additional risk. Such combinations might produce chemicals harmful to e-cigarette users. Therefore we think monitoring the mixing of flavors is important. We hope the patterns detected in this research will be further examined from a public health perspective. The next section contains further analysis on mixing flavors.  However, if single flavors only are examined rather than flavor categories, as shown in Figure 2, strawberry is the most popular flavor used in mixed e-liquids, followed by vanilla and tobacco flavors. The top 10 flavors include four fruit flavors and two cream flavors, which is consistent with the flavor category findings above.
The most popular combinations of two-and three-flavors were also examined. The posts indicate that combinations of vanilla and strawberry flavors are the most popular among the Reddit users (Tables 3 and 4).

Temporal Trend Analysis of Flavor Categories
Since 2014, many U.S. tobacco companies have entered the e-cigarette market, triggering rapid growth and expansion. It can be seen in Figure 3 that the number of discussions in each flavor category increased significantly beginning in 2014. According to this growth trend, the number will continue to increase through 2015. The fruit category is clearly the most popular on Reddit, with the post volume regarding this flavor category rising the most sharply since 2014. Similarly, other flavor categories, such as cream and sweet, also show an increased discussion post volume due to the fast-developing market. Here, we only examined the increase on category level. However, a flavor-level analysis could provide detailed information on trends for a specific flavor.

Popular Flavor Brand Analysis
Although health benefits and possible adverse effects of e-cigarettes are still unclear and need further study by the scientific community, many smokers show great interest in information regarding the safety and quality of e-liquids used with e-cigarettes. Well-known and/or well-advertised brands, such as Mt Baker Vapor and Hangen, receive good reviews from consumers and are popular among users who report good quality and taste. The top 10 most popular e-liquid brands are shown in Figure 4. Among these brands, Mt Baker Vapor, Hangen, Halo, and Marlboro produce both e-juice and e-cigarette devices. All the other brands only sell e-juice. The characteristics of these brands showed that Reddit was a good promotion platform both for big names in the industry and small companies

Number of Posts
Brand with high-quality products. The comprehensive composition of users might increase the promotion power of this platform.

Discussion
To the best of our knowledge, this is the first systematic study of e-liquid flavors based on data collected from social media. Popularity and characteristics of flavors are summarized based on data collected from Reddit, a social media platform that also has not been studied in the context of e-cigarette content. In this research, the popularity of flavors was the primary focus, which was defined as the number of posts containing the flavor. Although preference and sentiments were not analyzed in this study, we believe the number of posts could be considered as a good index for the prevalence of flavor use. The main findings are as follows: (1) Through analyzing e-cigarette flavor-related posts, 29 different flavors were classified into eight categories, and the fruit category was the most popular flavor category, which is consistent with previous studies [5]. In addition, sweet category flavors and cream category flavors were also very popular. Smaller numbers of users were interested in the tobacco and menthol categories.  [39], and sales of e-cigarettes are estimated to grow 24.2% per year through 2018 [40]. Our data collected from Reddit appear consistent with this noted growth.
Based on the findings above, the fruit category is the most popular flavor among e-cigarette users. The reason might be that fruit flavors taste sweet, are mild, do not stimulate, and are closest to the actual naturally occurring flavor of fruit, since they have a relatively high glycerol ratio. This could potentially contribute to use and uptake of e-cigarette smoking by making products and the vaping experience more appealing. The large variety and diversity of fruit flavors could also be a reason they are the most popular and appeal to the most users. Research on little cigars and cigarillos has found that candy-like flavors could increase the appeal to starters because it masks the heavy cigar taste [7]. Similarly, adding candy-like and other non-tobacco flavors could potentially be perceived as enjoyable, which might make it easier for e-cigarette companies to recruit new users.
The prevalence of flavors in e-cigarettes raises new concerns about this product, which is advertised as an approach to help smokers quit. However, the large variety of sweet flavors could also increase appeal to adolescents and young adults, which could be supported by evidence similar to that used by the FDA to ban the use of flavors (with the exception of menthol) in traditional combustible cigarettes. Flavor use in e-cigarettes, which is not currently banned, has been proposed by public health advocates as just the next in a long list of tactics used by producers and marketers to attract adolescents and young adults to initiate smoking behaviors. Our research suggests that, e-cigarette flavors, especially fruits flavors, are attractive to vapers. The health effects and the attractiveness of flavors should be further examined to help policy makers to determine what regulatory action is appropriate for flavored e-cigarettes. E-cigarette flavorings could potentially be harmful to humans if inhaled into the lungs. Prior research found that the concentrations of some flavor chemicals in e-cigarette fluids are sufficiently high for inhalation exposure by vaping to be of toxicological concern [21]. Another investigation found that flavoring is a parameter known to affect the stability of products. For example, nicotine was often easily oxidized by common substances found in mint, vanilla, and fruit flavors. The oxidative degradation of nicotine resulted in high amounts of nicotine-related impurities, which was harmful to human bodies. Furthermore, based on their sampling, some brands had levels of impurities above accepted limits for pharmaceutical products [3]. Our findings suggest that fruit flavors, in particular, should be the first to receive further investigation from the medical and public health communities.

Contributions
In summary, e-cigarette flavors could be a possible factor that contributes to smoking initiation. Our findings, based on Reddit e-cigarette-related content, reveals that fruit flavors are the most popular and that this flavor category should perhaps be the first to be investigated further in future research.
This study is also the first study to examine Reddit as a social media data source for e-cigarette research. The user-generated content on Reddit is likely to be different than on other social media sites. Further analysis and comparison of e-cigarette content across social media platforms is needed. This study and its findings based on Reddit should serve as the first example for data mining in this platform. This is also the first research that we are aware of that investigates flavor combinations mixed by e-cigarette users. Rather than a single e-liquid flavor, consumers are increasingly mixing multiple flavors to create a more complex and unique vaping experience. We find that fruit and cream flavor categories are the most popular used in mixing. The mixing of different flavors could potentially change the substance and components in the e-liquid, creating new compounds for which even less is known regarding long-term health effects. We hope our findings in this study could be analyzed by toxicological experiments to find out the potential health effects of flavor mixing.

Limitations
We collected data on Reddit from 1 January 2011 to 30 June 2015 to the extent feasible, but posts and comments beyond this scope were not collected. Including all the data could provide more comprehensive understanding of the patterns of e-cigarette flavors. However, we believe the current dataset of 493,994 posts is large enough for some data analysis and mining processes. In this research, we treated posts and comments as the same. Thus, relationships and network structures of the posts and comments were not included in the analysis, and interactions of users were omitted. This information could be used in more detailed description and prediction models.
As for the data collection strategy, we used keywords from current literature to identify e-cigarette-related posts: electronic cigarettes, e-cigarettes, ecigarettes, e-cigs, flavor, flavors, and e-juice.
However, another important keyword we missed is vape or vaping, which is widely used among e-cigarette users to discuss the behavior of smoking e-cigarettes. We may have overlooked some valuable data because of this flaw. However, because of the huge amount of posts and comments, vape and vaping were also widely used in the posts we collected. Thus, we still believe the validity of our research findings. In future research, we would like to expand the keywords set for a more comprehensive dataset.
We were interested in the Reddit user characteristics, however, age and gender information were not available and hence could not help us define more important and interesting patterns on e-cigarette adoption.
Another limitation is the categorization of flavors. We only listed major flavors mentioned in the posts. Some other flavors such as sugar cane, apple vinegar, and absinthe are not included because they appeared infrequently. Although these flavors are not as common, additional analyses are needed to characterize all flavors because certain flavors might not be heavily used by large populations but could be used by specific populations, such as youth.
Finally, this study only focused on the prevalence of flavors, not the sentiments of posts and comments. The user preference of flavors is closely related to positive or negative content. For example, "I hate strawberry e-juice" and "I love strawberry e-juice" have totally different preference meaning. However, they all positively contribute to the popularity analysis we performed. Thus, this research determined popular or prevalent flavors, but not preferred flavors.

Future Research
We envision three possible approaches for further study. First, as stated above, certain flavors or flavor constituents could be hazardous to human health. Flavors are sold in stores and via the internet to a wide variety of users even though no medical testing of those flavors has been conducted, so the constituents of those flavors are unknown. The U.S. FDA has indicated that they intend to regulate e-cigarette products, and flavors, so future research will need to be conducted to determine health risks. We also examined flavor mixing/mixtures. Since cinnamon was found to contain highly cytotoxic substances [13], flavors blends including cinnamon should be tested. Cream is an interesting new category that could be further explored.
Future informatics analyses can also inform understanding of flavor and e-cigarette use, such as self-reported associations and relationships in social media between flavor and short-term discomfort and long-term symptoms. For example, our analysis of social media comments on Reddit found that some users reported xerostomia, eructation, and allergy after trying e-cigarettes. Some users expressed concern about the effect of sweet flavors on diabetes. Further exploration of how users describe the health effects of specific flavors might even function as an "early warning system" regarding potentially unsafe flavorants. Thus, further in-depth analysis of flavor characteristics and potential disease risk is a promising area of research.
Second, this study only focused on the prevalence of flavors, not the meanings of posts and comments. A more in-depth analysis of sentiments could reveal more information and patterns of flavors among e-cigarette users. For example, the discussion of an apple flavor could be considered as positive, negative, or neutral in sentiment, due to the difference in user-generated content. A broadly mentioned e-juice brand could be really great, or on the contrary, could be the worst. The two situations are the same in prevalence but different in sentiment. Thus, attending to the meanings of posts could reveal more information about e-cigarette users.
Finally, it is important to note that the flavors being discussed on Reddit do not provide information on the nicotine strength of the products. Different nicotine concentrations could potentially impact how different flavors are perceived or what reaction or health effects occur when they are consumed. For example, some users mentioned that tobacco flavor should be used in liquid containing high concentrations of nicotine because the "throat hit" would be more intense and enjoyable. Thus, further research could be conducted to analyze the matching and effect of flavors and nicotine solutions.

Conclusions
This is the first study showing that Reddit is heavily used by the e-cigarette and vaping community to share information about flavors and other aspects of e-cigarette use, and that Reddit social media data can be mined for valuable information on self-reported e-cigarette flavor use. We found that fruit flavors are the most popular among all the flavor categories, and postings indicate that combinations of vanilla and strawberry flavors are the most popular among Reddit users. This analysis of Reddit social media is an important step in understanding how and why consumers use different flavored e-cigarette products, in particular because Reddit data is far more difficult to analyze than some other social media venues (e.g., Twitter). Reddit data could function as an early warning system to better understand the emerging trends of e-juice flavors. In addition, future analyses of these data could lend insight into how consumers and e-cigarette businesses are reacting to new regulations and products. Consumers, e-cigarette producers, and policy makers could make use of this information to identify new products, health outcomes from particular products, and how products are being marketed/promoted, which could in turn inform the development and implementation of new regulations or laws. Scott J. Leischow and Janet Okamoto contributed to the manuscript and provided critical feedback on the manuscript. All authors read and approved the final manuscript.