The Social Aspects of Sexual Health: A Twitter-Based Analysis of Valentine’s Day Perception

: Sentiment analysis (SA) is a technique aimed at extracting opinions and sentiments through the analysis of text, often used in healthcare research to understand patients’ needs and interests. Data from social networks, such as Twitter, can provide useful insights on sexual behavior. We aimed to assess the perception of Valentine’s Day by performing SA on tweets we collected between 28 January and 13 February 2019. Analysis was done using ad hoc software. A total of 883,615 unique tweets containing the word “valentine” in their text were collected. Geo-localization was available for 48,918 tweets; most the tweets came from the US (36,889, 75.41%), the UK (2605, 5.33%) and Canada (1661, 3.4%). The number of tweets increased approaching February 14. “Love” was the most recurring word, appearing in 111,981 tweets, followed by “gift” (55,136), “special” (34,518) and “happy” (33,913). Overall, 7318 tweets mentioned “sex”: among these tweets, the most recurring words were “sexy” (2317 tweets), “love” (1394) and “gift” (679); words pertaining to intimacy and sexual activity, such as “lingerie”, “porn”, and “date” were less common. In conclusion, tweets about Valentine’s Day mostly focus on the emotions, or on the material aspect of the celebration, and the sexual aspect of Valentine’s Day is rarely mentioned.


Introduction
Valentine's Day is a feast celebrated annually, on February 14, traditionally associated with the celebration of love and romance. The history of Saint Valentine and the reasons for celebrating love on its day are controversial and often disputed. St. Valentine, a clergyman from Rome or the nearby village of Terni, was beheaded in 269 AD, on February 14th [1], in a time window possibly associated with the ancient roman feast of the Lupercalia, a rite connected with fertility. While several Christian recurrences have been superimposed onto preexisting pagan and local feasts, in this case there is little, if any, direct evidence suggesting an association between the two celebrations. Indeed, Valentine's Day was first associated with its current romantic connotations several centuries later: the first mention of Valentine's Day as a celebration of courtly love occurs in Chaucer's poetry (Canterbury Tales) in the late 14th century. The literary tradition of Valentine's Day was later maintained Sexes 2021, 2 51 in works by several authors, such as Shakespeare (Hamlet). The celebration of Valentine's Day has changed considerably across the ages [2]. In the 17th Century, "Valentine lotteries" took place on February 13th: participants drew the name of their Valentine (usually from a hat) in order to write short poems for each letter of the person's name. Such poems were then pinned to hats and clothes as to make show of them. By the 18th Century, the custom of Valentine lotteries had been largely abandoned by people of the "upper class", as the role of chance on the selection of a Valentine was considered "a vulgar superstition". At the end of the 18th Century, new printing techniques allowed a new niche market to flourish, and pre-made Valentine cards were printed and sold by local booksellers and stationers. Hand-made cards were, however, often considered more "romantic" than those printed by a machine. The current custom of gifting cards seemingly started in the early 19th Century, with the word "valentine" itself becoming a synonym for the card traditionally gifted to the loved ones [3]. In the present day, Valentine's Day is associated with several "cultural rituals"-among which the exchange of gifts has been the most extensively studied in scholarly literature [4].
While Valentine's Day, as a feast celebrating love, had and has important bearings on human health and behaviors, several people do not enjoy the need for celebration and decide to completely opt out of the feast, in order to preserve the "romantic" aspect of the feast or as a "backlash" against the commercial exploits from vendors. Likewise, singles and people in unhappy relationships have, obviously, no particular interest in celebrating [5]. Previous studies suggested an increase in para-suicidal behavior on Valentine's Day [6], as well as on other public holidays [7], possibly as a result of the underestimation of their emotions on the day of celebration [8]. Additionally, management of expectations is seemingly relevant for healthcare [9,10]. Furthermore, subjects with higher social anxiety are more likely to have positive expectations regarding Valentine's Day if they are in a relationship [11], and envy similarly increases in the same timeframe [12]. There is therefore a rationale for suspecting increasing social anxiety in couples as Valentine's Day approaches.
We believe that the large amount of text data made available through social networks could provide useful insights in regard to people's opinions, concerns, and desires around Valentine's Day. Social networks have reshaped communication patterns, allowing direct communication to coexist with "open" messages which can be read and commented on by many different users. While Facebook and Instagram predominantly feature the possibility of sharing pictures and videos to other users, Twitter has always kept a more "serious" tone, promoting the use of just words (up to 280 characters) as a means to convey a message. Social networks are also often used in the academic setting: scientific societies, as well as scientific journals, often use Twitter in order to provide quick updates to their followers. Given these premises, it is easy to understand why medical conditions are frequently debated on social networks: Twitter and similar platforms allow patients to share their concerns and needs with peers sharing the same condition, while, on the other hand, healthcare professionals can benefit from the constant feed of scientific updates. In several fields of medical research, Twitter is used as a means to address the attitudes and perceptions of the general population regarding specific topics, such as mental health [13] and diabetes [14], or to involve, educate, and engage [15,16]. During the COVID-19 pandemic, Twitter-based sentiment analysis was also performed in order to measure the understanding of public awareness [17], or to investigate the possible effects of the restriction measures on medically relevant conditions, such as diabetes [18] and arthritis [19]. An analysis of tweets pertaining to male sexual dysfunctions was performed in 2019 and discovered that both erectile dysfunction and premature ejaculation are frequently mentioned by Twitter users [20]: this might help understand the needs and interests of the general population in the field of sexual medicine. In this regard, Twitter-based studies have already been conducted in different fields of healthcare, ranging from celiac disease [21] to influenza outbreaks [22]. Sentiment analysis of tweets pertaining to Valentine's Day could improve our knowledge on how this celebration is perceived, providing further insights for experts in sexual health as well. We therefore propose an approach based on data collection from Twitter, aiming to assess the prevalence of the different aspects of the celebration-namely, material, sexual, relational, and psychosexological aspects. Our main prediction was that as Valentine's Day approaches, the collection of tweets would reflect the global perception of the celebration itself, highlighting recurring themes and possibly providing evidence of social anxiety. Based on previous research, we also aimed to identify whether the prevalence of tweets investigating love and sex was different in tweets posted shortly before Valentine's Day than in another random timeframe.

Data Collection
The used method has been previously described [20,21]. In detail, an automated script using the R package rtweet [23] in the statistical software R (version 3.5.3, R Core team; R Foundation for Statistical Computing, Vienna, Austria) was run daily between 28 January 2019 and 13 February 2019 (16 days). Every day, the script would collect new tweets containing the word "Valentine" (in the English language only) by directly querying Twitter's search API (Application program interface) through the rtweet::search_tweets(q = "valentine") function of the rtweet package [23], then filter retweets, strip them of weblinks, mentions, emojis and emoticons, and store the resulting dataset in an incremental backup aimed to preserve data. Due to Twitter's search API limiting search results to recent tweets posted in the previous 7 days, the script needed to run daily. After a pre-collection period of 5 days in which we measured the average number of tweets posted per day, we set the script for collecting up to 300,000 tweets per day. Subsequently, duplicates were identified by using the dplyr::distinct() function of the dplyr package [24]; suspected Twitter bots were identified by assessing the frequency of tweets and retweets from each Twitter individual user id, the number of followers and followed users, and the general contents of the majority of tweets.

Data Analysis
Data analysis was performed with the statistical software R (version 3.5.3, R Core Team), using the aforementioned package rtweet [23], as well as the packages dplyr and tidytext for data cleaning, and ggplot2 for figure drawing [24][25][26]. Assessment of correlation between words (φ) was performed by measuring how often selected words appear together relative to how often they appear separately; a φ > 0.05 was considered suggestive of a significant correlation. The chi-square test was used to assess the differences between distributions of tweets.

Results
At the end of data collection, we gathered a total of 883,615 tweets including the word "Valentine"; geo-localization was available for 48,918 tweets only, showing that, as expected due to the English term used for the search, most tweets came from users in the US (36,889, 75.41%), in the UK (2605, 5.33%) and in Canada (1661, 3.4%). We then performed data cleaning by removing duplicates, as well as "suspect" tweets (mostly advertisement), leaving 228,141 unique tweets for analysis. We observed a marked increase in the number of weekly tweets when approaching February 14 ( Figure 1). Due to the characteristics of the method used and to the profiles of Twitter users, data on gender and the sexual orientation of the sample studied were not available. We assessed the prevalence of different monograms in the text of all tweets included in the analysis. The most common word for all Valentine tweets is, unsurprisingly, "love", occurring in 27,969 tweets (12.26% of all tweets); several of the most common words, however, mostly focus on the "material" component of the feast, as shown by the prevalence of monograms such as "gift" (15,090 occurrences, 6.61% of all tweets), "chocolate" (5542, 2.43%), "card" (4365, 1.91%), "flowers" (3729, 1.63%) and "dinner" (3719, 1.63%). The only word depicting the sexual act, "fuck", is at 50th place (2463 occurrences, 1.08%). The list of the 50 most recurring words is depicted in Figure 2. We assessed the prevalence of different monograms in the text of all tweets included in the analysis. The most common word for all Valentine tweets is, unsurprisingly, "love", occurring in 27,969 tweets (12.26% of all tweets); several of the most common words, however, mostly focus on the "material" component of the feast, as shown by the prevalence of monograms such as "gift" (15,090 occurrences, 6.61% of all tweets), "chocolate" (5542, 2.43%), "card" (4365, 1.91%), "flowers" (3729, 1.63%) and "dinner" (3719, 1.63%). The only word depicting the sexual act, "fuck", is at 50th place (2463 occurrences, 1.08%). The list of the 50 most recurring words is depicted in Figure 2.
We then focused on the more "intimate" part of Valentine's Day: in order to assess the prevalence of tweets pertaining to the sexual component of the feast, we then filtered all tweets including the word "sex" or other depictions of the sexual act ("intercourse", "coitus", "fuck" and derivatives) in their text. Overall, 5829 tweets were thus collected (0.83% of all tweets). When addressing only tweets pertaining to sex, the most common word was "sexy", with 660 occurrences (11.32%), with "love" closely after (641 occurrences, 11%). The material aspect of the feast (gifts, chocolate, oysters, champagne, etc.) was once again measurable among these tweets, with "gift" occurring in 269 tweets (4.61%); however, more sexually-oriented themes could be appreciated, such as "porn" (225, 3.86%) and "lingerie" (139 occurrences, 2.38%) (Figure 3). The vulgar slang words for female body parts were among the most commonly recurring words, with "ass" and "pussy" being, respectively, at 12th and 42nd place in the list, with 148 and 80 occurrences (2.54% and 1.37%). We then focused on the more "intimate" part of Valentine's Day: in order to assess the prevalence of tweets pertaining to the sexual component of the feast, we then filtered all tweets including the word "sex" or other depictions of the sexual act ("intercourse", "coitus", "fuck" and derivatives) in their text. Overall, 5829 tweets were thus collected (0.83% of all tweets). When addressing only tweets pertaining to sex, the most common word was "sexy", with 660 occurrences (11.32%), with "love" closely after (641 occurrences, 11%). The material aspect of the feast (gifts, chocolate, oysters, champagne, etc.) was once again measurable among these tweets, with "gift" occurring in 269 tweets (4.61%); however, more sexually-oriented themes could be appreciated, such as "porn" (225, 3.86%) and "lingerie" (139 occurrences, 2.38%) (Figure 3). The vulgar slang words for female body parts were among the most commonly recurring words, with "ass" and "pussy" being, respectively, at 12th and 42nd place in the list, with 148 and 80 occurrences (2.54% and 1.37%). We then compared the prevalence of the words "love" and "sex" in 3033 tweets collected over a random 16-day interval, using the dataset we previously used for sentiment analysis of sexual dysfunctions [20]. The dataset contained tweets retrieved and filtered in the same method reported above, but using two different search strings (rtweet::search_tweets (q = "premature ejaculation") and rtweet::search_tweets(q = "erectile dysfunction")); overall, the script used to retrieve data was run on a daily basis between 24 May and 9 October 2018, and a 16-day interval was chosen at random in order to prevent possible sources of bias. The US, the UK, and Canada were the most prevalent sources of tweets in the sexual dysfunction dataset as well. Tweets pertaining to love were much more common in the Valentine's Day dataset compared to the sexual dysfunction dataset, with "love" occurring in 27,969 and 46 tweets, respectively (12.26% and 1.52%); the opposite could be observed for tweets pertaining to sex, as their prevalence was higher in the sexual dysfunction dataset (282 tweets, 9.3%) than in the Valentine's Day dataset (5829, 0.83%; χ 2 = 1039.2, p-value 0.0001). We then compared the prevalence of the words "love" and "sex" in 3033 tweets collected over a random 16-day interval, using the dataset we previously used for sentiment analysis of sexual dysfunctions [20]. The dataset contained tweets retrieved and filtered in the same method reported above, but using two different search strings (rtweet::search_tweets(q = "premature ejaculation") and rtweet::search_tweets(q = "erectile dysfunction")); overall, the script used to retrieve data was run on a daily basis between 24 May and 9 October 2018, and a 16-day interval was chosen at random in order to prevent possible sources of bias. The US, the UK, and Canada were the most prevalent sources of tweets in the sexual dysfunction dataset as well. Tweets pertaining to love were much more common in the Valentine's Day dataset compared to the sexual dysfunction dataset with "love" occurring in 27,969 and 46 tweets, respectively (12.26% and 1.52%); the opposite could be observed for tweets pertaining to sex, as their prevalence was higher in the sexual dysfunction dataset (282 tweets, 9.3%) than in the Valentine's Day dataset (5829 0.83%; χ 2 = 1039.2, p-value 0.0001).
We then performed the analysis of the correlation among words, in order to identify "word clusters" which might suggest the presence of common underlying themes. The correlation index indicates how often these words, or mono-grams, appear together relative to how often they appear separately. We used a correlation index (ϕ) of 0.05 based on previous research [20]. Correlation plots have been depicted in Figure 4. As expected when addressing all tweets, most of the "material" components-such as "special", "perfect", "gift", "ideas" and "shop"-of the feast are grouped together; similarly, words pertaining to the "celebrative" aspect of Valentine's Day ("romantic", "dinner", "night" and similar) are in a separate cluster. Word associations for the subgroup of tweets pertaining We then performed the analysis of the correlation among words, in order to identify "word clusters" which might suggest the presence of common underlying themes. The correlation index indicates how often these words, or mono-grams, appear together relative to how often they appear separately. We used a correlation index (φ) of 0.05 based on previous research [20]. Correlation plots have been depicted in Figure 4. As expected, when addressing all tweets, most of the "material" components-such as "special", "perfect", "gift", "ideas" and "shop"-of the feast are grouped together; similarly, words pertaining to the "celebrative" aspect of Valentine's Day ("romantic", "dinner", "night" and similar) are in a separate cluster. Word associations for the subgroup of tweets pertaining to sex are similarly grouped: among these groups, we identified a cluster for traditional gifts ("buy", "chocolate", flowers"), another for "preparations" for the evening ("sexy", "lingerie", "romantic", "love", "playlist") and another one for more vulgar, slang words pertaining to the physical part of the sexual act ("pussy", "bitch", "ass", "hot"). gifts ("buy", "chocolate", flowers"), another for "preparations" for the evening ("sexy", "lingerie", "romantic", "love", "playlist") and another one for more vulgar, slang words pertaining to the physical part of the sexual act ("pussy", "bitch", "ass", "hot").

Discussion
Valentine's Day is a feast traditionally associated with love and passion, and it is therefore unsurprising that sex is often a part of the celebrations. Through sentiment analysis, we tracked all the English tweets in the two weeks before February 14th and identified the most commonly recurring words and topics related to Valentine's Day. The analysis allowed us to understand that while "love" is the most common word, a relevant portion of Twitter users discuss the more "material" part of the feast, i.e., the gifts (or lack thereof). Chocolate, flowers, and cards are among the most common gifts shared by lovers on February 14th; this tradition seemingly stems from the end of the 18th century in Britain, with cards being sent long before decorated boxes of chocolate. Chocolate has often been cited as an aphrodisiac, hence the reason for gifting boxes filled with fine delicacies to loved ones. Despite inconclusive evidence concerning the effects on mood [27][28][29], in most countries, offering chocolate as a gift has become a well-established tradition among lovers. In this regard, the importance and motivations behind Valentine's Day gifts has been investigated many times in the literature [4]: several hypotheses have been made, including contradicting themes such as obligation, altruism, and self-interest [30,31]. Additionally, the topic of the importance of lovers' gifts approaching Valentine's day has already been investigated in the context of naturally occurring cultural priming. The eval-

Discussion
Valentine's Day is a feast traditionally associated with love and passion, and it is therefore unsurprising that sex is often a part of the celebrations. Through sentiment analysis, we tracked all the English tweets in the two weeks before February 14th and identified the most commonly recurring words and topics related to Valentine's Day. The analysis allowed us to understand that while "love" is the most common word, a relevant portion of Twitter users discuss the more "material" part of the feast, i.e., the gifts (or lack thereof). Chocolate, flowers, and cards are among the most common gifts shared by lovers on February 14th; this tradition seemingly stems from the end of the 18th century in Britain, with cards being sent long before decorated boxes of chocolate. Chocolate has often been cited as an aphrodisiac, hence the reason for gifting boxes filled with fine delicacies to loved ones. Despite inconclusive evidence concerning the effects on mood [27][28][29], in most countries, offering chocolate as a gift has become a well-established tradition among lovers. In this regard, the importance and motivations behind Valentine's Day gifts has been investigated many times in the literature [4]: several hypotheses have been made, including contradicting themes such as obligation, altruism, and self-interest [30,31]. Additionally, the topic of the importance of lovers' gifts approaching Valentine's day has already been investigated in the context of naturally occurring cultural priming. The evaluation of red roses and gift chocolates showed a positive shift in the days before Valentine's Day, suggesting that changes in evaluations might be related to the "cultural salience" of the event [32]-people gift chocolate because they think they are expected to.
Sexual activity often plays a significant role on Valentine's Day. Several reports, mostly anecdotal, suggest that a spike in the sales of pro-erectile drugs occurs in the weeks before February 14th, possibly suggesting rising concerns with sexual performance on these days [33]. We have already reported that sexual dysfunctions are often mentioned on Twitter [20]; additionally, words such as "lingerie", "porn" and "sexy" suggest that the erotic aspect of Valentine's Day (or, maybe, Valentine's Night) is often mentioned by Twitter users. Clinical experience suggests that the idea of "mandatory intercourse on Valentine's Day might act as a major stressful event, therefore psychogenically impairing the ability to obtain and/or maintain an adequate erection, as in the subclinical erectile dysfunction" [34]. However, this issue does not come up among the most recurring words and themes; "stress" is in fact the 1577th most recurring word among all tweets included in analysis and is mentioned only four times among sexually oriented tweets. We conclude that no solid evidence of increased anxiety concerning sexual performance for Valentine's Day can be drawn from our collected data: we therefore believe that the prevalence of tweets mentioning stress could suggest that this is a rather rare finding, although sentiment analysis does not allow us to completely exclude the possibility of Valentine's Day acting as a stressful event.
Additional investigation is needed in order to assess the effects of Valentine's Day on psychological well-being and sexual dysfunction. Sales data concerning anti-depressant or anti-anxiety drugs, or phosphodiesterase type 5 inhibitors, could provide more interesting data in this regard: however, these data are not publicly available, and several confounding factors might become sources of bias. Gender-based differences could also play a significant role in the perception of Valentine's Day, with some studies investigating the mechanisms behind gifting in males providing some grounds upon which to expand [31]. Furthermore, identifying whether Valentine's Day is different from any other holiday, such as Christmas or New Year's Eve, would provide additional insights concerning the "social rituals" associated with these celebrations as well.
While our study is the first, to our knowledge, to investigate the public perception of Valentine's Day through sentiment analysis, there are some important limitations which should be considered. First and foremost, as the search string was an English word, it is possible that results could not be extended to the whole "twitterverse". Additionally, gender-based differences in regard to the approach to Valentine's Day, while interesting, could not be measured due to limitations of the results provided by the Twitter search API. Last, but not least, quantitative assessment of the sentiment was not feasible, as different words could have different meanings according to context (e.g., "no gifts" would have a negative meaning in a tweet such as "I received no gifts whatsoever", but it would have a positive meaning in a tweet such as "I expected no gifts, yet was given lots of chocolate").

Conclusions
Valentine's Day is a feast which has undergone massive changes over the course of several centuries, starting as a celebration of romantic love and progressively becoming associated with the ritual of gifting. In the present day, Valentine's Day is a widely discussed topic on social media: despite some limitations, our study is the first investigating the public perception of Valentine's Day through the use of a social network. Most tweets involve the tradition of trading gifts among lovers, such as jewelry, flowers, and, of course, chocolate-in the end, "even with nougat, you can have a perfect moment" [35]. Tweets pertaining to sexual activity are not unheard of, and several anecdotal reports suggest increasing anxiety when Valentine's Day approaches. However, our results suggest that most Twitter users do not openly discuss the possibility of having sex on February 14th. Concerns with sexual performance on Valentine's Day, a claim sometimes promoted by the media and supported by a "spike" in sales of pro-erectile drugs, were not found to be a frequently perceived issue in our study. In conclusion, we demonstrated that the analysis of tweets in peculiar timeframes could be used as a powerful tool to measure in the wild, naturalistic context the sentiments underlying social and sexual perceptions, thoughts, choices, and, finally, behaviors. More adequate investigation methods might be required to understand fully the psycho-sociological dynamics of Valentine's Day.