A Systematic Literature Review of Sexual Harassment Studies with Text Mining

: Sexual harassment has been the topic of thousands of research articles in the 20th and 21st centuries. Several review papers have been developed to synthesize the literature about sexual harassment. While traditional literature review studies provide valuable insights, these studies have some limitations including analyzing a limited number of papers, being time-consuming and labor-intensive, focusing on a few topics, and lacking temporal trend analysis. To address these limitations, this paper employs both computational and qualitative approaches to identify major research topics, explore temporal trends of sexual harassment topics over the past few decades, and point to future possible directions in sexual harassment studies. We collected 5320 research papers published between 1977 and 2020, identiﬁed and analyzed sexual harassment topics, and explored the temporal trend of topics. Our ﬁndings indicate that sexual harassment in the workplace was the most popular research theme, and sexual harassment was investigated in a wide range of spaces ranging from school to military settings. Our analysis shows that 62.5% of the topics having a signiﬁcant trend had an increasing (hot) temporal trend that is expected to be studied more in the coming years. This study offers a bird’s eye view to better understand sexual harassment literature with text mining, qualitative, and temporal trend analysis methods. This research could be beneﬁcial to researchers, educators, publishers, and policymakers by providing a broad overview of the sexual harassment ﬁeld.


Introduction
Behaviors that define Sexual Discrimination and Harassment (SDH) are often presented on a continuum, from offensive comments to sexual and physical assault [1]. Nonphysical SDH such as sexual remarks (e.g., verbal remarks about the size of women's breasts) is the most frequently reported SDH [1]. According to a survey, 81% of women and 43% of men experienced sexual harassment during their lifetime in the US [2]. Annually in the US [3], more than 400,000 Americans over the age of 12 are sexually assaulted or raped, 60,000 children are victims of sexual abuse, and 18,900 military members experience unwanted sexual contact. The majority (69%) of sexual assault victims are below the age of 30 [4]. According to a survey, it was estimated that 21.3% of women and 2.6% of men in the US have experienced completed or attempted rape at some point in their lifetime [5]. In terms of locations, 55% of sexual harassment occurs at or near the victim's home, in an open public space (15%), at or near a relative's home (12%), in an enclosed public area (10%), and on school property (8%) [3].
SDH has negative physical (e.g., sleeplessness) and mental health (e.g., depression) effects [6]. SDH also has negative financial impacts. The loss of productivity in SDH

N/A N/A-135
Sexual harassment on the Internet [6] 2005 N/A N/A-98 Gender and communication incomputer mediated communication (CMC) environments [30] 2005 N/A N/A-68 Role of gender in workplace stress [31] 2006 N/A N/A-30 Sexual harassment at work and cross-cultural study of reaction to academic sexual harassment [32] 2006 N/A 182 Women veterans' health [33] 2007 N/A 41 Sexual harassment at work [34] 2008 N/A N/A-73 Aggression and sexual harassment in service encounters (sexual harassment at work) [35] 2008 N/A 49 Sexual harassment at work [36] 2009 1995-2009 N/A-151 Sexual harassment at work [37] 2010 N/A N/A-73 Interventions for sexual harassment at work [38] 2011 N/A N/A-147 Sexual harassment at work [1] 2011 N/A 32 Bullying in special education (Youth) [39] 2012 N/A N/A-121 Sexual harassment at work [40] 2012 N/A N/A-157 Sexual harassment at work [41] 2013 N/A N/A-35 Peer sexual harassment (Youth) [42] 2014 N/A N/A-159 Workplace injustices and occupational health disparities [43] 2014 N/A 136 Bullying, violence and sexual harassment of nurses [44] 2015 N/A 60 Interventions for sexual harassment at work [45] 2016 N/A N/A-73 Sexual harassment and assault in the US military [46] 2017 N/A N/A-45 Sexual harassment in academia [47] 2018 1995-2018 11 Gender-based nature of technology-facilitated sexual violence (TFSV) [48] 2018 N/A 60 Sexual harassment training [45] 2018 N/A N/A-122 Sexual harassment at work [49] 2019 2000-2019 24 Sexual harassment in higher education [50] 2019 N/A N/A-43 Sexual harassment in academia [51] 2019 N/A N/A-105 Sexual harassment at work [52] 2019 2005-2018 15 Sexual harassment of nurses at work [53] 2019 2003-2019 N/A-95 Sexual cyberbullying [54] 2019 N/A N/A-67 Sexual harassment [55] 2019 N/A N/A-134 Sexual harassment at work [56] 2019 1990-2017 60 Sexual harassment of refugees [57] 2020 1966-2017 30 Sexual harassment in higher education [58] 2020 N/A 20 Sexual harassment against female nurses at work [59] 2020 1980-2020 71 Sexual harassment in transit environments [60] 2020 N/A N/A-109 Sexual harassment of girls [61] There has been extensive research on sexual harassment, as evidenced by the large number of reviews identified in Table 1. Most of the reviews had an emphasis on sexual harassment in the workplace and included the following topics: prevalence and frequency Sustainability 2021, 13, 6589 4 of 24 of sexual harassment; the legal history and frameworks of sexual harassment; different theoretical perspectives to explain sexual harassment; summaries of precursors at the individual and organizational level; consequences of experiencing sexual harassment (e.g., physical, psychological, social, and work-related consequences); and coping strategies of victims of sexual harassment. Additionally, there were some reviews on interventions and trainings aimed at decreasing sexual harassment, perceptions and definitions of sexual harassment, bullying and peer sexual harassment in youth, sexual harassment in specific settings (e.g., academia) or among specific populations (e.g., women of color, nurses), and reviews of sexual harassment on the Internet.
Previous research reviews provide valuable understanding of different aspects of research on sexual harassment; however, these articles and chapters have some limitations. First, due to the limitation of traditional literature review papers, a limited sample of all relevant papers, mainly published in top journals, was selected from all possible relevant articles. Second, these papers do not include a temporal trend analysis to study how research topics change during a time frame. Third, the limitation of traditional literature review methods imposes another restriction on investigating a limited number of topics and usually focusing on a single topic, such as prevalence and theories of sexual harassment in the workplace. Fourth, some reviews engaged in macrolevel analyses and attempted to synthesize major topics and theories from the literature on sexual harassment in the workplace; however, these reviews did not encompass all the topics related to sexual harassment (e.g., did not include sexual harassment in youth or sexual harassment outside the workplace). Fifth, the traditional methods are time consuming and cannot be applied on large datasets.
While current literature review studies utilize standard formats such as Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), the current study develops a systematic approach using mixed methods to collect and analyze a large number of studies' abstracts to address the aforementioned gaps and provide a wider perspective on the sexual harassment literature. This paper employs both computational and qualitative approaches to identify major research topics, explore temporal trends of the topics over the past few decades, and point to future possible directions in sexual harassment studies. This paper addresses the following research questions:

1.
What are the main research topics in studies related to sexual harassment? 2.
What is the temporal trend of each topic?
This endeavor offers the following contributions. First, to the best of our knowledge, this is the first study to analyze thousands of sexual harassment manuscripts. Secondly, the proposed data analysis framework is a flexible approach that can be applied to other research fields. Thirdly, the data will be shared, which provides a great opportunity for further investigation and replicating results. Fourth, this paper can shine a light on past and future sexual harassment research by exploring main SDH research topics from 1977 to 2020.

Materials and Methods
This section describes our corpus and data analysis methods used in this research. We collected and cleaned data and utilized mixed methods including topic modeling, topic analysis, and temporal trend analysis to investigate sexual harassment literature containing thousands of research studies. Figure 1 shows four steps of this research. The following sections provide more details on each step.

Materials and Methods
This section describes our corpus and data analysis methods used in this research. We collected and cleaned data and utilized mixed methods including topic modeling, topic analysis, and temporal trend analysis to investigate sexual harassment literature containing thousands of research studies. Figure 1 shows four steps of this research. The following sections provide more details on each step.

Data Collection and Cleanings
Google Scholar does not provide an Application Programming Interface (API) for data collection. Therefore, we need to obtain data from relevant research databases offering an API. We collected relevant journal and conference abstracts containing "sexual harassment" in their title or abstract from three large databases, including Web of Science, Scopus, and EBSCO databases [62,63]. After removing duplicate records based on title, abstract, or DOI, we found abstracts published with their title and keywords. Our corpus contains concise information representing a larger picture of sexual harassment research papers that discussed various issues. We choose to analyze title, abstracts, and keywords,

Data Collection and Cleanings
Google Scholar does not provide an Application Programming Interface (API) for data collection. Therefore, we need to obtain data from relevant research databases offering an API. We collected relevant journal and conference abstracts containing "sexual harassment" in their title or abstract from three large databases, including Web of Science, Scopus, and EBSCO databases [62,63]. After removing duplicate records based on title, abstract, or DOI, we found abstracts published with their title and keywords. Our corpus contains concise information representing a larger picture of sexual harassment research papers that discussed various issues. We choose to analyze title, abstracts, and keywords, instead of full-text papers for several reasons. First, while title, abstracts, and keywords are available for all paper records in a proper format, the full-text format is restricted [64]. Second, title, abstracts, and keywords contain dense, rich information, and most important findings [64]. However, the full-text format has speculative and complex statements [65]. The collected data are available at Supplementary Materials https://github.com/amir-karami/Sexual-Harassment-Literature.

Topic Modeling
To address the first research question, we utilized topic modeling, which is a semantic analysis method to disclose the hidden semantic structure of papers. Among different topic models, Latent Dirichlet Allocation (LDA) [66] is an effective and efficient model [67]. LDA has been used for different applications such as politics [68], health [69,70], opinion mining [71], and social media analysis [72][73][74][75][76][77]. LDA has also been utilized for reviewing literature of different domains such as social media [78], big data [79], biomedical [80,81], and wearable technology [82]. While LDA has been utilized to analyze SDH experiences in academia [83] and workspace [84], LDA has not been used to review the greater sexual harassment literature.
LDA provides two matrices: P(word|topic) and P(topic|document). The former matrix recognizes semantically related words representing a theme. The latter matrix shows distribution of topics for a document (paper), which assists in finding documents related to each of topics. The outputs of LDA for n documents (papers), m words, and t topics, are two matrices [67]. The first one is the probability of each of the words occurring in each topic or P(W i |T k ) and the second one is the probability of each of the topics occurring in each document or P(T k |D j ): Topics Documents The top words in each topic based on the order of P(W i |T k ) represent the topics. We also used P(T k |D j ) to find the significance of each topic, ST(T k ). For an effective comparison, each of the STs was normalized by the sum of the weight scores of all topics: If N_ST(T x ) > N_ST(T y ), it means that researchers discussed topic x more than topic y. This normalization has been applied on all papers in all years to find the weight of topics in total, each decade, and each year.

Topic Analysis
Next, we applied a qualitative approach in four phases to disclose the meaning of topics and their categories: (1) discovering the theme for each topic, (2) detecting meaningful and stable topics, (3) determination of overarching categories, and (4) assessing reliability of coding. We explain each of these phases below. Phase 1. Discovering the theme of initial topics: To make the determination of the topics, three of the authors coded 40 topics individually. "Coding" in this context means that coders read the top most frequently used words (shown in Appendix A) and most relevant papers' abstracts for each of the topics provided by LDA, and identified the common theme underlying the papers. To find the top papers for each topic, we sorted P(T k |D j ) from the highest value to the lowest one. The three coders used consensus coding [85] to agree on the theme for each topic. For consensus coding, the coders first developed themes separately; then they met and compared and contrasted the themes they had each generated, and kept on discussing them until they agreed on the final themes. For example, one topic contained these words: "legal", "law", "environment", "court", "hostile", "discrimination", "decisions", "rights", "act", and "claims". After each of the coders coded these topic words and its corresponding top papers' abstracts individually, they came together, discussed it, and reached consensus on coding this topic as "Workplace Legal Cases" (see T14, Appendix A). Consensus coding was used to determine the label and the description of the topic. Phase 2. Detecting meaningful and stable topics: The next step was to determine the topics that were meaningful, stable, and related to human sexual harassment. To achieve this goal, three types of topics were removed. The first type included topics that contained general words that do not represent a consistent theme. The second group included unstable topics that did not appear in all experiments when we ran LDA three times. The third type were related to animal sexual harassment and behavior. We used the consensus coding method and the results of LDA experiments to refine the topics and agreed on 26 meaningful and stable topics related to human sexual harassment.
Phase 3. Determination of overarching categories: Coders grouped the 26 topics into categories. Coder 1 created the categories, and then the other two coders reviewed the categories. The three coders discussed and reached consensus regarding the grouping and the labels of the categories. Phase 4. Assessing reliability of coding: Once final coding was completed, we utilized an outside coder (coder 4) who was not involved in the qualitative coding to evaluate our consensus coding. In this way, we could determine if, given the same dataset, another person would reach the same conclusions. The outside coder coded 12 of the 40 topics (30% of the total number of topics), deciding which of the labels those 12 topics fit into. Then, we performed a Cohen's κ to determine the agreement between our coding reached via consensus (described in Phase 1) and the outside coder. There was good agreement, κ = 0.7270 [86].

Temporal Trend Analysis
To address the second research question, we applied statistical trend analysis to explore annual changes of the 26 topics. This study utilized a linear trend model based on study temporal changes of topics. This analysis measures the trend of P(T|D) with the R lm function showing the p-value and slope of trends. Based on this analysis, increasing (hot) trends have p-value ≤ 0.05 and slope > 0 and decreasing (cold) trends have p-value ≤ 0.05 and slope < 0. In this analysis, a trend is not meaningful or significant if p-value > 0.05.

Results
We collected 5320 research papers written in English and published between 1977 and 2020. Figure 2 shows that the overall frequency and the relative frequency of papers have a significant change (p-value ≤ 0.05) with an increasing trend (slope > 0). The relative frequency is the rate of the frequency of the papers in a year to the total number of papers published between 1977 and 2020.   Out of the 40 topics, we removed 14 topics that were not stable, not meaningful, or not related to human sexual harassment. Appendix A shows 26 topics with a clear theme along with their label identified with the qualitative process. For example, the coders found that "students", "education", "university", "college", "faculty", and "school" in T2 were related to sexual harassment in education. The order of topics in Appendix A from T1 to T40 is based on the output of LDA. Table 2 shows the definition of the 26 topics, and Appendix A illustrates three related research papers offered by LDA using P(T|D). Figure  3 illustrates the weight of topics. The order of topics in Figure 4 is based on the value of N_ST(Tk) from the highest value to the lowest value. The highest weight is for perceptions The number of tokens (a string of contiguous characters between two spaces) and words were 594,922 and 16,916, respectively. To find topics with LDA, we applied coherence analysis on 2 to 50 topics to find the optimum number of topics. Among topic coherence measures, C_V is highly correlated with human ratings [87]. To measure C_V, we used the gensim Python package [88] and found the optimal point at 40. The comparison of the mean and standard deviation of five sets of 4000 iterations was not significant (p-value > 0.05), indicating the robustness of LDA (Figure 3). To apply LDA on our data, we used the Mallet [89] setting at 40 topics with 4000 interactions.  Out of the 40 topics, we removed 14 topics that were not stable, not meaningful, or not related to human sexual harassment. Appendix A shows 26 topics with a clear theme along with their label identified with the qualitative process. For example, the coders found that "students", "education", "university", "college", "faculty", and "school" in T2 were related to sexual harassment in education. The order of topics in Appendix A from T1 to T40 is based on the output of LDA. Table 2 shows the definition of the 26 topics, and Appendix A illustrates three related research papers offered by LDA using P(T|D). Figure  3 illustrates the weight of topics. The order of topics in Figure 4 is based on the value of N_ST(Tk) from the highest value to the lowest value. The highest weight is for perceptions of sexual harassment and the lowest weight is for racial/ethnic discrimination. Out of the 40 topics, we removed 14 topics that were not stable, not meaningful, or not related to human sexual harassment. Appendix A shows 26 topics with a clear theme along with their label identified with the qualitative process. For example, the coders found that "students", "education", "university", "college", "faculty", and "school" in T2 were related to sexual harassment in education. The order of topics in Appendix A from T1 to T40 is based on the output of LDA. Table 2 shows the definition of the 26 topics, and Appendix A illustrates three related research papers offered by LDA using P(T|D). Figure 3 illustrates the weight of topics. The order of topics in Figure 4 is based on the value of N_ST(T k ) from the highest value to the lowest value. The highest weight is for perceptions of sexual harassment and the lowest weight is for racial/ethnic discrimination.

Sexual Harassment in Education
Higher Education T2 Research on sexual harassment in higher education with several articles focused on medicine. Studies addressed the prevalence of sexual harassment, perceptions of sexual harassment by members of academic institutions, and institutional policies and resources.
Youth Bullying and Victimization T40 Research on sexual harassment (e.g., bullying) in middle and high schools.
Workspace Professional Relationships T6 Research on cross-sex friendships and professional relationships. Multiple studies focused on cross-sex mentorship relations at work and in academic settings with many studies finding that these types of relationships could be challenging and lead to negative outcomes.

T13
Research on sexual harassment by coworkers and costumers. Additionally, workplace romance experiences and policies.

Gender Equality in Workspace T16
Research on equality in the workplace, many articles studied the barriers and challenges (e.g., gender-based discrimination) that women experience in various workspaces.

Category/Topic ID Definition
Poor Health Outcomes of Employees T20 Association between sexual harassment remarks or physical advances (e.g., bullying) and poor health outcomes of employees.
Medical Field Discrimination T21 Research on training, perceptions, and experiences regarding professionalism among students and members in the medical field. Many studies found that women reported gender-based discrimination and sexual harassment.

Workspace Policies T26
Research on developing sexual harassment policies in the workplace such as creating user-friendly sexual harassment policies.
Hospital Workplace Violence T31 Articles studied the types of violence experienced by hospital staff members. Studies found that verbal abuse and threats by patients and patients' family members were common. Additionally, medical staff experienced sexual harassment by other workers as well as patients.

Historically Oppressed Populations
Sex Workers and HIV T7 Research on risk factors (e.g., drug use, sexual harassment/rape) that increase the risk of HIV infection among sex-workers.

Racial/Ethnic Discrimination T22
Research on racial/ethnic and gender discrimination, including sexual harassment and other forms of discrimination. Several articles focused on how the intersection between gender and race/ethnicity increases experiences of oppression and victimization.

Global Society T34
Research on factors that increase vulnerability of low-income people; many articles focused on women. Studies assessed natural, structural, and environmental factors that increased vulnerability (e.g., natural disasters and social settings). Several articles focused on developing countries.

Attitudes, Beliefs, and Perceptions
Sexual Harassing Behaviors T1 Research on individuals' perceptions and attitudes related to sexual harassment behaviors. Several studies surveyed undergraduate students.

Perceptions of Sexual Harassment T12
Research using vignettes and hypothetical scenarios to study perceptions of sexual harassment and attributions of responsibility. Studies assessed how characteristics of the rater (e.g., gender attitudes), the target of harassment (e.g., attractiveness), and the perpetrator influenced individuals' perceptions of the scenario.

Sexist Beliefs and Masculinity T28
The influence of sexist beliefs and threat to masculinity on aggressive behavior, including tolerance for sexual harassment, self-reported perpetration of sexual harassment, and aggressive behaviors in experimental contexts. Digital Space T23 with an emphasis on youth.

Sexual
Prevention and Treatment T24 Research on prevention and treatment of sexual violence and interpersonal violence (such as intimate partner violence and sexual harassment) within workplaces and other settings (e.g., community, schools) Feminism, Media, and Politics T25 Research on portrayals in media and politics of sexual harassment. Most articles focused on the Hill-Thomas hearing and the #MeToo movement. Coping Reactions T37 Research on reactions (e.g., coping strategies) around sexual harassment. We explored the temporal trend over years of the 26 topics from 1977 to 2020. We found 10 topics (e.g., professional relationships) without a significant trend (p-value > 0.05) and 16 topics (e.g., higher education) with a significant trend (p-value ≤ 0.05) ( Table  3). Out of the 16 topics, six topics had a decreasing (cold) trend and 10 topics had an increasing (hot) trend ( Figure 5). Evidenced by the increasing trend of publications in Figure   0 0  We explored the temporal trend over years of the 26 topics from 1977 to 2020. We found 10 topics (e.g., professional relationships) without a significant trend (p-value > 0.05) and 16 topics (e.g., higher education) with a significant trend (p-value ≤ 0.05) ( Table 3). Out of the 16 topics, six topics had a decreasing (cold) trend and 10 topics had an increasing (hot) trend ( Figure 5). Evidenced by the increasing trend of publications in Figure 1, sexual harassment research will likely continue to grow in some aspects and lose momentum in others ( Figure 5). Topics that are particularly "hot" include Sex Workers and HIV; Domestic Violence; Military Trauma; Digital Space; Healthcare Services; Effects of Trauma Exposure; and Youth Bullying and Victimization. The 26 topics can be grouped into 6 broad categories ( Table 2). The first category is health outcomes with three topics: T19, T33, and T36. This category represents research on health barriers, trauma exposure, and mental and physical health outcomes. This category includes research on military sexual trauma, which often has severe mental health outcomes for female veterans. All the topics in this category had a significant increasing trend. The second category represents sexual harassment in education with two topics. This category includes T2 (higher education) and T40 (youth bullying and victimization) that were cold and hot topics, respectively.
The third category is workplace, which has the highest number of topics including T6, T13, T16, T20, T21, T26, and T31. This category includes research on workplace stress, violence, gender-based discrimination, sexual harassment, sexual assault, and discrimination. Studies investigated the prevalence and types of hostile SDH, in addition to their impact on work climate and health and psychological outcomes for employees, and relevant policies and training addressing SDH. This theme included research on workplace stress, violence, gender-based discrimination, sexual harassment, sexual assault, and discrimination. The research took place in different countries and included various settings such as the medical field, police, and service professions. Three topics (T6, T13, and T20) did not show a significant trend, but two topics (T21 and T31) had an increasing trend and two topics (T16 and T26) had a decreasing trend. did not show a significant trend, but two topics (T21 and T31) had an increasing trend and two topics (T16 and T26) had a decreasing trend.  Table 3.
The fourth category represents experiences of historically oppressed populations, which includes the second largest number of topics with a total of three (T7, T22, and T34). This category includes factors that increase vulnerabilities and impact the well-being of these historically oppressed populations. Other than T7, with an increasing trend, the rest of the topics did not demonstrate a meaningful trend. The fifth category is studies investigating attitudes, beliefs, and perceptions of sexual harassment and attribution of responsibility (T1, T12, T28). Different methods used included experimental approaches, in which participants were presented with different scenarios and surveys to assess various sexual harassment attitudes and behaviors. Some studies assessed how individual and contextual characteristics related to participants' perceptions of sexual harassment. While T1 and T12 are cold topics, T28 did not have a significant trend.
The sixth category is sexual harassment in the legal field with two topics: T14 with a decreasing trend and T17 without a significant trend. This category contains research and legal policies on sexual harassment in the legal field. The articles include reviews and analyses of sexual harassment legal cases and laws related to workplace sexual harassment.  Table 3.
The fourth category represents experiences of historically oppressed populations, which includes the second largest number of topics with a total of three (T7, T22, and T34). This category includes factors that increase vulnerabilities and impact the well-being of these historically oppressed populations. Other than T7, with an increasing trend, the rest of the topics did not demonstrate a meaningful trend. The fifth category is studies investigating attitudes, beliefs, and perceptions of sexual harassment and attribution of responsibility (T1, T12, T28). Different methods used included experimental approaches, in which participants were presented with different scenarios and surveys to assess various sexual harassment attitudes and behaviors. Some studies assessed how individual and contextual characteristics related to participants' perceptions of sexual harassment. While T1 and T12 are cold topics, T28 did not have a significant trend.
The sixth category is sexual harassment in the legal field with two topics: T14 with a decreasing trend and T17 without a significant trend. This category contains research and legal policies on sexual harassment in the legal field. The articles include reviews and analyses of sexual harassment legal cases and laws related to workplace sexual harassment.
Each of the following topics represent a separate broad category: Hegemonic Masculinity (T4), Domestic Violence (T18), Digital Space (T23), Prevention and Treatment (T24), Feminism, Media, and Politics (T25), and Coping Reactions (T37). Out of these seven topics, T18, T23, and T25 were hot topics, while the rest of the topics did not show a meaningful trend. It seems that the #MeToo movement has had a major impact on developing more research related to T25 (Feminism, Media, and Politics).
We also measured the average weight of topics per year in each decade ( Figure 6). We found that the following top three topics in each decade:

Discussion
This study offers a bird's eye view to better understand sexual harassment literature with text mining, qualitative, and temporal trend analysis methods. The publication date of articles included in this paper ranges from 1977 to 2020. Future researchers will be able

Discussion
This study offers a bird's eye view to better understand sexual harassment literature with text mining, qualitative, and temporal trend analysis methods. The publication date of articles included in this paper ranges from 1977 to 2020. Future researchers will be able to use the study as an overview of the sexual harassment field and as a starting point for developing new research hypotheses and prevention efforts. Our findings indicate that sexual harassment was studied in different spaces such as the university, school, workplace, home, online space, hospital, and the military. Sexual harassment in the workplace was a particularly popular topic for researchers. Our results also show that sexual harassment research investigated different age groups (e.g., children), sexualities (e.g., LGBT), races/ethnicities (e.g., Latino), and countries (e.g., India). This review could be beneficial to researchers, educators, publishers, and policymakers to better understand the larger picture of issues and research topics and their trends in sexual harassment research.
The large variety of research within the field of sexual harassment shows that the effects of harassment on women and individuals, goes far beyond the traditional workplace [49]. Current studies show that sexual harassment as a form of sexual violence has been included in research under broader labels, such as bullying in middle school and "elder abuse" in nursing homes. Due to the use of broader labels, there is potential for the sexual nature of the violence to be missed, as well as the connection between experiences through various life stages. For example, middle school students can be sexually harassed by peers, and people can experience sexual harassment through the Internet. However, if these topics are only studied under the labels "peer and internet bullying," then we miss their connection to the sexual harassment literature. Expanding the understanding of sexual harassment beyond the conventional physical workplace is particularly relevant in the context of the global COVID-19 pandemic where millions of people have transitioned to working from home online. Thus, broadly reviewing topics that fall under sexual harassment can help draw connections between areas studied separately in the past, thus providing the field with a better understanding of this phenomenon.
Using traditional review methods to summarize all the research on sexual harassment in the past few decades (over 5000 articles) would entail a colossal endeavor. To the authors' knowledge, no reviews on sexual harassment have been published that include such a broad scope as the current article. The results are complex but uniquely capture and summarize all research relevant to the term sexual harassment between the years of 1970 and 2020 in the English language. This study offers the following methodological benefits. As mentioned, the approach used here can analyze a large sample of papers. While traditional methods are time-consuming and labor-intensive, this study used an efficient approach. Another advantage of the method used here is that we were able to identify and predict hot and cold research themes. Finally, the research framework is flexible and can be implemented on other research issues. The data presented can help researchers understand the historical changes and trends in research under the umbrella of sexual harassment in various contexts. These changes probably reflect historical and contextual differences in society, as well as within multiple disciplines. The following discussion highlights our study's critical results within the context of relevant historical and social movements.
In the 1970s, the most popular research topics were hospital workspace violence, hegemonic masculinity, and gender equality in workspace. In the United States context, the legal determinations and definitions of sexual harassment being newly developed in the 1970s, informed by the Title VII Civil Rights Act 1964 case law and policy guidance by the Equal Employment Opportunity Commission (EEOC; [1]). Many of the first legal cases that defined the language and legislation related to sexual harassment in the US took place in the 1960s and 1970s. In 1980, the EEOC defined sexual harassment as "unwelcome sexual advances or verbal or physical sexual conduct that unreasonably interferes with a person's job or creates an intimidating or offensive work atmosphere" [90]. Thus, from a legal perspective, sexual harassment was first defined in work discrimination, and therefore, most research during the 1970s and 1980shas focused on sexual harassment at work. This trend is evidenced in our results, with the category of Workspace (including Hospital Workplace Violence and Gender Equality in the Workplace) having the largest number of topics.
The third most studied topic of the 1970s, Hegemonic Masculinity, is vast and captures general research on men's positions of power, including ideologies and practices that uphold the status quo. Sexual harassment has been conceptualized as a phenomenon that mirrors societal-level patterns of dominance and oppression by those in power (often, men) towards those seen as "other" or "less-than" (often, women; [49]). Often, sexual harassment tends to be toward women because they are perceived as others. Similarly, individuals from groups with less power often experience sexual harassment, for example, men who are deemed as not manly enough, LGBTQ individuals, and people with intersecting identities such as women of color [49]. Thus, understanding sexual harassment as a means of maintaining historical power dynamics aligns with research on how hegemonic masculinity is enacted. Furthermore, Hegemonic Masculinity was one of the most popular topics in the last three decades (the 2000s, 2010s, and 2020s). This topic's trend analysis indicates no significant change (not hot or cold); thus, research in this area has been prevalent and will most likely continue.
Research in the 1980s also focused on workspace topics, including Gender Equality in Workspace, which is evidence of continued interest in understanding sexual harassment in the workplace. A second popular topic was Workspace Policies, indicating a shift from studying and defining what sexual harassment is to a focus on the policies that were being implemented, probably due to legislation in the 1960s and 1970s. Lastly, the most popular topic of the 1980s was Perceptions of Sexual Harassment, which shows an interest in social psychology and other disciplines on understanding attitudes and judgments of situations related to sexual harassment vignettes and scenarios.
The 1990s show a similar trend regarding popular areas of research, with topics again related to the category of Workspace, with the topics of Workplace Legal Cases and Workspace Policies as two of the most popular research topics. Similarly, there was continued interest in research related to people's attitudes of Perceptions of Sexual Harassment. The televised case of Anita Hill versus Clarence Thomas took place in 1991, and studies indicate that this event changed and influenced how media and US society discussed sexual harassment, with Anita Hill becoming a symbol for sexual harassment [91]. Interestingly, while there was widespread interest in legal and policy aspects of sexual harassment in this decade, these topics have become "cold," indicating a decrease in research in these areas in more recent decades.
The 2000s had as popular topics Perceptions of Sexual Harassment and Hegemonic Masculinity, which are in line with research in previous decades. However, this decade's unique area of interest was Youth Bullying and Victimization, with studies focused on youth's experiences of violence by peers in middle and high school. Scholars state that before the 1990s, few researchers had examined school bullying in the US. Unfortunately, school massacres in the 1990s (e.g., Columbine High School) brought bullying and other forms of violence that youth experience in schools into the spotlight in America, along with research and implementation of anti-bullying policies in the 2000s [92]. The 2010s include similar topics to the previous decades, including Youth Bullying and Victimization, Hegemonic Masculinity, and Workspace Policies.
The most popular topics from 2020 to present continue to be Hegemonic Masculinity and Workspace Policies, thus, evidencing continuity in these research areas. However, Feminism, Media, and Politics is a uniquely new popular topic. This topic includes research on media portrayals of sexual harassment, including the Hill-Thomas hearing, as well as research on the #MeToo movement. The #MeToo movement and the #TimesUp campaign revitalized discussions of sexual harassment in the US and the world. They led to renewed advocacy movements to address sexual harassment, including, as a result of these campaigns, several high-profile cases ending with consequences for harassers (e.g., Harvey Weinstein being fired and then convicted to 23 years in prison for rape and sexual assault; [93,94]). The #MeToo and #TimesUp campaigns highlight how individuals are using new technologies such as social media to connect with other survivors of sexual harassment and share their stories outside of traditional avenues (e.g., media or work organizations), and creating new advocacy strategies. Thus, the topic of Feminism, Media, and Politics shows that researchers are expanding their scholarly work to include contemporary forums and social movements. In addition to this topic, Digital Space also emerged as another new hot topic; thus, showing that research on the intersection between sexual harassment, social media, and the Internet is likely to continue in the field. As mentioned earlier, with the COVID-19 pandemic pushing millions of workers and students to use online spaces to work and study from home, this historic event will also influence and provide new research on these topics. For example, Jeffrey Toobin, a writer for the New Yorker and a political analyst for CNN, "unintentionally" exposed his genitals to his coworkers during a Zoom call [95]. Given our increased dependence on the Internet, we expect to see more research in the 2020s on these topics.
Besides discussing the most popular topics in each decade, it is worth exploring and contextualizing topics that seem to be increasing in popularity in research. The three topics in the Health Outcomes category are hot topics, including Military Trauma, Healthcare Services, and Effects of Trauma Exposure. Previously, researchers had explored the effects of sexual harassment on job-related outcomes, such as poor worker satisfaction. However, the increase in research in this category suggests a shift in the field to conceptualizing sexual harassment as potentially traumatic and an understanding that trauma can impact the physical and mental health of those who experience it [36].
Although the category of Workspace included the highest number of topics, most topics display a cold trend or no change. The only two hot topics in this category include Medical Field Discrimination and Hospital Workplace Violence. Thus, these results evidence a unique and increasing interest in understanding and studying sexual harassment in the medical field.
Domestic Violence is a hot topic that was separate from other categories. While sexual harassment has been conceptualized as gendered and sexual violence at work, domestic violence has been described as violence (physical, sexual, and psychological [96]) between partners, and is usually perceived as a "private" matter. However, while these two forms of violence pertain to different social life spheres, they are both forms of interpersonal violence and there may be a unique connection between the two. Both forms of violence can be understood as displays and attempts by those historically in power (e.g., men compared to women) to maintain control and traditional power dynamics. Our results indicate that research on this topic will most likely continue.
This study provides a macrolevel perspective of sexual harassment research across past few decades; however, it has some limitations. First, the data collection was limited to three databases (Web of Science, Scopus, and EBSCO). As such, it is possible that there were missed topics that may be present in other databases that were not covered by this review. Second, we focused on papers in the English language, so this review may be missing topics that might have been published in other languages. Third, this study is limited to research papers containing "sexual harassment" but does not capture all possible relevant keywords. Fourth, our review covers research studies primarily, but there were other document formats such as book chapters, news pieces, and opinion editorials that might be worth reviewing to gain a deeper understanding of temporal trends in research on sexual harassment. Fifth, due to the limited access to full-text papers and the time burden of reading thousands of papers, this study has analyzed abstracts. Sixth, compared to traditional literature review methods, this paper proposes a breath approach and therefore may miss nuanced findings. Future work could collect data from other databases, study non-English manuscripts, investigate other keywords and subtopics of the topics (e.g., higher education) detected in this research, include other document formats and full-text papers, and analyze topics separately to provide deeper analysis. Research databases can provide platforms for researchers to obtain the full-text format of papers in a proper format for text mining. Each of the topics (e.g., workspace sexual harassment) can be analyzed separately to provide a deeper analysis.
Despite these limitations, this is the first paper to our knowledge to undertake the task of contextualizing over 40 years' worth of research on sexual harassment. By utilizing a broad approach, we were able to capture topics that may not traditionally be considered sexual harassment simply from paper titles (e.g., bullying). Doing so will allow the field to more critically analyze spaces and populations who are experiencing sexual harassment but may have been excluded from prior literature. Ultimately, better understanding this phenomenon will help to prevent it and hopefully lead to better and safer workplaces, schools, social spaces, etc. where no one has to worry about being sexually harassed.   [112][113][114] perceptions differences victim behavior target effects sex scenarios perpetrator found T13 Workplace Harassment and Romance [115][116][117] workplace organizational work organizations employees power incivility management impact romance T14 Workplace Legal Cases [118][119][120] legal law environment court hostile discrimination decisions rights act claims