What Are the Drivers of Citations?: Application in Tourism and Hospitality Journals

: In line with the qualitative and quantitative growth of academic papers, it is critical to understand the factors driving citations in scholarly articles. This study discovered the up-to-date academic structure in the tourism and hospitality literature and tested the comprehensive sets of factors driving citation counts using articles published in ﬁrst-tier hospitality and tourism journals found on the Web of Science. To further test the effects of research topic structure on citation counts, unsupervised topic modeling was conducted with 9910 tourism and hospitality papers published in 12 journals over 10 years. Articles speciﬁc to online media and the sharing economy have received numerous citations and that recently published papers with particular research topics (e.g., rural tourism and eco-tourism) were frequently cited. This study makes a major contribution to hospitality and tourism literature by testing the effects of topic structure and topic originality discovered by text mining on citation counts.


Introduction
Many tourism and hospitality researchers have long sought answers to questions, such as "how are knowledge and theories produced, and how have they evolved to influence academic discipline?" and "to what extent has hospitality and tourism research reached maturity?" [1,2]. As a result of the constant efforts of numerous researchers, the number of related scholarly and academic papers has swiftly increased, and the knowledge of hospitality and tourism has become substantially diversified, evolving rapidly [3][4][5].
Owing to the interdisciplinary nature of tourism and hospitality, knowledge progression has been attained through the wide application of concepts, ideas, and theories in various disciplines [6]. This leads to a complex academic structure of tourism and hospitality research, creating sub-categories of diverse research topics [7]. Hence, various efforts have been made to uncover the knowledge structure by discovering prevalent research topics in tourism and hospitality research [8][9][10].
While knowledge development progresses through close relationships between research topics, the growth of academic literature has been facilitated through collaborations among researchers [4,11]. Racherla and Hu [12] posited that citations can represent "the cognitive structure of the scientific communities" (p. 1015). Xiao, et al. [13] also proposed that citations can indicate how knowledge is diffused and utilized in knowledge networks. However, the rapid increase in the quality and quantity of tourism and hospitality research has led fellow researchers to choose only selective articles for citations among numerous published manuscripts [14].
Citations are regarded as a proxy for the scientific impact of individual articles in knowledge networks or scholarly achievements [15,16]. Citations can also demonstrate the strength of associations among researchers within the academic network [17]. However, many studies report that various external factors that may not be relevant to the quality of research can also enhance the visibility of research papers to attract more citations [18].
The scientific impacts of articles or citation counts on articles are strongly associated with journals [20]. In the management discipline, the top seven journals belonging to the first and the second quartiles accounted for more than 60% and 80% of the citations, respectively [21]. Papers published in the top journals are considered good quality because of the journal's high standards and rigorous review process to screen highly impactful articles. In other words, a journal's reputation can cue article quality. Subsequently, articles published in reputable journals tend to attract the attention of fellow researchers and have great scientific impact.
Many academic journals publish special issues dedicated to a particular topic [22,23]. Articles focusing on a unique theme may further their influence and significance if they are published in a specialty journal [24]. However, specialty journals tend to have a lower journal impact in comparison with general journals, keeping researchers from choosing specialty journals [25]. Inviting papers for a special issue in reputable journals attracts high-quality manuscripts related to the particular topic because researchers do not need to compromise the journal ranking and gain the attention of fellow researchers with similar research interests [25,26].

Article Structural Attributes
Previous studies examined the effects of article structures, such as length of article, title, and keywords on citation counts [27][28][29]. Specifically, long articles may receive more citations than short ones, especially right after they are published in a peer-reviewed journal [27,30]. Long articles are perceived to contain more information than short ones [29]. Thus, the length of an article may be associated with quality [28]. The competition for journal spaces is quite fierce, especially among top-tier journals. Hence, to be accepted by journals, lengthy papers must have good quality that is proportional to their length [28].
The length of the title, which is often measured by the number of words in the title, is another frequently adopted article structure-related factor to predict citation counts. The title length and citation counts are either negatively [30] or not significantly related [31,32]. The association between the number of keywords and the citation counts indicated that papers with many keywords were likely to receive many citations [33].

Author Attributes
Author attributes are important determinants of predicting citations, and the number of authors and citation counts are positively related [34]. The number of authors may contribute to citations because the quality of articles can be enhanced through knowledge exchange among researchers with diverse expertise or through multiple proofreadings [29].
Furthermore, an article with many authors may receive additional attention from fellow researchers through personal and formal connections with coauthors [35,36].
Previous studies evaluated if authors' gender influenced citation counts [37,38]. The mixed results indicated no significant gender effect [39][40][41], high citations with maleauthored papers [37,42,43], or high citations with female-authored papers [44,45]. The author's gender effect on citation counts was mostly dependent on the unique researcher demographics and their research interests. Hence, such an association should be tested in each discipline. The recent study in the tourism and hospitality field demonstrated a clear difference in the number of citations between articles written by male and female researchers, in which male-authored papers received more citations [32].

Reference Attributes
The number of references and citation counts are positively related [46,47]. Demonstrating the significance of the study and a sound literature review is a key to be accepted in hospitality and tourism journals [48]. Academic papers rely on existing knowledge shared in previous studies to develop new theories or hypotheses, including key articles that are critical to improving the quality of papers [49]. Therefore, the number of references may serve as a quality indicator for scholarly articles because the high number of references may represent extensive literature review, which subsequently attracts fellow researchers to cite the articles [50].

Maturity of Research Topics
While articles on high-demand or general topics are likely to attract more attention from fellow researchers than less popular or specific topics, topic attributes have been understated, especially in the tourism and hospitality literature. Most academic papers contain multiple research topics that have attracted researchers in various areas [9]. For example, an article testing the use of technology in the foodservice industry was mostly cited by fellow researchers with expertise in information technology or foodservice operations; thus, Antons, Joshi and Salge [46] tested the association between the concentration of research topics and citation counts. Articles with salient core topics were more visible than those with a widespread topic distribution. The study speculated that a concentrated key topic strongly connects with the existing research, such that fellow researchers can easily accept new research topics.
Li, et al. [51] proposed "the evolution of a new subject needs to build on the knowledge accumulation of relevant subjects" (p. 80). Likewise, the maturity of the topics is specific to research subtopics, resulting in different citation behaviors. Therefore, it is also necessary to consider the topic's originality, which indicates the chronological order in which a particular research topic or method is introduced to the tourism and hospitality literature [52]. Research questions have often been developed based on accumulated knowledge in the field; hence, other researchers may pay attention to recent bibliographic coupling with the newly developed framework and sophisticated methodology [53]. While Antons, Joshi and Salge [46] found that a research paper containing the original ideas received higher citations in the management literature, the opposite was true in other disciplines [54]. That is, novel ideas that differ from existing concepts or methods are challenged and not well accepted by peers and are thus rarely cited [30]. It is also possible that fundamental theory and basic research, rather than a newly developed framework, may be valued in a particular discipline because of their scientific impact sustained over the years [14].
Many systematic review studies have been conducted on hospitality and tourism to understand the progress of research topics [9,17,[55][56][57][58]. Previous studies have examined the evolution of citations within a particular research topic [58,59] or identified individual research papers that received high citations [56]. More specifically, some studies have focused on bibliometrics-based studies in the field of hospitality and tourism [59][60][61]. However, empirical evidence regarding the role of research topic attributes on citation counts, including topic originality and concentration, is still lacking [52].
Previous studies examining the effects of research subtopics on citation counts adopted manual content analysis to discover research topics, which required a long time for data analysis and can be challenged in reproducing data due to subjectivity [14,19]. To overcome these limitations, this study tested generally adopted citation factors and research topic attributes discovered by text mining using hospitality and tourism articles.

Sample
We collected data and article-related metadata from the Web of Science (WoS) database. Journal citation reports-a tool for journal assessment supported by WoS-were used to select hospitality and tourism journals. As for the data collection date, we retrieved the most recent information of 52 SSCI journals in the category of "hospitality, leisure, sport & tourism" in 2019. Among the top 25% of journals within this category (classified as Q1), we excluded two sports-related journals and selected 12 hospitality and tourism-focused journals for the sample.
In March 2021, all full records and cited references of regular journal articles from the 12 journals were downloaded after excluding other types of documents (e.g., editorial materials, book reviews, and biographical items). Data analysis employed a 10-year time window, sampling articles published between January 2011 and the end of 2020. Regardless of the factors that this study aims to explore, articles published in the last months were excluded because they may not get enough citations. Finally, the data analysis included 9,910 papers that consisted of 735,182 cited references and 156,624 pages.

Citation Counts
Citation counts were used as a proxy for scientific impact [46]. Articles published in earlier years may have received higher citation counts than more recent papers, which is not necessarily due to their higher scientific impact but to the extended periods in which to receive citations. Therefore, the average citation counts per year were used to handle bias at the time of publication.
We took several steps to calculate adjusted citation counts. First, we recorded the publication year and month for each article. The year of publication was calculated as of the data collection date, March 2021. We calculated adjusted citation counts by dividing the total citation counts by the year of publication.

Adjusted citation counts = Total citation counts The number o f years since publication
Due to many uncited articles, citation scores were highly skewed and did not conform to the standard normal distribution. The square root of the citation scores was used to normalize the data [20,46].

Journal/Article Structural/Author/Reference Attributes
The names of the 12 journals were included as a dummy variable to compare different scientific impacts among them. Another dummy variable was created with a special issue (i.e., coded 0 for regular issues and 1 for special issues). To measure the effects of the article structure, this study accounted for the length of the article, title, and keywords. We measured article length using the number of pages and calculated the number of words in the titles and the keywords. To understand author attributes, the number of authors and that of female authors were calculated. We adopted an automated approach to match the gender of the authors in order to save time for the manual identification of author names. A gender classifier based on the global name dataset was used to perform this step [62]. Finally, the total number of references for each paper was used to examine reference attributes.

Topic Attributes
This study adopted a machine learning technique for automated text mining and a structural topic model (STM) to discover the latent research subtopics (i.e., topic structure) from a vast amount of text data [63]. To build the topic model, the optimal number of topics was set to 40 based on the quantitative index (e.g., residuals and held-out likelihood). In addition, we performed a qualitative review to compare the research topics generated from the topic modeling algorithm with those derived from previous studies that investigated academic structure [56,64]. Similar to the basic topic model, like the Latent Dirichlet Allocation algorithm, STM also generates two major outputs: (1) the list of top words with the highest probabilities of topics, β and (2) the probability of documents with each topic, θ (See Figure 1). The word-topic distribution (β) reveals the most salient 40 research topics from the dataset, and the document-topic proportions (θ) demonstrates how closely each document is related to 40 topics. STM was implemented with text data, which combined the title, keywords, and abstract of each article. Before implementing STM, text preprocessing was conducted, such as converting to lowercase, removing non-alphabetic characters and stop words, and lemmatizing. Customized stop words (e.g., study, goal, and limitation) were built for data cleaning, thus eliminating irrelevant words. Bigrams and trigrams were built with phrases that appeared more than 10 times in the corpus. Then, we conducted topic network analysis and clustering analysis to discover the hierarchical structure of hospitality and tourism research topics [9]. After comparing multiple community detection algorithms, we used a fast-greedy algorithm to determine the membership of each topic. Topic proportions generated from STM indicate the association between journal articles and topics, which are used as a proxy for topic structure. To identify articles that are highly relevant to a particular topic, topic proportions with less than a cutoff value (<0.1) were replaced by zero [65].
Hospitality and tourism research tends to take a multidisciplinary approach [66]. By considering topic centration, this study attempts to identify whether an academic article focusing on a single topic or multiple topics can receive more citations. Topic concentration was calculated using a standard Herfindahl index (HHI) with topic proportion scores [46]. The sum of topic proportions was 1; hence, we multiplied each topic proportion score by 100 before calculating the HHI.
HHI ranges from zero to 10,000. If HHI is close to 10,000, then the article has a strong focus on a single topic with an exceptionally high topic proportion with the topic. If HHI is close to zero, then the article has a diffused topic distribution, indicating that it focuses on multiple topics.
To compare the citation counts of articles on a particular topic at an early stage, we calculated its originality score. Topic originality refers to the relative order of topic discovery among articles on the same topic. For each topic, articles with topic proportions higher than the cutoff values were identified and arranged chronologically. For these articles, higher scores were assigned to articles published in the earlier year, while lower scores were assigned to the recent articles. The rest of the articles with less than 0.1% topic proportions were assigned 0. Because of the many zero data points, the topic originality score was highly skewed. As such, log transformations were performed using the original topic originality score [46].

Statistical Analyses
We employed STATA 14.2 for further statistical analyses, using descriptive statistics and pairwise correlations. All variables included in the regression analysis were standardized. The skewness of the variables was then corrected. Hence, ordinary least square regressions were applied to predict scientific impacts with structural, author, reference, and topic attributes [46].  Table 2 and Figure 1 illustrate the characteristics of the research topics discovered in this study. The dendrogram in Figure 2 depicts the hierarchical structure of the research topics and summarizes the relationships among them. The horizontal axis represents the dissimilarities between topics. The short height of the horizontal axis connecting two topics indicates that these topics have a high correlation and share high similarities, while the long axis indicates a low degree of correlation between them. Eight clusters were discovered with 40 topics, including the standalone cluster (Cluster 2; Cruise). Of the eight clusters, Cluster 4 (destination marketing) was the most popular as the subtopics accounted for 26.5% of the topic proportion. Cluster 3 (tourism planning and development), which consists of seven topics, had the second-highest topic proportion. In particular, Topic 21 (economic growth) had the highest topic proportion, meaning that much research related to this topic has been conducted.  Since the outbreak of the novel coronavirus disease (COVID-19), many researchers have shown interest in COVID-19-related issues, and many journals have launched special issues on this topic, leading to the publication of several COVID-19 studies. As COVID-19 was considered to be a major risk with detrimental effects on the industry, related papers share high similarities with previous studies examining the impact of various hazards (e.g., natural disasters or economic risk) on hospitality and tourism settings. As a result, COVID-19 research creates a topic in conjunction with previous risk studies, and the topic is labeled "risk" (see topic 15). The dendrogram shows that topic 15 (risk) is closely related to topics related to technology. This implies that much COVID-19-related research explored technology acceptance and the use of new technology during the pandemic.

Scientific Impact Prediction
We conducted multiple regression analyses to understand the association between variables and scientific impact ( Table 3). The regression results of the article effects indicated that papers with longer pages (b = 0.05, p < 0.001) had a higher scientific impact, which is consistent with previous studies [30,35]. Note: * p < 0.05, ** p < 0.01, *** p < 0.001. 1 & 2 The specific results of topic structure and topic originality are illustrated in Table 4.
When testing the author effects, having more authors and fewer female authors were found to increase citation counts, similar to the findings of Nunkoo, Hall, Rughoobur-Seetah and Teeroovengadum [32], who found that female authors tend to receive fewer citations than male authors. Articles with a more comprehensive list of references tended to have a higher scientific impact (b = 0.17, p < 0.001). Regarding the effects of topic attributes, articles with a strong focus on key research topics tended to receive more citations (b = 0.05, p < 0.001). Several topics positively contributed to the scientific impact, and we found associations between their topic originality and scientific impact. The specific regression coefficients of topic structure and topic originality for the 39 topics are listed in Table 4. Table 4 indicates the effects of topic structure and topic originality on scientific impact. The positively significant association between topic structure and citation counts implies that research papers focusing on a particular topic may attain many citations, but the negative association indicates that these research topics have not gained much attention from others. Note: * p < 0.05, ** p < 0.01, *** p < 0.001.
The topic originality coefficients indicated whether early or recently published papers among those on a particular topic were more likely to be cited by others. If the originality coefficient is positively significant, the papers published in the early stage and contained the originality tended to be cited more. If the originality coefficient is negatively significant, recently published papers may regain popularity as these articles have been cited frequently by others. For instance, the training topic had an insignificant association with citation counts, implying that these research papers collectively have not been so popular; however, the significantly negative originality coefficient implies that recent training papers tend to gain more popularity.
In Cluster 3 (tourism planning and development), rural tourism (T20; p < 0.05), economic growth (T21; p < 0.01), eco-tourism (T23; p < 0.05), and spatial tourism planning (T36; p < 0.01) had significantly negative coefficients of topic originality, indicating that articles published in recent years tended to receive more citations. The following two topics in Cluster 4 (destination marketing) had significantly negative topic originality coefficients: social issues (T12; p < 0.05) and tourist perception (T13; p < 0.05). Social issues and tourist perceptions can evolve over time and are, therefore, time-sensitive topics. As a result, more recent papers may have been favored over old papers and cited more. In the case of Cluster 6 (Technology), the risk topic (T15; p < 0.05) had a significantly negative topic originality coefficient. Among research papers focusing on risk topics, COVID-19 studies account for the majority of recent studies. The originality test results, therefore, reveal that COVID-19 papers have received a lot of attention and have been frequently cited by other researchers.
The following topics had significantly positive topic originality coefficients: technology (T27; p < 0.05) and the sharing economy (T35; p < 0.05). The results indicate that more pioneering papers on these topics and published in an early stage were cited more often by other researchers compared to recent papers. According to Park, Chae and Kwon [9], information technology research has rapidly advanced in recent years with the development of artificial intelligence and the advent of various social media sites in the hospitality and tourism setting. In addition, sharing economy research is relatively nascent compared to historical research topics, such as human resource management or cultural studies. Thus, researchers interested in these topics seem to examine papers containing original ideas to define the concept and further develop it.

Conclusions
This study aims to discover the academic structure of the tourism and hospitality literature by identifying salient research topics and the interrelationships of these topics by analyzing research papers published in top-tier tourism and hospitality journals over the past decade with multiple automated algorithms. Our findings on the predominant research topics in tourism and hospitality research demonstrate the areas of research that many researchers are interested in. By using the machine learning approach, this study was able to capture emerging and up-to-date research topics, such as COVID-19. In addition, it performed topic network analysis to discover how these research topics are correlated to progress in the research sub-categories.
In addition, different citation patterns were examined depending on the sub-categories of research topics to understand how ideas and knowledge have been exchanged in the academic network. While the academic structure can serve as a snapshot of research maturity in the tourism and hospitality literature, citation patterns can show the path through which research has evolved. For instance, papers related to human resource management have actively cited papers specific to job conflict and organizational behavior. Although training papers published over the past decade have not been very popular, recent training papers have been popularly cited by fellow researchers, implying that this topic has been growing in popularity in recent years. Similar patterns were found in the papers on rural tourism, eco-tourism (over-tourism), social issues, and risk topics since recently published papers have been cited frequently. As for the risk topic, the topic has become very popular due to the growing interest of fellow researchers in the recent major health-related risk, COVID-19. Among the papers concerning technology and sharing economy topics, pioneering papers that introduced the concept to the tourism and hospitality discipline were more actively cited. These findings imply that researchers' demands and preferences in the literature may vary depending on the research topicswhether they look for the latest articles reflecting the current social and industry issues or prefer fundamental theory in the original concept papers.
This study also investigated the effects of the journal, article, author, reference, and topic attributes on citation counts. Although these attributes are less relevant to the quality of research papers, they are relevant to citation counts; specifically, papers with longer pages, more references, and more authors have received more citations. A gender effect was also significant, showing that papers with more female authors were less cited. This was consistent with the findings of Nunkoo, et al. [67], who proposed that gender can be a latent factor of authorship and collaboration.
According to our findings, only a few research topics (e.g., technology and the sharing economy) represented a benefit due to their originality. This may reflect the characteristics of the hospitality and tourism industries, which are sensitive to the external environment and constantly require innovative ideas [68]. However, this can be a warning that the inflow of new ideas and methodology can be challenged and even discouraged. According to Alvesson and Sandberg [69], the incremental pressure on publishing in top-tier journals with high impact factors has forced researchers to conduct "gap-spotting research," which can be a double-edged sword. Although strong gap-spotting research can modify existing theories and fill the research gap, it may not bring fundamentally novel ideas or methodologies. Therefore, journal editors should monitor emerging and innovative topics to be accepted in the literature. The findings of this study can be used to alert academia stakeholders by simply reminding them that citation counts can be affected by mere formatting issues or the underlying power dynamics of the tourism and hospitality field. Based on the findings of this study, we propose implications for stakeholders.

Implications for Academic Scholars
The findings of topic attributes can represent predominant research topics in the tourism and hospitality literature, serving as a reference for young scholars and graduate students. Out of all tested factors, journal selection was most crucial to improving citation counts. Therefore, academic scholars need to choose the journal carefully. Journal reputation itself may enhance the scientific impact of articles. However, producing high-quality papers that meet top journals' standards is crucial. Another piece of advice is to aim at a special issue to improve the visibility of the manuscript. In addition to journal or topic attributes with the most significant impact on citation counts, factors related to article presentation (i.e., page numbers, word counts in the title, and keywords) contributed to en-hancing citation counts. Although these factors may not be directly relevant to the quality of academic papers, they tended to play an important role in drawing attention such that papers can be cited more by fellow scholars [33]. Our findings highlighted the importance of having comprehensive references. The significant association between the reference counts and scientific impact may come not necessarily because fellow researchers care about reference counts but because they can be the proxy of in-depth literature review [47]. In the abstract, emphasizing a research topic can be beneficial to enhance citation counts.
As the demand and popularity of research topics evolve over time [9], researchers should consider the association of citation counts with topic structures and originality. This study found that popularity and citation patterns may vary across specific hospitality and tourism research topics, demonstrating research topics with high demand. In addition, journal editors may consider having a special issue related to understudied or emerging topics to encourage the submission of articles corresponding to these topics depending on the vision of the journals [33].

Implications for University Administrators
Citation counts can be a useful measure by demonstrating how knowledge is diffused in the academic network and may address the shortcomings of simply counting the number of publications [70]. However, university administrators should be careful not to be blinded by the citation counts for the sustainable growth of the hospitality and tourism literature [30]. We found various factors that may be less relevant to research quality (e.g., author and article attribute) that can drive high citations, consistent with a previous study [71]. These findings imply that fewer citations may not be due to the poor quality of the paper but to external factors, such as the format of the paper or the low popularity of the research subject. Moreover, many researchers acknowledge that the quality of papers and journals cannot be solely evaluated by citation counts because some citations can be spurious, superficial, and incorrect [72]. Hence, the aforementioned factors that influence citation counts should be considered and adjusted to evaluate the quality of the paper and its scientific impact. As this study suggests, comparing citations of papers within a particular research subject or methods can be an alternative.

Limitations and Future Studies
This study utilized WoS citation counts because of the accuracy and inclusion of influential academic journals; however, these only capture the evaluation of professional fellow researchers in the academic field. Thus, the scientific impacts measured by adjusted citations may not capture opinions from more general groups of people, such as governments, experts, and the public [27]. In addition, as this study analyzed articles from 12 top-tier hospitality and tourism journals over a decade, the inclusion of only top journals makes generalizing the current findings in the hospitality and tourism literature difficult. Moreover, it utilized automated approaches to analyze large datasets (such asthe topic modeling approach, the global name dataset) to identify the structures of the articles (i.e., topical structures, the gender of the authors). Although these automated methods enable researchers to save time and generate consistent results, some inaccuracies in the results are inevitable. This study attempted to indirectly demonstrate the changes in citation counts by including topic originality but did not investigate the changes in citation count patterns over time; therefore, future studies should include more journals over a broader period.