Analyzing Research Trends in University Student Experience Based on Topic Modeling

This study aims to identify research trends in student experience in higher education through analyzing the topics around which research on university students’ experiences has been conducted. Using the topic modeling technique, the Scopus database for studies published up to 2017 containing the terms “student experience” and either “higher education” or “tertiary education” in their titles, keywords, and abstracts was searched. After excluding overlapping studies, a total of 1211 studies were extracted. The articles were then classified into a total of 21 topics on university student experience, including “Learning with online technologies”, “Practice at the university”, and “Diversity in college”. The results of the current study indicate that it will be possible to offer various programs to support more valuable and better student experience at the university level. Thus, this study elucidates the ways in which research fields regarding student experience have been constructed and the ways in which the main research trends have changed.


Introduction
What do students experience at universities? It is one of the important research topics in the field of higher education. Research on student experience has shown that the education and support provided by the institutions were working well from not only the perspectives of professors or institutions but also of students [1]. By analyzing the data of student experiences, it is possible to evaluate whether or not student experience is according to the institutions' intentions and whether or not the support provided by the institutions and the system in place is appropriate.
Key milestones in a student's journey from entrance into an institution to graduation, such as First-Year Experience (FYE), learning engagement, career planning and practices, internships, and degree completion, have attracted educators' and researchers' attention because of their potential to foster student learning and development [2][3][4][5][6][7]. For decades, first-year student programs have been designed based on the idea that the adjustment of first-year students is very deeply related to their continued studies and academic success. In the USA, the National Resource Center (NRC) was established approximately 30 years ago. It provides first-year student education programs and teaching plans of various universities and lists of studies on them in a listserv format to share materials on first-year experiences and student transitions [8]. Research on how to support students with special needs, such as adult learners, soldiers, working students, students with disabilities, and international students, to have effective learning and adjustment experiences has also been stressed [9][10][11][12]. • What are the main topics of research on student experience in the field of higher education? • How does the number of articles on university students' experience change over time?

Methods
The research procedures comprised the following phases: data collection, pre-processing, topic modeling, and validation.

Data Collection
The articles to be analyzed were selected in March 2018 by using Scopus, which is a citation index database provided by Elsevier. The objects were articles that included in their titles, keywords, and abstracts the terms "student(s) experience(s)" and either "higher education" or "tertiary education". Out of the articles published or scheduled to be published up to 2017, articles published in English in academic journals were chosen in terms of the format and the language. After researchers' reviewed their titles, abstracts, and language, a total of 1211 articles out of 1217 were selected as objects of analysis, excluding six inappropriate articles duplicated or written in Spanish.

Pre-Processing
The titles and keywords, the objects of analysis, extracted from 1211 articles, were combined into a text file. The words used in the titles and keywords underwent pre-processing using R, a statistical programming software. Pre-processing included the following: First, meaningless numbers and signs were excluded from the text data. Second, stopwords, that is, words in English that lack special meaning and make no distinction in the extraction of topics from texts, were deleted (e.g., demonstrative pronouns "this", "these", "that", and "those"; indefinite articles "a(n)" and the definite article "the"; verb "to be", which causes changes in grammatical person and tense; and auxiliary verbs, such as "have", "has", "do", and "does", that exist in the grammatical system without specific meanings). Third, words that had identical meanings yet were perceived as disparate by computer software due to differences in word endings caused by changes in the tense or number (singular/plural) were treated as identical. To achieve this, text mining uses a method called "stemming", where only the roots of words are extracted. The tm package, a natural language (English) processing program within R, provides the stemming function. For example, if the words "accessed" and "accessing", which are simple past and present progressive forms of the verb "access", each appeared once throughout the articles, instead of being perceived as different, they were all converted into "access" through the stemming function, and the original meaning of the word "access" was counted as having appeared twice in the articles. Out of the words remaining after the task of extracting word roots, those that needed to be treated as semantically identical, such as "health" and "healthy", too, were combined as single words.
In the end, words related to place names unrelated to research trends were excluded. Thus, a total of 2100 words were extracted from 1211 articles.

Topic Modeling
The LDA package of R was used to perform topic modeling, which is a statistical model that infers the probability of the appearance of topics assumed to be latent in documents based on document-term matrices [19]. A total of 2011 word sets that had undergone word pre-processing were converted into document-term matrices. Each column signified a single word in the 2011 × 2100 document-term matrices thus created. Words appearing twice or less throughout the articles were excluded from the matrices. Although no study has mathematically proven the effects of excluding words from matrices based on certain standards regarding the number of times each word appears, the present study deleted words that appeared twice or less after comparisons revealed that the meanings of the topics are clearer after deletion of such words.
Topic modeling is a statistical model that assumes that the collected text data represent the entire population and indirectly infers, through the collected sample data, the probability of the appearance of specific words in each topic and, through such probability, the distribution of topics in each paper. However, the number of topics latent in the document set in question cannot be indirectly inferred from the sample data. This is a disadvantage of most of the analytical methods making use of sets and classifications, and it involves researchers' subjectivity to establish in advance the numbers of sets and topics. In general, when topic modeling is used as a research method, as for determining the number of topics, there are two ways to do so: researchers' establishment, through prior research, of a hypothesis that there are several topics; and mechanical determination of the number through the concept of perplexity [20].
Since the present study focuses on discovering the types of topics addressed by a large number of articles in an explorative manner without the researchers' subjective intervention, the concept of perplexity was used to determine the number of topics. Perplexity arbitrarily divides the data in possession into training data and validation data; the appropriateness of the models created through training data is tested through the models' application to validation data, and the optimized model is selected. The concept of perplexity is often used in statistical prediction studies, such as data mining and machine learning. In general, multiple numbers of topics are arbitrarily designated; perplexity is calculated for each number of topics, and the one with the lowest perplexity value is evaluated as the best classifying the topic for the document set in question [21]. In the present study, the number of topics was designated in the range of 2-30. Perplexity for each topic was calculated, and the appropriate number of topics was judged to exist between the 20 topics where the perplexity values decreased dramatically and the 30 topics where there was no improvement in perplexity values. Because the classification of the number of topics, according to perplexity, generally and necessarily does not match humans' cognitive judgment [22], to derive more appropriate topics in terms of the contents, two of the authors reviewed the topics and compared the appropriateness of the contents multiple times until an agreement was reached. As a result, the number of topics was determined as 21.

Validation
According to the results of topic modeling calculations, ten words with a high probability of appearance were extracted for each topic. Topic names were determined in consideration of the probability of appearance for each word and the number of articles in which each word appeared. The topic names were derived through an agreement among the present researchers. To review the validity of the topic names assigned by the present researchers according to the keywords for each topic, eight doctors of education in charge of practicalities and research in the field of university education were requested to assign scores to the topic name's correctness, generality, usefulness, comprehensibility, and novelty on a five-point scale, as shown in Table 1. Correctness with a score of 3.75 had the lowest validity, while usefulness and comprehensibility with a score of 4.13 each were the aspects with the highest validity. The reliability coefficient was found to be 0.834. Correctness had a relatively low score because the topic names did not include all ten words.

Correctness
The topic names adequately describe the included words. 3.75

Generality
The topic names cover the general research topics related to student experience in higher education. 4.00

Usefulness
The topic names are useful to understand how research topics related to student experience in higher education are categorized. 4.13

Comprehensibility
The topic names are easy to understand. 4.13

Novelty
The topic names inform about the research topics related to student experience not well known before. 4.00

Topic Analysis
Using LDA topic modeling, the number of topics was analyzed according to perplexity and their meanings, as shown in Table 2. The words were listed in the order of highest probability of appearance in relation to each topic based on LDA algorithms. A word with a high probability of appearance can actually appear more times in articles concerning the topic in question, but this is not automatic. For example, even when the specific word "A" appears more in articles on Topic 1, if most words that appear together with "A" belong to Topic 2, then the probability of appearance of "A" in articles on Topic 1 becomes low. Topic modeling does not simply show the frequency of a word's appearance, it assigns each word to each topic in consideration of the relationships among words and calculates and shows the possibility of appearance for each word. Consequently, when determining topic name, a word's probability of appearance rather than the frequency of appearance was considered first. ※ Root words with probability of appearance more than 10% are in bold, more than 5% are in italic, and more than 3% are with an under bar.
For Topic 1, the possibility of appearance was high for the terms "higher education" and "social", amounting to 18.5% and 12.7%, respectively, followed by the words "participate" and "widen", which amounted to 6.4% and 4.4%, respectively. Out of a total of 55 articles on the expansion of higher education, those on "widen participation" amounted to 23 (42%). Likewise, other articles mostly concerned students' social relationships and diverse participation. The topic was, therefore, named "Widen participation in higher education".
As to Topic 2, the term "university" amounted to 19.9%, thus having the greatest possibility of appearance, but the words "transition(s)" and "first year" amounted to 9.9% and 8.6%, respectively. Out of a total of 45 articles, 35 (77.8%) concerned topics related to "transition(s)" and "first year". The topic was, therefore, named "First-year transitions".
In respect of Topic 3, the term "study" amounted to 16.4%, thus having the greatest possibility of appearance, followed by "identity", which amounted to 7.7%. Next were the words "doctor(s)" and "engineer(s)", which amounted to 6.1% and 4.5%, respectively. The top ten words included "woman"/"women" and "mature". Out of 45 articles, 26 (57.8%) concerned content related to identity, and many articles were on formation, maturity, and development of identity, such as the identities of doctoral students and of women students in colleges of engineering. The topic was, therefore, named "Student identity".
Regarding Topic 4, the terms "experience(s)" and "student(s)" amounted to 22.3% and 20.2%, respectively, thus showing a high possibility of appearance, and the word "percept(s)" amounted to 8.1%, thus exhibiting a comparatively high possibility of appearance. Indeed, 23 articles (53%) concerned student perceptions of institutions, learning, educational environments, and values based on surveys. The topic was, therefore, named "Student perceptions".
With regard to Topic 5, the term "develop" amounted to 14.0%, thus showing the highest possibility of appearance, with "curriculum", "design", and "evaluate" amounting to 8.3%, 6.9%, and 5.3%, respectively. The contents of 42 out of 51 articles (82.4%), or most of the articles, concerned the design or evaluation of educational curricula and experiences. Here, educational curricula did not simply mean formal curricula but encompassed informal curricular as well. The topic was, therefore, named "Curriculum development".
Concerning Topic 6, the term "education" amounted to 32.5%, thus showing the greatest possibility of appearance, but "nurse(s)", "teacher(s)", and "clinic(s)" amounted to 8.8%, 4.7%, and 3.5%, respectively, thus displaying a fair degree of occurrence. Amounting to 2.9%, the words "placement" and "profession(s)" were not low in the possibility of appearance either. Out of a total of 46 articles, 34 (73.9%) concerned professional development for occupations, such as nurses, teachers, and medical practitioners, as well as related education, employment, or assignment. The topic was, therefore, named "Professional education".
As for Topic 7, the term "teach" amounted to 20.2%, thus showing a high possibility of appearance, and "research" and "undergraduate", likewise, amounted to 14.0% and 11.6%, respectively. Though different from the top three words, "pedagogy", too, exhibited a possibility of appearance amounting to 4.9%. Out of 77 articles, 41 (53.2%) concerned contents related to "teach". and 31 articles (40.2%) concerned contents related to "research". When examined, the articles mostly concerned surveys on students' experiences with diverse ways of teaching at universities, teaching with research, or undergraduate research. The topic was, therefore, named "Teaching research".
In relation to Topic 8, the words "assess" and "feedback" amounted to 16.0% and 10.2%, respectively, thus showing the highest possibility of appearance. Ranking next were the terms "enhance" and "peer(s)", which amounted to 5.0% and 4.0%, respectively. Out of a total of 61 articles, 45 (74%) concerned contents related to "assess" or "feedback". The topic was, therefore, named "Assessment and feedback".
With respect to Topic 9, the words "learn" and "online" amounted to 18.5% and 11.7%, respectively, thus showing the highest possibility of appearance; "technology", too, was high, amounting to 8.4%. Even words with a possibility of appearance amounting to 3-5% were mostly related to long-distance education, such as "classroom(s)", "distance(s)", "communicate", and "virtual". Terms directly related to online education, such as "online", "technology"/"technologies", "distance(s)", and "virtual", appeared in 75 out of a total of 96 articles, thus exceeding 78%. The topic was, therefore, named "Learning with online technologies".
As to Topic 10, the term "college(s)" amounted to 11.3%, thus showing the highest possibility of appearance. Next came "community"/"communities" and "educate", both of which amounted to 8.3%, and the word "diverse" amounted to 6.5%, as well. "Campus(es)" and "program", too, amounted to 4% in the possibility of appearance. While some of the articles in this field concerned community colleges, numerous others concerned diverse students and local communities, students with different characteristics and demands, and race, gender, and disability. The topic was, therefore, named "Diversity in college".
With reference to Topic 11, the terms "international" and "higher education" amounted to 21.7% and 18.9%, respectively, thus showing a high possibility of appearance, followed by "culture(s)", which amounted to 8.6%. Indeed, 56 of the articles (80%) concerned contents related to "international", and eight out of the 14 articles on other topics were related to the word "culture". The topic was, therefore, named "Internationalization of higher education".
Concerning Topic 12, while the word "student(s)" amounted to 25.6%, thus showing a high possibility of appearance, "English", "language(s)", and "university"/"universities" amounted to 3.8%, and "life" amounted to 3.1%, thus demonstrating a fair degree of occurrence. Out of a total of 39 articles, 16 (41%) concerned "language(s)", "English" in particular. Many of them concerned communication for daily life or learning at universities. These articles often concerned international students or immigrants using English as a second language. The topic was, therefore, named "Language proficiency".
Regarding Topic 13, the word "student(s)" amounted to 34.9%, thus showing a very high possibility of appearance. Out of a total of 40 articles, 18 (45%) concerned active activities in which students participated directly. The topic was, therefore, named "Active student experience".
As to Topic 15, the words "student(s)" and "experience(s)" amounted to 21.6% and 18.3%, respectively, thus showing a very high possibility of appearance. The terms "context(s)" and "learner(s)" amounted to 3-5%, thus exhibiting differences in the possibility of appearance. However, 38 out of 75 articles were on contents related to "learning" or "learner(s)", thus mostly concerning student experience related to learning. Specifically, many articles concerned teaching and learning, and others concerned learners' reactions in specific contexts as well. The topic was, therefore, named "Student experience in learning".
For Topic 16, the word "work" amounted to 10.8%, thus showing the greatest possibility of appearance, and "employ" amounted to 5.4%, thus being comparatively high in the possibility of appearance. Out of 46 articles, 36 (78.3%) concerned the term "work" or "employ". The topic was, therefore, named "Work and employment".
With regard to Topic 17, the terms "academy"/"academies" and "student(s)" amounted to 17.0% and 16.4%, respectively, thus showing a high possibility of appearance. These were followed by "support" and "experience(s)", which amounted to 6.9% and 4.4%, respectively. Out of a total of 55 articles, 41 (74.5%) concerned academic activities, and most of the articles concerned support for such activities or students' experiences related to academic activities. The topic was, therefore, named "Academic support for students".
In respect to Topic 18, the terms "higher education" and "student(s)" amounted to 23.7% and 20.4%, respectively, thus showing a very high possibility of appearance. Next came "policy"/"policies" and "business", whose possibility of appearance amounted to 44.7% and 3.6%, respectively. However, over 18 articles (40%) concerned topics related to "policy"/"policies" or "business". In addition, there were articles concerning external "impact(s)" or public "interest". The topic was, therefore, named "Higher education policy and business".
Regarding Topic 19, the word "learn" amounted to 45.9%, thus showing the highest possibility of appearance, and was used in all of the articles. Ranking next in the possibility of appearance was "base" (8.4%) as learning strategies were mainly mentioned in the form of the term "-based learning". Words "environment(s)" and "collaborate" amounted to 6.6% and 5.6%, respectively. All of the articles were related either to designing learning according to collaborative learning, independent learning, or particular learning strategies or constructing learning environments for such strategies. The topic was, therefore, named "Learning environments".
As for Topic 20, the term "practice" amounted to 13.4%, thus showing the greatest possibility of appearance, followed by "university"/"universities", which amounted to 7.3%. The possibility of appearance amounted to 3-5% for the words "health", "student(s)", "role(s)", "disable(d)", "perspective(s)", "theory"/"theories", and "effect(s)". Out of a total of 91 articles, 65 (71.4%) used words related to students. Numerous articles concerned theory and practice regarding certain contents at universities, introduced best practices, or were about diverse forms of university students' practical training. The topic was, therefore, named "Practice at the university".

Comparison of the Number of Articles Per Topic
All 1211 articles were distributed diversely, ranging from 32 articles per topic (Topic 14) to 96 articles per topic (Topic 9). While over 90 articles were on topics such as "Learning with online technologies" (Topic 9) and "Practice at the university" (Topic 20), articles on topics such as "Student engagement and outcome" (Topic 14) and "Language proficiency" (Topic 12) amounted to fewer than 40.

Limitation
There are a couple of limitations to this study. First, this study only collected and analyzed the titles and keywords with a limited number of articles. Considering that topic modeling is based on probability, it is necessary to expand the number of related papers and additionally include more content to ensure the accuracy and appropriateness of the topics.
In addition, this study did not address how the hierarchical relationship of each topic can be structured, and how the university support for expanding student experience can be connected to each topic. Thus, a categorization scheme of the topics of higher education student experience to explore student support by the university also remains for further research.

Discussion and Conclusions
With the spread of the perception that all experiences through which students adjust to university life beyond lessons in classrooms, engage in exchange with diverse people, participate in various educational activities, and prepare for life after graduation are educational achievements supporting students' development and growth, interest in "student experience" has increased. Student experience is also important for the improvement in university education in that whether or not tangible and intangible educational programs provided by universities are properly implemented can be confirmed, and students' perceptions of and achievements regarding such programs can be grasped at the same time. To analyze trends in research on student experience at universities in terms of topics, the present study analyzed 1211 articles on student experience in the field of higher education by using topic modeling.
In the present study, 21 topics regarding student experience were defined. According to perplexity calculations in topic modeling, the number of appropriate topics amounted to 20 or more, thus showing the considerable span of research fields regarding student experience. When the contents of the topics are examined, they can be classified into first-year students' experiences with adjustment or participation in education (Topic 1, Topic 2, and Topic 3), experience related to classes or learning (Topic 5, Topic 7, Topic 8, Topic 9, Topic 15, and Topic 17), experience of exchange with other groups (Topic 10, Topic 11, and Topic 12), perceptions and achievements of students' activities and participation (Topic 4, Topic 13, and Topic 14), experience related to careers and employment (Topic 6 and Topic 16), and universities' environments and systems supporting student experience (Topic 18, Topic 19, Topic 20, and Topic 21). These can be seen as more comprehensive than the six fields mentioned by Tight [1].
When quantitative changes in articles were examined per period, articles published before or in 2000 amounted to 59, or only 4.9% of the total, but subsequent increase rates were very high, with 276 articles during 2001-2010 (22.8%) and 876 articles during 2011-2017 (72.3%). The dramatic increase in research on student experience in recent years is related to the popularization of higher education, which was limited to the elite in the past. As the thresholds of universities have lowered, diverse types of students, including women, people of African descent, immigrants, international students, working students, and senior citizens, have entered universities, thus diversifying students' educational needs. Universities have developed educational programs tailored to various learners and provide education on diverse levels, ranging from online education using technology to highly professional education based on research. In addition, the expansion of opportunities for participation, first-year transition programs, education for the establishment of identity, such as a sense of belonging and autonomy, and language support, have been provided so that learners of various types may appropriately adjust to and be mixed into the university and graduate program without dropping out. In addition, universities have continued investigations and information collection to maintain such students' learning achievements and the quality of education, such as satisfaction with services, and, through such efforts, have improved their environments and systems.
The present study has some significant implications in that it first confirmed the value and importance of topic modeling in the analysis of student experience and also found ways to improve the student experience based on the results of the student experience topic modeling. Student experience using topic modeling is very valuable and useful for efficiently analyzing various document materials related to student experience to minimize the subjectivity of researchers and to grasp hidden intellectual structures objectively. Furthermore, the increase in the number of distant college students, the increase in learners of various types and backgrounds, the expansion of educational opportunities to resolve social inequality, the increase in various interactions using social media, and the industrialization of universities have led to fierce competition among universities for the improvement in their educational quality [23]. Therefore, it is very important in that it can provide a direction for improving the student experience that can be both a challenge and an opportunity to face student support at the university level. Compared to previous studies, the results of this study show that the number of topics derived from the student experience topic modeling has increased and that the growing importance of online technology-based learning and practice at the university has become apparent. This suggests that to improve the university student experience, not only a variety of student support programs tailored to the characteristics of the individual student but also more diverse methods to facilitate their learning with online technology and experience-based practice must be provided.
Through the results of this current study, it will be possible to devise various ways of supporting students at the university level to improve individual student experiences. Since such student support is a prerequisite for the success of university education [24,25], systematic development of student support programs and systems that can have a positive effect on improving the student experience is needed along with university-level efforts. In addition, knowing how the student experience is specifically categorized and which specific parts are relatively emphasized in each university can help educators and stakeholders determine which experiences they want to focus on to drive university students' success.