Topic Modeling and Sentiment Analysis of Online Review for Airlines

: The purpose of this study is to conduct topic modeling and sentiment analysis on the posts of Skytrax (airlinequality.com), where there are many interests and participation of the people who have used or are willing to use it for airlines. The purpose of people gathering at Skytrax is to make better choices using the actual experiences of other customers who have experienced airlines. Online reviews written by customers with experience using airlines in Asia were collected. The data collected were online reviews from 27 airlines, with more than 14,000 reviews. Topic modeling and sentiment analysis were used with the collected data to figure out what kinds of important words are in the online reviews. As a result of the topic modeling, ‘seat’, ‘service’, and ‘meal’ were significant issues in the flight through frequency analysis. Additionally, the result revealed that delay was the main issue, which can affect customer dissatisfaction while ‘staff service’ can make customers satisfied through sentiment analysis as the result shows the ‘staff service’ with meal and food in the topic modeling.


Introduction
The global airline industry is facing increased competition between airlines and regions due to the expansion of the Treaty on Open Skies, private participation in airline and airport operations, and the interaircraft partnerships and mergers [1]. To survive this competition, airlines continue to make efforts to improve service quality as survival strategies. Just as the product quality revolution in manufacturing determines a company's competitiveness, in the service sector, service quality innovation is a factor that determines a company's win or loss [2]. In addition, the development of service quality is perceived as a means of securing a competitive advantage with loyal customers. However, the airline industry is not aware of the customer's needs, and the provision of quality of service is compromised. As a result, customer needs have consistently attracted the attention from scholars as a fundamental variable in customer service delivery, and should be more important to airlines to identify customer needs and provide the right quality of service [3,4].
The development of Internet technology is changing the hospitality industry, which puts customer needs first. Using the Internet, customers can communicate with businesses as well as deliver information to one another [5]. In particular, online reviews are having an important impact on the decision-making of potential purchasing customers rather than on corporate marketing activities, because online reviews left by experienced customers are recognized as objective and reliable information [6]. Therefore, various hospitality industries are using online reviews as a marketing tool. In addition, the hospitality industry has been able to use the Internet to easily access customer opinions and establish relationships with customers [7]. These changes are also bringing significant alterations to the airline industry, which puts customer needs first.
The survey method has the advantage of being able to get answers to the questions you ask. However, there are limitations such as measurement errors that can occur in survey patterns, survey terminology, response categories, and survey order [8]. Additionally, there were many quantitative studies of structured survey methods in the research methodology. However, with limitations and problems regarding on existing quantitative survey methods, changes in research methodology are required [9]. In particular, due to the nature of the airline industry, sentiment analysis and topic modeling techniques are drawing attention as a new research method to understand consumers' in-depth thoughts as online reviews and comments from customers have a lot of influence on their purchasing intentions [10]. Topic modeling is an analysis that has been actively used in recent research trends analyses to deduce topics latent in text data through observed words and derive hidden meanings to understand the overall trend. Sentiment analysis, a branch of text mining, is called opinion mining as a natural language processing analysis method that identifies the opinions of positive or negative people posted in text data [11].
Recently, there have been various studies using sentiment analysis in general products, politics, and society, including the service industry, such as movies, tourism, restaurants, and hotels, where the importance of online reviews is emphasized [12][13][14][15][16][17][18]. Accordingly, the airline-related research also emphasizes the importance of big data analysis, and although sentiment analysis of customers' online reviews is under way, it is still only in the beginning stage. Therefore, this study conducted a sentiment analysis to identify the meaning of positive and negative reviews recorded online by customers using airlines in Asia. The purpose is to derive meaningful implications by analyzing what customers feel satisfied and dissatisfied with. It also seeks to provide an opportunity to assist airlines in their management activities and decision-making process, and to provide key data and analysis methods that are useful for their management.
This paper is expanded as follows; after the introduction, the subsequent literature review presents the previous works of Asian Airlines, online customer review, topic modeling, and sentiment analysis using big data. The methodology section explains the data collection and data analysis with big data analytics. The results of this paper are presented divided into frequency analysis, word cloud, topic modeling, and sentiment analysis. The final section summarizes the study and the implications of this study both for academia and practice; the limitations and future research directions are well indicated and discussed as well.

Asian Airline
Asia-Pacific airlines are an important collective force in the international aviation market, accounting for a quarter of the total global air passenger demand and two-fifths of the global air cargo demand [19]. Boeing predicts that Asia will account for half of global aviation demand growth [20]. According to IATA [21], in terms of profitability, Asia-Pacific airlines have achieved half the profits of the global aviation industry. The year 2010 accounted for about USD 10 billion out of USD 18 billion, and USD 2.1 billion out of USD 4 billion in 2011, when rising oil prices seriously weighed down the industry's profitability. Asian airlines, which have growing global influence and importance as a group, are expected to play an active role in creating the future global air transport industry.
The Asia-Pacific region is already the world's largest aviation market. Existing airlines in Asia are creating many new airlines to reach different segment markets. Part of this attempt is in the form of a joint venture between traditional airlines and new airlines, which combines their respective influences to gain new access to the international market.
While Asia-Pacific aviation demand continues to grow, fierce competition is stifling airline revenue generation as the supply of airlines overflows into the market [22]. As Asia-Pacific airlines increase their supply, this is a very challenging business environment that puts downward pressure on prices and profits. As a result, the import growth rates of many airlines in the region are flat and profitability is hard to find. In order to present marketing implications for airlines to survive in the highly competitive Asian aviation market, the research was conducted to effectively identify customer satisfaction and dissatisfaction and present marketing implications.

Online Review
Online reviews are called electronic word of mouth (eWOM), and customers who have experienced services freely describe or choose scores, which are perceived as more reliable and objective than information provided by companies. This is why online reviews play a major role in creating images before experiencing service to customers [23,24]. Unlike general reviews, the features of online reviews are accessible 24 h a day, enabling continuous information storage in text or images [25]. Not only has the coverage become broader, but the spread of information is fast. This means that online reviews have a significant impact on service-enabled customers because they can be written and edited without time and space constraints [26]. Through online reviews, customers also share reviews of their services with others through the online community regardless of their commercial interests. Unlike one-sided advertising, it also controls customers' choices because it conveys the customer's true opinions or specific experiences they experience [27]. Various information exchanges on airline services are also taking place actively through online reviews, as airline services are difficult to review before experience and are not easy to evaluate on an objective basis, as well as making careful choices with relatively highpriced services [28]. This has led to increased information acquisition and purchase of tickets over the Internet, and airlines have recently been building promotions or marketing strategies using mobile phones to compete with other airlines [29].
Many scholars realized the importance of online reviews and did research using online reviews. Dellarocas et al. [30] have demonstrated that the metrics of online review can accurately predict movie revenue. Sotiriadis and van Zyl [31] found that online reviews and recommendations affect the decision making process of tourists towards tourism services and WOM has a significant impact on the subjective norms and attitudes towards an airline, and a customer's willingness to recommend. M. Siering et al. [32] have investigated whether user-generated content in form of online reviews can be leveraged to explain and predict the recommendation decision. Additionally, they discovered sentiment related to different service aspects also significantly influences the recommendation decision. Gutierrez and Alsharif [33] investigated the tweets mining approach to detection of critical events. Therefore, the online review would be very useful for airlines to understand their diverse customers in order to take service improvement strategies since airlines are a highly competitive industry.

Topic Modeling and Sentiment Analysis Using Big Data
Latent Dirichlet allocation (LDA) is the most popular topic model, which is a method for analyzing a large set of documents. The basic idea is that documents are represented as a topic distribution where each topic is characterized by a word distribution. p(z|di)) is the topic probability density function for document i, and p(w|zi,j) is the word probability density function for the topic assigned to the word in document i. Given this distribution, LDA creates a new document through the following generation process: for word in the document: Choose a topic zi,j~Multinomial(p(z|di)) Choose a topic wi,j~Multinomial(p(w|zi,j)) Depending on what you read from the text, text data analysis can be largely divided into two categories: topic modeling and sentiment analysis. The topic modeling analysis refers to a series of techniques that identify what text deals with, and the sentiment analysis refers to a series of techniques that identify the emotions or sentiments that appear in the text [34]. Topic modeling is a technique that extracts and suggests potentially meaningful topics from a great number of documents based on a procedural probability distribution model [35]. A great number of studies were conducted on various types of unstructured text documents, including SNS and online reviews, through topic modeling. Table  1 lists studies using topic modeling. 2020 Text mining approach to explore dimensions of airline customer satisfaction using online customer reviews 55,000 reviews covering 400 airlines and passengers from 170 countries analyzed using latent Dirichlet allocation (LDA) model, and identified 27 dimensions of satisfaction. Comparisons of service quality perceptions between full service carriers and low cost carriers in airline travel This paper proposed a new topic modeling method called Word2vec-based Latent Semantic Analysis to perform an annual trend analysis of blockchain research by country and time for 231 abstracts of blockchain-related papers published over the past five years.
Sun, L.; Yin, Y. [40] 2017 Discovering themes and trends in transportation research using topic modeling This paper applied a LDA model on article abstracts to infer 50 key topics. We show that those characterized topics are both representative and meaningful, mostly corresponding to established subfields in transportation research. Sentiment analysis is also called opinion mining as one of the text mining analyses that extracts consumer emotions, opinions, attitudes, etc. Due to the recent development of Internet media such as SNS, e-commerce, and online communities, text data containing subjective elements is flooding online [41]. As a result, the importance of emotional analysis was highlighted as the user's sensibility extracted from text data was actively utilized in the enterprise's marketing [42,43]. Sentiment analysis establishes a sentiment dictionary consisting of emotional words and polarities indicating the degree of positivity and negation of words, and quantifies emotions using these sentiment dictionaries. A set of words, such as a sentiment dictionary, is very important to derive accurate sentiment analysis results. Liu [11] built Opinion Lexicon, which consists of English, to perform sentiment analysis by extracting 2006 positive and 4783 negative words over the years. In addition, Wibe et al. [44] created an MPQA sentiment dictionary that delicately defines emotion and sensitivity according to the purpose of the emotional vocabulary appearing in about 10,000 sentences. Esuli and Sebastiani [45], based on the existing set of WordNet synonyms, developed the SentiWordNet by distinguishing between three levels of sensitivity: positive, neutral, and negative as a result of the classification of semi-supervised.

Materials and Methods
Online review data posted on Skytrax (airlinequality.com) [46], the world's largest airport and airline service assessment site, was collected to provide an empirical analysis of this study. This work explores the latent meaning through the results of various text mining techniques, focusing on online reviews left by experienced flying customers on the Skytrax. This has released well-known online reviews of the airline industry in terms of reliability and recognition to ensure objectivity in evaluating airline service quality. An annual Airline Customer Satisfaction Survey was used by the Skytrax to select targets for data collection. Among the World's Top 100 Airlines selected through the Customer Satisfaction Survey, airlines in Asia were selected for the study of airlines based on the airline.
The collected data was analyzed using the R program, which is an open source program. The procedures performed in topic modeling consist largely of three stages. First, collect the data to which topic modeling will be applied. In the second step, preprocessing and morphological analysis was performed to transform unstructured data collected into data suitable for topical modeling. The last step is data analysis. Frequency analysis of words derived from morphological analysis was conducted and word cloud was visualized [33]. Topic modeling and sentiment analysis were also performed by converting unstructured text data into a structured form, Document-Term Matrix (DTM). The detailed research procedures were carried out as shown in Figure 1 with three systems: data collection, text mining, and data analysis. The TM library was used as a preprocessing stage of the data. The collected online review data is organized into excel files, and only text data is converted to pdf files. A library was installed to enable the use of pdf files, assigning the object name to 'asia_text' and preprocessing it. Function packages required for preprocessing and preanalysis steps utilize stingr, tm, tidytext, tidyverse, and dplyr. First, the stripping white space was replaced with one blank that appeared more than one in a row. Additionally, because English words have upper and lower case letters and can be analyzed separately, they are unified into lower case letters. Meaningless number expressions were removed, and sentence codes and special characters were removed. Finally, stopwords were removed. There are two types of stopwords dictionaries that can conveniently remove English words from the R program: 'en' and 'smart'. Among them, 'en' contains 174 words of words, and 'SMART' contains 571 words [47]. Therefore, the 'SMART' dictionary of words was used to remove it from this study.
Preprocessed data should go through the process of transforming data structures to proceed with topic modeling analysis during the data analysis phase. To this end, the data were transformed into documents and word matrices using the 'DTM' function. The final step is a text mining analysis step, where we performed a morphological analysis of the text using refined documents with term removal earlier. Specifically, a tokenization process was used to treat words as one word using stemming words. Morpheme is a basic analysis of the word or morpheme, the smallest unit of meaning. It uses grammatical rules or part of speech (POS), named entity recognition, spelling corrections, and word identification techniques. The combination of the morpheme analysis function commands used in this study is as Figure 2. Only the top 100 words were derived through frequency analysis. We conducted word clouds of the top 100 derived frequency-rank words. For word cloud visualization, the library functions 'wordcloud' and 'RColorBrewer' were used. We used statistical text processing techniques to estimate the probability of the emergence of topics assumed to be latent in the entire document by conducting a topic modeling analysis through structurally modified documents. In this work, we used the most widely known topic modeling latent Dirichlet allocation (LDA) model method. Emotional analysis used a way to derive positive and negative words through emotional words left by customers in the entire document.

Word Cloud
If the frequency is calculated by item of the word after the preprocessing process of the document, various visualizations can be made by utilizing it. This process of visualization is a universally used method, allowing more intuitive representation of the subject and characteristics of the document. The word cloud provides a more intuitive representation of the document's characteristics by visualizing the corpus in proportion to its frequency [48]. The word cloud is a technique for visualizing top words, and keywords that are hard to see at a glance in the table have the advantage of visualizing important words right away. Figure 3 shows the result of the word cloud. In the word cloud, large keywords can be judged to have importance or meaning as words that are often mentioned in reviews. Asia Airlines Review Word Cloud analysis shows that excluding 'flight', 'seat', 'service', 'airline', 'food', and 'time' are more frequently mentioned in online reviews of Asia Airlines. It can be determined that the analysis data as a whole has significant meaning. This means that customers value the airline's brand, service, in-flight meals, and speed when choosing an airline.

Topic Modeling
Looking for the key topics shown in the entire document, the entire document of the customer's English online review using Asian Airlines was classified by subject, and the researchers judged that the most appropriate topics were classified into six. The topic modeling method is for the researcher to repeat the number of topics several times and select the number of topics classified as the most descriptive group among them. That is why the subject group extracts the number of topics that they believe can best describe the entire document.
The results of the topic modeling analysis are shown in Figure 4. Among the words derived from each graph of six topics, the high beta value has the most important meaning in that topic, and the researcher sets a topic name to describe the keywords contained in each topic. Topic 1 was a representative theme of 11 keywords and is named 'In-flight meal' which can be explained by words such as 'flight', 'poor', and 'dessert'. Topic 2 was named 'Entertainment' with keywords like 'good', 'great', and 'entertainment' which can represent the inside of the plane. Topic 3 was named 'Seat class' with keywords such as 'business', 'class', 'upgrade', and 'economy' which were found to have important meanings, indicating the rating of seats on board. Topic 4 was named 'Seat comfort' as its subject name, including keywords like 'seat', 'plastic', 'business', 'bed', 'space', and 'comfortable'. Topic 5 was named 'Singapore Airlines' as the subject name for 'singapore' keywords and 'airlines', 'flight', and 'service', which represent the very large beta value of the difference. The last topic 6 was named 'Staff service' because of the difference in importance values such as 'cabin', 'crew', and 'food'. As shown above, the topics latent in full-text data of the Asian Airlines' Online Review were grouped into six topics in total. Customers with experience using Asian Airlines can judge that the reasons for choosing to use Asian Airlines are 'In-flight meals', 'Entertainment', 'Seating ratings', 'Seating comfort', and 'Staff service' which are more likely to affect their purchasing intent, and especially those using Asian Airlines are very much in favor of 'Singapore Airlines'.

Sentiment Analysis
Sentiment analysis refers to a technique that classifies or quantifies emotions in text and turns them into objective information [49]. Humans use language to communicate their thoughts and feelings. If the topic modeling introduced earlier was a text mining technique that identifies the "target covered by text", the sentiment analysis is a text mining technique that estimates the "attitude contained in text". Just as the topic modeling extracts words that embody topics assumed to be inherent in the text and estimates topics, sentiment analysis also estimates the feelings inherent in the text [11].
The R program used for emotional analysis in this study is a globally accepted tool and can analyze various languages. However, it should be noted that the packages used to analyze each language are different. In the case of English, there are sentiment dictionaries and stopword dictionaries that are available to the public for sentiment analysis. This study used Opinion Lexicon, which can classify emotions as positive and negative among various sentiment dictionaries, using the 'tidytext' library for sentiment analysis of English text in R program. As a result of the analysis, 16 words were derived to indicate positive and negative.
The results showed that words expressing negative emotions, such as in Figure 5, were shown as 'poor', 'bad', 'problem', 'difficult', and 'delayed'. Therefore, it can be determined that when there is a problem, there is a negative emotion when there is a delay in service or a delay in time appointment. Words that expressed positive feelings included 'good', 'great', 'fantastic', 'excellent', and 'amazing'. These are keywords that express satisfaction after using them. 'delicious' is a word that expresses positive feelings when the airline's in-flight meal tastes good, and customers value the taste of food among what they expect from the airline. In the case of 'clean', it is the customer's desire to be clean when using the plane. As such, maintaining cleanliness can be seen as a positive emotion for customers. The word 'smiling' was derived as a word for positive emotion. This confirmed that the flight attendant's smile had a positive effect on the customer in the airline crew's service. This can be said that it is important for airlines to focus on smile education in the training of flight attendants in the future.

Discussion
This study used topic modeling and sentiment analysis of big data analysis to identify the needs of customers using Asian airlines as the market size of Asian airlines has become larger. By analyzing online reviews written by customers who have been experiencing Asian airlines, we explored the factors that influence the customer's intention to purchase using Asian airlines in an exploratory approach.
Based on the results of this study, the following theoretical and practical implications were performed. First, customer needs were more clearly identified through text expressing customer opinions and feelings using online reviews from Asia Airlines to compensate for the limitations of the survey methods undertaken in many previous studies. Second, if you look at recent research trends, prior research shows various attempts to analyze big data by utilizing topical modeling and sentiment analysis. In the field of tourism and hospitality services, its utilization has also been increasing recently. In this study, customer feedback could be derived in a variety of ways by applying it to customer online reviews in the airline sector. This will be used as a marketing foundation that can be used actively in the airline service sector in the future. Third, it attempted to access the data in depth by extending it from traditional methods of analysis. This can be done by methodological understanding to establish an expanded research plan in the future.
The following are practical implications. First, frequency analysis and word cloud analysis show that among the regions where many Asian airlines operate, Singapore, Bangkok, Guangzhou, Shanghai, Hong Kong, and Beijing are frequently used and highly preferred. This could increase customer utilization if airlines use this part for marketing when promoting new flights or routes. Second, among the services provided by airlines to many customers, we could see that the comfort of the seats, the delicious in-flight meals, and the diversity of the seat ratings were more favorable and positive for customers to choose Asian airlines. Some low-cost airlines are included among Asian airlines. However, away from the image, we could see that it was important to provide various seat upgrades and diversity in seat ratings using appropriate prices. Third, the topic modeling results confirmed that customers were very interested in Singapore Airlines among Asian airlines. In the same vein as the previous analysis, we demonstrated that there are in-flight meals, entertainment, seat ratings, seat comfort, and employee services as factors that affect customers' use of Asian airlines. It is necessary for Asian airlines to pursue diversity that will allow customers to choose from a variety of needs, using marketing to improve customer-centric services in the future.
Finally, sentiment analysis refers to negative expressions about factors that make customers feel less than expected, which negatively affects future re-use of Asian airlines. Furthermore, it will be an obstacle to developing the airline market. The most important part of it is speed. A systematic service management system must be established and operated in order to achieve the goal of services to be delivered quickly. There should also be a system that can quickly grasp the needs of customers and meet them quickly. Service training should be actively encouraged for employees to maintain consistency in service delivery. Customers expect a comfortable, clean and delicious meal to be maintained. The smile of the employee also has a positive effect on the customer, so this is also a part of the need for employee service training.
Recently, more and more studies have been undertaken using big data analysis techniques in the field of hospitality. In addition, various attempts are being made away from the existing survey methods. Representatively, text mining techniques are actively used. Although quantitative research has been mainly used, studies are actively underway to predict the future using exploratory methods while attempting to analyze using text online. Many of them use websites or social media data. In this study, it is meaningful in that the reviews left by experienced customers are derived through exploratory methods to uncover insights, predict the future, and present directions to move forward. By attempting to analyze online review data in the airline sector through topic modeling and emotional analysis, a recently actively researched text mining analysis technique, it is meaningful to present diversity in research methods in the future [33,50]. This study derives meaningful implications by analyzing what customers feel satisfied and dissatisfied with. It also seeks to provide an opportunity to assist airlines in their management activities and decision-making process, and to provide key data and analysis methods that are useful for their management. This study has the following limitations. This study was conducted based on online reviews. There is no demographic information that many researchers point out about online reviews. Studies show that this is similar to or superior to traditional sampling, as the number of Internet and mobile users has increased in recent years, and the sample of Internet users is gradually becoming a whole population. However, since the limitations have not been fully resolved, future research may develop into a more feasible study if additional user information from the demographic site can be collected and utilized.