Sentimental Approach to Airline Service Quality Evaluation

: This paper focuses on the analysis of traditional methods of service quality evaluation and represents a new sentimental approach to airline service quality evaluation employing user-generated content. It identiﬁes aspects of airline service that passengers react to positively or negatively using the word cloud method—a basic straightforward exploratory analysis tool. The aim is to introduce an approach that can be implemented using freely available analytical software tools and freely available data. As a case study, authors evaluated selected airlines’ service quality using sentimental analysis of user-generated content. The research relied on sentiment analysis of Twitter posts related to selected airlines’ service quality. The paper describes how Twitter can be used for data mining, sentimental analysis, and airline service quality evaluation. The authors analysed over 30,000 posts related to the service quality of Ryanair, Southwest Airlines, American Airlines and KLM and proposed two types of word clouds (for each individual airline) which allow more informed decisions about enhancing the service quality of selected airlines. Compared to rather expensive traditional methods of airline service quality evaluation, such as onboard surveys of airline passengers or on-site surveys of passengers at airport departure gates, the key advantages of this new approach are the availability of free data and free analytical software tools. Moreover, this approach allows analysis of the service quality of competing airlines and, thus, provides internal opportunities for comparison. The results contribute to the literature by clarifying how both positive and negative passenger feedback impacts airline service quality and airline product planning.


Introduction
Most people who live in developed countries leave an electronic trail behind them, either unintentionally or intentionally.These traces are carriers of valuable data and can be used for various purposes.Data could be mined from various fields such as machine translation, email spam detection, information extraction, summarization, medical, question answering, etc. [1].There have been a number of studies on automated extraction of sentiment from the text.For example, Ref. [2] used movie review domains to experiment using machine learning techniques.However, as the performance of sentiment classification is based on the context of documents, the machine learning approaches have difficulties in determining the sentiment of text if sentiment lexicons with contrast sentiment are found in the text.Besides using machine learning techniques, natural language processing (NLP) techniques have been introduced.NLP defines the sentiment expression of the specific subject and classifies the polarity of the sentiment lexicons [3].Natural language processing has recently gained much attention for representing and analysing human language computationally.
It is obvious that the emergence of real-time information networking platforms such as Twitter (Twitter was rebranded to X in July 2023, but since our research was performed before this date, the original name Twitter is used in the paper) has led to the development of an unmatched public collection of viewpoints about all relevant worldwide entities, thereby interfering and affecting human lifestyle [4].Twitter, as a popular microblogging service, allows users to post tweets, which are status messages with a length of up to 140 characters [5].Twitter may be a great platform for opinion generation and presentation, but it also presents new and unique obstacles, and the process would be incomplete without capable tools for assessing those thoughts to speed up their consumption.The best approach over time has proven to be using sentiment analysis tools to identify individual attitudes and emotions [6], including those attitudes related to passenger choice of airline.
A passenger's choice of airline is influenced by numerous factors, such as the number of available flight frequencies, ground access travel time, airfare, departure schedule from origin airport, arrival schedule to destination airport, etc.In addition, passenger choice behaviour is influenced by a strong airline satisfaction concept [7].Passenger satisfaction depends on the indicators of airline service quality, such as airline reputation, airline safety, convenience of airline schedule, airline punctuality, availability of non-stop flights, quality of the in-flight entertainment system, quality of on board service (food and drinks), aircraft type, friendliness and helpfulness of flight attendants, availability of well-trained staff and availability of a useful frequent flyer program [7].As the standard of living increases, so do the demands of the travelling public on the quality of services [8].Evaluation of service quality presents intrinsic complexity aspects related to the nature of services [9].Service quality evaluation in air transport appears to be more difficult, as regards the other public transport modes, because of the diversity of the services offered by the airlines and the services provided by the companies managing the airports [8].
However, Salas [10] suggests that 71 percent of passengers (out of more than 10,000), surveyed by the Aviation Passenger Tracker 2021, stated that cabin cleanliness is a crucial factor to determine their airline choice to fly in the aftermath of the coronavirus (COVID- 19) pandemic.As a result of these considerations, air carriers are delivering high-quality services while also preserving their essential services, keeping their expenses low and generating profit.Service quality differentiates airlines, and their uniqueness attracts and helps to retain frequent passengers and promote loyalty to their brand name.
Most airlines worldwide understood the necessity of reassuring passenger satisfaction by the outstanding quality of their services [11].The perception of airline service quality differs based on the situation.Airline passengers interact with airline employees, online and in person; while booking tickets and when checking-in, they talk to gate agents and flight attendants, and often interact with other passengers.These human interactions have a certain impact on their whole flight experience.Similarly, they encounter the technical equipment, their seats and on-board services, e.g., meals.All of these interactions play a role in how the airline service is perceived and later evaluated.Traditions and airline product planning also impact the image of services.Service quality is difficult to measure, as it always depends on passengers' perceptions, and airlines do not always understand passenger needs and wants.Insight into the passenger experience allows airlines to improve in areas that need advancement and to keep up to date with passenger requirements.
However, a problem for some airlines, especially smaller ones, may be the limited financial resources available to measure passenger satisfaction with the quality of their service through expensive and often time-consuming questionnaire surveys.Therefore, this research aims to explore the extent to which freely available sources of data, e.g., user-generated content (UGC), can be used to evaluate airline service quality.Additionally, this paper aims to develop a new sentimental approach to the quality of airline service evaluation (SASQUE), which is independent of the size of an airline's budget, and can be implemented using freely available analytical software tools and freely available data.
Chaffey [12] stresses the fact that 4.6 billion people, about half of the Earth's population, use some sort of social media.He considers UGC to be an important source of airline service quality data.Despite the recent controversies surrounding the change of Twitter ownership to Elon Musk, Twitter remains one of the most popular forms of media.Every second, approximately 6000 tweets are added, equal to around 200 billion tweets per year [13].
Twitter had more than 300 million active users as of 2022 [14].Putting this into an everyday context, the authors of this paper reveal the potential of sentiment analytical tools in distinguishing how positive or negative the narratives of a specific group of tweets are.The main conclusions and some key recommendations and limitations are also provided.Airlines can benefit from tracking public opinion in real time and understanding passenger perceptions of airline service quality.
User-generated content analysis is an inexpensive and efficient way to gain insights into passenger satisfaction with airline services.Compared with the traditional research on service quality methods, this approach is less expensive and much less time-consuming.Social media offer freely available and almost unlimited data sources.Alternative approaches based on the digital footprint analysis generated by passengers online as blogs, specialised internet sites and social media, among others, are becoming new database substitutes.
In our paper, we analyse the traditional methods of airline services' quality evaluation and user-generated content analytical methods.Our research will be based on sentiment analysis of Twitter posts related to selected airlines' service quality.We will focus on the service quality of four airlines, ranked among the top 10 airlines in the world by scheduled passenger kilometres, as of 2022.An advanced Twitter scraping Intelligence Tool (Twint) will be used to scrape tweets from Twitter.We propose two types of word clouds (positive and negative), to represent word frequency, that give greater prominence to words that appear more frequently on Twitter accounts of selected airlines.
Multiple traditional methods of service quality evaluation are covered in this paper (Section 2), followed by a theoretical background of the SASQUE approach to airline service quality evaluation (Section 3).The materials and methods for a sentimental approach to SASQUE are described in Section 4, while results are detailed across a sample of selected airlines in Section 5, including coverage of the core advantages of using the SASQUE approach in an airline context.

Methods of Airline Service Quality Evaluation
The maxim 'you can't manage what you can't measure' ranks high on the list of ideas attributed to the management specialist Peter Drucker [15].To measure and evaluate service quality, the researchers created multiple tools, each of which use different rating systems to classify and rank service quality.

Traditional Methods
The traditional post-service rating should happen after the service or flight.The respondent (passenger) still has the experience freshly in their memory and can give an accurate response.However, in the case of airline services, most of the studies analyse: (i) data collected at the boarding gates that refer to a previous flight because passengers have not flown yet; (ii) data collected through questionnaires distributed at airport before the flight and picked up immediately after the flight; and (iii) data collected through online surveys where passengers express their opinions about their latest flight [16].

Airline Customer Review Platforms and Applications
Airline quality surveys can also be conducted online, via a phone app, in person, on paper or by a phone call.The rating can be conducted on a variety of scales.However, cultures differ in how they rank their experiences.People from individualistic cultures, such as North Americans, are significantly more likely to choose the extremes of the scale.On the other hand, those from collectivistic cultures, for example those from Japan, usually try to avoid the ends of the scale [17].In the next section, we provide an overview of traditional service quality evaluation methods.The overview is not exhaustive, as many, especially large airline companies, use proprietary systems that are not publicly available.
The Key Surveys work with two different types of surveys: one-off studies, and regular, high-volume customer satisfaction surveys.In return for participating in surveys, customers can participate in a lottery with a chance to win free air tickets.One of the benefits of online surveys is that questions can be based on previous answers.For example, if a passenger rates their experience poorly, the next question will ask about the details and reasons for this rating [18].
In 1988, a new method to rank service quality from the customer's point of view, SERVQUAL, was developed [19].It is sometimes referred to as the GAP or RATER model, which allows customer expectations and perceptions to be captured and depicted.SERVQUAL defines the expectations, describes the perception of services and shows the difference between the expected and delivered service quality [20].Using SERVQUAL also allows the identification of weaknesses and differences between the service quality of various airlines [21].Moreover, this method enables a researcher to rate service quality using indicators such as delays, check-in, baggage services, quality of the reservation service, employees' willingness to help and staff behaviour towards delayed passengers [22].
Another method of service quality evaluation, using an industry-specific scale in a variety of settings, is AIRQUAL.The advantage of AIRQUAL is that it allows consideration of both objective and subjective variables.Objective variables could be airline tangibles, terminal tangibles, personnel, empathy and image, while subjective variables include factors such as perceived service quality and customer satisfaction.The model considers cultural differences and passenger satisfaction with airline service quality, but also factors such as perceived empathy from airline employees [23].The AIRQUAL model could be used to identify gaps with airline tangibles, such as the quality of equipment on the aircraft, or suggest operational recommendations on cabin crew training and qualification changes [24].The AIRQUAL was used in the case of Indigo airlines by studying passenger's trip advisor reviews regarding the low-cost commercial airline service.Ref. [25] analysed 1777 passenger reviews, which were classified, to uncover sentiments for five dimensions of airline service quality.
What is slightly different to AIRQUAL is the AIRSAT passenger satisfaction benchmark survey, developed for airlines by the International Air Transport Association (IATA) [26].AIR-SAT is capable of measuring passenger satisfaction in more than 80 key performance indicators (KPIs) across the entire journey of passengers, both on ground and in flight [27].
Since 2012, IATA has also regularly carried out the Global Passenger Survey (GPS).The survey offers factual and in-depth information about the preferences and behaviours of airline passengers and airline service quality.In 2021, the survey involved 13,579 respondents from 186 countries.The questions and results are not concerned with one airline, but instead, all airlines in the sample collectively.This makes the results of GPS a potential source to help inform future decisions around adapting to overall market changes [28].
Skytrax, a UK-based consultancy company, analyses about 500 to 800 products and services offered by airlines and airports, and announce Skytrax Airline and Airport Awards annually.Skytrax focuses on services offered at the airport by airline staff and throughout the whole flight [29].Airport services' quality data extracted from Skytrax were used for sentiment analysis by [30].The World Airline Survey was established in 1999 and is overseen by Skytrax.More than 350 airlines are included in the survey results annually.Airlines can promote the survey on social media or through the airline's website.The main part of the survey consists of questions regarding every phase of the passenger travel experience, such as waiting times at check-in, seat comfort, toilet cleanliness, baggage delivery, etc. [31].
'Air Travellers in America' is a survey performed by Ipsos on behalf of Airlines for America (A4A).The survey questions are about past experiences, but also about topics airlines consider to be important for their future decision making.Some of the questions cover issues such as acceptable waiting times at U.S. Customs; the use of online boarding passes; means of transportation to the airport; preferred payment methods; but also, customers' personal opinion about the importance of airlines being committed to sustainable practices [32].
The APEX Official Airline Ratings™ is an airline rating system based solely on passenger feedback.It is based on certified neutral, third-party passenger feedback in partnership with the travel-organising application TripIt, from Concur.APEX is a non-profit organisation who, in conjunction with the International Flight Services Association and Future Travel Experience, provides a full range of the end-to-end travel experience [33].
The J.D. Power North America Airline Satisfaction Study measures passenger satisfaction in North America from ticket reservation, through the whole flight experience, to baggage claim.It measures customer satisfaction in various categories based on the passengers' cabin classes [34].
TripAdvisor is an online global travel guidance platform that focuses on online advertising and travel guidance.Using TripAdvisor, customers can rate different kinds of businesses including airlines.They can share their experience by leaving a comment on the TripAdvisor site.Therefore, potential customers can make more informed travel decisions, for any trip type [35].
Table 1 shows types of assessment methods and data collected by individual platforms and applications.

Traditional Evaluation Methods
The Classification and Regression Tree (CART) method allows the identification of the characteristics which mostly influence overall passenger satisfaction.CART represents a useful and powerful tool, being capable of automatically searching for the best predictors and threshold values to classify the dependent variable.It allows the identification of the most critical factors that should receive more attention, particularly from the airlines' perspectives [16].
To highlight the practical implications obtained from the CART approach, the Importance-Performance Analysis (IPA) could be used.The final aim of the IPA is to identify service improvement priorities, and this objective is achieved by placing service attributes into four quadrants based on their Importance and Performance values [36].
Every service evaluation method has its strengths and weaknesses.The advantages of traditional methods, among others, include a wide choice of well-established tools with a long record, which allows the evaluation of various parameters and trends.In addition to the methods described in this section, there are many other approaches to evaluate service quality.Applying traditional methods needs to effectively balance the costs of administering the survey with the expected results [37].Unfortunately, traditional methods and data sources are not usually open to the research community, or data have to be purchased [38].For that very reason, the number of research papers in this area is rather limited.The best practice, which relies on customer opinion, is to use a combination of methods.By doing so, every aspect of the provided service could be measured, analysed and evaluated.

UGC Analytical Methods
Our research focuses on the evaluation of airline service quality using UGC.One of the major strengths of the use of UGC is the practically unlimited source of data that can be used.Any text or comment written on social media, blogs or forums can be analysed.Another advantage of this method is the lack of human interactions.It is not subjected to empathy, as are face to face interviews, and there are no strict questions, and customers are free to elaborate on their preferred topics.An adequately programmed software tool allows for simultaneous data collection and sentimental analysis of a certain topic.Thus, the analysis of service quality always covers current affairs.This is best applied when a company is releasing a new product or service.
The advent of Web 2.0 technologies has enabled the efficient creation and distribution of UGC, resulting in vast changes in the online media landscape.The proliferation of UGC has made a strong impact on consumers, media suppliers and marketing professionals while necessitating research to understand both the short-and long-term implications of this media content [39].Social media not only creates a virtual community that links people around the globe through many websites such as Friendster, YouTube, Facebook, Twitter, and Instagram, but, also, many customers prefer to purchase through social media, which could offer a direct interface and allows individuals to read feedback known as UGC from other buyers [40].The trends of research on this comparatively new and rapidly developing subject are systematically discussed and desiderata are identified.The UGC is approached by scholars from a variety of perspectives.Latent Semantic Analysis, a text mining and categorisation technique, was applied to analyse online user-generated airline reviews of over 5000 passenger reviews for 50 airlines from the online TripAdvisor website [41].
Investigations by [42] of websites containing UGC of more than 12,000 hospitality and tourism consumers served as an additional source of information that travellers consider as part of their information search process.This study [42] appears to be one of the few investigations that captures the perceptions of the travel consumer and the way they relate to the information value associated with Web 2.0 sites.
The study [43] uses content analysis to identify the types of content provided by airlines on their official Facebook pages and the extent of services offered (customer service, flight booking applications, etc.).It focuses on the Facebook pages of the 250 largest airlines, by number of passengers.The study showed that airline FB profiles contain limited information and are not substitutes for airline websites.The major determinants of whether an airline operates a FB page are the airline size and its business model [44], which provide a classification of UGC in social media and group applications into more specific categories by characteristic: collaborative projects, blogs, content communities, social networking sites, virtual game worlds and virtual social worlds.An empirical study to identify the impact of online UGC reviews on business performance using data extracted from a major online travel agency in China showed that traveller reviews have a significant impact on online sales, with a 10 percent increase in traveller review ratings, boosting online bookings by more than five percent [45].
Ref. [46] used Twitter, where UGC was leveraged to identify possible service improvement areas.A Twitter dataset of 949,497 tweets was analysed from the four-year period from 2018 to 2021 for 100 airports-with the second half falling under the main COVID period.The Latent Dirichlet Allocation (LDA) method was used for topic discovery and the lexicon-based method was used for sentiment analysis of the tweets.The COVID-19related tweets reported a lower sentiment by passengers, which can be an indication of the lower service level perceived [46].Twitter and its UGC was used to analyse sourcing and verification practices on Twitter during the Brussels attacks in March 2016.Results showed that sourcing on Twitter has become a global phenomenon.During the first hours of the attack at the Brussels airport, journalists relied on UGC [47].
The use of artificial intelligence (AI) or 'opinion mining' to locate, extract and interpret blog content, and to cut down time and the costs of research was demonstrated by [48].However, this method is still developing and is not completely reliable, because technologies and AI are not advanced enough to completely comprehend all nuances of human communication.Even so, precise sentiment analysis of text in multiple languages will be possible in the near future.
Table 2 summarises the main differences between traditional research methods and Twitter UGC.It is obvious that the new method cannot replace the traditional method, but complements it appropriately.Information obtained from individuals who expressed their views on the specific platform Typically, the interviewers ask passengers their opinions about the service before their departure or during the flight Typically, the opinion is expressed after the end of the service Major parts of studies analyse data collected before the flight departure, even if referred to a previous flight [8] The result usually refers to the service recently performed and fresh experience It is easier to collect only one kind of opinion, avoiding fatiguing the respondents with many questions The range of questions is not specified but not limited either Allow service quality evaluation at the specific company only Enable to evaluate not only a specific company but also comparison of results at various companies

Sentimental Approach to Airline Service Quality Evaluation-Theoretical Background
Sentimental analysis is the process of computationally identifying and categorising opinions expressed in a piece of text.It determines whether the writer's attitude towards a particular topic, product, etc., is positive, negative or neutral.Sentimental analysis uses natural language processing (NLP), text analysis, computational linguistics and biometrics to systematically identify, extract, quantify and study affective states and subjective information.Conventional sentiment analysis concentrates primarily on the textual content [39].Sentimental analysis is a tool used to express customer opinions, using UGC information from a variety of sources, such as social media, reviews, testimonials, video content, blog posts, hashtag campaigns, case studies or interactive events with customers [49].Sentimental analysis can automatically clarify the emotional tone of text and identifies positive, negative and neutral sentiment from text.It is often used to provide data about perceptions of brands, products and services.It is widely applicable to a variety of subjects including passenger airline services.
Sentiment analysis can be conducted via two different approaches, either based on computational learning techniques or based on semantic approaches.It is also possible to use a combination of these two approaches.Typically, programs can work with modifier terms that increase or decrease the severity of the associated term, for example: very, too or little.This approach can also distinguish inversion terms or negations that use words such as no and never, which switch the polarity of the sentence or word [50].The emotional meaning of words is subjective; humans interpreting the sentiment of sentences will only agree on between 65% and 80% of sentences.The accuracy of sentiment analysis depends on the comparison of sentiment analysis evaluated by a computer system versus evaluations by human.On average, a sentiment analysis system needs to be at least 50% accurate to be deemed as effective and accurate, though above 65% is widely considered good [51].
Computer systems are confronted with many challenges when trying to determine the sentiment of text.Short messages (e.g., posts on social media) do not offer enough context for absolutely correct analysis.The short length of such posts also negatively affects the capability of software tools to detect sarcasm.It is complicated to program a system that can flawlessly differentiate between double negatives and negations.Negations are a linguistic way to reverse the meaning of words or even a sentence.Another issue is that text can be interpreted in many ways; sentimental analysis systems tend to have problems detecting multipolarity in sentences; one sentence can have a positive emotion towards one object and a negative emotion towards another at the same time, which is difficult to detect.
A lot of text information is exchanged when passengers communicate with an airline via Twitter.Text information, as one of the most common data types, can be structured, semi-structured or unstructured.Given that 80% of text information in the world is in an unstructured format, for example text information on social media, the application of a text mining technique is needed to work with this information.Text mining tools and natural language processing (NLP) techniques can extract information and transform unstructured data into structured information, which then makes data easy to analyse.The outcome of analysis can improve the decision making of organisations, including airlines, leading to better business results [52].Korean Air, for instance, uses IBM's Watson software tool (version 8.0.0), which helps to discover patterns that are similar to previous issues.Every time technicians fix the issue on an aircraft; they upload their notes into Watson's database to broaden its analytical capabilities.Thanks to this, the analysis of historical defects is 90% faster compared to traditional analysis without IBM's Watson software tool.
This paper uses Twitter as a source of UGC.This is mainly because Twitter focuses on the interests of the audience, while other social media focuses on socialisation of the audience.Twitter allows people to follow relevant topics and people rather than just keeping in touch with their social circle.Twitter limits tweets to 280 characters, which makes them specific and easily accessible.Passengers use it to express their compliments, concerns or complaints about delayed flights, lost baggage or mistreatment by airline staff.Twitter as a medium is becoming customer-service-central and its use for that purpose continues to evolve and expand.Ref. [53] suggests that about 40 percent of passengers have tried contacting airline companies by Twitter.

Material and Methods for the Sentimental Approach to Selected Airlines' Service Quality Evaluation
Our research relied on sentimental analysis of Twitter posts related to selected airlines' service quality.Authors analysed nearly 32,000 posts related to the service quality of four airlines, ranked among the top 10 airlines in the world by scheduled passenger kilometres, as of 2022 [54].An advanced Twitter scraping Intelligence Tool (Twint) was used to scrape tweets from Twitter.The authors proposed two types of word clouds (positive and negative) to represent word frequency that give greater prominence to words that appear more frequently on Twitter accounts of selected airlines.As demonstrated in Table 3, the selected airlines are active on Twitter.Table 3 shows the number of followers of each airline's Twitter profile, along with the total number of tweets posted by each airline.

Data Collection and Simplification
Recalling the aim of this research, which is to develop a new sentimental approach to service quality evaluation using UGC that can be implemented using freely available analytical software tools, the authors of this paper decided to use Twint.Twint is a software tool (version 2.1.20.) written in the Python Programming Language that allows for scraping tweets from Twitter profiles without using Twitter's Application Programming Interface (API).Compared to Twitter's API, Twint can fetch almost all tweets (Twitter's API limits searches to only the last 3200 tweets), it benefits from fast initial setup, it can be used anonymously and without Twitter sign up and there are no rate limitations [55].
To enable the Twitter data mining functionality of the Twint tool, we turned on the Windows subsystem for Linux (WSL) feature.Then, we were able to install the Twint application.A simple terminal-based text editor, GNU nano, was used to write commands in Linux (Figure 1).The first part of the code starts the Twint tool.In the second part, the timeframe of the search is defined.The third part of the code defines the searched item.In this case, we changed the item 'airline' to the name of the airline's Twitter profile.Thanks to this part of the code, Twint searched for every tweet containing the name of the airline.The subsequent line of the code defines the language.Here, in this case, the tool will only be searching for tweets in English.Further, we defined how the searched text is going to be saved to the computer.The name of the file and the file format are defined.When all the variables are defined, the search for tweets within these parameters can begin.
Table 3, the selected airlines are active on Twitter.Table 3 shows the number of follower of each airline's Twitter profile, along with the total number of tweets posted by each air line.

Data Collection and Simplification
Recalling the aim of this research, which is to develop a new sentimental approach to service quality evaluation using UGC that can be implemented using freely available analytical software tools, the authors of this paper decided to use Twint.Twint is a soft ware tool (version 2.1.20.) written in the Python Programming Language that allows fo scraping tweets from Twitter profiles without using Twitter's Application Programming Interface (API).Compared to Twitter's API, Twint can fetch almost all tweets (Twitter' API limits searches to only the last 3200 tweets), it benefits from fast initial setup, it can be used anonymously and without Twitter sign up and there are no rate limitations [55].
To enable the Twitter data mining functionality of the Twint tool, we turned on the Windows subsystem for Linux (WSL) feature.Then, we were able to install the Twint ap plication.A simple terminal-based text editor, GNU nano, was used to write command in Linux (Figure 1).The first part of the code starts the Twint tool.In the second part, the timeframe of the search is defined.The third part of the code defines the searched item.In this case, we changed the item 'airline' to the name of the airline's Twitter profile.Thank to this part of the code, Twint searched for every tweet containing the name of the airline The subsequent line of the code defines the language.Here, in this case, the tool will only be searching for tweets in English.Further, we defined how the searched text is going to be saved to the computer.The name of the file and the file format are defined.When al the variables are defined, the search for tweets within these parameters can begin.All tweets were then converted to .xlsformat and edited (we deleted the name of the Twitter account and the date and time of the tweet).Additionally, we deleted identical tweets, posted by the same user at different times, to prevent significant change in the sentiment from happening.

Sentiment Analysis
Applications and companies that are conducting sentiment analysis charge significant amounts of money for sentiment research; free demo versions are only available for around 50 lines of text.Because thousands of tweets were required to be downloaded for this analysis, a more accessible Microsoft Excel tool called Azure Machine Learning (AML) was used to generate sentiment analysis.It uses a Multi Perspective Question Answering (MPQA) Subjectivity Lexicon, including 5097 negative and 2533 positive words.Each word is assigned positive or negative polarity.The results show positive, negative or neutral sentiment and a percentage score.Items near 100% are very likely positive and items near 0% are very likely negative.We used pivot tables to count the number of tweets with different sentiment and calculated the average score of each sentiment for every airline (Figure 2).
All tweets were then converted to .xlsformat and edited (we deleted the name of the Twitter account and the date and time of the tweet).Additionally, we deleted identica tweets, posted by the same user at different times, to prevent significant change in the sentiment from happening.

Sentiment Analysis
Applications and companies that are conducting sentiment analysis charge signifi cant amounts of money for sentiment research; free demo versions are only available fo around 50 lines of text.Because thousands of tweets were required to be downloaded fo this analysis, a more accessible Microsoft Excel tool called Azure Machine Learning (AML was used to generate sentiment analysis.It uses a Multi Perspective Question Answering (MPQA) Subjectivity Lexicon, including 5097 negative and 2533 positive words.Each word is assigned positive or negative polarity.The results show positive, negative or neu tral sentiment and a percentage score.Items near 100% are very likely positive and item near 0% are very likely negative.We used pivot tables to count the number of tweets with different sentiment and calculated the average score of each sentiment for every airline (Figure 2).

Credibility Check
To differentiate between the positivity and negativity of tweets, conditional format ting tools were used.The AML tool provided a correct classification of positive and nega tive sentiment for most of tweets (sentiment of 5% of tweets was classified incorrectly by the system).To improve the correctness of sentiment classification, we randomly checked up to 50 tweets from each group.Random manual checks of tweets helped to reveal sar casm (expressing negative sentiment using positive words), polarity (very apparent and robust tone in some sentences vs. neutral tone or not-easily classified tone), polysemy (when words have more than one meaning) and negation detection (just because a sen tence contains negation, it does not mean that the overall sentiment of the sentence is neg ative) and prevented massive deterioration of correct sentiment classification from hap pening.Often, passengers mentioned multiple airlines in one tweet, comparing one airline to another.This sentiment analysis is unable to distinguish which part of a tweet refers to a particular airline.
As an example, the following tweet was identified as 99% positive: "love when @AmericanAir: 1. Delays your flight a day ahead of time 2. Encourages you to change your connecting flight because of the delay 3. Gets you there to make the first flight 4.You have no ticket because you changed your flight really great way to start the weekend!So fun!" Manual verification identified this tweet, as a false positive as it uses sarcasm, which cannot be detected by AML.
Many of the positive tweets were identified with high scores, as these were replie shared by the airline employees responsible for social media communication, who are trained to use words with positive sentiment when replying to customers.

Credibility Check
To differentiate between the positivity and negativity of tweets, conditional formatting tools were used.The AML tool provided a correct classification of positive and negative sentiment for most of tweets (sentiment of 5% of tweets was classified incorrectly by the system).To improve the correctness of sentiment classification, we randomly checked up to 50 tweets from each group.Random manual checks of tweets helped to reveal sarcasm (expressing negative sentiment using positive words), polarity (very apparent and robust tone in some sentences vs. neutral tone or not-easily classified tone), polysemy (when words have more than one meaning) and negation detection (just because a sentence contains negation, it does not mean that the overall sentiment of the sentence is negative) and prevented massive deterioration of correct sentiment classification from happening.Often, passengers mentioned multiple airlines in one tweet, comparing one airline to another.This sentiment analysis is unable to distinguish which part of a tweet refers to a particular airline.
As an example, the following tweet was identified as 99% positive: "love when @AmericanAir: 1. Delays your flight a day ahead of time 2. Encourages you to change your connecting flight because of the delay 3. Gets you there to make the first flight 4.You have no ticket because you changed your flight really great way to start the weekend!So fun!" Manual verification identified this tweet, as a false positive as it uses sarcasm, which cannot be detected by AML.
Many of the positive tweets were identified with high scores, as these were replies shared by the airline employees responsible for social media communication, who are trained to use words with positive sentiment when replying to customers.

Visual Representation of Word Data
Word clouds represent a visual image of word data from this research, which were scraped from Twitter.These collections of words are depicted in different sizes.The bigger and bolder the word appears, the more often it is mentioned within a given text and the more important it is.

Results
The sentiment of tweets shared by passengers on selected airlines' Twitter profiles is shown in Figure 3 and presents the emotional tone behind the body of these tweets.

Visual Representation of Word Data
Word clouds represent a visual image of word data from this research, which were scraped from Twitter.These collections of words are depicted in different sizes.The bigger and bolder the word appears, the more often it is mentioned within a given text and the more important it is.

Results
The sentiment of tweets shared by passengers on selected airlines' Twitter profiles is shown in Figure 3 and presents the emotional tone behind the body of these tweets.As for Ryanair, 7495 tweets, posted between 10 April 2022 and 18 April 2022, were analysed.Every tweet contained the word Ryanair.More tweets (3325) were identified as positive (44%), and 37% of tweets (2768) were identified as negative.Overall, the sentiment of tweets for Ryanair, which were subject to our analysis, was 52% positive.To generate Ryanair's positive word cloud, 3325 tweets were used with a sentiment score higher than 60% (where 100 is the most positive possible outcome and 0 is the least) (Figure 4).
In this sample, there were as many as 18 frequent words found, with a minimum occurrence of 40 times.Ryanair occurred 3628 times, Dublin Airport was used 79 times and price was mentioned 138 times.Other popular words were online, flight and time.
To generate the negative word cloud for Ryanair, 2768 tweets were used with a sentiment score lower than 45% (Figure 4).The sample of tweets contains 20 frequent words with the minimum frequency of 70 times.Ryanair was mentioned 3356 times in the sample tweets.Insurance was found 132 times, while staff was found 82 times.The username of the additional Ryanair Twitter channel askryanair was used 233 times and the word app was used 72 times.Other frequent negative words were airport, boarding and money.These words represent services (areas) Ryanair customers are not satisfied with.As for Ryanair, 7495 tweets, posted between 10 April 2022 and 18 April 2022, were analysed.Every tweet contained the word Ryanair.More tweets (3325) were identified as positive (44%), and 37% of tweets (2768) were identified as negative.Overall, the sentiment of tweets for Ryanair, which were subject to our analysis, was 52% positive.To generate Ryanair's positive word cloud, 3325 tweets were used with a sentiment score higher than 60% (where 100 is the most positive possible outcome and 0 is the least) (Figure 4).In the case of Southwest Airlines, 8195 tweets, posted between 10 April 2022 a April 2022, were analysed.More tweets were identified as positive (45%) and 36 tweets were identified as negative.Overall, the sentiment of tweets for Southwest Air which were subject to our analysis, was 51% positive.A positive word cloud was g ated from 3667 positive tweets about Southwest Airlines, with a sentiment score h than 60% (Figure 5).This sample of tweets contains 15 frequent words with a mini occurrence of 50 times.SouthwestAir occurred as much as 4276 times in the sampl word Mask occurred 194 times, Vegas (Las Vegas city in Nevada) occurred 70 time Attendant 62 times.Other popular words were plane, seats and time.These words r sent the things Southwest Airlines customers are most happy with.
To generate the negative word cloud for Southwest Airlines, 2989 tweets were lysed with a sentiment score lower than 45% (Figure 5).This sample contains 20 freq words with a minimum occurrence of 50 times.SouthwestAir occurred 2944 times, C In this sample, there were as many as 18 frequent words found, with a minimum occurrence of 40 times.Ryanair occurred 3628 times, Dublin Airport was used 79 times and price was mentioned 138 times.Other popular words were online, flight and time.
To generate the negative word cloud for Ryanair, 2768 tweets were used with a sentiment score lower than 45% (Figure 4).The sample of tweets contains 20 frequent words with the minimum frequency of 70 times.Ryanair was mentioned 3356 times in the sample tweets.Insurance was found 132 times, while staff was found 82 times.The username of the additional Ryanair Twitter channel askryanair was used 233 times and the word app was used 72 times.Other frequent negative words were airport, boarding and money.These words represent services (areas) Ryanair customers are not satisfied with.
In the case of Southwest Airlines, 8195 tweets, posted between 10 April 2022 and 18 April 2022, were analysed.More tweets were identified as positive (45%) and 36% of tweets were identified as negative.Overall, the sentiment of tweets for Southwest Airlines, which were subject to our analysis, was 51% positive.A positive word cloud was generated from 3667 positive tweets about Southwest Airlines, with a sentiment score higher than 60% (Figure 5).This sample of tweets contains 15 frequent words with a minimum occurrence of 50 times.SouthwestAir occurred as much as 4276 times in the sample, the word Mask occurred 194 times, Vegas (Las Vegas city in Nevada) occurred 70 times and Attendant 62 times.Other popular words were plane, seats and time.These words represent the things Southwest Airlines customers are most happy with.In the case of Southwest Airlines, 8195 tweets, posted between 10 April 2022 and 18 April 2022, were analysed.More tweets were identified as positive (45%) and 36% of tweets were identified as negative.Overall, the sentiment of tweets for Southwest Airlines, which were subject to our analysis, was 51% positive.A positive word cloud was generated from 3667 positive tweets about Southwest Airlines, with a sentiment score higher than 60% (Figure 5).This sample of tweets contains 15 frequent words with a minimum occurrence of 50 times.SouthwestAir occurred as much as 4276 times in the sample, the word Mask occurred 194 times, Vegas (Las Vegas city in Nevada) occurred 70 times and Attendant 62 times.Other popular words were plane, seats and time.These words represent the things Southwest Airlines customers are most happy with.
To generate the negative word cloud for Southwest Airlines, 2989 tweets were analysed with a sentiment score lower than 45% (Figure 5).This sample contains 20 frequent words with a minimum occurrence of 50 times.SouthwestAir occurred 2944 times, Covid was mentioned 110 times, masks occurred 176 times and mask 279 times.Delayed occurred 159 times.Other popular negative words were bags, money and gate.These words represent the things Southwest Airline customers are not satisfied with.For American Airlines, 11,371 tweets, posted between 10 April and 18 April 2022, were analysed.Also, in this case, more tweets (44%) were identified as positive and 40% of tweets were categorised as negative.The overall sentiment of tweets for American Airlines, which were subject to our analysis, is 50% positive.
To generate a positive word cloud for American Airlines, 5068 tweets were analysed with a sentiment score higher than 55% (Figure 6).In this sample, 18 frequent words occurred with a minimum occurrence of 70 times.AmericanAir occurred 3594 times, the word help was mentioned 216 times, flight occurred 388 times and team was mentioned To generate the negative word cloud for Southwest Airlines, 2989 tweets were analysed with a sentiment score lower than 45% (Figure 5).This sample contains 20 frequent words with a minimum occurrence of 50 times.SouthwestAir occurred 2944 times, COVID was mentioned 110 times, masks occurred 176 times and mask 279 times.Delayed occurred 159 times.Other popular negative words were bags, money and gate.These words represent the things Southwest Airline customers are not satisfied with.
For American Airlines, 11,371 tweets, posted between 10 April and 18 April 2022, were analysed.Also, in this case, more tweets (44%) were identified as positive and 40% of tweets were categorised as negative.The overall sentiment of tweets for American Airlines, which were subject to our analysis, is 50% positive.
To generate a positive word cloud for American Airlines, 5068 tweets were analysed with a sentiment score higher than 55% (Figure 6).In this sample, 18 frequent words occurred with a minimum occurrence of 70 times.AmericanAir occurred 3594 times, the word help was mentioned 216 times, flight occurred 388 times and team was mentioned 230 times.Other popular words were love, trip and check-in.These words represent the things American Airlines customers are most satisfied with.
Aerospace 2023, 10, x FOR PEER REVIEW 13 230 times.Other popular words were love, trip and check-in.These words represen things American Airlines customers are most satisfied with.
To generate the negative word cloud, 4510 tweets were analysed for American lines with a sentiment score lower than 45% (Figure 6).This sample contains 20 freq words with a minimum occurrence of 109 times.AmericanAir occurred 4508 times word gate occurred 339 times, delayed occurred 223 times and customer was menti 275 times.Other popular negative words were mask, kids and waiting.These words resent the things American Airlines customers are not satisfied with.For KLM, 4772 tweets, posted between 10 April and 18 April 2022, were analy From those, 53% of tweets were identified as positive and 26% were identified as nega Overall, the sentiment of tweets for KLM, which were subject to our analysis, is 56% itive.
To generate a positive word cloud for KLM, 1012 tweets were analysed with a s To generate the negative word cloud, 4510 tweets were analysed for American Airlines with a sentiment score lower than 45% (Figure 6).This sample contains 20 frequent words with a minimum occurrence of 109 times.AmericanAir occurred 4508 times, the word gate occurred 339 times, delayed occurred 223 times and customer was mentioned 275 times.Other popular negative words were mask, kids and waiting.These words represent the things American Airlines customers are not satisfied with.
For KLM, 4772 tweets, posted between 10 April and 18 April 2022, were analysed.From those, 53% of tweets were identified as positive and 26% were identified as negative.Overall, the sentiment of tweets for KLM, which were subject to our analysis, is 56% positive.
To generate a positive word cloud for KLM, 1012 tweets were analysed with a sentiment score higher than 55% (Figure 7).In this sample of tweets, 18 frequent words were found with a minimum occurrence of 50 times.KLM occurred 2749 times.The word refund was mentioned 184 times, while Schiphol was used 363 times.Airbus occurred 79 times.Other popular positive words were climate, booking and ticket.These words represent the things KLM customers are most satisfied with.For KLM, 4772 tweets, posted between 10 April and 18 April 2022, were analy From those, 53% of tweets were identified as positive and 26% were identified as nega Overall, the sentiment of tweets for KLM, which were subject to our analysis, is 56% itive.
To generate a positive word cloud for KLM, 1012 tweets were analysed with a s ment score higher than 55% (Figure 7).In this sample of tweets, 18 frequent words found with a minimum occurrence of 50 times.KLM occurred 2749 times.The wor fund was mentioned 184 times, while Schiphol was used 363 times.Airbus occurre times.Other popular positive words were climate, booking and ticket.These words resent the things KLM customers are most satisfied with.
To generate the negative word cloud for KLM, 1223 tweets were analysed with a timent score lower than 45% (Figure 7).In this sample of tweets, 20 frequent word curred with a minimum occurrence of 40 times.KLM occurred 1440 times.The word fl was mentioned 173 times and LelystadAirport was mentioned 85 times.Other pop negative words were service, baggage and waiting.These words represent the th KLM customers are not satisfied with.To generate the negative word cloud for KLM, 1223 tweets were analysed with a sentiment score lower than 45% (Figure 7).In this sample of tweets, 20 frequent words occurred with a minimum occurrence of 40 times.KLM occurred 1440 times.The word flight was mentioned 173 times and LelystadAirport was mentioned 85 times.Other popular negative words were service, baggage and waiting.These words represent the things KLM customers are not satisfied with.

Discussion
Obtaining information about customer preferences and capturing consumer attitudes is a primary challenge faced by marketing researchers.Conventional service quality research is expensive and time consuming.Social media posts are published every minute and are freely available.Sentimental Approach to Airline Service Quality Evaluation (SASQUE) offers airlines an efficient and inexpensive way to gain insights into passenger satisfaction with airline service quality.Compared to conventional research of service quality (e.g., in-depth interviews, which are very accurate when designed and delivered well), this approach is less expensive and much less time-consuming.There are also other issues to be considered.Questionnaires and data collection at airport facilities are becoming a burden, in terms of not only cost but also additional issues such as security and passenger comfort [38].On the other hand, social media offers freely available and almost unlimited data (posts are published every second).Therefore, alternative approaches based on the digital footprint analysis left by passengers in some online communities, blogs, specialised internet sites and social media, among others, are becoming new database substitutes [30].Posts are usually very much straightforward because customers want to be heard and acknowledged.The advantage of the method is that the results are available immediately and there are sources of practically unlimited data.
Weaknesses of this method are the inability of software tools to fully understand the sentiment of tweets and limits in downloading data from Twitter.The problem also is that it analyses the meaning and frequency of words and, in some cases, the sense of the sentences could be lost.Software tools are not sensitive enough to the sentiment of all words and contexts which could result in assigning an inappropriate sentiment score.In addition, software tools are, at present, not able to detect double negatives or sarcasm.Word clouds do not combine words into phrases, for example, 'customer' and 'service' appear as separate words in a word cloud, instead of 'customer service'.These drawbacks can, most likely, be solved by the use of artificial intelligence in the future.Another issue with word clouds is that words that are synonyms of each other are not treated as one or phrase in the clouds.Consequently, one of the weaknesses is insufficient understanding of sentiment by computer analyses.Software tools, such as Twint, must be used to download a sufficient amount of data from Twitter.Software might experience troubles downloading tweets from months or years ago.Additionally, numerous accounts tweet spam messages about airlines, or there are hundreds of plane spotter tweets about the whereabouts of aircraft and posts written by airline staff which could possibly change the outcome of the analysis.
However, airlines should use this type of online data to instantly learn about their customers' thoughts on various products and services.Ignoring an 'online voice' of customers might put airlines in a competitive disadvantage.To authentically understand customers, airlines should always use multiple ways to evaluate their services.From user-generated data, the least filtered opinions can be acquired.Sentiment analyses can be applied to sources other than Twitter posts; data from any social media posts or comment sections can be used.

Conclusions
The authors of this paper analysed 31,833 tweets of four selected airlines, Ryanair, Southwest, American and KLM, each ranked among the top 10 airlines in the world by scheduled passenger kilometres in 2022, to demonstrate the sentimental approach to airline service quality evaluation.Analysing user-generated content can support airlines to make more informed decisions that enhance their quality of service.
American Airlines recorded the highest number of posts on Twitter, out of all selected airlines, between 10 April and 18 April 2022.On the other hand, we saw the fewest tweets for KLM, over the same period.The word cloud visualisations (Figures 4-7), coming directly from the collected tweets, allowed us to see which words are most commonly used when discussing each airline in both ways, positive and negative.This provided excellent insight into how selected airlines are being commented on and talked about online, and also into the day-to-day experiences of their customers.
Ryanair's most common words are 'Dublin Airport', 'price', 'online', 'flight' and 'time'.The strategy of keeping very competitive prices and offering 'no frills' marks this airline apart from its competitors.On the other hand, negative words, including 'app', 'askryanair', 'staff' and 'insurance' are definitely larger for Ryanair than for other selected airlines.We recorded the number of tweets complaining about how passengers were handled by the airline staff of dedicated Twitter account 'askryanair'.Additionally, passengers complained about frequent crashes of the Ryanair Mobile app that prevented the online check-in process from finishing.Another issue experienced by users were troubles logging into the application.
The key features of the airline product that keep KLM one step ahead of its competitors is 'climate' (one of the most popular positive words shared by KLM customers).The airline feels a strong responsibility to reduce its carbon footprint demonstrated by the release of its Climate Action plan.Numerous tweets mentioned KLM actions towards a more ecologically sustainable business in a positive way.'Vegas' is popular positive word used by Southwest passengers.It refers to Las Vegas, NV, USA.The Southwest Vacations program offers an exclusive Las Vegas vacation pack.If Southwest passengers book their Las Vegas accommodation on the Southwest Vacations homepage, they are allowed one extra piece of checked baggage with no additional charge.
Our analysis revealed that 'covid' and 'masks' are some of the most frequent words for both selected US airlines (Southwest and American) in a negative sense in 2022.This does not necessarily mean that 'mask' only triggered negative emotions of US travellers; the font size 'mask' is even larger in the positive case for Southwest Airlines, according to Figure 5, which indicates it triggered more positive reactions among Southwest travellers, mainly thanks to professionalism of Southwest flight attendants and their kind attitude during the pandemic period.
We have not recorded the same negative trend of 'covid' and 'masks' with Ryanair and KLM.In our opinion, the negative sentiment of 'mask' indicates more negative emotions of US travellers towards travel restrictions and government rules during and shortly after the pandemic.The different response from American airlines might be caused by cultural differences and political polarity.This can be documented by the fact that FAA received a record 5981 reports of disruptive passengers in 2021, with more than 70% of those incidents involving masks [56].
It remains extremely important that airlines pay good attention to their online customer feedback and analysis of its sentiment.Not only because it affects prospective customers, but also because it can help to decide on future airline product planning.The sentimental approach to airline service quality evaluation using UGC can show airlines their strengths and, more importantly, their weaknesses.That piece of information could turn any airline business into a success story and help airlines to master the art of providing excellent service for their returning and prospective customers.
Nevertheless, the sentimental analysis itself has many setbacks.The free version, which was used in our research, is not sensitive enough for word differentiation.Many tweets could end up with a wrong sentiment score.Additionally, the free version of the software is not able to detect double negatives or sarcasm.Word clouds only count single words, not phrases.The word cloud would only count customer or service but it would not count customer service.Another issue is that word clouds do not count words with identical meaning as one, nor do they count words in their plural form or misspelled words as one.
Ref. [57] shows that this is a general word parsing problem caused by inadequate level of automatic text processing.It is necessary to consider other limitations, too.For example, computers are rather poor at understanding the meaning of language and they also do not provide context, so the meaning of individual words may be lost, so as part of the text analysis, human interaction might be appropriate.As indicated in Section 6, word processing is a good, quick tool for identifying extremes; however, it is only one of the tools that must be complemented by other suitable methods (presented in Section 2) to evaluate the quality of airline services.
In addition, analysis of the content, created by airline passengers using word clouds, cannot distinguish the specific requirements of a particular passenger.It only reveals the weaknesses and strengths of an airline product.Therefore, airlines have to evaluate the specific requirements of particular passengers individually, taking into account their cultural differences, etc.
Our research has shown that the sentimental approach to airline service quality evaluation analysing UGC can be a valuable and inexpensive means of passenger satisfaction or dissatisfaction tracing, and it is a useful tool to identify extremes in positive and negative perceptions of airline service quality.

Figure 1 .
Figure 1.The code used for Twint data collection and sorting.[Source: Authors].

Figure 1 .
Figure 1.The code used for Twint data collection and sorting.[Source: Authors].

Figure 2 .
Figure 2. Pivot table displaying sentiment analysis for Ryanair.Red and orange colors represen negative sentiments while yellow and green colors represent positive sentiments.[Source: Authors]

Figure 2 .
Figure 2. Pivot table displaying sentiment analysis for Ryanair.Red and orange colors represent negative sentiments while yellow and green colors represent positive sentiments.[Source: Authors].

Table 1 .
Type of assessment method and data collected by individual instruments (review platform and applications).

Table 2 .
The main differences between the traditional methods of service quality evaluation and the methods based on Twitter data.

Table 3 .
Overview of how active selected airlines are on Twitter.

Table 3 .
Overview of how active selected airlines are on Twitter