Categorizing Quality Determinants in Mining User-Generated Contents

: User-Generated Contents (UGCs) are gaining increasing popularity as a source of valuable information for companies to manage the quality of their products, services and Product-Service Systems (PSS). This paper aims at proposing a novel approach to identify and categorize quality determinants through the analysis of an extensive database of UGCs. In detail, this paper applies a topic modeling algorithm (Structural Topic Model) to identify quality determinants and introduces the Mean Rating Proportion measurement for their classiﬁcation into three categories: negative, positive and neutral quality determinants. The application of the proposed methodology is exempliﬁed through the analysis of a PSS case study (car-sharing).


Introduction
If the manufacturing industry has faced its 4th Industrial Revolution, the same is yet to happen for quality research and management [1]. A great opportunity for quality research is given by the rise of digital technologies and big data [2]. In this area, the analysis of digital word-of-mouth is becoming increasingly important. Traditionally, customer opinions and word-of-mouth were investigated through questionnaires, focus groups and interviews. Nowadays, consumers express their opinions autonomously on social media, forum or review aggregators. User-Generated Contents (UGCs), i.e., information directly generated by customers, may represent a valuable source of information for the quality management of products, services and Product-Service Systems.
A recent approach to determine the quality determinants is the analysis of UGCs, more specifically in the form of online reviews, which can offer a low-cost source of information for understanding customer's expectations and requirements [3][4][5][6]. The identification of quality determinants is based on the in-depth analysis of such data, leveraging text mining approaches capable of obtaining information through text documents written in a natural language [7]. To this end, topic modeling approaches are used. Such approaches are based on unsupervised machine-learning algorithms that can detect latent topics running through a collection of unstructured documents [8]. Given a big set of documents, topic modeling algorithms deals with the problems of: (i) identifying a set of topics that describe a text corpus (i.e., a collection of text document from a variety of sources); (ii) associating a set of keywords to each topic and (iii) defining a specific mixture of these topics for each document [9]. The logic of these approaches is that if a topic is discussed (within the UGCs), then it is critical to the definition of the quality of the object (product, service or Product-Service System) under investigation.
In particular, the quality management of Product-Service Systems (PSS) can benefit significantly from UGCs' analysis. Product-Service Systems are defined as "systems of products, services, networks of "players" and supporting infrastructure that continuously strives to be competitive, satisfy customer needs and have a lower environmental impact than traditional business models" [10,11].
Several factors explain the recent success of the PSS-based business model on a global scale and in a wide range of economic activities: maturity of information technology, general user acceptance of PSS offers format, affordable and extended access to internet, wide range of suppliers available, etc. [12][13][14][15][16]. Particularly, the PSS-based business model has been proved to enable the achievement of the Sustainable Development Goals (SDGs), which is becoming appreciated and demanded by an increasing number of customers [17]. In addition, this model shows robustness in different scenarios and circumstances, such as the different degree of development of a country, or particular changes that suddenly influence the behavior of customers. These challenges are better envisaged and faced from the PSS perspective, which allows to modulate and adapt the product-service to the particular demand requirements and to sudden and unexpected changes [18,19].
Despite the academic and practical success of PSS, too little has yet been done to manage their quality. While for pure products and services, standardized models are available [20,21], the same cannot be said for PSS, for which ad hoc solutions are still needed in each analyzed case [12].
This article attempts to shed light on these issues by proposing a methodology to analyze UGCs in order to extract and categorize PSS quality determinants. The proposed methodology can also be applied for the analysis of UGCs related to pure products and services. In order to better illustrate the strengths of the proposed methodology, a case study on PSS car-sharing is presented.
The remaining part of the paper proceeds as follows: Section 2 introduces the methodology and the results of its application; Section 3 discusses the findings of the research and Section 4 summarizes the contribution of this paper and future research directions.

Methodology
In this study, specifically, we use a probabilistic topic modeling method, named Structural Topic Model (STM), an extension of well-established probabilistic topic models, such as Latent Dirichlet Allocation (LDA) [9] or Correlated Topic Models (CTP) [22]. A significant advantage of STM is that it allows the connection of arbitrary information, in the form of covariates (such as customer ratings, date and place of publication of the review, service provider, etc.), with the degree of association of a document with a topic (topic prevalence) as well as the degree of association of a word with a topic (content prevalence). Roberts et al. [23,24] provide good overviews of the STM algorithm.
The analysis herein presented has been carried out using the STM package of R software (R Core Team, 2017). Its application consists of the six steps shown in Figure 1 and is further described by the following sections.

Application Case Description
The proposed methodology is described using a practical application case concerning car-sharing [25]. Car-sharing is a form of shared mobility that has gained increasing popularity in recent years [26]. Given its promise to reduce traffic congestion, parking demands and pollution, this mode of shared transportation has spread, especially in urban contexts, so much so that several new competitors are recently entering this market designing and proposing new service solutions [27]. The number of users of car-sharing services is multiplying: 15 million people in Europe (about 2% of the population) are expected to use car-sharing services in 2020, compared to 7 million in 2015 [28]. This increase in users is expected to increase profits from approximately $1 billion in 2013 to $10.8 billion by 2025 [29]. Generally, car-sharing schemes can fall into one of four models: • one-way, when members are allowed to begin and end their trip at different locations, through free-floating zones or station-based models with designated parking locations; • roundtrip, when members are required to begin and end their trip at the same location; • peer-to-peer, when the vehicles are typically privately owned or leased with the sharing system operated by a third-party; • fractional, allows the users to co-own a vehicle and share its costs and use.
Among others, the most successful model in terms of users over time is the "one-way" model in both free-floating and station-based configuration [30].

Dataset Extraction
Analyzed data are reviews and relevant metadata (car-sharing providers, nationality, rating, date, source) retrieved in December 2019 from different review aggregators: Yelp, Google, Trustpilot, Facebook and Play Store. Reviews were published from January 2010 to December 2019. Only English-language reviews were selected, with a total of almost 17,000 reviews from 22 car-sharing providers (Car2go, DriveNow, Maven, Zipcar, Goget, etc.), distributed in three countries (US, Canada and UK). Each provider was related to the type of car-sharing (station-based or free-floating). The average length of the obtained reviews is about 500 characters. The information concerning review ratings, types of car-sharing (station-based or free-floating) and countries was used to define the topic prevalence in the STM model, i.e., the overall frequencies of words associated with each topic.

Pre-Processing
According to previous approaches [31,32], the text corpus was pre-processed and unified in order to improve the efficiency of the topic modeling algorithm. In detail, the text corpus was pre-processed as follows: • the text was converted to lowercase in order to eliminate ambiguity with uppercase words; • punctuation and numbers were removed since they were adding little topical content; • English stop words (e.g., "the", "and", "when", "is", "at", "which", "on", etc.) were removed; • words shorter than 2 characters or longer than 15 were removed; • words with an extremely low frequency (less than 15 occurrences in the whole text corpus) were excluded from the text corpus since their inclusion would confound results or would not be representative of any specific topics; • the text was normalized using Porter stemmer (or 'Porter stemming') to reduce similar words to a unique term. Stemming removes the commoner morphological and inflectional endings from words in English [33]. For example, the words "likes", "liked", "likely" and "liking" were reduced to the stem "like"; • words generally not related to topical content (such as: "another", "mean", "etc.", "problem", "review", "made") were removed; • all the n-grams, i.e., contiguous sequences of n items from a given sequence of text were replaced by a single term. For example, the n-grams of "customer service" were replaced by the term "customerservice".

Identification of the Optimal Number of Topics
An essential parameter for the STM method is T, i.e., the number of topics able to describe the analyzed text corpus. The literature discusses several possible alternatives to define T [34]. To the purpose of this analysis, the held-out likelihood has been selected as a measure of performance of the topic model. The held-out likelihood evaluates how well the trained model explains the held-out data (i.e., a portion of data not used to develop the topic model). It can be seen as a measure of how the developed topic model can explain the overall variability in the text corpus [23,35]. In the proposed application, only 90% of available UGCs were used to train the topic model, and the remaining 10% were used to test the developed topic model. Held-out likelihood (L) is formally defined as the log probability (p) of the held-out data (W held−out ) given the trained model (M trained ): The graph in Figure 2 shows the values of the held-out likelihood as a function of T (from 5 to 100). From the graph, we can observe that starting from the value of T equal to 20, there is an almost stationary held-out likelihood. Considering this, an optimal number of T = 20 topics was identified.
The graph in Figure 2 shows the values of the held-out likelihood as a function of T (from 5 to 100). From the graph, we can observe that starting from the value of T equal to 20, there is an almost stationary held-out likelihood. Considering this, an optimal number of T = 20 topics was identified. Figure 2. Results of the held-out likelihood analysis to determine the optimal number of topics (ranging from 5 to 100).

Labeling
For each topic, the STM approach identifies the most relevant keywords. However, to generate a relevant semantic label, the method still requires some human input [36]. To date, no automatic labeling techniques have yet been developed. Table 1 shows the identified labels and the relevant lists of keywords as defined by the authors. After an independent analysis which led to the definition of partially different labels, a joint brainstorming allowed to settle the differences and obtain the final list of labels listed in Table 1. Finally, to test their reliability, the defined topic labels were submitted for confirmation to an external panel familiar with quality research and practice.  Figure 2. Results of the held-out likelihood analysis to determine the optimal number of topics (ranging from 5 to 100).

Labeling
For each topic, the STM approach identifies the most relevant keywords. However, to generate a relevant semantic label, the method still requires some human input [36]. To date, no automatic labeling techniques have yet been developed. Table 1 shows the identified labels and the relevant lists of keywords as defined by the authors. After an independent analysis which led to the definition of partially different labels, a joint brainstorming allowed to settle the differences and obtain the final list of labels listed in Table 1. Finally, to test their reliability, the defined topic labels were submitted for confirmation to an external panel familiar with quality research and practice.

Data Verification
Obtained results were verified by comparing the assigned topic of a randomly selected sample composed of 100 reviews with a manual topic assignment performed by the authors. For each of the 100 reviews, the authors were requested to agree in the association of one or more of the 20 topics identified by STM. The so-defined topic assignment was then considered as reference and compared to that obtained by STM. For each review and topic, the following four cases can occur (see Table 2): • True positive (tp), i.e., agreement between authors and algorithm in the assignment of a review to a topic.

•
True negative (tn), i.e., agreement between authors and algorithm not to assign a review to a topic.

•
False positive (fp), i.e., misalignment between the assignment of the review to a topic by STM and the non-assignment by the authors (type I error).

•
False negative (fn): i.e., misalignment between the assignment of the review to a topic by the authors and the non-assignment by STM (type II error). According to Costa et al. [37], three verification indicators have been calculated (see Table 3). Accuracy is the most intuitive performance measure and it is equal to the ratio of correctly predicted observations to the total observations. It measures how often the algorithm produces a correct topic assignment. Accuracy assumes equal costs for both kinds of errors. Other metrics should be calculated in order to evaluate more accurately the performance of the applied method. To fully evaluate the effectiveness of a topic modeling algorithm, two indicators should also be considered: Recall and Precision. Recall, also known as sensitivity or true positive rate, can be defined as the ratio of the total number of correctly predicted observations (true positive) with the sum of true positive and false negative observations. Recall metric answers to the questions: "If a topic is present in a review, how often is the algorithm able to detect it?" Precision, also known as positive predictive value, is equal to the ratio between the total number of correctly classified positive examples by the total number of predicted positive prediction (true positive + false positive). This metric answers to the question: "What proportion of positive topic assignments was actually correct?" These three metrics show a generally good correspondence between the assignment produced by STM and the authors. The accuracy of 94% proves good effectiveness of the method to predict the content of the reviews, correctly identifying true positive and true negative. According to Nassirtoussi et al. [38], accuracy values above 55% can be accepted as "report-worthy". According to , in most cases, accuracy is between 50% and 80%. The Recall and Precision indicators, respectively equal to 73% and 65%, show that the method performs well in terms of identification of the topics (true positive).  Figure 3 shows the Mean Topical Prevalence (MTP) of the 20 identified topics in the analyzed reviews. The MTP represents the average weight of a topic and can be calculated as follows:

Categorization of the Quality Determinants
where N is the number of reviews considered in the analysis and TP i,t is the topical prevalence of the topic t in the review i. The sum of the MTPs related to all the identified topics is equal to 1: The most discussed topics are topic 6, concerning the reliability of the mobile application; topic 9, related to service convenience, and topic 15, related to the responsiveness of the customer service. The less discussed topics are those related to the tangible component of the car-sharing service: topic 2, relating to the management of accidents and damage to vehicles, and topic 8, relating to the internal condition of vehicles.
Note that the above mentioned does not mean these topics are more "critical to quality" than others. The difference in MTP may depend on several factors, including the review aggregators used for the analysis, which may be more (or less) oriented towards collecting specific information on certain topics. For example, the Play Store review aggregator commonly collects information related to the user experience with respect to the applications.
It is, therefore, necessary to introduce a new dimension of analysis for the identified quality determinants. The understanding of the way quality determinants are discussed and hence their relationship to customer satisfaction is probably more relevant than knowing how much a topic is discussed. A global assessment of satisfaction, the so-called rating, is generally associated to each review in most review aggregator platforms to summarize the overall satisfaction of the user [40]. The rating is usually expressed on an ordinal scale, ranging from one star (maximum dissatisfaction) to five stars (maximum satisfaction) [41]. The relationship between the topic discussed in a review and its respective rating can provide the basis for categorizing the quality determinants identified through UGCs analysis.
In this view, this paper introduces the concept of Mean Rating Proportion (MRP). The MRP can be defined as the average proportion of a topic in reviews with a specific rating level. MRP can be calculated as follows: where t is the topic; k is the rating level; R k is the subset of reviews associated to a rating level equal to k; TP i,t is the topical prevalence of the topic t in the review i;|R k | is the cardinality of R k. Note that the sum of the MRPs related to all the identified topics and a specific rating level is equal to 1: For example, the sum of the MRPs related to rating 1 of all topics is equal to 1. The graph shown in Figure 4 shows the MRP profile for a generic topic. In this example, the average Topic Proportion (i.e., the average proportion of the topic in the analyzed subset of UGC) increases as the rating level increases. Specifically, the exemplified topic is more discussed in positive reviews. This result can be read from two points of view: (i) the quality determinant given by that topic generates user satisfaction, or (ii) the topic is related to aspects that users identify as positive. In both cases, this information is critical to quality management.  Figure 5 reports the MRP profiles for the 20 identified car-sharing quality determinants. It is clear how different quality determinants present different profiles. In particular, three different categories of quality determinants can be identified:

•
Negative quality determinants, i.e., those determinants more discussed by reviews characterized by a negative rating (see Figure 5A). Accident and damages management, registration process, charges and fees, car condition, customer service responsiveness, car start-up issues, customer service courtesy, and billing and membership fall into this category.

•
Positive quality determinants, i.e., those determinants which are more discussed by reviews with a positive rating (see Figure 5B). Convenience, efficacy, sharing benefits and intermodal transportation fall into this category.

•
Neutral quality determinants, i.e., those determinants where the MRP does not appear to be affected by the review rating. Neutral quality determinants have a flat or (approximatively) symmetric profile centered on the intermediate rating (see Figure 5C). Customer service (physical office), parking areas, app reliability, end trip issues, use rates, car proximity, car availability and car reservation fall into this category.

Discussion and Implications
Previous studies proposing tools for quality management based on the analysis of UGCs focused on the proportion in which the different quality determinants are discussed (i.e., Mean Topical Prevalence) to assess their criticality. This variable is strongly influenced by external factors (such as type of platform or sample of users) which cannot be easily controlled. Trying to overcome this issue, this article proposes the study of a complementary variable named Mean Rating Proportion (MRP). The analysis of MRP proved to provide a clear indication concerning the relevance of the topics to customer (dis)satisfaction, enhancing the classification of quality determinants into three different categories requiring different approaches in their management: • Negative quality determinants represent those aspects that generate dissatisfaction in the user. When users discuss them, it is mainly in a negative connotation. It is essential to analyze the reasons behind dissatisfaction to implement strategies for improving quality or at least for mitigating the adverse effects.

•
On the contrary, positive quality determinants can be seen as those elements that can generate greater satisfaction and delight. Consequently, positive quality determinants represent the key advantages of the object under analysis. These features need to be developed and enhanced in order to attract and better satisfy customers. • Finally, the role of the neutral quality determinants should not be underestimated. At a first analysis, it might seem that these elements do not influence satisfaction and therefore are not critical in the value offering. However, these elements may be important for the quality perception of the object under analysis since discussed by users. The fact that these determinants may both generate satisfaction and dissatisfaction in the users determines a symmetric MRP distribution. Understanding the reasons is the winning key for their correct management and continuous improvement.
Taken together, these considerations provide a new perspective on the results of the algorithms for UGCs analysis in the field of quality management. The combination of Mean Topical Prevalence and Mean Topic Rating can lay the basis for the definition of a new taxonomy of determinants of products, services or Product-Service Systems. A potentially dynamic taxonomy is capable of tracking the evolution of users' perceptions over time.

Conclusions
The present research aimed at offering a novel methodology to identify and categorize quality determinants of a product, service or Product-Service System. The proposed methodology is based on the analysis of User-Generated Contents, and more specifically, on the use of topic modeling algorithms and the processing of their results. The proposed application exploits the Structural Topic Model algorithm, but similar results can also be obtained with other topic modeling algorithms (such as Latent Dirichlet Allocation, Latent Semantic Analysis, Hierarchical Dirichlet Process).
Car-sharing was chosen as a case study for this article. In detail, 20 quality determinants have been identified for car-sharing. Seven of these are mainly related to customer dissatisfaction: accident and damages management; registration process; car condition; charges and fees; customer service responsiveness; car start-up issues; customer service courtesy; billing and membership. Four determinants can be classified as positive, and therefore drivers of customer satisfaction: convenience; efficacy; sharing benefits; intermodal transportation. The remaining eight quality determinants are neutral, hence not distinctly related to customer satisfaction or dissatisfaction: customer service (physical office); parking areas; app reliability; end trip issues; use rates; car proximity; car availability; car reservation.
The classification of quality determinants was based on a new metric derived from the results of the topic modeling algorithm: the Mean Rating Proportion (MPR), i.e., the average proportion of a topic in reviews with a specific rating level.
The present study may contribute to recent debates concerning Quality 4.0, which involves the usage of artificial intelligence tools to understand the intense information flows related to quality aspects. Overall, this study strengthens the idea that UGCs can represent a valuable source of information to manage and design quality characteristics. The proposed approach can be applied in every context in which UGCs are available and quality determinants need to be identified and classified. Although the current study proposes an analysis related to the PSS car-sharing, similar results can be obtained by analyzing others UGC datasets related to PSS, products or services.
The main limitations of the proposed approach lie in the sensitivity of the results to the UGCs' dataset structure, which may be influenced by the number and type of reviews considered in the analysis, their composition, language and geographical region of origin.
Further work is needed to fully understand the effect of customer satisfaction and perceived quality on the MRP. Besides, more attention should be given to the classification of quality determinants through MRP profiles. Clustering techniques can be used to provide a more objective and specific classification. Funding: This research has been partially supported by Ministero dell'Istruzione, dell'Università e della Ricerca Award TESUN-83486178370409 finanziamento dipartimenti di eccellenza CAP. 1694 TIT. 232 ART. 6.

Conflicts of Interest:
The authors declare no conflict of interest.