Analyzing Online Car Reviews Using Text Mining

Consumer reviews on the web have rapidly become an important information source through which consumers can share their experiences and opinions about products and services. It is a form of text-based communication that provides new possibilities and opens vast perspectives in terms of marketing. Reading consumer reviews gives marketers an opportunity to eavesdrop on their own consumers. This paper examines consumer reviews of three different competitive automobile brands and analyzes the advantages and disadvantages of each vehicle using text mining and association rule methods. The data were collected from an online resource for automotive information, Edmunds.com, with a scraping tool “ParseHub” and then processed in R software for statistical computing and graphics. The paper provides detailed insights into the superior and problematic sides of each brand and into consumers’ perceptions of automobiles and highlights differences between satisfied and unsatisfied groups regarding the best and worst features of the brands.


Introduction
The rapid spread of the Internet has provided humanity with a new way to obtain information. It has now become the biggest source of information, with people conducting ever more searches on the Web. Alongside this, social media, another part of the Internet domain, has also captured the attention of netizens. Social media can take many different forms, one of which is product-review websites. These, along with other types of social media, provide a platform for consumers to share their experiences and opinions about the products they purchase and use, thereby providing other consumers with information about the pros and cons of these products. Such communication is also known as electronic word-of-mouth and has opened up new horizons in marketing. By reading consumer reviews, companies and marketers get to know their customers and thus obtain a better understanding of marketing opportunities, the competitive landscape, the market structure, and the features of their own and competitors' products that customers discuss.
Nowadays, with data spreading dramatically, many organizations as well as researchers strive to find patterns among data using datamining methods. Text mining is one of these methods and is used to analyze consumer reviews. For instance, we can take as an example a situation where consumers write comments or reviews about a mobile telephone they purchase and discuss their best or worst experiences. Consumer 1 purchases a mobile telephone from Company A while Consumer 2 purchases a mobile telephone from Company B. The autonomy of a mobile telephone from Company A is better than Company B, while the camera quality of the mobile phone from Company B is better than Company A. Hence, Consumer 2 is likely to write a review that will contain positive feedback about the camera. However, he is also likely to leave negative feedback about the autonomy of the camera. Thus, extracting meaningful information from consumer reviews such as the most frequent words and the relationship between them provides companies with an insight into the superior features of and various product categories such as books [33][34][35]38], movies [39][40][41], fashion [42], cosmetics [43], hotels [44][45][46][47], washing machines [48], online lectures [49], restaurants [50,51], and airlines [52].
However, there have been very few studies on consumer car reviews. Kulkarni et al. [53] examined whether Internet use is associated with different choice patterns for cars and found that Internet users rely more on ratings while non-Internet users rely more on recommendations. Sagar et al. [1] considered whether factors affecting car choice behavior such as competition, consumer preferences, and government policies are salient features. Kaushal [2] identified car purchasing behavior through 39 items and validated the usefulness of five factors: safety & security, quality, performance, value, and technology. In this paper, we use mining methods to analyze eight different features: performance, comfort, value, interior, reliability, safety, technology, and exterior for three competitive brands in the automobile market in 2012. Such an approach gives marketers the opportunity to eavesdrop on their own consumers and on consumers in the automotive market in general. In particular, engineers and marketers from automotive companies, based on the results of the review analysis, can obtain structured information about both the superior and problematic aspects of their vehicles and those of their rivals, thereby gaining a competitive advantage in the market. Consequently, the implementation of such an approach can trigger sales growth and improve firm performance.

Text Mining Approach to Car Reviews
To obtain adequate and proper data, reviews were collected from one of the biggest online resources for automotive information, Edmunds.com. The process of mining is divided into several steps.

Scraping Data from Websites
Data was collected using the scraping tool "ParseHub." This is a useful instrument when dealing with information of any kind. It can be adapted to any website, and scholars can extract any piece of information, be it a text or an image (e.g., Figure 1). In this study, units such as title, model of vehicle, best features, worst features, ratings for 8 different features, and total ratings were collected. The process of scraping is illustrated in Figure 1. However, there have been very few studies on consumer car reviews. Kulkarni et al. [53] examined whether Internet use is associated with different choice patterns for cars and found that Internet users rely more on ratings while non-Internet users rely more on recommendations. Sagar et al. [1] considered whether factors affecting car choice behavior such as competition, consumer preferences, and government policies are salient features. Kaushal [2] identified car purchasing behavior through 39 items and validated the usefulness of five factors: safety & security, quality, performance, value, and technology. In this paper, we use mining methods to analyze eight different features: performance, comfort, value, interior, reliability, safety, technology, and exterior for three competitive brands in the automobile market in 2012. Such an approach gives marketers the opportunity to eavesdrop on their own consumers and on consumers in the automotive market in general. In particular, engineers and marketers from automotive companies, based on the results of the review analysis, can obtain structured information about both the superior and problematic aspects of their vehicles and those of their rivals, thereby gaining a competitive advantage in the market. Consequently, the implementation of such an approach can trigger sales growth and improve firm performance.

Text Mining Approach to Car Reviews
To obtain adequate and proper data, reviews were collected from one of the biggest online resources for automotive information, Edmunds.com. The process of mining is divided into several steps.

Scraping Data from Websites
Data was collected using the scraping tool "ParseHub." This is a useful instrument when dealing with information of any kind. It can be adapted to any website, and scholars can extract any piece of information, be it a text or an image (e.g., Figure 1). In this study, units such as title, model of vehicle, best features, worst features, ratings for 8 different features, and total ratings were collected. The process of scraping is illustrated in Figure 1. The results can be saved in JSON or CSV format. In this study, they were saved to CSV format, slightly corrected in an Excel program, and then changed to XSLX format for further usage. The output appears as shown in Figure 2. The results can be saved in JSON or CSV format. In this study, they were saved to CSV format, slightly corrected in an Excel program, and then changed to XSLX format for further usage. The output appears as shown in Figure 2. All reviews were divided into two groups: satisfied and unsatisfied. The overall rating by consumers was used as the basis for this division. The average mean of overall ratings thus became a separating point for all three car samples and the next condition was set as follows: If ((car = 1) and (overall_rating >= 4.30)) GD = 1 If ((car = 1) and (overall_rating < 4.30)) GD = 0 If ((car = 2) and (overall_rating >= 4.60)) GD = 1 If ((car = 2) and (overall_rating < 4.60)) GD = 0 If ((car = 3) and (overall_rating >= 3.70)) GD = 1 If ((car = 3) and (overall_rating < 3.70)) GD = 0 where Car #1 is a Hyundai Elantra, Car #2 is a Honda Civic, and Car #3 is a Ford Focus. GD is a variable for group diversity where GD = 1 relates to the satisfied group and GD = 0 relates to the unsatisfied group. The result of the data split is presented in Table 1. For further analysis, data was input into the R program using the "xlsx" package. This enables the R user to read, write, and format Excel files. Best features, worst features, satisfied and unsatisfied variables, reviewers' id, name of automobile brand, and car id relative to automobile brand were included as input. Part of the input data is displayed in Figure 3. All reviews were divided into two groups: satisfied and unsatisfied. The overall rating by consumers was used as the basis for this division. The average mean of overall ratings thus became a separating point for all three car samples and the next condition was set as follows: If ((car = 1) and (overall_rating >= 4.30)) GD = 1 If ((car = 1) and (overall_rating < 4.30)) GD = 0 If ((car = 2) and (overall_rating >= 4.60)) GD = 1 If ((car = 2) and (overall_rating < 4.60)) GD = 0 If ((car = 3) and (overall_rating >= 3.70)) GD = 1 If ((car = 3) and (overall_rating < 3.70)) GD = 0 where Car #1 is a Hyundai Elantra, Car #2 is a Honda Civic, and Car #3 is a Ford Focus. GD is a variable for group diversity where GD = 1 relates to the satisfied group and GD = 0 relates to the unsatisfied group. The result of the data split is presented in Table 1. For further analysis, data was input into the R program using the "xlsx" package. This enables the R user to read, write, and format Excel files. Best features, worst features, satisfied and unsatisfied variables, reviewers' id, name of automobile brand, and car id relative to automobile brand were included as input. Part of the input data is displayed in Figure 3.
In the output we acquired 539 reviews, each of which referred to either the satisfied group or unsatisfied group in accordance with the sat/unsat variable obtained from the average mean of the overall rating. In the output we acquired 539 reviews, each of which referred to either the satisfied group or unsatisfied group in accordance with the sat/unsat variable obtained from the average mean of the overall rating.

Data Manipulation
After the data were superficially arranged, R program tools were used to process the data and divide the reviews into four different groups: satisfied best features, satisfied worst features, unsatisfied best features, and unsatisfied worst features. Such segregation can be deciphered as meaning that, even though consumers may be satisfied or unsatisfied, they still identify some best and worst features of the car they are reviewing. An example of the output is presented in Figure 4, where rows are organized randomly to increase clarity.

Data Cleansing
To make it possible to count words and identify their co-occurrence in reviews, text must first undergo the cleaning process. Some features of text are thus removed, such as numbers, white spaces, punctuation, and common English words that have no semantic meaning. As well as converting text to lower case, fixing contractions and text stemming is also essential to obtain accurate and valuable data. We therefore received all the words occurred in all 1078 reviews, which totaled 2299 words. A sample of the output is presented in Figure 5.

Data Manipulation
After the data were superficially arranged, R program tools were used to process the data and divide the reviews into four different groups: satisfied best features, satisfied worst features, unsatisfied best features, and unsatisfied worst features. Such segregation can be deciphered as meaning that, even though consumers may be satisfied or unsatisfied, they still identify some best and worst features of the car they are reviewing. An example of the output is presented in Figure 4, where rows are organized randomly to increase clarity.

Data Cleansing
To make it possible to count words and identify their co-occurrence in reviews, text must first undergo the cleaning process. Some features of text are thus removed, such as numbers, white spaces, punctuation, and common English words that have no semantic meaning. As well as converting text to lower case, fixing contractions and text stemming is also essential to obtain accurate and valuable data. We therefore received all the words occurred in all 1078 reviews, which totaled 2299 words. A sample of the output is presented in Figure 5.

Data Cleansing
To make it possible to count words and identify their co-occurrence in reviews, text must first undergo the cleaning process. Some features of text are thus removed, such as numbers, white spaces, punctuation, and common English words that have no semantic meaning. As well as converting text to lower case, fixing contractions and text stemming is also essential to obtain accurate and valuable data. We therefore received all the words occurred in all 1078 reviews, which totaled 2299 words. A sample of the output is presented in Figure 5.

Data Mastering
For further data analysis, the final step was to combine our primary database and database, which consisted of occurring words. Thus, with the help of R tools, a master table was created, as shown in Figure 6.

Frequency Analysis
Once the master table was created, the actual analysis could be conducted. For this we utilized "WordCloud," one of R program's utilities. This is a visualization method that displays how frequently words appear in a given sample of text, and the way it works is quite simple. The more frequently a specific word appears in a database, the bigger and bolder it appears in the word cloud. The results of our cases will now be discussed.

Data Mastering
For further data analysis, the final step was to combine our primary database and database, which consisted of occurring words. Thus, with the help of R tools, a master table was created, as shown in Figure 6.

Data Mastering
For further data analysis, the final step was to combine our primary database and database, which consisted of occurring words. Thus, with the help of R tools, a master table was created, as shown in Figure 6.

Frequency Analysis
Once the master table was created, the actual analysis could be conducted. For this we utilized "WordCloud," one of R program's utilities. This is a visualization method that displays how frequently words appear in a given sample of text, and the way it works is quite simple. The more frequently a specific word appears in a database, the bigger and bolder it appears in the word cloud. The results of our cases will now be discussed. Figure 7 shows the best features of three cars. Words with the highest co-occurrence are represented in this word cloud, with the most frequent and important words located in the center and the least frequent words located on the edges. Hence, the closer the words are to edges, the less

Frequency Analysis
Once the master table was created, the actual analysis could be conducted. For this we utilized "WordCloud," one of R program's utilities. This is a visualization method that displays how frequently words appear in a given sample of text, and the way it works is quite simple. The more frequently a specific word appears in a database, the bigger and bolder it appears in the word cloud. The results of our cases will now be discussed.  Figure 7 shows the best features of three cars. Words with the highest co-occurrence are represented in this word cloud, with the most frequent and important words located in the center and the least frequent words located on the edges. Hence, the closer the words are to edges, the less frequent they are. In the case of Hyundai, "seat" is the most frequent word, followed by "interior," then "style," and then the rest. In the case of Honda, the most frequent words are, in order, "mpg," "gas," "seat," "comfort," "mileage," "dash," "control," "display," "smooth," "steering," "wheel," "econ," "fun," and so on. In contrast with Hyundai, where the main advantage was design and style, consumers mostly emphasize characteristics related to value, technology, and movement on a road. In the case of Ford, the most frequent words are "handles," "seat," "interior," "system," "sync," "style," "comfort," "gas," "transmission," "exterior," "mileage," and so on. frequent they are. In the case of Hyundai, "seat" is the most frequent word, followed by "interior," then "style," and then the rest. In the case of Honda, the most frequent words are, in order, "mpg," "gas," "seat," "comfort," "mileage," "dash," "control," "display," "smooth," "steering," "wheel," "econ," "fun," and so on. In contrast with Hyundai, where the main advantage was design and style, consumers mostly emphasize characteristics related to value, technology, and movement on a road. In the case of Ford, the most frequent words are "handles," "seat," "interior," "system," "sync," "style," "comfort," "gas," "transmission," "exterior," "mileage," and so on. In the case of Hyundai, after filtration only 14 words remain. As shown in the barplot in Figure  8, there are many words relating to appearance. These are "interior" (24), "style" (19), "exterior" (13), "look" (13), and "design" (10). The interpretation of this result is that consumers mostly liked the car design. The most frequent word is "seat," which occurs 43 times. Because this word has such high frequency, association analysis was conducted to determine its significance. We performed correlation analysis on the most frequent word, as shown in Figure 9. For example, in the case of Hyundai, the term "seat" has a high correlation with words such as "position," "front," and "back." Hence, we can assume that this word refers to the convenience and comfort consumers felt when they sat in a Hyundai Elantra. Consistent with this interpretation, words such as "back," "comfort," and "rear" might also refer to comfort, which was one of the best features for consumers. Similarly, the occurrence of words such as "mpg" and "gas" means that consumers were satisfied with Hyundai's fuel consumption. For the word "control," the most closely associated word was "steering" with a correlation of 0.78. This means consumers were likely to be satisfied with their control over the movement of a vehicle.  In the case of Hyundai, after filtration only 14 words remain. As shown in the barplot in Figure 8, there are many words relating to appearance. These are "interior" (24), "style" (19), "exterior" (13), "look" (13), and "design" (10). The interpretation of this result is that consumers mostly liked the car design. The most frequent word is "seat," which occurs 43 times. Because this word has such high frequency, association analysis was conducted to determine its significance. We performed correlation analysis on the most frequent word, as shown in Figure 9. For example, in the case of Hyundai, the term "seat" has a high correlation with words such as "position," "front," and "back." Hence, we can assume that this word refers to the convenience and comfort consumers felt when they sat in a Hyundai Elantra. Consistent with this interpretation, words such as "back," "comfort," and "rear" might also refer to comfort, which was one of the best features for consumers. Similarly, the occurrence of words such as "mpg" and "gas" means that consumers were satisfied with Hyundai's fuel consumption. For the word "control," the most closely associated word was "steering" with a correlation of 0.78. This means consumers were likely to be satisfied with their control over the movement of a vehicle. frequent they are. In the case of Hyundai, "seat" is the most frequent word, followed by "interior," then "style," and then the rest. In the case of Honda, the most frequent words are, in order, "mpg," "gas," "seat," "comfort," "mileage," "dash," "control," "display," "smooth," "steering," "wheel," "econ," "fun," and so on. In contrast with Hyundai, where the main advantage was design and style, consumers mostly emphasize characteristics related to value, technology, and movement on a road. In the case of Ford, the most frequent words are "handles," "seat," "interior," "system," "sync," "style," "comfort," "gas," "transmission," "exterior," "mileage," and so on. In the case of Hyundai, after filtration only 14 words remain. As shown in the barplot in Figure  8, there are many words relating to appearance. These are "interior" (24), "style" (19), "exterior" (13), "look" (13), and "design" (10). The interpretation of this result is that consumers mostly liked the car design. The most frequent word is "seat," which occurs 43 times. Because this word has such high frequency, association analysis was conducted to determine its significance. We performed correlation analysis on the most frequent word, as shown in Figure 9. For example, in the case of Hyundai, the term "seat" has a high correlation with words such as "position," "front," and "back." Hence, we can assume that this word refers to the convenience and comfort consumers felt when they sat in a Hyundai Elantra. Consistent with this interpretation, words such as "back," "comfort," and "rear" might also refer to comfort, which was one of the best features for consumers. Similarly, the occurrence of words such as "mpg" and "gas" means that consumers were satisfied with Hyundai's fuel consumption. For the word "control," the most closely associated word was "steering" with a correlation of 0.78. This means consumers were likely to be satisfied with their control over the movement of a vehicle.   In the case of Honda, the first two words are "mpg" and "gas" which means this car has very low fuel consumption. Additionally, for some consumers, seats seem to be very comfortable. The rest of the words are related to the dashboard and technological features such as "dash," "control," "display," "steering," "system," "bluetooth," and "econ." The word "steering" refers more to technology than holding the road because the correlated words were "wheel," "control," "electronic assist," and "dash." The word "econ" was correlated with words such as "mode" and "feature." This is explained by the fact that the Honda Civic has an econ button as a special function, which has become one of its most favored features.
In the case of Ford, the most frequent word is "handles," which occurred 48 times. Because it is quite difficult to interpret this word, an association analysis was conducted as shown in Figure 10. We can assume that the word "handles" does not refer to the means by which a thing is held, carried, or controlled, but how easily a car is to handle on a road. Such words as "turn," "directions," "balance," "quiet," "turn," and others can help to precisely interpret the meaning of this word. Another frequent word was "sync," which are correlated with some words such as voice, system, phone, ipod, control, navigation, and so on.

Worst Features
The same analysis was then conducted using reviews that contained the worst features of three brands. For a Hyundai Elantra, the most frequent word is "mpg," which occurred 28 times, as shown in Figure 11. Although "mpg" also occurred in the results for best features, it is not impossible for the same term to appear in worst features. Here, we can assume that many consumers were not satisfied In the case of Honda, the first two words are "mpg" and "gas" which means this car has very low fuel consumption. Additionally, for some consumers, seats seem to be very comfortable. The rest of the words are related to the dashboard and technological features such as "dash," "control," "display," "steering," "system," "bluetooth," and "econ." The word "steering" refers more to technology than holding the road because the correlated words were "wheel," "control," "electronic assist," and "dash." The word "econ" was correlated with words such as "mode" and "feature." This is explained by the fact that the Honda Civic has an econ button as a special function, which has become one of its most favored features.
In the case of Ford, the most frequent word is "handles," which occurred 48 times. Because it is quite difficult to interpret this word, an association analysis was conducted as shown in Figure 10. We can assume that the word "handles" does not refer to the means by which a thing is held, carried, or controlled, but how easily a car is to handle on a road. Such words as "turn," "directions," "balance," "quiet," "turn," and others can help to precisely interpret the meaning of this word. Another frequent word was "sync," which are correlated with some words such as voice, system, phone, ipod, control, navigation, and so on.  In the case of Honda, the first two words are "mpg" and "gas" which means this car has very low fuel consumption. Additionally, for some consumers, seats seem to be very comfortable. The rest of the words are related to the dashboard and technological features such as "dash," "control," "display," "steering," "system," "bluetooth," and "econ." The word "steering" refers more to technology than holding the road because the correlated words were "wheel," "control," "electronic assist," and "dash." The word "econ" was correlated with words such as "mode" and "feature." This is explained by the fact that the Honda Civic has an econ button as a special function, which has become one of its most favored features.
In the case of Ford, the most frequent word is "handles," which occurred 48 times. Because it is quite difficult to interpret this word, an association analysis was conducted as shown in Figure 10. We can assume that the word "handles" does not refer to the means by which a thing is held, carried, or controlled, but how easily a car is to handle on a road. Such words as "turn," "directions," "balance," "quiet," "turn," and others can help to precisely interpret the meaning of this word. Another frequent word was "sync," which are correlated with some words such as voice, system, phone, ipod, control, navigation, and so on.

Worst Features
The same analysis was then conducted using reviews that contained the worst features of three brands. For a Hyundai Elantra, the most frequent word is "mpg," which occurred 28 times, as shown in Figure 11. Although "mpg" also occurred in the results for best features, it is not impossible for the same term to appear in worst features. Here, we can assume that many consumers were not satisfied with fuel consumption and that these consumers outnumber those who were satisfied.

Worst Features
The same analysis was then conducted using reviews that contained the worst features of three brands. For a Hyundai Elantra, the most frequent word is "mpg," which occurred 28 times, as shown in Figure 11. Although "mpg" also occurred in the results for best features, it is not impossible for the same term to appear in worst features. Here, we can assume that many consumers were not satisfied with fuel consumption and that these consumers outnumber those who were satisfied. Moreover, if we consider the correlation analysis for "mpg" in Figure 12, we can see that there are highly correlated words such as "show," "computer," "onboard," and "display." Therefore, we assume there might be some problem related to displaying the mpg on the onboard computer. This hypothesis was checked manually, and it was found that many consumers were complaining about an incorrect mpg display. Consistent with this finding, the correlation for the word "gas" yielded a similar result as shown in Figure 13, so we can assume that the words "estimates" and "misleading" are referring to the same problem. The barplot for worst features shows that consumers were unsatisfied with spare tire ("spare" and "tire" were the most highly correlated with = 0.89), noise on the road, fog lights, trunk, mpg efficiency, as well as the mpg display and seats. As mentioned previously, "mpg," "gas," "fuel," as well as "seat," occurred inconsistently in the results for best features. Such phenomena could be accounted for by the differing preferences of every individual. Furthermore, based on the proportion of words for both groups, mpg and mileage are more likely to be considered poor rather than superior features because the sum of occurrences in best features is 36; in worst features it is 61.
In the case of Honda, we can see that one of the most frequent words in terms of worst features is "interior," as shown in Figure 14. Moreover, if we consider the correlation analysis for "mpg" in Figure 12, we can see that there are highly correlated words such as "show," "computer," "onboard," and "display." Therefore, we assume there might be some problem related to displaying the mpg on the onboard computer. This hypothesis was checked manually, and it was found that many consumers were complaining about an incorrect mpg display. Moreover, if we consider the correlation analysis for "mpg" in Figure 12, we can see that there are highly correlated words such as "show," "computer," "onboard," and "display." Therefore, we assume there might be some problem related to displaying the mpg on the onboard computer. This hypothesis was checked manually, and it was found that many consumers were complaining about an incorrect mpg display. Consistent with this finding, the correlation for the word "gas" yielded a similar result as shown in Figure 13, so we can assume that the words "estimates" and "misleading" are referring to the same problem. The barplot for worst features shows that consumers were unsatisfied with spare tire ("spare" and "tire" were the most highly correlated with = 0.89), noise on the road, fog lights, trunk, mpg efficiency, as well as the mpg display and seats. As mentioned previously, "mpg," "gas," "fuel," as well as "seat," occurred inconsistently in the results for best features. Such phenomena could be accounted for by the differing preferences of every individual. Furthermore, based on the proportion of words for both groups, mpg and mileage are more likely to be considered poor rather than superior features because the sum of occurrences in best features is 36; in worst features it is 61.
In the case of Honda, we can see that one of the most frequent words in terms of worst features is "interior," as shown in Figure 14. Consistent with this finding, the correlation for the word "gas" yielded a similar result as shown in Figure 13, so we can assume that the words "estimates" and "misleading" are referring to the same problem. Moreover, if we consider the correlation analysis for "mpg" in Figure 12, we can see that there are highly correlated words such as "show," "computer," "onboard," and "display." Therefore, we assume there might be some problem related to displaying the mpg on the onboard computer. This hypothesis was checked manually, and it was found that many consumers were complaining about an incorrect mpg display. Consistent with this finding, the correlation for the word "gas" yielded a similar result as shown in Figure 13, so we can assume that the words "estimates" and "misleading" are referring to the same problem. The barplot for worst features shows that consumers were unsatisfied with spare tire ("spare" and "tire" were the most highly correlated with = 0.89), noise on the road, fog lights, trunk, mpg efficiency, as well as the mpg display and seats. As mentioned previously, "mpg," "gas," "fuel," as well as "seat," occurred inconsistently in the results for best features. Such phenomena could be accounted for by the differing preferences of every individual. Furthermore, based on the proportion of words for both groups, mpg and mileage are more likely to be considered poor rather than superior features because the sum of occurrences in best features is 36; in worst features it is 61.
In the case of Honda, we can see that one of the most frequent words in terms of worst features is "interior," as shown in Figure 14. The barplot for worst features shows that consumers were unsatisfied with spare tire ("spare" and "tire" were the most highly correlated with = 0.89), noise on the road, fog lights, trunk, mpg efficiency, as well as the mpg display and seats. As mentioned previously, "mpg," "gas," "fuel," as well as "seat," occurred inconsistently in the results for best features. Such phenomena could be accounted for by the differing preferences of every individual. Furthermore, based on the proportion of words for both groups, mpg and mileage are more likely to be considered poor rather than superior features because the sum of occurrences in best features is 36; in worst features it is 61.
In the case of Honda, we can see that one of the most frequent words in terms of worst features is "interior," as shown in Figure 14.
The correlation graphic shows that this is highly correlated with "cheap," as shown in Figure 15. Therefore, we can assume that some customers did not like the quality and appearance of their interior, viewing it as a drawback rather than an advantage. Additionally, "fabric" is correlated with "interior." Although the correlation is quite low, we can still assume that reviewers were unsatisfied with the material of their interior. Furthermore, some consumers were not satisfied with back or front seats. Moreover, fog lights, mirrors, and noise on roads became some of the worst features in the Honda Civic as shown in Figure 14.
well as "seat," occurred inconsistently in the results for best features. Such phenomena could be accounted for by the differing preferences of every individual. Furthermore, based on the proportion of words for both groups, mpg and mileage are more likely to be considered poor rather than superior features because the sum of occurrences in best features is 36; in worst features it is 61.
In the case of Honda, we can see that one of the most frequent words in terms of worst features is "interior," as shown in Figure 14. The correlation graphic shows that this is highly correlated with "cheap," as shown in Figure 15. Therefore, we can assume that some customers did not like the quality and appearance of their interior, viewing it as a drawback rather than an advantage. Additionally, "fabric" is correlated with "interior." Although the correlation is quite low, we can still assume that reviewers were unsatisfied with the material of their interior. Furthermore, some consumers were not satisfied with back or front seats. Moreover, fog lights, mirrors, and noise on roads became some of the worst features in the Honda Civic as shown in Figure 14. In the case of Ford, the worst features were "transmission," "seat," "back," "control," "fix," "issue," "shift," "rear," "wheel," "system," and so on. The most frequent word "transmission" occurred 57 times, comparatively larger than other terms. To understand the transmission flaw in the Ford Focus, an association graph was built, as shown in Figure 16. Among the terms highly correlated with "transmission" were "severe," "grinding," "crunching," and "bucking." Therefore, we can assume there is a problem with the transmission, as it is perceived as making strange sounds and being inconvenient to use. In addition, the terms "issue," "fix," "manual," "problem," and "shift" also correlated with "transmission," which means that it is probably the most significant problem with the Ford Focus. Ford also seems to have problems in terms of technology. For instance, the terms "control" and "wheel" were highly correlated with "steering," "device," "equipment," "aux," "cruise," "dashboard," and other words, which can be interpreted as Ford exhibiting a deficiency in equipment. There are also consumers who In the case of Ford, the worst features were "transmission," "seat," "back," "control," "fix," "issue," "shift," "rear," "wheel," "system," and so on. The most frequent word "transmission" occurred 57 times, comparatively larger than other terms. To understand the transmission flaw in the Ford Focus, an association graph was built, as shown in Figure 16. The correlation graphic shows that this is highly correlated with "cheap," as shown in Figure 15. Therefore, we can assume that some customers did not like the quality and appearance of their interior, viewing it as a drawback rather than an advantage. Additionally, "fabric" is correlated with "interior." Although the correlation is quite low, we can still assume that reviewers were unsatisfied with the material of their interior. Furthermore, some consumers were not satisfied with back or front seats. Moreover, fog lights, mirrors, and noise on roads became some of the worst features in the Honda Civic as shown in Figure 14. In the case of Ford, the worst features were "transmission," "seat," "back," "control," "fix," "issue," "shift," "rear," "wheel," "system," and so on. The most frequent word "transmission" occurred 57 times, comparatively larger than other terms. To understand the transmission flaw in the Ford Focus, an association graph was built, as shown in Figure 16. Among the terms highly correlated with "transmission" were "severe," "grinding," "crunching," and "bucking." Therefore, we can assume there is a problem with the transmission, as it is perceived as making strange sounds and being inconvenient to use. In addition, the terms "issue," "fix," "manual," "problem," and "shift" also correlated with "transmission," which means that it is probably the most significant problem with the Ford Focus. Ford also seems to have problems in terms of technology. For instance, the terms "control" and "wheel" were highly correlated with "steering," "device," "equipment," "aux," "cruise," "dashboard," and other words, which can be interpreted as Ford exhibiting a deficiency in equipment. There are also consumers who are certainly not satisfied with the seats and space in a cab, both front and back (rear). Among the terms highly correlated with "transmission" were "severe," "grinding," "crunching," and "bucking." Therefore, we can assume there is a problem with the transmission, as it is perceived as making strange sounds and being inconvenient to use. In addition, the terms "issue," "fix," "manual," "problem," and "shift" also correlated with "transmission," which means that it is probably the most significant problem with the Ford Focus. Ford also seems to have problems in terms of technology. For instance, the terms "control" and "wheel" were highly correlated with "steering," "device," "equipment," "aux," "cruise," "dashboard," and other words, which can be interpreted as Ford exhibiting a deficiency in equipment. There are also consumers who are certainly not satisfied with the seats and space in a cab, both front and back (rear).

Implications and Discussion
Based on the results, we can assume that the biggest strength of the Hyundai Elantra car is its design. This is supported by the fact that the most frequent words are related to car appearance. These include "interior," "style," exterior," "look," and "design." The worst features for Hyundai appear to be gas consumption and some problems with technology, such as the mpg display on the onboard computer and problems with a spare tire. In contrast to the Hyundai car, the biggest strengths of the Honda car are low gas consumption, dashboard, controls on the steering wheel, and the "econ" mode, which improves fuel efficiency. The worst feature for Honda appears to be its interior, which reviewers emphasized as cheap. Furthermore, they were unsatisfied with the material it was made of. In the case of Ford, the biggest strength appeared to be manipulation of the car. This is consistent with the high frequency of the word "handles." The, other best features for Ford were the interior and exterior, noiselessness during the ride, and the "Ford Sync" system which allowed users to control automotive functions using their voice. The biggest disadvantage for Ford was found to be transmission. This was supported by the high frequency of the word "transmission" and other frequent yet negative words, such as "issue," "fix," "manual," "problem," and "shift." Another important disadvantage relates to technology, specifically a problem with the controls on the steering wheel. In addition, in all three cases one of the most frequent words was "seat," which appeared in both "best features" and "worst features" categories, suggesting that reviewers are divided in their opinions. Hence, it can be concluded that it is difficult for all three car brands to find favor in the eyes of all consumers.

Analysis of Car Features Using the Association Rule
According to Edmunds.com, eight different consumers rate features whereby each one of eight features refers to certain terms and involves some form standard conception. Otherwise, every individual might have a different conception about each of the features relative to other individuals.

•
Performance involves terms such as acceleration, braking, road holding, and shifting. • Comfort relates to front seats, rear seats, getting in/out, and noise/vibration. • Value involves fuel economy, maintenance cost, purchase cost, and resale value.

•
Interior implies cargo/storage, instrumentation, interior design, and the logic of controls. • Reliability relates to repair frequency, dealership support, engine, and transmission. • Safety consists of headlights, outward visibility, parking aids, and rain/snow traction. • Technology stands for entertainment, navigation, Bluetooth, and USB ports.

•
Exterior stands for exterior design.
The question that arises is: how are the most frequent terms for each car related to the eight different features and what is their frequency? To answer this question, both groups of reviews, which contain best features and worst features, were combined and analyzed using text mining tools. The aim was to determine the frequency of every word that occurs in reviews. The process was conducted for all three car brands: Hyundai Elantra 2012, Honda Civic 2012, and Ford Focus 2012. The 24 most frequent terms were chosen as a sample and, using the same association approach as in previous research, the relationship between the terms and eight different features were found. The results are presented in Table 2. Although the frequency for the most frequent terms was found, the total number of all reviews for each vehicle brand was different; specifically, 116 for the Hyundai Elantra, 156 for the Honda Civic, and 267 for the Ford Focus, respectively. Thus, we have to adjust the numbers to the common denominator to interpret the comparison between three brands more clearly. To do this, the following formula was applied: F = frequency of a given term total quantity of reviews for a given car where F is approximate occurrence of a given term in one review.
Thus, all words according to a specific feature were summed, and their frequency before and after adjustment was determined.

Analysis Results for Eight Features
As shown in Table 3, the highest frequency was for terms related to comfort and interior features in Hyundai, comfort and technological features in Honda, and comfort in Ford (Criteria: F ≥ 1). Therefore, reviewers were mostly interested in these features and discussed these most heatedly. These can now be scrutinized more closely for each case.
(1) Hyundai If we compare the frequency of terms for interior and the average rating score of consumers for interior we see that terms related to interior appear more often in the satisfied group and in best features because the score for interior is quite high. Therefore, the Hyundai company has won the favor of consumers in respect to its interior. The comfort score is 3.97, which is neither high nor low. This suggests there might be some factors reviewers were not satisfied with. Thus, a more precise analysis is needed. Another feature worth considering is the exterior feature. Compared to other automobile brands, the frequency of terms related to the exterior is significantly higher than for Honda and Ford. It can therefore be assumed that the exterior is also the strongest feature for Hyundai. Its score is highest among the scores for all features and the likelihood that terms related to an exterior would mostly occur in the satisfied group and best features is very high. (2) Honda Comparing the frequencies and scores for comfort and technology, it is very likely that terms related to these will occur mostly in the satisfied group and best features because the score for both groups is pretty high. Furthermore, the frequency and score for technology is the highest among all three automobile brands, which can be interpreted as Honda being a technological leader.
(3) Ford case Although the frequency of terms related to comfort is high, the score is quite low. Although it is not the lowest score compared to other features, the result suggests that such terms would appear in both satisfied and unsatisfied groups and in both the best and worst features groups. Furthermore, reliability is also worth mentioning, because the frequency of terms related to this feature is much higher than for Hyundai and Honda. With a very low score for reliability, we can assume that terms will mostly appear in worst features for both satisfied and unsatisfied groups.

Comparison of Two Groups' Reviews
In this section, we compare the reviews of both groups and find terms whose influence is greater than others. We also compare the differences between satisfied and unsatisfied groups of reviewers. What, therefore, are the frequency and ratio of terms for eight different features between satisfied and unsatisfied groups and what are the implications of this? Performance As shown in Table 4, in the case of Hyundai consumers rarely mentioned words related to performance in comparison to Honda and Ford consumers.
The satisfied group mentioned the words "control," "system," and "speed" more often than the unsatisfied group, although it is difficult to say they were definitely satisfied with these factors because these terms appeared in both best features and worst features. The frequency of these terms in the unsatisfied group is very low. Along with a fairly low score for performance, we can assume that reviewers were unsatisfied due to factors other than "control," "system," and "speed." In the case of Honda, consumers were generally satisfied with the performance because the most frequent words for this feature mostly appeared in best features, and, looking at the performance score of 4.29 in Table 3, we can assume they were significant to a certain degree. The result for Ford is ambiguous, but we can say with confidence that reviewers like how the car handles as the term "handle" appeared much more frequently than other terms and, in 95% of cases, appeared in best features for both satisfied and unsatisfied groups. Comfort As shown in Table 5, Hyundai drivers felt comfortable in the car, and most were very satisfied with the seats. However, it seems that it has some problems with noise as the average score for comfort is low. Looking at the results for Honda, consumers who were satisfied with the Honda Civic purchase felt very comfortable in a cab and were pleased with the space provided, but it is likely that both satisfied and unsatisfied groups were unsatisfied with the back seats. In the case of Ford, there were people who found it comfortable and people who did not. The unsatisfied group did not discuss the comfort feature as much as the satisfied group, and it seems that individuals from the satisfied group liked neither the front nor the back seats. The most positive aspect of Ford mentioned by reviewers is the manipulation of the car. Value As shown in Table 6, most Hyundai holders were not satisfied with the fuel consumption of this car. Nevertheless, some of the satisfied group felt that Hyundai's mpg was not bad. The reason for this might depend on individual satisfaction levels in relation to mpg assessment. In the case of Honda, all terms related to fuel consumption constantly appeared in best features for both satisfied and unsatisfied groups. Occurring several times in worst features was the word minimal, which means Honda's mpg is very high and probably best among the three brands. Ford holders did not mention fuel consumption as much as owners of the other cars, but it is likely that Ford does not have any problems with fuel consumption and is credibly even better than Hyundai's mpg. Hence, there are other factors that resulted in the low score for value.

Interior
As shown in Table 7, the interior was most often discussed in the Hyundai case, and it is clear that reviewers from both satisfied and unsatisfied groups were greatly satisfied with this attribute. However, there is also a problem with a spare tire, which often appeared in worst features for both satisfied and unsatisfied groups. Opinions about the interior for Honda were divided among the satisfactory group, but the common element for both groups is that they liked the dashboard display. Additionally, the satisfied group often mentioned "room," which means they were satisfied with this feature. The interior results for Ford holders were good rather than bad, but it seems this was not the most important feature for reviewers.

Exterio
As shown in Table 8, in all three cases, customers were plenty satisfied with the exterior, especially Hyundai and Ford users. For instance, the occurrence of terms related to the exterior was very high for the Hyundai Elantra and the term "exterior" never appeared in worst features in either of the two groups. In the case of Ford, relative to other features, the exterior was the only factor with which consumers were satisfied. It is also the only factor which has a high score in the Ford sample. However, looking at the frequency of the term, we can assume that it was not the hottest topic for discussion compared to Hyundai. For the Honda Civic, the term "exterior" did not appear at all, which means it was not the main factor in determining whether customers purchased this car.

Reliability and Safety
As shown in Table 9, among the 24 most frequent terms for Hyundai and Honda, only one word, "engine," was related to reliability. There might be other words such as "engine" that could influence a decision to rate reliability, but these did not appear among the most frequent words. Therefore, it is hard to determine the extent to which the term "engine" affected the reliability score, but many people from the satisfied group for Honda were satisfied with its engine and mentioned it a few times. In the Hyundai group, the term "engine" occurred almost equally in best features and worst features. In the satisfied group it occurred more often in best features while in the unsatisfied group it appeared more frequently in worst features, which makes sense. Thus, we can assume there was an approximately equal number of people who were satisfied and unsatisfied with the engine. Reliability was actively discussed in the Ford group. It is clear that Ford has serious problems in this field, mostly to do with transmission. The frequency of the term "transmission" was 0.29, the highest among all terms in the reliability group and more than three times higher than the frequency of the term "engine" in the Hyundai and Honda groups. Furthermore, the frequency of terms "manual," "automatic," "issue," and "shift" was also high. Based on frequency analysis, where a link was found between these terms and "transmission," we can say that both satisfied and unsatisfied groups criticized transmission, and the occurrence of these terms in total was 0.73, which is very high for just one specific part of a car. The low score of 3.19 in Table 3 is consistent with the results for frequency, so Ford must solve this problem in order to secure clients' trust. In the case of safety, it is difficult to interpret the results as there is only one word, "light," that, after association analysis, was correlated with several features such as "safety" and "technology." Hence, it would be a mistake to judge the significance of the relationship between safety scores for all three car brands and the term "light" as well as its frequency in the satisfied and unsatisfied groups.

Technology
Among the three brands, technology was the most frequently discussed by Honda owners, quite frequently by Ford owners, and least often in the Hyundai group as shown in Table 10. Along with comfort, technology was the hottest topic for discussion in the Honda group. We can say with confidence that Honda holders from both satisfactory groups greatly enjoyed using the steering wheel, Bluetooth, econ function, and inward system. Opinions about the dashboard varied, as there were reviewers in both satisfied and satisfied groups who liked or did not like this feature. Hence, Honda consumers were very satisfied with the technological side, and the technological level is probably the highest among the three brands. In the case of Ford, satisfied and unsatisfied groups mentioned terms related to technology in both best features and worst features, and the ratio was quite similar. Indeed, one of the most frequent terms was "sync," which refers to Ford's special feature. Looking at the results, it seems that, regardless of the satisfactory group, some reviewers enjoyed using this system and some did not. Therefore, the technological side of the Ford Focus was worth paying attention to, but it is unclear whether this is beneficial or disadvantageous for an automotive company. Given the very low score in Table 3 for the technology, we can presume that reviewers who mentioned these words in worst features evaluated it very negatively, while reviewers who mentioned these words in best features did not evaluate it highly. If we look at the results for Hyundai, we can say that consumers liked the Bluetooth system, but we cannot say the same about other terms. Therefore, we suppose there are other factors that resulted in the low score for technology.

Implications and Discussion
Based on the results of this research, the following propositions can be stated. Firstly, among the three car brands, the Hyundai Elantra car has the best marks in relation to "interior" and "exterior," but, in terms of the interior, there is a problem with a spare tire that needs to be solved. A few people were also unsatisfied with gas consumption, so it would be better for engineers from Hyundai to improve the mpg index. Furthermore, Hyundai has problems in terms of technology, one of which is incorrect mpg displays. Moreover, despite consumers' satisfaction with comfort, there was a problem with noise on roads. This seems to be the main reason for a relatively low score for comfort. Hence, Hyundai should reconsider the value and technological particularities of the Elantra car to make it more competitive on the market. Secondly, among the three car brands, the Honda Civic received positive feedback for all features and was found to be best in terms of value and technology. It has the best mpg index and the best technological equipment compared to Hyundai and Ford. Despite satisfaction with all features, Honda engineers should pay attention to the interior, because many consumers criticized it for its cheapness. In addition, Honda should consider the issue of comfort, because back seats were also found to be a weak point. Thirdly, the Ford car was evaluated very poorly regarding all features with the exception of the exterior, where it received a high score. However, according to the results, this was not the most discussed topic among reviewers. Among the 24 most frequent terms, negative terms were in the majority. These were related to the topic of reliability, where reviewers severely criticized transmission and found this to be the biggest problem in the Ford automobile. Apart from problems with transmission, most consumers were quite unsatisfied with both back and front seats. The only feature reviewers were truly pleased with, according to the results, was manipulation of the car. In previous research, reliability was found to be one of the most significant factors for buyers. Therefore, looking at the poor evaluation of this car, reviewers were greatly disappointed with its reliability and, for this reason, the scores for other features were slightly biased. Hence, Ford marketers and engineers must completely reconsider and reassess their car from all sides, starting with the reliability feature.
In addition, the results for several features, such as safety, were unclear and ambiguous, thereby making interpretation difficult. This can be explained by the lack of terms chosen for the analysis. To fill such blind spots, a more extended analysis is needed.

Conclusions
In this paper, consumer reviews of three different competitive automobiles-the Hyundai Elantra, the Honda Civic, and the Ford Focus-were examined. The results can be summarized as follows: Firstly, each car model was analyzed in terms of its best and worst features, thereby underlining the superior features of a given car as well as its problematic features. It was reached by virtue of finding the words appearing most frequently in corresponding reviews. In terms of best features, the Hyundai Elantra has car design, seats, interior, bluetooth, and steering control; the Honda Civic has low gas consumption, seats, dashboard, and technological equipment such as Bluetooth and "econ" mode; and the Ford Focus has car manipulation, exterior, interior, a quiet ride, and the "Ford Sync" function. In terms of the worst features, the Hyundai Elantra has an incorrect mpg display, problems with a spare tire, noise on the road, fog lights, trunk, and mpg inefficiency; the Honda Civic has a cheap interior, seats, fog lights, mirrors, and noise on the road; and the Ford Focus has problems with transmission, seats, space in the cab, and controls on the steering wheel. For the Ford Focus, some features such as seats, gas consumption, dashboard, and "Ford Sync" systems were found in both best and worst features, which can be interpreted as a difference of opinion among reviewers.
Secondly, eight specific yet different features were analyzed using consumers' reviews of best and worst features. The results showed that consumers actively discussed the comfort feature for all three brands. In particular, Hyundai reviewers emphasized the interior and exterior, Honda reviewers were interested in technology, and Ford reviewers paid attention to reliability.
Thirdly, the ways in which the views of both satisfied and unsatisfied groups differed were analyzed. The results showed that Hyundai received the best marks in terms of design and interior but needs to reconsider the value and technology features to make the Elantra more competitive on market. The Honda Civic does not have any critical issues relating to any factors. It has the best mpg index and technological equipment compared to Hyundai and Ford. However, it should consider its cheap interior and comfort feature. In contrast, Ford should completely reconsider and reassess its car, starting with its reliability. This paper has the following limitations. Firstly, even though the Edmunds website is one of the biggest online resources for automotive information, there was still a general lack of reviews. To obtain more significant results, a larger amount of reviews will be needed. Furthermore, it would be useful to obtain data on sales figures for all three car models to make a comparison between sales and the results of this study. In addition, because only 24 of the most frequent terms were chosen for the analysis, the results for some features, such as safety, were unclear and ambiguous, thereby making interpretation difficult. Hence, it is necessary to increase the number of terms to find more related to safety features.