Understanding 21st Century Bordeaux Wines from Wine Reviews Using Naïve Bayes Classiﬁer

: Wine has been popular with the public for centuries; in the market, there are a variety of wines to choose from. Among all, Bordeaux, France, is considered as the most famous wine region in the world. In this paper, we try to understand Bordeaux wines made in the 21st century through Wineinformatics study. We developed and studied two datasets: the ﬁrst dataset is all the Bordeaux wine from 2000 to 2016; and the second one is all wines listed in a famous collection of Bordeaux wines, 1855 Bordeaux Wine O ﬃ cial Classiﬁcation, from 2000 to 2016. A total of 14,349 wine reviews are collected in the ﬁrst dataset, and 1359 wine reviews in the second dataset. In order to understand the relation between wine quality and characteristics, Naïve Bayes classiﬁer is applied to predict the qualities (90 +/ 89 − ) of wines. Support Vector Machine (SVM) classiﬁer is also applied as a comparison. In the ﬁrst dataset, SVM classiﬁer achieves the best accuracy of 86.97%; in the second dataset, Naïve Bayes classiﬁer achieves the best accuracy of 84.62%. Precision, recall, and f-score are also used as our measures to describe the performance of our models. Meaningful features associate with high quality 21 century Bordeaux wines are able to be presented through this research paper.


Introduction
The ancient beverage, wine, has remained popular in modern times. While the ancients had mostly wine available from neighboring vineyards, the number and variety of wines available for purchase have exploded in modern times. Consumers are assaulted with an endless number of varieties and flavors. Some examples include red wine, white wine, rose wine, starch-based wine, etc., which are then also based on a variety of grapes, fruits like apples, and berries. For a non-expert, unfamiliar with the various nuances that make each brand distinct, the complexity of decision making has vastly increased. In such a competitive market, wine reviews and rankings matter a lot since they become part of the heuristics that drive consumers decision making. Producers of wine gain a competitive advantage by knowing what factors contribute the most to quality as determined by rankings. What has also changed is the amount of data available. Moore's law and other advances in computing have allowed for the collection and analysis of vast amounts of data. Data mining is the utilization of various statistics, algorithms, and other tools of analysis to uncover useful insights into all this data. The goal of data mining is to gain predictive or descriptive information in the domain of interest. To help producers better understand the determinants of wine quality we decided to harness the power of these data mining techniques on two datasets on wine produced in the Bordeaux region. This region is the biggest wine delivering district in France and one of the most influential wine districts in the world.
There is a lot of research that focuses on the price and vintage of Bordeaux wines [1][2][3] from a historical and economic data. Shanmuganathan et al. applied decision tree and statistical methods Since the wine reviews are stored in human language format, we have to convert reviews into machine understandable via the computational wine wheel [23]. The computational wine wheel works as a dictionary to one-hot encoding to convert words into vectors. For example, in the wine review, there are some words that contain fruits such as apple, blueberry, plum, etc. If the word matches the attribute in the computation wine wheel, it will be 1, otherwise, it will be 0. More examples can be found in Figure 1. Many other wine characteristics are included in the computational wine wheel other than fruit flavors, such as descriptive adjectives (balance, beautifully, etc.) and body of the wine (acidity, level of tannin, etc.). The computational wine wheel is also equipped with generalization function to generalize similar words into the same coding. For example, fresh apple, apple, and ripe apple are generalized into "Apple" since they represent the same flavor; yet, green apple belongs to "Green Apple" since the flavor of green apple is different from apple. In this research, in order to understand the characteristics of classic (95+) and outstanding (90-94) wine, we use 90 points as a cutting point. If a wine receives a score equal/above 90 points out of 100, we mark the label as a positive (+) class to the wine. Otherwise, the label would be a negative (−) class. There are some wines that scored a ranged score, such as 85-88. We use the average of the ranged score to decide and assign the label.

Datasets
We developed two datasets in this research. The first one is the reviews for all the Bordeaux wines made in the 21st century (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016). The second one is the reviews for all available wine In this research, in order to understand the characteristics of classic (95+) and outstanding (90-94) wine, we use 90 points as a cutting point. If a wine receives a score equal/above 90 points out of 100, we mark the label as a positive (+) class to the wine. Otherwise, the label would be a negative (−) class. There are some wines that scored a ranged score, such as 85-88. We use the average of the ranged score to decide and assign the label.

Datasets
We developed two datasets in this research. The first one is the reviews for all the Bordeaux wines made in the 21st century (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016). The second one is the reviews for all available wine listed in Bordeaux Wine Official Classification, made in the 21st century (2000-2016) as well. The second dataset Beverages 2020, 6, 5 4 of 16 is a subset of the first dataset. All the available wine reviews were collected from Wine Spectator. Details of each dataset will be discussed as follows.

ALL Bordeaux Wine Dataset
A total of 14,349 wines has been collected. There are 4263 90+ wines and 10,086 89− wines. The number of 89− wines is more than 90+ wines. The score distribution is given in Figure 2a. Most wines score between 86 and 90. Therefore, they fall into the category of "Very Good" wine. In Figure 2b, the line chart is used to represent the trend of number of wines has been reviewed in each year. The chart also reflects the quality of vintages. More than 1200 wines were reviewed in 2009 and 2010, which indicates that 2009 and 2010 are good vintages in Bordeaux. Wine makers are more willing to send their wines to be reviewed if their wines are good.
Beverages 2020, 6, x FOR PEER REVIEW 4 of 16 listed in Bordeaux Wine Official Classification, made in the 21st century (2000-2016) as well. The second dataset is a subset of the first dataset. All the available wine reviews were collected from Wine Spectator. Details of each dataset will be discussed as follows.

ALL Bordeaux Wine Dataset
A total of 14,349 wines has been collected. There are 4263 90+ wines and 10,086 89− wines. The number of 89− wines is more than 90+ wines. The score distribution is given in Figure 2a. Most wines score between 86 and 90. Therefore, they fall into the category of "Very Good" wine. In Figure 2b, the line chart is used to represent the trend of number of wines has been reviewed in each year. The chart also reflects the quality of vintages. More than 1200 wines were reviewed in 2009 and 2010, which indicates that 2009 and 2010 are good vintages in Bordeaux. Wine makers are more willing to send their wines to be reviewed if their wines are good.

1855 Bordeaux Wine Official Classification Dataset
A total of 1359 wines has been collected. In this dataset, we have 882 90+ wines and 477 89− wines. The score distribution is given in Figure 2. Unlike the data distribution of the first dataset, which has much more 89− wines than 90+ wines, in Wine Spectator, the wines selected in this research are elite choices based on Bordeaux Wine Official Classification in 1855 (a complete list of Bordeaux Wine Official Classification in 1855 is given in Appendix A). Therefore, classic (95+ points) and outstanding (90-94 points) wines are the majority of this dataset. The number of wines has been reviewed annually is given in Figure 3b. Since Bordeaux Wine Official Classification in 1855 is a famous collection of Bordeaux wines, wine makers send their wine for review almost every year. Therefore, the line chart remains stable, which is very different from Figure 2b. Regardless, some wines listed in Bordeaux Wine Official Classification in 1855 may still missing their wine reviews in Wine Spectator. A complete list of wines and vintages we cannot find within this dataset's scope is listed in Appendix B.

1855 Bordeaux Wine Official Classification Dataset
A total of 1359 wines has been collected. In this dataset, we have 882 90+ wines and 477 89− wines. The score distribution is given in Figure 2. Unlike the data distribution of the first dataset, which has much more 89− wines than 90+ wines, in Wine Spectator, the wines selected in this research are elite choices based on Bordeaux Wine Official Classification in 1855 (a complete list of Bordeaux Wine Official Classification in 1855 is given in Appendix A). Therefore, classic (95+ points) and outstanding (90-94 points) wines are the majority of this dataset. The number of wines has been reviewed annually is given in Figure 3b. Since Bordeaux Wine Official Classification in 1855 is a famous collection of Bordeaux wines, wine makers send their wine for review almost every year. Therefore, the line chart remains stable, which is very different from Figure 2b. Regardless, some wines listed in Bordeaux Wine Official Classification in 1855 may still missing their wine reviews in Wine Spectator. A complete list of wines and vintages we cannot find within this dataset's scope is listed in Appendix B.

Classification Algorithms
Our goal of this research is to find out the important wine characteristics/attributes toward 21st century general Bordeaux wines. Applying white box classification algorithms is a way to achieve the goal. Based on the previous research, Naïve Bayes classifier algorithm achieved the best accuracy among all applied white box classification algorithms; and Support Vector Machine (SMV) classifier algorithm, which is from black box classification algorithms family, always had slightly better accuracy than Naïve Bayes [29]. Therefore, in this research, we applied Naïve Bayes classifier algorithm to find out the important wine characteristics/attributes toward 21st century general Bordeaux wines. Then we applied SMV classifier as a comparison to evaluate the goodness of Naïve Bayes classifier.

Naïve Bayes
Naïve Bayes is a commonly used machine learning classification algorithm. A Naïve Bayes classifier is a simple probabilistic classifier by applying Bayes' theorem with ignoring the dependency between features.

Classification Algorithms
Our goal of this research is to find out the important wine characteristics/attributes toward 21st century general Bordeaux wines. Applying white box classification algorithms is a way to achieve the goal. Based on the previous research, Naïve Bayes classifier algorithm achieved the best accuracy among all applied white box classification algorithms; and Support Vector Machine (SMV) classifier algorithm, which is from black box classification algorithms family, always had slightly better accuracy than Naïve Bayes [29]. Therefore, in this research, we applied Naïve Bayes classifier algorithm to find out the important wine characteristics/attributes toward 21st century general Bordeaux wines. Then we applied SMV classifier as a comparison to evaluate the goodness of Naïve Bayes classifier.

Naïve Bayes
Naïve Bayes is a commonly used machine learning classification algorithm. A Naïve Bayes classifier is a simple probabilistic classifier by applying Bayes' theorem with ignoring the dependency between features.
Formula of Naïve Bayes classifier algorithm [30]: Bayes' Theorem: The posterior probability of Y belongs to a particular class when X happens; P(X|Y): The prior probability of certain feature value X when Y belongs to a certain class; P(Y): prior probability of Y; P(X): prior probability of X.
Naïve Bayes Classifier: Beverages 2020, 6, 5 6 of 16 c: number of values in Y When a value of X never appears in the training set, the prior probability of that value of X will be 0. If we do not use any techniques, P(Y X 1, X 2 , . . . X n ) will be 0, even when some of other prior probability of X are very high. This case does not seem fair to other X. Therefore, we use Laplace smoothing to handle zero prior probability.

SMV
"SVM are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis" [31]. SVM for classification will based on the training data, building a model by constructing "a hyperplane or set of hyperplanes in a high-or infinite-dimensional space, which can be used for classification, regression, or other tasks. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training data point of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier". After having the model, a test data is used to predict the accuracy. SVM light [32] is the version of SVM that was used to perform the classification of attributes for this project. Figures 4 and 5, is used to evaluate the predictive performance of our models, especially the performance of the model for new data, which can reduce overfitting to some extent. First, we shuffle the dataset randomly. Second, we group 90+/89− wines. Third, split 90+ wine group and 89− wine group into 5 subsets separately. Fourth, combine first subset from 90+ wine group and first subset from 89− wine group into a new set, repeat the same process for the rest. In this way, we split our dataset into 5 subsets with the same distribution as the original dataset.

5-fold cross-validation, illustrated in
Beverages 2020, 6, x FOR PEER REVIEW 6 of 16 When a value of X never appears in the training set, the prior probability of that value of X will be 0. If we do not use any techniques, , , … will be 0, even when some of other prior probability of X are very high. This case does not seem fair to other X. Therefore, we use Laplace smoothing to handle zero prior probability.

SMV
"SVM are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis" [31]. SVM for classification will based on the training data, building a model by constructing "a hyperplane or set of hyperplanes in a high-or infinite-dimensional space, which can be used for classification, regression, or other tasks. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training data point of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier". After having the model, a test data is used to predict the accuracy. SVM light [32] is the version of SVM that was used to perform the classification of attributes for this project. Figures 4 and 5, is used to evaluate the predictive performance of our models, especially the performance of the model for new data, which can reduce overfitting to some extent. First, we shuffle the dataset randomly. Second, we group 90+/89− wines. Third, split 90+ wine group and 89− wine group into 5 subsets separately. Fourth, combine first subset from 90+ wine group and first subset from 89− wine group into a new set, repeat the same process for the rest. In this way, we split our dataset into 5 subsets with the same distribution as the original dataset. After data splitting, we use the subset 1 as testing set, the rest of the subsets as training set as fold 1; we use subset 2 as testing set, the rest of the subsets as training set as fold 2; Repeat the same process for the rest.  When a value of X never appears in the training set, the prior probability of that value of X will be 0. If we do not use any techniques, , , … will be 0, even when some of other prior probability of X are very high. This case does not seem fair to other X. Therefore, we use Laplace smoothing to handle zero prior probability.

SMV
"SVM are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis" [31]. SVM for classification will based on the training data, building a model by constructing "a hyperplane or set of hyperplanes in a high-or infinite-dimensional space, which can be used for classification, regression, or other tasks. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training data point of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier". After having the model, a test data is used to predict the accuracy. SVM light [32] is the version of SVM that was used to perform the classification of attributes for this project. Figures 4 and 5, is used to evaluate the predictive performance of our models, especially the performance of the model for new data, which can reduce overfitting to some extent. First, we shuffle the dataset randomly. Second, we group 90+/89− wines. Third, split 90+ wine group and 89− wine group into 5 subsets separately. Fourth, combine first subset from 90+ wine group and first subset from 89− wine group into a new set, repeat the same process for the rest. In this way, we split our dataset into 5 subsets with the same distribution as the original dataset. After data splitting, we use the subset 1 as testing set, the rest of the subsets as training set as fold 1; we use subset 2 as testing set, the rest of the subsets as training set as fold 2; Repeat the same process for the rest.  After data splitting, we use the subset 1 as testing set, the rest of the subsets as training set as fold 1; we use subset 2 as testing set, the rest of the subsets as training set as fold 2; Repeat the same process for the rest. To evaluate the effectiveness of the classification model, several standard statistical evaluation metrics are used in this paper. First of all, we need to define True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) as:
If we use 90 points as a cutting point, to describe TP is this research's perspective would be "if a wine scores equal/above 90 and the classification model also predicts it as equal/above 90". In this research, we include the following evaluation metrics: Accuracy: The proportion of wines that has been correctly classified among all wines. Accuracy is a very intuitive metric.
Recall: The proportion of 90+ wines was identified correctly. Recall explains the sensitivity of the model to 90+ wine.
Precision: The proportion of predicted 90+ wines was actually correct.
F-score: The harmonic mean of recall and precision. F-score takes both recall and precision into account, combining them into a single metric.

ALL Bordeaux Wine Dataset
In ALL Bordeaux Wine dataset, both Naïve Bayes classifier and SVM classifier achieve 85% accuracy or above. SVM classifier achieves the highest accuracy of 86.97%. In terms of precision, SVM classifier has much better performance than Naïve Bayes Laplace classifier. Diametrically opposed in recall, Naïve Bayes Laplace classifier has much better performance. Naïve Bayes classifier and SVM classifier have very close F-scores. Overall, SMV has better performance in terms of accuracy and f-score. Details can be found in Table 1.

1855 Bordeaux Wine Official Classification Dataset
In Bordeaux Wine Official Classification in 1855 dataset, both Naïve Bayes classifier and SVM classifier are able to achieve 81% accuracy or above. Naive Bayes Laplace classifier achieves the highest accuracy of 84.62%. In terms of precision, both classifiers are around 86%; SVM classifier achieves highest precision of 86.84%. In terms of recall, Naive Bayes Laplace classifier achieves the recall as high as 90.02%. In the combination of precision and recall, Naive Bayes Laplace classifier has the highest F-score of 88.38%. Overall, Naïve Bayes Laplace has better performance than SMV classifier in this specific Bordeaux wine dataset. Details can be found in Table 2.

Comparison of Two Datasets
SVM classifier achieves the best accuracy of 86.97% in ALL Bordeaux Wine dataset; Naïve Bayes Laplace achieves the best accuracy of 84.62% in 1855 Bordeaux Wine Official Classification dataset. The accuracies in both datasets are very close. However, compared to the second dataset, the models in the first dataset have relatively poor performance in terms of accuracy, recall, and f-score. This can be explained from their score distribution. In the first dataset, there are more 89− wines than 90+ wines, so the models can better identify 89− wines than 90+ wines. In the second dataset, there are more 90+ wines than 89− wines, so the models can better identify 90+ wines than 89− wines.

Visualization of 1855 Bordeaux Wine Official Classification Dataset
With the benefit of using Naïve Bayes in a small dataset, we developed a visualized classification result from Naïve Bayes for the Bordeaux Wine Official Classification in 1855 dataset in Figure 6a. In the figure, we have the probability that the sample is 90+ for the horizontal axis, and the probability that the sample is 89− for the vertical axis. According to Bayes' theorem, the sample belongs to the class with a bigger probability. Therefore, a line y = x is drawn as the decision boundary. Any samples in the area that are below the line are predicted as positive classes and vice versa. The points in blue are actually from 89-class, orange is 90+ class. By seeing this figure, we can tell the numbers of misclassified samples, and can be accurate to false positive and false negative samples. Figure 6b  has the highest F-score of 88.38%. Overall, Naïve Bayes Laplace has better performance than SMV classifier in this specific Bordeaux wine dataset. Details can be found in Table 2.

Comparison of Two Datasets
SVM classifier achieves the best accuracy of 86.97% in ALL Bordeaux Wine dataset; Naïve Bayes Laplace achieves the best accuracy of 84.62% in 1855 Bordeaux Wine Official Classification dataset. The accuracies in both datasets are very close. However, compared to the second dataset, the models in the first dataset have relatively poor performance in terms of accuracy, recall, and fscore. This can be explained from their score distribution. In the first dataset, there are more 89− wines than 90+ wines, so the models can better identify 89− wines than 90+ wines. In the second dataset, there are more 90+ wines than 89− wines, so the models can better identify 90+ wines than 89− wines.

Visualization of 1855 Bordeaux Wine Official Classification Dataset
With the benefit of using Naïve Bayes in a small dataset, we developed a visualized classification result from Naïve Bayes for the Bordeaux Wine Official Classification in 1855 dataset in Figure 6a. In the figure, we have the probability that the sample is 90+ for the horizontal axis, and the probability that the sample is 89− for the vertical axis. According to Bayes' theorem, the sample belongs to the class with a bigger probability. Therefore, a line y = x is drawn as the decision boundary. Any samples in the area that are below the line are predicted as positive classes and vice versa. The points in blue are actually from 89-class, orange is 90+ class. By seeing this figure, we can tell the numbers of misclassified samples, and can be accurate to false positive and false negative samples. Figure 6b

Top 20 Keywords
SVM is considered as a black-box classifier, since the classification processes are unexplainable. Naïve Bayes, on the other hand, is a white-box classification algorithm, since each attribute has its own probability contribute to positive case and negative case. We extract keywords with 20 highest positive probabilities toward 90+ and 89− classes from both datasets.
In ALL Bordeaux Wine dataset, there are 11 common keywords that appear in both 90+ and 89− wines. Details can be found in Table 3. These common keywords represent the important wine characteristics/attributes toward 21st century general Bordeaux wines. Furthermore, our goal is to understand the important wine characteristics/attributes toward 21st century classic and outstanding Bordeaux wines. Therefore, finding out the distinct keywords between 90+ and 89− is our final goal. Details about the distinct keywords between 90+ and 89− from ALL Bordeaux Wine dataset can be found in Tables 4 and 5. According to Table 4, fruity characters including "BLACK CURRANT", "APPLE", "RASPERBERRY", and "FIG" are favorable flavors to 21st century Bordeaux. Since Bordeaux is also famous for red wines that can age for many years, "SOLID" (showed in Table 4 Body category) is preferred over "MEDIUM-BODIED" and "LIGHT-BODIED" (showed in Table 5 Body category).

Top 20 Keywords
SVM is considered as a black-box classifier, since the classification processes are unexplainable. Naïve Bayes, on the other hand, is a white-box classification algorithm, since each attribute has its own probability contribute to positive case and negative case. We extract keywords with 20 highest positive probabilities toward 90+ and 89− classes from both datasets.
In ALL Bordeaux Wine dataset, there are 11 common keywords that appear in both 90+ and 89− wines. Details can be found in Table 3. These common keywords represent the important wine characteristics/attributes toward 21st century general Bordeaux wines. Furthermore, our goal is to understand the important wine characteristics/attributes toward 21st century classic and outstanding Bordeaux wines. Therefore, finding out the distinct keywords between 90+ and 89− is our final goal. Details about the distinct keywords between 90+ and 89− from ALL Bordeaux Wine dataset can be found in Tables 4 and 5. According to Table 4, fruity characters including "BLACK CURRANT", "APPLE", "RASPERBERRY", and "FIG" are favorable flavors to 21st century Bordeaux. Since Bordeaux is also famous for red wines that can age for many years, "SOLID" (showed in Table 4 Body category) is preferred over "MEDIUM-BODIED" and "LIGHT-BODIED" (showed in Table 5 Body category).  In 1855 Bordeaux Wine Official Classification dataset, there are 11 common keywords that appear in both 90+ and 90− wines. Details can be found in Table 6. Comparing the common keywords with ALL Boredeaux Wine dataset, 10 out of 11 are the same keywords. "TANNINES_LOW" only appears in ALL Bordeaux Wine dataset, and "SWEET" only appears in 1855 Bordeaux Wine Official Classification dataset. Details about the distinct keywords between 90+ and 89− from 1855 Bordeaux Wine Offical Classification dataset can be found in Tables 7 and 8. Comparing the dictinct keywords between 90+ and 89− wines from both datasets in 90+ wines, "LONG", "BLACK CURRANT", "APPLE", and "FIG" appear in both datasets; "RANGE", "RIPE", "RASPERBERRY", "SOLID", and "LICORICE" only appear in ALL Bordeaux Wine dataset; "STYLE", "LOVELY", "IRON", "TANNINS_LOW", and "SPICE" only appear in 1855 Bordeaux Wine Offical Classification.

Conclusions
In this research, we developed and studied two datasets: the first dataset is all the Bordeaux wine from 2000 to 2016; and the second one is all wines listed in a famous collection of Bordeaux wines, 1855 Bordeaux Wine Official Classification, from 2000 to 2016. We used Naïve Bayes classifier and SMV classifier to make wine quality prediction based on wine reviews. Overall, Naïve Bayes classifier works better than SMV in the 1855 Bordeaux Wine Official Classification dataset, slightly worse than SMV in the ALL Bordeaux Wine. Also, with the benefit of using Naïve Bayes classifier, we were able to find the important wine characteristics/attributes toward 21st century classic and outstanding Bordeaux wines. The list of common attributes in Tables 3 and 6 identifies the general wine characteristics in Bordeaux; while the list of dominate attributes in Tables 4 and 7 (Tables 5 and 8) shows the preferable characteristics for 90+ (90-) wines. Those characteristics/attributes can help producers improve the quality of their wines allowing them to concentrate of dilute the wanted or unwanted characteristics during the wine making process. To the best of our knowledge, this is the first paper that gives a detailed analysis in all prestigious Bordeaux wines in the 21st century.
To go further in this research as future works, two follow up questions can be raised; 1. Instead of dichotomous (90+ and 90−) analysis, can the research use finer label (classic, outstanding, very good, and good) to categorize these Bordeaux wine to perform the analysis or even regression analysis [32]? 2. What characteristics/attributes make the Bordeaux wines become a classic (95+) instead of outstanding (90-94)? The first question can be studied as a multi-class problem in data science since the computational model will be built into four different classes and produce important characteristics for each class. The second question is a typical highly unbalanced problem in data science. The number of wines scores 95+ is much less than 95− wines. The regular computational model such as SVM and Naïve Bayes will not be able to identify the boundary between the two classes and predict all testing wines into majority class. How to amplify the minority class and obtain meaningful information is a big challenge in this type of question. Finally, we would like to address the limitation of our current research. Since the computational wine wheel was developed from Wine Spectators' Top 100 lists, the proposed research might have optimal results in the dataset collected from Wine Spectators' review. While several other wine experts in the filed such as Robert Parker Wine Advocate [33], Wine Enthusiast [34], and Decanter [35] may not agree with each other's comments, they can still agree in the overall score of the wine. The legendary Chateau Latour 2009 gives a great example [36]; every reviewer scores the wine either 100 or 99 and their testing notes are very different with each other. This would be our ultimate challenge in Wineinformatics research that involves the true human language processing topic.