Wineinformatics is a field that uses machine-learning and data-mining techniques to glean useful information from wine. In this work, attributes extracted from a large dataset of over 100,000 wine reviews are used to make predictions on two variables: quality based on a “100-point scale”, and price per 750 mL bottle. These predictions were built using support vector regression. Several evaluation metrics were used for model evaluation. In addition, these regression models were compared to classification accuracies achieved in a prior work. When regression was used for classification, the results were somewhat poor; however, this was expected since the main purpose of the regression was not to classify the wines. Therefore, this paper also compares the advantages and disadvantages of both classification and regression. Regression models can successfully predict within a few points of the correct grade of a wine. On average, the model was only 1.6 points away from the actual grade and off by about $13 per bottle of wine. To the best of our knowledge, this is the first work to use a large-scale dataset of wine reviews to perform regression predictions on grade and price.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited