Dining is an essential part of human life. In order to pursue a healthier self, more and more people enjoy homemade cuisines. Consequently, the amount of recipe websites has increased significantly. These online recipes represent different cultures and cooking methods from various regions, and provide important indications on nutritional content. In recent years, the development of data science made data mining a popular research area. However, only a few researches in Taiwan have applied data mining in the studies of recipes and nutrients. Therefore, this work aims at utilizing machine learning models to discover health-related insights from recipes on social media. First, we collected over 15,000 Chinese recipes from the largest recipe website in Taiwan to build a recipe database. We then extracted information from this dataset through natural language processing methodologies so as to better understand the characteristics of various cuisines and ingredients. Thus, we can establish a classification model for the automatic categorization of recipes. We further performed cluster analysis for grouping nutrients to recognize the nutritional differences for each cluster and each cuisine type. The results showed that using the support vector machine (SVM) model can successfully classify recipes with an average F-score of 82%. We also analyzed the nutritional value of different cuisine categories and the possible health effects they may bring to the consumers. Our methods and findings can assist future work on extracting essential nutritional information from recipes and promoting healthier diets.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited