Nutritional phenotyping can help achieve personalized nutrition, and machine learning tools may offer novel means to achieve phenotyping. The primary aim of this study was to use energy balance components, namely input (dietary energy intake and macronutrient composition) and output (physical activity) to predict energy stores (body weight) as a way to evaluate their ability to identify potential phenotypes based on these parameters. From the Women’s Health Initiative Observational Study (WHI OS), carbohydrates, proteins, fats, fibers, sugars, and physical activity variables, namely energy expended from mild, moderate, and vigorous intensity activity, were used to predict current body weight (both as body weight in kilograms and as a body mass index (BMI) category). Several machine learning tools were used for this prediction. Finally, cluster analysis was used to identify putative phenotypes. For the numerical predictions, the support vector machine (SVM), neural network, and k-nearest neighbor (kNN) algorithms performed modestly, with mean approximate errors (MAEs) of 6.70 kg, 6.98 kg, and 6.90 kg, respectively. For categorical prediction, SVM performed the best (54.5% accuracy), followed closely by the bagged tree ensemble and kNN algorithms. K-means cluster analysis improved prediction using numerical data, identified 10 clusters suggestive of phenotypes, with a minimum MAE of ~1.1 kg. A classifier was used to phenotype subjects into the identified clusters, with MAEs <5 kg for 15% of the test set (n = ~2000). This study highlights the challenges, limitations, and successes in using machine learning tools on self-reported data to identify determinants of energy balance.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited