Country-Level Relationships of the Human Intake of N and P, Animal and Vegetable Food, and Alcoholic Beverages with Cancer and Life Expectancy

Background: The quantity, quality, and type (e.g., animal and vegetable) of human food have been correlated with human health, although with some contradictory or neutral results. We aimed to shed light on this association by using the integrated data at country level. Methods: We correlated elemental (nitrogen (N) and phosphorus (P)) compositions and stoichiometries (N:P ratios), molecular (proteins) and energetic traits (kilocalories) of food of animal (terrestrial or aquatic) and vegetable origin, and alcoholic beverages with cancer prevalence and mortality and life expectancy (LE) at birth at the country level. We used the official databases of United Nations (UN), Food and Agriculture Organization of the United Nations (FAO), Organization for Economic Co-operation and Development (OECD), World Bank, World Health Organization (WHO), U.S. Department of Agriculture, U.S. Department of Health, and Eurobarometer, while also considering other possibly involved variables such as income, mean age, or human development index of each country. Results: The per capita intakes of N, P, protein, and total intake from terrestrial animals, and especially alcohol were significantly and positively associated with prevalence and mortality from total, colon, lung, breast, and prostate cancers. In contrast, high per capita intakes of vegetable N, P, N:P, protein, and total plant intake exhibited negative relationships with cancer prevalence and mortality. However, a high LE at birth, especially in underdeveloped countries was more strongly correlated with a higher intake of food, independent of its animal or vegetable origin, than with other variables, such as higher income or the human development index. Conclusions: Our analyses, thus, yielded four generally consistent conclusions. First, the excessive intake of terrestrial animal food, especially the levels of protein, N, and P, is associated with higher prevalence of cancer, whereas equivalent intake from vegetables is associated with lower prevalence. Second, no consistent relationship was found for food N:P ratio and cancer prevalence. Third, the consumption of alcoholic beverages correlates with prevalence and mortality by malignant neoplasms. Fourth, in underdeveloped countries, reducing famine has a greater positive impact on health and LE than a healthier diet.


Empirical Bayesian Framework
In this section we briefly summarize the key features of the empirical framework used. We assume that a scalar response variable , (that contains -depending on the model -the standardized prevalence of cancer, standardized cancer mortality and life expectancy at birth, respectively) measured for country = 1, … , arises from the following model where is a −dimensional vector of fundamental factors that include income and age structure determinants, as well as observations of total, kilocaloric, phosphorous and protein intake from vegetable, alcoholic, terrestrial/aquatic animal sources, as well as an intercept term. is a K by P rotation matrix, containing the three eigenvectors associated with the highest eigenvalues for each of the 5 cluster of covariates. is a vector of regression coefficients of dimension × 1 and ~(0, 2 ) is a Gaussian shock with variance 2 .
This model can be easily estimated using maximum likelihood estimation. However, since one of the goals of this study is to analyze the driving forces that determine cancer prevalence rates across countries, we need a more flexible approach that allows to a.) assess uncertainty with respect to the underlying structural model and b.) enables robust estimation if the number of observations is small relative to the number of covariates . The Bayesian approach allows, through flexible prior specifications, to control for model uncertainty and this entails estimating large models with only a moderate number of observations.
To set the stage, we assume that each element of , , arises from a mixture of Gaussians distribution. This prior, labeled the stochastic search variable selection (SSVS) prior (see George & McCulloch, 1993;1997), is given by: | ~ (0, 1 2 ) + (0, 0 2 )(1 − ), whereby 1 2 ≫ 0 2 denote prior scaling parameters, where 0 2 is specified to be close to zero and denotes an indicator variable that follows a Bernoulli distribution with prior inclusion probability 0 . In the empirical application, 1 2 = 10 2 and 0 2 = 10 −4 while 0 = 1/2. This specification implies that if = 1, a Gaussian prior with a larger prior variance is used for with little weight attached to the prior information (i.e. exclusion of the corresponding element in ). This component of the mixture distribution is commonly referred to as the 'slab' distribution. By contrast, if = 0, the prior variance is close to zero and the corresponding element in is pushed to zero. We refer to this component as the 'spike' distribution. The can be used to infer what covariates determine cancer prevalence rates across regimes.
The remaining priors are standard in the literature. On 2 , we use an inverted Gamma prior specified to be weakly informative while we use a Gaussian prior with zero mean and a large prior variance on .
Model estimation is carried out using a Markov chain Monte Carlo (MCMC) algorithm. This algorithm cycles between full conditional posterior distributions, iteratively sampling from a Gaussian posterior density, 2 from an inverse Gamma posterior distribution, the indicators from a Bernoulli distribution. The posterior moments of all quantities except take standard forms and are, for the sake of brevity, not repeated here. Note: Estimates in bold are statistically significant with a 95% confidence interval. "Sign" denotes the posterior sign certainty of a covariate in the model. The coefficients in the Bayesian models were interpreted using the sign certainty of each covariate. If the sign certainty is above 97.5 the coefficient is interpreted as significant.. The "median" column contains the estimates of the posterior coefficient, which describes the median increase in the dependent variable (e.g. cancer prevalence) in response to a one-unit increase in the explanatory variable (e.g. N/P intake). The "Std. Deb." column contains the corresponding posterior standard deviations for the coefficients. Note: Estimates in bold are statistically significant with a 95% confidence interval. "Sign" denotes the posterior sign certainty of a covariate in the model.