In recent years, several companies have started selling DNA testing kits via the internet directly to consumers (DTC) [1
]. Such “consumer genetics” companies communicate genetic test results to the customer without medical supervision and offer a wide range of predictions about the personal risk for developing common diseases including cancer, autoimmune or cardiovascular diseases. Overall, “direct-to-consumer genetic testing” services, or DTC-GT, are supposed to guide the user toward more informed decisions about lifestyle choices. Usually, the genetic test starts with an online order of a kit by a consumer, necessary for the collection and delivery of a saliva sample; the companies then extract the consumer’s DNA and assess the presence or absence of specific genetic variants known to be associated with (for example) an increased disease risk, or with the regulation of a trait of interest and various health conditions. Interestingly, several companies have focused their service toward health-related outcomes such as fitness (e.g., performance and injury tests) [2
], pharmacogenetics (e.g., personalized treatment) [3
] and nutrigenetics (e.g., weight control, and food intolerance and sensitivity) [4
The continuous growth of DTC-GT companies is fueled by both the ongoing dramatic drops in DNA sequencing and genotyping costs, and by the availability of a wealth of human genetic variation data. Furthermore, the Human Genome Project (HGP, a 13-year project begun formally in 1990 and coordinated by the National Institutes of Health and the U.S. Department of Energy) [5
] in 2003 provided researchers with the full sequence of the human genome (“reference genome sequence”), which allows researchers to define genetic variants (differences in the sequence of DNA among individuals) and to study their functional consequences.
The most common class of genetic variants, termed single nucleotide polymorphisms (SNPs), is represented by a single nucleotide change with respect to the reference sequence (for example, a cytosine C is replaced by a thymine T) at a given position of the genome. So far, scientists have found more than 600 million SNPs in human populations around the world [6
]. The second most abundant class of genetic variants is insertions and deletions (INDELs), where a person’s DNA sequence at a given position of the genome has more (insertions) or less (deletions) nucleotides with respect to the reference sequence.
SNPs and INDELs, both clinically relevant as well as neutral genetic variants, are registered in a free public archive, the Single Nucleotide Polymorphism Database (dbSNP) [7
]. A specialized subset of dbSNP entries is collected in ClinVar [8
], a freely available, public archive of human genetic variants with proven or suspected clinical relevance. A unique identification tag—the so-called “rs identifier” or “rs ID”—is assigned to each genetic variant stored in dbSNP. For example, the ID rs1801280 corresponds to a genetic variant that changes the aminoacidic sequence of the protein encoded by the gene NAT2 (N-Acetyltransferase 2, see [9
]). Rs identifiers are particularly useful when searching for information about a variant, because they are unambiguous, unique and stable; in contrast, descriptive names of genetic variants (based on their genetic position or on amino acid changes) can be more ambiguous, depending for example on the version of the human genome used as a reference.
The majority of common genetic variants have little or no effect on health or development. However, in certain special cases, they can influence an individual’s response to certain drugs [10
], or increase the risk of developing a complex disease such as type 1 diabetes [11
] or Alzheimer’s disease [12
Genome-wide association studies (GWAS) are a relatively recent and effective way to identify genetic variants and, consequently, genes associated with disease risk and human quantitative traits (such as body mass index or blood levels of a metabolite). GWAS can be based on SNP array data, or on the full genomic sequence; in the second case, variants can be obtained either by directly sequencing individuals or statistically inferring (by “imputation”) the unobserved genotypes in array data from the reference genome [13
]. Large sample sizes are required to accurately estimate the allelic frequencies and, consequently, to obtain a statistically robust finding. When a large number of genetic markers are tested in the same experiment, the results need to be corrected for multiple testing; for GWAS in particular, results are considered reliable when associations achieve an accepted genome-wide statistical significance threshold, conventionally represented by a p
-value < 5 × 10−8
. Despite the success of GWAS in identifying variant–trait associations and their application to clinical analysis, several limitations need to be taken into account when interpreting the results. One of the most relevant limitations is that, in each genetic region, multiple variants can be correlated (they are in “linkage disequilibrium”): on one hand, this facilitates the identification of the association but, on the other hand, makes it difficult to identify the “causal” variant(s), i.e., the variant effectively responsible for the signal. This means that the effective biological effect is unknown. Also, several studies have demonstrated that not only allele frequencies but also biological effects can be different in populations of distinct ancestries: this implies a further level of complexity in identifying the underlying biological mechanisms, which can be different for example in Europeans with respect to Asians or Africans [14
At present, the large success of GWAS is represented by a large number of common variants associated with complex diseases that are freely available in specific databases, such as the GWAS Catalog (https://www.ebi.ac.uk/gwas/
), in particular for neurological, immunological and cardiovascular diseases [15
]. In addition, a wide number of genetic variants associated with clinical, biochemical and anthropometric traits are available. For example, as of 16 December, 2019, the GWAS Catalog contained 4346 publications and 166,103 genetic associations.
In a recent paper, Zenin et al. [17
] identified 35 traits with significant and large genetic correlation to health-span, which can be classified into four clusters: (1) sociodemographic factors, lifespan, smoking and coronary artery disease; (2) high-density lipoprotein (HDL)-related traits; (3) obesity-related disease and body mass index (BMI); and 4) type 2 diabetes-related traits. Therefore, there is a strong genetical correlation between health-span, as a morbidity-free life period, personal history and life-traits, which might be controlled by physiological processes such as nutrition.
Nutritional genomics can be considered a branch of personalized medicine [4
]. It is widely known that many genetic variants can influence the body’s metabolism and responses to nutrients (nutrigenetics) but, on the other hand, nutrients themselves can also modulate gene expression (nutrigenomics) [18
]. Nutrigenetics and nutrigenomics might therefore be defined as two alternative approaches to nutritional genomics.
The identification of genetic variants associated to a specific diet-related disease or to a particular response to food will allow the elaboration of dietary strategies and population-wide dietary recommendations [4
]. A nutritional genomic approach is reliable only for some conditions such as monogenic diseases—e.g., phenylketonuria and galactosemia (and many other similar diseases)—caused by alterations of a single gene product that can be specifically tested [19
]. However, this is typically not the case for complex traits (i.e., obesity, vitamin levels) and diseases (i.e., type 2 diabetes, cardiovascular disease), which result from a combination of genetic and environmental factors. A key measure of the importance of genetic factors is the heritability (h2), the proportion of phenotypic variation of a trait that is due to genetic variation [20
]. It should be noted that heritability does not represent the proportion of a trait determined by genes. Rather, a heritability of, for example, 0.7, means that 70% of the variability in the trait in a specific population is due to genetic differences among the individuals in that population. Each estimate of the trait heritability is specific to a population in a specific environment, and its value, depending on the phenotypic variability, can change as the environmental conditions change [21
]. An interesting example is BMI, whose heritability in higher-risk obesogenic home environments has been estimated to be about 86%, more than double that for those living in lower-risk obesogenic home environments (39%) [23
When traits and diseases are highly heritable (i.e., monogenic traits and disorders determined by one or a few variants), genetic testing will be accurate and very predictive. In contrast, when traits and diseases are only partially influenced by genetic factors and the heritability is low, the predictive ability of tests that consider only a single genetic variant will never be very accurate [24
]. Simply put, in case of complex traits and diseases, having a genetic variant will not mean developing a certain phenotype.
The complexity of these concepts means that a generic user does not have the tools to understand the truthfulness and reliability of these tests, unless direct-to-consumer genetic-testing (DTC-GT) companies declare in a transparent way the origin of the predictors used and the reliability of their advice.
Despite the complexity of these arguments, according to a MIT Technology Review article published in February 2019, “more than 26 million consumers added their DNA to four leading commercial ancestry and health databases” [26
]. A recent analysis of DTC-GT companies performed by KPMG International assessed that this market is supposed to grow to over one billion USD by 2020 [27
], probably being around six billion USD by 2028 [28
A recent survey of a sample of the European population showed that 30% of individuals are strongly convinced that personalized genetic counseling would be effective in improving their eating behavior and felt that paying for such a service seems to be incorrect [29
Within this context, the aim of the present work is to provide an overview of nutrigenetics online services, in order to understand what types of nutritional traits are analyzed by the various companies, and what kinds of information are applied to support decisions and their degree of clearness. In particular, the availability of unambiguous indications about the genes and genetic variants used for nutrigenetics predictions is investigated.
DTC-GT companies are specialized in providing nutritional advice based on an individual’s genetic background. Consumers, without a specific medical prescription, can obtain a set of information about their genetic predisposition to food-related disease or traits, by simply collecting saliva samples at home. Although the provided nutritional advice might lead the consumer to adhere to a better lifestyle, it should be properly explained and supported by scientific evidence. The present study focused on the analysis of the nutrigenetic services sold by 45 companies spread throughout the world. Interestingly, the most analyzed traits of nutritional interest were lactose intolerance and caffeine sensitivity. Since these intolerances are monogenic food-associated diseases, more reliable nutritional advice can be provided to consumers [4
Predisposition to obesity, type 2 diabetes and cardiovascular disease is also widely analyzed by DTC-GT companies, although it is important to note that many factors that are not currently accounted for (such as multiple genetic variants with a general very small effect, environmental/lifestyle factors, and genetics–environment interaction) are involved in the potential development of these diet-related diseases.
We observed that the genetic analysis of lipid metabolism and weight management traits is commonly provided by about one third of the companies. Several genetic studies demonstrated an association between FTO (“fat mass and obesity-associated”) gene variants and obesity. For this reason, although little evidence supports the protective effect of specific nutritional protocols in individuals carrying FTO gene variants [4
], FTO is the most investigated gene. Notably, the best-associated SNP in this gene (rs9939609) lies in intron 1 and explains <1% of the phenotypic variance of BMI and fat percentage in Europeans [30
]; the minor allele increases BMI by 0.39 kg/m2
(or 1130 g in body weight) and is associated with a 1.20-fold increased risk of obesity [31
]. This association has been confirmed across age groups and populations of diverse ancestry, although with different effects in different populations [31
]. According to our analysis, three companies use only rs9939609, one company uses 5 SNPs and six companies do not declare the identifiers of the FTO genetic variants. It is also important to observe that, overall, 941 near-independent SNPs regulate BMI at a genome-wide significance level [32
], explaining totally only 6.0% of its phenotypic variance. This means that, even though the FTO locus explains most of the interindividual variation in BMI, the ability to predict a person’s obesity risk based only on their FTO genotype is very limited.
Among the micronutrients, vitamins, such as vitamin D, are mostly investigated by the DTC-GT companies. Vitamin D is important in a wide range of physiological processes, and its deficit has been related to different chronic diseases and metabolic conditions, including obesity. Nonetheless, encompassing literature studies indicated only a faint association between gene variants acting in vitamin D metabolism and the obese phenotype [33
], while the environment seems to play a major role [34
]. The overall estimate of heritability of 25-hydroxyvitamin D serum concentrations attributable to the six susceptibility loci harboring genome-wide significant SNPs (recently identified in a large GWAS) is 7.5%, with statistically significant loci explaining 38% of this total [37
]. These common variants tagged by GWAS chips therefore explain only a modest fraction of the overall variability in circulating 25-hydroxyvitamin D levels.
Considering scientific clearness, it should be noted that only 22 out of 45 companies provide a sample report on their website that clearly and comprehensively indicates the exact steps involved in the genetic analysis, with all the results provided to the consumer. Genetic risk level is frequently summed in a table, using an attractive and straight-forward color legend to indicate high, medium or low risk; however, allele variants are not always specified. Interestingly, the investigated traits are usually described and scientific references are linked, to give a general background to the consumer and to make the result interpretation easier.
However, the most outstanding fact is that only 16 companies out of 45 (about one third of the total) state which genes or which genetic variants are employed for nutrigenetic predictions. Moreover, only for 50% of the declared variants is an unambiguous code (in particular the dbSNP identifier) used. For this reason, for most companies it is difficult to understand exactly which genetic variants have been used to make predictions, and as a consequence it is very hard to interpret the reports and evaluate their scientific reliability.
In addition, of the 64 variants with the dbSNP identifier, only half of them are significantly associated at the genome-wide level with at least one trait of nutritional interest. This means that on average about 50% of genetic variants used for predictions show weak evidence of association with nutritional traits, and the chance that these variants might represent false signals of association is very high.
An interesting aspect is that predictions concerning traits for which dozens or hundreds of genetic associations (as in the case of body weight) are known, are made on the basis of few genetic variants.
Moreover, none of the companies exploit the use of powerful statistical tools such as polygenic risk scores (PRS, also known as risk profile scoring, genetic scoring, and genetic risk scoring). PRS combine multiple associated variants into a unique score by weighting their frequency in the population with their estimated impact on a trait [38
]; they can be constructed for any complex genetic phenotype for which appropriate GWAS (or other robust association) results are available. PRS for susceptibility are promising tools to identify individuals at high risk who may be eligible for protective interventions, and their application could lead to more reliable information for consumers [24
To the best of our knowledge, the present study provides new insight into DTC-GT companies’ characteristics and services in the nutrigenetic field. Considering the widespread proliferation of these services without medical advice, the evaluation of the scientific support and the understanding of the information provided could represent a great improvement.