Systems Biology: New Approaches to Old Environmental Health Problems

The environment plays a pivotal role as a human health determinant and presence of hazardous pollutants in the environment is often implicated in human disease. That pollutants cause human diseases however is often controversial because data connecting exposure to environmental hazards and human diseases are not well defined, except for some cancers and syndromes such as asthma. Understanding the complex nature of human-environment interactions and the role they play in determining the state of human health is one of the more compelling problems in public health. We are becoming more aware that the reductionist approach promulgated by current methods has not, and will not yield answers to the broad questions of population health risk analysis. If substantive applications of environment-gene interactions are to be made, it is important to move to a systems level approach, to take advantage of epidemiology and molecular genomic advances. Systems biology is the integration of genomics, transcriptomics, proteomics, and metabolomics together with computer technology approaches to elucidate environmentally caused disease in humans. We discuss the applications of environmental systems biology as a route to solution of environmental health problems.


Introduction
The relationship between the external environment and human health was recognized by ancient societies. The Greek physicians Alcmaeon of Croton and Hippocrates are credited with hypotheses linking environment and health [1]. In Roman times it was known that a source of potable water was necessary for human health, thus in addition to building aqueducts to supply necessary drinking water. Roman laws concerning public health were severe and strictly enforced [2]. Remnants of association between environment and disease survive to this day in some of the names associated with diseases. Malaria, for example, literally means "bad air", which was associated with the onset of the disease. With the discovery that bacteria could cause disease, the Germ Theory of Disease was promulgated, largely from the work of Lister, Koch and Pasteur [3][4][5][6]. The germ theory recognized infectious agents of biological origin such as bacteria and viruses as the cause of much of human disease, subsequently leading to discovery of antibiotics that control bacteria and development of new regimens of immunization to control viral diseases [6][7][8]. Together with greater understanding of vector control and use of antibiotics and vaccines, the ability to manage diseases increased and the environment was largely overlooked as a causative agent of human disease.
With the elucidation of the structure of DNA in the early 1950's and the growth of molecular biology, the genetic basis of non-infectious diseases blossomed, and great emphasis on genetics as a cause of diseases was emphasized in medicine [9][10][11][12]. In fact, chronic diseases for which no specific cause was known were largely attributed to genetics or even "bad genes" [13].

The Envirome
Awareness of the environment as an agent that affects human health gained momentum with publication of some popular press books, notably Silent Spring [14]. Incidents such as that which occurred at Love Canal, inspired the environmental movement, and government action and research into the environment and disease. The creation of the Environmental Protection Agency (EPA) and the National Institute of Environmental Health Sciences (NIEHS), an institute of the National Institutes of Health (NIH) [15,16] focused on government sponsored environmental health research. The presence of hazardous pollutants in the environment is now often implicated in human disease [17]. That pollutants cause human diseases however is often controversial because data connecting exposure to environmental hazards and human diseases are not well defined, except for some cancers and syndromes [18].
The complex nature of human-environment interactions and the role those interactions play in determining the state of human health are becoming more appreciated [19]. Observational epidemiology studies undertaken to assess potential causal relationships between exposure and human health are limited because excess disease occurrence is often small and difficult to identify [20,21].
The Human Genome Project was undertaken as an international collaboration to sequence the entire human genome [22]. It was found that the human genome consists of between 20,000 to 25,000 genes, 3 billion base pairs, and that about 99.9 % of which are identical in human populations [23]. It has been estimated that approximately 1,200 genes are responsible for about 1,600 diseases [24]. The "genome" was originally defined by a German botanist; Hans Winkler in the 1920's to refer to all genes within a set of chromosomes [25]. The term was expanded to mean all DNA in chromosomes, because it was found that genes comprise only 2 to 3 percent of the human genome [25]. Sequencing the human genome is the most ambitious and important effort in the history of biology. It was thought that through sequencing the entire human genome a complete genetic blueprint for human life would be provided, which would yield important insights into human health and development [26]. The genome sequence has provided many tools for researchers to ask questions that were not addressable before the human genome project. While there is hope for improved medical care and public health resulting from the advances made by the human genome project, the genome sequence is not yet used as widely in public health or medical practice as it is in research.
It was quickly realized that the sequence of the genome alone was not going to yield all the answers, thus we quickly entered the post-genomic age, which focuses not only on the study of the genome, but also on products of the genome, which essentially follows the central dogma of molecular biology proposed by Watson and Crick more than 50 years ago [27], with the addition of enzymes and metabolism: (Figure 1). Thus, the genome (all DNA) gives rise to the transcriptome (all messenger RNA; mRNA), the proteome (all proteins in a cell, including enzymes) and the metabolome (all metabolites and enzymes that generate metabolites) in the cell.  [27], with the addition of active enzymes and metabolities, which taken together reflect human phenotypes. Here we include enzymes as part of the metabolome because metabolities are regulated by enzyme patterns.
The human genome project yielded huge data sets containing large numbers of DNA sequences stored and being analyzed on computers all over the world. These data are being sorted, annotated and developed in various ways using computer software to organize integrated maps of DNA involving genetic and physical information [28]. Recognition of the need to be able to handle large data sets came early when GenBank was established in the mid 1960's [29,30]. This marriage of biology and computer technology led to the emergence of the new science of bioinformatics.
As the picture of environmentally-caused diseases continues to emerge, we are gaining a greater appreciation that it is the interaction of the environment with our genes that leads to most disease states in humans. Sequencing the human genome served to underscore this. Understanding risks to human health in light of the human genome-environment interaction is one of the more compelling challenges in environmental public health [31,32]. With approximately 99.9 % of human genomes being identical, the remaining 0.1% (or about 3 million base pairs) appears to dictate differences in susceptibility to environmental challenges among human populations. As a result, much research has focused on single nucleotide polymorphisms (SNPs), which are stable heritable changes abundant in the genome, as the source of human variation [33]. We are learning that it is not as simple as a single SNP alone, but rather it is differences in patterns of SNP polymorphisms, called haplotypes, that may be at least partly responsible for differences in susceptibility to environmental conditions of human populations [32,34,35]. Active research to elucidate haplotype maps and patterns among different population groups is currently underway [36]. Haplotype mapping and pattern recognition is a potentially powerful tool to identify populations at risk for environmentally caused diseases. Thus certain SNPs or groups of SNPs (haplotype) confer susceptibility of individuals in a population to disease [37].
Because of our increased knowledge of genetics and genomics it is now apparent that most diseases are not carried in our genes as deterministic factors of disease, but rather our genomes carry variations in populations that result in differences in susceptibility to disease. So, with the sequencing of the human genome, renewed interest in understanding the role of the environment as a cause of human disease has occurred [38]. Genes are expressed in response to the environment. Thus, when individuals in a population carry variations in the genome that results in altered expression of certain genes, disease results in susceptible populations [39,40].
Even with availability of large sets of sequence data and genomic information, it is not yet possible to determine the role that exposure to the environment plays in affecting health outcomes such as birth defects, developmental deficiencies, chronic respiratory disease, multiple sclerosis, Parkinson's or Alzheimer's disease [41]. The term toxic genomics has been applied to the study of gene-environment interactions [42]. However that term is self-limiting to consideration of pollutant chemicals and does not embrace the concept that the environment encompasses more than pollutant toxicants.
We use the term "Enviromics" to mean interactions of the complete environment, or envirome, with human genomes (Figure 2) [43]. The envirome encompasses every interaction between humans and the external environment. It includes where we live, what we eat, drink, or breathe, our social economic status, behavior, social interactions, occupation, and exposure to pollutants. The concept of the enviromics is all encompassing in its scope and understanding how the envirome affects human health, both positively and negatively. To gain a full understanding of these interactions, new tools and approaches must be developed. The science of genetics has been a powerful tool in environmental public health practice to identify rare conditions and syndromes, chromosomal aberrations birth defects, inborn errors in metabolism and reproductive errors, and as a tool for genetic counseling [44]. Genetics however is a linear science, which examines single genes, one at a time. A multidimensional approach is required to derive a more accurate assessment of the dynamic processes associated with living systems.

Figure 2: Indirect Environment-Gene Interaction:
Hormones and vitamins interact with the genome via ligand-activated transcription factors yielding a "normal" cellular response to maintain homeostasis. Environmental agents can mimic natural ligands or bind to other intracellular receptors that yield different information from homeostatic regulation. The result is an altered cellular response yielding an adverse health effect.

Environmental Systems Biology
Genomics looks at all the genes as a dynamic system, over time, to determine how they interact and influence biological pathways, networks and physiology, in a much more global sense than genetics. Thus, genomics shows great promise for identifying groups of genes involved in complex disorders to understand and intervene in environmentally caused diseases [45].
When considering environment-genome interactions as a factor in complex disease, we understand that the genome cannot be changed, at least for now. However, once identified, it is possible to reduce exposure or modify the lifestyle element that is the environmental factor in the disease [46,47]. Gene-envirome interactions can occur by direct interactions with active metabolites at specific sites of the genome to yield mutations, which could result in a human disease [48]. Indirect interactions with the human genome can occur via intracellular receptors that act as ligand-actived transcription factors, which regulate gene expression maintaining cellular homeostasis, or with an environmental agent to cause harmful effects (Figure 3) [49]. This type of envirome-gene interaction may be more easily examined than direct interaction because markers of this type of interaction are numerous and easily measured before onset of disease. Some examples of this include expression of cytochrome P450 genes after exposure to environmental agents, such as the polyaromatic compound benzo[a]pyrene, that bind to the Ah receptor [50][51][52]. Epigenomic change brought about by exposure to environmental agents is another important example of indirect environment-gene interaction [53,54]. These changes, which are not considered mutations, result in silencing or enhancing specific gene expression by hyperor hypo-alkylation processes.

Figure 3: Simple Interaction Gene Regulatory Network:
In the simple model, three interacting genes form a network in a cell. Here Gene A activates Gene B. Gene B activates Gene A and Gene C, and Gene C inactivates Gene A. Thus several levels of regulation are possible with the three interacting genes.
Our ability to measure envirome-gene interactions has exceeded our understanding of the mechanisms of envirome-disease linkages. Current approaches to understanding risk to human health after environmental exposure are based on studies of single chemical exposure and limited health effect, or single geneenvironment interactions [55,56]. We are becoming more aware that the reductionist approach promulgated by traditional research methodology has not, and will not yield answers to the broad and most important questions of population health risk analysis [57].
The question most people have is "will the environment adversely affect my family's health?" This is obviously not an easy question to answer. There are many common chronic diseases for which we do not have a clear understanding of causes, etiology, gene involvement, or susceptibility and we certainly do not have causal links [58]. These diseases are ones which are common in our society, including asthma, prostate and breast cancer, autism, Parkinson's disease, Crohn's disease, or diabetes. In addition, we lack knowledge of the molecular mechanisms of pathology of diseases caused by exposure to lead, mercury, or pesticides of various kinds. This is true in spite of a large body of research to try to pick apart those diseases and exposures. We have not really progressed to the point that we have detailed knowledge of how genes are involved or what processes and pathways influence individual susceptibility to disease after interaction with the envirome. This is a result of using a reductionist approach to piece together the larger picture one component at a time. We need an integrated approach that draws on data from the environment, biomarkers of exposure, gene expression patterns and parameters, and physiology, for public health practice to benefit from modern genomics technology [59]. Systems biology is an emerging science that integrates genomics, transcriptomics, proteomics, and metabolomics together with computer analysis and modeling to understand interacting gene networks that maintain cellular homeostasis. Because of the unique problems we face in environmental health, environmental system biology teams must include environmental anthropologists and sociologists, exposure assessors, epidemiologists, ecologists as well as toxicologists, molecular scientists, computer modelers and statisticians. Systems biology can thus can be applied to the understanding how the envirome can modulate the tightly regulated circuitry of the human organism to cause disease in the broadest sense [60].
That cells and organisms have interconnected pathways that regulate metabolism is well known and reflected in the metabolic pathways found in every textbook of biochemistry. Similarly, signal transduction networks are becoming better understood. However, understanding the complex gene regulation networks expressed in the transcriptome, proteome and metabolome downstream of the signal transduction pathways is much more complex [61]. Gene array technology together with computers for statistical analysis and modeling techniques has been used to establish gene networks (see Figure 3) [62]. Proteomics and metabolomics are more complex than genome analysis and have lagged in application to environmental health; however the development of protein chips and other analytical advances will result in exponential growth in those fields [63].
The recognition that using gene array technology can elucidate genomic and envirome factors in understanding human health and disease are a focal point in modern environmental public health [64]. We will soon be in a position to organize data components into modules amenable to systems biology approaches to modeling of environmental disease. Thus data on environment, exposure, and gene networks that describe the transcriptome, proteome and metabolome will provide insights into the identity and character of genomeenvirome interactions, giving us opportunities to effectively target intervention strategies. Complex databases of genome sequences from genomic and toxicant information combined with modern methods of data mining, information retrieval and statistics will provide comparative information on the molecular basis of toxicity and disease.

Where Do We Go From Here?
The science underlying genomic and system biology approaches to environmental diseases is readily available. However, application of these powerful methods is lagging, in part because at first glance, genomics and public health practice are at polar opposites. Public health is practical and utilitarian, where the rights of the majority out weigh the rights of the minority, resulting in interventions that can be perceived as coercive. For example general immunization and isolation or quarantine has been justified over individual civil rights to protect general health of the population [65]. On the other hand, using systems biology to identify susceptibility to environmental diseases other is highly personalized [66]. There is no guarantee that individual findings will be generalizeable to the population at large, consequently, there is potential for clashes between public health and new genomics approaches [67]. Another major concern includes, ethical, legal and social issues regarding the accumulation and proper application of the data derived from such studies [68][69][70]. These points will have to be addressed before modern genomic approaches can be widely accepted in the practice of environmental public health.
Human population studies using clinical or epidemiological data that associate environmental exposures with health endpoints and disease can now be studied using systems biology approaches incorporating enviromics, and metabolomics. Together with the use of population genetic histories, understanding human genetic variation and genomic reactions to specific environmental exposures will allow us to uncover the causes of variations in human response to environmental exposures providing important new tools in assessing risk of human disease [31].