Evaluation of a meta-analysis of ambient air quality as a risk factor for asthma exacerbation

False-positive results and bias may be common features of the biomedical literature today, including risk factor-chronic disease research. A study was undertaken to assess the reliability of base studies used in a meta-analysis examining whether carbon monoxide, particulate matter 10 and 2.5 micro molar, sulfur dioxide, nitrogen dioxide and ozone are risk factors for asthma exacerbation (hospital admission and emergency room visits for asthma attack). The number of statistical tests and models were counted in 17 randomly selected base papers from 87 used in the meta-analysis. P-value plots for each air component were constructed to evaluate the effect heterogeneity of p-values used from all 87 base papers The number of statistical tests possible in the 17 selected base papers was large, median=15,360 (interquartile range=1,536 to 40,960), in comparison to results presented. Each p-value plot showed a two-component mixture with small p-values less than .001 while other p-values appeared random (p-values greater than .05). Given potentially large numbers of statistical tests conducted in the 17 selected base papers, p-hacking cannot be ruled out as explanations for small p-values. Our interpretation of the meta-analysis is that the random p-values indicating null associations are more plausible and that the meta-analysis will not likely replicate in the absence of bias. We conclude the meta-analysis and base papers used are unreliable and do not offer evidence of value to inform public health practitioners about air quality as a risk factor for asthma exacerbation. The following areas are crucial for enabling improvements in risk factor chronic disease observational studies at the funding agency and journal level: preregistration, changes in funding agency and journal editor (and reviewer) practices, open sharing of data and facilitation of reproducibility research.


False-positives and bias in biomedical literature
Bias -Bias consists of systematic alteration of research findings due to factors related to study design, data acquisition, analysis or reporting of results (Boffetta et al. 2008). Gotzsche (2006) suspects that selective reporting proliferates in published observational studies where researchers routinely test many models and questions during a study and then report those models that offer interesting (statistically  Confounding (due either to incomplete statistical adjustment for measured variables or from the inability to adjust for unmeasured distorting variables). For example, the case cross-over study design attempts to capture an outcome of interest for each study participant in a observational study -e.g., occurrence of a short-term health episode such as an asthma exacerbation -when exposed and when unexposed (Howards 2018). Fixed risk factors for the outcome, such as genetic factors, do not change over time and therefore are the same when a participant is exposed and when unexposed (i.e., they do not confound the results). However, confounding can occur if there are risk factors that change over time (e.g., exposure to aeroallergens and viruses, changes in weather, etc.).
 Selection bias (use of improper procedures for selecting a sample population or as a result of factors that influence participation of the subjects in a study). For example, one form of selection bias commonly seen in cardiovascular studies is referred to as 'recurrent event bias' (Labos and Thanassoulis 2018), which occurs when a population under study is not representative of the general population but is selected to include patients who already have, in this case, cardiovascular disease.

Positive and negative predictive values of risk factor−chronic disease effects
Because of the prominence of disease prevention in our current health care system, observational risk factor−chronic disease research plays a key role in providing evidence to public health decision makers. Analysis of biomedical diagnostic test results being true depends on sensitivity, specificity and disease prevalence in a population (Schechter 1986, Last 2001, Shah 2003. Ioannidis (2005) incorporated the role of bias into this analysis and further developed relationships for understanding the probability of a research finding in a study being true for traditional epidemiological studies in the presence of bias. Given the importance of bias in observational studies (Ioannidis 2008a, 2011b), we extend the work of Ioannidis (2005) to illustrate the probability of positive research findings in a study (Positive Predictive Value or PPV) and negative, null, research findings in a study (Negative Predictive Value or NPV) being true for different levels of bias and prevalence rates for common chronic diseases in United States. Table 1 presents estimates of prevalence for common chronic diseases of interest in United States (Supplemental Information (SI) 1 provides details upon which estimates are based). The four most common types of chronic diseases in the world's population (including developed and developing countries) are (Beaglehole et al. 2007, Alwan and MacLean 2009): respiratory diseases (primarily asthma), heart and stroke disease, diabetes and cancers. These diseases share key modifiable and preventable risk factors related to individual behavior (unhealthy diet, physical inactivity, tobacco use and harmful substance abuse, e.g., alcohol).
Using relationships we develop in SI 1, Figure 1 illustrates the probability of a research finding in a study being true as a function of disease prevalence within the range 0.0001-0.1 for two levels of bias (0.2−a study influenced by relatively minor bias and 0.8−a study influenced by relatively major bias). Bias, defined here after Ioannidis (2005), represents the proportion of probed analyses [relationships] in a study that would not have been "research findings," but nevertheless end up presented and reported as such, because of bias. All of the common diseases listed in Table 1 correspond with low post-study probabilities of a positive research finding in a study being true -less than 30%. On the other hand, these diseases correspond with very high post-study probabilities of a negative (null) research finding in a study being truegreater than 95%. Figure 1 implies that a null risk factor−chronic disease finding in an observational study is very likely true; whereas, a positive risk factor−chronic disease finding is more likely to be false. For the most common chronic diseases of interesti.e., diseases with a prevalence <0.1 -NPV (PPV) is essentially independent (dependent) of prevalence and bias. Also in Figure 1, a PPV exceeding 30% is difficult to achieve in risk factor−chronic disease epidemiological research for the most common chronic diseases of interest. A majority of modern biomedical research making claims is operating in areas with very low pre-and post-study probability for true findings (Ioannidis 2005). This is particularly true for the chronic diseases listed in Table 1 Table 1 have been made in the past, using meta-analysis -e.g., asthma (Zheng et al. . p-Hacking enables researchers to find statistically significant results even when their samples are much too small to reliably detect the effect they are studying or even when they are studying an effect that is non-existent (Simonsohn et al. 2014). Motulsky (2014) offers some examples of different forms of p-hacking that can be used during a study: increasing sample size, analysing data subsets, increasing variables in a model, adjusting data, transforming data (i.e., log transformations), removing suspicious outliers, changing the control group, using different statistical tests.

Objective of the current study
Epidemiological study of chronic diseases with low prevalence operates in areas with very low pre-and post-study probability of positive research findings being true ( Figure   1). Young and Kindzierski (2019) previously reported on the potential for epidemiology literature, particularly observational studies related to air quality component−heart attack associations, to be compromised by false positives and bias. For the present study we were interested in exploring whether the same sort of issues might be occurring with observational studies of air quality component−asthma attack associations. Asthma has an estimated prevalence in United States of 0.079 (Table 1) with any ambient air quality−chronic disease observational study 'positive effect' having a low post-study probability of being true -less than 25% (Figure 1).
A meta-analysis offers a window into a research claim, for example, that some ambient air quality components, e.g., airborne fine particulate matter, are causal of a chronic disease. A meta-analysis examines a claim by taking a summary statistic along with a measure of its reliability from multiple individual ambient air quality−chronic disease studies (base papers) found in the epidemiological literature. These statistics are combined to give what is supposed to be a more reliable estimate of an air quality effect. A key assumption of a meta-analysis is that estimates drawn from the base papers for the analysis are unbiased estimates of the effect of interest (Boos and Stefanski 2013). However, as stated previously, studies with negative results are more likely to remain unpublished than studies with positive results leading to distortion of effects in the epidemiological literature and subsequent unreliable meta-analysis of these effects (Ioannidis 2008a).
Here we evaluated the meta-analysis study of Zheng et al.

Zheng et al. undertook a systematic computerized search of published observational
studies to identify those studies focusing on short-term exposures (same day and lags up to 7 days; which are never chosen a priori) to six ambient air quality components (carbon monoxide (CO), particulate matter with aerodynamic equivalent diameter ≤10 micron (PM10), particulate matter with aerodynamic equivalent diameter ≤2.5 micron (PM2.5), sulfur dioxide (SO2), nitrogen dioxide (NO2) and ozone (O3)) and asthma exacerbation (hospital admission and emergency room visits for asthma attack).
Associations between air quality components and asthma-related hospital admission and emergency room visits were expressed as risk ratios (RRs) and 95% confidence intervals (CIs) that were derived from single-pollutant models reporting RRs (95% CIs) or percentage change (95% CIs). They further recalculated these associations to represent a 10 μg/m 3 increase, except for CO (where they recalculated associations to represent a 1 mg/m 3 increase). Creswell (2003) indicates that a 5−20% sample from a population whose characteristics are known is considered acceptable for most research purposes as it provides an ability to make generalizations for the population. Given the prior screening and data collection procedures used by Zheng et al., we assumed their 87 papers had consistent characteristics suitable for use in meta-analysis. Based on this assumption, we randomly selected 17 of the 87 papers (~20%) for search space counting in the following manner. Using similar methods described previously (Young et al. 2019; Young and Kindzierski 2018, 2019), we started with the 87 base papers and assigned a separate number in ascending order to each paper (numbered 1−87). We then used the online web tool numbergenerator.org to generate 10 random numbers between 1 -87.
We then removed the 10 selected papers from the ordered list, renumbered the remaining papers 1−77 and used the web tool to generate 7 random numbers between 1 -77. This allowed us to select 17 of the Zheng et al. base papers for further evaluation (refer to SI 2).
Electronic copies of the 17 randomly selected bases papers (and any corresponding electronic supplementary information files) were obtained and read. One change was made from previous search space counting procedures used by Young et al.
(2019) and Young and Kindzierski (2019). It was apparent that several of the base papers employed a variety of model forms in their analysis (e.g., Sheppard et al. 1999). To accommodate this, we separately counted the number of model forms along with the number of outcomes, predictors, time lags, covariates reported in each base paper (covariates can be vague as they might be mentioned anywhere in a base paper). Specifically, analysis search space of a base paper was estimated as follows:  The product of outcomes, predictors, model forms and time lags = number of questions at issue, Space1.
 A covariate may or may not act as a confounder to a predictor variable and the only way to test for this is to include/exclude the covariate from a model. As it can be in or out of a model, one way to approximate the modelling options is to raise 2 to the power of the number of covariates, Space2.
 The product of Space1 and Space2 = an approximation of analysis search space, Space3.  P-values were computed using the method of Altman and Bland (2011) and ordered from smallest to largest and plotted against the integers, 1, 2, 3, …  If the points on the plot followed an approximate 45-degree line, then the pvalues are assumed to be from a random (chance) process.
 If the shape of the points exhibits a hockey stick, i.e., bilinear shape (blade on the bottom left hand corner, shaft towards the top right hand corner), then the pvalues used for meta-analysis constitute a mixture and a general (over-all) claim is not supported; in addition, the p-value reported for the overall claim in the meta-analysis paper is not valid.
To assist in the interpretation of behavior of the Zheng et al. meta-analysis pvalue plots, we also constructed and show p-value plots for plausible true null and true alternative hypothesis outcomes based on meta-analysis of observational datasets.

Analysis search space
Estimated analysis search spaces for the 17 randomly selected base papers from Zheng et al. are presented in Table 3. From Table 3, investigating multiple (i.e., 2 or more) 13 asthma outcomes in the selected base were as common as single outcome investigations.
In addition, use of multiple models and lags was common in their analysis, so was making adjustments for multiple possible covariate confounders. While the multiple factors considered (i.e., outcomes, predictors, models, lags and treatment of covariates) is seemingly realistic, these attempts to find possible exposure−disease associations among combinations of these factors will increase the overall number of tests performed in a single study.
Summary statistics of the possible numbers of tests in the 17 base papers are presented in Table Table 5. Summary statistics for the other air quality components (CO, NO2, O3, SO2 and PM10) and calculated p-values are provided in SI 3. In Table 5 (and tables presented in SI 3), calculated p-values ≤.05, taken as a statistically significant result, are bolded and italicized. Table 6 presents additional information on each of the air quality components  Table 6).

Asthma characteristics
Asthma is a disease that affects the airways to the lungs and it is worthwhile summarizing what is presently known about the disease as it relates to relevant characteristics, genetics and socioeconomic factors, triggering/precipitating factors and seasonality patterns of exacerbations. The underlying pathology of asthma, regardless of its severity, is chronic inflammation of the airways and reactivity/spasm of the airways. A combined contribution of genetic predisposition and non-genetic factors account for divergence of the immune system towards T helper (Th) type 2 cell responses that include production of pro-inflammatory cytokines, Immunoglobulin E (IgE) antibodies and eosinophil infiltrates (circulating granulocytes) known to associate with asthma ( Figure 5) (Noutsiosa and Floros 2014). The release of pro-inflammatory cytokines that cause airway narrowing is responsible for cough, shortness of breath, wheezing and chest tightness characteristic of the asthmatic state (Bernstein 2008). But this fails to account for beta stimulus of bronchiolar muscles that increases airway spasm. Airway inflammation causes secretions and contributes to edema (swelling) but it does not cause hyperresponsiveness.
During exacerbations these airway narrowing processes are accentuated, but it is not completely clear how these events contribute to these underlying changes and the mechanisms underlying an increase in airflow obstruction are not fully understood (Singh and Busse 2006). There is also ongoing debate as to whether asthma is one disease or several different diseases that include airway inflammation; however two  From an exposure point-of-view, isolating the role/contribution of a particular triggering/precipitating factor in an asthma exacerbation episode is difficult, unless the factor overwhelms. As asthma is a complex interaction between the inhaled environment and the formed elements of the airways (Holgate 2010), the hypothesis is that exposure to abruptly changing air quality conditions may contribute to symptoms and increase the severity of asthma exacerbations; although these effects are not as Asthma exacerbations are not so problematic in summer except for those asthmatics that are triggered by seasonal allergens that appear in summer. The onset of cooler weather brings on asthma exacerbations. Wintertime allergenic stimulants is more a problem of inside air-insect offal and dander and household allergens as well as mold in closed in housing or other allergens that are accentuated by inside living.
There is also the well-known asthmatic trigger of cooler or cold air. (2011) showed that employing a few common forms of p-hacking can cause the false positive error rate for a single study to increase from the nominal 5% to over 60%.
One is unable to conclude anything of consequence by observing 1 positive (statistically significant) result from 20 independent statistical null hypothesis tests based on a 5% false positive error rate. If these tests are not independent and p-hacking is employed, one is unable to conclude anything of consequence even observing more than 1 positive result because of the potential for false positive error rate inflation. The probability of this occurrence depends on a host of factors and is almost never uniform across the tests performed (thus violating a key assumption of the 1 in 20 error rate rule of a null hypothesis test). If statistical null hypothesis testing is used as a kind of data beach-combing tool unguided by clear (and ideally prospective) specification of what findings are expected and why, much that is nonsense will be "discovered" and added to

Lack of transparent descriptions of statistical tests and statistical models
In reviewing the 17 randomly-selected based papers, transparency was lacking in the methodology descriptions making it challenging for us and, in general, readers to understand how many statistical tests and models were used in these studies.
Consequently our reviews required careful reading and re-reading of the 'Method' section, and the 'Results' and "Discussion' sections of the base papers to understand what was actually done and to compile information for estimating the analysis search spaces. As an example, in one base paper reviewed (Evans et al. 2013) only near the end of their Discussion section did they indicate that multiple pollutants were included in the same model -implying that multivariate models were employed along with univariate and bivariate models that they described in their Methods section.
Why this lack of transparency is important is explained further. It is common today for observational studies employing MTMM to seek out information on multiple exposures and disease outcomes, and the possibility exists for researchers to test thousands of exposure-disease combinations (Boffetta et al. 2008) and only report a portion of results that allow them to make interesting claims. Several hypothetical analytical search space scenarios were presented in Table 2 As shown in Table 2, analysis search spaces can easily inflate into tests of thousands of exposure-disease combinations. Without clear, concise descriptions about details of outcomes, predictors, models, lags and covariate confounders used (or available) in a study, readers will not be able to comprehend what was done relative to the few statistical results that typically get presented in a study. The latter Examples 2 and 3 in Table 2 illustrate opportunities for researchers to search through but only report statistically significant, positive exposure-disease associations in a study.  (Table 5), 196 (59%) represented null (statistically non-significant) associations. That is to say, they offer insufficient evidence to support an air quality component exposure−asthma exacerbation causal relationship.
Quantitative results from observational studies (e.g., RRs, odds ratios) can figure prominently into regulatory decisions but frequently observational studies offer RRs and odds ratios extremely close to 1.0. Further, a disservice to observational epidemiology is the practice of searching for and reporting and attempting to defend weak statistical associations (e.g., RRs and odds ratios extremely close to 1.0) -among which the potential for distorting influences of chance, bias and confounding is further Epidemiology studies that test many null hypotheses tend to provide results of limited quality for each association due to limited exposure assessment and inadequate information on potential confounders (Savitz and Olshan 1995) and they tend to generate more errors of false positive or false negative associations (Rothman 1990).

Recommendations for improvement
It is our belief that risk factor−chronic disease researchers are unaware that many positive research findings from published observational studies may be false. Also, there are many sources of bias currently being underestimated by observational study researchers with selective reporting biases likely a key issue distorting their findings, and publication bias is likely a key issue distorting the epidemiologic literature in general. As a result, we present a number of recommendations for ways of improving risk factor−chronic disease observational studies to address these issues.
Our recommendations have largely been advanced by others in the past-e.g., Simes (1986), Begg and Berlin (1988), Angell (1989) and Dickersin (1990). In addition, this issue has become a topic of interest more recently -e.g., Song  We see the following topics as being crucial for enabling improvements in risk factor−chronic disease observational studies at the funding agency/journal level:  Changes in funding agency, journal editor (and reviewer) practices. Open sharing of data -We consider the potential for bias to be particularly severe for observational studies that investigate subtle/weak risk factor−chronic disease relationships. This type of epidemiological research poses a considerable challenge for the most common chronic diseases of interest in United States (Table 1)   Regarding two properties of the Zheng et al. meta-analysis that we were interested in understanding:  As for the reliability of claims in the base papers of their meta-analysis due to the presence of multiple testing and multiple modelling -which can give rise to false positive results, we conclude that the meta-analysis is unreliable due to the presence of multiple testing and multiple modelling.
 As for whether heterogeneity across the base papers of their meta-analysis is more complex than simple sampling from a single normal process, we show that the two-component mixture of data used in the meta-analysis (i.e., Figure 4) does not represent simple sampling from a single normal process.   Table 2. Three examples of search space analysis of a hypothetical observational study of ambient air quality versus hospitalization due to asthma exacerbation.

Example 1
A simple univariate analysis of childhood asthma hospital admissions is considered using 6 air quality predictors -daily average levels of PM10, PM2.2, SO2, NO2, CO and O3, and no lags or weather covariate confounders: Example 2 For a simple, however slightly more typical analysis of the same 6 predictors with 3 lags (i.e., same day and 1 and 2 day lags), and 2 weather variables treated as covariate confounders (daily average temperature and relative humidity), and also adjusting for possible confounding of co-pollutants in the analysis (i.e., air quality variables are also treated as covariate confounders in the analysis), we have the following search space counts:      NPV=negative predictive value -probability that a negative relationship (i.e., risk factor does not cause disease) is true; PPV=positive predictive value -probability that a positive relationship (i.e., risk factor causes disease) is true).

Post-study probabilities of a research finding being true
Our interest is in estimating the post-study probability of an outcome being true for a single observational study of risk factor−chronic disease relationships. An outcome may either be one that is a positive (+ve) association, i.e. one that is statistically significant with p-value ≤.05; or a negative (−ve) association, i.e. one that is not statistically significant with p-value >.05. Using a Bayesian argument, the former outcome may either be a true or false positive association and the latter outcome may either be a false or true negative association.

Disease prevalence (P)
First we provide a formal definition of prevalence (P):  If we substitute equation (2) into equation (3) and rearrange, we get P(d which is the same as the prevalence rate of a disease (P). Thus the "pre-study probability of a relationship being true" can be represented by the prevalence rate of a disease (P) for our purposes. As our interest is in estimating the post-study probability of an outcome being true as a function of P, we conservatively assume that all those in the population with the disease (i.e., d + ) being studied are due to the specific risk factor we are examining.
where α and β are the Type I and Type II error rates, respectively.
The negative predictive value (NPV) c is used to represent the post-study probability of a negative (i.e., null) outcome being true in the presence of bias: