E-Mail Alert

Add your e-mail address to receive forthcoming issues of this journal:

Journal Browser

Journal Browser

Special Issue "Methodological Innovations and Reflections"

Quicklinks

A special issue of International Journal of Environmental Research and Public Health (ISSN 1660-4601).

Deadline for manuscript submissions: closed (31 August 2015)

Special Issue Editors

Guest Editor
Dr. Igor Burstyn (Website)

Department of Environmental and Occupational Health School of Public Health Drexel University Nesbitt Hall Room 614, 3215 Market Street, Philadelphia, PA 19104, USA
Interests: maternal and child health; occupational and environmental epidemiology; occupational hygiene; exposure assessment; gene-environment interaction; bio-statistics
Guest Editor
Dr. Gheorghe Luta (Website)

Department of Biostatistics, Bioinformatics, and Biomathematics Lombardi Comprehensive Cancer Center Georgetown University Building D, Suite 180 4000 Reservoir Road, NW Washington DC, 20057-1484, USA
Phone: +1 (202) 687-8203
Fax: +1 (202) 687-2581
Interests: cancer control and prevention, cancer epidemiology, biostatistics, empirical likelihood, flow cytometry, non-negative matrix factorization

Special Issue Information

Dear Colleagues,

The idea behind this Special Issue of the journal dedicated to Methodological Innovations and Reflections, is not novel. Even the title of the Special Issue owes its existence to the journal Epidemiologic Perspectives & Innovations that was discontinued in 2012 by the BMC. This is intentionally chosen. Our aim is to create the right environment for the open exchange of innovative ideas and reflections, concepts that, in and of themselves, may not be new to some fields (e.g., statistics or economics), but are unknown or under-appreciated in public health research (including, but not limited to, epidemiology, exposure sciences, and toxicology). We strive to stimulate dialogue about how we do science and (more importantly) how we could do it better. There may not be too much novelty about this general approach, but we feel strongly that we must talk about this in open literature for all to benefit in a similar way to how any institution of higher learning benefits from their seminar series: They create a safe and respectful environment to discuss innovation without the threat of being judged for making errors and to explore ideas that may, or may not, lead to wide adoption and/or substantive advances. The main goal of the Special Issue is to advance methodology though debate, rather than by the publication of a single seminal article: A defined process that stimulates creativity, reflection and innovation is our aim. Of course, we also strive to bring out into the open some of the papers that are usually judged to be too simple for the theoretical statistical journals, and yet too complex for the applied journals. To this effect, we promise our readers and contributors to make editorial decisions that will reflect the soundness of the argument rather than the implications of their conclusions: Elegant mathematical arguments and logical interpretations that challenge the current orthodoxy are highly encouraged. We look forward to working with you in making this exciting endeavor a success. It is up to all of us to make this a reality.

Igor Burstyn
George Luta
Guest Editors

Submission

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. Papers will be published continuously (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are refereed through a peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. International Journal of Environmental Research and Public Health is an international peer-reviewed Open Access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs).

Published Papers (19 papers)

View options order results:
result details:
Displaying articles 1-19
Export citation of selected articles as:

Research

Jump to: Review, Other

Open AccessArticle Empirical Likelihood-Based ANOVA for Trimmed Means
Int. J. Environ. Res. Public Health 2016, 13(10), 953; doi:10.3390/ijerph13100953 (registering DOI)
Received: 17 May 2016 / Revised: 15 September 2016 / Accepted: 20 September 2016 / Published: 27 September 2016
PDF Full-text (306 KB) | HTML Full-text | XML Full-text
Abstract
In this paper, we introduce an alternative to Yuen’s test for the comparison of several population trimmed means. This nonparametric ANOVA type test is based on the empirical likelihood (EL) approach and extends the results for one population trimmed mean from Qin [...] Read more.
In this paper, we introduce an alternative to Yuen’s test for the comparison of several population trimmed means. This nonparametric ANOVA type test is based on the empirical likelihood (EL) approach and extends the results for one population trimmed mean from Qin and Tsao (2002). The results of our simulation study indicate that for skewed distributions, with and without variance heterogeneity, Yuen’s test performs better than the new EL ANOVA test for trimmed means with respect to control over the probability of a type I error. This finding is in contrast with our simulation results for the comparison of means, where the EL ANOVA test for means performs better than Welch’s heteroscedastic F test. The analysis of a real data example illustrates the use of Yuen’s test and the new EL ANOVA test for trimmed means for different trimming levels. Based on the results of our study, we recommend the use of Yuen’s test for situations involving the comparison of population trimmed means between groups of interest. Full article
(This article belongs to the Special Issue Methodological Innovations and Reflections)
Open AccessArticle Generalized Confidence Intervals and Fiducial Intervals for Some Epidemiological Measures
Int. J. Environ. Res. Public Health 2016, 13(6), 605; doi:10.3390/ijerph13060605
Received: 14 March 2016 / Revised: 3 June 2016 / Accepted: 12 June 2016 / Published: 18 June 2016
PDF Full-text (344 KB) | HTML Full-text | XML Full-text
Abstract
For binary outcome data from epidemiological studies, this article investigates the interval estimation of several measures of interest in the absence or presence of categorical covariates. When covariates are present, the logistic regression model as well as the log-binomial model are investigated. [...] Read more.
For binary outcome data from epidemiological studies, this article investigates the interval estimation of several measures of interest in the absence or presence of categorical covariates. When covariates are present, the logistic regression model as well as the log-binomial model are investigated. The measures considered include the common odds ratio (OR) from several studies, the number needed to treat (NNT), and the prevalence ratio. For each parameter, confidence intervals are constructed using the concepts of generalized pivotal quantities and fiducial quantities. Numerical results show that the confidence intervals so obtained exhibit satisfactory performance in terms of maintaining the coverage probabilities even when the sample sizes are not large. An appealing feature of the proposed solutions is that they are not based on maximization of the likelihood, and hence are free from convergence issues associated with the numerical calculation of the maximum likelihood estimators, especially in the context of the log-binomial model. The results are illustrated with a number of examples. The overall conclusion is that the proposed methodologies based on generalized pivotal quantities and fiducial quantities provide an accurate and unified approach for the interval estimation of the various epidemiological measures in the context of binary outcome data with or without covariates. Full article
(This article belongs to the Special Issue Methodological Innovations and Reflections)
Open AccessArticle Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health
Int. J. Environ. Res. Public Health 2016, 13(5), 509; doi:10.3390/ijerph13050509
Received: 29 February 2016 / Revised: 11 May 2016 / Accepted: 12 May 2016 / Published: 18 May 2016
PDF Full-text (3226 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Often data can be represented as a matrix, e.g., observations as rows and variables as columns, or as a doubly classified contingency table. Researchers may be interested in clustering the observations, the variables, or both. If the data is non-negative, then Non-negative [...] Read more.
Often data can be represented as a matrix, e.g., observations as rows and variables as columns, or as a doubly classified contingency table. Researchers may be interested in clustering the observations, the variables, or both. If the data is non-negative, then Non-negative Matrix Factorization (NMF) can be used to perform the clustering. By its nature, NMF-based clustering is focused on the large values. If the data is normalized by subtracting the row/column means, it becomes of mixed signs and the original NMF cannot be used. Our idea is to split and then concatenate the positive and negative parts of the matrix, after taking the absolute value of the negative elements. NMF applied to the concatenated data, which we call PosNegNMF, offers the advantages of the original NMF approach, while giving equal weight to large and small values. We use two public health datasets to illustrate the new method and compare it with alternative clustering methods, such as K-means and clustering methods based on the Singular Value Decomposition (SVD) or Principal Component Analysis (PCA). With the exception of situations where a reasonably accurate factorization can be achieved using the first SVD component, we recommend that the epidemiologists and environmental scientists use the new method to obtain clusters with improved quality and interpretability. Full article
(This article belongs to the Special Issue Methodological Innovations and Reflections)
Figures

Open AccessArticle A Simulation-Based Comparison of Covariate Adjustment Methods for the Analysis of Randomized Controlled Trials
Int. J. Environ. Res. Public Health 2016, 13(4), 414; doi:10.3390/ijerph13040414
Received: 7 September 2015 / Revised: 1 April 2016 / Accepted: 1 April 2016 / Published: 11 April 2016
PDF Full-text (288 KB) | HTML Full-text | XML Full-text
Abstract
Covariate adjustment methods are frequently used when baseline covariate information is available for randomized controlled trials. Using a simulation study, we compared the analysis of covariance (ANCOVA) with three nonparametric covariate adjustment methods with respect to point and interval estimation for the [...] Read more.
Covariate adjustment methods are frequently used when baseline covariate information is available for randomized controlled trials. Using a simulation study, we compared the analysis of covariance (ANCOVA) with three nonparametric covariate adjustment methods with respect to point and interval estimation for the difference between means. The three alternative methods were based on important members of the generalized empirical likelihood (GEL) family, specifically on the empirical likelihood (EL) method, the exponential tilting (ET) method, and the continuous updated estimator (CUE) method. Two criteria were considered for the comparison of the four statistical methods: the root mean squared error and the empirical coverage of the nominal 95% confidence intervals for the difference between means. Based on the results of the simulation study, for sensitivity analysis purposes, we recommend the use of ANCOVA (with robust standard errors when heteroscedasticity is present) together with the CUE-based covariate adjustment method. Full article
(This article belongs to the Special Issue Methodological Innovations and Reflections)
Open AccessArticle Pooling Bio-Specimens in the Presence of Measurement Error and Non-Linearity in Dose-Response: Simulation Study in the Context of a Birth Cohort Investigating Risk Factors for Autism Spectrum Disorders
Int. J. Environ. Res. Public Health 2015, 12(11), 14780-14799; doi:10.3390/ijerph121114780
Received: 12 October 2015 / Revised: 4 November 2015 / Accepted: 6 November 2015 / Published: 19 November 2015
PDF Full-text (1015 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
We sought to determine the potential effects of pooling on power, false positive rate (FPR), and bias of the estimated associations between hypothetical environmental exposures and dichotomous autism spectrum disorders (ASD) status. Simulated birth cohorts in which ASD outcome was assumed to [...] Read more.
We sought to determine the potential effects of pooling on power, false positive rate (FPR), and bias of the estimated associations between hypothetical environmental exposures and dichotomous autism spectrum disorders (ASD) status. Simulated birth cohorts in which ASD outcome was assumed to have been ascertained with uncertainty were created. We investigated the impact on the power of the analysis (using logistic regression) to detect true associations with exposure (X1) and the FPR for a non-causal correlate of exposure (X2, r = 0.7) for a dichotomized ASD measure when the pool size, sample size, degree of measurement error variance in exposure, strength of the true association, and shape of the exposure-response curve varied. We found that there was minimal change (bias) in the measures of association for the main effect (X1). There is some loss of power but there is less chance of detecting a false positive result for pooled compared to individual level models. The number of pools had more effect on the power and FPR than the overall sample size. This study supports the use of pooling to reduce laboratory costs while maintaining statistical efficiency in scenarios similar to the simulated prospective risk-enriched ASD cohort. Full article
(This article belongs to the Special Issue Methodological Innovations and Reflections)
Open AccessArticle A Discriminant Function Approach to Adjust for Processing and Measurement Error When a Biomarker is Assayed in Pooled Samples
Int. J. Environ. Res. Public Health 2015, 12(11), 14723-14740; doi:10.3390/ijerph121114723
Received: 3 August 2015 / Revised: 15 October 2015 / Accepted: 6 November 2015 / Published: 18 November 2015
PDF Full-text (697 KB) | HTML Full-text | XML Full-text
Abstract
Pooling biological specimens prior to performing expensive laboratory assays has been shown to be a cost effective approach for estimating parameters of interest. In addition to requiring specialized statistical techniques, however, the pooling of samples can introduce assay errors due to processing, [...] Read more.
Pooling biological specimens prior to performing expensive laboratory assays has been shown to be a cost effective approach for estimating parameters of interest. In addition to requiring specialized statistical techniques, however, the pooling of samples can introduce assay errors due to processing, possibly in addition to measurement error that may be present when the assay is applied to individual samples. Failure to account for these sources of error can result in biased parameter estimates and ultimately faulty inference. Prior research addressing biomarker mean and variance estimation advocates hybrid designs consisting of individual as well as pooled samples to account for measurement and processing (or pooling) error. We consider adapting this approach to the problem of estimating a covariate-adjusted odds ratio (OR) relating a binary outcome to a continuous exposure or biomarker level assessed in pools. In particular, we explore the applicability of a discriminant function-based analysis that assumes normal residual, processing, and measurement errors. A potential advantage of this method is that maximum likelihood estimation of the desired adjusted log OR is straightforward and computationally convenient. Moreover, in the absence of measurement and processing error, the method yields an efficient unbiased estimator for the parameter of interest assuming normal residual errors. We illustrate the approach using real data from an ancillary study of the Collaborative Perinatal Project, and we use simulations to demonstrate the ability of the proposed estimators to alleviate bias due to measurement and processing error. Full article
(This article belongs to the Special Issue Methodological Innovations and Reflections)
Open AccessArticle Assessment of Offspring DNA Methylation across the Lifecourse Associated with Prenatal Maternal Smoking Using Bayesian Mixture Modelling
Int. J. Environ. Res. Public Health 2015, 12(11), 14461-14476; doi:10.3390/ijerph121114461
Received: 14 October 2015 / Revised: 9 November 2015 / Accepted: 9 November 2015 / Published: 13 November 2015
PDF Full-text (1068 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
A growing body of research has implicated DNA methylation as a potential mediator of the effects of maternal smoking in pregnancy on offspring ill-health. Data were available from a UK birth cohort of children with DNA methylation measured at birth, age 7 [...] Read more.
A growing body of research has implicated DNA methylation as a potential mediator of the effects of maternal smoking in pregnancy on offspring ill-health. Data were available from a UK birth cohort of children with DNA methylation measured at birth, age 7 and 17. One issue when analysing genome-wide DNA methylation data is the correlation of methylation levels between CpG sites, though this can be crudely bypassed using a data reduction method. In this manuscript we investigate the effect of sustained maternal smoking in pregnancy on longitudinal DNA methylation in their offspring using a Bayesian hierarchical mixture model. This model avoids the data reduction used in previous analyses. Four of the 28 previously identified, smoking related CpG sites were shown to have offspring methylation related to maternal smoking using this method, replicating findings in well-known smoking related genes MYO1G and GFI1. Further weak associations were found at the AHRR and CYP1A1 loci. In conclusion, we have demonstrated the utility of the Bayesian mixture model method for investigation of longitudinal DNA methylation data and this method should be considered for use in whole genome applications. Full article
(This article belongs to the Special Issue Methodological Innovations and Reflections)
Open AccessArticle An Enhanced Variable Two-Step Floating Catchment Area Method for Measuring Spatial Accessibility to Residential Care Facilities in Nanjing
Int. J. Environ. Res. Public Health 2015, 12(11), 14490-14504; doi:10.3390/ijerph121114490
Received: 5 October 2015 / Revised: 8 November 2015 / Accepted: 10 November 2015 / Published: 13 November 2015
Cited by 1 | PDF Full-text (1868 KB) | HTML Full-text | XML Full-text
Abstract
Civil administration departments require reliable measures of accessibility so that residential care facility shortage areas can be accurately identified. Building on previous research, this paper proposes an enhanced variable two-step floating catchment area (EV2SFCA) method that determines facility catchment sizes by dynamically [...] Read more.
Civil administration departments require reliable measures of accessibility so that residential care facility shortage areas can be accurately identified. Building on previous research, this paper proposes an enhanced variable two-step floating catchment area (EV2SFCA) method that determines facility catchment sizes by dynamically summing the population around the facility until the facility-to-population ratio (FPR) is less than the FPR threshold (FPRT). To minimize the errors from the supply and demand catchments being mismatched, this paper proposes that the facility and population catchment areas must both contain the other location in calculating accessibility. A case study evaluating spatial accessibility to residential care facilities in Nanjing demonstrates that the proposed method is effective in accurately determining catchment sizes and identifying details in the variation of spatial accessibility. The proposed method can be easily applied to assess other public healthcare facilities, and can provide guidance to government departments on issues of spatial planning and identification of shortage and excess areas. Full article
(This article belongs to the Special Issue Methodological Innovations and Reflections)
Open AccessArticle Quantifying and Adjusting for Disease Misclassification Due to Loss to Follow-Up in Historical Cohort Mortality Studies
Int. J. Environ. Res. Public Health 2015, 12(10), 12834-12846; doi:10.3390/ijerph121012834
Received: 16 March 2015 / Revised: 22 September 2015 / Accepted: 8 October 2015 / Published: 15 October 2015
PDF Full-text (1026 KB) | HTML Full-text | XML Full-text
Abstract
The purpose of this analysis was to quantify and adjust for disease misclassification from loss to follow-up in a historical cohort mortality study of workers where exposure was categorized as a multi-level variable. Disease classification parameters were defined using 2008 mortality data [...] Read more.
The purpose of this analysis was to quantify and adjust for disease misclassification from loss to follow-up in a historical cohort mortality study of workers where exposure was categorized as a multi-level variable. Disease classification parameters were defined using 2008 mortality data for the New Zealand population and the proportions of known deaths observed for the cohort. The probability distributions for each classification parameter were constructed to account for potential differences in mortality due to exposure status, gender, and ethnicity. Probabilistic uncertainty analysis (bias analysis), which uses Monte Carlo techniques, was then used to sample each parameter distribution 50,000 times, calculating adjusted odds ratios (ORDM-LTF) that compared the mortality of workers with the highest cumulative exposure to those that were considered never-exposed. The geometric mean ORDM-LTF ranged between 1.65 (certainty interval (CI): 0.50–3.88) and 3.33 (CI: 1.21–10.48), and the geometric mean of the disease-misclassification error factor (eDM-LTF), which is the ratio of the observed odds ratio to the adjusted odds ratio, had a range of 0.91 (CI: 0.29–2.52) to 1.85 (CI: 0.78–6.07). Only when workers in the highest exposure category were more likely than those never-exposed to be misclassified as non-cases did the ORDM-LTF frequency distributions shift further away from the null. The application of uncertainty analysis to historical cohort mortality studies with multi-level exposures can provide valuable insight into the magnitude and direction of study error resulting from losses to follow-up. Full article
(This article belongs to the Special Issue Methodological Innovations and Reflections)
Open AccessArticle A Case Study Perspective on Working with ProUCL and a State Environmental Agency in Determining Background Threshold Values
Int. J. Environ. Res. Public Health 2015, 12(10), 12905-12923; doi:10.3390/ijerph121012905
Received: 30 May 2015 / Revised: 5 October 2015 / Accepted: 12 October 2015 / Published: 15 October 2015
Cited by 2 | PDF Full-text (891 KB) | HTML Full-text | XML Full-text
Abstract
ProUCL is a software package made available by the Environmental Protection Agency (EPA) to provide environmental scientists with better tools with which to conduct statistical analyses. ProUCL has been in production for over ten years and is in its fifth major version. [...] Read more.
ProUCL is a software package made available by the Environmental Protection Agency (EPA) to provide environmental scientists with better tools with which to conduct statistical analyses. ProUCL has been in production for over ten years and is in its fifth major version. In time, it has included more sophisticated and appropriate analysis tools. However, there is still substantial criticism of it among statisticians for its various omissions and even its philosophical approach. Due to limited resources, some state agencies have set ProUCL as a standard by which all state-mandated environmental analyses are compared, despite the EPA’s more open acceptance of other software products and methodologies. As such, it can be difficult for state-supervised sites to convince the state to allow the use of more appropriate methodologies or different software. In the current case study, several such instances arose and substantial resources were invested to demonstrate the appropriateness of alternative methodologies, sometimes without acquiring acceptance by the state despite sound statistical demonstration. In particular, efforts were made to address: inappropriate outlier detection, upper tolerance limit (UTL) calculations based on gamma distributions when non-detects were present, and inappropriate use of nonparametric UTL formulas. Full article
(This article belongs to the Special Issue Methodological Innovations and Reflections)
Open AccessArticle Assessment of Residential History Generation Using a Public-Record Database
Int. J. Environ. Res. Public Health 2015, 12(9), 11670-11682; doi:10.3390/ijerph120911670
Received: 31 July 2015 / Revised: 4 September 2015 / Accepted: 9 September 2015 / Published: 17 September 2015
Cited by 1 | PDF Full-text (725 KB) | HTML Full-text | XML Full-text
Abstract
In studies of disease with potential environmental risk factors, residential location is often used as a surrogate for unknown environmental exposures or as a basis for assigning environmental exposures. These studies most typically use the residential location at the time of diagnosis [...] Read more.
In studies of disease with potential environmental risk factors, residential location is often used as a surrogate for unknown environmental exposures or as a basis for assigning environmental exposures. These studies most typically use the residential location at the time of diagnosis due to ease of collection. However, previous residential locations may be more useful for risk analysis because of population mobility and disease latency. When residential histories have not been collected in a study, it may be possible to generate them through public-record databases. In this study, we evaluated the ability of a public-records database from LexisNexis to provide residential histories for subjects in a geographically diverse cohort study. We calculated 11 performance metrics comparing study-collected addresses and two address retrieval services from LexisNexis. We found 77% and 90% match rates for city and state and 72% and 87% detailed address match rates with the basic and enhanced services, respectively. The enhanced LexisNexis service covered 86% of the time at residential addresses recorded in the study. The mean match rate for detailed address matches varied spatially over states. The results suggest that public record databases can be useful for reconstructing residential histories for subjects in epidemiologic studies. Full article
(This article belongs to the Special Issue Methodological Innovations and Reflections)
Open AccessArticle A Bayesian Approach to Account for Misclassification and Overdispersion in Count Data
Int. J. Environ. Res. Public Health 2015, 12(9), 10648-10661; doi:10.3390/ijerph120910648
Received: 30 May 2015 / Revised: 17 July 2015 / Accepted: 25 August 2015 / Published: 28 August 2015
Cited by 1 | PDF Full-text (399 KB) | HTML Full-text | XML Full-text
Abstract
Count data are subject to considerable sources of what is often referred to as non-sampling error. Errors such as misclassification, measurement error and unmeasured confounding can lead to substantially biased estimators. It is strongly recommended that epidemiologists not only acknowledge these sorts [...] Read more.
Count data are subject to considerable sources of what is often referred to as non-sampling error. Errors such as misclassification, measurement error and unmeasured confounding can lead to substantially biased estimators. It is strongly recommended that epidemiologists not only acknowledge these sorts of errors in data, but incorporate sensitivity analyses into part of the total data analysis. We extend previous work on Poisson regression models that allow for misclassification by thoroughly discussing the basis for the models and allowing for extra-Poisson variability in the form of random effects. Via simulation we show the improvements in inference that are brought about by accounting for both the misclassification and the overdispersion. Full article
(This article belongs to the Special Issue Methodological Innovations and Reflections)
Figures

Open AccessArticle A Simulation Study of Categorizing Continuous Exposure Variables Measured with Error in Autism Research: Small Changes with Large Effects
Int. J. Environ. Res. Public Health 2015, 12(8), 10198-10234; doi:10.3390/ijerph120810198
Received: 16 April 2015 / Revised: 25 July 2015 / Accepted: 19 August 2015 / Published: 24 August 2015
Cited by 1 | PDF Full-text (2133 KB) | HTML Full-text | XML Full-text
Abstract
Variation in the odds ratio (OR) resulting from selection of cutoffs for categorizing continuous variables is rarely discussed. We present results for the effect of varying cutoffs used to categorize a mismeasured exposure in a simulated population in the context of autism [...] Read more.
Variation in the odds ratio (OR) resulting from selection of cutoffs for categorizing continuous variables is rarely discussed. We present results for the effect of varying cutoffs used to categorize a mismeasured exposure in a simulated population in the context of autism spectrum disorders research. Simulated cohorts were created with three distinct exposure-outcome curves and three measurement error variances for the exposure. ORs were calculated using logistic regression for 61 cutoffs (mean ± 3 standard deviations) used to dichotomize the observed exposure. ORs were calculated for five categories with a wide range for the cutoffs. For each scenario and cutoff, the OR, sensitivity, and specificity were calculated. The three exposure-outcome relationships had distinctly shaped OR (versus cutoff) curves, but increasing measurement error obscured the shape. At extreme cutoffs, there was non-monotonic oscillation in the ORs that cannot be attributed to “small numbers.” Exposure misclassification following categorization of the mismeasured exposure was differential, as predicted by theory. Sensitivity was higher among cases and specificity among controls. Cutoffs chosen for categorizing continuous variables can have profound effects on study results. When measurement error is not too great, the shape of the OR curve may provide insight into the true shape of the exposure-disease relationship. Full article
(This article belongs to the Special Issue Methodological Innovations and Reflections)
Open AccessArticle Can Public Health Risk Assessment Using Risk Matrices Be Misleading?
Int. J. Environ. Res. Public Health 2015, 12(8), 9575-9588; doi:10.3390/ijerph120809575
Received: 11 June 2015 / Accepted: 11 August 2015 / Published: 14 August 2015
Cited by 1 | PDF Full-text (786 KB) | HTML Full-text | XML Full-text
Abstract
The risk assessment matrix is a widely accepted, semi-quantitative tool for assessing risks, and setting priorities in risk management. Although the method can be useful to promote discussion to distinguish high risks from low risks, a published critique described a problem when [...] Read more.
The risk assessment matrix is a widely accepted, semi-quantitative tool for assessing risks, and setting priorities in risk management. Although the method can be useful to promote discussion to distinguish high risks from low risks, a published critique described a problem when the frequency and severity of risks are negatively correlated. A theoretical analysis showed that risk predictions could be misleading. We evaluated a practical public health example because it provided experiential risk data that allowed us to assess the practical implications of the published concern that risk matrices would make predictions that are worse than random. We explored this predicted problem by constructing a risk assessment matrix using a public health risk scenario—Tainted blood transfusion infection risk—That provides negative correlation between harm frequency and severity. We estimated the risk from the experiential data and compared these estimates with those provided by the risk assessment matrix. Although we validated the theoretical concern, for these authentic experiential data, the practical scope of the problem was limited. The risk matrix has been widely used in risk assessment. This method should not be abandoned wholesale, but users must address the source of the problem, apply the risk matrix with a full understanding of this problem and use matrix predictions to inform, but not drive decision-making. Full article
(This article belongs to the Special Issue Methodological Innovations and Reflections)
Open AccessArticle Model Averaging for Improving Inference from Causal Diagrams
Int. J. Environ. Res. Public Health 2015, 12(8), 9391-9407; doi:10.3390/ijerph120809391
Received: 10 June 2015 / Revised: 23 July 2015 / Accepted: 5 August 2015 / Published: 11 August 2015
PDF Full-text (765 KB) | HTML Full-text | XML Full-text
Abstract
Model selection is an integral, yet contentious, component of epidemiologic research. Unfortunately, there remains no consensus on how to identify a single, best model among multiple candidate models. Researchers may be prone to selecting the model that best supports their a priori [...] Read more.
Model selection is an integral, yet contentious, component of epidemiologic research. Unfortunately, there remains no consensus on how to identify a single, best model among multiple candidate models. Researchers may be prone to selecting the model that best supports their a priori, preferred result; a phenomenon referred to as “wish bias”. Directed acyclic graphs (DAGs), based on background causal and substantive knowledge, are a useful tool for specifying a subset of adjustment variables to obtain a causal effect estimate. In many cases, however, a DAG will support multiple, sufficient or minimally-sufficient adjustment sets. Even though all of these may theoretically produce unbiased effect estimates they may, in practice, yield somewhat distinct values, and the need to select between these models once again makes the research enterprise vulnerable to wish bias. In this work, we suggest combining adjustment sets with model averaging techniques to obtain causal estimates based on multiple, theoretically-unbiased models. We use three techniques for averaging the results among multiple candidate models: information criteria weighting, inverse variance weighting, and bootstrapping. We illustrate these approaches with an example from the Pregnancy, Infection, and Nutrition (PIN) study. We show that each averaging technique returns similar, model averaged causal estimates. An a priori strategy of model averaging provides a means of integrating uncertainty in selection among candidate, causal models, while also avoiding the temptation to report the most attractive estimate from a suite of equally valid alternatives. Full article
(This article belongs to the Special Issue Methodological Innovations and Reflections)
Open AccessArticle Gateway Effects: Why the Cited Evidence Does Not Support Their Existence for Low-Risk Tobacco Products (and What Evidence Would)
Int. J. Environ. Res. Public Health 2015, 12(5), 5439-5464; doi:10.3390/ijerph120505439
Received: 15 April 2015 / Revised: 6 May 2015 / Accepted: 11 May 2015 / Published: 21 May 2015
Cited by 4 | PDF Full-text (727 KB) | HTML Full-text | XML Full-text
Abstract
It is often claimed that low-risk drugs still create harm because of “gateway effects”, in which they cause the use of a high-risk alternative. Such claims are popular among opponents of tobacco harm reduction, claiming that low-risk tobacco products (e.g., e-cigarettes, smokeless [...] Read more.
It is often claimed that low-risk drugs still create harm because of “gateway effects”, in which they cause the use of a high-risk alternative. Such claims are popular among opponents of tobacco harm reduction, claiming that low-risk tobacco products (e.g., e-cigarettes, smokeless tobacco) cause people to start smoking, sometimes backed by empirical studies that ostensibly support the claim. However, these studies consistently ignore the obvious alternative causal pathways, particularly that observed associations might represent causation in the opposite direction (smoking causes people to seek low-risk alternatives) or confounding (the same individual characteristics increase the chance of using any tobacco product). Due to these complications, any useful analysis must deal with simultaneity and confounding by common cause. In practice, existing analyses seem almost as if they were designed to provide teaching examples about drawing simplistic and unsupported causal conclusions from observed associations. The present analysis examines what evidence and research strategies would be needed to empirically detect such a gateway effect, if there were one, explaining key methodological concepts including causation and confounding, examining the logic of the claim, identifying potentially useful data, and debunking common fallacies on both sides of the argument, as well as presenting an extended example of proper empirical testing. The analysis demonstrates that none of the empirical studies to date that are purported to show a gateway effect from tobacco harm reduction products actually does so. The observations and approaches can be generalized to other cases where observed association of individual characteristics in cross-sectional data could result from any of several causal relationships. Full article
(This article belongs to the Special Issue Methodological Innovations and Reflections)
Open AccessArticle Resampling Methods Improve the Predictive Power of Modeling in Class-Imbalanced Datasets
Int. J. Environ. Res. Public Health 2014, 11(9), 9776-9789; doi:10.3390/ijerph110909776
Received: 20 June 2014 / Revised: 4 September 2014 / Accepted: 12 September 2014 / Published: 18 September 2014
Cited by 5 | PDF Full-text (575 KB) | HTML Full-text | XML Full-text
Abstract
In the medical field, many outcome variables are dichotomized, and the two possible values of a dichotomized variable are referred to as classes. A dichotomized dataset is class-imbalanced if it consists mostly of one class, and performance of common classification models on [...] Read more.
In the medical field, many outcome variables are dichotomized, and the two possible values of a dichotomized variable are referred to as classes. A dichotomized dataset is class-imbalanced if it consists mostly of one class, and performance of common classification models on this type of dataset tends to be suboptimal. To tackle such a problem, resampling methods, including oversampling and undersampling can be used. This paper aims at illustrating the effect of resampling methods using the National Health and Nutrition Examination Survey (NHANES) wave 2009–2010 dataset. A total of 4677 participants aged ≥20 without self-reported diabetes and with valid blood test results were analyzed. The Classification and Regression Tree (CART) procedure was used to build a classification model on undiagnosed diabetes. A participant demonstrated evidence of diabetes according to WHO diabetes criteria. Exposure variables included demographics and socio-economic status. CART models were fitted using a randomly selected 70% of the data (training dataset), and area under the receiver operating characteristic curve (AUC) was computed using the remaining 30% of the sample for evaluation (testing dataset). CART models were fitted using the training dataset, the oversampled training dataset, the weighted training dataset, and the undersampled training dataset. In addition, resampling case-to-control ratio of 1:1, 1:2, and 1:4 were examined. Resampling methods on the performance of other extensions of CART (random forests and generalized boosted trees) were also examined. CARTs fitted on the oversampled (AUC = 0.70) and undersampled training data (AUC = 0.74) yielded a better classification power than that on the training data (AUC = 0.65). Resampling could also improve the classification power of random forests and generalized boosted trees. To conclude, applying resampling methods in a class-imbalanced dataset improved the classification power of CART, random forests, and generalized boosted trees. Full article
(This article belongs to the Special Issue Methodological Innovations and Reflections)

Review

Jump to: Research, Other

Open AccessReview Spatial and Spatio-Temporal Models for Modeling Epidemiological Data with Excess Zeros
Int. J. Environ. Res. Public Health 2015, 12(9), 10536-10548; doi:10.3390/ijerph120910536
Received: 4 July 2015 / Revised: 19 August 2015 / Accepted: 21 August 2015 / Published: 28 August 2015
PDF Full-text (1694 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Epidemiological data often include excess zeros. This is particularly the case for data on rare conditions, diseases that are not common in specific areas or specific time periods, and conditions and diseases that are hard to detect or on the rise. In [...] Read more.
Epidemiological data often include excess zeros. This is particularly the case for data on rare conditions, diseases that are not common in specific areas or specific time periods, and conditions and diseases that are hard to detect or on the rise. In this paper, we provide a review of methods for modeling data with excess zeros with focus on count data, namely hurdle and zero-inflated models, and discuss extensions of these models to data with spatial and spatio-temporal dependence structures. We consider a Bayesian hierarchical framework to implement spatial and spatio-temporal models for data with excess zeros. We further review current implementation methods and computational tools. Finally, we provide a case study on five-year counts of confirmed cases of Lyme disease in Illinois at the county level. Full article
(This article belongs to the Special Issue Methodological Innovations and Reflections)

Other

Jump to: Research, Review

Open AccessConcept Paper Effects of Non-Differential Exposure Misclassification on False Conclusions in Hypothesis-Generating Studies
Int. J. Environ. Res. Public Health 2014, 11(10), 10951-10966; doi:10.3390/ijerph111010951
Received: 11 June 2014 / Revised: 11 October 2014 / Accepted: 14 October 2014 / Published: 21 October 2014
Cited by 3 | PDF Full-text (2205 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Despite the theoretical success of obviating the need for hypothesis-generating studies, they live on in epidemiological practice. Cole asserted that “… there is boundless number of hypotheses that could be generated, nearly all of them wrong” and urged us to focus on [...] Read more.
Despite the theoretical success of obviating the need for hypothesis-generating studies, they live on in epidemiological practice. Cole asserted that “… there is boundless number of hypotheses that could be generated, nearly all of them wrong” and urged us to focus on evaluating “credibility of hypothesis”. Adopting a Bayesian approach, we put this elegant logic into quantitative terms at the study planning stage for studies where the prior belief in the null hypothesis is high (i.e., “hypothesis-generating” studies). We consider not only type I and II errors (as is customary) but also the probabilities of false positive and negative results, taking into account typical imperfections in the data. We concentrate on a common source of imperfection in the data: non-differential misclassification of binary exposure classifier. In context of an unmatched case-control study, we demonstrate—both theoretically and via simulations—that although non-differential exposure misclassification is expected to attenuate real effect estimates, leading to the loss of ability to detect true effects, there is also a concurrent increase in false positives. Unfortunately, most investigators interpret their findings from such work as being biased towards the null rather than considering that they are no less likely to be false signals. The likelihood of false positives dwarfed the false negative rate under a wide range of studied settings. We suggest that instead of investing energy into understanding credibility of dubious hypotheses, applied disciplines such as epidemiology, should instead focus attention on understanding consequences of pursuing specific hypotheses, while accounting for the probability that the observed “statistically significant” association may be qualitatively spurious. Full article
(This article belongs to the Special Issue Methodological Innovations and Reflections)

Journal Contact

MDPI AG
IJERPH Editorial Office
St. Alban-Anlage 66, 4052 Basel, Switzerland
ijerph@mdpi.com
Tel. +41 61 683 77 34
Fax: +41 61 302 89 18
Editorial Board
Contact Details Submit to IJERPH
Back to Top