Although “forensic entomology” could refer to many other activities, the phrase almost always denotes a death investigation, and in particular using insects to estimate when death occurred [1
]. The two familiar entomological postmortem clocks are the development of an individual insect and the succession of species on a corpse [2
], and both processes may potentially be influenced by many factors [3
]. Depending on the circumstances, the investigator may interpret a prediction of insect age or time a corpse was available to insects as the actual postmortem interval (PMI) or a minimum PMI (PMImin
We think that this prediction should be mathematically explicit, and that in most cases the prediction should be a range rather than a single value. If estimating the range using statistical analysis, the typical method is to calculate a confidence set, a step that we think can satisfy the National Research Council’s recommendation that any forensic science conclusion include an objective statement of the uncertainty associated with that conclusion [6
Being able to calculate a probability for a PMI estimate would obviously improve casework, and we think it would be difficult to validate a PMI-prediction method if the estimate does not include confidence limits. It would be an unreasonable validation standard to expect the prediction to be exactly correct. How close to the correct value is close enough to support validity? Confidence limits provide the answer to that question. The coverage proportion should be greater than the nominal value, e.g., if calculating 95% confidence intervals, then at least 95% of predictions should include the true value.
Furthermore, it appears to us that forensic entomology researchers give almost no consideration to how choice of experimental design might influence PMI-prediction performance based on the resulting data.
Some Specialized Terminology
When describing the relationship between development or succession and elapsed time, we refer to the amount of time that has elapsed (usually the time since oviposition/larviposition, i.e., age, or the time the corpse has been available to carrion insects, i.e., the succession interval (S.I.) [5
]) as a condition
, any other factor that influences development or succession rate (e.g., temperature) as a covariate
, and that aspect(s) of a specimen or insect community that changes with time (e.g., larval length or species present on a corpse) as a response
. Scientists concerned with PMI estimation try to understand (e.g., model) the condition/response relationship so as to be able to infer condition from response. To do this, one conducts a training experiment
(producing training data
, TD), in which one records the response(s) corresponding to known values of conditions.
Probably the more an investigator understands the correct condition/response relationship to use for a given death investigation, the more accurate a prediction of PMI will be. This relationship has been the topic of a great deal of published research into the effect of factors such as temperature e.g., [7
], habitat e.g., [8
], drug concentration e.g., [9
], or sex of the insect e.g., [10
]. However, modeling that relationship, such as fitting a regression line to the data, does not by itself specify how to predict condition from response. Exactly how to predict age or S.I. has received relatively little attention in the literature.
The purpose of this paper, then, is to persuade a forensic entomologist to think carefully about how experimental data will be used to predict carrion insect age or S.I. before she or he designs an experiment meant to support casework. To do this, we will describe examples, referred to as lessons, drawn from our own research program on inverse prediction and related statistical methods for PMI estimation [11
2. Lesson 1: Employ an Unbiased Sampling Technique for Generating Training Data
This is an elementary aspect of good design for many kinds of experiments [17
], and we examined the implications for a carrion insect age prediction model [18
]. We were motivated by the fact that some authors deliberately collected biased samples by targeting the largest larvae in a single-age cohort, and by the fact that authors who claimed to take a random sample did not describe any randomization method [18
], without which they could not have sampled even approximately at random [19
]. Given that taking a random sample would require first physically isolating each individual from a rearing container, we doubt that an author who had done this would fail to mention it in the description of experimental methods.
It was clear that sampling the largest larvae yielded an inaccurate prediction of age. For example, in Figure 1
B, about 20% of the predictions included the true age. A model built from small random samples performed relatively well compared to using the much larger full data set, with 100% of predictions, shown by Figure 1
A, including the true age. However, a random sample would require so much effort and would be so disruptive of development that the insects not selected would, we believe, not be suitable for inclusion in an older sample. Therefore, in most circumstances, one might as well sample (e.g., kill and measure) all insects in an age cohort. Perhaps a random sample, removing all insects in a rearing container without replacement at the chosen age followed by a randomization procedure to select a subset for data collection, might be worthwhile if the measurements were particularly time-consuming and/or expensive [18
]. We think it is clear that the most commonly used sampling scheme in carrion insect development studies, i.e., repeatedly removing a small number of larvae from a rearing container, is a faulty experimental design and should be discontinued.
3. Lesson 2: Exceed the Minimum Sample Size for a Categorical Response
We proposed the only statistical method for predicting condition based on a categorical response [12
], see also [20
]. The original application was to estimate S.I. [5
], but the same procedure can be used to predict insect age from stage of development.
Depending on the number of response categories and preferred level of statistical significance [12
], there is a sample size, e.g., the number of training experiment carcasses observed for a given set of environmental conditions, below which it is not possible to ever reject a putative S.I. value (Table 1
). In other words, below the smallest sample size, the training data do not provide enough statistical power for a prediction.
For example, the smallest sample size (e.g., the number of experimental pig carcasses or human corpses decomposed under a given set of circumstances) to possibly reject at the 5% level is 7, and at that the lowest amount of replication the method works for only two categories, e.g., the presence/absence of a single species during succession [12
]. For two insect species (four response categories), the minimum sample size is 22, for three insect species it is 52, etc., and the analysis is likely to be more useful in practice if the sample size of the training data exceeds the smallest sample size [5
The same logic applies to confidence limits on an estimate of insect specimen age from developmental stage [16
]. Most published development data also included observations of one or more continuous responses, such as body length, but instar is a particularly reliable response because it is not affected by the mystery specimen preservation method. If the training data include six life stages (e.g., three larval instars, pupa, adult) then the smallest sample size is 37 insects per age [16
We note that although sample size still plays a role in the performance of an age-prediction model based on continuous training data, there is no smallest sample size [14
4. Lesson 3: The Practical Significance of a Covariate, or the Practical Value of a Response, Should Be Evaluated by Predictive Model Performance
Many authors examine the effect of one or more factors on carrion insect succession or development rate (see Introduction). For example, the discovery that food tissue type influenced larval growth rate led [22
] to conclude “it is important to know where on a corpse larval material has come from”. The implication is that a predictive model based on training larvae reared on one substrate might be unacceptably inaccurate for predicting age of a mystery specimen that fed on a different substrate. The typical interpretation for an experiment such as this was to infer a practical effect from the discovery of a statistically significant effect e.g., [9
However, while we do not doubt that some covariates (temperature especially comes to mind) are of practical importance, so there should either be a match between training data and scene conditions, or some way to extrapolate between the two, a statistically significant effect does not automatically correspond to a practically important effect on prediction performance (Figure 2
In this example, larval food type (pork heart vs. pork liver) had a highly significant effect on larval growth rate, but the model derived from liver was quite accurate when estimating the age of larvae grown on heart. Depending on the particular training data set and comparison, the outcome of these two approaches will not always conflict in this way, but given that a test of significance such as ANOVA does not answer the practical question of interest, while a measure of prediction performance such as the coverage rate does, we think that measuring prediction performance should be preferred.
Similarly, the fact that a particular variable changes with time since death does not by itself guarantee that its inclusion in the model will improve prediction performance. Table 2
shows a simple example of predicting larval age based on a multivariate response. In this case, a prediction of age based on length, width, and instar was no better than a prediction based on width and instar. The number of variables that potentially could be used to predict PMI is huge. One must decide what data to record and to not record during an experiment, and the effect on prediction performance may indicate whether or not to include that measurement in future studies. For example, insect sex influences development rate [26
], but is it worthwhile, for example, to measure calliphorid larval sex when this requires relatively expensive equipment and/or reagents [27
]? Note that we do not claim that including individual insect sex in the model is not worthwhile, just that this and other covariates need to be evaluated in this fashion.
For these reasons, we suggest that the practical importance of a covariate or the practical value of a response be assessed based on predictive model performance.
The most crucial practical application of forensic entomology is the prediction of carrion insect age or SI, which can then potentially be interpreted to support a forensic investigation such as when one concludes that the age of an insect equals PMImin. We argue that this prediction should be mathematically explicit, should yield a range of values rather than a single value, and that defining this range as a confidence set would conform to mainstream scientific practice.
To the extent that a reader agrees with our views, it follows that a central aim of forensic entomology research should be to optimize PMI prediction statistical model performance, something that can only be done if one employs such a statistical model. We hope that the examples presented here will better convey this message to readers who may have missed it within the mathematical language of our earlier publications.