1. Introduction
Forensic anthropologists can assist with the identification of an unknown individual by constructing a biological profile by estimating sex, stature, and age at death from skeletal remains. In some jurisdictions, ancestry (sometimes referred to as race or population affinity) is also considered part of identification for historical and political reasons rather than biological reasons [
1]. There are many theoretical, methodological, and ethical problems with the continued use of these typological approaches in forensic anthropology. Theoretically, osteometric and genetic variation does not cluster in racial groups, continental origins, and national affinities—for a systemic review in forensic anthropology see [
1,
2]. Methodologically, when tested systematically using independent samples, methods perform very poorly when estimating group membership, matching premortem documents in only 36–50% of cases [
1,
3]. These poor results are consistent with some of the earliest tests of these methods beginning in the 1960s, for example, [
4]. Changes in reference samples, alternative statistical approaches, new terminology, software updates, etc., have not resulted in any improvements in providing information that could be useful in a forensic investigation [
1]. Additionally, linking sex estimation and stature estimation to group membership can make these methods more difficult to apply, while compromising accuracy and utility—see [
1,
5]. Ethically, there are at least two major problems with these typological approaches. First, forensic anthropologists should be using the best methods that are available and stop using methods that provide wrong information in over 50% of cases [
1,
3]. Second, the continued racialization of the dead only serves to reinforce existing power structures that marginalize the living [
6]. With a privileged place at the intersection of science and law, forensic anthropologists can have an enormous influence on how individuals and groups are racialized and marginalized by what they say and how they say it [
7].
For individual cases, assessing ancestry/race/affinity should be avoided because methods provide wrong information that would seriously compromise an investigation in the majority of cases. Forensic anthropologists should adopt a more robust protocol for identification that does not include estimating race, ancestry, or affinity [
1,
3,
6]. However, there is a need to prove that a specific group was targeted in investigations of genocide and crimes against humanity. In this paper, we present an approach to estimate group membership that can be applied to mass graves. Although mass graves are defined differently by various researchers, based on the number of individuals required for a burial site to be considered a mass grave, most researchers agree that a burial site must contain remains that are tightly packed and indiscriminately placed to be considered a mass grave [
8]. In other words, the grave is an attempt to hide the evidence of mass murder and deny the dead the locally appropriate mortuary rituals and practices and is thus a continuation of structural violence inflicted on the dead and the survivors.
Without relying on the outdated typological concepts of human variation, we present some preliminary results for an approach that can be used within the context of a mass grave to demonstrate that a specific group was the target of violence. In this research we use carefully selected research samples from two different identified reference collections (IRCs). Subsamples from the relatively homogeneous Coimbra collection are used to model various mass grave scenarios involving one or multiple sexes and a range of sample sizes. A sample from the relatively heterogenous Terry collection is used as a reference sample for comparison.
3. Results
An allocation accuracy of greater than 50% for a subsample from the Coimbra collection indicates that the individuals included in the hypothetical mass grave have more in common with each other than with the reference sample from the Terry collection. This threshold of 50% was exceeded for the all-sexes scenarios when the sample size for the mass grave was greater than 25 cases (
Table 2). The 50% threshold was met with smaller sample sizes for sex-specific trials. For scenarios that included individuals whose sex was documented as female (
Table 3), allocation accuracy for the Coimbra collection mass grave subsamples consistently exceeded the 50% threshold when the sample size was 14 individuals or higher. When the sample was below 14 individuals, the accuracy varied widely from 28.6% (n = 7), to 38.5% (n = 13), to 100% (n = 6). For scenarios that included individuals whose sex was documented as male (
Table 4), the Coimbra collection mass grave subsamples consistently met or exceeded the 50% threshold with a sample size of about 12 to 14 individuals. As with the females, the allocation accuracy below this sample size produced both the lowest allocation accuracy of 44.4% (n = 11) and the highest allocation accuracy of 100% (n = 4). It is worth stating explicitly that the 100% accuracy for the smallest sex-specific models are likely due to chance since the results are not consistent with other results wherein accuracy is correlated to sample size.
Excluding the smallest sample sizes (n = 4 males and n = 6 females), which had the highest allocation accuracies (100%), there is a clear pattern wherein allocation accuracy increases with an increase in size of the mass grave sample. This pattern is clearly visible in
Figure 1, a scatter plot of sample size by allocation accuracy. The other pattern that is clear in
Figure 1 is that, for any given sample size, the allocation accuracy for the sex-specific scenarios always exceeds the all-sexes scenarios.
The
p-values for each case were reviewed in detail for two specific trials for the all-sexes scenarios to assess the pattern of allocation. The
p-values for the scenario with the best results using the entire Coimbra collection sample are presented graphically in
Figure 2. The
p-values for the subsamples (n = 26) that met the minimum parameter (50% allocation accuracy) for useability are presented in
Figure 3. For both figures, the first column includes
p-values between 0.0 and 0.09, the second column includes
p-values between 0.1 and 0.19, the third column includes
p-values between 0.2 and 0.29, etc. The white portion of each column represents the Coimbra collection individuals, and the shaded portion of each column represents the Terry collection individuals. Any given column may include individuals from either collection. For example, the first column in
Figure 2 includes four individuals from the Terry collection sample allocated to the Coimbra collection sample, and over 50 individuals from the Coimbra collection sample allocated to the Coimbra collection sample. The pattern in
Figure 2 clearly demonstrates that the majority of the sample from the Coimbra collection was allocated to the Coimbra collection with
p-values from 0.0 to 0.29 (probabilities of 99% to 71%, respectively), as seen in the white portions of first three columns. The strong
p-values from each individual case confirm the relative homogeneity of the Coimbra collection sample. The results in
Figure 3 appear to be different because of the much smaller sample size for the Coimbra collection subsamples. However, these results are consistent with
Figure 2. For this scenario wherein the Coimbra collection subsamples included 26 individuals, 14 of those individuals (53.8%) were allocated to the Coimbra collection. Of those 14 cases, 13 cases out of 26 (i.e., 50%) had
p-values between 0.0 and 0.29 (probabilities of 99% to 71%, respectively). Even after excluding a case from the Coimbra collection with an ambiguous
p-value of 0.44 (the one case in white in the column 0.4 in
Figure 3), the overall allocation accuracy is still 50%.
4. Discussion
Allocation accuracies varied with subsample sizes and sex-specific approaches in various scenarios, but consistently demonstrated the utility of the method for assessing that a specific relatively homogeneous group was targeted and included in a mass grave. Allocation accuracies of greater than 50% for the various subsamples modelled as mass grave scenarios were possible, provided that minimum samples (n > 13 for sex-specific and n > 25 for all-sexes) were available for analysis. The modeling of the Coimbra collection subsamples demonstrates that it is possible to assess that those individuals who were murdered and included in the mass grave have more in common with each other than with an external, independent reference sample.
Controlling for some variation related to commonly documented sexes (male and female) consistently provided better results and worked well with smaller samples. These results are statistically expected since a sex-specific subsample, regardless of how sex is defined, will include a narrower range of variation for any given subsample. A male-only sample will exclude the smallest individuals from a subsample, and the female-only sample will exclude the largest individuals from the subsample. Either approach will truncate the range of variation. Controlling for some of the variation associated with documented sex necessarily made the Coimbra subsamples more homogeneous and allowed for good results (over 50%) with smaller sample sizes. Although the all-sexes approach requires a larger sample, it is a more robust approach for several reasons. First, sex cannot always be reliably estimated. It can be difficult to assess which individuals might be larger females versus smaller males. In other cases, individuals who identify as men (gender) and are targeted because of their gender may have a skeletal phenotype that is consistent with a “female pattern” or vice versa. Although counter-intuitive, a sex-specific approach based on various biological assumptions may perform worse than an all-sexes approach when a specific gender was targeted because biological sex does not neatly map onto gender. Additionally, the all-sexes approaches can be pursued without linking this method to sex estimation. Individual sex and group membership remain two separate independent questions, and errors in sex estimation will not have an impact on the assessment of group membership—for a well-documented case involving stature estimation see [
5]. Overall, the all-sexes approach should be used when a binary approach to sex estimation is not applicable, sex cannot be reliably estimated, or marginalized genders or sexes (not XX or XY) have been disproportionately targeted.
An added benefit of this approach that focuses on the entire group of victims rather than any one individual is that missing data for any one case (for example, due to severe cranial trauma), or the ambiguous
p-value in any one case (see
Figure 3), will not compromise the utility of the method. The measurements collected for most individuals in the mass grave can be effectively used to demonstrate that a specific relatively homogeneous group was the target of violence, and a clear cluster of strong
p-values can be used to support the conclusion (see
Figure 2 and
Figure 3). The results wherein allocation accuracy exceeded 50% were consistent when sample sizes fluctuated slightly. For example, when the subsamples for the all-sexes scenario (
Table 2) ranged from 49 to 52 individuals, the allocation accuracy ranged from 73.5% to 75%. All the results in this accuracy range strongly demonstrate the homogeneity of the subsample used to model a mass grave. Having to exclude a few cases because of missing data has minimal impact on the utility of the method provided the sample size minimums are met.
Additional research is needed to establish the full parameters of applicability and utility of the method using different predictor variables. Logistic regression allows for a great deal of flexibility in which variables can be used as predictor variables because the data do not need to be normally distributed. It is possible to use different measurements and possibly fewer measurements, and to combine measurements with morphological data and dental non-metric traits as predictor variables. However, future research should be conducted to confirm the minimum number of variables required, and which combinations can be used effectively. In this research we used 10 standard cranial measurements that capture overall cranial size and shape. These standard measurements are often already being collected as part of a standard protocol in many investigations. Thus, the approach described here can be applied to data collected from individuals from mass graves that have already been excavated. The method has potential beyond the investigation of genocide and human rights violations. The method can be applied in some bioarchaeological contexts as well. This research was initiated, in-part, by a need to establish the parameters of applicability of the method to assess the relationships of the deceased included in a Late Bronze Age monumental tomb on the island of Kefalonia, Greece, without relying on destructive methods —for detailssee [
17].
Skeletal analysis is one of two critical steps to determine that a specific group was targeted and included in a specific grave. After the relative homogeneity of the victims in a mass grave is confirmed with this new method, the ethnicity of the targeted group must be assessed using material cultural objects found in the mass grave with the victims. To be clear, we are not attempting to estimate “ethnicity” as a euphemism for race, ancestry, or affinity. The skeletal data are not used to estimate “ethnicity.” The true ethnicity of a group is assessed with the material cultural objects and personal belongings of those murdered and placed in a mass grave. The skeletal data would only indicate that the victims were a relatively homogeneous group when compared to an external reference sample. A similar approach using material cultural objects can be used to assess if certain genders were disproportionately targeted.
In addition to providing evidence supporting the utility of a new method, the results from this research also provide additional evidence against using a typological approach in forensic anthropology. The Coimbra collection subsamples consisting of European-born individuals did not cluster with those individuals from the Terry collection who were described as “White” in various documents—for details about documentary data see [
9]. Furthermore, those individuals who were described in various documents as “White” and “Black” from the Terry collection were consistently grouped together. The results in
Table 2,
Table 3 and
Table 4 indicate that the allocation accuracy for the Terry collection sample starts high at 82.5% and increases to 100% as the Coimbra collection subsamples decrease. However, a more rigorous approach is to hold sample sizes from each collection approximately equal. For the all-sexes comparison (
Table 2), the Terry collection sample included 111 cases, and the allocation accuracy for this reference sample was 83.8%. Comparable samples from the Coimbra collection are 100 and 127 individuals, and allocation accuracy for these Coimbra samples were 82.0% and 84.3%, respectively. Similarly, for the female-specific scenario wherein sample size was the same for each collection (n = 48), the allocation accuracy was 83.3% for the Terry collection individuals and 85.4% for the Coimbra collection subsamples (
Table 3). Lastly, for the male-specific scenario the maximum sample size was the same at 63 individuals for the Coimbra collection sample and the Terry collection, and allocation accuracy was 82.5% for the Terry collection sample and 81.0% for the Coimbra collection sample. It is possible to predict collection membership with an allocation accuracy exceeding 80% with minimal bias in accuracy by collection. The historical, socioeconomic, and political context of the collection has much more explanatory power than race/ancestry/affinity when investigating patterns of human variation (2,5,10,11,13,18).
The results from this research demonstrate that the method for assessing group homogeneity can work in a mass grave context, and that approaches that use race, ancestry, or affinity are not effective at capturing patterns of human variation and predicting group membership. However, the results do not show that the variation in the subsamples from the Coimbra collection can, or should, be extrapolated nationally, and the results do not demonstrate a pattern of “population affinity” for Portugal—see [
10]. The homogeneity of the Coimbra collection subsamples is due to the samples being derived from the mortality sample of a group from a very specific time, place, and political and socioeconomic context. The method works because it assesses temporally, geographically, and economically specific biocultural variation for a specific local group. A different pattern of variation is necessarily expected with living conditions that vary through time and space. For example, in the context of assessing patterns of secular change, the mean maximum femur length for males from the Coimbra collection born in the 19th century was not significantly different from the mean for females from the Terry collection when controlling for year of birth [
18]. When looking at secular changes during the 20th century throughout Portugal, the mean height of males increased from 163.2 cm in 1904 to 172.13 cm with significant differences by district [
19]. There is no homogeneity through time or space in Portugal.
5. Conclusions
The main goal of this research is to present a new approach that is applicable in a mass grave context to assess that a specific, relatively homogeneous group was targeted without resorting to harmful typological pseudo-scientific stereotypes. This method may be used in more recent forensic contexts and possibly in some bioarchaeological applications. We used 10 standard cranial measurements that capture overall cranial size and shape and that are relatively resistant to premortem, perimortem, and taphonomic changes. All the measurements can be easily collected with standard spreading and sliding calipers with minimal training, and typically are already collected as part of a standard protocol in many investigations. The results suggest that these standard measurements can be applied to mass graves that have already been excavated, as well as to future investigations. Missing data or measurements for any one case will not compromise the utility of the method, provided that minimum sample sizes are available for analysis. Additional research is required to assess which other metric, morphological, and non-metric variables could reliably be used as predictor variables.
The pattern of results clearly supports that this method does provide useful information for the investigation of mass graves with minimum sample sizes of greater than 25 for a multi-sex scenario and greater than 13 individuals for sex-specific approaches. The results from this research also provide additional evidence against a typological approach in forensic anthropology involving race, ancestry, and affinity. The Coimbra collection subsamples of European-born individuals consistently did not cluster with those individuals from the Terry collection who were described as “White.” Those individuals who were described as “White” and “Black” from the Terry collection were consistently grouped together.