A Bayesian Approach to Estimating Age from the Auricular Surface of the Ilium in Modern American Skeletal Samples

: Age estimation from human skeletal remains is a critical component of the biological profile for unidentified decedents. Using a Bayesian approach, we examine two popular methods (Lovejoy–LJ, and Buckberry zand Chamberlain–BC) for estimating age from the auricular surface of the ilium. Ages of transition are generated from a modern Portuguese skeletal sample ( n = 466) and are coupled with an informative prior from historic Spitalﬁelds, London ( n = 179) to estimate age in a sample of modern Americans from the Bass Donated collection ( n = 639). The Bass collection was challenging to statistically model, potentially due to higher morbidity and mortality characteristics of the central southern United States. The highest posterior density ranges provide a realized accuracy between 84–89% for males and 85–91% for females using the LJ method, and a realized accuracy between 79–82% for males and 65–71% for females using the BC method. Both methods worked well for older individuals. Cumulative binomials showed that both methods signiﬁcantly underperformed; however, results were better for the LJ method, which also showed lower bias. Reference tables for aging modern American samples are provided, and the data meet Daubert guidelines , i.e., legal criteria for acceptable scientiﬁc evidence in a court of law in the United States.


Introduction
An important component in building the biological profile for an unidentified decedent is the estimation of age-at-death. Adult age estimation is particularly challenging because it is based on the degeneration of the skeletal tissue, which is further affected by an individual's life history. These biological and environmental processes result in considerable variation in age indicators, especially for middle to older adults.
Traditionally, skeletal age is estimated by scoring features on a reference sample where morphological changes in the skeleton are linked with known age and then applying these estimated ages to an unknown target sample or individual. In a severe critique, Bocquet-Appel and Masset [1] outlined the bias and inaccuracy inherent in this approach, where the age distribution of the target sample will reflect the age distribution of the reference sample, a condition which later came to be known as age mimicry. Boldsen and colleagues [2] proposed transition analysis, a method that reduces or eliminates the issue of age mimicry. When transition analysis parameters are combined with Bayesian statistics using a prior age distribution, the resulting age estimates more accurately reflect the senescence changes in the target sample while minimizing bias [2]. Bayesian statistics with transition analysis have been successfully applied to pelvic age indicators, including the pubic symphysis [3][4][5][6] and the auricular surface [7][8][9].
Lovejoy and colleagues [10] developed the traditional, standardized method for estimating age-at-death based on morphological changes to the auricular surface of the ilium. Their method (referred to here as the Lovejoy method, LJ), based on an eight-phase system, describes changes to several criteria (e.g., transverse organization, texture, porosity, etc.) and is considered advantageous because the changes manifest well into advanced years. While widely used across various geographic regions, most authors reported broad agreement when applying the Lovejoy method [11][12][13][14][15][16][17][18][19]. However, as one might expect, this traditional approach suffered from typical issues of age mimicry and some researchers struggled to assign a myriad of morphological changes to one stage. This latter issue was specifically addressed by Lovejoy and colleagues, who proposed that scholars should focus on the critical age indicators that best represent the aging process and use auxiliary features to adjust age estimates.
Buckberry and Chamberlain [20] used a different approach to revise the auricular surface aging method (referred to here as Buckberry and Chamberlain method, BC). They argued that the auricular surface features aged independently from one another and should be scored separately and not grouped together into a single phase. Thus, instead of combining various indicators into a single phase, they created a composite scoring system where five features (transverse organization, texture, microporosity, macroporosity, and apical changes) were independently scored and then combined to estimate a final composite age for the individual. They reported increased replicability, due in part to lower intra-and inter-observer error. Importantly, while results were presented using a traditional statistical approach, the authors also used Bayesian statistics to provide posterior probabilities for ten-year age ranges. Some authors [21,22] compared the LJ and BC methods and found BC markedly easier to apply and more accurate than the LJ method. In 2016, Hens and Godde [7] applied Bayesian methodology to both auricular surface methods, statistically modeling age in Portuguese males. They found that the application of transition analysis coupled with Bayesian statistics significantly improved age estimates for both methods, work that was later independently confirmed by Kim and Algee-Hewitt [23].
The main objective of this research is to use a Bayesian analysis to apply an informative prior probability distribution with transition analysis to estimate age-at-death in a modern American skeletal sample of both males and females. We compare the Lovejoy and Buckberry and Chamberlain methods. The resulting age estimates are tested for performance against a hold-out sample of Americans, an issue considered critical by Braga et al. [24]. This Bayesian approach allows for the estimation of accuracy and precision (i.e., bias) for each method. Our second objective is to provide the highest posterior density ages and highest posterior density age ranges at 75%, 90%, and 95% accuracy (i.e., coverages) that may be used by forensic anthropologists practicing in a legal setting who need to estimate the age of unknown individuals. These results meet the criteria outlined in Daubert v. Merrell Dow Pharmaceuticals, Inc. [25] which required rigorous testing and evaluation of forensic methods. Referred to as the Daubert guidelines, this case law outlines the criteria needed for the presentation of scientific evidence in a court of law. Specifically, the United States Supreme Court determined that in order for evidence to be considered reliable, it must be grounded in scientific methods and procedures, i.e., the technique has been subjected to peer review, error rates are reported, and the technique is generally accepted within the relevant scientific community. The 75% coverage would be acceptable for skeletal remains in archaeological contexts, where personal identification of decedents is highly unlikely or impossible and age estimation is provided as a foundation for further demographic or pathological interpretations. However, the 90% and 95% coverages meet the standards outlined for reliability in the Daubert guidelines where accuracy and precision in age estimates are essential.

Samples and Scoring
associated with documented demographic characteristics. Background on collection history and composition are presented by Shirley et al. [26] and Cardoso [27], and the samples have been thoroughly described in the literature [3,4,7,8,28]. A third sample from Christ Church Spitalfields (Natural History Museum, London) dates to post-medieval London residents and is dated to 1646-1859 AD. This sample also has documented age, sex, and occupation for a portion of the collection [29] and served as the sample for Buckberry and Chamberlain's [20] work. The Spitalfields sample has a similar age profile to our two modern samples. Table 1 depicts a breakdown of the samples by sex and age for each collection. The American and Portuguese samples were scored by one experienced researcher (SMH) for the two auricular surface methods, LJ and BC, using protocols to maintain a blind study. Pathological specimens and individuals of unknown sex were not scored. All Portuguese pelves were scored across a period of two weeks, while the Americans were scored during one week. To account for intraobserver error, ten pelves from each collection were scored three times each on different days and there was perfect agreement. Additionally, due to the challenging nature of the LJ method, wherein one must combine numerous indicators into one stage, scoring followed guidelines outlined by Lovejoy et al. [10] where individuals exhibiting traits indicative of an advanced stage were placed into a higher phase/stage even when youthful characteristics were retained. We lament cases where authors have not reported their scoring techniques thoroughly in the literature.

Statistical Methodology
For the goal of this paper, the American data (Bass collection) serve as the target sample to derive and test age-at-death estimates from the two auricular surface aging methods. The Portuguese data (Lopes collection) are used to estimate the transition analysis (TA) parameters. The Spitalfields sample serves as the informative prior, from which we calculated hazard parameters using a Gompertz model. These Gompertz parameters are subsequently combined with the Portugal TA parameters in Bayes Theorem to produce point estimates and ranges of ages to be tested on the American sample. Plots of Kaplan-Meier survivorship calculated on American females and males with Gompertz curves from the Portuguese and Spitalfields data confirm the fit of these samples (Figures 1 and 2). Selection of similar mortuary profiles generally ensures a good fit of the informative prior to the age-at-death structure in the TA sample [6]. However, Godde and Hens [3] reported some deviation from fit may be tolerated; a conclusion that was later confirmed by the independent work of other researchers [9,30].  A number of samples were available from which to choose an informative prior, either through published Gompertz parameters (Terry collection, LA County Coroner from Konigsberg et al. [6]), Knox County, Knox Cemetery, U.S. from census data in Godde [8], or the generation of Gompertz parameters from our own data with published Gompertz parameters (Torino, Portugal, Sardinia) [3,4,7,31], unpublished data of which we were granted access (Forensic Databank, St. Brides [32]), publicly available data (post-medieval England and Wales as described in Godde and Hens [3,4,7,33]), and the data published in Buckberry and Chamberlain [20]. Past attempts at modeling the American sample showed the available samples (except those from post-medieval Europe) were not a good fit and did not produce viable age ranges. Gompertz survivorship plots of the three European samples (Figures 3-5) showed St. Brides and Spitalfields had promise as their fit to the Bass Collection was good, as was their relationship to the sample from Portugal. The decision to go with Spitalfields over St. Brides was driven by a slightly better fit of TA parameters.   The Gompertz model is expressed as: where h is the hazard, t is age-at-death, and S is survivorship; Equation (1) is the hazard and Equation (2) is the survival function. The Gompertz model parameters are α 3 and β 3 . We did not shift our curve to a specific age. The transition analysis parameters were obtained from a cumulative probit model [6], which represent the age at which an individual transitions from, in this case, one auricular surface phase to the next. The probits were run for: (1) Lovejoy method phases, (2) each of the five indicators for Buckberry and Chamberlain, and (3) the composite scores as phases from Buckberry and Chamberlain. The transition analysis and Gompertz model parameters are combined in Bayes' theorem [2]: where f(a) is a probability density function (PDF) used to derive the highest posterior density (HPD), analogous to a frequentist mode, and highest posterior density region (HPDR), which is a range of values calculated at 75%, 90%, and 95% coverages. Coverage represents the "percentage of individuals expected to fall within the specified HPDR" [6]. Due to this property, as the coverages increase, the regions become larger. In this paper, we use the HPDRs as age ranges. The accuracy of the HPDRs was tested by a cumulative binomial test using the documented age of the American samples as a measure of the accuracy of the HPDRs at the various coverages. A measure of accuracy and whether it is significantly different than the stated coverage is provided through this process. Finally, bias is estimated and reported with the following equation: All analyses were run in R [34] using originally composed scripts and programming written by Dr. Lyle Konigsberg (http://faculty.las.illinois.edu/lylek/, accessed on 25 June 2022).

Results
TA parameters for Lovejoy method phases in females and males are found in Table 2. Female ages-at-transition are spread almost across each decade of life until phases 7-8 at which point it jumps to 2 decades. In males, each age-at-transition until phases 6-7 represents a single decade of life. At phases 6-7, this changes to every 2 decades, similar to the females. The Buckberry and Chamberlain method TA parameters in Tables 3-8 have relatively similar patterns across the sexes and show several components that distinguish between young and older adults (i.e., apex and macroporosity). Microporosity does not distinguish between major life stages (e.g., young, middle, and older adult), concentrating on discriminating with an approximately 10-12 year age range in both sexes. Texture ages-at-transition are spread from young-middle adult among the sexes, while transverse organization is spread from young-middle until stages 4-5 at which time it increases to the 80s. Finally, for both sexes the phases derived from composite scores extend relatively evenly until older adult. The HPDR table for the LJ method phases shows similar ranges for females and males ( Table 9). The regions increase with phases, which is expected due to increased heterogeneity as one ages. In comparison with the HPDRs generated from the BC method (Table 10), the Lovejoy method's regions were narrower by phase. Both aging techniques detected older ages. The cumulative binomials (Table 11) show that neither method's accuracy met the corresponding coverage level; they all significantly underperformed. However, the realized accuracies for the Lovejoy method coverages were still good-excellent, especially for the 90% coverage, yielding approximately 91% accuracy in females and 89% accuracy in males. Biases indicate that the LJ technique overaged for both sexes, while the BC method underaged females and overaged males. In general, the bias was greater for the BC method. Taken together, the results from the LJ method technique were superior to BC-at least for the American target sample. Bold indicates significantly lower performance than probability.

Discussion
Age estimation from the auricular surface of the ilium remains a popular method utilized by forensic anthropologists, with the LJ method garnering more than twice as many practitioners as the BC method in a recent survey [35]. The BC method is reportedly easier to apply and outperforms the LJ method when no statistical manipulations are applied [7,21,22]. The BC method should be the technique of choice when practitioners are unable to use Bayesian statistics. However, previous work on Portuguese males [7] strongly indicated that while both methods perform relatively equally when Bayesian statistics are applied, the LJ method showed significant improvement under this rigorous methodology compared with a traditional (i.e., non-Bayesian) approach. There is little reason for forensic anthropologists to apply the traditional LJ method without the use of a Bayesian approach. Additionally, the auricular surface of the ilium outperforms the commonly used Suchey-Brooks pubic symphysis method, providing narrower age ranges with better coverages (see ranges in Godde and Hens [3,4] for comparison).
This paper used Bayesian modeling to estimate age in modern Americans from both sexes and supports our previous work modeling age in Portuguese males [7]. When contrasting these two aging methods on Americans of both sexes, the LJ method showed narrower age ranges and lower bias, around 8%, compared with around 11% using BC. Additionally, the LJ method showed higher realized accuracy at 75%, 90%, and 95% coverages, and markedly so for females, where accuracy was between 15-20% higher using LJ. The HPDRs reported here control for age mimicry, allowing age estimates to be applied across different populations. Forensic anthropologists should feel comfortable using the HPDR tables presented here, along with our previously published values for Portuguese males [7] as look-up tables for their own data and avoiding non-Bayesian auricular surface aging, especially for the LJ method. Indeed, Konigsberg et al. [6] argue that population-specific age indicators are not as important as comparable age structures between populations.
Admittedly, we attempted to model age in the American sample previously but were unable to find a comparable sample with a good fit to use as an informative prior. The Bass Donated collection represents a sample of modern Americans drawn primarily from Tennessee, but also other regions in the central south of the United States; one would think comparable populations would abound. In a 2017 paper, Godde [8] noted the Bass Collection Gompertz survivorship was significantly different than a cemetery from the same city and the mortality data from the U.S. Census for the same county, state, and country (also visible in . Thus, we were challenged in finding a sample with a similar age distribution until we accessed the data from Spitalfields and St. Brides. When choosing a prior, it is customary to search for populations that are comparable in terms of time and geography. Why would a sample of post-medieval Londoners have provided such a good fit to modern Americans from the south? In particular, for females, the lower survivorship of around age 83 and over almost mirrors post-medieval Europe. Additionally, the roughly contemporary Portuguese sample, representing urban as well as rural-to-urban migrants, also matched with the modern Americans. An exploration of the literature on health and mortality provides some insights as to why it was so tough to fit a model to the Bass collection.
In general, mortality in the United States is higher than in countries with similar economies [36]. While health and mortality may vary due to numerous factors including socioeconomic status, sex, and ethnicity, geography remains a significant factor. Geographic inequality in adult mortality in the United States is greater than in western Europe [37] and the central-southern section of the United States is one of the most disadvantaged [38], especially since the mid-20th century [39].
The rural southern United States, especially Tennessee, Kentucky, Alabama, and Mississippi, has higher rates of morbidity and mortality compared with other rural and urban areas in the Northeast, Midwest, and Western United States [40,41]. In particular, Appalachia and the Mississippi Delta regions have the lowest life expectancy in the country and the highest mortality rates due to various health issues [42][43][44][45][46]. People from Tennessee and Appalachia are the main contributors to the Bass Donated collection [26].
Mortality rates in the last two decades in many southern states (including Tennessee) were 30-40% higher than other regions of the United States-translating to 3-4 fewer years of life expected at age 50 [36]. These values represent the effects of numerous environmental and health hazards, including: smoking [36]; lower bone mineral density with higher hip fracture rates [47]; nutritional disease [48]; growth stunting [49]; stroke [50,51]; and high helminth load, with up to 55% of people affected by endemic whipworm, despite campaigns to eradicate such infections [52]. Overall, these studies underscore the poorer health and increased mortality risk from structural inequalities in populations from the central south United States, which is the underlying population for the Bass Donated collection. The health and mortality of these U.S. southerners more closely matches that from post-medieval London and urbanizing Lisbon in the 1800s who would have been affected with similar environments [28].
Despite the challenges inherent in modeling the Americans, we are successful in providing highest posterior density ages and highest posterior age ranges for modern American samples. Tests on a hold-out sample show realized accuracy in the 80-90% range. The Bayesian approach is far superior to traditional (i.e., non-Bayesian) methods because it controls for age mimicry, allowing age estimates to be applied across different populations. Additionally, Bayesian methods provide prediction intervals and/or credible intervals. Some forensic anthropologists may find the wide age ranges disappointing and hesitate to apply them to their own casework. We would remind them that these estimates are far superior to the traditional auricular surface age estimates alone and these statistical approaches are necessary to meet Daubert criteria in American forensic science.  Data Availability Statement: The data from this study was obtained from several sources: (1) published data are available in the works cited, (2) data collected by the authors are available upon reasonable request, (3) data provided by the Museum of London are publicly available at the WORD website (https://www.museumoflondon.org.uk/collections/other-collection-databases-andlibraries/centre-human-bioarchaeology/osteological-database, accessed on 25 July 2019) and by reasonable request to the Museum of London, and 4) other data sources' availabilities are under the purview of the owners of the data.
Acknowledgments: Many thanks to the curators who allowed access to the William Bass Donated collection and the Luis Lopes skeletal collection. Further, thank you to the faculty and staff who provided access to FDB data and the Museum of London online database (for the St. Brides data). We could not have conducted any of this work without the significant contributions to the literature and valuable insight offered by Lyle Konigsberg, who served as graduate advisor extraordinaire to both authors many years ago.

Conflicts of Interest:
The authors declare no conflict of interest.