Sex Estimation from the Pubic Bone in Contemporary Italians: Comparisons of Accuracy and Reliability Among the Phenice (1969), Klales et al. (2012), and MorphoPASSE Methods

Godde, K.; Hens, Samantha M.; Fuentes, Gwendolyn

doi:10.3390/forensicsci5040054

Open AccessArticle

Sex Estimation from the Pubic Bone in Contemporary Italians: Comparisons of Accuracy and Reliability Among the Phenice (1969), Klales et al. (2012), and MorphoPASSE Methods

by

K. Godde

^1,*

,

Samantha M. Hens

² and

Gwendolyn Fuentes

²

¹

School of HESBS, Moreno Valley College, 16130 Lasselle St, Moreno Valley, CA 92551, USA

²

Department of Anthropology, California State University Sacramento, 6000 J St, Sacramento, CA 95819, USA

^*

Author to whom correspondence should be addressed.

Forensic Sci. 2025, 5(4), 54; https://doi.org/10.3390/forensicsci5040054 (registering DOI)

Submission received: 14 September 2025 / Revised: 20 October 2025 / Accepted: 24 October 2025 / Published: 27 October 2025

Download

Browse Figures

Versions Notes

Abstract

Background/Objectives: The identification of a decedent through skeletal analysis is dependent on accurate estimation of demographic characteristics, including biological sex. The most well-known sex estimation technique using the pubic bone is the Phenice method. In 2012, it was revised by Klales and colleagues and a logistic regression equation to predict sex was applied. Later, a program that estimates sex from Klales’ scoring with a random forest model, MorphoPASSE, was developed by Klales. Methods: Here we compare the accuracy of the original and revised methods, along with MorphoPASSE, using a contemporary sample of Northern Italians with documented sex. We further test the assertions by Phenice that his method is easy to employ for new observers and that ambiguity can be applied when characteristics do not morphologically fit into the categories of the method. Accuracy, error, bias, sensitivity, and specificity were calculated for each approach, along with McNemar’s tests for paired data, which compared documented sex and estimated sex. A linear weighted Cohen’s Kappa measured the differences in scoring between a new observer and an experienced observer. Results: Phenice’s method achieved higher accuracy (97%) than the Klales method and MorphoPASSE (86% each), as well as higher sensitivity and specificity, and lower error and bias. All McNemar’s tests conducted were not significant. The new observer demonstrated a similar accuracy (93%) to the experienced observer (97%). Furthermore, comparisons of Phenice’s scoring with ambiguity indicate its superior performance for capturing variation over the Klales method and MorphoPASSE. Conclusions: Phenice’s method is recommended in forensic anthropology and bioarchaeological contexts, particularly in Milan.

Keywords:

pubic bone; forensic anthropology; biological profile; logistic regression; validation; morphological traits

1. Introduction

A critical part of the biological profile in forensic anthropology cases is the estimation of biological sex. Numerous visual and metric methods derived from the cranial and postcranial skeleton have been proposed and tested for accuracy over the years. It is well established that the pelvis remains the best indicator of sex in the human skeleton due to the differences in biological function of the male and female coxa. In 1969, Phenice [1] developed a method for estimating sex from the pelvis that was quick, accurate, objective, and did not require years of experience to apply on adult individuals. Phenice identified the following three traits: the ventral arc, subpubic concavity, and medial ischiopubic ramus. The first two traits were scored as either present or absent, while the ischiopubic ramus was scored as broad or narrow with a ridge. Despite identifying minor variations in the presentation of these traits, Phenice described them as discrete and objective. He reported at least 96% accuracy (thus, a 4% error rate) using a large sample from the Terry anatomical skeletal collection in Washington D.C. and found that the ventral arc was the most reliable indicator, while the medial aspect of the ischiopubic ramus was the most ambiguous trait. He stated his method was easy for novice observers to apply, and it included room for ambiguity, despite his primary focus on classifying traits into either male or female categories. Additionally, Phenice argued that variation should be expected and not hinder overall assessment of sex. Importantly, he emphasized that the method was not tested on subadult remains, and that the ventral arc and subpubic concavity were not well developed in females until 20 years of age.

Phenice’s [1] landmark paper facilitated generations of biological profiles in forensic anthropology analyses, providing information to assist in identifying decedents. It is currently enjoying a resurgence of research which demonstrates its continued relevance in comparison to later methods that have been developed [2,3,4]. Here, we test the Phenice method for accuracy and error on a new population (Northern Italians), its reported ability to be easily learned by new observers, and the value of the ambiguity he documented in the method. We further compare its performance to a later method that modernized and revised the original scoring of Phenice’s traits.

Phenice’s traits are based on the growth and development patterns of sexual dimorphism in the pubic bone. When looking at adolescent growth differences and pelvic sexual dimorphism, Coleman [5] argued that the superior functional region of the pelvis (false pelvis) is less sexually dimorphic, which contrasts with areas such as the pubic bone, whose subpubic angle has notable sex differences [5], along with the ventral arc and medial aspect of the ischiopubic ramus. The subpubic angle is wider in females compared to males, which Coleman [5] attributes to the difference in growth of the ischial tuberosity. In females, the inferior margin of the ischial tuberosity is directed more laterally, whereas in males, growth is directed more inferiorly, although it is still laterally placed for muscle attachment. This directional difference in growth is what creates a wider angle and subpubic concavity in the female pelvis.

The pubic symphysis begins forming during childhood and adolescence, and completes its formation in the post-adolescence period; in females and in males, pubic growth stops around age 18 [6]. Despite the developmental age of the ventral arc being about 20, Sutherland and Suchey [7] identified a precursor arc at age 14. The ligament and muscles that arise from this ridge of bone are the gracilis and adductor brevis, which are identical in origin between sexes [6], but their course is directed laterally in females, as opposed to running parallel to the symphysis and inferior ramus in males [8]. Phenice [1] found that the ventral arc always runs as far up as the pubic crest, but Budinoff and Tague [6] reported only 25% of individuals showed an arc running this superiorly using a sample from the Hamann-Todd osteological collection. While it is not well-understood what makes the medial aspect of the ischiopubic ramus a sexually dimorphic trait, Klales and colleagues [9] hypothesized that the pinched, narrower surface typical of females corresponds with the lengthening of the pubic bone during adolescence that is caused by hormonal activity during puberty [8].

Several researchers tested the Phenice method on a variety of samples, providing a fundamental understanding of its strengths and potential weaknesses. Lovell [10] reported a slightly lower accuracy (83%) when the method was tested on medical school cadavers, but upheld the assertion that the method was easy to apply and no experience was necessary. Sutherland and Suchey [7] expanded the method to a large sample of over 1200 individuals from the Los Angeles County coroner’s office; however, they focused only on the ventral arc, which alone matched the 96% accuracy rate. They also reaffirmed that the ventral arc is fully developed by age 20, suggesting lower accuracy for younger individuals. A 1990 study by MacLaughlin and Bruce [11] reported surprisingly low accuracy, based on three northern European collections, where they also identified the subpubic concavity as the most reliable trait and found observer experience to be critical to the method. However, their study was recently refuted by McFadden and Oxenham [12], who argued that the MacLaughlin and Bruce [11] study suffered methodological problems, specifically the inclusion of an ambiguous category. While an ambiguous category might be considered an intermediate score or moderate expression of a trait, it problematically may often be construed as “I don’t know” [12]. Furthermore, they point out that MacLaughlin and Bruce [11] did not apply the assignment of overall sex as Phenice instructed, which involves assigning the sex that corresponds to at least one trait that is strongly male or female (expanded on more in the Materials and Methods, and Discussion sections), and instead MacLaughlin and Bruce used the approach of assigning the sex of two traits in agreement. Ubelaker and Volk [13] also felt that experience contributed to the success of the method, with ~88% accuracy overall, and higher accuracy for females than males. More recently, the Phenice method was applied to CT scans of a large sample from western Australia [14], and found highly accurate (~92%) and suitable for forensic application. When compared to various sex estimation methods, the Phenice method was reported to be the most accurate (~94%) [15] and reliable [16] when compared to the skull and other regions of the pelvis.

In 2012, Klales and colleagues [9] published a revision of the Phenice method as they proffered the original method did not capture the variation in the expression of the three traits because it focused on “extremes” (p. 106). They further claimed the Phenice method operated under a “majority rules” approach to assigning a final sex from the three characteristics when some traits were better at discriminating sex than others (p. 106), positing an argument using similar scoring to MacLaughlin and Bruce [11] that did not follow the Phenice method for assigning overall sex. Finally, they pointed out that in the Phenice method there is no posterior probability as the measure of uncertainty for the final diagnosis of sex. Klales et al. [9] created a visual 5-point ordinal scale for each trait using the Hamann-Todd collection and William M. Bass Donated Skeletal Collection to align with other nonmetric studies that used similar types of scales and applied statistical methods to generate error rates and posterior probabilities. They wrote new trait descriptions, retitled subpubic concavity to subpubic contour, created clear illustrations of their revisions based on the original Phenice method, and provided easy-to-use, representative, color picture examples of each character state. Their categories reflect morphological differences, rather than “I don’t know” c.f., [12]. The logistic regression results yielded a mean correct classification of 94.5%, but that of tests on independent samples was 86.2%. Subsequent predictive models found 96.6% accuracy when using the ventral arc alone, with only minimal improvement using a multiple-variable model. Later research by Klales [17] identified secular trends in trait expression for all traits in females, and the subpubic concavity and ventral arc for males. Recent research by Klales and Cole [18] provided recalibrated regression equations for Hispanic groups, which improved accuracy and greatly improved bias, and created a new logistic regression equation for modern Mexican individuals with accuracies as high as 100% [19].

Tests of the Klales et al.’s [9] revised method found varying degrees of success. Lesciotto and Doershuk [20] examined the revised method using a large sample from the Hamann-Todd collection and found excellent accuracy for females (95%), but surprisingly poor accuracy for males (50%). The authors reported moderate to substantial agreement between observers, and requested better trait descriptions and visual guides. Kenyhercz and colleagues [21] tested the revised method on a large sample spanning populations from around the world. Validation accuracy ranged from 87.5 to 95.6% on population-specific models and a global model reached 95.9%, indicating that the revised method can be employed on diverse, worldwide populations without specific population equations, making it highly utilitarian. Indeed, Selliah et al. [22] found 100% accuracy in the sex estimation of a large sample of middle-aged and older contemporary (meaning from the current time period) individuals from the University of Milan collection.

In 2018, Klales and Cole [23] released the manual for MorphoPASSE [24], a software program that calculates the logistic regression equation from the Klales et al. [9] paper, as well as a random forest model (RFM), which is the recommended statistical procedure in the manual. The manual provides the pictures and diagrams from the original Klales et al. scoring method, and includes updates of the written scoring instructions. Also provided in MorphoPASSE, is the ability to run scores from cranial morphological sexing using the Walker [25] equations, and an RFM for combining cranial and pelvic scores. Using the Forensic Anthropology Database for Assessing Methods Accuracy (FADAMA) [26], we calculated how often practitioners in the United States run the MorphoPASSE software. As of October 2024, 6% of cases using Phenice’s method, Klales et al.’s revisions, and/or MorphoPASSE scoring (whether it be on the cranium or pelvis, or both) and run MorphoPASSE, the software for sex estimation. Additionally, the American Board of Forensic Anthropology critiqued applications for the non-use of MorphoPASSE in 2023 (personal communication). However, to date, the scoring in the MorphoPASSE manual (which is reported to differ from Klales et al. [9]), the RFM in MorphoPASSE, and R statistical coding have not been peer reviewed, which limits our understanding of how MorphoPASSE is calculating its models.

In 2019, Konigsberg and Frankenberg [27] critiqued the approach of the binary logistic regression of Klales et al. [9]. They point out that the dependent (sex) and independent (revised Phenice skeletal indicators) variables in Klales et al. [9] are swapped from transition analysis theory that shows that skeletal traits (e.g., revised Phenice skeletal indicators) are instead dependent on demographic characteristics (e.g., sex), and use examples and/or data from Konigsberg et al. [28] and Konigsberg and Hens [29] to show why this is problematic, ultimately supporting the application of multivariate ordinal probit regression. Klales et al. [30] agreed with the points raised by Konigsberg and Frankenberg [27] and asserted they did not view their 2012 equation as “the last word” (p. 388). They further pointed to the promise of machine learning methods and referenced the advantages of RFM in MorphoPASSE and provided estimates of improved accuracy of RFM (that were obtained from a book under review) over the 2012 logistic regression equation. Only one chapter in that book described MorphoPASSE upon publication in 2020 [31]; it highlighted the manual and interface. However, it did not provide the R code for peer review, a test of the performance of the statistics as it was cited in 2019, or similar necessary information.

At the time of this writing, only three studies have compared the original Phenice to the revised Klales et al. method on the same samples, and no studies have examined the performance of the MorphoPASSE program. Rojas González et al. [4] analyzed 265 documented individuals from a Chilean sample in Santiago. They found the accuracy of the Phenice method was ~97%, while the accuracy of the Klales et al. method was ~87%. The authors designed additional models based on the Chilean population and reported that a single-variable model with the ventral arc was the most accurate, at 96.6%, while using all traits for classification reached 97% accuracy. Jager and Eliopoulos [2] tested the original Phenice and revised Klales et al. methods on a modern documented Portuguese collection in Lisbon. In their study, they also used 2/3 traits to assign overall sex for both the Phenice and Klales et al. traits, but the accuracies they achieved were much higher than MacLaughlin and Bruce [11]. The original method performed better, with 96.5% accuracy, compared to the revised method, at 92.7% accuracy. Females were more likely to be sexed correctly in the original method; however, results were similar for the revised method. Finally, Zermeño and Godde [3] evaluated how accurate the Phenice and Klales et al. methods were in contemporary forensic anthropology case work in the United States using FADAMA [26], finding the Phenice method performed superiorly to that of Klales et al.

Aims

We build upon this prior work by testing Phenice’s [1] method on a contemporary sample, evaluating his assertion of the ease of the application of the method by a new observer after simply reading his paper (p. 298), and comparing the difference between the applications of ambiguity in Phenice’s method against having to use only extreme categories (i.e., male or female, only). We also follow the lead of recent papers that compare the Phenice method to the Klales et al. scoring system, their logistic regression equation, and the later-developed and associated online software, MorphoPASSE. The recommended RFM in MorphoPASSE, to our knowledge, has not yet been peer reviewed for pelvic sex estimation or had its performance tested. While we cannot peer review MorphoPASSE, we will test its accuracy for what we think is the first time in the published literature. We also provide sensitivity and specificity for Phenice’s and Klales at al.’s methods for what we also believe is the first time in the literature. Our hypotheses, based on the literature review, are as follows: (1) Phenice’s method will be more accurate with a lower error, and higher sensitivity and specificity than Klales et al.’s method using either statistic; (2) a new observer will estimate sex at a similar rate to an experienced observer for Phenice’s method, as per his claim; and (3) for both approaches, using ambiguity and using extremes with Phenice’s method will perform similarly.

2. Materials and Methods

The data were collected from individuals in the CAL Milano Cemetery Skeletal Collection in June of 2025. The collection is composed of individuals buried in the cemeteries of Milan and exhumed by cemetery workers after being buried for at least 10 years [32]. It is housed at the Laboratorio di Antropologia e Odontologia Forense (LABANOF) in the Department of Biomedical Sciences for Health at the University of Milan, Italy, and part of the Collezione Antropologica LABANOF (CAL). The collection is fully documented with sex, age, birth and death dates, and cause of death recorded. The majority of individuals (85%) died after 1980 and represent a cross-section of contemporary Milanese society [32]; LABANOF further refers to the collection as “contemporary”. Article 43 of the Presidential Decree of the Italian Republic of the National Police Mortuary Regulation allows cemeteries to grant unclaimed skeletal remains to universities for education and research [32].

Of the 424 adult individuals available, 147 had complete enough pubic bones to facilitate the scoring of one or more of the traits from each method. Approximately 45% of the scorable sample was female and 55% male, making the sexes roughly balanced. The age at death ranged from 20 to 97, with a mean of 61.94 years old overall, although sixteen individuals we scored did not have documented age. On average, females (66.04 years) were slightly older than males (58.25 years). Our sample differs from the Selliah et al. [22] sample by being larger and extending the minimum age at death to 20 years, consistent with the full formation of the ventral arc. We scored lefts and rights if both were available, but did not score any with visible pathology. For analysis, we used the left side and substituted with the right side if the left side was not scorable; both sides had approximately the same amount of missing data, and thus shifting to the right and substituting with a left did not provide an advantage.

Two observers (first and third authors) scored the pubic bones, with each scoring different approaches. Observer 1 (third author) evaluated the pubic bones in two ways. First, Observer 1 scored the Phenice method by documenting the ambiguity built into the method for each score (i.e., possible male or possible female) for Hypotheses 1 and 3. The observer is a master’s student who applied the Phenice method for the first time in this project (Hypothesis 2), although they had learned specifics about it in a human osteology class taught by the second author. Second, Observer 1 scored the Phenice method without documenting the ambiguity; they scored the pubic bones as in Phenice’s method [1] (Figure 1). Prior to scoring, when Observer 1 asked questions for clarification, the second author provided answers on morphology, while Observer 2 instructed Observer 1 to follow the method exactly. After scoring, Observer 1 was allowed to see the documented sex of the individual and ask questions, which were answered by Observer 2, but they were unable to change their scores. Due to this, it was expected that, over time, Observer 1 would normalize to Observer 2. However, this practice should approximate how someone would learn a new method in a classroom or during supervised research, which are conditions we wish to replicate in order to reflect the real world.

Observer 2 (first author) also scored the Phenice method with and without documenting the ambiguity, so that the Observer 1 and Observer 2 scores could be compared for accuracy and overall sex estimation differences (Hypothesis 2). Moreover, Observer 2’s scoring of with and without ambiguity addresses Hypothesis 3. Observer 2 also scored the Klales et al. method, as described in the MorphoPASSE manual [23], for Hypothesis 1. This observer is a seasoned skeletal biologist who has scored many skeletons using the Phenice method (Hypothesis 2). This was the first application of the Klales method in a research environment for Observer 2, but they had used it in teaching and forensic labs. The decision to only have Observer 2 score the Klales method was due to the overlap with Phenice; having a new observer learn the methods simultaneously could lead to improved scores for both methods. Observer 1 was aware of the Klales method, but not to the degree of the Phenice method. Thus, the Phenice method seemed more appropriate due to prior exposure. The first 10 pubic bones were reevaluated to check the scoring after the conclusion of data collection, about two days later. Additionally, Observer 2 rescored four pelves at the time they were observed due to straying from the instructions in the Phenice method [1]; they recorded scores that were on a 1–5 scale, as in the Klales method. The research team took care not to examine other elements that indicated biological sex and to hide the sex of the individuals on the boxes prior to assigning a final sex by indicator during sampling. This was accomplished by having the second author solely pulling boxes the first day and placing them with the labels hidden against a wall. The second day, one observer would pull several boxes and stack them with the labels facing the wall and the second observer would later randomly select one of those boxes to evaluate, after scoring one to two more boxes that were out from the prior batch that had been pulled. This process ensured any memories of labels by the observer who pulled the boxes would not follow the individuals.

For Hypothesis 1, the sex of each individual was estimated using the method-specific instructions. As the aim of this paper is to look at final sex estimation from the two pelvic methods and not to look at individual traits, the accuracy was calculated by the overall sex estimated by the observer and method. Thus, the Phenice [1] scores were evaluated according to his guidelines; he stated the following:

“When there is some ambiguity concerning one, or rarely, two of the criteria, there is almost always one of the criteria which is obviously indicative of male or female. If the estimation of sex is based on the one or two criteria which are definitely male or female, the estimate will be right at least 96% of the time as was shown by the test on the Terry skeletal material.” (emphasis added, p. 300)

Turning to the Klales et al. method, while the MorphoPASSE manual states, “The descriptions below are nearly verbatim from the Klales et al. [9] method modification of Phenice’s [1] original method; however, trait weighting has been added to clarify” (p. 22), Klales [31] contradicts this statement with, “The five Walker (2008) and three Klales et al. (2012) traits should be scored using the MorphoPASSE manual (Klales & Cole, 2018), not the original publications, because modifications were made to the traits based on research from the grant” (p. 273). Because the only example provided by Klales [31] refers to cranial traits, rather than pelvic, we compared the written descriptions by Klales and Cole [23] to Klales et al. [9] after confirming the illustrations and color pictures were the same in the two documents and reflected the scoring in both. As the Klales and Cole [23] descriptions appeared to be only more detailed than the Klales et al. [9], and would not have changed the scoring outcomes by trait, we decided to follow the statements in Klales and Cole [23] that the descriptions were relatively the same. Observer 2 also did not notice a difference during scoring as laminated copies of both documents were placed in front of the observer to consult for scoring and in case differences arose. However, we recognize that the differences between the scoring documents may be detectable on other samples or for observers who are more dependent on written descriptions; Observer 2 used the pictures identical in both documents to make decisions in scoring after consulting the written descriptions. The pictures depict many of the more detailed components described in Klales and Cole’s [24] that are not captured in the written text of Klales et al. [9].

For the Klales et al. [9] traits, we estimated sex using the logistic regression equation (referred to as Klales logit, here) and cutpoint provided in the article (which classifies the sex of a decedent using the higher probability, p. 111), as well as MorphoPASSE (referred to as Klales RFM, here). According to the MorphoPASSE manual [23] (downloaded July of 2025), the practitioner can select the logistic regression equation from the 2012 article or “on-the-fly” RFM. In multiple areas, the manual recommends the RFM option in bold red text over the logistic regression equation [23]. Thus, we ran our scores from the Klales et al. [9] method in MorphoPASSE, selecting “MorphoPASSE Random Forest,” “Contemporary” for time period, and “Unknown” for ancestry and region, as the population we observed is not specifically represented in the database from which MorphoPASSE draws. For MorphoPASSE, we recorded the estimated sex and probability of belonging to that sex from the Case Prediction section, the model accuracy (which is calculated from a holdout sample in the MorphoPASSE database [23]) and Kappa from the Test Accuracy section, and mean decrease in Gini index from the Variable Importance section. The sample size MorphoPASSE used was 1210.

For Hypotheses 1 and 3, the estimated sex using each method and observer was compared to the documented sex and the accuracy (# of accurately sexed individuals/# of individuals) and error calculated (1-accuracy). We further calculated the sensitivity and specificity for the methods. For sensitivity, we divided the number of true positives for a method by the number of true positives added to the number of false negatives. Similarly, specificity was calculated as the number of true negatives divided by the number of true negatives added to the number of false positives. The documented sex and estimated sex were statically compared by observer and method using McNemar’s test for paired data (Hypotheses 1–3) to provide consistency with other studies. We met the assumptions of the binary categorical variables; the dependent variable had two, non-overlapping categories (male or female) and the independent variable had two related categories (estimated vs. actual sex). For scoring differences between observers, which would provide further information about the scoring of Observer 1 for Hypothesis 2, a linear weighted Cohen’s Kappa was performed on Observer 1 and Observer 2’s overall sex estimations with and without ambiguity. This tested the interrater reliability in categorical data. However, we emphasize that these tests were not meant as an interobserver error investigation; rather, we were looking at intra-method performance between experience levels. The inaccurate conclusions were analyzed for distributions across sex and age (when available) to look for patterns and sex bias calculated (female accuracy–male accuracy). All analyses other than the RFM were conducted in R 4.3.1 [33].

3. Results

Histograms of Observer 2’s scores by trait and documented sex are presented in Figure 1 (Phenice) and Figure 2 (Klales). The pattern of Phenice scores shows males had increased scores, with traits described as more likely to be male using Phenice’s method [1]. This is most apparent with medial ischiopubic ramus; 31% of the scores for persons identified as female were scored as having a “broad flat surface,” which Phenice [1] describes as being more likely in males. The Klales traits show that females were more likely to have lower scores (i.e., 1 and 2) and males higher scores (i.e., 4 and 5), with the exception of the medial aspect of the ischiopubic ramus. For that trait, the scoring difference from Phenice allowed females to be categorized with lower values (i.e., 1 and 2). However, that did not differentiate them from males who also scored primarily as 2. We return to the issue of so many males scoring as a 2 at the end of the Results.

The accuracy of the overall sex estimation of Observer 1 scoring Phenice with ambiguity recorded (Hypothesis 2) is 93% (137 correct/147 total) with an error of 7%, which was not statistically different than the documented sex (χ² = 0, df = 1, p = 1). The sensitivity was 92% and the specificity 94%. The high sensitivity and specificity indicate Phenice’s method is reliable when diagnosing the actual sex of an individual and excludes individuals who are not that sex. The accuracy, error, sensitivity, and specificity of overall sex estimation of Observer 1 without ambiguity recorded was the same as with ambiguity; the overall sex estimation by individual produced the same result, which is probably due to the instructions for estimating overall sex in Phenice [1] (see Discussion for the more in-depth explanation). For Observer 2 scoring Phenice without ambiguity recorded (Hypotheses 2–3), the accuracy is 97% (142 correct/147 total) with an error of 3%, and the estimated sexes also were not statistically significantly different than the documented sexes (χ² = 0, df = 1, p = 1). The sensitivity and specificity were relatively similar at 97% and 96%, respectively, with these high values indicating it is reliable when estimating the actual sex and excluding individuals who are not of that sex. This observer only recorded two values that were ambiguous, demonstrating Phenice’s method can be practiced with only occasionally using ambiguity, as he originally stated (p. 300). Similarly to Observer 1, the estimated overall sex for those individuals scored by Observer 2 with ambiguity was the same; the ambiguity did not change the overall sex estimation (Hypotheses 1–3). Thus, running only one analysis to compare overall sex estimation between observers with differing experience levels was necessary. Cohen’s linear weighted Kappa that evaluated interrater error (Hypothesis 2) for Phenice’s method with and without ambiguity recorded was 0.73, which indicates substantial agreement between observers [34].

Turning to the Klales method, the accuracy of estimating sex using the logistic regression equation (Observer 2; Hypothesis 1) is decreased from Phenice, to 86% (108 correct/126 total) with an error of 14%, but is still not significantly different from the documented sex (χ² = 0.56, df = 1, p = 0.81). The sensitivity and specificity decreased from Phenice, to 89% and 86%, respectively. The sensitivity is higher than the specificity, which means this method is better at estimating the actual sex of an individual than excluding those who are not that sex. Similarly, the estimated sex from RFM in MorphoPASSE (Observer 2; Hypothesis 1) had an accuracy of 86% (127 correct/147 total) with an error of 14% and was not significantly different from documented sex (χ² = 0.45, df = 1, p = 0.50). The sensitivity (89%) and specificity (85%) were very similar to the logit and should be interpreted the same. The probability of classifying as a particular sex ranged from 0.562 to 1, and model accuracy from 0.8217 to 0.9641, with a corresponding Kappa of 0.6433–0.9281. In all but four cases, the variable importance, measured by the mean decrease in the Gini index was highest for the ventral arc (meaning it was the most important marker), followed by the subpubic contour. A zero was generated for all medial ischiopubic ramus scores. Four individuals were classified, and generated probabilities, but had Gini indices of NAN for the ventral arc and zeros for both the subpubic contour and medial ischiopubic ramus; three of these were female and one was male.

The inaccurate estimates of sex appear to be patterned according to the experience level of observer and the method (Table 1). We report these numbers for transparency reasons even though these numbers are very small and so the patterns may be an artifact of sample size, rather than true effects. For Phenice, Observer 1 (Hypothesis 2), who was the more inexperienced observer, yielded inaccurate sex estimations split between an individual in their 20s and older adults (60+ years old), the most numerous age group in our sample, and between the sexes (sex bias = 0%). Observer 2 (Hypotheses 1–3), the seasoned observer, had a roughly equal split in age between younger and older adults, and sexes (sex bias = 0.5%). Both the Klales logit (sex bias = −3%) and RFM (sex bias = −1.7%) errors were spread across the ages and sexes, with the exception of males with the Klales logit.

The pattern of the inaccurate estimates was examined across observers and methods (Table 2) to look for individuals whose morphology was difficult to classify across all methods, which would demonstrate that the misclassification was not due to a bias in a particular method. Only one individual was estimated incorrectly for sex across both observers and methods; this individual was identified as female. There was no overlap between the observers for the Phenice method (either with or without ambiguity). However, there was some overlap between the Phenice and Klales logit and RFM, as scored by Observer 2. The majority of individuals whose sexes were estimated incorrectly were estimated using the Klales methods (i.e., both Klales methods misclassified these individuals, as scored by Observer 2), followed closely by the Klales logit individually (Observer 2), and by Observer 1 (Phenice with and without ambiguity). The classifying probabilities and model accuracies from RFM (Observer 2) for the incorrect sex estimations were not just isolated to the lowest numbers; the higher accuracies and probabilities also had inaccurate sex estimations (e.g., four individuals with classification probabilities of 1 and fourteen individuals with model accuracies of 0.9641, the highest model accuracies reported for our individuals).

One complication with the Klales et al. [9] method (Observer 2) that we observed is that it did not account for the medial ischiopubic rami morphology we saw in many males; they presented with a combination of a ridge below the pubic symphysis, extending inferiorly to approximately 1/3 of the ramus (score: 2), and a broad (score: 4) or very broad (score: 5) dorso-ventral ascending ramus. The guidance from the MorphoPASSE manual [23] states, “If a ridge or plateau is present on superior 1/3rd of ramus (not just below symphysis), see scores 1–2” (p. 27), thus we scored these individuals as a two. However, this did not affect the RFM results; none of the 147 individuals scored above a zero for the mean decrease in Gini index for medial ischiopubic ramus, indicating it was not useful to estimate sex. For the Klales logit, individuals that were identified as male but whose sex was estimated as female, which is the group that could be impacted by this scoring issue the most, had the scores of a 2 with features of scores 4/5 and were accompanied by at least one other feature that scored as a 2 or a 3, this indicates these individuals were hard to classify due to ambiguity. Otherwise, individuals that presented in this manner had sex correctly estimated.

4. Discussion

In this paper, we compared the accuracy of the estimation of sex from the pubic bone using Phenice’s [1] method to Klales and coworkers’s [9] logistic regression equation and, as a first, to MorphoPASSE’s RFM (Hypothesis 1; Observer 2). The accuracies (and errors) were 97% (3% error) for Phenice’s method with and without ambiguity, which exceeded the performance of the Klales logit and the RFM (86% accuracy and 14% error for both). Sensitivity (97%) and specificity (96%) for Phenice’s method were also higher than either of the Klales methods (logit = 89% and 86%, and RFM = 89% and 85%). Furthermore, while there was not much difference, there was a larger sex bias with the Klales et al. methods (−3% for logit and −1.7% for RFM) as opposed to Phenice’s method (0.5% for the experienced observer). Thus, our first hypothesis is supported; Phenice’s method outperformed both of Klales’ methods. For Hypothesis 2 (both observers), the accuracies, errors, non-significant McNemar’s tests, and calculated substantial agreement between observers using the linear weighted Cohen’s Kappa indicates that the inexperienced observer performed similarly to the experienced observer, supporting our hypothesis and Phenice’s assertion that the method is easy to learn and apply. Finally, our Hypothesis 3 (Observer 2) was also supported; using the ambiguity embedded in Phenice’s writings led to the same overall sex being estimated by individual, within and between observers, despite the different levels of experience of observers.

The accuracies we calculated for Phenice’s method (97%) for Hypothesis 1 (Observer 2) align with that of the greater literature, of 88–97% [7,10,13,14,15,16]. Similarly, the accuracies using the Klales method in our study (86% for both analyses) are close to the 86.2% accuracy found in the tests on an independent sample by Klales et al. [9], the recent studies of Rojas González et al. [4] and Jager and Eliopoulos [2], and the equations on specific populations in Kenyhercz et al. [21]. However, they are lower than the global equation in Kenyhercz et al. [21] and other examinations of the methods, including one using the same overall sample [22]. We appear to be the first to report the sensitivity and specificity of the methods, which both demonstrate a better estimation of sex using the Phenice method. The sex biases were reduced in our study when using the Phenice method, as compared to Johnstone-Belford et al. [14], Rojas González et al. [4], and Jager and Eliopoulos [2]. In comparing Phenice to Klales [2,4], Phenice, again, reported the lowest sex biases, similarly to our results. Due to the way in which forensic cases are scored differently than research projects (see discussion in Godden and Hens [35]) and to the number of cases the authors examined that did not have both methods scored on the same individual, our results are not directly comparable to Zermeño and Godde [3], but the same pattern exists in that sample, with Phenice outperforming Klales.

Our empirical findings for Hypothesis 2 (Observers 1 and 2) agree with Lovell [10] and Phenice [1] that an inexperienced observer could successfully apply the method; the observers here only differed by 4% accuracy, although that could reflect some normalizing, which is to be expected as someone learns a method. MacLaughlin and Bruce [11] did not find that inexperienced observers performed similarly to experienced observers when scoring pelves with ambiguity. Our results may differ from theirs due to how they approached calculating Phenice’s overall sex from the individual traits; they used at least two traits that strongly suggested the same sex to assign an overall sex c.f., [12]. We followed Phenice’s [1] exact protocol, which states, “If the estimation of sex is based on the one or two criteria which are definitely male or female, the estimate will be right at least 96% of the time…” (emphasis added, p. 300), and thus, we assigned an overall sex using one strong indicator for some individuals. Ubelaker and Volk [13] also found a lower accuracy rate (88.4%) in an inexperienced observer, although their accuracy far exceeded the experienced observers in MacLaughlin and Bruce [11]. Unfortunately, Ubelaker and Volk [13] did not document how they assigned overall sex, so we cannot compare their application to this study.

The use of ambiguity in Phenice’s method (Hypothesis 3; Observer 2) did not decrease the accuracy here, which also complements Kelley’s [36] findings. While McFadden and Oxenham [12] critiqued having an ambiguous category for overall sex (i.e., female, ambiguous, male), they differentiate it from having scores that indicate trait expression differences. Here, this type of ambiguity applied by the Phenice method appears to capture the range of human variation better than the Klales et al. method, as evidenced in its increased accuracy compared to both types of Klales’ statistical analyses, and the close agreement between observers for overall sex. This may be because of Phenice’s focus on two very different expressions of the traits and permittance of ambiguity, as appropriately introduced by practitioners, which can be adapted to various time periods and geographical locations.

Indeed, ambiguity is important to the Phenice method. Phenice [1] stated:

“When one uses the three criteria outlined above, it must be kept in mind that not every os pubis will be a perfect male or female… On occasion the ventral arc may not be well developed in a female, or a male may show a hint of a subpubic concavity, or the medial aspect of the ischiopubic ramus may be intermediate between the male and female morphology. It must be pointed out that such variability is to be expected, but it presents no really serious problem.” (p. 300).

This aligns his method with a categorical scale, similar to the 1–5 scale in Klales et al. [9], albeit a smaller one. For example, a categorical, three-point scale can be applied to each trait using Phenice’s [1] terminology and the simplicity of his scoring that keeps it easy to apply for novice observers. For the ventral arc, present = 1, not “well developed” = 2, and absent = 3. For the subpubic concavity, present = 1, “slight hint of a concavity” = 2, and absent = 3. Moreover, for the medial aspect of the ischiopubic ramus, ridge present = 1, intermediate = 2, and “broad flat surface” = 3. Then, the observer applies a decision tree type of analysis to estimate sex, whereby if an individual has a score of a 1 or a 3 for at least one trait, they are classified as female or male, respectively, with a 96% accuracy for the population in Phenice’s original work. Additional scoring guidance is provided, i.e., a slight subpubic concavity (a 2 above) is present in males and the medial aspect of the ischiopubic ramus should be discounted unless the other traits are missing, so a careful reading of the paper by the practitioner should be completed when implementing the method. The guidance on the medial ischiopubic ramus is supported by the values, in Figure 1, for females; 31% of the scores for persons identified as female were scored equivalent to a 3 on this scale, which appears to have been anticipated by Phenice. Our results from Observer 2 indicate a 97% accuracy, 3% error, 0.5% sex bias, 97% sensitivity, and 96% specificity in Northern Italians when following this ambiguity protocol of Phenice. This protocol also partially explains why both observers’ results were the same with and without ambiguity scored; the protocol outlined above only requires one strongly male or strongly female score (i.e., a score of a 1 or 3), which allows for up to two ambiguous scores without changing the outcome. Moreover, the decision-tree-type process Phenice created led to the same answers when having to rely on any ambiguity scores, which showcases the strength of the decision nodes he described when applied exactly. However, there may be additional combinations of ambiguity not represented in our data where this decision tree approach will not lead to the same outcome, and thus, we caution this may not be the usual result. Additionally, the categorization of a 2 for ambiguity demonstrates that the Phenice method is not just focused on extremes [9] p. 106, as has been claimed. The lower reliance on the medial aspect of the ischiopubic ramus by Phenice also discounts the idea of a “majority rules” [9], p. 106.

The pictures and descriptions based on the growth and development literature in Klales et al. [9] and Klales and Cole [23] yield valuable insights into the variation in pelvic features. Klales’ revision of Phenice started an important conversation causing the field to review a longstanding practice, with advancements and knowledge gained 43 years after the original method was proposed. For that, the Klales method should be lauded. However, our results here indicate that the components of Phenice’s original method still hold, and its accuracy exceeds the more modern revisions (including Kenyhercz et al. [21]; we calculated a 94% accuracy using the global equation), even on contemporary individuals who died after the publication of Phenice’s original work. One additional benefit of Phenice’s method is its superior ability to score sex indicators from fragmented pubic bones; in Phenice’s method, the subpubic concavity is scored as inferior to the pubic symphysis, but Klales et al. [9] extended the scoring of it to be the length of the ischiopubic ramus, which presented data collection challenges for us as we encountered many fragmented os coxae and had to skip individuals that did not have the length of the ramus needed for Klales et al. [9], but were sufficient for Phenice [1]. However, when the ramus was present, we were able to score the isolated pubic bones.

While Phenice’s original method outperformed the Klales methods, we do show a similar performance of the logit model [9] to the MorphoPASSE RFM [24]. However, MorphoPASSE was able to analyze more individuals than the logit because it handles missing values, while logistic regression does not. Klales [31] and Klales and Cole [23] both cite fewer assumptions with the RFM as a reason to use the recommended RFM in MorphoPASSE. However, the assumptions they cite are incorrect for their logistic regression model (see below), which undervalues their implementation of the logit model and underscores the need for the peer review of MorphoPASSE and the manual.

First, in 2020, Klales stated the variables for a logit model must be discrete; this is only true for the outcome (dependent) variable [37]. Second, Klales [31] and Klales and Cole [23] stated there should be no outliers; which, with ordinal data and the expectation of having pelvises that achieve scores of 1 s or 5 s for all three traits, there will inherently be no outliers. Third, Klales [31] and Klales and Cole [23] stated there should be a linear relationship between the odds ratio and predictor (independent) variables. This is incorrect as an odds ratio is calculated for each predictor; instead, the linear relationship to which they refer is between the predictor variables and the logit of the outcome and only pertains to continuous predictors [37], of which Klales et al. [9] had none. Fourth, they stated there should be no collinearity between the predictor variables, which was violated with their model. To evaluate this last one, we have to examine their first publication on the subject.

In Klales et al. [9], the authors presented a polychoric correlation matrix that showed strong correlations among the three ordinal variables used in their logistic regression equation. However, this correlation is not a measure of multicollinearity (which is what the assumption pertains to for a multivariable logit [37]), rather than collinearity; the polychoric correlations will not provide the best measure of multicollinearity. To evaluate the actual performance of the Klales et al. traits in a logistic regression here, we ran a squared scaled generalized variance inflation factor (squared scaled GVIF [38]) in R [33], using the car package [39], on a logistic regression equation we generated with all three of the Klales et al. traits from left sides in our Milanese data. The square scaled GVIF provides a measure of multicollinearity among the categorical predictor variables. In our sample, the values were under 1.25, which indicates there are no problems with the model that need fixing [40]. In other words, our model did not have a problem with multicollinearity. Given the different developmental pathways of each of the three of Phenice’s characteristics paired with our findings here, we feel that if the data from the original model in Klales et al. [9] were tested with the square scaled GVIF, they would probably yield similar values. Therefore, we disagree with Klales [31] and Klales and Cole [23] that these assumptions were applicable and/or definitely violated; the logit model in Klales et al. [9] appears to us that it met its assumptions. However, we affirm our agreement with the critiques of Konigsberg and Frankenberg [30], and their recommendation that the independent and dependent variables should be reversed and a new model should be run, to be consistent with transition analysis theory, contend better with missing data, and avoid scenarios where the model will be inaccurate due to the reversing of the variables.

Without a peer-reviewed article on MorphoPASSE to consult, there are several issues we encountered that we are unable to explain, and which underscore the importance of the peer review process for this software and statistical application. Firstly, the medial ischiopubic ramus was not weighted in any of the RFM we ran (see Gini index results). As this was unexpected in contemporary samples, we also ran several random individuals through the historic and unknown temporal periods, yielding the same result. At the time of the writing of this paper, the software would not run protohistoric individuals, so we could not examine its performance. We are aware that model selection by Konigsberg and Frankenberg [27] led to a model using only the ventral arc and subpubic concavity, which coincides with the reported lower accuracy of medial ischiopubic ramus, compared to the other two traits in Klales et al. [9] and Phenice’s assertion it should only be used in absence of the other two indicators [1]. Thus, we expect it may not be a valuable predictor of sex in the RFM and look forward to analyses showing why this is the case in MorphoPASSE. Secondly, we had individuals that were classified with Case Prediction probabilities of 0.972 or 1 with corresponding Gini indices of NAN for the ventral arc and zeros for the other two traits. These individuals were highly fragmentary; is there a minimum number of traits/sides needed to estimate sex? We had individuals with only one trait on one side or one trait present on both sides that produced the NANs, so we deduce they are connected given how the Gini index works. Relatedly, how were classifications and probabilities produced with no variables measured via the Gini index? In other words, how does the RFM work under these circumstances? Again, we assume because the Gini index is a separate measure from classifications and probabilities, this result is to be expected, but would appreciate a peer-reviewed document that outlines this scenario and how the RFM operates in these cases. Thirdly, what are the advantages of the RFM using both the left and right sides from a single individual c.f., [23] when using one side in the logistic regression equation yielded similar results to the RFM in our data?

Our work certainly is not without bias. Observer 2 may have had negligibly increased accuracy using Phenice due to the two different scoring methods applied, causing the observer to look at each trait from multiple perspectives, although not concurrently. Additionally, Observer 1 may have adjusted how they scored future individuals due to being able to see the actual sex immediately after scoring and being able to ask questions, which reflects the normal learning of a new method. Moreover, we were limited by the methodology, as sex estimation methods only identify sex in the binary and do not capture the continuum of sex. Furthermore, although we do not know the dates of death of the individuals used here, we recognize that the most recent death may be 2001 [18], which is over 20 years ago and may limit the representation of the current Milanese population. Finally, we selected “unknown” for ancestry and region, following the guidance in the MorphoPASSE manual, because our population is not represented in the database. This will evaluate the scores using all populations in the MorphoPASSE database. Kenyhercz et al. [22] show that a population specific equation is not necessary and an equation derived from global samples performs better, and thus the impact from not having a population specific equation is probably minimal. Future directions should include scoring the two methods completely independently (e.g., one observer scoring one method and another scoring the other after normalizing on each other’s methods), a well-balanced interobserver error study across all three methods, and a comparison of how the Phenice categories align with the Klales et al. categories within individuals.

5. Conclusions

We conclude that Phenice is superior, based on the body of literature cited in this paper that consistently reports Phenice’s higher accuracy, lower error, and lower bias across populations, combined with our results here (Observer 2) that detail its greater accuracy, and lower error rates, higher specificity and sensitivity, paired with it working well when using both ambiguity and non-ambiguity approaches. Although biological profile components are not usually testified upon in the United States [41,42], we disagree with the assessment of Klales et al. [9] that Phenice [1] is not consistent with Daubert v. Merrell Dow Pharmaceuticals, Inc. [43] guidelines in the United States. Therefore, based on our results here, we recommend the use of Phenice and the adherence to all guidance in his 1969 paper [1] in all applicable anthropological contexts. Practitioners can further apply a categorical scoring method as we have detailed here, which may decrease subjectivity and bias. The accuracy provided by Phenice and those accuracies reported for different populations by authors cited in this paper can be reported for degree of certainty, which aligns with the ANSI/ASB (American National Standards Institute/Academy Standards Board) Standard [44]: “The degree of certainty should be expressed when reporting sex estimates. This may be expressed numerically (e.g., correct classification rates, method accuracies) and qualitatively, as necessary (i.e., using qualifiers such as probable)” (emphasis added, p. 3). It also aligns with the Daubert guideline that “the court should consider… known or potential error rates for the technique” [43] p. 2; Phenice (1969) reported a 96% accuracy rate, which corresponds to a 4% error rate. We further suggest the use of the Klales et al. [9] and Kenyhercz et al. [21] logistic regression formulae be used with caution until the issues raised by Konigsberg and Frankenberg [30] are addressed and the variables flipped. While MorphoPASSE shows promise, the findings detailed here, in particular its calculations in the output, cause us to caution about the use of MorphoPASSE for work that involves the medicolegal system and certification/licensure requirements until such time as it has been formally peer reviewed (our paper cannot be substituted as one) and there is a better understanding of the manual, the RFM, and its performance. We look forward to reading the published, peer-reviewed study on MorphoPASSE, its manual, and its RFM, which will also satisfy the principles in the Daubert guidelines that Klales et al. [9] and Kenyhercz et al. [21] deemed important.

Author Contributions

Conceptualization, K.G.; Data curation, S.M.H.; Formal analysis, K.G.; Investigation, K.G. and G.F.; Methodology, K.G.; Validation, K.G.; Visualization, K.G.; Writing—original draft, K.G., S.M.H. and G.F.; Writing—review and editing, K.G., S.M.H. and G.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Article 43 of the Presidential Decree of the Italian Republic of the National Police Mortuary Regulation allows cemeteries to grant unclaimed skeletal remains to universities for education and research. This research did not involve living humans. Therefore, ethics committee approval was not required.

Informed Consent Statement

Not applicable. Data used in this project were anonymized and came from the CAL Milano Cemetery Skeletal Collection. Article 43 of the Presidential Decree of the Italian Republic of the National Police Mortuary Regulation allows cemeteries to grant unclaimed skeletal remains to universities for education and research.

Data Availability Statement

The Phenice (1969) and Klales et al. (2012) scores only are available from the first author upon reasonable request and in accordance with protocols set forth by Laboratorio di Antropologia e Odontologia Forense; due to the recent nature of the collection and the contextual information relating to the individual numbers, there are concerns individuals could be identified. Availability of the remaining data is under the purview of Laboratorio di Antropologia e Odontologia Forense.

Acknowledgments

The authors wish to thank C. C. and M. M. for access to the skeletal collection in Milan.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Phenice, T.W. A newly developed visual method of sexing the os pubis. Am. J. Phys. Anthropol. 1969, 30, 297–301. [Google Scholar] [CrossRef] [PubMed]
Jager, V.R.; Eliopoulos, C. Sex assessment from the pelvis: A test of the Phenice (1969) and Klales et al. (2012) methods. Forensic Sci. Med. Pathol. 2024, 20, 778–784. [Google Scholar] [CrossRef] [PubMed]
Zermeño, N.; Godde, K. Patterns in the Phenice (1969) and Klales et al. (2012) Methods of Sex Estimation Using Forensic Casework from the United States. J. Forensic Sci. 2025; In Review. [Google Scholar]
Rojas González, N.; Obertová, Z.; Franklin, D. Validation and recalibration of sex estimation methods using pubic nonmetric traits for the Chilean population. Int. J. Leg. Med. 2024, 138, 2071–2080. [Google Scholar] [CrossRef]
Coleman, W.H. Sex differences in the growth of the human bony pelvis. Am. J. Phys. Anthropol. 1969, 31, 125–151. [Google Scholar] [CrossRef] [PubMed]
Budinoff, L.C.; Tague, R.G. Anatomical and developmental bases for the ventral arc of the human pubis. Am. J. Phys. Anthropol. 1990, 82, 73–79. [Google Scholar] [CrossRef]
Sutherland, L.; Suchey, J. Use of the ventral arc in pubic sex determination. J. Forensic Sci. 1991, 36, 501–511. [Google Scholar] [CrossRef]
Anderson, B.E. Ventral arc of the os pubis: Anatomical and developmental considerations. Am. J. Phys. Anthropol. 1990, 83, 449–458. [Google Scholar] [CrossRef] [PubMed]
Klales, A.R.; Ousley, S.D.; Vollner, J.M. A revised method of sexing the human innominate using Phenice’s nonmetric traits and statistical methods. Am. J. Phys. Anthropol. 2012, 149, 104–114. [Google Scholar] [CrossRef]
Lovell, N.C. Test of Phenice’s technique for determining sex from the os pubis. Am. J. Phys. Anthropol. 1989, 79, 117–120. [Google Scholar] [CrossRef]
MacLaughlin, S.M.; Bruce, M.F. The accuracy of sex identification in European skeletal remains using the phenice characters. J. Forensic Sci. 1990, 35, 1384–1392. [Google Scholar] [CrossRef]
McFadden, C.; Oxenham, M.F. Revisiting the Phenice technique sex classification results reported by MacLaughlin and Bruce (1990). Am. J. Phys. Anthropol. 2016, 159, 182–183. [Google Scholar] [CrossRef]
Ubelaker, D.H.; Volk, C.G. A test of the Phenice method for the estimation of sex. J. Forensic Sci. 2002, 47, 19–24. [Google Scholar] [CrossRef]
Johnstone-Belford, E.; Flavel, A.; Franklin, D. Morphoscopic observations in clinical pelvic MDCT scans: Assessing the accuracy of the Phenice traits for sex estimation in a Western Australian population. J. Forensic Radiol. Imaging 2018, 12, 5–10. [Google Scholar] [CrossRef]
Inskip, S.; Scheib, C.L.; Wohns, A.W.; Ge, X.; Kivisild, T.; Robb, J. Evaluating macroscopic sex estimation methods using genetically sexed archaeological material: The medieval skeletal collection from St John’s Divinity School, Cambridge. Am. J. Phys. Anthr. 2019, 168, 340–351. [Google Scholar] [CrossRef] [PubMed]
Flamino, C.B.; Oliveira, D.; Ferreira, L.; Martins, M.; Santos, R.; Laureano, R.; Nunes, T.; Bento, B. Osteometric and Osteomorphological Sex Estimation from the Os Coxa in an Archaelogical Population Related to the 1755 Earthquake of Lisbon. Bull. Int. Assoc. Paleodont. 2020, 14, 32–39. [Google Scholar]
Klales, A.R. Secular Change in Morphological Pelvic Traits used for Sex Estimation. J. Forensic Sci. 2016, 61, 295–301. [Google Scholar] [CrossRef]
Klales, A.R.; Cole, S.J. Improving Nonmetric Sex Classification for Hispanic Individuals. J. Forensic Sci. 2017, 62, 975–980. [Google Scholar] [CrossRef]
Gómez-Valdés, J.A.; Menéndez Garmendia, A.; García-Barzola, L.; Sánchez-Mejorada, G.; Karam, C.; Baraybar, J.P.; Klales, A. Recalibration of the Klales et al. (2012) method of sexing the human innominate for Mexican populations. Am. J. Phys. Anthopol. 2017, 162, 600–604. [Google Scholar] [CrossRef]
Lesciotto, K.M.; Doershuk, L.J. Accuracy and reliability of the Klales et al. (2012) morphoscopic pelvic sexing method. J. Forensic Sci. 2018, 63, 214–220. [Google Scholar] [CrossRef]
Kenyhercz, M.W.; Klales, A.R.; Stull, K.E.; McCormick, K.A.; Cole, S.J. Worldwide population variation in pelvic sexual dimorphism: A validation and recalibration of the Klales et al. method. Forensic Sci. Intl. 2017, 277, 259.e1–259.e8. [Google Scholar] [CrossRef]
Selliah, P.; Martino, F.; Cummaudo, M.; Indra, L.; Biehler-Gomez, L.; Campobasso, C.P.; Cattaneo, C. Sex estimation of skeletons in middle and late adulthood: Reliability of pelvic morphological traits and long bone metrics on an Italian skeletal collection. Int. J. Leg. Med. 2020, 134, 1683–1690. [Google Scholar] [CrossRef]
Klales, A.R.; Cole, S.J. MorphoPASSE: The Morphological Pelvis and Skull Sex Estimation Database Manual; Washburn University: Topeka, KS, USA, 2017. [Google Scholar]
Klales, A.R. MorphoPASSE: The Morphological Pelvis and Skull Sex Estimation Database; Washburn University: Topeka, KS, USA, 2018. [Google Scholar]
Walker, P.L. Sexing skulls using discriminant function analysis of visually assessed traits. Am. J. Phys. Anthr. 2008, 136, 39–50. [Google Scholar] [CrossRef]
Juarez, C.A.; Hughes, C.E.; Yim, A.-D. Technical note: A report on the Forensic Anthropology Database for Assessing Methods Accuracy. Am. J. Phys. Anthropol. 2021, 174, 149–150. [Google Scholar] [CrossRef]
Konigsberg, L.W.; Frankenberg, S.R. Multivariate ordinal probit analysis in the skeletal assessment of sex. Am. J. Phys. Anthropol. 2019, 169, 385–387. [Google Scholar] [CrossRef] [PubMed]
Konigsberg, L.; Hermann, N.P.; Wescott, D.J.; McBride, D.G.; Benfer, R.A. Commentary on: McBride DG, Dietz MJ, Vennemeyer MT, Meadors SA, Benfer RA, Furbee NL. Bootstrap methods for sex determination from the os coxae using the ID3 algorithm. J. Forensic Sci. 2002, 47, 424–426. [Google Scholar] [CrossRef] [PubMed]
Konigsberg, L.W.; Hens, S.M. Use of ordinal categorical variables in skeletal assessment of sex from the cranium. Am. J. Phys. Anthropol. 1998, 107, 97–112. [Google Scholar] [CrossRef]
Klales, A.; Ousley, S.; Vollner, J. Response to multivariate ordinal probit analysis in the skeletal assessment of sex (Konigsberg and Frankenberg). Am. J. Phys. Anthropol. 2019, 169, 388–389. [Google Scholar] [CrossRef]
Klales, A.R. Chapter 16-MorphoPASSE: Morphological pelvis and skull sex estimation program. In Sex Estimation of the Human Skeleton; Klales, A.R., Ed.; Academic Press: Cambridge, MA, USA, 2020; pp. 271–278. [Google Scholar]
Cattaneo, C.; Mazzarelli, D.; Cappella, A.; Castoldi, E.; Mattia, M.; Poppa, P.; De Angelis, D.; Vitello, A.; Biehler-Gomez, L. A modern documented Italian identified skeletal collection of 2127 skeletons: The CAL Milano Cemetery Skeletal Collection. Forensic Sci. Int. 2018, 287, 219.e1–219.e5. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2020. [Google Scholar]
Landis, J.; Koch, G. The measurement of observer agreement for categorical data. Biometrics 1977, 33, 159–174. [Google Scholar] [CrossRef]
Godde, K.; Hens, S.M. A Bayesian approach to Suchey-Brooks age estimation from the pubic symphysis using modern American samples. J. Forensic Sci. 2025, 70, 9–18. [Google Scholar] [CrossRef] [PubMed]
Kelley, M.A. Sex determination with fragmented skeletal remains. J. Forensic Sci. 1979, 24, 154. [Google Scholar] [CrossRef] [PubMed]
Faizi, N.; Alvi, Y. Regression and multivariable analysis. In Biostatistics Manual for Health Research; Faizi, N., Alvi, Y., Eds.; Academic Press: Cambridge, MA, USA, 2023; pp. 213–247. [Google Scholar]
Fox, J.; Monette, G. Generalized collinearity diagnostics. J. Am. Stat. Assoc. 1992, 87, 178–183. [Google Scholar] [CrossRef]
Fox, J.; Weisberg, S. An R Companion to Applied Regression; Sage publications: Oaks, PA, USA, 2018. [Google Scholar]
Harris, J.K. Primer on binary logistic regression. Fam. Med. Community Health 2021, 9 (Suppl. S1), e001290. [Google Scholar] [CrossRef] [PubMed]
Lesciotto, K.M.; Christensen, A.M. The over-citation of Daubert in forensic anthropology. J. Forensic Sci. 2024, 69, 9–17. [Google Scholar] [CrossRef]
Murray, E.A.; Anderson, B.E. Forensic anthropology in the courtroom: Trends in testimony. In Proceedings of the 59th Annual Meeting of the American Academy of Forensic Sciences, San Antonio, TX, USA, 20 February 2007; pp. 322–323. [Google Scholar]
United States. Supreme Court. Daubert v. Merrell Dow Pharmaceuticals Inc., 509 US 579. In United States Reports Volume 509, Cases Adjudged in the Supreme Court at October Term 1992; UNT Digital Library: Denton, TX, USA, 1997. [Google Scholar]
ASB Academy Standards Board. Standard for Sex Estimation in Forensic Anthropology; Academy Standards Board: Colorado Springs, CO, USA, 2019. [Google Scholar]

Figure 1. Histogram depicting the scores of Phenice’s characteristics without ambiguity from Observer 2. VA = ventral arc, SC = subpubic concavity, and MIPR = medial ischiopubic ramus.

Figure 2. Histogram depicting the scores of Klales’ characteristics from Observer 2. VA = ventral arc, SC = subpubic contour, and MIPR = medial aspect of the ischiopubic ramus.

Table 1. Age at death and sex patterning in inaccurate estimates of sex.

	Observer 1		Observer 2
	Phenice		Phenice		Klales Logit		Klales RFM
Age Group	Female	Male	Female	Male	Female	Male	Female	Male
20–29 Years	1 (10%)	0	0	0	2 (11%)	0	2 (11%)	1 (5%)
30–39 Years	0	0	0	2 (40%)	0	0	0	2 (11%)
40–49 Years	0	0	0	0	1 (6%)	0	1 (5%)	1 (5%)
50–59 Years	0	0	0	0	0	1 (6%)	0	1 (5%)
60+ Years	4 (40%)	5 (50%)	2 (40%)	1 (20%)	6 (33%)	8 (44%)	6 (32%)	5 (26%)

Table 2. Pattern of inaccurate estimates of sex across observers and methods.

Observer/Method	Number of Individuals with Incorrect Estimated Sex
Observer 1 (Phenice with and without ambiguity) only	6
Observer 2 (Phenice with and without ambiguity) only	1
Observer 2 (Klales Logit)	6
Observer 2 (Klales RFM)	5
Observer 1 (Phenice with and without ambiguity) and Observer 2 (Klales Logit and RFM)	2
Observer 2 (Klales Logit and RFM)	8
Observer 2 (Phenice with and without ambiguity and Klales RFM)	2
Observer 2 (Phenice with and without ambiguity and Klales Logit and RFM)	1
All 4 Methods	1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Godde, K.; Hens, S.M.; Fuentes, G. Sex Estimation from the Pubic Bone in Contemporary Italians: Comparisons of Accuracy and Reliability Among the Phenice (1969), Klales et al. (2012), and MorphoPASSE Methods. Forensic Sci. 2025, 5, 54. https://doi.org/10.3390/forensicsci5040054

AMA Style

Godde K, Hens SM, Fuentes G. Sex Estimation from the Pubic Bone in Contemporary Italians: Comparisons of Accuracy and Reliability Among the Phenice (1969), Klales et al. (2012), and MorphoPASSE Methods. Forensic Sciences. 2025; 5(4):54. https://doi.org/10.3390/forensicsci5040054

Chicago/Turabian Style

Godde, K., Samantha M. Hens, and Gwendolyn Fuentes. 2025. "Sex Estimation from the Pubic Bone in Contemporary Italians: Comparisons of Accuracy and Reliability Among the Phenice (1969), Klales et al. (2012), and MorphoPASSE Methods" Forensic Sciences 5, no. 4: 54. https://doi.org/10.3390/forensicsci5040054

APA Style

Godde, K., Hens, S. M., & Fuentes, G. (2025). Sex Estimation from the Pubic Bone in Contemporary Italians: Comparisons of Accuracy and Reliability Among the Phenice (1969), Klales et al. (2012), and MorphoPASSE Methods. Forensic Sciences, 5(4), 54. https://doi.org/10.3390/forensicsci5040054

Article Menu

Sex Estimation from the Pubic Bone in Contemporary Italians: Comparisons of Accuracy and Reliability Among the Phenice (1969), Klales et al. (2012), and MorphoPASSE Methods

Abstract

1. Introduction

Aims

2. Materials and Methods

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI