1. Introduction
Establishing a biological profile from unidentified skeletal remains is a fundamental objective in forensic anthropology, with sex estimation representing the first and most critical step in this process. Accurate sex determination reduces the pool of potential matches by approximately 50%, substantially improving the efficiency of subsequent identification procedures [
1]. Although molecular methods such as DNA analysis remain the most definitive approach, the degraded or fragmented condition of remains encountered in forensic casework frequently precludes their application, necessitating reliable morphological alternatives [
2]. Under such circumstances, osteometric methods derived from skeletal measurements provide a practical and reproducible basis for sex estimation.
The skull is the second most sexually dimorphic skeletal element after the pelvis and offers practical advantages in forensic contexts due to its structural resilience and frequent recovery in cases of fragmented or incomplete remains [
3]. Linear craniofacial measurements reflect cumulative effects of hormonal and biomechanical influences on skeletal development, resulting in consistently larger dimensions in males than females across multiple anatomical regions [
4,
5]. Measurements of the cranial vault, facial skeleton, nasal aperture, and orbital region have each demonstrated discriminatory value for sex estimation in diverse populations [
6,
7,
8,
9]. However, the degree of sexual dimorphism in craniofacial dimensions varies considerably across populations, rendering reference standards derived from one group inappropriate for direct application to another [
10,
11]. Population-specific data for Thai individuals derived from computed tomography (CT) remain limited, representing a gap that constrains the forensic applicability of existing models in this population.
Discriminant function analysis (DFA) has historically served as the primary statistical approach for osteometric sex estimation and remains widely used owing to its interpretability and established track record in forensic practice [
12,
13]. However, DFA requires assumptions of multivariate normality and homogeneity of covariance matrices that may not be satisfied in skeletal datasets, potentially compromising classification performance [
14]. Machine learning algorithms, including support vector machine (SVM) and random forest (RF), have attracted increasing attention as complementary approaches, as these methods operate without parametric assumptions and are capable of capturing non-linear relationships among predictor variables [
15,
16]. SVM identifies an optimal separating hyperplane in a high-dimensional feature space, while RF constructs an ensemble of decision trees that aggregates predictions across multiple weak learners to improve classification stability [
16,
17,
18]. Despite growing interest in machine learning for forensic applications, direct comparisons between DFA and machine learning classifiers in published studies are complicated by inconsistent validation protocols, with different methods frequently evaluated under different cross-validation schemes, precluding fair assessment of their relative performance. Furthermore, validation of sex estimation models against actual forensic cases with confirmed biological sex is rarely reported, leaving the practical utility of these approaches inadequately demonstrated [
19,
20].
The present study therefore aimed to address these gaps through three specific objectives. First, to characterize the pattern and magnitude of sexual dimorphism in CT-derived craniofacial measurements across the cranial vault, facial skeleton, nasal aperture, and orbital region in a Thai adult population. Second, to develop and compare sex estimation models incorporating DFA, SVM, and RF under a unified leave-one-out cross-validation protocol to enable direct and unbiased performance comparison. Third, to demonstrate the practical application of the developed models through illustrative case examples using two cases with confirmed biological sex.
4. Discussion
The present study examined sexual dimorphism in CT-derived craniofacial measurements and evaluated the performance of three classification methods for sex estimation in a Thai adult population. Seven of the eight measurements demonstrated statistically significant sexual dimorphism, with facial breadth (d = 1.61) and nasal height (d = 1.31) showing the greatest dimorphism. Under a unified LOO-CV protocol, all three classifiers achieved comparable accuracy ranging from 84.0% to 85.7%. DFA demonstrated the highest overall discriminating ability, achieving an accuracy of 85.7%, AUC of 0.924, and MCC of 0.713, while SVM and RF achieved accuracy of 84.7% and 84.0%, respectively. All three classifiers yielded concordant correct classifications in both forensic case application examples. These findings suggest that CT-derived craniofacial measurements provide a reliable basis for sex estimation in the Thai population, and that DFA remains a competitive approach alongside machine learning classifiers in this context.
The pattern of sexual dimorphism observed in the present study is broadly consistent with findings reported in other Asian and Southeast Asian populations, in which males consistently demonstrate larger craniofacial dimensions than females across multiple anatomical regions [
3,
6,
13]. Nasal height and facial breadth exhibited the greatest degree of dimorphism in the present sample, with percentage differences of 8.12% and 5.84%, respectively, findings that align with previous reports identifying the facial skeleton as a region of pronounced sexual dimorphism in Thai and neighboring populations [
3,
24]. The relatively modest dimorphism observed in cranial breadth is consistent with the brachycephalic cranial morphology characteristic of Asian populations, in which transverse cranial dimensions are proportionally enlarged in both sexes, thereby reducing the magnitude of sex-related differences in this measurement [
3,
6,
24,
25]. Orbital height was the only measurement without statistically significant sexual dimorphism (
p = 0.07, d = 0.21), indicating that this measurement contributes limited independent discriminatory value at the univariate level in the present sample. Notably, however, orbital height was retained in the stepwise DFA model, indicating that it contributes incrementally to sex discrimination within a multivariate context despite its lack of univariate significance, a pattern consistent with the well-established statistical principle that stepwise selection optimizes collective discriminatory power rather than individual variable significance.
The narrow accuracy range of 84.0–85.7% across the three classifiers in the present study suggests that the discriminatory information available from the selected craniofacial measurements was similarly captured by DFA, SVM, and RF, regardless of their underlying algorithmic assumptions. DFA achieved the highest AUC of 0.924 and MCC of 0.713, indicating superior overall discriminating ability and balanced classification performance. SVM demonstrated the most balanced sensitivity and specificity, reflecting the effectiveness of margin-based optimization in minimizing asymmetric misclassification. The lower sensitivity observed for RF (81.3%) relative to its specificity (86.7%) suggests a tendency toward female over classification, which may reflect the influence of ensemble averaging on borderline cases in a balanced sample, a pattern previously noted in RF-based sex estimation studies [
26,
27]. Although machine learning classifiers offer theoretical advantages in capturing non-linear relationships among predictor variables, the present findings indicate that these advantages did not translate into meaningful performance gains over DFA when applied to a relatively small set of linear craniofacial measurements in a homogeneous population sample, suggesting that the discriminatory relationships among these variables are largely linear in nature [
27]. SVM and RF were evaluated using default parameter configurations, with C = 1.0 and gamma = 1/8 for SVM, and 100 trees with round (√8) features per split for RF; while these defaults provide a reproducible baseline, hyperparameter optimization may yield different performance estimates and should be considered in future comparative studies.
The classification performance observed in the present study is broadly consistent with the literature on craniometric sex estimation across diverse populations. The overall DFA accuracy of 85.7% falls within the range of 80–95% reported across CT-based and dry bone craniometric studies worldwide [
6,
7,
8,
10,
27,
28,
29], and DFA and machine learning methods have not consistently demonstrated superior accuracy over one another [
8,
28]. Within Thai populations, DFA applied to dry cranial collections has yielded higher accuracy of 90.6% using six measurements in a northern Thai sample [
29], likely reflecting methodological distinctions between dry bone specimens and CT-derived measurements rather than a fundamental limitation of CT-based craniometry. Among CT-based studies in neighboring Asian populations, Hoshioka et al. reported 93.9% accuracy using DFA in a Japanese sample [
6], while Imaizumi et al. achieved 89.6% using SVM with dimensionality reduction in the same population [
30], with higher rates potentially attributable to greater sexual dimorphism or larger measurement sets. In European populations, comparable accuracy ranges have been reported, including 90.25% using logistic regression in an Italian sample [
7], 86.25% using morphoscopic traits in a Croatian MSCT sample [
28], and 91.9% using rule induction algorithms in a Bulgarian population [
8]. Collectively, these comparisons suggest that performance differences across studies reflect variation in measurement sets, sample size, skeletal material, and analytical approaches rather than population-specific limitations of CT-based sex estimation.
A methodological strength of the present study is the adoption of a comparative multi-method approach, in which DFA, SVM, and RF were evaluated as three analytically independent classifiers under a unified LOO-CV protocol. This approach differs from the majority of published sex estimation studies, in which classification performance is reported for a single method and validated solely through internal cross-validation, limiting the extent to which observed performance can be attributed to genuine discriminatory ability rather than method-specific characteristics [
12,
31]. The use of a unified LOO-CV protocol across all three classifiers is particularly important, as inconsistent validation schemes represent a common source of incomparability in the forensic literature, where DFA and machine learning classifiers are frequently evaluated under different cross-validation procedures, precluding direct performance comparison [
32,
33]. In the present study, the convergence of correct classifications across all three methodologically independent classifiers under a unified protocol therefore provides multilayered evidence that the reported accuracy estimates reflect genuine discriminatory ability of the selected craniofacial measurements within the development sample, rather than artifacts of method-specific optimization or overfitting. Although LOO-CV is well-suited for moderate sample sizes and minimizes bias in performance estimation, it exhibits higher variance in performance estimates compared with repeated k-fold cross-validation and does not assess model stability across different data partitions; this should be considered when interpreting the reported performance metrics.
The procedural workflow for practical application of the developed models was illustrated using two cases with confirmed biological sex. These examples demonstrate that craniofacial measurements obtained from CT images can be directly entered into the canonical discriminant function equation or the saved machine learning models to yield classification outputs with quantified confidence. For DFA application, the five selected measurements are substituted into the canonical discriminant function equation, yielding a discriminant score that can be interpreted using basic arithmetic without dependence on specialized statistical software, which is particularly relevant in resource-limited forensic settings. For SVM and RF application, measurements are entered into the saved models provided in
Supplementary Materials and evaluated using the Predictions widget in Orange Data Mining software, returning sex-class probabilities that quantify classification confidence for each case. In both illustrative cases, discriminant scores were situated well away from the zero-decision boundary, and machine learning classifiers returned probabilities exceeding 0.90 for the correct sex class, indicating unambiguous classification. The concordance of correct classifications across all three methods serves as an illustrative example of how the models function in practice and is consistent with the convergent LOO-CV performance estimates: when DFA, SVM, and RF yield concordant outputs for a given case, the examiner can interpret the result with greater confidence than when relying on a single classifier alone [
32,
33]. The population-specific nature of the developed models further addresses a critical limitation of applying reference data derived from non-Thai populations to Thai forensic cases, where morphological differences may systematically bias classification outcomes [
11]. Collectively, the DFA equation and saved machine learning models constitute a complementary and immediately applicable multi-method toolkit for CT-based forensic sex estimation in the Thai population, directly supporting practitioners in settings where population-specific reference data have previously been unavailable. In this context, DFA and machine learning classifiers are best regarded as complementary approaches: DFA provides an interpretable equation applicable without specialized software, while SVM and RF offer probabilistic outputs that quantify classification confidence and may assist decision-making in borderline cases. Future studies may further explore geometric morphometrics and deep learning-based image analysis, both of which capture craniofacial complexity beyond linear measurements and may offer additional discriminatory power for sex estimation.
5. Conclusions
The present study demonstrates that CT-derived craniofacial measurements provide a reliable basis for sex estimation in Thai adults, with seven of eight measurements exhibiting statistically significant sexual dimorphism. Under a unified LOO-CV protocol, DFA, SVM, and RF achieved comparable classification accuracy ranging from 84.0% to 85.7%, with DFA demonstrating the highest overall discriminating ability as reflected by AUC and MCC. The convergence of performance across three methodologically independent classifiers under a unified validation protocol, further supported by concordant correct classifications in both forensic case application examples, provides multilayered evidence for the internal robustness of the developed models, though external validation remains necessary to confirm generalizability beyond the development sample. The canonical discriminant function equation and saved machine learning models constitute a complementary and immediately applicable toolkit for CT-based forensic sex estimation in the Thai population, addressing a critical gap in population-specific reference data for forensic practitioners in this region.
Limitations
Several limitations of the present study should be acknowledged. First, the sample was drawn from a single tertiary referral center in northeastern Thailand, and the unequal age distribution reflects the clinical profile of the institution rather than population demographics; although cases with craniofacial pathology, trauma, and significant alveolar resorption were excluded, subtle age-related craniofacial changes cannot be fully excluded, and age-stratified analyses in future studies would provide additional insight. Second, all three classifiers were evaluated using LOO-CV as an internal validation procedure, which estimates performance within the development sample but does not substitute for external validation. External validation using independent samples would be required to confirm model generalizability under realistic forensic conditions, and should be pursued when sufficient resources are available. Third, variation in scanner settings and image reconstruction protocols across institutions may influence measurement reproducibility when the developed models are applied in other settings. Fourth, the present models were developed exclusively from Thai adults, and application to other populations should be approached with caution, as performance may vary due to morphological differences; prior validation is recommended before use.