Author Contributions
Y.M.: conceptualization, data curation, formal analysis, investigation, methodology, software, visualization, writing—original draft, writing—review and editing. C.H.: conceptualization, resources, supervision, validation, writing—review and editing. C.S.: supervision, validation. H.Y.: data curation, investigation. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Facial images of different individuals at the same chronological age.
Figure 1.
Facial images of different individuals at the same chronological age.
Figure 2.
Illustration of intra-class and inter-class distances in the feature space of facial age estimation.
Figure 2.
Illustration of intra-class and inter-class distances in the feature space of facial age estimation.
Figure 3.
Overall architecture of the proposed CIIT network.
Figure 3.
Overall architecture of the proposed CIIT network.
Figure 4.
Architecture of the CSE module. (a) Full CSE for Stage 1. (b) Light CSE for Stages 2–4.
Figure 4.
Architecture of the CSE module. (a) Full CSE for Stage 1. (b) Light CSE for Stages 2–4.
Figure 5.
Illustration of the dual-attention mechanism. (a) Transformer block. (b) IFA and ICA. (c) PG-ACIA.
Figure 5.
Illustration of the dual-attention mechanism. (a) Transformer block. (b) IFA and ICA. (c) PG-ACIA.
Figure 6.
Illustration of the ARN module.
Figure 6.
Illustration of the ARN module.
Figure 7.
Validation MAE and CS(5) convergence curves on MORPH II. The red dashed line in subfigure (a) marks the epoch with the best validation MAE.
Figure 7.
Validation MAE and CS(5) convergence curves on MORPH II. The red dashed line in subfigure (a) marks the epoch with the best validation MAE.
Figure 8.
Effect of reference sequence length on MAE and CS(5) on MORPH II.
Figure 8.
Effect of reference sequence length on MAE and CS(5) on MORPH II.
Figure 9.
Performance comparison across seven regression head configurations on MORPH II. The optimal configuration is marked with ★.
Figure 9.
Performance comparison across seven regression head configurations on MORPH II. The optimal configuration is marked with ★.
Figure 10.
Effect of the number of anchors M on MAE and CS(5) on MORPH II. The optimal configuration () is marked with ★.
Figure 10.
Effect of the number of anchors M on MAE and CS(5) on MORPH II. The optimal configuration () is marked with ★.
Figure 11.
Sensitivity analysis of the prior coefficient and sparsity ratio on MORPH II. The selected setting yields the best overall performance in the two studies.
Figure 11.
Sensitivity analysis of the prior coefficient and sparsity ratio on MORPH II. The selected setting yields the best overall performance in the two studies.
Figure 12.
Age-range MAE comparison between the short-sequence baseline S1 () and the full CIIT model on MORPH II. The test set is divided into 10-year age groups, and the number below each group denotes the number of test samples.
Figure 12.
Age-range MAE comparison between the short-sequence baseline S1 () and the full CIIT model on MORPH II. The test set is divided into 10-year age groups, and the number below each group denotes the number of test samples.
Figure 13.
Prediction examples from all four benchmarks. Black numbers denote ground-truth labels, and green and red numbers denote accurate and erroneous predictions, respectively. Adience labels denote age-group intervals, and a prediction is regarded as accurate if it falls within the interval.
Figure 13.
Prediction examples from all four benchmarks. Black numbers denote ground-truth labels, and green and red numbers denote accurate and erroneous predictions, respectively. Adience labels denote age-group intervals, and a prediction is regarded as accurate if it falls within the interval.
Table 1.
Comparison with state-of-the-art methods on MORPH II. The best result in each column is shown in bold. − indicates that the metric was not reported in the original work.
Table 1.
Comparison with state-of-the-art methods on MORPH II. The best result in each column is shown in bold. − indicates that the metric was not reported in the original work.
| Method | Year | Pretrained | MAE | CS(5) |
|---|
| DLDL-V2 [18] | 2018 | MS-Celeb-1M | 1.97 | − |
| DEX [3] | 2018 | IMDB-WIKI | 2.68 | 78.10 |
| DHAA [37] | 2019 | ImageNet | 1.91 | − |
| MWR [23] | 2022 | IMDB-WIKI | 2.00 | 95.00 |
| GOL [38] | 2022 | ImageNet | 2.09 | 94.20 |
| MetaAge [39] | 2022 | IMDB-WIKI | 1.81 | − |
| GLAE [35] | 2023 | MS-Celeb-1M | 1.14 | − |
| DAA [13] | 2023 | IMDB-WIKI | 2.06 | − |
| OLDL [21] | 2023 | ImageNet | 2.45 | − |
| TAA-GCN [36] | 2023 | − | 1.69 | − |
| HR [34] | 2023 | ImageNet | 1.13 | − |
| MSDNN [40] | 2024 | ImageNet | 2.59 | 86.66 |
| GroupFace [26] | 2024 | IMDB-WIKI | 1.86 | − |
| ConvNeXt-Trans [41] | 2025 | ImageNet | 2.26 | − |
| CSCS-Swin [16] | 2025 | ImageNet | 2.26 | − |
| MGP-Net [42] | 2025 | IMDB-WIKI | 2.25 | 91.82 |
| MCGRL [43] | 2025 | − | 2.39 | 89.90 |
| OrdCon [25] | 2025 | − | 2.21 | − |
| LRA-GNN [27] | 2025 | − | 1.79 | − |
| SA-LDL [44] | 2025 | − | 1.75 | 92.20 |
| GFL [29] | 2025 | IMDB-WIKI | 1.57 | − |
| Ours | 2026 | ImageNet | 1.19 | 99.58 |
Table 2.
Comparison with state-of-the-art methods on MegaAge-Asian. The best result in each column is shown in bold. − indicates that the metric was not reported in the original work.
Table 2.
Comparison with state-of-the-art methods on MegaAge-Asian. The best result in each column is shown in bold. − indicates that the metric was not reported in the original work.
| Method | Year | Pretrained | MAE | CS(5) |
|---|
| Posterior [31] | 2017 | MS-Celeb-1M | − | 82.15 |
| SSR-Net [5] | 2018 | IMDB-WIKI | − | 74.10 |
| LRN [46] | 2020 | IMDB-WIKI | − | 82.95 |
| VGG+Distillation [45] | 2020 | ImageNet, IMDB-WIKI, AFAD | − | 83.01 |
| PVP+VGG16 [45] | 2020 | ImageNet, IMDB-WIKI, AFAD | − | 87.24 |
| DAA [13] | 2023 | − | 2.93 | 84.89 |
| ALD-Net [47] | 2023 | − | − | 81.20 |
| TDT [48] | 2024 | ImageNet | − | 85.42 |
| SA-Hierarchical [49] | 2025 | ImageNet | 3.09 | 82.30 |
| Ours | 2026 | ImageNet | 1.25 | 98.76 |
Table 3.
Comparison with state-of-the-art methods on FG-NET under the LOPO protocol. The best result in each column is shown in bold. − indicates that the metric was not reported in the original work.
Table 3.
Comparison with state-of-the-art methods on FG-NET under the LOPO protocol. The best result in each column is shown in bold. − indicates that the metric was not reported in the original work.
| Method | Year | Pretrained | MAE | CS(5) |
|---|
| MV-Loss [19] | 2018 | ImageNet, IMDB-WIKI | 2.68 | − |
| BridgeNet [24] | 2019 | ImageNet, IMDB-WIKI | 2.56 | 86.0 |
| AVDL [20] | 2020 | IMDB-WIKI | 2.32 | − |
| PML [50] | 2021 | IMDB-WIKI | 2.16 | − |
| MWR [23] | 2022 | IMDB-WIKI | 2.23 | 91.1 |
| DAA [13] | 2023 | IMDB-WIKI | 2.19 | − |
| MCGRL [43] | 2025 | − | 2.86 | 88.0 |
| OrdCon [25] | 2025 | − | 2.85 | − |
| MGP-Net [42] | 2025 | IMDB-WIKI | 2.28 | 90.3 |
| LRA-GNN [27] | 2025 | − | 2.14 | 91.6 |
| Ours | 2026 | ImageNet | 1.42 | 95.4 |
Table 4.
Comparison with state-of-the-art methods on Adience. The best result in each column is shown in bold. − indicates that the metric was not reported in the original work.
Table 4.
Comparison with state-of-the-art methods on Adience. The best result in each column is shown in bold. − indicates that the metric was not reported in the original work.
| Method | Year | Pretrained | MAE | ACC |
|---|
| SORD [53] | 2019 | ImageNet | 0.49 | 59.6 |
| AL-ResNet-34 [54] | 2019 | ImageNet, IMDB-WIKI | − | 67.5 |
| Agbo-Ajala CNN [51] | 2020 | IMDB-WIKI | − | 83.1 |
| EfficientNet-B4 [55] | 2021 | ImageNet | − | 81.1 |
| OrdinalCLIP [56] | 2022 | CLIP | 0.47 | 61.2 |
| MWR [23] | 2022 | ImageNet | 0.45 | 62.6 |
| GOL [38] | 2022 | ImageNet | 0.43 | 62.5 |
| CIG-PVT [57] | 2023 | ImageNet | 0.43 | 64.4 |
| L2RCLIP [58] | 2023 | CLIP | 0.36 | 66.2 |
| MiVOLO-D1 [28] | 2023 | LAGENDA | − | 68.7 |
| ViT-hSeq [52] | 2024 | ImageNet | − | 84.9 |
| SCG-Net [59] | 2025 | LAGENDA | 0.36 | 65.2 |
| Ours | 2026 | ImageNet | 0.24 | 69.9 |
Table 5.
Component-level ablation on MORPH II. The best result in each column is shown in bold. ✓ and × indicate that the corresponding component is enabled and removed, respectively. Dense denotes replacing sparse ICA with dense ICA. The MAE column reports the MAE difference from the full model, and − marks the reference row.
Table 5.
Component-level ablation on MORPH II. The best result in each column is shown in bold. ✓ and × indicate that the corresponding component is enabled and removed, respectively. Dense denotes replacing sparse ICA with dense ICA. The MAE column reports the MAE difference from the full model, and − marks the reference row.
| ID | Configuration | CSE | ICA | ARN | MAE | CS(5) | MAE |
|---|
| A0 | Full model | ✓ | ✓ | ✓ | 1.19 | 99.58 | − |
| A1 | w/o CSE | × | ✓ | ✓ | 1.43 | 98.91 | +0.24 |
| A2 | w/o ICA | ✓ | × | ✓ | 1.58 | 97.86 | +0.39 |
| A3 | Dense ICA | ✓ | Dense | ✓ | 1.28 | 99.39 | +0.09 |
| A4 | w/o ARN | ✓ | ✓ | × | 1.34 | 99.17 | +0.15 |
Table 6.
Progressive design analysis of PG-ACIA on MORPH II. The best result in each column is shown in bold. ✓ and × indicate that the corresponding operation is enabled and disabled, respectively. − indicates not applicable. The MAE column reports the MAE change from the previous row, and − in the MAE column marks the reference row.
Table 6.
Progressive design analysis of PG-ACIA on MORPH II. The best result in each column is shown in bold. ✓ and × indicate that the corresponding operation is enabled and disabled, respectively. − indicates not applicable. The MAE column reports the MAE change from the previous row, and − in the MAE column marks the reference row.
| ID | Cross-img | Axial | Prior | Prior Type | Sparse | MAE | CS(5) | MAE |
|---|
| P1 | × | − | − | − | − | 1.58 | 97.86 | − |
| P2 | ✓ | × | × | − | − | 1.52 | 98.31 | −0.06 |
| P3 | ✓ | ✓ | × | − | − | 1.28 | 99.39 | −0.24 |
| P4 | ✓ | ✓ | ✓ | Single-scale | × | 1.33 | 99.26 | +0.05 |
| P5 | ✓ | ✓ | ✓ | Multi-scale | × | 1.28 | 99.39 | −0.05 |
| P6 | ✓ | ✓ | ✓ | Multi-scale | ✓ | 1.19 | 99.58 | −0.09 |
Table 7.
Effect of reference sequence length on MORPH II. The best result in each column is shown in bold. The MAE column reports the MAE change from the previous row. − marks the reference row.
Table 7.
Effect of reference sequence length on MORPH II. The best result in each column is shown in bold. The MAE column reports the MAE change from the previous row. − marks the reference row.
| ID | | MAE | CS(5) | MAE |
|---|
| S0 | 0 | 3.87 | 71.26 | − |
| S1 | 2 | 2.38 | 89.08 | −1.49 |
| S2 | 4 | 1.74 | 96.54 | −0.64 |
| S3 | 6 | 1.64 | 97.65 | −0.10 |
| S4 | 8 | 1.19 | 99.58 | −0.45 |
| S5 | 10 | 1.26 | 99.32 | +0.07 |
Table 8.
Analysis of regression head design on MORPH II. The best result in each column is shown in bold. N/A indicates that anchor similarity and temperature scaling are not used in the MLP, expectation regression or LDL heads. The MAE column reports the MAE change from the previous row. − marks the reference row.
Table 8.
Analysis of regression head design on MORPH II. The best result in each column is shown in bold. N/A indicates that anchor similarity and temperature scaling are not used in the MLP, expectation regression or LDL heads. The MAE column reports the MAE change from the previous row. − marks the reference row.
| ID | Reg. Head | Similarity | Temperature | MAE | CS(5) | MAE |
|---|
| R1 | MLP | N/A | N/A | 1.34 | 99.17 | − |
| R2 | Expectation Regression | N/A | N/A | 2.21 | 90.70 | +0.87 |
| R3 | LDL | N/A | N/A | 1.41 | 98.64 | −0.80 |
| R4 | ARN | Cosine | Fixed | 1.42 | 97.09 | +0.01 |
| R5 | ARN | Dot | Fixed | 1.22 | 99.57 | −0.20 |
| R6 | ARN | Dot | Learnable | 1.19 | 99.58 | −0.03 |
| R7 | ARN | Cosine | Learnable | 1.40 | 97.76 | +0.21 |
Table 9.
Effect of the number of anchors M on MORPH II. The best result in each column is shown in bold. The MAE column reports the MAE change from the previous row. − marks the reference row.
Table 9.
Effect of the number of anchors M on MORPH II. The best result in each column is shown in bold. The MAE column reports the MAE change from the previous row. − marks the reference row.
| ID | M | MAE | CS(5) | MAE |
|---|
| N0 | 0 | 1.34 | 99.17 | − |
| N1 | 60 | 1.54 | 98.38 | +0.20 |
| N2 | 90 | 2.11 | 92.27 | +0.57 |
| N3 | 120 | 1.19 | 99.58 | −0.92 |
| N4 | 180 | 1.65 | 97.52 | +0.46 |
| N5 | 240 | 1.41 | 98.93 | −0.24 |
Table 10.
Sensitivity analysis of the prior coefficient and sparsity ratio on MORPH II. The best result in each group is shown in bold, and the MAE columns report the MAE change from the previous row. − marks the reference row.
Table 10.
Sensitivity analysis of the prior coefficient and sparsity ratio on MORPH II. The best result in each group is shown in bold, and the MAE columns report the MAE change from the previous row. − marks the reference row.
| Prior Coefficient | Sparsity Ratio |
|---|
|
Value
|
MAE
|
CS(5)
| MAE
|
Value
|
MAE
|
CS(5)
| MAE
|
|---|
| 0.2 | 1.21 | 99.49 | − | 0.10 | 1.27 | 99.47 | − |
| 0.4 | 1.19 | 99.58 | −0.02 | 0.15 | 1.19 | 99.58 | −0.08 |
| 0.6 | 1.28 | 99.41 | +0.09 | 0.20 | 1.32 | 99.17 | +0.13 |
| 0.8 | 1.63 | 97.73 | +0.35 | 0.25 | 1.32 | 99.07 | 0.00 |
| | | | | 0.30 | 1.31 | 99.01 | −0.01 |
Table 11.
Inference efficiency comparison of the five component-level variants on MORPH II, measured on an NVIDIA GeForce RTX 5090 with synthetic input of size .
Table 11.
Inference efficiency comparison of the five component-level variants on MORPH II, measured on an NVIDIA GeForce RTX 5090 with synthetic input of size .
| ID | Configuration | Params (M) | FLOPs (G) | Latency (ms/query) | Throughput (query/s) | Peak Mem (GB) |
|---|
| A0 | Full model | 33.07 | 101.25 | 9.95 | 100.52 | 0.26 |
| A1 | w/o CSE | 31.86 | 76.09 | 11.84 | 84.49 | 0.26 |
| A2 | w/o ICA | 28.21 | 107.44 | 7.00 | 142.84 | 0.24 |
| A3 | Dense ICA | 33.07 | 134.47 | 10.89 | 91.81 | 0.59 |
| A4 | w/o ARN | 33.18 | 101.70 | 12.94 | 77.29 | 0.26 |