ECL-ConvNeXt: An Ensemble Strategy Combining ConvNeXt and Contrastive Learning for Facial Beauty Prediction
Abstract
1. Introduction
- We indicate that, in multi-class imbalanced datasets, specifically improving per-class recall to enhance MR more effectively reflects improved overall classification ability compared to solely improving ACC, as validated by experimental results;
- We propose an ensemble strategy combining the primary ConvNeXt multi-class model with ConvNeXtCL-ABC to precisely optimize confused class pairs in multi-class tasks;
- We conducted extensive experiments on three multi-class imbalanced datasets to validate the superiority and robustness of the proposed method in facial beauty prediction tasks.
2. Related Work
2.1. Facial Beauty Prediction
2.2. ConvNeXt
2.3. Contrastive Learning
3. Methodology
3.1. Overall Framework
3.2. ConvNeXt
Algorithm 1 ECL-ConvNeXt ensemble strategy algorithm. |
|
3.3. Discriminator
3.4. Auxiliary Binary Classifier ConvNeXtCL-ABC Based on CL and ConvNeXt
3.5. Ensemble Strategy
4. Experimental Results and Discussion
4.1. Experimental Datasets
4.1.1. LSAFBD Dataset
4.1.2. MEBeauty Dataset
4.1.3. HotOrNot Dataset
4.2. Experimental Environment and Implementation Details
4.3. Evaluation Metrics
4.4. Comparative Experiments
4.4.1. Comparative Experiments Conducted on the LSAFBD Dataset
4.4.2. Comparative Experiments on the MEBeauty Dataset
4.4.3. Comparative Experiments on the HotOrNot Dataset
4.5. Ablation Experiments
4.5.1. Ablation Experiments on the LSAFBD Dataset
4.5.2. Ablation Experiments on the MEBeauty Dataset
4.5.3. Ablation Experiments on the HotOrNot Dataset
4.5.4. Ablation Experiment on Dataset Split Ratios
4.5.5. Ablation Experiment on Contrastive Loss Weight Ratios
4.5.6. Ablation Experiment on Discriminator
4.5.7. Ablation Experiment on Activation Function
4.5.8. Ablation Experiment on Data Augmentation
4.6. Statistical Significance Testing
4.7. Comparison with Other Facial Beauty Prediction Methods
5. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
FBP | Facial beauty prediction |
ACC | accuracy |
MR | Macro Recall |
MPR | Macro Precision |
MF1 | Macro F1-score |
CL | contrastive learning |
ConvNeXtCL-ABC | a dedicated auxiliary binary classifier based on ConvNeXt |
ViT | Vision Transformer |
ABC | an auxiliary binary classifier model |
GELU | Gaussian Error Linear Unit |
LN | layer normalization |
Stem | an initial feature extraction layer |
Stages | feature extraction stages |
DownsamplingLayer | downsampling layers |
Head | a classification head |
Block | residual block |
LSAFBD | The Large-Scale Asia Facial Beauty Database |
Confusion Probability | Average Confusion Probability Between Any Two Classes |
TP | True Positives |
FN | False Negatives |
FP | False Positives |
References
- Lebedeva, I.; Ying, F.; Guo, Y. Personalized facial beauty assessment: A meta-learning approach. Vis. Comput. 2023, 39, 1095–1107. [Google Scholar] [CrossRef]
- Liu, Z.; Mao, H.; Wu, C.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference CVPR, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the 9th International Learning Representations Conference ICLR, Austria Hosted Virtual Event, Virtual, 3–7 May 2021. [Google Scholar] [CrossRef]
- Gan, J.; Xie, X.; He, G.; Luo, H. TransBLS: Transformer combined with broad learning system for facial beauty prediction. Appl. Intell. 2023, 53, 26110–26125. [Google Scholar] [CrossRef]
- Liu, Q.; Lin, L.; Shen, Z.; Yu, Y. FBPFormer: Dynamic convolutional transformer for global-local-contextual facial beauty prediction. In Proceedings of the 32nd International Conference on Artificial Neural Networks and Machine Learning Proceedings ICANN, Heraklion, Greece, 26–29 September 2023. [Google Scholar] [CrossRef]
- Laurinavičius, D.; Maskeliūnas, R.; Damaševičius, R. Improvement of facial beauty prediction using artificial human faces generated by generative adversarial network. Cogn. Comput. 2023, 15, 998–1015. [Google Scholar] [CrossRef]
- Nasios, I. Enhancing kelp forest detection in remote sensing images using crowdsourced labels with Mixed Vision Transformers and ConvNeXt segmentation models. Int. J. Remote Sens. 2025, 46, 2159–2177. [Google Scholar] [CrossRef]
- Holste, G.; Zhou, Y.; Wang, S.; O’Connor, M.; Gill, A.; Saed, A.; Rajpurkar, P. Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge. Med. Image Anal. 2024, 97, 103224. [Google Scholar] [CrossRef]
- Bhattacharya, D.; Reuter, K.; Behrendt, F.; Maack, L.; Grube, S.; Schlaefer, A. PolypNextLSTM: A lightweight and fast polyp video segmentation network using ConvNext and ConvLSTM. Int. J. CARS 2024, 19, 2111–2119. [Google Scholar] [CrossRef] [PubMed]
- Bhattacharya, D.; Reuter, K.; Behrendt, F.; Maack, L.; Grube, S.; Schlaefer, A. Supervised contrastive learning. In Proceedings of the Neural Information Processing Systems Advances Conference NeurIPS, Virtual, 6–12 December 2020; pp. 18661–18673. [Google Scholar] [CrossRef]
- Chen, T.; Kornblith, S.; Swersky, K.; Norouzi, M.; Hinton, G. Big self-supervised models are strong semi-supervised learners. In Proceedings of the Neural Information Processing Systems Advances Conference NeurIPS, NeurIPS, Virtual, 6–12 December 2020; pp. 22243–22255. [Google Scholar]
- Chen, X.; Xie, S.; He, K. An empirical study of training self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9640–9649. [Google Scholar] [CrossRef]
- Chen, H.; Zendehdel, N.; Leu, M.C.; Yin, Z. Fine-grained activity classification in assembly based on multi-visual modalities. J. Intell. Manuf. 2024, 35, 2215–2233. [Google Scholar] [CrossRef]
- Cui, J.; Zhong, Z.; Liu, S.; Yu, B.; Jia, J. Parametric contrastive learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 715–724. [Google Scholar] [CrossRef]
- Zhu, J.; Wang, Z.; Chen, J.; Chen, Y.P.; Jiang, Y.G. Balanced contrastive learning for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 6908–6917. [Google Scholar] [CrossRef]
- Li, T.; Cao, P.; Yuan, Y.; Wu, Z.; Wang, Y.; Liu, Y.; Wu, C.Y. Targeted supervised contrastive learning for long-tailed recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 6918–6928. [Google Scholar] [CrossRef]
- Hou, C.; Zhang, J.; Wang, H.; Zhou, T. Subclass-balancing contrastive learning for long-tailed recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–3 October 2023; pp. 5395–5407. [Google Scholar] [CrossRef]
- Xuan, S.; Zhang, S. Decoupled contrastive learning for long-tailed recognition. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; pp. 6396–6403. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar] [CrossRef]
- Zhai, Y.; Gan, J.; Wu, B.; He, G. Benchmark of a large scale database for facial beauty prediction. In Proceedings of the 1st International Conference on Intelligent Information Processing (ICIIP), Melbourne, Australia, 18–21 November 2016; pp. 1–5. [Google Scholar] [CrossRef]
- Lebedeva, I.; Guo, Y.; Ying, F. MeBeauty: A multi-ethnic facial beauty dataset in-the-wild. Neural Comput. Appl. 2022, 34, 14169–14183. [Google Scholar] [CrossRef]
- Xu, L.; Xiang, J.; Yuan, X. CRNet: Classification and regression neural network for facial beauty prediction. In Proceedings of the Pacific Rim Conference on Multimedia (PCM), Hefei, China, 21–22 September 2018; pp. 661–671. [Google Scholar] [CrossRef]
- Wang, C.-Y.; Liao, H.-Y.M.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the 2020 IEEE/CVF Computer Vision and Pattern Recognition Conference Workshops CVPRW, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar] [CrossRef]
- Ding, M.; Xiao, B.; Codella, N.; Luo, P.; Wang, J.; Yuan, L. DaViT: Dual attention vision transformers. In Proceedings of the Computer Vision Conference of Europe ECCV, Tel Aviv, Israel, 23–27 October 2022; pp. 74–92. [Google Scholar] [CrossRef]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the Machine Learning International Conference ICML, Virtual, 18–24 July 2021; pp. 10347–10357. [Google Scholar] [CrossRef]
- Vasu, P.K.A.; Gabriel, J.; Zhu, J.; Tuzel, O.; Ranjan, A. FastViT: A fast hybrid vision transformer using structural reparameterization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 5762–5772. [Google Scholar] [CrossRef]
- Yang, J.; Li, C.; Dai, X.; Gao, J. Focal modulation networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 28 November–9 December 2022; pp. 4203–4217. [Google Scholar] [CrossRef]
- Ryali, C.; Hu, Y.; Bolya, D.; Wei, C.; Fan, H.; Huang, P.-Y.; Aggarwal, V.; Chowdhury, A.; Poursaeed, O.; Hoffman, J.; et al. Hiera: A hierarchical vision transformer without the bells-and-whistles. In Proceedings of the International Conference on Machine Learning (ICML), Honolulu, HI, USA, 23–29 July 2023; pp. 29441–29454. [Google Scholar] [CrossRef]
- Tolstikhin, I.O.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Steiner, A.; Keysers, D.; Uszoreit, J.; et al. MLP-Mixer: An all-MLP architecture for vision. Adv. Neural Inf. Process. Syst. 2021, 34, 24261–24272. [Google Scholar] [CrossRef]
- Heo, B.; Yun, S.; Han, D.; Chun, S.; Choe, J.; Oh, S.J. Rethinking spatial dimensions of vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 11936–11945. [Google Scholar] [CrossRef]
- Qin, D.; Leichner, C.; Delakis, M.; Fornoni, M.; Luo, S.; Yang, F.; Wang, W.; Banbury, C.; Zhou, X.; Liu, C.; et al. MobileNetV4: Universal models for the mobile ecosystem. In Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, 29 September–4 October 2024; pp. 78–96. [Google Scholar] [CrossRef]
- El-Nouby, A.; Touvron, H.; Caron, M.; Bojanowski, P.; Douze, M.; Joulin, A.; Laptev, I.; Neverova, N.; Synnaeve, G.; Verbeek, J.; et al. XCiT: Cross-covariance image transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 20014–20027. [Google Scholar] [CrossRef]
- Touvron, H.; Cord, M.; Sablayrolles, A.; Synnaeve, G.; Jégou, H. Going deeper with image transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision Proceedings ICCV, Montreal, QC, Canada, 10–17 October 2021; pp. 32–42. [Google Scholar] [CrossRef]
- Xu, W.; Xu, Y.; Chang, T.; Tu, Z. Co-scale conv-attentional image transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision Proceedings ICCV, Montreal, QC, Canada, 10–17 October 2021; pp. 9981–9990. [Google Scholar] [CrossRef]
- Dai, Z.; Liu, H.; Le, Q.V.; Tan, M. CoAtNet: Marrying convolution and attention for all data sizes. Adv. Neural Inf. Process. Syst. 2021, 34, 3965–3977. [Google Scholar] [CrossRef]
- d’Ascoli, S.; Touvron, H.; Leavitt, M.L.; Morcos, A.S.; Biroli, G.; Sagun, L. ConViT: Improving vision transformers with soft convolutional inductive biases. In Proceedings of the International Conference on Machine Learning (ICML), Virtual, 18–24 July 2021; pp. 2286–2296. [Google Scholar] [CrossRef]
- Chen, C.F.R.; Fan, Q.; Panda, R. CrossViT: Cross-attention multi-scale vision transformer for image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 357–366. [Google Scholar] [CrossRef]
- Gan, J.; Chen, H.; Xu, W.; Li, H.; Zhuang, Z.; Chen, Z. Facial Beauty Prediction Combining Dual-Branch Feature Fusion with a Stacked Broad Learning System. IEEE Access 2025, 13, 130883–130895. [Google Scholar] [CrossRef]
- Gan, J.; Luo, H.; Xiong, J.; Xie, X.; Li, H.; Liu, J. Facial beauty prediction combined with multi-task learning of adaptive sharing policy and attentional feature fusion. Electronics 2023, 13, 179. [Google Scholar] [CrossRef]
- Gan, J.; Xiong, J. Masked autoencoder of multi-scale convolution strategy combined with knowledge distillation for facial beauty prediction. Sci. Rep. 2025, 15, 2784. [Google Scholar] [CrossRef] [PubMed]
Hardware/Software | Configuration Details |
---|---|
Processor | Intel Xeon Gold 5218 @ 2.30 GHz |
Memory | 188 GB |
Graphics Processing Unit | 4×NVIDIA Tesla V100-PCIE-32 GB |
Operating System | CentOS 7.6 |
Software Platform | Python 3.8.0 |
Deep Learning Framework | PyTorch 2.4.0 (CUDA 12.1) |
Model | LSAFBD | MEBeauty | HotOrNot | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Time/s | ACC (%) | MPR (%) | MR (%) | MF1 (%) | Time/s | ACC (%) | MPR (%) | MR (%) | MF1 (%) | Time/s | ACC (%) | MPR (%) | MR (%) | MF1 (%) | |
ConvNeXt-XLarge [2] | 326.58 | 71.35 | 74.40 | 73.64 | 73.25 | 60.07 | 74.79 | 70.22 | 65.63 | 67.60 | 35.31 | 61.65 | 55.61 | 48.76 | 50.41 |
ViT-T [3] | 17.27 | 66.35 | 68.96 | 67.36 | 67.89 | 5.11 | 68.59 | 68.82 | 55.49 | 58.92 | 4.89 | 59.22 | 35.78 | 33.39 | 27.66 |
Swin [20] | 37.98 | 70.80 | 74.37 | 71.77 | 72.77 | 9.61 | 70.51 | 70.40 | 53.36 | 56.91 | 7.02 | 59.95 | 54.08 | 36.47 | 33.02 |
CS3Darknet-S [24] | 11.76 | 67.10 | 73.64 | 66.30 | 68.12 | 4.33 | 64.53 | 57.60 | 61.68 | 59.27 | 3.55 | 33.50 | 41.56 | 38.85 | 26.77 |
DaViT-T [25] | 41.04 | 68.65 | 71.47 | 73.65 | 72.24 | 9.97 | 72.01 | 69.22 | 58.95 | 62.07 | 6.97 | 57.77 | 42.46 | 36.67 | 35.04 |
DeiT-T [26] | 17.28 | 66.10 | 67.69 | 68.31 | 67.56 | 4.86 | 63.68 | 61.62 | 44.25 | 44.40 | 5.12 | 60.68 | 47.91 | 37.52 | 35.09 |
FastViT-T [27] | 26.02 | 65.70 | 67.94 | 67.29 | 66.79 | 6.79 | 67.09 | 41.96 | 40.45 | 39.53 | 5.38 | 57.77 | 46.70 | 39.37 | 39.23 |
FocalNet-T [28] | 54.32 | 70.25 | 73.62 | 73.30 | 72.61 | 11.37 | 67.95 | 65.17 | 54.72 | 57.75 | 8.86 | 61.17 | 53.78 | 34.20 | 27.65 |
Hiera-T [29] | 36.58 | 63.40 | 66.71 | 64.31 | 65.38 | 8.70 | 69.23 | 64.61 | 59.42 | 61.47 | 7.63 | 61.17 | 20.39 | 33.33 | 25.30 |
Mixer-B [30] | 56.80 | 63.45 | 67.55 | 63.57 | 64.73 | 13.33 | 69.02 | 72.55 | 49.96 | 51.95 | 11.12 | 60.44 | 31.51 | 33.76 | 27.27 |
PiT-S [31] | 28.82 | 68.05 | 69.83 | 70.30 | 69.14 | 7.79 | 71.37 | 69.28 | 58.48 | 62.15 | 6.28 | 60.92 | 37.73 | 33.75 | 26.72 |
MobileNetV4-S [32] | 11.14 | 67.35 | 70.93 | 69.38 | 70.09 | 4.09 | 65.60 | 63.73 | 49.15 | 51.91 | 3.90 | 58.74 | 47.68 | 34.84 | 31.34 |
XCiT-S [33] | 72.99 | 65.35 | 71.26 | 66.42 | 67.81 | 17.27 | 63.25 | 69.60 | 51.76 | 52.87 | 12.11 | 61.65 | 45.51 | 34.42 | 27.68 |
CaiT-S [34] | 75.24 | 68.65 | 72.42 | 69.98 | 70.88 | 17.34 | 66.24 | 64.29 | 53.83 | 57.08 | 11.71 | 60.92 | 36.33 | 37.31 | 33.67 |
CoAt-M [35] | 121.82 | 69.10 | 72.77 | 70.35 | 71.39 | 14.63 | 68.59 | 69.56 | 61.05 | 62.26 | 14.48 | 60.44 | 55.57 | 40.84 | 38.83 |
CoAtNet [36] | 42.13 | 67.60 | 70.45 | 70.51 | 70.28 | 7.68 | 69.23 | 73.64 | 48.21 | 52.15 | 7.07 | 60.19 | 44.85 | 33.96 | 28.06 |
ConvIT-T [37] | 15.41 | 67.95 | 72.64 | 66.42 | 68.67 | 4.23 | 69.44 | 63.77 | 56.33 | 58.92 | 4.44 | 61.89 | 63.91 | 35.12 | 29.27 |
ConvMixer [38] | 131.57 | 62.55 | 65.18 | 62.09 | 63.15 | 13.55 | 62.39 | 50.92 | 46.06 | 46.98 | 13.21 | 54.85 | 41.01 | 38.50 | 38.53 |
Proposed Method | 542.12 | 72.09 | 75.24 | 75.43 | 75.28 | 115.94 | 73.23 | 68.52 | 67.50 | 67.61 | 63.59 | 62.62 | 56.39 | 49.29 | 51.00 |
Model | Recall (%) | MR (%) | ACC (%) | MPR (%) | MF1 (%) | ||||
---|---|---|---|---|---|---|---|---|---|
Class 0 | Class 1 | Class 2 | Class 3 | Class 4 | |||||
ConvNeXt-XLarge (Baseline) | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Baseline+CL | −1.06 | −6.52 | +2.60 | +0.37 | −8.99 | −2.72 | −0.95 | +0.96 | −1.15 |
Baseline+ConvNeXt-ABC | +0.00 | +0.00 | −8.19 | +5.70 | +0.00 | −0.50 | −1.60 | −1.19 | −0.3 |
Proposed Method | +0.00 | +0.00 | −15.21 | +24.26 | +0.00 | +1.81 | +0.75 | +0.88 | +2.03 |
Model | Recall (%) | MR (%) | ACC (%) | MPR (%) | MF1 (%) | ||
---|---|---|---|---|---|---|---|
Class 0 | Class 1 | Class 2 | |||||
ConvNeXt-XLarge (Baseline) | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Baseline+CL | −6.06 | −4.61 | +8.46 | −0.73 | −1.07 | −1.81 | −1.85 |
Baseline+ConvNeXt-ABC | +0.00 | −5.59 | +6.15 | +0.19 | −1.93 | −2.37 | −1.13 |
Proposed Method | +0.00 | −9.21 | +16.15 | +2.31 | −1.50 | −1.7 | +0.01 |
Model | Recall (%) | MR (%) | ACC (%) | MPR (%) | MF1 (%) | ||
---|---|---|---|---|---|---|---|
Class 0 | Class 1 | Class 2 | |||||
ConvNeXt-XLarge (Baseline) | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Baseline+CL | +12.20 | +7.54 | −26.92 | −2.40 | +1.94 | +6.07 | +1.7 |
Baseline+ConvNeXt-ABC | +0.00 | +4.76 | −8.97 | −1.40 | +1.21 | +0.94 | −1.08 |
Proposed Method | +0.00 | +1.59 | +0.00 | +0.53 | +0.97 | +0.78 | +0.59 |
Split Ratio | LSAFBD | MEBeauty | HotOrNot | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
ACC (%) | MPR (%) | MR (%) | MF1 (%) | ACC (%) | MPR (%) | MR (%) | MF1 (%) | ACC (%) | MPR (%) | MR (%) | MF1 (%) | |
1:7:2 | 60.94 | 62.09 | 66.67 | 63.26 | 63.56 | 21.19 | 33.33 | 25.91 | 62.65 | 20.88 | 33.33 | 25.68 |
2:6:2 | 64.81 | 71.00 | 66.91 | 68.62 | 65.43 | 74.23 | 40.04 | 40.00 | 61.45 | 34.92 | 36.48 | 33.13 |
3:5:2 | 66.94 | 71.96 | 67.61 | 69.51 | 66.22 | 71.29 | 59.11 | 62.44 | 57.83 | 46.26 | 42.54 | 43.32 |
4:4:2 | 69.50 | 71.58 | 68.62 | 69.92 | 71.79 | 69.55 | 63.41 | 65.13 | 66.57 | 59.33 | 46.69 | 48.03 |
5:3:2 | 72.56 | 78.43 | 75.36 | 76.58 | 73.40 | 75.77 | 61.22 | 65.94 | 63.55 | 58.33 | 48.38 | 50.49 |
6:2:2 | 72.09 | 75.24 | 75.43 | 75.28 | 73.23 | 68.52 | 67.50 | 67.61 | 62.62 | 56.39 | 49.29 | 51.00 |
7:1:2 | 71.63 | 75.27 | 75.41 | 74.73 | 74.47 | 73.38 | 64.32 | 67.50 | 61.17 | 55.33 | 48.73 | 50.48 |
Weight Ratio | LSAFBD | MEBeauty | HotOrNot | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
ACC (%) | MPR (%) | MR (%) | MF1 (%) | ACC (%) | MPR (%) | MR (%) | MF1 (%) | ACC (%) | MPR (%) | MR (%) | MF1 (%) | |
0.1 | 71.89 | 75.22 | 75.39 | 75.21 | 72.81 | 68.18 | 66.99 | 67.24 | 61.17 | 54.83 | 49.67 | 50.70 |
0.2 | 71.29 | 74.98 | 75.11 | 74.91 | 72.81 | 68.14 | 66.84 | 67.18 | 61.89 | 56.00 | 48.59 | 50.40 |
0.3 | 70.94 | 74.90 | 74.96 | 74.74 | 71.09 | 67.49 | 67.29 | 66.52 | 63.11 | 57.22 | 49.85 | 51.65 |
0.4 | 70.43 | 74.40 | 74.55 | 74.35 | 70.45 | 66.86 | 66.37 | 65.89 | 63.35 | 57.32 | 49.98 | 51.77 |
0.5 | 72.09 | 75.24 | 75.43 | 75.28 | 73.23 | 68.52 | 67.50 | 67.61 | 62.62 | 56.39 | 49.29 | 51.00 |
0.6 | 72.59 | 75.52 | 75.72 | 75.57 | 75.80 | 70.86 | 66.03 | 68.08 | 61.89 | 55.57 | 49.48 | 50.88 |
0.7 | 70.88 | 74.61 | 74.79 | 74.60 | 72.38 | 68.03 | 67.36 | 67.12 | 60.68 | 54.29 | 48.52 | 49.81 |
0.8 | 70.74 | 74.86 | 76.86 | 74.64 | 71.73 | 67.69 | 67.18 | 66.75 | 60.19 | 54.12 | 49.44 | 50.23 |
0.9 | 71.14 | 74.98 | 75.06 | 74.84 | 73.45 | 68.57 | 67.03 | 67.55 | 63.11 | 56.84 | 49.85 | 51.53 |
Database | Confusing Categories | Confusion Probability (%) | ACC (%) | MPR (%) | MR (%) | MF1 (%) |
---|---|---|---|---|---|---|
LSAFBD | class 3, class 4 | 23.18 | 71.74 | 75.07 | 73.84 | 73.74 |
class 2, class 3 | 26.68 | 72.09 | 75.24 | 75.43 | 75.28 | |
MEBeauty | class 0, class 1 | 26.57 | 73.88 | 66.98 | 65.65 | 66.13 |
class 1, class 2 | 29.98 | 73.23 | 68.52 | 67.50 | 67.61 | |
HotOrNot | class 0, class 1 | 36.84 | 62.38 | 57.25 | 48.88 | 50.63 |
class 1, class 2 | 39.45 | 62.62 | 56.39 | 49.29 | 51.00 |
Activation Function | LSAFBD | MEBeauty | HotOrNot | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
ACC (%) | MPR (%) | MR (%) | MF1 (%) | ACC (%) | MPR (%) | MR (%) | MF1 (%) | ACC (%) | MPR (%) | MR (%) | MF1 (%) | |
RELU | 69.98 | 73.09 | 72.65 | 72.78 | 71.52 | 66.85 | 65.29 | 65.61 | 60.92 | 55.08 | 48.93 | 50.43 |
GELU | 72.09 | 75.24 | 75.43 | 75.28 | 73.23 | 68.52 | 67.50 | 67.61 | 62.62 | 56.39 | 49.29 | 51.00 |
Data Augmentation Ratio | LSAFBD | MEBeauty | HotOrNot | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
ACC (%) | MPR (%) | MR (%) | MF1 (%) | ACC (%) | MPR (%) | MR (%) | MF1 (%) | ACC (%) | MPR (%) | MR (%) | MF1 (%) | |
10 | 72.12 | 74.96 | 74.62 | 74.76 | 73.45 | 71.38 | 66.71 | 68.06 | 62.38 | 54.46 | 48.63 | 49.92 |
1 | 72.09 | 75.24 | 75.43 | 75.28 | 73.23 | 68.52 | 67.50 | 67.61 | 62.62 | 56.39 | 49.29 | 51.00 |
Model | LSAFBD (p-Value) | MEBeauty (p-Value) | HotOrNot (p-Value) |
---|---|---|---|
ConvNeXt-XLarge [2] | p < 0.00001 | 0.00187 | 0.03274 |
ViT-T [3] | p < 0.00001 | 0.00007 | p < 0.00001 |
Swin [20] | p < 0.00001 | 0.00007 | p < 0.00001 |
CS3Darknet-S [24] | p < 0.00001 | 0.00001 | p < 0.00001 |
DaViT-T [25] | 0.01051 | p < 0.00001 | p < 0.00001 |
DeiT-T [26] | p < 0.00001 | p < 0.00001 | p < 0.00001 |
FastViT-T [27] | p < 0.00001 | p < 0.00001 | p < 0.00001 |
FocalNet-T [28] | 0.00259 | p < 0.00001 | p < 0.00001 |
Hiera-T [29] | p < 0.00001 | p < 0.00001 | p < 0.00001 |
Mixer-B [30] | p < 0.00001 | p < 0.00001 | p < 0.00001 |
PiT-S [31] | p < 0.00001 | p < 0.00001 | p < 0.00001 |
MobileNetV4-S [32] | p < 0.00001 | p < 0.00001 | p < 0.00001 |
XCiT-S [33] | p < 0.00001 | p < 0.00001 | p < 0.00001 |
CaiT-S [34] | p < 0.00001 | 0.00009 | p < 0.00001 |
CoAt-M [35] | p < 0.00001 | 0.04706 | p < 0.00001 |
CoAtNet [36] | p < 0.00001 | p < 0.00001 | p < 0.00001 |
ConvIT-T [37] | p < 0.00001 | 0.00275 | p < 0.00001 |
ConvMixer [38] | p < 0.00001 | p < 0.00001 | p < 0.00001 |
Model | LSAFBD | ||||
---|---|---|---|---|---|
ACC (%) | MPR (%) | MR (%) | MF1 (%) | Para(M) | |
TransBLS-H [4] | 66.49 | 66.21 | 66.03 | 66.12 | - |
SDB-BLS [39] | 65.63 | 65.18 | 66.52 | 65.03 | 96.13 |
Adaptive multi-task method [40] | 61.37 | - | - | 59.72 | - |
MSCD-MAE [41] | 67.94 | 73.45 | 66.63 | 69.87 | 84.60 |
Proposed Method | 72.09 | 75.24 | 75.43 | 75.28 | 695.84 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gan, J.; Xu, W.; Chen, H.; Chen, Z.; Zhuang, Z.; Li, H. ECL-ConvNeXt: An Ensemble Strategy Combining ConvNeXt and Contrastive Learning for Facial Beauty Prediction. Electronics 2025, 14, 3777. https://doi.org/10.3390/electronics14193777
Gan J, Xu W, Chen H, Chen Z, Zhuang Z, Li H. ECL-ConvNeXt: An Ensemble Strategy Combining ConvNeXt and Contrastive Learning for Facial Beauty Prediction. Electronics. 2025; 14(19):3777. https://doi.org/10.3390/electronics14193777
Chicago/Turabian StyleGan, Junying, Wenchao Xu, Hantian Chen, Zhen Chen, Zhenxin Zhuang, and Huicong Li. 2025. "ECL-ConvNeXt: An Ensemble Strategy Combining ConvNeXt and Contrastive Learning for Facial Beauty Prediction" Electronics 14, no. 19: 3777. https://doi.org/10.3390/electronics14193777
APA StyleGan, J., Xu, W., Chen, H., Chen, Z., Zhuang, Z., & Li, H. (2025). ECL-ConvNeXt: An Ensemble Strategy Combining ConvNeXt and Contrastive Learning for Facial Beauty Prediction. Electronics, 14(19), 3777. https://doi.org/10.3390/electronics14193777