An Early Hair Loss Detection and Prediction Method Based on Machine Learning

Ahmad, Muhammad; Mir, Azka; Permana, Anton

doi:10.3390/engproc2025107126

Open AccessProceeding Paper

An Early Hair Loss Detection and Prediction Method Based on Machine Learning^†

by

Muhammad Ahmad

^1,*,

Azka Mir

¹ and

Anton Permana

²

¹

Department of Software Engineering, University of Sialkot, Sialkot 51040, Pakistan

²

Department of Career and Alumni Relation Unit, Nusa Putra University, Sukabumi 43152, West Java, Indonesia

^*

Author to whom correspondence should be addressed.

^†

Presented at the 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society, Aizuwakamatsu City, Japan, 20–26 January 2025.

Eng. Proc. 2025, 107(1), 126; https://doi.org/10.3390/engproc2025107126

Published: 11 October 2025

(This article belongs to the Proceedings of The 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society)

Download

Browse Figures

Versions Notes

Abstract

Hair loss is a common issue that influences many people around the world and can lead to mental and social challenges, which can bring down self-esteem and social relationships. To overcome these challenges, this study investigates the promising role of machine learning (ML) in the early detection and determination of hair loss, clearing the way for personalized medicines. In order to arrive at a particular outcome, the research incorporates a few techniques, including Random Forest, Support Vector Machines (SVMs), as well as K-nearest neighbor (KNN). Important elements like feature engineering, preprocessing, and hyperparameter tweaking are used. Traditional approaches are outrun by the outcomes reached, and there is a clear difference when it comes to the accuracy and precision. This study shows the potential of automatic diagnostics that could transform the treatment of hair loss to the enormous benefit of the many afflicted by it.

Keywords:

hair loss; machine learning; early prediction; random forest; support vector machines

1. Introduction

Alopecia especially tends to impact an individual’s lifestyle and perception of self, and hence is termed hair loss. Hair loss goes beyond the physical, which makes it a more complex problem that can also cause emotional distress, social discomfort, and lower self-esteem. Hormonal variations, genetics, stress, and nutritional imbalances are some of the common causes of hair loss, and we expand upon this issue further [1,2]. Existing treatments, however, can be classified as traditional, for example, the manual evaluation of hair loss performed by trichologists or dermatologists, which relies on the human nose and is quite subjective. These methods can sometimes be incorrect, causing inaccurate findings and treatment criteria, and they can be physically challenging [3]. In addition to this, it is necessary to look for alternative methods and solutions that are better suited because manual evaluations may not consider all the possible factors that contribute to hair loss. Later progressions in machine learning (ML) and counterfeit insights (AI) offer a groundbreaking opportunity to convert hair loss diagnostics. By leveraging endless amounts of information, these advances empower the improvement of mechanized, exact, and versatile arrangements that might essentially diminish reliance on conventional strategies [4,5]. Even though it might not be possible for humans to conclude as fast, we might be able to use earlier conclusions in order to offer more individualized treatment options [6]. While predicting hair loss, this study employs sophisticated algorithms such as neural networks that are intended for predicting tasks to be executed on the structured dataset that includes demographic features, medical history, and various other relevant features. Finally, in order to check the accuracy and enhance the model’s training, complex preprocessing techniques are incorporated [7]. Moreover, the focus of this investigation is on the development of robust models that can generate accurate and reliable predictions so that the medical practitioners are aided in making informed decisions regarding diagnosis and treatment plans [8]. In addition, the impact of this approach is not limited to just increasing the accuracy of the diagnosis. We can help alleviate some pressure from medical institutions and improve patient experience by utilizing AI and ML in the hair loss assessment process. That change might also provide a better understanding of the hair loss mechanisms and thus allow for the design of more targeted and effective therapies [9]. The subsequent sections will detail this study’s elaborate design, the results of the various technological algorithms that have been analyzed, and the implications of these findings for patient management. The main objective of this research is to help improve the quality of life of many people suffering from hair loss and to continue the work of finding ways to treat this condition.

2. Literature Review

ML has quickly progressed in wellbeing informatics, especially in diagnosing hair and scalp-related conditions. Ref. [1] marked progress in the area of scalp diagnosis by transferring learners by improving accuracy levels while identifying six types of lesions. Propositional models of Alopecia Areata are accompanied by practical implications.

The clinical version of the Severity of Alopecia Tool (SALT). For instance, ref. [4] proposed a surface-sensing Convolution Neural Network (CNN) model with 98.78% accuracy for classifying scalp lesions, while [2] introduced a lightweight CNN trained on scanning electron microscopy images, achieving 94.8% accuracy to simplify traditional hair damage diagnostics. In other words, a recent study by Margaryan et al. employed Mask Region-based Convolutional Neural Network (R-CNN). to classify hair follicles and visually demonstrated the severity of hair loss, and Moreira et al. achieved a 15-4 percent boost in accuracy. Furthermore, ref. [6] reached an accuracy of 95% across three models, DenseNet, XceptionNet, and ResNet—on a disparate dataset for alopecia, whereas ref. [7] reached an accuracy of 58.67% using on measurement of hair density using YOLOv4, implying the need for better datasets. Ref. [8] created a CNN-based system for dandruff severity analysis, achieving 85.03% accuracy while ref. [9] stressed the importance of preprocessing methods, including Contrast Limited Adaptive Histogram Equalization (CLAHE) and data augmentation, to improve VGG-SVM models for alopecia diagnosis and proposed the utilization of YOLOv5 for follicle detection, successfully tackling the problems of small-scale datasets and achieving a mAP of 0.8151. Ref. [10] explored a Visual Geometry Group-16 (VGG) based CNN for hair disease classification with 94.5% accuracy. Ref. [11] analyzed scalp diseases using deep learning models for effective diagnosis.

Ref. [12] developed a Mask R-CNN for follicle detection with high precision. Ref. [13] proposed the AB-MTEDeep classifier trained with AAGAN for identifying and classifying alopecia areata. Ref. [14] proposed Attention-based Balanced Multi-Tasking Ensembling Deep network (AB-MTEDeep) enhanced by Alopecia Areata–Generative Adversarial Network (AA-GAN) for Alopecia Areata classification, achieving 96.94% accuracy and addressing data imbalance challenges. Ref. [15] devised a novel ensemble model that employs Random Forest as well as SVMs and active CNNs in order to optimize recall and accuracy on the hair loss prediction task.

Ref. [5] illustrated the improvement of ensemble techniques in F1 score when various algorithms are combined for hairfall prediction. To bridge the gap between theory and practice, future work needs to solve challenges like a lack of data, a lack of generalization, and a lack of performance metrics, which still exist in spite of the recent developments [16], so that ML-based diagnostic systems can be made more efficient and effective. A comparison of related studies, including CNN, ResNet, YOLOv4, and other models with their reported performance, is summarized in Table 1.

3. Methodology

In this section, we used this framework, which is given in Figure 1. This framework consists of old machine learning steps from data gathering for predicting hair loss. These steps are further divided into sub-steps, as you can see in Figure 1.

In the first step, we gathered data from the hair loss dataset, as shown in the data gathering step. After collecting the data, we cleaned and formatted the data and also handled missing values. After that, we split the data into a 70:30 split. We used 70% for training the data and 30% for testing the data. In the next step, we selected different classification algorithms according to our data for prediction. After selecting the algorithms that may be suitable for our data and according to the targeting label, we trained and applied the model using the training data. And at the last step, we obtained the output in the form of the prediction of hair loss. These are the steps that are performed in the framework given in Figure 1. And the proposed model, which we used with operators, is shown in Figure 2.

3.1. Data Collection and Preprocessing

The dataset comprises a total of 999 entries, each characterized by 13 distinct attributes that provide valuable insights into various factors. These attributes encompass genetic predisposition, hormonal shifts, age, and lifestyle choices, all of which are crucial for understanding the subject matter. To prepare the data for analysis, categorical variables were transformed using LabelEncoder, ensuring that each category is represented as a numerical value. Meanwhile, numerical features underwent standardization with StandardScaler, which modifies the values to achieve a mean of zero and a standard deviation of one, facilitating more effective comparisons. Additionally, any missing values in the dataset were thoughtfully managed by substituting them with suitable representative values, such as designating "Unknown" for categorical features, thereby maintaining the integrity of the data.

3.2. Data Partitioning

During the data split step, the dataset was partitioned into two sets. A total of 30% is used for testing, and the remaining 70% is used for training. This partitioning allowed machine learning model training on the majority of the data, resulting in accurate performance evaluation on the reserved test set. The random state was set to 42 to ensure reproducibility across tests.

3.3. Model Development

Several machine learning techniques were used in this investigation to evaluate how well they performed on the dataset:

Logistic Regression: This algorithm served as the baseline model, establishing the standard for the performance of the other models. It is also suitable for tasks that involve binary classification only.
KNN: This technique was explored with various values of k (3) to determine the optimal number of nearest neighbors for classifying the data points. KNN is a non-parametric method that can adapt well to different data distributions.
Random Forest: This ensemble learning method was fine-tuned using GridSearchCV, which systematically searches through multiple combinations of hyperparameters to find the best configuration for improved accuracy and performance.
Gradient Boosting and XGBoost: These advanced algorithms were applied to effectively capture and model complex relationships within the data, enhancing predictive performance by sequentially addressing the errors of previous models.

4. Results and Discussion

4.1. Model Performance

Table 2 shows the outcomes of every algorithm utilized in this study. Random Forest achieved the highest accuracy (99.80%), and Logistic Regression, KNN, and Gradient Boosting also showed competitive performance. However, these results highlight areas for improvement, particularly in sensitivity and managing false positives. Table 2 shows the outcomes of every algorithm.

From the data given in the above table, we can see that the Random Forest performs greatly and gives the highest accuracy in comparison with other algorithms. The obtained accuracy rate can be seen in the graph illustrated in Figure 3.

Random Forest gives the most accurate results for the given data. The Random Forest classifier is the most effective model for this dataset, as it achieves competitive results on multiclass data efficiently. Consequently, Random Forest has attained the highest accuracy in the proposed model. The confusion matrix generated by the Random Forest classifier is presented in Table 3.

4.2. Feature Importance

The Random Forest model uncovered the most critical highlights impacting hair loss. This finding highlights the complex interaction between physiological components, such as age and glucose levels, and lifestyle-related components. These experiences propose that both organic and behavioral viewpoints play a significant part in understanding and tending to hair misfortune.

5. Conclusions

This study has made a critical advance in creating ML to anticipate hair loss with noteworthy precision. Through comprehensive preparation and assessment, we have illustrated the potential viability of our approach. Among the models tried, our results uncovered qualities in a few calculations; be that as it may, we recognize the need for advanced optimization, especially in tending to affectability and learning bias issues to guarantee dependable forecasts over assorted persistent profiles.

Future Enhancements

Looking ahead, we plan to center on a few key regions:

Growing Datasets:
By broadening our datasets to incorporate a more diverse statistical run, we point to a way to better obtain the components affecting hair loss over different populations.
Tending to Course Awkwardness:
We will execute progressive procedures such as oversampling, undersampling, or algorithmic alterations to upgrade the model’s vigor and exactness, especially for underrepresented groups.
Moving Forward and Demonstrating Execution:
Our future work will investigate hyperparameter tuning and highlight choices to advance and refine the model’s prescient capabilities and decrease mistakes.
The suggestions of our discoveries expand to progressing the field of trichology and moving forward quiet care. By giving exact, data-driven forecasts, this investigation lays the basis for more solid demonstrative instruments, which may help healthcare experts in making educated choices. Whereas this study centers exclusively on demonstrating improvement, future integration with viable symptomatic workflows has the potential to revolutionize hair loss administration.

Author Contributions

M.A. contributed to conceptualization, research design, project administration, data curation, validation, and writing original draft. A.M. was responsible for data curation, analysis, and critical review. A.P. contributed to validation, review, and refinement of the manuscript. All authors contributed to writing, discussed the results, and approved the final version of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

Authors received no external funding for this research.

Institutional Review Board Statement

Not Applicable.

Informed Consent Statement

Not Applicable.

Data Availability Statement

The data will be made available upon reasonable request to the first author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Roy, M.; Protity, A.T. Hair and scalp disease detection using machine learning and image processing. Eur. J. Inf. Technol. Comput. Sci. 2023, 47, 1–10. Available online: https://www.ej-compute.org/index.php/compute/article/view/85 (accessed on 25 June 2025).
Zhang, L.; Man, Q.; Cho, Y.I. Deep-learning-based hair damage diagnosis method applying scanning electron microscopy images. Diagnostics 2021, 11, 1831. [Google Scholar] [PubMed]
Kim, J.-H.; Lee, S.-H.; Moon, Y.-S. Hair follicle classification and hair loss severity estimation using Mask R-CNN. J. Imaging 2022, 8, 283. [Google Scholar] [CrossRef] [PubMed]
Kim, H. Development of scalp diagnosis algorithm using surface-sensing convolutional neural network. Natl. Acad. Sci. Lett. 2024, 47, 1–5. Available online: https://www.researchgate.net/publication/371862840 (accessed on 25 June 2025). [CrossRef]
Gupta, M.; Mysore, V. Classifications of patterned hair loss: A review. J. Cutan. Aesthetic Surg. 2016, 9, 3–12. [Google Scholar] [CrossRef] [PubMed]
Kim, M.; Kang, S.; Lee, B.D. Evaluation of automated measurement of hair density using deep neural networks. Sensors. 2022, 22, 650. [Google Scholar] [CrossRef] [PubMed]
Kim, M.; Gil, Y.; Kim, Y.; Kim, J. Deep-learning-based scalp image analysis using limited data. Electronics. 2023, 12, 1380. [Google Scholar] [CrossRef]
Sayyad, S.; Midhunchakkaravarthy, D. Deep review on alopecia areata diagnosis for hair loss-related autoimmune disorder problem. Int. J. Health Sci. 2022, 6, 123–130. Available online: https://journals.innovareacademics.in/index.php/ijap/article/view/45533/26767 (accessed on 25 June 2025). [CrossRef]
Kumar, G.R.; Banerjee, D. CNN-KNN Model for Assessing Hair Health in Telogen Effluvium. In Proceedings of the IEEE 2025 3rd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), Bengaluru, India, 5–7 February 2025; pp. 1531–1537. [Google Scholar]
Karo, I.M.K.; Kiswanto, D.; Panggabean, S.; Perdana, A. Hair Disease Classification Using Convolutional Neural Network (CNN) Algorithm with VGG-16 Architecture. Sink. J. Dan Penelit. Tek. Inform. 2023, 7, 2786–2793. [Google Scholar] [CrossRef]
Krishnamoorthy, N.; Jayanthi, P.; Kumaravel, T.; Sundareshwar, V.A.; Harris, R.S.J. Scalp disease analysis using deep learning models. Appl. Comput. Eng. 2023, 2, 1003–1009. [Google Scholar] [CrossRef]
Saraswathi, C.; Pushpa, B. Ab-mtedeep classifier trained with aagan for the identification and classification of alopecia areata. Eng. Technol. Appl. Sci. Res. 2023, 13, 10895–10900. [Google Scholar] [CrossRef]
Sai, C.N.V.; Archana, E.; Vivek, B.; Dhanwanth, B.; Viknesh, K.S. Enhancing Hairfall Prediction: A Comparative Analysis of Individual Algorithms and An Ensemble Method. Int. J. Recent Innov. Trends Comput. Commun. 2023, 11, 499–508. [Google Scholar] [CrossRef]
Babbar, H.; Rani, S.; Masud, M.; Verma, S.; Anand, D.; Jhanjhi, N. Load balancing algorithm for migrating switches in software-defined vehicular networks. Comput. Mater. Continua 2021, 67, 1301–1316. [Google Scholar] [CrossRef]
Kok, S.H.; Azween, A.; Jhanjhi, N.Z. Evaluation metric for crypto-ransomware detection using machine learning. J. Inf. Secur. Appl. 2020, 55, 102646. [Google Scholar] [CrossRef]
Alshudukhi, K.S.S.; Ashfaq, F.; Jhanjhi, N.Z.; Humayun, M. Blockchain-enabled federated learning for longitudinal emergency care. IEEE Access 2024, 12, 137284–137294. [Google Scholar] [CrossRef]

Figure 1. Framework.

Figure 2. Proposed model with operators.

Figure 3. Accuracy comparison in bar chart.

Table 1. Comparison with similar studies.

Ref	Year	ML Algorithms	Performance
[1]	2023	CNN, Random Forest	High accuracy, sensitivity, and specificity
[2]	2021	Deep learning	94.8% accuracy
[3]	2022	ResNet, Mask R-CNN	4–15% increase in classification accuracy
[4]	2021	CNN, CBAM, DSC, FC	85.03% accuracy
[5]	2022	YOLOv4, DetectoRS, EfficientDet	YOLOv4: 58.67 mean average precision
[6]	2023	ResNet, ResNeXt, DenseNet, XceptionNet	95.75% accuracy, 87.05 F1 score
[7]	2022	ANNs, CLAHE, VGG, CNN, SVM	Improved accuracy in classifying healthy and alopecia cases
[8]	2023	ANNs, CLAHE, VGG, CNN, SVM	Enhanced precision in classification
[9]	2023	CNN, VGG-16	94.5% accuracy
[10]	2023	CNN, VGG16, VGG19, MobileNetV2	High accuracy in disease prediction
[11]	2023	YOLOv5	mAP of 0.8151, optimal results with YOLOv5l in multiclass detection
[12]	2024	CNN	98.78% accuracy
[13]	2023	AB-MTEDeep, FRCNN, LSTM, AA-GAN	96.94% accuracy
[14]	2023	CNN, Random Forest	High accuracy, sensitivity, and specificity
[15]	2023	CNN, Random Forest, SVM, KNN	Improved accuracy, precision, and recall

Table 2. Results of algorithms.

Model	Accuracy
Logistic Regression	51.25%
KNN (k = 3)	87.09%
Decision Tree	51.65%
Random Forest	99.80%
Naive Bayes	58.16%
SVM	49.33%
Gradient Boosting	71.77%
XGBoost	49.33%

Table 3. Class Wise Precision.

	True No	True Yes	Class Precision
pred. No	500	0	100.00%
pred. Yes	2	497	99.60%
class recall	99.60%	100.00%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ahmad, M.; Mir, A.; Permana, A. An Early Hair Loss Detection and Prediction Method Based on Machine Learning. Eng. Proc. 2025, 107, 126. https://doi.org/10.3390/engproc2025107126

AMA Style

Ahmad M, Mir A, Permana A. An Early Hair Loss Detection and Prediction Method Based on Machine Learning. Engineering Proceedings. 2025; 107(1):126. https://doi.org/10.3390/engproc2025107126

Chicago/Turabian Style

Ahmad, Muhammad, Azka Mir, and Anton Permana. 2025. "An Early Hair Loss Detection and Prediction Method Based on Machine Learning" Engineering Proceedings 107, no. 1: 126. https://doi.org/10.3390/engproc2025107126

APA Style

Ahmad, M., Mir, A., & Permana, A. (2025). An Early Hair Loss Detection and Prediction Method Based on Machine Learning. Engineering Proceedings, 107(1), 126. https://doi.org/10.3390/engproc2025107126

Article Menu

An Early Hair Loss Detection and Prediction Method Based on Machine Learning^†

Abstract

1. Introduction

2. Literature Review