Machine Learning in Allergic Contact Dermatitis: Identifying (Dis)similarities between Polysensitized and Monosensitized Patients

: Background: Allergic contact dermatitis (ACD) is a delayed hypersensitivity reaction occurring in sensitized individuals due to exposure to allergens. Polysensitization, defined as positive reactions to multiple unrelated haptens, increases the risk of ACD development and affects patients’ quality of life. The aim of this study is to apply machine learning in order to analyze the association between ACD, polysensitization, individual susceptibility, and patients’ characteristics. Methods: Patch test results and demographics from 400 ACD patients (Study protocol Nr. 3765/2022), categorized as polysensitized or monosensitized, were analyzed. Classic statistical analysis and multiple correspondence analysis (MCA) were utilized to explore relationships among variables. Results: The findings revealed significant associations between patient characteristics and ACD patterns, with hand dermatitis showing the strongest correlation. MCA provided insights into the complex interplay of demographic and clinical factors influencing ACD prevalence. Conclusion: Overall, this study highlights the potential of machine learning in unveiling hidden patterns within dermatological data, paving the way for future advancements in the field.


Introduction
Allergic contact dermatitis (ACD) is an inflammatory skin disease, which is characterized by a delayed type of IV hypersensitivity immune reaction.ACD occurs only in previously sensitized individuals [1], while the sensitization process is a result of the interplay between genetic and environmental factors [2][3][4][5].Sensitized patients to one allergen have been shown to be at high risk of developing multiple contact allergies [2,4].The term multiple sensitizations or polysensitizations is defined as having positive patch test reactions to three or more non-related haptens [2][3][4][5].Polysensitization is considered to increase ACD prevalence and affect patients' quality of life [5].
Patch testing is the in vivo method of choice for detecting allergen sensitization.The European baseline series (EBS) is the main contact allergen group that is used during patch testing.However, based on labor, social, and national norms, the EBS panel varies between different diagnostic departments and geographical regions [6][7][8].For instance, Thimerosal 0.1% is included in the EBS of the National Reference Center for Occupational Dermatoses "Andreas Syggros" Hospital' in Athens, Greece [7].
Thimerosal is an organic compound containing two sensitizing moieties, mercury and thiosalicylate [9,10].It is worth mentioning that despite the fact that thimerosal trace levels are either eliminated or minimized in many products today, there is still pervasive exposure to the population as a result of its previous extensive use.In addition, due to its antimicrobial action, thimerosal is still added as a disinfectant agent in merthiolate tincture and as a preservative, especially, in biological products (vaccines and antitoxins) and pharmaceutical/self-hygiene products (ocular solutions, eye drops/ointments, and contact lens fluids), while less commonly than previously in cosmetics and tattoo inks [9][10][11].The above thimerosal-containing products can provoke localized hypersensitivity reactions [9].Thimerosal is, also, an indicator of photosensitivity to piroxicam, due to its thiosalicylic moiety [10,11].Despite the prevalence of thimerosal sensitivity seeming to be quite high, the clinical relevance of a positive patch test to thimerosal is usually very difficult to establish [10][11][12].In most cases it seems to be due either to ocular preparations or vaccines [11,12].
Artificial intelligence (AI) applications are outlined for a variety of medical specialties, such as dermatology, neurology, cardiology, pediatrics, surgery and others [13].More specifically, machine learning algorithms have the ability to uncover data patterns from extensive repositories of patient information [14].The application of machine learning algorithms on patch testing datasets was found to improve the process of contact sensitization detection [14,15].So, machine learning models may contribute to ACD early diagnosis and enhance its precision, as well as the treatment strategy [13][14][15].
The general purpose of this study was the investigation of sensitization patterns in order to understand the association among ACD, polysensitization, individual susceptibility, and patients' characteristics.In more detail, this analysis aimed to (a) identify relationships between patients' characteristics and ACD patterns, and (b) compare the performance of the monosensitized patients (in terms of thimerosal) against that of the polysensitized.Machine learning and classical techniques were used in this retrospective study.

Study Design and Patient Selection
Patch test results from 400 ACD patients (200 polysensitized and 200 monosensitized) were collected at the National Reference Center for Occupational Dermatoses "Andreas Syggros" Hospital in Athens, Greece.In this observational study, data were retrospectively collected from decades worth of hospital medical records, and transcribed into an electronic medical dataset.In the current study, the entire data management was conducted under supervision of the treating physicians of the ACD patients.Furthermore, the Scientific Review Board of the "Andreas Syggros" Hospital reviewed and approved this study protocol (Protocol Nr. 3765/2022).All ethical aspects of the study were fully in line with the Helsinki Declaration (1975, review 2000).All participant data were anonymized, and no patient information could be identified.The whole study was conducted with respect to medical data confidentiality.
According to the guidelines of the Department for Patch Testing, an adapted EBS of 30 contact allergens is tested for contact sensitization, while exclusion criteria for patch testing are the high UV exposure and the chronic use of corticosteroids, immunomodulators, and anti-inflammatory drugs, that might produce false-positive or negative results [6][7][8].The detection of sensitization (positive patch test) is based on the International Contact Dermatitis Research Group (ICDRG) criteria.
In this study, two groups were assigned: monosensitized patients having a positive patch test to thimerosal 0.1%, and polysensitized patients having positive patch test reactions to 3 or more unrelated haptens of the adapted EBS (the 30 specified contact allergens).In each case (mono-or polysensitized), the data collection was completed for the first 200 medical records of men and women, with an equal contribution to both groups.The baseline patients' demographics and clinical characteristics are listed in Table 1.

Polysensitization
Polysensitization is defined as having positive patch test reactions to three or more non-related haptens [5].The positive patch-test reactions of the polysensitized patients are listed in Table 2, which were methodically recorded based on the ICDRG criteria.

Data Analysis
Following data collection, descriptive statistics and statistical comparisons between groups were used in the study.In this study, all variables (except for age) were on the nominal or ordinal scale.Chi-square and multiple correspondence analysis (MCA) were performed.The chi-square analysis was used to investigate the relationship between two or more features (at the 5% nominal significance level).When more than two hypothesis tests were performed simultaneously, the Bonferroni correction was used.
MCA, a machine learning technique, serves as an extension of correspondence analysis (CA) and proves to be particularly advantageous when working with datasets that encompass multiple categorical variables, as in the case of this study [16].The primary objective of MCA is to visually represent complex connections among categorical variables, thereby facilitating comprehension of patterns and associations within the data.Through the process of dimensionality reduction, MCA enables the representation of data in a manner that is both succinct and comprehensible, while still preserving the majority of the significant information.Each categorical variable is represented as a point in the MCA plot.The position of the point in the reduced space is determined based on the relationships and associations between categories of that variable with other variables in the dataset.The origin of the plot represents the centroid or the average of all the data points.It indicates the overall average of all the variables in the dataset.The vector lines extend from the origin (centroid) to the points representing the individual categorical variables.The direction of the vector line indicates the relationship and association between the variable and the overall average.The length of the vector line represents the strength of that association.Longer vector lines suggest a stronger association with the centroid.The angle between two vector lines represents the relationship and association between the corresponding variables in the reduced space.If two vector lines are closer together, it suggests that the categories they represent are similar or positively associated.Conversely, if they are far apart, it indicates dissimilarity or a negative association.A 90 o angle indicates that the variables are not related.
By visualizing the vector lines in the MCA plot, you can gain insights into the relationships between categorical variables and identify patterns and groupings in the data.In general, the number of machine learning algorithms which analyze categorical data (nominal and ordinal) is limited, so the application of MCA was the most appropriate algorithm for this analysis.The MCA analysis was used to complement the chi-square findings and reveal the relationships among the variables in a compact way [16,17].In this study, Varimax rotation was used to simplify the generated representations and contributed to model's flexibility [16,17].As measures of the internal model's consistency, component loads were illustrated in the generated MCA plots.
In this study, the entire statistical analysis was implemented in IBM SPSS ® v.28 (Chicago, IL, USA).
Multiple sensitizations were induced mainly by three allergen-categories and accounted for 109 (54.5%) of all polysensitized patients, followed by four in 49 cases (24.5%), and five in 27 cases (13.5%).Among the polysensitization patterns, preservatives-fragrances-metals was the most frequent combination of contact allergens.The polysensitization patterns of this study are summarized in Table 3.

Associations among Patients' Characteristics
Chi-square analysis revealed statistically significant relationships between hand dermatitis and patient group (p-value = 0.001).In particular, the percentage of monosensitized patients was higher than the percentage of polysensitized patients (monosensitization > polysensitization).None of the groups, however, were shown to be associated with the other dermatitis types (FD/Face Dermatitis, LD/Leg Dermatitis, TD/Trunk Dermatitis, and AD/Atopic Dermatitis).Also, both polysensitized (p-value = 0.003) and monosensitized (p-value = 0.000) individuals had significant relationships between hand dermatitis and occupation class.Only in the monosensitized patients' group were significant relationships between hand dermatitis and gender revealed (p-value = 0.025), with males outnumbering females (number of males > number of females) (Table 4).

Multiple Correspondence Analysis
The application of MCA revealed interesting relationships among several patients' characteristics.Indeed, occupation showed a strong association with gender and age, while AD was related to familial AD history (Figure 1A).Interestingly, the patient group was found not to be related to occupation, age, or gender, while the total patient cohort was associated with AD and family AD history (Figure 1A).Additional relationships were found among the anatomical regions of ACD.Specifically, HD was shown to be most associated with FD, then with LD, and less with TD (Figure 1B).
In terms of anatomical regions, gender was most strongly linked with FD and HD, then LD, and finally TD (Figure 2A).Age, on the other hand, was significantly related to TD, followed by LD, but not to HD or FD (Figure 2B).Occupation was most closely associated with HD and FD, then LD, and less correlated with TD (Figure 2C).Furthermore, both AD and family AD history were most strongly associated with TD, followed by LD, HD, and less so with FD (Figure 2D,E).
The application of MCA revealed interesting relationships among several patients' characteristics.Indeed, occupation showed a strong association with gender and age, while AD was related to familial AD history (Figure 1A).Interestingly, the patient group was found not to be related to occupation, age, or gender, while the total patient cohort was associated with AD and family AD history (Figure 1A).Additional relationships were found among the anatomical regions of ACD.Specifically, HD was shown to be most associated with FD, then with LD, and less with TD (Figure 1B).In terms of anatomical regions, gender was most strongly linked with FD and HD, then LD, and finally TD (Figure 2A).Age, on the other hand, was significantly related to TD, followed by LD, but not to HD or FD (Figure 2B).Occupation was most closely associated with HD and FD, then LD, and less correlated with TD (Figure 2C).Furthermore, both AD and family AD history were most strongly associated with TD, followed by LD, HD, and less so with FD (Figure 2D,E).Additional correlations were found between patient groups and the anatomical regions of ACD in the following descending order: HD > TD > FD > LD (Figure 3).Additional correlations were found between patient groups and the anatomical regions of ACD in the following descending order: HD > TD > FD > LD (Figure 3).Additional correlations were found between patient groups and the anatomical regions of ACD in the following descending order: HD > TD > FD > LD (Figure 3).MCA analysis was also performed to assess the influence of gender, age, occupation, AD, family AD history, and anatomical sites of ACD on the polysensitized patients.The same analysis was performed on the monosensitized patients for comparison (Figure 4).In the group of polysensitized patients, similar positive correlations to in the total patient cohort (Figure 1A) were identified for the following variables: Occupation manifested a positive correlation with gender and age, as well as AD with family AD history (Figure 4A).In the group of monosensitized patients, occupation also manifested a positive correlation with gender and age, as well as AD with family AD history.On the contrary, AD and family AD history were found to be independent of age and gender (Figure 4B).
In the group of polysensitized patients, HD, in contrast to the total patient cohort (Figure 1B), was found to be most correlated to LD, then to FD and TD (Figure 4C).In the group of monosensitized patients, HD, as in the total patient cohort, was found to be most correlated with FD.On the contrary, LD was found to be most correlated with TD and independent of HD and FD (Figure 4D).
Additional results regarding polysensitization are shown in Figure 5.The application of MCA revealed interesting relationships among medicines, metals, and fragrances, while colorant contact allergens were found to be independent of the other allergen types (Figure 5A).On the other hand, only dyes-colorants were strongly linked with AD (Figure 5B).The anatomical regions of HD and FD were, also, significantly related to medicines, followed by dyes-colorants, but not to fragrances and metals (Figure 5C).
In the group of polysensitized patients, similar positive correlations to in the total patient cohort (Figure 1A) were identified for the following variables: Occupation manifested a positive correlation with gender and age, as well as AD with family AD history (Figure 4A).In the group of monosensitized patients, occupation also manifested a positive correlation with gender and age, as well as AD with family AD history.On the contrary, AD and family AD history were found to be independent of age and gender (Figure 4B).In the group of polysensitized patients, HD, in contrast to the total patient cohort (Figure 1B), was found to be most correlated to LD, then to FD and TD (Figure 4C).In the group of monosensitized patients, HD, as in the total patient cohort, was found to be most correlated with FD.On the contrary, LD was found to be most correlated with TD and independent of HD and FD (Figure 4D).
Additional results regarding polysensitization are shown in Figure 5.The application of MCA revealed interesting relationships among medicines, metals, and fragrances, while colorant contact allergens were found to be independent of the other allergen types (Figure 5A).On the other hand, only dyes-colorants were strongly linked with AD (Figure 5B).The anatomical regions of HD and FD were, also, significantly related to medicines, followed by dyes-colorants, but not to fragrances and metals (Figure 5C).

Discussion
This study aimed to investigate the patterns of contact sensitization using machine learning (ML) methods, in order to unveil any association among ACD, polysensitization, individual susceptibility, and patient characteristics.Patients diagnosed with ACD are at

Discussion
This study aimed to investigate the patterns of contact sensitization using machine learning (ML) methods, in order to unveil any association among ACD, polysensitization, individual susceptibility, and patient characteristics.Patients diagnosed with ACD are at an increased risk of developing additional hypersensitivity reactions [4].Contact sensitization varies among individuals due to both environmental and genetic factors, while polysensitization occurs more often than expected based on population frequencies of individual sensitization [4].Therefore, polysensitized patients represent a special subgroup with increased susceptibility to contact allergy [2][3][4].
Thimerosal 0.1% was selected as the allergen for monosensitization, since based on previous results, it was one of the most prevalent allergens [7].It should be mentioned that despite thimerosal having now been either reduced or removed from many pharmaceutical and cosmetic products, there remains widespread exposure in the population due to its previous extensive use.Indeed, there is a significant geographical variability in the incidence of thimerosal sensitization, which can be explained by its availability and application between different formulations in each country [10].Despite the quite high prevalence of thimerosal sensitivity, as in Greece, the clinical relevance of a positive patch test to thimerosal is usually very difficult to establish [10][11][12].In most cases this seems to be due to use of ocular preparations and topical medicines, but it is more likely to be attributable to high vaccination levels of the general population, and to occupational vaccinations against infectious diseases (influenza, hepatitis), such as in healthcare workers [10][11][12].It is worth mentioning that all comparisons made between the polysensitized and monosensitized patients actually refer to thimerosal sensitization.
In the current study, the dimensionality reduction method of MCA was applied for investigation of data patterns, as well as for data interpretability and visualization of patients' characteristics.More specifically, dimensionality reduction is the process of taking data in a high dimensional space and mapping them into a new space whose dimensionality is much smaller [17].Dimensionality reduction is categorized in the ML subgroup of unsupervised learning, which includes algorithms that work solely on data without prior knowledge of any input or output variables [13,16,17].However, the number of machine learning algorithms which analyze categorical data (nominal and ordinal) is limited.This means that machine learning techniques that rely on numerical data could not be used because our dataset consisted almost completely of nominal/ordinal variables.Other suitable algorithms, like random forest and logistic regression, were not used since their scope did not fit the goals of this study.For example, logistic regression is used for classification purposes.That means predicting the probability of an event occurring (e.g., belonging to the mono-or polysensitized group) given some input features (e.g., patients' characteristics) [16,17].Consequently, the application of MCA was the most appropriate algorithm for this analysis.
In addition, it is commonly advised to perform machine learning in large datasets, which represents the diversity and complexity of the investigated problem, in order to prevent overfitting.If the model becomes "overfitted,", it means that it is unable to generalize well to new data and eventually performs the classification or prediction tasks that it was intended for.Usually, a dataset should ideally contain a minimum of ten times the number of features in order to ensure sufficient data points (records) for effective model training.The quantity of data required for conducting MCA can also depend on various factors, such as the quantity of categorical variables, the number of categories within each variable, and the intricacy of the interrelationships among the variables [17].In our case, the number of tested variables never exceeded five, which means that the number of participants (i.e., 400 in total, or 200 per group) was more than adequate to allow for obtaining robust results.
MCA allowed assessing the influence of patients' profiles on ACD prevalence.Despite the fact that authors have described a decreased risk of sensitization in AD patients, the penetration of haptens seemed to be higher in patients with persistent AD due to the established barrier dysfunction [18,19].Interestingly, in this study positive correlations were identified between AD and family AD history in the total ACD patient cohort.
The MOAHLFA index calculation is an essential parameter in the clinical evaluation of ACD [6][7][8].In the total patient cohort, the most common ACD anatomical region was the hand, which is in agreement with the ESSCA patch test database [20].Based on the chi-square analysis, significant associations were found between the patient group and HD, while MCA revealed additional correlations between the patient group and all anatomical regions of ACD in the following descending order: HD > TD > FD > LD.Therefore, haptens seem to be transferred among the affected anatomical body sites, such as from hands to face [11].
Comparing patient groups, chi-square analysis found significant associations between HD and occupation in both patient groups, while between HD and gender only in monosensitized patients.MCA revealed additional and similar correlations among gender, age, and occupation, indicating that these are important risk factors for both individual susceptibility and polysensitization [5].Moreover, MCA showed that in the polysensitized patients, HD was found to be most correlated to LD, then to FD and TD, while in the monosensitized patients, HD was found to be most correlated to FD and independent of LD and TD.This confirms the hypothesis, that patients with HD and patients with LD and/or chronic leg ulcers are likely to be polysensitized [3][4][5].
MCA showed that polysensitization was induced mainly by the combination of medicines, metals, and fragrances, revealing a unique link among these allergen categories.Dyes-colorants might provoke severe barrier dysfunction [11], since they were found to be strongly linked with AD.Finally, the anatomical regions of HD and FD were significantly related to medicines, followed by dyes-colorants, indicating that hand and head are the most exposed body sites to contact haptens [5,20].
In dermatology, image-based screening technologies are developed for the diagnosis and management of skin diseases [14,15,[35][36][37].Recent studies have been conducted to assess the effectiveness of machine learning algorithms and convolutional neural networks models in precisely detecting and diagnosing ACD from patch test images.In both studies, a cohort of 200 ACD patients was examined using a new medical device the Antera ® 3D camera (Miravex Limited, Ireland), while the acquired spectral 3D images were used to map chromophores' concentration (hemoglobin, and melanin) and skin parameters (texture, volume, folds, and fine lines) [15,35].In the first study, the results indicated that the synergy of convolutional neural networks (CNNs) and machine learning algorithms can achieve a success rate of 85% in ACD detection, indicating a high level of correct diagnostic predictions [15].Convolutional neural networks exhibited high accuracy in ACD diagnosis based on the hemoglobin concentration, while the textural information (texture, volume, folds, and fine lines) was insufficient for classifying a positive allergic reaction [35].Furthermore, ML algorithms offer the ability to recognize unique patterns in the datasets from extensive repositories of patient information [13,14].A retrospective analysis of ACD, in which the MCA algorithm was used, revealed unique associations among ACD onset, patch test positive reactions, and patients' demographics [38].In particular, hands were found to be the most affected body site in ACD patients; as well as this, the occupation class was found to be correlated to the anatomic site of dermatitis in the following ascending order: HD > FD > LD > TD.In addition, the type of allergen and gender were also found to be correlated to occupation class [38].The above findings are in accordance with the results of this study.The MCA technique has been, recently, applied to investigate data patterns in different dermatological disorders [39][40][41].In a patient cohort with atopic dermatitis, significant correlations were found among disease severity, gender, age, treatment strategy and quality of life [39].MCA was also applied in datasets from breast cancer patients undergoing radiotherapy to explore the symptoms patterns of radiation-irritated skin [40].MCA plots were contributed to psychometric variables investigation in patients with psoriasis, unveiling that the highest levels of depression/anxiety were associated with low income, middle age and females [41].The results of this study complement the findings of a previous study series on ACD research, which have been conducted using classical statistics [5][6][7]20,[42][43][44][45][46][47][48][49][50][51].
The application of ML algorithms on patch test datasets/images was found to improve the process of contact sensitization detection [14,15,35].Furthermore, the ML algorithms, such as MCA, can unveil unique relationships among psychometric clinimetric and demographic variables [38][39][40][41].So, the integration of AI technology in dermatology can contribute to early ACD diagnosis, enhance its precision and offer an individualized treatment strategy.The automation of diagnosis can, also, reduce clinician workload and diagnosis time, as well as facilitate a wider range of treatment options especially for marginalized regions.Overall, the application of AI in clinical practice can significantly improve the management of ACD patients and their quality of life, and contribute to an up-to-date surveillance of contact sensitization prevalence.AI has the potential to fundamentally transform the healthcare system and improve patient monitoring [13,[52][53][54].
A limitation of this study was the reduced sample size in order to investigate more subtle differences between the mono-and polysensitized groups.It should not be disregarded that the issue of assessing the relationship between poly-and monosensitization is rather wide; thus, in order to provide an overall answer to this question, investigation of all possible associations should be explored, namely, to use every possible allergen for monosensitization.However, this cannot be implemented in a single study since the analysis performed in this analysis should at least be repeated for 30 times (30 specified allergens).Thus, many studies are required to provide an overall understanding.
In conclusion, the MCA analysis allowed identifying the link between patients' demographic and clinical characteristics.The MCA analysis was used to complement the chi-square findings and reveal the relationships among the variables in a compact way.This study showed how the application of machine learning can identify unique patterns in the data.

Conclusions
Patients diagnosed with ACD face an increased risk of developing additional delayedtype hypersensitivity reactions.This study aimed to explore contact sensitization patterns using machine learning techniques to better comprehend the connections among ACD, polysensitization, individual susceptibility, and patient characteristics.Through MCA, we were able to assess how patients' profiles influence ACD prevalence and reveal associations not observable through traditional statistics alone.The analysis highlighted that polysensitization predominantly originated from combinations of medications, metals, and fragrances, indicating a direct link among these allergen categories.Dyes and colorants were identified as potential triggers for severe barrier dysfunction, as they exhibited strong associations with AD.Moreover, anatomical regions such as the hands and face were significantly associated with medications, followed by dyes and colorants, suggesting that these body sites are most susceptible to contact allergens.To the best of our knowledge, this is the first study to utilize machine learning to analyze contact hypersensitivity.
Author Contributions: A.K.: investigation; writing-original draft; methodology; validation; writing-review and editing; software; formal analysis; data curation; resources.A.T.: writing-review and editing; data curation; supervision; resources.A.S.: writing-review and editing; project administration; supervision; resources.V.D.K.: writing-original draft; methodology; validation; visualization; writing-review and editing; project administration; supervision; resources; conceptualization; formal analysis.All authors have read and agreed to the published version of the manuscript.

Figure 3 .
Figure 3. Multiple correspondence analysis of patient group (polysensitized patients or monosensitized patients) in relation with the anatomical site.Key: HD, hand dermatitis; LD, leg dermatitis; FD, face dermatitis; TD, trunk dermatitis.

Figure 3 .
Figure 3. Multiple correspondence analysis of patient group (polysensitized patients or monosensitized patients) in relation with the anatomical site.Key: HD, hand dermatitis; LD, leg dermatitis; FD, face dermatitis; TD, trunk dermatitis.

Figure 5 .
Figure 5. Multiple correspondence analysis of polysensitization.The analysis was performed for (A) allergen category (dyes, colorants, medicines, metals, fragrances), (B) atopic dermatitis (AD) in relation to allergen category, and (C) anatomical regions of allergic contact dermatitis.Key: HD, hand dermatitis; FD, face dermatitis.

Figure 5 .
Figure 5. Multiple correspondence analysis of polysensitization.The analysis was performed for (A) allergen category (dyes, colorants, medicines, metals, fragrances), (B) atopic dermatitis (AD) in relation to allergen category, and (C) anatomical regions of allergic contact dermatitis.Key: HD, hand dermatitis; FD, face dermatitis.

Table 1 .
Baseline characteristics of the patients.

Table 4 .
Results of chi-square analysis according to hand dermatitis.
Key: p-value refers to chi-square test (at the 5% nominal significance level).