Contact Allergy to Fragrances: In Vitro Opportunities for Safety Assessment

The majority of cosmetic products contain fragrances to make products more pleasant to the consumer, as we all like goods that smell nice. Unfortunately, contact allergy to fragrance compounds is among the most frequent findings in patients with suspected allergic contact dermatitis. In order to revert this and to reduce contact allergy to cosmetics, it is imperative to improve safety assessment of cosmetic products for skin sensitization. In the era of animal ban for cosmetic ingredients, this represents a challenge. Luckily, in the last decades, substantial progress has been made in the understanding of the mechanism of chemical-induced contact allergy and several in vitro methods are available for hazard identification. The purpose of this manuscript is to explore the possibility of non-animal testing for quantitative risk assessment of fragrance-induced contact allergy, essential for cosmetic products, which cannot be tested on animals.


Introduction
The use of fragrances is widespread and they can be found in everyday products, including hygiene products, perfumes, fragranced cosmetics, but also in housecleaning products. This widespread use increases the risk of exposure and development of contact allergy in genetically predisposed individuals [1]. Skin sensitization has been identified as the critical adverse effect for a wide range of fragrances, and the disease can be severe with a significant impairment of quality of life [1]. Among the most common allergens, isoeugenol, cinnamyl alcohol, eugenol, farnesol, and citral can be mentioned, while lilial, hexyl cinnamal, citronellol, linalool, and limonene are less common [2].
Allergic contact dermatitis (ACD) is a predominantly CD8+ T cell-mediated immune disease toward low molecular weight chemicals (typically < 500 Da), denominated haptens that contact and penetrate the skin, resulting in erythema and eczema [3]. ACD is a common phenomenon and epidemiological studies suggest that it affects up to 20% of the population [4]. There are currently more than 4000 substances identified as contact allergens, and of the chemicals registered at ECHA, approximately 30% of them were classified as allergens (hazard). Regarding fragrances, in a Belgium study, of over 10,000 dermatitis patients tested for suspected contact dermatitis over a period of 15 years, 14.5% were positive to at least one of fragrance mix I or II, Myroxylon Pereirae or hydroxyisohexyl 3-cyclohexene carboxaldehyde [5].
Typically, ACD develops in two phases, the sensitization or induction phase leading to the priming of hapten-specific CD8+ T cells, and the elicitation phase occurring after challenge and resulting in the development of skin inflammation. In order for a chemical to function as a contact sensitizer, the chemical must penetrate the skin, react with carrier endogenous protein to form a complete antigen, it must cause epidermal and dermal inflammation to induce dendritic cell

Quantitative Risk Assessment for Skin Sensitization
Quantitative risk assessment for skin sensitization is based on the knowledge that both the induction and elicitation phases are dose-dependent phenomena, which implies that a threshold of reactivity can be identified under foreseeable conditions of exposure [11]. Therefore, the fact that a compound is a skin allergen does not mean that it cannot be used in a cosmetic product proven that it is used at levels well tolerated by most individuals [12]. Currently, quantitative risk assessment for skin sensitization focus on the prevention of the induction phase, while the proper labeling of the product is used to protect people already allergic to a specific component. The main reason is that the induction of sensitization to a particular chemical requires higher doses to the one eliciting a reaction in a previously sensitized subject, making it easier to establish the threshold compared to the establishment of the one for the elicitation phase, which is highly dependent from individual sensitivity and the history that led to sensitization in the first instance, and therefore more difficult to establish. In the case of fragrances, however, for some of them these data are available. For example, for linalool hydroperoxide allergic reactions are observed to a concentration to 0.056%, while for hydroxyisohexyl 3-cyclohexene carboxaldehyde 50% of the allergic patients react to concentration of 0.18-0.34%, depending on the products [1].
Quantitative risk assessment for skin sensitization is different from the safety evaluation of cosmetic products and the calculation of the margin of safety for the different ingredients, which is meant to prevent systemic toxicity. Quantitative risk assessment for skin sensitization is done to prevent the induction of new sensitization.
In the quantitative risk assessment for skin sensitization, appropriate safety factors must be introduced to take into consideration that the sensitization threshold can be influenced by the formulation in which the chemical is included, and vary between individuals [12].
The formula used to calculate the acceptable exposure level (AEL) is based on the estimation of the no expected sensitization induction level (NESIL) and the application of a sensitization assessment factor (SAF): The NESIL is established based on weight of evidence, which takes into account all data pre-clinical and/or clinical data, i.e., human data (human repeated insult patch test NOEL in µg/cm 2 ), the local lymph node assay (EC3 value in µg/cm 2 ). It is important to express values in µg/cm 2 , as the relevant dose metric for skin sensitization is the amount of chemical allergen per unit area on the skin [12,13]. The local lymph node assay (LLNA) is the preferred animal method for the identification of contact allergens as it provides an accurate and reliable identification of skin sensitization hazards and it is useful for the relative potency information [14]. The SAF is the multiple of the three defined areas taking into consideration the inter-individual variability (SAF = 10), matrix effects (SAF = 1-10), and use considerations, i.e., rinse off vs. leave on products, frequency, site of exposure (SAF = 1-10). Thus, SAF can increase from 10 to 1000. The AEL is then compared with the consumer exposure level (CEL, in µg/cm 2 ), which is calculated similarly to what is done for the calculation of the margin of safety for cosmetic ingredient for systemic toxicity. Dermal aggregate exposure should also be taken into consideration, especially for fragrances which are contained in several cosmetic products daily used.
To support safe use of the potential skin sensitizer, the risk will be considered acceptable if AEL > CEL or AEL/CEL ratio > 1, which indicates that the estimated daily exposure for an ingredient must be equal or below the acceptable exposure level to consider safe the use of this ingredient. Following this procedure, the risk of inducing new sensitization should be reduced. The possibility of conducting a quantitative assessment of the risk of inducing new sensitization is strictly dependent on having the NESIL available, and this represents a challenge for new cosmetic ingredients as animal tests are no longer allowed.

In Vitro Possibility to Establish a NESIL
For a new cosmetic ingredient, or for a new formulation containing ingredients known to have the potential to induced contact allergy, i.e., fragrances or preservatives, animal tests cannot be longer used according to the European 7th Amendment to the Cosmetics Directive, therefore it is imperative to have alternatives in place. Currently, test strategies for skin sensitization based on the AOP are preferred. Coming to alternatives to animal testing and potency assessment, there has been relatively modest progress, besides the local lymph node assay, in the definition of appropriate experimental in vivo and consequently in vitro models. Robust and reliable methods for the assessment of relative skin sensitizing potency is central for the classification of contact allergens for the Globally Harmonized System of Classification and Labelling of Chemicals (GHS) and for the development of accurate assessment of risk [10,[15][16][17]. Based on human data, 89 fragrances have been classified according to their human skin sensitization potency, which includes six categories, where category 1 (extreme) includes chemicals with human repeated insult patch test/human maximization test NOEL <25 µg/cm 2 and frequency > 3% in the diagnostic patch test, and category 6 are the non-sensitizer [18]. Of these, none of the fragrances were assigned to category 1, whereas 11 were category 2, 22 were category 3, 37 were category 4, and 19 were category 5. Fragrances in category 2 include oakmoss, citronitrile, trans-2-hexenal, jasmalone, creosol, isoeugenol, pomarose, 6-methyl-3,5-heptadien-2-one, trans-damascone, tea leaf absolute, methyl heptine carbonate [18].
Currently, the understanding the biological mechanisms underlying the allergen potency and the possibility of defining it using in vitro methods represent an area of intense research, where potency is defined as the dose and/or the frequency of occurrence of skin sensitization. It is believed that the extent of chemicals allergen-induced keratinocytes/dendritic cells activation/maturation and lifespan may drive T cell polarization (e.g., Th1, Th2, Treg), and the magnitude of activation, which then results in different in vivo potency [15].
Mathematical models to predict skin sensitization potency class useful for classification and labeling have been proposed [19]. As reported by Zang [19] based on in silico and in vitro methods, the best one-tiered model predicted LLNA outcomes with 78% accuracy, and human outcomes with 75% accuracy (LLNA predicts human potency categories with 69% accuracy). While these approaches may provide valuable information for assessing skin sensitization class of potency (e.g., strong and weak), they do not provide data useful for quantitative risk assessment of skin sensitization. Therefore, additional efforts are clearly needed both from a biological and regulatory point of view. One important limitation of the individual methods developed for the hazard identification of contact allergens and the testing strategy is that they are unable to assign a NESIL to a chemical, which is one of the main advantages of the local lymph node assay.
Currently, methods based on full thickness skin models are being explored to resolve this issue (e.g., epidermal equivalent IL-18 potency assay, SENS-IS, SenCee Tox, EpiSensA). The reconstituted human epidermis models more closely resemble the human epidermis structure, and a dry surface allows topical application of chemicals of any physical-chemical properties, mimicking human exposure, and the possibility of testing difficult chemicals, i.e., chemicals with very high LogP (>4) may be insoluble or incompletely soluble in aqueous media or chemicals unstable in water.
In Figure 1, one of such methods, the epidermal equivalent IL-18 assay is described [20,21].
Cosmetics 2018, 5, x FOR PEER REVIEW 4 of 10 lifespan may drive T cell polarization (e.g., Th1, Th2, Treg), and the magnitude of activation, which then results in different in vivo potency [15]. Mathematical models to predict skin sensitization potency class useful for classification and labeling have been proposed [19]. As reported by Zang [19] based on in silico and in vitro methods, the best one-tiered model predicted LLNA outcomes with 78% accuracy, and human outcomes with 75% accuracy (LLNA predicts human potency categories with 69% accuracy). While these approaches may provide valuable information for assessing skin sensitization class of potency (e.g., strong and weak), they do not provide data useful for quantitative risk assessment of skin sensitization. Therefore, additional efforts are clearly needed both from a biological and regulatory point of view. One important limitation of the individual methods developed for the hazard identification of contact allergens and the testing strategy is that they are unable to assign a NESIL to a chemical, which is one of the main advantages of the local lymph node assay.
Currently, methods based on full thickness skin models are being explored to resolve this issue (e.g., epidermal equivalent IL-18 potency assay, SENS-IS, SenCee Tox, EpiSensA). The reconstituted human epidermis models more closely resemble the human epidermis structure, and a dry surface allows topical application of chemicals of any physical-chemical properties, mimicking human exposure, and the possibility of testing difficult chemicals, i.e., chemicals with very high LogP (>4) may be insoluble or incompletely soluble in aqueous media or chemicals unstable in water.
In Figure 1, one of such methods, the epidermal equivalent IL-18 assay is described [20,21]. A. Identification of contact sensitizer by assessing IL-18 release [20] Using a cut-off of 5-fold increase in epiCS TM or 2fold in Epiderm TM RhE models of IL-18 above vehicle control, depending from the laboratory, 100 % sensitivity and 88-100 % specificity were obtained (n= 17 allergens; n= 13 nonsensitizers), with an overall accuracy of 95%. [20,21]. The in vitro RhE IL-18 potency test data (EE-EC50) showed a superior correlation with human DSA05 (μg/cm2) data (Spearman r = 0.850; P= 0.0061) compared to LLNA data (Spearman r = 0.597; P= 0.0542). Similarly, for the release of IL-18 (SI-2) the correlation with human DSA05 data (Spearman r = 0.8333; P= 0.0154) was better compared to LLNA data (Spearman r = 0.627; P= 0.060). Based on these results, we proposed for an unknown chemical to estimate in vivo values by interpolating in vitro data generated from the unknown chemical from curves built using known contact allergens. An example is provided in the graphic below.  In the epidermal equivalent IL-18 assay, ingredients (or final products) are topically applied to the epidermal equivalent surface for 24 h. Tissue viability is assessed by the MTT reduction assay, while IL-18 is measured in the culture medium by ELISA. The in vivo values will then be estimated using standard curves constructed from data obtained by testing reference allergens of different potency (e.g., DNCB, isoeugenol, eugenol, benzocaine). In the graphic shown in Figure 1, data variability for both in vitro and in vivo data (where available) was omitted on purpose as currently for the calculation of the margin of safety a single value is still used (i.e., NOAEL). Data variability can be easily included. At present, only a limited number of chemicals have been tested, additional chemicals need to be tested to confirm the results and the usefulness of the proposed method, but the results are very encouraging.

B. Identification of potency based on in vitro EE-EC50 or IL-18 SI-2 values and in vivo interpolati on of LLNA EC3 (%) or human DSA05 (μg/cm 2 ) values
The advantage of the epidermal equivalent IL-18 assay is that it aims to estimate a NESIL and not just potency class as the other methods do, which while being essential for classification and labeling is not sufficient for quantitative risk assessment. Recently Natsch et al. [22], proposed a workflow using structural information, reactivity data and KeratinoSens results to predict an LLNA result as a point of departure, suggesting that sensitization risk assessment without animal testing is possible in most cases. In addition, as mentioned above, potency class might be estimated through Quantitative Mechanistic Modeling or Bayesian statistics [8,9,19,23,24].

In Vitro Methods, Fragrances, and Hazard Identification
For the hazard identification, several fragrances have been tested in different in vitro methods. A very useful published database can be found in a recent review published by Urbisch et al. [25]. From this paper, the data relative to 53 fragrances have been extracted and reported in Table 1. In vivo and in vitro data, including human evidences and animal results, are reported. Results are ranked according to human evidences: Negative, inconclusive, positive, and fragrances with no human data. At a glance, one can immediately see the presence of discordant results, indicating that no test, including the LLNA, is perfect, highlighting the necessity to do more than one test before a conclusion on the risk (or not) of sensitization for humans can be reached. In this regard, the review published by Roberts and Patlewicz will provide important insights on the comparison on the predictive performances of the validated methods [17]. It is clear, that the value of using more than one assay relays on how the different assays compensate for each other's limitations and in relation to the characteristic of the chemical to be tested.
When using in vitro methods to assess effects of fragrances it is important to take into consideration the metabolic capacity of the experimental models, as several fragrances require metabolic activation to form the reactive species. Experimental and clinical studies have shown that fragrance substances can act both as prehaptens and/or prohaptens. Both abiotic (for pre-haptens) and biotic activation (for pro-haptens) are known to be involved in fragrance activation [26]. Within the skin, keratinocytes are the ones providing the metabolic activation [27,28]. Commonly used fragrances, found in essential oils and in many cosmetic products, autoxidize on contact with air, forming hydroperoxides, or can be activated in the skin, forming potent sensitizers that can be an important cause of skin sensitization to fragrances and fragranced products. Among the fragrances known to be pre-haptens, geraniol, D-limonene, and linalool can be mentioned. In Table 1, the pre/prohaptens are marked with an asterisk. In general, pure non-oxidized fragrances seldom cause positive patch test reactions in allergic patients [1]. Some fragrances can undergo both abiotic and biotic transformation, like isoeugenol [29]. The most well-known prohapten pair is cinnamyl alcohol and cinnamaldehyde, where the metabolism occurs within the skin [30]. Taking this into account, predictive testing should include activation steps, meaning that negative results should be carefully considered to respect of the metabolic capacity and/or activation of the tested compound [31].
In Table 1, in vitro results obtained with fragrances are listed according to AOP KEs. Data were extracted from Urbisch et al. database [25]. For the in silico prediction, the OECD QSAR Toolbox v.3.2 was used. It is important to remember that different in silico models may provide different classification. The comparison between the different in silico models and their predictive capabilities goes beyond the scope of this review. Based on in silico data, 7 chemicals were wrongly classified (24%), with six positive showing no alert (benzaldehyde, cinnamyl alcohol, eugenol, farnesol, geraniol, isoeugenol). For KE1, results have been obtained using the DPRA [32]. This assay measures the ability of a chemical to react with peptides containing cysteine and lysine residues (haptenation), which is considered the molecular initiating event for chemical-induced skin sensitization [6]. In the DPRA, several fragrances have been tested with mix results as the system, being an in chemico method, lacks metabolic capacity: Of the fragrances with clear positive or negative results human data (n = 29), 6 were misclassified (21%), with 4 in vivo positive not identified (benzaldehyde, coumarin, farnesol, geraniol).  For KE2, addressing the keratinocyte response, data were obtained using the KeratinoSens [33], in which the activation of keratinocytes is measured by assessing Nrf2-keap1-ARE (antioxidant response element) pathway, associated with the chemical allergen-induced cellular oxidative stress [34]. Also in this case, both false positive and false negative compared to human data can be observed: Of the fragrances with clear positive or negative results human data (n = 29), 6 were misclassified (21%), with 3 in vivo positive not identified or with inconsistent results (3,4-dihydrocoumarin, eugenol, lilial).
Finally, for KE3, addressing dendritic cell activation, data were obtained using the h-CLAT, in which the upregulation of CD54 and CD86 is measured as indicators of dendritic cells activation. Also in this case, both false positive and false negative compared to human data can be observed: Of the fragrances with clear positive or negative results human data (n = 29), 6 were misclassified (21%) and for safranal no data are available, however, with only two in vivo positive not identified (coumarin, isoeugenol).
Overall, data obtained using the non-animal OECD TGs yield good results, but possibly due to the applicability domains of the methods themselves, not all fragrances gave concordant results. The same applies to the LLNA, in which both false positive and false negative results compared to human data can be observed: Of the fragrances with clear positive or negative results human data (n = 29), 7 were misclassified (24%), with 2 in vivo positive not identified (benzaldehyde, coumarin).
The number of fragrances tested in 3D tissue model based assays is more limited, nevertheless, the results hold promise [20,21,[35][36][37], but dataset must be enhanced to move these models to an OECD guideline stage to complement cell-culture based models. For example, in the IL-18 RhE assay only ten fragrances have been tested [20,21], namely benzyl cinnamate, benzyl salicylate, cinnamaldehyde, cinnamic alcohol, citral, dihydroeugenol, eugenol, isoeugenol, limonene, methyl salicylate, all correctly classified.

Conclusions
A reduction of ACD can be achieved by: (1) Correct detection of skin sensitizers; (2) characterization of potency; (3) understanding of human skin exposure; and (4) application of adequate risk assessment and management strategies.
The possibility of defining potency and establishing a NESIL using in vitro methods represents an area of intense research. Some of the in vitro methods based on the use of reconstituted human epidermis offer the opportunity to characterize potency as well as to test complex natural products, mixtures, and final products. These methods should be further explored to assess safety of cosmetic ingredients and final products concerning skin sensitization before testing on humans.