Safety Evaluation of Cosmetic Ingredients Regarding Their Skin Sensitization Potential

Up to today, product safety evaluation in the EU is predominantly based on data/information on their individual ingredients. Consequently, the quality and reliability of individual ingredient data is of vital interest. In this context, the knowledge about skin sensitization potential is an explicit need for both hazard and risk assessment. Proper skin sensitization data of the individual chemicals is essential, especially when dermal contact is intended, like for cosmetics. In some cases, e.g., in the presence of irritating chemicals, the combination of individual ingredients may also need to be evaluated to cover possible mixture effects. Today, it seems unlikely or even impossible that skin sensitization in humans can be adequately described by a single test result or even by a simple combination of a few data points (in vivo or in vitro). It is becoming evident that a set of data (including human data and market data) and knowledge about the ingredient’s specific sensitizing potency needs to be taken into account to enable a reliable assessment of skin sensitization. A more in-depth understanding on mechanistic details of the Adverse-Outcome-Pathway of skin sensitization could contribute key data for a robust conclusion on skin sensitization.


Introduction
Dermal contact to skin sensitizing substances (contact allergens) may result in an allergic contact dermatitis (ACD).Repeated contact with a potent sensitizer at a sufficiently high dose level may be sufficient to induce contact allergy (sensitization), and the re-exposure of a sensitized individual to that contact allergen can then result in an allergic reaction (elicitation), defined as ACD.The dose needed for induction is generally higher than the dose necessary for elicitation.To avoid ACD, sensitized individuals have to avoid further contact, even to low doses of this specific allergen [1].
For skin sensitization, skin redness and skin edema are typical responses (adverse outcome) which occur after topical re-exposure to a sensitizer.As the evaluation of skin sensitization potential has been a mandatory legal request for decades for most chemicals, a huge amount of historical animal data with guinea pigs (e.g., Buehler, Guinea Pig Maximization Test (GPMT)) or mice (Local Lymph Node Assay (LLNA)) exists.Much of this data can be found on the internet or published in literature.Since March 2013, the EU cosmetic legislation banned any animal testing for cosmetic ingredients, which is why new approaches became of enormous importance.The challenging target was and is to maintain the high level of safety for both individual chemicals, but preferably for cosmetic products, and to be able to continue the development of new products without compromising their safety.

Regulatory Requirements for Chemicals
Under different legal frameworks, the result of animal testing was typically given as a yes/no answer, e.g., Dangerous Substances Directive (67/548/EEC), before REACH (Regulation (EC) No 1907/2006) [2] came into force.Based on the outcome of standard animal tests (GPMT, Buehler, LLNA), identified skin sensitizers had to be labelled as R43 (may cause sensitization by skin contact) [3], which has to be translated in to H317, Category 1 (may cause an allergic skin reaction) under GHS/CLP (Globally Harmonized System/Classification, Labeling and Packaging) [4] today.

Data Obtained from Animal Testing
In the first instance, each chemical intended for use in cosmetic products has to fulfill the legal requirements defined under REACH and more specifically those of the EU Cosmetic Regulation (EC 1223/2009) [5].Furthermore, the advisory panel for consumer safety to the European Commission, the Scientific Committee on Consumer Safety (SCCS), endorsed in their recently revised "Notes of Guidance for the Testing of Cosmetic Substances and their Safety Evaluation" specific needs for cosmetic ingredients which have to be fulfilled [6].Sufficient information on the skin sensitization potential of individual chemicals/ingredients is always mandatory.The first legally accepted test methods used guinea pigs (GPMT or the Buehler test) [7].Today, not only as a refinement [8] in the sense of the 3R-principle (Refinement, Reduction, Replacement of animal tests) [9], the LLNA in mice [10] is the preferred test system in global chemical regulation with a high level of confidence to correctly distinguish between skin sensitizers and non-sensitizers [11,12] knowing that even this test is not perfect [13].
Both sensitization test methods conducted in guinea pigs have been designed for hazard identification.The LLNA is in addition suitable for assessment of dose-response (hazard quantification), because its results are expressed as numerical values that indicate the sensitizing potency of a contact allergen [14,15].The EC 3 -value is used for comparison of the sensitizing potency between different substances, (EC 3 = the concentration required to induce a three-fold increase in T-cell proliferation above the background).In the meantime, two additional non-radioactive alternatives (Skin Sensitization: Local Lymph Node Assay: DA [16] and Skin Sensitization: Local Lymph Node Assay: BrdU-ELISA [17]) were validated, which are probably able to provide similarly reliable information on sensitization but not yet validated for potency prediction.It is important to mention that, in contrast with the former guinea pig tests, the LLNA focuses on induction of skin sensitization and not on elicitation.With the information on skin sensitization potency, a more appropriate basis for risk communication and management became available [18,19], which finally ensures optimal protection of the naïve public.
As the added value of potency information has been recognized by regulators, the EU regulation under GHS/CLP established, additionally to the existing "yes/no" decision (Cat 1/no) [4], two sub-classes: "1A" with an EC 3 -value ď 2% and "1B" with an EC 3 -value > 2%.Beyond these two sub-categories in the current regulation, a more detailed and implementable approach was introduced by the ECETOC (European Centre for Ecotoxicology and Toxicology of Chemicals), proposing four subclasses (weak, mild, strong and extreme) [20] or specifically for cosmetic ingredients by the SCCP (the EU Scientific Committee on Consumer Products) with only three sub-classes [21].
At least for those chemicals which are exclusively foreseen to be used in cosmetic formulations, the testing situation changed tremendously by legislative recast, which transformed the cosmetic Directive 76/768/EEC into a Regulation (2009/1223/EC) [5].Animal testing for the purpose of cosmetic safety assessment in the EU has been generally banned since March 2013 [22].In the meantime, some other countries have implemented this testing ban as well.Since then, the LLNA or other well-known animal tests are no longer allowed for cosmetic ingredients and new tests/approaches are urgently needed.

Data Obtained from New Testing Approach
During the last decade, much effort was put into the development of new approaches that do not rely on use of living animals to fulfill the current legal requirements.One important step in this paradigm change was the better understanding of the Adverse-Outcome-Pathway (AOP), which was described by the OECD (Organisation for Economic Co-operation and Development) some years ago [23,24].The relevant principles of the AOP are shown in Figure 1.The AOP is helpful, even essential, to ensure science-based investigations on key-events along the chemical/biological step-by-step logic and to measure relevant responses linked to skin sensitization accepting the given legal framework.
to ensure science-based investigations on key-events along the chemical/biological step-by-step logic and to measure relevant responses linked to skin sensitization accepting the given legal framework.In the meantime, a broad spectrum of individual in vitro methods [26] has been developed to investigate the AOP and to get a deeper understanding of it.To date, three of these in vitro methods have been sufficiently validated and evaluated by the EURL-ECVAM (European Union Reference Laboratory for Alternatives to Animal Testing-European Centre for the Validation of Alternative Methods at the European Commission Joint Research Centre), the European Union Reference Laboratory for Alternatives to Animal Testing, to fulfill the acceptance criteria for a new OECD testing guideline (TG).The h-CLAT (human Cell Line Activation Test) [27,28] is not fully accepted yet, but likely to reach the same level as the other two already accepted methods soon.
Specific to the first key event in the AOP, the Molecular Initiating Event, the DPRA (Direct Peptide Reactivity Assay) [29] was developed to analyze the ability of a suspected sensitizer to bind to epidermal proteins, a fundamental step in allergen formation.The second key event (cellular response) is addressed by the KeratinoSens (ARE-Nrf2 Luciferase Test Method) [30] to measure the activation of exposed keratinocytes, and the third key event is analyzed by the mentioned h-CLAT, measuring the activation of dendritic cells.Such kinds of cellular responses are currently understood to be essential for a skin sensitizer.Based on numerous experimental data gathered during the last couple of years, it is obvious that an isolated result from a single assay of this battery is not able to give the necessary sufficient confidence to decide if a tested chemical is a skin sensitizer or not.However, there is consensus about the value of each individual test result for a better understanding of the whole picture of skin sensitization.The applicability of the currently prioritized three in vitro tests is limited for severe cytotoxins and for chemicals which are insoluble in water or in other cellcompatible hydrophilic solvents.Furthermore, strong fluorescent substances may interfere with the detection system of those in vitro methods that are based on fluorescence detection.Probably, the most important drawback of these in vitro methods is the fact that information on skin sensitization potency is limited and currently not robust enough to be used for sub-classification of chemicals.
A lot of effort was spent combining individual test results of the three test methods to ensure a proper prediction on skin sensitization (e.g., when compared to historical animal data).Whereas promising and reliable data have been published for a number of tested chemicals [31], physical/chemical limitations and other constrictions do not always allow the running of all three tests, and conduct of the test battery may even result in inconsistent data [32].Until now, we have to realize that, even when performing in vitro tests close to the AOP, the postulated stringent step-by- In the meantime, a broad spectrum of individual in vitro methods [26] has been developed to investigate the AOP and to get a deeper understanding of it.To date, three of these in vitro methods have been sufficiently validated and evaluated by the EURL-ECVAM (European Union Reference Laboratory for Alternatives to Animal Testing-European Centre for the Validation of Alternative Methods at the European Commission Joint Research Centre), the European Union Reference Laboratory for Alternatives to Animal Testing, to fulfill the acceptance criteria for a new OECD testing guideline (TG).The h-CLAT (human Cell Line Activation Test) [27,28] is not fully accepted yet, but likely to reach the same level as the other two already accepted methods soon.
Specific to the first key event in the AOP, the Molecular Initiating Event, the DPRA (Direct Peptide Reactivity Assay) [29] was developed to analyze the ability of a suspected sensitizer to bind to epidermal proteins, a fundamental step in allergen formation.The second key event (cellular response) is addressed by the KeratinoSens (ARE-Nrf2 Luciferase Test Method) [30] to measure the activation of exposed keratinocytes, and the third key event is analyzed by the mentioned h-CLAT, measuring the activation of dendritic cells.Such kinds of cellular responses are currently understood to be essential for a skin sensitizer.Based on numerous experimental data gathered during the last couple of years, it is obvious that an isolated result from a single assay of this battery is not able to give the necessary sufficient confidence to decide if a tested chemical is a skin sensitizer or not.However, there is consensus about the value of each individual test result for a better understanding of the whole picture of skin sensitization.The applicability of the currently prioritized three in vitro tests is limited for severe cytotoxins and for chemicals which are insoluble in water or in other cell-compatible hydrophilic solvents.Furthermore, strong fluorescent substances may interfere with the detection system of those in vitro methods that are based on fluorescence detection.Probably, the most important drawback of these in vitro methods is the fact that information on skin sensitization potency is limited and currently not robust enough to be used for sub-classification of chemicals.
A lot of effort was spent combining individual test results of the three test methods to ensure a proper prediction on skin sensitization (e.g., when compared to historical animal data).Whereas promising and reliable data have been published for a number of tested chemicals [31], physical/chemical limitations and other constrictions do not always allow the running of all three tests, and conduct of the test battery may even result in inconsistent data [32].Until now, we have to realize that, even when performing in vitro tests close to the AOP, the postulated stringent step-by-step approach is not perfectly confirmed, e.g., for a sensitizer, a positive result in a test specific for the second key event is not always associated with a positive first key event test result.A deeper understanding of test specific applicability domains is needed and should be targeted in upcoming research projects.It has to be mentioned that, during the validation process of the introduced in vitro methods, the in vitro results did not always confirm the historical LLNA data.On the other hand, epidemiological data endorsed the prediction power of these tests in regard to human data [31].

Assessment of Sensitizing Potential Using A Weight of Evidence Approach
With the remaining questions regarding the AOP and some important limitations of the applicability of today's in vitro tests, there is a continuous decrease in reliable data necessary for a correct decision on chemical skin sensitizing potential.Driven by activities at the EURL-ECVAM and OECD level, a promising and probably more appropriate process to evaluate such biologically complex endpoints has already been started.The proposal suggests taking all existing data in the context of skin sensitization into account: Chemical/physical characteristics, experimental data of historical in vivo tests, in vitro tests and, last but not least, appropriate dermatological data [33].An approach dealing with such diverse and complex data needs clear rules and quality controls to finally result in a reliable Integrated Assessment and Testing Approach (IATA) [34].
The usefulness of human data in this process is already acknowledged under CLP by defining: "Relevant information with respect to skin sensitization may be available from case reports, epidemiological studies, medical surveillance and reporting schemes based on human patch testing."[4] (p. 33).Typically, human data of an HRIPT (human repeated insult patch test) or of an HMT (human maximization test) is found suitable under CLP to enable the decision on the two sub-categories "1A" with positive responses at topical doses of ď500 µg/cm² and "1B" for those which have to be exposed >500 µg/cm² to get a response.It has to be mentioned that, for ethical reasons, the HRIPT is only allowed to be performed if induction is not expected due to available pre-information.Thus, only historical HRIPT data are available to support the above-mentioned sub-classification.
Specifically for the evaluation of cosmetic ingredients with regard to skin sensitization, the SCCS published their view very recently in their "Memorandum on use of Human Data in risk assessment of skin sensitization" [35].The SCCS prefer the use of so-called diagnostic patch testing data and stated the need to co-evaluate both human data and (historical) animal data in a final safety assessment.When proper human results of standardized testing are available, this data should be given the highest weight of evidence.
As animal tests are no longer allowed for cosmetic ingredients and the introduced in vitro methods have their specific limitations, there is an urgent need to take all data sources into account as illustrated in Figure 2. Any feedback from the market plays an important role in this proposal and should be understood as an improvement for the "final decision" or a post-validation follow up on skin sensitization.Currently, the cosmetic industry has established an early alert system, which, at the same time, fulfills the legal requirement to conduct market surveillance (cosmetovigilance).This system is appropriate to react fast in case of unexpected adverse occasions in the market and finally could end up in a product withdrawal or a reformulation of a consumer product.Beyond such well-established "one-direction" feedback, indicating statistically relevant occurrence of skin effects, it may even be useful as a system to endorse the absence of any relevant skin sensitization risk in the market with identified chemicals.The knowledge on the absence of adverse skin effects in the market (via the mentioned market surveillance) under typical and realistic exposure conditions is of particular importance.In this context, the "exposure" is described by the number of products sold, the specific use concentration of the ingredient/chemical of interest and the typical skin contact scenario (duration, frequency, rinse-off, leave-on).Such shift in paradigm, to also analyze the "non-complaints", could be useful to respect the complexity of skin sensitization response and to guarantee the necessary high level of safety, in regard to possible mixture effects but also the use of sometimes very small concentrations in consumer products.

Product Safety Evaluation
The safety evaluation of cosmetic products is based on information of the individual ingredients which are classified according to CLP/GHS possibly including information on sensitization potency.For non-cosmetic products, the classification of the ingredients and their individual concentration will determine the labelling of the final product.This principle, which is described in more detail below, is also relevant for the safety evaluation of cosmetic products, although they do not need to be classified according to CLP/GHS.
Mixtures other than cosmetic products have to be classified as skin sensitizing along with the former Dangerous Preparation Directive (DPD, 1999/45/EC) [36], or according to GHS/CLP [4], when an identified skin-sensitizing ingredient is equal to or exceeds 1% in the product.Testing of a mixture may impact the classification as a skin sensitizer, but following the current regulation, the existence of a classified sensitizer has to be disclosed in the product specific Material Safety Data Sheet (MSDS) when present at ≥0.1% in the product.
Corresponding to the above-mentioned potency classes, applied for sub-classification of individual ingredients, the threshold values for the classification of mixtures have been adapted accordingly.Specific cut-off values, related to the concentration and potency of individual ingredients, were proposed by the ECETOC beyond the current regulatory ones to decide on the classification of products [37].The different approaches are contrasted in Table 1.
Table 1.Cut-off values of sensitizing ingredients leading to classification of a related mixture/product as a sensitizer.H317 = may cause an allergic skin reaction, R43 = may cause sensitization by skin contact.

Product Safety Evaluation
The safety evaluation of cosmetic products is based on information of the individual ingredients which are classified according to CLP/GHS possibly including information on sensitization potency.For non-cosmetic products, the classification of the ingredients and their individual concentration will determine the labelling of the final product.This principle, which is described in more detail below, is also relevant for the safety evaluation of cosmetic products, although they do not need to be classified according to CLP/GHS.
Mixtures other than cosmetic products have to be classified as skin sensitizing along with the former Dangerous Preparation Directive (DPD, 1999/45/EC) [36], or according to GHS/CLP [4], when an identified skin-sensitizing ingredient is equal to or exceeds 1% in the product.Testing of a mixture may impact the classification as a skin sensitizer, but following the current regulation, the existence of a classified sensitizer has to be disclosed in the product specific Material Safety Data Sheet (MSDS) when present at ě0.1% in the product.
Corresponding to the above-mentioned potency classes, applied for sub-classification of individual ingredients, the threshold values for the classification of mixtures have been adapted accordingly.Specific cut-off values, related to the concentration and potency of individual ingredients, were proposed by the ECETOC beyond the current regulatory ones to decide on the classification of products [37].The different approaches are contrasted in Table 1.
Table 1.Cut-off values of sensitizing ingredients leading to classification of a related mixture/product as a sensitizer.H317 = may cause an allergic skin reaction, R43 = may cause sensitization by skin contact.Besides such potency related classification systems for product evaluation, the Quantitative Risk Assessment (QRA) [19,38], originally developed for fragrances [39], is under discussion concerning its applicability for general cosmetic ingredients and products.The QRA principles are based on the understanding of the necessity of dermal penetration of an individual quantity of an allergen to be able to induce skin sensitization.In other terms, the QRA is used to evaluate the risk to become sensitized by an allergen via the exposure to consumer products.The needed amount for this allergen is determined by its individual sensitization potency and may be impacted by the individual composition of the product [40].The highest dose not inducing skin sensitization, the No-Expected-Sensitizing-Induction-Level (NESIL), is in essence adjusted by some well-defined uncertainty factors (Sensitization Assessment Factors (SAFs)) [40] in order to calculate an acceptable exposure level (AEL).In addition, a consumer exposure level (CEL) is calculated based on typical product uses.Then, the AEL is compared with the CEL, whereby the AEL should be greater than or equal to the CEL for a safe product.The NESIL may be derived from animal (historical data or data of chemically very similar substances) or human test data.The systematic analysis and use of the QRA concept, with the focus on fragrance materials, has been deled over the last couple of years in a cooperation between the IFRA (International Fragrance Association) and the European Commission in the IDEA-project (International Dialogue for the Evaluation of Allergens) with annual reports on the progress [41].
One of the fundamental starting points in the AOP of skin sensitization is the bioavailability of the allergen, typically the absorption by the skin and exposure of the epidermal keratinocytes and dendritic cells.It was shown that not the concentration of the allergen, but much more the dose per area of the skin [mg/cm²] is essential for sensitization [42].Crossing the stratum corneum, the most important skin barrier, could be impacted by other components of a complex mixture/product.Different biological effects could result when testing chemicals alone or in a complex mixture regarding skin sensitization.Such vehicle effects are more relevant for the potency of chemicals than for the hazard identification as has been found when testing identical chemicals with different vehicles [43].
With this understanding, it could be beneficial to get, additionally to the ingredient data, more information on the product side.Product composition, i.e., the combination of specific ingredients at well-defined concentrations, and habitual product use are likely to impact the occurrence of skin responses like skin sensitization.Compared to actual product exposure, hazard data on individual chemicals tested in vivo or, nowadays, in vitro is typically and intentionally based on unrealistically high test concentrations.Today, people come in daily contact with tens of thousands of chemicals via food or by applying consumer goods additionally to the exposure to environmental factors.The individual risk to become sensitized is not just triggered by the existence/distribution of an individual allergen but also related to exposure details.It should be of enormous interest to get better knowledge of their impact on the reaction to products, which is why standardized market surveillance data is of enormous value as a kind of post-validation follow up.
Very low concentrations of individual sensitizers with low potency used in consumer products may be acceptable and will not necessarily induce sensitizing effects as predicted on the basis of preclinical ingredient data.To avoid arbitrary prejudgment of useful ingredients and to enable a robust and relevant risk management, investigations in this direction should be promoted.

Discussion
It is becoming obvious that the public is concerned about animal tests as the exclusive or even the most relevant source for chemical and in particular for cosmetic products safety [44].This trend is noticed in many countries including China and India.It is also obvious that typical responses measured in standard animal tests and applied for the decision on safety are not always scientifically fully understood.The dilemma of limited fundamental knowledge on biological processes and regulations contributes to the current lack of reliable and robust in vitro methods.In case of skin sensitization, much effort has been invested to reach a better understanding of such fundamentals, e.g., starting with individual chemicals [45].
Despite these limitations, the unbroken desire for new cosmetic products, emphasizes the need for new safety assessment approaches.About three years after the animal testing ban for chemicals exclusively used in cosmetic products came into force, the current set of tools suitable to evaluate new chemicals for the market is limited.One reason for this is the idealistic, even simple expectation to get sufficient information on the complexity of skin sensitization just by one or two in vitro test results.Another reason is the currently limited experience with applying an integrated approach for a reliable and robust decision, based on a set of diverse data from different sources.It is becoming obvious, that the IATA approach is needed to get a more relevant understanding of safety details and to be able to invest in the development of new chemicals and products without the use of living animals.To be able to predict the risk of ACD, it is important to consider real exposure conditions and mixtures that people really come into contact with [46].Recently, the US Interagency Coordinating Committee for the Validation of Alternative Methods (ICCVAM) published their work on computational approaches to effectively integrate data from different sources [47].As this promising work on data integration does not consider market surveillance data of real exposure, this missing data source may be an added value in new investigations in the future.A balanced risk identification and, consequently, the right risk management is of crucial interest in our industrial world.

Conclusions
Today, it seems unlikely or even impossible that skin sensitization in humans can be adequately described by a single test result or even by a simple combination of a few data points (in vivo or in vitro).It is becoming evident that a set of data (including human data and market data) and knowledge about the ingredient's specific sensitizing potency needs to be taken into account to enable a reliable assessment of skin sensitization.A more in-depth understanding on mechanistic details of the AOP of skin sensitization and in particular a better understanding of test specific applicability domains could contribute key data for a robust conclusion on skin sensitization.Well analyzed market information on the daily use of millions of products without any health concern regarding skin sensitization, even in complex mixtures, could be an additional valid data source in the future.

Figure 1 .
Figure 1.Modified schema of the Adverse Outcome Pathway for skin sensitization developed by the OECD [23] and updated by the related AOP WIKI [25].

Figure 1 .
Figure 1.Modified schema of the Adverse Outcome Pathway for skin sensitization developed by the OECD [23] and updated by the related AOP WIKI [25].

Figure 2 .
Figure 2. Complexity of data sourcing for a proper safety assessment on skin sensitization in the context of the current animal testing ban.

Figure 2 .
Figure 2. Complexity of data sourcing for a proper safety assessment on skin sensitization in the context of the current animal testing ban.