Safety Testing of Cosmetic Products: Overview of Established Methods and New Approach Methodologies (NAMs)

: Cosmetic products need to have a proven efficacy combined with a comprehensive toxicological assessment. Before the current Cosmetic regulation N°1223/2009, the 7th Amendment to the European Cosmetics Directive has banned animal testing for cosmetic products and for cosmetic ingredients in 2004 and 2009, respectively. An increasing number of alternatives to animal testing has been developed and validated for safety and efficacy testing of cosmetic products and cosmetic ingredients. For example, 2D cell culture models derived from human skin can be used to evaluate anti-inflammatory properties, or to predict skin sensitization potential; 3D human skin equivalent models are used to evaluate skin irritation potential; and excised human skin is used as the gold standard for the evaluation of dermal absorption. The aim of this manuscript is to give an overview of the main in vitro and ex vivo alternative models used in the safety testing of cosmetic products with a focus on regulatory requirements, genotoxicity potential, skin sensitization potential, skin and eye irritation, endocrine properties, and dermal absorption. Advantages and limitations of each model in safety testing of cosmetic products are discussed and novel technologies capable of addressing these limitations are presented.


Introduction
Cosmetic products need to have a proven efficacy combined with a comprehensive toxicological assessment.The 7th Amendment to the European Cosmetics Directive has banned animal testing for cosmetic products and for cosmetic ingredients in 2004 and 2009, respectively.Then, the European Cosmetic Regulation N°1223/2009 and the specific Regulation N°655/2013 specify the required data to proof the safety and support the claims.Largely driven by regulatory authorities, a wide range of alternatives to animal testing have been developed and validated for safety testing of cosmetic products and adopted as test guidelines (Figure 1).This review discusses the main in vitro alternative models used in safety testing of cosmetic products and cosmetic ingredients with a focus on regulatory requirements, genotoxicity potential, skin sensitization potential, skin and eye irritation, endocrine properties, and dermal absorption.Advantages and limitations of each model in safety testing of cosmetic products are discussed and novel technologies capable of addressing these limitations are presented.

Overall Context
In Europe, the Cosmetic Regulation N°1223/2009 sets the framework for the safety of any cosmetic product [1].Although, many other geographical areas do not specify the detailed documentation to establish their own frameworks, their regulations share the common goal of ensuring safety of the final consumers.Some ingredients must be included in so-called "positive" lists, for the ones having specific functions (Annex VI for colorants, Annex V for preservatives, Annex V for UV filters).An ingredient with such a function should then comply to the requirements of the given Annex.Some ingredients are prohibited (Annex II) or restricted to particular uses (Annex III).
The origin of those regulatory limitations is mainly safety.In Europe, some of the ingredients are evaluated by the SCCS (Scientific Committee on Consumer Safety), which publishes its opinion with safe conditions of use, before the ingredient is listed in an annex.The SCCS publishes opinions based on the evidence presented to it, combined with guidance.That is helpful, rather than spelling out the prescriptive demand for strict adherence to precise regulatory "guidelines".The European committee regularly provides a guidance for the evaluation of the safety of ingredients [2,3].In the USA, the CIR (Cosmetic Ingredient Review), established from a trade association (currently the PCPC) with the support of the FDA prioritizes and assesses cosmetic ingredients, generally consider groups of similar substances based on chemical families or plant-derived ingredients.The CIR's report does not include the risk assessment.
All regulated ingredients must have a favorable opinion of the SCCS, such as the recent ones on resorcinol, for its use in hair dyes [4], propylparaben as preservative (updated opinion discarding any concern related to endocrine disruption) [5] or octocrylene as UV filter (other update related to endocrine disruption) [6].
However, the committee can also give its opinion on substances for non-regulated uses (titanium dioxide in inhaled products [7] or aluminum in lipsticks [8]).
Some publications can also be available from national authorities, related to particular concern for a country (example of phenoxyethanol in France [9]), or specific investigations allowing a better management of the risk, as in the case of "technically unavoidable concentrations" of heavy metals, studied in Germany [10].
Transversal regulations can have consequences on the safety of the substances used in cosmetic products: the CLP Regulation (classification, labelling and packaging of substances and mixtures) [11] of major importance for CMR (carcinogenic, mutagenic and reprotoxic) substances.The carcinogenic, mutagenic, and reprotoxic substances are considered as the most dangerous substances; their harmonized classification in Europe is rarely based on epidemiological information (asbestos, benzene, etc.) and more generally based on experimental results in animals (musk xylene, Disperse Yellow 3, etc.).The Annex XVII of REACH can be of major importance for a very limited number of substances: D4 (cyclopentasiloxane) and D5 (cyclotetrasiloxane) are prohibited silicones in rinsed products above 0.1% (under entry 70 of the Annex XVII of REACH for restrictions) [12].This decision is not triggered by toxicological properties but by their fate in the environment: these are the PBT and vPvB effects (for Persistent, Bioaccumulative and Toxic, Very Persistent, Very Bioaccumulative).
The list of SVHC (substances of very high concern) includes substances based on concern regarding reprotoxicity, carcinogenicity, endocrine disruption or effects for the environment, PBT or vPvB.
Those programs are somehow linked to each other (the general concern of endocrine disruption justified a call for data from the European Commission to revise the opinions of the SCCS (e.g., Benzophenone-3, octocrylene, benzyl salicylate…) in the past two years.

Substances restricted by an Annex
When the SCCS receives a mandate from the European Commission to assess the safety of a substance for a regulated function, the opinion is based on the analysis of the scientific dossier submitted by the industry.
The scientific opinion considers each endpoint, including local tolerance (skin irritation, phototoxicity when relevant), genotoxicity, systemic toxicity including reprotoxicity and sub-chronic/chronic toxicity.Characterization of dermal absorption is essential to calculate the SED (systemic exposure dose).
The exposure of the substance is considered as its expected concentration in cosmetic products, either in one given product or in several products, when a broad use is expected, as it would be for a preservative.


Substances not restricted by an Annex Any other substance, ingredient, or impurity has the obligation of being safe for the consumer, based on the toxicological profile, as required by the Annex I and Guidelines [13], using regularly updated data from supplier or literature.
There are two points of view: the one of the supplier of the ingredient and the one of the Responsible Person for a cosmetic product using the ingredient (the Responsible Person being the legal entity in Europe responsible for the product, generally the manufacturer).They do not have the same regulatory obligations.However, they should have the same purpose: consumer safety.
Any supplier of a cometic ingredient, such as any company which manufactures and markets a substance in the European Union, must register its substance according to the annual tonnage.
Even if the intrinsic toxicity of a substance is independent from its production, the number of toxicological results required in a REACH registration dossier depend on the annual tonnage.Highly toxic substances and substances of low toxicity have the same requirements (but important concern should be taken into account among the program of SVHC: substances of very high concern).No toxicological data are requested for substances registered below 1 to 10 tpa (ton per annuum) and increasing information is required to be submitted with increasing tonnage bands.
For tonnage of 10-100 tpa (Annex VII): toxicological requirements include data for in vitro skin irritation/ corrosion, in vitro eye irritation, skin sensitization, in vitro gene mutation in bacteria, acute toxicity, and short-term toxicity (28 days).
At 10 to 1000 tpa (Annex VIII): toxicological requirements include data for in vitro mutagenicity study in mammalian cells or in vitro micronucleus study, in vitro gene mutation in mammalian cells, in vivo skin irritation, in vivo eye irritation, possibly testing proposal for in vivo genotoxicity, acute toxicity, and screening for reproductive/developmental toxicity.
At 100 to 1000 tpa (Annex IX) following endpoints are added: the sub-chronic toxicity (90 days), prenatal developmental toxicity in one species, and extended one-generation reproductive toxicity.
Finally, above 1000 tpa (Annex X) a long-term repeated dose toxicity (≥ 12 months) if triggered, with developmental toxicity in a second species, extended one-generation reproductive toxicity, and carcinogenicity.
Several reviews of these methods are available; we can cite a very recent one focused on cosmetic and REACH regulations [14].Particularly, assessing the safety of the consumer should include the assessment of any potential regarding endocrine disruption, but this endpoint is not required in the REACH registration dossiers.The inclusion of such criteria by the CLP Regulation could change things in the future.
It is then important to realize that for ingredients produced below 10 to 1000 tpa, no information is available about the DNA damage (micronucleus test), and below 100 tpa, neither any sub-chronic toxicity nor any information on the full cycle of reproduction is known.A supplier of cosmetic ingredients should then think about the need of the cosmetic brands (Responsible Persons in general) who need to prove the safety of each ingredient.
The cosmetic brand (the Responsible Person) is the one responsible of the product.Studies can be made on the product, to confirm a good acceptability in humans.It is mostly to confirm the absence of eye and skin irritation, by in vitro test and other complementary tests in humans (the grail being the use test in normal conditions of use, to confirm the absence of objective irritation and absence of signs of discomfort).The tests for photo-toxicity or skin sensitization are rarely performed.It should be reminded that the Human Repeat Insult Patch Test (HRIPT) is non ethical and usually the historical data are significantly poor from a statistical point of view using a small size panel [15,16].However, the new in vitro tests for skin sensitization are quite promising, particularly if they can cover multiple Key Events of the Adverse Outcome Pathway, and if they can be applied to the finished product.Both the SENS-IS and Genomic Allergen Rapid Detection (GARD) assays analyze the genomic response of the cells to the exposure of the substance or the product to predict sensitization, including its potency [17], with GARD assay being able to quantify the dose-effects relationship, thus providing a good perspective for its use in quantitative risk assessment [18].Any test done on the finished product, as those two last ones, and the tests made on eleuthero-embryo from fish or amphibians discussed in this article are of particular relevance, since a large part of the risk assessment on the product in based on individual data of substances.
The major part of the safety then relies on the toxicological data of the substances.The toxicological results can come from the supplier, when they have a REACH registration dossier, or when they voluntary produce additional in vitro data.It can also be existing data from literature or in silico predictions Quantitative Structure-Activity Relationship (QSARs) or read-across.The safety assessor, working with the Responsible Person, makes a comprehensive search of existing toxicological information to write the toxicological profile of the ingredient, and possibly identify any data gaps.Pragmatically, toxicological profiles of ingredients often lack some information.Among the most current data gaps includes following endpoints: skin sensitization, DNA damage, chronic toxicity, and dermal absorption.With one exception, in vitro assays exist for all these endpoints, most of them with OECD guidelines, or with good results of validation.When it is chosen not to perform the test (data waiving), a rationale is absolutely needed as justification.In vitro micronucleus test is one of the missing test which has no reason to be lacking, since an in vitro OECD test exists for a long time.Probably there is a misunderstanding of the Responsible Person who might not realize that it is absolutely complementary to the in vitro mutagenicity test in bacteria, since both tests investigate two independent types of abnormalities of DNA, both predictive of cancer.
In some cases, a reliable in silico prediction, with one, or even better, consensus from several complementary software, can waive or replace such tests.This solution can be cheaper than testing and the rationale can be robust.In silico predictions are also a good strategy when associated to partially concluded results, such as the in vitro mutagenicity test.This test is not sufficient to investigate genotoxicity, but a QSAR prediction can provide a good orientation before performing the in vitro micronucleus assay, to better understand the potential of a substance to induce DNA damage.Such approaches are widely accepted for the regulatory assessment of pharmaceutical impurities under ICH M7 guideline [19].
Currently, with other methods gathered in the so-called NAMs (New Approach Methodologies), read-across is a major tool to predict the systemic toxicity of a substance in the absence of any animal testing.Finding structural analogues, selecting them based on relevant criteria, and predicting an endpoint-specific toxicity based on the results formerly obtained with those analogues is both a very ethical way to use existing data, and provides a relevant and reliable solution for predicting sub-chronic/chronic toxicity and reprotoxicity [20].This parameter is one of the criteria of toxicokinetic (absorption, distribution, metabolism, and excretion; ADME) which should be better used in the future to enhance the application of NAMs [21].
Last but not least, although dermal absorption could help calculating a precise margin of safety, it is hardly investigated.This rare information is of equal importance in the calculation of the MoS (Margin of Safety) as the systemic NOAEL (or Point of Departure) and the exposure.Generally unknown, it is, by default, estimated to 50% according to the Notes of Guidance from the SCCS.For some substances, a "very low rate" can justify to avoid investigating systemic toxicity.The mathematical modeling of dermal absorption is an important field of research [22] but no robust model is currently available.Some models identified good predictivity but were limited to small substances below 300 Da [23].A recent preliminary retrospective analysis of the ingredients with opinions of the SCCS showed that physicochemical properties of the substance can differentiate the ones with low and high dermal absorption (the threshold being at 2%) [24].This article does not detail the requirement on impurities, which also deserve the attention of the safety assessor.CMR impurities are prohibited, but we can recommend to pay attention to any impurity, since this could have adverse effects.

Genotoxicity Assessment of Cosmetic Products
In the second part of the 20th century, many research teams [25] have developed different kind of tests based on different mechanisms showing direct DNA damages (DNA adduct, unscheduled DNA synthesis, DNA repair chromosomal aberrations), to detect direct DNA reactive substances that alter DNA and therefore the genetic code.In the 70s, Bruce Ames developed the most famous bacterial Reverse Mutation test, the "Ames test" [26].The most relevant mutagen tests were quickly taken into account by regulatory authorities to identify genotoxic substances in cosmetics [27] and also by cosmetics companies for optimization of the methods and refined cosmetics ingredients [28].Test battery strategies for genotoxicity evaluation have been issued by regulatory agencies and guidelines are published by OECD.
In the safety assessment of cosmetic ingredients, the assessment of genotoxic potential is crucial.The SCCS 10th Revision [2] recommended to use an in vitro battery of two tests.One test for the evaluation of the potential for mutagenicity: bacterial reverse mutation test (OECD 471) Ames test [29] and a second in vitro micronucleus test (OECD 487) [30] for the evaluation of chromosome damage (clastogen and aneuploidy).The combination of both tests allowed the detection of all relevant genotoxic carcinogens [31,32].The test system should be exposed to the test item both in the absence and in the presence of a metabolic activation system (S9-fraction from the livers of rats treated with Aroclor 1254 or a combination of phenobarbital and β-naphthoflavone) [33].
The mutagenicity: bacterial reverse mutation test should be performed in the first instance, as the result could lead to an end of the project.The nature of test item has an impact on the method that should be used and consequently on the expected result.For pure compounds, if using the Ames test, the structure of the test item should be considered.Thus, depending on the nature of the test article the metabolic activation system should be adapted (SCCS/1532/14).For nanoparticles, a gene mutation test in mammalian cells (OECD 476), or mouse lymphoma assay (OECD 490) should be performed instead of the Ames test.For complex mixtures such as biological compounds or plant extracts the, presence of amino acid producing a feeder effect could be observed.In this case "treat and wash" method [34,35] could be used.The presence of flavonoids i.e., quercetin or kaempferol in plant extract could lead to increases in the number of the revertant colonies [36], in such case the quantification of this kind of substances in the plant extract is essential to explain the results obtained [37,38].
Before engaging into the second genetic toxicology test, an in-silico assessment (Quantitative Structure-Activity Relationship QSAR, DEREK, Multicase, or Compound Toxicity Profile) is useful to predict the clastogen potential of the pure chemical in accordance with the stringent quality criteria and the validation principles laid down by the OECD 487 [39].In case of alert or when the prediction is out of domain, the micronucleus test should be performed following OECD 487 guideline.Recently, this technic has been refined in order to avoid a "false positive".The cell lines (V79, CHO and CHL) were consistently more susceptible to cytotoxicity and micronucleus induction than p53-competent cells and are therefore more susceptible to giving misleading positive results.These data suggest that a reduction in the frequency of misleading positive results can be achieved by careful selection of the mammalian cell type for genotoxicity testing [40].
One of the strengths of the cosmetics industry is the exclusive use of in vitro tests and consequently in vitro micronucleus has been also adapted to high-throughput technology, i.e., with only 10 milligrams, a micronucleus test is performed by flow cytometry [41] or using automated slide image analysis systems [42] and with double labelling telomere and centromere the distinction between aneugen and clastogen effect could be done [43,44].
When the results from both tests are clearly negative, the test item has no mutagenic potential.On the other hand, when the results from both tests are clearly positive, the test item is considered as being mutagen.In both cases further testing is not mandatory.
When one of the two tests gives a positive result, the test item is considered an in vitro mutagen.Further testing is required for excluding mutagen (clastogen) potential of the test item assessed.
Equivocal results for mixture plant extract can be obtained in particular in micronucleus test when excessive osmolarity, pH or excessive concentration leads to a high level of cytotoxicity [43,44].In this case the toolbox for further evaluation in WoE (weight of evidence) approach is described in the SCCS recommendation: "The comet assay [45] in mammalian cells or on 3D-reconstructed human skin [46] is a tool which can support a WoE approach in the case of a positive or equivocal gene mutation test in bacteria or mammalian gene mutation test." This battery of tests leads to the identification of substances named initiators.They and their metabolites are DNA reactive carcinogens.In the theory of carcinogenesis, a second kind of substances are the promotors, they are non-genotoxic carcinogens.The SCCS/1602/18 (2018) recommends using the cell transformation assay (CTA) [47,48] as an alternative new test to in vivo carcinogenesis studies, to detect genotoxic and non-genotoxic carcinogens.
Progress in the knowledge of stem cells makes it is possible to propose new biological models to be closer to the in vivo exposure such organoid models [49] or for a screening approach such as the ToxTracker ® model.The total blood is also a robust alternative, as it is easily available and extensively studied.In silico, and in the next future, AI (artificial intelligence), for analysis and prediction will be increasingly relevant, with the concept to build a "finger print of genotoxicity" as for drug in pharmaceutical companies.

Skin Sensitization Assessment of Cosmetic Products
Skin sensitizers are chemicals that have the intrinsic potential to induce a state of hypersensitivity in humans, that upon repeated topical exposure may result in the development of allergic contact dermatitis (ACD).Sensitization involves the activation of an adaptive immune response and the priming of immunological memory, and once acquired, it is often a chronical condition, and elicitation of clinical symptoms can only be prevented by avoiding exposure to the inducing agent (see for example [50] for an excellent review).Proactive identification and evaluation of skin sensitization potential is therefore of central importance for safety evaluation of chemicals and represents a key toxicological endpoint among regulatory authorities across multiple industries, and not least for cosmetics, where the intended route of exposure often is via dermal application [51].
Before a new cosmetic ingredient is placed on the European market, evaluation of its safety profile, including the assessment of skin sensitization hazards and potency is mandatory.Following the revision of Annex VII of the REACH regulation [52], as well as the transformation of the cosmetic directive into a regulation (EC1223/2009) [1], traditional animal models, such as the Guinea Pig based assays (GPMT or the Buehler test) [53] or the murine Local Lymph Node Assay (LLNA) [54], are no longer allowed to meet the information requirements for substances exclusively intended for use in cosmetic products.To this end, a plethora of New Approach Methods (NAMs), such as in chemico and in vitro methods, have been validated and incorporated into official test guidelines by the OECD, serving as viable replacements for animal studies.These methods are designed to target individual Key Events (KE) in the Adverse Outcome Pathway (AOP) for skin sensitization [55], which recapitulates the most important key mechanistic events that are required for the development of skin sensitization.Currently, three technical Test Guidelines (OECD TG 442 C, D and E) describe a total of seven such methods, including the KE1 based Direct Peptide Reactivity Assay (DPRA) and the Amino acid Derivative Reactivity Assay (ADRA) [56], the KE2 based assays KeratinoSens and LuSens [57], and the KE3 based assays h-CLAT, U-SENS, and the IL-8 Luc assay [58].According to the current testing paradigm, these methods should not be considered as stand-alone assays, but rather in the context of a tiered testing strategy, a so-called defined approach (DA), where a fixed data integration procedure is used to arrive at a final classification, based on the readout from several NAMs.Currently, several DAs have been described for hazard identification of skin sensitizers, and their individual components, data integration procedures (DIPs), and performances have been summarized in [59].Importantly, based on the empirical data from this publication, accuracies of the proposed DAs, ranging between 75.6% to 85.0%, were superior to that of the LLNA (74.2%) for predicting human skin sensitization hazard.In addition to the current OECD adopted assays, several alternative and innovative assays are in the process of being validated and adapted as official TGs [60], some showing predictive performances similar to the proposed DAs, also when considered as stand-alone assays [61].Thus, skin sensitization testing is an ever-moving target, and to provide guidance to testing and safety evaluation to the cosmetic industry, the Scientific Committee on Consumer Safety (SCCS) publishes the "Notes of Guidance for the Testing of Cosmetic Ingredients and Their Safety Evaluation" [2], ensuring that testing can be performed in compliance with EU cosmetic legislations.
Despite the above-mentioned progress to replace animal experimentation, more work is still needed to address certain limitations with current NAM-based strategies.For example, it has been recognized that certain chemicals of interest to the cosmetic sector may be difficult to test in the conventional OECD validated assays [62].Such limitations, as far as they have been identified, are described in individual TGs, and may include constraints with testing of hydrophobic ingredients, pre-pro haptens, and complex substances, including natural extracts where the ingredient of concern is often present in minute concentrations within a complex mixture.Novel state-of-the-art scientific methods currently in the OECD Test Guideline Program (TGP) and under evaluation for official TG adaption [60], such as the Genomic Allergen Rapid Detection (GARD) assay [63,64], which is based on the measurements of a biomarker signature of genes associated with immunologically relevant pathways to the sensitization process, have shown promise to address some of these limitations.For example, the GARD assay is compatible with a variety of different solvents that can be applied to increase bioavailability of a Test Item [65], and a protocol is also available for testing of solid materials, such as medical devices, using both polar and non-polar extraction vehicles in compliance with ISO-10993:12 [66].Such findings may prove potentially useful also for cosmetic-related test items, such as UVCBs or natural extracts with limited solubility in conventional assay solvents, such as DMSO or water.Furthermore, several 3D-models based on reconstructed human epidermis (RHE) have also been developed to address some of the solubility limitations (reviewed in [62]).The majority of these assays have a clearly defined readout of established biomarkers (e.g., IL-18), while others are less transparent.In a recent publication evaluating the performance of a selection of such models, the majority of the RHE-based assays showed similar, or slightly improved performances (dependent on the specific RHE-assay) to the best performing OECD validated assay, the h-CLAT assay, when investigating a limited set of "difficult-to-test" substances in comparison to human reference data [63], demonstrating that such assays may comprise a viable source of information within a weight-of-evidence approach for testing within this chemical domain.
In addition to the limited applicability domain, the most obvious limitation of the current OECD validated assays is likely that they have only been validated for skin sensitization hazard identification, and not for assessment of sensitizing potency, which is a critical component for risk assessment of cosmetic ingredients when used in consumer products.Skin sensitization is a threshold phenomenon, and a quantitative risk assessment (QRA) of individual ingredients aims to define a maximum dose of the chemical not inducing sensitization (referred to as the NESIL value) [67,68].The general procedure for QRA, involving a continuous prediction of skin sensitizing potency as a point of departure (POD), which is subsequently adjusted by applying uncertainty factors, has been described for fragrances [67], and its applicability to general cosmetic ingredients is currently being discussed.Development of NAM based strategies also for continuous assessment of skin sensitizing potency for use as point-of-departure in the QRA is ongoing, and examples include the DA-based Artificial Neural Network Model for Predicting LLNA EC3 [69], as well as the recently proposed GARDskin Dose-Response model [18,70].
Finally, as novel NAM-based methods are developed to replace traditional animal models for assessment of cosmetic ingredients, the ultimate arbiter of the capacity of these tests to protect human health must be evaluated based on how well they correlate with reliable information on the skin sensitizing activity of chemicals in humans, and not how well they recapitulate the weaknesses of the "gold" standard animal tests, irrespective of their historical consideration as valid and adapted OECD methods.For chemicals of hitherto unknown sensitization potential, the preclinical evaluation of cosmetic ingredients using the NAM strategies described above is an essential and important first step to ensure the safety profile of cosmetics, but also as described in [71], post-market surveillance, often referred to as cosmetovigilance, will remain an important part to ensure that the use of cosmetic ingredients, as well as their concentration in formulated products remain safe to the consumers.

Endocrine Properties Assessment of Cosmetic Products
On the 13 December 2017 the European Parliament adopted scientific criteria to define endocrine disruptors which came into force for plant protection products and biocides in 2018 [72].This has been a major step towards the future implementation of similar criteria for regulation of cosmetics in Europe.Despite the discrepancies due to the particular context of cosmetics, a few lessons relating to endocrine assessment strategies have been learnt from experience.
Adopted criteria for endocrine disruptors are closely related to the WHO definition of 2012 [73].An endocrine disruptor is defined by three main criteria: its endocrine mode of action, its capacity to cause an adverse effect, and the plausible link between this endocrine activity and the related adverse outcome.
Regulatory authorities require datasets to permit a conclusive assessment on the disruptive capacity of an endocrine active sample.However, for cosmetic ingredients this will be difficult as availability of comprehensive endocrine test systems is very limited without accessing animal experimentation.Therefore, alternative models will be required to overcome this difficulty that can provide data which will contribute to safety of cosmetics for the endocrine system in an ethical manner.
Since 2002, experts representing OECD member countries have published test guidelines dedicated to endocrine assessment of chemicals.These internationally acknowledged methods are listed, and their proper usage is described within the OECD Guidance Document 150 [74].According to this document, adversity should be assessed (using laboratory animals) to achieve a conclusive assessment of an endocrine disruptor.OECD validated methods cover so far EATS (Estrogen, Androgen, Thyroid, and Steroidogenic) endocrine pathways, for which specific adverse physiological outcomes have been characterized.
It could be argued that the absence of endocrine activity excludes the need for investigating physiological adversity.This opens a possible testing strategy for an ethical cosmetic approach: using a battery of validated in vitro/embryonic models to cover all major modes of action of endocrine disruptors on EATS pathways.Cellular-based assays using tumoral cell lines, allow the assessment of the transactivation capacity estrogen (OECD TG 455) [75] or androgen (TG 458) [76] receptors, as well as disruption of steroidogenesis (TG 456) [77].Nevertheless, performing all these assays independently will not mimic the interaction of these mechanisms occurring in vivo and many modes of actions are not covered by in vitro tests such as disruption of 5-alpha reductase endocrine target to counteract alopecia [78].The complexity and crosstalk of endocrine pathways as well as the number of mechanisms involved often leads to false positive or false negative results using cellular models [79,80].Identifying an endocrine disruptor boils down to elucidating an adverse outcome pathway and requires a complete endocrine system as a model.
As indicated by the SCCS guidance notes [3], due to the conservation of endocrine mechanisms across vertebrate species data provided by "some ecotox tests may be informative for the assessment of the endocrine activity of a compound in humans".This is of great value as the additional information provided by ecotoxicological tests significantly increases the weight of evidence available for endocrine assessment of cosmetic ingredients.
Embryos of aquatic vertebrates provide ethical and useful models to assess endocrine activity of cosmetic ingredients or products in a whole endocrine system.In 2019, the OECD published the first eleuthero-embryo-based test to assess Thyroid activity, Test Guideline 248 (XETA) [81].Eleuthero-embryo defines early life stages post-hatch which still depend on maternally deposited energy reserves making them eligible for cosmetic testing according to the EU definition of a laboratory animal [82].This first eleutheroembryonic model for measuring thyroid activity paved the way for the development of a series of embryonic models derived from fish and amphibians bearing fluorescent reporter constructs integrating hormonal responsive elements.
Among assays in the OECD process of validation, the EASZY and REACTIV assays are dedicated to measuring estrogenic activities.These models carry specific targets to reveal the brains response to estrogens (EASZY) [83] and estrogenic control over reproduction (REACTIV) [84].Further, it is also included in the OECD work program on endocrine disruptors and in the EFSA/ECHA guidance document [85] on endocrine disruptor assessment is the RADAR [86] assay which measures androgenic activities related to male reproductive behaviors.
These embryonic models allow the detection and quantification of endocrine activities by the quantification of fluorescence.Even if these in vitro aquatic models are not necessarily predictive of the effects in humans, they make it possible to detect endocrine activity and constitute a predictive screening tool.
The criteria adopted by the EU for the assessment of endocrine disruptors are hazard based.These criteria were implemented within plant protection product and biocide regulations in 2018.Weight of evidence provided by models that identify modes of action and related adverse outcomes have replaced risk assessment for the classification of endocrine disruptors.However, for the assessment of cosmetic ingredients, implementation of these hazard-based criteria without the use of laboratory animals remains a challenge.Despite this, some solutions are available to provide more realistic exposure scenarios whilst avoiding the use of regulated life stages of laboratory animals.Linking the selection of test concentrations for hazard assessment to a range of daily doses of a compound or product could be one approach for screening cosmetics.Recent advances in the development of eleuthero-embryonic tests systems also provide options for semi-quantitative assessment of endocrine activity in a whole endocrine system.Allowing the identification of ingredients, extracts, or preparations, which would require more in-depth investigation.
Data provided by embryonic models and cellular assays will be a great source of knowledge to feed into the development of in silico models.Our ultimate aim should be to develop in silico models of each endocrine pathway, and one day perhaps a computational model of a complete vertebrate endocrine system.

Assessment of Dermal Absorption of Cosmetic Products
Assessment of dermal absorption is a crucial aspect of cosmetic product and ingredient safety, as opposed to drugs, which almost always enter the body in other ways.In vitro dermal absorption studies are the gold standard method for skin pharmacokinetic evaluation and are suitable to predict the expected dermal absorption by humans.
The purpose of the dermal absorption testing, also known as dermal penetration or percutaneous penetration, is to provide a measurement of the absorption or penetration of a substance through the skin barrier and into the skin.
Detailed guidance on the performance of in vitro skin absorption studies is available (OECD 2004(OECD , 2011(OECD , 2019)), [87][88][89].In addition, the SCCNFP (Scientific Committee on Cosmetics and NonFood Products) adopted a first set of "Basic Criteria" for the in vitro assessment of dermal absorption of cosmetic ingredients back in 1999 and updated in 2003 (SCCNFP/0750/03) [90].The SCCS updated this Opinion in 2010 (SCCS/1358/10) [91].A combination of OECD 428 guideline with the SCCS "Basic Criteria" (SCCS/1358/10) is considered to be essential for performing appropriate in vitro dermal absorption studies for cosmetic ingredients.
Dermal absorption studies are conducted to determine how much of a chemical penetrates the skin, and thereby whether it has the potential to be absorbed into the systemic circulation.Therefore, knowledge of dermal absorption phenomena is essential for:


Safety issues: the presence of systemic test item may lead to systemic adverse effects, the quantities absorbed is taken into consideration in toxicological risk assessment to extrapolate human exposure and calculate the margin of safety (MoS); and  Therapeutic aspects: the quantities penetrated can be taken into consideration to predict the therapeutic concentration at the target sites in skin tissue.
In vitro dermal absorption studies are applied in different sectors and for different purposes:


Formulation Screening: for selection of lead candidate formulation;  Bioequivalence: to determine if the new product has the same degree of dermal absorption as reference product.In vitro dermal absorption assay was recently used to demonstrate bioequivalence, and the results of the comparison were accepted by the FDA in connection with the marketing authorization for Lotrimin Ultra cream [92];  Cosmetics and consumer products: Dermal absorption rate is part of the toxicological profile of any ingredient.Almost always provided for any submission to the SCCS, the in vitro dermal absorption studies can then be part of the safety assessment of a cosmetic product;  Pharmaceutical products: in vitro dermal absorption studies are part of safety and efficacy assessment of topical products;  Chemical/agrochemical: in vitro dermal absorption studies are part of safety assessment purposes.With respect to pesticides, the results of the in vitro dermal absorption studies alone are accepted for pesticides risk assessment purposes in the European Union and other countries.
When conducting in vitro dermal absorption study, skin sample is placed between two chambers (a donor chamber and a receptor chamber) of a Franz-type diffusion cell in a way such that the stratum corneum is facing the donor compartment where the formulation to be examined is applied, while the dermis is touching receptor compartment.
Human skin samples are usually obtained from patients undergoing plastic surgery.Abdominal skin is most convenient, due to the large areas that may be available.Carefully handled frozen human skin are suitable for testing the passive permeation of chemicals, when skin viability and metabolic activity were not being investigated [93].However, for studies requiring the presence of viable epidermal tissue, such as investigations of drug transporters [94][95][96][97][98] or skin metabolism [96], fresh skin samples are required.
There are considerable differences in skin absorption across different body sites, attributed to stratum corneum thickness, hydration, and lipid composition [99][100][101][102][103].To reduce variability, it is recommended to use split-thickness skin.Full-thickness skin is cut to approximately 500-750 µm using a dermatome.Quality of skin samples have to be checked at the beginning of the experiment.This is done by measuring transepidermal water loss (TEWL) indicative of barrier integrity.
A finite dose of tested product is applied on the skin surface and incubation is done at 32 °C.The permeation rate of a test item from the donor compartment through the skin into the receptor is determined by measuring the amount of drug in skin samples and in receptor fluid.Different analytical methods can be used to quantify the amount of test item in the samples.
Different analytical methods can be used to quantify concentration of test substance in different skin compartments according to physicochemical properties of the test substance such as lipophilicity, molecular weight, charge, and concentration of the test substance: liquid chromatography-tandem mass spectrometry (LC-MS/MS); inductively coupled plasma-tandem mass spectrometry (ICP-MS/MS), liquid chromatography with UV detection (LC-UV) or fluorescence detection (LC-Fluo), liquid scintillation counting (LSC) for radiolabelled compound, and imaging approaches, e.g., epifluorescence or confocal microscopy in the case of fluorescent molecules or matrix-assisted laser desorption-mass spectrometry imaging (MALDI-MSI) [104].
In vitro dermal absorption assay is very operator-dependent, and care needs to be taken especially when handling skin samples and when removing the excess of formulation.The success of the assay is equally dependent on the development and validation of sensitive analytical methods to quantify the amount of test substance in the samples.
One of the main challenges is how to measure dermal absorption in babies and infant skin necessary in cosmetic ingredient safety assessments.It is recognized that babies, infants, and children represent a distinct subpopulation for risk and safety assessments, and routinely considered the greater skin-surface area to body-mass ratio in children when performing cosmetic ingredient safety assessments [105].Systemic exposures in babies and infants are generally assumed to be greater than in older children and adults.On one side, the percutaneous absorption could be higher because of the immaturity of the skin as a barrier to absorption (higher pH of the skin yields decreased barrier function and increased risk of irritation), particularly onto the nappy area.On the other side, the greater body-surfacearea to body-mass ratio of babies and infants compared with older children and adults, mathematically induces high amounts in mg/kg bw/w for a similar quantity of product [106][107][108][109][110]. Modifications of existing in vitro skin penetration protocols to evaluate the potential for higher absorption from topically applied products are needed.The use of compromised skin represents a good alternative to mimic underdeveloped barrier function as in premature infant skin.Compromised skin can be achieved by different procedures, e.g., tape stripping, microneedling device, abrasive skin preparation pad, or even iontophoresis [111][112][113].

Skin and Eye Irritation Assessment of Cosmetic Products
Assessment of skin and eye irritation potential of an ingredient or formulation is an important part in cosmetic ingredient safety assessments.
Dermal irritation is defined as the production of reversible damage of the skin, following the application of a test substance for up to 4 h (OECD 404) [114].Eye irritation is defined as the occurrence of changes in the eye following the application of a test substance to the anterior surface of the eye, which are fully reversible within 21 days of application (OECD 405) [115].
Skin and eye irritation are assessed using reconstructed human tissue-based test methods.Commercially available 3D-models based on reconstructed human epidermis (RhE) are used for skin irritation testing (OECD test Method 439) [116] and 3D-model based on reconstructed human cornea-like epithelium (RhCE) is used for eye irritation testing (OECD Test Method 492) [117].It should be noted that there are different in vitro models that address serious eye damage and/or identification of chemicals not triggering classification for eye irritation or serious eye damage [3], but we will only focus on RhCE model.
The overall design 3D-models based on reconstructed human tissues mimics the biochemical and physiological properties of the upper layers of the human skin and eye.
RHE is a skin model composed of living human keratinocytes which have been cultured to form a multi-layered, highly differentiated epidermis.The model consists of highly organized basal cells and includes a functional skin barrier with an in vivo-like lipid profile.
RhCE is a corneal model composed of living human cells which have been cultured to form a multi-layered, differentiated corneal epithelium.The model consists of highly organized basal cells which progressively flatten out as the apical surface of the tissue is approached, analogous to the normal human in vivo corneal epithelium.
In both models, the cells are both metabolically and mitotically active, and release many of the pro-inflammatory agents (cytokines) known to be important in irritation and inflammation.Reconstructed human tissues are grown on special platforms at the air-liquid interface.
The test item is applied directly to the tissue surface, providing a good model of "real life" exposure.The endpoint used in both RhE and RhCE test methods is the cell-mediated reduction of MTT (3-(4,5)-dimethyl-2-thiazolyl-2,5-dimethyl-2H-tetrazolium bromide) into a blue formazan salt that is quantitatively measured after extraction from the tissues.A second endpoint can be used to increase sensitivity is the measurement of interleukin-1α (IL-1α) production.
If the viability is greater than 50% (RhE) or 60% (RhCE), the test item is classified as Non-Irritant (no-label or UN GHS No Category).
If the viability is below or equal to 50% in the case of RhE model, the test item is classified Irritant (UN GHS Category 2).
If the viability is below or equal to 60% in the case of RhCE, no prediction can be made, and further testing may be required.
So far, neither a single in vitro assay nor a testing battery has been validated as a standalone replacement for the in vivo test.New test systems are under development using stem cells.These could generate new alternatives for in vitro ocular toxicity testing [118].

Conclusions
The total number of experiments in animals only slightly decreased in Europe between 2015 and 2017.It changed from 9.59 million animals to 9.39 million, when it was 11.5 million in 2011.Animals are mainly used for research (69%) and then for regulatory purpose (23%).In 2017, 61% of the experiments in animals were for medical products for humans, 15% for veterinary products, 11% for industrial chemicals.Moreover, the report of the European Commission identifies a concern about the use of animals for endpoints where alternative methods exist (irritation, skin sensitization).
Despite the marketing ban of cosmetic ingredients and cosmetic products tested in animals, there is still debate on this issue.From a regulatory point of view, the position of the European Agency is clear and has been clarified ("Clarity on interface between REACH and the Cosmetics Regulation").No cosmetic product is currently tested in animals in Europe.The cosmetic ingredients can have former results obtained from toxicological tests in animals.These results can be obtained after the animal testing ban, but only if required by another regulation (food, pharmaceutical, or even REACH, considering the obligations of safety of the workers).If cosmetics are the only use of a substance, all in silico and in vitro tests will then be encouraged to demonstrate the safety.However, for a toxicologist, it remains a huge challenge to guarantee the absence of risk based on the current available methods.All so-called New Approach Methodologies, using AOPs, IATAs, or Defined Approaches will be the foundation of the safety for future new ingredients [119].
A wide range of in vitro models for safety testing of cosmetic products and cosmetic ingredients has been developed and adopted in test guidelines.There is still an increasing need, largely driven by regulatory authorities and industry, to develop in vitro models to predict carcinogenicity, repeat dose toxicity and reproductive toxicity, for which no alternative in vitro methods are currently available.

Figure 1 .
Figure 1.Overview of different alternatives to animal testing for safety assessment of cosmetic products and cosmetic ingredients.Assays in grey are not discussed in this review.