A State-of-the-Art Review on the Alternatives to Animal Testing for the Safety Assessment of Cosmetics

: Almost a decade after the stipulated deadline in the 7th amendment to the EU Cosmetics Directive, which bans the marketing of animal-tested cosmetics in the EU from 2013, animal experimentation for cosmetic-related purposes remains a topic of animated debate. Cosmetic industry continues to be scrutinised for the practice, despite its leading role in funding and adopting innovation in this field. This paper aims to provide a state-of-the-art review of the field on alternative testing methods, also known as New Approach Methodologies (NAMs), with the focus on assessing the safety of cosmetic ingredients and products. It starts with innovation drivers and global regulatory responses, followed by an extensive, endpoint-specific overview of accepted/prospective NAMs. The overview covers main developments in acute toxicity, skin corrosion/irritation, serious eye damage/irritation, skin sensitisation, repeated dose toxicity, reproductive toxicity/endocrine disruption, mutagenicity/genotoxicity, carcinogenicity, photo-induced toxicity, and toxicokinetics. Specific attention was paid to the emerging in silico methodology. This paper also provides a brief overview of the studies on public perception of animal testing in cosmetics. It concludes with a view that educating consumers and inviting them to take part in advocacy could be an effective tool to achieve policy changes, regulatory acceptance, and investment in innovation.


Introduction
A 1936 publication entitled 'American Chamber of Horrors: The Truth about Food and Drugs' [1] highlighted the many instances where consumer goods led to injury or even death of the user.Consumer safeguarding, and subsequent animal testing, became a legal requirement in the United States shortly after [2], and thus, the first target for the modern animal rights movement was created.Global campaigning efforts culminated in 2009, with the phasing out of animal testing in cosmetics within European Union (EU) member states, despite representing only 0.05% of total animal use [3].The 3Rs principle of replacement, reduction, and refinement, introduced by Russel and Burch [4], was reduced to a single R approach (replacement), making the cosmetics industry a major pro-must be combined with in vitro/ex vivo testing methodologies through a Weight of Evidence (WoE) approach, historical animal data (performed prior to legislative deadlines), and when available, data from human research, such as clinical trials and human biomonitoring [6,8].
Great efforts have been made to promote the development and regulatory acceptance of NAMs, both in science [6,9] and through initiatives championed by Non-Governmental Organizations (NGOs), trade associations, and cosmetic companies [10][11][12].This has led to great advances in NAM development that have recently been highlighted in the literature [13].Nevertheless, consumers and brands continue to focus on optics, such as crueltyfree certifications [14], over important milestones in NAM implementation.For example, a survey of 1011 British adults found that only 23% of participants knew that animal research is only allowed to be carried out when there is no alternative [15].
Since the 2013 cut-off-date for the phasing-out of animal testing, no novel compounds for exclusive use in cosmetics have been announced to the EU market.To remove this barrier, additional approaches and out-of-the-box thinking for the safety evaluation of new chemicals are needed [6].
This literature review aims to provide a brief history and state-of-the-art picture of the developments in the area.It will do so firstly by briefly exploring major drivers for NAM innovation and the emergence of alternatives.Secondly, it will outline the regulatory response across the globe thus far, with a particular focus on its impact on the cosmetics industry.An extensive, endpoint-specific overview of accepted/prospective NAMs will follow, with a focus on in silico methodology.Finally, with the intent of understanding opportunities for the acceptance of NAMs through the democratisation of science, public perception of animal testing in cosmetics will be explored.

Methodology
To procure literature for this work, the following databases and other literature sources were consulted (Table 1).* All abbreviations appearing in this review are summarised in Supplementary Materials (Table S1).
Primary search terms (black on the mind map, Figure 1) were used to gather information from Tier 1 and 2 sources, either alone or in combination; these were then combined with secondary search terms selected according to their relevance to different sections.Upon selection of a relevant number of academic papers and articles, backward and forward snowballing techniques [16] were used to identify additional material.Tier 3 databases were analysed following distinct objectives: the European Commission's Joint Research Centre (EC JRC) Publications Repository was used to identify yearly reports for the European Centre for the Validation of Alternative Methods (EURL ECVAM); the European Chemicals Agency (ECHA) website was inspected to find relevant guidelines and validated methods; the Tracking System for Alternative Methods Towards Regulatory Acceptance (TSAR) Database was used to identify relevant NAMs, as well as their current status in the process of approval and regulatory acceptance.Finally, the Organisation for Economic Co-operation and Development (OECD) Library was searched for relevant OECD guidelines and guidance documents.

Drivers for the Phasing-Out of Animal Testing in Cosmetics
According to NGOs, the timeline for the reduction of the use of animals in scientific experiments did not start until the early 1980s, with ethics-centred actions spearheaded by advocate Henry Spira [17].However, prior to the birth of this global social justice movement, and the adoption of cruelty-free certificates by the personal care industry, the concept of lessening the use of animals in clinical testing was already being discussed in science [4,18].
Russel and Burch [4] were the first scientists to explore the limitations of animal models in scientific experimentation as a way of incentivising the development of alternative methods.They were also the first in literature to explore the concept of efficiency in animal experimentation, specifically the length and cost of animal studies when compared to alternative methods.They went on to state that, at the time, some in vitro methods (e.g., those employing bacteria cultures) were already proving to be cheaper than keeping live animals in laboratories.Overall, their work helped make a case for the reduction, refine-ment, and replacement of animal experimentation that went beyond ethical rules.However, ethical considerations must not be ignored, as they were, effectively, the fuel for the formation of the modern Animal rights movement [19,20].
The following three major drivers of innovation in the field of clinical research, including the testing of cosmetics, can be identified: ethical considerations, the lack of effective extrapolation, and economic efficiency.These emerged from an intersection between healthcare and social sciences, alongside the inescapable economic requirements of modern society.

Ethical Considerations
Henry Spira, who was a major catalyst for the modern animal rights movement, was a USA journalist for left-wing publications, which led him to emulate other social movements (such as the civil rights movement) and identify one single big success against what he called "systems of oppression" [20].His first target was the American Museum of Natural History (AMNH), which culminated with the Museum ceasing their research on laboratory-bred and domesticated animals [20].Spira's next target was the cosmetics industry, specifically the Draize test [20], which signified the birth of the modern Animal rights movement.

The Lack of Effective Extrapolation
The use of animal models to obtain human-relevant data requires extrapolation, i.e., the conversion of dose-related toxicity of chemicals from animal models to humans [21].A major motivation for the development of alternative methods is the fact that animal models can differ structurally and physiologically from humans in ways that render the study unsatisfactory [22].In a recent analysis of 100 systematic reviews on animal experiments, 75% of reviews were found to present significant limitations when trying to predict human disease outcomes or safety through animal data [23].These were associated with one or more of the following factors: discrepancies between species, lack of clinical translation, unsuitable methodology, inconsistencies, and publication bias, which led to an exaggeration of the benefits of animal use.
One such study that lacked in concordance was the Draize test, a method using rabbit models to study irritation and toxicity of substances applied topically to the skin and mucous membranes [24].The efficacy of the Draize test was already being questioned by scientists and singled out as a good contender for the creation of an alternative method [20,22].Spira's campaign against the Draize test, which culminated in 1980, led to Revlon's Board of Directors agreeing to provide the Rockefeller University with $750,000 in funding to support research into non-animal safety tests.Revlon then went to call on other major corporations in the personal care space to join in as research program partners.The Cosmetic, Toiletry, and Fragrance Association (CTFA) set up a fund for this purpose, which gradually gained supporters, including Avon, Bristol-Myers, Estée Lauder, Max Factor, Mary Kay Cosmetics, and others [20].

Economic Efficiency
The economic efficiency of animal testing has often been questioned [25], hence the alternative approaches were explored as a way of saving time and money, while also addressing animal welfare concerns [26].A good example of the application of the reduction principle of the 3Rs was the abolition of the classical Lethal dose test (LD50) in 2002, which at the time used a minimum of 20 animals per test.Post-abolition, the OECD approved three new in vivo tests: the 'Fixed-Dose Procedure Test' (FDP; OECD TG 420), the 'Acute Toxic Class Method Test' (ATC; OECD TG 423), and the 'Up-and-Down Procedure' (UDP; OECD TG 425).These tests are performed in a sequence, in which the outcome of the previous step/dosage defines the next dose to be tested; this allows for a significant reduction in the number of animals utilised for each test, to a minimum of five animals per test [27][28][29][30][31].
Bottini and Hartung [3] were the first to explore the economic aspects of animal testing and identified the lack of new developments in this field as a suppressor of innovation and economic growth.One such inhibitory factor was precautionary, hyper-sensitive animal testing leading to the acceptance of false positives; this would then make companies discard substances at late (and expensive) stages of development.Gabbert and van Ierland [32] applied the concept of Cost-Effectiveness Analysis (CEA) to a short-term mutagenicity testing and found that, to achieve superior sensitivity to that of an in vivo test, a combination of tests would have to be used (Ames test, OECD TG 471, and Gene Mutation in Mammalian Cells, OECD TG 476), leading to an increased cost.However, when considering substances that will be marketed towards the EU cosmetics industry, where animal testing has been prohibited since 2013, in vitro testing may be the only option, and therefore the most economically viable.
Recently, Meigs et al. [5] expanded on the work of Bottini and Hartung [3], concluding that the NAMs lead to greater productivity and turnover in different industries.This work highlighted the unique position in which the cosmetic industry found itself due to regulatory changes in Europe.

Global Regulatory Responses
Regulatory toxicology is a branch of toxicology that aims to protect humans and the ecosystem from the toxic effects of substances by means of regulations and standardisation.Toxicology itself is a highly interdisciplinary field of study, also known as 'the science of poisons'.It studies how chemical, physical, or biological agents can cause adverse effects on living organisms and the environment.
Officially, consumer protection became the state responsibility with the enactment of the US Federal Food, Drug and Cosmetic Act of 1938 [33].This act was prompted by several public emergencies, with many relating to the use of cosmetic products.
The beginning of the phasing-out of animal testing was prompted by both scientists looking for more efficient methods and by animal welfare activists, leading to NAMs being considered under a regulatory framework as early as 1977, with the Netherlands being the first country to include a section on alternatives in its Animal Protection Law.Switzerland followed in 1981, with legislation requiring the consideration of NAMs, and the same request was made in an amendment of the United States Animal Welfare Act in 1985 [34].The EU responded to growing concerns over Animal Welfare in 1986 with Council Directive 86/609/EEC [35], which stated that the Commission and Member States should actively try to promote the development, validation, and acceptance of procedures that might reduce, refine, or replace the use of laboratory animals [36].

European Union
In response to the Council Directive 86/609/EEC [35], the European Commission (EC) established the EU Reference Laboratory for Alternatives to Animal Testing (EURL ECVAM) in 1991, creating a network of experts that collaborate in the identification, development, and validation of alternatives to animal testing for regulatory purposes [36].This historical move was one of the major drivers for the development and adoption of alternative methods by the EU and at the Organisation for Economic Co-operation and Development (OECD).What began as 7 OECD test guidelines based on in vitro methods became a total of 30 OECD validated guidelines based on 52 alternative methods (by June 2022), accepted by member and observer organisations of the International Cooperation on Alternative Testing Methods (ICATM) [37].Most of these accepted methods apply to the human and environmental safety of cosmetic products.
However, it was at the early stages of the EURL ECVAM that the EC adopted the 7th amendment (2003/15/EC) to the Council Directive 76/768/EEC [38].Again, and much like what happened in the 1970s, the cosmetics industry was being singled out, despite very limited animal usage (0.03-0.05% of all animals tested in Europe at the time).A ban on animal testing in cosmetics was implemented before suitable alternatives were made available, in a unique move that cemented the cosmetics industry's place as a propeller for innovation in the field of safety testing [5].

REACH vs. Regulation EC 1223/2009
The early 2000s also saw the establishment of the European Chemical Agency (ECHA), and the implementation of the new EU chemicals regulation-Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH).REACH entered into force in June 2007, while ECHA became fully operational a year later [39].By introducing strict safety testing requirements regardless of the availability of NAMs that met regulatory acceptance, REACH heavily conflicted with the 7th amendment of the Cosmetics Directive, having an effect that animal welfare activists have named a 'loophole' [23].
This divergence between the two regulations had a lasting effect on the safeguarding of the animal testing ban, with court cases being fought between ECHA and cosmetic companies (e.g., the 2020 case of homosalate and 2-ethylhexyl salicylate, which required animal data for the purpose of occupational safety of workers) [40].The report by Knight et al. [41] has identified 63 problematic substances in the REACH database, with 13 being directly confirmed by the registrants (or within the dossier) to have undergone animal testing after the cut-off-dates.NGOs and the cosmetic industry have called for more transparency by the authorities on post-ban in vivo testing of cosmetic ingredients [41].

United States
In 1978, the National Toxicology Program (NTP) was established as part of the National Institutes of Health (NIH), with the aim of coordinating US toxicological testing programs.It became a world leader in the field of toxicology, informing health, regulatory and research agencies set to protect public health.With the turn of the century, a collaboration between NIH divisions, the US Environmental Protection Agency (EPA) and the Food and Drug Administration (FDA) was formed, and 2008 saw the inception of the Toxicology in the 21st Century (Tox21) program [42].This federal consortium was created with a focus on the development and evaluation of in vitro high-throughput screening (HTS) methods for hazard identification, alongside the provision of mechanistic insights.Later, it went on to focus on the development of a portfolio of alternative test systems [43].
The Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) was established in 2000.Soon after its establishment, California became the first state to pass a law that required companies to use ICCVAM-validated testing methods [17,44].
In March of 2014, the Humane Cosmetics Act, which would prohibit the sale or transport of cosmetics developed using animal testing, was first introduced to the House of Representatives.It was reintroduced in 2015, 2017, 2018, 2019 (with the support of the cosmetics industry, including Unilever and P&G), and again in 2021.The act has failed to be enacted to this day but, in the interim, several states have passed bans on the sale of animal-tested cosmetics [17,45].

China
China's policy around cosmetic testing has been a source of controversy for over a decade.This was likely due to conflicting timelines: while the EU banned alternative methods between 2009 and 2013, in 2007 the former Chinese Ministry of Health issued an update to the Hygienic Standards for Cosmetics, which included a requirement for 17 animal-based toxicological tests [46].However, following the 2013 EU cut-off-date, the importance of alternatives began to be explored in China.In 2016, the Chinese State Food and Drug Administration (CFDA) adopted the 3T3 NRU Phototoxicity Assay into China's revised Safety and Technical Standards for Cosmetics (STSC), officially marking China's first regulatory acceptance of an alternative method [47].
Chinese cosmetics are categorised as general and special, and then further divided depending on their country of manufacture.In 2014, animal testing requirements were removed from domestic, non-special cosmetics that could be deemed safe through a safety risk assessment [47].In 2018, non-domestic companies were able to market their products in mainland China while maintaining their cruelty-free status through the Leaping Bunny China Pilot Project.The products were manufactured in their country of origin, but filled in Chinese territory, thus allowing brands to enjoy exemptions previously exclusive to domestic products [48].
In May 2021, the animal testing exemption on general cosmetics was extended to imports, thanks to a new set of regulations-Provisions for Management of Cosmetic Registration and Notification Dossiers and Provision for Management of New Cosmetic Ingredient Registration and Notification Dossiers-created to offer a standardised guide to the registration and notification of new cosmetic products and raw materials.To secure this exemption, companies must obtain a good manufacturing practices (GMP) certificate from their relevant regional authority and provide sufficient safety assessment results.However, this exemption does not apply to special cosmetic products, as well as products marketed to infants and children [49,50].

Rest of the World
In 2009, a Memorandum of Cooperation was signed by four agencies, EURL ECVAM (EU), Health Canada, ICCVAM (USA) and JaCVAM (Japanese Centre for the Validation of Alternative Methods), signalling the establishment of the International Cooperation on Alternative Test Methods (ICATM).In 2011, the memorandum was updated to include South Korea, and in 2015 Brazil and China [51].The ICATM framework was created with the intent of establishing international cooperation in critical areas related to NAMs, including validation, peer-reviewing, and the elaboration of harmonized recommendations for the worldwide acceptance of NAMs [52].Since then, multiple ICATM agencies have contributed towards the development of new methods, with Japan's JaCVAM developing 9 alternative methodologies thar have since been adopted; methods developed by South Korea's KoCVAM (South Korean Centre for the Validation of Alternative Methods) and Brazil's BraCVAM (Brazilian Centre for the Validation of Alternative Methods) are also currently undergoing the process of approval and regulatory acceptance [37].
This international effort towards the development of new technologies, alongside the work of animal welfare organizations, and a general switch in public opinion, led to a series of animal testing bans being implemented worldwide (Table 2).However, it is worth noting that Regulation (EC) No. 1223/2009 [53] continues to be the golden standard, as it covers the testing of both cosmetic products and ingredients, while simultaneously addressing their marketing to consumers.Therefore, the following chapter will explore the adoption and regulatory acceptance of NAMs within the EU, with a focus on their application to cosmetics.

New Approach Methodologies (NAMs)
The progress of NAMs from submission to regulatory acceptance in the EU is tracked by the EC through the Tracking System for Alternative Methods Towards Regulatory Acceptance (TSAR) [37].Amongst the 52 currently accepted methods (Figure 2a), most are performed in vitro (utilizing sub-cellular fractions and cell-based assay systems) but may also be completed in vivo (by focusing on refinement and reduction of animal use), ex vivo (performed on tissue excisions from animal or human donors), in chemico (chemical and biochemical assays), and in silico (computational modelling and screening) [56].Among the 41 methods that apply to the assessment of human safety in the context of cosmetic products (Table 3), nine toxicological endpoints are covered, as seen in Figure 2b.Endpoint-specific guidance for safety assessment of chemicals in the EU is provided by ECHA and relates to the information requirements for the registration of new raw materials, encompassing not only toxicological endpoints relating to human health, but also physicochemical properties, stability and reactivity data, ecological information, disposal considerations, and others [57].The Scientific Committee on Consumer Safety (SCCS) also offers industry-specific guidance on the safety evaluation of cosmetic ingredients, including an overview of relevant toxicological tools for multiple endpoints as part of the SCCS Notes of Guidance for the Testing of Cosmetic Ingredients, with its 11 th revision published in March 2021 [8].Many of these endpoints overlap with those required under REACH; they will be explored further in this paper.(1) In this method, refinement is achieved as animals are humanely killed and skin disks are prepared for testing [61].(2) designed to reduce the number of animals used, while offering substantial refinement (less pain and distress) (OECD, 2018d) [65]; however, a new OECD testing guideline (OECD TG 497) [82] has since been approved for the replacement of this test.

Non-Testing Methods: In Silico Toxicology
The goal of in silico toxicology is to forecast particular hazards using computational models [83].Due to imposed bans and restrictions on animal testing across several regions, in silico methodologies are becoming increasingly important in the cosmetics sector [84].
The similarity principle, which states that substances with similar chemical structures should have comparable biological actions, underpins the development and implementation of all in silico toxicology approaches.Since then, this concept has broadened to include similarities based on gene expression and bioactivities [56,83].
In silico models can be developed by experts (e.g., structural alerts or read-across) or automatically (e.g., machine learning techniques) [83].Expert methods rely on the knowledge and experience of experts to predict or describe toxicity processes, utilising endpoint information from data-rich compounds to derive predictions for data-poor compounds; these include read-across (a procedure based on the formation of chemical categories from their structures) and structural alerts (the grouping of chemicals according to likeness in toxicity and mode of action) [83][84][85].Machine-learning methods rely on creating and training a computational model for hazard prediction; specifically, structurebased approaches (SAR) and quantitative SAR (QSAR) approaches-collectively known as (Q)SARs-are theoretical models that rely on descriptors derived from knowledge of chemical structures to predict potential hazard.While SARs assess qualitative connections amongst descriptors and the existence (or absence) of a property/activity of interest, QSARs use a statistical technique to quantitatively measure the correlation between said property/activity and descriptor values [83,86,87].
Currently, in silico approaches are primarily employed for internal decision-making, with regulatory acceptance under REACH being limited to gap-filling in a Weight of Evidence (WoE) approach [83].ECHA has published a Read-Across Assessment Framework (RAAF) [88] which provides guidance on how read-across methods can be used for predicting REACH-relevant features associated with hazard detection; it has also published a practical guide on how to use and report (Q)SARs [86], which includes a non-exhaustive list of available QSAR models for multiple endpoints (Supplementary Materials, Table S2).Additionally, SCCS guidelines have included guidance on the incorporation of in silico data across several endpoints [8].This growing confidence in the applicability of nontesting methods is likely to bring in silico toxicology to the forefront of safety assessment for regulatory purposes [84].

Acute Toxicity
Acute toxicity is the first stage of safety testing, used to identify target organs and offer an assessment of a substance's intrinsic toxicity.These investigations are often conducted in rodents (rats and mice) with the goal of estimating the LD50 (the dose at which a substance will lead to the death of 50% of the test population) of a toxin.These findings are then used to guide the design and dosing of longer-term (sub-chronic and chronic) studies [89].In cosmetics, acute toxicity data are required for oral, dermal, and respiratory routes [8].
This toxicological endpoint relies heavily on in vivo experimentation, and as the only validated NAM for acute oral toxicity, 3T3 Neutral Red Uptake (NRU) test, OECD GD 129 [58], cannot be used alone [90].However, it may be included within a testing strategy or WoE approach, and thus contribute towards the reduction and refinement of the use of animals in in vivo methods (e.g., by providing a starting dose).The usage of 'organ-onchip' systems seeded with human cells may be a viable substitute to in vivo tests in the future [91].These systems are in vitro microfluidic biomimetic devices that aim to emulate physiological systems of human organs, tissues, and circulation [92].
In the interim, and to further address animal welfare concerns, the OECD issued an acute toxicity waiver guidance document (OECD GD 237) [93] (Table S3).

Skin Corrosion/Irritation
Exposure to a chemical can produce alterations at the first point of contact, regardless of its ability to become systemically available, known as local effects [90].Chemicals that cause local effects can be further classified as irritant or corrosive substance.While corrosive chemicals lead to the irreversible destruction of living tissue (such as necrosis through the epidermis into the dermis), irritant substances will lead to a reversible injury, generally presenting as inflammation [8,89,90].
In vivo testing for skin corrosion/irritation is not encouraged under REACH; rather, a selection of non-testing methods is available (such as SARs/QSARs and read-across approaches), which may provide direct predictions of corrosion/irritation potential, as a part of WoE scheme or as a way of assessing how to proceed with in vitro testing [90].Specifically, non-testing methods may inform whether to perform tests through a Top-Down or Bottom-Up Approach (Figure 3).If the test material is anticipated to be of no to low irritancy potential, the Bottom-Up approach would be initiated, with the first test allowing to distinguish irritant/corrosive materials from non-classified materials; if a material is found not to be irritant/corrosive, no further testing is required.In contrast, test materials estimated to have high irritancy/corrosivity potential would initiate the Top-Down approach, which assumes that the material is irritating/corrosive, and initiates testing by distinguishing moderate from severe effects; further testing is only performed if a moderate effect is observed.This streamlined approach ensures that no unnecessary testing is performed [94].
This approach can be applied to tiered in vitro testing guidelines used to assess skin irritation and corrosion: OECD TG 439 [64] and 431 [62], respectively.Both guidelines are comprised of multiple methods (Table 4) and are based on a three-dimensional reconstructed human epidermis (RHE) model, which closely mimics its biochemical and physiological properties.Cell viability is used as readout, with different thresholds applying to skin irritation and corrosion.Furthermore, a testing waiver may be obtained if the substance meets any of the criteria outlined in OECD GD 237 [93] (Table S4).Serious eye damage refers to damage to the ocular tissue or deterioration of vision, which has not fully reversed after 21 days of exposure to a substance.Eye irritation is classified as any change to the eye after the application of a test substance, which is fully reversible within 21 days of exposure [8].
While there are currently no validated in vitro methods for a direct classification of mild eye irritants, there are several options to identify chemicals that induce serious eye damage, and those that do not require classification [90].These methods (Table 5) can be organotypic (ex vivo, utilising tissues obtained from slaughterhouses), cytotoxicity and cell function-based, reconstructed human tissue-based, or macromolecular [8].When designing a testing protocol, the same Bottom-up/Top-down approach illustrated in Figure 3 can be applied to the assessment of eye irritation [94].A testing waiver may also be granted if the substance meets the criteria (Table S5).

Organotypic tests Able to identify chemicals that induce serious eye damage (GHS Category 1), and chemicals that do not require classification
Bovine Cornea Opacity Permeability (BCOP, OECD TG 437) [75] Determines a test chemical's ability to cause opacity and permeability in an isolated bovine cornea.Isolated Chicken Eye (ICE, OECD TG 438) [76] Determines a test chemical's ability to induce toxicity in an enucleated chicken eye.

Short Time Exposure (STE, OECD 491) [79] Evaluates eye irritation potential of a test chemical by measuring its cytotoxic effect on a rabbit corneal cell line
Fluorescein Leakage (FL, OECD TG 460) [77] Evaluates the toxic effects of a short exposure to a substance by measuring sodium fluorescein permeability through an epithelial monolayer of MDCK kidney cells.A biochemical assay that determines ocular toxicity through the premise that eye irritation and corneal opacity result from the denaturation or perturbation of corneal proteins.

Reconstructed human tissue (RhT)-based tests
Stem-cell-based retina models are also being explored as a promising replacement for in vivo ocular toxicity testing [8].Due to the great cell-type diversity necessary to allow the mammalian retina to perform a series of intricate processes, the development of a truly biomimetic model has proven to be a complex task.3D-layered organoids derived from embryonic and induced pluripotent stem cells were able to reproduce major, but not all, aspects of the retina; they hold promise as future 'organ-on-chip' models [96].

Skin Sensitisation
Skin irritation (also known as irritant contact dermatitis) is an inflammatory response of the skin to exogenous agents, depending on various factors which include the dose and nature of the irritant itself and the condition of the skin's barrier function; this response does not depend on prior sensitisation [97].In contrast, skin sensitisation, or allergic contact dermatitis, is a delayed allergic reaction involving the adaptive immune system, with a T-cell mediated response occurring after an initial episode of sensitisation.The initial exposure may not elicit a reaction in the skin, which is often observed during subsequent contact with a sufficient dose of the sensitiser [98].
In 2020, a non-animal, next generation risk assessment (NGRA) tiered framework was developed to assess skin sensitisation.Tier 0 focuses on the reviewing of existing information (e.g., use scenario/consumer exposure, physicochemical properties and purity of the substance, in silico predictions, existing in vitro/historical data, identification of read-across candidates).In Tier 1, a hypothesis is generated, and all previously gathered data are considered.If the existing information is found to be insufficient, the generation of additional information through exposure estimation refinement or in vitro/in chemico testing is explored in Tier 2. This is followed by the determination of a point of departure (PoD; the dose-response point corresponding to a low or no effect level), characterisation of uncertainty, and comparison of the reference dose to consumer exposure following a WoE approach [99].
In 2021, the OECD published a guideline on Defined Approaches for Skin Sensitisation (OECD TG 497) [82], which was subsequently adopted by all OECD member countries, making skin sensitisation the latest toxicological endpoint to obtain a suitable replacement to the use of animal testing.Much like the NGRA framework, this guideline combines multiple types of data to obtain a conclusion regarding the safety of a chemical and relies on different defined approaches (DAs) for hazard identification and potency categorisation, as described in Table 6.Table 6.Summary of non-animal defined approaches (DAs) included in OECD TG 497 [82].

Defined Approach (DA) Information Sources Capability (Hazard and/or Potency)
'2 out of 3' (2o3) DA Able to distinguish chemicals that induce skin sensitisation (GHS Category 1) from chemicals that do not require classification.

K3 (dendritic cell activation):
Human Cell Line Activation Test (h-CLAT; OECD TG 442E) [68] Hazard Integrated Testing Strategy (ITS) DA Able to distinguish chemicals that induce skin sensitisation (GHS Category 1) from chemicals that do not require classification and allocate skin sensitizers into GHS sub-categories (1A or 1B).

.5. Repeated Dose Toxicity
The final stage in determining the safety of a cosmetic ingredient is to calculate the margin of safety (MoS), which is mostly taken from oral toxicity trials (unless substantial dermal toxicity data are available).When considering oral toxicity studies, the following equation is used to calculate the MoS of a cosmetic ingredient [8]: The PoDsys is a dose descriptor for systemic exposure to a chemical that is calculated from the oral PoD using the fraction of the substance absorbed systemically.SED stands for Systemic Exposure Dose, which can be derived from the absolute amount of chemical that is bioavailable after a determined time, or from the percentage of the chemical that is absorbed through a dermal route.If PoD cannot be determined, then NOAEL (No-Observed-Adverse-Effect Level) or LOAEL (Lowest-Observed-Adverse-Effect Level) values can also be used [8].
NOAEL and LOAEL values are the outcome of in vivo repeated dose toxicity studies performed over a period of 28 or 90 days on a variety of vertebrates.The lack of validated NAMs for determining these values poses a great problem for the introduction of new cosmetic ingredients to the EU market [8].
Over the past decades, several in vitro methods for the assessment of this endpoint have been developed, explored, and summarised in EURL ECVAM reports [100][101][102][103].Further innovation has been achieved by the project known as 'Safety Evaluation Ultimately Replacing Animal Testing' (SEURAT), the first phase of the ambitious research strategy launched by the EC and Cosmetics Europe.Moving from traditional in vivo testing to predictive toxicology, SEURAT-1 adopted a Mode-of-action (MoA) approach [104,105], which can be defined as "a description of key events or processes by which an agent causes a disease state or other adverse effect" [104].By relying on existing data, in silico methods, and biokinetic considerations, SEURAT-1 led to a case study outlining an ab initio ('from the beginning') workflow [106].This workflow organises knowledge and data in a logical order for an integrated safety assessment that considers numerous data streams [103].SEURAT-1 went on to publish 6 yearly reports until 2016, and was then followed by a new European project, EU-ToxRisk, which received an investment of over 30 million euros, and continues to drive research on non-animal, mechanism-based toxicity testing and risk assessment [103,107].

Reproductive Toxicity/Endocrine Disruption
Reproductive toxicity covers the effects of toxicants on any physiological process and/or anatomical structure related to animal reproduction and development.Signalling within and between organs/cells is required as part of normal reproduction and development, making this process especially prone to xenobiotic-induced adverse effects.This specifically applies to the disruption/interference of signalling pathways involving the primary gonadal steroids (androgens and oestrogens).By imitating and/or inhibiting the actions of these sex hormones, xenobiotics can be classified as 'endocrine disruptors'.Ultimately, endocrine disruption is one facet of the complex process of animal reproduction [108].
The mammalian reproductive cycle is comprised of distinct phases: male and female fertility, implantation, and pre-and post-natal development.Due to their complexity, these phases cannot be assessed by one alternative method; rather, a battery of tests is needed [8].The creation of an alternative testing strategy for reproductive toxicity was tasked to the Integrated Project ReProTect, a consortium set up by the ECVAM [109].This project identified four research areas of interest: three coinciding with the different phases of the reproductive cycle, and a fourth dedicated to the identification of endocrine disrupters [110].During its final year, this project conducted a ring trial where blinded chemicals with well-documented toxicological profiles underwent a battery of 14 in vitro tests developed within ReProTect.When coupled with a WoE approach, a comparative analysis allowed for a robust prediction of adverse effects on fertility and embryonic development previously observed in vivo [111].However, further research is required until regulatory acceptance is achieved [8].
Currently, there are 8 EU-funded projects focused on the identification of chemicals that lead to reproductive toxicity/endocrine disruption, housed within the European Cluster to Improve Identification of Endocrine Disruptors (EURION) [112].Endocrine disruption is also the toxicological endpoint with the highest number of methods (19) currently under review, as per the TSAR database [37].Nevertheless, due to the complexity of this endpoint, animal testing may still be required to ensure consumer safety [8].

Mutagenicity/Genotoxicity
Mutagenicity applies to a permanent change, i.e., mutation in the structure or quantity of genetic material.Mutagenic agents cause a rise in the incidence of mutations in populations of cells and/or organisms.This might lead to a heritable modification in the organism's traits, often reflected phenotypically.Genotoxicity is a more general term, referring to agents or conditions that modify the structure, information content or segregation of genetic material.Going beyond mutation, genotoxic agents may also induce DNA damage by disrupting normal replication processes, or by altering DNA replication in a temporary, non-physiological fashion [8,113].
The goal of genotoxicity testing is to rule out or identify possible human risks and, for substances that render positive results, to help in the understanding of their mechanism of action [113].Gene mutation (point mutations or deletions/insertions that affect single or blocks of genes), clastogenicity (structural chromosome changes), and aneuploidy (numerical chromosome aberrations) are three major endpoints of genetic damage associated with human disease [114].These can be efficiently assessed through a combi-nation of a bacterial gene mutation (Ames) test (OECD TG 471) [115], and an in vitro micronucleus test (MNvit, OECD TG 487) [71], the latter being able to detect both clastogens and aneugens.
A strategy for genotoxicity/mutagenicity testing of cosmetic ingredients utilises a combination of initial considerations (read-across, QSAR and other in silico models, physicochemical properties, etc.) to inform future steps, including the selection of in vitro methods.A material is classified as an in vitro mutagen if one of two tests is positive.Additional animal testing is required to rule out the substance's possible in vivo mutagenicity [8].
The requirement for additional in vivo animal testing can lead to the loss of a cosmetic ingredient from further development, as currently there are no validated in vitro assays to assess genotoxicity through dermal exposure.To target this limitation, the reconstructed human skin (RS) comet test and the reconstructed human skin micronucleus (RSMN) assay were developed through a Cosmetics Europe effort.These methods are currently undergoing assessment by the EURL ECVAM [8,116].

Carcinogenicity
Carcinogenic substances cause or exacerbate the incidence of tumours, promote malignancy, or decrease the period until tumour formation after inhalation, ingestion, topical application, or injection.These can be further differentiated as genotoxic carcinogens (GTxC) and non-genotoxic carcinogens (NGTxC)-chemicals that produce a carcinogenic action by processes other than direct interactions with DNA [8].
Mutagenicity/genotoxicity test batteries may be applied as a pre-screening method for assessing the carcinogenicity of genotoxic substances.A positive result indicates that the substance may be considered as a potential carcinogen, and further testing is required [8,117].While the standard rodent carcinogenicity bioassay (RCB) is considered the golden standard for carcinogenicity testing, it involves extensive animal use and poses multiple limitations, including high costs, prolonged duration of the study (2 years), and insufficient mechanistic information, which can make extrapolation difficult [118,119].
The Cell Transformation Assay (CTA) was explored as a prospective in vitro replacement for carcinogenicity testing, as it addresses several endpoints reflecting different stages within the process of carcinogenesis [8].It provides a simple oncotransformation endpoint that may be used to link exposure to the development of malignancy [118,119].When considered alongside additional information such as genotoxicity data, structureactivity analyses, and pharmaco/toxicokinetic data, CTAs should allow for a wide-ranging assessment of carcinogenic potential [8].While the OECD concluded that CTAs could not be considered as a stand-alone method to assess carcinogenesis [120], the EURL ECVAM has validated three CTAs-the BALB/c 3T3 CTA [121], the Syrian Hamster Embryo (SHE) CTA (OECD GD 214; OECD, 2015a), and the Bhas 42 CTA (OECD GD 231) [60], which may be used in a WoE approach [8,118].
CTAs were considered as potential building blocks of an integrated approach for testing and assessment (IATA) of NGTxCs [118], the development of which was tasked to an expert group established by the OECD in 2016 [122].The adverse outcome pathway (AOP) concept, which allows for hazard assessment through incorporation of mechanistic information relating to a chemical [83], was applied to the development of various cancer models, alongside the identification of primary mechanisms of action.The project has identified 100 in vitro assays that warrant further evaluation [120].
The same mechanistic approach was also applied to a recent EURL ECVAM project which developed a new methodology for carcinogenicity testing.By combining data originating from several systemic toxicity endpoints, rather than studying them individually and integrating them with other data sources, this project was able to create an efficient testing strategy and reduce unnecessary toxicity studies [116,123].

Photo-Induced Toxicity
Phototoxicity is an acute response to the activation of photoreactive chemicals after exposure to ultraviolet radiation/visible light (UV-Vis), and their transformation into cytotoxic agents.Phototoxicity can be caused by chemicals that come into contact with the skin and are absorbed, or by systemically absorbed substances that find their way to the skin and are then activated by incident radiation.These can then lead to skin irritation or sensitisation (depending on the presence of an immune response), alongside visible symptoms such as erythema, pruritis, and oedema [124].
The 3T3 Neutral Red Uptake Phototoxicity Test [125,126] is a validated in vitro method for the assessment of photo-induced irritation.It assesses photo-cytotoxicity by evaluating the relative reduction in cell viability following sample exposure in the presence versus absence of UV-Vis radiation.While highly sensitive (93%), this method cannot predict photogenotoxicity/photomutagenicity, photocarcinogenicity, or photoallergy (photosensitization) [124].However, chemicals with positive results in the 3T3 NRU PT test are likely to show photo-allergic properties; further testing to assess in vitro photoallergenic potential could be conducted through different assays [127].Since the generation of reactive oxygen species (ROS) following UV-Vis exposure is described as a key determinant of chemicals causing direct phototoxic reactions, the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH) guideline S10 also recommends an optional initial in chemico screening tool (the ROS assay) for evaluating photoreactivity [70,128,129].
The evaluation of photogenotoxicity/photomutagenicity has been deemed superfluous across multiple guidelines [130,131].However, in the event that a molecule's structure, light absorption capacity, or ability to be photoactivated signals such potential, the SCCS guidance [8] suggests the completion of in vitro tests, e.g., photo-Ames test and photo-Comet assay [132].

Toxicokinetics
The term "toxicokinetics" refers to the process of chemical's absorption, distribution, metabolism, and excretion (ADME) after it enters the body.It gives information on timedependent blood/plasma or tissue concentrations of a chemical, its propensity for accumulation and biotransformation, alongside its potential for induction/inhibition of biotransformation upon exposure [8,101].Toxicokinetics is regarded as a critical component in assessing systemic effects when transitioning from traditional toxicological safety assessment methods based on whole animal to approaches based on in vitro and in silico technologies [9,101].There are currently no verified alternative approaches that thoroughly cover the subject of ADME.Some in vitro models may be useful for assessing certain endpoints of this process; however, the majority have not been officially validated [8].
The most promising NAM in this field, which has been under consideration for more than a decade, is the application of physiologically-based toxicokinetic (PBTK) computational models, which can be used to produce human whole-body toxicokinetic information [101,102].In a PBTK model, the body is depicted as a collection of compartments (individual organs), linked by blood flow.To forecast the concentration-time profile of a substance in tissues, cellular compartments, or sub-compartments, the models combine knowledge of physiology and anatomy with chemical-specific information.These models allow for the integration of human data provided by in silico and in vitro ADME approaches; however, the absence of standardisation hinders their regulatory approval and application [102,103,133].

Public Perception of Animal Testing in Cosmetics
Only a few studies have investigated the attitudes of the general public towards animal research in the cosmetic field [15,134,135].
The IPSOS Mori's 2018 survey [15] highlighted that one's perception of this topic is not always based on objective understanding of the field: a group of participants that believed that the testing of cosmetic products and ingredients was still legally allowed in the UK (38%) were also more likely than the general population to feel well-informed about animal experimentation (41% compared with 35% overall).
Multiple studies have found that familiarity with scientific research increased support of animal studies [136][137][138][139], but this finding was not universal.It has been argued that knowledgeable members of the public and those who are more familiar with animal research are generally less supportive of morally contentious areas, e.g., animal studies performed on cosmetic products/ingredients [140,141].However, the measurement of 'knowledge' or 'familiarity' could benefit from being standardised to account for cognitive bias and self-perception fallacies, as highlighted by IPSOS Mori [15].
Aldhous, Coghlan, and Copley [142] found that, when an experiment was devised to evaluate the safety of a cosmetic component rather than the safety and efficacy of a drug or vaccine, participants were more inclined to disapprove; surveys by IPSOS Mori [15] and the Fund for the Replacement of Animals in Medical Experiments [143] confirmed this finding.Furthermore, consumers' purchasing choices when considering cosmetics are heavily dependent on the absence of animal testing [143].Indeed, the cosmetics industry has been responding to this consumer demand for over two decades, with cruelty-free certifications being a part of product marketing since 1996 [18].
The cosmetics industry's support to NAMs, started in the 1980s through funding and scientific contributions, continues to this day.A leading example in the EU was the Long-Range Science Strategy (LRSS) programme, funded by the members of Cosmetics Europe from 2016 to 2020.Worth mentioning is the work of industry conglomerate L'Oréal in the development of the first reconstructed human epidermis model in 1979 [142], leading to the validation of the EpiSkin™ in vitro model by the EURL ECVAM in 1998 [143].There are also industry grants for projects on the development and acceptance of NAMs, e.g., the annual Lush Prize [144].Clearly, the industry must improve communication with its consumers in this area, since there is no evidence that the public is aware of its effort and investment.
Another factor that has been found to positively influence consumer's purchasing decisions is the perception of a reduced environmental impact, usually associated with words such as 'eco' or 'green' [145][146][147].This demand attracted brands and raw material manufacturers to Green Chemistry, described as the use of a set of principles in the design, production, and application of chemicals that lowers or eliminates the use or creation of hazardous compounds, hence posing a lower risk to human health and the environment [148].
It is worth noting that hazard-based approaches rely on a simple presence of a potentially harmful agent, whereas a risk-based approach aims to establish health-based guidance using toxicological data [149].The priority given by Green Chemistry to hazard above risk may lead to confusion and perpetuate the dichotomy between 'real' and 'perceived' risk.This is often observed when so-called 'controversial technologies' (i.e., ingredients which can be hazardous to a degree, but pose no risk to human health or the environment when used in cosmetics) are being evaluated by the public [150].
Following similar principles to those of Green Chemistry, the emerging field of Green Toxicology is of particular importance in this area.It brings into focus in silico predictive toxicology, aiming to lower waste, produce human-relevant data, and reduce animal use [151].Its derivation from the already popular Green Chemistry, alongside its association with improved animal welfare, makes it of particular interest to those wishing to gather public support around NAM development [152].

Conclusions
When looking at the work of Russel and Burch from a 21st century perspective, three major drivers for the development of non-animal testing become apparent: ethical considerations, the lack of effective extrapolation, and economic considerations.The ethical considerations, which they tried to avoid, were in fact a major incentive for ending animal experimentation and have since led the quest to obtain human-relevant data at the smallest cost.
Animal welfare activism and public opinion have laid the weight of discontinuing animal testing on the shoulders of the cosmetics industry, ultimately leading to a fourth and very important driver-animal testing bans.The response coming from collaborations between scientists, NGOs, policy makers, and the cosmetics industry has led to what is now the constantly evolving field of New Approach Methodologies (NAMs).
While no validated alternatives exist currently to assess acute toxicity, repeated dose toxicity, reproductive toxicity, carcinogenicity, and a major part of toxicokinetics, new multi-factorial/tiered approaches are being developed, coming away from the traditional, and often unattainable, one-for-one replacement.Skin sensitisation assessment has recently achieved 'cruelty-free' status through the development of the Next Generation Risk Assessment (NGRA) tiered framework, and similar approaches are currently being developed for other endpoints.Further examples include the combination of data derived from multiple endpoints and the use of in silico computational models such as Physiologically-Based Toxicokinetics (PBTKs).
However, despite the rapid growth that the field of regulatory toxicology has seen over the past decade, public opinion continues to focus on the absence of animal testing, rather than the development of methods that can help to discontinue it across multiple industries.Increasing public awareness seems to be the answer for those wishing to gain support in the development and regulatory acceptance of NAMs.

Figure 1 .
Figure 1.A mind map of the search terms.

Figure 2 .
Figure 2. (a) Adopted alternative testing methods, based on TSAR data[37]; (b) Toxicological endpoints of adopted NAMs related to human safety of cosmetics, based on TSAR data[37].

Figure 3 .
Figure 3. Bottom-Up and Top-Down in vitro testing strategy approach for irritation testing (modified from [94]).

Table 1 .
Literature sources used for this review.

Table 2 .
[17,54,55]nimal testing bans[17,54,55]; (a) ban relating to the testing of cosmetic products within the country's borders; (b) ban relating to the testing of cosmetic ingredients within the country's borders; (c) ban relating to the marketing of products tested on animals, within the country's borders or otherwise.

Table 3 .
[37]ted NAMs relating to toxicological endpoints relevant to human hazard assessment in cosmetics, based on TSAR (modified from[37]).
[81]igel-EIT (OECD TG 494)[81]Determines eye irritation potential of a test chemical by evaluating its ability to induce damage to the barrier function of a hCE model fabricated in a Collagen Vitrigel Membrane (CVM) chamber, by measuring relative changes in transepithelial electrical resistance (TEER) over time.