ChemSkin Reference Chemical Database for the Development of an In Vitro Skin Irritation Test

Since the animal test ban on cosmetics in the EU in 2013, alternative in vitro safety tests have been actively researched to replace in vivo animal tests. For the development and evaluation of a new test method, reference chemicals with quality in vivo data are essential to assess the predictive capacity and applicability domain. Here, we compiled a reference chemical database (ChemSkin DB) for the development and evaluation of new in vitro skin irritation tests. The first candidates were selected from 317 chemicals (source data n = 1567) searched from the literature from the last 20 years, including previous validation study reports, ECETOC, and published papers. Chemicals showing inconsistent classification or those that were commercially unavailable, difficult or dangerous to handle, prohibitively expensive, or without quality in vivo or in vitro data were removed, leaving a total of 100 chemicals. Supporting references, in vivo Draize scores, UN GHS/EU CLP classifications and commercial sources were compiled. Test results produced by the approved methods of OECD Test No. 439 were included and compared using the classification table, scatter plot, and Pearson correlation analysis to identify the false predictions and differences between in vitro skin irritation tests. These results may provide an insight into the future development of new in vitro skin irritation tests.


Introduction
Cosmetics and toiletries are the main sources of human exposure to potentially dangerous chemicals among the general public [1]. Therefore, it is essential to evaluate the toxicity of chemicals used in these products before product release into market. Most of the toxicity items required for the safety evaluation of chemicals use experimental animals, but more than 35 countries have banned animal tests for cosmetics since 2013 [2,3]. Various alternatives have been developed to replace animal testing for cosmetics, including tests in nonvertebrate animals and in vitro, and in chemico and in silico methods [4][5][6]. For the regulatory body to accept the data produced from methods other than the standard test method, the methods should be verified in their ability to address the toxic endpoint with relevance and reliability to the same extent as the existing standard test method through officially endorsed procedures. This validation procedure is clearly described in OECD Guidance Document 34 [7].
Relevance, also called predictive capacity, is evaluated by comparing the concordance of the prediction for reference chemicals made by the new test method with that from the existing standard test method, generally an in vivo animal test [8,9]. Reference chemicals with quality in vivo data are, therefore, important for evaluating the relevance of new test methods [10]. A reference chemical database for in vivo eye irritation test sensitization has been well established [11]. However, to the best of our knowledge, a reference chemical The OECD Test No. 439: In vitro Skin Irritation: Reconstructed Human Epidermis (RhE) Test Method [18] was developed to identify No category chemicals from other chemicals in accordance with the UN GHS classification. The OECD test guideline (TG) 439 provides an in vitro procedure using RhE to predict the skin hazard of irritant chemicals (substances and mixtures) based solely on the cell viability value obtained with 3-(4,5dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) assay. Three validated reference methods (VRMs), the EpiDerm™ skin irritation test (SIT), EpiSkin™ SIT, and SkinEthic RhE SIT, were originally approved for OECD TG 439 in 2010. Following the performance standards of TG 439, four me-too models were additionally approved for OECD TG 439 in 2021 [19] and new models like USP-RhE [20] are under development for inclusion in TG 439. However, TG 439 has several limitations. TG 439 does not classify chemicals to the optional UN GHS Category 3 (mild irritants). In addition, TG 439 RhE test methods cannot resolve between UN GHS Categories 1 and 2; thus, further information on skin corrosion is required to decide on the final classification of certain chemicals. To resolve these limitations, inclusion of an IL-1α assay was considered [21,22], but it failed to exceed the result of MTT assay [23]. Therefore, a novel skin irritation test method is still in demand to overcome the limitations of the current OECD TG 439.
One of the major problems for TG 439 is that incongruent sets of reference chemicals have been used to evaluate the predictive capacity of the individual RhE methods in TG 439, suggesting a need to establish a well-characterized reference chemical database for the evaluation of novel skin irritation test methods [24] that will overcome the current TG 439. Here, we compiled a reference chemical database (ChemSkin DB) for the development and evaluation of alternative methods for in vivo skin irritation test. The first candidates were selected from 317 chemicals (source data n = 1567) searched from the literature for the last 20 years, including previous validation study reports, ECETOC, and published papers. Chemicals without GHS category information or only with the information on past EU classification criteria were removed. In addition, chemicals showing inconsistent classification results and those without both in vivo and in vitro test results were excluded, leaving a total of 100 chemicals. Supporting references, in vivo Draize scores, UN GHS/EU CLP classifications, and commercial sources in the manuscripts were compiled. Furthermore, test results produced by OECD TG 439 (RhE Test Method) were included if available, to provide an insight into the future development of alternatives to the in vivo skin irritation test. As described in OECD TG 404 [13], the in vivo skin irritation test (Draize test) is conducted and erythema and edema scores at 24-and 72-h post-exposure (or 24, 48, and 72 h, if available) are combined and averaged into a PII. According to PII, the skin irritancy of a test chemical is classified as described in Table 1 [25]. According to the EU DSD/DPD criteria, the skin irritancy of a test chemical is classified as follows: an in vivo score of 2.0 or higher is classified as R38 (irritant) and a score less than 2.0 is classified as No label (non-irritant).

Materials and Methods
As a replacement for the EU DSD/DPD regulations, the CLP regulation was created as a new system that reflects the UN GHS classification and labeling regulation. This is referred to as the EU CLP or UN GHS/CLP, which is different from the current UN GHS. Although the CLP regulation was defined in 2008 as Regulation (EC) No. 1272/2008 [16], it has been used interchangeably with the classification criteria of the previous EU DSD/DPD [15]. Chemical classification and labeling systems have been completely replaced. According to this UN GHS/CLP classification, the skin irritancy of a test chemical is classified as follows: scores < 2.3 indicate No category (non-irritant) and scores > 2.3 indicate Category 2 (irritant).

Classification Based on the Current UN GHS
In the revised version of the UN GHS 2019 [17], the skin irritancy of a test chemical is classified as follows: No category (non-irritant) in cases of <1.5, Category 3 (mild irritant) in cases of ≥1.5 and <2.3, and Category 2 (irritant) in cases of ≥2.3 and ≤4.0. Category 3 is newly adopted but only a few authorities employ it. OECD TG 439 defines Category 3 as a non-classified chemical [19].

Classification of Edema/Erythema of 100 Reference Chemicals Based on the In Vivo Draize Test
The in vivo scores of 100 reference chemicals of the ChemSkin DB were obtained through the literature search. ChemSkin DB chemicals were newly classified according to the in vivo classes and in vivo categories described above.

Comparison of In Vivo and In Vitro Data of ChemSkin DB Reference Chemicals
In vivo scores and in vitro viability values obtained from OECD TG 439 were compared for ChemSkin DB reference chemicals. The skin irritancy was identified based on in vivo scores of 2.3 or higher or with a viability cut-off of 50%. The in vivo score was compared with the viability data of VRMs and the KeraSkin™ model. The VRMs used in the analysis are EpiSkin™, SkinEthic™, RhE, EpiDerm™, and LabCyte-EPI. The results of the test methods were plotted as a scatter plot to show the data distribution, and the cut-off values were displayed to easily demonstrate the incorrect values.
The predictive capacity of VRMs and the KeraSkin™ was calculated by indexes of sensitivity, specificity, and accuracy. According to the OECD TG 439 performance standard, a sensitivity of ≥ 80%, specificity of ≥ 70%, and accuracy of ≥ 75% must be satisfied.
The Pearson correlation coefficient was obtained to confirm the correlation between in vivo score and viability. The Pearson correlation coefficient is a measure of the degree of correlation between two variables, and it is one of the most widely used measures of relationships [26,27]. It has a range of values from −1 to +1; the closer the coefficient is to the absolute value of 1, the greater the association between the two variables. Correlation coefficients ≤0.35 are generally considered to represent low or weak correlations, those of 0.36 to 0.67 have a modest or moderate correlation, those of 0.68 to 1.0 indicate a strong or high correlation, and correlation coefficients ≥ 0.90 reflect a very high correlation [28].

Chemical Selection for the Establishment of ChemSkin DB
To establish the ChemSkin DB, the reference chemicals were searched from the literature over the last 20 years and reviewed for the quality of in vivo data and availability of in vivo scores, which is critical for the classification of optional Category 3 as recently stated by UN GHS. In addition to in vivo results, human patch test results and the in vitro results produced by four validated reference methods of OECD 439 were included. The final ChemSkin DB was completed by adding the source literature information or official review reports (SCCS, SCCP, CIR, etc.). Through this procedure, 100 reference chemicals were included (source data n = 1567) in the final version of ChemSkin DB. The composition of 100 chemicals is shown in Table 2 and Figures 2 and 3.      [19,32]. Under this test guideline, Cat 3 is considered as no category (non-irritant). # Twenty reference chemicals are performance standards (PS) based on the Series on Testing and Assessment No. 220 [32]. These PS are now available related to the present OECD TG 439. PS are available to facilitate the validation and assessment of similar and modified RhE-based test methods, in accordance with the principles of OECD Guidance Document No. 34 [7]. ## 1-Decanol (a borderline reference chemical) and di-n-propyl disulphide (a false negative of the VRM) are non-irritants in humans, although being identified as irritants in the rabbit test. Since RhE models are based on cells of human origin; they may predict these reference chemicals as non-irritants (UN GHS No category). ### According to the OECD TG 439, 1-methyl-3-phenyl-1-piperazine and 1-bromohexane can have variable results in different laboratories dependent on the supplier. #### According to the Series on Testing and Assessment No. 137 [31], methyl palmitate is a false negative in EpiSkin TM , modified EpiDerm TM , and SkinEthic TM RHE. This chemical is also a non-irritant to humans based on the human 4-hr patch test. * EpiSkin TM data produced in China [37];   The reference chemical group included 46 No category chemicals, 18 optional Category 3 chemical, 22 Category 2 chemicals, 5 Category 1B chemicals, 3 Category 1C chemicals, and 6 Category 1B/1C chemicals (Figure 2). Among the total chemical group, there were 76 liquids, 23 solids, and 1 gel (Figure 3). A total of 51 studies were reviewed to establish ChemSkin DB; 41 were used for in vivo data, consisting of 22 published papers, 14 reports, and 5 government documents. In vivo Draize scores and UN GHS/EU CLP classification information were also added. In addition, the in vitro data produced by the RhE test methods of OECD TG 439 were sourced from 16 published papers, 3 reports, and 4 government documents.

In Vivo Draize Scores of 86 Chemicals in ChemSkin DB
Scatter plot is widely used to analyze the data distribution across categories [66]. In vivo Draize scores of 86 chemicals, excluding the 14 Cat 1 chemicals, are plotted as Figures 4 and 5. Plotted chemicals based on irritant or non-irritant classification showed that some irritant chemicals have scores just above the threshold of 2.3, suggesting that they may be determined as false negatives (Figure 4). The mean ± SD of the in vivo score for each in vivo class was 3.27 ± 0.62 (20 irritants) for irritants (excluding Cat 1 chemicals, not stated values) and 0.78 ± 0.81 (55 non-irritants) for non-irritants (excluding not stated values) (irritant ≥ 2.3, non-irritant < 2.3), respectively. When Cat 3 is considered, the in vivo score of Cat 2 was 3.27 ± 0.62, Cat 3 was 1.92 ± 0.14 and No Cat was 0.36 ± 0.47 ( Figure 5), which conforms to the current UN GHS classification criteria (UN GHS classification criteria: No Cat < 1.5, 1.5 ≤ Cat 3 < 2.3, 2.3 ≤ Cat 2 ≤ 4.0).  The scatter plot indicates the irritation signs for the Draize scoring. The in vivo score 2.3 is the cut-off for Cat 2 and 1.5 is the cut-off for Cat 3. Scores of NC chemicals without an in vivo score value available were set to 0.

Comparison of Prediction Results Produced by OECD TG 439 Test Methods for ChemSkin DB Substances
In the comparison of the results of the four approved methods of OECD TG 439 and in vivo scores, EpiSkin™ showed 3 false positives and 1 false negative, SkinEthic HCE™ showed 6 false positives, EpiDerm™ showed 5 false positives and 1 false negative, LabCyte EPI-MODEL24 showed 6 false positives and 1 false negative, and KeraSkin™ showed 4 false positives ( Figure 6). Interestingly, compared with other models, KeraSkin™ and SkinEthic™ showed viability values near either 0% or 100%, reflecting that these models show a type of 'all-or-none' responses, while other models showed a number of borderline results around the cut-off. We calculated the mean ± SD viability of irritants and non-irritants. KeraSkin™ showed 10.57 ± 9.12% and 77. 55   A correlation analysis of the in vivo score and in vitro tissue viability was performed. The correlation coefficient was in the order of SkinEthic™ (−0.803), KeraSkin™ (−0.773), EpiSkin™ (absolute value of 0.760), LabCyte EPI-MODEL24 (−0.752), and EpiDerm™ (−0.749) ( Table 4), suggesting that all VRMs and KeraSkin™ of OECD TG 439 produced tissue viability data highly correlated with in vivo scores.

Conclusions
In this study, we compiled the ChemSkin DB listing 100 reference chemicals for the development and evaluation of new test methods for skin irritation test through the review of more than 317 reference chemicals. The selection of correct reference chemicals is pivotal in the establishment and optimization of a new test method. Detailed information such as supporting literature, in vivo Draize scores, UN GHS/EU CLP classifications, and commercial sources were included, which could be invaluable for the developers of new skin irritation test methods. In addition, the test results produced by five methods approved in the current OECD Test No. 439 (2021) were included, compared in a table, a scatter plot, and analyzed for the correlation with in vivo Draize scores. Overall, the current RhE methods of TG 439 could not distinguish Category 3 from other categories, but strong correlations between viability and in vivo scores suggest an opportunity for further improvement. Collectively, we believe that our study will provide important insight into the future development of new in vitro skin irritation testing methods.