A New Benchmark to Determine What Healthy Western Skin Looks Like in Terms of Biodiversity Using Standardised Methodology

: A signiﬁcant loss of microbial biodiversity on the skin has been linked to an increased prevalence of skin problems in the western world. The primary objective of this study was to obtain a benchmark value for the microbial diversity found on healthy western skin, using the Chao1 index. This benchmark was used to update our 2017 skin health measuring mechanism in line with standardised methodology. It used 50 human participants from Graz in Austria and at a read depth of 6600 sequences, we found the average Chao1 diversity to be ~180, with upper and lower quartiles of ~208 and ~150, respectively. Previous work with a larger sample size was unsatisfactory to use as a benchmark because di ﬀ erent diversity indices and evaluation methodologies were used. The Medical University of Graz used the most recent version of the Chao1 index to obtain diversity results. Because of this study, we can transfer other benchmarks of skin microbiome diversity to the methodology used in this work from our 2017 study, such as “unhealthy western skin” and “caveman / perfect skin”. This could aid with the diagnostic assessment of susceptibility to cutaneous conditions or diseases and treatment. We also investigated the e ﬀ ect of sex and age, which are two known skin microbiome a ﬀ ecting factors. Although no statistical signiﬁcance is seen for sex-and age-related changes in diversity, there appear to be changes related to both. Our preliminary results (10 in each of the ﬁve age groups) show adults aged 28–37 have the highest average diversity, and adults aged 48–57 have the lowest average diversity. In future work, this could be improved by obtaining benchmark diversity values from a larger sample size for any age, sex, body site, and area of residence, to which subjects can be compared. These improvements could help to investigate the ultimate question regarding which environmental factors in the western world are the main cause of the huge rise in skin problems. This could lead to future restrictions of certain synthetic chemicals or products found to be particularly harmful to the skin.


Introduction
For the c. 300,000 years that Homo sapiens have inhabited the earth [1], our bodies have been host to trillions of microorganisms and recent estimates have suggested that, by cell count, we may be more microbe than human [2]. On the skin, the majority of these microorganisms are beneficial or harmless [3][4][5][6][7] and vital for defending us against infections and disease [8][9][10][11][12][13][14][15][16][17][18], immune system regulation, and lipid metabolism [19]. The skin is like any other ecosystem across nature, i.e., if healthy, and uses the existence of fragile "rare" species as a sign that biodiversity is high. This concept is backed up in other areas of ecology, where it is common knowledge that rare species become extinct first as ecosystems are disrupted, weakened, and lowered in biodiversity [72,73].

Study Conditions and Sampling Procedure
A total of 50 human subjects were recruited from the town of Graz in Austria, and split into the following five different age groups: There were five female and five male study subjects in each group, and all participants were Caucasian. There were 25 males and 25 females in total. All participants were recruited on the criteria they were without chronic skin diseases (no serious skin conditions such as acne, eczema, psoriasis, and skin cancer, among others) and had not used antibiotics within the last three months. All participants did not shower on the day of sampling and all of them had their skin sampled once.
The skin microbiome samples were taken from the volar forearm of the nondominant arm, with sterile BBLTM CultureSwabTM EZ (Becton Dickinson), which had been premoistened with sampling buffer (50 mM Tris (pH 7.2), 1 mM EDTA, 0.5% Tween20). The swab was placed on the forearm and the whole area was swabbed with pressure under rotation in three directions (horizontal, vertical, and diagonal). After sampling, the swabs were directly transferred into DNA-free 1.5 mL reaction tubes (Eppendorf AG) and stored at −80 • C until further processing. Swabbing the skin has widely been used as a suitable sampling method when analysing the skin microbiome [7]. Sampling buffer controls were taken to control the sterility of the used buffer. We also used negative controls, including PCR and sequencing controls, DNA extraction controls, and swab controls to exclude the effect of possible contamination from laboratory reagents.
The entire project was carried out in collaboration with the Medical University of Graz in Austria, and run independently, supervised by a member of staff at the Department of Internal Medicine. All study participants were recruited by the Medical University of Graz. Human skin samples were taken noninvasively and handled with approval by and in accordance with the Ethics Commission at the Medical University of Graz, who approved the study (Ethikkommission der Medizinischen Universität Graz, approval number EK-31-110e × 18/19). Approval allowed the use of human subjects and the following procedures in this methodology. All participants provided written informed consent prior to enrolment in this study. Samples were treated anonymously, and human material was not the focus point of this study. Microbial samples or data derived cannot be linked to a certain individual. The process of the experimentation was agreed upon by the Medical University of Graz, and the Austrian Centre of Biotechnology (ACIB), a not-for-profit research organisation through whom the funding application was made.

DNA Extraction and 16S rRNA Gene Amplification
The swabs were thawed on ice and transferred into Matrix E tubes (MP Biomedicals) with flamed tweezers. The DNA extraction was performed using a FastDNA Spin Kit (MP Biomedicals), according to the manufacturer's protocol with the following derivations: the bead beating step was done with MagNA Lyser (Roche Diagnostic GmbH) at 6400 rpm twice for 30 s, and the first centrifugation step was performed for 10 min. For every DNA extraction run, one extraction control (kit control) was processed with the samples to control the sterility of the kit system. The concentration of the DNA Cosmetics 2020, 7, 79 4 of 19 was determined using a Qubit ds-DNA HS Assay Kit (Invitrogen AG), according to the manufacturer's protocol. DNA concentrations of all samples were under the detection limit. The extracted DNA was stored at −20 • C until further downstream applications.
To amplify the 16S rRNA gene V4 region of healthy skin microbial communities, PCR (polymerase chain reaction) was performed by using 1-2 µL of the extracted DNA as a template. Forward (GTG YCA GCM GCC GCG GTA) and reverse (GGG ACT ACN VGG GTW TCT ATT) primers [74] were added to a final concentration of 200 nM. The following cycling conditions were applied: initial denaturation at 94 • C for 3 min, 35 cycles of denaturation at 94 • C for 45 s, annealing at 50 • C for 60 s, and elongation at 72 • C for 90 s, followed by a final elongation at 72 • C for 10 min.
To visualise the PCR products, agarose gel electrophoresis was performed with 1.5% agarose gel for 35 min, at 70 V. We applied Roti Gel Stain (Carl Roth GmbH + Co. KG) as DNA intercalate to dye the DNA, and we used a Fastruler Low Range (Thermo Fisher Scientific Inc.) ladder to assess the size of the PCR products. PCR products were stored at −20 • C until sequencing. Library construction and Illumina sequencing (MiSeq) was performed by the Core Facility Molecular Biology at the Center for Medical Research in Graz, Austria.

Data Analysis
The obtained 16S rRNA sequences were processed using QIIME 2 (Version 2018.11.0). The data were processed and all required files were formatted as described by the QIIME developers (https: //docs.qiime2.org/). First, the fastq data were imported and the sequences were filtered and denoised with DADA 2. After the quality check, the sequences were trimmed to a minimum length of 300 bp and a maximum of 350 bp. The taxonomy was assigned using 16S rRNA gene reference sequences of the Silva database (version 132).
From the OTUs, alpha and beta diversity analyses were performed using Calypso (http://cgenome. net/calypso/) [75]. All data needed to replicate the results in this paper is included in Supplementary Materials. For Calypso, the data was normalised using total sum normalisation (TSS) and transformed using square root transformation, no samples removed, no taxa removed, and no cut off for rare taxa. Chloroplast and cyanobacteria were removed. On healthy skin, cyanobacteria have been found to contaminate samples due to their similarity, sequence wise, to the DNA sequences of chloroplast which can be present after application and long-term use of herbal skin care products [76]. We did not ask the participants to complete questionnaires on their skin care regime, and therefore we do not know if this would have been applicable. The number of raw reads per sample can be seen in the raw sequence read files (fastq).
For alpha diversity analysis, samples with less than 6600 sequence reads were filtered out to calculate the diversity indices reliably. The removed samples included controls (buffer, kit, and PCR controls) and one sample of a female study subject. We analysed alpha diversity using both the Chao1 and Shannon diversity metrics. For an explanation of the Chao1 index, and how it is different from the Shannon index, see our first paper [34] or previous work [77,78]. We used the Shannon index, in this paper, to evaluate the "evenness" or spread of organisms distributed in a system. Chao1 was part of the "standardized" methodology we used for our skin health measuring mechanism. Filtering was performed, because Chao1 and Shannon analyses rarefied the number of reads to the lowest number of reads in a sample, and with too few sequences these analyses were not reliable. Additionally, sequence data from our previous study [79] was compared with the new dataset regarding their alpha and beta diversity.
Beta diversity between the samples, the overall structural similarity, and variation between the microbiomes using the variables under investigation, were also examined. For beta diversity analysis, no samples and no taxa were removed to visualize the differences between skin microbiome samples and controls. Control samples typically carry very few sequence reads. Rarefaction analyses were done to assess how well the data represented the diversity of the microbial communities, where a larger number of sequences or reads meant the potential of discovering more species or types of microbe was higher.
Within beta diversity analyses, principal coordinates analysis (PCoA) was performed, which is a statistical technique that uses clusters of samples which have similar biological communities to assess differences between them in a simple, graphical form. To perform PCoA analysis, we obtained an abundance matrix following sequencing. This contained microbial abundance information of each sample in a matrix, which was a way of describing data from multiple datasets containing multiple measurements in each. Then, it was converted into a distance or dissimilarity matrix that related each of the samples to the other samples in the groups being compared by the difference between microbial communities. This was, then, visualized as a two-dimensional representation of the results in PCoA plots [80]. The human mind finds it difficult to understand multivariate data with multiple measurements in each sample and to grasp it in more than three dimensions, therefore, this technique projects multidimensional results onto a low dimensional graphic. Each point represents a sample and the smaller the distance between points, the more similar the communities.
Redundancy analysis (RDA resulting in RDA plots) was also used to evaluate the difference in microbial community structure in response to the changing variables in this study. In a simple study with two variables, linear regression and correlation analysis would be performed to find the relationship between the two. RDA is the multivariate equivalent that is used when there are multiple variables [81,82] and is a type of "constrained ordination" that evaluates the extent to which the change in one group of variables is caused by the change in another group of variables [83,84]. The resulting RDA plots, in this study, were used to evaluate how well samples could be distinguished from one another in relation to the explanatory variables such as age and sex.
Here, the limit for statistical significance was set to a p-value of p = 0.001, due to multiple testing. This number was decided because genome-wide association studies could have many measurements per participant. The p-value threshold of p ≤ 0.05 is used for clinical studies where there are a low number of measurements per participant. If it were to be used in studies like the current one, some significant relationships found may be due to chance [85]. In addition, a review of classical and "Bayesian" methods for significance testing has shown that p-values in the region of 0.05 and 0.01 offered only moderate evidence of null hypothesis rejection (the null hypothesis states that there is no relationship between two variables being studied), but p-values around 0.005 and 0.001 showed "strong and very strong evidence", respectively [86]. It recommends a "revised standard for statistical evidence" by setting the significance threshold at 0.005 or 0.001. Figure 1 shows the benchmark value for skin microbiome diversity of healthy "western" skin using the Chao1 richness estimator, and the value can now be used to update our skin-health measuring mechanism. Using Chao1 and at 6600 sequence reads, the level to which every sample in this study was rarefied to the benchmark value obtained was~180 with upper and lower quartiles~208 and 150, respectively. We note this value changed depending on the rarefaction level and sequence reads. As the amount of sequences per sample increases, so does the richness of an ecosystem, as more species could potentially be discovered in a sample which contains more organisms. However, the rate of this increase slows with increasing sequences, as there is only a finite amount of organism types that can be found within a system. Simple rarefaction curves show this concept, such as the ones displayed in our 2017 work [34].

Benchmark for Healthy Western Skin
The alpha diversity of the human participants (n = 50, male = 25 and female = 25) in this study (Figures 1 and 2) is compared to our previous work, which used 32 western human participants from the town of Graz in Austria, all of whom were female, measured at T1 before product use started [79]. This meant they also resembled a normal western person's skin. We included the comparison to see how the diversity changed when a larger sample size, and an even spread of genders and ages was used.
Cosmetics 2020, 7, x FOR PEER REVIEW 6 of 19 Figure 1. The mean diversity benchmark value for "healthy western skin" as found in this study is ~180. Alpha diversity boxplots using Chao1 showing the new benchmark of diversity found in this study (red) and comparing it with the results from our previous work (blue). There are no significant changes in alpha diversity between the studies. The black lines indicate average diversity. Samples were rarefied to 6600, so plots were visually comparable to others in this study.
The alpha diversity of the human participants (n = 50, male = 25 and female = 25) in this study (Figures 1 and 2) is compared to our previous work, which used 32 western human participants from the town of Graz in Austria, all of whom were female, measured at T1 before product use started [79]. This meant they also resembled a normal western person's skin. We included the comparison to see how the diversity changed when a larger sample size, and an even spread of genders and ages was used. There are no significant changes in alpha diversity between the studies. The black lines indicate average diversity. Samples were rarefied to 6600 sequence reads.
Results using the Shannon diversity index are shown in Figure 2. As mentioned previously, we are most interested in Chao1 results because they give the best analysis of diversity, and although the Shannon index is inferior, we include it because it tells us how evenly the microbes within systems are spread between the different inhabiting types.
In both alpha diversity comparisons, the studies were similar, with average diversity slightly higher in the previous study. All samples from both studies were processed using identical procedures from sampling and sample processing to sequencing, including re-analysing the previous data with QIIME 2, which was formerly analyzed using QIIME.
Beta diversity analyses show that the microbial community structure is distinctly different on the skin of participants in this study as compared with those in our previous work. This is shown by the very separate groupings in Figure 3a,b. The PCoA plots are used to visualize similarities and differences between microbial communities. A reason for the stark differences could be "batch effect"-there is no specific factor or reason, which could be controlled, for this difference. The studies may not officially be suitable for comparison because their data are obtained from different Figure 1. The mean diversity benchmark value for "healthy western skin" as found in this study is 180. Alpha diversity boxplots using Chao1 showing the new benchmark of diversity found in this study (red) and comparing it with the results from our previous work (blue). There are no significant changes in alpha diversity between the studies. The black lines indicate average diversity. Samples were rarefied to 6600, so plots were visually comparable to others in this study.
Cosmetics 2020, 7, x FOR PEER REVIEW 6 of 19 Figure 1. The mean diversity benchmark value for "healthy western skin" as found in this study is ~180. Alpha diversity boxplots using Chao1 showing the new benchmark of diversity found in this study (red) and comparing it with the results from our previous work (blue). There are no significant changes in alpha diversity between the studies. The black lines indicate average diversity. Samples were rarefied to 6600, so plots were visually comparable to others in this study.
The alpha diversity of the human participants (n = 50, male = 25 and female = 25) in this study (Figures 1 and 2) is compared to our previous work, which used 32 western human participants from the town of Graz in Austria, all of whom were female, measured at T1 before product use started [79]. This meant they also resembled a normal western person's skin. We included the comparison to see how the diversity changed when a larger sample size, and an even spread of genders and ages was used. There are no significant changes in alpha diversity between the studies. The black lines indicate average diversity. Samples were rarefied to 6600 sequence reads.
Results using the Shannon diversity index are shown in Figure 2. As mentioned previously, we are most interested in Chao1 results because they give the best analysis of diversity, and although the Shannon index is inferior, we include it because it tells us how evenly the microbes within systems are spread between the different inhabiting types.
In both alpha diversity comparisons, the studies were similar, with average diversity slightly higher in the previous study. All samples from both studies were processed using identical procedures from sampling and sample processing to sequencing, including re-analysing the previous data with QIIME 2, which was formerly analyzed using QIIME.
Beta diversity analyses show that the microbial community structure is distinctly different on the skin of participants in this study as compared with those in our previous work. This is shown by the very separate groupings in Figure 3a,b. The PCoA plots are used to visualize similarities and differences between microbial communities. A reason for the stark differences could be "batch effect"-there is no specific factor or reason, which could be controlled, for this difference. The studies may not officially be suitable for comparison because their data are obtained from different There are no significant changes in alpha diversity between the studies. The black lines indicate average diversity. Samples were rarefied to 6600 sequence reads.
Results using the Shannon diversity index are shown in Figure 2. As mentioned previously, we are most interested in Chao1 results because they give the best analysis of diversity, and although the Shannon index is inferior, we include it because it tells us how evenly the microbes within systems are spread between the different inhabiting types.
In both alpha diversity comparisons, the studies were similar, with average diversity slightly higher in the previous study. All samples from both studies were processed using identical procedures from sampling and sample processing to sequencing, including re-analysing the previous data with QIIME 2, which was formerly analyzed using QIIME.
Beta diversity analyses show that the microbial community structure is distinctly different on the skin of participants in this study as compared with those in our previous work. This is shown by the very separate groupings in Figure 3a,b. The PCoA plots are used to visualize similarities and differences between microbial communities. A reason for the stark differences could be "batch effect"-there is no specific factor or reason, which could be controlled, for this difference. The studies may not officially be suitable for comparison because their data are obtained from different sequencing runs, however, the discrepancies from run-to-run are not likely to be significantly large, especially as we use the data here for average diversity using a significant number of participants. It could be different if we were looking at the results from a single sample in each run. sequencing runs, however, the discrepancies from run-to-run are not likely to be significantly large, especially as we use the data here for average diversity using a significant number of participants. It could be different if we were looking at the results from a single sample in each run. Bray-Curtis comparing the current study (red) with our previous study (blue). Samples were rarefied to 6600 sequence reads.

Sex
There were no statistically significant differences in alpha diversity comparing sex (Figure 4a,b), however, the average diversity was slightly higher in males using Shannon diversity, but slightly lower using Chao1. Among the 50 human participants, 25 were male and 25 were female. There are no significant changes in Alpha diversity between the sexes. The black lines indicate average diversity. Samples were rarefied to 6600 reads, so plots were comparable with others in this study. PCoA plots were created for (c) Chao and (d) Bray-Curtis of the female and male participants. There is no significant grouping between the sexes, but the controls and the product control ("no"-grey) are grouped separately.
We further explored the relationship between the microbiome and sex of participants by displaying PCoA plots, which are also used in Sections 3.1, 3.3, and 3.4, and an RDA plot. No sexdependent clustering (Figure 4c,d) in beta diversity could be observed, but again the controls were grouped separately. Redundancy analysis (RDA plots) showed significant differences between female and male study subjects ( Figure 5). The axes of the RDA plots show a small percentage, indicating that the variance of the data is not substantial.

Sex
There were no statistically significant differences in alpha diversity comparing sex (Figure 4a,b), however, the average diversity was slightly higher in males using Shannon diversity, but slightly lower using Chao1. Among the 50 human participants, 25 were male and 25 were female.
Cosmetics 2020, 7, x FOR PEER REVIEW 7 of 19 sequencing runs, however, the discrepancies from run-to-run are not likely to be significantly large, especially as we use the data here for average diversity using a significant number of participants. It could be different if we were looking at the results from a single sample in each run. Bray-Curtis comparing the current study (red) with our previous study (blue). Samples were rarefied to 6600 sequence reads.

Sex
There were no statistically significant differences in alpha diversity comparing sex (Figure 4a,b), however, the average diversity was slightly higher in males using Shannon diversity, but slightly lower using Chao1. Among the 50 human participants, 25 were male and 25 were female. There are no significant changes in Alpha diversity between the sexes. The black lines indicate average diversity. Samples were rarefied to 6600 reads, so plots were comparable with others in this study. PCoA plots were created for (c) Chao and (d) Bray-Curtis of the female and male participants. There is no significant grouping between the sexes, but the controls and the product control ("no"-grey) are grouped separately.
We further explored the relationship between the microbiome and sex of participants by displaying PCoA plots, which are also used in Sections 3.1, 3.3, and 3.4, and an RDA plot. No sexdependent clustering (Figure 4c,d) in beta diversity could be observed, but again the controls were grouped separately. Redundancy analysis (RDA plots) showed significant differences between female and male study subjects ( Figure 5). The axes of the RDA plots show a small percentage, indicating that the variance of the data is not substantial. Samples were rarefied to 6600 reads, so plots were comparable with others in this study. PCoA plots were created for (c) Chao and (d) Bray-Curtis of the female and male participants. There is no significant grouping between the sexes, but the controls and the product control ("no"-grey) are grouped separately.
We further explored the relationship between the microbiome and sex of participants by displaying PCoA plots, which are also used in Section 3.1, Section 3.3, and Section 3.4, and an RDA plot. No sex-dependent clustering (Figure 4c,d) in beta diversity could be observed, but again the controls were grouped separately. Redundancy analysis (RDA plots) showed significant differences between female and male study subjects ( Figure 5). The axes of the RDA plots show a small percentage, indicating that the variance of the data is not substantial.

Age
We split the alpha and beta diversity results into the following age groups: Group 1 (18-27 years), Group 2 (28-37 years), Group 3 (38-47 years), Group 4 (48-57 years), and Group 5 (58-70 years). There were ten people, five males and five females, in each group, meaning these results are a preliminary evaluation. In alpha diversity, a weak trend of decreasing alpha diversity with increasing age is seen with Chao1, but the observation is not statistically significant (Figure 6a). The Shannon index of the age groups shows less of this trend (Figure 6b). There are no significant differences in diversity between ages, but the comparative tests performed are not clear. The highest average diversity appears to be in adults aged 28 to 37, in both the Chao1 and Shannon indices, while those aged 48 to 57 display the lowest average in both. However, these need to be investigated with a larger sample size in future work, in order to fully validate these preliminary findings.

Age
We split the alpha and beta diversity results into the following age groups: Group 1 (18-27 years), Group 2 (28-37 years), Group 3 (38-47 years), Group 4 (48-57 years), and Group 5 (58-70 years). There were ten people, five males and five females, in each group, meaning these results are a preliminary evaluation.
In alpha diversity, a weak trend of decreasing alpha diversity with increasing age is seen with Chao1, but the observation is not statistically significant (Figure 6a). The Shannon index of the age groups shows less of this trend (Figure 6b). There are no significant differences in diversity between ages, but the comparative tests performed are not clear. The highest average diversity appears to be in adults aged 28 to 37, in both the Chao1 and Shannon indices, while those aged 48 to 57 display the lowest average in both. However, these need to be investigated with a larger sample size in future work, in order to fully validate these preliminary findings. where no significant differences were observed. Samples were rarefied to 6600 reads, so plots were

Age
We split the alpha and beta diversity results into the following age groups: Group 1 (18-27 years), Group 2 (28-37 years), Group 3 (38-47 years), Group 4 (48-57 years), and Group 5 (58-70 years). There were ten people, five males and five females, in each group, meaning these results are a preliminary evaluation.
In alpha diversity, a weak trend of decreasing alpha diversity with increasing age is seen with Chao1, but the observation is not statistically significant (Figure 6a). The Shannon index of the age groups shows less of this trend (Figure 6b). There are no significant differences in diversity between ages, but the comparative tests performed are not clear. The highest average diversity appears to be in adults aged 28 to 37, in both the Chao1 and Shannon indices, while those aged 48 to 57 display the lowest average in both. However, these need to be investigated with a larger sample size in future work, in order to fully validate these preliminary findings. where no significant differences were observed. Samples were rarefied to 6600 reads, so plots were comparable with others in this study. The black lines indicate average diversity. Beta diversity PCoA plots were created for (c) Chao1 and (d) Bray-Curtis of the different age groups, as explained above. There is no significant grouping between the age groups, but the controls ("co", red) and the product control ("pro", black) are grouped separately. where no significant differences were observed. Samples were rarefied to 6600 reads, so plots were comparable with others in this study. The black lines indicate average diversity. Beta diversity PCoA plots were created for (c) Chao1 and (d) Bray-Curtis of the different age groups, as explained above. There is no significant grouping between the age groups, but the controls ("co", red) and the product control ("pro", black) are grouped separately.
In beta diversity (Figure 6c,d), the PCoA plots showed no significant clustering of the age groups. Redundancy analysis (RDA plots) showed no significant differences between the age groups ( Figure 7), but Group 5 seemed to be separate from the other groups. The axes of the RDA plots show a small percentage, indicating that the variance of the data is not substantial. In beta diversity (Figure 6c,d), the PCoA plots showed no significant clustering of the age groups. Redundancy analysis (RDA plots) showed no significant differences between the age groups ( Figure 7), but Group 5 seemed to be separate from the other groups. The axes of the RDA plots show a small percentage, indicating that the variance of the data is not substantial.

Control Testing
Analysing differences in microbial communities using the PCoA plots revealed that the samples (taken from the 50 human participants) were grouped differently from the controls (buffer, kit, and PCR controls), which indicated that there was no contamination of the samples, and the sampling and manufacturing process was sterile ( Figure A1a,b in Appendix A). Skin microbiome samples, which group near to the control samples in beta diversity plots, carried a low number of reads like the control samples. Rarefaction analysis showed that a sufficient number of sequences were used to represent the diversity of the healthy skin microbial communities ( Figure A1 in Appendix A).

Why the Benchmark Is Important
Our previous 2017 work collated skin microbiome diversity using the Chao1 index of humans with differing levels of skin health on one graph, to see if there was a trend [34]. This is displayed in Figure 8. We noticed that the unhealthier the skin was, the lower the diversity. For example, "unhealthy western skin" was more decreased in diversity than that found on western humans with "healthy skin", devoid of any skin problems. "Caveman/perfect skin" was the diversity of tribes people with negligible or zero contact with western civilization who displayed "unprecedented" levels of diversity and an absence of western skin problems such as acne and eczema [34,42]. We proposed biodiversity as a reliable measure of skin health, and it gave us a sliding scale of different levels of skin health, against which we could compare human subjects' skin microbiome diversity, to see whether they were nearer "healthy" skin at a higher biodiversity, or diseased skin at a lower level. The beauty of this was that it meant we could, for the first time, begin to test what was contributing to the large increase in skin problems in the western world.

Control Testing
Analysing differences in microbial communities using the PCoA plots revealed that the samples (taken from the 50 human participants) were grouped differently from the controls (buffer, kit, and PCR controls), which indicated that there was no contamination of the samples, and the sampling and manufacturing process was sterile ( Figure A1a,b in Appendix A). Skin microbiome samples, which group near to the control samples in beta diversity plots, carried a low number of reads like the control samples. Rarefaction analysis showed that a sufficient number of sequences were used to represent the diversity of the healthy skin microbial communities ( Figure A1 in Appendix A).

Why the Benchmark Is Important
Our previous 2017 work collated skin microbiome diversity using the Chao1 index of humans with differing levels of skin health on one graph, to see if there was a trend [34]. This is displayed in Figure 8. We noticed that the unhealthier the skin was, the lower the diversity. For example, "unhealthy western skin" was more decreased in diversity than that found on western humans with "healthy skin", devoid of any skin problems. "Caveman/perfect skin" was the diversity of tribes people with negligible or zero contact with western civilization who displayed "unprecedented" levels of diversity and an absence of western skin problems such as acne and eczema [34,42]. We proposed biodiversity as a reliable measure of skin health, and it gave us a sliding scale of different levels of skin health, against which we could compare human subjects' skin microbiome diversity, to see whether they were nearer "healthy" skin at a higher biodiversity, or diseased skin at a lower level. The beauty of this was that it meant we could, for the first time, begin to test what was contributing to the large increase in skin problems in the western world.
The values, in the graph from our 2017 work (Figure 8), used slightly different Chao1 diversity measuring methodology, therefore, the primary objective of this paper was to move the benchmarks over to the new methodology used by the Medical University of Graz. To do this, in this paper, we obtained a new "healthy western skin" benchmark. Using this, and the ratios or multiplication factors from our previous mechanism, we can transfer across the other benchmarks. Figure 9 explains how this would work. For example, our 2017 work showed unhealthy/diseased western skin diversity to be 4.2 times lower than "healthy western skin", therefore, we would divide our current benchmark found in this study by 4.2 to get a benchmark value for "unhealthy/diseased western skin" using the current methodology. As the benchmark for "healthy western skin" found in this study was~180 at a read depth of 6600 sequences, the value for "unhealthy/diseased western ski" would be roughly 180/4.2 = 42.9. Table 1 shows the multiplication factors. This method would be done for all the benchmarks displayed in our previous paper, to get the sliding scale of how diversity changes for different states of health on the skin. The values, in the graph from our 2017 work (Figure 8), used slightly different Chao1 diversity measuring methodology, therefore, the primary objective of this paper was to move the benchmarks over to the new methodology used by the Medical University of Graz. To do this, in this paper, we obtained a new "healthy western skin" benchmark. Using this, and the ratios or multiplication factors from our previous mechanism, we can transfer across the other benchmarks. Figure 9 explains how this would work. For example, our 2017 work showed unhealthy/diseased western skin diversity to be 4.2 times lower than "healthy western skin", therefore, we would divide our current benchmark found in this study by 4.2 to get a benchmark value for "unhealthy/diseased western skin" using the current methodology. As the benchmark for "healthy western skin" found in this study was ~180 at a read depth of 6600 sequences, the value for "unhealthy/diseased western ski"' would be roughly 180/4.2 = 42.9. Table 1 shows the multiplication factors. This method would be done for all the benchmarks displayed in our previous paper, to get the sliding scale of how diversity changes for different states of health on the skin. Figure 9. Graph showing how to obtain a multiplication factor between "healthy" and "unhealthy" western skin diversity, to be used in transferring the remaining benchmarks from our 2017 work to the new skin health measuring mechanism. Table 1. Table showing the multiplication factors needed to get from the benchmark for "healthy western skin" to the other benchmarks listed in our 2017 work.

Skin Diversity Benchmark Multiplication Factor
Unhealthy/diseased western 0.24 (1/4.2) Healthy western - Figure 8. Graph showing the combined benchmarks taken from our previous work [34]. This graph formed the basis for the skin health measuring mechanism, with benchmark values of diversity for a range of different states of health on the skin. For an explanation of each please see our previous work.

Figure 8.
Graph showing the combined benchmarks taken from our previous work [34]. This graph formed the basis for the skin health measuring mechanism, with benchmark values of diversity for a range of different states of health on the skin. For an explanation of each please see our previous work.
The values, in the graph from our 2017 work (Figure 8), used slightly different Chao1 diversity measuring methodology, therefore, the primary objective of this paper was to move the benchmarks over to the new methodology used by the Medical University of Graz. To do this, in this paper, we obtained a new "healthy western skin" benchmark. Using this, and the ratios or multiplication factors from our previous mechanism, we can transfer across the other benchmarks. Figure 9 explains how this would work. For example, our 2017 work showed unhealthy/diseased western skin diversity to be 4.2 times lower than "healthy western skin", therefore, we would divide our current benchmark found in this study by 4.2 to get a benchmark value for "unhealthy/diseased western skin" using the current methodology. As the benchmark for "healthy western skin" found in this study was ~180 at a read depth of 6600 sequences, the value for "unhealthy/diseased western ski"' would be roughly 180/4.2 = 42.9. Table 1 shows the multiplication factors. This method would be done for all the benchmarks displayed in our previous paper, to get the sliding scale of how diversity changes for different states of health on the skin. Graph showing how to obtain a multiplication factor between "healthy" and "unhealthy" western skin diversity, to be used in transferring the remaining benchmarks from our 2017 work to the new skin health measuring mechanism. . Graph showing how to obtain a multiplication factor between "healthy" and "unhealthy" western skin diversity, to be used in transferring the remaining benchmarks from our 2017 work to the new skin health measuring mechanism. For the purposes of this discussion, we created an example set of boxplots transferring "caveman/perfect" and "rural healthy", as well as "unhealthy/diseased western" benchmarks of diversity from our previous work ( Figure 10). This is how we would transfer the data across in future work, along with other benchmarks. The multiplication factors used to obtain these example plots from the new benchmark found in this study are listed in Table 1.
For the purposes of this discussion, we created an example set of boxplots transferring "caveman/perfect" and "rural healthy", as well as "unhealthy/diseased western" benchmarks of diversity from our previous work ( Figure 10). This is how we would transfer the data across in future work, along with other benchmarks. The multiplication factors used to obtain these example plots from the new benchmark found in this study are listed in Table 1. Figure 10. Boxplots showing the transferred Chao1 diversity benchmarks for "caveman/perfect skin" (blue), "rural healthy" (orange), and "unhealthy/diseased western" (yellow). The "healthy western skin" (grey) is taken directly from the data found in this study in Section 3.1, and the other benchmarks were created using the multiplication factors in Table 1. As all data in this study was rarefied to 6600 sequences, this is the read depth these benchmarks would apply at; at higher numbers of sequences, the diversity would change. This is a rough example of how we would transfer the benchmarks across, and future work will do this accurately, considering other factors.
We note that the benchmark for "healthy western skin" can change depending on other factors, including age and sex. It is for this reason we take a preliminary look at these in the following sections, so we can best improve our mechanism in the future.

The Effect of Sex
This study found no significant differences in alpha diversity between the sexes, but men carried slightly lower average biodiversity using Chao1. This finding was echoed in previous work where a decreased diversity for men was evident on almost all body sites [61]. This difference was statistically significant across far more body sites on humans from rural communities than urban ones, implying environmental factors in the western world could be a cause of the smaller effect of sex on diversity. Further work found women's backs were higher in diversity using Chao1 than men, but similar to our study, it was not significant [87].
Although the focus was not on community structure, our RDA findings showed a significant difference in bacterial communities between men and women, which backed up a previous study that also found a significant difference in alpha diversity between men and women for one body site, the palm of the hand [62]. However, it used phylogenetic diversity, which may make comparisons difficult. A final study showed that the total number of bacteria on the feet of women was much higher than on men. However, as alpha diversity analysis methods take into account more factors than just the number of bacteria present, this is not fully comparable [88].
The fact that men produce more sebum could influence results, especially on high-sebum areas of the skin. On average, sebaceous areas of the skin are less diverse than dry or moist [61,89,90], which could be due to sebum production increasing skin acidity [60]. Damaged skin promotes resident bacterial flora's dispersal from the skin and leads to pathogenic bacterial and fungal growth [5,6,21]. This could explain why our results did not show significant differences in diversity between the Figure 10. Boxplots showing the transferred Chao1 diversity benchmarks for "caveman/perfect skin" (blue), "rural healthy" (orange), and "unhealthy/diseased western" (yellow). The "healthy western skin" (grey) is taken directly from the data found in this study in Section 3.1, and the other benchmarks were created using the multiplication factors in Table 1. As all data in this study was rarefied to 6600 sequences, this is the read depth these benchmarks would apply at; at higher numbers of sequences, the diversity would change. This is a rough example of how we would transfer the benchmarks across, and future work will do this accurately, considering other factors.
We note that the benchmark for "healthy western skin" can change depending on other factors, including age and sex. It is for this reason we take a preliminary look at these in the following sections, so we can best improve our mechanism in the future.

The Effect of Sex
This study found no significant differences in alpha diversity between the sexes, but men carried slightly lower average biodiversity using Chao1. This finding was echoed in previous work where a decreased diversity for men was evident on almost all body sites [61]. This difference was statistically significant across far more body sites on humans from rural communities than urban ones, implying environmental factors in the western world could be a cause of the smaller effect of sex on diversity. Further work found women's backs were higher in diversity using Chao1 than men, but similar to our study, it was not significant [87].
Although the focus was not on community structure, our RDA findings showed a significant difference in bacterial communities between men and women, which backed up a previous study that also found a significant difference in alpha diversity between men and women for one body site, the palm of the hand [62]. However, it used phylogenetic diversity, which may make comparisons difficult. A final study showed that the total number of bacteria on the feet of women was much higher than on men. However, as alpha diversity analysis methods take into account more factors than just the number of bacteria present, this is not fully comparable [88].
The fact that men produce more sebum could influence results, especially on high-sebum areas of the skin. On average, sebaceous areas of the skin are less diverse than dry or moist [61,89,90], which could be due to sebum production increasing skin acidity [60]. Damaged skin promotes resident bacterial flora's dispersal from the skin and leads to pathogenic bacterial and fungal growth [5,6,21]. This could explain why our results did not show significant differences in diversity between the sexes, because we used the "dry" volar forearm. Differences in facial cosmetics application, far less pronounced on the forearm than the face, could also have diminished the diversity difference. The generally more acidic skin of men [91,92], associated with lower ecosystem diversity [93], along with amount of cosmetic application and hormone production [92,94], wash frequency [62], and perspiration rate could be reasons for differences between the sexes [60,95].

The Effect of Age
Our results show humans between the ages of 28 and 37 appear to have the highest average alpha diversity in both Chao1 and Shannon, and humans aged 48-57 have the lowest, although the results are not statistically significant. Our study uses smaller and more regular age groupings as compared with previous work, however, the sample size of 10 in each group means our work is preliminary. Future work is needed to give more accurate insight into how diversity changes with age.
Previous work has displayed mixed results, which could be because age-related changes varied across body sites, between subjects, and different diversity evaluating methodologies [61,96]. A study with 495 human subjects observed the highest diversity on humans aged 20-29 but evaluated the diversity differently, using relative abundance of "important" Corynebacterium OTUs, rendering comparisons obsolete [96]. Two studies found higher alpha diversity in older than younger skin across multiple body sites including the volar forearm. Both used a "young" (23-37 and 21-50 years) and "old" (60-76 and 51-90 years) group of women [63] and adults [97], respectively. Their groupings, however, took in a wider range of ages which could have impacted the results.
In contrast, two studies with similar age range groups were in more agreement with our results. The first found greater diversity on the volar forearm of younger adults aged 25-35 as compared with those aged 50-60 [61]. The second found females aged 25-35 to have higher diversity than those aged 50-63 [98].
We are still unsure why age-related diversity changes occur. Older humans are more susceptible to the onset of inflammatory disorders [99], which fits in with our study where older people tended to have less diverse skin, but it is not known why their microbiomes maybe easier to colonise for opportunistic taxa. With aging, the skin's ability to produce sebum and retain moisture changes [100], although short term changes in transepidermal water loss (TEWL) and diversity appear not to be linked [101].

Limitations and Future Work
We note this study took samples from the volar forearm of participants. Previous work has shown that body site strongly influenced skin microbial diversity and community structure, i.e., dry areas such as the volar forearm, the body site found to have the highest diversity, displayed higher diversity than sebaceous (such as forehead and cheeks) and moist (inguinal crease and popliteal fossa) areas [22,24,30,58,61,63,89,90,96,102,103].
Benchmark diversity values for unhealthy" or "diseased" western skin, and "perfect" or "caveman" skin on uncontacted tribes people [42] will be accurately transferred from our 2017 paper in future work, using the multiplication factors found between their value and that of healthy western skin. We want our mechanism to be able to accurately evaluate the skin health of anyone, taking into account age, sex, body site, and area of residence [61]. Diversity benchmarks for separate skin diseases should also be included, as they have a far greater effect than factors such as age and sex [87].
The inclusion of more detailed bioinformatic analyses focusing on the specific types of microbe present, such as an LEfSe, which was beyond the scope or remit of this study for the reasons explained in the introduction, could be an interesting future addition. This may help move us closer to being able to use microbiome community structure as a biomarker for skin health and disease. However, our current understanding of this area is limited, and therefore it is not included here.
We note that there have been known annotation errors in microbial databases, such as the Silva database used in this study, which have been shown to affect results. However, clustering methods were found to affect diversity more than databases [104]. Despite the Silva database being worse than other databases at determining microbes at the species level due to errors [104], the focus of this study was diversity, primarily using the Chao1 index, which did not rely on the correct identification of microbes but the amount of singletons and doubletons [105]. Therefore, this particular problem applies to microbiome composition studies. A bigger concern from this study's perspective is distinguishing between true "rare" species and sequencing artifacts. Quality filtering techniques were used in this study for this purpose; they aimed to remove unique sequences that were not "rare" species but errors. This process has a large influence on low abundance OTUs [106]. These methods are not perfect but are being improved.
A larger sample group should be recruited, so that age-related changes in diversity can be investigated independently in men and women. Due to the preliminary nature of these results, in a follow-up study we will conduct an in-depth evaluation of how age affects biodiversity. This will include full correlation analysis using Spearman rho and Pearson's correlation to determine whether a relationship exists, and the strength of it. In addition, we will explore how different p-value thresholds affect results and use corrections for multiple testing such as the Bonferroni method which aims to mitigate against the effect of false positive results. Previous work has shown that different studies should have different p-value thresholds [107], and that one should not "believe that an association or effect is absent just because it was not statistically significant" [108].

Conclusions
In this study, we determined a benchmark value for the microbial diversity, using the Chao1 index, found on healthy western human skin. At a read depth of 6600 sequences, we found average Chao1 diversity to be~180. This allows us, in future work, to transfer all benchmark levels of diversity from our 2017 paper, and to update our skin health measuring mechanism. This is a step towards helping us to begin to evaluate which environmental factors, such as synthetic chemicals in modern cosmetics, are contributing to the large rise in skin problems in the western world. Men carried a slightly lower average biodiversity than women, and younger adults appeared to have a higher average diversity than older adults, but these results were not statistically significant. A larger sample group should be used in the future to investigate these preliminary findings in more detail, and to determine whether age-related diversification differs between men and women. Moreover, to aid dermatological assessments of susceptibility to infections or disease, and associated treatments, diversity benchmarks split into age, sex, area of residence, body site, and skin diseases should be added to the skin health measuring mechanism.