Establishing Normative Values to Determine the Prevalence of Biochemical Hyperandrogenism in Premenopausal Women of Different Ethnicities from Eastern Siberia

Androgen assessment is a key element for diagnosing polycystic ovary syndrome (PCOS), and defining a “normal” level of circulating androgens is critical for epidemiological studies. We determined the upper normal limits (UNLs) for androgens in a population-based group of premenopausal “healthy control” women, overall and by ethnicity (Caucasian and Asian), in the cross-sectional Eastern Siberia PCOS Epidemiology and Phenotype (ESPEP) Study (ClinicalTrials.gov ID: NCT05194384) conducted in 2016–2019. Overall, we identified a “healthy control” group consisting of 143 healthy premenopausal women without menstrual dysfunction, hirsutism, polycystic ovaries, or medical disorders. We analyzed serum total testosterone (TT) by using liquid chromatography with tandem mass spectrometry (LC-MS/MS), and DHEAS, sex-hormone-binding globulin (SHBG), TSH, prolactin, and 17-hydroxyprogesterone (17OHP) were assessed with an enzyme-linked immunosorbent assay (ELISA). The UNLs for the entire population for the TT, free androgen index (FAI), and DHEAS were determined as the 98th percentiles in healthy controls as follows: 67.3 (95% confidence interval (CI): 48.1, 76.5) ng/dl, 5.4 (3.5, 14.0), and 355 (289, 371) μg/dl, respectively. The study results demonstrated that the UNLs for TT and FAI varied by ethnicity, whereas the DHEAS UNLs were comparable in the ethnicities studied.


Introduction
Hyperandrogenism is a common endocrine disorder in premenopausal women, and the assessment of androgen levels is one of the essential approaches for diagnosing polycystic ovary syndrome (PCOS) [1][2][3][4]. In epidemiological studies, upper normal limits (UNLs) or cut-off values for androgens can be determined by using two different approaches: by cluster analysis in a large population-based unselected cohort or by using upper (95th-98th) percentiles in a well-characterized cohort of healthy women from the same population and recruited in a manner similar to that for study subjects (i.e., "healthy controls"). Unfortunately, many investigators use 'controls' who are often not well phenotyped; nor are they from the same population or selected in a manner similar to that of study subjects. Instead, opportunistic populations are used. Additionally, investigators have used different definitions of what is "normal" regarding androgen concentrations, and the best way to determine the normal ranges for androgens remains a subject of intense discussion [5]. For population studies, it is also important to take into account that a normative range for androgens can differ significantly depending on the ethnicities of participants. Therefore, the identification of subjects with PCOS in epidemiologic studies of a population is possible only when specific cut-offs for androgens are established in the same population that is recruited in a manner similar to that of study subjects [6].
Previously published data suggested that the phenotype and prevalence of symptoms related to PCOS may vary between women of Caucasian and Asian origin [7]. Nevertheless, to date, there is a limited number of epidemiological studies that have estimated the prevalence of PCOS by using a uniform methodology in different ethnic groups arising from the same population [8]. Eastern Siberia (in the Russian Federation) is a unique region in which Caucasians and Asians have been living together in similar geographic and socioeconomic conditions since the 17th century. Ethnicity-dependent normative ranges for androgens will help to estimate the prevalence of PCOS in this population.
The objectives of our study were to determine the UNLs for androgens in a healthy control group of premenopausal women from Eastern Siberia, overall and by ethnicity.
In this study, we tested the following "null" hypothesis: that the normative values for biochemical hyperandrogenism (HA) do not differ between Caucasian, Asian, or Mixedethnicity (Mixed) premenopausal women from the same population (i.e., Eastern Siberia).

Materials and Methods
Study design. A cross-sectional population-based prospective study. Study population. Study subjects were recruited during the population-based prospective Eastern Siberia PCOS Epidemiology and Phenotype (ESPEP) Study (ClinicalTrials.gov ID: NCT05194384) [9], which was conducted in two major areas of Eastern Siberia (Irkutsk Region and the Buryat Republic, Russian Federation) from March 2016 to December 2019. ESPEP was a multicenter, institution-based study, which included 1134 premenopausal women who were undergoing an obligatory early-employment medical assessment. All centers represented major regional employers.
The inclusion criteria for the present cross-sectional study were as follows: female subjects aged ≥18 and <45 years who provided written informed consent, were willing to comply with all study procedures, and would be available for the duration of the study. Exclusion criteria were: current pregnancy or lactation, history of hysterectomy and/or bilateral oophorectomy, endometrial ablation, and/or uterine artery embolization, anything that would place the individual at increased risk or preclude the individual's full compliance with or completion of the study, unwillingness to participate or difficulty understanding the consent processes or the study objectives and requirements, and the use of significant medications at the time of the study or within the previous three months, including: oral contraceptive pills (OCPs), vaginal rings, transdermal patches, levonorgestrel-releasing intrauterine devices (LNG-IUDs), transdermal implants, injectable contraceptives, hormone replacement therapy (HRT), mineralocorticoids, glucocorticoids, and insulin sensitizers, including metformin and thiazolidinediones.
Study Protocol. As previously described [9], subjects were evaluated consecutively by trained personnel with a questionnaire, anthropometry, vital signs, and gynecological exam. Anthropometric measurements included height, weight, and waist circumference (WC). The body mass index (BMI) was calculated as: weight (kg)/height (m 2 ). Hirsutism was defined by the modified Ferriman-Gallwey (mFG) visual hirsutism score scale [10]. Assessment of acne was made by using a standard acne lesion assessment [11]. The Ludwig Scale was used to assess alopecia [12]. Pelvic ultrasound (U/S) was performed by experienced specialists who were trained to conduct the U/S scans uniformly, with intra-/inter-observer coefficients of variation that were less than 6%. We used Mindray M7 (Mindray Bio-Medical Electronics Co., Shenzhen, China), a transvaginal probe (5.0-8.0 MHz) for sexually active subjects, and a transabdominal probe (2.5-5.0 MHz) for women who had never been sexually active. Ovarian volume was determined by the formula for a prolate ellipsoid (length × width × height × 0.523).
Definition of 'Healthy Controls'. The definition of healthy controls was agreed upon by the ESPEP Steering Committee and included the following: a history of regular predictable menstrual cycles of 21-35 days in length, an mF-G score of 2 or less, an antral follicle count (AFC; i.e., number of follicles that were 2-9 mm in diameter) less than 12, an ovarian volume less than 10 cm 3 , a blood pressure less than 130/85 mmHg without medical treatment for hypertension, and a fasting plasma glucose level less than 6.1 mmol/l without medical treatment for dysglycemia. We excluded from the healthy control group women with significant acne (moderate to severe) or alopecia (based on the subject's complaints), BMI <18 or ≥30 kg/m 2 , chronic or major illness (cancer, diabetes mellitus, impaired glucose tolerance, impaired fasting glucose, cardiovascular disease, hypertension, etc.), premature ovarian failure (by history or elevated FSH), treated and untreated hyperprolactinemia (based on history or increased prolactin level of >727 mIU/ml), untreated thyroid disorder (based on history or a TSH level of >4 mIU/ml), and 21-hydroxylase deficient non-classic adrenal hyperplasia (based on an increased 17-OHP of >6.9 nmol/l in the early follicle phase). The intake of any steroid hormones, including contraceptives, was considered an exclusion criterion as well.
Statistical analysis. Sample size calculations for the total population were based on the following formula: n= [(Z 1−α ) 2 (P(1 − P)/D 2 )] [13], where n is the individual sample size, Z 1−α = 1.96 (when α = 0.05), P is assumed PCOS prevalence according to previously published data, and D is an absolute error. Data were collected by using Research Electronic Data Capture (REDCap) [14,15]. Outliers were identified during the Exploratory Data Analysis [16,17] by using the box-plot and 3σ methods.
Managing missing data: In our research dataset, there were two types of missing data: those that were missing completely at random (MCAR) and missing at random (MAR). We recorded all missing values with labels of "N/A" to make them consistent throughout our dataset. When analyzing the dataset, we used pairwise deletion.
The results of Kolmogorov-Smirnov's test for normality showed that the continuous variables that were analyzed were non-normally distributed. For continuous variables, we used Kruskal-Wallis ANOVA with multiple comparisons, p-values (2-tailed), and Mann-Whitney non-parametric tests. Pearson chi-square and Fisher exact one-tailed tests were used to compare proportions and categorical variables. A p-value of 0.05 was considered statistically significant. We defined the UNLs for androgens as the 98th percentiles of serum TT, DHEAS, and FAI in the group of healthy controls. To compare the 98th percentiles, we analyzed the 95% confidence intervals (95% CIs). Overlapping 95% CIs can explain statistical significance when comparing two measured results [18]. If the two 95% CIs do not overlap, they can be considered significantly different. To construct the 95%Cis, we utilized the bootstrap percentile method.
The socio-demographic characteristics of healthy controls and their menstrual and reproductive history, anthropometry, vital signs, and pelvic U/S parameters by ethnicity are presented in the Supplementary Tables S1-S3.
The hormonal characteristics and glucose levels in the healthy controls, overall and by ethnicity, are shown in Table 4. As presented in this table, prolactin levels were significantly higher in Asians as compared with Caucasians and women of Mixed ethnicity. Mixedethnicity women demonstrated a slow increase in TSH in comparison with Caucasians. Nevertheless, the prolactin and TSH values were within the reference interval. Regarding androgens, when studying the impact of ethnicity, we found that TT values were significantly lower in Asians than in Caucasians and in comparison with Mixed-ethnicity women (Table 4). When analyzing these data by age, the androgen profiles of healthy controls aged <35 years and ≥35 years were comparable (Table S4).
The UNLs for androgens and FAI were defined as the 98th percentiles of all healthy controls, overall and by ethnicity (Table 5). Based on the calculation of the 95% CIs for the 98th percentiles and on the analysis of the overlapping 95% CIs, the UNLs for TT and FAI were significantly higher in Caucasians as compared to Asian and Mixed-ethnicity individuals ( Figures S1 and S2), and they were similar in Asians and Mixed-ethnicity individuals. Taking into account the comparable UNLs for TT and FAI in women of Asian and Mixed ethnicity, we combined their data into the Asian and Mixed subgroup. Again, we found higher UNLs for both the TT and FAI in Caucasians as compared to the combined Asian and Mixed subgroup.  Abbreviations: TT is the total testosterone, FAI is the free androgen index, and DHEAS is dehydroepiandrosterone sulfate. * The difference between Caucasians and Asians; ** the difference between Caucasians and Mixed; # the difference between Caucasians and the Asians and Mixed group.
There were no significant differences for Caucasians vs. Asians or for Caucasians vs. Asians and Mixed ethnicity regarding the DHEAS UNLs. A slightly lower UNL was found in the Mixed group than that for Caucasians, but the difference did not reach significance, which was possibly because of the small number of subjects of Mixed ethnicity (Table 5, Figure S3). Therefore, in our study, we considered 355 µg/dl as the UNL for DHEAS for all ethnicities and age groups. The combined UNLs for TT and FAI were also calculated and could be utilized in cases when data on ethnicity were unavailable.
We also found that the UNLs for androgens as defined by the 98th percentiles were similar in healthy controls aged <35 and ≥35 years (Table S5).

Discussion
The study of the epidemiology and phenotype of any complex genetic trait, including PCOS, requires careful determination of what is "normal". There are two general approaches to determining normal limits: (a) cluster analysis of a large, unselected population to determine the 'natural' or 'native' cut-offs in the population, or (b) the 95th-98th percentile (for the UNL) of a select group of well-phenotyped "healthy controls". Nevertheless, a limited number of epidemiological PCOS studies are based on pre-developed cut-offs for determining biochemical HA in multiracial populations [8,19]. A minority of studies utilize the optimal approaches when determining biochemical HA.
Welt et al. defined the biochemical HA as an androgen level greater than the 95% confidence limits in a Boston multiethnic control population: TT of >63 ng/dl according to RIA and DHEAS of >430 µg/dl according to ELISA However, these investigators did not develop or compare any specific cut-offs for different ethnicities [8].
Caucasians living in Eastern Siberia are mainly of Slavic origin, and most Asians are Buryats. Importantly, the distribution of our control subjects by ethnicity corresponds to those in the total population. Previously, no data were available on normative androgen values for these populations. Therefore, we analyzed our UNLs in comparison with cut-offs derived from studies performed in other Caucasian and Asian populations ( Table 6).    Among Caucasians, the UNLs for total testosterone mostly did not fall outside the confidence intervals for the UNLs determined in our study and were comparable with ours [20][21][22][23].
A lower cut-off for TT was demonstrated by Hashemi et al. in Iranian women. These investigators computed normative cut-off levels by using 95th percentile values and kmeans cluster analysis in the total population (n = 923) and in a reference group comprising 423 Caucasian eumenorrheic non-hirsute women of reproductive age selected from the total population. In this study, the investigators indicated the lower UNLs for TT according to the 95th percentiles and cluster analysis in the reference group as compared to those in the total population. Notably, these investigators emphasized that their results could not be generalized to other ethnicities [24]. At the same time, another study in an Iranian population of premenopausal women demonstrated cut-off values for TT based on the 95th percentile of the control values that were comparable to our UNLs for this parameter [23].
It has been suggested that Asians are more likely to have lower androgen values than those of Caucasians, although the data are insufficient. Population-based studies performed in Han Chinese populations utilized cluster analysis and/or 95th percentiles to determine the normative values for androgens [25][26][27]. We found that the UNLs for TT in Chinese populations are similar to those in Siberian Caucasians and look even higher than in Buryats, but, unfortunately, the 95% CIs are available only for the UNLs from Siberian women, and this complicates the estimation of the statistical significance of this difference.
For the FAI, an upper normal limit of 6.4 was indicated in Chinese women by using the 95th percentiles of a reference group [26]. In the total population, these authors iden-tified 6.1 as the cut-off value for the FAI according to k-means cluster analysis. The UNL determined in our population for Caucasians of Slavic origin (6.9) looks comparable to the cut-offs established for Chinese and Iranian women (5.4) [23] and higher than those in Caucasians from Spain (3.9) [20] and Turkey (4.9) [22], but statistical significance is unclear. At the same time, when comparing the cut-offs for the FAI in Chinese women and Buryats, we found significantly lower UNLs in our Asian and our Asian and Mixed subpopulations [26].
Regarding DHEAS, we did not find an ethnicity-dependent difference for cut-offs in our study and estimated the UNL for this hormone as 355 (95% CI: 289,371) µg/dl. According to the data presented in Table 6, our cut-off is close to the UNLs for the Turkish and Iranian populations (325 and 245-345 µg/dl, respectively) [22,24], but is a little bit lower than the UNLs established for premenopausal women from Spain (438 µg/dl) [20]. We consider the lowest UNL (179 µg/dl), which was reported for a small reference group of Iranian women [23], as a potential bias caused by the small sample size. In the Chinese population, the authors demonstrated that the cut-offs for DHEAS ranged from 181 to 289 µg/dl depending on the method used, with higher values being estimated according to the 95th percentile in the controls [25], which is comparable with our results.
In general, all of the analyzed data had an important limitation-the absence of the calculated 95% CIs for the established UNLs, which could be useful for estimating the statistical significance of the differences between diagnostic criteria of biochemical hyperandrogenism proposed for different populations of premenopausal women.
In our study, we established the UNLs for androgens in a well-phenotyped control group of premenopausal women, overall and depending on ethnicity, which were identified during the cross-sectional, institution-based ESPEP study. Using the "healthy control" approach, we noted that despite utilizing the same recruitment and assay methodologies, the UNLs for TT and the FAI were significantly higher in healthy premenopausal Caucasian than in Asians or Asians and Mixed women, reinforcing the need to use ethnicity-specific normative ranges, at least for androgens, in the study of PCOS.
Study strengths: Importantly, our study benefited from the fact that all study subjects were recruited in a representative, unselected, medically unbiased, multiracial population of women with comparable socio-demographic characteristics and living in the same geographical conditions. We consider the Eastern Siberian population as an ideal model for the epidemiological study of prevalence and phenotype of PCOS in Caucasians and Asians based on ethnicity-dependent normative ranges for androgens. All study participants were well phenotyped, with the exclusion of any factors that could influence their androgen profiles. A highly effective method (LC-MS/MS) was used for TT measurements [28,29]. We conservatively defined the UNLs for androgens as the 98th percentiles of the groups assessed, and this approach provides less of a chance of overestimating the prevalence of biochemical HA than that when using lower percentiles (e.g., 95% percentiles as the UNLs).
Study limitations: Regarding the ethnicity-specific UNLs, we used a relatively small number of subjects (less than 120, as recommended by CLSI EP28-A3c guidelines [30]) in the subgroups that were compared. Furthermore, the overall number of healthy controls in our study was relatively small compared to the entire population assessed in the ESPEP study (12.6% of the total).

Conclusions
In this study, we report the UNLs for the TT, FAI, and DHEAS by using the 98th percentiles in a population of well-phenotyped "healthy controls" identified among premenopausal women from an unselected multiracial Eastern Siberian population. The study's results demonstrated that the UNLs for TT and the FAI depended on ethnicity: 73.9 (51.7-78.0) ng/dl and 6.9 (3.6-14.0) for Caucasians and 41.0 (37.9-47.8) ng/dl and 2.9 (2.5-3.0) for Asians and Mixed ethnicity combined, respectively. For the DHEAS levels, the UNLs were similar for all ethnicities: 355 (289-371) µg/dl. The relatively small number of subjects of different ethnicities suggests the need for further research to obtain more data regarding the normative androgen values that are specific to these subpopulations. A recently published study protocol for defining diagnostic cut-offs using integrated international multi-ethnic data from medically unbiased and unselected populations described a methodological approach that will allow us to update the ethnicity-dependent definition of biochemical HA [31].