Sex Biases in Cancer and Autoimmune Disease Incidence Are Strongly Positively Correlated with Mitochondrial Gene Expression across Human Tissues

Simple Summary Our study investigates the well-known observation/quandary that cancer occurs more frequently in men while autoimmune diseases (AIDs) occur more frequently in women. This has motivated us to explore whether these sex biases may have a common basis. To study that, we assembled and analyzed a large collection of cancer and AID incidence datasets, including matched data from 29 countries. We first, quite strikingly, find that the sex biases observed in the incidence of AIDs and cancers that occur in the same tissue are positively correlated across human tissues. To our knowledge, this is the first time this across-tissue relationship has been quantitatively demonstrated. Second, we find by analyzing healthy human tissue gene expression data that the sex bias in the expression of mitochondrial-encoded genes stands out as the common key factor whose levels across human tissues are most strongly and positively associated with both cancer and AID incidence rate sex biases, pointing to the key potential role of these genes in determining sex bias in both disorders. These findings may further prompt researchers to explore how pertaining findings in cancer studies could cross fertilize AID studies and vice versa, potentially enhancing our ability to prevent and treat these diseases. Abstract Cancer occurs more frequently in men while autoimmune diseases (AIDs) occur more frequently in women. To explore whether these sex biases have a common basis, we collected 167 AID incidence studies from many countries for tissues that have both a cancer type and an AID that arise from that tissue. Analyzing a total of 182 country-specific, tissue-matched cancer-AID incidence rate sex bias data pairs, we find that, indeed, the sex biases observed in the incidence of AIDs and cancers that occur in the same tissue are positively correlated across human tissues. The common key factor whose levels across human tissues are most strongly associated with these incidence rate sex biases is the sex bias in the expression of the 37 genes encoded in the mitochondrial genome.


Introduction
Both autoimmune diseases (AIDs) and cancers have notably sex-biased incidence rates. Most AIDs occur more often in women [1,2], and most cancers occur more often in men [3][4][5]. While sex differences in several key biological factors have been implicated in the biased incidence rates observed for both AIDs and cancer, including inflammation and immunity, metabolism and sex hormones, their mechanistic underpinnings remain largely unexplained [2,6,7].
Given these observations, we asked whether the sex biases observed in the incidence of AIDs and cancers that occur in the same tissue are correlated across human tissues. This question is of fundamental interest, since an affirmative answer may suggest that there are common factors underlying their incidence. Establishing such a link between AIDs and cancers could further prompt researchers to explore how pertaining findings in AIDs could cross fertilize cancer risk studies and vice versa, potentially enhancing our ability to prevent and treat these diseases.
To explore whether these sex biases are correlated across human tissues, we collected population-based AID incidence studies for tissues that have both a cancer type and an AID that arise from that tissue. For countries for which we collected AID incidence data, we gathered incidence data for corresponding cancer types from national cancer registries. Analyzing a total of 182 country-specific, tissue-matched cancer-AID incidence rate sex bias data pairs, we find that the incidence rate sex biases observed for AIDs and cancers that occur in the same tissue are positively correlated across human tissues. In addition, we analyzed gene expression data from non-diseased tissue samples to determine if sex biases in gene set expression in these tissues are correlated with AID and cancer incidence rate sex biases in the same tissues. We find that the top positively enriched gene set across human tissues whose expression sex bias is most strongly associated with the incidence rate sex biases for AIDs, cancers, and AIDs and cancers considered jointly, is the set of 37 genes encoded in the mitochondrial genome.

Overview
Our analysis is divided into two main parts: curation and analysis of disease incidence rate data; and investigation of associations between incidence rates and gene expression in corresponding non-diseased human tissue samples. First, we studied the association of incidence rates for AIDs and cancers occurring in the same tissue. We collected incidence data for AIDs from published studies, and for each country for which we found incidence data for a given AID, we collected incidence data from that country's national cancer registry for cancers occurring in the same tissue as the AID. We matched AID and cancer incidence data by tissue and by country to produce country-specific tissue-matched AIDcancer incidence rate data pairs. We then computed across-tissue correlations between AIDs and cancers for male incidence rates, female incidence rates, overall incidence rates, and incidence rate sex biases, at both the individual country level and the across-country global level.
Next, we used non-diseased human tissue transcriptomic data from GTEx version 8 [8] to investigate possible factors across human tissues that might be associated with incidence rate sex biases. We computed correlations between incidence rate sex biases and either expression of individual genes or enrichment of human functional gene sets across tissues.

Autoimmune Disease Incidence Data Curation
We first performed an extensive literature search for sex-specific incidence data for AIDs. For each AID, we searched for original studies mentioning the disease and epidemiology, prevalence, incidence, incidence rate, or sex bias using Google Scholar. We considered only population-based studies that use clinical inclusion criteria and have at least 25 cases for a given disease. We evaluated whether or not a study was population-based using either (a) the characteristics of the existing data source used in the study (e.g., a mandatory country-wide reporting registry) or (b) estimates showing that the data collected in the study were likely representative of the overall population. We evaluated whether or not a study used clinical diagnostic criteria by looking for use of a disease-specific blood test, a histological assay, or other evidence used to confirm diagnosis and rule out similar non-autoimmune conditions. Additionally, we considered only AIDs with a focal primary tissue (e.g., we included ulcerative colitis but excluded Crohn's disease), for which we could find incidence data for at least three countries. We excluded sex-specific tissues.
We collected 188 AID-country incidence rate datasets from 167 studies. For each dataset, we calculated the incidence rate sex bias (IRSB) as so that a value of zero indicates no bias, a positive value indicates a higher incidence rate in males (termed a "male bias") and a negative value indicates a higher incidence rate in females (similarly termed a "female bias"). A majority of the studies provided sex-specific (123 of 188 datasets, 65%) and total (143 of 188, 76%) incidence rates (IR): where: "POP" stands for either the "MALE", "FEMALE", or "TOTAL" population; cases TOTAL = cases MALE + cases FEMALE ; and population TOTAL = population MALE + population FEMALE ).
Most studies reported IR as cases per year per 10 5 persons; those using a different scale were converted to this scale. We used "crude" incidence rates (as defined above) when available; some studies provided only age-adjusted incidence rates.

Estimating Incidence Rates
For each of the four incidence rate measures we consider (IRSB, IR F , IR M , and R TOTAL ) the majority of studies provided a value, while other studies gave values for other measures (i.e., different incidence rates or case counts) that can be used to estimate the value of that measure. For a given measure we can divide our AID-country datasets into four groups ( The estimators for all four measures require a value for the population's sex ratio (Formula (1)) As only one study provided the background population sex ratio, we estimated measures using either a sex ratio of 1:1 or the sex ratio for the corresponding population (matching the specific country during the years the study was conducted) according to United Nations estimates [9]. Based on (A), (B), and (C), we estimated IRSB, IR F , IR M , and IR TOTAL as (Formulas (2)−(5)): To assess the accuracy of our estimators we compared the actual and estimated values for datasets in group (2) in two ways (Table 2). First, we computed the Pearson's correlation coefficient r between the two values. All estimators were accurate: for each the correlation coefficient was close to 1 and the one-sided t-test was significant. Second, we computed a simple linear model of the form x actual = β × x estimate + α. All estimators were accurate: for each the coefficient β was close to 1 and the r 2 close to 1 (where r is the Pearson's correlation coefficient). For all four measures the estimators performed well, but for each measure the estimator using a sex ratio of 1:1 performed as good as or slightly better than the estimator using the sex ratio based on the United Nations estimates. Accordingly, for our analyses we used estimators with a sex ratio of 1:1. For all of our analyses, results computed using only given values, and not estimates, were consistent with results computed using both given and estimated values (the code for this paper includes scripts to reproduce all tests and figures using data that either includes or excludes estimated values).
When multiple studies were available for an AID in a country, we used the acrossstudy arithmetic mean of each incidence rate measure as the measure value for that AIDcountry pair (for IR MALE , IR FEMALE , IR TOTAL , or IRSB measures). Overall, surveying 167 published studies (Supplementary References), we calculated 133 country-specific AID incidence rate sex bias data points for 17 AIDs in 33 countries (Table S1, e.g., the mean incidence rate sex bias for Type 1 diabetes in Spain is one such data point).

Cancer Incidence Data Curation
Cancer incidence rates were calculated from GLOBOCAN [10] data for all but three countries. For each country for which we had AID data, we computed each cancer type's incidence rate measure for each year and then averaged the yearly measure values to produce a single measure value for each country-cancer pair (for IR MALE , IR FEMALE , IR TOTAL , or IRSB measures). Cancer data for Finland [11], Sweden [12], and Taiwan [13] were collected from country-specific databases. For Finland and Sweden we calculated each incidence rate measure as the across-year average yearly measure for each cancer type for the most recent 20 years (1999-2019) for each country. For Taiwan we calculated each measure as the average of the measure for the two available time periods (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007). Overall, we calculated 165 country-specific cancer incidence rate sex bias data points for 17 cancer subtypes in 29 countries (Table S2; for an additional four countries we were unable to find population-level cancer incidence data).

Pairing AID and Cancer Incidence Data
Across 12 human tissues we paired 17 AIDs with 17 cancer types for a total of 24 cancer-AID data pairs. To compute the correlation between AIDs and cancer incidence rate sex biases across tissues, we grouped AIDs with matched cancers occurring in the same tissue in the same country (Table S3). For example, for the UK, we paired thyroid AID data points for Hashimoto's hypothyroidism and Graves' hyperthyroidism with cancer data points for thyroid carcinoma and thyroid sarcoma, resulting in 4 possible thyroid cancer-AID pairs. The 133 country-specific AID incidence rate sex bias data points were matched to the 165 country-specific cancer incidence rate sex bias data points, yielding a total of 182 country-specific, tissue-matched cancer-AID incidence rate sex bias data pairs that are jointly present in both the AID and cancer datasets (Table S2).

Gene Expression Analysis of Human Tissues
Gene expression was calculated from GTEx v8 data [8] provided in transcripts-permillion (TPM). For gene i and tissue k with m samples we calculated the within-tissue gene expression (GE) as the arithmetic mean TPM across samples as (where "POP" stands for either the "MALE", "FEMALE", or "TOTAL" population, Formulas (6) and (7)): and the gene expression sex bias (ESB) as: where both GE MALE,i,k and GE FEMALE,i,k are positive. For a set of n genes N and tissue k with m samples, we calculated the within-tissue gene set activity (GSA) as the geometric mean of gene expression across genes (Formulas (8) and (9)): (8) and the gene set activity sex bias (ASB) as: where both GSA MALE,N,k and GSA FEMALE,N,k are positive.

Gene Set Enrichment Analysis across Human Functional Pathways
We performed gene set enrichment analysis (GSEA) in three steps. First, for each gene in each tissue we calculated the gene expression sex bias (ESB). Second, we computed the across-tissue Spearman correlation of the ESB of each gene with AID or cancer IRSBs (we abbreviate these correlations as corr ESB/IRSB ). We also computed aggregated or "joint" corr ESB/IRSB values as the average for each gene of its corr ESB/IRSB value for AID IRSBs and its corr ESB/IRSB value for cancer IRSBs. Finally, for each of these three phenotypes, we ordered all the genes from greatest to least by the corr ESB/IRSB values and performed a GSEA [14] to identify gene sets and pathways that were either significantly positively or negatively associated with IRSB (for gene sets used see Results). We considered GSEA results significant if the adjusted p ≤ 10 −3 (we used the Benjamini-Hochberg method to adjust p-values for multiple tests) and ranked the results by normalized enrichment score (NES) [14].

Results
We surveyed 167 published AID studies and the cancer registries for 29 countries to assemble 182 country-specific, tissue-matched cancer-AID incidence rate sex bias data pairs (Methods ; Table S2). For each study, we calculated the incidence rate sex bias (IRSB) as IRSB = log 2 (IR MALE /IR FEMALE ), where IR MALE and IR FEMALE are the male and female incidence rates, so that a value of zero indicates no bias, a positive value indicates a higher incidence rate in males (termed a "male bias") and a negative value indicates a higher incidence rate in females (similarly termed a "female bias"). Having assembled these data, we computed the mean IRSBs to get a view of tissue-matched cancer and AID incidence rate sex bias across tissues, yielding global IRSB values for 17 AIDs and 17 cancer types across 12 human tissues, comprising a total of 24 cancer-AID data pairs. As expected, most AID incidence rates are female-biased (a negative sex-bias score), while most cancer incidence rates are male-biased (a positive sex-bias score) ( Figure 1A, Table S4). Figure 1B presents the correlation of the IRSB of these disorders across human tissues, summed up across all countries surveyed. Notably, we find an overall positive correlation (Pearson correlation r = 0.48 with two-sided t-test p = 0.017, Spearman correlation r = 0.43 with two-sided t-test, p = 0.034). Repeating this analysis using various levels of cancer type classification shows a consistent and robust correlation (Figures S1 and S2, Table S5). (We used Pearson's product-moment correlation coefficient to measure correlation because it takes effect size into account. We also provide correlation test results based on Spearman's rank correlation coefficient as this assesses correlation differently and may be of interest to the reader. We considered a correlation test result significant when the t-test adjusted p ≤ 0.05. We used the Benjamini-Hochberg method to adjust p-values for multiple tests). Second, studying this correlation in a country-specific manner for the four countries with at least 18 AID-cancer data pairs, we find a country-specific significant correlation for Sweden, while the correlations for Denmark, the UK and the USA have q-values (p-values corrected for multiple hypotheses testing) > 0.05 but are quite close to this threshold, showing a consistent trend for each country ( Figure 1C).  types (y-axis). All data points are shown. Box shows interquartile range (IQR, first quartile to third quartile), with center bar representing the median (second quartile). Lefthand whisker extends from first quartile (Q1) to Q1−1.5×IQR or to the lowest value point, whichever is greater. Righthand whisker extends from third quartile (Q3) to Q3 + 1.5×IQR or to the highest value point, whichever is smaller. Positive median sex bias (red) indicates median with higher incidence rate in men; negative median sex bias (blue) indicates median with higher incidence rate in women. To fit the data compactly, the x-axes for the left and right panels differ in two unconventional ways: Observing this fundamental correlation, we next asked if we could identify factors that might jointly modulate both the incidence rate sex bias observed in cancer and in AID across human tissues. We conducted both an unbiased general investigation and a hypothesis-driven one. We specifically examined four major factors that have been previously associated in the literature with the incidence rates of cancers and AIDs and/or their incidence rate sex biases. Those include (1) inflammatory or immune activity in the tissue [15,16]; (2) expression of immune checkpoint genes [17,18]; (3) the extent of X-chromosome inactivation [6,19]; and finally, (4) mitochondrial activity [20,21] and mitochondrial DNA copy number [22,23].
Having these literature-driven specific hypotheses in mind, we still have chosen to begin by systematically charting the landscape of gene sets whose sex-biased enrichment in normal tissues is associated with IRSB in cancers and AIDs in an unbiased manner (see Section 2.5). We analyzed gene expression data from non-diseased tissue samples from GTEx v8 [8], for tissues in which both cancer and AID arise; GTEx data were available for 10 of the 12 tissues we studied above (Table S3). First, (1) for each gene in each tissue we calculated the expression sex bias (ESB) as ESB = log 2 (GE MALE /GE FEMALE ), where GE MALE or GE FEMALE denote the average gene expression in TPM (transcripts-per-million) for male or female samples of the tissue. (2) Second, we computed the correlation of the expression sex bias of each gene with AID or cancer IRSBs (we abbreviate these corre-lations as corr ESB/IRSB ). We also computed aggregated or "joint" corr ESB/IRSB values as the average for each gene of its corr ESB/IRSB value for AID IRSBs and its corr ESB/IRSB value for cancer IRSBs. (3) Finally, for each of these three phenotypes, we ranked all the genes from top to bottom by the corr ESB/IRSB values and performed a gene set enrichment analysis (GSEA) [14,24] to identify gene sets and pathways that were either significantly positively or negatively associated with IRSB. In total, this analysis covered 7763 gene sets, including gene ontology biological process sets and chromosome-location based sets from MSigDB [25], three X-chromosome gene sets (fully escape X-inactivation, variably escape X-inactivation, and pseudoautosomal region) [26], and finally, the two separate sets of nuclear-encoded genes whose protein products localize to the mitochondria and the 37 mitochondrial-genome-encoded genes [27]. In this study, we consider the 37 mitochondrial genes as a set; all the genes are listed individually and some previously identified associations between individual genes and either AIDs or cancer are shown in Table S6. Figure 2 shows the top positively and negatively corr ESB/IRSB enriched sets with p ≤ 10 −3 after multiple hypotheses test correction for AID incidence (positive, Panel A; negative Panel B), for cancer incidence (positive, Panel C; negative Panel D), and their joint aggregate enrichment for both AID and cancer incidence (positive, Panel E, negative, Panel F). Strikingly, the top enriched gene set (highest normalized enrichment score (NES)) in all three phenotypes is the set of 37 genes encoded on the mitochondrial genome, including many genes with high corr ESB/IRSB values. In contrast, while the (much larger) set of all genes encoding proteins that localize to the mitochondria is significantly enriched for cancer IRSB, it is not significantly enriched for AID IRSB, where it is only ranked 3842 out of 6420 (negatively) enriched gene sets. Several immune-related gene sets also show high and significant corr ESB/IRSB positive enrichments in accordance with one of our initial hypotheses ( Figure 2). However, the three different X chromosome gene sets studied in light of another one of our original hypotheses are not significantly enriched in corr ESB/IRSB values. Finally, several mRNA processing gene sets show strong negative significant correlations and high negative NES scores with AID and cancer incidence.
To obtain a clearer visualization of the key positively enriched gene sets described above, we summarized the expression of the genes composing a given gene set in a normal GTEx tissue by computing their geometric mean, giving us a single activity summary value (see Section 2.5). We then computed the correlation across tissues between these summary values of the gene sets in each normal tissue and the IRSBs of cancers or AIDs (Figure 3). In concordance with the results of the unbiased analysis presented above, we do not observe a significant correlation between cancer or AID incidence rate sex bias and the expression of key immune checkpoint genes (CTLA-4, PD-1, or PD-L1, Figure S3), or the extent of X-chromosome inactivation (quantified by the expression of XIST lncRNA [28], Figure S4). We also do not find such significant consistent correlations for the top immune gene sets found via the unbiased analysis (previously shown in Figure 2). However, we do find strong correlations between these summary values for the mitochondrial gene set, which was ranked highest in Figure 2 (gene set "MT"): Remarkably, we find that the sex bias of mtRNA expression in GTEx tissues is positively correlated both with AID incidence rate sex bias (Pearson r = 0.56, one-sided t-test p = 0.018) and with cancer incidence rate sex bias (Pearson r = 0.67, one-sided t-test p = 0.0058) (Figure 3A,B; the correlations between mtRNA expression and cancer and AID incidence rates for each of the sexes individually are provided in Figure S5). The significance of these two associations is further supported by observing that the basic correlation between cancer and AID IRSBs becomes insignificant when we compute the partial correlation between these two variables while controlling for the mtRNA expression bias (Pearson r = 0.21, two-sided t-test p = 0.42). Overall, these findings are in line with previous reports linking mitochondrial activity [20,21] and mtDNA copy number [22,23] with higher AID and cancer incidence. To obtain a clearer visualization of the key positively enriched gene sets described above, we summarized the expression of the genes composing a given gene set in a normal GTEx tissue by computing their geometric mean, giving us a single activity summary value (see Section 2.5). We then computed the correlation across tissues between these summary values of the gene sets in each normal tissue and the IRSBs of cancers or AIDs (Figure 3). In concordance with the results of the unbiased analysis presented above, we do not observe a significant correlation between cancer or AID incidence rate sex bias and the expression of key immune checkpoint genes (CTLA-4, PD-1, or PD-L1, Figure S3), or the extent of X-chromosome inactivation (quantified by the expression of XIST lncRNA [28], Figure S4). We also do not find such significant consistent correlations for the top immune gene sets found via the unbiased analysis (previously shown in Figure 2). However, we do find strong correlations between these summary values for the mitochondrial gene set, which was ranked highest in Figure 2 (gene set "MT"): Remarkably, we find that the sex bias of mtRNA expression in GTEx tissues is positively correlated both with AID incidence rate sex bias (Pearson r = 0.56, one-sided t-test p = 0.018) and with cancer incidence rate sex bias (Pearson r = 0.67, one-sided t-test p = 0.0058) (Figure 3A,B; the correlations between mtRNA expression and cancer and AID incidence rates for each of the sexes individually are provided in Figure S5). The significance of these two associations is further supported by observing that the basic correlation between cancer and AID IRSBs becomes insignificant when we compute the partial correlation between these two variables while controlling for the mtRNA expression bias (Pearson r = 0.21, two-sided t-test p = 0.42). Overall, these findings are in line with previous reports linking mitochondrial activity [20,21] and mtDNA copy number [22,23] with higher AID and cancer incidence.

Discussion
The correlative findings between the expression of mitochondrially encoded genes and cancer and AID IRSBs across human tissues are quite surprising, giving rise to two further fundamental questions. First, what biological mechanisms may be associated with sex differences in overall mitochondrial functioning? One potential candidate may be estrogen signaling, which has been shown to regulate at least four mitochondrial functions relevant to health and disease [29], including, (1) biogenesis of mitochondria, whose levels

Discussion
The correlative findings between the expression of mitochondrially encoded genes and cancer and AID IRSBs across human tissues are quite surprising, giving rise to two further fundamental questions. First, what biological mechanisms may be associated with sex differences in overall mitochondrial functioning? One potential candidate may be estrogen signaling, which has been shown to regulate at least four mitochondrial functions relevant to health and disease [29], including, (1) biogenesis of mitochondria, whose levels differ across sexes and tissues [30], (2) T-cell metabolism (including mitochondrial activity measured by Seahorse assays) and T-cell survival (estimated by retention of inner membrane potential) [31], (3) unfolded protein response [32] (mediated partly via mitochondrial superoxide dismutase) [33], and (4) generation of reactive oxygen species (ROS) [34]. Second, how might sex differences in mitochondria functioning modulate the sex-biased incidence observed in cancers and AIDs? One possible mechanism is through differences in ROS production, which notably involves quite a few mitochondrially encoded genes: Increased mitochondrial ROS generation has been associated with both the initiation and intensification of autoimmunity in several organ-specific AIDs [20] and with cancer initiation and progression [21]. More generally, alterations in mtDNA copy number have been associated with increased risk of lymphoma and breast cancer, [22] and somatic mtDNA mutations producing mutated peptides may trigger autoimmunity [23].
Our analyses have a few limitations and we list three main ones. First, the majority of our AID-cancer data pairs are from European countries (113 of 182 [62%]), which might introduce geographic, ethnic, or social biases. This geographic bias in our AID-cancer data pairs is largely due to a paucity of suitable AID epidemiological studies based on populations outside of North America and Europe, particularly on populations in lowand middle-income countries [35,36]. For example, despite the geographically widespread study of common AIDs such as multiple sclerosis and type 1 diabetes, for other common AIDs such as Hashimoto's hypothyroidism, Graves' hyperthyroidism, and ulcerative colitis, data from regions outside North America and Europe are sparse [37]. Second, factors beyond biological drivers, such as sex differences in the propensity to seek medical care or reporting of specific diseases, are not characterized in the datasets studied. However, putative disease-specific effects may be somewhat mitigated given the opposite tendency of sex biases for AIDs and cancers in a study of tissue-specific correlations like ours. Although there is evidence for the role of environmental exposures in the development of some AIDs [38], there is little evidence for sex-specific exposures contributing to sex biases in AID incidence [39]. Likewise, a recent study of the contribution of risk factors to sex disparities in the incidence of solid tumor cancers at 21 anatomical sites found that differences between male and female incidence rates are largely unexplained by factors outside of sex-related biological factors [40]. Third, although much of the incidence rate data is age-standardized, we could not take additional steps to account for age-related incidence rate differences as the sample sizes available are too small to enable doing such an analysis in a robust manner.
It has been hypothesized that chronic inflammation leads to cancer [41,42]. Indeed, a recent study analyzing UK Biobank data found positive associations between several tissuespecific immune-mediated diseases (i.e., diseases such as asthma and myositis in addition to AIDs) and subsequent cancer risk in the same individual [43]; they did not however consider sex-bias. Both this study and our study seek statistical evidence of shared risks between immune diseases and cancer. However, in contrast to this individual-level study design assessing within-subject risk of sequential disease appearance in the UK population, we chose a population-level study design assessing within-population correlations between IRSBs for AIDs and cancers affecting the same tissues across populations. A strength of our study is that when AID studies reported whether individuals developing AIDs had previous immune-related diseases or cancers we excluded individuals with such previous diseases from our study.
As in humans, sex differences have been reported in animal studies of diseases, which has prompted us to search the literature and survey previous studies of sex bias in disease incidence in rodent models of cancers and AIDs. We focused on studies of sex difference in spontaneous and/or autochthonous carcinogenesis by either carcinogen treatment or genetic engineering, excluding transplantation of syngeneic animals because these animals do not model disease development (representative examples are listed in Tables S7 and S8 for cancer and AIDs, respectively). Table S7 lists our cancer incidence findings, where the sex bias was male skewed in colon, liver, kidney, pancreas, and stomach, and higher in females in the thyroid, consistent with the human reports. Interestingly, for colon, liver, kidney, pancreas, and thyroid, the sex bias disappeared or was reduced when the animals were subjected to castration/ovariectomy or hormone treatment, supporting the notion that the differences in these organs are likely to be driven by sex hormones. Table S8 lists AID rodent models that allow for direct comparisons to the human data. The AID sex bias reported is however generally higher in males than in females, in difference from the human findings, but the higher male bias observed in kidney, colon, pancreas, and skin compared to the thyroid is maintained.

Conclusions
In summary, we find a surprising overall positive correlation between cancer and AID incidence rate sex biases across many different human tissues. Among key factors that have been previously associated with sex bias in either AID or cancer incidence, we find that the sex bias in the expression of mitochondrially encoded genes (and possibly in the expression of a few immune pathways) stands out as a key factor whose aggregate level across human tissues is quite strongly associated with these incidence rate sex biases. Our findings thus call for further mechanistic studies on the role of mitochondrial gene expression in determining cancer and AID incidence and their incidence rate sex biases.
Supplementary Materials: The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/cancers14235885/s1, Figure S1: The correlation between incidence rate sex bias for cancers and AIDs (using low-resolution cancer classification); Figure S2: The correlation between incidence rate sex biases for cancer types and AIDs (using high-resolution cancer classification); Figure S3: The correlation between checkpoint gene expression sex bias versus AID and cancer incidence rate sex bias; Figure S4: The correlation between XIST gene expression versus AID and cancer incidence rate sex bias; Figure S5: The correlation between mitochondrial gene expression versus AID and cancer incidence rates in males and females; Table S1: Curated autoimmune disease study data; Table S2: Tissue-based cancer and autoimmune disease pairings; Table S3: Matched cancer and autoimmune disease data pairs; Table S4: Cancer and AID incidence rate averages; Table S5: Expanded cancer classifications; Table S6: The 37 genes encoded on the human mitochondrial genome, and some known associations between individual genes and either autoimmune diseases or cancers; Table S7: Curated cancer rodent model data; Table S8: Curated autoimmune disease rodent model data. Supplementary references. References and DOI links for studies curated in Table S1.

Institutional Review Board Statement:
This study is based on published aggregate data and hence not subject to Institutional Review Board review.

Informed Consent Statement:
This study is based on published aggregate human subjects data. No individual human subjects were enrolled. Data Availability Statement: All data used is publicly available. Code and data (including links to original sources, raw data downloaded from those sources, and processed data files) used for analysis are available on Zenodo at DOI: https://doi.org/10.5281/zenodo.7058954 (last accessed on 16 September 2022). Statistical analyses and figure preparation were performed on a Macintosh computer (OS 12.5.1; 32 GB memory; 8-core 2.3 GHz processor) in RStudio (v2021.09.0 + 351 "Ghost Orchid" release) [44] running the R language (v4.1.2) [45] (R package versions used are listed in the Zenodo repository). Plots produced in R were aligned and lettered using Inkscape (v1.0) (inkscape.org (last accessed on 16 September 2022)) to produce multi-plot figures.