Cross-Species Transcriptomics Analysis Highlights Conserved Molecular Responses to Per- and Polyfluoroalkyl Substances

In recent decades, per- and polyfluoroalkyl substances (PFASs) have garnered widespread public attention due to their persistence in the environment and detrimental effects on the health of living organisms, spurring the generation of several transcriptome-centered investigations to understand the biological basis of their mechanism. In this study, we collected 2144 publicly available samples from seven distinct animal species to examine the molecular responses to PFAS exposure and to determine if there are conserved responses. Our comparative transcriptional analysis revealed that exposure to PFAS is conserved across different tissues, molecules and species. We identified and reported several genes exhibiting consistent and evolutionarily conserved transcriptional response to PFASs, such as ESR1, HADHA and ID1, as well as several pathways including lipid metabolism, immune response and hormone pathways. This study provides the first evidence that distinct PFAS molecules induce comparable transcriptional changes and affect the same metabolic processes across inter-species borders. Our findings have significant implications for understanding the impact of PFAS exposure on living organisms and the environment. We believe that this study offers a novel perspective on the molecular responses to PFAS exposure and provides a foundation for future research into developing strategies for mitigating the detrimental effects of these substances in the ecosystem.


Introduction
Per-and polyfluoroalkyl substances (PFASs) are a heterogeneous class of fluorinated synthetic compounds encompassing a great number of molecules with different structures [1]. They have gained global notoriety due to their persistence and adverse effects on living organisms and environmental health [2]. While a compendious definition of these chemicals is challenging to provide, the Organization of Economic Co-operation and Development (OECD) recently defined PFAS as molecules containing at least a perfluorinated methyl (-CF 3 ) or a perfluorinated methylene group (-CF 2 -) without any H/Cl/Br/I attached to it [3]. However, there are several PFAS classifications that are based on diverse definitions and include a variable number of molecules. For instance, PubChem's classification, based on OECD's general description, includes more than 6.3 million PFAS molecules [4], while the United States Environmental Protection Agency's (EPA) classification, founded on molecular substructures and a threshold of fluorine percentage [5], contains 14,735 compounds [6]. Despite the challenges and discrepancies in defining these substances, the OECD currently recognizes 4730 molecules as bona fide PFASs, which are further classified based on their carbon chain length and molecular structure, which determines their unique physicochemical properties and environmental behavior. Shortchain and long-chain PFASs are distinguished based on their carbon chain length, and polymeric and non-polymeric PFASs are differentiated based on the presence or absence of repeating monomer units in their molecular structure. Moreover, PFASs are commonly classified based on their legal status as either legacy or emerging PFASs. Emerging PFASs are compounds such as HFPO-DA or GenX, ADONA, or C6O4, which were introduced after the ban on perfluorooctane sulfonate (PFOS) and perfluorooctanoate (PFOA) production, import, and use. These emerging PFASs are characterized by a shorter C-F backbone and are considered less hazardous than legacy PFASs due to their lower bioaccumulation potential and toxicity [7].
PFASs have unique chemical properties that fostered their widespread production and use in a multitude of industrial products since the 1950s [8]. The C-F bond in PFAS molecules confers high molecular stability but also results in high resistance to degradation [8]. Additionally, the chemical attributes of amphiphilic and hydrophobic PFASs make them ideal surfactants and surface protectors, while also making them resistant to high temperatures. The versatility of PFASs has led to their use in a wide range of products, including non-stick pans, firefighting foams (aqueous film-forming foams, AFFF), waterproof textiles, pesticides, building and construction materials, cleaning products and medical and personal care products, among many others [2,8].
Despite their widespread use, the potential risks of PFAS exposure to human health and the environment have become increasingly apparent. PFASs have been found to be ubiquitously present in the environment, where, thanks to their intrinsic chemical stability, they can persist for several years owing to their resistance to degradation [2]. Water basins have been identified as major repositories of PFASs and are capable of transferring these substances over long distances, making the water ecosystem a crucial gateway for PFAS entry into the food chain up to humans [9,10]. Numerous studies have focused on specific PFAS molecules, such as PFOS and PFOA, and have shown that their accumulation can have detrimental effects on aquatic and terrestrial ecosystems, as well as on animal species and plants. As a result, limitations on the use of PFOA and PFOS were introduced in some regions [2,[11][12][13][14]. Moreover, the presence of PFASs in human biological matrices has been highlighted in numerous studies, with a global distribution. PFASs have been detected in serum [15,16], breast milk [17,18], placenta [19,20], hair [21] and semen [22], indicating widespread exposure in human populations.
The vast majority of physiological and molecular research on PFASs has been directed towards human health, revealing their toxicological effects on biological processes and metabolism. These negative impacts include reduced fertility, altered gene transcription [12,[23][24][25][26][27][28][29] and the promotion of certain types of cancer, such as kidney and liver cancer [30,31]. However, there are conflicting data on the involvement of PFASs in cancer pathogenesis [30]. Furthermore, PFASs have been shown to negatively affect the activity of the immune system, particularly in children, by impairing immune reactions and vaccination responses [23][24][25]. Lipid metabolism is also heavily impacted by PFAS exposure, leading to dyslipidemia and increased plasma levels of cholesterol [32][33][34][35][36][37].
Numerous studies have demonstrated that PFASs affect multiple species through detectable molecular mechanisms [38][39][40][41]. These compounds can directly interact with molecules such as the peroxisome proliferator-activated receptor α (PPARα), which mediates PFAS toxicity [42]. Most importantly, PFASs are capable of modifying the transcriptional expression of many genes in humans and other species [12], which has significant repercussions on the mentioned pathways and diseases.
Despite the vast evidence of transcriptional changes induced by PFASs in multiple species and despite the presence of numerous quantitative transcriptome-wide studies measuring gene expression responses to PFAS exposure [38,39,43], a comprehensive and comparative analysis of the data generated by these studies has yet to be performed. To address this gap, we propose a rational integration and comparison of transcriptome-wide studies performed in animal species and cell models, in the form of RNA-Seq or microarray datasets. Using the opportunities offered by transcriptomics, we aim to elucidate the molecular effects induced by PFASs not only at the single gene level but also across different pathways and cell types. Our research provides a comprehensive understanding of the molecular mechanisms underlying PFAS toxicity that translate across species while accelerating evidence-based policies and treatments to safeguard public and environmental health.

Data collection and Processing
We conducted an extensive literature search across databases to identify all transcriptomewide quantitative studies focusing on the effects of PFASs on animal samples. A total of 11 transcriptomics datasets were identified, containing publicly available data from 7 different species (Homo sapiens, Mus musculus, Caenorhabditis elegans, Danio rerio, Gadus morhua, Micropterus salmoides, Pimephales promelas) ( Table 1) [38][39][40][41][43][44][45][46][47][48][49]. Raw data associated with these studies were retrieved from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/ accessed on 1 December 2022) [50] and the Sequence Read Archive (SRA) database (https://www.ncbi.nlm.nih.gov/sra accessed on 1 December 2022) [51], both hosted at the National Center for Biotechnology Information (NCBI).  2019 [49] Raw sequence data (FASTQ files) from the D. rerio [41] and G. morhua [43] datasets were downloaded from the SRA database [51] using SRA Toolkit version 3.0.1. These reads were aligned to the respective reference genomes (zebrafish genome version danRer11/GRCz11 and Atlantic cod genome version gadMor3.0) using the HISAT2 alignment program version 2.1.0 [52]. The BAM files containing the aligned reads of zebrafish and Atlantic cod were processed with featureCounts version 2.0.0 [53] to obtain matrices containing the gene counts for each sample. The other datasets were directly downloaded from the NCBI GEO database [50] with most of them being in the form of gene counts matrices, while the Pfohl et al. 2021 dataset [47] was available through CEL files (which are files commonly produced by Affymetrix DNA microarray image analysis software).
All statistical analyses were conducted in the R statistical software version 4.2.2 and Bioconductor version 3.16. To generate graphs for this manuscript, we used base R functions and R packages including ggplot2 version 3.4.1 [54], corrplot version 0.92, corto version 1.2.0 [55] and ComplexHeatmap version 2.14.0 [56].
RNA-Seq gene-based reads counts were directly loaded into the R environment, while R package oligo version 1.62.2 was used to import and process CEL files. Microarray data were normalized using RMA normalization [57]. R package GEOquery version 2.66.0 [58] was utilized to recover the metadata containing the information about the experimental design.
All sequencing data alignment and gene expression quantification steps were performed on an HPC-dedicated DELL EMC server with an AMD EPYC 7301 32-core processor and 256 GB of RAM. Microarray normalization, post-normalization statistical analysis and graphics were carried out on a Windows 10 machine Intel Core i7-10700 CPU with 32 GB of RAM (manufacturer: LENOVO, Beijing, China).

Differential Gene Expression Analysis
To comprehensively assess the transcriptome-wide response to PFASs, we designed an approach of comparison of 110 total differential gene expression contrasts, using for each dataset a balanced PFAS-treated vs. control design, with at least three replicates per group. For RNA-seq data, we used the DESeq2 R package version 1.38.3 [59] on raw read counts. For microarray data, we implemented the default pipeline of the limma R package version 3.54.1 [60]. Due to the significantly higher number of contrasts in two H. sapiens datasets [38,44] than all others (Table 1), we decided to retain only a PFAS concentration of 20 µM in these two datasets [38,44]. In the case of the P. promelas dataset [48], lowexposure specimens from Upper Prior Lake were used as PFAS-treated samples. Overall, the differential gene expression analysis was implemented on 110 separated contrasts, encompassing all datasets ( Table 1). All contrasts yielded more than 10 significantly (at p ≤ 0.05) differentially expressed genes in response to PFASs (Supplementary Table S1).
For each contrast of the datasets, we generated a gene-by-gene transcriptome-wide signature, defined by the following formula: where p represents the p-value of the differential expression (calculated by limma or DESeq2) and FC represents the fold change of the differential expression.
In essence, this formula (implemented in several other transcriptomics publications, such as Alvarez et al., 2016 [61]) assigns a numerical value to each gene that is positive for significantly up-regulated genes, and negative for significantly down-regulated genes. The magnitude of the numerical value is proportional to the tested significance of the change.

Ortholog Prediction
To enable the comparison of gene expression data across different species, we devised a phylogenetic gene conversion approach to convert all gene signatures to a common gene identifier.
In order to do so, we performed a direct species-to-human conversion using the DRSC Integrative Ortholog Prediction Tool (DIOPT) database version 9.0 [62] for all available species in the database. For species not available in DIOPT, we utilized the R package orthogene version 1.4.1 [63] to perform the conversion. In instances where species were not available in either database (specifically, for Micropterus salmoides and Pimephales promelas), we employed a bidirectional best-hit approach based on BLASTn version 2.12.0+ [64], using the sequences associated with each microarray probe as a query, and the zebrafish cDNA version danRer11/GRCz11 as the target database. We then converted the identifiers from zebrafish to human using DIOPT. All ortholog conversions used in this study are available in Supplementary Table S2.
The resulting matrix of signatures, based on the most likely human ortholog, contained 110 contrasts (PFAS vs. control) and was used for subsequent analysis (Supplementary Table S3).

Signature Analysis
To assess the similarities between gene expression signatures, we employed Pearson correlation, provided by the R basic function cor().
For the pathway enrichment analysis, we retrieved gene sets from KEGG, WikiPathways and Gene Ontology using the Molecular Signatures Database (MSigDB) [65]. We accessed the database via the R package msigdbr version 7.5.1 and implemented the enrichment analysis on the signatures using the R package fgsea version 1.24.0. This package uses an algorithm for expedited and parallel gene set enrichment analysis [66].
To integrate the normalized enrichment scores (NES) derived from the pathway enrichment analysis, we employed Stouffer integration as implemented by the corto R package version 1.2.0 [55] and as performed before [61]. Z-scores were converted to p-values, where needed, using the z2p() function from the aforementioned corto pack-age [55]. All p-values were corrected using the Benjamini-Hochberg method. All the R code used to integrate data and generate the figures in this paper is available on Github at the following address: https://github.com/federicogiorgi/pfas (accessed on 14 June 2023).

Metabolites Prediction
We employed a correlation-based method to predict metabolites based on gene expression signatures, as described in Cavicchioli et al., 2022 [67]. Briefly, this method assesses the correlation structure between metabolites and transcripts measured in the Cancer Cell Line Encyclopedia metabolomics/transcriptomics dataset [68] and then predicts the metabolite levels in scenarios where only transcripts are available.
We applied this analysis to 55 PFAS exposure contrasts of three human datasets included in this study [38,44,45]. Prior to the analysis, RNA-seq gene expression count data were normalized using variance stabilizing transformation (VST) [69]. We then integrated the normalized enrichment scores (NESs) generated contrast by contrast (Supplementary  Table S4) using the Stouffer integration method, as implemented by the corto R package [55]. The p-values were corrected using Benjamini-Hochberg method.

Datasets
Using the literature and biological databases, we searched all publicly available transcriptome-wide PFAS quantitative data, in order to build the most comprehensive collection available to date. Our search retrieved 2144 samples from 11 datasets and from 7 different species for our analysis. Table 1 provides detailed information about each dataset, including the overall study design, tested PFAS molecules, number of samples, and tissues analyzed.

Correlation Analysis
To assess whether PFASs promote similar responses across species, we extracted transcriptional signatures from each PFAS vs. control contrast (Supplementary Table S3). Our comparative transcriptional analysis revealed that exposure to different PFAS molecules determines both intra-and interspecies correlations (Figure 1), indicating that this class of compounds induces conserved biological responses among species, despite the high phylogenetic distance between the species analyzed herein. Notably, our analysis demonstrated a general preponderance of positive correlation, with greater values in intraspecies comparison (Figures 1 and S1).
Relating to cross-species correlation, our analysis revealed a strong positive correlation between the transcriptional signatures of H. sapiens and M. musculus, especially when exposed to the same PFAS molecule (Figures 2A and S2), highlighting the close evolutionary proximity between the two species. We detected interspecies positive correlations as high as 0.36 (Figure 2A), which is extremely significant (p-value = 1.52 × 10 −68 , Figure 2B). This similarity was observed between the liver of wildtype mice [39] and human liver spheroids [38], both exposed to PFOA, although at different concentrations and exposure times. Relating to cross-species correlation, our analysis revealed a strong positive correlation between the transcriptional signatures of H. sapiens and M. musculus, especially when exposed to the same PFAS molecule (Figures 2A and S2), highlighting the close evolutionary proximity between the two species. We detected interspecies positive correlations as high as 0.36 (Figure 2A), which is extremely significant (p-value = 1.52 × 10 −68 , Figure 2B). This similarity was observed between the liver of wildtype mice [39] and human liver spheroids [38], both exposed to PFOA, although at different concentrations and exposure times.  Table S3). The upper bar denotes the tissue of origin of each contrast. This correlation between the transcriptional signature of H. sapiens [38] and M. musculus [39] is driven by genes that are differentially expressed (p-value ≤ 0.001) in both species in response to PFAS exposure, as highlighted in Figure 2B. Among these genes, CYP4A11 is highly up-regulated in both species and encodes an ω-hydroxylase of the CYP450 gene family, which is involved in the metabolism of fatty acids, such as arachidonic acid. CYP4A11 is highly expressed in the liver and kidney, where it synthetizes the 20-hydroxyeicosatetraenoic acid (20-HETE) from arachidonic acid [70]. 20-HETE has been shown to have cardiotoxic and vasoconstrictive activity, and its increased synthesis is associated with vascular inflammation and hypertension [71]. Remarkably, CYP4A11 up-regulation has been associated with non-alcoholic fatty liver disease (NAFLD), since it increases the intracellular production of reactive oxygen species (ROS) and pro-inflammatory cytokines [72]. Our result is in line with data showing that exposure to PFOA is positively related to NAFLD development [73]. The other upregulated genes ( Figure 2B) are mainly implicated in lipid metabolism, mitochondrial function, and stress response, while down-regulated genes participate in immune response and inflammation, thrombosis, and cellular adhesion.
Our analysis also revealed a significant positive correlation (0.22, p-value = 7.89 × 10 −20 ) between the transcriptional signatures of H. sapiens from Rowan-Carroll et al.'s (2021) dataset [38] and D. rerio from Dasgupta et al.'s (2020) dataset [41]. Notably, this correlation is driven by the down-regulation of various genes encoding different types of collagen (Supplementary Figure S3).
In addition to positive correlations, our analysis also highlighted significant negative correlations, both between distinct species and between different tissues of the same species (Figures 1 and S1). We hypothesize that exposure to PFAS substances elicits opposite responses depending on the tissue analyzed, both within and across different species. These results might be due to histological differences in gene expression among distinct tissues, as similarly observed by Glinos and colleagues [74], where the same molecules trigger distinct transcriptional changes as demonstrated for drug-metabolizing enzymes [75]. Illustratively, the negative values were most prominently observed in fish This correlation between the transcriptional signature of H. sapiens [38] and M. musculus [39] is driven by genes that are differentially expressed (p-value ≤ 0.001) in both species in response to PFAS exposure, as highlighted in Figure 2B. Among these genes, CYP4A11 is highly up-regulated in both species and encodes an ω-hydroxylase of the CYP450 gene family, which is involved in the metabolism of fatty acids, such as arachidonic acid. CYP4A11 is highly expressed in the liver and kidney, where it synthetizes the 20-hydroxyeicosatetraenoic acid (20-HETE) from arachidonic acid [70]. 20-HETE has been shown to have cardiotoxic and vasoconstrictive activity, and its increased synthesis is associated with vascular inflammation and hypertension [71]. Remarkably, CYP4A11 up-regulation has been associated with non-alcoholic fatty liver disease (NAFLD), since it increases the intracellular production of reactive oxygen species (ROS) and pro-inflammatory cytokines [72]. Our result is in line with data showing that exposure to PFOA is positively related to NAFLD development [73]. The other up-regulated genes ( Figure 2B) are mainly implicated in lipid metabolism, mitochondrial function, and stress response, while down-regulated genes participate in immune response and inflammation, thrombosis, and cellular adhesion.
Our analysis also revealed a significant positive correlation (0.22, p-value = 7.89 × 10 −20 ) between the transcriptional signatures of H. sapiens from Rowan-Carroll et al.'s (2021) dataset [38] and D. rerio from Dasgupta et al.'s (2020) dataset [41]. Notably, this correlation is driven by the down-regulation of various genes encoding different types of collagen (Supplementary Figure S3).
In addition to positive correlations, our analysis also highlighted significant negative correlations, both between distinct species and between different tissues of the same species (Figures 1 and S1). We hypothesize that exposure to PFAS substances elicits opposite responses depending on the tissue analyzed, both within and across different species. These results might be due to histological differences in gene expression among distinct tissues, as similarly observed by Glinos and colleagues [74], where the same molecules trigger distinct transcriptional changes as demonstrated for drug-metabolizing enzymes [75]. Illustratively, the negative values were most prominently observed in fish species, where different tissues of distinct species, such as G. morhua (ovary [43]) and P. promelas (blood [49]), and of the same species, as in the case of M. salmoides (liver and testis [48]), exhibited moderate but significant negative correlations (Figures 3 and S4). For instance, the negative correlation of −0.2 between the blood sample of P. promelas exposed to PFOS at 0.5 µg/L and the ovary of G. morhua exposed to PFOS at low concentration is particularly significant with a p-value of 6.99 × 10 −49 .

of 22
species, where different tissues of distinct species, such as G. morhua (ovary [43]) and P. promelas (blood [49]), and of the same species, as in the case of M. salmoides (liver and testis [48]), exhibited moderate but significant negative correlations (Figures 3 and S4). For instance, the negative correlation of −0.2 between the blood sample of P. promelas exposed to PFOS at 0.5 µg/L and the ovary of G. morhua exposed to PFOS at low concentration is particularly significant with a p-value of 6.99 × 10 −49 .

Generation of a Cross-Species PFAS Responses
Once it was ascertained that exposure to PFAS molecules induces significantly similar transcriptional changes across different species, our primary objective was to identify which genes are most responsible for this transcriptional conservation and therefore define the molecular basis for this observed conservation. In order to overcome the uneven representation of species in our signature analysis (Table 1), we performed a weighted Stouffer integration on the signature matrix, giving equal representation to each species in our dataset. This approach enabled us to pinpoint the genes that were over-and under-expressed across all species.
We successfully identified 3435 genes appearing in at least six species of the seven

Generation of a Cross-Species PFAS Responses
Once it was ascertained that exposure to PFAS molecules induces significantly similar transcriptional changes across different species, our primary objective was to identify which genes are most responsible for this transcriptional conservation and therefore define the molecular basis for this observed conservation. In order to overcome the uneven representation of species in our signature analysis (Table 1), we performed a weighted Stouffer integration on the signature matrix, giving equal representation to each species in our dataset. This approach enabled us to pinpoint the genes that were over-and underexpressed across all species.
We successfully identified 3435 genes appearing in at least six species of the seven species included in our dataset (Figure 4). Our analysis highlights genes that are most consistently up-or down-regulated by PFASs in the dataset. Nine genes (EHHADH, RETSAT, GCLM, ACOX1, HADHB, ARHGAP27, DECR1, HADHA, and POR, depicted in orange in Figure 4) are characterized by an elevated and positive integrated signature (≥10 Stouffer integrated Z-score, corresponding to p-value ≤ 1.6 × 10 −23 ), but also by a high (≥10) signature standard deviation across our dataset; these nine genes are therefore induced by PFASs in a strong and conserved way, albeit with heavy fluctuations across contrasts (see also Supplementary Figure S5), which may indicate outlying contrasts. We then highlighted 25 genes significantly up-regulated (≥5 Stouffer integrated Z-score, corresponding to p-value ≤ 5.8 × 10 −7 ) with lower standard deviation (<10), highlighted in red in Figure 4, and including acetyl-CoA acetyltransferase 1 (ACAT1), an inhibitor of DNA binding 1 (ID1) and vascular endothelial growth factor A (VEGFA). Among genes consistently repressed by PFASs, we found eight genes (FN1, MSMO1, TTR, HMGCR, FMO5, NEB, DPYS, and COL1A2, indicated in cyan in Figure 4) with strong down-regulation across the dataset (≤ −10 Stouffer integrated Z-score, corresponding to p-value ≤ 1.6 × 10 −23 ) and high standard deviation. We also highlighted 23 genes down-regulated at a lower standard deviation (<10) with ≤ −5 Stouffer integrated Z-score, corresponding to p-value ≤ 5.8 × 10 −7 , which include the PFAS-repressed oncogene ESR1, encoding for estrogen receptor. (see also Supplementary Figure S5), which may indicate outlying contrasts. We then highlighted 25 genes significantly up-regulated (≥5 Stouffer integrated Z-score, corresponding to p-value ≤ 5.8 × 10 −7 ) with lower standard deviation (<10), highlighted in red in Figure 4, and including acetyl-CoA acetyltransferase 1 (ACAT1), an inhibitor of DNA binding 1 (ID1) and vascular endothelial growth factor A (VEGFA). Among genes consistently repressed by PFASs, we found eight genes (FN1, MSMO1, TTR, HMGCR, FMO5, NEB, DPYS, and COL1A2, indicated in cyan in Figure 4) with strong downregulation across the dataset (≤ −10 Stouffer integrated Z-score, corresponding to p-value ≤ 1.6 × 10 −23 ) and high standard deviation. We also highlighted 23 genes down-regulated at a lower standard deviation (<10) with ≤ −5 Stouffer integrated Z-score, corresponding to p-value ≤ 5.8 × 10 −7 , which include the PFAS-repressed oncogene ESR1, encoding for estrogen receptor. While useful as a summarization technique, signature integration may hide odd behaviors in the response to PFASs across different contrasts. In order to investigate this potential issue, we visualized the signature of each of the 48 genes (25 + 23) up-and downregulated by PFASs across the 7 species and 110 contrasts ( Figure 5). All prioritized genes indeed show a consistent pattern of activation. It is to be noted, however, that for the data deriving from two species, the response to PFASs is almost negligible (C. elegans and P. promelas). Genes with higher standard deviation (cyan and orange dots in Figure 4) also showed consistent response to PFASs; however, their scores were heavily dominated by    , ECH1) genes. The presence of so many genes involved in lipid metabolism confirms previous data demonstrating that this metabolic process is highly affected by PFAS exposure [32][33][34][35][36]. A peculiar finding here is the USP42 gene, which is down-regulated by PFASs across species (Figure 4). USP42 encodes a deubiquitinating enzyme involved in embryonal testis development and spermatogenesis [76], and its presence amongst the most consistently down-regulated genes may provide a molecular link to the previously observed effects of PFASs on the male reproductive system [77].

Pathway Enrichment Analysis
In order to perform a more rigorous investigation of the molecular and biological processes most affected by PFASs, we calculated the pathway enrichment contrast of the signature matrix (Supplementary File S3) using the GSEA algorithm [78]. We then integrated the normalized enrichment scores (NES) across the datasets to identify the pathways that were predominantly up-and down-regulated. We identified 3275  While useful as a summarization technique, signature integration may hide odd behaviors in the response to PFASs across different contrasts. In order to investigate this potential issue, we visualized the signature of each of the 48 genes (25 + 23) up-and down-regulated by PFASs across the 7 species and 110 contrasts ( Figure 5). All prioritized genes indeed show a consistent pattern of activation. It is to be noted, however, that for the data deriving from two species, the response to PFASs is almost negligible (C. elegans and P. promelas). Genes with higher standard deviation (cyan and orange dots in Figure 4) also showed consistent response to PFASs; however, their scores were heavily dominated by specific contrasts in M. musculus and H. sapiens (Supplementary Figure S5). The 65 genes prioritized by our analysis were found to be consistently differentially expressed not only across different species but also across different tissues. A more detailed analysis of the signatures shows that the strongest impact of PFASs is observed in the liver and reproductive system of M. musculus, H. sapiens, G. morhua and M. salmoides, together with a strong response to PFASs in the embryonal development of D. rerio.

Pathway Enrichment Analysis
In order to perform a more rigorous investigation of the molecular and biological processes most affected by PFASs, we calculated the pathway enrichment contrast of the signature matrix (Supplementary Table S3) using the GSEA algorithm [78]. We then integrated the normalized enrichment scores (NES) across the datasets to identify the pathways that were predominantly up-and down-regulated. We identified 3275 pathways significantly up-and down-regulated by PFASs across species (integrated p-adjusted ≤ 0.05). In Figure 6, we show the most significantly up-and down-regulated pathways. pathways significantly up-and down-regulated by PFASs across species (integrated padjusted ≤ 0.05). In Figure 6, we show the most significantly up-and down-regulated pathways. As inferred in the previous paragraph, lipid metabolism appears to be amongst the cellular component most up-regulated in response to PFASs (Figure 6), with the "fatty acid transporters" WikiPathways gene set characterized by a p-adjusted of 1.79 × 10 −17 and the Gene Ontology "lipid import to cell" gene set at p-adjusted = 2.80 × 10 −12 . As previously mentioned, PFASs have a significant impact on this metabolic process, for example through the induction of dyslipidemia, characterized by elevated total cholesterol plasma levels [32][33][34][35][36][37], and NAFLD [73,79], characterized by fat accumulation in the liver that leads to impaired organ function. It is important to note that children and adolescents are equally susceptible to the effects of PFAS exposure on lipid metabolism [80], as studies have reported that this group is at a higher risk of developing nonalcoholic steatohepatitis (NASH) and NAFLD [81]. A significant body of research has confirmed this effect of PFAS As inferred in the previous paragraph, lipid metabolism appears to be amongst the cellular component most up-regulated in response to PFASs (Figure 6), with the "fatty acid transporters" WikiPathways gene set characterized by a p-adjusted of 1.79 × 10 −17 and the Gene Ontology "lipid import to cell" gene set at p-adjusted = 2.80 × 10 −12 . As previously mentioned, PFASs have a significant impact on this metabolic process, for example through the induction of dyslipidemia, characterized by elevated total cholesterol plasma levels [32][33][34][35][36][37], and NAFLD [73,79], characterized by fat accumulation in the liver that leads to impaired organ function. It is important to note that children and adolescents are equally susceptible to the effects of PFAS exposure on lipid metabolism [80], as studies have reported that this group is at a higher risk of developing nonalcoholic steatohepatitis (NASH) and NAFLD [81]. A significant body of research has confirmed this effect of PFAS on lipid metabolism in human [32][33][34][35][36], mouse [37], and zebrafish [82] with comparable lipid changes observed across species. Strikingly, among the up-regulated pathways, there are some that relate to the response to gonadotropins (Gene Ontology "Cellular response to gonadotropin stimulus", p-adjusted 7.56 × 10 −14 ) and to FSH (follicle-stimulating hormone, represented by Gene Ontology term "Response to FSH" at p-adjusted 7.56 × 10 −14 ). These hormones stimulate the development and growth of the ovarian follicles, thereby affecting fertility [83]. Previous data have shown that PFAS molecules directly influence the secretion of gonadotropin-releasing hormone (GnRH), in turn promoting the expression of gonadotropins, depending on the dose and period of exposure [84].
The most significant down-regulated pathway is represented by the Gene Ontology "Tertiary granule" gene set (adjusted p-value = 7.28 × 10 −12 ). Tertiary granules are secretory granules of neutrophil cells that contain extracellular-matrix-degrading enzymes and are implicated in the inflammatory response [85]. This result highlights a possible mechanism for the immunotoxicity deriving from PFAS exposure [23][24][25].
In summary, the identified pathways underscore the complex and diverse nature of PFAS toxicity, with significant implications for lipid metabolism, immune response, and reproductive function.

Prediction of Affected Metabolites
As the last step of our analysis, we wanted to test the possibilities provided by a newly developed algorithm to infer metabolite differential abundance from gene expression data [67], based on correlation data from the largest (in terms of samples) transcriptomics/metabolomics dataset available to date, the human Cancer Cell Line Encyclopedia dataset [68]. As the method has been developed and tested only on human data, we decided to test it on all human contrasts, generating a predicted normalized NES in response to PFASs for 147 metabolites across all human PFAS vs. control comparisons (Supplementary  Table S4). We then proceeded to integrate all the PFAS-related NESs across contrasts, to provide a human-specific prediction of metabolite response to PFASs. The top 20 most significantly up-and down-regulated metabolites are displayed in Figure 7. there are some that relate to the response to gonadotropins (Gene Ontology "Cellular response to gonadotropin stimulus", p-adjusted 7.56 × 10 −14 ) and to FSH (folliclestimulating hormone, represented by Gene Ontology term "Response to FSH" at padjusted 7.56 × 10 −14 ). These hormones stimulate the development and growth of the ovarian follicles, thereby affecting fertility [83]. Previous data have shown that PFAS molecules directly influence the secretion of gonadotropin-releasing hormone (GnRH), in turn promoting the expression of gonadotropins, depending on the dose and period of exposure [84]. The most significant down-regulated pathway is represented by the Gene Ontology "Tertiary granule" gene set (adjusted p-value = 7.28 × 10 −12 ). Tertiary granules are secretory granules of neutrophil cells that contain extracellular-matrix-degrading enzymes and are implicated in the inflammatory response [85]. This result highlights a possible mechanism for the immunotoxicity deriving from PFAS exposure [23][24][25].
In summary, the identified pathways underscore the complex and diverse nature of PFAS toxicity, with significant implications for lipid metabolism, immune response, and reproductive function.

Prediction of Affected Metabolites
As the last step of our analysis, we wanted to test the possibilities provided by a newly developed algorithm to infer metabolite differential abundance from gene expression data [67], based on correlation data from the largest (in terms of samples) transcriptomics/metabolomics dataset available to date, the human Cancer Cell Line Encyclopedia dataset [68]. As the method has been developed and tested only on human data, we decided to test it on all human contrasts, generating a predicted normalized NES in response to PFASs for 147 metabolites across all human PFAS vs. control comparisons (Supplementary File S4). We then proceeded to integrate all the PFAS-related NESs across contrasts, to provide a human-specific prediction of metabolite response to PFASs. The top 20 most significantly up-and down-regulated metabolites are displayed in Figure 7.  Our analysis shows that exposure to different PFAS compounds stimulates the dysregulation of different types of lipids (triacylglycerols, phosphatidylcholines and lysophosphatidylcholines), amino acids, vitamins and coenzymes. Amongst the most up-regulated lipids, we found triacylglycerol C52:3, which is predicted to have the highest induction via PFASs (NES = 16.78, p-adjusted = 1.72 × 10 −62 ). Another compound highly up-regulated by PFASs is oxidized glutathione, with NES = 11.07 (p-adjusted = 6.95 × 10 −28 ).

Discussion
Our comprehensive analysis gathered and compared all currently available quantitative transcriptomics datasets on PFAS response in animals. The resulting data collection is heterogeneous in terms of species, compounds, concentration, time of exposure, organ and sequencing technology. However, despite this biological diversity, we detected significant recurring responses both at the gene and pathway levels, indicating a cross-compound, cross-tissue and cross-species conservation of transcriptional effects induced by PFASs.
The first important result is that there is detectable and significant cross-species transcriptomics similarity in the response to PFASs (Figures 1 and S1), with higher similarity between closer species (Figures 2, 3, S1, S2 and S4). However, some transcriptional effects induced by PFASs are conserved even in species as distantly related as human and zebrafish (Supplementary Figure S3).
Our investigation then deepened towards specific genes and pathways underlying this cross-species conservation. For example, our analysis detected a strongly conserved PFASinduced up-regulation of lipid metabolism and transport, as well as gonadotropin and FSH pathways ( Figure 6). All these processes are clearly related to ovarian development, estrogen production, ovulation and the physiological functioning of the female reproductive system [86], and this deregulation may provide molecular mechanisms to explain PFASrelated detrimental effects on fertility [26][27][28][29]83] and fetal development [87][88][89][90][91].
Another interesting finding is the conserved down-regulation of another component of ovarian development, the ESR1 gene (Figures 4 and S6). ESR1 encodes for the estrogen receptor alpha (ER-α), a nuclear receptor that influences the expression of numerous genes and physiological processes [92]. By interacting with estrogens, mainly with estradiol (E2), it affects female fertility being essential for ovulation, cellular proliferation and tissue differentiation [92]. Ovary E2/ER-α axis promotes ovulation, and a lower or absent expression of ER-α is associated with infertility [92,93]. ER-α is expressed even in kisspeptin neurons, in which the E2-ER-α interaction inhibits the activity of these neurons and the subsequent synthesis of gonadotropins in the hypothalamic-pituitary axis [94,95]. A lack of ER-α is also associated with increased synthesis of gonadotropins [96], which in turn determines the production of estradiol in the ovary [83]. ESR1 down-regulation is associated with the up-regulation of response to gonadotropins also in polycystic ovary syndrome, leading to infertility [96]. Previous studies have already shown reduced ESR1 expression and transcriptional regulatory activity in mice and humans [37,97] in response to PFAS exposure, giving further validation to our data.
There also appear to be effects of PFASs that go beyond the disruption of reproductive functionality. For example, our data show the up-regulation of the ID1 gene across species (Figures 4 and 5). ID1 encodes for an inhibitor of DNA-binding proteins, which regulates the cell cycle and differentiation. The overexpression of ID1 has been linked to various types of cancer, including leukemia, breast and pancreatic cancer [98,99]. Epidemiologic data suggest that PFASs are also associated with certain types of cancer, with some elements suggesting a pro-oncogenic effect [30]. Notably, elevated exposure to PFOA and PFOS appears to significantly increase the mortality of individuals affected by liver cancer and malignant neoplasms of lymphatic and hematopoietic tissues [31]. The finding of a conserved up-regulation of ID1 may provide molecular support to the involvement of PFAS molecules in cancer pathogenesis.
Our integrated pipeline also detected a strongly conserved down-regulation of the tertiary granule pathway (Figure 6), a component of the immune defense against microorganisms enacted by neutrophil cells [85]. Recent independent findings also suggest that PFASs affect the function of neutrophils, likely inhibiting the formation of the granules or the degranulation process [100]. More scientific literature supports the fact that PFAS exposure impairs immune reactions, antibody production and vaccination responses, particularly in children exposed to PFASs during prenatal and postnatal periods [23][24][25]. This immunotoxicity has been observed not only in humans but also in other animals [23][24][25] and can increase the incidence and severity of many pathologies, including COVID-19 [101][102][103]. In addition, PFAS exposure increases the serum concentration of inflammatory and oxidative stress markers, potentially promoting the development of systemic diseases such as liver injury and cardiovascular diseases, including atherosclerosis and thromboembolic events [104][105][106]. The size and width of our collected PFAS transcriptomics dataset provide the neutrophil tertiary granule mechanism as a strong molecular candidate behind the observed toxic effect of PFASs on the immune system.
Our analysis shows that the transcription of genes involved in lipid metabolism is significantly affected by PFAS exposure, not only in humans but also in other species (Figures 4-6). This is confirmed by previous studies, where PFAS exposure is associated with chronic dyslipidemia and increasing in lipid serum levels [32][33][34][35][36][37]. PFASs also increase the plasma levels of total cholesterol and triglycerides in a dosage-dependent manner [32][33][34][35][36]. It is worth noting that dyslipidemic changes are more pronounced in females than males [35,36] and are also observed in mice [37], as confirmed by our data. The relationship between dyslipidemia and PFASs has also been found in human children and adolescents [80], where exposure to these chemicals increases the risk of developing NASH and NAFLD [81] as well as impairing glucose metabolism [107]. Notably, we found that CYP4A11, previously associated with NAFLD [72,73], is highly up-regulated in both humans and mice, possibly indicating a causative role in NASH development due to PFAS exposure. The impact of PFAS on children is a crucial issue, and it seems that these chemicals can even be transferred through breastfeeding [17,18], which is of great concern.
Overall, our findings on the conserved pathway response to PFASs agree with the existing literature, especially concerning the disruption of lipid and energy metabolism [2,12]. While this validates our findings, it must be noted that the conservation of pathways and genes detected by our analysis is based on an animal-only dataset and, while we took measures to limit the preponderance of data from certain species (human and mouse), the available data are currently dominated by mammalians and vertebrates, with only one representative for invertebrates (C. elegans). If the future will provide more data for more species from different phylogenetic clades, it will certainly provide a more evolutionarily balanced overview of the conservation of transcriptional response to PFASs.
Using recent developments in gene expression data mining for metabolite level predictions [67], we could further analyze PFAS exposure through the prediction of their effects on the metabolome (Figure 7). In particular, the finding that PFAS molecules increase the levels of different kinds of lipids, mainly triacylglycerols as C52:3 TAG (Figure 7), is supported by studies in humans showing that PFAS exposure enhances the concentration of triglycerides and cholesterol in the blood [32][33][34][35][36]. Similarly, mice exposed to PFAS exhibit an increase in cholesterol and triglycerides in the liver [108]. Another PFAS-induced metabolite is oxidized glutathione, a thiol compound resulting from the reduction of reactive oxygen species, xenobiotics and drugs; it plays an important role in protection from oxidative stress and redox homeostasis maintenance, and its high levels are potentially toxic [109]. This induction is consistent with the previously shown PFAS-induced increase in glutathione S-transferase in the liver of Atlantic cod [110], whose increased activity is a marker of oxidative stress, and with reduced levels of reduced glutathione in human liver cells [111]. Our analysis predicted three amino acids amongst the top 10 metabolites down-regulated by PFASs: serine, lysine and glutamate (Figure 7). The down-regulation of serine agrees with the current literature displaying that serine deficiency is associated with an increase in lipid accumulation in the liver [112], a mechanism that mimics the impact of exposure to PFASs [12,47,79,108]. Lysine is an essential metabolite for a healthy pregnancy [113], and its deficiency is known to be detrimental to embryonal development [114], while glutamate is essential for embryonal neurogenesis [115]. Another metabolite predicted to be strongly down-regulated by PFAS is pantothenate (NES = −12.25, p-adjusted = 8.63 × 10 −34 ), a vitamin required for the synthesis of coenzyme-A (CoA), which is in turn essential for fatty acid and energetic metabolism [116]. Pantothenate deficiency is associated with enhanced production of reactive oxygen species and oxidative stress [116], emulating the oxidative stress stimulated by PFAS exposure [117]. In the water flea Daphnia magna, pantothenate was experimentally shown to be down-regulated by PFAS exposure [118].

Conclusions
Our study constitutes the most extensive cross-species and cross-experiment analysis of transcriptional response to PFASs to date. With our collected dataset encompassing 7 species, 11 datasets, 110 contrasts and 2144 samples, we have demonstrated significant conservation of differential expression at both gene and pathway levels. Our analysis leverages the opportunities provided by contemporary transcriptome-wide quantitative technology and reveals a general disruption of hormonal synthesis and detection mechanisms, indicating that PFASs affect an ancient and conserved metabolic hormonal network, which has implications for several components of the ecosystem. While our work focused on commonalities between PFAS compounds, future studies, both computational and experimental, and fueled by the generation of more transcriptomics datasets, will certainly provide greater precision on the specific effects of different PFAS compounds (e.g., PFOS or PFOA) on specific tissues and organisms. This will allow scientists to identify better strategies for the prevention and/or mitigation of the molecular effects of PFASs. We also believe that the identification of the most conserved genetic responders to PFASs will support future research by providing new molecular venues of investigation for PFAS effects and also novel multi-species biomarkers, fueling the creation of ecosystem-wide tests for biological PFAS exposure.
Supplementary Materials: The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/toxics11070567/s1. Figure S1: correlation plot displaying the Pearson correlation coefficient between 110 PFAS vs. control contrasts across 11 datasets and 7 species. The color indicates the correlation coefficient, from the most negative (−1, dark blue) through no correlation (0, white) to the most positive (+1, dark red). The legend indicates the colors used to depict the eleven datasets. Figure S2: correlation plot showing the Pearson correlation coefficient between contrasts derived from M. musculus and H. sapiens datasets. The color indicates the correlation coefficient, from the most negative (−1, dark blue) through no correlation (0, white) to the most positive (+1, dark red). The legend indicates the colors used to depict the six datasets. Figure S3: scatter plot showing the positive correlation between two contrasts of D. rerio and H. sapiens. The highlighted and labeled genes are significantly (p < 0.05) and concordantly differentially expressed in response to PFAS exposure in both datasets. Figure S4: correlation plot showing the Pearson correlation coefficient between contrasts derived from fish species. The color indicates the correlation coefficient, from the most negative (−1, dark blue) through no correlation (0, white) to the most positive (+1, dark red). The legend indicates the colors used to depict the four datasets. Figure S5: line graph indicating the levels of expression of selected genes in response to PFAS molecules in different species, characterized by absolute integrated signature ≥ 10 and standard deviation ≥ 10. Each line is one gene: the genes shown here are the most consistently up-or down-regulated with high standard deviation, as extracted from the orange and cyan points of Figure 4. x-axis reports all the 110 contrasts analyzed in the integrated dataset, grouped by species. y-axis reports the signature for each gene, representing the significance (and sign) of the gene's transcriptional response to PFAS. The horizontal lines delimit the p-value thresholds of 0.05. Figure S6. Line graphs showing the differential expression of selected genes across PFAS exposure: ACAT1 and UGT2A3 in panel A, and RPL35 and ESR1 in panel B. y-axis reports the signature for each gene, representing the significance (and sign) of the gene's transcriptional response to PFAS, as −log10(p) × sign(log(FC)). The horizontal lines delimit the p-value thresholds of 0.05. Table S1: number and symbols of differentially expressed genes (at p-value ≤ 0.05) in all 110 PFAS versus control contrasts of the dataset. Table S2: table depicting pairwise inter-species orthologous conversions adopted in the current study. Table S3: table showing all signature values calculated in the combined dataset, where genes are shown as rows, and contrasts as columns. Abbreviations of each contrast are shown in the "Legend" Table S4: table showing   Data Availability Statement: All data used in this study is referenced (see Table 1) and publicly available in locations described in the Section 2.