Transcriptome analysis reveals the importance of the immune system during early pregnancy in sheep

The majority of pregnancy loss in ruminants occurs during the preimplantation stage, which is thus the most critical period determining reproductive success. While ovulation rate is the major determinant of litter size in sheep, interactions among the conceptus, corpus luteum and endometrium are essential for pregnancy success. To evaluate the role of reproductive tract function in sheep fertility, we performed a comparative transcriptome study by sequencing total RNA (mRNA and miRNA) from corpus luteum (CL) and endometrium tissues collected during the preimplantation stage of pregnancy in Finnsheep, Texel and F1 crosses. A total of 21,287 genes and 599 miRNAs were expressed in our dataset. Ten out of the top 25 most highly expressed genes were shared across tissues, indicating the complementary functions of the CL and endometrium. Moreover, highly expressed autosomal genes in the endometrium and CL were associated with biological processes such as progesterone formation (STAR and HSD3B1) in the CL and facilitation of maternal recognition of pregnancy, trophoblast elongation and implantation (LGALS15, CST3, CST6, and EEF1A1) in the endometrium. In the CL, a group of sialic acid-binding immunoglobulin (Ig)-like lectins (Siglecs), solute carriers (SLC13A5, SLC15A2, SLC44A5) and chemokines (CCL5, CXCL13, CXCL9) were upregulated in Finnsheep, while several multidrug resistance-associated proteins (MRPs) were upregulated in Texel ewes. We also identified a novel ERV gene located in a reduced FecL locus that is associated with sheep prolificacy and is upregulated in prolific Finnsheep. Moreover, we report, for the first time in any species, several genes that are active in the CL during early pregnancy (including SIGLEC13, SIGLEC14, SIGLEC6, MRP4, and CA5A). Importantly, functional analysis of differentially expressed genes suggested that Finnsheep have a better immune system than Texel and that high prolificacy in Finnsheep might be governed by immune system regulation. Taken together, the findings of this study provide new insights into the interplay between the CL and the endometrium in gene expression dynamics during early pregnancy. The data and results will serve as a basis for studying this highly critical period of pregnancy, which has wide significance in mammalian fertility and reproduction.


1
Abstract 17 The majority of pregnancy loss in ruminants occurs during the preimplantation stage, which is thus 18 the most critical period determining reproductive success. While ovulation rate is the major 19 determinant of litter size in sheep, interactions among the conceptus, corpus luteum and endometrium 20 are essential for pregnancy success. To evaluate the role of reproductive tract function in sheep 21 fertility, we performed a comparative transcriptome study by sequencing total RNA (mRNA and 22 miRNA) from corpus luteum (CL) and endometrium tissues collected during the preimplantation 23 stage of pregnancy in Finnsheep, Texel and F1 crosses. A total of 21,287 genes and 599 miRNAs 24 were expressed in our dataset. Ten out of the top 25 most highly expressed genes were shared across 25 tissues, indicating the complementary functions of the CL and endometrium. Moreover, highly 26 expressed autosomal genes in the endometrium and CL were associated with biological processes 27 such as progesterone formation (STAR and HSD3B1) in the CL and facilitation of maternal 28 recognition of pregnancy, trophoblast elongation and implantation (LGALS15, CST3, CST6, and 29 EEF1A1) in the endometrium. In the CL, a group of sialic acid-binding immunoglobulin (Ig)-like 30 lectins (Siglecs), solute carriers (SLC13A5, SLC15A2, SLC44A5) and chemokines (CCL5, CXCL13, 31 CXCL9) were upregulated in Finnsheep, while several multidrug resistance-associated proteins 32 (MRPs) were upregulated in Texel ewes. We also identified a novel ERV gene located in a reduced 33 FecL locus that is associated with sheep prolificacy and is upregulated in prolific Finnsheep. 34 Moreover, we report, for the first time in any species, several genes that are active in the CL during Finnish landrace, one of the most highly prolific breeds, has been exported to more than 40 countries 50 to improve local breeds, although the heritability of ovulation rate is low (Hanrahan and Quirke, 51 1984). In recent years, a FecG F (V371M) mutation in gene GDF9 has been identified to be strongly 52 associated with litter size in Finnsheep and breeds such as the Norwegian White Sheep, Cambridge 53 and Belclare breeds, which were developed using Finnsheep (Hanrahan et  The success of pregnancy establishment in sheep and other domestic ruminants is determined at the 56 preimplantation stage and involves coordination among pregnancy recognition, implantation and 57 placentation, in which the corpus luteum (CL) and endometrium play vital roles (Geisert et al., 1992; 58 Spencer et al., 2004bSpencer et al., , 2007. The preimplantation stage of pregnancy is the most critical period in 59 determining the litter size because of the high embryo mortality during this period. It has been shown 60 that most embryonic deaths occur before day 18 of pregnancy in sheep (Quinlivan et al., 1966;Bolet, 61 1986; Rickard et al., 2017). However, due to the biological complexity of the process and to technical 62 difficulties, embryo implantation is still not well understood. 63 The CL is an endocrine structure whose main function is to synthesize and secrete the hormone 64 progesterone. Progesterone production is essential for the establishment of pregnancy. However, if 65 pregnancy is not established, the CL will regress as a result of luteolysis, and a new cycle will begin. 66 The endometrium is the site of blastocyst implantation, but its function is not limited to implantation. 67 The outer lining of the endometrium secretes histotroph, a complex mixture of enzymes, growth 68 factors, hormones, transport proteins and other substances that are key to conceptus survival and 69 implantation, pregnancy recognition signal production and placentation (Spencer and Bazer, 2004; 70 Forde et al., 2013). In addition, the endometrium also plays an important role in regulating the 71 estrous cycle (Spencer et al., 2008). 72 The whole-transcriptome profiling approach enables a deeper understanding of the functions of both 73 the CL and endometrium, which may allow the identification of genes and markers that are 74 differentially expressed, for example, between breeds showing different litter size phenotypes. 75 Although most of the studies associated with early pregnancy have been performed in sheep (Spencer 76 et al. expression data with genome-wide association studies (GWASs) to understand the roles of CL and 84 endometrium transcriptomes in dairy cattle fertility. A study by (Kfir et  prior to implantation (current study). After ovary removal (Pokharel et al., 2018), the ewes were 108 mated using two Finnsheep rams, and the pregnant ewes were slaughtered during the preimplantation 109 phase of the pregnancy when the embryos were estimated to be one to three weeks old. At the 110 slaughterhouse, a set of tissue samples (the pituitary gland, a CL, oviductal and uterine epithelial  111 cells, and preimplantation embryos) were collected and stored in RNAlater reagent (Ambion/Qiagen, 112 Valencia, CA, USA) following the manufacturer's instructions. Of the collected tissue samples, CL 113 and endometrium tissues were subjected to current study. Endometrial samples were collected from 114 the uterine horns with a cytobrush, which was rinsed in a tube containing RNAprotect Cell Reagent 115 (Qiagen, Valencia, CA, USA). One of the CLs was dissected from each ovary. For the present study, 116 and particularly for the RNA-Seq of the endometrium and CL, six ewes each from the Finnsheep, 117 Texel and F1 cross groups were included. Therefore, out of 31 ewes that were originally included in 118 the main experiment, only 18 have been considered here. The experimental design have been 119 described in more detail in an earlier study (Pokharel et al., 2018). 120

Library preparation and sequencing 121
Both mRNA and miRNA were extracted from the tissues using an RNeasy Plus Mini Kit (Qiagen,  122 Valencia, CA, USA) following the manufacturer's protocol. The details on RNA extraction have 123 been described previously (Hu et Illumina TruSeq indexing adapters were  129  ligated to each sample during an adapter ligation step to enable pooling of multiple samples into one  130  flow cell lane. The quality and concentrations of the libraries were assessed with an Agilent  131 Bioanalyzer 2100 and by Qubit® Fluorometric Quantitation, Life Technologies, respectively. All 132 samples were normalized and pooled for automated cluster preparation at an Illumina cBot station. 133 High-quality libraries of mRNA and miRNA were sequenced with an Illumina HiSeq 2000 134 instrument using paired-end (2x100 bp) and single-end (1x50) sequencing strategies, respectively. 135

Data preprocessing and mapping for mRNA 136
The raw reads were assessed for errors and the presence of adapters using FastQC v0.11.6 (Simon 137 Andrews). As we noticed the presence of adapters, Trim Galore v0.5.0 (Felix Krueger; Martin, 2011) 138 was used to remove the adapters and low-quality reads and bases. The transcripts were quantified 139 under the quasi-mapping-based mode in Salmon v0.11.2 (Patro et al., 2017). We extracted the 140 FASTA sequences (oar31_87.fa) of the sheep transcriptome (oar31_87.gtf) using the gffread utility 141 (Trapnell et al., 2010) and built the transcriptome index. The resulting index was used for transcript 142 quantification (also known as pseudo alignment) of the RNA-Seq reads. 143

Data preprocessing and analysis for miRNA 144
The raw sequence data were initially screened to obtain an overview of the data quality, including the 145 presence or absence of adapters, using FastQC v0.11.6 (Simon Andrews). Next, the Illumina adapters 146 and low-quality bases were removed using Trim Galore v0.5.0 (Felix Krueger; Martin, 2011). In 147 addition, reads that were too short (having fewer than 18 bases) after trimming were also discarded. 148 To reduce downstream computational time, high-quality reads were collapsed using Seqcluster 149 v1.2.4a7 (Pantano et al., 2011). The FASTQ output from Seqcluster was first converted into a 150 FASTA file. The FASTA header was reformatted by including a sample-specific three letter code, 151 which is also a requirement for miRDeep2 analysis. For instance, ">A01_1_x446 A01" represents 152 sample C1033, whose first read was repeated 446 times. 153 The collapsed reads were mapped against the ovine reference genome (oar v3.1) using Bowtie 154 (Langmead et al., 2009). The Bowtie parameters were adjusted so that (1.) the resulting alignments 155 had no more than 1 mismatch (-v 1); (2.) the alignments for a given read were suppressed if more 156 than 8 alignments existed for it (-m 8); and (3.) the best-aligned read was reported (--strata, --best). 157 The alignment outputs (in SAM format) were coordinate-sorted and converted to BAM files. The 158 sorted BAM files were converted to the miRDeep2 ARF format using the "bwa_sam_converter.pl" 159 script. 160 miRDeep2 v2.0.0.5 (Friedländer et al., 2012) was used to identify known ovine miRNAs and to 161 predict conserved (known in other species) and novel ovine miRNAs. Before running the miRDeep2 162 pipeline, we merged both the collapsed FASTA files and the mapped ARF files. Furthermore, hairpin 163 and mature sequences of all species were extracted from miRBase v22 (Kozomara and Griffiths-164 Jones, 2011, 2014). The extracted sequences were grouped into mature ovine sequences, ovine 165 hairpin sequences, and mature sequences for all species except sheep. The results from miRDeep2 166 were further processed to compile a list of all known and novel miRNAs. For novel and conserved 167 miRNAs, we designated provisional IDs that included the genomic coordinates of the putative mature 168 and star sequences. For mRNA-Seq data, the gene expression estimates from Salmon were used to identify DEGs. The 171 Salmon-based transcript counts were summarized to gene level estimates using tximport (Soneson et  172 al., 2016).DESeq2 (Love et al., 2013) was used to compare gene expression differences. We started 173 by considering both tissues, but after observing the high variation between the endometrial samples, 174 we decided to analyze the two tissues separately. Furthermore, PCA plot ( Fig. 2A) illustrated a high 175 variation in gene expression estimates between the endometrial samples while tight clustering of the 176 CL samples. Therefore, owing to sampling bias (explanation in results and discussion section), we 177 did not proceed with differential gene expression analysis on endometrial samples. All replicates 178 were collapsed before running DESeq. We set the filtering criteria for significant DEGs to an 179 adjusted p-value of 0.1 (padj <1) and an absolute log2(fold change) of greater than 1 180 (abs(log2Foldchange)>1). All the significant DEGs were annotated with Bioconductor biomaRt 181 (Durinck et al., 2005) to retrieve additional information (gene name, gene description, Entrez ID, 182 human ortholog and chromosome number). 183 From the list of miRNAs discovered with miRDeep2, those with a minimum count of 10 reads across 184 all samples were considered for expression analysis. We used DESeq2 for expression analysis, in 185 which the technical replicates of three samples (C107, C4271 and C312) were collapsed prior to 186 running the DESeq command. Because of the sampling bias in endometrium samples, we did not 187 conduct breed wise differential expression analysis for endometrium samples. The differentially 188 expressed miRNAs with adjusted p-values less than 0.1 were regarded as significant. test with the Bonferroni step-down correction method. We used a custom reference set that included 196 a list of all the expressed genes in our dataset. We also modified the default GO and pathway 197 selection criteria in such a way that a minimum of three genes and three percent of genes from a 198 given GO or KEGG pathway should be present in the query list. Furthermore, GO terms with a 199 minimum level of three and a maximum level of 5 were retained. 200

Manual annotation of select genes 201
When we noticed that many genes lacked gene annotations, we manually annotated those among the 202 top 25 most highly expressed genes in each tissue and the significant DEGs. First, we extracted the 203 coding sequence of each novel gene using Ensembl BioMart. All genes that had coding sequences 204 were BLASTed against the nonredundant (NR) nucleotide database. For the BLAST-based 205 annotation, we chose the hit with the highest coverage and the highest percentage of sequence 206 identity to the query sequence. Gene IDs that lacked coding sequences (CDSs) were queried back to 207 the Ensembl database to retrieve existing information. Throughout the paper (including in the 208 supplementary data files), the genes that were annotated based on the BLAST results have been 209 marked with an asterisk (*), while those that were annotated based on information available in 210 Ensembl are marked with a hash (#). CLs (Supplementary Table S1). F1 showed phenotypes closer to those of Finnsheep than those of 216 Texel, having 3.75 CLs on average (Supplementary Table S1). We did not observe more than 2 CLs 217 in the Texel group or fewer than 3 CLs in Finnsheep or F1 cross-bred. Similarly, on average, 218 Finnsheep had the highest number of embryos (n=2.6), followed by F1 crosses (n=1.8) and Texel 219 (n=1.5). The F1 crosses displayed phenotypes similar to those of Finnsheep; this was unsurprising, as 220 we observed a similar pattern in an earlier study (Pokharel et al., 2018). Interestingly, the embryo 221 survival rate in Texel where 1.5 embryos were present from 1.7 CLs (88%) on average. On the other 222 hand, Finnsheep (63%) and F1 cross (48%) had remarkably low embryo survival rate. While these 223 findings are based on fewer animals, the results are in line with earlier studies (Rhind et al., 1980;224 Silva et al., 2016) where typically higher litter size is associated with higher embryo mortality and 225 vice versa. It would be of great interest to determine if productivity follows the same pattern in F2 226 (i.e., F1 x F1) crosses, backcrosses and presumably also in a reciprocal cross. 227

RNA-Seq data 228
From the 42 libraries (21 from each tissue, including three technical replicates), 4.4 billion raw reads 229 were sequenced, of which 4.2 billion clean reads were retained after trimming. The summary 230 statistics from Trim Galore revealed that up to 3.6% of the reads were trimmed, with reverse-strand 231 reads having a comparatively higher percentage of trimmed bases. However, the percentage of reads 232 that were excluded for being shorter than 18 bp was always less than 1% across all samples 233 (Supplementary table S2). Up to 70% of the high-quality reads were mapped to the ovine reference 234 transcriptome (Ensembl release 92). 235

Gene expression in CL and endometrium 236
A total of 21,287 gene transcripts were expressed in the whole data set, of which 1,019 and 959 were 237 specific to the endometrium and CL, respectively. Genes such as cytochrome P450, family 11, 238 subfamily A, polypeptide 1 (CYP11A1), C-C motif chemokines (CCL21, CCL26), serpin family A 239 members (SERPINA1, SERPINA5), inhibin subunit alpha (INHA) and paternally expressed 10 240 (PEG10) were specific to CL while genes related with solute carriers (SLC44A4, SLC7A9, 241 SLC34A2), ERVs were endometrium-specific (Table 1). Further grouping of the expressed genes 242 showed that the most genes (n =19,440) were expressed in endometrium samples of Texel, while the 243 fewest genes (n =19,305) were expressed in endometrium samples of F1 crosses indicating s high 244 variation within the tissue. The cumulative difference in the number of genes in different samples and 245 tissues might be due to transcriptional noise. Nevertheless, the total number of genes expressed in 246 these tissues is comparatively higher than that in ovaries (Pokharel et al., 2018). As shown in Fig. 1, 247 the highest number of breed-specific genes expressed in the CL was found in Finnsheep (n=254), 248 followed by F1 crosses (n=204) and Texel (n=199). Similarly, from endometrium samples, we 249 observed the highest number of unique genes in F1 crosses (n=284), followed by Finnsheep (n=244) 250 and Texel (n=201). In a pairwise comparison, based on overall gene expression, Finnsheep and Texel 251 shared a higher number of genes (n=260) in the CL than in the endometrium. Moreover, Finnsheep 252 and F1 crosses were found to share relatively more common genes (n=278) than the other pairs ( Fig.  253 1). 254 Several Igs were expressed in the endometrium samples. Igs are heterodimeric proteins that belong to 255 the Ig superfamily (IgSF) (Williams and Barclay, 1988). Igs are composed of two heavy and two light chains, and the light chain may further consist of a κ or λ chain (Williams and Barclay, 1988). 257 Interestingly, the structure and organization of the genes enable Igs to be receptive to a virtually 258 unlimited array of antigens rather than being limited to a fixed set of ligands (Honjo, 1983 acetylgalactosaminyltransferase 2 (B4GALNT2) and insulin-like growth factor 2 mRNA-binding 284 protein 1 (IGF2BP1), and a pseudogene, ezrin-like protein, have been identified to exist in that 285 region; our results have added one more gene. In addition to the finding that the best hit was related 286 to the FecL locus, the gene appeared to be an ERV, as we noticed that the query gene had 98% 287 sequence identity with a partial sequence of the endogenous-virus beta-2 pro/pol region (see also Although we observed considerable overlap of genes between tissues, principal component analysis 294 (PCA) of the 500 most highly expressed genes clearly indicated two distinct groups ( Fig. 2A). 295 Similarly, a heatmap plot based on the top 25 genes with the highest levels of gene expression 296 variation across all samples showed a similar pattern (Fig. 2B). However, we did not observe any 297 breed-specific clusters in either of the tissues, which was also the case in our earlier ovarian 298 transcriptome study (Pokharel et al., 2018). In addition to distinctiveness in terms of gene expression, 299 the PCA plot also revealed that the CL samples appeared to be more homogeneous than the 300 endometrium samples. The two sub-clusters within the endometrium cluster is linked to the age of 301 the embryo (i.e. days after mating) and indicated the experimental bias. More importantly, sampling 302 bias owing to difference in days of collecting endometrium biopsies was apparent in the gene 303 expression. In general, samples in the upper right were older (13-16 days) compared to those in the 304 endometrium. Plakophilin 2 (PKP2) and 3 (PKP3), and plakophilin-1-like were also upregulated in 328 endometrium. 329

Most highly expressed genes 330
One of the most interesting findings was the significant interplay between the CL and endometrium 331 during the preimplantation phase, as revealed by the most highly expressed genes. To obtain an 332 overview of the most abundant genes in each tissue, we selected the top 25 genes (Table 2). We 333 noticed that fifteen out of the top 25 genes were shared in both tissues, and the majority (9 out of 15) 334 were mitochondrial genes. Mitochondrial genes play prominent roles during reproduction. We have 335 also observed high levels of expression of mitochondrial genes in ovaries during the follicular growth 336 phase (Pokharel et al., 2018). Six shared autosomal genes also appeared to play substantial roles 337 during the preimplantation stage. Six genes (NUPR1, BCL2L15, CST3, CST6, S100G, and OST4; see Table 2 for descriptions) specific 358 to the endometrium and one gene (B2M) common to both the CL and endometrium were also found 359 to be highly abundant in a recent study in which the authors compared gene expression changes in 360 the luteal epithelium and glandular epithelium during the peri-implantation stage in sheep (Brooks et  361 al plays an important role during peri-implantation and throughout pregnancy (Kendrick, 2000). OXT 383 signaling is known to be influenced by progesterone, but the mechanism underlying the regulation is This is a provisional file, not the final typeset article Davis and LaVoie, 2018). Two of these major genes involved in progesterone synthesis (STAR and 399 HSD3B1) were ranked among the top 25 most highly expressed autosomal genes, while CYP11A1 400 (TPM=2,080), FDXR (TPM=86) and FDX1 (TPM=1,206) were also highly expressed. 401

Breed wise gene expression differences in the CL 402
Because of the sampling bias owing to the Overall, the CL appeared to display higher levels of gene 403 expression differences between the breeds than the endometrium ( were downregulated. However, 91 out of the 199 genes lacked annotations (i.e., a gene name and 417 gene description). We were able to retrieve the CDSs for 82 genes and employed a BLAST search 418 against the NR database. Based on the Ensembl ncRNA prediction system, two out of nine genes that 419 lacked CDSs were predicted to be miRNAs, and the rest were lincRNAs. 420 In the list of DEGs, we observed a few cases in which more than one gene from the same family was 421 present. All eight genes related to multidrug resistance-associated proteins (MRPs) were upregulated 422 in Texel ewes, of which seven genes were type 4, while one was type 1. Out of 140 genes that were upregulated in Finnsheep, 50 genes (35.71%) were associated with 48 451 different GO terms (Fig. 3, Supplementary table S5) within the biological processes category, 452 whereas 90 genes lacked GO annotations. The upregulated genes were associated with positive 453 regulation of several processes such as "T cell migration", "cytokine production", "defense 454 response", "immune system process", "interferon-gamma production", "leukocyte chemotaxis", 455 "response to external stimulus", , "cell-cell adhesion" and "cytokine-mediated signaling pathway". 456 Some biological processes potentially associated with implantation were "maintenance of location", 457 "plasma membrane invagination", "import into cell", "chemotaxis", and "receptor internalization". 458 Other biological processes, such as "response to bacterium", "response to lipopolysaccharide", 459 "lymphocyte-mediated immunity" and "chemotaxis", could be associated with adaptation of 460 Finnsheep to the rugged Finnish climate and with disease resistance. In summary, genes involved in 461 the immune response were upregulated in Finnsheep CL during early pregnancy. 462 Similarly, only 40 of the 140 genes upregulated in Finnsheep were associated with 29 KEGG 463 pathways (Supplementary fig. 1, Supplementary table S6). The majority of the pathways were 464 associated with diseases; "tryptophan metabolism", "cell adhesion molecules (CAMs)", "Th1 and 465 Th2 cell differentiation" and "Th17 cell differentiation" appeared to play roles in implantation. Out 466 of 59 genes that were downregulated in Finnsheep, 17 and 14 genes were associated with GO IDs 467 and KEGG pathways, respectively. However, after applying our selection criteria (GO terms with 468 minimum level of 3 and maximum level of 5 whereby minimum of 3 genes and 3 % of genes from 469 given GO terms, also see methods section), only one biological process, "negative regulation of 470 endopeptidase activity" (associated genes: COL28A1, LOC101104482, and SLPI) and one KEGG 471 pathway, "bile secretion" (associated genes: This is a provisional file, not the final typeset article genes had available functional annotations and were associated with nine different GO terms (Fig. 5,  489 Supplementary table S8). The majority of the GO terms were related to transport ("anion transport", 490 "lipid transport", "organic anion transport", and "fatty acid transport") and regulation ("regulation of 491 lipid transport", "regulation of homotypic cell-cell adhesion", "negative regulation of T cell 492 activation", and "regulation of lipid localization"). 493 Altogether, 20 out of the 49 genes upregulated in Finnsheep vs F1 crosses were linked to KEGG 494 pathways. Based on the selection criteria, only two KEGG pathways, namely, "complement and 495 coagulation cascades" (associated genes C5AR1, F13A1, and VSIG4) and "Fc gamma R-mediated 496 phagocytosis" (associated genes : FCGR1A, SCIN, and SYK) (miRBase 21) after four years, and the overall number of miRNA sequences increased by over a 534 third. However, the number of sheep miRNAs remained the same. Moreover, studies that produce 535 miRNA datasets have been scarce. As of April 2019, miRNA datasets from only three studies were 536 available in the European Nucleotide Archive (ENA) database, with accession codes PRJNA308631 537 (n=3), PRJEB22101 (n=37) and PRJNA414087 (n=40); the PRJEB22101 dataset was from the first 538 phase of this study (Pokharel et al., 2018). In the current study, we quantified over threefold more 539 sheep miRNAs (n=599) than are available in miRBase. Therefore, these miRNAs will certainly 540 improve the existing resources and will be valuable in future studies. 541 We did not perform differential gene expression analysis on the endometrial samples because of the 542 sampling bias. Two miRNAs, both upregulated in Finnsheep, were significantly differentially 543 expressed between the pure breeds in the CL, while the other comparisons did not reveal any 544 significantly differentially expressed miRNAs. Of these two significantly differentially expressed 545 miRNAs, rno-miR-451-5p is a conserved miRNA similar to one found in rats (Rattus norvegicus). 546 The other, oar-18_757_mt, is a novel miRNA expressed on chromosome 18. Chromosomal 547 placement of the quantified miRNAs revealed a large cluster of miRNAs on chromosome 9 that we 548 also observed in the ovaries (Fig.5). 549

Limitations and thoughts for future studies 550
We acknowledge certain limitations of this study. We believe that with the availability of a better 551 annotated reference genome, the data from this study will reveal additional information that we may 552 have missed in this analysis. With sequencing costs becoming increasingly inexpensive, increasing 553 the sample size of each breed group would certainly add statistical power. Given that time-series 554 experiments are not feasible with the same animal, sampling could be performed with a larger group 555 of animals at different stages of pregnancy to obtain an overview of gene expression changes. One 556 ovary from each ewe was removed earlier (Pokharel et al., 2018) and all the CLs for this study was 557 collected from the remaining ovary. Therefore, there might be some impact due to possible negative 558 feedback effects on overall gene expression. It should be noted that overall gene expression and, 559 more specifically, differential expression between breeds is inherently a stochastic process; thus, 560 there is always some level of bias caused by individual variation (Hansen et al., 2011). After noticing 561 the bias caused by pregnancy length on endometrium, we were unable to proceed further with breed-562 wise differential expression analyses. Including more individuals in future experiments as well as 563 more controlled tissue sampling will minimize such bias. We are also aware that the overall gene 564 expression profiles may have been affected by the absence of one CL that was removed for earlier 565 study, (Pokharel et al., 2018). Having said that, as CL was removed from all ewes in current study, 566 we do not expect any bias in the gene expression comparison. The results from breeding experiments 567 have shown that productivity traits such as litter sizes may not carry on to F2 crosses (F1 x F1) 568 and/or backcrosses. Therefore, future experiments that involve F2 crosses and backcrosses would 569 provide more valuable findings related to prolificacy. Moreover, by doing a reciprocal cross 570 experiment, we might be able to get insight into POE and measure the potential contribution of the 571 Texel and Finnsheep in each cross. In addition, replicating such experiments in different 572 environments would be relevant for breeding strategies to mitigate the effects of climate change. To 573 minimize or alleviate noise from tissue heterogeneity, single-cell experiments may prove beneficial 574 in future studies. While we observed interplay between the endometrium and the CL, it would be 575 equally interesting to measure the transcriptional patterns in the embryo. Finally, the application of 576 gene-modifying technologies such as CRISPR/Cas9 to edit certain regions (such as the region in the 577 FecL locus homologous to the partial retrovirus sequence) may provide important insights into 578 phenotypes associated with infertility, prolificacy and other traits of interest. 579

5
Conclusion 580 This is a provisional file, not the final typeset article We compiled the most comprehensive list thus far of genes (n=21,287) and miRNAs (n=599) 581 expressed in the CL and endometrium, which are the most important tissues during the 582 preimplantation stage and therefore determine the success of pregnancy in sheep. Our results agree 583 well with the (limited) existing reports, which are mainly focused on the interplay of the 584 endometrium and conceptus, but we have shown that the CL plays an equally important role. The 585 relative scarcity of transcriptomic information about the CL means that its functional importance is 586 underrated. We identified several key transcripts, including coding genes (producing mRNA) and 587 noncoding genes (miRNAs, snoRNAs, and lincRNAs), that are essential during early pregnancy. 588 Functional analysis primarily based on literature searches and earlier studies revealed the significant 589 roles of the most highly expressed genes in pregnancy recognition, implantation and placentation. F1 590 crosses were more closely related to Finnsheep than to Texel, as indicated by phenotypic and gene 591 expression results that need to be validated with additional experiments (with F2 crosses and 592 backcrosses). Several genes with potential importance during early pregnancy (including SIGLEC13, 593 SIGLEC14, SIGLEC6, MRP4, and CA5A) were reported in the CL for the first time in any species. 594 The roles of retroviruses during early pregnancy and in breed-specific phenotypes were indicated by 595 the observed gene expression dynamics, especially in the endometrium. A novel gene sharing 596 similarity with an ERV was identified in the FecL locus. The results from this study show the 597 importance of the immune system during early pregnancy. We also highlight the need for improved 598 annotation of the sheep genome and emphasize that our data will certainly contribute to such 599 improvement. We observed a cluster of miRNAs on chromosome 18 homologous to that found on 600 chromosome 14 in humans. Taken together, our data provide new information to aid in understanding 601 the complex reproductive events during the preimplantation period in sheep and may also have 602 implications for other ruminants (such as goats and cattle) and mammals, including humans. 603 Conflict of Interest 608 The authors declare that the research was conducted in the absence of any commercial or financial 609 relationships that could be construed as potential conflicts of interest.   novel ERV identified in this study belongs to FecL locus and is located between B4GALNT2 and 979 Enzrin-like protein.   The table includes  992 the Ensembl gene ID, chromosome number (Chr.), gene ID (GeneID) and gene description. The table  993 is divided into three sections; the first section lists the 25 genes that were shared by the two tissues, 994 and the other two list the remaining 10 genes in the endometrium and CL. Gene IDs and annotations 995 that were not available in BioMart were retrieved based on a homology search using the nucleotide 996 BLAST (marked with an asterisk, "*") or on information available in Ensembl (marked with a hash, 997 "#"    Table S1: Phenotype data of the samples 1008