CAG Repeat Instability in the Peripheral and Central Nervous System of Transgenic Huntington’s Disease Monkeys

Huntington’s Disease (HD) is an autosomal dominant disease that results in severe neurodegeneration with no cure. HD is caused by the expanded CAG trinucleotide repeat (TNR) on the Huntingtin gene (HTT). Although the somatic and germline expansion of the CAG repeats has been well-documented, the underlying mechanisms had not been fully delineated. Increased CAG repeat length is associated with a more severe phenotype, greater TNR instability, and earlier age of onset. The direct relationship between CAG repeat length and molecular pathogenesis makes TNR instability a useful measure of symptom severity and tissue susceptibility. Thus, we examined the tissue-specific TNR instability of transgenic nonhuman primate models of Huntington’s disease. Our data show a similar profile of CAG repeat expansion in both rHD1 and rHD7, where high instability was observed in testis, liver, caudate, and putamen. CAG repeat expansion was observed in all tissue samples, and tissue- and CAG repeat size-dependent expansion was observed. Correlation analysis of CAG repeat expansion and the gene expression profile of four genes in different tissues, clusterin (CLU), transferrin (TF), ribosomal protein lateral stalk subunit P1 (RPLP1), and ribosomal protein L13a (RPL13A), showed a strong correlation with CAG repeat instability. Overall, our data, along with previously published studies, can be used for studying the biology of CAG repeat instability and identifying new therapeutic targets.


Introduction
Huntington's Disease (HD) is an autosomal dominant disease that results in severe neurodegeneration and cognitive, behavioral, and motor decline, followed by death approximately 10-15 years after diagnosis [1][2][3]. HD is caused by the expanded CAG trinucleotide repeat (TNR) on the Huntingtin gene (HTT) [1,4], which results in a longer polyglutamine (polyQ) chain and misfolding of the huntingtin (HTT) protein. The expansion of CAG repeat tract in exon 1 of HTT results in expanded polyglutamine-containing fragments that form aggregates in the cell [5]. Huntingtin is a large protein (>340 kDa) with large a-helical HEAT (huntingtin, elongation factor 3, protein phosphatase 2A, and lipid kinase TOR) repeat protein [6]. The expansion results in oligomerization disrupting cellular functions and impairing proteostasis, eventually resulting in alterations or neural functions [5,7,8].
The functions of wild-type (WT) HTT, though not fully understood, are essential in neurogenesis and the prevention of cell death [9,10]. The underlying mechanisms of monkey and rodent models [25,26]. However, CAG repeat expansion was observed in HD monkey sperm, while a limited expansion was observed in some HD rodent models.
The aim of this study was to further investigate tissue or cell-type specificity of CAG expansion in our transgenic HD monkeys and investigate whether our model recapitulates human pathology. We examined CAG repeat instability of postmortem tissues of transgenic HD monkeys and investigated proteins that are expressed in various proteins that show a correlation with CAG repeat expansion which could lead to new insight into the underlying mechanisms of CAG repeat instability and expansion.

Materials and Methods
Animals: Four Rhesus macaques, rHD1, rHD7, and two WTs, were used in this experiment. The two HD monkeys (rHD 1 and rHD7) carried transgenes with different length of CAG repeats in exon 1 of HTT regulated by human ubiquitin C promoter and human HTT gene promoter, respectively. Both monkeys were euthanized at five years of age [45,46,55].
DNA Isolation: An approximately 0.5 cm 3 sample of tissue was used for DNA extraction. DNA extraction was completed using a Maxwell ® 16 Tissue DNA Purification Kit (Promega, Madison, WI, USA). The concentration and purity of DNA extractions were measured using NanoDrop™ 2000 (ThermoFisher, Waltham, MA, USA).
Data Analysis: PCR product was sent to Emory Integrated Genomics Core for GeneScan analysis. In total, 1.5 µL of PCR products were mixed with 0.5 µL of GeneScan™ 500 ROX™ (ThermoFisher, Waltham, MA, USA) and 9.5 µL of Hi-Di™ Formamide (Applied Biosystems). Samples were denatured at 95 • C for 5 min and ran on a 3130xl Genetic Analyser (Applied Biosystems). The data was analyzed using GeneMarker ® (SoftGenetics). From the electrograms, only peaks with a height above 10% of the highest peak were included in calculations. Expansion index was calculated by modifying instability index [13] following the equation: Σ peak height Σpeak heights (∆TNR from the reference allele) For the expansion index, instead of calculating changes from the modal peak (i.e., the highest peak), reference tissue with the most stable CAG repeat (muscle in both rHD1 and rHD7) was used to calculate the changes, which was multiplied by the normalized peak height (peak height/∑peak height). The sum of all values was expressed as an expansion index. The expansion index represents the instability of a sample and its tendency towards expansions (i.e., positive values) or contractions (i.e., negative values). The expansion index close to zero indicates low instability. Positive values indicate expansions, and negative values indicate contractions.
For curve-fit data analysis, masked allele data was imported into MATLAB (The Math-Works Inc., Natick, MA, USA). The curve-fitting was processed as previously described [28]. Briefly, imported data was analyzed with the ipf.m function in MATLAB, minimizing error to under 5% and maximizing the overall fitness R 2 value to greater than 0.95. The Gaussian distribution was used to fit the curves due to the nature of the data. Later, curve-fit data and the electropherogram were superimposed using Adobe Illustrator (Adobe). All the data with mean, error, and R 2 values are presented in the Supplemental Tables (Table S1  and Table S2).
Statistical analysis: All curve-fit data with the position of mean, error, and R 2 value are presented in the Supplemental Tables. Individual alleles (red curves) were plotted in the box plot to deconvolute individual alleles that arise from the parental alleles since different tissues have different expansion profiles, i.e., continuous in the tail and periodic expansion in striatum [28]. Continuous expansion conforms to a random bi-directional forward-biased model while dramatic expansion demonstrates periodicity of inserting stable TNR segments occurring within small cell populations causing the subsequent cell population to have similar repeat length within the normal distribution [28]. Therefore, the curve-fit method was used to deconvolute the alleles that arise from periodic expansion within the electropherograms. For the linear regression, the Pearson correlation coefficient (R 2 ) and statistical significance (p-value) were calculated using GraphPad Prism Version 8.0.2 (GraphPad Software, La Jolla, CA, USA). For all correlation analyses, one-tailed Spearman's correlation analysis was conducted on GraphPad Prism.
Correlation Data Analysis: To characterize the TNR-associated gene expression, we extracted gene expression data from the tissue used in this study from Genotype-Tissue Expression (GTEx) project [55]. From the top 100 expressed genes among tissues with high TNR instability, liver, testis, caudate, and putamen, a total of 35 genes were commonly expressed in all four tissues. The median transcripts per million (TPM) data were downloaded from the database, and all 35 genes were tested for correlation with the expansion index and curve-fit range data. The correlation was tested using corrplot in R. Of 35 genes, only 4 showed either significantly positive or negative correlation with either expansion index or curve-fit range data (CLU, TF, RPL13A, and RPLP1). Gene expressions from NEIL1 and MSH3 were added as a reference. The median TPM and either expansion index or curve-fit range were plotted, and correlations between the two were calculated. For all correlation analyses, one-tailed Spearman's correlation analysis was conducted on GraphPad Prism (GraphPad 8.0.2).

Results
Tissue samples were collected from two WT monkeys and two HD monkeys (rHD1 and rHD7; Table 1). Both HD monkeys were created by injecting lentivirus vectors into oocytes. rHD1 was created with a vector expressing exon 1 of the human HTT gene with 84 CAG repeats, and rHD7 was created with a vector expressing exons 1-10 of the human HTT with approximately 67-72 CAG repeats under the human HTT promoter [48,56]. Although the integration sites and precise copy numbers of transgenes were not analyzed, lymphocytes from 3 month-old rHD1 showed mutant alleles at 27Q, 44Q, 76Q, and 87Q [57], and lymphocytes from 12 month-old rHD7 showed mutant allele at 66Q [47]. Both monkeys were euthanized at the age of five. DNA was extracted from the tissues and then underwent PCR specifically targeting the CAG repeats of normal and mutant HTT genes. Representative electropherograms and curve-fit data of several notable tissues from rHD1 and rHD7 are shown in Figure 1.
Biomedicines 2022, 10, x FOR PEER REVIEW 5 of 16 lymphocytes from 12 month-old rHD7 showed mutant allele at 66Q [47]. Both monkeys were euthanized at the age of five. DNA was extracted from the tissues and then underwent PCR specifically targeting the CAG repeats of normal and mutant HTT genes. Representative electropherograms and curve-fit data of several notable tissues from rHD1 and rHD7 are shown in Figure 1.  All electropherograms are provided in supplemental data ( Figures S1-S4). The curvefit method was used in this study to capture multiple alleles derived from the primary allele following the method described by Mollersen et al. [28], which was used in our previous studies [54,57]. The result of the curve-fit data is provided in supplemental data with the error and goodness of fit value (R 2 ) (Supplemental Tables S1 and S2). The electropherograms show the mosaicism of CAG repeats with different repeat lengths. From these electrograms, peak sizes from the curve-fit data and expansion index were used in further analysis. Curve-fit data show that the liver showed a larger range of allele sizes in peripheral tissues denoting the high instability in these tissues in both rHD1 and rHD7 (Figure 2A,B).
Biomedicines 2022, 10, x FOR PEER REVIEW 6 of 16 All electropherograms are provided in supplemental data ( Figures S1-S4). The curvefit method was used in this study to capture multiple alleles derived from the primary allele following the method described by Mollersen et al. [28], which was used in our previous studies [54,57]. The result of the curve-fit data is provided in supplemental data with the error and goodness of fit value (R 2 ) (Supplemental Tables S1 and S2). The electropherograms show the mosaicism of CAG repeats with different repeat lengths. From these electrograms, peak sizes from the curve-fit data and expansion index were used in further analysis. Curve-fit data show that the liver showed a larger range of allele sizes in peripheral tissues denoting the high instability in these tissues in both rHD1 and rHD7 (Figure 2A,B).

Figure 2.
Curve-fit data of rHD1 and rHD7 arranged from the tissues with the highest CAG expansion to most stable. (A) Testis showed the largest CAG expansion and showed the emergence of multiple large alleles derived from 77Q. The kidney showed little emergence of larger alleles among peripheral tissue samples. Among the central nervous tissue samples, caudate, thalamus, and putamen showed similar emergency of larger alleles derived from 77Q. (B) Liver showed the largest CAG expansion followed by the testis in rHD7 among the peripheral tissue samples. Other peripheral tissue samples showed relatively stable CAG size while the muscle was the most stable. The caudate, hippocampus, and putamen showed a large CAG expansion among the central nervous samples, while the other samples were stable.
In rHD1, adrenal gland, lung, and pancreas showed moderate instability while heart, muscle, and kidney were relatively stable ( Figures 1A and 2A). In the central nervous system (CNS), tissue samples showed relatively high instability across all brain regions except the cerebellum, while caudate, thalamus, and putamen showed the highest instability ( Figures 1A and 2A). In rHD 7, liver, caudate, hippocampus, and putamen showed a broad range of alleles, while the rest of the tissues were relatively stable in allele sizes ( Figures 1B and 2B). A similar trend was observed in both rHD1 and rHD7 where high instability was observed in liver and caudate and putamen in the larger allele ( Figures 1  and 2). Among the tissues from the CNS, a large median value with high CAG mosaicism was observed in caudate and putamen of rHD1 and rHD7 (Figures 2 and S5).
The expansion index was also calculated by modifying the instability index [13]. All WT tissues had 0 expansion indexes suggesting the lack of any CAG repeat expansions in small repeat sizes. The expansion indexes were plotted according to reference alleles (8,35,45, and 77Q for rHD1; 7 and 68Q for rHD7) ( Figure 3).

Figure 2.
Curve-fit data of rHD1 and rHD7 arranged from the tissues with the highest CAG expansion to most stable. (A) Testis showed the largest CAG expansion and showed the emergence of multiple large alleles derived from 77Q. The kidney showed little emergence of larger alleles among peripheral tissue samples. Among the central nervous tissue samples, caudate, thalamus, and putamen showed similar emergency of larger alleles derived from 77Q. (B) Liver showed the largest CAG expansion followed by the testis in rHD7 among the peripheral tissue samples. Other peripheral tissue samples showed relatively stable CAG size while the muscle was the most stable. The caudate, hippocampus, and putamen showed a large CAG expansion among the central nervous samples, while the other samples were stable.
In rHD1, adrenal gland, lung, and pancreas showed moderate instability while heart, muscle, and kidney were relatively stable ( Figures 1A and 2A). In the central nervous system (CNS), tissue samples showed relatively high instability across all brain regions except the cerebellum, while caudate, thalamus, and putamen showed the highest instability ( Figures 1A and 2A). In rHD 7, liver, caudate, hippocampus, and putamen showed a broad range of alleles, while the rest of the tissues were relatively stable in allele sizes ( Figures 1B and 2B). A similar trend was observed in both rHD1 and rHD7 where high instability was observed in liver and caudate and putamen in the larger allele (Figures 1 and 2). Among the tissues from the CNS, a large median value with high CAG mosaicism was observed in caudate and putamen of rHD1 and rHD7 (Figure 2 and Figure S5).
The expansion index was also calculated by modifying the instability index [13]. All WT tissues had 0 expansion indexes suggesting the lack of any CAG repeat expansions in small repeat sizes. The expansion indexes were plotted according to reference alleles (8,35,45, and 77Q for rHD1; 7 and 68Q for rHD7) (Figure 3).
Similar to curve-fit data, testis and liver showed high instability in the 77Q allele in rHD1 ( Figure 3A). In all tissues, 77Q showed relatively high instability ( Figure 3A). In rHD7, liver and all central nervous system tissue samples, except cerebellum, showed high instability ( Figure 3B).
Spearman's correlation test was used to determine whether CAG expansion depends on tissue type (i.e., tissue specificity) or the size of CAG repeat (i.e., size specificity) ( Figure 4).  Similar to curve-fit data, testis and liver showed high instability in the 77Q allele in rHD1 ( Figure 3A). In all tissues, 77Q showed relatively high instability ( Figure 3A). In rHD7, liver and all central nervous system tissue samples, except cerebellum, showed high instability ( Figure 3B).
Spearman's correlation test was used to determine whether CAG expansion depends on tissue type (i.e., tissue specificity) or the size of CAG repeat (i.e., size specificity) ( Figure  4). When expansion indexes were plotted for rHD1 and rHD7 for each correlating tissue, a strong positive correlation was observed between rHD1 and rHD7 with statistical significance (Rs(14) = 0.5341, p = 0.0379) when rHD1 testis was excluded after Grubbs' outlier test (α = 0.05, G = 2.858) ( Figure 4A). The range of the curve-fit data also showed a strong positive correlation between rHD1 and rHD7 (Rs(14) = 0.5926, p = 0.0190) ( Figure 4B). When all expansion indexes of rHD1 and rHD7 were plotted against CAG repeat size (Q size), the expansion index followed a nonlinear regression model (R 2 = 0.9121). The expansion index increases exponentially around 60Q ( Figure 4C).
To further determine the factors contributing to tissue specificity of CAG expansion, lists of the top 100 expressed genes in four tissues with high instability (liver, testis, caudate, and putamen) were retrieved from the Genotype-Tissue Expression (GTEx) database  Similar to curve-fit data, testis and liver showed high instability in the 77Q allele in rHD1 ( Figure 3A). In all tissues, 77Q showed relatively high instability ( Figure 3A). In rHD7, liver and all central nervous system tissue samples, except cerebellum, showed high instability ( Figure 3B).
Spearman's correlation test was used to determine whether CAG expansion depends on tissue type (i.e., tissue specificity) or the size of CAG repeat (i.e., size specificity) ( Figure  4). When expansion indexes were plotted for rHD1 and rHD7 for each correlating tissue, a strong positive correlation was observed between rHD1 and rHD7 with statistical significance (Rs(14) = 0.5341, p = 0.0379) when rHD1 testis was excluded after Grubbs' outlier test (α = 0.05, G = 2.858) ( Figure 4A). The range of the curve-fit data also showed a strong positive correlation between rHD1 and rHD7 (Rs(14) = 0.5926, p = 0.0190) ( Figure 4B). When all expansion indexes of rHD1 and rHD7 were plotted against CAG repeat size (Q size), the expansion index followed a nonlinear regression model (R 2 = 0.9121). The expansion index increases exponentially around 60Q ( Figure 4C).
To further determine the factors contributing to tissue specificity of CAG expansion, lists of the top 100 expressed genes in four tissues with high instability (liver, testis, caudate, and putamen) were retrieved from the Genotype-Tissue Expression (GTEx) database When expansion indexes were plotted for rHD1 and rHD7 for each correlating tissue, a strong positive correlation was observed between rHD1 and rHD7 with statistical significance (R s (14) = 0.5341, p = 0.0379) when rHD1 testis was excluded after Grubbs' outlier test (α = 0.05, G = 2.858) ( Figure 4A). The range of the curve-fit data also showed a strong positive correlation between rHD1 and rHD7 (R s (14) = 0.5926, p = 0.0190) ( Figure 4B). When all expansion indexes of rHD1 and rHD7 were plotted against CAG repeat size (Q size), the expansion index followed a nonlinear regression model (R 2 = 0.9121). The expansion index increases exponentially around 60Q ( Figure 4C).
To further determine the factors contributing to tissue specificity of CAG expansion, lists of the top 100 expressed genes in four tissues with high instability (liver, testis, caudate, and putamen) were retrieved from the Genotype-Tissue Expression (GTEx) database ( Figure S6). Among the 35 genes that were common in all four tissues, genes with expression patterns similar to the trend of CAG instability were selected for further analysis (clusterin (CLU), transferrin (TF), ribosomal protein lateral stalk subunit P1 (RPLP1), and ribosomal protein lateral stalk subunit P1 (RPL13A)) ( Figure 5).
( Figure S6). Among the 35 genes that were common in all four tissues, genes with expression patterns similar to the trend of CAG instability were selected for further analysis (clusterin (CLU), transferrin (TF), ribosomal protein lateral stalk subunit P1 (RPLP1), and ribosomal protein lateral stalk subunit P1 (RPL13A)) ( Figure 5). CLU and TF were highly expressed in the testis, liver, and brain where CAG instability was high ( Figure 5A,B). Two genes (RPLP1 and RPL13A) showed the opposite trend CLU and TF were highly expressed in the testis, liver, and brain where CAG instability was high ( Figure 5A,B). Two genes (RPLP1 and RPL13A) showed the opposite trend as the CAG expansion index ( Figure 5C,D). Among the genes previously that have been associated with CAG expansion in HD, NEIL1 [58] and MSH3 [59,60] were also investigated to see whether their expression correlates with CAG instability (Figure 5D,E). To further investigate the correlation between gene expression patterns and CAG instability in Biomedicines 2022, 10, 1863 9 of 16 different tissues, we plotted gene expression data (transcripts per million (TRM)) against the expansion index and curve-fit data of corresponding tissue ( Figure 6).
as the CAG expansion index (Figure 5C,D). Among the genes previously that have been associated with CAG expansion in HD, NEIL1 [58] and MSH3 [59,60] were also investigated to see whether their expression correlates with CAG instability (Figure 5D,E). To further investigate the correlation between gene expression patterns and CAG instability in different tissues, we plotted gene expression data (transcripts per million (TRM)) against the expansion index and curve-fit data of corresponding tissue ( Figure 6).  rHD1 data showed a strong positive correlation between expansion index and TF expression in rHD1, which was statistically significant (R s (14) = 0.6214, p = 0.0077) while CLU showed a positive correlation that was close to statistical significance (R s (14) = 0.4321, p = 0.0547) ( Figure 6A). RPLP1 and RPL13A expression level showed a statistically significant strong negative correlation with the expansion index ((R s (14) = −0.5750, p = 0.0137) and (R s (14) = −0.5648, p = 0.0152), respectively) ( Figure 6A). However, NEIL1 and MSH3 did not show any significant correlation ( Figure 6A). On the other hand, a strong positive correlation ((R s (14) = 0.5036, p = 0.0291) and (R s (14) = 0.6679, p = 0.004)) between the range of the curve fit data with CLU and TF expression was observed in rHD1, respectively. Although RPLP1 expression level showed a statistically significant negative correlation with the expansion index ((R s (14) = −0.4750, p = 0.0379), RPL13A, NEIL1, and MSH3 did not show any significant correlation ( Figure 6B). For rHD7, TF gene expression showed a strong positive correlation with both expansion index (R s (14) = 0.7143, p = 0.0019) and curve-fit range (R s (14) = 0.7893, p = 0.0004) ( Figure 6A Figure 6A,B). No statistically significant correlation was found for CLU, NEIL1, and MSH3 ( Figure 6A,B).

Discussion
Peripheral tissues from HD monkeys provide a unique opportunity to study CAG instability in different cell types that share the same genetic background. Previously, we demonstrated age-and CAG repeat size-dependent CAG repeat expansion [47]. This study demonstrates tissue/cell-type and CAG size-dependent CAG repeat expansion ( Figure 4). One of the hypotheses of HD pathogenesis is a two-step process. First, the rate of somatic expansion dictates the rate of phenotypic onset. Second, somatic expansion causes cytotoxicity resulting in cell death in the venerable cells. Somatic expansion of the CAG repeat has been well documented in various model systems and humans [13,[25][26][27]30,[50][51][52][53]. A recent report in human postmortem tissues [53] reveals similar findings as our HD monkeys. Moreover, unlike humans, rHD1 carries four alleles with different CAG repeat sizes (8, 35, 45, and 77Q), which provides a unique opportunity to investigate the correlation between CAG repeat sizes and CAG stability in the same individual.
Behaviorally, rHD1 resembles juvenile-onset HD, while rHD7 is comparable to adultonset HD [49], which suggests the expression level of HTT and the size of the HTT fragment are crucial factors for disease onset and severity [49,55]. Tissue-specific instability observed in caudate and putamen of rHD1 and rHD7 was similar to reports in humans and mice [53,58,61,62]. However, the instability of the liver does not typically exceed that of the caudate and putamen, as we observed in rHD7 [13,23,26,27].
Interestingly, a recent human postmortem study showed high CAG repeat size instability in testis and liver similar to rHD1 [53]. Nonetheless, the caudate, putamen, hippocampus, thalamus, and motor cortex were among the most unstable tissues, as seen in other studies [13,[23][24][25]. The cerebellum, prefrontal cortex, and peripheral tissues such as the muscle, heart, adrenal gland, and pancreas of rHD1 were highly unstable, similar to other reports [23,25,27,51,53]. Besides the liver, the testis is the most unstable peripheral tissue.
The use of human polyubiquitin-C promoter in rHD1 resulted in the global expression of mHTT. In contrast, mHTT transgene under the regulation of the human HTT promoter in rHD7 was expected to mimic human HTT expression pattern. Although overall high instability was observed in rHD1, heart, muscle, kidney, and cerebellum appeared to be among the most stable, while the testis, liver, thalamus, caudate, and putamen were the most unstable. In rHD7, liver, hippocampus, caudate, and putamen showed high instability ( Figure 2B) similar to a human postmortem study [53].
It is interesting to note that HD patients with less than 44 repeats have small tissuespecific differences in instability levels [63,64]. In rHD1and in rHD7, higher instability was observed in alleles with a larger repeat size of over 44Q than in alleles with less than 44Q. A prior longitudinal study on three HD monkeys, including rHD7 that carry the same mutant HTT transgene with repeat sizes, ranging between 56Q and 70Q, showed that 62Q might be the threshold of CAG repeat instability leading to large CAG expansion [54]. Current data also demonstrate that around 62Q, the expansion index exponentially increases ( Figure 4C). Tissue specificity was demonstrated by both expansion index and curve-fit data ( Figure 4A,B). Tissue specificity of CAG repeat expansion begs to investigate gene expression in multiple organs to find genetic modifiers that might be involved in CAG expansion. The Genotype-Tissue Expression (GTEx) allows quick screening of the expression pattern of tissue-specific genes that share a similar trend to CAG repeat expansion [65]. Of the top 100 genes in the four tissues with the highest CAG repeat instability, 35 common genes were identified. Among the 35 common genes, 17 genes were mitochondrial genes and did not show a similar expression trend as our instability data. Of the remaining 18, 4 genes CLU, TF, RPLP1, and RPL13A showed similar expression pattern as instability data ( Figure 5). Additionally, the trend of gene expressions that have been associated with HD pathogenesis and CAG repeat expansions, such as huntingtin (HTT), huntingtininteracting protein 1 (HIP1), 8-oxoguanine DNA glycosylase (OGG1) [61], tumor protein 53 (TP53) [66,67], RE1-silencing transcription factor (REST) [68], nuclear factor kappa B (NF-κB) [69], CREB-binding protein (CBP) [70], forkhead box protein 1 (FOXP1) [71], heat shock factor 1 (HSF1) [72], FANCD2-and FANCI-associated nuclease 1 (FAN1) [62,73], postmeiotic segregation increased 1 homolog 1/2 (PMS1/PMS2) [62], mutL homolog 1/3 (MLH1/MLH3) [62,74], transcription elongation regulator 1 (TCERG1) [62], ribonucleotide reductase regulatory TP53 inducible subunit M2B (RRM2B) [62], coiled-coil domain containing 82 (CCDC82) [62], apurinic/apyrimidinic endodeoxyribonuclease 1 (APEX1) [75], DNA ligase 1 (LIG1) [62], breast cancer 1 (BRCA1) [75], nei like DNA glycosylase 1 (NEIL1) [58], and mutS Homolog 2/3 (MSH2/MSH3) [62,76] in various tissues were analyzed. Only NEIL1 and MSH3 showed similar gene expression patterns as CAG repeat expansion ( Figure 6E,F). However, correlation analysis shows a positive correlation between CAG repeat expansion with CLU and TF while a negative correlation between CAG repeats expansion with RPLP1 and RPL13A ( Figure 6). Both CLU and TF are involved in oxidative stress response, apoptosis, and DNA damage response (DDR) and have been reported to be involved in Huntington's disease pathogenesis [77][78][79]. Especially, increased CLU expression has been associated with Alzheimer's disease, where increased CLU decreased toxicity and the aggregation of amyloid-beta (Aβ) [80]. Additionally, CLU is involved in Aβ aggregation and clearance, neuroinflammation, and regulations of neuronal cell cycle and apoptosis [80]. Moreover, RNA sequence analysis of human Huntington's disease brain showed CLU as one of the top differentially expressed genes [81]. One study overexpressing CLU in the COS-7 cell line (African green monkey kidney cell) showed the formation of aggresomes, severe interruption of mitochondrial distribution, and triggering of the mitochondria-mediated apoptotic pathway [82], which are the hallmark phenotypes of neurodegenerative disease such as HD and AD. Transferrin gene expression has also been implicated in HD [78]. Iron overloading has been reported in HD models [78] and HD patients [83][84][85]. Moreover, an antibody against transferrin receptor and deferoxamine (iron chelator) has been successfully used in treating HD symptoms [86,87]. However, its impact on CAG repeat size has not been investigated, which warrants further investigation. Both RPLP1 and RPL13A code for ribosomal proteins and are structural components of the ribosome. Although no association with HD has been reported, their involvement in the elongation step of protein synthesis might suggest why they are negatively regulated in tissues with high CAG repeat instability. Interestingly, neither NEIL1 nor MSH3 showed a statistically strong correlation. Although NEIL1 and MSH3 have been reported in GWAS studies [62] and verified by animal studies [15,58,60], most prior studies were focused on the brain and blood samples. Therefore, a comprehensive multi-omics study including genomic, transcriptomic, epigenomic, and proteomic studies on multiple tissues is a critical step to uncover genetic modifiers that affect CAG instability in HD and other TNR expansion diseases. A recent human postmortem study on HTT CAG and ATXN1 CAG expansion showed high tissue-specific CAG expansion between the two genes [53], which suggests a common pathogenic mechanism among TNR expansion diseases and HD monkeys might provide valuable insight to investigate TNR instability and pathogenesis.
It will be pertinent in the future to replicate this study with a larger sample size. Still, our findings suggest that the HD monkey model could contribute significantly to advancing HD research and preclinical studies. rHD7 had distinct tissue-specific instability that largely mirrored human HD. rHD1 contains multiple alleles with differing CAG repeat sizes that might be useful to investigate the effect of CAG repeat sizes on their stability with the same genetic background.

Conclusions
Our study provides further evidence that CAG repeat expansion is an age-, tissue-, and size-dependent process. This study set the future avenues of investigations that could delineate the biological processes involved in CAG expansion and pathogenesis that can be targeted for future therapeutic development.