Molecular Identification and In Silico Protein Analysis of a Novel BCOR-CLGN Gene Fusion in Intrathoracic BCOR-Rearranged Sarcoma

Simple Summary BCOR (BCL6 corepressor)-rearranged sarcoma (BRS) is a rare sarcoma entity with a predominantly BCOR-CCNB3 fusion. In this paper, we present an index case of BRS with a novel BCOR-CLGN (calmegin) gene fusion that was first identified by next-generation sequencing and then verified by Sanger sequencing. We also carried out in silico protein analysis to demonstrate the 3D structure of the chimera protein. We concluded that, due to its heterogeneity, molecular ancillary tests serve as powerful tools to discover such unusual variants. The fusion protein used in the in silico analysis is an appropriate approach to understanding the exact pathogenesis of such a rare variant. Abstract BCOR (BCL6 corepressor)-rearranged sarcomas (BRSs) are a heterogeneous group of sarcomas previously classified as part of the group of “atypical Ewing” or “Ewing-like” sarcomas, without the prototypical ESWR1 gene translocation. Due to their similar morphology and histopathological features, diagnosis is challenging. The most common genetic aberrations are BCOR-CCNB3 fusion and BCOR internal tandem duplication (ITD). Recently, various new fusion partners of BCOR have been documented, such as MAML3, ZC3H7B, RGAG1, and KMT2D, further increasing the complexity of such tumor entities, although the molecular pathogenetic mechanism remains to be elucidated. Here, we present an index case of intrathoracic BRS that carried a novel BCOR-CLGN (calmegin) gene fusion, exhibited by a 52-year-old female diagnosed initially by immunohistochemistry due to the positivity of a BCOR stain; the fusion was identified by next-generation sequencing and was confirmed by Sanger sequencing. In silico protein analysis was performed to demonstrate the 3D structure of the chimera protein. The physicochemical properties of the fusion protein sequence were calculated using the ProtParam web-server tool. Our finding further broadens the fusion partner gene spectrum of BRS. Due to the heterogeneity, molecular ancillary tests serve as powerful tools to discover these unusual variants, and an in silico analysis of the fusion protein offers an appropriate approach toward understanding the exact pathogenesis of such a rare variant.


Introduction
Aberrant BCOR (BCL6 corepressor) expression has been found in a variety of tumor types, including clear cell sarcoma of the kidney, endometrial sarcoma, rhabdomyosarcoma, and central nervous system and myeloid tumors [1][2][3]. Among these, BCOR-rearranged small blue round cell sarcoma (BRS), which commonly occurs in the bones of young patients, represents a rare soft tissue tumor entity and shares morphological similarities

Next-Generation Sequencing
For NGS library preparation, an RNA-based Archer FusionPlex custom gene panel (Archer DX, Boulder, CO, USA) specific for sarcomas, including the EWSR1 and BCOR genes, was applied to identify the SNVs, indels, and gene fusions. This solution uses anchored primers with known translocation partners and reverse primers that hybridize into sequencing adapters to detect breakpoints and partners. A total of 100-250 ng of RNA was loaded into the assay. cDNA was taken after the first-strand synthesis and run in a quantitative RT-PCR pre-sequencing QC assay. The pre-sequencing quantitative reversetranscription PCR assay was performed in duplicate on cDNA, using primers targeting the control gene transcripts of VCP and using 35 cycles to determine the amount of intact RNA in a given sample. The final libraries were quantified with a KAPA library quantification kit (Roche, Basel, Switzerland), diluted to a final concentration of 4 nM, and pooled by equal molarity.
For sequencing on the MiSeq System (MiSeq Reagent kit, version 3, 600 cycles), the libraries were denatured using 0.2 nM NaOH and diluted to 40 pM with hybridization buffer (Illumina, San Diego, CA, USA). The final loading concentration was 8 pM libraries and 5% PhiX. Captured libraries were sequenced in a multiplexed fashion with a paired-end run, to obtain 2 × 150 bp reads, with a depth of coverage of at least 500×. Sequencing was conducted according to the MiSeq instruction manual. Trimmed fastq files were generated using the MiSeq reporter (Illumina, San Diego, CA, USA) and were uploaded to the Archer Analysis v7 website (Archer DX, Boulder, CO, USA). For alignment, the human reference genome GRCh37 (equivalent UCSC version hg19) was built. Translocations were stated at a fusion sequence of over 5 reads, with reads from gene-specific primers comprising at least 10% of the total reads. Gene fusion frequency was calculated for both the fusion transcript reads and the total reads ratio.

Sanger Sequencing
For confirming the BCOR fusion, detected by NGS, and knowing the breakpoints, Sanger sequencing was performed. To exclude the possibility of an analytical mistake, another RNA isolation method was also used to fulfill the Sanger sequencing (Trizol reagent, Thermo Fisher Scientific, Waltham, MA, USA). RNA quality was determined via eukaryote total RNA nano assays.
For Sanger sequencing, a reverse transcription-polymerase chain reaction (RT-PCR) was carried out. The cDNA quality was tested for the phosphoglycerate kinase 1 (PGK1) housekeeping gene (247 bp of amplified product). One microgram of total RNA was used for cDNA synthesis with a SuperScript ® III First-Strand Synthesis Kit (Invitrogen, Carlsbad, CA). RT-PCR was performed using the Advantage-2 PCR kit (Clontech, Mountain View, CA, USA) for 32 cycles at an annealing temperature of 64 • C. Two pairs of primers were used for excluding analytical problems, which were as follows: BCOR exon 15 forward primers: 5 -GGTGGAATTCACGAACGAAA-3 (F1), 5 -GAATTCACGAACGAAATTCAGA-3 (F2), and CLGN exon 9 reverse primers: 5 -GGATAAATTTTGGTTCATCATCAAG-3 (R1) and 5 -ATCATCAAGCCAGCCAGCA-3 (R2). The lengths of the first and second PCR products were 170 and 150 bp, respectively. The amplified products were purified and sequenced using the Sanger method.
The instability index gives an estimate of a protein's stability in vitro. The aliphatic index of a protein is regarded as a positive factor for the growth of thermostability of globular proteins and is specifically defined as the relative volume occupied by aliphatic side chains (alanine, valine, isoleucine, and leucine). The GRAVY score is calculated as the sum of the hydropathy values of all the amino acids, divided by the number of residues in the sequence.
The 3D structure of the chimera protein was built using the Robetta RoseTTAFold (University of Washington, Seattle, WA, USA) [36] protein structure prediction software. Due to the limitations of RoseTTAFold, we were only able to predict a 1400 amino-acid (AA)-long region of the 2070 AA-long chimera protein. Therefore, the predicted protein encompasses the 671-2070 residues. Disorder prediction was performed by the IUPred3 web server [37]. GORIV was used for secondary structure prediction [38]. To calculate the accessible surface area (ASA), we used the Accessible Surface Area and Accessibility Calculation for Protein web server (ver. 1.2) [39]. The coordinate files of wild-type BCOR and calmegin proteins that were predicted by AlphaFold (AF) were downloaded from the AF database [40,41], while the crystal structure of BCOR's PUFD domain was drawn from the RCSB Protein Data Bank (PDB ID: 4HPL). The crystal structure of the PUFD domain was compared to the Robetta-predicted chimera protein (PUFD-chimera-Robetta) and the PUFD domain of the BCOR protein obtained from the AlphaFold database (PUFD-BCOR-AlphaFold) [9]. Alignment of the 3D structures was performed using the PyMOL molecular graphics system (version 1.2r3pre, Schrödinger, LLC).

Statistical Analysis
A paired-sample t-test was performed to compare the IUPred3 scores between the chimera and wild-type BCOR PUFD domain. A one-way repeated measures (RM) ANOVA was performed to compare the relative ASA (0-1) and ASA (Å2) between chimera, AF predicted and experimentally determined the wild-type BCOR PUFD domains. Tukey's multiple comparison tests were applied to determine the differences between the experimentally determined (PDB ID: 4HPL) and modeled (AlphaFold) structures of the wild-type PUFD domain and the PUFD domain of the chimera protein (Robetta).

Case Report
The index case was a 52-year-old female with a previous medical history of hysterectomy, conducted in 2021 due to an endometrial tumor (the histological diagnosis was of moderately differentiated endometrioid adenosarcoma). One year after the operation, at the postoperative follow-up, a right-side chest wall tumor was found, and positron emission tomography-computed tomography (PET-CT) revealed a right parasternal 5 × 4.5 × 3.5 cm-sized high metabolic-rate tumor with costal cartilage, pectoral muscle, and pleural invasion ( Figure 1A). The result of the core needle biopsy performed in the local hospital was inconclusive. Under the clinical diagnosis of a metastatic tumor or malignant mesothelioma, tumor resection was carried out. The specimen that was received was a polypoid tumor measuring 7.5 cm at its largest diameter. The tumor showed a fleshy and greyish-white cut surface, with hemorrhage and cystic change ( Figure 1B). The patient received four cycles of VIDE (vincristine, ifosfamide, doxorubicin, and etoposide) chemotherapy without radiotherapy. Up until three months of follow-up, no tumor recurrence was identified. in the local hospital was inconclusive. Under the clinical diagnosis of a metastatic tumor or malignant mesothelioma, tumor resection was carried out. The specimen that was received was a polypoid tumor measuring 7.5 cm at its largest diameter. The tumor showed a fleshy and greyish-white cut surface, with hemorrhage and cystic change ( Figure 1B). The patient received four cycles of VIDE (vincristine, ifosfamide, doxorubicin, and etoposide) chemotherapy without radiotherapy. Up until three months of follow-up, no tumor recurrence was identified.

Pathological Findings
Microscopically speaking, the H&E-stained slides showed hyper-/hypocellular small blue round cell proliferation in a lobulated pattern within the fibromyxoid-vascular stroma ( Figure 2A). Infiltration of the adjacent bone and skeletal muscle was also found ( Figure 2B). The tumor cells possessed ovoid to spindle-shaped nuclei with uniform, slightly vesicular chromatin features and a scant amount of clear to mildly eosinophilic cytoplasm ( Figure 2C). The focal areas also revealed prominent nucleoli ( Figure 2D). Tumor necrosis and hemorrhage were also noticed. Tumor mitotic activity was visible in the 18/10 high-power field. During the first round of immunohistochemical stains, the tumor cells showed positivity for CD56, CD99 (diffuse to patchy), INSM1, and NKX2.2, while they were negative for desmin, cytokeratin, SS18-SSX, and WT-1. MIB1 was up to 50%. The second IHC panel revealed diffuse, strong intranuclear positivity for BCOR, TLE1, and SATB2, and was negative for NUT. The intranuclear Brg1 staining was retained ( Figure 2E-K).

Pathological Findings
Microscopically speaking, the H&E-stained slides showed hyper-/hypocellular small blue round cell proliferation in a lobulated pattern within the fibromyxoid-vascular stroma ( Figure 2A). Infiltration of the adjacent bone and skeletal muscle was also found ( Figure 2B). The tumor cells possessed ovoid to spindle-shaped nuclei with uniform, slightly vesicular chromatin features and a scant amount of clear to mildly eosinophilic cytoplasm ( Figure 2C). The focal areas also revealed prominent nucleoli ( Figure 2D). Tumor necrosis and hemorrhage were also noticed. Tumor mitotic activity was visible in the 18/10 high-power field. During the first round of immunohistochemical stains, the tumor cells showed positivity for CD56, CD99 (diffuse to patchy), INSM1, and NKX2.2, while they were negative for desmin, cytokeratin, SS18-SSX, and WT-1. MIB1 was up to 50%. The second IHC panel revealed diffuse, strong intranuclear positivity for BCOR, TLE1, and SATB2, and was negative for NUT. The intranuclear Brg1 staining was retained ( Figure 2E-K).

Fluorescence In Situ Hybridization
Under the original impression of the possibility of Ewing sarcoma, an EWSR1 FISH examination was carried out and no rearrangement of the EWSR1 gene was identified, excluding the possibility of Ewing sarcoma. Based on the BCOR positivity via an immunohistochemical stain, the BCOR break-apart FISH assay was performed with a positive result, indicating BCOR gene rearrangement ( Figure 2L).

NGS and Sanger Sequencing
To analyze not only the SNVs and indels but also gene fusions, NGS panels that are specific for 54 genes identified in sarcomas were applied. BCOR-CLGN gene fusion was detected (reads%: 90.8, depth: 7813, breakpoints: chrX: 39.911.366 and chr4: 141.317.359). The BCOR breakpoint was located in exon 15, while the CLGN breakpoint was affected in exon 9. Nucleotide alterations were not proven. To confirm the BCOR-CLGN fusion and identify the breakpoints, RT was followed by Sanger sequencing. The bidirectional electropherogram of the fusion genes is presented in Figure 3.

Fluorescence In Situ Hybridization
Under the original impression of the possibility of Ewing sarcoma, an EWSR1 FISH examination was carried out and no rearrangement of the EWSR1 gene was identified, excluding the possibility of Ewing sarcoma. Based on the BCOR positivity via an immunohistochemical stain, the BCOR break-apart FISH assay was performed with a positive result, indicating BCOR gene rearrangement ( Figure 2L).

NGS and Sanger Sequencing
To analyze not only the SNVs and indels but also gene fusions, NGS panels that are specific for 54 genes identified in sarcomas were applied. BCOR-CLGN gene fusion was detected (reads%: 90.8, depth: 7813, breakpoints: chrX: 39.911.366 and chr4: 141.317.359). The BCOR breakpoint was located in exon 15, while the CLGN breakpoint was affected in exon 9. Nucleotide alterations were not proven. To confirm the BCOR-CLGN fusion and identify the breakpoints, RT was followed by Sanger sequencing. The bidirectional electropherogram of the fusion genes is presented in Figure 3.
The full-length wild-type BCOR (UniProt ID: Q6W2J9) and calmegin (UniProt ID: O14967) proteins consist of 1755 and 610 residues, respectively. The BCOR-calmegin chimera protein consists of a total of 2070 residues, encompassing 1-1755 residues of BCOR and 296-610 residues of calmegin (Table 1). Upon fusion, the calmegin protein loses the 1-19 AA signal sequence that is responsible for its endoplasmic reticulum localization. Based on the PhosphoSitePlus database, the region of calmegin that is missing from the chimera protein (1-295) contains multiple post-translation sites, ubiquitylation on Lys101, a monomethylation on Arg109, and acetylation sites (Lys190 and Lys291). At the same time, the transmembrane segment of calmegin (472-492) is present in the fusion protein  residues, according to the chimera numbering). The MCP prediction revealed that this segment of fusion protein has an average probability of 0.76 (on a scale of 0-1.0). This implies the presence of the transmembrane region close to the C-terminus of the chimera, but it remains to be determined whether the chimera protein is membrane-associated. Accordingly, 1-1755 residues of the chimera correspond to the full-length BCOR protein, while the 1756-2070 residues correspond to the truncated calmegin. At the breakpoint, two nucleotides of the BCOR gene (TG) and one nucleotide of the CLGN gene (G) encode Trp with the TGG codon; this Trp in the 1755th position corresponds to the wild-type C-terminal residue of the BCOR protein. At the breakpoint, the 1766th residue of the chimera protein is the 296th Asp of calmegin. The physicochemical properties of the fusion protein sequence were calculated using the ProtParam web server tool. It was found that the fusion protein is composed of 2070 residues, the calculated molecular weight is 228 kDa, and the pI is 5.05. The total number of negatively charged residues (D + E) was 295, and the total number of positively charged residues (R + K) was 203 ( Table 1).
The molecular weight of the chimeric protein was 15.8% and 69.3% higher than those of the wild-type BCOR and calmegin proteins, respectively. The total hydrophobicity of the chimera protein was similar, with a 3.97% and 5.38% change compared to the wild-type proteins. Nevertheless, the aliphatic index and instability index values of the chimera and the wild-type BCOR proteins are similar (there were <0.5 differences in the predicted values), while the difference is notably higher in the case of calmegin; the aliphatic index was 7.10% lower, while the instability index was 18.97% higher for the chimera. Overall, the predicted stabilities of the wild-type BCOR and the chimera proteins were highly comparable ( Table 1).
The region of interest for the analyses was 1634-1748 because this is where the BCOR's PUFD domain is located. We performed secondary structure prediction using the GORIV web server, but it did not show any considerable changes; the predicted secondary structure of the PUFD domain was identical in the wild-type BCOR and the chimera. The 296-610 region of calmegin was also predicted to have the same arrangement of secondary structural elements, even in the chimera protein.
We used IUPred3 to predict the disorder propensities of the regions of the PUFD domain in the wild-type and chimera proteins ( Figure 4A). We predicted a slight increase in the disorder propensity, but the PUFD domain was predicted to retain its structurally ordered nature in the chimera protein. This is in agreement with the results of the secondary structure prediction. A paired-sample t-test was performed to compare the IUPred3 scores (0-1) ( Figure  4B) between the chimera and wild-type BCOR PUFD domains. There was a significant difference in the IUPred3 scores between the chimera (M = 0.1686, SD = 0.08402), and wildtype domains (M = 0.1162, SD = 0.06796); t(114) = 10.81, p < 0.0001.
Statistical analyses showed that both ASA Area 57.73 Å 2 and relative ASA 0.3368 means were reduced in the PUFD-chimera-Robetta compared to PUFD-BCOR-AlphaFold 70.63 Å 2 , 0.4061 and PUFD-BCOR-4HPL.pdb, 69.23 Å 2 , 0.3972. It was also found that there was no significant difference in either ASA Area (Å 2 ) or relative ASA, between the PUFD-BCOR-4HPL.pdb and the PUFD-BCOR-AlphaFold ( Figure 6B,C). It is worth noting that the differences in structure between the terminal ends of the domain are more apparent in Figure 6A.

Discussion
BRS mainly affects children and young adults, with a wide age distribution [19], among which it has been reported that BCOR-CCNB3-arranged sarcoma occurred preferentially in children with skeletal distribution, whereas the alternative BCORrearranged sarcomas have more variable anatomic distribution [17]; our index fits into this clinical scenario. The differential diagnosis of the small blue round cell tumor is wide, including carcinoma, sarcoma, melanoma, and lymphoma, depending on the patient's The RM one-way ANOVA significant difference, based on a relative ASA (0-1). A significant decrease in the PUFDchimera-Robetta area ASA (B) and the relative ASA (C) was detected, compared to the two wild-type PUFD domains (p < 0.0001). The graph in part (A) shows the ASA values starting from the original amino acid 1634th, while the 1634th and 1635th residues were omitted from the RM one-way ANOVA statistical calculations (B,C) because these residues are missing from the PUFD-BCOR-4HPL.pdb structure (PDB ID: 4HPL).
Statistical analyses showed that both ASA Area 57.73 Å 2 and relative ASA 0.3368 means were reduced in the PUFD-chimera-Robetta compared to PUFD-BCOR-AlphaFold 70.63 Å 2 , 0.4061 and PUFD-BCOR-4HPL.pdb, 69.23 Å 2 , 0.3972. It was also found that there was no significant difference in either ASA Area (Å 2 ) or relative ASA, between the PUFD-BCOR-4HPL.pdb and the PUFD-BCOR-AlphaFold ( Figure 6B,C). It is worth noting that the differences in structure between the terminal ends of the domain are more apparent in Figure 6A.

Discussion
BRS mainly affects children and young adults, with a wide age distribution [19], among which it has been reported that BCOR-CCNB3-arranged sarcoma occurred preferentially in children with skeletal distribution, whereas the alternative BCOR-rearranged sarcomas have more variable anatomic distribution [17]; our index fits into this clinical scenario. The differential diagnosis of the small blue round cell tumor is wide, including carcinoma, sarcoma, melanoma, and lymphoma, depending on the patient's age, anatomical location, and specific genetic aberrations, indicating that a battery of immunohistochemical stains is usually necessary.
In our case, based on the clinical information, diagnoses of metastatic endometrioid adenocarcinoma, mediastinal small cell neuroendocrine carcinoma, and Ewing sarcoma were our initial impressions. Although our case study showed the expression of neuroendocrine markers, such as CD56 and INSM1, with negativity for cytokeratin, NUT, desmin, and SS18-SSX, the retention of the Brg1 intranuclear stain can exclude the possibility of neuroendocrine carcinoma, NUT carcinoma, rhabdomyosarcoma, poorly differentiated synovial sarcoma, and the SMARCA4 deficient thoracic tumor, respectively. CD99 and NKX2.2 positivity gave rise to the thought that it could be the Ewing sarcoma. Nevertheless, the lack of ESWR1 gene rearrangement by FISH made us wonder whether it might be a so-called "Ewing-like" tumor. The result of the second IHC panel prompted us to consider a BCOR-rearranged sarcoma. During the NGS examination, we identified the novel BCOR-CLGN fusion. The immunohistochemical profile of the case showed similarity with other BRS; indeed, it has been reported that, from the transcriptome point of view, a distinctive cluster was found at the transcriptional level in BRS, which is different from Ewing sarcoma [42], reinforcing the concept that BRSs, either as ITD or rearrangements, exhibit a common pathogenic pathway, leading to similar morphology and immunophenotype.
BRS was originally reported to exhibit BCOR-ITD and fusion with CCNB3 within the X-chromosome, due to paracentric inversion [4]. Later, several fusion gene variants were documented, including ZC3H7B, MAML3 [17], KMT2D [19], and, recently, RGAG1 [18]. The oncogenic mechanisms of those fusion variants are largely unclear. Interestingly enough, in our index case, the CLGN is situated in proximity to MAML3, and both are in chromosome 4q31.1. Our NGS result identified the in-frame breaking points of BCOR (chrX: 39.911.366) in exon 15 and CLGN (chr4:141.317.359) in exon 9, respectively. To validate this finding, we performed RT-PCR and Sanger sequencing, which also revealed the in-frame fusion of both BCOR and CLGN and further verified our novel finding.
It is known that the BCOR gene has 16 alternative exons that regulate germinal center formation in lymph nodes and apoptosis. BCOR is also known to be a member protein of polycomb repressive complex 1.1 (PRC1.1) through its PUFD domain, in exon 15, which interacts with the RAWUL domain of the PCGF protein of PRC1.1 [16], which, in turn, although binding the non-methylated CpG islands, ubiquitylates histone 2A to repress gene expression [6]. BCOR plays an important role in tissue development and maintains mesenchymal stem cell function [43], while its high expression maintains cells in their pluripotent status [12,44]. In our case, since the coding sequences of BCOR were largely preserved, it might contribute a significant oncogenic role by exerting its genetic silencing effect via epigenetic regulation [45]. On the other hand, CLGN encodes a protein called calmegin, which is a testis-specific endoplasmic reticulum chaperone protein that plays an important role in spermatogenesis, intracellular calcium homeostasis, and the synthesis of proteins and steroid hormones. It is usually expressed in the testis, prostate, and heart. It has been documented that calmegin is transcriptionally regulated by histone deacetylase and CpG methyltransferase [46]; the deregulation of calmegin methylation capability may lead to infertility, endocrine-, prostatic-and germ cell neoplasms [47][48][49][50][51]. However, a CLGN translocation/fusion-associated tumor, based on our best knowledge, has not been reported as yet. We assume that the BCOR-CLGN chimeric protein reported herein may exert its oncogenic potential via the antiapoptotic effect of BCOR and epigenetic dysregulation from the calmegin.
We found that the protein product of the BCOR-CLGN fusion gene is composed of full-length BCOR protein (1-1755) and the 295-610 region of calmegin; the chimera protein consists of a total of 2070 residues. Using the ProtParam tool, the physicochemical characteristics of the wild-type BCOR protein and fusion protein were calculated and compared. By a disorder prediction using the IUPred3 web server, we found a significant increase in the PUFD domain's disorder propensity in the chimera protein, compared to the wild-type BCOR protein's domain. Using the Robetta webserver, a 1400 AA long region of the 2070 AA long chimera protein (encompassing its 671-2070 residues) was built up; this proposed structure was used to perform alignments to compare the folding of the chimera to those of the wild-type BCOR and calmegin proteins. The crystal structure of the PUFD domain (PDB ID: 4HPL) was also compared to the PUFD-chimera-Robetta and PUFD-BCOR-AlphaFold structures to visualize the structural differences and similarities of the domain ( Figure 5). One limitation of this comparison of the 3D structures is that the Robetta web server has a 1400 residue prediction limit; therefore, it was not possible to predict the structure of the full-length chimera protein. In addition, the experimentally determined PUFD domain (PDB ID: 4HPL) lacked the 1634th and 1635th residues, which were not included in the comparison. However, the predicted structure contained the entire PUFD domain of the BCOR protein (671-1755), the breakpoint, and also the truncated calmegin (1755-2070), which made it possible to estimate the structure of the chimera and the effects of protein fusion on the PUFD domain. We assume that the 1-670 region of BCOR (which is missing from the modeled chimera structure) retains the overall structural characteristics of the wild-type BCOR upon fusion with the truncated calmegin. Although we have predicted changes in its disorder propensities, the comparison of the predicted structures revealed that the PUFD domain of BCOR may retain its globular fold in the chimera ( Figure 5G). This hypothesis is in agreement with the results of the secondary structure prediction, which also implied that there would be no changes to the BCOR and calmegin proteins upon fusion.
The full-length calmegin protein contains a signal sequence at its N-terminus for translocation to the endoplasmic reticulum. This signal sequence is missing from the calmegin in the chimera protein, indicating that it is not localized in this cell compartment. Rather, IHC revealed strong intranuclear positivity for BCOR, indicating that not only the wild-type BCOR but the chimera is also localized in the nucleus.
Based on our ASA area ( Figure 6B) and relative ASA ( Figure 6C) analyses, the PUFD domain (especially the N-and C-terminal region) has a reduced surface accessibility in the chimera protein. Due to the lack of two residues (1634 and 1635) in the crystal structure (PDB ID: 4HPL), the ASA calculations were used to examine only the 1636 to 1748 region of the PUFD domain via an RM one-way ANOVA analysis. Increased disorder ( Figure 4) and decreased ASA (Figure 6) of the PUFD domain of the chimera protein may potentially affect its interaction with non-canonical polycomb repressive complex 1 (PRC1) and the maintenance of H3K27me3. Increased disorder and decreased ASA values of the BCOR's PUFD domain in the BCOR-calmegin chimera protein (at least at the C-terminus of the domain) can be explained by the extension of the full-length BCOR protein at its C-terminus with the 295-610 region of calmegin. In addition, the presence of the truncated calmegin in the chimera protein may potentially interfere with the allosteric properties and intermolecular interactions (including the posttranslational modifications) of the wild-type BCOR. The in silico results may be proven by future in vitro experiments and may confirm the occurrence of this hypothesized reduction in its interaction with PRC1.1.

Conclusions
In this study, we report the first case of BCOR-rearranged sarcoma with a novel CLGN fusion that shared morphological and immunophenotypical similarities with other more common fusion variants, which contribute to the continued expanding molecular subtypes of BRS. Molecular ancillary tests, such as NGS and confirmatory Sanger sequencing, serve as powerful tools to discover these unusual variants. In addition, the in silico analysis of the BCOR-CLGN fusion protein is an appropriate approach to aid in better understanding the exact pathogenesis of such a rare variant, via estimation of the fusion protein's characteristics. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available to protect the rights of the patient.

Conflicts of Interest:
The authors declare no conflict of interest.