Surfaceome Profiling of Cell Lines and Patient-Derived Xenografts Confirm FGFR4, NCAM1, CD276, and Highlight AGRL2, JAM3, and L1CAM as Surface Targets for Rhabdomyosarcoma

Rhabdomyosarcoma (RMS) is the most common soft tissue sarcoma in children. The prognosis for patients with high-grade and metastatic disease is still very poor, and survivors are burdened with long-lasting side effects. Therefore, more effective and less toxic therapies are needed. Surface proteins are ideal targets for antibody-based therapies, like bispecific antibodies, antibody-drug conjugates, or chimeric antigen receptor (CAR) T-cells. Specific surface targets for RMS are scarce. Here, we performed a surfaceome profiling based on differential centrifugation enrichment of surface/membrane proteins and detection by LC-MS on six fusion-positive (FP) RMS cell lines, five fusion-negative (FN) RMS cell lines, and three RMS patient-derived xenografts (PDXs). A total of 699 proteins were detected in the three RMS groups. Ranking based on expression levels and comparison to expression in normal MRC-5 fibroblasts and myoblasts, followed by statistical analysis, highlighted known RMS targets such as FGFR4, NCAM1, and CD276/B7-H3, and revealed AGRL2, JAM3, MEGF10, GPC4, CADM2, as potential targets for immunotherapies of RMS. L1CAM expression was investigated in RMS tissues, and strong L1CAM expression was observed in more than 80% of alveolar RMS tumors, making it a practicable target for antibody-based therapies of alveolar RMS.


Introduction
Pediatric rhabdomyosarcoma (RMS) is the most common soft tissue sarcoma in children and young adults [1]. Each year it accounts for 3% of childhood cancers in the United States [2]. RMS is a heterogeneous group of malignant and metastatic tumors, which originate from a primitive mesenchymal cell [3]. Based on histology, RMS can be classified into different subtypes: embryonal RMS (eRMS; 60-70%) and alveolar RMS (aRMS; 20-30%) are the main subtypes; pleomorphic (pRMS) and spindle cell/sclerosing (s-scRMS) account for 7-15% of the cases [4]. The aggressive aRMS tumors carry one of the two characteristic chromosomal translocations, the t(2; 13)(q35; q14) or the t(1; 13)(p36; q14), which result

Isolation and Enrichment of Membrane/Surface Proteins
In order to identify novel and specific targets upregulated on the surface of rhabdomyosarcoma (RMS) cells by mass spectrometry (MS), we initially compared two methods for isolation of membrane/surface proteins: the first based on biotin labeling of cell surface proteins with the cleavable EZ-Link-Sulfo-NHS-SS-biotin, followed by isolation with a NeutrAvidin agarose column, and reducing elution with dithiothreitol (DTT); the second based on differential centrifugations and washes at high pH and high salts concentration [52]. In a preliminary experiment performed in triplicates with the Rh4 cell line, we could detect MS 2667 proteins with the surface biotinylation method and 2851 proteins with the differential centrifugations method. A total of 1918 proteins were detected with both methods ( Figure 1A). A low enrichment for surface proteins and the high sensitivity of MS may often result in the detection of intracellular proteins. rhabdomyosarcoma (RMS) cells by mass spectrometry (MS), we initially compared two methods for isolation of membrane/surface proteins: the first based on biotin labeling of cell surface proteins with the cleavable EZ-Link-Sulfo-NHS-SS-biotin, followed by isolation with a NeutrAvidin agarose column, and reducing elution with dithiothreitol (DTT); the second based on differential centrifugations and washes at high pH and high salts concentration [52]. In a preliminary experiment performed in triplicates with the Rh4 cell line, we could detect MS 2667 proteins with the surface biotinylation method and 2851 proteins with the differential centrifugations method. A total of 1918 proteins were detected with both methods ( Figure 1A). A low enrichment for surface proteins and the high sensitivity of MS may often result in the detection of intracellular proteins.
To determine the enrichment efficiency of the two methods, the detected proteins were filtered with a list of 2886 annotated surface proteins, published by Bausch-Fluck et al. [53] (Supplementary Information List A), and with a list containing 7643 proteins compiled to include all the annotated membrane/surface proteins ( Supplementary  Information List B). This analysis showed that the differential centrifugations protocol produced a lower background (~35%) compared to the biotinylation protocol, which resulted in the detection of ~49% of intracellular proteins ( Figure 1B). Interestingly, the differential centrifugations protocol resulted in a higher enrichment of annotated and predicted surface proteins. Therefore, this method was used for the next experiments. Figure 1. Comparison of surface biotinylation and differential centrifugations for the enrichment of membrane/surface proteins. Two methods for the enrichment of membrane/surface proteins were compared in a pilot experiment with the Rh4 cell line in triplicates. (A) More proteins were detected after differential centrifugations enrichment than after surface biotinylation, but there was a consistent overlap between the two methods. (B) Differential centrifugations resulted in the enrichment of a higher number of membrane/surface proteins than surface biotinylation and in a lower background of intracellular proteins. Created with Biorender.com. Figure 1. Comparison of surface biotinylation and differential centrifugations for the enrichment of membrane/surface proteins. Two methods for the enrichment of membrane/surface proteins were compared in a pilot experiment with the Rh4 cell line in triplicates. (A) More proteins were detected after differential centrifugations enrichment than after surface biotinylation, but there was a consistent overlap between the two methods. (B) Differential centrifugations resulted in the enrichment of a higher number of membrane/surface proteins than surface biotinylation and in a lower background of intracellular proteins. Created with Biorender.com.
To determine the enrichment efficiency of the two methods, the detected proteins were filtered with a list of 2886 annotated surface proteins, published by Bausch-Fluck et al. [53] (Supplementary Information List A), and with a list containing 7643 proteins compiled to include all the annotated membrane/surface proteins (Supplementary Information List B). This analysis showed that the differential centrifugations protocol produced a lower background (~35%) compared to the biotinylation protocol, which resulted in the detection of~49% of intracellular proteins ( Figure 1B). Interestingly, the differential centrifugations protocol resulted in a higher enrichment of annotated and predicted surface proteins. Therefore, this method was used for the next experiments.

Surfaceome Profiling Strategy and Proteomics Results Analysis
Eleven RMS cell lines, three PDXs, and MRC-5 human embryonal fibroblasts and primary myoblasts, as controls, were cultured, and surface membrane proteins were enriched following the differential centrifugations protocol, as shown in Figure 2.

Surfaceome Profiling Strategy and Proteomics Results Analysis
Eleven RMS cell lines, three PDXs, and MRC-5 human embryonal fibroblasts and primary myoblasts, as controls, were cultured, and surface membrane proteins were enriched following the differential centrifugations protocol, as shown in Figure 2. The isolated proteins were then processed, and detection was performed by LC-MS. In total, 7373 proteins were detected and quantified (iTop3 values in Supplementary Information Tables S1 and S2). To analyze the MS data, we then applied the strategy summarized in Figure 3. RMS cell lines and PDXs are used for differential centrifugation enrichment of membrane/surface proteins. RMS cell lines and PDXs, as well as the normal controls MRC-5 human embryonal fibroblasts and primary human myoblast, were cultured and used for the isolation of membrane/surface proteins. All experiments were performed in triplicates. Created with Biorender.com.
The isolated proteins were then processed, and detection was performed by LC-MS. In total, 7373 proteins were detected and quantified (iTop3 values in Supplementary Information Tables S1 and S2). To analyze the MS data, we then applied the strategy summarized in Figure 3.
The 7373 proteins were then filtered with List A, revealing 699 membrane/surface proteins that were then selected to generate List C (Supplementary Information Table S3. To prioritize membrane/surface proteins with high and consistent expression in RMS cells lines and PDXs, and low or absent expression in controls, List C was processed with a scoring strategy taking into account the following parameters: (1) a number of RMS cell lines in which a protein was detected; (2) abundance mean, defined as "iTop3 mean", of all the RMS cell lines; (3) ratio of the iTop3 values between PDXs, FP-RMS, and FN-RMS, with the controls MRC-5 and the primary myoblasts, expressed as the base-2 logarithm of Fold Change (Log 2 (FC)); (4) no detection in the controls MRC-5 and primary myoblasts; (5) high expression in the PDXs, since these are biologically closer to primary tumors (Tables 1 and 2). This approach attributes the lowest scores to the most abundant proteins in the three groups, PDXs, FP-RMS and FN-RMS, but not in the controls. An analysis of proteins upregulated two-fold in the different groups is available in Supplementary Information Tables S4-S7. The comprehensive list of all the ranked proteins is available as Supplementary Information (Supplementary Information Table S8). The first 100 proteins ranked by this scoring are presented in Table A1, as Top100, and in detail in Supplementary Information Tables S9 and S10.  . Schematic outline of the strategy used to analyze mass spectrometry data. The surfaceome of six FP-RMS cell lines, five FN-RMS cell lines, and three RMS PDXs were analyzed by MS. The strategy adopted to analyze the MS data is shown. The MS results were filtered with a list of annotated surface proteins (List A). A total of 699 proteins predicted to be surface proteins (List C) were then further prioritized by a scoring strategy to identify highly expressed proteins specific to RMS.
The 7373 proteins were then filtered with List A, revealing 699 membrane/surface proteins that were then selected to generate List C (Supplementary Information Table S3. To prioritize membrane/surface proteins with high and consistent expression in RMS cells lines and PDXs, and low or absent expression in controls, List C was processed with a scoring strategy taking into account the following parameters: (1) a number of RMS cell lines in which a protein was detected; (2) abundance mean, defined as "iTop3 mean", of all the RMS cell lines; (3) ratio of the iTop3 values between PDXs, FP-RMS, and FN-RMS, with the controls MRC-5 and the primary myoblasts, expressed as the base-2 logarithm of . Schematic outline of the strategy used to analyze mass spectrometry data. The surfaceome of six FP-RMS cell lines, five FN-RMS cell lines, and three RMS PDXs were analyzed by MS. The strategy adopted to analyze the MS data is shown. The MS results were filtered with a list of annotated surface proteins (List A). A total of 699 proteins predicted to be surface proteins (List C) were then further prioritized by a scoring strategy to identify highly expressed proteins specific to RMS.

Statistical Analysis of the Filtered Proteins Highlights Five Putative Therapeutic RMS Surface Targets
In parallel to the above selection of surface proteins, two different statistical analyses were performed in order to identify the most significant putative surface targets. First, an individual cell-to-control differential expression test was performed. More specifically, the Empirical Bayes (EB) or moderated t-test was applied, as implemented in R [54,55].
Considering an average Log 2 (FoldChange) ≥ 2 versus an average EB statistic ≥ 2.132 across comparisons within a class (FP-RMS, FN-RMS, and PDXs), 63 proteins were identified as upregulated in all RMS groups, 32 of which were present in the Top100 corroborating our first selection (Supplementary Table S14). Among these, AGRL2, AQP1, EPHA7, ERBB3, FGFR4, GAS1, GPC2, GPC3, GPC4, IL17RD, MEGF10, NRCAM, NECTIN1 are highlighted in Figure 4A. L1CAM was significantly upregulated in FP-RMS and FN-RMS, NCAM1 only in FN-RMS, and JAM-3 only in FP-RMS ( Figure 4A). Considering an average Log2(FoldChange) ≥ 2 versus an average EB statistic ≥ 2.132 across comparisons within a class (FP-RMS, FN-RMS, and PDXs), 63 proteins were identified as upregulated in all RMS groups, 32 of which were present in the Top100 corroborating our first selection (Supplementary Table S14). Among these, AGRL2, AQP1, EPHA7, ERBB3, FGFR4, GAS1, GPC2, GPC3, GPC4, IL17RD, MEGF10, NRCAM, NECTIN1 are highlighted in Figure 4A. L1CAM was significantly upregulated in FP-RMS and FN-RMS, NCAM1 only in FN-RMS, and JAM-3 only in FP-RMS ( Figure 4A). The second statistical analysis, called linear mixed model (LMM) and derived from the R implementation DREAM [56], is a statistical evaluation of all the respective FP-RMS, FN-RMS, and PDXs groups versus the controls, even though the groups are themselves collection of subgroups of replicates. The LMM analysis considers the variations within the cell lines as well. The LMM results were very stringent, and only AGRL2 was confirmed as significantly overexpressed in all three RMS groups. FGFR4 was identified in FP-RMS, L1CAM in FN-RMS, and GPC4 in PDXs ( Figure 4B and Supplementary Information Table S13). To note is that LMM selected a larger number of downregulated than upregulated proteins in the RMS groups compared to the controls. The second statistical analysis, called linear mixed model (LMM) and derived from the R implementation DREAM [56], is a statistical evaluation of all the respective FP-RMS, FN-RMS, and PDXs groups versus the controls, even though the groups are themselves collection of subgroups of replicates. The LMM analysis considers the variations within the cell lines as well. The LMM results were very stringent, and only AGRL2 was confirmed as significantly overexpressed in all three RMS groups. FGFR4 was identified in FP-RMS, L1CAM in FN-RMS, and GPC4 in PDXs ( Figure 4B and Supplementary Information Table S13). To note is that LMM selected a larger number of downregulated than upregulated proteins in the RMS groups compared to the controls.
In conclusion, extended statistical analyses detected AGRL2, ranked first by our ranking strategy, as significantly overexpressed in all samples. Detection in several groups of FGFR4, a well-established target for RMS, validates our approach.

Expression of the Top100 Proteins in Normal Tissues
During the selection of the putative targets, we considered MRC-5 normal embryonal fibroblasts and immortalized primary myoblasts as controls. An ideal RMS target should be expressed at high levels in RMS and not, or at low levels, in all normal tissues. Therefore, to evaluate the expression of the Top100 proteins in normal tissues, we took advantage of proteomics data for normal tissues available from Proteomicsdb.org (accessed on 25 November 2022) [57][58][59]. The expression heatmap generated with the MS1 Top3 values (Tissue, SWISS-PROT only) confirms that FGFR4 is a very specific target because it is detected only in the colon, lung, and liver ( Figure 5, green square) and highlights other excellent targets clustering together with FGFR4: Glypican-2 (GPC2), detected only in the testis and heart at low levels, and in spermatozoon and brain at medium levels; Multiple epidermal growth factor-like domains protein 10 (MEGF10), detected in brain, prefrontal cortex, and salivary gland at low levels, and in arachnoid cyst at medium levels; and Claudin-15 (CLDN15), detected only in duodenum, liver, and small intestine at low levels ( Figure 5, green square).
Moreover, a cluster of candidates ( Figure 5, blue square), including GPC4, GPC6, CD276, NCAM1, and L1CAM, are detected only at low-medium levels in most tissues. Interestingly, AGRL2 (LPHN2) and JAM3 ( Figure 5, yellow square) clustering loosely together are detected in about 30 tissues but at low levels in almost all of them. For AGRL2, the highest expression is detected in the urinary bladder, myometrium, thyroid gland, oviduct, adrenal gland, and placenta.

Specific and High mRNA Expression of the Candidates in Patients' RMS Samples
Since a direct comparison of our data with the normalized proteomic expression data in normal tissues from ProteomicsDB ( Figure 5) is not possible, to investigate the therapeutic potential of the selected targets, we analyzed their expression in RMS patients' samples and in normal tissues, by using transcriptomics data published by Brohl et al. [49] (Supplementary Information Tables S11 and S12). Transcriptomics analysis of normal tissues confirms the selective RMS expression of the candidates. Indeed, the most representative candidates, e.g., FGFR4, show a relatively low FPKM number in normal tissues when compared to RMS tumors, where expression is highest in FP-RMS ( Figure 6). Highly specific expression of MEGF10 was observed in tumor samples, particularly in FP-RMS, compared to normal tissues. For MEGF10, the highest expression among normal tissues is observed in the cerebrum and cerebellum. CD276, JAM3, and NCAM1 also show higher expression in tumors compared to normal tissues, although expression in normal tissues is higher than for FGFR4 and MEGF10. Expression of GPC4 is high in the lungs, of L1CAM in the brain, and of GPC4 in the stomach. For these targets, a careful evaluation of protein expression in normal tissues will be required.
Taken together, these results validate our selection strategy and show that the targets of interest were indeed detected at highest levels on PDXs, suggesting that they might be valuable therapeutic targets for RMS.   Table S15, and are the mean total sum normalized protein expression value across all samples that are stored in the database PrtoteomicsDB.
6). Highly specific expression of MEGF10 was observed in tumor samples, particularly in FP-RMS, compared to normal tissues. For MEGF10, the highest expression among normal tissues is observed in the cerebrum and cerebellum. CD276, JAM3, and NCAM1 also show higher expression in tumors compared to normal tissues, although expression in normal tissues is higher than for FGFR4 and MEGF10. Expression of GPC4 is high in the lungs, of L1CAM in the brain, and of GPC4 in the stomach. For these targets, a careful evaluation of protein expression in normal tissues will be required. and 5-20 normal tissues per organ [49]. FGFR4, MEGF10, and CD276 show a clear higher expression in tumors (grey boxes) compared to normal tissues (white boxed). Box and whiskers show the median with the 25th to 75th percentiles. Bars represent the minimum and maximum values.
Next, we analyzed the distribution of peptides abundance of the most promising putative targets with a Log2(FC) > 2 in the RMS samples ( Figure 7). The highest median Log2(iTop3) value was observed for NCAM1, followed by JAM3, CD276, FGFR4, AGRL2, CADM2, L1CAM, MEGF10, and GPC4. Importantly, all were consistently found in PDXs (green dots). Taken together, these results validate our selection strategy and show that the targets of interest were indeed detected at highest levels on PDXs, suggesting that they might be valuable therapeutic targets for RMS.

Validation of AGRL2, L1CAM, and JAM3 Expression on RMS Cell Lines
After performing surfaceome analysis and in silico selection for RMS surface targets, several candidates stood out in terms of high expression in RMS samples (NCAM1, JAM3, CD276, FGFR4, AGRL2, CADM2, L1CAM, and MEGF10) and some showed a particular low expression in normal tissues (FGFR4, MEGF10, and CD276). FGFR4 and NCAM1 are known targets for RMS; therefore, to reveal novel targets for RMS, we selected AGRL2, JAM3, and L1CAM and investigated the surface expression by Flow Cytometry on the eleven RMS cell lines and the two controls, MRC-5 and myoblasts, used in this study ( Figure 8). For AGRL2 and JAM3, no directly labeled antibodies are commercially available; therefore, we had to use a two-step incubation with fluorescent secondary antibodies.

Expression of L1CAM in RMS Tumors and Inverse Correlation with Survival
We next investigated the expression of L1CAM on a tissue microarray (TMA) with 248 cores from 124 RMS tumors, consisting of 24 ARMS and 100 ERMS [60]. Not all the cores were evaluable, so in the end, 17 ARMS and 60 ERMS could be evaluated. Most of ARMS showed strong ( Figure 9A, upper) or medium ( Figure 9A, lower) L1CAM staining. In contrast, 95% of ERMS were negative. The H-score indicates how 85% of ARMS have high L1CAM expression, while the great majority of ERMS is negative.  All RMS cells were positive for AGRL2, while the staining for MRC-5 fibroblasts and myoblasts was not above the control staining. The FP-RMS cell lines showed stronger staining than FN-RMS cell lines. JAM3 staining of RMS cell lines was consistently higher than AGRL2; however, staining in myoblasts and MRC-5 fibroblasts was higher than with isotype control, even though it was clearly lower than in RMS cell lines. L1CAM staining was clearly much higher in RMS cell lines compared to the controls. Therefore, these results demonstrate that the three proteins are expressed at high levels on most RMS cell lines and are expressed at much lower levels in the controls. This, on one side, validates our surfaceome profiling and selection strategy and, on the other side, reveals AGRL2, JAM3, and L1CAM as novel surface targets for RMS.

Expression of L1CAM in RMS Tumors and Inverse Correlation with Survival
We next investigated the expression of L1CAM on a tissue microarray (TMA) with 248 cores from 124 RMS tumors, consisting of 24 ARMS and 100 ERMS [60]. Not all the cores were evaluable, so in the end, 17 ARMS and 60 ERMS could be evaluated. Most of ARMS showed strong ( Figure 9A, upper) or medium ( Figure 9A, lower) L1CAM staining. In contrast, 95% of ERMS were negative. The H-score indicates how 85% of ARMS have high L1CAM expression, while the great majority of ERMS is negative.
We next investigated the expression of L1CAM on a tissue microarray (TMA) with 248 cores from 124 RMS tumors, consisting of 24 ARMS and 100 ERMS [60]. Not all the cores were evaluable, so in the end, 17 ARMS and 60 ERMS could be evaluated. Most of ARMS showed strong ( Figure 9A, upper) or medium ( Figure 9A, lower) L1CAM staining. In contrast, 95% of ERMS were negative. The H-score indicates how 85% of ARMS have high L1CAM expression, while the great majority of ERMS is negative.  To investigate the relevance of L1CAM expression for clinical prognosis, we took advantage of an expression data set with survival information. We compared the overall survival of ARMS patients with mRNA levels of L1CAM. The best cut-off value was determined as 123.3 (range 3-390), and this revealed a significantly worse survival probability of ARMS patients with high L1CAM expression (p = 0.044; Figure 9C). Performing the same analysis on the whole cohort, including ARMS and ERMS patients, a cut-off of 57.3 resulted in a more significant logrank p-value of 0.0016, likely reflecting the better survival probability of ERMS vs. ARMS, and the L1CAM expression restricted to ARMS. In conclusion, L1CAM is highly expressed in the majority of ARMS, and within this histological subclass, higher expression of L1CAM seems to define a group of patients with even worse prognoses. Taken together, L1CAM targeted therapies could provide a therapeutic option for ARMS patients with very poor prognoses.

Discussion
In this work, we identified 699 surface proteins by performing a surfaceome profiling by differential centrifugations enrichment of surface/membrane proteins and LC-MS detection with six FP-RMS cell lines, five FN-RMS cell lines, and three RMS PDXs. Ranking of the protein based on iTop3 expression analysis, mRNA expression, and expression in control normal MRC-5 fibroblasts and myoblasts, followed by statistical analysis and investigation of protein and mRNA expression in normal tissues, yielded nine surface proteins highly expressed in RMS and with low expression or absent in normal tissues: FGFR4, MEGF10, CD276, AGRL2, GPC4, JAM3, CADM2, NCAM1, and L1CAM. Expression of three of these candidates-AGRL2, JAM3, and L1CAM-on RMS cell lines was confirmed by FACS.
In this study, we found two well-known and investigated targets for RMS, FGFR4 [37][38][39] and N1CAM [22][23][24][25], validating our approach. CD276 (B7-H3) has also been recently shown to be consistently overexpressed in RMS with high expression in 92% of FP-RMS and with medium-high intensity in 100% of FN-RMS tumors [62]. CD276 expression is regulated by the fusion protein PAX3-FOXO1 found in FP-RMS [63], and the monoclonal antibody 8H9, binding to a wide spectrum of tumors, including RMS, was found to target CD276 [64,65]. The B7-H3-targeting antibody-drug conjugate m276-SL-PBD was potently effective against pediatric cancers in preclinical solid tumor models, including RMS [66]. Expression of CD276 on RMS cells was independently identified by another group by surfaceome profiling and was shown to be a mediator of immune evasion [51]. All these results confirm that CD276 is a relevant target for RMS.
L1CAM. Among the novel targets not previously associated with RMS before, targeting approaches are most advanced for L1CAM, which is highly and consistently overexpressed in neuroblastoma [67][68][69], ovarian cancer [70,71], and testicular germ cell tumors [72]. L1CAM was very early targeted with CAR T cells [73], and the effort to improve the CAR design continues (NCT02311621). Our results show that 85% of ARMS are strongly positive for L1CAM, and 95% of ERMS are negative. In a large study of 5155 tumors, expression of L1CAM was found in 50% alveolar (FP) RMS (n = 42) and in 15% embryonal (FN) RMS (n = 55) [74], confirming our observation. Here, we also show higher expression of L1CAM in ARMS compared to ERMS at the mRNA level. So far, no attention has been dedicated to targeting RMS with L1CAM antibodies or CAR T-cells, but our results would suggest that a small group of RMS patients with the poorest prognosis might benefit from such an approach.
AGRL2, or Adhesion G protein-coupled receptor L2, is an adhesion G-protein-coupled receptor (aGPCR) that was first described in 2000 [75], and whose function has not been well investigated yet. Like other aGPCRs, AGRL2 has been associated with cancer (reviewed in [76]). AGRL2 was found to be upregulated by transcriptome profiling in urothelial carcinoma [77]. To the best of our knowledge, its expression or function have never been investigated in RMS.
CADM2, cell adhesion molecule 2, belongs to the immunoglobulin superfamily and regulates cell adhesion, in particular synaptic assembly [78,79]. Its role in cancer is not completely clear: it is overexpressed in glioma [80], prostate cancer [81], and renal carcinoma [82], in which it can act as a tumor suppressor, but it promotes tumor metastasis in other cancers such as non-small cell lung cancer metastasis [83] and in hepatocellular carcinoma [84], with a role in epithelial to mesenchymal transition (EMT). In our analysis, CADM2 was significantly upregulated in all three RMS groups, FP-RMS, FN-RMS, and PDXs; and its expression in normal tissues was restricted to the brain. CADM2's role and expression in RMS still need to be investigated.
MEGF10 is a single transmembrane protein with particularly high expression in the CNS [85] and muscles [86,87]. In muscles, the expression seems to be restricted to satellite cells, the muscle progenitor cells, and MEGF10 mutations are associated with myopathies [88]. MEGF10 was among eleven RMS markers with high expression in RMS and low/no expression in normal peripheral blood or bone marrow to detect disseminated disease [89]. The overexpression of MEGF10 in RMS might be related to a block in myogenic differentiation [90]. Our analysis revealed a very restricted expression in normal tissues; however, CNS expression must be carefully evaluated to assess the safety of possible therapies targeting MEGF10. Overall, MEGF10 is a very appealing target for RMS therapy.
GPC4 belongs to the glypicans family, a family of heparan sulfate proteoglycans that are attached to the cell membrane via a glycosylphosphatidylinositol anchor, with a known role in cancer. So far, only GPC3 and GPC5 [91,92] have been associated with RMS (reviewed in [93]), but not GPC4. Several CAR constructs against glypicans have been developed, but so far, no GPC4 CAR has been reported [93], making the expression of GPC4 in RMS appealing for novel CAR design. JAM3, or Junctional Adhesion Molecule (JAM) C, mediates heterotypic cell-cell interactions with its cognate receptor JAM2 [94,95]. JAM3 is involved in homing and mobilization of hematopoietic stem and progenitor cells within the bone marrow and by homology with zebrafish, might be involved in myocyte fusion [96,97]. JAMs are clearly involved in cell migration, polarization, and adhesion, and they are involved in cancer cells proliferation, migration, and invasion (reviewed in [98]). The function or expression of JAM3 in RMS has never been investigated.
Among the RMS surface targets previously identified, HER2/ERBB2 is missing from our selected list. HER2 CAR T cells are being tested for RMS therapy, and one encouraging success has been reported [36]. HER2 was detected in FP-RMS cell lines and FN-RMS cell lines but not in PDXs; therefore, it was scored low and was also not selected in the following stringent analyses. A less stringent selection might have selected HER2/ERBB2, but also a higher number of proteins. Alternatively, the lack of identification of HER2/ERBB2 might reflect the heterogeneous expression of HER2 observed within tumors [99]. It is interesting to note, that HER3/ERBB3 was included in the Top100 list and showed significant expression in FP-RMS and FN-RMS cell lines, and PDXs. HER3/ERBB3 seems to be expressed in RMS more consistently than HER2/ERBB2 [99]. Although these results are dependent on the antibodies used and should be interpreted carefully, it is tempting to speculate that HER3 might be a good alternative to HER2/ERBB2 as a target for CAR T cell therapy in RMS.
One limitation of this type of study is posed by the availability of normal controls. Cultured primary cells like myoblasts, often used as a normal control for RMS, which express myogenic markers, or like fibroblasts, can be assumed to represent normal tissues; however, their surface expression can differ from normal tissues and can therefore serve only as a first screening tool. Proteomic databases representing ideally all human tissues are extremely useful to prioritize the targets with low expression in normal tissues. The challenge is how to compare our own data, e.g., surfaceome, with the reposited data that are derived from whole tissues and globally normalized. Detection of a protein in normal tissue does not disqualify it from being a viable therapeutic target. The difference in expression between tumor and normal tissue needs to be big enough to allow for selective targeting. Therefore, careful quantitative evaluation of the expression is mandatory. This is very important, since the identification of proteins exclusively expressed on tumors is a very rare event. The final evaluation of the therapeutic window needs to be performed in more complex model systems, non-human primates, and eventually in patients.
In conclusion, surfaceome profiling of cultured tumor cells is a very powerful tool to identify novel putative cell surface targets for antibody-based therapies, such as CAR T-cell therapy. Here, we confirm FGFR4, NCAM1, and CD276 as specific RMS targets, and identify AGRL2, JAM3, MEGF10, as promising candidates. In particular, high L1CAM expression observed in the aggressive ARMS histological subtype, and its inverse correlation with survival, support further investigation of L1CAM targeted therapies for patients with dismal prognosis.

Cell Surface Proteins Isolation
Membrane/surface proteins were enriched with two methods: (1) Cell surface biotinylation and isolation (Thermo Fisher Scientific, #A44390), following the manufacturer's instructions; (2) with a two-step protocol of ultracentrifugation and high salt washes [52]. Briefly, 1 × 10 7 cells were seeded on five P15 dishes. On the day of the experiment, 80-90% confluent cells were gently washed twice with PBS at RT, collected with a scraper, and centrifuged at 700× g at RT for 5 min. After resuspension in 1 mL cold hypotonic buffer (50 mM Mannitol, 5 mM HEPES, pH 7.4), the cells were homogenized with 1 min sonication (10% duty cycle, Branson Sonifer 250, Thermo Fisher Scientific) and centrifuged at 600× g at 4 • C for 10 min. The supernatant was then processed following differential centrifugations: 15,000× g, 4 • C for 5 min; wash in 10 mM CaCl 2 ; shaking at 4 • C for 10 min; 3000× g at RT for 15 min; 48,000× g for 30 min at RT; wash in 1 M KCl; 48,000× g at RT for 30 min; wash in 0.5 mL 100 mM Na 2 CO 3 ; 48,000× g at RT for 30 min. Next, all the samples were resuspended in 20 µL Laemmli buffer (62.5 mM TrisHCl, pH 6.8, 1% SDS, 10% Glycerol, 40 mM DTT) and separated by 1D gel-electrophoresis, 1.5 cm long gel-migration. For all the cell lines, three replicates were obtained. The SDS gel was fixed with 10% glacial acetic acid/40% EtOH, stained with 0.1% Brilliant Blue G in 45% EtOH/10% acetic acid and destained with 10% glacial acetic acid/40% EtOH in order to visualize the protein bands. Each lane was cut in four horizontal bands, and each band was further cut into six gel cubes. The six pieces of gel were then stored in 20% EtOH at 4 • C until processing.

In-Gel Digestion and Mass Spectrometry (MS)
MS experiments were performed in collaboration with the DBMR proteomics core facility (University of Bern). Proteins were in-gel digested as previously described [102]. Digests were loaded onto a precolumn (C18 PepMap 100, 5 µm, 100 A, 300 µm i.d. × 5 mm length, Thermo Fisher Scientific) at a flow rate of 50 µL/min with solvent C (0.05% TFA in water/acetonitrile 98:2). After loading, peptides were eluted in back flush mode onto a homemade C18 CSH Waters column (1.7 µm, 130 Å, 75 µm × 20 cm) by applying a 40 min gradient of 5% acetonitrile to 40% in water, 0.1% formic acid, at a flow rate of 250 nL/min. The column effluent was directly coupled to a Fusion LUMOS mass spectrometer (Thermo Fischer Scientific) via a nano-spray ESI source. Data acquisition was made in data-dependent mode with precursor ion scans recorded in the orbitrap with a resolution of 120,000 (at m/z = 250) parallel to top speed fragment spectra of the most intense precursor ions in the Linear trap for a cycle time of 3 s maximum.
The samples were searched and quantified with MaxQuant [103] version 2.0.1.0, using the SWISS-PROT [104] Homo sapiens database (April 2021 release) containing isoforms, and to which common contaminants were added. Search parameters were the following: enzyme was set to strict trypsin, with a maximum of three missed cleavages allowed; the first search peptide tolerance was set to 10 ppm, and the MS/MS match tolerance to 0.4 Da; carbamidomethylation on cysteine was set as a fixed modification, while methionine oxidation, asparagine, and glutamine deamidation, and protein N-terminal acetylation were set as variable modifications. The matches between runs were enabled, with the corresponding fractions labeled 1 to 4. The Top3 values were calculated by first normalizing peptide forms with variance stabilization normalization [105] and imputing them (see below) before summing the top three intensities.
Imputation at the peptide level was performed as follows: if there was at most one nonzero value in a group of replicates, then the remaining missing values were drawn from a Gaussian distribution of width 0.3 times the sample standard deviation and centered at the sample distribution mean minus 2.8 times the sample standard deviation; any remaining missing values were imputed by the Maximum Likelihood Estimation (MLE) method [106].

MS Data Processing and Data Mining
MS-derived data were inspected with the Panther database (www.pantherdb.org (accessed on 15 September 2022)) to evaluate the amount of membrane-associated proteins and validate the experiments. To select membrane/surface proteins with higher confidence, two published lists of predicted/annotated membrane/surface proteins were used. List A, a list of 2886 predicted and experimentally validated surface proteins by SURFY with an accuracy of 93.5%, which is included in the Cell Surface Protein Atlas (CSPA), published by Bausch et al. (Supplementary Information, List A) [53]. List B [107], a comprehensive list of 7643 membrane/surface proteins generated bioinformatically, by pooling annotated surface proteins from Gene Ontology [108], transmembrane proteins predicted by hidden Markov models (TMHMM) [109], and glycosylphosphatidylinositol (GPI)-anchored proteins [107] ( Supplementary Information, List B).
Subcellular localization of the putative targets was verified by using Genecards source (www.genecards.org (accessed on 15 September 2022)); protein expression in normal tissues was evaluated with Human Protein Atlas database (www.proteinatlas.org (accessed on 15 September 2022)) checking RNA expression (nTPM) and protein expression (score). The UniProt Knowledgebase was used to confirm single candidates as membrane proteins (www.uniprot.org (accessed on 15 September 2022)). Briefly, membrane/surface proteins classified in UniProt as "reviewed" were sorted by the keywords "Homo sapiens" in Taxonomy and "Transmembrane" in Subcellular location searching fields, and the corresponding gene names were converted into UniProt KB ID.

Scoring Strategy for Sorted Membrane/Surface Proteins
The membrane/surface proteins extracted from the MS data were further processed to determine the Top100 upregulated surface proteins. A stringent scoring was designed to assign lower grades to the most RMS-specific candidates, expressed at the highest levels ( Table 1).

Statistical Analysis
Differential expression by moderated t-statistics and significance evaluation was performed following Uldry et al., 2022 [110], with a minimum of log 2 fold change of 1 and a maximum adjusted p-value of 0.05 for each individual comparison using the imputed Top3 intensities for each set of cells, Results for C-list proteins were summarized by plotting on the x-axis the average log 2 fold changes between each cell of the set and MRC-5 and Myoblasts, and on the y-axis, the average of the corresponding moderated t-statistics of the comparisons. Proteins for which the moderated t-statistics were above 2.132 (95th percentile of the corresponding Student's distribution) in all three sets of cells were considered of interest. Graphs were generated with R. The linear mixed model (LMM) was derived from the R implementation DREAM [56] and was used to perform a statistical evaluation of all the respective FP-RMS, FN-RMS, and PDXs groups versus the controls while accounting for the fact that each subgroup of replicates are repeated measurements. Differential expression and significance evaluation were performed as above. Volcano plots were generated with the online tool VolcanoNoseR [111].

Transcriptomics Data Analysis
The mRNA expression data of the genes corresponding to the Top100 putative targets for RMS tumors and normal tissues were obtained from the RMS whole-transcriptome sequencing data set (dbGaP Study Accession: phs000720.v3.p1), reported in 2021 by Brohl et al. [49].

Scoring Strategy for mRNA Data from RMS Tumors
The mRNA levels of the genes for the Top100 proteins were ranked by applying the scores in Table 2.

Flow Cytometry Analysis
Cells were detached with Accutase (Thermo Fisher Scientific) for 10 min at 37 • C, washed with PBS, and counted. 100,000 cells were incubated in 100 µL FACS buffer (2% BSA in PBS) with the primary antibodies at the optimized concentrations for 30 min at RT. Flow cytometry measurements were performed with a CytoFLEX device (Beckman Coulter, Krefeld, Germany). The results were analyzed by FlowJo v10.8.1 Software (BD Life Sciences, Allschwil, Switzerland).

Tissue Microarrays
A tissue microarray with 248 cores from 124 RMS tumors (24 ARMS, of which 17 with known FOXO1 gene rearrangements and 100 ERMS) was constructed [60]. Tumors used were collected at the University Hospital Zurich, Switzerland and at the Kiel Pediatric Tumor Registry, Kiel, Germany. Immunohistochemistry was performed essentially as described in [72] by using the monoclonal antibody anti-L1CAM (clone 14.10, directed to the ectodomain, 1:200).

Survival Analysis
The correlation between L1CAM mRNA expression levels and RMS survival was analyzed with the dataset "Rhabdomyosarcoma Davicioni 147" publicly available through the R2 Genomics Analysis and Visualization Platform (http://r2.amc.nl; ps_avgpres_ rmstriche147_u133a (accessed on 12 January 2023)), derived from a comprehensive analysis of 147 RMS samples [39], and survival data were obtained from the supplementary Tables in Davicioni et al. [112]. The Kaplan-Meier plot was generated with https://kmplot.com (accessed on 15 Juanuary 2023) autoselecting for best cut-off and performing univariate Cox regression as described [113]. Significance was computed using the Cox-Mantel (logrank) test [61]. Data Availability Statement: Proteomics data have been deposited to proteomeXchange.org (identifier PXD039480) [114].

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.