Protein Expression of AEBP1, MCM4, and FABP4 Differentiate Osteogenic, Adipogenic, and Mesenchymal Stromal Stem Cells

Mesenchymal stem cells (MSCs) gain an increasing focus in the field of regenerative medicine due to their differentiation abilities into chondrocytes, adipocytes, and osteoblastic cells. However, it is apparent that the transformation processes are extremely complex and cause cellular heterogeneity. The study aimed to characterize differences between MSCs and cells after adipogenic (AD) or osteoblastic (OB) differentiation at the proteome level. Comparative proteomic profiling was performed using tandem mass spectrometry in data-independent acquisition mode. Proteins were quantified by deep neural networks in library-free mode and correlated to the Molecular Signature Database (MSigDB) hallmark gene set collections for functional annotation. We analyzed 4108 proteins across all samples, which revealed a distinct clustering between MSCs and cell differentiation states. Protein expression profiling identified activation of the Peroxisome proliferator-activated receptors (PPARs) signaling pathway after AD. In addition, two distinct protein marker panels could be defined for osteoblastic and adipocytic cell lineages. Hereby, overexpression of AEBP1 and MCM4 for OB as well as of FABP4 for AD was detected as the most promising molecular markers. Combination of deep neural network and machine-learning algorithms with data-independent mass spectrometry distinguish MSCs and cell lineages after adipogenic or osteoblastic differentiation. We identified specific proteins as the molecular basis for bone formation, which could be used for regenerative medicine in the future.

In this study, we focus specifically on the characteristics of MSCs in the context of potential treatment interventions in bone lesions. MSCs can differentiate into osteoblasts (OB), adipocytes (AD), chondrocytes and muscle cells [20]. Although the utilization of MSCs as a cellular therapy approach is clinically investigated, the experimental efficacy of MSCs concerning bone regeneration was reported inconsistently regarding the successful in vivo bone formation [18,[21][22][23]. Next to insufficient marker panels for MSC classification, the most likely causes for this observation are cellular and molecular variations of the bone marrow, which does not consist of a homogenous cell type but rather of a population with high cellular heterogeneity containing multipotent stem cells, progenitors and differentiated cells [24][25][26][27]. Furthermore, reports suggest that bone marrow contains clonal MSC subpopulations with associations towards either osteoblast or adipocyte lineage. Hence, MSCs show cellular heterogeneity in the context of in vitro osteoblast differentiation, resulting in heterogenous bone formation capacity in vivo [24,[28][29][30][31][32][33][34] and certain bone diseases such as osteoporosis [35][36][37].
Since the nature of the MSC differentiation into osteoblasts and adipocytes remains unclear [23,38,39], the specific objective of this study was to characterize undifferentiated MSCs and cells after osteoblastic and adipocytic differentiation on the proteome level. We utilized quantitative mass spectrometry in data-independent acquisition mode combined with machine-learning algorithms for mass spectrometric and statistical evaluation. Generated quantitative proteome data were used to gain further insight into functional annotation including intracellular signaling pathways and gene ontology terms for potential clinical applications. In the future, identified protein candidates for adipocytes or osteoblasts could be used for regenerative therapeutic approaches healing bone fractures or bone diseases.

Cultured Cell Populations Fulfilled Phenotypic Criteria of MSC, Adipocytic and Osteoblastic Criteria
The individual MSC isolates (n = 5), each derived from one single individual (patients p11, p13, p15, p17, and p18), were characterized in vitro (Supplementary Table S1), and have been published elsewhere [32,38]: using a colony-forming unit fibroblast (CFU-F) assay, cultured MSCs formed colonies and expressed alkaline phosphatase (ALP) (median percentage of positive cells ± SD, 37.19% ± 10.91%). The cells were further characterized by their MSC surface marker expression recommended by the International Society for Cellular Therapy (ISCT [39]). Cell material of p11, p15, and p18 were limited, wherefore the determination of all surface markers was only possible for p13 and p17. However, p13 and p17 showed positive expression values for CD44, CD90, CD105 and CD73 in ≥99% (max. SD of 0.16%).
After induction of in vitro osteoblastic differentiation, matrix mineralization was visualized by alizarin red staining and compared to cells cultured in control media (Supplementary Figure S1a). Differentiated cells showed a 3.53-fold higher alizarin red intensity after 14 days of culturing. Additionally, cells showed a 6.35-fold higher ALP activity compared to the control cells. Adipocytic differentiation was assessed by quantifying the lipid droplet area based on Oil Red O staining after 14 days. All cell cultures showed visible lipid droplets after culturing in adipocyte induction media (Supplementary Figure S1b). These results confirmed the successful differentiation of MSCs into adipocytic and osteoblastic cell types, respectively.

Combined Quantitative Mass Spectrometry and Neural-Network-Based Algorithms Revealed Distinct Protein Expression Patterns of MSCs, Adipocytic and Osteoblastic Cells
To compare differences between adipocyte (AD), osteoblast (OB) lineage cells, and undifferentiated MSCs on the proteome level, we performed microflow ESI-MS/MS analysis using the data-independent acquisition mode for exact quantification. The neuronal network-based workflow of DIA-NN identified 4569 unique proteins (FDR ≤ 0.01 on both precursor and protein levels). After filtering and imputation of missing values (cf. Section 4.12), quantitative values of 4108 proteins were subsequently compared between groups (AD, OB, and MSCs) to reveal distinct clustering between samples, differentially expressed proteins, and enriched gene sets (cf. Section 4.13). A table containing all protein quantification values is included in Supplementary Table S2.

MSCs, Osteoblasts, and Adipocytes Show Distinct Clustering Behavior and Involvement of PPAR
Phenotypic differences between distinct cell types were visualized using unsupervised principal component analysis (PCA) and hierarchical clustering (so-called 'heatmap') applying detected protein expression data. Both, PCA and hierarchical clustering showed a high discriminating potential and clear separation between the three groups ( Figure 1A and Supplementary Figure S2). Analogous to the mass spectrometry-based analysis, twodimensional gel electrophoresis was performed. Here, 1517 protein spots were detected and revealed a similar clustering behavior in the PCA plot ( Figure 1B).

Discovery of Differentially Expressed Proteins
For the detection of differentially expressed proteins, all three two-group comparisons were carried out using an ANOVA with Benjamini-Hochberg correction and Tukey post hoc testing (Supplementary Table S3). While the comparison between OB and MSCs identified 50 differentially expressed proteins (post-hoc q-value < 0.05, |log 2 FC| > 2) with 25 proteins being over-and under-expressed in the OB group, respectively (Figure 2a), the comparison OB versus AD revealed 66 proteins with differential abundance (16 over-and 50 under-expressed in the OB group, Figure 2b). The evaluation between AD and MSC revealed 61 proteins with a higher and 25 with lower protein abundance (86 differential expressed proteins in total, Figure 2c).

Two Distinct Protein Panels Differentiate Osteoblasts and Adipocytes from Mesenchymal Stem Cells
Next, the separation of the three distinct cell types based on their protein expression profiles was analyzed. In total, 15 proteins which were both identified as differentially expressed in the OB vs. MSC and OB vs. AD comparison were defined as osteoblastspecific proteins. In analogy, 30 proteins that were differentially expressed in the AD vs. MSC and OB vs. AD comparison were defined as adipocyte-specific proteins (Figure 2d). To validate the results of the ANOVA analysis, expression data were further evaluated by computing their variable importance using machine learning algorithms (LASSO [41], elastic nets [42], random forests [43], and extreme gradient boosting [44]): the results confirmed 9 out of 15 proteins (60%) for the osteoblastic and 13 out of 30 proteins (43%) for the adipocytic panel.

Osteoblastic Panel
A total of nine differentially expressed proteins were calculated combining two feature selection approaches (classical statistical analysis and machine learning algorithms) for the osteoblastic panel showing six (66%, AEBP1, BGN, CARMIL1, CYP24A1, MCM4, STMN1) with a higher and three (33%, COL3A1, MEST, P4HA1) with lower expression in the osteoblastic group. Expression levels of these nine osteoblast-specific proteins are visualized in Figure 3a. Closer inspection of the figure showed a protein expression of AEBP1 and MCM4 in the AD and MSC group with uniformly low expression values (range of means: 1.09-1.60) and coefficients of variation (CV, range: 0.28-0.64).

Adipocytic Panel
For the adipocytic panel, a total of 13 differentially expressed proteins (ACSL1, CD36, EPHX1, FABP4, HP, HSD11B1, ITIH1, MAOA, PLIN1, PLIN4, PLPP1, RAP2A, SCD) were detected all being more highly expressed in the AD group. Expression levels of these 13 adipocyte-specific proteins are visualized in Figure 3b. It is apparent from this figure that the protein expression of FABP4, ITIH1, SCD, PLIN1, and PLIN4 in the OB and MSC group presented uniformly low expression values (range of means: 0.24-1.04) and coefficients of variation (CV, range: 0.36-1.72).
As a summary, differentially expressed proteins which were cell-type specifically differentially expressed and further validated by the machine learning strategies are presented as a heatmap in Figure 4 and Supplementary Table S4 including gene symbols, UniProt IDs, descriptive statistics, ANOVA with post-hoc results, fold changes and functional GO-term annotations.

MSC Marker Protein Panel Comparison
The Mesenchymal Tissue Stem Cell Committee of the International Society of Cellular Therapy (ISCT) defined a set of cluster differentiation (CD) cell surface markers for MSC classification (CD105+, CD73+, CD90+, CD45-, CD34-, CD14-, CD11b-, CD79a-or CD19and HLA-DR-). However, reports suggest that this marker panel is insufficient to distinguish undifferentiated MSCs from differentiated adipocytes or osteoblasts [32,38]. This inconsistency is also supported by our quantitative protein data shown in Table 2. It is apparent from this table that all ISCT-defined markers identified showed no significance across comparisons. Additionally, CD90 was the only protein to present a net log 2 fold change >1.0 (OB vs. MSC comparison). All CD markers that are defined by a very low protein level by the ISCT were not identified in the analyzed samples.

Discussion
Therapy development for bone regeneration is highly challenging and depends on the in vitro differentiation of MSCs into favorable bone-forming osteoblasts. It has been shown that, e.g., osteoblast-like cells derived from MSCs can prevent glucocorticoid-induced bone loss [45] and support bone regeneration [46]. However, the prediction of MSCs to differentiate into osteoblasts is hampered by the incomplete biomolecular understanding and the lack of cellular biomarkers that define the quality of cells designated for therapy [29]. The present study was designed to determine differences between MSCs and their differentiated adipocytes (AD) and osteoblasts (OD) on the proteome level. Although previous studies evaluated the relationship between MSCs and osteoblasts using proteomics approaches such as mass spectrometry [47][48][49][50], this is the first report comparing the global proteome of MSCs, osteoblasts, and adipocytes by using label-free mass spectrometry in data-independent acquisition mode for quantification. Additionally, updated workflows for machine learning algorithms were applied for in-depth protein identification and data evaluation.
Cluster analysis of proteomics data indicated a clear separation of all cell types along with an enrichment of hallmark gene sets, e.g., associated with adipogenesis including, e.g., PPAR (Peroxisome proliferator-activated receptors) signaling gene sets which is considered as the key master transcription regulator in adipocytes [51]. The Mesenchymal and Tissue Stem Cell Committee of the International Society for Cellular Therapy (ISCT) has defined MSC by a set of present (+) and absent (-) cluster of differentiation (CD) markers (CD105+, CD73+, CD90+, CD45-, CD34-, CD14-, CD11b-, CD79a-or CD19-and HLA-DR-), their plastic adherence capacity and their multipotent differentiation potential when cultured in standard conditions as minimal quality [39]. However, it has been reported that positive markers of the ISCT panel are homogenously expressed among all MSC progeny [32]. Consistent with the literature, our protein analysis demonstrated that CD105, CD73, and CD90 were detectable but did not show a differential expression between MSCs, adipogenic and osteoblastic cells (Table 2). Additionally, all ISCT-defined markers that should present a very low protein abundance in MSCs have not been identified at all. These findings suggest that (a) our applied mass spectrometric workflow is capable of validating ISCT + -markers and (b) new protein markers to differentiate between MSCs, adipocytes, and osteoblasts are needed.
The model of the MSC differentiation into AD and OB allows us to determine their global proteome using quantitative mass spectrometry combined with neuronal networks and machine-learning algorithms for the first time. Interestingly, Aasebø et al. performed the only recent similar study and reported a strong separation of osteoblasts and MSCs [50] by comparing mass spectrometric data after data-dependent acquisition. However, and in contrast to our data, the proteome of adipocytes was not evaluated and only proteins were reported that showed an abundance level in the osteoblasts and not in the MSCs (n = 156). The findings of the here presented data considered positive and negative effect sizes (log 2 FC > |2|) resulting in 50 differentially expressed proteins between MSCs and osteoblasts. In line with the results presented by Aasebø, we detected decorin (DCN) and biglycan (BGN) as potential protein markers for osteoblastic differentiation (Supplementary Table S2): while BGN acts on the cell surface and is involved in the matrix mineralization [52][53][54], DCN was described to promote osteoblast differentiation fate [55]. It must be noted though that DCN could not be validated by machine learning algorithms and was thus not included in our final differentiation marker panel for osteoblastic differentiation.

Osteoblastic Panel
Overall, the applied algorithms revealed nine specific proteins for the osteoblastic differentiation, respectively. Analogous to BGN, cytochrome P450 family 24 subfamily A member 1 (CYP24A1), AE binding protein 1 (AEBP1), and collagen type III alpha 1 chain (COL3A1) concern functions of bone mineralization, osteoblastogenesis as well as matrix remodeling and thus confirmed a close association to the osteoblastic molecular differentiation module [56][57][58][59][60]. Strikingly, CYP24A1 presented the highest fold-change comparing OB to MSC (log 2 FC 3.65) and OB to AD (log 2 FC 3.52). Further, three identified proteins were reported to be associated with osteoblastic diseases: while Stathmin 1 (STMN1) was described for osteoblast and osteoclast function [61] as well as for osteopenic phenotypes in mice [62], we found minichromosome maintenance complex component 4 (MCM4) which seems to play an important role during cell division and metastasis-free survival of osteosarcoma patients [63]. Further, prolyl 4-hydroxylase subunit alpha 1 (P4HA1) has been found as the active catalytic component of the prolyl 4-hydroxylase which catalyzes the post-translational formation of 4-hydroxyproline [64,65]. High P4HA1 gene and protein expression values have been recently described as a prognostic predictor in head and neck squamous cell carcinoma [66] as well as primary melanomas [67]. Additionally, P4HA1 is associated with the collagen-dependent bone disease osteogenesis imperfecta [68,69]: as P4HA1 is involved in the post-translational modification of collagens, a direct involvement in the pathogenesis of osteogenesis imperfecta is conceivable. Noteworthy, AEBP1 and MCM4 presented low protein levels in the AD and MSC groups making AEBP1+ and MCM4+ most suited as new OB or ISCT differentiation markers with a known osteoblastic background.
No literature link for osteoblastic differentiation or diseases was found for the proteins capping protein regulator and myosin 1 linker 1 (CARMIL1), and mesoderm specific transcript (MEST). While the CARMIL1 was more highly expressed in the OB than in the MSC and AD groups, the expression pattern for MEST was reduced. The plasma-membraneassociated protein CARMIL1 plays a role in the regulation of actin polymerization and cell migration. Specifically, CARMIL1 prevents the F-actin heterodimeric capping protein (CP) activity of migrating cells and thus stimulates actin polymerization [64]. In this context, it has been shown that activation of actin polymerization decreases osteoblast differentiation and bone formation in MSCs [70]. Since we observed high CARMIL1 levels in OBs, one could assume that the endpoint of an osteoblastic differentiation process is marked by high protein levels of CARMIL1 to activate actin polymerization and thus to stop cellular differentiation mechanisms.
Last, MEST-one of the markers with a low expression in osteoblastic cells-has been described to be involved in the mesoderm development and the regulation of lipid storage. Inline, it was described as a specific protein for the endoplasmic reticulum that co-localizes within lipid droplets in cells undergoing adipogenic differentiation [71]. Additionally, elevated gene expression of MEST in preadipocytes differentiating in adipocytes was described by Kadota et al. [72].
Noteworthy and to our best knowledge, no specific association to adipocyte differentiation was described for inter-alpha-trypsin inhibitor heavy chain 1 (ITIH1), RAP2A member of RAS oncogene family (RAP2A), epoxide hydrolase 1 (EPHX1), and phospholipid phosphatase 1 (PLPP1). While ITIH1 belongs to a protein family of related plasma serine protease inhibitors which is involved in extracellular matrix stabilization and the prevention of tumor metastasis [81], RAP2A is a small GTP binding protein that may regulate cytoskeletal rearrangements, cell migration, cell adhesion, and spreading [82]. EPHX1 is a member of the epoxide hydrolase family which plays a role in the metabolism of endogenous lipids such as epoxide-containing fatty acids [83] and fulfills a key function in the detoxification of xenobiotics [84]. Lastly, PLPP1 is a magnesium-independent phospholipid phosphatase of the plasma membrane and is associated with the regulation of inflammation, platelet activation, cell proliferation, and migration [85,86]. Against this background, it could be hypothesized that all four proteins may play a reasonable role in adipocytic processes. However, further studies including larger patient collectives are required.
In conclusion, this study set out to characterize undifferentiated MSCs and cells after osteoblastic and adipocytic differentiation on the proteome level. These experiments identified 22 highly cell type-specific proteins for MSCs, adipocytes, and osteoblasts using the combination of deep neural network-based quantification of data-independent mass spectrometry data. Overexpression of AEBP1 and MCM4 for OB as well as of FABP4 for AD differentiation seem to be the most promising molecular targets which could be used for regenerative medicine, stem cell, and cancer research in the future. Further studies to evaluate the molecular basis for bone formation including single-cell criteria and clinical patient data are warranted.

Donors and Materials
Bone marrow was aspirated from the lower extremities of five adult donors undergoing surgeries at the Department of Orthopedics and Traumatology, Odense University Hospital. The donor collective consisted of one male and four females. The bone marrow samples were considered as 'waste material' from routine operations and were thus collected without any extra patient risk. All donors received oral and written information and signed a consent. The project was approved by the Scientific Ethics Committee of Southern Denmark (project ID: S-20160084).

Cell Isolation and Culture
Bone marrow (5-10 mL) was collected into ethylenediaminetetraacetic acid (EDTA)coated vacutainers. MSCs were isolated from the mononuclear cell population following gradient centrifugation using Lymphoprep of the bone marrow, through plastic adherence, as described previously by Stenderup et al. [87]. The cells were cultured in minimum essential medium (MEM medium) supplemented with 10% fetal bovine serum (FBS) and 1% penicillin/streptomycin (P/S) at 37 • C in a humidified incubator with 5% CO 2 . The medium was switched to MEM medium supplemented with 10% FBS, 1% P/S, 1% GlutaMAX, 1% sodium pyruvate, and 1% nonessential amino acids (S-MEM growing medium) after the first visualization of adherent cells. At 80% confluence, the cells were trypsinized and used for analysis and further cell expansion.

Colony-Forming Unit-Fibroblast (CFU-f) Assay
The colony-forming unit-fibroblast (CFU-f) assay was performed to assess the colonyforming capacity of cultured MSCs. The freshly isolated cells were counted in triplicates using a hemocytometer under an optical microscope and plated at a density of 1 million cells (passage 0) into three 22.1 cm 2 Petri dishes (TPP, 93060). Standard culture conditions were used for 17 days and colonies were visualized by crystal violet staining.

Cell Proliferation
The cell proliferation capacity assay was performed at the first cell passage in triplicates. The cells were counted in a hemocytometer under an optical microscope, seeded (1000 cells/well) in a 6-well plate (TPP, 92006), and cultured under standard conditions. On days 1, 3, 6, 9, 12, and 15, the cells were trypsinized and counted in a hemocytometer. The proliferation capacity of the cells was measured as the area under the curve (AUC).

Osteoblastic Differentiation
For the osteogenic differentiation, MSCs at first passage were seeded in a 4-well plate at a density of 20,000 cells/cm 2 . At 90% confluence (after 24 h), cell culture media were replaced with osteoblastic induction media containing: 10% FBS, 1% P/S, 5 mM β-glycerophosphate, 10 nM dexamethasone, 50 µg/mL vitamin C, and 10 nM vitamin D 3 . Osteoblastic induction media were replaced every 2-3 days. After 14 days, the osteoblastic differentiation was assessed by visualization of mineralized matrix formation via alizarin red staining. The cells were washed with PBS and fixed with 70% ice-cold ethanol at −20 • C for 1 h. Afterwards, the cells were washed with Milli-Q and incubated with alizarin red (pH = 4.2) for 10 min with rotation at room temperature (RT). Subsequently, the staining intensity of alizarin red was quantified using ImageJ software.

Alkaline Phosphatase (ALP) Activity
The alkaline phosphatase (ALP) activity is a common biochemical measure for osteoblast activity. Cells were washed with tris-buffered saline (pH 9), fixed with 3.7% formaldehyde-90% ethanol for 30 s at RT, and incubated with p-nitrophenyl phosphate (1 mg/mL) in 50 mM NaHCO 3 and 1 mM MgCl 2 , pH 9.6 at 37 • C. After 20 min of incubation, 3 M NaOH was added to stop the reaction. Absorbance was measured at 405 nm, and ALP activity values were corrected for the number of cells in each well. The cell number was determined as a measure of cell viability and determined by incubating the cells with CellTiter-Blue for 1 h at 37 • C. The fluorescent intensity at 560/590 nm (excitation/emission) was measured in the FLUOstar Omega plate reader.

Adipocytic Differentiation
MSCs of the first passage were plated at a density of 30,000 cells/cm 2 in a 4-well plate for 24 h. At near full confluence, the media were replaced with adipocytic induction media containing Dulbecco's modified Eagle's medium (DMEM) supplemented with 10% FBS, 1% P/S, 5% horse serum, 1 µM rosiglitazone (BRL) 49,653, 3 µg/mL insulin, 100 nM dexamethasone and 225 µM 3-isobutyl-1-methylxanthine (IBMX). Media were changed every 2-3 days. After 14 days, adipocytic differentiation efficiency was determined by visualizing the formation of mature adipocytes containing lipid droplets using Oil Red O staining. The cells were fixed with 4% paraformaldehyde (PFA) for 10 min at RT, washed with 3% isopropanol, and incubated with filtered Oil Red O solution (25 mg of Oil Red O in 5 mL of 100% isopropanol and 3.35 mL Milli-Q). Photomicrographs of the differentiated cells were captured using an Olympus optical microscope (×10 magnification objective) and quantified as the area of lipid droplets (average of 6 images per sample) using ImageJ software.  ice-cold PIH buffer centrifuged at 660× g for 3 min at 4 • C. The pellet was resuspended in 1 mL PIH buffer and transferred to pre-weighted cryo-tubes and centrifuged at 2700× g for 5 min at 4 • C. The pellet was frozen at −80 • C.
Gel image acquisition was performed using the Typhoon FLA 9000 laser scanner (GE Healthcare, UK). Subsequently, protein spots were evaluated using the software Progenesis SameSpots (Nonlinear Dynamics, Newcastle upon Tyne, UK, v4.1). The analysis included protein spot detection, background subtraction, and relative quantification.

Sample Preparation for High-Performance Liquid Chromatography (HPLC) and Electrospray Ionization Tandem Mass Spectrometry (ESI-MS/MS)
The sample preparation for mass spectrometry analysis was performed with a filter aided sample preparation (FASP) protocol [89]. Briefly, 100 µg samples diluted in 200 µL 8M uric acid in 0.1 M Tris-HCL (UA) were added to filter columns (30 k, AmiconUltra, Merck, Darmstadt, Germany) and centrifuged at RT, 14,000× g for 15 min. Additional 200 µL UA was added to the filter column and repeatedly centrifuged. After discarding the eluate, 100 µL 0.05 M IAA (solubilized in UA) was added to the filter column and incubated in the dark at RT for 20 min. The column was washed two times with UA before 100 µL 0.05 M ammonium bicarbonate (ABC) buffer was added to the columns and centrifugation at RT, 14,000× g for 10 min, twice. Next, 40 µL trypsin in ABC (enzyme:protein ratio 1:100) was added and left for incubation in a humid chamber at 37 • C overnight. The columns were centrifuged at RT (14,000× g for 10 min) before 40 µL of ABC were added. Finally, the filtrate was collected, lyophilized, and stored at −20 • C after centrifugation at 14,000× g for 10 min.

SWATH Data Processing
The raw SWATH data were processed using the software tool DIA-NN v1.7.16 (dataindependent acquisition by neural networks) developed by Vadim Demichev et al. [90]. The software was used in the high accuracy LC mode with RT-dependent cross-normalization enabled. Mass accuracy, MS1 accuracy, and scan window settings were set to 0, as DIA-NN optimizes these parameters automatically. The 'match between runs' function was used to first develop a spectral library using the 'smart profiling strategy' from the data-independent acquisition data. The human UniProtKB/swiss-prot database (version 2020/12/6) [91] was used for protein inference from identified peptides. Trypsin/P was specified as protease. The precursor ion generation settings were set to peptide length of 7-52 amino acids, the maximum number of missed cleavages to one. The maximum number of variable modifications was set to zero. N-terminal methionine excision and cysteine carbamidomethylation were enabled as fixed modifications. The neural network classifier was set to double-pass mode as it typically generates the best results and analysis time was not an issue here. The resulting report file was further processed in the DIA-NN R package [90] for MaxLFQ-based [92] protein quantification. A report was generated containing unique proteins (proteins that were not assigned to a group of homologs) that passed the FDR cut-off of 0.01 applied on the precursor level and were identified and quantified using proteotypic peptides only. All proteins in the final dataset were identified by at least two unique peptides. The proteins were mapped for their corresponding gene names, which were required for downstream analysis steps such as gene enrichment analysis. In this context, the terms proteins and genes are used interchangeably in this study report.
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium [93] via the PRIDE [94] partner repository with the dataset identifier PXD029900.

Quantitative Data Processing
The dataset was further processed with the R [95] package DEP [96], which allows for missing value filtering and imputation: proteins/genes that were not reliably identified in 4 out of 5 replicates of at least one condition were removed from the dataset. Variance stabilizing normalization was applied to the remaining dataset using the R package vsn [97]. To analyze all identified proteins, remaining missing values were imputed with a multiple imputation strategy, as those can conserve differential expression, maintain the original informational content of the dataset, and respect low concentrated/not detectable proteins [96,98]. Missing values were considered as missing not at random (MNAR) when protein quantity data were completely missing for at least one condition. All other missing values were considered missing at random (MAR). While MNAR were imputed with random draws from a Gaussian distribution centered around a minimal value, MAR were imputed with a k-nearest neighbor model. The imputed dataset was used for subsequent statistical analysis.

Data Analysis
The gene name of the corresponding protein was used for analyses. The bioinformatics platform Omics Playground v2.7.18 (BigOmics Analytics, Lugano, Switzerland) was partly used for protein quantity data analysis and visualization [99]. For a clustering analysis and visualization of the high-dimensional quantitative proteomics data, a principal components analysis plot was computed using the stats package provided by the R software suite [95], and visualized with the R package ggplot2 [100]. A heatmap (two-way hierarchical clustering) was computed in Omics Playground. Using the R package NbClust [101], we utilized several algorithms for the estimation of the adequate cluster numbers within the data set of the top 50 differentially expressed proteins (selected via their standard deviation). According to a majority vote of the NbClust algorithms, three is the best number of clusters for this data set. Thus, 'three clusters' was used as a parameter for the initialization of the unsupervised two-way hierarchical clustering method included in the Omics Playground software suite. The method was applied to the data set of the top 50 proteins and used to generate the heatmap.
The gene set enrichment was computed with Omics Playground using the merged results of the Fisher's exact test [102], fGSEA [103], and GSVA [104], thereby applying a FDR < 0.05 (which corresponds to a so-called 'meta.q value' which corresponds to the highest q-value provided by the used statistical methods) and absolute |log 2 FC| threshold of >0.2. The MSigDB gene set collection was used as the target database for the enrichment. All measured genes are used as 'universe' after filtering for non-expressed genes. The log 2 FC is calculated as the average log 2 FC of all genes identified in the particular gene set.
For a differential protein expression analysis, a Welch-ANOVA was computed using the stats package. The results of the Welch-ANOVA were corrected for multiple testing via the Benjamini-Hochberg procedure [105], selecting the candidates for post-hoc testing with a FDR of <0.05. Post-hoc testing was performed via the Tukey honestly significant difference (HSD) method using the TukeyHSD function of the R software suite. Significant differential protein expression was considered at a q-value of <0.05 and an absolute logarithmic fold change (|log 2 FC|) of >2. Volcano plots were created from the results of the differential expression analysis, using the plot function in R. For better visualization, the -log 10 q-values of the ANOVA were plotted on the y-axis and log 2 FC on the x-axis. A Venn diagram was created using the web tool InteractiVenn [106], using the differentially expressed proteins of the three comparisons as input sets. Boxplots for overlapping proteins were created in the R software suite using ggplot2. The 'biomarker' module of Omics Playground was used for the ranking of biomarkers that could be suitable for the characterization of the cell lineage. Here, a cumulative variable importance score for each feature was calculated using machine learning algorithms, including LASSO [41], elastic nets [42], random forests [43], and extreme gradient boosting [44] and provides the top 40 features according to the ranking of the cumulative score by the algorithms. The findings of the machine learning algorithms were used as further validation of our marker panels derived from classical statistical testing. The Panther classification system was used for the annotation of all overlapping proteins according to gene ontology terms of the biological process, molecular function, and cellular component [107].

Funding:
The study was performed as a part of the BONEBANK project (project number: 16-1.0-15) supported by Interreg 5a Germany-Denmark with funds from the European Regional Development.
Institutional Review Board Statement: The project was approved by the Scientific Ethics Committee of Southern Denmark (project ID: S-20160084).
Informed Consent Statement: All donors received oral and written information and signed a consent.

Data Availability Statement:
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium [93] via the PRIDE [94] partner repository with the dataset identifier PXD029900.
Acknowledgments: This work was supported by the German Network for Bioinformatics Infrastructurede.NBI, service center BioInfra.Prot, funded by the German Federal Ministry of Education and Research (BMBF)-Grant FKZ 031 A 534A. Additionally, we would like to thank V. Demichev for the support with the DIA-NN software. T. Sauer is grateful for the support from the Ad Infinitum Foundation.

Conflicts of Interest:
All authors declare that they have no competing financial interest. IK is CTO of BigOmics Analytics SA, the creator of the Omics Playground. TG is a member of the BigOmics Analytics Advisory Board.