Systematic Review of the Current Status of Human Sarcoma Cell Lines

Sarcomas are rare mesenchymal malignant tumors with unique biological and clinical features. Given their diversity, heterogeneity, complexity, and rarity, the clinical management of sarcomas is quite challenging. Cell lines have been used as indispensable tools for both basic research and pre-clinical studies. However, empirically, sarcoma cell lines are not readily available. To understand the present status of sarcoma cell lines and identify their current challenges, we systematically reviewed reports on sarcoma cell lines. We searched the cell line database, Cellosaurus, and categorized the sarcoma cell lines according to the WHO classification. We identified the number and availability of sarcoma cell lines with a specific histology. We found 844 sarcoma cell lines in the Cellosaurus database, and 819 of them were named according to the WHO classification. Among the 819 cell lines, 36 multiple and nine single cell lines are available for histology. No cell lines were reported for 133 of the histological subtypes. Among the 844 cell lines, 148 are currently available in public cell banks, with 692 already published. We conclude that there needs to be a larger number of cell lines, with various histological subtypes, to better benefit sarcoma research.


Introduction
Sarcomas are rare mesenchymal malignant tumors with unique biological and clinical features. Sarcomas are unique malignancies for several reasons. First, sarcomas originate from diverse mesenchymal tissue lineages such as adipose, muscle, fibrous, cartilage, nervous, and vascular tissues, or bone. Since these tissues are distributed throughout the human body, sarcomas can occur in almost all organs. Second, sarcomas are heterogeneous diseases that are pathologically grouped into more than 70 described subtypes [1]. The histological appearances do not necessarily represent their normal counterparts, and indeed, the original normal cells are not identified for most sarcomas. Third, sarcomas have high complexity at the molecular level, classifying them into two groups: genetically simple sarcomas, such as those bearing specific genetic alterations, and sarcomas with multiple, complex karyotypic abnormalities with no specific pattern [2,3].
Recent advances in genomic technology using next-generation sequencing have enabled the classification of sarcomas, which did not fit into known specific diagnostic categories [4]. Such classification may lead to innovative therapies. Finally, despite their diversity, heterogeneity, and complexity, sarcomas are rare, accounting for less than 1% of all malignancies. The reasons for this rarity are not well understood. Possible reasons include the need for unique genetic mutations for carcinogenesis, the small number of original cells, and the resistance of the original cells to carcinogenesis. Interestingly, sarcomas are prevalent in children and adolescents, where they account biomarker candidates is mandatory to convince collaborators to perform multi-institutional validation studies. Cell lines are mandatory to investigate the biological properties of biomarker candidates. Overall, without using cell lines, most of the anti-cancer drugs and biomarkers used in hospitals and the scientific discoveries written in text books could not have been achieved for cancers.
To date, different sarcoma cell lines have been developed. These cell lines represent a useful experimental model to examine the hypothesis about the etiology of diseases, to evaluate the molecular mechanisms of cancer progression, and to examine the effect of potent anti-cancer drugs at the cellular and subcellular levels. At the same time, besides the obvious utilities of cell lines in cancer research, researchers may be empirically aware that sarcoma cell lines are not readily available, probably due to the rarity of the disease, and a lack of proper cell lines hinders basic studies and development of effective therapies for sarcomas. In this review, we provide an overview of the current status of reported sarcoma cell lines, and finally discuss what types of sarcoma cell lines need to be established, what system needs to be created to promote sarcoma research using cell lines, and what biological studies need to be performed to improve the present status of sarcoma cell lines.

Search Strategy
All potentially relevant cell lines were identified by searching the Cellosaurus database (version 28, November 2018) [56,57]. The data file was downloaded from the website of Cellosaurus (https://web.expasy.org/cellosaurus/) and searched using ontology: human cell lines were searched with the term 'NCBI_TaxID=9606; 'Homo sapiens' in 'Species of origin', and sarcoma cell lines were searched with a term that was one of the children of the term 'NCIt:C3810 Connective and Soft Tissue Neoplasm' in 'Disease'. The NCI thesaurus ontology file was downloaded from the website of NCI Thesaurus (https://ncit.nci.nih.gov).

Eligibility Criteria and Selection
The following criteria had to be met for a cell line to be included in this review: cell line established from human patients with connective and soft tissue neoplasm, regardless of histology or original sites. Cell lines derived from other cell lines and modified by genes or reagents were considered as duplicates and excluded from the data analysis.

Data Collection Process
The following data were examined for each cell line using Python version 3.6.1 (https://www. python.org/): cell line name, disease, publications, and cell line collections.

Data Items
The focus of this review was on the availability of cell lines with specific sarcoma histology that could influence research activity. Thus, the primary end point of this review was the identification of the histology of the original tumor, as the cell lines are expected to reflect the features of tumors from which they derived; in addition, the study sought to identify what specific histology cell lines are needed to fill in the gaps. The histological classification was performed according to the classification by the World Health Organization [1]. Secondary end points were data availability and publication.

Information Sources
In the Cellosaurus database, the availability of cell lines was examined from the following cell cell lines. The publications in PubMed were also considered for investigation because these cell lines are available from public cell banks or published in academic journals and were, therefore, somehow assured by the public organizations or research community. In addition, the cell lines and their relevant data were supposed to be easily obtainable.

Results
Cell lines were chosen through a systematic review process (Figure 1). A total of 109,135 cell lines were identified in the Cellosaurus database. Of these, 27,518 cell lines were excluded because they did not originate from humans. In addition, 80,342 cell lines were further excluded because they were not derived from connective and soft tissue neoplasm. Among the resulting 1275 cell lines, 431 cell lines originated from other cell lines and were transfected with genes or treated with reagents; therefore, we excluded these cell lines from the analysis as duplicates. The final number used for the analysis was 844 cell lines; their information was extracted. In addition, we searched the Ximbio website (https://ximbio.com/) for published sarcoma cell lines. The publications in PubMed were also considered for investigation because these cell lines are available from public cell banks or published in academic journals and were, therefore, somehow assured by the public organizations or research community. In addition, the cell lines and their relevant data were supposed to be easily obtainable.

Results
Cell lines were chosen through a systematic review process ( Figure 1). A total of 109,135 cell lines were identified in the Cellosaurus database. Of these, 27,518 cell lines were excluded because they did not originate from humans. In addition, 80,342 cell lines were further excluded because they were not derived from connective and soft tissue neoplasm. Among the resulting 1275 cell lines, 431 cell lines originated from other cell lines and were transfected with genes or treated with reagents; therefore, we excluded these cell lines from the analysis as duplicates. The final number used for the analysis was 844 cell lines; their information was extracted.

Histology
The cell lines were categorized according to the WHO classification. The results are summarized in Supplementary Table 1. Among the 189 histological subtypes listed in the WHO classification, 45 had corresponding cell lines, while 133 did not (Supplementary Table 2). The histology of original tumors from which the cell lines were most commonly established included Ewing's sarcoma (156 cell lines), osteosarcoma (148 cell lines), and undifferentiated high-grade pleomorphic sarcoma (43 cell lines). Among the 189 histological subtypes listed in the WHO classification, multiple cell lines were established from 36 histological subtypes of sarcomas. On the other hand, a single cell line was established for each of the following nine histological subtypes: pleomorphic liposarcoma, desmoids-type fibromatosis, tenosynovial giant cell tumor, desmoplastic small round cell tumor, PEComa, osteoblastoma, small cell osteosarcoma, fibrosarcoma of bone, and benign fibrous histiocytoma/non-ossifying fibroma.
We found that there are cell lines that originated from other sarcoma cell lines and were modified by gene transfection or drug treatments (Table 2). Those modified cell lines are most common in osteosarcoma (275 cell lines). Among the 275 cell lines, U2OS cell lines are most commonly used as original cell lines for modifications. We found that there are cell lines for which the reported histology did not match that in the WHO classification (Table 3).

Histology
The cell lines were categorized according to the WHO classification. The results are summarized in Supplementary Table S1. Among the 189 histological subtypes listed in the WHO classification, 45 had corresponding cell lines, while 133 did not (Supplementary Table S2). The histology of original tumors from which the cell lines were most commonly established included Ewing's sarcoma (156 cell lines), osteosarcoma (148 cell lines), and undifferentiated high-grade pleomorphic sarcoma (43 cell lines). Among the 189 histological subtypes listed in the WHO classification, multiple cell lines were established from 36 histological subtypes of sarcomas. On the other hand, a single cell line was established for each of the following nine histological subtypes: pleomorphic liposarcoma, desmoids-type fibromatosis, tenosynovial giant cell tumor, desmoplastic small round cell tumor, PEComa, osteoblastoma, small cell osteosarcoma, fibrosarcoma of bone, and benign fibrous histiocytoma/non-ossifying fibroma.
We found that there are cell lines that originated from other sarcoma cell lines and were modified by gene transfection or drug treatments ( Table 2). Those modified cell lines are most common in osteosarcoma (275 cell lines). Among the 275 cell lines, U2OS cell lines are most commonly used as original cell lines for modifications. We found that there are cell lines for which the reported histology did not match that in the WHO classification (Table 3).

Availability from Cell Banks
Among the 819 cell lines originally derived from patients with connective and soft tissue neoplasm, 139 cell lines were available from the public cell banks (Table 1), while 680 were not. Among the 421 modified cell lines, 262 were available from the public cell banks, and 159 were not (Table 2). In addition to the cell banks examined in Cellosaurus, we searched the Ximbio website for published sarcoma cell lines. By searching the Ximbio website, three cell lines were additionally recognized in the cell bank: 2C4 gamma1A/JAK2 (fibrosarcoma), S_M6R1 (osteosarcoma), and S_N40R2 (osteosarcoma). Among the 35 WHO unclassified cell lines, 12 were available from the public cell banks, and 23 were not (Table 3).

Publication of Cell Lines
Among the 819 cell lines originally derived from patients with connective and soft tissue neoplasm, 674 cell lines were cited in the PubMed database (Table 4), while 145 were not. Among the 421 modified cell lines, 159 were cited in PubMed, and 262 were not (Table 5). Among the 35 WHO unclassified cell lines, 27 were cited in PubMed, and eight were not (Table 6).

Availability and Publication of Cell Lines
The cell lines whose establishment was reported in academic journals that are cited in the PubMed database can be useful research resources because the relevant data of cell lines are available from the published papers. Among the 844 original cell lines, there were 692 cell lines that have been published: among them, 108 cell lines are available from public cell banks (Figure 2A) (Supplementary  Table S1). Among the 819 original cell lines with the histology defined by WHO classification, there were 674 cell lines that have been published; among them, 103 cell lines are available from public cell banks ( Figure 2B) (Supplementary Table S1).
PubMed database can be useful research resources because the relevant data of cell lines are available from the published papers. Among the 844 original cell lines, there were 692 cell lines that have been published: among them, 108 cell lines are available from public cell banks (Figure 2A) (Supplementary Table 1). Among the 819 original cell lines with the histology defined by WHO classification, there were 674 cell lines that have been published; among them, 103 cell lines are available from public cell banks ( Figure 2B) (Supplementary Table 1).

Discussion
A lack of sarcoma cell lines is empirically noticed in the research community; it is important to know their availability from a practical view point. In this review, we investigated the current status of sarcoma cell lines to reveal what cell lines have to be established to promote sarcoma research. The cell line database, Cellosaurus, used in this study includes more than one hundred thousand cell lines and is frequently updated. Thus, Cellosaurus is an adequate cell line database for investigation.
We grouped the cell lines according to the histology of their original tumor. We found that 45 histological subtypes were covered by the currently reported cell lines, while 133 were not. Considering the diversity and complexity of sarcomas, we need more cell lines that represent the different histological subtypes. In addition, we found that multiple cell lines were established for 36 histological subtypes, with a single cell line reported for nine subtypes. During the course of the cell line establishment, clonal selection and expansion may occur, and only limited cell populations may survive under tissue culture conditions. To understand the sustainability of the original characteristics of the established cell lines, the capability of tumor tissue formation and the histology of the formed tumors can be evaluated by xenograft experiments. In addition, patient-to-patient variations are clinically considerable even if they have tumors with the same histology. Therefore, no single cell line can represent the characteristics of whole tumor tissues; we need to use multiple cell lines. In this sense, we also need more cell lines for sarcomas that already have corresponding cell lines.
Cell lines are most frequently established from Ewing's sarcoma and osteosarcoma samples. However, the absolute number of patients with Ewing's sarcoma and osteosarcoma is small according to medical statistics; undifferentiated pleomorphic sarcoma, liposarcoma, and leiomyosarcoma are more common sarcomas [58]. Thus, the number of patients may not be a critical factor to determine the established cell lines. Cell lines with a higher malignant potential may be easier to establish, and the clinical stage of donor patients, pathological grading, and prognosis may be correlated with the success rate of the cell line establishment. However, during our investigation, there was no report discussing the efficacy of the cell line establishment in terms of histology. This issue is quite important because we can refine the experimental protocols and improve the efficacy of experiments by clarifying the biological and clinical factors that determine the success rate of establishment.
The histological diagnosis of original tumors of cell lines may need to be updated in cases where the name of cell lines did not match the official classification. Among the 844 sarcoma cell lines investigated, 42 were not named according to the 2013 World Health Organization Classification of Tumours of Soft Tissue and Bone [1,59]. The diagnosis of sarcomas has been achieved based on morphological observations, and sarcomas are reclassified by the genetic characterization and subsequent phenotypic correlations. Thus, the diagnosis of cell lines with the official name should be refined by pathological examinations according to the most recent diagnosis criteria. This is a dilemma for a study using clinical materials, because the criteria of histological subtypes may have been updated after the cell lines were reported. To take full advantage of patient-derived sarcoma cell lines, we should investigate the pathology archives and update the diagnosis. However, this will be a challenging task.
Unfortunately, cell lines are not always deposited in cell banks. We found that only 139 of 819 sarcoma cell lines named according to the WHO classification were deposited in public cell banks. Probably, the rest of the cell lines can be provided upon request by researchers. The current cell bank systems may rely on researchers and institutes to undertake the cell line establishment. Establishing novel cell lines costs a considerable amount of resources, such as time and money; furthermore, because cell lines are properties of the institutes to which researchers are affiliated, it may be difficult to deposit all cell lines in public cell banks and share them with other researches. As the establishment of cell lines itself is not necessarily a novel discovery, nor would the publication be in high-impact journals, researchers may not be motivated to establish and share cell lines. A system to motivate cell line establishers and their institutes may be required to improve the availability by depositing cell lines.
This systematic review has several limitations. First, although the genetic background and biological characteristics of some but not all cell lines were reported in publications, this review did not summarize those data. In our research, 692 cell lines were reported in previous papers, and 108 of them were deposited in cell banks (Figure 2). Although the experiments were performed individually using different methods, it is worth integrating the relevant genetic and biological data of reported cell lines to evaluate their possible applications. Second, the clinical features of donor patients, such as metastasis and resistance against therapy, were not investigated in this review. Bernardo et al. [60] performed a systematic review for patient-derived xenografts in bladder cancers and discussed the clinical factors that may influence the take-rate of xenografts. Lu et al. [61] investigated previous studies on xenograft establishment, and correlated the higher engraftment rates with tumor stage. A similar approach could be used for cell lines of sarcomas. Thirdly, the pathological diagnosis should be updated using the most recent pathological criteria of sarcomas. It is possible that some of the reported cell lines may actually represent other subtypes. However, because we cannot access the original pathological archives and it takes too much effort to validate the results of pathological diagnosis, we cannot know the correct histology according to the most recent WHO classification. This is a general problem of sarcoma research, as observed when we conducted histology-based research using previously published data. Finally, the applications of cell lines are diverse, and probably depend on the cell lines and the experiments. In addition to the number of established cell lines, it would be worth investigating the literature to determine how the established cell lines were used by the researchers who received them.

Conclusions
Cell lines have been considered a valuable tool for both basic research and pre-clinical studies. The functional significance of genetic products such as mRNA, miRNA, and proteins can be clarified using living cells, and cell lines are an indispensable research resource. In the preclinical evaluation of new drugs, their tumor suppressive effects and mode-of-action are also investigated using cell lines. Although the predictive power of cell lines can be undermined by the selective pressures during the process of establishment and long-term passaging, a great advantage of cell lines is that the examinations can be done in a high-throughput manner with relatively low costs. Patient-derived xenografts (PDXs) may complement the inherent drawbacks of cell lines, because PDXs may retain the microenvironmental conditions of the original tumors. However, the manipulation of PDX requires time-consuming and tremendous efforts, and their unstable molecular backgrounds have been revealed at the genome level. Moreover, because the human stromal components in PDX tumors are replaced with mouse ones after several passes, consistent results may be limited in experiments using PDXs. Taken together, cell lines have a unique utility, and they are indispensable in cancer research.
We conclude that (1) more sarcoma cell lines representing the various histological types as well as those established by single cell lines are needed to effectively capture the diversity and complexity of the disease; (2) a system is needed to reward the efforts of researchers who establish and deposit cell lines into public cell banks to promote cell line sharing in the research community, and (3)

Conflicts of Interest:
The authors declare no conflict of interest.