Bioinformatic and Machine Learning Applications in Melanoma Risk Assessment and Prognosis: A Literature Review

Over 100,000 people are diagnosed with cutaneous melanoma each year in the United States. Despite recent advancements in metastatic melanoma treatment, such as immunotherapy, there are still over 7000 melanoma-related deaths each year. Melanoma is a highly heterogenous disease, and many underlying genetic drivers have been identified since the introduction of next-generation sequencing. Despite clinical staging guidelines, the prognosis of metastatic melanoma is variable and difficult to predict. Bioinformatic and machine learning analyses relying on genetic, clinical, and histopathologic inputs have been increasingly used to risk stratify melanoma patients with high accuracy. This literature review summarizes the key genetic drivers of melanoma and recent applications of bioinformatic and machine learning models in the risk stratification of melanoma patients. A robustly validated risk stratification tool can potentially guide the physician management of melanoma patients and ultimately improve patient outcomes.


Introduction
Cutaneous melanoma is the most aggressive form of skin cancer and the fifth most common cancer in the United States [1]. The incidence of cutaneous melanoma has been rising in the past few decades, with over 100,000 new cases diagnosed in the United States each year [1]. Despite recent advancements in advanced melanoma therapy, including targeted therapy (e.g., BRAF/MEK inhibitors) and immunotherapy (e.g., PD-1 inhibitors), there are over 7000 melanoma-related deaths each year in the United States, as the most advanced stage melanoma patients have recurrence after initial therapy [1][2][3].
The major risk factors for cutaneous melanoma formation are ultraviolet (UV) exposure and genetic susceptibility. UV-induced DNA damage and oxidative stress can cause the malignant transformation of melanocytes [4]. A family history of melanoma is a strong risk factor for the disease, which has led to the significant growth of melanoma genomics research in the past two decades [5].
The bioinformatic analysis of genomic data has been widely used to identify potential genetics and signaling pathways associated with melanoma pathogenesis and metastasis. More recently, bioinformatic analyses, including machine learning, are increasingly utilized to predict prognosis, risk stratify, and ultimately inform personalized treatment in cutaneous melanoma.
We conducted a literature review within PubMed and Google Scholar to provide an overview of bioinformatic and machine learning applications in melanoma prognostics and risk stratification. Given the massive catalog of bioinformatics and machine learning studies in the field of melanoma genomics and risk stratification, we attempt to summarize the currently established key drivers of melanoma that have utilized bioinformatics in its discovery. We also provide an overview of key findings, algorithms, and the predictive accuracy of recent studies applying bioinformatic and machine learning algorithms to melanoma risk stratification.

Bioinformatics in Melanoma Genomics
A melanoma is a heterogenous disease with numerous genetic determinants. Bioinformatic tools have been widely used to help understand the genetic drivers of melanoma and identify patient subgroups by specific genetic mutations to inform the management and development of therapies.
Ras genes and CDKN2A were the earliest gene mutations identified in melanoma in the 1980s and 1990s ( Figure 1) [6,7]. Ras genes are proto-oncogenes that are frequently mutated in cancers which encode a family of small G proteins, while CDKN2A encodes tumor suppressor proteins [8].
Recent whole-genome analyses of melanoma has also identified different mutated genes in cutaneous, acral, and mucosal melanoma, and highlighted mutations in the TERT promoter [16]. The TERT gene encodes the catalytic subunit of telomerase, an enzyme complex that regulates telomere length [16]. Additional genomic changes observed include changes in c-KIT, c-MET, and EGF receptors, and in MAPK and PI3K signaling pathways, which are important pathways for cell proliferation and survival [8].
The introduction of the high throughput analysis of biological information, particularly next-generation sequencing, has led to the rapid growth of genomic data [17]. As new genomic databases grow, additional genetic regulators of melanoma formation and progression are expected to be characterized in the future and potentially inform melanoma management.

Bioinformatics and Machine Learning in Melanoma Risk Assessment
Despite clinical staging guidelines, predicting the prognosis of melanoma is challenging due to its heterogenous nature. Bioinformatic tools have been widely used to analyze NGS data and help identify potential mutations associated with melanoma pathogenesis [18]. More recently, there have been increasing applications of bioinformatic analysis in melanoma risk stratification and the prediction of prognosis to inform treatment. Since In 2002, one of the first genomic studies identified mutations in BRAF, a regulator of cell survival, in 65% of malignant melanomas [9], which led to the development of BRAF inhibitors for BRAF mutant metastatic melanoma [10,11].
Recent whole-genome analyses of melanoma has also identified different mutated genes in cutaneous, acral, and mucosal melanoma, and highlighted mutations in the TERT promoter [16]. The TERT gene encodes the catalytic subunit of telomerase, an enzyme complex that regulates telomere length [16]. Additional genomic changes observed include changes in c-KIT, c-MET, and EGF receptors, and in MAPK and PI3K signaling pathways, which are important pathways for cell proliferation and survival [8].
The introduction of the high throughput analysis of biological information, particularly next-generation sequencing, has led to the rapid growth of genomic data [17]. As new genomic databases grow, additional genetic regulators of melanoma formation and progression are expected to be characterized in the future and potentially inform melanoma management.

Bioinformatics and Machine Learning in Melanoma Risk Assessment
Despite clinical staging guidelines, predicting the prognosis of melanoma is challenging due to its heterogenous nature. Bioinformatic tools have been widely used to analyze NGS data and help identify potential mutations associated with melanoma pathogenesis [18]. More recently, there have been increasing applications of bioinformatic analysis in melanoma risk stratification and the prediction of prognosis to inform treatment. Since the approval of systemic adjuvant therapies for stage III and stage IV melanoma, these therapies are now widely used following the resection of advanced melanoma. However, these systemic therapies are associated with frequent grade 3 or 4 adverse events, and are costly [19][20][21][22][23]. 2021 National Comprehensive Cancer Network (NCCN) guidelines currently do not recommend adjuvant therapy in stage I and II patients [24]. Patients with stage II melanoma have a 12% to 25% 10-year melanoma-specific mortality rate, and some stage II patients have worse survival than stage III patients [25,26]. As such, accurate prognostic tools to predict the probability of recurrence and survival are needed to risk stratify to better identify appropriate candidates for adjuvant treatment and level of surveillance.

Gene-Expression Profiling
The gene expression profiling of stage IV melanomas identified molecular subtypes with unique gene signatures that were correlated with different clinical outcomes [27]. This finding led to the development of a proprietary 31-gene expression profile (GEP) assay (Castle Biosciences) used to categorize the high-versus low-risk of metastases within five years of melanoma diagnosis [28,29]. One of the goals of 31-GEP testing was to determine the intensity of treatment and follow-up for melanoma patients.
The clinical utility and performance of 31-GEP has varied, and needs to be further validated in prospective studies [30]. Zager et al. analyzed 523 primary melanoma tumors using 31-GEP and reported that 31-GEP identified 70% of stage I and II patients who ultimately developed distant metastasis [31]. Similarly, Gastman et al. found that 31-GEP accurately identified high-risk patients who are likely to recur or die of melanoma in low-risk subgroups (e.g., sentinel lymph node-negative disease, stage I and IIA) [32]. A meta-analysis reported that 31-GEP performance varied, and was a better predictor of recurrence in stage II disease than in stage I [33]. However, a separate study suggested that there is limited cost-benefit of 31-GEP utilization in stage IIIA melanoma due to the limited survival benefit of this tool for this patient subgroup [34]. Given the lack of clear evidence that 31-GEP improves outcomes in melanoma, an established prognostic tool is still needed to accurately identify high-risk patients.

Current Bioinformatics in Melanoma Risk Assessement
A bioinformatic analysis of genes and biomarkers has not only been used to help identify genes associated with melanoma survival and mortality, but also to predict melanoma metastasis and prognosis (summarized in Table 1). Several recent studies constructed protein-protein interaction (PPI) networks to identify hub genes in melanoma. Sheng et al. constructed a PPI network to analyze differentially expressed genes (DEGs) from the Gene Expression Omnibus (GEO) database [35]. The study identified DGS3, DSC3, PKP1, EVPL, IVL, FLG, SPRR1A, and SPRR1B as potential biomarkers that predict the metastases of cutaneous melanoma [35]. Another study constructed a PPI network from melanoma gene expression data from UCSC Xena and GEO and found FOXM1, EXO1, KIF20A, TPX2, and CDC20 as genes associated with reduced overall survival [36]. Results from Wang et al. indicated that high CD38 expression could be a diagnostic marker for melanoma, and found that higher CD38 expression levels resulted in improved survival probabilities compared to lower expression levels [37].
An analysis of miRNA expression from 59 melanoma metastases identified 18 miRNA signatures that were overexpressed and correlated with longer post-recurrence survival [38]. Furthermore, the study identified six miRNA signatures that were predictors of survival of stage III patients independent of American Joint Committee on Cancer (AJCC) staging [38].
Sentinel lymph nodes (SLNs) regulate anti-tumor immune responses, so Farrow et al. hypothesized that SLN gene expression could predict a recurrence risk in melanoma [43]. Immune-related genes from SLN biopsies were used to create a multivariate regression model to predict recurrence-free survival [39]. Twelve genes, including immune checkpoint TIGIT, accurately predicted RFS, and therefore could potentially inform patient selection for adjuvant therapy [39]. Several other prognostic biomarkers were identified with Cox regression analyses, including pre-operative circulating tumor DNA that have the potential to further enrich the stage IIIA population for high-risk adjuvant therapy candidates [42,47].
A logistic regression analysis was used to create a nomogram that predicted the probability of a positive SLN in melanoma based on tumor characteristics, such as tumor thickness, Clark level, ulceration, site, and patient sex and age [51]. The nomogram predicted the presence of SLN metastasis more accurately than the AJCC staging system and has been externally validated by three separate institutions [54][55][56].

Machine Learning in Melanoma Risk Asessement
Machine learning is the application of computer algorithms with the aim to optimize the predictive accuracy of the algorithm [57,58]. Machine learning algorithms are based on pattern recognition and are designed improve its behavior based on data or experience, without additional human intervention. These algorithms can be powerful tools to assist humans in the analysis of large, heterogenous data sets, such as genomic data sets.
Machine learning research in dermatology has been primarily focused in developing image recognition tools for the binary classification of malignant melanoma [59]. Recently, there are a growing number of machine learning studies that aim to risk stratify and predict prognosis in melanoma, with several models outperforming the current risk classification tools available (summarized in Table 1). Various machine learning algorithms were employed in the studies we reviewed, with neural networks, a support vector machine, and random forest classifier models as the more commonly utilized algorithms. Several studies were able to achieve an AUROC over 0.8, or accuracy greater than 80%, though there were no clear associations between the machine learning algorithm used and accuracy achieved.
We do not compare the predictive abilities of these studies, as the models aimed to predict different outcomes.
Gene expression datasets from GEO and TCGA were used to construct a PPI network that identified 798 genes associated with melanoma metastasis [50]. These genes were used as variables in a support vector machine (SVM) classifier that had a metastasis prediction accuracy ranging from 96% to 100% [50]. A separate study used gene expression data from 754 thin-and intermediate-thickness primary cutaneous melanomas to train logistic regression models to predict the presence of SLN metastases from molecular, clinical, and histologic variables. The study found that models using clinicopathologic or gene expression variables were outperformed by a model that included molecular variables along with clinicopathologic predictions (i.e., Breslow thickness and patient age) [40]. Arora et al. also incorporated clinicopathologic variables in their machine learning models and found that models using clinicopathological features (e.g., Breslow thickness, N staging, M staging, ulceration status) outperformed GEP-based profiles and AJCC staging in predicting melanoma prognostics [39].
Several studies have utilized machine learning to analyze large RNA datasets and identify correlations with melanoma prognosis with high degrees of accuracy. Yang et al. used multiple machine learning algorithms to analyze melanoma samples from TCGA. The study hypothesized that six long non-coding RNA (lncRNA) signatures may regulate the MAPK, immune and inflammation-related pathways, the neurotrophin signaling pathway, and focal adhesion pathways [52]. The six lncRNA signatures were identified and used in a machine learning classifier that risk-stratified melanoma patients with 85% accuracy [52]. A separate study of transcriptomic data from 478 primary and metastatic melanoma, nevi, and normal skin samples identified six novel associations between the activation of metabolic molecular signaling pathways and the progression of melanoma [49]. A differential expression analysis of primary tumors from 205 RNA-sequenced melanomas revealed 121 metastasis-associated gene signatures which were then used to train machine learning classification models. The machine learning models better predicted the likelihood of metastases than models trained with clinical covariates or published prognostic signatures [53]. The analysis of RNA transcriptome data from cutaneous melanoma from Huang et al. found 16 m5C-related proteins that (e.g., USUN6, NSUN6) were also predictors of melanoma prognosis [45].
Mancuso et al. analyzed levels of selected cytokines with machine learning to classify stage I and II melanoma patients with a high and low risk of metastasis. The study found that cytokines IL-4, GM-CSF, and CDC with the Breslow thickness best predicted melanoma metastasis [48].
Johannet et al. used deep learning on histology specimens with clinicodemographic variables to predict low versus high risk of progression after immunotherapy in advanced melanoma [46]. A separate computation pathology-based cell classification algorithm demonstrated that a high ratio of lymphocytes to all lymphocytes within the stromal compartment and a high ratio of stromal cells to all cells correlated with a poor survival in melanoma [53]. Histology slides from primary melanoma tumors with known SLN metastasis were used to train a machine learning model to predict SLN status, though the model achieved 61% accuracy and was not clinically relevant [41].

Conclusions
Cutaneous melanoma is a genetically heterogenous disease with many patient subgroups associated with different outcomes. There are currently no melanoma risk stratification tools that have been well validated and widely used. Bioinformatic analyses, particularly machine learning, have been internally validated to accurately risk stratify melanoma patients. However, bioinformatic tools will need to be externally validated to have clinical utility. Bioinformatic and machine learning analyses are growing rapidly in the field of melanoma, and we anticipate that continued research in melanoma risk stratification tools can potentially change future patient management and outcomes. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.