Investigating Serum and Tissue Expression Identified a Cytokine/Chemokine Signature as a Highly Effective Melanoma Marker

Simple Summary In this study, we investigated the expression of 27 cytokines/chemokines in the serum of 232 individuals (136 melanoma patients vs. 96 controls). It identified several cytokines/chemokines differently expressed in melanoma patients as compared to the healthy controls, as a function of the presence of the melanoma, age, tumor thickness, and gender, indicating different systemic responses to the melanoma presence. We also analyzed the gene expression of the same 27 molecules at the tissue level in 511 individuals (melanoma patients vs. controls). From the gene expression analysis, we identified several cytokines/chemokines showing strongly different expression in melanoma as compared to the controls, and the 4-gene signature “IL-1Ra, IL-7, MIP-1a, and MIP-1b” as the best combination to discriminate melanoma samples from the controls, with an extremely high accuracy (AUC = 0.98). These data indicate the molecular mechanisms underlying melanoma setup and the relevant markers potentially useful to help the diagnosis of biopsy samples. Abstract The identification of reliable and quantitative melanoma biomarkers may help an early diagnosis and may directly affect melanoma mortality and morbidity. The aim of the present study was to identify effective biomarkers by investigating the expression of 27 cytokines/chemokines in melanoma compared to healthy controls, both in serum and in tissue samples. Serum samples were from 232 patients recruited at the IDI-IRCCS hospital. Expression was quantified by xMAP technology, on 27 cytokines/chemokines, compared to the control sera. RNA expression data of the same 27 molecules were obtained from 511 melanoma- and healthy-tissue samples, from the GENT2 database. Statistical analysis involved a 3-step approach: analysis of the single-molecules by Mann–Whitney analysis; analysis of paired-molecules by Pearson correlation; and profile analysis by the machine learning algorithm Support Vector Machine (SVM). Single-molecule analysis of serum expression identified IL-1b, IL-6, IP-10, PDGF-BB, and RANTES differently expressed in melanoma (p < 0.05). Expression of IL-8, GM-CSF, MCP-1, and TNF-α was found to be significantly correlated with Breslow thickness. Eotaxin and MCP-1 were found differentially expressed in male vs. female patients. Tissue expression analysis identified very effective marker/predictor genes, namely, IL-1Ra, IL-7, MIP-1a, and MIP-1b, with individual AUC values of 0.88, 0.86, 0.93, 0.87, respectively. SVM analysis of the tissue expression data identified the combination of these four molecules as the most effective signature to discriminate melanoma patients (AUC = 0.98). Validation, using the GEPIA2 database on an additional 1019 independent samples, fully confirmed these observations. The present study demonstrates, for the first time, that the IL-1Ra, IL-7, MIP-1a, and MIP-1b gene signature discriminates melanoma from control tissues with extremely high efficacy. We therefore propose this 4-molecule combination as an effective melanoma marker.

its clinical application as a melanoma marker. Consensus among different studies is often difficult given experimental discrepancies on serum/plasma handling or the antibodies' sensibility/specificity. Using multiplex immune-based technology may overcome these issues, at least in part, by measuring many different analytes within the same sample.
We have previously investigated melanoma markers by in vitro screening [13], as well as by investigating the ion channels [14,15], autophagy-related molecules [16,17], or molecules related to lipid metabolism [18] in populations composed of hundreds/thousands of controls and patients. In the present study, we investigated the cytokine/chemokine protein expression in the serum of 232 controls/melanoma patients recruited in our hospital, and the gene expression on 511 melanoma tissues selected from the GENT2 database. We report here, for the first time, significant differences related to gender, age, and Breslow thickness in the serum-expression dataset. In the tissue-expression dataset, we report, for the first time, a highly relevant gene marker combination, discriminating healthy controls from melanoma patients with an extremely high accuracy, and reaching an AUC = 0.982, according to the ROC analysis.

Results
The cytokine/chemokine expression in melanoma patients was analyzed to identify molecules with strong and significant differential expression in patients vs. controls. The cytokine/chemokine protein expression in the serum of 232 patients recruited at the IDI hospital and their RNA expression in tissue biopsies of 511 samples from the GENT2 public database were evaluated. The serum expression and tissue expression of the same 27 human chemokines/cytokines were analyzed as a single-molecule analysis, as a paired-molecule analysis, or as a profile analysis, as reported in the cartoon depicted in Figure 1.

Serum Expression: Single-Molecule Analysis of the Cytokines/Chemokines in Melanoma Patients vs. Controls
The serum dataset included the following information: histopathological diagnosis (96 pathological subjects versus 136 controls), sex (112 male and 120 female), age (median 46.5 years, and mean 48.54 years), Breslow's depth (minimum value 0 mm, maximum 12 mm, median 0.7 mm, and mean 1.34 mm), and the expression values of the 27 cytokines/chemokines expressed as pg/mL. Table 1 summarizes the information on the serum dataset. Male melanoma 48 58.0 1.08 31 14 Total 232 * The 1 mm limit is consistent with the current threshold used for staging of T1 melanoma patients and allowed the best case distribution. Not all pathological samples report the thickness value.
Tables S1 and S2 report more general data (number of samples for each molecule, minimum value, 25% percentile, median, 75% percentile, maximum, mean, standard deviation, and having passed the normality test (or not)) for all controls and all melanoma patients, respectively. Table 2 reports the mean values of serum expression of the 27 cytokines/chemokines, the statistical significance of the differences, and the AUC according to the ROC analyses. Five molecules show a Cancers 2020, 12, 3680 4 of 27 significantly (p < 0.05) different expression in melanoma vs. the controls, namely, IL-1b, IL-6, IP-10, PDGF-BB, and RANTES. The ROC analyses indicated that none shows a good ability to act as a serum marker of melanoma; in fact, an AUC < 0.70 was found in all cases. Nevertheless, the following Breslow-, age-, and gender-specific characterization indicated many statistically significant differences.    Table S3 reports the correlations with Breslow thickness in male melanoma and in female melanoma patients and shows that the cytokines with significant correlations are different in males vs. females. Table 3. Serum expression as a function of Breslow thickness. Count indicates the number of patients analyzed. The median expression values of the 27 molecules with a Breslow thickness <1 mm or >1 mm are reported. The significance of the difference according to the Mann-Whitney analyses is also reported: for p-values > 0.05, the null hypothesis that the two distributions of cytokine expressions have the same median should be rejected (significant values are reported in bold and are underlined). In simpler words, when the p-value is < 0.05, the cytokine expressions in patients with a Breslow thickness <1 mm and expression in patients with a thickness >1 mm have significantly different medians.  A further analysis was carried out as a function of age, in all melanoma patients and all controls. The Spearman's correlation index was computed between the age and the expression value of each cytokine. Seven significant correlations were found in the controls involving IL-7, IL-12(p70), IL-13, IP-10, MIP-1a, MIP-1b, and VEGF. Such correlation were mostly lost in the melanoma patients; in fact, the patients showed only two significant correlations with age, namely, IP-10 and G-CSF (Table 5). A similar finding in male controls vs. male melanoma and in female controls vs. female melanoma is reported in Tables S4 and S5, showing a strong reduction in the correlation with age in melanoma samples compared to the controls. Then, a gender-specific analysis was carried out. Namely, expression levels of the 27 cytokines in male melanoma were compared to female melanoma. Table 6 shows that Eotaxin is significantly increased in male vs. female melanoma, and MCP-1 expression is significantly reduced in male vs. female melanoma, highlighting gender-related differences in cytokine/chemokine serum expression. Altogether, the results reported in Tables 2-6 indicate strong and significant differential expression of several cytokines/chemokines, as a function of Breslow thickness, age, and gender. Such differences support the known role of these molecules in controlling proliferation, immune response, chemotaxis, and inflammation in melanoma samples, and provide molecular insights into the systemic response to melanoma (see the Discussion section).

Serum Expression: Analysis of Paired Molecules by a Correlation Matrix
A correlation analysis was then carried out. Namely, Spearman's correlations between the expression values of all pairs of molecules were investigated in control and in melanoma patients. Figures 2 and 3 show the molecule pairs exhibiting the highest correlation coefficients in the control and in melanoma patients, respectively. Specifically, Figure 2A,B shows the heatmap of the intersections having a p < 0.05 and Spearman's rank coefficient >0.60, in the control and melanoma samples, respectively. Figure 3A,B show the molecules pair exhibiting a more severe selection, i.e., a correlation with p < 0.05 and a coefficient >0.7. In Figure 2, the 27 molecules were roughly clustered according to their biological functions. A higher number of strong correlations appear in the melanoma samples as compared to the controls, involving either immune/inflammatory molecules, chemokines, and angiogenic factors, indicating that the cytokines/chemokines expression network appears to be more reciprocally correlated in the melanoma samples.

Serum Expression: Profile Analysis by SVM
The serum expression of the 27 chemokines/cytokines was then analyzed as a global profile. The SVM supervised learning algorithm was used, performing a simultaneous analysis of all molecules

Serum Expression: Profile Analysis by SVM
The serum expression of the 27 chemokines/cytokines was then analyzed as a global profile. The SVM supervised learning algorithm was used, performing a simultaneous analysis of all molecules as predictors of melanoma state. A 10-fold cross-validated, linear-kernel SVM search method was carried out, and the missing values were handled in two alternative ways, as specified in the Methods

Serum Expression: Profile Analysis by SVM
The serum expression of the 27 chemokines/cytokines was then analyzed as a global profile. The SVM supervised learning algorithm was used, performing a simultaneous analysis of all molecules as predictors of melanoma state. A 10-fold cross-validated, linear-kernel SVM search method was carried out, and the missing values were handled in two alternative ways, as specified in the Methods section. The SVM analysis of the sera data improved the classification efficacy as compared to the single-molecule analysis reported in Table 2, leading to an AUC value = 0.761 and an average accuracy of 0.724 with a p = 0.108 (reported in bold and underlined in Table 7). This is slightly higher than 0.7, i.e., the best AUC value obtained by analysis in the single-molecule data ( Table 2). Such a result was obtained by removing the missing values and considering as predictors the age and the expression values of the molecules IL-4, IL-8, IL-9, Eotaxin, FGF-2, IFN-γ, IP10, MIP-1a, MIP-1b, PDGF-BB, RANTES, and VEGF. Table 7. Results of the SVM method applied to the serum expression dataset. Missing values are either removed or assigned as zero. In the first case, some molecules are removed from the predictor values (remaining molecules are listed in the "Molecules" column). Sex and/or age are or are not regarded as predictors. The p-value is the probability that the "Accuracy" value is not significantly above the "No Info Rate" value.  The SVM analysis on the serum expression data show that the profile analysis may improve the classification efficacy as compared to the single-molecule analysis, but unfortunately not enough to reach clinically relevant values.

Tissues Expression: Single-Molecule Analysis of 27 Cytokines/Chemokines in Melanoma Patients and Controls
Gene expression of the 27 chemokines/cytokines was then evaluated in tissue biopsies of melanoma samples and in control samples. Expression data were derived from the skin cancers section of the GENT2 database. The interface available at the link http://gent2.appex.kr/gent2/ presents data from all skin cancers combined, reporting analyses of Basal Cell Carcinoma (BCC) pooled with Squamous Cell Carcinoma (SCC), Merkel carcinoma primary, Merkel carcinoma metastatic, primary and metastatic melanoma data, for a total of 810 samples. We therefore extracted data referring to normal skin, primary melanoma, and all other melanoma data, excluding all other skin cancers from the analysis. After such a selection, 511 samples were considered, namely, 201 normal skins, 83 primary/primary in-transit melanoma patients, and 227 metastatic melanoma patients. The median gene expression values of the 27 molecules are reported in Table 8 for the three categories. No other stratifications (such as sex, age, or Breslow thickness) were carried out, given the database limitations reported in the Material and Methods section. Differences of the medians in the categories assessed by Mann-Whitney analysis revealed that most molecules show significantly different median values (p < 0.05). ANOVA analysis was also carried out on the three groups, indicating similarly strong significant differences for most molecules investigated (see Table S6). A ROC analysis was then carried out for every molecule by comparing the control vs. all melanoma, control vs. primary melanoma, control vs. metastatic, and primary melanoma vs. metastatic melanoma samples. The results are reported in Table 9. Four molecules were found to be very good classifiers of the control vs. melanoma samples, namely, IL-1Ra, IL-7, MIP-1a, and MIP-1b, with AUC values of 0.88, 0.86, 0.93, and 0.87, respectively. The corresponding ROC curves for the control vs. all melanoma samples are shown in Figure 4.  These results indicate that analyzing the gene expression of single molecules identifies relevant and significant differences; in this case, the ability to discriminate melanoma from the controls is much higher than the serum-expression data. Nevertheless, such values are below the threshold commonly indicated for potential clinical application. We then carried out the paired-molecule and profile analysis.

Tissue Expression: Analyzing Paired Molecules by a Matrix Correlation
The analysis of the paired-molecule correlations was then carried out, similarly to what we have done for the sera dataset. The correlations between the expression values of all pairs of molecules in the control and melanoma patients were analyzed by computing Spearman's rank correlation coefficient. Figures 5 and 6 show the molecule pairs exhibiting high correlation coefficients in the control and melanoma patients. The heatmap in Figure 5 highlights the pairs with significant correlation coefficients of R > 0.6 with p < 0.05. Figure 6 shows a more severe selection, i.e., the pairs with a correlation p-value < 0.05 and coefficient >0.70. As observed in the serum dataset, this analysis indicates that many strong correlations appear in the melanoma samples as compared to the controls, involving immune/inflammatory molecules, chemokines, and angiogenic factors. Figures 5 and 6 show the molecule pairs exhibiting high correlation coefficients in the control and melanoma patients. The heatmap in Figure 5 highlights the pairs with significant correlation coefficients of R > 0.6 with p < 0.05. Figure 6 shows a more severe selection, i.e., the pairs with a correlation p-value < 0.05 and coefficient >0.70. As observed in the serum dataset, this analysis indicates that many strong correlations appear in the melanoma samples as compared to the controls, involving immune/inflammatory molecules, chemokines, and angiogenic factors.    Figures 5 and 6 show the molecule pairs exhibiting high correlation coefficients in the control and melanoma patients. The heatmap in Figure 5 highlights the pairs with significant correlation coefficients of R > 0.6 with p < 0.05. Figure 6 shows a more severe selection, i.e., the pairs with a correlation p-value < 0.05 and coefficient >0.70. As observed in the serum dataset, this analysis indicates that many strong correlations appear in the melanoma samples as compared to the controls, involving immune/inflammatory molecules, chemokines, and angiogenic factors.

Tissue Expression: Profile Analysis by SVM
As in the case of the serum expression data, the SVM supervised learning algorithm was used to investigate all molecules simultaneously as melanoma predictors. Impressive results were achieved by analyzing the tissue-expression data. The results are shown in Table 10. The average of the AUC values obtained in the 10 iterations of the cross-validation procedure is 0.99, much higher than the highest AUC value obtained in the ROC analysis of the single molecules (namely, 0.93; see Figure 4 and Table 9). The p-value is <0.00001; hence, we are highly confident about the statistical significance of this observation. Table 10. SVM method applied to the tissue-expression dataset. Sex and age were not considered as predictors for the lack of data in the dataset. The p, i.e., the probability that the "Accuracy" value is not significantly higher than the "No Info Rate" value, is lower than 0.00001. Hence, we can safely reject the null hypothesis and we can assume that the accuracy of the predictive model is higher than the value of the No Info Rate (0.61), corresponding to the performance of a dummy, fixed-answer predictor. We then conclude that by using all molecules as melanoma classifiers, the accuracy of the prediction is extremely strong. The ROC curve of the predictive model based on the simultaneous analysis of all molecules is shown in Figure 7. Figure 4 and Table 9). The p-value is <0.00001; hence, we are highly confident about the statistical significance of this observation.

Training
We then conclude that by using all molecules as melanoma classifiers, the accuracy of the prediction is extremely strong. The ROC curve of the predictive model based on the simultaneous analysis of all molecules is shown in Figure 7. Table 10. SVM method applied to the tissue-expression dataset. Sex and age were not considered as predictors for the lack of data in the dataset. The p, i.e., the probability that the "Accuracy" value is not significantly higher than the "No Info Rate" value, is lower than 0.00001. Hence, we can safely reject the null hypothesis and we can assume that the accuracy of the predictive model is higher than the value of the No Info Rate (0.61), corresponding to the performance of a dummy, fixed-answer predictor.  A model based on many predictors (in this case 27) may present some practical issues. We thus performed a Recursive Feature Elimination (RFE) procedure (summarized in the Methods section), to select the most relevant molecules of the predictive SVM-based model. According to this analysis, the most sensible predictors are, in order, MIP-1a, IL-1RA, IL-7, MIP-1b, IL-12(p70), and TNF-α. The best four molecules correspond to the ones shown in Table 9, obtained with ROC analyses of the A model based on many predictors (in this case 27) may present some practical issues. We thus performed a Recursive Feature Elimination (RFE) procedure (summarized in the Methods section), to select the most relevant molecules of the predictive SVM-based model. According to this analysis, the most sensible predictors are, in order, MIP-1a, IL-1RA, IL-7, MIP-1b, IL-12(p70), and TNF-α. The best four molecules correspond to the ones shown in Table 9, obtained with ROC analyses of the single molecules. As shown in Figure 8, a model based on two molecules, namely, MIP-1a and IL-1RA, achieves an AUC value = 0.965, while a 4-predictor model (MIP-1a, IL-1RA, IL-7, and MIP-1b) reaches an AUC = 0.982. These molecules stably represent the best 4-marker combination with the highest AUC value. Further increasing the number of predictors does not add any relevant improvement (see Figure 8).

Num. Controls
We therefore conclude that combined analysis of the expression of the MIP-1a, IL-1RA, IL-7, and MIP-1b genes represents the best combination within the 27 investigated, able to very effectively discriminate the control from the melanoma samples.
highest AUC value. Further increasing the number of predictors does not add any relevant improvement (see Figure 8).
We therefore conclude that combined analysis of the expression of the MIP-1a, IL-1RA, IL-7, and MIP-1b genes represents the best combination within the 27 investigated, able to very effectively discriminate the control from the melanoma samples. We finally investigated the role of the expression of these four genes as a prognostic factor. According to the survival analysis tool available in the GEPIA2 database, 3 out of 4 show significant Hazard Ratios. Namely, IL7, MIP-1a (CCL3), and MIP-1b (CCL4) show a HR of 0.71, 0.65, and 0.5, respectively, with p = 0.01, 0.002, and 1 × 10 −7 , respectively. These data indicate significantly improved survival in patients with high expression values for these three genes.

Results Validation
The expression of the four molecules reported in Figure 6 was then investigated in an independent database, namely, GEPIA2 (found at http://gepia2.cancer-pku.cn/). Expression was confirmed to be significantly different in melanoma compared to the healthy controls, for IL-1Ra (recognized as ILRn by GEPIA2), IL-7, MIP-1a (recognized as CCL3 by GEPIA2), and MIP-1b (recognized as CCL4 by GEPIA2) (see Figure 9). We finally investigated the role of the expression of these four genes as a prognostic factor. According to the survival analysis tool available in the GEPIA2 database, 3 out of 4 show significant Hazard Ratios. Namely, IL7, MIP-1a (CCL3), and MIP-1b (CCL4) show a HR of 0.71, 0.65, and 0.5, respectively, with p = 0.01, 0.002, and 1 × 10 −7 , respectively. These data indicate significantly improved survival in patients with high expression values for these three genes.

Results Validation
The expression of the four molecules reported in Figure 6 was then investigated in an independent database, namely, GEPIA2 (found at http://gepia2.cancer-pku.cn/). Expression was confirmed to be significantly different in melanoma compared to the healthy controls, for IL-1Ra (recognized as ILRn by GEPIA2), IL-7, MIP-1a (recognized as CCL3 by GEPIA2), and MIP-1b (recognized as CCL4 by GEPIA2) (see Figure 9).
The combined expression of these four molecules was then subject to a PCA analysis carried out by the "Dimensionality reduction" tool in GEPIA2. The three most relevant components very effectively differentiated melanoma from controls (Figure 10), indicating that the combined analysis of these four molecules may represent an effective melanoma marker.
This observation fully validated the SVM analysis reported in Figure 6. Cancers 2020, 12, x FOR PEER REVIEW 17 of 27 Figure 9. Tissue expression according to the GEPIA2 database. An asterisk (*) indicates p < 0.0001.
The combined expression of these four molecules was then subject to a PCA analysis carried out by the "Dimensionality reduction" tool in GEPIA2. The three most relevant components very effectively differentiated melanoma from controls (Figure 10), indicating that the combined analysis of these four molecules may represent an effective melanoma marker.
This observation fully validated the SVM analysis reported in Figure 6.

Discussion
While several serum biomarkers are investigated in melanoma patients [19], diagnostic markers currently applied in clinics are restricted to S-100, HMB-45, Melan-A, and SM5-1 [20,21], and the prognostic markers to monitor melanoma progression are S100B, MART1, PMEL, and S100A13 [22]. As recently reported [23], potential markers in melanoma are mutations (on BRAF, NRAS, KIT, GNA11/GNAQ, NF1, CDKN2AI, immunohistochemical biomarkers (such as PD-11 and PD-L1, as well as mutated BRAF and NY-ESO-1), miRNAs, and other serum molecules. The key role cytokines/chemokines play in the immune/inflammatory response and in proliferation and chemotaxis control has been largely investigated; nevertheless, their role as diagnostic or prognostic markers remains to be elucidated. We and others demonstrated the key role of growth factors such as FGF-2, PDGF, and TNF-α in controlling melanoma growth [24][25][26] and melanoma aggressiveness [27]. The present study is the first, to our knowledge, presenting a signature of four cytokines/chemokines as an extremely effective melanoma marker, in a large patient collection. The present study measures cytokine/chemokine expression in serum and in tumor biopsies. We did not expect that the same cytokines/chemokines would be modified in serum and in tissues. In fact, the molecules measured in the serum are likely produced as a systemic response, while the molecules measured within the biopsies are directly produced in the tumor or in the regions immediately close to it. Therefore, the cytokines/chemokines measured within the biopsies reflect more directly the tumor biology and its aggressive behavior. On the contrary, the cytokines/chemokines measured in the serum reflect more how the organism responds to the tumor from an inflammatory/immunological point of view. We cannot exclude that molecules produced within the primary tumor may reach the blood. However, such signals may be not measured due to the large dilution in the blood stream and their expression values may fall below the detection limit. We used the xMAP technology for quantification in serum samples, to minimize as much as possible the sensitivity limitations.
For the sake of clarity, we will discuss below the results of serum expression separately from the results on tissue expression.
Serum expression: Analyzing the serum expression of 27 cytokines/chemokines did not identify any relevant marker when individually analyzed. Nevertheless, significant and strong differential expressions were found in melanoma vs. controls (see Table 2), as well as in melanoma samples as a function of Breslow thickness (see Tables 3 and 4) or age (Table 5). Furthermore, significant gender-specific differences were identified in Eotaxin and MCP-1 expression (Table 6), as well as in GM-CSF, TNF-α, IL-9, MIP-1a, IL-8, PDGF-BB, and MCP-1 (Tables S3-S5). This finding indicated the molecular bases possibly underlying the different incidence and different mortality rates in male vs. female melanoma [28][29][30][31][32], as well as the unexpected better response of immunotherapies in men than in women [33].
Serum expression of IL-1b, IL-6, IP-10, PDGF-BB, and RANTES was found to be significantly different in melanoma vs. controls (Table 2), reinforced by a much larger patient cohort, since previous observations were carried out on much smaller patient cohorts [34,35]. These molecules make a proinflammatory milieu previously found in uveal melanoma [36] and such findings agree with recent data showing that serum inflammation markers are strongly associated with melanoma progression [37]. Cytokines and chemokines are closely engaged within a large network where several ligands share few receptors [38], therefore reciprocally modulating their pro-, anti-inflammatory, chemotactic, and angiogenic functions [39][40][41]. The chemokines network is known to mediate melanoma interaction with the surrounding tissues [42]. Investigating cytokine/chemokine serum expression may therefore reveal the coordinated tissue reaction to the presence of melanoma. In the present study, Spearman's correlation matrix revealed for the first time that strong and significant correlations of the expression values are more numerous in melanoma samples than in healthy controls (Figures 2 and 3). In the melanoma samples, molecules involved in inflammation, chemotaxis, and angiogenesis had strong and significant correlations, namely, according to Figure 3, strong correlations of IL-10 with FGF-2, IL-10 with RANTES, IL-5 with IL-13, IL-6 with TNF-α, and of IFN-γ with MCP-1 appear in melanoma samples.
Particularly interesting is the strong correlation of IL-6 to TNF-α in the melanoma samples; these two molecules are known to control the ability to evade the immune system control in a PDL-1-dependent manner [43]. Such correlation data indicated that the cross-talk of cytokines and chemokines is altered in melanoma and this may help in defining a melanoma-specific, correlation-matrix fingerprint. We therefore analyzed the entire panel of 27 cytokines/chemokines with the SVM machine learning algorithm, to investigate simultaneously all molecules as predictors of melanoma state. In other studies, SVM effectively discriminated melanoma on the basis of dermoscopic images [44], ultrasonic and spectrophotometric images [45], BRAF status [46], or dermo-fluorescence spectra [47], with a reported accuracy up to 90%. SVM was previously used for prognostic purposes in melanoma patients [48] but, to our knowledge, the present study is the first applying the SVM analysis to cytokine/chemokine-expression values to discriminate melanoma from controls, both in serum and in tissue, in a large group of controls and patients. The SVM procedure was indeed able to improve the ability to classify the serum samples, from AUC = 0.70 for IL-6 expression (see Table 2) up to AUC = 0.761 for the combined indicators (see Table 7). However, this is still not good enough to propose a clinical diagnostic application. We then concluded that the serum expression data of these molecules, while showing strong and significant differences, may not be good classifiers. Several reasons may underlie this result, such as biological reasons (namely, the large serum dilution) as well as technical reasons (namely, samples storage or antibody cross-specificity). We cannot exclude that the cytokine serum expression will give improved information with an improved technology. Protein quantification is a rapidly evolving technology with the continuous upgrading of antibody combinations, sensitivity improvement, and protocol optimization. As an example, a 2007 report [49] identified several cytokines differently expressed in melanoma serum using multiplex xMAP technology, in 179 melanoma patients and 378 healthy controls. However, those data merit to be re-evaluated in the light of the currently available multiplex xMAP technology and new antibodies. As an alternative, quantitative proteomics approaches, based on mass spectrometry, indicated proteins differently expressed in melanoma compared to the controls [50]. However, the sensitivity of the latter techniques limits their application for cytokine/chemokine quantification, as compared to immunometric methods. Analyzing serum from melanoma patients aimed at identifying markers suitable for the early diagnosis, using a minimally invasive technique, expressions significantly different were indeed identified. However, we could not identify good markers within the 27 molecules investigated. This may depend on the molecules chosen (i.e., we should probably change targets and focus on other molecules), or it may depend on the high dilution factor in the serum samples.
We should address briefly the age-matching issue. As reported in Table 1, the mean age in the healthy groups and melanoma groups is different. Such difference reflects what the reality is, i.e., cancer patients are generally older than healthy controls, since increased age is a specific cancer risk factor. In the present study, individuals were sequentially enrolled, and controls were individuals with a suspect lesion removed and diagnosed by the pathologist as a not-cancer lesion. To have a similar age distribution in patients and in the control groups, one would be forced to remove several young healthy controls from the dataset (to match the rarely present young melanoma patients) and to remove several old melanoma patients (to match the rarely present old healthy controls). Age-matching would, therefore, strongly decrease the number of individuals analyzed, and alter the actual patient and control age distribution. The SVM analysis reported in Table 7 was also carried out on the age-matched groups, and the results are similar to the ones obtained from unmatched groups (See Table S7). Furthermore, the matrix analysis reported in Figure 3 was carried out also on the age-matched groups, and the results are identical to the ones obtained from the unmatched groups. We then conclude that age-matching, required for correct statistical analysis, would strongly reduce the group numerosity; it also would abolish a specific risk factor. In addition, the results achieved on the age-matched groups appear very similar or identical to those obtained on the age-unmatched groups, under our experimental conditions. Tissue expression: Analyzing the tissue expression was much more effective to discriminate melanoma from controls. This is likely related to biological reasons (i.e., the direct analysis of the melanoma bearing tissue), as well as technical reasons (i.e., using a more stable quantification technique). Very high AUC values were calculated, up to 0.92 (see Figure 4), with the single-molecule analysis. Most of the investigated molecules showed a significant differential expression. This finding indicated that, at the tissue level, most of the cytokines/chemokines investigated are strongly altered in melanoma, prompting us to look for a further improvement of their classifier ability. Analysis of paired molecules reinforced what was observed in the sera dataset, showing that high-correlation pairs appear in melanoma while they are almost absent in the controls (Figures 5 and 6). Particularly interesting are the strong correlations involving the chemokines RANTES, MIP-1a, and MIP-1b (Figures 5 and 6).
Then, an SVM analysis on all molecules simultaneously analyzed was carried out. This analysis strongly improved the classification ability as compared to the single molecules. AUC reached an extremely high value of 0.991 when all 27 cytokines/chemokines were simultaneously considered as melanoma predictors. Use of the Recursive Feature Elimination (RFE) [51] procedure allowed us to identify the four best-performing molecules. The combination of IL-1Ra/IL-7/MIP-1a/MIP-1b shows the relevant AUC = 0.982. Interestingly to notice, the expression of IL-1Ra and IL-7 (known anti-inflammatory cytokines) is significantly reduced in the melanoma samples, while expression of MIP-1a and MIP-1b (known inflammatory chemokines) is significantly increased. Previous data demonstrate that CCL3 (MIP-1α) and CCL4 (MIP-1b) control the infiltration of the immune cells by recruiting antigen-presenting cells, including dendritic cells (DCs), to the tumor site via IFN-γ [52]. However, the specific signature made of the two anti-inflammatory and two pro-inflammatory molecules is a novel finding to our knowledge.
Full validation of these results was achieved on an independent dataset, the GEPIA2 database, reporting expression data from 1019 control and melanoma samples. The four molecules IL-1Ra, IL-7, MIP-1a, and MIP-1b were confirmed to have a significant (p < 0.0001) differential expression in melanoma (Figure 9), and their combined analysis with the PCA methodology (a different methodology compared to SVM) was found to effectively discriminate the controls from the melanoma samples ( Figure 10).
The identification of the relevant gene markers from the tissue-expression data by using a quantitative technique may help improve histological diagnoses. Identification of a 4-gene signature may be a relevant help for pathologists. Measuring expression of these genes represents a quantitative approach that is operator-independent and may be part of an automatic process useful to identify suspect samples.

Patients Selection and Recruitment
Melanoma patients were consecutively recruited at the hospital sections of IDI-IRCCS, according to the procedure approved by the IDI Ethics Committee (CE 287/1 approved 7/04/2009) based on a suspect skin lesion. All patients gave written informed consent. Patients under any pharmacological melanoma therapy were excluded. Serum was collected before the biopsy procedure, aliquoted, and stored at −80 • C. According to the histological analysis, patients were then assigned to the control arm or to the melanoma arm. A total of 232 patients were recruited in the present study.

Serum Handling
A total of 7 mL of peripheral blood were collected from patients. Blood was collected in tubes with no additives of any type. Tubes were taken at room temperature for 2 to 3 h; they were then centrifuged at 15,000 rpm for 15 min; clear yellow color serum was stored in 100 µL aliquots and stored at −80 • C. The red color was considered a hemolysis sign and such sera were then not analyzed in the current study.

Cytokines Quantification in Sera Samples
Sera were obtained from melanoma patients (n = 96) and from patients with non-melanoma suspect lesions (n = 136) and were analyzed using xMAP technology on the Luminex platform (X200 Instrumentation equipped with a magnetic washer workstation and software Manager 6.1), which allows the simultaneous quantification of many molecules. The commercial kit used was Bio-Plex Pro human cytokine 27-plex panel (

Serum Expression Data
The serum dataset was composed of 232 records corresponding to 232 different individuals. The recorded data included in each case the histopathological diagnosis, sex, age, Breslow's thickness, and the expression values of the 27 chemokines/cytokines, reported in pg/mL. The expression of a few molecules was undetectable in some patients. Specifically, expressions of IL-2, IL-5, IL-6, IL-10, IL-13, and IL-15 were undetected in most cases and were measured in less than 30 controls and/or less than 30 melanoma samples. We handled missing values in two alternative ways, i.e., we removed the missing data point from the analysis (either the whole cytokine from all samples or the whole sample containing the missing value, depending on the performed analysis while trying to maximize the size of the resulting dataset), or, alternatively, we assumed the missing values are equal to zero, thus assuming that all missing values were caused by expression values lower than the measurement thresholds of the diagnostic kits.

Tissue Expression Data from GENT2 Database
The 27 molecules measured in the sera were investigated in control and melanoma samples taken from the GENT2 database, according to transcriptomic data; data of 201 normal skin biopsies were from 11 independent studies (referred to as GSE39612, GSE30355, GSE14905, GSE13355, GSE7553, GSE42109, GSE16161, GSE15605, GSE7307, GSE46239, and GSE7307); data of 83 primary melanoma biopsies were from 4 independent studies (referred to as GSE10282, GSE15605, GSE7553, and GSE62837); data of 227 metastatic melanoma biopsies were from 11 independent studies (referred to as GSE62837, GSE7307, GSE31879, GSE38312, GSE15605, GSE35640, GSE7553, GSE4587, GSE19293, GSE19234, and GSE22968). The tissue dataset was composed of 511 records, each describing a single subject: the recorded data included the histopathological diagnosis, sex, age, melanoma stage, and the expression values of the 27 cytokines. Each record always states whether the subject has been clinically diagnosed as affected by melanoma (310 patients) or not (201 controls). In this dataset, there were no missing values within the expression data. However, sex and age data were not recorded in most cases: only 217 records reported sex (138 males and 79 females) and only 125 records indicated the age (minimum 20 years, maximum 92 years, median 56 years, mean 59.56 years), and the majority of the control subjects had no sex and age data (about 70% of total). Therefore, sex and age stratifications were not carried out in the tissue-expression data.

Statistical Analyses
All statistical analyses were carried out on the "R" package version 4.0.0 [53].

Single-Molecule Analysis
Expression data of the single molecules were analyzed as an evaluation of the statistical significance of the median differences, by the Mann-Whitney test with Bonferroni correction. ANOVA analysis of the differences between the controls, primary melanoma, and metastatic melanoma samples in the tissue data was also performed and reported in Supplementary Material. In this case, normal distribution was evaluated by the Shapiro-Wilk test, and homoscedasticity (homogeneity of variance) was evaluated by the nonparametric Levene test. When the ANOVA analysis showed a significant difference between the medians, Mann-Whitney with Bonferroni correction and Tukey's honest significance tests were applied as post-hoc tests.

Paired-Molecule Analysis
A data-matrix analysis investigated whether the expression correlations of all molecule pairs show any relevant difference between the control and melanoma samples. Spearman's rank correlation coefficient was calculated.

Profile Analysis
Profile analysis was based on the Support Vector Machine (SVM) supervised learning algorithm, using a linear kernel [54,55]. Briefly, the method finds the best separation hyperplane between the set of control samples and the set of melanoma samples. Each sample is assumed as a single point in the hyperspace of dimensions n, where n is the number of features that can be used as predictors (specifically, the expression values of the molecules, and optionally age and sex). The result of the SVM algorithm is a separation hyperplane that maximizes the cumulative quadratic distance between the boundary points and the hyperplane itself. A parameter C plays a crucial role when the points are not linearly separable: C represents the tradeoff between decreasing the quadratic distance and ensuring that the boundary points are properly classified. We tuned the parameter C by testing 40 values between approximately 10 −14 and 10 5 and selecting the value yielding the largest Area Under Curve (AUC) of the Receiver Operating Characteristic (ROC).
The missing expression values were removed from the dataset, according to two alternative approaches. In the first approach, we removed either the entire sample or, if more convenient in terms of resulting dataset size, all expression values of a specific molecule from the dataset. In the second approach, we assigned all missing values to zero [56]. The expression values of each molecule were then transformed to have average = 0 and standard deviation = 1, according to standard methods for this kind of analysis [57,58]. To validate the results, a 10-fold cross-validation procedure was applied. The SVM algorithm considers all predictors as coordinates in a multidimensional space, hence the prediction model is based on the whole set of 27 expression values of the molecules. For the analysis of the tissue-expression data, a Recursive Feature Elimination procedure [51] was applied to identify the molecules having the greatest impact. Therefore, the most relevant molecules of the predictive SVM-based model were identified. This method essentially repeats several times the cross-validated SVM analysis by excluding one of the predictors at a time, then discarding the weakest one, and restarts the whole process on the set of remaining predictors. By this procedure, the molecules having the weakest impact on the performance of the SVM model were identified and removed from the feature set.

Cross-Validation Procedure
All statistical results involving random selections of samples (namely SVM analyses) were validated using "cross-validation" methods to reduce errors due to overfitting. Overfitting errors are caused by an over-optimization of the parameters of a statistical method that achieves an optimal result on the available dataset but poor results on a dataset built from a different set of observations. In a typical k-fold cross-validation procedure, the dataset is randomly partitioned in k subsets of approximately equal size. The statistical method is then repeated k times: in every execution, k-1 subsets are used as "training sets" to optimize the parameters of the method, while the remaining "testing"' subset is used to evaluate the performances of the method. Every execution of the method uses a different subset as the "testing" set. Eventually, the performances of the method are taken as the average performances of all k executions. In this work we selected k = 10; thus, every measure is the net results of 10 experimental runs.

Validation
The tissue-expression results obtained from the data collected from the GENT2 database were validated on an independent database, GEPIA2 (available at http://gepia2.cancer-pku.cn/#index), reporting the RNA expression data from 461 controls and from 558 melanoma patients. Expression and dimensionality reduction by PCA analysis were carried out by the specific tools available at the GEPIA2 database.

Conclusions
We report here, for the first time, significant differences in cytokine expression as a function of the pathological state and gender, or age, or Breslow thickness, in the serum expression of a large patient dataset. Such differences are likely related to the systemic response to the tumor and may help, at least in part, investigating the known heterogeneity of this tumor. Furthermore, by analyzing gene expression in a large tissue expression dataset, we report, for the first time, a highly relevant 4-gene signature that discriminates the controls from the melanoma patients. We also show here that the machine learning algorithm SVM appears to be very effective in improving the classification ability for potentially diagnostic purposes and clinical applications.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6694/12/12/3680/s1, Table S1: Serum expression general data in all controls (male + female). Table S2: General data on serum expression in all melanoma patients (male + female). Table S3: Correlation of the serum expression with Breslow thickness in male melanoma and in female melanoma. Table S4: Correlation of the serum expression with age, in female controls compared to female melanoma. Table S5: Correlation of the serum expression with age, in male controls compared to male melanoma. Table S6: Anova analysis of tissue expression data. Table S7: Results of the SVM method applied to the serum expression dataset after age-matching. Results are similar to the analysis performed on the unmatched data (see Table 7).