A Roadmap towards Breast Cancer Therapies Supported by Explainable Artiﬁcial Intelligence

: In recent years personalized medicine reached an increasing importance, especially in the design of oncological therapies. In particular, the development of patients’ proﬁling strategies suggests the possibility of promising rewards. In this work, we present an explainable artiﬁcial intelligence (XAI) framework based on an adaptive dimensional reduction which (i) outlines the most important clinical features for oncological patients’ proﬁling and (ii), based on these features, determines the proﬁle, i.e., the cluster a patient belongs to. For these purposes, we collected a cohort of 267 breast cancer patients. The adopted dimensional reduction method determines the relevant subspace where distances among patients are used by a hierarchical clustering procedure to identify the corresponding optimal categories. Our results demonstrate how the molecular subtype is the most important feature for clustering. Then, we assessed the robustness of current therapies and guidelines; our ﬁndings show a striking correspondence between available patients’ proﬁles determined in an unsupervised way and either molecular subtypes or therapies chosen according to guidelines, which guarantees the interpretability characterizing explainable approaches to machine learning techniques. Accordingly, our work suggests the possibility to design data-driven therapies to emphasize the differences observed among the patients.


Introduction
The definition of patients' profiles according to an automated and data-driven procedure represents a cornerstone of personalized medicine [1,2]. This practice is already well suited to address a patient towards a diagnosis and treatment, once medical hypotheses are broadcasted through multimodal clinical data [3][4][5][6][7][8]. Indeed, clinical practice can take into account multiple features, such as vital signs, historical records and symptoms to diagnose a pathological condition or determine a specific therapy. In this work, we investigate if it is possible to identify a quantitative similarity criterion to assess similarities (or dissimilarities) among oncological patients and, consequently, quantify to which extent such similarity groups match the current oncological therapies.
To this aim, we design an explainable machine learning framework able to support decision making about therapies assigned to breast cancer patients, based on a coherent description of the clinical status. In recent years, explainable artificial intelligence (XAI) has gained an increasing attention, especially for clinical purposes, where interpretability of results is of paramount importance. In general, an XAI model should have four main characteristics: transparency, justification, informativeness and uncertainty estimation [9][10][11]. Based on a proprietary cohort of patients enrolled at the Istituto Tumori "Giovanni Paolo II" in Bari (Italy), we collected the histological features of the first infiltrating breast tumor, the oncological treatment carried out according to current guidelines [12] and performed a hierarchical cluster analysis. Clustering is a pattern recognition approach widely adopted which exploits the features characterizing a sample of observations to define an intrinsic distance measure and then a similarity criterion: to ensure its transparency we included several robustness analyses in the proposed framework. If clusters transparency is somehow easy to obtain, the same consideration does not hold for justification.
In the XAI domain, justification consists in the elucidation of how the model provided a specific answer; in this case the problem is to provide a justification for the observed clusters. Besides, for clinical purposes we also hypothesize that a justification is not robust if it does not lead to informative conclusions. We unified these two aspects by adopting an embedding strategy. Embedding allows to represent, with a certain degree of fidelity, multidimensional data in suitable two-or three-dimensional spaces which are therefore prompted for visualization; thus, they are equivalently called dimensional reduction techniques. Visualization becomes the key for both justification and informativeness: in fact, using a two-dimensional embedding we were able to provide a clear clinical interpretation of clusters (justification) and quantify the informative content provided by the embedding (informativeness). Specifically, we adopted several embedding strategies such as the adaptive dimension reduction [13], principal component analysis [14,15], Laplacian eigenmap [16] and distinguishing variance embedding [17,18]. The selection of helpful prognostic factors among clinical and histological data collected in the first tumor diagnosis will characterize coherently the output of each embedding technique, verifying the data considered in current guidelines [12] by an automated procedure.
Finally, we performed an extensive evaluation for the uncertainty related to our quantitative method. Quantifying how reliable a prediction is within an XAI framework is the last but probably the most important feature to ensure. In fact, all the considerations and the conclusions drawn by the model would be inconsistent if characterized by a huge uncertainty. For this purpose, we designed and performed specific robustness analyses and statistical tests to obtain uncertainty estimates and create a comprehensive XAI framework. Provided that it is possible to determine some clinically consistent similarity groups, are there any therapy differences within these communities? Answering these questions paves the way for novel computer aided therapy design systems based on specific patients' profiles and hopefully novel and more effective treatments.

Enrolled Patients and Clinical Features
Our dataset is composed by clinical and cytohistological outcomes of 267 patients registered for a first tumor diagnosis in the period 1992-2017 and referred to Istituto Tumori "Giovanni Paolo II" in Bari (Italy). Inclusion criteria were based on the absence of primary chemotherapy for breast cancer and any patient has not to be metastatic ab initio.
Features consist in the age at diagnosis, tumor size (diameter), histological subtype (ductal, lobular, other), type of surgery (quadrantectomy/mastectomy), estrogen receptor expression (ER, %), progesterone receptor expression (PgR, %), cellular marker for proliferation (Ki67, %), histological grade (grading, Elston-Ellis scale: G1, G2, G3), human epidermal growth factor receptor-2 (HER2: Pos/Neg), the number of metastatic lymph nodes, lymph nodes stage (N: 0, 1, 2, 3), in situ component (absent, G1, G2, G3, present but not typed) and vascular invasion (absent, focal, extensive, present but not typed), while therapy (chemotherapy (CT), hormone therapy (HT), trastuzumab) is considered as unknown variables for our predictive scheme. The data set is described in Table 1. Once therapy is excluded from the set of predictive features, N = 13 prognostic factors, typically considered by clinicians during the first tumor diagnosis and therapy planning, remained to be considered. The exclusion of therapy data from input features and their use as labels concerning the outcomes of our clustering procedure ensures the interpretability of the latter, thus providing a clinical justification within the proposed machine learning framework. Clinical guidelines are referred to therapies specifically targeting cells labeled by a receptor yielded by a somatic mutation, which is signaled by a typical biomarker expression in clinical exams [12]. Anticancer drugs are engineered to limit the tumor mass proliferation by inducing cell apoptosis. Cell proliferation is characterized by genetic heterogeneity, thus often requiring the combination of molecular targeted agents [19][20][21]. Treatments of breast cancer are ruled by drugs families adopted for currently identified biomarker expressions: (i) HT select cancer cells showing ER or PgR on their membrane, (ii) a biological drug category, named trastuzumab, acts on cancer cells flagged by HER2 and (iii) CT is usually adopted when the tumor mass is endowed with not selectable cancer cells subclones and just a measure of cells proliferation given by Ki-67 antigen is available.
The combinations of biomarkers are synthetized by 5 possible molecular subtypes currently defined according to guidelines as follows [12]: A summary concerning therapies assigned to each molecular subtype in our data set is described in Table 2. The retrospective observational study was approved by our Institute Scientific Board.

An XAI Framework for Patients' Profiling
In this work, we propose an explainable artificial intelligence framework to evaluate which are the clinical features best representing oncological patients and capturing their similarities. A schematic overview is presented in the following Figure 1. Thus, the proposed framework consists of four distinct steps, which are designed according to the four XAI pillars of transparency, justification, informativeness and uncertainty estimation:

1.
Collection of patients' records, subject to a features processing which excludes those endowed with missing data.

2.
Dimensional reduction analysis, reducing the data dimensionality in a suitable twodimensional space (for XAI transparency) and evaluating which are the clinical features driving the embedding (prodromal for XAI justification).

3.
Clustering analysis, exploring and assessing the patients' similarities according to clinical features unveiled by embedding and bootstrap for robustness evaluation (for XAI justification and uncertainty estimation).

4.
Clinical and Therapy interpretation, evaluating how and to what extent the features and the clusterization previously outlined have a direct interpretation in terms of therapy differences (for XAI justification).

Dimensional Reduction Techniques
The first step of our analysis consists in the dimensional reduction. A wide variety of methods has been proposed to highlight the degrees of freedom of a data set endowed with most of the informative content. Such kinds of techniques are generally named embedding or dimensional reduction techniques; they are usually intended to project data into a subspace with a lower dimensionality to ease the data exploration, for example with the support of visual inspection or emphasize the informative content by improving the signal-to-noise-ratio. Several techniques have been proposed in the last two decades. In particular, we consider, here, the adaptive dimension reduction (ADR) approach [13].
ADR is a methodology iteratively combining the calculation of principal components and the application of k-means clustering. The motivation underlying the choice is based on a specific advantage of ADR over other effective possibilities: the absence of any parameter ruling the number of neighbors defining a local patch with respect to other popular choices such as Laplacian eigenmap [16] and distinguishing variance embedding [17,18]. Besides, ADR presents two important characteristics: • on the one hand, the application of a principal component analysis (PCA) takes into account the information contained in the whole data set; • on the other hand, the definition of clusters during each iteration rules further projections in the subspace spanned by principal components of the covariance matrix of cluster centers coordinates, weighted by the cardinality of each cluster, whose separation is improved.
Thus, this embedding technique perfectly matches the purposes of this work. Besides, to evaluate the robustness of the analysis, we considered other embedding techniques and compared their results with those obtained by ADR. ADR adopts PCA as a first step, followed by k-means clustering for the patients' sample embedded in the space with a reduced number of dimensions. In the latter each patient is associated with a position vector y i with 1 ≤ i ≤ M and included in a cluster C l with 1 ≤ l ≤ k, whose centroid reads where |C l | stands for the number of patients in the cluster or equivalently its cardinality. This partitioning procedure is ruled by the cost function.
which is the sum of the squared Euclidean distances between the objects y i and their centroids. It has to be minimized with respect to centroids coordinates, while the number of clusters represents a fixed parameter [22]. Once cluster memberships in the embedded space are defined, they are exploited in the original feature space, where the associated centroids coordinates define a new covariance matrix, whose PCA represents the starting of the next iterative embedding. The algorithm pursues this flowchart until it reaches a convergence to a stable configuration of embedded patients. Such iterations evaluate patients' similarities defining clusters in the reduced dimensions space. This partition is then induced in the original space with full dimensions to define a new embedding, where another clustering is implemented until memberships are not modified in successive steps.
For further comparison and to assess the robustness of ADR results we considered embedding techniques which for their popularity can be considered the state of the art: PCA, Laplacian eigenmap and distinguishing variance embedding (DVE).
PCA represents a minimal implementation of an embedding map. The term minimal is referred to the number of operations required by the procedure: considering the diagonalization of a covariance matrix, followed by a projection onto the subspace spanned by the eigenvectors endowed with the largest eigenvalues, PCA allows the representation of high dimensional data in suitable low dimensional spaces. Importantly, such eigenvalues provide a measure of the variance explained along the directions associated with aforementioned eigenvectors, therefore they provide a tool to evaluate the informative content of the examined data.
We also considered two methods characterized according to a different approach, because they consider local patches, built as neighborhood graphs associated with each patient. Indeed, once a neighborhood is defined in the feature space in terms of a certain number of patients endowed with lowest distance with respect to a reference one, or those included into a (hyper-)sphere centered in the latter, it is possible to build an adjacency matrix. Each not vanishing entry of the adjacency matrix is then endowed with a gaussian weight where x i is a scaled patient's features vector with each component in the unit interval range and σ is a scale parameter.
Embedded coordinates have to satisfy the same neighborhood information, as made by minimization of the cost function subject to the constraint ∑ i d(y i , 0) 2 D ii = 1 and D ii = ∑ j W ij , to remove an arbitrary scaling factor to avoid the optimization lying in a subspace with less dimensions. The cost function may be expressed compactly as E = 2 Tr Y T L Y , where Y is a M × n matrix with an embedded patient y i in each row, Tr stands for the trace of a matrix and L = D − W is the Laplacian matrix associated with the weighted graph, causing the name Laplacian eigenmap for the algorithm.
This optimization is also maintained in DVE, but in this case the global information of the data set is still included. A complementary graph is introduced according to the adjacency matrix W , with unit elements between patients not considered as neighbors in the Laplacian eigenmap roadmap. In this way a further cost is evaluated in parallel

Unsupervised Clustering and Validation
The dimensional reduction analysis provides a set of restricted dimensions which can suitably model the subjects' heterogeneity. Using these dimensions and the related clinical features, we adopt a clustering procedure to characterize the similarity of the patients enrolled in this study. To this aim, we evaluated the pairwise distances between the patients in the embedding space. Then, we used the unweighted pair group method with arithmetic mean (UPGMA) [23] to build a dendrogram. This procedure combines iteratively patients in groups according to growing distance scales among them, as defined by followed by the next patients' group distance weighted by the cardinality of the previous groups d A∪B,C = (|A|d A,C + |B|d B,C )/(|A| + |B|). In this way we define communities of patients typical of each distance scale. The distance is computed by considering the feature in a Euclidean reference system. The choice of the cutting level in the dendrogram is implemented according to internal validations, which reward cluster compactness and separation, while to measure how much the yielded partition matches with the studied medical variables we used the external validation. In particular, the decision tree nature of molecular subtypes allows us to improve explainability [10]. To evaluate the robustness of the cluster analysis, several metrics were investigated, such as total within sum of squares, average silhouette width, Dunn index and adjusted Rand index [22,24,25] (Appendix C). The limited cardinality of the considered data set imposes a required quantification of uncertainty that we implement by means of a bootstrap procedure [26]. Bootstrap consists in sampling data with replacement in order to obtain replica of the original data which statistically follow the same distribution. To ensure a higher fidelity of this procedure, we replicate the original data set by adopting a sampling ratio equal to the 80% of the available instances. This ratio ensures that the balance of considered labels is preserved. This procedure is implemented 1000 times, in each trial evaluating the three internal validations and the ARI external validation for both molecular subtypes and therapy labels. Internal and external validation metrics were implemented by means of R packages Clustering (version 1.7.2), factoextra (version 1.0.7) and mclust (version 5.4.7).

Comparing the Embedding Techniques
Our targeted roadmap towards a XAI being able to identify optimal therapy schemes has to rely on the detection of most informative variables in this purpose, whose role can be verified by the procedure adopted in current guidelines. First of all, we evaluated the different embedding techniques and compared their output using n = 2 dimensions, as shown in Figure 2.
The color legend shows the molecular subtype of the patient's tumor. Interestingly, the two-dimensional embedding shows a cluster structure. Clusters seem to reproduce the molecular subtype, a property slightly affected by the embedding technique, but ADR emphasizes the separation of each group.
To evaluate the informative content of such clusters and to obtain an interpretation, we considered which features most influence this representation. Focusing on the ADR technique, which resulted in the clearest representation, we evaluated the most important eigenvectors and examined their components: ER, PgR, Ki67 and HER2 resulted as the most significant ones. We underline that these features are exactly those defining the molecular subtypes, according to the associated decisional trees, whose selection in the relevant space appears to be an important result from a clinical point of view. By construction, the sum of squared coefficients of vectors spanning the relevant space equals to one, so we express the aforementioned weights in terms of percentages assumed by the square coefficient associated with each initial feature. The first principal component in output of the ADR procedure assigns to ER a squared coefficient equal to 20.9%, 16.1% to PgR, 2.8% to Ki67 and 56.7% to HER2. Globally, these initial features cover 96.5% of the overall unitary sum of squared coefficients. The second ADR component allocates to ER 37.5%, to PgR 18.8%, to Ki67 1.8% and to HER2 39.2%, now totally covering 97.3%, that is quite the overall variability. We chose some well-established dimensional reduction procedures to underline the robustness of the selected information once the same characterization is approximately observed in the output of each technique. The PCA embedding yields molecular subtype groups with a low separation with respect to remaining methods, because of the very simple structure of the associated algorithm. Nevertheless, it is possible already to underline the common indistinguishability shared by all embedding techniques between Luminal A and Luminal B HER2 negative patients. ADR, Laplacian eigenmap and DVE show much more separated HER2 positive and Luminal B HER2 positive patients, as well as triple negative cases. The absence for free parameters other than the embedded space dimension gives to ADR a privileged role for our purposes. Indeed, both Laplacian eigenmap and DVE are based on local patches defined through a parameter associated with the number of neighbors for each patient. In Figure 2 both cases adopt this number equal to 15 to maximize the cluster separation, even if it is data set dependent, because a different number of members is associated with each molecular subtype and patches should match as much as possible with this membership limitation.

Clustering Oncological Patients
The UPGMA clustering of the ADR embedding is described in Figure 3, where we highlight its hierarchical structure established according to the dendrogram. This latter is characterized by the definition of patients' communities typical of each distance level corresponding to which we apply a cut. This action selects the information expressed by a certain distance scale, thus grouping the elements accordingly. Our unsupervised procedure aims at the detection of molecular subtype and consequently of therapy information by cutting the dendrogram at the appropriate level.

Internal Validation
The selection of an optimal number of clusters is ruled by internal validation methods. The behavior of these quantities with respect to the number of clusters k is depicted in Figure 4 and it allows us to identify some optimal conditions. We choose to support the selection of the optimal condition by adding the information contained in the interquartile range provided by a bootstrap procedure replicating a subsample with the 80% of the original sample cardinality, together with the first derivative of each mean index, intended as a function of the number of clusters, shown in the inbox of each plot in Figure 4. Indeed, according to the presented criteria k = 4 is the choice suggested by WSS function, while by observing just the silhouette index we may follow the maximal variation obtained with lowest k value, thus leading to k = 2. The function w averaged over all patients reaches its maximum value corresponding to k = 4, so it represents a confirmation of the optimal number given by the total within sum of squares. Instead, Dunn index suggested to cut the dendrogram in Figure 3 corresponding to k = 3 clusters, so grouping toghether HER2 positive and Luminal B HER2 positive patients. These molecular subtypes present a similar clinical behavior caused by the correlated higher values for Ki67 with HER2 positivity.

External Validation
In our data set 110 patients belong to the luminal A subtype, 55 are in the luminal B HER2 negative condition, 35 in luminal B HER2 positive, 22 in HER2 positive and 45 in triple negative. Once a category among the latter is chosen for a patient, a therapy is assigned by clinicians to each case according to characterizing specificities and the presence of possible comorbidities. We resume the therapy statistics characterizing each molecular subtype in Table 2.
Results presented in Table 3 highlight misclassifications of patients belonging to luminal B HER2 negative and positive subtypes, corresponding numerically to five and six cases included among triple negative and HER2 positive molecular subtypes, respectively. The wrong inclusion is explained by considering the mean value of biomarkers for hormonal receptors, that in the cluster C4 read ER = 65.3 and PgR = 50.1, while for misclassified patients in C3 read ER = 0.4 and PgR = 4.8 and Ki67 = 20.6, so signaling a much lower value for the hormonal receptors expression. Analogously in C2 we observe ER = 61.5 and PgR = 34.4, while misclassifications in C1 are endowed with ER = 5.2 and PgR = 7.7. Table 3. Correspondence between clusters and molecular subtypes in the data set.

Cluster
Luminal To quantify the correspondence between clusters and molecular subtype or therapy labels, we evaluate ARI for both cases. To establish a baseline this calculation is implemented in the variation of the number of clusters k: the behavior of the index is shown in Figure 5, where we highlight the confirmation for the choice of optimal number of clusters, because the maximum value is reached corresponding to k = 4 in both cases. Moreover, approximately the same derivative characterizes both plots, thus strengthening the relation between therapies and molecular subtypes in a completely unsupervised way. Figure 5. Dependence of ARI on the number of clusters, which is computed with respect to the correspondence between the unsupervised clustering and molecular subtypes in (a) and therapy combinations in (b). The blue line joins mean values included in the error bars and associated with the interquartile range yielded by bootstrap, whose finite difference derivative is described in the inbox.
The clusters yielded by UPGMA contain a prevalent group in terms of both molecular subtypes or therapies, as resumed in Table 3. The maximal number of cluster members with respect to therapy correspond to the one observed in Table 2.

Discussion
Biomarkers are currently widely exploited in computer-aided diagnosis (CAD) tools in case of complex diseases such as cancer, intrinsically requiring a multifactorial study, as made for the prediction of optimal chemotherapeutic plans by fusing their information content with genic expressions [27], thus supporting the maintenance of a privileged role for cancer cell receptors in our XAI perspective towards therapy assignment.
XAI applications with medical purposes are increasingly widespread [28,29], based on the fundamental achievement of interpretability, which represents the main difference with brute force deep learning approaches. In particular, the exploitation of these algorithmic schemes for breast cancer is currently adopted for classification of malignant cases, as well as recurrent ones. With this aim the multidimensional scaling is implemented to obtain embedding in a relevant space, followed by a different step with respect to our flowchart, represented by rainbow boxes [29], which may be exploited in further developments targeting the use of query patients. In literature, a clustering technique analogous to the one presented in this study with UPGMA is used in the definition of symptoms groups among cancer patients, based on the purpose to provide a support in the definition of therapies [30,31]. Interestingly, the main objective is already focused on a robust patients' stratification by using ensemble clustering [31], thus partly anticipating the current application of XAI in medical purposes.
The output of our XAI procedure [9] satisfies transparency, because it is strictly linked with the complete knowledge of the flowchart depicted in Figure 1. Justification is supported by a cross evaluation of the emergent effective feature consisting in the molecular subtype with the therapy label, which is the target of XAI for medical purposes. Uncertainty estimations are sketched with the help of bootstrap, while the missing informativeness at this stage will be tested in future developments according to the employment of new data sources. Indeed, a higher number of features can be retained as endowed with an information content if the used bioinformatics data are correlated with the ones yielded by different sources: this data integration will be targeted in further investigations addressing the indistinguishability of Luminal A and Luminal B HER2 negative subtypes observed for any embedding technique in Figure 2 and residing in the small weight attributed to the Ki67 biomarker.
Nevertheless, the clustering depicted in Figure 3 shows an accurate selection of molecular subtypes, as proved in Table 3, while a less precise selection of therapies is described in Table 4, probably caused by not included factors characterizing each patient, as well as associated comorbidities. The bootstrap test of the presented clusters, reaching an ARI approximately equal to 0.6 with respect to molecular subtypes and 0.4 with therapies, reveals a lower level of performances, as represented in Figure 5 where the lower values assumed by ARI with therapies labels are still observed. Concerning the selection of the optimal number of clusters described in Figure 4, we have to stress that the presented results are preliminary and included in a more general framework referred to personalized medicine. Furthermore, the cardinality of our data set does not allow us to configure the discussion in the high dimensional setting. This characterization translates into internal validation indices not endowed with a stable behavior with respect to the variation in the number of clusters, which motivated our choice for a bootstrap procedure, as well as to take into account the optimal number of cluster once the first derivative associated with mean indices establishes a steady condition. The selection of k = 4 is supported both in a supervised way by ARI and in an unsupervised one by the total within sum of squares and the average silhouette width, even if taking into account error bars k = 3 represents an equivalent choice, which is suggested by the Dunn index. According to the therapeutic perspective this equivalence is motivated by the similar clinical characterization of Luminal B HER2 positive and HER2 positive subtypes, because the correlation of high values for Ki67 with HER2 positivity induces a widespread adoption of chemotherapy. It is important to underline that the maximum value of the Dunn index together with the most pronounced variation of the average silhouette width are associated with k = 2, so related with just the HER2 discrimination.
Results shown in Table 4 matching the most frequent therapeutic plan observed in Table 2 establish that misclassifications regarding subtypes do not prefer a certain therapeutic strategy, but they spread over all combinations, thus maintaining the same prevalent assignment given by clinicians per subtype. Moreover, the behavior of ARI curves represented in Figure 5 is equivalent, unless regarding the values they reach, thus confirming that the overall trend is well captured by our clustering procedure and therapy inaccurate assignments implemented by clustering are just statistical fluctuations. The correspondence between molecular subtypes and therapies shown in Table 2 represents the hinge of our unsupervised clustering procedure with health care plans assigned by clinicians to each patient.

Conclusions
In this work we present results concerning an unsupervised definition of clusters of breast cancer patients diagnosed with a first tumor. The adaptive dimension reduction method is combined with the average distance clustering, whose output is compared with molecular subtype and therapy labels through the adjusted rand index metric. Internal validation indices show the limits of our study linked with the small cardinality of the data set, thus characterizing it as a preliminary analysis inserted in the more general personalized medicine purpose. The equivalent behavior of the unsupervised partition with respect to both labeling ensures the robustness of the clinical decision, here considered just as a label, so not included as a feature.
Besides, this result guarantees further developments of the implemented procedure to detect therapy information in an unsupervised way among elements of unstructured datasets, as well as to suggest modification in therapeutic plans. In this perspective once a sufficiently populated database is obtained, we may think to implement a query of new patients subject to a case-based reasoning able to detect specificities and comorbidities, as well as to improve tumor recurrences by adopting a most suitable therapeutic scheme.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available because are propriety of Istituto Tumori "Giovanni Paolo II"-Bari, Italy.
Acknowledgments: This work was supported by funding from the Italian Ministry of Health "Ricerca Finalizzata 2018".

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Adaptive Dimension Reduction
The ADR procedure which maps into the relevant space endowed with a reduced dimension n adopts centered features and starts by implementing a PCA, that is by keeping just the eigenvectors associated with the n largest eigenvalues of the covariance matrix X T X, where X is a M × N matrix representing the data set with each row corresponding to a patient x i . The aforementioned diagonalization is implemented through a N × N rotation matrix R, whose columns correspond to eigenvectors r j of the covariance matrix X T X. Denoting by Q (1) the N × n matrix composed by the selected eigenvectors, we obtain a PCA dimensional reduction represented by the M × n matrix Y (1) = XQ (1) . The first iteration implements a k-means clustering concerning the points coordinate y i in Y (1) , whose output is expressed by the M × k matrix H (1) , where k = n + 1 is an optimal value, with elements H l i = 1 if y i εC (1) l , l = 1, . . . , k, otherwise H l i = 0. An important role is played by clusters centroids in the original full dimension space, whose coordinates are expressed by µ (1) l | standing for the cardinality of patients' cluster C (1) l . The last step looks at the covariance matrix of clusters centroids, weighted by their cardinality, thus reading S (1) = µ (1) (H (1)T H (1) )µ (1)T , whose PCA leads to the new iteration by using Q (2) . We will follow the iteration cycle until the Frobenius norm ||Q (α) − Q (α+1) | | F < ε, with ε a small preset parameter equal to 0.001 in the Rdimtools package when it is not specified by the user, as in our case motivated by our use of scaled features. Figure A1. ADR embedding in the three-dimensional space spanned by relevant degrees of freedom y = y 1 , y 2 , y 3 , adopting the same color code of Figure 2.
The preferred embedding in the two-dimensional plane is caused by the new weight attributed just to the in situ component in the three-dimensional case shown in Figure A1. Indeed, it is possible to recognize the embedding presented in Figure 2 in the plane y = y 1 , y 2 , while patients accumulate approximately along two parallel planes in the third dimension because of the values assumed by the new feature listed in Table 1, where just a few cases are referred to G1, G2 and G3. Moreover, the in situ component does not influence the therapy choice, even if the variance explained is 61.43% compared with 45.24% obtained in the two-dimensional embedding, thus confirming its noise content for our aims.

Appendix B. Embedding Techniques Adopting Local Patches
A local structure is described by a neighborhood graph associated with each patient's position x i in the initial feature space, which links the node representing the latter to its m nearest neighbors patients or to those satisfying d x i , x j < ρ, with ρ an input radius. This topological information is encoded in an adjacency matrix A ij = 1 if for nodes i and j the presented proximity property holds true, otherwise A ij = 0. A complementary graph is built to take into account the information associated with far apart nodes according to the adjacency matrix such that W ij = 1 if A ij = 0 and viceversa.
The neighborhood graph structure becomes a weighted graph by assigning a gaussian weight W ij = e −d(x i ,x j ) 2 /σ 2 to each edge, as shown in the main text. The cost function is explicitly evaluated

Appendix C. Internal and External Validation
The unsupervised selection of an optimal number of clusters in our procedure is ruled by multiple methods for internal validation, which aims at the evaluation of separation and compactness of clusters in output. The aforementioned M(M − 1)/2 distances are partitioned according to cluster memberships established by means of cutting at a certain level the dendrogram yielded by UPGMA, such that k clusters are obtained. The total within sum of squares exploits the function: where (i, j) represents the edge joining patients y i and y j belonging to the same cluster. The optimal k is given by the first integer such that ∑ γ≥k |WSS(γ + 1) − WSS(γ)| < δ, with δ a sufficiently small preset parameter to obtain stability for a reasonably small value k.
Another method corresponds to silhouette width, measuring the dissimilarity of patients in different clusters. The similarity of the patient y i in the cluster C l with respect to remaining members is The range of this quantity is [-1,1], with the lower bound corresponding to high dissimilarity with patients within the same cluster, while the upper one expresses high similarity with remaining members [22]. We will consider the mean value evaluated over all patients to define the average silhouette width, whose maximization targets the selection of an optimal number of clusters.
The Dunn index measures the smallest separation between patients belonging to different clusters with respect to the largest cluster diameter [25] Dunn(k) = The optimal number of clusters is compared with the patients' partition defined by labels we wish to characterize in an unsupervised manner through the adjusted Rand index (ARI), representing an external validation method. The initial Rand index (RI) formulation of this quantity measures the overlap of clusters taking into account a set of objects O = {o 1 , . . . , o m } endowed with two partitions U = U 1 , . . . , U p and V = V 1 , . . . , V q . Then define: with 1 ≤ i, j ≤ m and 1 ≤ l, l 1 , l 2 ≤ p, 1 ≤ l , l 1 , l 2 ≤ q. Let's describe these sets in terms of binary classification by means of two class labels related to: objects o i , o j belong to the same subset in U and V or objects o i , o j belong to different subsets in U and V. In this way the Rand index is equivalent with the definition of classifiers accuracy: where m 2 is the number of combinations for m objects in pairs.
To define the adjusted version of RI, we introduce a contingency table, with elements n ij = U i ε V j , sums over rows a i = |U i | and sums over columns b j = V j . Under the null hypothesis of a random model with a fixed number and size of clusters and elements shuffling between the fixed clusters generate random clusterings, by converting the cardinalities in RI according to the contingency table elements, it is possible to show that equal to a normalized difference of the Rand Index and its expected value [24]. This metric has range [−1,1] with 1 corresponding to perfect mutual agreement, −1 corresponding to perfect disagreement and 0 that correspond to random agreement between the two partitions [22], so measuring the mutual agreement of labels pairs from two clusterings.