The Milan System, from Its Introduction to Its Current Adoption in the Diagnosis of Salivary Gland Cytology

: Salivary gland masses are often encountered in the everyday practice of cytopathology. It is commonly known that the cytologic interpretation of these lesions can pose diagnostic problems due to overlapping cytomorphologic features. Fine needle aspiration (FNA) of salivary lesions shows good to excellent sensitivity and speciﬁcity in differentiating a neoplastic from a non-neoplastic process and in diagnosing common tumors such as pleomorphic adenoma. However, its value is limited in diagnosing speciﬁc neoplastic entities especially those with well-differentiated morphology. In light of this gap, an international group of pathologists has proposed a management-oriented, tiered classiﬁcation for reporting salivary gland FNA specimens, “The Milan System for Reporting Salivary Gland Cytopathology (MSRSGC)”. Similar to other classiﬁcation systems, the MSRSGC scheme comprises six diagnostic categories, which were linked with a speciﬁc risk of malignancy (ROM) and management. In this review article, the author evaluated the published literature on FNA in diagnosing salivary gland lesions with the adoption of the Milan system since its introduction in the daily practice of salivary cytopathology.


Introduction
Fine needle aspiration cytology (FNAC) is a very effective tool in the evaluation of salivary lesions. It is widely adopted to provide management for salivary lumps mostly based on the ability in distinguishing between non-neoplastic and neoplastic entities, and low grade versus high-grade malignancies including metastatic processes [1][2][3][4][5]. Nevertheless, diagnostic challenges in salivary gland FNAC (SG-FNAC) are well documented, particularly in several cases of neoplasms. FNAC can distinguish many benign tumors and can usually differentiate between low-and high-grade carcinomas [1][2][3][4][5][6][7][8][9][10]. Based upon the cytologic interpretation, all malignant tumors and several benign tumors are typically treated by surgical excision. In this regard, the correct diagnosis of malignant entities is a crucial element in determining the extent of surgery including preservation of the facial nerve in the case of parotid tumors and indications for neck dissection. However, it is also important to underline that specific benign entities such as certain pleomorphic adenomas and Warthin tumors are associated with a non-surgical management hinged on clinical follow-up and imaging depending upon patient wishes and health status [1][2][3][4][5][6][7][8].
Several studies demonstrated that the rate of malignancy varies depending upon the size of the mass and the location. For example, the ROM increases from 20-25% in the parotid gland, to 40-50% in the submandibular gland, and to 50-81% in the sublingual and minor salivary glands [5][6][7][8]. Despite the advantages of SG-FNAC, several authors pointed to the wide range of sensitivity and specificity depending upon a variety of factors including FNA technique, cytologic preparation, experience, lesional heterogeneity, and cystic component. It has been demonstrated that SG-FNAC lacks specificity in precisely classifying the tumor as a specific subtype which is also linked with the biphasic nature of several malignant salivary lesions [1][2][3][4][5][6][7][8]. Furthermore, the accuracy of SG-FNA is high for the diagnosis of most common salivary gland tumors such as pleomorphic adenoma ( Figure 1A,B) and Warthin tumors. Several authors have confirmed that the accuracy is also high for distinguishing benign and low-grade neoplasms (Figure 2A,B) from high-grade carcinomas; however, the specificity of SG-FNA for sub-typing a particular neoplasm shows a range (48-94%) of diagnostic accuracy [1][2][3][4][5][6][7][8][9][10]. This is due to the cytological overlap of several of the less common salivary gland tumors.
factors including FNA technique, cytologic preparation, experience, lesiona heterogeneity, and cystic component. It has been demonstrated that SG-FNAC lacks specificity in precisely classifying the tumor as a specific subtype which is also linked with the biphasic nature of several malignant salivary lesions [1][2][3][4][5][6][7][8]. Furthermore, the accuracy of SG-FNA is high for the diagnosis of most common salivary gland tumors such as pleomorphic adenoma ( Figure 1A,B) and Warthin tumors. Several authors have confirmed that the accuracy is also high for distinguishing benign and low-grade neoplasms (Figure 2A,B) from high-grade carcinomas; however, the specificity of SG-FNA for sub-typing a particular neoplasm shows a range (48-94%) of diagnostic accuracy [1][2][3][4][5][6][7][8][9][10]. This is due to the cytological overlap of several of the less common salivary gland tumors.    In 2018, in order to minimize the issues linked with salivary FNAC, under the umbrella of the American Society of Cytopathology (ASC) and the International Academy In 2018, in order to minimize the issues linked with salivary FNAC, under the umbrella of the American Society of Cytopathology (ASC) and the International Academy of Cytology (IAC), a task force, composed of an international group of pathologists and cytopathologists, was established to put forth recommendations for the risk-based and management-oriented tiered classification scheme known as the Milan System for Reporting Salivary Cytopathology (MSRSGC), which is an evidence-based system which correlates diagnostic categories with the risk of malignancy (ROM and a clinical management algorithm [11]. The MSRSGC Atlas was published in March 2018. The atlas is a softcover book including nine chapters devoted to standardization for the reporting of salivary cytology results and as an aid to the clinicians for the management of patients [11]. According to the other classification systems [12], the salivary atlas includes an "overview of diagnostic terminology and reporting and it comprises six diagnostic categories: non-diagnostic, nonneoplastic, atypia of undetermined significance (AUS), neoplasm-benign or salivary gland neoplasm of uncertain malignant potential (SUMP), suspicious for malignancy (SFM), and malignant [11]. The objective of these categories is to create a uniform system that can assist: communication between cytopathologists and clinicians, cyto-histological correlation of cases, allow for data evaluation, and comparison among institutions.
Since its introduction, different papers have been published discussing the usefulness of the adoption of a classification system . The current paper discusses the published recent data in the literature since the publication of the Milan System Atlas, which also represents the basis for the upcoming second edition of the Milan system expected for the fall of 2022.
In these past several years, the International Molecular Cytopathology Meeting held annually in Naples, Italy, has focused on providing perspective on the evolution of molecular pathology in the field of cytopathology across the world with updates on cytological and molecular classifications and novel research. The current review presents a summary for a talk delivered during the 9th International virtual Molecular Cytopathology Meeting (Naples, Italy, December 2020), focusing on the results from the adoption of the MSRSGC in the practice of salivary cytology. For the readers' convenience and according to the approach followed during the virtual meeting 2020, the analyzed papers are divided into different years from 2018 to 2020.

The First Year of Publications Using the Milan System-2018
During 2018, several papers were published dealing with the role and value of the MSRSGC [11,[17][18][19][38][39][40][41]. In 2018, Kurtcyz et al. published the results obtained from a web-based interobserver baseline study to identify cytomorphologic features and cytologic reporting categories that represent sources of poor interobserver agreement [39]. The study included 627 participants, who had evaluated 75 web images chosen from the MSRSGC image set, prior to the release of the Milan Atlas. The results confirmed that 42% of respondents agreed with the reference interpretations of salivary gland lesions. To note, the best agreement was seen in cytopathology certified pathologists. Analyzing the results from the different categories, the best agreement was found in neoplasm-benign (58.9%) and nondiagnostic (49.2%) categories followed by malignant (48.4%). Nevertheless, the categories of uncertain malignant potential (SUMP) and suspicious for malignancy were 23.6% and 22.7%, respectively, confirming the difficulties in the indeterminate categories [39].
Among them, Wei et al. reported a complete review of the literature from 1987 to 2015 in order to identify and study publications, which categorized the salivary entities followed by surgical management [40]. They included 4514 cases from 29 studies demonstrating that the ROM for each diagnostic category was in agreement with those from the MSRSGC and encouraging the adoption of a uniform classification system as a helpful tool in guiding clinical and/or surgical management of salivary lesions.
Among the other interesting data, the authors highlighted that only one study included eight cases classified as "atypical", which were correctly re-classified in the "atypia of undetermined significance category of the MSRSGC even though a ROM was not defined due to the scant number of cases.
Farahani et al. retrospectively assessed the effectiveness of the Milan system in a meta-analytic paper [18]. The study included 92 studies with a total of 16.456 FNA with surgical follow-up. The authors demonstrated that SG-FNAC was able to render a definitive diagnosis in 94% of the cases, documenting a high specificity of salivary cytology (98%). The data reported that the mean prevalence of malignancy was 22%, with an increase in malignancy of up to 91% in cases diagnosed as "suspicious for malignancy" and a significant decrease (at around 5%) in cases diagnosed as non-neoplastic [18]. Nonetheless, the data underlined also that the results were highly heterogenous among the analyzed studies, with differences mostly identified in the population characteristics, the setting of the studies, and the technical aspects of specimens. To note, the study confirmed the role of rapid on-site evaluation (ROSE) in improving sensitivity and specificity of SG-FNAC, with a significant reduction in non-diagnostic yields of 50%, especially when a less experienced pathologist or clinician is performing FNACs [18].
The authors also mentioned the increasing sensitivity and specificity of SG-FNAC in diagnosing neoplasms when ancillary techniques are performed in selective cases [18], as for instance documented by the high specificity of genetic alterations in some of the salivary lesions (i.e., secretory carcinoma). Although the study offered some limits commonly seen in several retrospective papers i.e., bias linked with each included paper and the high level of heterogeneity and across those studies, the meta-analysis confirmed that SG-FNAC is a useful tool in the diagnosis of salivary lesions and the adoption of a standardized reporting system is likely to improve communication and intra-and inter-institution data collection [18].
Viswanathan et al. analyzed a series of 627 SG-FNACs with follow-up available in 373 cases [41]. Two of the authors reclassified the original diagnoses according to the MSRSGC and defined the ROM and risk of neoplasms (RON) for each diagnostic category. The results documented a near-perfect diagnostic agreement between the two cytopathologists with discrepancies in only one case, which was diagnosed with a multidisciplinary approach. The sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV) were 79%, 98%, 94%, and 92%, respectively, confirming that the results were in-keeping with other studies and that the MSRSGC is a useful instrument for the evaluation of salivary lesions.

The Second Year of Publications Using the Milan System-2019
In 2019, several additional papers were published dealing with the results obtained from the application of the Milan system [15][16][17]24,[40][41][42][43][44]. Among them, Song et al. published their series of 893 SG-FNACs, which were retrospectively classified according to the MSRSGC [42]. Histological follow-up was available in 48% of the cases assessing that more than 60% of them had a benign histological diagnosis. The application of the new classification system documented that the ROM for each category was perfectly in line with those reported in it and that the MSRSGC is likely to reduce the number of diagnostic pitfalls and to correctly stratify salivary lesions [42].
In the study by Layfield et al., 408 cases over a 5-year period were reviewed and classified by three independent observers [43]. The purpose dealt with the evaluation of the inter-observer reproducibility using a Cohen's kappa test. According to the authors, the best agreement was documented for the categories of neoplasm-benign and malignant lesions, whilst the lowest agreement was seen in the indeterminate categories, confirming the issues related to those lesions [43].
A French study by Dubucs et al. included the retrospective evaluation of 328 SG-FNACs classified according to the MSRSGC [26]. The data supported the idea that benign neoplasms represent the most common diagnosis (44.2%) whilst malignant and suspicious for malignancy were 11.3% and 4.9% of diagnosed cases, respectively. The majority of cases (65.8%) had a surgical follow-up including the evidence that all malignant lesions were confirmed, whilst 68.8% of them were suspicious for malignancy. Hence, the paper suggested that the performance of immunocytochemistry (ICC) added informative results in 72.3% of the cases, contributing to a definitive diagnosis in 23.7% of cases [26].
Savant et al. reviewed 199 cases over a 5-year period using the MSRSGC [44]. The distribution of cases in the different diagnostic categories confirmed that the ROM was mostly concordant with those reported in the Atlas for each category. The authors assessed that the Milan classification is helpful for the management of salivary gland neoplasms, including the fact that the indeterminate lesions of uncertain malignant potential may represent a controversial category mostly managed by personalized and tailored approaches [44].
Chen et al. included a series of 1020 SG-FNA specimens in the period between 2008 and 2017 [45]. The series, including 349 histologic follow-up data, was reclassified according to the MSRSGC diagnostic categories. Their results confirmed the reproducibility of the MSRSGC, supporting its use in clinical practice leading to the most adequate management strategy for salivary gland lesions [45].
Jalaly et al. reviewed all published papers available online [48]. Specifically, thirtyseven articles in English in the literature met the criteria, and the entire series included a total number of 16,394 cases with 8468 for surgical follow-up. The authors calculated the ROM and RON for each diagnostic category as well as the false-negative and falsepositive rates. Their results were in agreement with the diagnostic categories of the MSRSGC showing that the ROM for the indeterminate categories was mostly affected by the heterogeneity among studies. In fact, according to different studies, the ROM for the AUS category ranged from 61% by Wang [47] et al. to 27.6% by Maleki et al. [46]. Their detailed analysis underlined that salivary cytology in general, and furthermore, the adoption of reporting by the MSRSGC, has significantly increased the diagnostic accuracy and the tailored management of salivary lumps.
A series of 208 SG-FNAC cases, performed over a 6-year period, were re-classified according to the Milan system by Rivera-Rolon et al. [49]. The yields confirmed that the overall concordance rate between SG-FNAC and histology was 78.8%. Furthermore, as for other studies, the authors reported 93.3% sensitivity, 94.6% specificity, 82.4% positive predictive value, and 98.2% negative predictive value. The use of cytology confirmed its relevant role as for a 94.4% diagnostic accuracy in discriminating benign from malignant neoplasms. Moreover, their result confirmed that the ROM for each category was perfectly in line with the MSRSGC ROM in each diagnostic category [49].
The results were also confirmed worldwide in other series, as for instance in a study by Gaikwad et al. from western India pointed out that the Milan system is a useful tool for the correct management of salivary lesions [50].
Kaushik et al. reported a retrospective cross-sectional observational study with 323 cases. SG-FNAC was correlated with the surgical follow-up (153 cases) and then categorized according to the Milan system [27]. The data were correlated with the RON and ROM. The yields found that the concordance rate of type-specific diagnosis was 80.3% as per conventional cytological diagnosis, whilst with the application of the Milan system, the concordance rates rose to 88.07% with an improvement of 6.67% (excluding non-diagnostic).

Conclusions
Since its introduction, the MSRSGC has demonstrated its validity and usefulness in the application to salivary cytology. The review of the literature in these last years has confirmed the value of a uniform reporting system. Although it is complicated to compare different studies, mostly due to the heterogeneity based on different parameters, the analysis of series with a similar number of cases has demonstrated that the diagnostic categories are likely to be very similar and overlap with those recommended by the MSRSGC.
Nonetheless, the most significant difference is in the introduction of the indeterminate categories of AUS and SUMP followed by lower ROM in the different series.
The data from the literature confirmed that the diagnosis of indeterminate proliferations is a crucial diagnosis, reflecting the difficulties in the daily practice of salivary cytology. Despite the fact that the SUMP category recognizes neoplastic lesions, and among them, we may subdivide neoplasms into morphological subgroups (namely basaloid, oncocytic, and clear cell patterns), the major limit is associated with the impossibility of discriminating between benign and malignant lesions. The latter flaw is intrinsically present in all the different classification systems, and all the indeterminate proliferations are due to the impossibility of evaluating any vascular and/or capsular invasion. Nevertheless, some of the doubts and limits that emerged using the MSRSGC will be the object of discussion and revisions in the upcoming second edition of the MSRSGC, which will be expected for the fall of 2022.
Funding: This research did not receive any specific grant from any funding agency in the public, commercial, or non-profit sectors.