Comparing World Health Organization / International Society of Urological Pathology Grading and Fuhrman Grading with the Prognostic Value of Nuclear Area in Patients with Renal Cell Carcinoma

: This study was undertaken to compare Fuhrman grading with World Health Organization / International Society of Urological Pathology (WHO / ISUP) grading and stereologically measured nuclear area in patients with Clear Cell Renal Cell Carcinoma (ccRCC) or Papillary Renal Cell Carcinoma (PRCC) and to evaluate the independent predictive value of Fuhrman, WHO / ISUP and stereologically measured nuclear area combined with necrosis in a series of patients with ccRCC in relation to cancer-speciﬁc survival. In all, 124 cases of ccRCC and PRCC were included. All slides were blindly scored by two trained pathologists according to the Fuhrman and WHO / ISUP grading systems. Nuclear measurements were performed on digitally scanned slides in Visiopharm ® and correlated to survival. Analysis of ccRCC and PRCC cases showed that application of WHO / ISUP grading resulted in a signiﬁcant downgrading of cases from G2 to G1, when comparing with Fuhrman grading. Neither of these patients experienced progression. Cancer speciﬁc survival estimates in 101 ccRCC patients showed that WHO / ISUP grading was slightly superior in predicting cancer-speciﬁc survival. Novel models included WHO / ISUP grading and mean nuclear area (MNA) each of which combined with necrosis. Both demonstrated an increased ability to predict cancer-speciﬁc survival. The study demonstrates that WHO / ISUP grading provides superior prognostic information compared to Fuhrman grading and stereologically measured nuclear area. Necrosis in combination with either WHO / ISUP grading or MNA adds additional prognostic information.


Introduction
Renal cell carcinoma (RCC) is a neoplasm with widely varying prognosis, from an aggressive neoplasm, with metastasis at presentation, to a slowly growing neoplasm that can be observed safely for years [1]. The overall 5-year progression-free survival rate is 70% and the cancer-specific mortality rate is 24% [2]. Numerous different prognostic markers have been investigated. However, only morphological features such as tumor size, vascular invasion, necrosis, stage and grade are routinely utilized in an effort to predict outcome [3,4].
A variety of grading systems have been proposed that focuses on nuclear morphology. Of these, that of Fuhrman et al. [5], published in 1982, has achieved widespread use throughout the world in clinical routine pathology. It is a 4-tiered grading system, which is based primarily on the simultaneous assessment of nucleolar prominence, nuclear size, and nuclear irregularity. The first three grades are defined on nuclear features and the fourth grade is defined by the presence of nuclear pleomorphism, Table 1. Despite widespread usage of the Fuhrman grading system, it has become apparent that the system has a number of inherent problems, in particular those related to poor reproducibility [6,7]. At the ISUP consensus conference, a novel grading system was proposed, based on nucleolar prominence [8], Table 1. The ISUP grading system was later endorsed by the World Health Organization (WHO) and renamed as the WHO/ISUP grading system [9] with few modifications, but that the staining quality of the nucleolus should also be encompassed. The WHO/ISUP grading system should be applied to Clear Cell RCC (ccRCC) and Papillary RCC (PRCC). However, Chromophobe RCC (ChRCC) should not be graded, since neither Fuhrman or WHO/ISUP are appropriate for grading of this tumor subtype [10]. The WHO/ISUP grading system has achieved widespread usage and has now replaced the Fuhrman grading system worldwide [11]. Tumor necrosis is another factor that has shown prognostic significance in several studies [4,12,13]. It occurs frequently in RCC and appears to be dependent on the histological subtype, with the highest occurrence in PRCC (32%-40%) and ccRCC (27%-32%) [4,14,15]. Delahunt et al. [4] recently proposed a modification of the current WHO/ISUP grading system incorporating tumor necrosis, Table 1. In this study, a significant difference in survival between each grade for ccRCC was demonstrated, in addition to a superior concordance index compared to ISUP grading. The ISUP Vancouver Consensus Conference on Renal Cell Carcinoma recommended to routinely include the presence or absence of tumor necrosis [8]. However, necrosis has not yet been implemented in any of the grading systems.
The ability to study nuclear morphometry quantitatively is made possible by advances in computer imaging technology. Issues with lack of reproducibility, different grading systems and the subjectivity that always belongs to histological grading systems might be avoided by using a more reproducible method to assess nuclear features and thereby predicting prognosis [16,17].
Hence, it is important to acknowledge the necessity of validation of the novel grading systems in different populations, and to the best of our knowledge, only few validation studies have been performed until now [18,19]. Furthermore, with the introduction of digital pathology in many countries, it seems relevant to investigate, how stereologically assessed nuclear morphometry correlates to the different grading systems. The objectives of our study were twofold: 1) to assess interobserver reliability and agreement using the Fuhrman nuclear grading system and the WHO/ISUP grading system for ccRCC and PRCC and to correlate gradings with nuclear morphometry; 2) To evaluate the independent predictive value of Fuhrman, WHO/ISUP and stereologically measured nuclear area in relation to cancer-specific survival in patients with ccRCC and to validate novel proposed models for grading incorporating tumor necrosis.

Inclusion of Patients
Patients nephrectomized at our institution between 2001 and 2012, who gave written informed consent and were diagnosed with PRCC or ccRCC, were included in the study. None of the patients received neo-adjuvant therapy. Files of all patients were reviewed and data regarding pathological parameters, sex, age at diagnosis and data regarding follow-up and death were obtained retrospectively. Date and cause of death were obtained from the Cause of Death Register, Denmark. The

Evaluation of Pathological Parameters
Paraffin-embedded tumors were sectioned and stained with hematoxylin-eosin (HE). Two pathologists reviewed independently and were blinded to all tumor slides with regard to the assessment of Fuhrman grade, WHO/ISUP grade, microscopic necrosis and subtype. Grading followed criteria listed in Table 1. Necrosis was reported, when well-demarcated foci of necrosis within tumor was observed.

Stereological Assessment of Nuclear Area
All slides were scanned for evaluation using a digital slide scanner, NanoZoomer 2.0-HT (Hamamatsu, Japan). Visiopharm newCAST Whole Slide Sterology software (Visiopharm, Hørsholm, Denmark) was used for calculation of nuclear area. Tumor areas were manually drawn as region-of-interest (ROI) and sample images from these were collected randomly using meander fraction-based sampling at 20 times magnification. In these images, nuclei area was calculated using the nucleator function ( Figure S1).

Statistics
Mean nuclear area (MNA) in all sampled nuclei and mean nuclear area in the 10 largest measured nuclei (MNA-10) was calculated for each patient together with standard deviation and the number of measured nuclei in each sample. Comparisons of nuclear area across patients and pathological characteristics were performed using Student's t-test or one-way ANOVA followed by Bonferroni's multiple comparisons test. Correlation analysis was performed with Spearman's rank correlation.
Cancer specific survival was calculated from the date of diagnosis by imaging to the date of death from RCC or to last follow-up contact. Patients alive at the end of the follow-up, who did not experience progression during the study period, where censored at the date of last follow-up.
A receiver-operating-characteristic (ROC) curve was generated for MNA and the optimal cutoff point was selected according to the point of the ROC curve closest to the top-left corner of the ROC plot. Cancer-specific survival was estimated using the Kaplan-Meier method and differences in survival among groups were calculated using log rank tests.
Two novel grading systems were evaluated, one based on the 4-tiered grading classification proposed by Delahunt et al. [4], incorporating tumor necrosis in the existing WHO/ISUP grading system, the other based on dichotomization of MNA incorporating tumor necrosis.
The ability of the prognostic models to predict death from RCC was evaluated by the area under a ROC curve and the c-index (Harell´s C) [20,21].
The κ statistics, a measurement of reliability between observers that corrects for chance agreement, was used to evaluate the interobserver reproducibility in grading of ccRCC and PRCC between two pathologists. The maximum value for κ is 1.00, which indicates perfect agreement and 0 indicates the level of agreement expected by chance alone. Negative values indicate less than chance agreement. Agreement measures for categorical data according to Landis et al. [22] are as follows: Slight, 0.00-0.20; Fair, 0.21-0.40; Moderate, 0.41-0.60; Substantial, 0.61-0.80 and Almost Perfect, 0.81-1.00. Absolute agreement was assessed with proportions of agreement [23]. Two-sided p-values < 0.05 were considered significant. All analyses were done with STATA/SE 16.0 (StataCorp, College Station, TX, USA).

Results
The study comprised 124 patients with RCC, with either the papillary subtype 1 (n = 14), papillary subtype 2 (n = 9) or the clear cell subtype (n = 101). Table 2 presents clinicopathological data for all 124 patients included in the study and summarizes the statistical analysis for nuclear morphometry. The median follow-up was 40.6 months (range 0.9 to 136.3), during which a total of 52 (42%) patients died and 33 (27%) patients died from RCC. Forty patients (32%) experienced recurrence within follow-up with a median time to recurrence of 9.4 months. Of the 101 ccRCC patients, 35 patients (34.6%) experienced recurrence with a median time to recurrence of 9.4 months.
Mean nuclei area (MNA) and mean nuclei area in the 10 largest nuclei (MNA-10) are listed in Table 2. The mean number of measured nuclei per tumor was 133 (range 13-287). Microscopic necrosis correlated with MNA (p < 0.0001) and MNA-10 (p < 0.0001), with a higher MNA and MNA-10 in patients with microscopic tumor necrosis. Neither pT stage, subtype, nor sex correlated with MNA or MNA-10.
All patients were assigned a Fuhrman grade and a WHO/ISUP grade by two pathologists, blinded to clinical data. Detailed relationship between WHO/ISUP grading and Fuhrman grading is shown in Table 3 and Figure 1A. MNA and MNA-10 both correlated with Fuhrman grade and WHO/ISUP grade with a proportional increase in MNA and MNA-10 with higher grades of both Fuhrman and WHO/ISUP, Figure 1B,C. Correlation analysis revealed a significant correlation between MNA and Fuhrman as well as MNA and WHO/ISUP grade (p < 0.0001, r = 0.53 and p < 0.0001, r = 0.57, respectively). All patients were assigned a Fuhrman grade and a WHO/ISUP grade by two pathologists, blinded to clinical data. Detailed relationship between WHO/ISUP grading and Fuhrman grading is shown in Table 3 and Figure 1A. MNA and MNA-10 both correlated with Fuhrman grade and WHO/ISUP grade with a proportional increase in MNA and MNA-10 with higher grades of both Fuhrman and WHO/ISUP, Figure 1B,C. Correlation analysis revealed a significant correlation between MNA and Fuhrman as well as MNA and WHO/ISUP grade (p < 0.0001, r = 0.53 and p < 0.0001, r = 0.57, respectively).  Comparison of WHO/ISUP and Fuhrman grades demonstrated a significant downgrading upon WHO/ISUP grading (p < 0.0001). In particular, only five patients were given the grade G1 (4%) according to Fuhrman grading, whereas 26 patients were graded G1 according to WHO/ISUP (21%). No significant difference in MNA was seen between the grades of Fuhrman and WHO/ISUP ( Figure 1D).
Interobserver κ-value for Fuhrman was 0.34, SE = 0.059 (Fair) and for WHO/ISUP 0.48, SE = 0.055 (Moderate). Interobserver κ-value for microscopic necrosis was 0.60, SE = 0.09 (Moderate). The proportion of agreement for Fuhrman grade and WHO/ISUP grade was 72.6% (95% CI: 63.8%-80.2%). Figure 2A depicts the receiver operating characteristic (ROC) curve characterizing the ability of MNA to predict death from RCC (Cancer Specific Survival), which was used to generate the optimal cut-point, as shown by the dashed vertical line in Figure 2B. The cut-off value of 35.75 µm 2 , as determined by the ROC curve, had a sensitivity of 93.9% and a specificity of 47.3% in predicting death from RCC. Comparison of WHO/ISUP and Fuhrman grades demonstrated a significant downgrading upon WHO/ISUP grading (p < 0.0001). In particular, only five patients were given the grade G1 (4%) according to Fuhrman grading, whereas 26 patients were graded G1 according to WHO/ISUP (21%). No significant difference in MNA was seen between the grades of Fuhrman and WHO/ISUP ( Figure  1D).
Interobserver κ-value for Fuhrman was 0.34, SE = 0.059 (Fair) and for WHO/ISUP 0.48, SE = 0.055 (Moderate). Interobserver κ-value for microscopic necrosis was 0.60, SE = 0.09 (Moderate). The proportion of agreement for Fuhrman grade and WHO/ISUP grade was 72.6% (95% CI: 63.8%-80.2%). Figure 2A depicts the receiver operating characteristic (ROC) curve characterizing the ability of MNA to predict death from RCC (Cancer Specific Survival), which was used to generate the optimal cut-point, as shown by the dashed vertical line in Figure 2B. The cut-off value of 35.75 μm 2 , as determined by the ROC curve, had a sensitivity of 93.9% and a specificity of 47.3% in predicting death from RCC.  The division of MNA into a two-tiered grading system could significantly separate patients with good or poor prognosis, Figure 5. However, in the good prognosis group, five patients experienced progression in disease.
As demonstrated by the c-indexes, the WHO/ISUP grading system contained greater predictive ability compared with the Fuhrman grading system and the stereologically measured MNA (c-indexes of 0.74 versus 0.68 and 0.70, respectively), Table 4. Proposed novel grading models, incorporating necrosis with either MNA or WHO/ISUP, resulted in slightly greater predictive ability (c-indexes of 0.76 and 0.75, respectively).      As demonstrated by the c-indexes, the WHO/ISUP grading system contained greater predictive ability compared with the Fuhrman grading system and the stereologically measured MNA (c-indexes of 0.74 versus 0.68 and 0.70, respectively), Table 4. Proposed novel grading models, incorporating necrosis with either MNA or WHO/ISUP, resulted in slightly greater predictive ability (c-indexes of 0.76 and 0.75, respectively).

Discussion
In this study, we investigated the prognostic significance of the Fuhrman grading system, the WHO/ISUP grading system, and the correlation of nuclear morphometry to clinical outcome together with two novel, modified grading systems in ccRCC. We demonstrated that the WHO/ISUP grading system is superior in predicting cancer-specific survival. Modified groups, combining either WHO/ISUP or MNA with necrosis, were only slightly superior to WHO/ISUP grading alone. Interobserver reliability calculated with kappa statistics was moderate for the WHO/ISUP grading system and fair for the Fuhrman grading system. The Fuhrman grading system was published in 1982 and the study behind has later been criticized for having major limitations, such as small number of patients, limited follow-up time, and no distinction between the different morphological subtypes of RCC. Yet, the Fuhrman grading system has achieved great popularity and is still used by many pathologists today [7]. Over the years, other issues have arisen when validation has been pursued, including poorly defined criteria for nuclear pleomorphism and difficulties in assessing nuclear diameter objectively. There is no recommendation of the relative importance of each of the parameters (nuclear diameter, nuclear shape, and nucleolar prominence) and no guidance on how to stratify between them, when contradictory results are obtained. Furthermore, lack of reproducibility within studies with reporting of significant variation in the distribution of the Fuhrman grades and variation in association to outcome are other important issues. Some studies suggest limited prognostic significance unless grades are combined for statistical analysis [24,25]. This has led pathologists to attempt to grade only on the basis of nucleolar prominence, which does not conform to the grading criteria of the Fuhrman system [6]. The WHO/ISUP system has now, to some extent, replaced the Fuhrman grading system. Only a few studies have validated the novel WHO/ISUP system in comparison to Fuhrman grading [13,18,19,26]. These studies demonstrated a superior predictive ability of the WHO/ISUP compared to the Fuhrman grading system.
In our series, we showed that grade 1 tumors according to both the WHO/ISUP grading system and the Fuhrman grading system were associated with an excellent prognosis, with no cases showing cancer progression. There was a significant separation in outcome between grades 1 and 3 according to both grading systems. However, grade 2 showed overlap of survival curves and we could not demonstrate a clear separation by WHO/ISUP grading. This could indeed be due to a smaller case number and not reflect a true problem of the grading system, since other larger studies did not report this [18]. Dagher et al. demonstrated that grading according to the WHO/ISUP resulted in a relative downgrading of cases as compared with Fuhrman grading. This was explained by the criteria of the WHO/ISUP, which bases the first three grades on nucleolar features. We could demonstrate a similar downgrading of cases when applying the WHO/ISUP grading system. Since WHO/ISUP grading demonstrated a significant downgrading of cases from G2 to G1, of which none of these experienced recurrence or metastatic disease, it seems that this grading system is slightly better at separating the group with excellent prognosis from the intermediate group.
In the present study, Fuhrman nuclear grading and WHO/ISUP grading was carried out by two observers. There was a fair agreement between them with a kappa value of 0.34 for Fuhrman nuclear grading and a moderate agreement with a kappa value of 0.48 for WHO/ISUP grading. Clearly, there is a subjectivity in nuclear grading, using either the Fuhrman grading system or the WHO/ISUP grading system, that might be avoided when replaced by quantitative morphometric approaches which evaluate nuclear features. There has been significant research investigating the usefulness of nuclear morphometry to provide information regarding prognosis in patients with RCC [16,27]. MNA has been considered to be one of the most valuable prognostic factors among several morphometric parameters. In the literature, the proposed cut point for dichotomization of patients with good and poor prognosis differs within a range of 32 to 39 µm, which could reflect a true difference, but could also be explained by differences in fixation times or preparation methods. Nevertheless, the identified cut point of MNA in our study is within this range. An isolated assessment of only one quantitative feature, such as nuclear area, may not suffice to describe nuclear abnormalities and the combination of several features may be required in order to give an accurate prediction of prognosis, as concluded by Montironi et al. [28]. Other suggested morphometric parameters are, among others, nuclear area index (NAI), nuclear perimeter, nuclear roundness factor, major and minor diameter, and nuclear form factor [17,27,29]. Stereological measurements of nuclear size are time-and labor-consuming when they, as in this case, are performed by applying the Nucleator function in Visiopharm. Whether the introduction of worldwide digital pathology makes such quantitation quicker and easier is not yet possible to establish. Our work has shown that both Fuhrman and WHO/ISUP grading correspond with increasing mean nuclear size (MNA). The objective, quantitative, and reproducible measurement of nuclear morphometry might be useful as a supplement to the histopathological grading, but the WHO/ISUP grading is significantly easier to applicate, less time-consuming, and provides a reasonable reliability.
Another important issue is the prognostic significance of tumor-related necrosis, which has also been emphasized in many other studies [4,13,30]. However, there seems to be conflicting terminology to describe necrosis. Delahunt et al. describe RCC tumor necrosis as either thrombo-embolic infarction, resulting in tumor coagulative necrosis or as a specific form of necrosis or tumor-related necrosis [31]. Dagher et al., define tumor-associated necrosis as well-demarcated foci of necrosis within the tumor [12]. The ISUP Vancouver Consensus Conference on Renal Cell Carcinoma recommend that the presence or absence of macroscopic and microscopic necrosis should be routinely reported in pathology reports [8]. However, since tumors may be associated with two separate forms of necrosis and thereby two different pathogenic pathways leading to necrosis, confusion relating to how to report necrosis could arise. Unfortunately, none of the two reference groups recommend a methodology for interpreting the prognostic significance of these two types of necrosis [8,9] In this study, microscopic necrosis was reported in accordance with Dagher et al. [12]. Foci of hemorrhage, fibrosis, or hyalinization should not be encountered as tumor necrosis. The prognostic significance of tumor necrosis applies only to ccRCC and has not been demonstrated for the papillary RCC [9].
It is important to acknowledge the limitations of this study. First, the results are based on a single-center retrospective study and must be verified in larger, prospective multi-center studies. Second, only the mean nuclear area was reported, and other nuclear features could be relevant to investigate. However, the strengths of our study include separation of renal cell carcinomas into subtypes, subjecting only ccRCC to prognostic evaluation, and inclusion of necrosis as a prognostic factor.
In conclusion, our study did not demonstrate a clear separation of cancer-free survival curves between the four groups of either the WHO/ISUP system or Fuhrman grading. The WHO/ISUP grading system was slightly superior in predicting cancer-specific survival than the Fuhrman grading system and Mean Nuclear Area. Furthermore, a downgrading of cases from G2 to G1 according to WHO/ISUP in comparison with Fuhrman grading resulted in a separation of patients with an excellent prognosis. Combining necrosis with either the WHO/ISUP grading system or MNA enhanced the predictive ability.
Supplementary Materials: The following are available online at http://www.mdpi.com/2673-4397/1/1/2/s1, Figure S1: Illustration of the Nucleator function in Visiopharm, measuring the area of the nucleus in ccRCC.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to restrictions of privacy.