Artificial Intelligence in Renal Cell Carcinoma Histopathology: Current Applications and Future Perspectives

Renal cell carcinoma (RCC) is characterized by its diverse histopathological features, which pose possible challenges to accurate diagnosis and prognosis. A comprehensive literature review was conducted to explore recent advancements in the field of artificial intelligence (AI) in RCC pathology. The aim of this paper is to assess whether these advancements hold promise in improving the precision, efficiency, and objectivity of histopathological analysis for RCC, while also reducing costs and interobserver variability and potentially alleviating the labor and time burden experienced by pathologists. The reviewed AI-powered approaches demonstrate effective identification and classification abilities regarding several histopathological features associated with RCC, facilitating accurate diagnosis, grading, and prognosis prediction and enabling precise and reliable assessments. Nevertheless, implementing AI in renal cell carcinoma generates challenges concerning standardization, generalizability, benchmarking performance, and integration of data into clinical workflows. Developing methodologies that enable pathologists to interpret AI decisions accurately is imperative. Moreover, establishing more robust and standardized validation workflows is crucial to instill confidence in AI-powered systems’ outcomes. These efforts are vital for advancing current state-of-the-art practices and enhancing patient care in the future.


Machine learning:
Machine learning is a specific branch of artificial intelligence, based on algorithms that enable computer systems to learn, make predictions, and decisions based on data, without the need for explicit programming instructions to do so.

Whole-slide images:
Digital representations of entire microscope slides created by scanning glass slides with high-resolution scanners.

Deep learning:
A subfield of machine learning where algorithms are trained for a task or set of tasks by subjecting a multi-layered artificial neural network to a training data. It eliminates the need for manual feature engineering by allowing the networks to learn directly from raw input data during the training process. The acquired algorithm is subsequently utilized for tasks such as classification, detection, or segmentation. The term "deep" refers to the use of artificial neural networks comprising numerous layers, thus referred to as deep neural networks.

Convolutional neural network:
In deep learning, a class of artificial nural network consisting of convolutional of a sequence of convolutional layers to process an input data and produce an output. Each layer implements the convolution operation between the input data and a set of filters. These filter values are learned automatically during training, allowing the network to extract relevant features from the data in an end-to-end fashion (learning the optimal value of all parameters of the model simultaneously rather than sequentially) Box 1. Cont.

Digital pathology:
The process of digitizing the conventional diagnostic approach. It is accomplished through the utilization of whole-slide scanners and computer screens

Pathomics:
The analysis by computational algorithms of digital pathology data, to extract meaningful features. These features are then used to build models for diagnostics, prognostics, and therapeutics purposes Computational pathology: Computational analysis of digital images acquired by scanning pathology slides

Image segmentation:
The process of dividing a digital pathology image into distinct regions or objects of interest (for example nuclei or tumor region) to enable analysis and extraction of specific features.

Artificial Intelligence Aided Diagnosis of RCC Subtypes
Although several advances have been made in RCC diagnostics in the last decade, especially in imaging techniques, histo-pathological diagnosis based on a pathologist's skill and experience remains the standard clinical practice used to distinguish RCC from normal renal tissue at the microscopic level [13,[29][30][31].
However, RCCs can have complicated characteristics that make the diagnosis difficult, laborious, and time consuming, even for experienced pathologists. These issues are known to lead to a moderate inter-reader agreement for the RCC subtype [32][33][34]. In addition, several studies demonstrated that computational pathology could be a solution to more uniform specimen readings and reduce intra-and inter-observer variability [35][36][37].

RCC Diagnosis and Subtyping in Biopsy Specimens
RCC varies in its biological behavior, ranging from indolent to aggressive tumors. Currently, no reliable predictive models that distinguish between different clinical types are available for use in the pre-operative setting, creating concerns about under-and overtreatment, especially in small renal masses (SRMs), which now represent up to 50% of renal lesions [38][39][40][41][42]. Therefore, this issue can lead to overdiagnosis and overtreatment. To date, there are no highly reliable biomarkers or imaging methods that can correctly differentiate between benign and malignant lesions [43][44][45] As a result, there has been a growing trend of using renal mass biopsy (RMB) to address this challenge over the past decade [46][47][48].
However, RMBs have some limitations as they are non-diagnostic in approximately 10-15% of the cases and remain intrinsically invasive [49]. The main reason for the high percentage of non-diagnostic results is inadequate sampling of tumors [50]. Another crucial issue in RMB is a fair degree of interobserver variability [51], a concern that is also found in breast, prostate, and melanoma biopsies [52][53][54].
To tackle these problems, Fenstermaker et al. developed a DL-based algorithm for RCC diagnosis, grading, and subtype assessment [55]. Their method reached a high accuracy level when using only a 100 square micrometers (µm 2 ) patch, making it a potentially valuable tool in RMB analysis. In addition, although their method was trained on wholemount surgical specimens, a computational method trained and tested on small tissue samples may reduce the need for repeat biopsies by decreasing insufficient tissue sampling and reducing interobserver variability.
However, this study focused on identifying the three main subtypes of RCC without considering benign tumors or oncocytomas. A significant proportion of small renal masses (SRMs) are benign, with oncocytoma being the most frequent benign contrast-enhancing renal mass found. A well-known problem faced by pathologists is differentiating oncocytomas from chromophobe RCC [56][57][58]. Zhu et al. reported favorable results in RCC subtyping in surgical resection and RMB specimens, as well as promising results in oncocytoma diagnosis in RMB [59]. The group trained and tested a model on an internal dataset of renal resections. In addition, they tested this model on 79 RCC biopsy slides, 24 of which were diagnosed as renal oncocytoma, and an external dataset, achieving good performance, as shown in Table 1.

RCC Diagnosis and Subtyping in Surgical Resection Specimens
Despite the recent increased use of RMB and enormous advances in diagnostic accuracy [60,61], approximately 73% of surveyed urologists would not perform a RMB for various reasons [62]. Currently, the standard of treatment for non-metastatic RCC is surgical resection, carried out via either a radical or partial nephrectomy; this technique was also used in some selected cases of metastatic RCC [63,64]. However, examining and analyzing the complex histological patterns of RCC surgical resection specimens under a microscope can be challenging and time consuming for pathologists for many reasons. For instance, nephrectomy specimens exhibit substantial heterogeneity, exemplifying the wide variation observed within RCC surgical resection samples [65]. Moreover, variability among different observers, and even within the same observer, has been reported [33].
Good results were obtained by Tabibu et al. in terms of distinguishing between ccRCC and chRCC and normal tissue using two pre-trained convolutional neural networks (CNN) and replacing the final layers with two output layers, which were fine-tuned using RCC data [66]. Moreover, for subtype classification, the group introduced a so-called directed acyclic graph support vector machine (DAG-SVM) on top of the deep network, obtaining good accuracy in this task. Unlike Tabibu et al.'s model, Chen et al. developed a DL algorithm to detect RCC that was externally validated on an independent dataset [67]. To accomplish this task, they used LASSO (least absolute shrinkage and selection operator), which is a method used in ML to select from a more extensive set of features, i.e., the most important in predicting outcomes. Through LASSO analysis, they identified various image features based on the "The Cancer Genome Atlas" (TCGA) cohort to distinguish between ccRCC and normal renal parenchyma, as well as ccRCC and pRCC and chRCC, obtaining high accuracy in test and external validation cohorts.
Also, Marostica et al. created a pipeline using transfer learning to identify cancerous regions from slide images and classify the three major subtypes, obtaining good performance in both the test set and two external independent datasets (Table 3) [68].
RCC classification is a challenging task not only due to the complexity of the procedure itself, but also because the classification system is subject to periodic updates [69,70]. For example, only in recent years has clear cell papillary renal cell carcinoma (ccpRCC) been recognized as a specific entity [4]. This subtype of RCC histologically resembles both ccRCC and pRCC, and it has clear cell changes. However, ccpRCC has distinct immuno-histochemical and genetic profiles compared to ccRCC and pRCC [71]. It also carries a favorable prognosis relative to the latter carcinoma; therefore, the World Health Organization recently changed its denomination to a clear cell papillary renal cell tumor [72]. Abdeltawab et al. developed a computational model that could distinguish between ccRCC and ccpRCC, obtaining an accuracy of 91% in identifying ccpRCC using the institution files and 90% in diagnosing ccRCC using an external dataset [73].
The abovementioned studies were mainly supervised and highly defined for RCC approaches, making them time consuming to conduct. However, the capability to apply knowledge gained from previous experiences to novel situations is a vital skill among human beings. For example, pathologists can use lessons learned outside of their specific subspecialty because several cancer types exhibit common hallmarks of malignancy, as demonstrated by Faust et al., who tested whether a previously trained AI system developed to recognize brain tumor features could be applied to clusters and analyze RCC specimens in an unsupervised fashion [74]. The results showed that grouping cancer regions from non-neoplastic tissue elements matched expert annotations in multiple randomly selected cases. This result, hypothetically, represents a way to demonstrate that unsupervised ML-based methods, which were built for the diagnosis of other cancers, can also be used to diagnose RCC, reducing development and work time. Chen et al. [67] (1) RCC diagnosis, (2) subtyping,

Pathomics in Disease Prognosis
The prognosis for RCC depends on several factors, including anatomical and clinical factors, while histological and molecular factors play important prognostic roles in both non-metastatic disease and mRCC [75].

Cancer Grading
Tumor grading is considered to be one of the most critical factors in prognosis prediction, as the 5-year survival rate for patients with low-grade RCC is around 90%, while in high-grade RCC, the survival rate is about 12% [75][76][77].
Although largely replaced by the WHO/ISUP grading classification method, the Fuhrman grading system still acts as an independent factor in determining a higher risk of recurrence and a lower chance of survival [78][79][80][81][82]. The Fuhrman grading system predominantly focuses on the morphology of the nucleus (size and shape) and the existence of prominent nucleoli, though inter-and intra-observer variability is a serious issue [33,37,83]. Yeh et al. trained a support vector machine (SVM) classifier that performed effectively in identifying, size-estimating, and calculating spatial distribution, as well as distinguishing between low and high grades on ccRCC specimens [84]. However, it could not differentiate between specific grades (e.g., III and IV), and no analyses of patients' likelihood of survival were presented.
Unlike the Fuhrman grading system, the WHO/ISUP system relies solely on nucleolar prominence for grade 1-3 tumors, allowing lower inter-observer variation [85]. Therefore, Holdbrook et al. developed a model that detected prominent nucleoli and quantified nuclear pleomorphic patterns by concatenating features (i.e., combining different features (or variables) into a single input representation for the model) extracted from prominent nucleoli and classifying them as either high-or low-grade features [86]. The model also showed excellent grade classification accuracy and prognosis prediction by comparing these results to a multigene score.
The aforementioned computational systems have many unique features, like image processing, feature extraction, classification method, and predicting two-tiered grades (which demonstrated effective performance in cancer-specific-survival (CSS) prediction). [87]. Tian et al. used 395 ccRCC cases from the TCGA dataset reviewed by a pathologist and stratified via the two-tiered system: low-or high-grade features [88]. Of these features, 277 had concordance between the TCGA and the pathologist's assigned grade and were used to train the model by extracting different histomic features for each patch. They used LASSO regression to select the features most associated with different grades, obtaining a model that predicted two-tiered ccRCC grading in good agreement with manual grades. It also showed a significant association between the predicted grade and overall survival, even when adjusting for age and gender. Furthermore, the model's predicted grade was superior in terms of overall survival prediction to TCGA and pathologist grade in discordant cases. This study was different from those of Yeh et al. [84], who only evaluated one feature (i.e., maximum nuclei size) to predict the two-tiered grade, and Holdbrook et al. [86], who used up to four concatenate feature vectors to calculate F-scores before classifying features into low or high grade. The features used in the model of Holdbrook et al. [86] [84,89]. The results of the studies mentioned above are summarized in Table 2. WSI analysis with an automatic stain recognition algorithm. An SVM classifier was trained to recognize nuclei. Sizes of the recognized nuclei were estimated, and the spatial distribution of nuclear size was calculated using Kernel regression.

ccRCC
A cascade detector of prominent nucleoli (constructed by stacking 20 classifiers sequentially) was trained with WSI images to extract image patches for subsequent analysis. This pipeline used two nucleoli detectors to extract prominent nucleoli image patches.
An automated image classification pipeline was used to detect and analyze prominent nucleoli in WSIs and classify them as either low or high grade. The pipeline employed ML and image pixel intensity-based feature extraction methods for nuclear analysis. Multiple classification systems were used for patch classification (SVM, logistic regression and AdaBoost).

Molecular-Morphological Connections and AI-Based Therapy Response Prediction
Recent developments in predicting RCC survival suggest that molecular differences within subtypes affect prognosis, as well as potentially predictive molecular biomarkers and marker signatures, even though there is no definitive evidence to date supporting the routine clinical use of biomarkers for treatment selection in metastatic RCC (mRCC) [90][91][92][93][94][95].
As the finding of predictive biomarkers still represents an unmet clinical need, AI can be used to explore connections between molecular biomarkers and morphological features on histopathology images, thus overcoming traditional biomarker analysis limitations, such as the high cost (both financially and in terms of time), limited sample size, and lack of standardization [96][97][98][99].
Among the many possible genetic aberrations in RCC, one crucial type of mutation are copy number alterations (CNAs), which are associated with an RCC's development, treatment response, and prognosis [100,101]. Marostica et al. used transfer learning to develop CNAs and somatic mutation image-based prediction models. They demonstrated that CNAs in several genes, including KRAS, EGFR, and VHL, could affect quantitative histopathology patterns [68]. Furthermore, the group leveraged a framework to predict ccRCC tumor mutational burden, which is a potential yet controversial biomarker for immune checkpoint blockade response [102], and obtained good performances on this task. It is important to note that this approach was weakly supervised and did not need a slidelevel label with detailed region or pixel-level segmentation, making it readily applicable for clinical use.
Although immunotherapy has changed the field of mRCC over the last years, TKI monotherapy still plays an essential role in treating patients who are unable to receive or tolerate checkpoint inhibitors as a later-line therapy [75,103]. Go et al. developed an ML-based method to identify which mRCC patients will respond to VEGFR-TKI treatment by analyzing clinical, pathology, and molecular data from 101 patients [104]. Specimens of the primarily resected tissue were collected and retrospectively divided into clinical and non-clinical benefit groups. The authors developed a predictive classifier and obtained a prediction accuracy of 0.87.
As stated, gene expression signatures are commonly used as predictive biomarkers. Endothelial cells and vascular architecture are known to play roles in the biological behavior of the tumor [105]. Ing et al. used ML to analyze tumor vasculature to gather prognostic insights [106]. They used ccRCC cases from the TCGA database to train their algorithm and discovered that nine vascular features correlated with clinical outcomes. They found that four of these features had more significant variation in individuals with poor outcomes than favorable outcomes, linking variation in vascular structure to worse results. Ing et al. identified 14 genes that correlated strongly with these features and built 2 ML-based models with satisfactory prediction outcomes comparable to those of traditional gene signatures. Further efforts are needed to develop models using morphologic and genomic biomarkers to improve patients' prognosis and treatment options.
Another active area of RCC research is the field of epigenetics [107][108][109][110][111]. Zheng et al. investigated possible interactions between histopathologic features and epigenetic changes in RCC [112]. Using morphometric features extracted from histopathological images, they employed ML models to accurately forecast differential methylation values for specific genes or gene clusters. Furthermore, prospective studies are needed to predict the mechanisms underlying cancer progression using predicted genes [113]. The results of the studies mentioned above are summarized in Table 3.  (1) Weak supervision approach used for malignant region identification; (2) Same transfer learning approach trained for 15 epochs; (3) Independent models for ccRCC, pRCC, and chRCC were developed; (4) 10-fold cross-validation was employed. Upsampling of uncensored data points was performed in each fold's training set to enhance the model training process.  (1) Three DCNN architectures (VGG-16, Inception-v3, and ResNet-50) were compared for each task.
(2) Same transfer learning approach as above was used. The hyperparameters of DCNNs were optimized via Talos.
(3) Two transfer learning approaches were used: gene-specific binary classification and multi-task classification for all genes for CNAs. DCNNs were used for associations between genetic mutations and WSI images. (4) DCNN models used image patches as inputs, predicting binary values for each patient. Grad-CAM was generated to identify the regions of greatest importance for survival prediction.
Go et al. [104] RCC VEGFR-TKI response classifier; survival prediction. Features that showed the statistical differences between the good and bad-response groups were selected, and the most appropriate cut-off for each feature was calculated. Secondary feature selection was performed using SVM to develop the most efficient model, i.e., the model showing the highest accuracy with the least number of features Ing et al. [106] (1) RCC vascular phenotypes; (2) survival prediction; (3) identification of prognostic gene signature; (4) prediction models. A stochastic backwards feature selection method with 1500 iterations was applied to identify the subset of VF with the highest predictive power. Two GLMNET models were trained: one model was trained on VF-risk groups, and the other model was trained using a 24-month disease-free status as the ground truth for a validation cohort. Quantitative analysis of tumor vasculature and developement of a gene signature. The algorithms trained in this framework classified with SVM and random forest classifiers, i.e., endothelial cells, and generated a VAM within a WSI. By quantifying the VAMs, nine VFs were identified, which showed a predictive value for DFS in a discovery cohort. Correlation analysis showed that a 14-gene expression signature related to the 9VF was discovered. The two GLMNET were developed based on these 14 genes, separating independent cohorts into groups with good or poor DFS, which were assessed via Kaplan-Meier plots.
Zheng et al. [112] RCC methylation profile 326 RCC (also tested on glioma) In total, 30 sets of training/testing data were generated. Binary classifiers were fitted on the training set, and the best parameters were selected using 5-fold cross-validation. Logistic regression with LASSO regularization, random forest, SVM, Adaboost, Naive Bayes, and a two-layer FCNN were used with optimized parameters.
To demonstrate that DNA methylation can be predicted based on morphometric features, different classical ML models were tested. Binary classifiers for each task were evaluated using accuracy, precision, recall, F1-score, ROC curve, AUC score, and precision-recall curves. Scores from 30 training/testing data sets were averaged per task. For logistic regression, feature importance analysis was conducted to rank the influence of morphometric features on the prediction task.

Prognosis Prediction Models Based on Computational Pathology
In the past, several models were developed and externally validated for the prediction of the prognosis of RCC patients. These models, which are currently used for both localized and metastatic RCC, are mainly based on clinicopathological data, both for localized and mRCC cases [114][115][116][117]. Currently, the prognostic models of localized ccRCC mainly include the Leibovich score [116] and the UISS score [117]. The latter score is primarily based on clinicopathological data, making the pathologist's subjective experience a limitation of their performances [118,119]. All mentioned models incorporate clinical parameters within their framework; however, models based exclusively on pathological data have been validated [120], Regarding mRCC, risk groups assigned via the Memorial Sloan Kettering Cancer Center (MSKCC) and the International Metastatic Renal Cell Carcinoma Database Consortium (IMDC) may differ in up to 23% of cases [75]. Although these models have shown reasonably good performance in the past, there is still room for improvement [121]. AI multimodal approaches applied to medical issues can raise accuracy by up to 27.7% compared to a single modality [122]. Specifically, integrating an ML-based algorithm that predicts RCC survival from histopathology to other known prognosis modalities improved prediction accuracy in multiple studies [123,124].
Cheng et al. was the first study to combine features from the gene data and histopathologic data for ccRCC prognosis [125], thus generating a risk index strongly correlated with survival and outperforming predictions based on separate consideration of morphologic features or eigengenes. The predicted risk could also stratify early-stage patients (stage I and II), whereas no significant difference in survival outcomes when using stage alone was recorded. In Cheng et al.'s study, microenvironment and radiologic imaging information were not integrated into the prognostic model. At the same time, the latter feature proved to be the single modality with the best predictive performance in a computational method presented by Ning et al. This method combined features extracted from CT, histopathological images, and clinical and genomic data [126]. However, Ning et al.'s method also had limitations, such as a small sample size and a lack of external validation. Another algorithm used by Chen et al. was trained on ccRCC images from the TCGA cohort and validated on Shangai General Hospital images to identify substantial survival-related digital pathological factors and combine them with clinico-pathological factors (age, stage, and grade) [67]. The integration nomogram developed in that study showed good ability in predicting 1-3-and 5-year DFS ( Table 1). The study also defined the cut-off value for high-and low-risk scores as the median score for each cohort. Therefore, external validation using a larger cohort or a prospective study would be necessary to confirm the novel computational recognition model's validity and determine the optimal cut-off value for high-and low-risk scores.
Another study by Schulz et al. reported on a multimodal deep learning model trained on multiscale histopathological images, CT/MRI scans, and genomic data from whole exome sequencing [127]. The model showed excellent performance in terms of 5-year survival status prediction, as it outperformed other parameters (T-stage, N-stage, M-stage, and grading). They also investigated the possibility of predicting the 5-year survival status by obtaining a significant difference in the survival curves after dividing the cohorts into low-and high-risk patients, even after evaluating only M0 or M + patients. However, this study had the following limitations: it needed to compare other clinical tools that consider factors such as performance status and calcium levels incorporated in the current, which are widely used prognostic models; the external validation sample size was relatively small; and further research is required to confirm the generalizability of the authors' approach.
The above-mentioned and future models should be externally validated, used in prospective cohorts, and compared to current prognostic models regarding discrimination, calibration, and net benefit [75]. The results of the studies mentioned above are summarized in Table 4.  CNN consisting of one individual 18-layer residual network (ResNet) per image modality (histopathology slides, CT scans, MR scans) and a dense layer for genomic data. The network outputs were then combined using an attention layer, which assigned weights to each output based on its relevance to the task at hand. The combined outputs were passed through a fully connected network. Depending on the specific case, either C-index calculation or binary classification for 5YSS was performed. The 5YSS category included patients who either survived for longer than 60 months or passed away within five years of diagnosis.

Future Perspectives
According to currently available data, AI and ML in RCC pathology ('pathomics') hold promise for the future, as they might help us to overcome several problems in classic histopathology, such as intra-and inter-observer variability and time consumption. Currently, several AI methods can be reliable in RCC diagnosis and, on some occasions, appear capable of predicting clinical outcomes in a few seconds. This capability could be of great help for pathologists in times in which the incidence of RCC is still rising. However, this exciting field is still relatively new and not without teething troubles, both in general and specifically within the realm of RCC [128,129].
In this review, we reported on the excellent results achieved using AI in several tasks, like staging and grading. Supervised learning methods efficiently perform these tasks but cannot be visually authenticated. In simple terms, the machine generates an answer (i.e., low or high grade or subtype) according to its learned algorithms, which humans cannot survey. These algorithms are often referred to as black box algorithms [130]. This problem makes them prone to doubt by the pathology community, as the pathologist must have faith in the findings before approving and discussing a report in multidisciplinary meetings [131]. One possible solution might be creating tools that bring transparency to non-linear machine learning techniques. For instance, gradient-weighted class activation mapping (grad-CAM) is a tool that can overlay images and heatmaps to improve visualization of the cell type or region in which the informative features were expressed [132]. Another possible solution can be "searching and matching", instead of "classifying" in an unsupervised fashion, which the group of Faust et al. used for RCC diagnosis [74]. With unsupervised learning, computers can search and cluster images with matching features in a dataset without labeling the data, which can be labor-intensive and potentially biased [133]. This method more or less resembles the current workflow, as pathologists often use atlases to compare images found in the specimen to judge if they match certain previously described conditions. Alternatively, asking other experts for a second opinion may be useful. However, this approach does not exclude the intervention of human experts since a pathologist still needs to inspect and interpret the images visually.
Another possible drawback of computational pathology is the current lack of generalization due to potentially biased inputs used in the training processes of models. For example, using cross-validation, ML models are validated using a set different from the training set, which can lead to biased evaluation if the input data are biased. Therefore, a recommended step before model training is to always check for any potential sample bias and assess whether there may be any issues related to sample size [134,135], heterogeneity [136], noise [137], and confounding factors [138].
Moreover, supposing the data are derived from one pathology laboratory, the algorithm may only be able to account for some variations and artifacts arising from different institutions. For example, the color distribution of WSIs varies across different pathology laboratories due to the staining process.
Once the data are adequately processed, the model is trained using the training set, and its performance is evaluated using the validation set. The so-called 'overfitting' can occur when a model is so finely tuned to a particular dataset that it fails to generalize well to new and unseen data. Overfitting is akin to memorizing answers to a test rather than understanding the material. Once the training process is complete, the final performance of the model is evaluated using the test set, which contains data that the model has not seen before that moment. This final evaluation estimates the model's performance using new and unseen data [139]. But, if the model is overfitting, it can still perform well if the data are derived from the same laboratory.
This approach leads to inter-center variability that impacts the accuracy of machine learning algorithms used to automatically analyze WSIs. This issue affects state-of-the-art CNN-based algorithms, which often exhibit reduced performance when applied to images from a different center than that on which they were trained [22,23,140,141]. Therefore, a global standard for tissue processing, staining, slide preparation in surgical pathology, and even digital acquisition would be greatly helpful [142]. Existing solutions to reduce generalization error in this setting can be categorized into stain color augmentation and stain color normalization, with ML-based methods that perform stain color normalization using a neural network being proposed [143]. One of the most effective methods to mitigate overfitting is external validation, which involves testing the method on a group of new patients distinct from the initial set, thus assessing the model's generalization ability [20].
The critical evidence for generalizability would be introducing external validation. Any features selected based on idiosyncrasies in the original training data, such as technical or sampling biases, would likely not function properly. As a result, adequate performance while using a reasonably extensive external validation set is seen as evidence of a model's generalizability (Figures 1 and 2) [144]. Therefore, a global standard for tissue processing, staining, slide preparation in surgical pathology, and even digital acquisition would be greatly helpful [142]. Existing solutions to reduce generalization error in this setting can be categorized into stain color augmentation and stain color normalization, with ML-based methods that perform stain color normalization using a neural network being proposed [143]. One of the most effective methods to mitigate overfitting is external validation, which involves testing the method on a group of new patients distinct from the initial set, thus assessing the model's generalization ability [20]. The critical evidence for generalizability would be introducing external validation. Any features selected based on idiosyncrasies in the original training data, such as technical or sampling biases, would likely not function properly. As a result, adequate performance while using a reasonably extensive external validation set is seen as evidence of a model's generalizability (Figures 1 and 2) [144]. Figure 1. Pathway for the development of pathomics algorithms. After the sample is via by surgical resection or biopsy, the WSI is created and derived patches utilized through a digital scanner to train the algorithm to define diagnostic, prognostic, or predictive models. Supervised learningbased algorithms could carry the "black box" issue (see Section 6).

Figure 1.
Pathway for the development of pathomics algorithms. After the sample is via by surgical resection or biopsy, the WSI is created and derived patches utilized through a digital scanner to train the algorithm to define diagnostic, prognostic, or predictive models. Supervised learning-based algorithms could carry the "black box" issue (see Section 6).
Additionally, it is important to note that, as stated above, radiomics showed promising results in different tasks, in particular in diagnosing and subtyping tasks. Many studies used histopathology results as the reference standard to evaluate the radiomic model [145]. Over the past decade, computational pathology research experienced a shift in focus. Initially, the aim of research was to replicate the diagnostic process already conducted by pathologists. However, the most recent literature witnessed a move towards uncovering and exploring "sub-visual" prognostic image cues derived from histopathological images.  Additionally, it is important to note that, as stated above, radiomics showed promising results in different tasks, in particular in diagnosing and subtyping tasks. Many studies used histopathology results as the reference standard to evaluate the radiomic model [145]. Over the past decade, computational pathology research experienced a shift in focus. Initially, the aim of research was to replicate the diagnostic process already conducted by pathologists. However, the most recent literature witnessed a move towards uncovering and exploring "sub-visual" prognostic image cues derived from histopathological images.
Radiomics involves the extraction of computational features that quantify tissue heterogeneity at the macroscopic level by leveraging ML. In contrast, pathomics focuses on providing quantitative information at the micro scale. The fusion of radiomics and pathomics can offer, in the future, an opportunity to combine tumor heterogeneity at both the macro and micro scales, potentially enhancing the integrated signature through complementary insights [146].
To conclude, AI is a promising tool that remains under investigation in relation to the diagnosis, grading, prognosis assessment, and treatment of kidney neoplasms. Results of new AI algorithms are encouraging since they are either on par with or outperform current state-of-the-art methods. However, most available technologies are currently unavailable for widespread clinical use, and further evidence is needed regarding their efficacy. Therefore, further advancements in this exciting field are eagerly awaited [23].
Supplementary Materials: The following supporting information can be downloaded at: www.mdpi.com/xxx/s1, Table S1: AI models datasets for diagnosis and subtyping; Table S2: AI models datasets for grading; Table S3: AI methods datasets for prognostic models; Table S4: AI models datasets for molecular morphologic connection and therapy response predictions  Radiomics involves the extraction of computational features that quantify tissue heterogeneity at the macroscopic level by leveraging ML. In contrast, pathomics focuses on providing quantitative information at the micro scale. The fusion of radiomics and pathomics can offer, in the future, an opportunity to combine tumor heterogeneity at both the macro and micro scales, potentially enhancing the integrated signature through complementary insights [146].
To conclude, AI is a promising tool that remains under investigation in relation to the diagnosis, grading, prognosis assessment, and treatment of kidney neoplasms. Results of new AI algorithms are encouraging since they are either on par with or outperform current state-of-the-art methods. However, most available technologies are currently unavailable for widespread clinical use, and further evidence is needed regarding their efficacy. Therefore, further advancements in this exciting field are eagerly awaited [23].
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/diagnostics13132294/s1, Table S1: AI models datasets for diagnosis and subtyping; Table S2: AI models datasets for grading; Table S3: AI methods datasets for prognostic models; Table S4: AI models datasets for molecular morphologic connection and therapy response predictions.

Conflicts of Interest:
The authors confirm that there are no conflict of interest with any financial organization regarding the material discussed in this manuscript.