Lymphatic Dissemination in Prostate Cancer: Features of the Transcriptomic Profile and Prognostic Models

Radical prostatectomy is the gold standard treatment for prostate cancer (PCa); however, it does not always completely cure PCa, and patients often experience a recurrence of the disease. In addition, the clinical and pathological parameters used to assess the prognosis and choose further tactics for treating a patient are insufficiently informative and need to be supplemented with new markers. In this study, we performed RNA-Seq of PCa tissue samples, aimed at identifying potential prognostic markers at the level of gene expression and miRNAs associated with one of the key signs of cancer aggressiveness—lymphatic dissemination. The relative expression of candidate markers was validated by quantitative PCR, including an independent sample of patients based on archival material. Statistically significant results, derived from an independent set of samples, were confirmed for miR-148a-3p and miR-615-3p, as well as for the CST2, OCLN, and PCAT4 genes. Considering the obtained validation data, we also analyzed the predictive value of models based on various combinations of identified markers using algorithms based on machine learning. The highest predictive potential was shown for the “CST2 + OCLN + pT” model (AUC = 0.863) based on the CatBoost Classifier algorithm.


Introduction
Prostate cancer (PCa) is one of the most common cancers in men worldwide; more than one million new cases are diagnosed annually [1]. Prostate cancer is characterized by high clinical heterogeneity, which is manifested by a different propensity for recurrence of the disease, as well as the onset of progression after surgical treatment. The clinical heterogeneity of PCa, in turn, is due to molecular heterogeneity-in the tumors of patients, there are disorders that are accompanied by various changes in signaling pathways and metabolic processes [2,3]. At the same time, for the development of aggressive forms of PCa, the occurrence of only a few driver disorders is sufficient [4][5][6].
Despite a wide range of therapeutic approaches, the problem of choosing the tactics for treating a patient after radical prostatectomy is quite acute, especially for the category of patients with locally advanced PCa (LAPC). In addition to invasion of the prostatic capsule, metastasis to regional lymph nodes is often observed in LAPC. In this case, as a rule, the choice is between the immediate start of adjuvant hormone therapy or active monitoring of the level of prostate-specific antigen (PSA) in the blood.
The choice of a therapeutic concept is based on determining the prognosis for the patient, which, in turn, is based on such clinical and pathological parameters as the PSA level, the size of the primary tumor (stage T), and the Gleason score. Indeed, it is worth noting the emergence of an improved classification of cancer cell differentiation based on the Gleason score, a system developed by the ISUP (International Society of Uropathologists) [7]. The division into groups occurs between 1 and 5, depending on the Gleason score, the method providing the most accurate stratification of tumors.
However, these indicators are not informative enough to identify a group of patients with tumors that have high potential for progression, the development of an aggressive phenotype, and metastasis [8,9]. Thus, reliable prognostic markers are needed, the use of which, in combination with clinicopathological parameters, will help to reliably identify an aggressive tumor phenotype and thus choose the best therapeutic approach for the patient. One of the promising approaches in the search for prognostic markers may be the analysis of the transcriptomic data of tumors [10][11][12]. Currently, several prognostic expression panels of PCa markers based on tissue analysis have been identified, such as Decipher and Oncotype DX Genomic Prostate Score (GPS).
The Decipher test measures the RNA expression levels of 22 different genes, selected based on unique differential expression patterns in early metastasis [13]. The Decipher test has shown high discrimination in the prediction of clinical metastases (AUC = 0.75-0.83) and mortality from PCa (AUC = 0.78) in validation studies, significantly exceeding the available clinicopathological characteristics (AUC = 0.69) [14]. The Oncotype DX panel is a tissue biopsy-based genomic assay that measures the mRNA expression of 17 genes responsible for tumor cell growth and survival [15]. At the development stage of this test, we used the results of quantitative PCR, based on the archival material of tissue samples obtained after the surgical treatment of patients with PCa in low-and intermediate-risk groups. Based on validation studies, the Oncotype DX panel has been shown to be highly correlated with the biochemical recurrence of PCa and a poor prognosis, highlighting its predictive value for patients with PCa in low-and intermediate-risk groups.
Previously, we conducted a study that included miRNA-Seq analysis of 44 PCa tissue samples with and without lymphatic dissemination (N1 group = 20 samples; N0 group = 24 samples), as a result of which we identified a number of miRNAs, the expression of which could potentially be associated with lymphatic dissemination [16]. In the present study, we performed RNA-Seq profiling of an expanded sample of 73 PCa tissue samples from Russian patients and validated the obtained results by quantitative PCR (qPCR). This study also included an independent sample of 37 PCa tissue samples based on archival material. The obtained qPCR data for both PCa samples were used to analyze the predictive value of models based on combinations of candidate markers and clinicopathological parameters, using various machine learning algorithms (Logistic Regression (LR; scikit-learn ver.1.1.3), Light Gradient Boosting Machine (LGBM; ver.3.3.2), CatBoost (ver.1.0.6), Random Forest (scikit-learn ver.1.1.3) and XGBoost (ver.1.6.2)). The results of the study can be used to develop an expression panel for assessing the metastatic potential of high-risk PCa when choosing a therapeutic concept for a patient.

Transcriptome Profile Associated with Lymphatic Dissemination
Based on the obtained RNA-Seq data, the differential expression (DE) of genes was analyzed between groups of patients with and without lymphatic dissemination (N1 and N0 groups, respectively). The list of obtained statistically significant DE genes (DEGs) is presented in Table S1 (p value of the QLF/U tests < 0.05). A heat map of the gene expression profile is shown in Figure 1.
Based on the obtained list of DEGs, pathway enrichment analysis was performed based on the GSEA algorithm using the Reactome 2022 database (Table 1).
According to the results of the pathway enrichment analysis, in the case of tumor samples of the N1 group, we predominantly observed the activation of the cell cycle and translation pathways. It is also worth noting the decrease in the activity of the pathway-"Fatty acid metabolism." presented in Table S1 (p value of the QLF/U tests < 0.05). A heat map of the gene expression profile is shown in Figure 1. Based on the obtained list of DEGs, pathway enrichment analysis was performed based on the GSEA algorithm using the Reactome 2022 database (Table 1).

Selection of Candidate Markers of Lymphatic Dissemination
One of the key objectives of this study was the identification of promising markers of lymphatic dissemination. To solve this problem, the resulting list of DEGs was filtered based on the following parameters: FDR U-test < 0.05; −1 < LogFC < 1; p value r s < 0.05. As a result of the filtering, the following genes were selected that best match the specified criteria: OCLN, F5, TBX1, CST2, RAB27A, PCAT4, and VGLL3 (Table 2).
After selecting a number of candidate markers, we evaluated their multicollinearity, including those with clinical and pathological parameters-ISUP and pT stage. The analysis of the correlation showed that there is no functional strong relationship between the markers and that they can be jointly considered as part of the models (Figure 2).  Table 2. Differential expression of promising genes as markers of lymphatic dissemination based on a sample of Russian patients with LAPC.

Validation of the Relative Expression of the Potential Markers by qPCR
Validation of the expression of promising markers was primarily carried out on a previously sequenced sample of Russian patients with LAPC based on freshly frozen surgical material (FFT samples).
As a result of the validation, it was shown that, based on the expression of all considered genes, a statistically significant difference between the studied groups was confirmed ( Figure 3a and Table 3). Next, we validated the relative expression of the selected genes in an independent sample of Russian patients (FFPE samples).
After selecting a number of candidate markers, we evaluated their multicollinearity, including those with clinical and pathological parameters-ISUP and pT stage. The analysis of the correlation showed that there is no functional strong relationship between the markers and that they can be jointly considered as part of the models ( Figure 2).

Validation of the Relative Expression of the Potential Markers by qPCR
Validation of the expression of promising markers was primarily carried out on a previously sequenced sample of Russian patients with LAPC based on freshly frozen surgical material (FFT samples).
As a result of the validation, it was shown that, based on the expression of all considered genes, a statistically significant difference between the studied groups was confirmed ( Figure 3a and Table 3). Next, we validated the relative expression of the selected genes in an independent sample of Russian patients (FFPE samples).
Statistically significant results were obtained based on the relative expression of the CST2, OCLN, and PCAT4 genes (Figure 3b and Table 3). The results of the calculations of the average change in the relative expression of the genes between the studied groups, performed based on the FFPE samples, showed the greatest decrease in expression in the presence of lymphatic dissemination for the PCAT4 gene (a 1.64-fold decrease) and the greatest increase in expression was shown for the CST2 gene (a 3.25-fold increase).
Next, we validated the relative expression of the selected promising miRNAs. In the case of the FFT samples, a statistically significant difference between groups based on the relative expression of miR-148a-3p and miR-615-3p was confirmed ( Figure 4a and Table 4).  Next, we validated the expression of miR-148a-3p and miR-615-3p in the FFPE samples. According to the results obtained, based on the relative expression of these miRNAs, there was also a statistically significant difference between the groups (Figure 4b).
In the case of miR-615-3p, we observed a 4.08-fold increase in expression in the case of the FFT samples (p = 0.001) and a 2.23-fold increase in the case of the FFPE samples (p = 0.04). The expression of miR-148a-3p was reduced in the N1 group by 1.5-fold in the FFPE samples (p = 0.04) and 2.04-fold in the FFPE samples (p = 0.04).

Relative Expression of the Candidate Markers in Lymph Node Metastases
We also analyzed the relative expression of the candidate markers in lymph node metastasis samples. Relative expression of CST2, OCLN, miR-148a-3p and miR-615-3p Next, we validated the expression of miR-148a-3p and miR-615-3p in the FFPE samples. According to the results obtained, based on the relative expression of these miRNAs, there was also a statistically significant difference between the groups (Figure 4b).
In the case of miR-615-3p, we observed a 4.08-fold increase in expression in the case of the FFT samples (p = 0.001) and a 2.23-fold increase in the case of the FFPE samples (p = 0.04). The expression of miR-148a-3p was reduced in the N1 group by 1.5-fold in the FFPE samples (p = 0.04) and 2.04-fold in the FFPE samples (p = 0.04).

Relative Expression of the Candidate Markers in Lymph Node Metastases
We also analyzed the relative expression of the candidate markers in lymph node metastasis samples. Relative expression of CST2, OCLN, miR-148a-3p and miR-615-3p was found in all PCa metastasis samples.
It was also found that in the metastasis samples, the expression of the CST2 gene was statistically significantly reduced by an average of 3.5-fold compared to the primary tumor samples (p = 0.002) (Figure 5a). In the case of the OCLN gene, in metastases, on the contrary, an average 3.34-fold increase in expression (p = 0.02) was observed (Figure 5b).
When evaluating the expression of candidate microRNAs in metastasis samples, a statistically significant increase in miR-615-3p expression was found-being, on average, 6.57-fold in comparison to primary tumor samples (p = 0.003) (Figure 5c). In the case of miR-148a-3p, an average 2.16-fold increase in expression (p = 0.01) was also noted in the metastatic samples (Figure 5d).

ROC-AUC Analysis of Models Based on Combinations of the Markers
We assessed the predictive value of the identified candidate markers in various combinations (from single predictors to a model based on all seven predictors). Each combination of predictors was evaluated based on five machine learning algorithms: Logistic Regression, LGBM, Catboost, Random Forest, and XGBoost. The results of all predictor combinations for the five algorithms are presented in Table S2. miR-148a-3p, an average 2.16-fold increase in expression (p = 0.01) was also noted in the metastatic samples (Figure 5d).

ROC-AUC Analysis of Models Based on Combinations of the Markers
We assessed the predictive value of the identified candidate markers in various combinations (from single predictors to a model based on all seven predictors). Each combination of predictors was evaluated based on five machine learning algorithms: Logistic Regression, LGBM, Catboost, Random Forest, and XGBoost. The results of all predictor combinations for the five algorithms are presented in Table S2.
Based on the results obtained in each category of combinations, we selected the best models in terms of the ROC-AUC value using the test data, which passed the threshold of 0.7 for the accuracy parameter in the training data ( Table 4).
The model based on the combination of «CST2+ OCLN+ pT» predictors and the Cat-Boost algorithm was chosen as the best one, characterized by the highest performance in all parameters considered. According to the other considered machine learning algorithms, we also observed high AUC values for this model (Table 5 and Figure 6).  Based on the results obtained in each category of combinations, we selected the best models in terms of the ROC-AUC value using the test data, which passed the threshold of 0.7 for the accuracy parameter in the training data ( Table 4).
The model based on the combination of «CST2+ OCLN+ pT» predictors and the CatBoost algorithm was chosen as the best one, characterized by the highest performance in all parameters considered. According to the other considered machine learning algorithms, we also observed high AUC values for this model (Table 5 and Figure 6).

Discussion
In the present study, a comprehensive transcriptomic analysis of LAPC tissue ples in a sample of Russian patients was carried out. This procedure aimed at ident promising candidate markers of early metastasis. Based on the obtained transcrip profile, the biological pathways associated with lymphatic dissemination were firs sidered. The results of the biological pathways analysis with high enrichment pre nantly demonstrated the activation of translational processes in tumor cells. This tr tional activity is likely to be closely related to increased metabolism, directed at obt energy from various sources to promote the epithelial-mesenchymal transition of cells and metastasis.
Among all of the pathways identified, it is worth noting the decreased activat the "Fatty Acid Metabolism" pathway. It is known that malignant transformatio tumor depends on complex intercellular interactions, supported by a wide netw physical and chemical mediators that make up the tumor microenvironment [17 cently, various researchers have emphasized the key role of adipose tissue as a key ponent in the progression of solid tumors [18]. The prostate gland is surround periprostatic adipose tissue, and extraprostatic expansion to adipose tissue is a w recognized poor prognostic factor in PCa and an important predictor of recurrenc treatment [19]. The positive relationship between obesity and aggressive PCa, determ by an increase in local and distant spread, also supports the role of adipose tissue in progression [20].

Discussion
In the present study, a comprehensive transcriptomic analysis of LAPC tissue samples in a sample of Russian patients was carried out. This procedure aimed at identifying promising candidate markers of early metastasis. Based on the obtained transcriptomic profile, the biological pathways associated with lymphatic dissemination were first considered. The results of the biological pathways analysis with high enrichment predominantly demonstrated the activation of translational processes in tumor cells. This translational activity is likely to be closely related to increased metabolism, directed at obtaining energy from various sources to promote the epithelial-mesenchymal transition of tumor cells and metastasis.
Among all of the pathways identified, it is worth noting the decreased activation of the "Fatty Acid Metabolism" pathway. It is known that malignant transformation of a tumor depends on complex intercellular interactions, supported by a wide network of physical and chemical mediators that make up the tumor microenvironment [17]. Recently, various researchers have emphasized the key role of adipose tissue as a key component in the progression of solid tumors [18]. The prostate gland is surrounded by periprostatic adipose tissue, and extraprostatic expansion to adipose tissue is a widely recognized poor prognostic factor in PCa and an important predictor of recurrence after treatment [19]. The positive relationship between obesity and aggressive PCa, determined by an increase in local and distant spread, also supports the role of adipose tissue in tumor progression [20].
Interactions between adipocytes and tumor cells in the tumor microenvironment can create a metabolic symbiosis, leading to growth and metastasis. In combination with glucose, fatty acids are also vital for the synthesis of membrane lipids in tumor cells, energy production, and the synthesis of carcinogenesis-associated lipid-signaling molecules such as lysophosphatidic acids [21][22][23]. As a result, tumor cells activate de novo fatty acid synthesis, and elevated levels of fatty acid synthase are negatively correlated with prognosis [24]. Thus, in addition to synthesis, tumor cells can also use exogenous fatty acids as a source of nutrition. Thus, our results also highlight the importance of lipid metabolism in the progression of PCa.
Furthermore, based on the obtained transcriptomic profile, we searched for promising candidate markers based on gene expression. As a result of the validation, we confirmed the statistical significance of the expression of the PCAT4, OCLN and CST2 genes in lymphatic dissemination in an independent sample.
The PCAT4 (prostate cancer associated transcript 4; PCAN1; GDEP) gene is characterized by high tissue specificity for prostate tissue, but there are no published data on the biological function of this gene. According to the data obtained, we saw a significant decrease in the expression of this gene in the N1 group.
The OCLN (occludin) gene encodes the occludin protein, which belongs to tight junction proteins. Tight junctions are one of the key components in tumor metastasis, as tumor cells must pass through a series of barriers to successfully metastasize to secondary lesions [25]. OCLN is widely expressed in tissues and cells with tight junctions and is a membrane protein with four trans-membrane domains [26]. According to the literature, high expression of OCLN has been found in lung cancer, and when OCLN was knocked down in cell lines of lung cancer (A549, NCL-H1650, SPC-A1, HCC827, NCI-H1299, and MSTO-211H), inhibition of cell proliferation was observed in vitro and in vivo. In addition, OCLN knockdown promoted apoptosis of lung cancer cell lines and reduced their ability to invade, on the basis of which the role of OCLN as a tumor promoter and prometastatic factor was shown for the first time [27]. Based on our data, for the first time, an association between increased expression of the OCLN gene and the presence of lymphatic dissemination in LAPC was shown.
The CST2 (cystatin SA) gene is a member of the cystatin family. Based on several studies, it has been shown that high expression of this gene is associated with the development of carcinogenesis. In breast cancer, increased expression of the CST2 gene has been shown to be associated with tumor cell proliferation, movement, and adhesion [28]. Based on our data, we observed an association between increased expression of the CST2 gene and lymphatic dissemination in LAPC.
In addition, we also validated previously identified promising markers based on miRNA expression, specifically miR-615-3p and miR-148a-3p, which also confirmed their association with lymphatic dissemination in the case of an independent sample of LAPC.
Aberrant expression of miR-615-3p has been described in many forms of cancer, including PCa, where overexpression of miR-615-3p has been observed in the most aggressive forms [29][30][31]. Experiments on cell lines of various types of cancer have shown that miR-615-3p overexpression supports cell proliferation and migration [29,30]. Functional studies performed on PCa cell lines have shown that miR-615-3p promotes proliferation, apoptosis, and migration of the PC3M cell line in vitro, indicating that miR-615-3p is an important oncogenic microRNA in PCa [31]. miR-148a-3p is one of the most highly expressed miRNAs in PCa tissues, as well as the most dominant in PCa metastasis [32]. High-grade tumors have been shown to exhibit reduced levels of miR-148a-3 expression. miR-148a expression has also been shown to be downregulated in docetaxel-resistant variants of PCa cell lines, including PC-3 and DU145, and downregulation of miR-148a has been observed in PCa with a risk of biochemical recurrence [33].
Evaluation of the expression of these candidate markers in samples of affected lymph nodes showed a further linear increase in the expression of the OCLN gene and miR-615-3p, as well as increased expression of miR-148a-3p. It can be assumed that the increased expression of these markers is not only associated with lymphatic dissemination in LAPC, but also supports the formation of secondary tumor foci.
We assessed the prognostic significance of PCAT4, OCLN, CST2, miR-615-3p, and miR-148a-3p for various combinations of predictors, both with each other and with such clinicopathological parameters as ISUP and pT. We considered the main metrics for models based on five machine learning algorithms for a classification problem. The

Material
The present study included 73 samples of freshly frozen LAPC tissues, obtained as a result of radical prostatectomy with extended pelvic lymphadenectomy, a procedure performed on the basis of the research of National Medical Research Center for Radiology of the Ministry of Health of the Russian Federation.
The main criteria for sample inclusion in the study were the following: tumor type adenocarcinoma, LAPC (pT3a/3b), no neoadjuvant therapy, known lymph node status (N0/N1), and a negative resection margin for samples with stage N0. The sample of FFT samples was divided into groups, both with and without lymphatic dissemination (groups N1 n = 31 and N0 n = 42, respectively).
As an independent sample, 37 FFPE LAPC tissue samples were used, obtained as a result of radical prostatectomy with extended pelvic lymphadenectomy on the basis of the A.V. Vishnevsky National Medical Research Center for Surgery of the Ministry of Health of Russia. The sample was also divided into groups N1 (n = 19) and N0 (n = 18). Samples of affected regional lymph nodes were also included in the study of patients from group N1 (n = 14).
All samples of tumor tissues were characterized in the pathological anatomical departments of the respective medical institutions, on the basis of which the material was obtained, and they were found to contain at least 70% of tumor cells. The main clinical and pathological characteristics of patients are presented in Table 6.

Library Preparation and High Throughput Sequencing
For the obtained samples of total RNA, the quality was assessed on the Agilent Bioanalyzer 2100 instrument (gilent Technologies, Santa Clara, CA, USA) using the Agilent RNA 6000 Nano Kit (Agilent Technologies) in accordance with the manufacturer's protocol. For the subsequent preparation of mRNA libraries, tumor tissue samples with a RIN value of at least 7 were used. Sample preparation of mRNA libraries was performed using the TruSeq Stranded mRNA Kit (Illumina, San Diego, CA, USA) in accordance with the manufacturer's protocol. The concentration of the resulting libraries was measured on a Quibit 4.0 fluorimeter using the Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific). The quality of the resulting libraries was assessed on an Agilent Bioanalyzer 2100 instrument using the Agilent High Sensitivity DNA Kit (Thermo Fisher Scientific) in accordance with the manufacturer's protocol. The size of the resulting mRNA library was~260 bp. High-throughput sequencing of mRNA libraries was performed on a NextSeq 500 System (Illumina) using NextSeq 500/550 High Output Kit v2.5 (Illumina) in 75 bp single-ended read mode. As a result of the sequencing, at least 14 million reads were obtained for each sample.

Bioinformatics Data Analysis
For the obtained RNA-Seq data in the fastqc format, quality assessment was performed using the FastqQC and MultiQC programs (https://www.bioinformatics.babraham. ac.uk/projects/fastqc/, accessed on 10 May 2022). The Trimmomatic tool was used to remove adapter sequences from RNA-Seq data, which was followed by mapping to the reference genome (GRCh38 assembly) using the STAR tool [34,35]. FeatureCounts (Subread package v.1.6.4, Parkville, Australia) was used to calculate the read counts per gene [36]. The analysis of differential gene expression was performed in the R statistical environment using the edgeR package [37]. The TMM (Trimmed Mean of M-values) method was used to normalize the data. In the analysis of differential gene expression, the following quasi-likelihood F-test (QLF test) and the non-parametric Mann-Whitney test were applied (U-test). The Benjamini-Hochberg correction was applied to calculate the false positive rate (FDR). Spearman's rank correlation coefficients (r s ) were calculated between N0 and N1groups. Differences in the level of gene expression were considered statistically significant at test p values < 0.05. The visualization of heat maps of transcriptome profiles was performed using the ggplot2 package [38]. Biological pathway enrichment analysis based on RNA-Seq data was performed based on the GSEA algorithm using the Reactome 2022 database. The results were considered significant at FDR < 0.05.

Quantitative PCR (qPCR)
cDNA samples were obtained from the mRNA template using Mint reverse transcriptase and oligo(dT) primer (20 µM) according to the protocol of the manufacturer (Evrogen, Moscow, Russia). cDNA was obtained from the miRNA template using the TaqMan Advanced miRNA cDNA Synthesis Kit (Thermo Fisher Scientific) according to the manufacturer's protocol. qPCR was performed in three technical replicates on an Applied Biosystems 7500 instrument (Thermo Fisher Scientific). The HPRT1 gene was used as a reference gene for analysis of relative mRNA expression. The sequences of primers used to validate markers based on mRNA expression are shown in Table 7. When validating microRNAs, miR-28-3p was used as a control. For the detection of control and target miRNAs, commercial sets of primers and probes, all contained on the TaqManTM Advanced miRNA Assay (Thermo Fisher Scientific), were used: 477814_mir (miR-148a-3p), 478175_mir (miR-615-3p), 477999_mir (miR-28-3p). The level of relative expression of genes and microRNA for each study group was calculated by the ∆Ct method. Visualization and statistical analysis of expression results were performed using paired Wilcoxon tests in Jupyter Notebook, Python (ver. 3.6).

Model Analysis
Based on the qPCR results, we used five algorithms for supervised machine learning, including Logistic Regression (LR), Light Gradient Boosting Machine (LGBM), CatBoost, Random Forest and XGBoost. These methods were implemented using scikit-learn, lightgbm, catboost, xgboost libraries in Jupyter Notebook, Python (ver. 3.6). The presence of lymphatic dissemination was used as a target. FFT samples were used as the train set, FFPE samples were used as the test set. The models were trained using cross-validation (cv = 5). Receiver operating characteristics (ROC) curves were used to compare the performance of these five algorithms.

Conclusions
We performed RNA-Seq profiling of 73 LAPC tissue samples, obtained from radical prostatectomy, with extended lymphectomy. Using bioinformatics analysis, enriched biological pathways associated with lymphatic dissemination in LAPC were identified in a sample of Russian patients, a group which can be further studied in the search for new potential therapeutic targets. Moreover, based on the bioinformatics analysis, we identified a number of genes and microRNAs, the expression of which can be considered potential prognostic markers. As a result of the validation of candidate markers by qPCR on an independent sample of patients, statistically significant results were confirmed for the PCAT4, OCLN, and CST2 genes, as well as miR-615-3p and miR-148a-3p. Based on the qPCR data obtained, we analyzed the prognostic significance of various combinations of these candidate markers, including those with the clinicopathological parameters ISUP and pT, using various machine learning algorithms. As a result, we showed that the model, based on the combination of «CST2 + OCLN + pT», was characterized by the highest predictive value (AUC = 0.863) for determining lymphatic dissemination, both on samples of freshly frozen PCa tissues and on samples of archival material.

Informed Consent Statement:
Written informed consent has been obtained from the patients to publish this paper.
Data Availability Statement: All data generated or analyzed during this study are available within the article or upon request from the corresponding author.