Artificial Intelligence Techniques for Prostate Cancer Detection through Dual-Channel Tissue Feature Engineering

Simple Summary Artificial intelligence techniques were used for the detection of prostate cancer through tissue feature engineering. A radiomic method was used to extract the important features or information from histopathology tissue images to perform binary classification (i.e., benign vs. malignant). This method can identify a histological pattern that is invisible to the human eye, which helps researchers to predict and detect prostate cancer. We used different performance metrics to evaluate the results of the classification. In the future, it is expected that a method like radiomic will provide a consistent contribution to analyze histopathology tissue images and differentiate between cancerous and noncancerous tumors. Abstract The optimal diagnostic and treatment strategies for prostate cancer (PCa) are constantly changing. Given the importance of accurate diagnosis, texture analysis of stained prostate tissues is important for automatic PCa detection. We used artificial intelligence (AI) techniques to classify dual-channel tissue features extracted from Hematoxylin and Eosin (H&E) tissue images, respectively. Tissue feature engineering was performed to extract first-order statistic (FOS)-based textural features from each stained channel, and cancer classification between benign and malignant was carried out based on important features. Recursive feature elimination (RFE) and one-way analysis of variance (ANOVA) methods were used to identify significant features, which provided the best five features out of the extracted six features. The AI techniques used in this study for binary classification (benign vs. malignant and low-grade vs. high-grade) were support vector machine (SVM), logistic regression (LR), bagging tree, boosting tree, and dual-channel bidirectional long short-term memory (DC-BiLSTM) network. Further, a comparative analysis was carried out between the AI algorithms. Two different datasets were used for PCa classification. Out of these, the first dataset (private) was used for training and testing the AI models and the second dataset (public) was used only for testing to evaluate model performance. The automatic AI classification system performed well and showed satisfactory results according to the hypothesis of this study.


Introduction
In 2011, the World Health Organization (WHO) reported that cancer caused more deaths than strokes and coronary heart disease combined, and global demographics and epidemiological indications suggested that the trend would continue, especially in lowincome countries. Annual cancer cases may exceed 20 million as early as 2025. In 2012, 14.1 million new cancer patients and 8.2 million cancer deaths occurred worldwide; lung, Over the past 150 years, pathologists have microscopically evaluated tissue slides when evaluating cancer status, but this is difficult because only a minuscule proportion of observed cells may be tumorous. Improvements in diagnosis and more targeted treatments are needed [5,6]. An AI system may be helpful. Recently, DeepMind (Google) has dramatically reduced breast cancer diagnosis error [7]. Novartis is working with PathAI to develop AI for cancer diagnosis and treatment decision-making [8]. Over the past 150 years, pathologists have microscopically evaluated tissue slides when evaluating cancer status, but this is difficult because only a minuscule proportion of observed cells may be tumorous. Improvements in diagnosis and more targeted treatments are needed [5,6]. An AI system may be helpful. Recently, DeepMind (Google) has dramatically reduced breast cancer diagnosis error [7]. Novartis is working with PathAI to develop AI for cancer diagnosis and treatment decision-making [8].
To perform cancer analysis and classification, machine learning (ML) is an appropriate method compared to visual analysis. The main difference between deep learning (DL) and ML classification is the way data is presented to the system. ML algorithms require organized data, while DL relies on layers of artificial neural networks (ANN). Generally, DL requires a large amount of data for better prediction and more computational time, whereas ML can predict accurately even with a small dataset and requires less computational time.
Our study focuses on a texture feature-based classification using AI techniques. There are many tools provided by ML by which data can be analyzed and classified automatically.
Tissue images contain a lot of information (i.e., grey level patterns), which are difficult to analyze by eye because of the invisibility [9]. This textural information can be extracted and analyzed using a radiomics technique [10] and through the process of tissue feature engineering (i.e., the computation of tissue-level features). Feature selection optimally reduces computational complexity and improves classification accuracy and model performance. Features are the 2D spatial values of an image that represents the type of texture. In this paper, the recursive feature elimination (RFE) method was used to select the best features for AI classification. This method is popular and easy to configure because it is highly effective at selecting significant features in the dataset. Further, an ANOVA test was performed to identify the significant difference between benign and malignant features.
The texture is a key element of human-recognizable visual perception and is used in various ways in computer vision systems. It is easy for the human eye to distinguish different textures, but this can be perceived as a rather tricky problem on a computer. Among the techniques for analyzing textures, a structural approach is used for the statistical characteristics of images because the pattern of textures creates the structure, and the texture has different consistent properties. Statistical analysis methods indirectly express textures by nondecisive attributes that control the distribution and relationship between the intensity levels of the image. Accurate diagnoses and cancer grading are essential. Gradespecific features must be defined. Here, we use a feature extraction process called feature engineering to extract textural aspects from tissue images and differentiate malignant (grade 3, 4, and 5) from benign (grade 1 and 2) and grade 5 from grade 3 prostatic samples using the proposed AI models, which include support vector machine (SVM), logistic regression (LR), boosting tree, bagging tree, and dual-channel bidirectional long short-term memory (DC-BiLSTM). Pixel distribution analysis is very important to understand the variation of intensity in the image.

Related Work
Existing research on tissue texture analysis and classification has shown promising performance for PCa detection using Hematoxylin and Eosin (H&E) histopathology images. Most of the existing research focuses on binary classification for differentiating malignant and benign tumors using computer-aided diagnosis (CAD) tools. In the past, many researchers used an AI system to detect malignant biopsies and decrease the workload of pathologists. Therefore, the pathologist can be assisted through an AI system with the detection of PCa among the biopsies that are included in the preliminary screening process. In this section, we mainly discuss the methods of classification and feature extraction. Past studies related to AI-based classification are summarized in Table 1. The studies in Table 1 show that different AI techniques and parameters have been used for cancer classification. Most of the studies performed binary classification using second-order statistical features to discriminate between noncancerous and cancerous tumors. Among them, Chakraborty et al. [21] achieved the best result in classifying the histopathologic scans of the lymph node section using a dual-channel residual convolution neural network. Similarly, the present study achieved astounding results in classifying the first-order statistic (FOS)-based texture features extracted from H&E channels, and considering all these different approaches, a comparative analysis was performed using the various AI models, namely SVM, LR, bagging tree, boosting tree, and DC-BiLSTM.

Data Collection
The following two datasets were collected from two different centers. Out of these, one was private and the other one was public. Both the datasets were used for preprocessing, prior to feature extraction and classification.
Private Dataset: The tissue slides used for this research were acquired from 20 patients and prepared at the Severance Hospital of Yonsei University, Korea. To prepare the tissue slides, the pathologist used the H&E staining system. Deparaffinization and rehydration were performed before H&E staining, as incomplete removal of paraffin wax compromises staining. Tissues were sectioned to 4 µm and autostained. Slides were scanned at 40× magnification using a 0.3-NA objective (Olympus BX-51 microscope) and photographed (Olympus C-3000 digital camera). Each slide contained 33,584 × 70,352 pixels. H&E-stained regions of interest (ROIs) of 256 × 256 pixels were extracted from whole slide images (WSIs), as shown in Figure 2. As shown in Table 2, 500 images were used for textural analysis, feature extraction, and classification, of which 250 were benign and 250 malignant (50 of grade 3, 100 of grade 4, and 100 of grade 5).  External Test Set: The dataset was collected online and publicly available at https://zenodo.org/record/1485967#.X_0ue-gzZMs (accessed on 25 January 2021). Bulten et al. [22] made their dataset and uploaded it to the Zenodo repository, which can be used for external validation. A total of 102 patients experienced a radical prostatectomy at the Medical Center of Radboud University. Out of these, the H&E-stained samples of 40 patients were selected to check the performance of the models, of which the WSI for each patient was divided into four sections (i.e., two containing benign epithelium and two containing tumor). From each section, the ROIs of 2500 × 2500 pixels were extracted at 10× magnification, shown in Figure 3. As a result, 160 ROIs were extracted (89 of benign epithelial and 71 of tumor). The best 10 ROIs of benign epithelial and tumor were selected for model validation.  External Test Set: The dataset was collected online and publicly available at https:// zenodo.org/record/1485967#.X_0ue-gzZMs (accessed on 25 January 2021). Bulten et al. [22] made their dataset and uploaded it to the Zenodo repository, which can be used for external validation. A total of 102 patients experienced a radical prostatectomy at the Medical Center of Radboud University. Out of these, the H&E-stained samples of 40 patients were selected to check the performance of the models, of which the WSI for each patient was divided into four sections (i.e., two containing benign epithelium and two containing tumor). From each section, the ROIs of 2500 × 2500 pixels were extracted at 10× magnification, shown in Figure 3. As a result, 160 ROIs were extracted (89 of benign epithelial and 71 of tumor). The best 10 ROIs of benign epithelial and tumor were selected for model validation.
External Test Set: The dataset was collected online and publicly available at https://zenodo.org/record/1485967#.X_0ue-gzZMs (accessed on 25 January 2021). Bulten et al. [22] made their dataset and uploaded it to the Zenodo repository, which can be used for external validation. A total of 102 patients experienced a radical prostatectomy at the Medical Center of Radboud University. Out of these, the H&E-stained samples of 40 patients were selected to check the performance of the models, of which the WSI for each patient was divided into four sections (i.e., two containing benign epithelium and two containing tumor). From each section, the ROIs of 2500 × 2500 pixels were extracted at 10× magnification, shown in Figure 3. As a result, 160 ROIs were extracted (89 of benign epithelial and 71 of tumor). The best 10 ROIs of benign epithelial and tumor were selected for model validation.

Image Representation
A power-law transformation (gamma correction) [23,24] was applied to the private dataset to adjust the contrast level of the tissue image. This method controls the overall brightness of an image and therefore, helps to display it accurately on a computer screen. Further, the preprocessed ROIs of H&E tissue samples (i.e., private dataset and external test set) were used for generating the non-overlapping patches of size 64 × 64 pixels, and a stain deconvolution technique was used to separate the Hematoxylin and Eosin channels from the extracted patches, as shown in Figure 4. A total of 8000 patches were selected from each dataset (private and external). Before generating patches of the external test set, the ROIs were resized to 2048 × 2048 pixels. According to the rule of thumb, the greater the learning samples per class [25], the better the model classification. Patch-based texture

Image Representation
A power-law transformation (gamma correction) [23,24] was applied to the private dataset to adjust the contrast level of the tissue image. This method controls the overall brightness of an image and therefore, helps to display it accurately on a computer screen. Further, the preprocessed ROIs of H&E tissue samples (i.e., private dataset and external test set) were used for generating the non-overlapping patches of size 64 × 64 pixels, and a stain deconvolution technique was used to separate the Hematoxylin and Eosin channels from the extracted patches, as shown in Figure 4. A total of 8000 patches were selected from each dataset (private and external). Before generating patches of the external test set, the ROIs were resized to 2048 × 2048 pixels. According to the rule of thumb, the greater the learning samples per class [25], the better the model classification. Patch-based texture analysis was performed to increase the number of samples in the dataset and to extract more discriminating features.

Materials and Methods
The proposed pipeline of this study is depicted in Figure 5. First, after ROIs (256 × 256 pixels) acquisition from the stained WSI, the images were separated between two groups (benign and malignant). Second, the extracted ROIs were used for patch generation, gamma correction, and stain deconvolution. Third, for texture analysis, a set of radiomic features were extracted from dual-channel (Hematoxylin and Eosin) separately. Forth, two steps of feature selection (RFE and one-way ANOVA) were carried out to validate feature significance. Fifth, a binary classification was performed using the AI models. Finally, a comparative analysis of classification algorithms and performance evaluation was performed using a confusion matrix and receiver operating characteristic (ROC) curve.

Materials and Methods
The proposed pipeline of this study is depicted in Figure 5. First, after ROIs (256 × 256 pixels) acquisition from the stained WSI, the images were separated between two groups (benign and malignant). Second, the extracted ROIs were used for patch generation, gamma correction, and stain deconvolution. Third, for texture analysis, a set of radiomic features were extracted from dual-channel (Hematoxylin and Eosin) separately. Forth, two steps of feature selection (RFE and one-way ANOVA) were carried out to validate feature significance. Fifth, a binary classification was performed using the AI models. Finally, a comparative analysis of classification algorithms and performance evaluation was performed using a confusion matrix and receiver operating characteristic (ROC) curve. tion, gamma correction, and stain deconvolution. Third, for texture analysis, a set of radiomic features were extracted from dual-channel (Hematoxylin and Eosin) separately. Forth, two steps of feature selection (RFE and one-way ANOVA) were carried out to validate feature significance. Fifth, a binary classification was performed using the AI models. Finally, a comparative analysis of classification algorithms and performance evaluation was performed using a confusion matrix and receiver operating characteristic (ROC) curve.

Patch-Based Feature Engineering
The textural analysis exploits spatial changes in image patterns to extract information from both images and shapes. Such features are effective classifiers because they contain statistical data on adjacent pixels in images [26,27]. Here, we extracted image textural features and classified them using AI-based models. Radiomic is a technique of extracting a large number of features from captured visual content of medical images for analysis and classification. In this paper, we extracted FOS-based features in which the texture values were statistically computed from an individual pixel without considering the relationships of neighboring pixels [28]. The given radiomic features [29] that were extracted from H&E staining channels are given in the Supplementary Information (SI), Feature Extraction, and Table S1.

Features Selection
To select significant features from those extracted, we used two-step feature selection methods, namely wrapper (RFE) [30] and filter (one-way ANOVA) [31]. At times, due to insignificant input features, the learning algorithms could be deceived, resulting in poor predictive performance. Therefore, feature selection is an important step for AI-based classification, which selects the most relevant features for a dataset.
First, the best five features were selected using an RFE (greedy optimization algorithm) technique, which generates baseline models repeatedly and selects the strongest or weakest performing feature at each iteration until all the features are classified. In our study, a gradient boosting classifier was used as a baseline model to carry out the RFE process [32]. As a result, features were ranked based on the descending order from strongest to weakest, as shown in Figure 6. classification, which selects the most relevant features for a dataset.
First, the best five features were selected using an RFE (greedy optimization algorithm) technique, which generates baseline models repeatedly and selects the strongest or weakest performing feature at each iteration until all the features are classified. In our study, a gradient boosting classifier was used as a baseline model to carry out the RFE process [32]. As a result, features were ranked based on the descending order from strongest to weakest, as shown in Figure 6. Second, as shown in Table 3, a one-way ANOVA statistical test was performed to identify the feature significance (F-value and p-value) and effect size (i.e., eta squared) from those selected using RFE. The magnitude differences between the two groups (i.e., benign and malignant) were analyzed based on the eta squared and effective size (i.e., 0.01 = small, 0.06 = medium, and 0.14 = large) [33,34]. The small, medium and large effect sizes signify the difference between the two groups as unimportant, less important, and important, respectively. The eta squared is calculated using the following equation, where SS is the sum of squares between the groups, SS is the sum of squares between + within the groups, and η is the eta squared. Second, as shown in Table 3, a one-way ANOVA statistical test was performed to identify the feature significance (F-value and p-value) and effect size (i.e., eta squared) from those selected using RFE. The magnitude differences between the two groups (i.e., benign and malignant) were analyzed based on the eta squared and effective size (i.e., 0.01 = small, 0.06 = medium, and 0.14 = large) [33,34]. The small, medium and large effect sizes signify the difference between the two groups as unimportant, less important, and important, respectively. The eta squared is calculated using the following equation, where SS effect is the sum of squares between the groups, SS total is the sum of squares between + within the groups, and η 2 is the eta squared.

Binary Classification
The classification of textural features from medical images provides useful information and helps pathologists to make accurate decisions. In this paper, we used AI-based algorithms to perform binary classification and differentiate malignant from benign tissue and high-grade (grade 5) from low-grade (grade 3) tissue samples. ML and DL-based classification are very important to understand the image patterns of a certain disease. To perform the ML classification, extracted features from H&E are concatenated before initiating the learning process. In contrast, for DL-based DC-BiLSTM classification, the dual-channel features from H&E are concatenated through the learning process. In this work, multiple ML and DL-based algorithms were developed, and we carried out a comparative analysis to determine performance based on the evaluation metrics. A detailed explanation of the classification methods is given in the Supplementary Information (SI) and Classification Algorithms [35][36][37][38][39][40][41][42][43][44][45]. In the Supplementary Information, Figures S1-S3 show the classification process of Bagging and Boosting and LSTM algorithms, respectively.
In this paper, we proposed a DC-BiLSTM model for learning dual-channel tissue features and discriminating diseases based on normal and abnormal prostate tissues, which is a novel approach. In general, a unidirectional LSTM model consists of one LSTM that works only in one way, learning the input features from the past to the future in a forward direction. On the other hand, a BiLSTM [46] model consists of two LSTMs that work in two ways, one learning the inputs from past to future and the other from future to past in a forward and backward direction. In BiLSTM, the information learned from both directions is concatenated for the final computation. As shown in Figure 7, DC-BiLSTM network consisted of two input channels for learning Hematoxylin and Eosin-based tissue features. Two layers of BiLSTM were used for each input channel containing the same number of nodes (i.e., time steps and cells = 64). The model concatenated the outputs of the two channels into a single feature vector and passed them to a fully connected layer for cancer classification.
ating the learning process. In contrast, for DL-based DC-BiLSTM classification, the dualchannel features from H&E are concatenated through the learning process. In this work, multiple ML and DL-based algorithms were developed, and we carried out a comparative analysis to determine performance based on the evaluation metrics. A detailed explanation of the classification methods is given in the Supplementary Information (SI) and Classification Algorithms [35][36][37][38][39][40][41][42][43][44][45]. In the Supplementary Information, Figures S1-S3 show the classification process of Bagging and Boosting and LSTM algorithms, respectively.
In this paper, we proposed a DC-BiLSTM model for learning dual-channel tissue features and discriminating diseases based on normal and abnormal prostate tissues, which is a novel approach. In general, a unidirectional LSTM model consists of one LSTM that works only in one way, learning the input features from the past to the future in a forward direction. On the other hand, a BiLSTM [46] model consists of two LSTMs that work in two ways, one learning the inputs from past to future and the other from future to past in a forward and backward direction. In BiLSTM, the information learned from both directions is concatenated for the final computation. As shown in Figure 7, DC-BiLSTM network consisted of two input channels for learning Hematoxylin and Eosin-based tissue features. Two layers of BiLSTM were used for each input channel containing the same number of nodes (i.e., time steps and cells = 64). The model concatenated the outputs of the two channels into a single feature vector and passed them to a fully connected layer for cancer classification. The hypothesis we created for the binary classification is as follows: The hypothesis we created for the binary classification is as follows: (a) For the internal test set, the recall of benign vs. malignant and grade 3 vs. grade 5 classification must be ≥90% and ≥80%, respectively. (b) For the external test set, the recall of benign vs. malignant classification must be ≥85%.

Model Performance
The implementation (i.e., image representation, feature extraction, feature selection, model classification) for this study was carried out using MATLAB and Python programming on an Intel Core i7 workstation with 24 GB RAM. For the analysis and classification of dual-channel tissue features, we used both private and public datasets. For the private dataset of benign and malignant tissue, we extracted 8000 patches from 500 ROIs of size 256 × 256 pixels. Out of these, 5120 were for training, 1280 for validation, and 1600 for testing. Moreover, within the malignant tissues, 1600 patches were extracted separately from 100 ROIs (50 of grade 3 and 50 of grade 5) to validate the performance of the trained classifiers in distinguishing between low-grade and high-grade disease. On the other hand, for the external test set, we extracted 10,240 patches from 10 ROIs of size 2048 × 2048 pixels, of which the best 8000 patches, excluding background, were selected for model validation in distinguishing between benign and malignant tissues. The evaluation metrics used to compute the results of binary classification were accuracy, recall, precision, and f1-score. Tables 4-6 show the test results and comparative analysis of the multiple learning algorithms for the private and public datasets. For the internal test sets of benign vs. malignant and grade 3 vs. grade 5, the best classification results obtained by DC-BiLSTM were accuracy = 98.6% and 93.5%, precision = 98.2% and 96.3%, recall = 98.9% and 91.2%, and F1 score = 98.6% and 93.7%, respectively. For the external test set of benign vs. malignant, the best classification results obtained by Boosting Tree were accuracy = 93.5%, precision = 92.9%, recall = 94.1%, and F1 score = 93.5%.

Result Evaluation
It was found from the comparative analysis that DC-BiLSTM and Boosting tree outperformed all the other classifiers. In general, for evaluating the performance of the classification model, the N × N confusion matrix is used which compares the target labels with those predicted by the models. Therefore, we used 2 × 2 matrices for evaluating the results of binary classification. Tables 7-9 show confusion matrices for the internal and external test sets with magnification factors of 40× and 10×, respectively. In classifying benign and malignant tissue, the confusion matrix of each classifier demonstrated that the malignant samples were more accurately classified compared to benign samples. This is because the tissue texture between malignant and benign samples was very different from each other. However, in the malignant dataset, low-grade samples were also included whose texture pattern was fairly similar to benign. Consequently, the classifier identified some of the low-grade samples as benign and the misclassification rate of benign increased gradually.    Figure 8 shows the receiver operating characteristic (ROC) curve and area under the ROC curve (AUC), which was generated to measure and compare the usefulness of the optimum AI models. AUCs of 1.00, 0.98, and 0.95 were achieved by the AI system representing the ability to distinguish malignant from benign and grade 5 from grade 3 tissue samples.   Figure 8 shows the receiver operating characteristic (ROC) curve and area under the ROC curve (AUC), which was generated to measure and compare the usefulness of the optimum AI models. AUCs of 1.00, 0.98, and 0.95 were achieved by the AI system representing the ability to distinguish malignant from benign and grade 5 from grade 3 tissue samples.

Discussion
We developed an AI-based CAD system for PCa classification that can achieve outstanding discrimination between benign and malignant biopsy tissue samples. This system can classify only binary samples for cancer detection. The initial step of this work was to analyze the ROIs of benign, grade 3, grade 4 and grade 5 tissue samples. We used QuPath open-source software for analyzing the tissue samples manually and separated the ROIs into two classes, namely benign and malignant (i.e., grade 3, grade 4, and grade 5). In benign samples, the glands are small and uniform in shape, with more stroma between glands, whereas in malignant samples, there are irregular masses of neoplastic

Discussion
We developed an AI-based CAD system for PCa classification that can achieve outstanding discrimination between benign and malignant biopsy tissue samples. This system can classify only binary samples for cancer detection. The initial step of this work was to analyze the ROIs of benign, grade 3, grade 4 and grade 5 tissue samples. We used QuPath open-source software for analyzing the tissue samples manually and separated the ROIs into two classes, namely benign and malignant (i.e., grade 3, grade 4, and grade 5). In benign samples, the glands are small and uniform in shape, with more stroma between glands, whereas in malignant samples, there are irregular masses of neoplastic glands, absence of glands and sheets of cells. Tissue-level texture analysis was performed for differentiating malignant from benign tumors and grade 5 from grade 3 tumors. In general, we analyzed the image texture by calculating the magnitude of the pixel values, peaks of the distribution values, randomness in the image values, homogeneity of the image intensity values, the coarseness of image texture, gray level intensity and a grouping of pixels with similar values.
Before we performed feature extraction, selection, and classification, we generated small patches of size 64 × 64 pixels from the selected ROIs and used the stain deconvolution method to separate the H&E channels. FOS-based features (energy, entropy, kurtosis, skewness, variance and uniformity) were calculated from each staining channel separately. Out of six extracted features, the best five were selected using RFE and a one-way ANOVA technique based on the feature ranking, p-values and effect size (eta squared). Later, the significant features of the two channels (H&E) were concatenated and classified using the AI models. Although the external test set was unknown for the learning models, the AI system performed well in the binary classification. The bar charts of the comparative analysis are given in the Supplementary Information (SI) and Figures (Figure S4a-c). Figure 9 shows the box plots for two different groups used for binary classification. The box plots were generated by calculating the mean feature values of each group. Five high-ranked radiomic features extracted from independent and external test sets were used for comparing the texture differences between benign and malignant and grade 3 and grade 5 tissue samples. It can be observed that the structure of box plots and the mean values for each feature in Figure 9a,c are quite similar, which demonstrates that the texture of prostatic tissue of independent and external test sets was relatively comparable. As a result, the classification algorithms had a good chance in accurate classification and making the right decision. Before we performed feature extraction, selection, and classification, we generated small patches of size 64 × 64 pixels from the selected ROIs and used the stain deconvolution method to separate the H&E channels. FOS-based features (energy, entropy, kurtosis, skewness, variance and uniformity) were calculated from each staining channel separately. Out of six extracted features, the best five were selected using RFE and a one-way ANOVA technique based on the feature ranking, p-values and effect size (eta squared). Later, the significant features of the two channels (H&E) were concatenated and classified using the AI models. Although the external test set was unknown for the learning models, the AI system performed well in the binary classification. The bar charts of the comparative analysis are given in the Supplementary Information (SI) and Figures (Figure S4a-c). Figure 9 shows the box plots for two different groups used for binary classification. The box plots were generated by calculating the mean feature values of each group. Five high-ranked radiomic features extracted from independent and external test sets were used for comparing the texture differences between benign and malignant and grade 3 and grade 5 tissue samples. It can be observed that the structure of box plots and the mean values for each feature in Figure 9a,c are quite similar, which demonstrates that the texture of prostatic tissue of independent and external test sets was relatively comparable. As a result, the classification algorithms had a good chance in accurate classification and making the right decision.  In our previous study [47], we performed textural analysis and extracted a total of 12 radiomic features (second-order statistic) using the gray-level co-occurrence matrix (GLCM) method, of which the best 10 features were selected using one-way ANOVA. The image size was 512 × 512 pixels (24 bits/pixel). Feature classification was performed separately with 10 and 12 features using SVM and K-nearest neighbors (KNN) classification algorithms. The training-to-testing sample ratio was 8:2. The classification accuracies for SVM were 81.6% and 84.1% using 12 and 10 features, respectively, and the accuracies for KNN were 77.5% and 79.1% using 12 and 10 features, respectively. In the present study, we used the window size of 64 × 64 to extract 16 patches from a single image of size 256 × 256 pixels. The number of data samples for benign and malignant tissues were limited, and, therefore, patch extraction was performed to boost the training sample per class. The FOS features were extracted from H&E staining channels after image representation was performed and, therefore, our AI system uncovered effective results for the detection of PCa. Even though our proposed H&E network (DC-BiLSTM) achieved high accuracy using the internal test sets, it did not perform very well for the external test set compared to the Boosting tree classifier. This is because the proposed network was developed and fine-tuned based on the private dataset and, in the histological sections, the spatial distributions of an image differed from one dataset to another. It was difficult to determine the performance of the trained model using the external or blind test set. However, the proposed network performed well and achieved satisfactory results according to the hypothesis of this study.
In medical image processing, feature extraction and selection of key features is very important for microscopic biopsy, magnetic resonance (MR), ultrasound and x-ray image analyses. However, many researchers use doctors' recommended clinical features for disease classification and, therefore, cannot achieve better results. The texture of medical images provides a lot of information in the spatial arrangement of colors or intensities. A radiomic method can be used to extract this information for AI-based disease classification, and thus better diagnostic results can be obtained.

Conclusions
We used various FOS features to perform AI-based classification and analyzed textural dissimilarities in prostate tissue images. The purpose of this paper was to analyze the significant features and classify them for PCa detection. Two-step feature selection was effective in terms of selecting important features. Our models yielded promising results using FOS radiomic features extracted from patch images. We evaluated the performance and strength of the models using private and public datasets collected from two different centers. All the AI models achieved high recall in classifying benign and malignant tissue samples, which is very helpful for researchers and clinicians. Each model was successfully validated using the two internal and one external test datasets, achieving accuracies of 96.1, 85.2 and 88.2% using SVM; 96.1, 85.1 and 87.9% using LR; 95.6, 80.8 and 91.3% using Bagging tree; 96.0, 86.0 and 93.5% using Boosting tree and 98.6, 93.6 and 89.2% using DC-BiLSTM, respectively.
In this study, fine-tuning of the classification models was performed to reduce the overfitting problem. The performance evaluation of the AI models was carried out using 2 × 2 confusion matrices and ROC curves. Texture analysis of patch-based histopathological images is sometimes difficult due to spatial changes in image patterns. Therefore, some preprocessing, like smoothing effect, image normalization and intensity correction, is necessary to overcome this type of difficulty. However, in this study, gamma correction and stain deconvolution technique techniques were incorporated to adjust the intensity level and separate the staining channels of the tissue images, respectively. To analyze the texture of tissue images and extract significant information, we must use feature-engineered radiomics techniques. In future studies, it is highly recommended that the validation of AI models should be performed using other histopathological datasets containing various cancer cases.
Supplementary Materials: The following are available online at https://www.mdpi.com/2072-669 4/13/7/1524/s1, Figure S1: An example of Bagging Tree classifier. Here, the classification is carried out in a parallel direction, Figure S2: An example of Boosting Tree classifier. Here, the classification is performed in a sequential direction, Figure S3: The operation and structure of LSTM cell. Figure S4: Comparative analysis graphs of four different evaluation metrics that show the results of binary classification obtained using different AI models. Table S1: The description and formula of the extracted FOS features.