Is High Expression of Claudin-7 in Advanced Colorectal Carcinoma Associated with a Poor Survival Rate? A Comparative Statistical and Artificial Intelligence Study

Simple Summary The need for predictive and prognostic biomarkers in colorectal carcinoma (CRC) brought us to an era where the use of artificial intelligence (AI) models is increasing. We investigated the expression of Claudin-7, a tight junction component, which plays a crucial role in maintaining the integrity of normal epithelial mucosa, and its potential prognostic role in advanced CRCs by drawing a parallel between statistical and AI algorithms. Claudin-7 immunohistochemical expression was evaluated in the tumor core and invasion front of CRCs and correlated with clinicopathological parameters and survival using statistical and AI algorithms. The Kaplan–Meier univariate survival analysis showed that the immunohistochemical overexpression of Claudin-7 in the tumor invasive front may represent a poor prognostic factor in advanced stages of CRCs. On the contrary, AI models could not predict the same outcome, probably because of the small number of patients included in our cohort. Abstract Aim: The need for predictive and prognostic biomarkers in colorectal carcinoma (CRC) brought us to an era where the use of artificial intelligence (AI) models is increasing. We investigated the expression of Claudin-7, a tight junction component, which plays a crucial role in maintaining the integrity of normal epithelial mucosa, and its potential prognostic role in advanced CRCs, by drawing a parallel between statistical and AI algorithms. Methods: Claudin-7 immunohistochemical expression was evaluated in the tumor core and invasion front of CRCs from 84 patients and correlated with clinicopathological parameters and survival. The results were compared with those obtained by using various AI algorithms. Results: the Kaplan–Meier univariate survival analysis showed a significant correlation between survival and Claudin-7 intensity in the invasive front (p = 0.00), a higher expression being associated with a worse prognosis, while Claudin-7 intensity in the tumor core had no impact on survival. In contrast, AI models could not predict the same outcome on survival. Conclusion: The study showed through statistical means that the immunohistochemical overexpression of Claudin-7 in the tumor invasive front may represent a poor prognostic factor in advanced stages of CRCs, contrary to AI models which could not predict the same outcome, probably because of the small number of patients included in our cohort.


Introduction
Patients with advanced stages of colorectal carcinoma (CRC) have a high risk of recurrence, with tumors in these stages exhibiting accelerated proliferation, increased tendency towards invasion and metastasis, and heterogeneity in treatment response [1][2][3][4]. Claudin-7, a tight junction component, is one of the most important members of the claudin family, consisting of 211 amino acid residues, which plays a crucial role in maintaining tight junction integrity, epithelial cell polarity, and ion permeability between cells (the integrity of epithelial mucosa) [5,6]. Recently, Claudin-7 has been reported to be also involved in non-tight junction-related functions such as inflammation initiation and in different tumor development steps [7]. Abnormal Claudin-7 expression was reported in a variety of cancers (e.g., ovary, breast, prostate, esophagus, stomach, colon, and lung) and has been associated with tumorigenesis, progression, and metastasis [5,6]. In CRC, up-regulation, down-regulation and even deletion of Claudin-7 have been reported, and are an important step in tumorigenesis, invasion, epithelial-to-mesenchymal transition (EMT), metastasis and even tumor suppression [5,[8][9][10].
Although Claudin-7 is involved in the pathogenesis of CRC through several distinct mechanisms, the literature data are inconsistent and limited [10][11][12][13]. Therefore, the aim of the present study was to investigate the expression of Claudin-7 and the potential prognostic role in advanced stages of CRC. Although Artificial Intelligence (AI) is a well-established field of research, in recent years, it has been applied extensively in medicine, mainly using its subdomains, Machine Learning and Deep Learning. In medicine, AI algorithms are commonly used to predict the outcome of a treatment, find correlations among data, or diagnose certain pathologies. Related to CRC, AI algorithms were used to diagnose or foresee the evolution of the disease [14,15] or the outcome of a CRC treatment [16,17] at an early stage. More specifically, AI algorithms were used to diagnose CRC during colonoscopy [18,19], on biopsies [20] or using RNA molecular biology [21]. AI was also widely used in CRC surgery [22,23] or chemotherapy [24]. To our knowledge, no prior study investigating the role of Claudin-7 expression in CRC using AI algorithms has been made.

Study Group
The study group consisted of 84 patients with histologically confirmed advanced stage CRC (stage IV), diagnosed between 2008 and 2020, in "Sf. Spiridon" Emergency County Hospital Iasi, Romania. This study was approved by the Ethics Committee of "Sf. Spiridon" Emergency County Hospital, Iasi and written informed consent was obtained from all patients.

Tissue Microarray and Immunohistochemistry
Tumor samples were routinely processed by fixation in neutral buffered formalin 10% and paraffin-embedding. Tissue Microarrays (TMAs) were constructed using 2 punches (4 mm diameter) from each case, one from the tumor core and the other from the invasion tumor front. The control group included 20 samples of normal colonic mucosa resection margins harvested from at least 10 cm from adenocarcinoma. Immunohistochemical tests were performed using anti-Claudin-7 monoclonal antibody (rabbit anti-human, 1:1500, ab207300, Abcam, Cambridge, UK) after pretreatment with a specific epitope retrieval solution (pH 9) at 96 • C, for 25 min. The detection of the immunoreaction was performed using an UltraVision LP Detection System (ThermoFisher Scientific, Fremont, CA, USA) and 3,3 -Diaminobenzidine chromogen (DAB) (ThermoFisher Scientific, Fremont, CA, USA). The specificity of the immunoreactivity was checked by omitting the primary antibody and replacing it with non-immunized serum at the same dilution (negative control) for each TMA. The immunohistochemistry tests were performed on two replicates from each TMA.

Immunohistochemical Assessment Protocol
Assessment of Claudin-7 expression was carried out by two independent investigators, in a blind manner. Expression of Claudin-7 was defined as the presence of membranous staining in tumor cells. A semi-quantitative four-tiered scoring system was used to measure the proportion of stained tumor cells per core, as follows: 0 = <5% positive tumor cells; 1 = 5-30%; 2 = 30-60%; and 3 = >60%. The intensity of immunoreaction was evaluated using a four-tiered system: 0-negative, 1-weak, 2-moderate, 3-strong. Both the tumor core and the invasive front were evaluated. Discordant cases were re-evaluated in the panel in order to achieve a consensus.  [25,26]. Survival was defined as the time elapsed between the date of diagnosis and therapy initiation, to the date of death or of the last follow-up. For univariate survival, the Kaplan-Meier method and scatter plots were used to estimate the overall survival (OS) and analyzed using the log-rank test. Cox multiple regression was used to perform multivariate survival analysis. A p-value < 0.05 was considered to be statistically significant.
The survival rate was considered as the output, in two different forms, as a classification and as a regression problem. The data were preprocessed to compute the difference between the Visit Date and the Date of Death (when it was the case) in years. Then, three Survival class values were constructed: "2" when the difference was less than 2 years, "5" when it was between 2 and 5 years, and "T" (10) when it was greater than 10 years. When considering the problem in its regression form, the actual difference in years was the output.
Before applying the machine learning algorithms, it is useful to have an overall visual representation of the distribution of the data. In this section, each attribute is considered independently in a predictive relation to the output, which is an oversimplifying assumption, but it can provide some initial information about the problem. Figure 1 displays this for the classification problem (with 3 classes). One can see that the class values are equally distributed for the different input values, and there is no discernable pattern in this representation.  Figure 1. Visualization of independent inputs vs. outputs, for the classification problem (Cldn7-Claudin-7; P-percentage, I-intensity, D-membranous staining pattern: discontinuous vs. continuous).
The same can be said about the numeric values in Figure 2: for each input value (on the abscise) there is a large range of output values (on the ordinate).
Therefore, it cannot be concluded that any input attribute can independently influence the survival rate of the patients.

Scenarios for AI Experiments
The experiments were performed in three scenarios:

•
On the whole training set, to assess the ability of an algorithm to learn the data at all; • With 10-fold cross-validation, which is the de facto standard of comparing different algorithms and assessing their generalization capabilities; • With the leave-one-out approach, which is useful when the number of training instances is small (in our case, there are 84 patients).
The same one-to-one correspondence as carried out in the statistical approach between clinicopathological parameters (as inputs) and Claudin-7 expression (as outputs) was evaluated by means of information gain. This method assesses the relevance of an attribute for solving a classification problem. More specifically, if the class values can be perfectly distinguished by testing the values of an attribute, then that attribute can solve the classification problem by itself. In this ideal case, each attribute value corresponds to a partition of the dataset where the instances have the same class value. The resulting partitions have zero entropy, and thus the difference in entropy between the original dataset and the partitioned dataset is maximum. When the attribute values cannot partition the dataset perfectly, more homogenous partitions are still preferred, and entropy can still be used as a homogeneity measure.  The same can be said about the numeric values in Figure 2: for each input value (on the abscise) there is a large range of output values (on the ordinate).
Therefore, it cannot be concluded that any input attribute can independently influence the survival rate of the patients.

Scenarios for AI Experiments
The experiments were performed in three scenarios:

•
On the whole training set, to assess the ability of an algorithm to learn the data at all; • With 10-fold cross-validation, which is the de facto standard of comparing different algorithms and assessing their generalization capabilities; • With the leave-one-out approach, which is useful when the number of training instances is small (in our case, there are 84 patients).
The same one-to-one correspondence as carried out in the statistical approach between clinicopathological parameters (as inputs) and Claudin-7 expression (as outputs) was evaluated by means of information gain. This method assesses the relevance of an attribute for solving a classification problem. More specifically, if the class values can be perfectly distinguished by testing the values of an attribute, then that attribute can solve the classification problem by itself. In this ideal case, each attribute value corresponds to a partition of the dataset where the instances have the same class value. The resulting partitions have zero entropy, and thus the difference in entropy between the original dataset and the partitioned dataset is maximum. When the attribute values cannot partition the dataset perfectly, more homogenous partitions are still preferred, and entropy can still be used as a homogeneity measure. Therefore, it cannot be concluded that any input attribute can independently influence the survival rate of the patients.

Scenarios for AI Experiments
The experiments were performed in three scenarios:

•
On the whole training set, to assess the ability of an algorithm to learn the data at all; • With 10-fold cross-validation, which is the de facto standard of comparing different algorithms and assessing their generalization capabilities; • With the leave-one-out approach, which is useful when the number of training instances is small (in our case, there are 84 patients).
The same one-to-one correspondence as carried out in the statistical approach between clinicopathological parameters (as inputs) and Claudin-7 expression (as outputs) was evaluated by means of information gain. This method assesses the relevance of an attribute for solving a classification problem. More specifically, if the class values can be perfectly distinguished by testing the values of an attribute, then that attribute can solve the classification problem by itself. In this ideal case, each attribute value corresponds to a partition of the dataset where the instances have the same class value. The resulting partitions have zero entropy, and thus the difference in entropy between the original dataset and the partitioned dataset is maximum. When the attribute values cannot partition the dataset perfectly, more homogenous partitions are still preferred, and entropy can still be used as a homogeneity measure. For a class C and an attribute A, this increase in entropy is defined as: where H is the entropy: The probabilities p can be directly computed from the data as the proportion of instances with a class value i. The same approach is used to compute the conditional entropy for each attribute value Aj. Basically, a larger information gain means that an attribute is more relevant for the classification problem.
For the information gain algorithm, we considered ages ranging from 20 to 90 years old as discretized classes for every 10 years.

Multivariate Approach
The next analysis we performed using AI aims to select a subset of inputs (same clinicopathological parameters as before) with the maximum relevance to the outputs (Claudin-7 expression). A wrapper feature selection method uses a classification algorithm to assess the influence of increasingly larger subsets of input attributes over the output. This approach is a greedy one: starting with a single input, it successively adds other inputs and measures the resulting classification accuracy. The greedy approach implies that there is no backtracking and no exhaustive attempt to assess all possible attribute combinations, which would quickly become intractable for a medium to a large number of inputs.

Claudin-7 Expression
In this retrospective study, the immunohistochemical assessment of Claudin-7 expression was performed on 84 tumor samples, with histologically confirmed advanced stage CRC (stage IV). At the time of the last clinical follow-up, 64 patients (76.2%) out of the total group had died. Expression of Claudin-7 was defined as the presence of membranous staining in tumor cells. As shown in Table 1, the proportion of stained tumor cells was graded 3 for almost all cases, both in the core and invasive front. However, Claudin-7 staining intensity showed a larger variation in the scoring system, both in the core and invasive front. In the control group samples, represented by normal colonic mucosa, the intensity and the proportion of the Claudin-7 immunoexpression were both graded as 3. Abbreviations: Cldn7, Claudin-7; Cldn7 (Core) P, the proportion of Claudin-7 stained tumor cells in the tumor core; Cldn7 (Core) I, the intensity of Claudin-7 immunoreaction in the tumor core; Cldn7 (Front) P, the proportion of Claudin-7 stained tumor cells in invasive front; Cldn7 (Front) I, the intensity of Claudin-7 immunoreaction in invasive front.
A discontinuous membranous staining pattern of Claudin-7 was observed mostly in the tumor invasion front (Table 2).
Claudin-7 immunoexpression in the normal colonic mucosa, staining intensity variations according to the scoring system and different Claudin-7 staining patterns (discontinuous vs. continuous) are shown in Figure 3.  Claudin-7 immunoexpression in the normal colonic mucosa, staining intensity variations according to the scoring system and different Claudin-7 staining patterns (discontinuous vs. continuous) are shown in Figure 3.

Clinicopathological Parameters and Claudin-7 Expression
In order to evaluate the relation between Claudin-7 expression (in the tumor core, respectively in the tumor invasive front) and clinicopathological parameters, correlations were made using Chi-squared and Fisher's exact test (Table 3). A significant correlation (p = 0.033) was identified between Claudin-7 invasive front intensity and tumor leukocyte infiltrate, implying that a decrease in Claudin-7 intensity in the tumor invasive front is associated with an increase in leukocyte infiltrate. However, Claudin-7 expression regarding the proportion and intensity was not correlated with age, sex, T stage, N stage, grading, tumor location, venous invasion, lympho-vascular invasion, perineural invasion, growth pattern, tumor deposits and tumor budding (p > 0.05).

Clinicopathological Parameters and Claudin-7 Expression
In order to evaluate the relation between Claudin-7 expression (in the tumor core, respectively in the tumor invasive front) and clinicopathological parameters, correlations were made using Chi-squared and Fisher's exact test (Table 3). A significant correlation (p = 0.033) was identified between Claudin-7 invasive front intensity and tumor leukocyte infiltrate, implying that a decrease in Claudin-7 intensity in the tumor invasive front is associated with an increase in leukocyte infiltrate. However, Claudin-7 expression regarding the proportion and intensity was not correlated with age, sex, T stage, N stage, grading, tumor location, venous invasion, lympho-vascular invasion, perineural invasion, growth pattern, tumor deposits and tumor budding (p > 0.05).

Survival and Claudin-7 Expression
As previously mentioned, survival was defined as the time elapsed between the date of diagnosis and therapy initiation, to the date of death or of the last follow-up.
The Kaplan-Meier univariate survival analysis using the log-rank test showed a significant correlation between survival and Claudin-7 intensity in the invasive front (p = 0.00), with a higher expression (score 3) being associated with a worse prognosis. This decrease in survival occurred independently of Claudin-7 intensity in the tumor core, which had no impact on survival. In addition, scatter plots were also used for the statistical analysis of survival data (Figure 4).

Claudin-7 vs. Survival Rate AI Analysis
Tables 4 and 5 present the results of the algorithms mentioned in the Materials and Methods section for the two considered problems (classification and regression, respectively), in terms of accuracy for classification and coefficient of determination for regression. The algorithms in Tables 4 and 5 are different because some algorithms cannot be applied for both types of problems.

Claudin-7 vs. Survival Rate AI Analysis
Tables 4 and 5 present the results of the algorithms mentioned in the Materials and Methods section for the two considered problems (classification and regression, respectively), in terms of accuracy for classification and coefficient of determination for regression. The algorithms in Tables 4 and 5 are different because some algorithms cannot be applied for both types of problems.

Explicit Rules with Frequent Support
Some rules generated with the NNGE algorithm are presented below. These rules were detected from the whole dataset used for training. In the brackets at the end of each rule, the number of instances covered by the rule is included. Given the findings from the previous sections, it must be stressed that they may not be statistically relevant from the generalization point of view, but only a frequent pattern in the data. Still, these rules show some information related to higher rates of survival and they may need to be checked by the medical experts in case they provide some helpful clues regarding the appropriate treatment:  (6) The following rules consider the six Claudin-7 attributes as inputs, together with the age of the patient in years:

Correspondence between Clinicopathological Parameters and Claudin-7
For our case study, Table 6 shows the InfoGain measure for each combination of inputs and outputs. Abbreviations: Cldn7-Claudin-7; P-percentage; I-intensity. In each column corresponding to an output Oi, the red values marked by bold typeface represent the first (i.e., best) value and the blue ones represent the second and third values.

Multivariate Approach
In the following experiments, we use the order of attributes found by information gain and use the C4.5 decision tree algorithm, which naturally uses information gain for node splits. The full training set is used for these case studies. The results of this technique are presented in Figure 5.

Discussion
This retrospective study was designed to investigate the immunohistochemical expression of Claudin-7 and the potential prognostic significance in advanced CRCs. Furthermore, we aimed to draw a parallel between classical statistical algorithms and those used in AI.
Since its first discovery, a variety of studies suggested a link between Claudin-7 low expression and CRCs development and progression [5,[27][28][29][30][31][32]. However, other studies ob-  I8 I13 I4 I1 I3 I6 I2 I10 I5 I11 I9 I7  For output O1, only I4 and I8 are enough to correctly classify the instances ( Figure 5). This is not the case for the other outputs, which require all the inputs for maximum accuracy. For example, in the case of output O2, Cldn7 (Core) I, if only input I3, T-stage, is used, the decision tree has an accuracy of 60.5263%. If input I8, Leukocyte Infiltrate, is added, i.e., only the attributes I3 and I8 are used, the classification has a larger accuracy of 69.7368%. When I3, I8, and I4 are used, the accuracy becomes 80.2632%. Adding more inputs no longer increases the accuracy until the final attribute I11 is added, which leads to a 100% classification accuracy. The other subfigures in Figure 5 present the increase in accuracy for different orders of inputs corresponding to their respective outputs, as found by the information gain method.

Discussion
This retrospective study was designed to investigate the immunohistochemical expression of Claudin-7 and the potential prognostic significance in advanced CRCs. Furthermore, we aimed to draw a parallel between classical statistical algorithms and those used in AI.
The present study identified a significant correlation between Claudin-7 intensity in the tumor invasive front and tumor leukocyte infiltrate (p = 0.033), implying that a decrease in Claudin-7 intensity in the tumor invasive front is associated with an increase in leukocyte infiltrate. To our knowledge, this is the first study to investigate the correlation of Claudin-7 immunohistochemical expression with inflammatory infiltrate in CRCs in humans. Consistent with our findings, but in animal models, Wang et al. reported that loss of Claudin-7 increases colonic infiltration of leukocytes during experimental colitis and demonstrated the promotion of colitis and associated CRC colitis in a Claudin-7 knockout mouse model [34].
We found that, in almost all cases (98.8% in the tumor core and 91.66% in the tumor invasive front), more than 60% of the tumor cells (grade 3) showed a high expression of Claudin-7. Similar results were reported by Kuhn et al. and Darido et al. where a low expression of Claudin-7 was found in normal colonic crypts, while in CRCs the expression was high. They identified the Claudin-7-EpCAM-CO-029-CDn44v6 complex and its upregulation also in hepatic metastasis of patients with CRC and a significant correlation was found between this complex and the clinical diversity, apoptosis resistance, and disease-free survival [9,33].
On the other hand, in contradiction to our results and also to the above-mentioned studies, Bornholdt et al. found an intense immunohistochemical reaction of Claudin-7 in normal colonic tissue, but a decreased or absent reaction in dysplastic and CRC tissue. These observations were sustained by the Claudin-7 mRNA levels, suggesting an early change in CRC carcinogenesis [29]. Moreover, Xu et al. proved that the positivity rate of Claudin-7 expression was significantly lower in CRC tissues than in peritumoral normal tissue and Claudin-7 expression was correlated with the grade of differentiation, being downregulated in well-differentiated adenocarcinomas, further downregulated in moderately differentiated adenocarcinomas, and significantly downregulated in poorly differentiated adenocarcinoma [5].
When analyzing the relationship between Claudin-7 expression and the morphological manifestations of EMT (tumor budding) we found no significant correlation. However, in contrast to our results Philip et al., Bhat et al. and Wang et al. concluded that Claudin-7 low expression or downregulation induced EMT, which plays a major role in CRC invasion, progression and metastasis process [30][31][32]. Furthermore, Xu et al. analyzed the effects of Claudin-7 knockdown in CRC stem cells through cell proliferation assay, migration assay, apoptosis assay and reported changes in cell characteristics such as promotion of cell proliferation, migration, and inhibition of cell apoptosis and the presence of EMT [35].
The present study found that when using a statistical approach, except for the leukocyte inflammatory infiltrate, Claudin-7 expression was not correlated with other clinicopathological parameters: age, sex, T stage, N stage, grading, tumor location, venous invasion, lympho-vascular invasion, perineural invasion, growth pattern, tumor deposits and tumor budding (p > 0.05), nor in the tumor core, neither in the tumor invasive front). Consistent with our results, Hou et al. who conducted a study to explore the role of Claudin-7, a p53 regulated gene, in tumorigenesis and progression of CRC through quantitative real-time PCR, Western blot, a luciferase reporter assay, and immunohistochemistry, found no correlation between clinicopathological parameters (tumor size, invasion depth, lymphatic metastasis, stage III/IV) and Claudin-7 high expression. In addition, Claudin-7's high expression was significantly correlated with a favorable prognosis [36]. However, in contrast with these findings, in our study, the Kaplan-Meier univariate survival analysis, by using the log-rank test and the scatter plots, showed a significant correlation between survival and Claudin-7 intensity in the invasive front (p = 0.00), where a higher expression was associated with a worse prognosis.
Whenever conducting multivariate analysis, such as with the Cox regression method, one of the major pitfalls is having an insufficient number of outcome events (such as death) relative to the number of variables analyzed in the model. This proportion has been termed EPV (events per variable) [37,38], and a small value of the EPV affects the accuracy (risk estimates) and precision (95% confidence intervals) of odds or hazard ratios of the variables included, which may render misleading results. An adequate minimum value of the EPV is 10-20 (at the very least 10 outcome events per variable analyzed) [37]. In keeping with this rule, we found multivariate analysis unsuitable for our study (84 cases, 64 events, 15 variables).
Taken together, our study advocates for the potential prognostic and therapeutic role of Claudin-7 in advanced CRCs.
Concerning the study of AI, in this paper, we presented various analyses based on machine learning methods. In concordance with the findings of the statistical approach, we found that none of the applied algorithms can properly predict the survival rate based on the Claudin-7 inputs, therefore it is likely that there is no correlation between these inputs and outputs (Tables 4 and 5). We have considered the input data both in discrete and continuous forms.
When considering the influence of clinicopathological parameters on Claudin-7 expression based on information gain, by analyzing the inputs on the rows in Table 6, one can identify the most relevant inputs for all the outputs. The applied information gain algorithm found that the most relevant single inputs are Leukocyte Infiltrate and N-stage. Age was found to be relevant for the Cldn7 (Core) P, Cldn7 (Front) P and Cldn7 (Front) I. T-stage was found to be relevant only for the Cldn7 (Core) I expression. These findings are consistent only with the correlation found by statistical means between Leukocyte Infiltrate and Claudin-7 intensity in the tumor invasive front.
The analysis based on subsets of input attributes shows that no single attribute can lead to a high accuracy classification. The actual accuracy values depend on the base classification algorithm wrapped for feature selection. For the multivariate approach using AI algorithms, we used C4.5 decision trees and information gain algorithms, which generally provide good results for the analyzed issue.
We found that two or three attributes from the set of Leukocyte Infiltrate, T-stage, N-stage, and Age increase the accuracy to more than 75% for each of the four outputs (Cldn7 (Core) P, Cldn7 (Core) I, Cldn7 (Front) P, and Cldn7 (Front) I). For Cldn7 (Core) P and Cldn7 (Front) P, the results are very good, with accuracies of 100% and 92%, respectively. However, with the exception of Cldn7 (Core) P, all input attributes are necessary to obtain maximum accuracy.
Even if the accuracy remains constant when adding more attributes, this does not mean that they are irrelevant to the classification. Figure 6 presents a similar evolution of accuracy for O2, but only considers the attributes that cause an increase in accuracy in Figure 5. It can be observed that only an accuracy of 82.8947% is eventually attained for these six inputs. maximum accuracy.
Even if the accuracy remains constant when adding more attribute mean that they are irrelevant to the classification. Figure 6 presents a sim accuracy for O2, but only considers the attributes that cause an increas Figure 5. It can be observed that only an accuracy of 82.8947% is eventu these six inputs. Regarding the fact that our method is not able to fully classify outpu we must stress the fact that the final accuracy depends not only on the sub but also on the classification algorithm used by the wrapper. In this case, to perform very well; however, other algorithms may have different l mance. The same wrapper procedure using a Random Forest leads to an of accuracy, which finally reaches 100%, as shown in Figure 7. Regarding the fact that our method is not able to fully classify output O4 in Figure 5, we must stress the fact that the final accuracy depends not only on the subset of attributes but also on the classification algorithm used by the wrapper. In this case, C4.5 is not able to perform very well; however, other algorithms may have different levels of performance. The same wrapper procedure using a Random Forest leads to another evolution of accuracy, which finally reaches 100%, as shown in Figure 7.

Conclusions
Given the fact that the patients' database is reduced (under 100 pa

Conclusions
Given the fact that the patients' database is reduced (under 100 patients), classical machine learning methods seem to be a reasonable choice. With various error rates, all the applied algorithms support the finding that the survival rate cannot be predicted based only on Claudin-7 expression, yet there are correlations between clinicopathological parameters and Claudin-7, e.g., Leukocyte Infiltrate. Different outcomes might result from applying more complex deep learning techniques, but they may require a larger database of patients, since less data may result in overfitting and thus unreliable results. However, classical statistical algorithms have once again proven their crucial role in this research field.
By contrast, from a statistical point of view, the study showed that immunohistochemical intensity overexpression of Claudin-7 in the tumor invasive front may represent a poor prognostic factor in the advanced stages of CRCs.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of "Sf. Spiridon" Emergency County Hospital, Iasi, Romania (2019).

Informed Consent Statement:
Written informed consent has been obtained from the patients to publish this paper.