Early Screening of Colorectal Precancerous Lesions Based on Combined Measurement of Multiple Serum Tumor Markers Using Artificial Neural Network Analysis

Many patients with colorectal cancer (CRC) are diagnosed in the advanced stage, resulting in delayed treatment and reduced survival time. It is urgent to develop accurate early screening methods for CRC. The purpose of this study is to develop an artificial intelligence (AI)-based artificial neural network (ANN) model using multiple protein tumor markers to assist in the early diagnosis of CRC and precancerous lesions. In this retrospective analysis, 148 cases with CRC and precancerous diseases were included. The concentrations of multiple protein tumor markers (CEA, CA19-9, CA 125, CYFRA 21-1, CA 72-4, CA 242) were measured by electrochemical luminescence immunoassays. By combining these markers with an ANN algorithm, a diagnosis model (CA6) was developed to distinguish between normal healthy and abnormal subjects, with an AUC of 0.97. The prediction score derived from the CA6 model also performed well in assisting in the diagnosis of precancerous lesions and early CRC (with AUCs of 0.97 and 0.93 and cut-off values of 0.39 and 0.34, respectively), which was better than that of individual protein tumor indicators. The CA6 model established by ANN provides a new and effective method for laboratory auxiliary diagnosis, which might be utilized for early colorectal lesion screening by incorporating more tumor markers with larger sample size.


Introduction
Colorectal cancer (CRC) is one of the leading causes of cancer death in both men and women worldwide [1]. Most CRC patients are diagnosed in the advanced stage because they are usually asymptomatic in the early stage. The five-year survival rate for metastatic CRC is very low, remaining at around 14%, while scientific and clinical advances in early detection and surgery have improved five-year survival rates to 90% and 71% for localized and regionalized CRC, respectively [2]. Given this sobering fact, CRC screening has been recommended in many countries, including China. Therefore, the development of sensitive, efficient, and reliable testing techniques is essential for the early diagnosis of CRC and precancerous lesions, providing more opportunities for effective treatment and intervention.
Prior to starting the study, ethical approval was obtained from the Ethics Committee of Xinhua Hospital, Shanghai Jiao Tong University School of Medicine (Approval No. XHEC-D-2023-093). All cases registered in the Pathology Laboratory of Xinhua Hospital from March 2022 to February 2023 were included in this study with final diagnosis. All cases were divided into two groups: precancerous disease and CRC. Inclusion criteria: (1) precancerous disease: patients with colonoscopy or pathological diagnosis results were included, who were clinically diagnosed with diseases including adenomatous polyp, hyperplastic polyp, and inflammatory polyp. (2) CRC: patients with colorectal adenocarcinoma were enrolled based on clinical and histopathological findings. Exclusion criteria: patients with unclear clinical diagnosis, repeated examination or treatment, or co-existing other malignancies. For patients with precancerous disease or CRC who had been admitted multiple times, only the data for the first diagnosis without treatment (including surgical treatment and drug therapy) was considered to minimize bias.
A total of 148 cases that fulfilled the eligibility criteria were included in this study, including 74 (50%) precancerous lesion cases and 74 (50%) CRC cases. The TNM stage was rescheduled according to the 8th edition AJCC Cancer Staging Manual [25]. Clinical guidelines for the diagnosis and treatment of CRC released by the Chinese Society of Clinical Oncology (CSCO) in 2021 defined early-stage CRC as cancer cells confined to the mucous lamina propria or penetrating the musculi of the colorectal mucous membrane to infiltrate into the submucosa but not involving the musculi propria. Of the 74 CRC cases, 18 (24.3%) were in early stage. In order to ensure the accurate comparison of results, 61 apparently healthy people were selected as normal healthy controls during the physical examination.

Measurement of Protein Marker Concentration Using ECL Immunoassay
An amount of 5 mL of venous blood was collected from fasting subjects in the morning and centrifuged at 3000 RPM for 10 min. The serum was separated and stored at −80 • C and thawed immediately prior to testing. The concentrations of serum protein tumor markers (including CEA, CA19-9, CA 125, CYFRA 21-1, CA 72-4, CA 242, CA 153, AFP, and SCC) were measured with an ECL immunoassay analyzer according to Roche's ECL instruction manual. The total time was 15-25 min. Specifically, the sample was added to a cuvette, and a biotin-labeled capture antibody and a ruthenium complex-labeled detection antibody were subsequently added. The mixture was incubated to form sandwich immune complexes. Then, streptavidin-coated magnetic particles were added to capture the immune complexes via biotin and streptavidin interaction. The reaction mixture was then sucked into the measuring unit, where the particles were magnetically captured to the electrode surface and flushed with tripropylamine and a cleaning solution. Next, the chemiluminescence reaction was performed on the electrode surface, the signal was detected by photomultiplier tubes, and the result was measured by the calibration curves. When the detection result exceeded the detection limit, the high or low value of the detection limit was recorded in the statistics. The indoor quality control, accuracy, precision, and other performance results related to these indicators were acceptable to ensure that the included patients' data were accurate and reliable.

Development of ANN-Based Prediction Model
All records and observations were reviewed by two board-certified colorectal pathologists. Patients with early-stage CRC accounted for 24 21.6%. Z-score normalization was carried out before modeling, that is, the mean value and standard deviation of the original data were used to normalize the data. The variables that were associated with the patients were used for developing the ANN-based prediction model. In the ANN modeling process, we randomly divided the data into two subsets: 146 patients (nearly 70%) for constructing the model (as the training subset) and the remaining 63 patients (nearly 30%) for testing the model (as the validation subset) ( Table 1). We used the ANN model based on the scikit-learn (sklearn) library for modeling. The architecture of the model includes an input layer, a hidden layer, and an output layer. In the hidden layer, we select 100 neurons and use the rectified linear unit (ReLU) as the activation function. ReLU activation functions perform well in dealing with nonlinear relationships and provide better model representation. We used LBFGS as the solver and set the regularization parameter (alpha) to 0.01 to control the complexity of the model and prevent overfitting. Six markers that could independently distinguish normal healthy and abnormal groups in the receiver operating characteristic (ROC) curve analysis were included in the model (including CEA, CA19-9, CA 125, CYFRA 21-1, CA 72-4, and CA 242). The output of the ANN model was assigned a prediction score between 0 and 1. The higher the output value, the higher the positive risk. Thresholds were then selected from the training set to calculate the specificity and sensitivity of the validation set. We used hyperparameter retrieval to find better parameters of the current model, that is, to find a better feature screening (dimensionality reduction) algorithm and its parameters, so as to obtain better results.

Testing the Performance of ANN-Based Prediction Model
The performance of the AI model based  Table S1. In addition, the area under a receiver operation characteristic curve (AUC) was used for comparing the prediction power of the described model. The quantitative index of the prediction ability of the model was directly expressed by the prediction score.

Statistical Analysis
The Deepwise & Beckman Coulter DxAI platform (https://dxonline.deepwise.com/ login) (accessed on 26 June 2023) was used to perform the ANN algorithm. This platform was based on scikit-learn 1.2.2 for packaging modeling using the neural network models algorithm. A detailed introduction to the algorithm and the original code can be found on the following website: (https://scikit-learn.org/stable/modules/neural_networks_ supervised.html) (accessed on 26 June 2023). Distributed variables were presented as means ± SD, and the significance of differences was determined with Student's t-test or the Wilcoxon rank sum test. The confidence interval (CI) was used to estimate the population parameters of the sample. The chi-square test was analyzed by the SPSS12.0 statistical package. A p-value less than 0.05 was considered statistically significant.

Comparison of Tumor Marker Levels among Different Groups
Based on the inclusion and exclusion criteria, 209 cases were selected, among which 74 cases were CRC, 74 cases were benign precancerous diseases, and 61 cases were normal healthy controls. The concentrations of protein tumor markers were measured by ECL immunoassays, and the results are shown in Figure 1A. The analytical performance of assays was summarized in Table S2. We compared the levels of nine protein tumor markers among the three groups. The level of CEA 3)] showed significant differences (p < 0.05).
When the training set, validation set, and total cases were included as research objects, the ROC curve analysis showed that the CA6 model had a good ability to distinguish between normal healthy and abnormal groups, and the AUC values were 0.98 (95% Cl 0.96-1.00), 0.96 (95% Cl 0.92-1.00) and 0.97 (95% Cl 0.95-0.99), respectively (Table 2, Figure 2A-C). (B) Difference comparison of artificial neural network (ANN)-derived prediction score among the three groups. * p < 0.05, the indicators of benign diseases were significantly higher than those of normal healthy subjects. ** p < 0.05, the indicators of colorectal cancer (CRC) group were significantly higher than those of normal healthy or benign disease subjects.

Evaluation of ANN Model Prediction Efficiency
A total of 209 cases, divided into normal healthy and abnormal groups, were used to train the ANN model based on the parameters mentioned above, including CEA, CA 19-9, CA 125, CA 242, CYFRA 21-1, and CA 72-4. The ANN model, named CA6, finally output the prediction score. The fitting results were satisfactory. The training set had a high accuracy of 94%. The AUC, sensitivity, and specificity of the training set were 0.98, 93%, and 95%, respectively. In the validation set, the prediction results showed that the AUC, accuracy, sensitivity, and specificity were 0.92, 83%, 96%, and 50%, respectively. (B) Difference comparison of artificial neural network (ANN)-derived prediction score among the three groups. * p < 0.05, the indicators of benign diseases were significantly higher than those of normal healthy subjects. ** p < 0.05, the indicators of colorectal cancer (CRC) group were significantly higher than those of normal healthy or benign disease subjects.

Diagnostic Efficacy Comparison of ANN Model with Other Markers
By comparing and analyzing the ability to distinguish between the normal healthy group and the abnormal group using the ROC curve, we found that the prediction score was much better than individual tumor markers, and its AUC was 0.97 (95% Cl 0.95-0.99, standard error 0.010, p < 0.001). Under the condition of p < 0.05, the AUCs of individual

Consistency between ANN Model and Clinical Diagnosis
The agreement between the prediction behavior of CA6 model and the actual diagnosis was further analyzed. In both the training and validation sets, patients' risk scores were consistent with clinical diagnosis results (p < 0.05) ( Figure 3A). The histogram in the upper part of Figure 3A was marked with different colors to indicate that some subjects predicted by the model to be abnormal were actually normal in pathological diagnosis (false positive: 2/98 in training and 9/52 in validation). In contrast, the bottom histogram shows that some subjects with abnormal pathological diagnosis were predicted to be normal by the model (false negative: 7/48 in training and 2/11 in validation). Results demonstrate that the false positive rate and false negative rate of the CA6 model were lower than those of conventional tumor markers. As can be seen from the calibration curve ( Figure 3B), the predicted value of the model was close to the actual diagnosis probability. The calibration curve is an evaluation index suitable for probabilistic models such as ANN. It isa curve with the predicted value as the abscissa and the real value as the ordinate. The closer the calibration curve was to the diagonal, the better the performance of the model. The consistency of the total number of normal healthy group and abnormal group patients predicted by the training set and validation set was compared with pathological diagnosis results, and the chi-square test showed a good consistency (χ 2 = 107.794, p < 0.001 and χ 2 = 18.515, p < 0.001) ( Table 3). These results showed that the prediction results of the CA6 model are in good agreement with the actual diagnosis.
upper part of Figure 3A was marked with different colors to indicate that some subjects predicted by the model to be abnormal were actually normal in pathological diagnosis (false positive: 2/98 in training and 9/52 in validation). In contrast, the bottom histogram shows that some subjects with abnormal pathological diagnosis were predicted to be normal by the model (false negative: 7/48 in training and 2/11 in validation). Results demonstrate that the false positive rate and false negative rate of the CA6 model were lower than those of conventional tumor markers. As can be seen from the calibration curve ( Figure  3B), the predicted value of the model was close to the actual diagnosis probability. The calibration curve is an evaluation index suitable for probabilistic models such as ANN. It isa curve with the predicted value as the abscissa and the real value as the ordinate. The closer the calibration curve was to the diagonal, the better the performance of the model. The consistency of the total number of normal healthy group and abnormal group patients predicted by the training set and validation set was compared with pathological diagnosis results, and the chi-square test showed a good consistency (χ² = 107.794, p < 0.001 and χ² = 18.515, p < 0.001) ( Table 3). These results showed that the prediction results of the CA6 model are in good agreement with the actual diagnosis.

Evaluation of ANN Model Prediction Efficiency in Early Colorectal Diseases
In order to evaluate the diagnostic efficiency of the CA6 model for benign diseases and early-stage CRC, 74 cases of benign diseases and 18 cases of early-stage CRC were analyzed. The CA6 model had high diagnostic efficiency in distinguishing between normal healthy subjects and patients with benign disease or early-stage CRC. Forthe benign disease, it had an AUC of 0.97 (95% Cl 0.94-0.99). When the cut-off value was set to 0.39, the specificity was 80%, and the sensitivity was 94% (Table 4 and S4). The diagnosis efficiency of early-stage CRC was also good, with an AUC of 0.93 (95% Cl 0.87-0.97), a cutoff value of 0.34, a specificity of 75%, and a sensitivity of 94%. The AUC of the prediction score of benign disease and early CRC was significantly higher than that of individual protein tumor indicators (Figure 4 and Table S4). When the cases of benign disease and

Evaluation of ANN Model Prediction Efficiency in Early Colorectal Diseases
In order to evaluate the diagnostic efficiency of the CA6 model for benign diseases and early-stage CRC, 74 cases of benign diseases and 18 cases of early-stage CRC were analyzed. The CA6 model had high diagnostic efficiency in distinguishing between normal healthy subjects and patients with benign disease or early-stage CRC. Forthe benign disease, it had an AUC of 0.97 (95% Cl 0.94-0.99). When the cut-off value was set to 0.39, the specificity was 80%, and the sensitivity was 94% (Table 4 and Table S4). The diagnosis efficiency of early-stage CRC was also good, with an AUC of 0.93 (95% Cl 0.87-0.97), a cut-off value of 0.34, a specificity of 75%, and a sensitivity of 94%. The AUC of the prediction score of benign disease and early CRC was significantly higher than that of individual protein tumor indicators (Figure 4 and Table S4). When the cases of benign disease and early CRC were pooled together, the overall prediction score was also significantly higher than that of individual protein tumor indicators; it had an AUC of 0.96 (95% Cl 0.94-0.99), a cut-off value of 0.30, a specificity of 72%, and a sensitivity of 97%. The number of patients with normal and early colorectal disease identified by the AUC curve was compared with the actual diagnosis, and the chi-square test showed good agreement (χ 2 = 89.172, p < 0.001) ( Table 5). These results illustrated that the CA6 model established by the ANN algorithm had a great potential diagnostic efficacy not only for total abnormal subjects but also for benign disease and early-stage CRC, which is of great significance for improving the clinical recognition of early colorectal diseases.  (A) (B) (C)

Discussion
Many CRC patients have no symptoms or signs in the early stage and thus cannot be diagnosed and treated in time. Delayed diagnosis and treatment significantly reduce patient survival time. CRC is mainly diagnosed by endoscopy and pathological biopsy combined with clinical symptoms [26]. Although these methods are effective, they are invasive, which might cause harm to patients and are thus not suitable for dynamic monitoring of disease. Non-invasive methods, such as fecal hemoglobin, which are used for large-scale screening, have been shown to reduce CRC-related mortality [27,28]. These methods have advantages with respect to cost, safety, and convenience [29]. However, they usually suffer from low sensitivity and specificity and might cause false positive results if the subjects are not compliant with screening recommendations. In addition, fecal or plasma DNA methylation tests, NGS tests of ctDNA, etc. all have their own shortcomings, such as low specificity and high cost. Current diagnostic methods for the diagnosis of early colorectal lesions, especially precancerous lesions, are not satisfactory. Therefore, it is important to develop safe, reliable, specific, effective methods for the accurate diagnosis of early colorectal lesions. Previous studies have demonstrated that analysis of multiple tumor markers combined with ML technology provides a promising platform for cancer diagnosis [30]. Therefore, we measured blood concentrations of multiple protein markers that are relevant to the diagnosis, disease surveillance, or prognosis of gastrointestinal tumors. The ANN algorithm was used to establish a model to distinguish between healthy and unhealthy subjects and to verify the efficiency of the model in diagnosing early colorectal diseases.
ECL immunoassay has been widely used for the detection of various clinical protein markers due to its high automaticity, easy operation process, accurate results, and good traceability. Those markers detected by the ECL method have been well used in the diagnosis, disease monitoring, and prognosis of various cancers. Serum tumor markers are used for the diagnosis of CRC, but single markers usually show insufficient sensitivity and specificity. In this study, we tested nine protein markers, among which CEA and CA19-9 were recommended in the Chinese expert consensus on experimental diagnostic technology for screening of early CRC and precancerous lesions [31]. CA125 was a cell surface glycoprotein that is abnormally expressed in most gastrointestinal adenocarcinomas and associated with the diagnosis and prognosis of CRC. CA 72-4 and CA 242 were often recommended as complementary indicators for screening and monitoring the therapeutic efficacy of gastrointestinal tumors [32]. In recent studies, CYFRA 21-1 was integrated with some other markers for CRC identification, although it was generally considered as a marker for non-small cell lung cancer [33]. Therefore, we focused on the role of these tumor markers in the early diagnosis of colorectal lesions.
Combining multiple serum tumor markers increases diagnostic sensitivity but decreases specificity, and vice versa. The development of AI and computer technology improves the potential of multiple serum tumor markers in screening for certain diseases [34]. In this study, a popular data mining algorithm, ANN, was used. ANN has the advantage of automatically detecting and modeling complex nonlinear relationships between the input layer and the output layer of the network and contains all possible interactions among input variables. The layers used in the ANN algorithm are composed of interconnected neurons. ANN analysis as a statistical modeling tool that has demonstrated the ability to assimilate information from multiple sources and detect subtle and complex patterns [35]. Recently, a large number of studies have discussed the application of ANN in cancer diagnosis and treatment guidance. Matsuda et al. used ANN to evaluate the endoscopic response of esophageal cancer patients receiving neoadjuvant chemotherapy [36]. Fan et al. developed a non-invasive and low-cost artificial neural network model integrating CA125, AFP, and CA242 tests, which was a valuable tool to assist in the diagnosis of gastric cancer [22]. Liu et al. demonstrated that their risk score model was robust and reliable for evaluating the prognosis with novel diagnostic and treatment targets in CRC [37]. Abdul Rahman et al. trained ANN models with large heterogeneous datasets and provided a solid foundation for building effective clinical decision support systems assisting healthcare providers in dietary-related, non-invasive screening in CRC [38].
Although there are many factors in CRC diagnosis, and the relationship between them is complex, ANN can learn fuzzy evaluation that cannot be described by mathematical methods and deal with some complex, uncertain, and nonlinear problems by imitating human intelligent behavior, which has good fault tolerance and fast parallel processing ability. Based on the ANN algorithm, multiple serum tumor markers were combined for modeling. Our analysis showed that the ANN model can improve the diagnostic efficiency of colorectal diseases, including benign lesions and CRC. In the ROC curve analysis of both the training set and the validation set, the prediction score output by ANN was proven to be superior to individual markers such as CEA and CA19-9. When the levels of the ANNderived prediction score among these three groups were compared, the benign disease group was significantly higher than the normal healthy group, while the CRC group was significantly higher than the benign disease and normal healthy groups. These results indicate that the prediction score was also an important indicator for these three groups.
Importantly, the prediction score significantly improved the AUC, sensitivity, specificity, and accuracy of the diagnosis potential of benign precancerous lesions or early CRC, i.e., ANN can be used to distinguish benign lesions or early CRC from healthy people. Comparative studies of ROC curves support the conclusion that the ANN model using multiple markers improves sensitivity and has higher diagnostic accuracy without sacrificing specificity. ANN can be used to improve the accuracy of combined diagnosis of multiple serum tumor markers. Prediction score combined with six serum tumor markers can distinguish not only benign diseases but also early CRC from normal controls. This strong evidence proves that ANN model is a promising tool to assist in the diagnosis and screening of early CRC. Based on the characteristics of good accuracy and low cost of the ANN model, it is expected to be used as an intelligent tool to screen the high-risk for CRC population for primary prevention. Large prospective cohort studies can further determine whether individuals identified as part of the high-risk group by the ANN model will be diagnosed as CRC in subsequent years. In addition, new serum markers need to be included to develop practical and reliable ANN models to assess the risk of CRC.

Conclusions
We measured CRC-related protein tumor markers by ECL immunoassays and constructed model CA6 for early diagnosis of colorectal lesions using the ANN algorithm. This model integrated six protein tumor marker variables. The diagnostic efficiency of this model was satisfactory, and it could significantly improve the ability to distinguish earlystage CRC and precancerous lesions from normal healthy people. Results demonstrate that a diagnosis model integrating multiple tumor marker data is very useful in enhancing laboratory auxiliary diagnosis and prediction. Further studies with more tumor markers and a larger population are required to make the model more accurate and reliable.
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/bios13070685/s1, Table S1. Matrices of true and predicted conditions. Table S2. Electrochemical luminescence performance of nine tumor markers. Table S3. Diagnostic efficiency of univariate marker to distinguish between normal healthy and abnormal groups. Table S4. Diagnostic efficiency of univariate marker to distinguish benign diseases and early-stage colorectal cancer (CRC) from normal healthy subjects.

Data Availability Statement:
The data presented in this study are available on request.