Chromogranin-A Expression as a Novel Biomarker for Early Diagnosis of Colon Cancer Patients

Colon cancer is one of the major causes of cancer death worldwide. The five-year survival rate for the early-stage patients is more than 90%, and only around 10% for the later stages. Moreover, half of the colon cancer patients have been clinically diagnosed at the later stages. It is; therefore, of importance to enhance the ability for the early diagnosis of colon cancer. Taking advantages from our previous studies, there are several potential biomarkers which have been associated with the early diagnosis of the colon cancer. In order to investigate these early diagnostic biomarkers for colon cancer, human chromogranin-A (CHGA) was further analyzed among the most powerful diagnostic biomarkers. In this study, we used a logistic regression-based meta-analysis to clarify associations of CHGA expression with colon cancer diagnosis. Both healthy populations and the normal mucosa from the colon cancer patients were selected as the double normal controls. The results showed decreased expression of CHGA in the early stages of colon cancer as compared to the normal controls. The decline of CHGA expression in the early stages of colon cancer is probably a new diagnostic biomarker for colon cancer diagnosis with high predicting possibility and verification performance. We have also compared the diagnostic powers of CHGA expression with the typical oncogene KRAS, classic tumor suppressor TP53, and well-known cellular proliferation index MKI67, and the CHGA showed stronger ability to predict early diagnosis for colon cancer than these other cancer biomarkers. In the protein–protein interaction (PPI) network, CHGA was revealed to share some common pathways with KRAS and TP53. CHGA might be considered as a novel, promising, and powerful biomarker for early diagnosis of colon cancer.


Introduction
Colon cancer is one of the leading causes of cancer death worldwide [1]. In 2018, there were more than 1,096,601 new diagnosed colon cancer cases, and around 551,269 patients were dead from the colon cancer worldwide [2]. The advanced modern medicines and surgery techniques have nowadays benefited the stage I and II colon cancer patients and their five-year survival rate has reached to 90% [3]. However, the five-year survival rate for the stage IV colon cancer patients is only around 10% [3]. Moreover, more than 50% of colon cancer patients are already at the late stages when they are clinically diagnosed [4]. As such, it is urgent to find more convenient and accurate biomarkers for early diagnosis of colon cancer.
Although colonoscopy and biopsy pathological examination have been considered as the golden test for colon cancer final diagnosis and primary treatment [5], there are a number of early colon cancer

Logistic Regression-Based Meta-Analysis Has Been Used in This Study
The Study Design Was Represented in Figure 1.

Logistic Regression-Based Meta-Analysis Has Been Used in This Study
The Study Design Was Represented in Figure 1. Figure 1. Study pipeline. In this study we used the human microarray Gene expression (GE) data from the Gene Expression Omnibus (GEO) database and logistic regression to formulate the 2 × 2 table for meta-analysis. After the chromogranin-A (CHGA) diagnostic meta-analysis, we utilized the RNA-seq data from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) to test the CHGA expression in colon cancer patients and healthy controls to verify the results for metaanalysis. Meanwhile the diagnostic meta-analysis for several other reported biomarkers were also conducted using the same datasets. Finally, we predicted several CHGA-associated biomarkers from the protein-protein interaction (PPI) network and GE level.

Data Collection
Microarray human series were collected from the GEO database using a keywords search. There were 1021 GEO datasets collected and 1012 were excluded due to lack of stage information. There were two datasets without detailed cancer patients and experiments information. One dataset did not contain enough samples and two datasets did not include the GE data for CHGA. Eventually, there were four ideal datasets (GSE44076, GSE74602, GSE10972, and GSE23878), including 187 colon cancer patients and 226 normal controls for the analyses. Table 1 shows the characteristics of the included studies.  1. Study pipeline. In this study we used the human microarray Gene expression (GE) data from the Gene Expression Omnibus (GEO) database and logistic regression to formulate the 2 × 2 table for meta-analysis. After the chromogranin-A (CHGA) diagnostic meta-analysis, we utilized the RNA-seq data from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) to test the CHGA expression in colon cancer patients and healthy controls to verify the results for meta-analysis. Meanwhile the diagnostic meta-analysis for several other reported biomarkers were also conducted using the same datasets. Finally, we predicted several CHGA-associated biomarkers from the protein-protein interaction (PPI) network and GE level.

Data Collection
Microarray human series were collected from the GEO database using a keywords search. There were 1021 GEO datasets collected and 1012 were excluded due to lack of stage information. There were two datasets without detailed cancer patients and experiments information. One dataset did not contain enough samples and two datasets did not include the GE data for CHGA. Eventually, there were four ideal datasets (GSE44076, GSE74602, GSE10972, and GSE23878), including 187 colon cancer patients and 226 normal controls for the analyses. Table 1 shows the characteristics of the included studies.

Logistic Regression
Logistic regression was utilized to clarify the expression data to establish the 2 × 2 table for  meta-analysis (Table 2). GSE44076 contains 2 groups of control: one from healthy samples, the other one from corresponding normal adjacent mucosa of patients.  Figure 2 presents the results, using forest plots, from the meta-analysis. Figure 2A shows the sensitivity of the forest plot for CHGA as a biomarker in the diagnosis of early-stage colon cancer, from which a pooled sensitivity of 0.89 (0.85 to 0.93) was calculated. The specificity forest plot was drawn in Figure 2B (0.89, 0.85 to 0.93). Figure 2C,D present the likelihood ratios for CHGA: positive-likelihood ratio (PLR) 7.86 (2.27 to 27.25) and negative-likelihood ratio (NLR) 0.14 (0.08 to 0.22). CHGA performed a diagnostic odds ratio (DOR) of 57.27 (14.83 to 222.24), which is available on the forest plot in Figure 2E. All these statistics were based on 95% confidence.

Logistic Regression
Logistic regression was utilized to clarify the expression data to establish the 2 × 2 table for metaanalysis (Table 2). GSE44076 contains 2 groups of control: one from healthy samples, the other one from corresponding normal adjacent mucosa of patients.  Figure 2 presents the results, using forest plots, from the meta-analysis. Figure 2A shows the sensitivity of the forest plot for CHGA as a biomarker in the diagnosis of early-stage colon cancer, from which a pooled sensitivity of 0.89 (0.85 to 0.93) was calculated. The specificity forest plot was drawn in Figure 2B (0.89, 0.85 to 0.93). Figure 2C,D present the likelihood ratios for CHGA: positivelikelihood ratio (PLR) 7.86 (2.27 to 27.25) and negative-likelihood ratio (NLR) 0.14 (0.08 to 0.22). CHGA performed a diagnostic odds ratio (DOR) of 57.27 (14.83 to 222.24), which is available on the forest plot in Figure 2E. All these statistics were based on 95% confidence. The summary receiver operator characteristic (SROC) curve for CHGA diagnostic meta-analysis is presented in Figure 3 and CHGA showed high diagnostic accuracy: area under curve (AUC) 0.9370 and Q value 0.8736. The summary receiver operator characteristic (SROC) curve for CHGA diagnostic meta-analysis is presented in Figure 3 and CHGA showed high diagnostic accuracy: area under curve (AUC) 0.9370 and Q value 0.8736.

Comparison of the CHGA with Other Biomarkers
In order to make a comparison for the diagnostic effect of the CHGA with the identified biomarkers, the diagnostic meta-analysis for MKI67, TP53, and KRAS were conducted using the same datasets. The results for these meta-analyses are listed in Table 3.

Verification in RNA-Seq Data
The CHGA expression of the microarray datasets was verified in the RNA-seq data from GTEx and TCGA databases. Figure 4A-D presents the expression of the microarray datasets from our metaanalysis. Figure 4E shows the CHGA expression levels for colon cancer patients and healthy controls in TCGA and GTEX databases. The expression of the CHGA levels was markedly reduced in the colon cancer patients as compared with the CHGA expression in the normal controls in both microarray and RNA-seq data.

Comparison of the CHGA with Other Biomarkers
In order to make a comparison for the diagnostic effect of the CHGA with the identified biomarkers, the diagnostic meta-analysis for MKI67, TP53, and KRAS were conducted using the same datasets. The results for these meta-analyses are listed in Table 3.

Verification in RNA-Seq Data
The CHGA expression of the microarray datasets was verified in the RNA-seq data from GTEx and TCGA databases. Figure 4A-D presents the expression of the microarray datasets from our meta-analysis. Figure 4E shows the CHGA expression levels for colon cancer patients and healthy controls in TCGA and GTEX databases. The expression of the CHGA levels was markedly reduced in the colon cancer patients as compared with the CHGA expression in the normal controls in both microarray and RNA-seq data.

CHGA-Related PPI Networks and Biological Explanation
Our previous study showed that CRC biomarkers had strong relationships on the PPI network [10]. Therefore, we supposed that the close neighbors with CHGA on the PPI network had high possibility to be further biomarkers. In order to further analyze the biological interactions for CHGA, PPI networks were constructed for both its closest proteins and other biomarkers in Figure 5. SCG3, SCG2, SST, NCAM1, ENO2, GAST, SYP, SYT1, STX1A, and CHGB, as the nearby proteins of the CHGA, were predicted as potential future early diagnostic biomarkers for colon cancer ( Figure 5A). CHGA expression was found to be associated with KRAS and TP53 ( Figure 5B).

CHGA-Related PPI Networks and Biological Explanation
Our previous study showed that CRC biomarkers had strong relationships on the PPI network [10]. Therefore, we supposed that the close neighbors with CHGA on the PPI network had high possibility to be further biomarkers. In order to further analyze the biological interactions for CHGA, PPI networks were constructed for both its closest proteins and other biomarkers in Figure 5. SCG3, SCG2, SST, NCAM1, ENO2, GAST, SYP, SYT1, STX1A, and CHGB, as the nearby proteins of the CHGA, were predicted as potential future early diagnostic biomarkers for colon cancer ( Figure 5A). CHGA expression was found to be associated with KRAS and TP53 ( Figure 5B).

CHGA-Related PPI Networks and Biological Explanation
Our previous study showed that CRC biomarkers had strong relationships on the PPI network [10]. Therefore, we supposed that the close neighbors with CHGA on the PPI network had high possibility to be further biomarkers. In order to further analyze the biological interactions for CHGA, PPI networks were constructed for both its closest proteins and other biomarkers in Figure 5. SCG3, SCG2, SST, NCAM1, ENO2, GAST, SYP, SYT1, STX1A, and CHGB, as the nearby proteins of the CHGA, were predicted as potential future early diagnostic biomarkers for colon cancer ( Figure 5A). CHGA expression was found to be associated with KRAS and TP53 ( Figure 5B).  In order to investigate the biological explanation for the reason why CHGA performed so well in the diagnosis of early-stage colon cancer, gene ontology (GO) analyses were conducted for the CHGA-related genes (from closest PPI and CHGA-TP53-KRAS PPI, Table 4). We found that CHGA and its closest genes were strongly associated with the regulations of cell communication and signaling at the biological function level (Table 4A). At the cellular component level, they are closely linked to the transport functions (Table 4B). Additionally, we found that CHGA and its related biomarkers (TP53 and KRAS) were mapped on the regulation of neuron death and cell death pathways (Table 4C).

Prediction for CHGA-Related Biomarkers from Expression Levels
The genes with similar expression levels always performed relatively similar functions [30]. Figure 6 shows the similar expressions of the genes to CHGA in colon cancer, which were predicted as further biomarker candidates.

Prediction for CHGA-related Biomarkers from Expression Levels
The genes with similar expression levels always performed relatively similar functions [30]. Figure 6 shows the similar expressions of the genes to CHGA in colon cancer, which were predicted as further biomarker candidates.

Discussion
Accumulating evidence has shown that the majority of colon cancer patients in their early stages (I and II) will benefit from modern cancer therapies, and their five-year survival rate has reached up to more than 90%. However, the five-year survival rate for the later stages (III and IV) of colon cancer patients remain at about 10% [3]. Moreover, more than 50% of colon cancer patients are already at the late stages when their cancer is clinically diagnosed [4]. The strict rule is that the earlier the diagnosis for cancers, including colon cancer, the better therapies the cancer patients will receive and the better the outcome the patients will have. It; therefore, appears significantly important to search for more convenient, accurate, and powerful biomarkers to meet such an urgent need for the early diagnosis of colon cancer.
Biomarkers as biological indicators and conditions have been widely shown to improve the diagnosis, therapeutic response, and prognosis of colon cancer [5]. Many cancer researchers, including our CRC research group, have been working on identifying the significance of various biomarkers for CRC [7][8][9][10]. In order to study the essential roles and important functions of biomarkers in CRC, we have created a CRC biomarker database [11], and have further analyzed, with AI-assisted verification, the potential applications of DNA, RNA, and protein biomarkers in diagnosis, therapy, and prognosis for CRC [10].
CHGA is a 439-Kd protein in the secretory granules of many normal and neoplastic neuroendocrine cells and it plays an essential role in the mechanisms of protein storage and release [13]. CHGA was considered as a biomarker for neuroendocrine neoplasms [17,18] and was approved, with microarrays and tissue arrays, as a potential biomarker for early cancer diagnosis of gastric cancer [19] and prostate cancer [20]. The majority of pancreatic neuroendocrine tumors showed CHGA positive immunostaining, and primary tumors with metastases revealed significantly less CHGA protein expression than primary tumors without metastases [21]. However, there is no study concerning CHGA in early diagnosis of colon cancer.
In this study, CHGA expression was found to be decreased in the early stages of colon cancer in patients, as compared to CHGA expression levels in both the healthy populations and the normal colon cancer mucosa from the colon cancer patients. This evidence indicated that CHGA might play an essential role in the initiation of colon cancer, from the normal colon tissue to early cancer. The

Discussion
Accumulating evidence has shown that the majority of colon cancer patients in their early stages (I and II) will benefit from modern cancer therapies, and their five-year survival rate has reached up to more than 90%. However, the five-year survival rate for the later stages (III and IV) of colon cancer patients remain at about 10% [3]. Moreover, more than 50% of colon cancer patients are already at the late stages when their cancer is clinically diagnosed [4]. The strict rule is that the earlier the diagnosis for cancers, including colon cancer, the better therapies the cancer patients will receive and the better the outcome the patients will have. It; therefore, appears significantly important to search for more convenient, accurate, and powerful biomarkers to meet such an urgent need for the early diagnosis of colon cancer.
Biomarkers as biological indicators and conditions have been widely shown to improve the diagnosis, therapeutic response, and prognosis of colon cancer [5]. Many cancer researchers, including our CRC research group, have been working on identifying the significance of various biomarkers for CRC [7][8][9][10]. In order to study the essential roles and important functions of biomarkers in CRC, we have created a CRC biomarker database [11], and have further analyzed, with AI-assisted verification, the potential applications of DNA, RNA, and protein biomarkers in diagnosis, therapy, and prognosis for CRC [10].
CHGA is a 439-Kd protein in the secretory granules of many normal and neoplastic neuroendocrine cells and it plays an essential role in the mechanisms of protein storage and release [13]. CHGA was considered as a biomarker for neuroendocrine neoplasms [17,18] and was approved, with microarrays and tissue arrays, as a potential biomarker for early cancer diagnosis of gastric cancer [19] and prostate cancer [20]. The majority of pancreatic neuroendocrine tumors showed CHGA positive immunostaining, and primary tumors with metastases revealed significantly less CHGA protein expression than primary tumors without metastases [21]. However, there is no study concerning CHGA in early diagnosis of colon cancer.
In this study, CHGA expression was found to be decreased in the early stages of colon cancer in patients, as compared to CHGA expression levels in both the healthy populations and the normal colon cancer mucosa from the colon cancer patients. This evidence indicated that CHGA might play an essential role in the initiation of colon cancer, from the normal colon tissue to early cancer. The down-expression of CHGA might be one of the critical mechanisms leading to colon cancer formation.
In one of our ongoing studies, CHGA expression was predicted as a promising early diagnosis biomarker for colon cancer via SVM and regression tree analysis, from the reported colon cancer diagnostic biomarkers' topology features on the PPI network [28], and the diagnostic value of CHGA expression in colon cancer was further verified. Meta-analysis, with its high scientific confidence, has been used to detect the diagnostic value of different biomarkers in various diseases [31,32]. The GEO database has been widely used in bioinformatics analysis because it is a comprehensive database storing big amounts of GE data. As such, more and more studies have been recently focused on the meta-analysis based on the GEO datasets, and a majority of the meta-analysis began with GE differential analyses [33,34]. Our results showed even higher confidence than the GE differential analyses, since we predicted CHGA expression as a candidate biomarker by machine learning (SVM and regression tree) based on the published biomarkers [28].
According to our previous study, CRC biomarkers always had strong relationships on the PPI networks [10]. Hence, we predicted several valuable biomarkers from the CHGA closest PPI network ( Figure 5A) which deserved further verification. In order to detect the functions and applications of these closest neighbors of the CHGA, literature verifications were performed. We searched relevant published papers from PubMed and Google scholar concerning these nearby proteins and cancers, and found that SCG3 was convinced as a prognostic biomarker for lung cancer [35] and rectal cancer [36]; SST expression was reported to have a strong relationship with advanced CRC [37]; NCAM1 was correlated with several human cancers [38]; ENO2 were predicted as treatment biomarker for acute lymphoblastic leukemia [39]; SYT1 expression was a candidate colon cancer biomarker [40]; STAX1 was proved to be associated with the transformation of high-grade tumors in the bladder cancer [41]; and as a paralog for CHGA, CHGB was a prognostic marker for the pancreatic neuroendocrine tumors [21]. Expression of SLC22A8 was considered as a valuable biomarker for the lung cancer treatment [42]. The NEUROD2 gene was related to the metastasis and the survival for colon cancer [43]. The SLC2A1 gene was related to the metabolic shift of colon cancer [44], and the HCN4 gene was correlated with low survival rate of multiple cancers [45].
A variety of biomarkers, as biological indicators in their pathways, have been widely used to improve the cancer diagnosis, therapeutic response, and prognosis, including for colon cancer [6]. A majority of cancer researchers, including our CRC research group, have been focusing on the typical oncogenes, such as KRAS [22], and classic tumor suppressors like TP53 [23], to clarify their significance of various biomarkers in colon cancer. KRAS, as a typical oncogene to initiate cancer development, has been proven in many cancer types, especially in CRC, with strong early diagnosis impact, even in the cancer progression and prognosis [22]. There are more than 50% of CRC patients with KRAS gene mutations in their early stages, and half of such patient are not benefiting from antibody therapy [22]. TP53, as a classic tumor suppressor, has revealed more than 60% of CRC patients with TP53 mutations, which is one of the commonest genetic events in the development of human CRC [23]. MKI67 is a biomarker of cellular proliferation and has been reported as an independent index for CRC cell growth [22]. The expression of MKI67 has been gradually increased from normal tissue, adenomas to adenocarcinomas, in CRC patients. The overexpression of MKI67 has been considered as the significant biomarker and predictor for primary CRC screening [25].
Majority of the candidate biomarkers predicted based on CHGA PPI and CHGA similar genes were further verified through the literatures studies. We showed that the candidate biomarkers were associated with the diagnosis, therapy, or prognosis of human cancers, including colon cancer, in which provided further evidence for the CHGA as a potential hub biomarker for the colon cancer. Furthermore, we found several important pathways for the CHGA and its closest PPI neighbors, as well as other biomarkers from GO annotation: The regulation of cell communication and signaling at the biological function level, the transport function, and the regulation of neuron death and cell death, which should be important for future colon cancer biomarker discovery. In the near future, there will be more practical machine learning techniques that will be introduced to the field of precise biomarker discovery, for the identification and verification of potential biomarkers for early diagnosis, therapy response, and prognosis in cancers.

Data Collection, Extraction, and Normalization
We followed the "Preferred Reporting Items for Systematic Reviews and Meta-Analyses" guidelines to conduct this meta-analysis.
Two investigators independently searched the GEO database for relevant studies up to February 2019. The following search terms were utilized: colon AND (cancer OR carcinoma OR neoplasia). We recorded the homo sapiens relevant series as a further filter.
The criteria to select the needed studies was as following: (1) The study contains the GE data for early-stage colon cancer patients and normal healthy controls. (2) There are clear descriptions for the patients' situation and experiment methods. The datasets recorded the country, number, and source of included samples, and the experiment methods and platform were clearly introduced, for example: "Expression profiling by array," platform "GPL13667." (3) The suitable dataset should contain at least 30 samples. (4) The dataset should include the GE data for CHGA. R package "GEOquery" was used to download the GE data from the GEO database, and the data were standardized to log scale. The function "normalizeQuantiles" in "Limma" R package was utilized to further normalize the log-scaled GE data.

Logistic Regression and Diagnostic Meta-Analysis
We extracted the GE data of CHGA for both the cancer group and the control group, and implemented logistic regression analysis to get true positive (TP), false positive (FP), true negative (TN), and false negative (FN) results for diagnostic meta-analysis. In this logistic regression, GE level was used as the variable.
Diagnostic meta-analysis for early-stage colon cancer was conducted, and a random effects model was selected as a standard statistical method in this study, which did not need new data to train the models and allowed the differences among the different studies.
The statistical data were as follows: sensitivity, specificity, PLR, NLR, and DOR with corresponding 95% confidence intervals. The SROC curves were plotted based on the sensitivity and specificity. A random effect model was used for the statistics. The heterogeneity among studies was assessed by I 2 on sensitivity of the CHGA diagnostic test. I 2 < 50% was considered as small heterogeneity.

Verification Test
In order to verify the meta-analysis result, we used RNA-seq data from the TCGA and GTEx database to compare the CHGA expression level between colon cancer patients with healthy samples. Meanwhile we also drew the boxplot for CHGA expression in our collected microarray data.

CHGA PPI Network Construction, Biological Function Analysis, and Detection for CHGA Similar Genes
We drew the PPI network for searching for the neighbors for CHGA based on protein interaction evidence, and investigated the PPI relationships of CHGA with some reported biomarkers (TP53, KRAS, and MKI67). GO annotation was conducted for analyzing the biological functions of the CHGA-related genes and biomarkers. The genes with similar expression were considered as similar genes, and we used Pearson correlation coefficients to calculate the similar genes for CHGA to provide evidence for further prediction of new biomarkers.

Software and Tools
R language was used to download the GEO data and make normalization, and logistic regression was conducted by SPSS 22.0. Diagnostic meta-analysis and heterogeneity analysis were implemented by MetaDisc 1.4. Verification test and similar genes detection were executed on the Gene Expression Profiling Interactive Analysis (GEPIA) database. The String database was used to draw the PPI network and conduct the biological function analyses.

Conclusions
A logistic regression-based meta-analysis was used to analyze the significant roles and functions of CHGA expression as a novel and promising significant biomarker for early diagnosis of colon cancer patients, and to compare CHGA in early diagnosis with the well-known oncogene, KRAS, tumor suppressor, TP53, and cellular proliferative factor, MKI67, in colon cancer. CHGA might be a further biomarker for early colon cancer patients. Several other biomarkers from the PPI network, and similar genes to CHGA, were further predicted for future early diagnosis of colon cancer.