Machine Learning in Prediction of Bladder Cancer on Clinical Laboratory Data

Bladder cancer has been increasing globally. Urinary cytology is considered a major screening method for bladder cancer, but it has poor sensitivity. This study aimed to utilize clinical laboratory data and machine learning methods to build predictive models of bladder cancer. A total of 1336 patients with cystitis, bladder cancer, kidney cancer, uterus cancer, and prostate cancer were enrolled in this study. Two-step feature selection combined with WEKA and forward selection was performed. Furthermore, five machine learning models, including decision tree, random forest, support vector machine, extreme gradient boosting (XGBoost), and light gradient boosting machine (GBM) were applied. Features, including calcium, alkaline phosphatase (ALP), albumin, urine ketone, urine occult blood, creatinine, alanine aminotransferase (ALT), and diabetes were selected. The lightGBM model obtained an accuracy of 84.8% to 86.9%, a sensitivity 84% to 87.8%, a specificity of 82.9% to 86.7%, and an area under the curve (AUC) of 0.88 to 0.92 in discriminating bladder cancer from cystitis and other cancers. Our study provides a demonstration of utilizing clinical laboratory data to predict bladder cancer.


Introduction
Bladder cancer has been noted as the 10th most common cancer in the world [1]. The incidence of bladder cancer is rising globally, especially in developed countries, such as U.S.A, Germany, and Taiwan; according to GLOBOCAN, 573,278 new cases of bladder cancer and 212,536 new deaths [2]. Furthermore, bladder cancer is observed in men more than in women, with respective incidence and mortality rates of 9.5 and 3.3 per 100,000 among men, which are four times those among women globally [2]. Moreover, smoking is considered the major risk factor in patients with bladder cancer [3]. The gold standard procedure for diagnosing bladder cancer is cystoscopy, with a sensitivity 88-100% and specificity 77.1-97% [4]. Currently, urinary cytology is considered a major non-invasive method to diagnose bladder cancer with high specificity, but only 38% sensitivity [5]. Therefore, a screening method with high sensitivity and high specificity is urgently needed for the diagnosis of bladder cancer.
Clinical chemistry tests and urinalysis are the major diagnostic screening test in the clinical laboratory [6]. The alteration of each test can be interpreted as a relationship with diseases; for instance, aspartate aminotransferase (AST) and alanine aminotransferase Herein, we collected clinical laboratory data, including biochemistry tests and urinalysis from 1336 patients with cystitis, bladder cancer, and other types of cancer in Mackay Memorial Hospital. We combined sampling techniques and two-step feature selection to exploit the clinical laboratory dataset. Furthermore, five different machine learning models were trained and validated with the selected dataset. Moreover, the accuracy, precision, f1 score, sensitivity, specificity, and area under the area of receiving operating characteristic curve (AUC) were calculated to evaluate the model performance.

Patient Cohort
We collected clinical laboratory test data from 144 patients with cystitis (56 female and 88 male patients, aged 60.12 ± 11.99), 200 patients with kidney cancer (

Statistical Analysis
Description analyses were performed with SPSS 19.0 (IBM, Chicago, IL, USA). Continuous variables are presented as mean ± SD or median (25th and 75th percentile). The categorical variables are presented as number and percentage. The clinical laboratory test and characteristics were compared with a t-test or Mann-Whitney U test for continuous variables and chi-square test for categorical variables.

Data Processing
We collected clinical laboratory data with 56 laboratory tests results. The laboratory test results with more than 50% of missing data were removed. After that, we received 31 laboratory test results with missing rate, varying between 0 to 44.1% (Table 1). The missing data was filled with the mean value for continuous value and median value for categorical value from each feature in the whole data. Features that were missing in the data in certain classifications were avoided in the feature selecting, model training, and validating. For instance, A/G ratio and urine epithelium was not included while discriminating cystitis from other cancers. The oversampling and undersampling techniques from imblearn v0.0 package were used for the problem of imbalanced data [44].

Feature Selection and Machine Learning
We used an InfoGainAttributeEval (InfoGain) + Ranker method with default parameters to perform feature selection with WEKA (vers. 3.8.3) (Table 2). Furthermore, optimized models were used to conduct a forward selection, as mentioned in a previous study [45]. The models we built in this study were based on decision tree (DT), random forest (RF), support vector machine (SVM), XGBoost, and lightGBM, with 10-fold cross validation with scikit-learn (vers. 0.21.3). The parameters were tuned before the experiments. For DT, the initial value of tree depth was set from 1-10, with a step of 1. The kernel of the model was set to entropy or gini. For RF, the initial value of the tree number was set at 100 and increased by 100 until 500. The kernel of model was set to gini or entropy. For SVM, the initial value of gamma was set from 10-6 to 10-10, with a step of 0.1. The initial value of C was set from 10-6 to 10-7 with a step of 10. The kernel of SVM was set to RBF. For XGBoost, the initial value of eta was from 0.01 to 0.2, with a step of 0.05. The initial value of depth was set from 1 to 10, with a step of 1. As for lightGBM, the initial value of leaves was set from 50 to 400, with a step of 50. The initial value of depth was set from 1 to 10, with a step of 1. The parameters of the machine learning were tuned via training and validating with the whole dataset. The parameters that obtained the highest accuracy were selected (Table 2). A confusion matrix was used in this study to calculate the accuracy, precision, sensitivity, specificity, and f1 score ( Table 3). The value of AUC was calculated from scikit-learn (vers. 0.21.3).

Clinical Characterisitcs and Clincal Laboratory Data from Patients
The differences in demographics, baseline characteristics, and laboratory data between healthy groups and other cancers are summarized in Table 4. We found that albumin, ALP, BUN, chloride, creatinine, direct bilirubin, eGFR, pH, potassium, total protein, nitrite, strip WBC, and urine occult blood were significantly different in patients with kidney cancer compared to patients with cystitis. Furthermore, we discovered that ALP, AST, BUN, calcium, creatinine, sodium, urine epithelium counts, and urine occult blood had significant differences in patients with prostate cancer compared to patients with cystitis. As for bladder cancer, the statistical results shown that ALP, BUN, calcium, chloride, creatinine, direct bilirubin, eGFR, glucose, specific gravity, total protein, and uric acid were significantly different compared to patients with cystitis. Lastly, ALP, BUN, calcium, chloride, creatinine, eGFR, glucose, potassium, sodium, urine epithelium count, urine protein, urobilinogen, and urine occult blood were significantly different between patients with uterus cancer and patients with cystitis.  -

Feature Selection and Sampling Technique Experiment
To reduce the noise in the dataset, we used InfoGain + Ranker to rank the features between groups (Figure 1). The top feature in each group was assembled as a set of selected features, including calcium, alkaline phosphate, albumin, urine ketone, urine occult blood, and creatinine (Table 5). We used the dataset to train and validate five models, including decision tree, random forest, SVM, XGBoost, and lightGBM. However, the evaluation parameter may not reflect the learning results, due to imbalanced data from bladder cancer compared to other groups. We used the python package named imbalanced-learn to solve the sample imbalance issue. Five models without any sampling techniques were trained and validated with the dataset. In differentiating patients with bladder cancer from patients with cystitis, the models received an accuracy 77.2-78.8%, precision 76.2-80.8%, f1 score 76.6-86.9%, sensitivity 77.7-95.8%, specificity 5-55.4%, and roc 0.592-0.729 (Table S1). After the oversampling technique was applied in the training and validating, the accuracy was adjusted to 73.4-78.8%, precision was adjusted to 74.9-81%, f1 score was adjusted to 75.6-81.4%, sensitivity was adjusted to 78-84.3%, and specificity was adjusted to 51.3-59.3% (Table S1 and Figure 2). An undersampling technique was also tested in our study. The accuracy was adjusted to 76.3-78.3%, precision was adjusted to 78.6-80.6%, f1 score was adjusted to 77.8-80.3%, sensitivity was adjusted to 79.0-83.9%, specificity was adjusted to 42.9-57.4%, and the roc was adjusted to 0.69-0.74 (Table S1). To further optimize our models, we conducted forward selection with the sampling technique in five different models.

Model Evaluation and Comparison
The forward selection method is illustrated in Figure 3. The features from forward selection may be different, due to the models and the classes in the dataset. In the forward selection experiment, we focused on discriminating the patients with cystitis and patients with bladder cancer. In the results of the decision tree classifier, the features including ALT, AST, potassium, sodium, specific gravity, strip WBC, total protein, triglyceride, urine epithelium count, and uric acid were further selected. The decision tree classifier was trained and validated with features from WEKA and forward selection. The model received an accuracy of 76.2%, a precision of 77.9%, a f1 score of 74.6%, a sensitivity of 73.2%, a specificity of 78.1%, and an AUC of 0.77 in differentiating patients with bladder cancer from patients with cystitis (Table 6). In the results of the random forest classifier, the feature including ALT was selected. The random forest classifier was trained and validated with features from WEKA and forward selection. The model received an accuracy of 83.1%, a precision of 78.2%, a f1 score of 81.6%, a sensitivity of 85.5%, a specificity of 79.4%, and an AUC of 0.88 in discriminating patients with bladder cancer from patients with cystitis (Table 6). In the results of SVM, features including ALT, BUN, chloride, direct bilirubin, nitrite, and pH were further selected. The SVM was trained and validated with features from WEKA and forward selection. The model received an accuracy of 71.7%, a precision of 81.9%, a f1 score of 65.5%, a sensitivity of 55.7%, a specificity of 86.7%, and an AUC of 0.73 in identifying patients with bladder cancer and patients with cystitis (Table 6). In the results of XGBoost, features including ALT, AST, BUN, chloride, direct bilirubin, pH, potassium, sodium, total bilirubin, and total cholesterol were further selected ( Table 6). The XGBoost model was trained and validated with features from WEKA and forward selection. The model received an accuracy of 82.8%, a precision of 84.7%, a f1 score of 82.7%, a sensitivity of 81.4%, a specificity of 83.3%, and an AUC of 0.87 in discriminating patients with bladder cancer from patients with cystitis (Table 6). In the results of lightGBM, features including ALT, and diabetes were further selected. The lightGBM model was trained with features from WEKA and forward selection. The model received an accuracy of 87.6%, a precision of 86.3%, a f1 score of 87.7%, a sensitivity of 89.5%, a specificity of 85.5%, and an AUC of 0.93 in identifying patients with bladder cancer and patients with cystitis (Table 6).  The lightGBM with selected features received the highest score for accuracy, precision, f1 score, sensitivity, and AUC. Therefore, we further evaluated the model performance in differentiating bladder cancer from other cancers. The model received an accuracy of 84.8% to 86.9%, a precision of 83% to 87.1%, a f1 score of 84.5% to 87.7%, a sensitivity of 84.4% to 87.8%, a specificity of 82.9% to 86.7%, and an AUC of 0.88 to 0.92 (Table 7 and Figure 4).

Discussion
Machine learning has been significantly developed in the past decade. Many machine learning applications have been created with different types of data, including genomic data, transcriptomic data, proteomic data, image data, electronic health records (EHR), and clinical laboratory data [46][47][48]. However, the most intriguing question is whether machine learning can be applied to medical diagnosis [49]. Obermeyer et al. suggested that machine learning applied to clinical laboratory data can dramatically improve prognosis and diagnostic accuracy [50]. Moreover, compared to novel biomarkers, a decision-making assist program based on clinical laboratory data can be considered a fast and cheap solution for improving the accuracy of diagnosis.
Herein, we utilized the clinical laboratory dataset coupled with machine learning algorithms to discriminate patients with bladder cancer from patients with cystitis. Missing values and imbalanced data were the two major challenges we encountered in this study. Missing values can be categorized as missing complete at random (MCAR), missing at random (MAR), and missing not at random (MNAR) [51]. Several methods can solve this issue, such as collecting more samples, removing the subjects with missing values, filling with mean or median, and imputing missing values [52,53]. Multiple imputation (MI) is considered a good method to calculate missing values from existing data [54]. An elegant study from Hong et al. performed MI with models such as random forest and received a good accuracy [55]. However, in our experiment, MI did not receive applicable results (data not shown). We speculated that MI needs a larger sample size or strongly correlated features to obtain enough characteristics from the existing data. The skewed data distribution of one class over another is considered imbalanced data [56]. The imbalanced data causes classification problems during the training of machine learning algorithms [57]. Therefore, we used sampling techniques, including oversampling and undersampling, to reduce the error. The study performed by Mohammed et al. suggested that oversampling has a better performance with certain classifiers and evaluation metrics [58]. The oversampling method was applied to molecular description data by Chang et al., and reported that it could be used to reduce the overfitting problem [59]. However, oversampling has some disadvantages, such as sample overlapping, noise interference, and blindness of neighbor selection [60]. The main disadvantage of oversampling is that by making copies from existing data, overfitting is likely; in contrast, the main disadvantage of undersampling is the discarding of potentially useful data [61]. Instead of acquiring the highest performance from the models, our goal was to achieve an authentic performance from the models with our dataset. Furthermore, when faced with imbalanced data, it requires more than a one-step solution to improve the accuracy of the model [62]. Thus, in our experiment, we applied tools including undersampling techniques, feature selection, and improved models to increase the diagnostic accuracy in bladder cancer.
From our two-step feature selection, calcium, ALP, albumin, urine ketone, urine occult blood, and ALT were selected from the clinical laboratory data. Michel et al. reported that hypercalcemia was only observed in several patients with bladder cancer. Furthermore, the data shown that the hypercalcemia was caused by increasing levels from the tumor [63]. Moreover, Rosa et al. reported that an increasing level of calcium is common in various cancers, but rare in bladder cancer [64]. In addition, Huang et al. suggested that the elevation of calcium in blood may be considered as an indicator of bone metastasis in bladder cancer [65]. These studies suggested that calcium is a good feature for discriminating bladder cancer from other cancers. ALP has been considered a prognostic biomarker in patients with prostate cancer [66]. Therefore, ALP was selected as the top feature for prostate cancer versus cystitis or other cancers in our study. Furthermore, Braendengen et al. reported that an increased level of ALP in serum did not improve the accuracy of a bone scan used for evaluation precystectomy, which suggested a low correlation between ALP and bladder cancer [67]. These studies indicated that ALP can be used to identify prostate cancer from other cancers without interfering with the classification of bladder cancer. Albumin and globulin play an important role in immunity and inflammation; therefore, several studies have been proposed the ratio of albumin to globulin ratio as a biomarker in gastric cancer and lung cancer [68]. In addition, Quhal et al. reviewed the albumin to globulin ratio in 1096 patients with non-muscle-invasive bladder cancer and found that the ratios independently predicted the progression of disease [69]. Moreover, Tan et al. proposed that the ratio of albumin to ALP can be used as a prognostic biomarker in upper tract urothelial carcinoma [70]. Urine ketone is one of the routine urinalysis. Only a few studies reported an alteration of ketone in patients with bladder cancer in a metabolomics study [71]. However, ketone body in urine has been considered as a high correlation to diabetes [72]. Further-more, the ketones in the blood and urine may indicate that the patients were suffering from diabetic ketoacidosis [73]. Moreover, a comprehensive systemic review suggested that diabetes mellitus was associated with bladder cancer [74]. These studies indicate that urine ketone is related to bladder cancer. Urine occult blood has been considered a screening indicator for bladder cancer [75]. Furthermore, a test for microhematuria found a strong correlation with bladder cancer in 46,842 patients [76]. However, urine occult blood can also be observed in other types of cancer, such as kidney cancer [77] or simply in a benign disease [78]. The lightGBM model we built in this study can discriminate bladder cancer from kidney cancer or cystitis with accuracies of 0.876 and 0.845. The ratio of AST and ALT was proposed as an indicator of liver function [79]. Recently, the ratio of AST and ALT has been discovered to have an association with bladder cancer [80]. Furthermore, Ha et al. suggested that the ratio of AST and ALT may further serve as a prognosis indicator in bladder cancer [81].
In this study, we trained and validated models with selected data from WEKA. Furthermore, a forward selection was performed with five different models, to optimize the model performance. Among the models, the lightGBM model had the highest performance, including an accuracy of 0.87, a precision of 0.86, a f1 score of 0.87, a sensitivity of 0.89, and an AUC of 0.93 in separating patients with bladder cancer from patients with cystitis (Table 6). Many studies have aimed at improving the diagnostic accuracy in bladder cancer. Wang et al. utilized machine learning algorithms to improve the tumor marker-based screening for multiple cancers and yielded a sensitivity of 0.81 and a specificity of 0.64 [82]. Shao et al. applied ultra-performance liquid chromatography coupled with time-of-flight mass spectrometry to acquire metabolites profiles in 152 samples from patients with bladder cancer and hernia; furthermore, the decision tree model embedded in this study obtained an accuracy of 76.6%, a sensitivity of 71.88%, and a specificity of 86.67% [83]. Wittmann et al. developed a random forest model with a set of metabolites selected based on statistical significance, metabolic pathway coverage, and fold difference from global metabolomics profiling of urine. Moreover, the model was tested in two independent cohorts and received an AUC of 0.81 to 0.78 [84]. Belugina et al. developed a non-invasive potentiometric multisensory system to perform urine analysis. In addition, various models were used in this study and received an accuracy of 76%, a sensitivity of 80%, and a specificity of 75% [85]. Kouznetsova et al. used two modeling methods, including multilayer perceptron (MLP) and stochastic gradient descent (SGD) with logistic regression loss function to discriminate bladder cancer patients with metabolite profiling. The best performing model was able to identify bladder cancer patients with an accuracy of 82.54% [43]. Compared to those studies, the model we performed in this study provided a better sensitivity and specificity. For future work, we aim to collect data from different cohorts. Moreover, we are eager to build a model that can differentiate bladder cancer from cystitis and other cancers in our next work.

Conclusions
In summary, we used two-step feature selection to select eight clinical laboratory tests and established a prediction model for bladder cancer with lightGBM. Furthermore, sample techniques were also used in our study and adjusted the imbalanced data. Our study indicated the potential of utilizing clinical laboratory data to detect cancer.