Colorectal cancer is one of the most common types of cancer, and it can have a high mortality rate if left untreated or undiagnosed. The fact that CRC becomes symptomatic at advanced stages highlights the importance of early screening. The reference screening method for CRC is colonoscopy, an invasive, time-consuming procedure that requires sedation or anesthesia and is recommended from a certain age and above. The aim of this study was to build a machine learning classifier that can distinguish cancer from non-cancer samples. For this, circulating tumor cells were enumerated using flow cytometry. Their numbers were used as a training set for building an optimized SVM classifier that was subsequently used on a blind set. The SVM classifier’s accuracy on the blind samples was found to be 90.0%, sensitivity was 80.0%, specificity was 100.0%, precision was 100.0% and AUC was 0.98. Finally, in order to test the generalizability of our method, we also compared the performances of different classifiers developed by various machine learning models, using over-sampling datasets generated by the SMOTE algorithm. The results showed that SVM achieved the best performances according to the validation accuracy metric. Overall, our results demonstrate that CTCs enumerated by flow cytometry can provide significant information, which can be used in machine learning algorithms to successfully discriminate between healthy and colorectal cancer patients. The clinical significance of this method could be the development of a simple, fast, non-invasive cancer screening tool based on blood CTC enumeration by flow cytometry and machine learning algorithms.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.