Hepatitis C virus (HCV) is one of the most dangerous viruses worldwide. It is the foremost cause of the hepatic cirrhosis, and hepatocellular carcinoma, HCC. Detecting new key genes that play a role in the growth of HCC in HCV patients using machine learning techniques paves the way for producing accurate antivirals. In this work, there are two phases: detecting the up/downregulated genes using classical univariate and multivariate feature selection methods, and validating the retrieved list of genes using Insilico classifiers. However, the classification algorithms in the medical domain frequently suffer from a deficiency of training cases. Therefore, a deep neural network approach is proposed here to validate the significance of the retrieved genes in classifying the HCV-infected samples from the disinfected ones. The validation model is based on the artificial generation of new examples from the retrieved genes’ expressions using sparse autoencoders. Subsequently, the generated genes’ expressions data are used to train conventional classifiers. Our results in the first phase yielded a better retrieval of significant genes using Principal Component Analysis (PCA), a multivariate approach. The retrieved list of genes using PCA had a higher number of HCC biomarkers compared to the ones retrieved from the univariate methods. In the second phase, the classification accuracy can reveal the relevance of the extracted key genes in classifying the HCV-infected and disinfected samples.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited