A Mixture Method for Robust Detection HCV Early Diagnosis Biomarker with ML Approach and Molecular Docking

Given the substantial correlation between early diagnosis and prolonged patient survival in HCV patients, it is vital to identify a reliable and accessible biomarker. The purpose of this research was to identify accurate miRNA biomarkers to aid in the early diagnosis of HCV and to identify key target genes for anti-hepatic fibrosis therapeutics. The expression of 188 miRNAs in 42 HCV liver patients with different functional states and 23 normal livers were determined using RT-qPCR. After screening out differentially expressed miRNA (DEmiRNAs), the target genes were predicted. To validate target genes, an HCV microarray dataset was subjected to five machine learning algorithms (Random Forest, Adaboost, Bagging, Boosting, XGBoost) and then, based on the best model, importance features were selected. After identification of hub target genes, to evaluate the potency of compounds that might hit key hub target genes, molecular docking was performed. According to our data, eight DEmiRNAs are associated with early stage and eight DEmiRNAs are linked to a deterioration in liver function and an increase in HCV severity. In the validation phase of target genes, model evaluation revealed that XGBoost (AUC = 0.978) outperformed the other machine learning algorithms. The results of the maximal clique centrality algorithm determined that CDK1 is a hub target gene, which can be hinted at by hsa-miR-335, hsa-miR-140, hsa-miR-152, and hsa-miR-195. Because viral proteins boost CDK1 activation for cell mitosis, pharmacological inhibition may have anti-HCV therapeutic promise. The strong affinity binding of paeoniflorin (−6.32 kcal/mol) and diosmin (−6.01 kcal/mol) with CDK1 was demonstrated by molecular docking, which may result in attractive anti-HCV compounds. The findings of this study may provide significant evidence, in the context of the miRNA biomarkers, for early-stage HCV diagnosis. In addition, recognized hub target genes and small molecules with high binding affinity may constitute a novel set of therapeutic targets for HCV.


Introduction
Hepatitis C is a multifactorial disease, with a reported global prevalence of 56.8 million hepatitis C virus (HCV) infections on 1 January 2020 (95% uncertainty interval (UI) 55. 2-67.8). Although this number is lower than in 2015, forecasts based on studies within 235 nations and territories indicate that we are not on track to meet global elimination targets by 2030 due to COVID-19 [1]. Recently developed direct-acting antiviral agents (DAAs) targeting viral NS3 protease and NS5A and NS5B polymerases are highly effective in curing patients with HCV. However, global eradication of HCV remains complicated due to the lack of

Differentially Expressed miRNA
The result of ANOVA indicated that the expression of 34 miRNAs was significantly different in HCV patients compared to normal (8 upregulated and 26 downregulated). When comparing normal livers to Child-Pugh class A, B, and C, we identified 28 (6 upregulated and 22 downregulated), 19 (5 upregulated and 14 downregulated) and 28 (6 upregulated and 22 downregulated) differentially expressed miRNAs, respectively (Supplementary File S1). The result of theVenn diagram shows that 13 DEmiRNAs are common between Child-Pugh class A, B, and C and 8 DEmiRNAs only in Child-Pugh class A but not in child-B or C, which means that these miRNAs can be potential biomarkers for early diagnosis, because their expression has changed specifically in the patients of functional stage A.
Moreover, 8 DEmiRNAs were only in Child-Pugh class C but not in Child-A or B (potential biomarkers for stage C). Our result did not show any particular DEmiRNAs only in Child-Pugh class B, which means that the exact differentiation in moderate state (Child-B) can be difficult based on miRNAs biomarkers (Figure 1 and Supplementary File S2). Our results of the statistical analysis further show that the expression of hsa-miR-342-3p, hsa-miR-886-5p, and hsa-miR-210 from healthy situation to functional stage A and B slowly increased (not significantly), but at functional stage C it significantly increased (Supplementary File S3).
plementary File S1). The result of theVenn diagram shows that 13 DEmiRNAs a mon between Child-Pugh class A, B, and C and 8 DEmiRNAs only in Child-Pu A but not in child-B or C, which means that these miRNAs can be potential biom for early diagnosis, because their expression has changed specifically in the pat functional stage A. Moreover, 8 DEmiRNAs were only in Child-Pugh class C bu Child-A or B (potential biomarkers for stage C). Our result did not show any pa DEmiRNAs only in Child-Pugh class B, which means that the exact differenti moderate state (Child-B) can be difficult based on miRNAs biomarkers (Figure 1 a plementary File S2). Our results of the statistical analysis further show that the exp of hsa-miR-342-3p, hsa-miR-886-5p, and hsa-miR-210 from healthy situation to fun stage A and B slowly increased (not significantly), but at functional stage C it signi increased (Supplementary File S3). Figure 1. The Venn diagram illustrates that 13 differentially expressed miRNAs (DEmiRN shared by Child-Pugh classes A, B, and C, and could be used as biomarkers for HCV diagn not disease stage. However, the diagram also depicted 8 specific biomarkers for early diag HCV, including hsa-miR-335, hsa-miR-140, hsa-miR-376c, hsa-miR-939, and 8 DEmiRNA can all be potential biomarkers for functional stage C.

Validated Target Genes Using Ensemble Machine Learning Algorithms
Generally, miRNAs perform posttranscriptional functions by base-pairing mRNA 3ʹ untranslated regions. Therefore, the miRNet database was applied to pre target genes of up-regulated and down-regulated specific DEmiRNAs in functiona A and C, respectively (Supplementary File S4). Table 1 and Figure 2 represent fea the network between DEmiRNAs and predicted target genes. Topological analys networks shows that hsa-mir-152 and hsa-mir-195 have the greatest number of tions between down-regulated specific DEmiRNAs in Child-Pugh A. Moreover network of down-regulated specific DEmiRNAs in Child-Pugh C, hsa-mir-155 a miR-99a have higher betweenness centrality, and in the network of up-re DEmiRNAs, hsa-miR-886-5p and hsa-miR-342-3p play key roles. To validate the pr target genes, five ensemble machine learning algorithms were applied to 22,14 from 459 samples with HCV and 459 samples without HCV obtained from the GS data set. The Venn diagram illustrates that 13 differentially expressed miRNAs (DEmiRNAs) are shared by Child-Pugh classes A, B, and C, and could be used as biomarkers for HCV diagnosis but not disease stage. However, the diagram also depicted 8 specific biomarkers for early diagnosis of HCV, including hsa-miR-335, hsa-miR-140, hsa-miR-376c, hsa-miR-939, and 8 DEmiRNAs, which can all be potential biomarkers for functional stage C.

Validated Target Genes Using Ensemble Machine Learning Algorithms
Generally, miRNAs perform posttranscriptional functions by base-pairing to the mRNA 3 untranslated regions. Therefore, the miRNet database was applied to predict the target genes of up-regulated and down-regulated specific DEmiRNAs in functional stages A and C, respectively (Supplementary File S4). Table 1 and Figure 2 represent features of the network between DEmiRNAs and predicted target genes. Topological analysis of the networks shows that hsa-mir-152 and hsa-mir-195 have the greatest number of connections between down-regulated specific DEmiRNAs in Child-Pugh A. Moreover, in the network of down-regulated specific DEmiRNAs in Child-Pugh C, hsa-mir-155 and hsa-miR-99a have higher betweenness centrality, and in the network of up-regulated DEmiRNAs, hsa-miR-886-5p and hsa-miR-342-3p play key roles. To validate the predicted target genes, five ensemble machine learning algorithms were applied to 22,149 genes from 459 samples with HCV and 459 samples without HCV obtained from the GSE34798 data set.
The predictive performance of five algorithms for classifying genes as DEGs or non-DEGs was evaluated. Positive predictive value (PPV), recall (sensitivity), F-score (a harmonic mean of sensitivity), precision, AUC, and Brier score (BS) were used to evaluate the models. The results of the predictive performance comparison models are displayed in Table 2. With an accuracy of 0.978, the XGBoost model demonstrated superior performance across all evaluation criteria compared to the other machine learning algorithms.   Betweenness distribution graph for down-regulated specific DEmiRNAs in Child-Pugh C and predicted target genes. The diameter of the network is 5, the average path length is 2.01, and hsa-mir-155 (0.907471357) and hsa-miR-99a (0.6527755) have higher betweenness centrality and the greatest number of shortest paths, which indicates the key role of these nodes in the network.
The predictive performance of five algorithms for classifying genes as DEGs or non-DEGs was evaluated. Positive predictive value (PPV), recall (sensitivity), F-score (a harmonic mean of sensitivity), precision, AUC, and Brier score (BS) were used to evaluate the models. The results of the predictive performance comparison models are displayed in Table 2. With an accuracy of 0.978, the XGBoost model demonstrated superior performance across all evaluation criteria compared to the other machine learning algorithms. To assess the performance of the models on test sets, the AUC-ROC for each model was calculated. The XGBoost has the best performance, with an AUC over 0.97 in the test set ( Figure 3a). The recall and PPV of the machine learning algorithms were also high, over 0.96. The precision-recall curve is represented in Figure 3b. In this curve, the XGBoost model shows the best performance with a higher value (0.98). Betweenness distribution graph for down-regulated specific DEmiRNAs in Child-Pugh C and predicted target genes. The diameter of the network is 5, the average path length is 2.01, and hsa-mir-155 (0.907471357) and hsa-miR-99a (0.6527755) have higher betweenness centrality and the greatest number of shortest paths, which indicates the key role of these nodes in the network. To assess the performance of the models on test sets, the AUC-ROC for each model was calculated. The XGBoost has the best performance, with an AUC over 0.97 in the test set ( Figure 3a). The recall and PPV of the machine learning algorithms were also high, over 0.96. The precision-recall curve is represented in Figure 3b. In this curve, the XGBoost model shows the best performance with a higher value (0.98).

Screen Hub Target Genes
To further investigate the hub target genes, a PPI network for target genes in each category was constructed using protein interaction data obtained from the STRING (Search Tool for the Retrieval of Interacting Genes) database and then visualized using Cytoscape (Supplementary File S7). A statistical summary of the networks is presented in Supplementary File S8. For the target genes of down-regulated DEmiRNAs in Child-Pugh A and C networks, we first employed the MCODE (Molecular Complex Detection) plugin from Cytoscape to find the top clusters derived from them (Supplementary File S9), and we then employed the cytoHubba to identify the hub genes from the top clusters using the maximal clique centrality (MCC) algorithm, and the genes with the top MCC values were considered hub genes. Due to the small size of the networks of down-regulated target genes (Child-Pugh A and C), cytoHubba was directly applied to identify the top target genes.

Screen Hub Target Genes
To further investigate the hub target genes, a PPI network for target genes in each category was constructed using protein interaction data obtained from the STRING (Search Tool for the Retrieval of Interacting Genes) database and then visualized using Cytoscape (Supplementary File S7). A statistical summary of the networks is presented in Supplementary File S8. For the target genes of down-regulated DEmiRNAs in Child-Pugh A and C networks, we first employed the MCODE (Molecular Complex Detection) plugin from Cytoscape to find the top clusters derived from them (Supplementary File S9), and we then employed the cytoHubba to identify the hub genes from the top clusters using the maximal clique centrality (MCC) algorithm, and the genes with the top MCC values were considered hub genes. Due to the small size of the networks of down-regulated target genes (Child-Pugh A and C), cytoHubba was directly applied to identify the top target genes. For down-regulated DEmiRNAs in Child-Pugh A, the following target genes were identified: STAT1, TGFBR1, PTEN, CUL3, FOS, BAP1, SLC12A4, GNPDA1, CDK1, for up-regulated DEmiRNAs in Child-Pugh A: SMAD4, MELK, SRSF1, for down-regulated DEmiRNAs in Child-Pugh C: ATXN1, CDKN1B, EGR1, RB1, CALR, FN1, UBE2Z, YWHAQ, ZEB1, ITGA5, and for up-regulated DEmiRNAs in Child-Pugh C: MYC, ILK, GTF2A1, CDK2. The results of the gene ontology enrichment analysis indicate that the majority of the most abundant genes were enriched in biological regulation, cellular processes, developmental processes, metabolic processes, and responses to stimulus signaling ( Figure 4a). Moreover, these genes were mainly involved in pathways such as cancer, hepatocellular carcinoma, hepatitis C, hepatitis B, and microRNAs in cancer and play key roles in some molecular functions, including double-stranded DNA binding, protein-containing complex binding, transcription cis-regulatory, RNA polymerase, and kinase binding (Figure 4b).
Int. J. Mol. Sci. 2023, 24, x FOR PEER REVIEW 6 of 14 majority of the most abundant genes were enriched in biological regulation, cellular processes, developmental processes, metabolic processes, and responses to stimulus signaling ( Figure 4a). Moreover, these genes were mainly involved in pathways such as cancer, hepatocellular carcinoma, hepatitis C, hepatitis B, and microRNAs in cancer and play key roles in some molecular functions, including double-stranded DNA binding, protein-containing complex binding, transcription cis-regulatory, RNA polymerase, and kinase binding (Figure 4b).

Binding Affinity between Target Gene and Anti-Hepatic Fibrosis Small Molecules
After removing water and ligand from CDK1 (6GU6, pdb resolution 2.33 Å), polar hydrogens were added using the Discovery Studio Visualizer tool. Ligand preparation and docking between protein and ligands were performed using Autodock Vina (with no change in rotatable bonds and active torsion for the ligand). All docked poses had a root mean square deviation (RMSD) value below 2.0 Å. The result of the united atom scoring function shows the highest binding affinities between CDK1 and paeoniflorin (−6.32 kcal/mol) and diosmin (−6.01 kcal/mol). Figure 5 illustrates the visualization of the docking of the paeoniflorin molecule on the CDK1 protein. Moreover, detailed information for all 18 small molecules, including binding energy between CDK1 and natural small molecules, chemical formula, and mechanisms, is provided in Table 3.

Binding Affinity between Target Gene and Anti-Hepatic Fibrosis Small Molecules
After removing water and ligand from CDK1 (6GU6, pdb resolution 2.33 Å), polar hydrogens were added using the Discovery Studio Visualizer tool. Ligand preparation and docking between protein and ligands were performed using Autodock Vina (with no change in rotatable bonds and active torsion for the ligand). All docked poses had a root mean square deviation (RMSD) value below 2.0 Å. The result of the united atom scoring function shows the highest binding affinities between CDK1 and paeoniflorin (−6.32 kcal/mol) and diosmin (−6.01 kcal/mol). Figure 5 illustrates the visualization of the docking of the paeoniflorin molecule on the CDK1 protein. Moreover, detailed information for all 18 small molecules, including binding energy between CDK1 and natural small molecules, chemical formula, and mechanisms, is provided in Table 3.

Discussion
More than half of HCV infected patients develop chronic infection. Therefore, early detection is essential for preventing or delaying the disease progression [7]. miRNAs are closely correlated with liver-specific disease progression, and the altered levels of miR-NAs have even higher sensitivity and specificity than proteins. Therefore, some of them

Discussion
More than half of HCV infected patients develop chronic infection. Therefore, early detection is essential for preventing or delaying the disease progression [7]. miRNAs are closely correlated with liver-specific disease progression, and the altered levels of miRNAs have even higher sensitivity and specificity than proteins. Therefore, some of them can serve as novel diagnostic biomarkers in HCV-infected patients. Thus, in the current study, 188 miRNA expressions in HCV patients and normal livers was compared to find DEmiRNAs as biomarkers in different functional states of HCV (A, B, and C). Our results show that in liver tissues of HCV patients in the early stage, the expression of hsa-miR-335, hsa-miR-939, hsa-miR-140, hsa-miR-376c, hsa-miR-203, hsa-miR-152, and hsa-miR-195 is significantly decreased, whereas the expression of hsa-miR-27b is highly up-regulated compared to the samples without liver disease, which can be potential biomarkers for the diagnosis of HCV in the early stage. The miR-27b is involved in lipid regulatory pathways and plays a crucial role in a self-inhibiting mechanism in HCV by downregulating the genes engaged in lipid metabolism that are required for HCV replication [8]. Therefore, an increase in miR-27b expression may serve as a potential marker for early-stage HCV patients. Although some studies demonstrated that miR-335, hsa-miR-203, and hsa-miR-152 can serve as biomarkers for the early diagnosis of HCV, but it is still debated [9][10][11]. Then, to discern the target genes that may serve as a therapeutic target in the early stage of HCV, the potential of DEmiRNA target genes, using miRNet were predicted.
To further verify the target genes, the first five ensemble machine learning algorithms (Random Forest, Adaboost, Bagging, Boosting, and XGBoost) were conducted on 22,149 genes of 459 samples with HCV, and 459 samples without HCV to find the best model. Evaluation of models based on PPV, recall, F-score, accuracy, AUC, and BS indicated that the XGBoost (AUC 0.978) model presented better performance in all evaluation metrics than the other machine learning algorithms. The result of the intersection between up-regulated DEGs (output of feature selection based on XGBoost) and target genes of down-regulated DEmiRNAs in Child-Pugh A shows 148 validated up-regulated target genes. To find hub target genes, after acquiring the PPI network, the hub target genes were screened using MCODE and the cytoHubba plug-in. The results of the maximal clique centrality algorithm indicate SLC12A4 (target genes for hsa-miR-939) and CDK1 (target genes for hsa-miR-335, hsa-miR-140, hsa-miR-152, and hsa-miR-195) as key up-regulated target genes in the early stage of HCV disease.
This transporter-related gene has been reported to be differentially expressed in HCVinfected patients by a number of prior studies [12][13][14]. Solute carrier family 12 member 4 (SLC12A4) is one of the essential genes for HCV RNA replication; therefore, suppression of this gene may result in inhibition of HCV replication. CDK1 is a key regulatory kinase of the cell cycle in the CDK family. A previous study demonstrated that CDK1 is up-regulated in HCV [15] and the viral protein increases the activity of the cyclin B1-CDK1 complex through the MAPK p38 and JNK pathways [16]. As CDK1 activation is required for common regulatory processes of the cell cycle, inhibition or interference by drugs has the potential to be an effective method of HCV treatment and progression prevention. To assess the potency of compounds that might hit CDK1, molecular auto-docking was performed. The observed binding energy of natural compounds artesunate and betulinic acid for CDK1 indicates their promising anti-fibrotic effects. Likewise, peoniflorin (from Paeonia lactiflora) targeting CDK1 demonstrates various effects on liver diseases. Investigation in clinical trials shows that this small molecule plays a key role in inhibiting liver inflammation through regulating multiple signaling pathways. Diosmin is a natural flavone that is proved to promote vascular health, but only part of the experimental results indicates its anti-inflammatory and antioxidant effects, which might be related to CDK1 activity.

Patient Characteristics and Specimens
RT-qPCR was used to determine the expression of 188 miRNAs in the livers of 42 HCV patients with different functional states (Child-Pugh class B (n = 11) and C (n = 7), as well as Child-Pugh class A (n = 23), and 23 normal livers (control). Normal liver tissue samples were obtained from patients without liver disease, undergoing resection of metastases from colon cancer at a distance of at least 5 cm from the tumor site. Histological examination verified the non-existence of pathological indicators in the collected tissues (the samples were used as controls in the previously published study). During elective liver transplantation, liver parenchymal tissue samples from patients with Hepatitis C infection (as determined by the standard clinical criteria) were obtained. The stage of liver dysfunction was categorized using the Child-Pugh score. The characteristics of the subjects are presented in Table 4. Tissue biopsies were taken from livers (control and pathological) under standard general anesthesia no later than 15 min after blood flow arrest. The liver samples were immediately snap frozen in liquid nitrogen for protein analysis or immersed in RNAlater (Applied Biosystems, Darmstadt, Germany) for RNA analysis, and then stored at −80 • C. The study protocol was approved by the Bioethics Committee of the Pomeranian Medical University.

Micro-Ribonucleic Acids Expression and Statistical Analysis
From 40-50 mg of tissue samples, total RNA (including small RNA) was isolated using the Direct-zol RNA Miniprep Plus Kit (Zymo Research, Irvine, CA, USA); RNA concentration was then measured using a NanoDrop ultraviolet (UV) spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). Reverse transcription was performed using a TaqMan MicroRNA Reverse Transcription Kit (Applied Biosystems, Darmstadt, Germany) in two separate reactions, each containing a different pool of Megaplex RT Primers (Human Pools A and B; Applied Biosystems, Darmstadt, Germany) and 500 ng of total RNA in a reaction volume of 7.5 µl. Finally, quantitative PCR was performed using the TaqMan Array Cards (TaqMan Array Human MicroRNA A + B Cards Set v3.0, Thermo Fisher Scientific, Waltham, MA, USA) in a ViiA7 Real-Time PCR system (Thermo Fisher Scientific, Waltham, MA, USA). Of the 754 analyzed miRNAs, 188 unique miRNAs that had a Ct value below 32 were selected for further analysis (as recommended by protocol from the assay provider). The relative quantity (RQ) of each miRNAs was calculated using the ∆Ct method in relation to the mean expression of three endogenous controls (stably expressed small noncoding RNAs: U6 snRNA, RNU44, and RNU48). To investigate differentially expressed microRNAs (DEmiRNAs) between disease groups (in total and separately in different functional states) and control groups, an ANOVA was performed using R version 4.1.3 (http://www.Rproject.org, accessed on 20 December 2022) on the number of normalized microRNA counts. The Holm-Bonferroni method was used to correct for multiple testing, and miRNAs with an adjusted p-value ≤ 0.05 were considered DEmiRNAs.

Prediction and Validation of Potential Target Genes
Based on liver tissue, the miRNet database (https://www.mirnet.ca/miRNet/home. xhtml, accessed on 14 January 2023) was used to predict the target genes of the differentially expressed miRNAs. Key DEmiRNAs were extracted based on a topological analysis of miRNA-target gene networks. To further validate the screened target genes for DEmiRNAs, we employed the GEO database to download the HCV mRNA expression datasets. We predicted DEmiRNAs between 22,149 genes of patients with HCV and subjects without HCV using the GSE34798 dataset to further verify the target genes.

Selection of the Best Classification Model and Validation Based on Machine Learning Algorithms
To validate predicted target genes, microarray data were subjected to five of the most frequently recommended machine learning models from prior research. In order to prevent data overfitting, ensemble methods with high detection power were used to build stable models for predicting significant genes. At this point, ensemble methods including XGBoost, AdaBoost, Boosting, Bagging, and Random Forest were used to extract effective genes in hepatitis C disease from gene expression data. To adjust the hyper parameters, the random selection algorithm with ten-fold cross-validation was used, as suggested by Bargstra and Bengio [17]. All the data analyses were carried out in the Python programming language (v.  Table 5b.

Predicted
Positive (1) TP TN Negative (0) FP FN (b) Evaluation metric to assess the performance of the models using accuracy, precision (positive predictive value-PPV), recall (sensitivity), and F-Score (a harmonic mean of sensitivity). The BS was also utilized to evaluate the performance of the models. This criterion evaluates the overall accuracy of the model, which is represented by the square of the difference between the actual value and the predicted value; a smaller value is preferable. MCC takes values between -1 and 1, and a high value means that both classes (DEGS and non-DEGs) are predicted accurately. Furthermore, ROC (receiver operating characteristic) and precision-recall plots were used to select the best model. The model that generated the highest values for all metrics was chosen to extract the top features, which were then subjected to recursive feature elimination. Subsequently, to find the boost potential target genes and due to the negative feedback relationship between miRNA and mRNA, we employed Venny 2.1.0 (https://bioinfogp.cnb.csic.es/tools/venny/, accessed on 25 January 2023) to intersect down-regulated (up-regulated) DEGs with target genes of screened upregulated (down-regulated) DEmiRNAs to obtain the boost potential target genes.

Construction and Topological Analysis Target Gene Networks
PPI networks were constructed using the Search Tool for the Retrieval of Interacting Genes (STRING) database (http://string-db.org, accessed on 3 February 2023) with the highest confidence threshold (0.900) to interpret the interactive relationships between validated target genes. Each score is derived by benchmarking analysis and generally corresponds to an estimate of the likelihood that a given association describes a functional connection between two genes. Using the Network Analyzer plug-in, topological properties were calculated for each node of the constructed networks in order to identify key target genes. Top modules were screened using the Cytoscape plug-in Molecular Complex Detection (MCODE, http://apps.cytoscape.org/apps/mcode, accessed on 6 February 2023). The following parameters were set for the Cytoscape analysis: degree cut-off = 2, node score cut-off = 0.2, k-core = 2, and maximum depth = 100. Next, the cytoHubba plug-in was utilized to identify the hub genes.

Functional and Pathway Enrichment Analysis
To figure out the potential functional role of key target genes, GO annotation and KEGG pathway enrichment analyses were performed for each miRNA using its respective target genes. Visualization of data was performed using the igraph package in R.

Functional and Pathway Enrichment Analysis
To explore compounds that might bind to a significant target gene, we downloaded the structures of 18 small molecules with anti-hepatic fibrosis action in pdb format from the PubChem database (https://pubchem.ncbi.nlm.nih.gov/, accessed on 7 January 2023), as well as the crystal structure of CDK1 (6GU6) from the RCSB PDB (https://www1.rcsb.org/, accessed on 7 January 2023). Small molecules that are compounds from natural products and are from four different classes include alkaloids, flavonoids, terpenes, and phenols. Molecular docking was performed using the Discovery Studio Visualizer tool and Auto Dock Tools 1.5.6 and a rigid docking protocol that used a genetic algorithm to generate binding poses of the protein-ligand complexes.

Conclusions
For the treatment and prevention of HCV, resistance to DAA and impediments to the development of a vaccine continue to pose the major challenges. Here, robust potential biomarkers to aid in the early diagnosis of HCV have been identified, along with potential target genes and anti-hepatic fibrosis molecules for HCV therapy. Altogether, these data support the idea that an alteration in hsa-miR-27b, hsa-miR-335, hsa-miR-140, hsa-miR-376c, hsa-miR-939, hsa-miR-203, hsa-miR-152, and hsa-miR-195 is associated with HCV in the early stage and hsa-miR-342-3p, hsa-miR-99a, hsa-miR-454, hsa-miR-886-5p, hsa-miR-155, hsa-miR-210, and hsa-miR-193a-5p is associated with a deterioration in liver function and an increase in HCV severity. Validated target genes were obtained from the intersection between features selected based on the XGBoost model and predicted target genes based on miRNet. The results of hub detection indicated that inhibition or interference of SLC12A4 and CDK1 by drugs may have potential as an effective method of HCV therapy and progression prevention. Molecular docking revealed a strong affinity binding between paeoniflorin and diosmin with CDK1, which may result in a promising anti-HCV molecule. Funding: The study was funded by the Minister of Science and Higher Education ("Regional Initiative of Excellence" in 2019-2022, project number 002/RID/2018/19, amount of financing 12,000,000 PLN), and the National Science Centre, Cracow, Poland UMO-2020/37/B/NZ7/01466.

Institutional Review Board Statement:
The study has been approved by the Bioethics Committee at the Pomeranian Medical University, approval number BN-001/11/07. The tissue samples were collected from participants who had signed voluntary informed consent. The study was conducted in accordance with the principles of the Declaration of Helsinki (2013).