Identifying MicroRNA Markers That Predict COVID-19 Severity Using Machine Learning Methods

Ren, Jingxin; Guo, Wei; Feng, Kaiyan; Huang, Tao; Cai, Yudong

doi:10.3390/life12121964

Open AccessArticle

Identifying MicroRNA Markers That Predict COVID-19 Severity Using Machine Learning Methods

by

Jingxin Ren

^1,†,

Wei Guo

^2,†,

Kaiyan Feng

³,

Tao Huang

^4,5,*

and

Yudong Cai

^1,*

¹

School of Life Sciences, Shanghai University, Shanghai 200444, China

²

Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200030, China

³

Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510507, China

⁴

Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China

⁵

CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Life 2022, 12(12), 1964; https://doi.org/10.3390/life12121964

Submission received: 27 October 2022 / Revised: 21 November 2022 / Accepted: 21 November 2022 / Published: 23 November 2022

(This article belongs to the Section Epidemiology)

Download

Browse Figures

Versions Notes

Abstract

:

Individuals with the SARS-CoV-2 infection may experience a wide range of symptoms, from being asymptomatic to having a mild fever and cough to a severe respiratory impairment that results in death. MicroRNA (miRNA), which plays a role in the antiviral effects of SARS-CoV-2 infection, has the potential to be used as a novel marker to distinguish between patients who have various COVID-19 clinical severities. In the current study, the existing blood expression profiles reported in two previous studies were combined for deep analyses. The final profiles contained 1444 miRNAs in 375 patients from six categories, which were as follows: 30 patients with mild COVID-19 symptoms, 81 patients with moderate COVID-19 symptoms, 30 non-COVID-19 patients with mild symptoms, 137 patients with severe COVID-19 symptoms, 31 non-COVID-19 patients with severe symptoms, and 66 healthy controls. An efficient computational framework containing four feature selection methods (LASSO, LightGBM, MCFS, and mRMR) and four classification algorithms (DT, KNN, RF, and SVM) was designed to screen clinical miRNA markers, and a high-precision RF model with a 0.780 weighted F1 was constructed. Some miRNAs, including miR-24-3p, whose differential expression was discovered in patients with acute lung injury complications brought on by severe COVID-19, and miR-148a-3p, differentially expressed against SARS-CoV-2 structural proteins, were identified, thereby suggesting the effectiveness and accuracy of our framework. Meanwhile, we extracted classification rules based on the DT model for the quantitative representation of the role of miRNA expression in differentiating COVID-19 patients with different severities. The search for novel biomarkers that could predict the severity of the disease could aid in the clinical diagnosis of COVID-19 and in exploring the specific mechanisms of the complications caused by SARS-CoV-2 infection. Moreover, new therapeutic targets for the disease may be found.

Keywords:

COVID-19; SARS-CoV-2; MicroRNA; feature analysis; biomarker; rules

1. Introduction

Since 2019, the novel coronavirus disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has had a severe negative impact on the world’s economy and public health. As of 23 June 2022, more than 539 million patients have been diagnosed with the infection, and more than 6.3 million deaths have been reported worldwide [1]. The SARS-CoV-2 infection causes symptoms that vary widely among individuals and could cause symptoms ranging from asymptomatic to fever and cough with mild symptoms to severe respiratory impairment that could even lead to death [2,3,4,5]. The course of the disease cannot be accurately predicted. Given the uncertain clinical response to COVID-19 and the wide range of complications, novel biomarkers that could predict the severity of the disease and help in the clinical diagnosis of COVID-19 could help us to explore the specific mechanisms of the complications caused by the SARS-CoV-2 infection and find new therapeutic targets.

To date, studies have compared the hematological and biochemical parameters and inflammatory biomarkers in COVID-19 patients of different severity levels and have found changes in various parameters, including lymphocytes, the C-reactive protein, albumin, and small molecules, such as coagulation factors and various inflammatory chemokines, which are more easily observed in severely and critically ill patients [5,6,7,8,9,10,11]. These parameters may be related to more complex complications in the later stages of the disease. To better explore the role played by various parameters in the immune system during COVID-19, we screened miRNAs as new biomarkers to identify patients with different COVID-19 severities.

MicroRNAs (miRNAs) are a class of highly conserved short noncoding RNA molecules with regulatory roles not directly involved in protein coding. However, they regulate gene expression at the posttranscriptional level by specific complementary binding to mRNAs. Many miRNAs were identified as biomarkers for many diseases, including cancer and cardiovascular diseases [12,13]. Among them, the role of miRNAs in the process of viral infection was highlighted in the diseases caused by exogenous viruses. Some miRNAs could inhibit viral infection through channels, such as the regulation of viral protein receptors and chemokine expression [14,15], whereas partially dysregulated miRNAs cause inflammation and disrupt autophagy [16,17], thereby leading to cellular damage and promoting viral replication. Changes in miRNA expression profiles in COVID-19 patients were identified in previous studies [18,19]. Some of these differentially expressed miRNAs are involved in the antiviral effects of the SARS-CoV-2 infection [20,21,22,23,24]. Some help with viral entry into host cells, as well as viral protein synthesis and assembly [25,26]. Such differences in miRNA expression could also be observed in COVID-19 patients of different severity levels [27]. Thus, screening miRNAs as biomarkers to help identify COVID-19 patients or to characterize COVID-19 severity could facilitate more accurate clinical diagnoses and explore the potential regulatory mechanisms in the course of the SARS-CoV-2 infection.

In the current study, we collected data on the plasma miRNA levels in COVID-19 patients from two previous studies [28,29]. Zeng et al. performed high throughput sequencing to detect the miRNAs in the plasma samples collected from patients with different symptoms of COVID-19 and identified 2336 known miRNAs and 361 novel miRNAs, which were associated with the various clinical presentations and viral persistence levels of COVID-19. Gustafson et al. evaluated miRNA expression and co-regulatory network generation and found that the transcriptome of critically ill COVID-19 patients was significantly deregulated, largely unlike that of SARS-CoV-2-negative patients in the ICU. Integrating the results reported in the above two studies would yield a more accurate miRNA profile of COVID-19 patients, because both studies examined the miRNA expression of patients with various COVID-19 symptoms at the plasma level. Along with machine learning methods, we were able to identify the biomarkers that distinguish the severity of COVID-19 with a more comprehensive miRNA profile. In detail, four feature selection methods, including the least absolute shrinkage and selection operator (LASSO) [30,31], the light gradient boosting machine (LightGBM) [32], the Monte Carlo feature selection (MCFS) [33], and the minimum redundancy maximum relevance (mRMR) [34], were used to evaluate the features associated with the infection status, generating four feature lists that ranked the features according to their importance to the prediction. Then, optimal prediction models based on four classification algorithms (decision tree (DT) [35], k-nearest neighbor (KNN) [36], random forest (RF) [37], and support vector machine (SVM) [38]) were built by applying the incremental feature selection (IFS) method [39] to each feature list. The features used in these models were deemed essential for distinguishing the COVID-19 symptoms and could be novel biomarkers. Furthermore, some interesting decision rules were obtained with the DT, which could indicate special miRNA patterns for different COVID-19 symptoms. The usage of multiple methods could overcome the data preference for specific algorithms, extract more reliable features, and establish more accurate decision rules for quantitative differentiation. The set of rules could help to distinguish between COVID-19 patients, SARS-CoV-2-negative patients with upper respiratory symptoms, and the healthy population, thereby further determining the disease severity of individuals, especially the COVID-19 patients. These identified biomarkers and rules may help in the clinical diagnosis of COVID-19 and uncover its disease mechanisms.

2. Materials and Methods

2.1. Data and Preprocessing

The data on the plasma miRNA levels of COVID-19 patients from two previous studies (GSE166160 and GSE178246) were combined to create a synthesized dataset [28,29]. The study by Zeng et al. (GSE166160) involved 66 healthy controls, 16 asymptomatic COVID-19 patients, and 149 symptomatic COVID-19 patients, including 66 with moderate COVID-19 and 83 with severe COVID-19. The study by Gustafson et al. (GSE178246) involved 30 SARS-CoV-2-negative patients with mild upper respiratory tract symptoms, 14 patients with mild COVID-19, 15 patients with moderate COVID-19, 54 patients with severe COVID-19, and 31 SARS-CoV-2-negative patients from the ICU with upper respiratory tract symptoms. Generally, asymptomatic COVID-19 patients could be categorized as mild COVID-19 patients. In addition, we renamed two classes of samples; the SARS-CoV-2-negative patients with mild upper respiratory tract symptoms and SARS-CoV-2-negative patients from the ICU with upper respiratory tract symptoms were renamed as non-COVID-19 patients with mild symptoms, and non-COVID-19 patients with severe symptoms, respectively. Finally, 30 patients with mild COVID-19 symptoms, 81 patients with moderate COVID-19 symptoms, 30 non-COVID-19 patients with mild symptoms, 137 patients with severe COVID-19, 31 non-COVID-19 patients with severe symptoms, and 66 healthy controls were identified in this study. In terms of the features, a total of 1444 miRNAs shared in two datasets, GSE166160 and GSE178246, were used as the features of this study.

2.2. Feature Ranking Algorithms

Four powerful algorithms: LASSO [30,31], LightGBM [32], MCFS [33], and mRMR [34], were employed to assess the miRNA features and rank them in lists. Their brief descriptions are as follows.

2.2.1. LASSO

In LASSO, the variables were selected and compressed to reduce the feature dimension based on the linear regression models [30,31]. The L1 paradigm was used to create a penalty function that selectively removed the low-correlation variables by penalizing features with higher coefficients and greater prediction errors. Through this process, fewer feature variables were included in the model, and the feature dimension was effectively reduced. The importance of a feature was the absolute value of its coefficient. The features could then be ranked based on their coefficients. The LASSO program from Scikit-learn was run with the default paraments.

2.2.2. LightGBM

LightGBM is a well-known boosting learning machine that combines many weak classifiers to achieve a single strong one [32]. It is an improved version of the gradient boosting decision tree (GBDT), which recurrently fits a new DT, whose residual is approximated by the negative gradient of the loss function of the current DT. It has the following advantages: it is fast, uses less memory, has higher accuracy, can support parallel learning, and can handle large quantities of data. In addition to classification, LightGBM sorts features based on their importance, as quantified by the number of times they are selected for building DTs. More frequently used features are ranked higher. The LightGBM program was implemented through a Python module called lightGBM. The default parameters were adopted to execute this program.

2.2.3. MCFS

MCFS involves the construction of several bootstrap sets and randomly selects some feature sets; it is a powerful and widely used method for selecting features [33,40,41]. Some features are randomly selected as a feature subset, which is used to re-represent the new training samples.

M

bootstrap sets are constructed from the new training samples to build

M

decision trees. This process is repeated

T

times, resulting in

M

×

T

trees. According to its involvement in building

M

×

T

trees, a feature is rated according to its relative importance (RI):

R I_{g} = \sum_{τ = 1}^{M T} {(w A c c)}^{u} \sum_{n_{g} (τ)} I G (n_{g} (τ)) {(\frac{n o . i n n_{g} (τ)}{n o . i n τ})}^{v},

(1)

where

w A c c

is the weighted accuracy.

I G (n_{g} (τ))

stands for the information gain (IG) of

n_{g} (τ)

(a DT node

n

with the attribute

g

).

n o . i n n_{g} (τ)

stands for the number of samples in

n_{g} (τ)

.

n o . i n τ

stands for the sample sizes in the tree root.

u

and

v

are two settled positive integers.

This study used the MCFS program downloaded from http://www.ipipan.eu/staff/m.draminski/mcfs.html (accessed on 4 June 2019). It was performed using the default parameters.

2.2.4. mRMR

Using mRMR, the features were ranked based on the maximum relevance with the class variable and minimum redundancy between the features [34]. Mutual information (MI) could be defined as follows:

I (x, y) = \iint p (x, y) \log \frac{p (x, y)}{p (x) p (y)} d x d y,

(2)

where

p (x, y)

represents the joint probabilistic density of

x

and

y

, and

p (x)

, and

p (y)

represent the marginal probabilistic densities of

x

and

y

, respectively.

Let

Ω

represent the feature set that includes the features to be selected.

Ω'

denotes the feature set that includes the features that are already selected.

f_{i}

or

f_{j}

represent a feature in

Ω

or

Ω'

, and

C

is the class label. The mRMR function is defined as follows:

a r g m a x (f_{i}) = I (f_{i}, C) - \frac{1}{|Ω'|} \sum_{f_{j} \in Ω'} I (f_{i,} f_{j}),

(3)

where

I (f_{i}, C)

indicates the MI between

f_{i}

and the class variable C and

I (f_{i}, f_{j})

represents MI between

f_{i}

and

f_{j}

.

The mRMR program was obtained from http://home.penglab.com/proj/mRMR/ (accessed on 2 May 2018) for this work and run with the default settings.

2.3. Incremental Feature Selection

The IFS method was utilized to find the appropriate number of features to build the optimal models [39,42,43]. The procedure of the IFS method could be divided as follows:

1.: Each feature matrix was constructed using the top $n (n = 1, 2, \dots, k)$ features from the four feature ranking algorithms, where $k$ is the total number of features;
2.: The 10-fold cross-validation was performed on each feature matrix to evaluate the performance of the classification model.
3.: The most effective classification model and its feature subset were selected for each of the four feature rankings.

IFS was used to identify the optimal subset of features for the four feature ranking algorithms, and Venn diagrams were drawn to examine their intersections.

2.4. Synthetic Minority Oversampling Technique

Considering that the sample sizes of each class were different, the synthetic minority oversampling technique (SMOTE) method was used to balance the dataset [44,45]. This approach creates an equal number of samples for each class by linearly synthesizing new sample data using a randomly selected sample and one of its k-nearest neighbors. These additional data were added to train the classification model and improve its performance during the 10-fold cross-validation test. We used the SMOTE tool from https://github.com/scikit-learn-contrib/imbalanced-learn (accessed on 24 March 2020) with default parameters.

2.5. Classification Algorithm

To make the predictions, four classification models were constructed, as follows: DT [35], KNN [36], SVM [38], and RF [37], which are widely used in bioinformatics [46,47,48,49,50,51,52]. The DT is a flowchart-like tree structure with logical operations at its internal nodes. A leaf node holds a class label. The predictions are created by starting at the root node and sorting the data down the tree to a leaf node according to the outcome of the test defined for each branch [35]. In the KNN classification, new samples are assigned to their classes based on the voting of their k-nearest neighbor samples [36]. The SVM is a machine learning model based on statistical learning theory that linearly separates the data by locating the maximum margin separating hyperplane; it could map data into a high-dimensional space using a kernel function [38]. RF is a machine learning algorithm that constructs a number of tree classifiers based on the Bagging algorithm; the data are classified by majority voting [37]. These algorithms were implemented by packages collected in Scikit-learn, which were directly employed in this study. For convenience, they were executed with their default parameters.

2.6. Performance Evaluation

The weighted F1 was mainly used to evaluate the performance of the classification models. To compute the weighted F1, the F1 score for each class should be computed first. For the i-th class, its F1 score is defined as:

F 1 s c o r e_{i} = \frac{2 \times P r e c i s i o n_{i} \times R e c a l l_{i}}{P r e c i s i o n_{i} + R e c a l l_{i}},

(4)

P r e c i s i o n_{i} = \frac{T P_{i}}{T P_{i} + F P_{i}},

(5)

R e c a l l_{i} = \frac{T P_{i}}{T P_{i} + F N_{i}},

(6)

where TP_i, FP_i and FN_i denote true positive, false positive, and false negative for the i-th class. The weighted F1 is defined as the weighted average of the F1 scores on all classes, where the weight is the proportion of the number of samples in one class to the total number of samples. The direct average of the F1 scores defines another widely used measurement, macro F1. In addition, we also used two other measurements: prediction accuracy (ACC) and the Matthews correlation coefficient (MCC). The ACC is defined as the proportion of correctly classified samples. For the MCC, two binary matrices X and Y should be constructed first, which denote the actual and predicted classes of all samples. Then, it could be computed using the following formula:

M C C = \frac{c o v (X, Y)}{\sqrt{c o v (X, X) c o v (Y, Y)}},

(7)

where cov(.) represents the covariance of the two matrices.

3. Results

The key miRNAs and classification rules were extracted to help us distinguish between the COVID-19 patients, the SARS-CoV-2-negative patients with upper respiratory symptoms, and the healthy population, as well as to further determine the severity of the disease in individuals, especially the COVID-19 patients. The overall computational framework is shown in Figure 1. Next, we provided details of the results at each step.

3.1. Results of the Feature Ranking Algorithms

The results of ranking 1444 miRNA features using LASSO, LightGBM, MCFS, and mRMR are shown in Table S1. In the classification process, the features that participated in the top ranked list are important. The biological significance of these features and their significance as classification features are discussed in Section 4.

3.2. IFS Results and Feature Intersections

The four ordered miRNA lists created with the feature ranking algorithms were fed into the IFS method one by one, which incorporated four classification algorithms. With the IFS method, we acquired feature subsets of varied sizes by increasing the feature numbers in order. The models built on different feature subsets were compared for the assessment of their classification performance by using the weighted F1. The IFS curve was produced by charting the weighted F1 on the y axis and the number of features on the x axis. Different curves are drawn for various classification algorithms. The four feature ranking methods corresponded to four sets of IFS curves, as shown in Figure 2, Figure 3, Figure 4 and Figure 5. The detailed results are presented in Table S2.

For the feature list generated by LASSO, the IFS curves corresponding to the four classification algorithms are illustrated in Figure 2. It can be seen that the highest weighted F1 values for DT, KNN, RF, and SVM were 0.677, 0.733, 0.725, and 0.700, respectively. Such performance was obtained using the top 1335, 734, 300, and 406 features in the list. Accordingly, the optimal DT, KNN, RF, and SVM models were constructed with these features. Their detailed overall performance is listed in Table 1, and their performance in each class is shown in Figure 6A. Generally, the optimal KNN model was the best among all of the optimal models. The features used in this model (the top 734 features in the list) constituted the optimal feature set of LASSO.

For the feature list produced with LightGBM, four IFS curves are illustrated in Figure 3. DT, KNN, RF, and SVM generated the highest weighted F1 values of 0.698, 0.751, 0.780, and 0.723, respectively, when the top 418, 834, 114, and 275 features in the list were adopted. Likewise, four optimal models could be set up with these features. The detailed performance of these models can be found in Table 1 and Figure 6B. Evidently, the optimal RF model gave the best performance. Thus, the features used in this model (the top 114 features in the list) comprised the optimal feature set of LightGBM.

For the third feature list yielded with MCFS, the IFS curves are shown in Figure 4. The highest weighted F1 values for the four classification algorithms were 0.694, 0.741, 0.748, and 0.709, respectively. The top 639, 55, 239, and 381 features in the list were used to achieve such performance. With these features, the optimal DT, KNN, RF, and SVM models were built. Table 1 and Figure 6C list their detailed performance, from which we can see that the optimal RF model was the best. Accordingly, the optimal feature set of MCFS included the features used in this model (the top 239 features in the list).

For the last feature list generated with mRMR, Figure 5 shows the IFS curves. The highest weighted F1 values for DT, KNN, RF, and SVM were 0.673, 0.735, 0.741, and 0.713, respectively, which were obtained using the top 249, 823, 583, and 141 features in the list. Then, the optimal DT, KNN, RF, and SVM models were set up based on these features. Their detailed performance is listed in Table 1 and Figure 6D. The optimal RF model was still the best, and its used features (the top 583 features in the list) constituted the optimal feature set of mRMR.

Four optimal feature sets were constructed from four feature lists. However, there were many features in each set, which was not easy for extensive analyses. Thus, we tried to find the most important features of each feature set. By checking the IFS results on each feature list (Table S2), the model with the same classification algorithm could provide relatively high performance when fewer features were employed. For example, when the top 129 features in the optimal feature set of LASSO were considered, the KNN could yield a weighted F1 of 0.696, which was only a little lower than that of the optimal KNN model. These 129 features were much more important than the other 605 (734–129) features in the optimal feature set, as the rest of the features could increase the weighted F1 by only 0.037. The overall performance of the KNN model with the top 129 features is listed in Table 2. Clearly, its performance was slightly lower than that of the optimal KNN model (Table 1). We termed the set consisting of these features as the essential feature set of LASSO. Using similar arguments, the essential feature set of LightGBM, MCFS, and mRMR could be determined, which contained the top 62, 75, and 39 features in the corresponding optimal feature set. Table 2 shows the performances of the models with these essential features. They were all slightly inferior to the optimal models. As the feature lists were generated with different feature ranking algorithms, which have different principles, the analysis of all the essential feature sets was helpful in extracting the miRNA biomarkers as completely as possible. Thus, we intersected the essential feature sets of LASSO, LightGBM, MCFS, and mRMR to obtain the overlapping miRNAs that appeared in multiple sets. By combining the above feature sets, 275 features were obtained. Their distribution on four essential feature sets is illustrated with a Venn diagram, as shown in Figure 7. The detailed results can be found in Table S3. Several miRNAs appeared in multiple sets, suggesting that they may play important roles in differentiating patients with different severities of COVID-19. The biological significance of these miRNAs is discussed in Section 4.

3.3. Classification Rules

The classification rules were extracted using the white-box model DT that visualized the classification process. The IFS results showed that the DT achieved optimal classification performance when the numbers of features were 1335, 418, 639, and 249. The classification rules were extracted based on the optimal DT model under the feature numbers mentioned above. From the optimal DT model built on the list generated with LASSO, 63 classification rules were obtained. For the other three DT models, they provided 69, 63, and 70 classification rules, respectively. Each rule is comprised of miRNA features and their expression values. It describes how the feature’s high or low expression value affects the ability to distinguish among the various classes of samples. The specific classification rules are shown in Table S4. The details of the quantitative rules are presented in Section 4.

4. Discussion

We integrated data from two studies on the miRNA levels in the plasma of COVID-19 patients [28,29] and obtained multiple feature lists based on four feature ranking algorithms. The corresponding classification models were built. The features extracted from these feature lists may be an excellent set of potential biomarkers that could reveal differences between the different symptoms of infection and between the different levels of severity after being infected. These biomarkers could help us to analyze the disease in patients with COVID-19 and reveal the mechanisms of miRNA regulation under SARS-CoV-2 infection. We selected some top features (miRNAs) that appeared in multiple lists for discussion. The representative miRNAs and their predictive roles were summarized and presented in Table 3. The analyses on them were as follows.

4.1. Analysis of the Key Biomarkers

The first feature analyzed was miR-24-3p. This feature appeared in the list of features of four methods and was considered a valid biomarker. miR-24-3p acts directly on the 3’ UTR of neuropilin-1 (NRP) mRNA [85], which expresses a non-tyrosine kinase surface glycoprotein in vertebrates. The NRP-1 transmembrane isoform serves as a target for direct binding to the S1 polypeptide after SARS-CoV-2 protein hydrolysis [86], thereby increasing the enhanced infectivity during acute severe SARS-CoV-2 infection [87]. NRP-1 uses vascular endothelial growth factor (VEGF)165 as a bridge to the VEGF receptor 2 to induce angiogenesis [88]. VEGF165 binds to the b1 structural domain of NRP-1, which is also the site of SARS-CoV-2 binding to NPR-1. SARS-CoV-2 infection leads to the dysregulation of vascular and coagulation functions. In addition, the NRP-1 transmembrane isoform contributes to the infectivity of SARS-CoV-2 by acting as a target for direct binding to the S1 polypeptide after the hydrolysis of SARS-CoV-2 proteins [86,87]. miR-24-3p could directly inhibit the replication of the SARS-CoV-2 virus [53]. The differential expression of miR-24-3p was found in patients with sequelae of acute lung injury caused by severe COVID-19 [54]. These results suggested the research potential of miR-23-3p.

The second feature analyzed was miR-93-3p. This feature was also mentioned in the list of four features and is a highly validated biomarker. No studies have elaborated on a direct relationship between this feature and COVID-19. miR-93-3p targets toll-like receptor 4 (TLR4) and inhibits its expression through a posttranscriptional regulatory mechanism [89], which accelerates the production of proinflammatory factors through the NF-κB pathway [90,91,92]. The upregulated expression of miR-93-3p was observed in macrophages during HIV infection, suggesting the potential role of miR-93-3p in viral-related immune responses [93]. miR-93-3p has serious effects on the cardiovascular system, triggering symptoms that lead to a variety of conditions, including myocarditis [55] and the development of cytokine storms [56]. miR-93-3p might play an important role in COVID19-triggered tissue damage.

The features discussed next were all present in the list of features of the three methods. miR-148a-3p was the third feature analyzed. One study reported that miR-148a-3p regulates the Ras/MAPK/Erk signaling pathway to suppress cancer by targeting Son of sevenless 2 (one of the guanylate nucleotide exchange factors) [94] and ultimately affects B cell differentiation by targeting transcription factor mRNAs, such as Bach2, Mitf, and others [95,96]. In the studies on the differentiation of ICU patients from ward patients, miR-148a-3p was selected for the assessment of COVID-19 severity [57]; the mechanisms involved have not been elucidated yet. In addition, miR-148a was differentially expressed against SARS-CoV-2 structural proteins [57,58], which may allow these structural proteins to escape miRNA-mediated repression and aid in the spread of the virus in the early stages of infection.

The fourth feature analyzed was miR-139-5p. miR-139-5p was upregulated in SARS-CoV-2 infected cells [59] and was associated with COVID-19 severity [60]. MiR-139-5p could reduce the production of proinflammatory cytokines and chemokines by targeting genes, such as MyD88, c-FOS, and Rap1b, and mediating NF-κB and STAT3 signaling pathways [97,98,99,100]. This result might be explained by the overexpression of miR-139-5p in COVID-19 patients, which is the organism’s stress response to infectious inflammation.

The fifth feature analyzed was miR-199a-5p. Few reports have been published on COVID-19 with miR-199a-5p. In a study comparing the bronchial aspirate in critically ill COVID-19 patients versus non-COVID-19 patients, multiple miRNAs, including miR-199a-5p, were identified; these could be used as markers to distinguish between the two samples [61]. The miR-199a-5p could reportedly inhibit the secondary envelope of the virus and exert antiviral effects by inhibiting the target ARHGAP21, which is a Golgi-localized GTPase-activating protein of Cdc42 [62]. miR-199a-5p possibly plays a role in COVID-19 infection.

The sixth signature analyzed was miR-17-3p, which is a signature of several potential targets, such as GFRA2 [101], PAR4 [102], and TIMP3 [103]. These targets are associated with cell proliferation and development. miR-17-3p could target the NIBP gene to inhibit NF-κB activation, thereby suppressing inflammation [104]. In COVID-19 patients, the miR-17-3p expression was proportional to the grading of infection [63]. miR-17-5p, another product of the miR-17 cluster, was also screened as an anti-SARS-CoV2 miRNA; its expression was significantly downregulated in patients [54,64,65]. SARS-CoV-1 expands replication by hijacking miR-17 [66]. Taken together, the production levels of miR-17 clusters are highly correlated with COVID-19 infection.

The last feature analyzed was miR-200c-3p. The direct target of miR-200c-3p is angiotensin-converting enzyme 2 (ACE2), which is a functional receptor necessary for SARS-CoV-2 entry into cells [67,68]. miR-200c-3p has an inverse regulatory effect on ACE2 expression. The differential expression of miR-200c-3p was observed in patients with different severity levels [69,70,71]. In addition, miR-200c-3p directly targets IL-8 [105], which is a proinflammatory chemokine and influences the development of cytokine storms [106].

Zeng et al. [29] identified 85 differentially expressed miRNAs (DE-miRNA) associated with COVID-19 and finally screened six miRNAs for use in differentiating SARS-CoV-2 infected individuals from healthy controls. The cross-sectional study by Dakota Gustafson et al. [28] examined the differentially expressed miRNAs between COVID-19 survivors and non-survivors. SARS-CoV-2 demonstrated differentially expressed miRNAs between negative individuals. The current study integrated them to obtain a richer miRNA profile of patients with different levels of COVID-19 by using four machine learning algorithms to obtain a ranked list of miRNA correlations. A total of 257 biomarkers were obtained for differentiating the samples, some of which could be supported by the recent publications described above. A systematic comparison between the COVID-19-related miRNAs identified in this study and the differentially expressed miRNAs reported by Zeng et al.’s and Gustafson et al.’s studies [28,29] is shown in Figure 8. Several miRNAs identified in this study were also reported in Zeng et al.’s and Gustafson et al.’s studies, indicating the reliability of our results. Furthermore, some exclusive miRNAs were discovered by our study, which could be novel biomarkers.

4.2. Analysis of the Classification Rules

As previously described, the top features obtained could participate in the classification of the samples based on the support of existing publications. Accordingly, we further established four quantitative rule sets to determine the infection status of the distinguished samples by using the newly proposed computational method. Each set of rules could help to distinguish between COVID-19 patients, SARS-CoV-2-negative patients with upper respiratory tract symptoms, and the healthy population. It could further determine the severity of infection in infected patients, especially COVID-19 patients. In conjunction with the purpose of this study, the following discussion focuses on the parameters used to identify patients with severe COVID-19. These parameters are derived from the four methods of the high accuracy top rules. We discuss the parameters in order according to their number of occurrences.

First, a parameter, miR-6750-5p, appeared in the rules of all four methods. miR-6750-5p was downregulated in all four sets of rules, which indicates severe COVID-19. miR-6750-5p, as a late discovered miRNA, has been poorly documented, and no report describes its direct relationship with COVID-19. A study on the miRNA profile of immunoglobulin A nephropathy of different severity levels claimed that the expression level of miR-6750-5p was significantly downregulated in the more severe forms of nephritis [72]. We inferred that the downregulation of this parameter in the rule might be associated with the high inflammation caused by severe COVID-19. In conclusion, miR-6750-5p is a very promising biomarker.

The second parameter discussed (miR-93-5p) is mentioned in the rules of LASSO, MCFS, and mRMR. miR-93-5p requires a lower level to indicate severe COVID-19. According to publications, miR-93-5p expression could inhibit the expression of chemokines, such as IL-6 and IL-8 [73,107,108,109]. These chemokines play a key role in the “cytokine storm” and inflammation that occurs in patients with severe COVID-19 and reflect the severity of the infection [110,111,112]. Considering the report that miR-93-5p has a protective effect on inflammation [73], we suggested that miR-93-5p could potentially serve as a biomarker for recognizing patients and could be used as a therapeutic target.

The parameters that appeared in both LASSO and MCFS methods are analyzed next. The first parameter discussed is miR-34a-5p. The downregulation of this parameter indicates severe COVID-19. The enrichment results on miR-34a-5p and its associated miRNAs showed that they are involved in endothelial cell function, inflammation, and the pathways of viral diseases [74]. miR-34a-5p could inhibit the NF-κB pathway to attenuate the inflammatory response [75,76] and alleviate lung injury [77]. miR-34a-5p expression was significantly reduced in the lung tissues or airway samples of COVID-19 patients [61,74], contrary to the results obtained with plasma samples. This finding might be related to the progression of SARS-CoV-2 infection or the immune response in different tissues of the organism. miR-29b-2-5p was also mentioned by both approaches, showing upregulation in the rules for identifying severe COVID-19. Only one study showed its possible involvement in cytokine overproduction in COVID-19 patients [78]. miR-6762-3p, miR-4709-3p, miR-6791-5p, and miR-4685-3p were also characterized by the common screening of LASSO and MCFS approaches but as late discovered miRNAs, few relevant literature reports are available, and the roles and mechanisms for these miRNAs still need further experimental exploration.

In addition to the above parameters that appeared in multiple methods, each method also screened for its own unique features. miR-429 is a parameter that belongs only to the LASSO method. miR-429 is a member of the miR-200 family. Several members of this family are involved in cytokine regulation. miR-429 could reduce the production of inflammatory cytokines and alleviate lung injury [79]. Another study also raised the possibility of drug-induced miR-429 downregulation for myocarditis [80]. miR-429 is downregulated in the rule, possibly in the response of the organism to inflammation in patients with severe COVID-19. miR-205-5p is a parameter that belongs only to the LightGBM approach. Increased levels of proinflammatory factors, such as NF-κB signaling indicator p65 and IL-6, were observed when miR-205-5p was inhibited [81]. In viral infection studies, miR-205-5p was used as a biomarker for the detection of influenza B [82]; miR-205-5p was differentially expressed in the case of mixed hepatitis C and HIV infections [83]. In the rule, miR-205-5p expression was downregulated. miR-873-5p is a parameter that belongs only to the MCFS approach and shows downregulation in the rule. miR-873-5p is regulated and mediated by IL-17; treatment with the anti-miR-873-5p protected against inflammation-induced injury [84]. The mRMR approach top rules have fewer parameters specific to the mRMR approach and are mostly newly discovered miRNAs. Few studies have been reported. Its role in the infection of SARS-CoV-2 needs further investigation.

In conclusion, all four selected methods yielded quantitative rules with significant effects and strong classification abilities, and each method has its own characteristic parameters in its rules. The top quantitative rules could all be supported by recent publications, thereby verifying the reliability of the obtained rules.

4.3. The Advantages and Limitations of the Proposed Method

In previous studies [28,29], they only performed differential expression analyses. The differentially expressed miRNAs were not necessarily good biomarkers. In fact, there were usually too many differentially expressed miRNAs, and they were highly redundant, which rendered them not suitable as biomarkers. The machine learning methods we proposed, especially the feature ranking algorithms, were proven to find the smallest number of biomarkers with the greatest prediction power.

For the limitations, although we have tested our method on two independent datasets (GSE166160 and GSE178246), the sample size was only approximately 400, which is too small and not representative enough for real applications. These identified biomarkers still need to be validated in a larger sample size and patient cohorts infected with various SARS-CoV-2 strains.

5. Conclusions

In this study, an advanced machine learning computational framework was designed to differentiate between COVID-19 patients, SARS-CoV-2-negative patients with upper respiratory symptoms, and the normal healthy population. The severity of the disease in individuals, especially in COVID-19 patients, was further determined. Four feature ranking algorithms (LASSO, LightGBM, MCFS, and mRMR) were used to filter out the miRNAs with significant impacts on the classification of the patients. Some miRNAs affecting SARS-CoV-2 infection, such as miR-24-3p, miR-93-3p, and miR-148a-3p, were identified, indicating the reliability of the results reported in this study. Subsequently, based on the screened miRNAs, RF, and KNN showed efficient classification performances in the present study. Meanwhile, we extracted the decision process of the DT to form the classification rules, describing the role of miRNA in the classification process at a quantitative level. Eventually, we explained the rationality of the miRNAs and rules for distinguishing the different types of patients through the literature, thereby reflecting the reliability and efficiency of this research.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/life12121964/s1, Table S1: Feature ranking results obtained using LASSO, LightGBM, MCFS, and mRMR; Table S2: Performance of IFS with different classification algorithms. Columns 2 to 7 represent the F1 scores on different classes. Columns 8 to 11 represent the overall performance measurements; Table S3: The intersection results of the essential feature sets extracted from four feature lists generated by LASSO, LightGBM, MCFS, and mRMR; Table S4: Classification rules generated by the optimal DT model. Each rule is comprised of miRNA features and their expression values. It describes how the feature’s high or low expression value affects the ability to distinguish among various classes of samples.

Author Contributions

Conceptualization, T.H. and Y.C.; methodology, J.R., W.G. and K.F.; validation, T.H.; formal analysis, J.R. and W.G.; data curation, T.H.; writing—original draft preparation, J.R. and W.G.; writing—review and editing, T.H. and Y.C.; supervision, Y.C.; funding acquisition, T.H. and Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Key R&D Program of China (2022YFF1203202), Strategic Priority Research Program of Chinese Academy of Sciences [XDA26040304, XDB38050200], the Fund of the Key Laboratory of Tissue Microenvironment and Tumor of Chinese Academy of Sciences [202002].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in Gene Expression Omnibus at https://www.ncbi.nlm.nih.gov/geo/ (accessed on 15 May 2022), reference number [28,29].

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

WHO Coronavirus (COVID-19) Dashboard with Vaccination Data. Available online: https://covid19.who.int/ (accessed on 23 June 2022).
Wu, C.; Chen, X.; Cai, Y.; Xia, J.; Zhou, X.; Xu, S.; Huang, H.; Zhang, L.; Zhou, X.; Du, C.; et al. Risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease 2019 pneumonia in Wuhan, China. JAMA Intern. Med. 2020, 180, 934–943. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Brodin, P. Why is COVID-19 so mild in children? Acta Paediatr. 2020, 109, 1082–1083. [Google Scholar] [CrossRef] [PubMed]
Zhou, Y.; Han, T.; Chen, J.; Hou, C.; Hua, L.; He, S.; Guo, Y.; Zhang, S.; Wang, Y.; Yuan, J.; et al. Clinical and autoimmune characteristics of severe and critical cases of COVID-19. Clin. Transl. Sci. 2020, 13, 1077–1086. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Frater, J.L.; Zini, G.; d’Onofrio, G.; Rogers, H.J. Covid-19 and the clinical hematology laboratory. Int. J. Lab. Hematol. 2020, 42, 11–18. [Google Scholar] [CrossRef] [Green Version]
Zhao, W.; Zha, X.; Wang, N.; Li, D.; Li, A.; Yu, S. Clinical characteristics and durations of hospitalized patients with COVID-19 in beijing: A retrospective cohort study. Cardiovasc. Innov. Appl. 2021, 6, 33–44. [Google Scholar] [CrossRef]
Perez, L. Acute phase protein response to viral infection and vaccination. Arch. Biochem. Biophys. 2019, 671, 196–202. [Google Scholar] [CrossRef]
Chen, X.; Zhao, B.; Qu, Y.; Chen, Y.; Xiong, J.; Feng, Y.; Men, D.; Huang, Q.; Liu, Y.; Yang, B.; et al. Detectable serum severe acute respiratory syndrome coronavirus 2 viral load (rnaemia) is closely correlated with drastically elevated interleukin 6 level in critically ill patients with coronavirus disease 2019. Clin. Infect. Dis 2020, 71, 1937–1942. [Google Scholar] [CrossRef]
Yang, Y.; Shen, C.; Li, J.; Yuan, J.; Wei, J.; Huang, F.; Wang, F.; Li, G.; Li, Y.; Xing, L.; et al. Plasma ip-10 and mcp-3 levels are highly associated with disease severity and predict the progression of COVID-19. J. Allergy Clin. Immunol 2020, 146, 119–127.e114. [Google Scholar] [CrossRef]
Henry, B.M.; de Oliveira, M.H.S.; Benoit, S.; Plebani, M.; Lippi, G. Hematologic, biochemical and immune biomarker abnormalities associated with severe illness and mortality in coronavirus disease 2019 (COVID-19): A meta-analysis. Clin. Chem. Lab. Med. 2020, 58, 1021–1028. [Google Scholar] [CrossRef] [Green Version]
Borges do Nascimento, I.J.; Cacic, N.; Abdulazeem, H.M.; von Groote, T.C.; Jayarajah, U.; Weerasekara, I.; Esfahani, M.A.; Civile, V.T.; Marusic, A.; Jeroncic, A.; et al. Novel coronavirus infection (COVID-19) in humans: A scoping review and meta-analysis. J. Clin. Med. 2020, 9, 941. [Google Scholar] [CrossRef]
Rupaimoole, R.; Slack, F.J. Microrna therapeutics: Towards a new era for the management of cancer and other diseases. Nat. Rev. Drug Discov. 2017, 16, 203–222. [Google Scholar] [CrossRef] [PubMed]
Zhou, S.S.; Jin, J.P.; Wang, J.Q.; Zhang, Z.G.; Freedman, J.H.; Zheng, Y.; Cai, L. Mirnas in cardiovascular diseases: Potential biomarkers, therapeutic targets and challenges. Acta Pharm. Sin. 2018, 39, 1073–1084. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Lodge, R.; Gilmore, J.C.; Ferreira Barbosa, J.A.; Lombard-Vadnais, F.; Cohen, É.A. Regulation of cd4 receptor and hiv-1 entry by micrornas-221 and -222 during differentiation of thp-1 cells. Viruses 2017, 10, 13. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Lodge, R.; Bellini, N.; Laporte, M.; Salahuddin, S.; Routy, J.P.; Ancuta, P.; Costiniuk, C.T.; Jenabian, M.A.; Cohen, É.A. Interleukin-1β triggers p53-mediated downmodulation of ccr5 and hiv-1 entry in macrophages through micrornas 103 and 107. mBio 2020, 11, e02314–e02320. [Google Scholar] [CrossRef] [PubMed]
Fu, X.; Ouyang, Y.; Mo, J.; Li, R.; Fu, L.; Peng, S. Upregulation of microrna-328-3p by hepatitis b virus contributes to thle-2 cell injury by downregulating foxo4. J. Transl. Med. 2020, 18, 143. [Google Scholar] [CrossRef]
Fu, Y.; Xu, W.; Chen, D.; Feng, C.; Zhang, L.; Wang, X.; Lv, X.; Zheng, N.; Jin, Y.; Wu, Z. Enterovirus 71 induces autophagy by regulating has-mir-30a expression to promote viral replication. Antivir. Res. 2015, 124, 43–53. [Google Scholar] [CrossRef] [Green Version]
Li, C.; Hu, X.; Li, L.; Li, J.H. Differential microrna expression in the peripheral blood from human patients with COVID-19. J. Clin. Lab. Anal. 2020, 34, e23590. [Google Scholar] [CrossRef]
Farr, R.J.; Rootes, C.L.; Rowntree, L.C.; Nguyen, T.H.O.; Hensen, L.; Kedzierski, L.; Cheng, A.C.; Kedzierska, K.; Au, G.G.; Marsh, G.A.; et al. Altered microrna expression in COVID-19 patients enables identification of SARS-CoV-2 infection. PLoS Pathog. 2021, 17, e1009759. [Google Scholar] [CrossRef]
Khan, M.A.; Sany, M.R.U.; Islam, M.S.; Islam, A. Epigenetic regulator mirna pattern differences among SARS-CoV, SARS-CoV-2, and SARS-CoV-2 world-wide isolates delineated the mystery behind the epic pathogenicity and distinct clinical characteristics of pandemic COVID-19. Front. Genet. 2020, 11, 765. [Google Scholar] [CrossRef]
Saçar Demirci, M.D.; Adan, A. Computational analysis of microrna-mediated interactions in SARS-CoV-2 infection. PeerJ 2020, 8, e9369. [Google Scholar] [CrossRef]
Hosseini Rad Sm, A.; McLellan, A.D. Implications of SARS-CoV-2 mutations for genomic rna structure and host microrna targeting. Int. J. Mol. Sci. 2020, 21, 4807. [Google Scholar] [CrossRef] [PubMed]
Girardi, E.; López, P.; Pfeffer, S. On the importance of host micrornas during viral infection. Front. Genet. 2018, 9, 439. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Bartel, D.P. Micrornas: Genomics, biogenesis, mechanism, and function. Cell 2004, 116, 281–297. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wilson, L.; McKinlay, C.; Gage, P.; Ewart, G. Sars coronavirus e protein forms cation-selective ion channels. Virology 2004, 330, 322–331. [Google Scholar] [CrossRef] [Green Version]
Masters, P.S. The molecular biology of coronaviruses. Adv. Virus Res. 2006, 66, 193–292. [Google Scholar]
Tang, H.; Gao, Y.; Li, Z.; Miao, Y.; Huang, Z.; Liu, X.; Xie, L.; Li, H.; Wen, W.; Zheng, Y.; et al. The noncoding and coding transcriptional landscape of the peripheral immune response in patients with COVID-19. Clin. Transl. Med. 2020, 10, e200. [Google Scholar] [CrossRef]
Gustafson, D.; Ngai, M.; Wu, R.; Hou, H.; Schoffel, A.C.; Erice, C.; Mandla, S.; Billia, F.; Wilson, M.D.; Radisic, M.; et al. Cardiovascular signatures of COVID-19 predict mortality and identify barrier stabilizing therapies. EBioMedicine 2022, 78, 103982. [Google Scholar] [CrossRef]
Zeng, Q.; Qi, X.; Ma, J.; Hu, F.; Wang, X.; Qin, H.; Li, M.; Huang, S.; Yang, Y.; Li, Y.; et al. Distinct mirnas associated with various clinical presentations of SARS-CoV-2 infection. Iscience 2022, 25, 104309. [Google Scholar] [CrossRef]
Breiman, L. Better subset regression using the nonnegative garrote. Technometrics 1995, 37, 373–384. [Google Scholar] [CrossRef]
Tibshirani, R.J. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 1996, 73, 273–282. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finely, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems 30 (NIP 2017); Curran Associates Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
Micha, D.; Rada-Iglesias, A.; Enroth, S.; Wadelius, C.; Koronacki, J.; Komorowski, J. Monte carlo feature selection for supervised classification. Bioinformatics 2008, 24, 110–117. [Google Scholar]
Peng, H.; Fulmi, L.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. Pattern Anal. Mach. Intell. IEEE Trans. 2005, 27, 1226–1238. [Google Scholar] [CrossRef] [PubMed]
Safavian, S.R.; Landgrebe, D. A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 1991, 21, 660–674. [Google Scholar] [CrossRef] [Green Version]
Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef] [Green Version]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Liu, H.; Setiono, R. Incremental feature selection. Appl. Intell. 1998, 9, 217–230. [Google Scholar] [CrossRef]
Chen, L.; Li, J.; Zhang, Y.H.; Feng, K.; Wang, S.; Zhang, Y.; Huang, T.; Kong, X.; Cai, Y.D. Identification of gene expression signatures across different types of neural stem cells with the monte-carlo feature selection method. J. Cell. Biochem. 2018, 119, 3394–3403. [Google Scholar] [CrossRef]
Chen, X.; Jin, Y.; Feng, Y. Evaluation of plasma extracellular vesicle microrna signatures for lung adenocarcinoma and granuloma with monte-carlo feature selection method. Front. Genet. 2019, 10, 367. [Google Scholar] [CrossRef]
Zhang, Y.H.; Guo, W.; Zeng, T.; Zhang, S.; Chen, L.; Gamarra, M.; Mansour, R.F.; Escorcia-Gutierrez, J.; Huang, T.; Cai, Y.D. Identification of microbiota biomarkers with orthologous gene annotation for type 2 diabetes. Front. Microbiol. 2021, 12, 711244. [Google Scholar] [CrossRef]
Zhang, Y.H.; Li, Z.; Zeng, T.; Pan, X.; Chen, L.; Liu, D.; Li, H.; Huang, T.; Cai, Y.D. Distinguishing glioblastoma subtypes by methylation signatures. Front. Genet. 2020, 11, 604336. [Google Scholar] [CrossRef] [PubMed]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. Smote: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Pan, X.; Chen, L.; Liu, I.; Niu, Z.; Huang, T.; Cai, Y.D. Identifying protein subcellular locations with embeddings-based node2loc. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022, 19, 666–675. [Google Scholar] [CrossRef] [PubMed]
Chen, L.; Li, Z.; Zhang, S.; Zhang, Y.-H.; Huang, T.; Cai, Y.-D. Predicting rna 5-methylcytosine sites by using essential sequence features and distributions. BioMed. Res. Int. 2022, 2022, 4035462. [Google Scholar] [CrossRef] [PubMed]
Ding, S.; Wang, D.; Zhou, X.; Chen, L.; Feng, K.; Xu, X.; Huang, T.; Li, Z.; Cai, Y. Predicting heart cell types by using transcriptome profiles and a machine learning method. Life 2022, 12, 228. [Google Scholar] [CrossRef]
Zhou, X.; Ding, S.; Wang, D.; Chen, L.; Feng, K.; Huang, T.; Li, Z.; Cai, Y.-D. Identification of cell markers and their expression patterns in skin based on single-cell rna-sequencing profiles. Life 2022, 12, 550. [Google Scholar] [CrossRef]
Tang, S.; Chen, L. Iatc-nfmlp: Identifying classes of anatomical therapeutic chemicals based on drug networks, fingerprints and multilayer perceptron. Curr. Bioinform. 2022, 17, 814–824. [Google Scholar]
Wu, C.; Chen, L. A model with deep analysis on a large drug network for drug classification. Math. Biosci. Eng. 2022, 20, 383–401. [Google Scholar] [CrossRef]
Yang, Y.; Chen, L. Identification of drug–disease associations by using multiple drug and disease networks. Curr. Bioinform. 2022, 17, 48–59. [Google Scholar] [CrossRef]
Ran, B.; Chen, L.; Li, M.; Han, Y.; Dai, Q. Drug-drug interactions prediction using fingerprint only. Comput. Math. Methods Med. 2022, 2022, 7818480. [Google Scholar] [CrossRef]
Wang, Y.; Zhu, X.; Jiang, X.M.; Guo, J.; Fu, Z.; Zhou, Z.; Yang, P.; Guo, H.; Guo, X.; Liang, G.; et al. Decreased inhibition of exosomal mirnas on SARS-CoV-2 replication underlies poor outcomes in elderly people and diabetic patients. Signal. Transduct. Target. 2021, 6, 300. [Google Scholar] [CrossRef] [PubMed]
García-Hidalgo, M.C.; González, J.; Benítez, I.D.; Carmona, P.; Santisteve, S.; Pérez-Pons, M.; Moncusí-Moix, A.; Gort-Paniello, C.; Rodríguez-Jara, F.; Molinero, M.; et al. Identification of circulating microrna profiles associated with pulmonary function and radiologic features in survivors of SARS-CoV-2-induced ards. Emerg. Microbes. Infect. 2022, 11, 1537–1549. [Google Scholar] [CrossRef] [PubMed]
Guzik, T.J.; Mohiddin, S.A.; Dimarco, A.; Patel, V.; Savvatis, K.; Marelli-Berg, F.M.; Madhur, M.S.; Tomaszewski, M.; Maffia, P.; D’Acquisto, F.; et al. Covid-19 and the cardiovascular system: Implications for risk assessment, diagnosis, and treatment options. Cardiovasc. Res. 2020, 116, 1666–1687. [Google Scholar] [CrossRef] [PubMed]
Souchelnytskyi, S.; Nera, A.; Souchelnytskyi, N. Covid-19 engages clinical markers for the management of cancer and cancer-relevant regulators of cell proliferation, death, migration, and immune response. Sci. Rep. 2021, 11, 5228. [Google Scholar] [CrossRef] [PubMed]
de Gonzalo-Calvo, D.; Benítez, I.D.; Pinilla, L.; Carratalá, A.; Moncusí-Moix, A.; Gort-Paniello, C.; Molinero, M.; González, J.; Torres, G.; Bernal, M.; et al. Circulating microrna profiles predict the severity of COVID-19 in hospitalized patients. Transl. Res. 2021, 236, 147–159. [Google Scholar] [CrossRef]
Mallick, B.; Ghosh, Z.; Chakrabarti, J. Micrornome analysis unravels the molecular basis of sars infection in bronchoalveolar stem cells. PLoS ONE 2009, 4, e7837. [Google Scholar] [CrossRef]
Chow, J.T.; Salmena, L. Prediction and analysis of SARS-CoV-2-targeting microrna in human lung epithelium. Genes 2020, 11, 1002. [Google Scholar] [CrossRef]
Li, C.; Wu, A.; Song, K.; Gao, J.; Huang, E.; Bai, Y.; Liu, X. Identifying putative causal links between micrornas and severe COVID-19 using mendelian randomization. Cells 2021, 10, 3504. [Google Scholar] [CrossRef]
Molinero, M.; Benítez, I.D.; González, J.; Gort-Paniello, C.; Moncusí-Moix, A.; Rodríguez-Jara, F.; García-Hidalgo, M.C.; Torres, G.; Vengoechea, J.J.; Gómez, S.; et al. Bronchial aspirate-based profiling identifies microrna signatures associated with COVID-19 and fatal disease in critically ill patients. Front. Med. 2021, 8, 756517. [Google Scholar] [CrossRef]
Kobayashi, K.; Suemasa, F.; Sagara, H.; Nakamura, S.; Ino, Y.; Kobayashi, K.; Hiramatsu, H.; Haraguchi, T.; Kurokawa, K.; Todo, T.; et al. Mir-199a inhibits secondary envelopment of herpes simplex virus-1 through the downregulation of cdc42-specific gtpase activating protein localized in golgi apparatus. Sci. Rep. 2017, 7, 6650. [Google Scholar] [CrossRef] [Green Version]
Keikha, R.; Hashemi-Shahri, S.M.; Jebali, A. The relative expression of mir-31, mir-29, mir-126, and mir-17 and their mrna targets in the serum of COVID-19 patients with different grades during hospitalization. Eur. J. Med. Res. 2021, 26, 75. [Google Scholar] [CrossRef] [PubMed]
Sardar, R.; Satish, D.; Gupta, D. Identification of novel SARS-CoV-2 drug targets by host micrornas and transcription factors co-regulatory interaction network analysis. Front. Genet. 2020, 11, 571274. [Google Scholar] [CrossRef] [PubMed]
Banaganapalli, B.; Al-Rayes, N.; Awan, Z.A.; Alsulaimany, F.A.; Alamri, A.S.; Elango, R.; Malik, M.Z.; Shaik, N.A. Multilevel systems biology analysis of lung transcriptomics data identifies key mirnas and potential mirna target genes for SARS-CoV-2 infection. Comput. Biol. Med. 2021, 135, 104570. [Google Scholar] [CrossRef]
Fayyad-Kazan, M.; Makki, R.; Skafi, N.; El Homsi, M.; Hamade, A.; El Majzoub, R.; Hamade, E.; Fayyad-Kazan, H.; Badran, B. Circulating mirnas: Potential diagnostic role for coronavirus disease 2019 (COVID-19). Infect. Genet. Evol. 2021, 94, 105020. [Google Scholar] [CrossRef] [PubMed]
Wang, B.; Li, D.; Fiselier, A.; Kovalchuk, I.; Kovalchuk, O. New akt-dependent mechanisms of anti-COVID-19 action of high-cbd cannabis sativa extracts. Cell Death Discov. 2022, 8, 110. [Google Scholar] [CrossRef] [PubMed]
Bozgeyik, I. Therapeutic potential of mirnas targeting SARS-CoV-2 host cell receptor ace2. Meta Gene 2021, 27, 100831. [Google Scholar] [CrossRef]
Pimenta, R.; Viana, N.I.; Dos Santos, G.A.; Candido, P.; Guimarães, V.R.; Romão, P.; Silva, I.A.; de Camargo, J.A.; Hatanaka, D.M.; Queiroz, P.G.S.; et al. Mir-200c-3p expression may be associated with worsening of the clinical course of patients with COVID-19. Mol. Biol. Res. Commun. 2021, 10, 141–147. [Google Scholar]
Abdolahi, S.; Hosseini, M.; Rezaei, R.; Mohebbi, S.R.; Rostami-Nejad, M.; Mojarad, E.N.; Mirjalali, H.; Yadegar, A.; Asadzadeh Aghdaei, H.; Zali, M.R.; et al. Evaluation of mir-200c-3p and mir-421-5p levels during immune responses in the admitted and recovered COVID-19 subjects. Infect. Genet. Evol. 2022, 98, 105207. [Google Scholar] [CrossRef]
Soltani, S.; Zandi, M. Mir-200c-3p upregulation and ace2 downregulation via bacterial lps and lta as interesting aspects for COVID-19 treatment and immunity. Mol. Biol. Rep. 2021, 48, 5809–5810. [Google Scholar] [CrossRef]
Wang, N.; Bu, R.; Duan, Z.; Zhang, X.; Chen, P.; Li, Z.; Wu, J.; Cai, G.; Chen, X. Profiling and initial validation of urinary micrornas as biomarkers in iga nephropathy. PeerJ 2015, 3, e990. [Google Scholar] [CrossRef] [Green Version]
Liu, J.; Jiang, M.; Deng, S.; Lu, J.; Huang, H.; Zhang, Y.; Gong, P.; Shen, X.; Ruan, H.; Jin, M.; et al. Mir-93-5p-containing exosomes treatment attenuates acute myocardial infarction-induced myocardial damage. Mol. Ther. Nucleic Acids 2018, 11, 103–115. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Centa, A.; Fonseca, A.S.; Ferreira, S.; Azevedo, M.L.V.; Vaz de Paula, C.B.; Nagashima, S.; Machado-Souza, C.; Miggiolaro, A.; Baena, C.P.; de Noronha, L.; et al. Deregulated mirna expression is associated with endothelial dysfunction in post-mortem lung biopsies of COVID-19 patients. Am. J. Physiol. Lung Cell Mol. Physiol. 2020, 320, L405–L412. [Google Scholar] [CrossRef] [PubMed]
Zhou, J.; Shuang, O.; Li, J.; Cai, Z.; Wu, C.; Wang, W. Mir-34a alleviates spinal cord injury via tlr4 signaling by inhibiting hmgb-1. Exp. Ther. Med. 2019, 17, 1912–1918. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kim, H.J.; Joe, Y.; Yu, J.K.; Chen, Y.; Jeong, S.O.; Mani, N.; Cho, G.J.; Pae, H.O.; Ryter, S.W.; Chung, H.T. Carbon monoxide protects against hepatic ischemia/reperfusion injury by modulating the mir-34a/sirt1 pathway. Biochim. Biophys. Acta 2015, 1852, 1550–1559. [Google Scholar] [CrossRef]
Yuan, J.; Zhang, Y. Sevoflurane reduces inflammatory factor expression, increases viability and inhibits apoptosis of lung cells in acute lung injury by microrna-34a-3p upregulation and stat1 downregulation. Chem. Biol. Interact. 2020, 322, 109027. [Google Scholar] [CrossRef]
Li, C.X.; Chen, J.; Lv, S.K.; Li, J.H.; Li, L.L.; Hu, X. Whole-transcriptome rna sequencing reveals significant differentially expressed mrnas, mirnas, and lncrnas and related regulating biological pathways in the peripheral blood of COVID-19 patients. Mediat. Inflamm. 2021, 2021, 6635925. [Google Scholar] [CrossRef]
Xiao, J.; Tang, J.; Chen, Q.; Tang, D.; Liu, M.; Luo, M.; Wang, Y.; Wang, J.; Zhao, Z.; Tang, C.; et al. Mir-429 regulates alveolar macrophage inflammatory cytokine production and is involved in lps-induced acute lung injury. Biochem. J. 2015, 471, 281–291. [Google Scholar] [CrossRef]
Lv, H.; Zhang, S.; Hao, X. Swainsonine protects h9c2 cells against lipopolysaccharide-induced apoptosis and inflammatory injury via down-regulating mir-429. Cell Cycle 2020, 19, 207–217. [Google Scholar] [CrossRef]
Yu, X.; Chen, X.; Sun, T. Microrna-205-5p targets hmgb1 to suppress inflammatory responses during lung injury after hip fracture. Biomed. Res. Int. 2019, 2019, 7304895. [Google Scholar] [CrossRef] [Green Version]
Peng, F.; He, J.; Loo, J.F.; Yao, J.; Shi, L.; Liu, C.; Zhao, C.; Xie, W.; Shao, Y.; Kong, S.K.; et al. Identification of micrornas in throat swab as the biomarkers for diagnosis of influenza. Int. J. Med. Sci. 2016, 13, 77–84. [Google Scholar] [CrossRef] [Green Version]
Martínez-González, E.; Brochado-Kith, Ó.; Gómez-Sanz, A.; Martín-Carbonero, L.; Jimenez-Sousa, M.; Martínez-Román, P.; Resino, S.; Briz, V.; Fernández-Rodríguez, A. Comparison of methods and characterization of small rnas from plasma extracellular vesicles of hiv/hcv coinfected patients. Sci. Rep. 2020, 10, 11140. [Google Scholar] [CrossRef] [PubMed]
Fernández-Tussy, P.; Fernández-Ramos, D.; Lopitz-Otsoa, F.; Simón, J.; Barbier-Torres, L.; Gomez-Santos, B.; Nuñez-Garcia, M.; Azkargorta, M.; Gutiérrez-de Juan, V.; Serrano-Macia, M.; et al. Mir-873-5p targets mitochondrial gnmt-complex ii interface contributing to non-alcoholic fatty liver disease. Mol. Metab. 2019, 29, 40–54. [Google Scholar] [CrossRef] [PubMed]
Mone, P.; Gambardella, J.; Wang, X.; Jankauskas, S.S.; Matarese, A.; Santulli, G. Mir-24 targets the transmembrane glycoprotein neuropilin-1 in human brain microvascular endothelial cells. Non-Coding RNA 2021, 7, 9. [Google Scholar] [CrossRef] [PubMed]
Daly, J.L.; Simonetti, B.; Klein, K.; Chen, K.E.; Williamson, M.K.; Antón-Plágaro, C.; Shoemark, D.K.; Simón-Gracia, L.; Bauer, M.; Hollandi, R.; et al. Neuropilin-1 is a host factor for SARS-CoV-2 infection. Science 2020, 370, 861–865. [Google Scholar] [CrossRef] [PubMed]
Cantuti-Castelvetri, L.; Ojha, R.; Pedro, L.D.; Djannatian, M.; Franz, J.; Kuivanen, S.; van der Meer, F.; Kallio, K.; Kaya, T.; Anastasina, M.; et al. Neuropilin-1 facilitates SARS-CoV-2 cell entry and infectivity. Science 2020, 370, 856–860. [Google Scholar] [CrossRef] [PubMed]
Xie, Y.; Mansouri, M.; Rizk, A.; Berger, P. Regulation of vegfr2 trafficking and signaling by rab gtpase-activating proteins. Sci. Rep. 2019, 9, 13342. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Tang, B.; Xuan, L.; Tang, M.; Wang, H.; Zhou, J.; Liu, J.; Wu, S.; Li, M.; Wang, X.; Zhang, H. Mir-93-3p alleviates lipopolysaccharide-induced inflammation and apoptosis in h9c2 cardiomyocytes by inhibiting toll-like receptor 4. Pathol. Res. Pract. 2018, 214, 1686–1693. [Google Scholar] [CrossRef]
Płóciennikowska, A.; Hromada-Judycka, A.; Borzęcka, K.; Kwiatkowska, K. Co-operation of tlr4 and raft proteins in lps-induced pro-inflammatory signaling. Cell Mol. Life Sci. 2015, 72, 557–581. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Xu, Y.; Jin, H.; Yang, X.; Wang, L.; Su, L.; Liu, K.; Gu, Q.; Xu, X. Microrna-93 inhibits inflammatory cytokine production in lps-stimulated murine macrophages by targeting irak4. FEBS Lett. 2014, 588, 1692–1698. [Google Scholar] [CrossRef] [Green Version]
Yan, X.T.; Ji, L.J.; Wang, Z.; Wu, X.; Wang, Q.; Sun, S.; Lu, J.M.; Zhang, Y. Microrna-93 alleviates neuropathic pain through targeting signal transducer and activator of transcription 3. Int. Immunopharmacol. 2017, 46, 156–162. [Google Scholar] [CrossRef]
Xu, Z.K.; Asahchop, E.L.; Branton, W.G.; Gelman, B.B.; Power, C.; Hobman, T.C. Micrornas upregulated during hiv infection target peroxisome biogenesis factors: Implications for virus biology, disease mechanisms and neuropathology. PLoS Pathog. 2017, 13, e1006360. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Xie, Q.; Yu, Z.; Lu, Y.; Fan, J.; Ni, Y.; Ma, L. Microrna-148a-3p inhibited the proliferation and epithelial-mesenchymal transition progression of non-small-cell lung cancer via modulating ras/mapk/erk signaling. J. Cell Physiol. 2019, 234, 12786–12799. [Google Scholar] [CrossRef] [PubMed]
Porstner, M.; Winkelmann, R.; Daum, P.; Schmid, J.; Pracht, K.; Côrte-Real, J.; Schreiber, S.; Haftmann, C.; Brandl, A.; Mashreghi, M.F.; et al. Mir-148a promotes plasma cell differentiation and targets the germinal center transcription factors mitf and bach2. Eur. J. Immunol. 2015, 45, 1206–1215. [Google Scholar] [CrossRef] [PubMed]
Miyashita, Y.; Yoshida, T.; Takagi, Y.; Tsukamoto, H.; Takashima, K.; Kouwaki, T.; Makino, K.; Fukushima, S.; Nakamura, K.; Oshiumi, H. Circulating extracellular vesicle micrornas associated with adverse reactions, proinflammatory cytokine, and antibody production after COVID-19 vaccination. NPJ Vaccines 2022, 7, 16. [Google Scholar] [CrossRef] [PubMed]
Zhao, X.; Wang, M.; Sun, Z.; Zhuang, S.; Zhang, W.; Yang, Z.; Han, X.; Nie, S. Microrna-139-5p improves sepsis-induced lung injury by targeting rho-kinase1. Exp. Ther. Med. 2021, 22, 1059. [Google Scholar] [CrossRef]
Zhang, X.; Liu, X.; Chang, R.; Li, Y. Mir-139-5p protects septic mice with acute lung injury by inhibiting toll-like receptor 4/myeloid differentiation factor 88/nuclear factor-&mac_kgr;b signaling pathway. Clinics 2021, 76, e2484. [Google Scholar]
Katsumi, T.; Ninomiya, M.; Nishina, T.; Mizuno, K.; Tomita, K.; Haga, H.; Okumoto, K.; Saito, T.; Shimosegawa, T.; Ueno, Y. Mir-139-5p is associated with inflammatory regulation through c-fos suppression, and contributes to the progression of primary biliary cholangitis. Lab. Investig. A J. Technol. Methods Pathol. 2016, 96, 1165–1177. [Google Scholar] [CrossRef] [Green Version]
Zou, F.; Mao, R.; Yang, L.; Lin, S.; Lei, K.; Zheng, Y.; Ding, Y.; Zhang, P.; Cai, G.; Liang, X.; et al. Targeted deletion of mir-139-5p activates mapk, nf-κb and stat3 signaling and promotes intestinal inflammation and colorectal cancer. Febs. J. 2016, 283, 1438–1452. [Google Scholar] [CrossRef]
Song, R.; Liu, Q.; Liu, T.; Li, J. Connecting rules from paired mirna and mrna expression data sets of hcv patients to detect both inverse and positive regulatory relationships. BMC Genom. 2015, 16, S11. [Google Scholar] [CrossRef] [Green Version]
Du, W.W.; Li, X.; Li, T.; Li, H.; Khorshidi, A.; Liu, F.; Yang, B.B. The microrna mir-17-3p inhibits mouse cardiac fibroblast senescence by targeting par4. J. Cell Sci. 2015, 128, 293–304. [Google Scholar] [CrossRef]
Yang, X.; Du, W.W.; Li, H.; Liu, F.; Khorshidi, A.; Rutnam, Z.J.; Yang, B.B. Both mature mir-17-5p and passenger strand mir-17-3p target timp3 and induce prostate tumor growth and invasion. Nucleic. Acids Res. 2013, 41, 9688–9704. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Cai, Y.; Zhang, Y.; Chen, H.; Sun, X.H.; Zhang, P.; Zhang, L.; Liao, M.Y.; Zhang, F.; Xia, Z.Y.; Man, R.Y.; et al. Microrna-17-3p suppresses nf-κb-mediated endothelial inflammation by targeting nik and ikkβ binding protein. Acta Pharm. Sin. 2021, 42, 2046–2057. [Google Scholar] [CrossRef] [PubMed]
Van der Goten, J.; Vanhove, W.; Lemaire, K.; Van Lommel, L.; Machiels, K.; Wollants, W.J.; De Preter, V.; De Hertogh, G.; Ferrante, M.; Van Assche, G.; et al. Integrated mirna and mrna expression profiling in inflamed colon of patients with ulcerative colitis. PLoS ONE 2014, 9, e116117. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Bao, Z.; Wang, L.J.; He, K.; Lin, X.; Yu, T.; Li, J.; Gong, J.; Xiang, G. High expression of ace2 in the human lung leads to the release of il6 by suppressing cellular immunity: Il6 plays a key role in COVID-19. Eur. Rev. Med. Pharmacol. Sci. 2021, 25, 527–540. [Google Scholar]
Farr, R.J.; Rootes, C.L.; Stenos, J.; Foo, C.H.; Cowled, C.; Stewart, C.R. Detection of SARS-CoV-2 infection by microrna profiling of the upper respiratory tract. PLoS ONE 2022, 17, e0265670. [Google Scholar] [CrossRef]
Fabbri, E.; Borgatti, M.; Montagner, G.; Bianchi, N.; Finotti, A.; Lampronti, I.; Bezzerri, V.; Dechecchi, M.C.; Cabrini, G.; Gambari, R. Expression of microrna-93 and interleukin-8 during pseudomonas aeruginosa-mediated induction of proinflammatory responses. Am. J. Respir Cell Mol. Biol. 2014, 50, 1144–1155. [Google Scholar] [CrossRef]
Gasparello, J.; d’Aversa, E.; Breveglieri, G.; Borgatti, M.; Finotti, A.; Gambari, R. In vitro induction of interleukin-8 by SARS-CoV-2 spike protein is inhibited in bronchial epithelial ib3-1 cells by a mir-93-5p agomir. Int. Immunopharmacol. 2021, 101, 108201. [Google Scholar] [CrossRef]
Ma, Q.; Pan, W.; Li, R.; Liu, B.; Li, C.; Xie, Y.; Wang, Z.; Zhao, J.; Jiang, H.; Huang, J.; et al. Liu shen capsule shows antiviral and anti-inflammatory abilities against novel coronavirus SARS-CoV-2 via suppression of nf-κb signaling pathway. Pharm. Res. 2020, 158, 104850. [Google Scholar] [CrossRef]
Hirano, T.; Murakami, M. Covid-19: A new virus, but a familiar receptor and cytokine release syndrome. Immunity 2020, 52, 731–733. [Google Scholar] [CrossRef]
Chi, Y.; Ge, Y.; Wu, B.; Zhang, W.; Wu, T.; Wen, T.; Liu, J.; Guo, X.; Huang, C.; Jiao, Y.; et al. Serum cytokine and chemokine profile in relation to the severity of coronavirus disease 2019 in china. J. Infect. Dis. 2020, 222, 746–754. [Google Scholar] [CrossRef]

Figure 1. Flow chart of the entire analytical process. The 1444 miRNAs integrated from Zeng et al.’s and Gustafson et al.’s studies [28,29] were ranked according to the feature importance using four feature ranking algorithms, namely, LASSO, LightGBM, mRMR, and MCFS. Then, four ordered feature lists were fed into the IFS framework, and efficient classification models were constructed. The essential feature subset in each feature list and the classification rules were extracted. We intersected the subsets to obtain the miRNAs that recurred in multiple subsets. Finally, a biological analysis of the overlapping miRNAs and classification rules was performed.

Figure 2. IFS curves based on the feature list generated with LASSO and the four classification algorithms. Four classification algorithms yielded the highest weighted F1 values when the top 1335, 734, 300, and 406 features were used. The KNN also provided a high performance based on the top 129 features.

Figure 3. IFS curves based on the feature list generated with LightGBM and the four classification algorithms. Four classification algorithms yielded the highest weighted F1 values when the top 418, 834, 114, and 275 features were used. The RF also provided a high performance based on the top 62 features.

Figure 4. IFS curves based on the feature list generated with MCFS and the four classification algorithms. Four classification algorithms yielded the highest weighted F1 values when the top 639, 55, 239, and 381 features were used. The RF also provided a high performance based on the top 75 features.

Figure 5. IFS curves based on the feature list generated with mRMR and the four classification algorithms. Four classification algorithms yielded the highest weighted F1 values when the top 249, 823, 583, and 141 features were used. The RF also provided a high performance based on the top 39 features.

Figure 6. Performance of the optimal models built on different feature lists in six classes. (A) Feature list generated with LASSO, (B) feature list generated with LightGBM, (C) feature list generated with MCFS, (D) feature list generated with mRMR.

Figure 7. Venn diagram showing the intersections of the essential feature sets of LASSO, LightGBM, MCFS, and mRMR. The overlapping circles indicate the miRNAs that appeared in multiple sets.

Figure 8. Venn diagram showing the comparison of the miRNAs associated with COVID-19 identified in this study and Zeng et al. and Gustafson et al.’s studies [28,29].

Table 1. Overall performances of the optimal classifiers under different classification algorithms and feature lists yielded with different feature ranking algorithms.

Feature Ranking Algorithm	Classification Algorithm	Number of Features	ACC	MCC	Macro F1	Weighed F1
LASSO	DT	1335	0.680	0.595	0.710	0.677
	KNN	734	0.747	0.700	0.788	0.733
	RF	300	0.739	0.686	0.779	0.725
	SVM	406	0.709	0.644	0.759	0.700
LightGBM	DT	418	0.699	0.618	0.727	0.698
	KNN	834	0.760	0.711	0.801	0.751
	RF	114	0.787	0.741	0.812	0.780
	SVM	275	0.731	0.673	0.769	0.723
MCFS	DT	639	0.696	0.616	0.733	0.694
	KNN	55	0.747	0.689	0.780	0.741
	RF	239	0.757	0.707	0.794	0.748
	SVM	381	0.720	0.659	0.762	0.709
mRMR	DT	249	0.677	0.594	0.719	0.673
	KNN	823	0.747	0.699	0.787	0.735
	RF	583	0.749	0.699	0.788	0.741
	SVM	141	0.720	0.659	0.762	0.713

Table 2. Performances of the models on essential feature sets yielded with different feature ranking algorithms.

Feature Ranking Algorithm	Classification Algorithm	Number of Features	ACC	MCC	Macro F1	Weighed F1
LASSO	KNN	129	0.707	0.645	0.740	0.696
LightGBM	RF	62	0.771	0.721	0.803	0.764
MCFS	RF	75	0.731	0.679	0.774	0.714
mRMR	RF	39	0.725	0.669	0.768	0.712

Table 3. Summary of the representative miRNAs associated with COVID-19 severities.

miRNA	Target Gene	Expression Level	Predicted Class	Ref.
miR-24-3p	NRP-1	Upregulated	Healthy	[53,54]
miR-93-3p	TLR4	Upregulated	Severe COVID-19	[55,56]
miR-148a-3p	SOS2, BACH2, MITF	Upregulated	Severe COVID-19	[57,58]
miR-139-5p	MYD88, c-FOS, RAP1B	Downregulated	Non-COVID-19-mild	[59,60]
miR-199a-5p	ARHGAP21	Upregulated	Healthy	[61,62]
miR-17-3p	NIBP	Upregulated	Severe COVID-19	[54,63,64,65,66]
miR-200c-3p	ACE2, IL8	Downregulated	Non-COVID-19-severe	[67,68,69,70,71]
miR-6750-5p	POU2F2	Downregulated	Severe COVID-19	[72]
miR-93-5p	PDCD1LG2	Downregulated	Severe COVID-19	[73]
miR-34a-5p	SDK2	Upregulated	Severe COVID-19	[61,74,75,76,77]
miR-29b-2-5p	POU2F2	Upregulated	Severe COVID-19	[78]
miR-429	NR5A2	Downregulated	Severe COVID-19	[79,80]
miR-205-5p	MOSMO	Downregulated	Severe COVID-19	[81,82,83]
miR-873-5p	PHF6	Downregulated	Severe COVID-19	[84]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ren, J.; Guo, W.; Feng, K.; Huang, T.; Cai, Y. Identifying MicroRNA Markers That Predict COVID-19 Severity Using Machine Learning Methods. Life 2022, 12, 1964. https://doi.org/10.3390/life12121964

AMA Style

Ren J, Guo W, Feng K, Huang T, Cai Y. Identifying MicroRNA Markers That Predict COVID-19 Severity Using Machine Learning Methods. Life. 2022; 12(12):1964. https://doi.org/10.3390/life12121964

Chicago/Turabian Style

Ren, Jingxin, Wei Guo, Kaiyan Feng, Tao Huang, and Yudong Cai. 2022. "Identifying MicroRNA Markers That Predict COVID-19 Severity Using Machine Learning Methods" Life 12, no. 12: 1964. https://doi.org/10.3390/life12121964

APA Style

Ren, J., Guo, W., Feng, K., Huang, T., & Cai, Y. (2022). Identifying MicroRNA Markers That Predict COVID-19 Severity Using Machine Learning Methods. Life, 12(12), 1964. https://doi.org/10.3390/life12121964

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Identifying MicroRNA Markers That Predict COVID-19 Severity Using Machine Learning Methods

Abstract

1. Introduction

2. Materials and Methods

2.1. Data and Preprocessing

2.2. Feature Ranking Algorithms

2.2.1. LASSO

2.2.2. LightGBM

2.2.3. MCFS

2.2.4. mRMR

2.3. Incremental Feature Selection

2.4. Synthetic Minority Oversampling Technique

2.5. Classification Algorithm

2.6. Performance Evaluation

3. Results

3.1. Results of the Feature Ranking Algorithms

3.2. IFS Results and Feature Intersections

3.3. Classification Rules

4. Discussion

4.1. Analysis of the Key Biomarkers

4.2. Analysis of the Classification Rules

4.3. The Advantages and Limitations of the Proposed Method

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI