Evaluating Machine Learning Algorithms in COVID-19 Research: A Framework Based on Algorithm Co-Occurrence and Symmetric Network Analysis
Abstract
1. Introduction
2. Literature Review
2.1. Evaluating Knowledge Entities
2.2. Evaluating Knowledge Entities Based on Co-Occurrence Network
2.3. Evaluating the Influence of ML Algorithm Entities
3. The ML Algorithm Evaluation Framework
3.1. Data Collection and Processing
3.2. Research Topic Analysis
3.3. Algorithm Entity Co-Occurrence Network Construction
3.4. Algorithm Entity Evaluation
4. Result
4.1. Research Topics Analysis
4.1.1. LDA Topic Modeling
4.1.2. The Evolution of COVID-19 Research Topics
4.2. Landmark ML Algorithms Analysis
4.2.1. Constructing ML Algorithm Co-Occurrence Network
4.2.2. Evaluating the Influence of ML Algorithms
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. The Search Teams for COVID-19 and the Dictionary of ML Algorithms
| Abbreviation | ML Algorithm | Abbreviation | ML Algorithm |
|---|---|---|---|
| ADB | Adaptive Boosting | LARS | Least Angle Regression |
| AA | Apriori Algorithm | LDA | Linear Discriminant Analysis |
| BP | Back Propagation | LRR | Linear Regression |
| BIRCH | Balanced Iterative Reducing and Clustering using Hierarchies | LR | Logistic Regression |
| BBN | Bayesian Belief Network | LSTM | Long short term memory |
| BN | Bayesian network | LOESS | Locally Estimated Scatterplot Smoothing |
| BR | Bayesian Regression | MDS | Multi-Dimensional Scaling |
| BA | Bootstrap aggregating | MLP | Multilayer Perceptron |
| BNB | Bernoulli Naive Bayes | MRF | Markov Random Field |
| CCA | Canonical Correlation Analysis | MDS | Multidimensional Scaling |
| CRF | Conditional Random Field | MNB | Multinomial Naïve Bayes |
| CNN | Convolutional Neural Network | MEN | Multi-task Elastic-Net |
| CDT | Conditional Decision Trees | MLASSO | Multi-task Lasso |
| CONB | Complement Naïve Bayes | MARS | Multivariate Adaptive Regression Splines |
| CART | Classification and Regression Tree | NB | naive Bayes |
| CANB | Categorical Naïve Bayes | OLS | Ordinary Least Squares |
| CHAID | Chi-squared Automatic Interaction Detection | OMP | Orthogonal Matching Pursuit |
| DT | Decision Tree | PLS | Partial least squares |
| DBN | Deep Belief Network | PA | Passive Aggressive Algorithms |
| DNN | deep Neural Network | PE | Perceptron |
| EA | Eclat Algorithm | PNN | Perceptron Neural Network |
| DBM | Deep Boltzmann Machine | PR | Polynomial regression |
| DBSCAN | Density-Base Spatial Clustering of Application with Noise | PCR | Principal Component Regression |
| ENR | Elastic Net Regression | PCA | Principal Component Analysis |
| EM | Expectation Maximization | QDA | Quadratic Discriminant Analysis |
| FA | Factor Analysis | RBFN | Radial Basis Function Network |
| FL | Federated Learning | RF | Random Forest |
| FC | Fuzzy clustering | RNN | Recurrent Neural Network |
| GPR | Gaussian Process regression | RBN | Restricted Boltzmann Machine |
| GBDT | Gradient Boosting Decision Tree | RIR | Ridge regression |
| GM | Gaussian Mixtures | RR | Robustness Regression |
| GNB | Gaussian Naïve Bayes | SM | Sammon Mapping |
| GPC | Gaussian Processes Classification | SC | Spectral Clustering |
| GLR | Generalized Linear Regression | SOM | Self-Organizing Map |
| GAN | Generative Adversarial Network | SA | Stacked Auto-encoders |
| GA | Genetic Algorithm | SG | Stacked Generalization |
| GBRT | Gradient Boosted Regression Trees | STA | Stacking |
| GBM | Gradient Boosting Machines | SR | Stepwise Regression |
| HMM | Hidden Markov Model | SGD | Stochastic Gradient Descent |
| HC | Hierarchical clustering | SVM | Support Vector Machine |
| HN | Hopfield Network | SVR | Support Vector Regression |
| ID3 | Iterative Dichotomiser | AC | Agglomerative Clustering |
| KRR | Kernel ridge regression | AODE | Averaged One-Dependence Estimators |
| KM | K-Means | TL | Transfer Learning |
| KME | K-Medians | XB | XGBoost |
| KNN | K-Nearest neighbor | CB | CatBoost |
| LVQ | Learning Vector Quantization | LGBM | LightGBM |
| Lasso | Least Absolute Shrinkage and Selection Operator |
| Word | Related Retrieval Keywords |
|---|---|
| COVID-19 | “COVID-19” OR “SARS-CoV-2” OR “2019-nCoV” OR “Novel Coronavirus Pneumonia” OR “Novel Coronavirus Infected Pneumonia” OR “2019 novel coronavirus” OR “coronavirus 2019” OR “coronavirus disease 2019” OR “2019-novel CoV” OR “2019 ncov” OR “covid 2019” OR “corona virus 2019” OR “ncov-2019” OR “ncov2019” OR “nCoV 2019” OR “Severe acute respiratory syndrome coronavirus 2 |
Appendix B. Evaluation Results of ML Algorithms in Other Topics
| No. | The Frequency of Being Mentioned | Weighted Degree | Degree Centrality | Betweenness Centrality | Closeness Centrality | Eigenvector Centrality | Normalized Average | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | LR | 1749 | LR | 558 | LR | 0.673 | LR | 0.409 | LR | 0.742 | LR | 0.404 | LR | 1 |
| 2 | LRR | 655 | LRR | 398 | LRR | 0.510 | LRR | 0.192 | LRR | 0.671 | LRR | 0.358 | LRR | 0.671 |
| 3 | FA | 544 | MEN | 328 | FA | 0.408 | FA | 0.128 | FA | 0.620 | FA | 0.303 | FA | 0.525 |
| 4 | MEN | 143 | FA | 264 | MEN | 0.388 | RF | 0.114 | MEN | 0.598 | MEN | 0.283 | MEN | 0.478 |
| 5 | PCR | 113 | PCR | 108 | RF | 0.306 | MEN | 0.111 | PCR | 0.57 | PCR | 0.244 | PCR | 0.321 |
| 6 | PA | 72 | PCA | 106 | PCR | 0.265 | DT | 0.05 | RF | 0.557 | RF | 0.21 | RF | 0.31 |
| 7 | SR | 70 | SR | 98 | DT | 0.184 | PCA | 0.046 | DT | 0.533 | SR | 0.193 | PCA | 0.253 |
| 8 | PCA | 63 | GLR | 68 | PA | 0.184 | FC | 0.041 | SR | 0.533 | GLR | 0.189 | SR | 0.238 |
| 9 | EM | 56 | PA | 44 | PCA | 0.184 | PCR | 0.038 | GLR | 0.527 | PCA | 0.188 | DT | 0.222 |
| 10 | PLS | 39 | RF | 32 | GLR | 0.163 | CB | 0.033 | CB | 0.521 | PA | 0.181 | GLR | 0.22 |
| No. | The Frequency of Being Mentioned | Weighted Degree | Degree Centrality | Betweenness Centrality | Closeness Centrality | Eigenvector Centrality | Normalized Average |
|---|---|---|---|---|---|---|---|
| 1 | LR 1019 | LR 520 | LR 0.568 | LR 0.429 | LR 0.685 | LR 0.413 | LR 1.000 |
| 2 | PCR767 | PCR 384 | PCR 0.459 | RF 0.295 | PCR 0.627 | PCR 0.386 | PCR 0.761 |
| 3 | LRR156 | MEN 158 | RF 0.405 | PCR0.208 | RF 0.607 | RF 0.301 | RF 0.500 |
| 4 | MEN 73 | BA 150 | LRR0.270 | MEN 0.089 | BA0.529 | LRR0.272 | LRR 0.363 |
| 5 | BA 65 | LRR 136 | MEN 0.243 | Lasso 0.086 | KM 0.529 | RR 0.231 | MEN 0.342 |
| 6 | RR 30 | RR 74 | RR 0.216 | BA 0.075 | LRR 0.514 | KM 0.231 | BA 0.320 |
| 7 | FA 24 | RF 40 | Lasso 0.216 | PE 0.054 | Lasso 0.514 | MEN 0.223 | RR 0.266 |
| 8 | SR 18 | SR 38 | BA 0.189 | LRR 0.041 | FA 0.507 | BA0.206 | KM 0.257 |
| 9 | RF 14 | FA 32 | SVM 0.189 | SVM 0.023 | MEN 0.500 | SR 0.200 | Lasso 0.255 |
| 10 | BR 13 | BR 30 | KM 0.189 | SR 0.018 | SR 0.493 | FA 0.194 | SR 0.229 |
| No. | The Frequency of Being Mentioned | Weighted Degree | Degree Centrality | Betweenness Centrality | Closeness Centrality | Eigenvector Centrality | Normalized Average | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | PCR | 2834 | PCR | 506 | PCR | 0.848 | PCR | 0.899 | PCR | 0.868 | PCR | 0.634 | PCR | 1.000 |
| 2 | LR | 236 | LR | 232 | LR | 0.273 | LR | 0.119 | LR | 0.559 | LR | 0.324 | LR | 0.302 |
| 3 | MEN | 88 | MEN | 182 | LRR | 0.121 | MEN | 0.062 | LRR | 0.508 | MEN | 0.188 | MEN | 0.193 |
| 4 | LRR | 24 | RR | 38 | MEN | 0.121 | SA | 0.061 | MEN | 0.500 | RR | 0.183 | LRR | 0.145 |
| 5 | EM | 23 | LRR | 26 | DT | 0.091 | RF | 0.061 | RF | 0.493 | DT | 0.179 | RR | 0.118 |
| 6 | PA | 21 | FA | 24 | RF | 0.091 | LR | 0.048 | PE | 0.493 | LRR | 0.166 | DT | 0.107 |
| 7 | RR | 18 | BA | 22 | RR | 0.091 | PE | 0.002 | SA | 0.485 | PE | 0.161 | RF | 0.105 |
| 8 | GBM | 16 | EM | 18 | PE | 0.091 | DT | 0.001 | DT | 0.485 | PA | 0.160 | PE | 0.105 |
| 9 | BP | 15 | PA | 12 | PA | 0.091 | RR | 0.001 | GA | 0.485 | BR | 0.157 | PA | 0.104 |
| 10 | FA | 13 | DT | 10 | BR | 0.061 | PA | 0.001 | RR | 0.485 | FA | 0.157 | FA | 0.098 |
| No. | The Frequency of Being Mentioned | Weighted Degree | Degree Centrality | Betweenness Centrality | Closeness Centrality | Eigenvector Centrality | Normalized Average |
|---|---|---|---|---|---|---|---|
| 1 | LR 1873 | LR 870 | LR 0.813 | LR 0.557 | LR 0.842 | LR 0.416 | LR 1.000 |
| 2 | LRR 697 | LRR 598 | MEN 0.521 | PA 0.146 | MEN 0.658 | MEN 0.340 | MEN 0.503 |
| 3 | PA 409 | MEN 504 | PA 0.438 | MEN 0.131 | PA 0.623 | LRR 0.304 | LRR 0.482 |
| 4 | MEN 224 | PA 276 | LRR 0.396 | BP 0.090 | LRR 0.608 | PA 0.292 | PA 0.432 |
| 5 | FA 91 | GLR 156 | FA 0.313 | PE 0.082 | FA 0.565 | FA 0.256 | FA 0.279 |
| 6 | SR 68 | FA 134 | RF 0.271 | LRR0.050 | RF 0.552 | RF 0.219 | RF 0.231 |
| 7 | PCR 65 | SR130 | DT 0.188 | RF 0.033 | HC0.533 | RR 0.185 | SR 0.192 |
| 8 | GLR 61 | PCR 78 | RR0.188 | FA 0.023 | DT 0.527 | PCR 0.181 | PCR 0.182 |
| 9 | OLS 31 | RF 44 | BP 0.167 | DT 0.019 | RR 0.522 | HC 0.180 | DT 0.181 |
| 10 | BP 23 | PCA 36 | HC0.167 | SVM 0.007 | BP 0.516 | DT 0.179 | RR 0.176 |
| No. | The Frequency of Being Mentioned | Weighted Degree | Degree Centrality | Betweenness Centrality | Closeness Centrality | Eigenvector Centrality | Normalized Average |
|---|---|---|---|---|---|---|---|
| 1 | PCR 888 | PCR 346 | PCR 0.612 | PCR 0.254 | PCR 0.700 | PCR 0.331 | PCR ·1.000 |
| 2 | EM224 | LR 198 | RF 0.490 | RF 0.207 | RF 0.636 | FA 0.305 | RF 0.636 |
| 3 | LR 167 | RF 144 | FA 0.469 | LR 0.066 | FA 0.628 | LR 0.288 | LR0.563 |
| 4 | PCA 111 | FA 130 | LR 0.429 | FA 0.064 | LR 0.620 | RF 0.286 | FA 0.530 |
| 5 | RF 79 | PCA 116 | PCA 0.367 | TL 0.064 | PCA 0.590 | PCA 0.246 | PCA 0.456 |
| 6 | LRR69 | LRR 90 | LRR0.347 | SVM 0.063 | LRR 0.583 | LRR 0.239 | LRR 0.416 |
| 7 | FA 46 | SVM 84 | SVM 0.347 | HC 0.062 | SVM 0.570 | PLS 0.231 | SVM 0.404 |
| 8 | MDS 46 | AA 72 | PLS 0.306 | PC A0.056 | PLS 0.563 | SVM 0.226 | HC0.355 |
| 9 | PA 38 | BA 72 | HC 0.286 | FC 0.054 | HC 0.557 | HC 0.194 | PLS 0.350 |
| 10 | BP 37 | MEN 66 | MEN 0.245 | BA 0.054 | GM 0.538 | GM 0.192 | MEN 0.300 |
| No. | The Frequency of Being Mentioned | Weighted Degree | Degree Centrality | Betweenness Centrality | Closeness Centrality | Eigenvector Centrality | Normalized Average |
|---|---|---|---|---|---|---|---|
| 1 | LR 318 | LR 104 | LR 0.741 | LR 0.746 | LR 0.730 | LR 0.594 | LR1.000 |
| 2 | PCR 101 | LRR 42 | LRR 0.296 | LRR 0.397 | LRR 0.574 | PCR 0.325 | LRR 0.442 |
| 3 | LRR 62 | MEN 34 | PCR 0.259 | PE 0.142 | PCR 0.540 | MEN 0.286 | PCR 0.364 |
| 4 | BP 20 | PCR 26 | MEN 0.185 | PCR 0.108 | MEN 0.519 | LRR 0.280 | MEN 0.276 |
| 5 | MEN 14 | RR 10 | RR 0.148 | PA 0.074 | GLR 0.482 | RR 0.217 | RR0.180 |
| 6 | PA 9 | RF 10 | HC 0.111 | MEN 0.021 | RR 0.458 | FA 0.216 | FA 0.164 |
| 7 | DT 8 | HC 8 | FA 0.111 | RR 0.007 | FA 0.450 | RF 0.182 | RF 0.155 |
| 8 | EM 5 | FA 8 | RF 0.111 | HC 0.001 | HC 0.443 | PCA 0.165 | GLR 0.145 |
| 9 | FA 5 | PCA 8 | BP 0.074 | RF 0.001 | PCA 0.443 | GLR 0.157 | HC 0.143 |
| 10 | MDS 5 | GLR 6 | BR 0.074 | BP 0.000 | RF 0.443 | HC 0.154 | PCA 0.139 |
| No. | The Frequency of Being Mentioned | Weighted Degree | Degree Centrality | Betweenness Centrality | Closeness Centrality | Eigenvector Centrality | Normalized Average |
|---|---|---|---|---|---|---|---|
| 1 | PCR 3571 | PCR516 | PCR 0.881 | PCR 0.856 | PCR 0.894 | PCR 0.586 | PCR 1.000 |
| 2 | BA 93 | BA 194 | BA0.262 | AC 0.093 | BA 0.545 | BA 0.268 | BA 0.257 |
| 3 | BP 58 | LR 74 | RF 0.190 | RF 0.057 | RF 0.532 | RF 0.211 | RF 0.166 |
| 4 | LRR53 | BP 66 | LR 0.167 | SVM 0.051 | LR 0.519 | LR 0.204 | LR 0.165 |
| 5 | LR 50 | MEN 50 | DT 0.143 | BA0.039 | PCA0.519 | PCA0.195 | PCA 0.138 |
| 6 | MEN 23 | LRR44 | PCA0.143 | DNN 0.016 | SVM 0.519 | DT 0.187 | DT 0.134 |
| 7 | RF 20 | DT22 | HC0.143 | AA0.010 | DNN 0.512 | HC0.183 | MEN 0.134 |
| 8 | PA 18 | RF 20 | DNN 0.119 | LR 0.008 | DT 0.512 | MEN 0.178 | HC0.131 |
| 9 | PCA 17 | PCA18 | MEN 0.119 | PCA0.007 | HC0.512 | LRR0.165 | LRR 0.130 |
| 10 | DT 15 | SC 18 | SVM 0.119 | HC0.005 | MEN 0.506 | SC 0.157 | BP 0.127 |
| No. | The Frequency of Being Mentioned | Weighted Degree | Degree Centrality | Betweenness Centrality | Closeness Centrality | Eigenvector Centrality | Normalized Average | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | PCR | 345 | LRR | 272 | LR | 0.540 | LRR | 0.249 | LR | 0.670 | LR | 0.339 | LRR | 0.959 |
| 2 | LRR | 317 | LR | 178 | LRR | 0.524 | LR | 0.190 | LRR | 0.663 | RF | 0.318 | LR | 0.853 |
| 3 | LR | 242 | LSTM | 156 | RF | 0.508 | RF | 0.184 | RF | 0.630 | LRR | 0.300 | RF | 0.683 |
| 4 | LSTM | 100 | PCR | 138 | LSTM | 0.381 | PCR | 0.135 | PCR | 0.568 | LSTM | 0.259 | PCR | 0.657 |
| 5 | OLS | 67 | RF | 112 | PCR | 0.349 | LSTM | 0.101 | LSTM | 0.558 | PE | 0.232 | LSTM | 0.571 |
| 6 | LDA | 66 | PE | 96 | PE | 0.270 | OLS | 0.082 | PE | 0.543 | PR | 0.201 | PE | 0.390 |
| 7 | RF | 63 | MLP | 78 | SVR | 0.238 | HC | 0.048 | PCA | 0.534 | MLP | 0.195 | PCA | 0.338 |
| 8 | PCA | 55 | RNN | 78 | PR | 0.222 | BR | 0.043 | SVR | 0.529 | SVR | 0.191 | OLS | 0.338 |
| 9 | KM | 44 | PCA | 78 | PCA | 0.222 | GA | 0.032 | PR | 0.529 | SVM | 0.188 | SVR | 0.325 |
| 10 | HC | 33 | PR | 56 | MLP | 0.206 | GPR | 0.032 | OLS | 0.525 | PCR | 0.182 | PR | 0.317 |
| No. | The Frequency of Being Mentioned | Weighted Degree | Degree Centrality | Betweenness Centrality | Closeness Centrality | Eigenvector Centrality | Normalized Average |
|---|---|---|---|---|---|---|---|
| 1 | PCR 936 | PCR 302 | PCR 0.658 | PCR0.624 | PCR 0.726 | PCR 0.531 | PCR 1.000 |
| 2 | LR 214 | MEN 190 | LR 0.289 | LR 0.180 | LR 0.541 | MEN 0.321 | LR 0.451 |
| 3 | MEN 85 | LR 144 | LRR 0.237 | LRR 0.149 | MEN 0.533 | LR 0.297 | MEN 0.416 |
| 4 | LRR64 | LRR 46 | MEN 0.237 | BP 0.083 | LRR 0.525 | LRR 0.250 | LRR 0.329 |
| 5 | PA 40 | RR 38 | BP 0.211 | MEN 0.071 | BP 0.502 | BP 0.248 | BP 0.274 |
| 6 | BP 24 | PE 32 | PA 0.158 | RR 0.053 | RR 0.487 | PE 0.227 | RR 0.239 |
| 7 | RR 18 | PA 30 | AA 0.132 | CNN 0.050 | PE 0.467 | PA 0.220 | PA 0.229 |
| 8 | DT 14 | PCA 18 | RR0.132 | PA 0.007 | PCA 0.467 | PCA0.217 | PE 0.224 |
| 9 | PE 13 | BP 16 | PE 0.132 | PCA 0.005 | AA 0.449 | RR 0.204 | PCA 0.214 |
| 10 | PCA 13 | AA 10 | PCA 0.132 | AA 0.005 | PA 0.449 | AA 0.182 | AA 0.192 |
| No. | The Frequency of Being Mentioned | Weighted Degree | Degree Centrality | Betweenness Centrality | Closeness Centrality | Eigenvector Centrality | Normalized Average |
|---|---|---|---|---|---|---|---|
| 1 | PLS 300 | LRR 126 | LRR 0.429 | LRR 0.194 | LRR 0.578 | LRR 0.372 | LRR 0.913 |
| 2 | LRR 143 | FA 94 | RF 0.333 | PLS 0.131 | FA 0.543 | LR 0.303 | FA 0.672 |
| 3 | LR 135 | LR 80 | LR 0.317 | FA 0.120 | LR 0.543 | RF 0.279 | LR 0.668 |
| 4 | PA 128 | RF76 | FA 0.286 | RF 0.118 | RF 0.529 | FA 0.265 | PLS 0.664 |
| 5 | FA127 | PCA 68 | PLS 0.286 | LR 0.096 | PCA 0.525 | PCA0.255 | RF 0.620 |
| 6 | PCA 68 | PLS 58 | PCA 0.254 | PCA 0.089 | KM 0.496 | PLS 0.188 | PCA 0.551 |
| 7 | OLS64 | PE 56 | PE 0.206 | PE 0.074 | LSTM 0.492 | KM 0.185 | PA 0.451 |
| 8 | RF48 | PA 52 | SVM 0.190 | DT 0.074 | PLS 0.488 | PCR 0.183 | KM 0.418 |
| 9 | PCR46 | PCR44 | PA 0.190 | KM 0.074 | PA 0.485 | PA 0.178 | PE 0.376 |
| 10 | KM43 | KM 44 | KM 0.190 | LSTM 0.065 | SR 0.477 | SR 0.175 | LSTM 0.372 |
| No. | The Frequency of Being Mentioned | Weighted Degree | Degree Centrality | Betweenness Centrality | Closeness Centrality | Eigenvector Centrality | Normalized Average | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | LR | 748 | LR | 304 | LR | 0.705 | LR | 0.552 | LR | 0.772 | LR | 0.492 | LR | 1.000 |
| 2 | LRR | 228 | LRR | 154 | LRR | 0.432 | LRR | 0.206 | LRR | 0.638 | LRR | 0.379 | LRR | 0.541 |
| 3 | PCR | 137 | MEN | 146 | RF | 0.273 | RF | 0.137 | MEN | 0.571 | MEN | 0.285 | MEN | 0.367 |
| 4 | MEN | 67 | PCR | 80 | MEN | 0.273 | PCR | 0.117 | RF | 0.564 | RF | 0.242 | PCR | 0.331 |
| 5 | PA | 42 | RR | 34 | PCR | 0.250 | MEN | 0.082 | PCR | 0.564 | PCR | 0.234 | RF | 0.291 |
| 6 | OLS | 37 | RF | 28 | PA | 0.205 | PA | 0.065 | PA | 0.543 | PA | 0.205 | PA | 0.233 |
| 7 | RF | 22 | FA | 24 | DT | 0.182 | CNN | 0.045 | RR | 0.500 | RR | 0.173 | RR | 0.174 |
| 8 | FA | 20 | PA | 22 | RR | 0.136 | DT | 0.043 | PCA | 0.494 | FA | 0.173 | FA | 0.155 |
| 9 | PCA | 20 | GLR | 22 | FA | 0.114 | RR | 0.009 | XB | 0.494 | PCA | 0.162 | PCA | 0.152 |
| 10 | EM | 19 | DT | 20 | PCA | 0.114 | PCA | 0.005 | OLS | 0.489 | XB | 0.154 | DT | 0.143 |
References
- Miller, H.J.; Goodchild, M.F. Data-driven geography. GeoJournal 2015, 80, 449–461. [Google Scholar] [CrossRef]
- Zhu, Y.; Zhou, L.; Xie, C.; Wang, G.J.; Nguyen, T.V. Forecasting SMEs’ credit risk in supply chain finance with an enhanced hybrid ensemble machine learning approach. Int. J. Prod. Econ. 2019, 211, 22–33. [Google Scholar] [CrossRef]
- Khan, M.; Mehran, M.T.; Haq, Z.U.; Ullah, Z.; Naqvi, S.R.; Ihsan, M.; Abbass, H. Applications of artificial intelligence in COVID-19 pandemic: A comprehensive review. Expert Syst. Appl. 2021, 185, 115695. [Google Scholar] [CrossRef] [PubMed]
- Qu, K.; Guo, F.; Liu, X.; Lin, Y.; Zou, Q. Application of machine learning in microbiology. Front. Microbiol. 2019, 10, 827. [Google Scholar] [CrossRef]
- Kitchin, R. Big Data, new epistemologies and paradigm shifts. Big Data Soc. 2014, 1, 2053951714528481. [Google Scholar] [CrossRef]
- Zhang, C.; Mayr, P.; Lu, W.; Zhang, Y. Extraction and evaluation of knowledge entities from scientific documents. J. Data Inf. Sci. 2021, 129, 7167. [Google Scholar] [CrossRef]
- Han, S.; Zhang, R.F.; Shi, L.; Richie, R.; Liu, H.; Tseng, A.; Quan, W.; Ryan, N.; Brent, D.; Tsui, F.R. Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing. J. Biomed. Inform. 2022, 127, 103984. [Google Scholar] [CrossRef]
- Appiahene, P.; Missah, Y.M.; Najim, U. Predicting bank operational efficiency using machine learning algorithm: Comparative study of decision tree, random forest, and neural networks. Adv. Fuzzy Syst. 2020, 2020, 8581202. [Google Scholar] [CrossRef]
- Tiwari, S.; Chanak, P.; Singh, S.K. A review of the machine learning algorithms for COVID-19 case analysis. IEEE Trans. Artif. Intell. 2022, 4, 44–59. [Google Scholar] [CrossRef]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Wang, Y.; Zhang, C.; Li, K. A review on method entities in the academic literature: Extraction, evaluation, and application. Scientometrics 2022, 127, 2479–2520. [Google Scholar] [CrossRef]
- Howison, J.; Bullard, J. Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature. J. Assoc. Inf. Sci. Technol. 2016, 67, 2137–2155. [Google Scholar] [CrossRef]
- Wang, Y.; Zhang, C.; Song, M.; Kim, S.; Ko, Y.; Lee, J. Exploring academic influence of algorithms by co-occurrence network based on full-text of academic papers. Aslib J. Inf. Manag. 2024, 77, 651–680. [Google Scholar] [CrossRef]
- Zhang, Z.; Tam, W.; Cox, A. Towards automated analysis of research methods in library and information science. Quant. Sci. Stud. 2021, 2, 698–732. [Google Scholar] [CrossRef]
- Bornmann, L.; Mutz, R. Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. J. Assoc. Inf. Sci. Technol. 2015, 66, 2215–2222. [Google Scholar] [CrossRef]
- Yuen, S.Y.; Chow, C.K.; Zhang, X.; Lou, Y. Which algorithm should I choose: An evolutionary algorithm portfolio approach. Appl. Soft Comput. 2016, 40, 654–673. [Google Scholar] [CrossRef]
- Yu, S.; Qing, Q.; Zhang, C.; Shehzad, A.; Oatley, G.; Xia, F. Data-driven decision-making in COVID-19 response: A survey. IEEE Trans. Comput. Soc. Syst. 2021, 8, 1016–1029. [Google Scholar] [CrossRef]
- Zhang, C.; Mayr, P.; Lu, W.; Zhang, Y. Guest editorial: Extraction and evaluation of knowledge entities in the age of artificial intelligence. Aslib J. Inf. Manag. 2023, 75, 433–437. [Google Scholar] [CrossRef]
- Ray, S. A quick review of machine learning algorithms. In Proceedings of the 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, 14–16 February 2019; pp. 35–39. [Google Scholar]
- Wang, Y.; Zhang, C. Using the full-text content of academic articles to identify and evaluate algorithm entities in the domain of natural language processing. J. Informetr. 2020, 14, 101091. [Google Scholar] [CrossRef]
- Oliveira, M.; Gama, J. An overview of social network analysis. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2012, 2, 99–115. [Google Scholar] [CrossRef]
- He, J.; Lou, W.; Li, K. How were science mapping tools applied? the application of science mapping tools in LIS and non-LIS domains. Proc. Assoc. Inf. Sci. Technol. 2019, 56, 404–408. [Google Scholar] [CrossRef]
- Pan, X.; Yan, E.; Cui, M.; Hua, W. Examining the usage, citation, and diffusion patterns of bibliometric mapping software: A comparative study of three tools. J. Informetr. 2018, 12, 481–493. [Google Scholar] [CrossRef]
- Belter, C.W. Measuring the value of research data: A citation analysis of oceanographic data sets. PLoS ONE 2014, 9, e92590. [Google Scholar] [CrossRef] [PubMed]
- Chu, H.; Ke, Q. Research methods: What’s in the name? Libr. Inf. Sci. Res. 2017, 39, 284–294. [Google Scholar] [CrossRef]
- Lozano, S.; Calzada-Infante, L.; Adenso-Díaz, B.; García, S. Complex network analysis of keywords co-occurrence in the recent efficiency analysis literature. Scientometrics 2019, 120, 609–629. [Google Scholar] [CrossRef]
- Behrouzi, S.; Sarmoor, Z.S.; Hajsadeghi, K.; Kavousi, K. Predicting scientific research trends based on link prediction in keyword networks. J. Informetr. 2020, 14, 101079. [Google Scholar] [CrossRef]
- Lv, Y.; Ding, Y.; Song, M.; Duan, Z. Topology-driven trend analysis for drug discovery. J. Informetr. 2018, 12, 893–905. [Google Scholar] [CrossRef]
- Li, K.; Yan, E.; Feng, Y. How is R cited in research outputs? Structure, impacts, and citation standard. J. Informetr. 2017, 11, 989–1002. [Google Scholar] [CrossRef]
- Yu, Q.; Wang, Q.; Zhang, Y.; Chen, C.; Ryu, H.; Park, N.; Baek, J.-E.; Li, K.; Wu, Y.; Li, D.; et al. Analyzing knowledge entities about COVID-19 using entitymetrics. Scientometrics 2021, 126, 4491–4509. [Google Scholar] [CrossRef]
- Landherr, A.; Friedl, B.; Heidemann, J. A critical review of centrality measures in social networks. Bus. Inf. Syst. Eng. 2010, 2, 371–385. [Google Scholar] [CrossRef]
- Comito, C.; Pizzuti, C. Artificial intelligence for forecasting and diagnosing COVID-19 pandemic: A focused review. Artif. Intell. Med. 2022, 128, 102286. [Google Scholar] [CrossRef] [PubMed]
- Opsahl, T.; Agneessens, F.; Skvoretz, J. Node centrality in weighted networks: Generalizing degree and shortest paths. Soc. Netw. 2010, 32, 245–251. [Google Scholar] [CrossRef]
- Das, D.; Biswas, S.K.; Bandyopadhyay, S. Perspective of AI system for COVID-19 detection using chest images: A review. Multimed. Tools Appl. 2022, 81, 21471–21501. [Google Scholar] [CrossRef] [PubMed]
- Anderson, B.S. Using text mining to glean insights from COVID-19 literature. J. Inf. Sci. 2023, 49, 373–381. [Google Scholar] [CrossRef]
- Cheng, X.; Zhao, Y.; Liao, S.S. Key topics in social science research on COVID-19: An automated literature analysis. Health Inf. Libr. J. 2023, 40, 343–358. [Google Scholar] [CrossRef]
- Zuo, X.; Chen, Y.; Ohno-Machado, L.; Xu, H. How do we share data in COVID-19 research? A systematic review of COVID-19 datasets in PubMed Central Articles. Brief. Bioinform. 2021, 22, 800–811. [Google Scholar] [CrossRef]
- Das, K.; Behera, R.N. A survey on machine learning: Concept, algorithms and applications. Int. J. Innov. Res. Comput. Commun. Eng. 2017, 5, 1301–1309. [Google Scholar]
- Comeau, D.C.; Wei, C.H.; Islamaj Doğan, R.; Lu, Z. PMC text mining subset in BioC: About three million full-text articles and growing. Bioinformatics 2019, 35, 3533–3535. [Google Scholar] [CrossRef]
- Li, X.; Lei, L. A bibliometric analysis of topic modelling studies (2000–2017). J. Inf. Sci. 2021, 47, 161–175. [Google Scholar] [CrossRef]
- Guo, Y.; Zhang, Y.; Lyu, T.; Prosperi, M.; Wang, F.; Xu, H.; Bian, J. The application of artificial intelligence and data integration in COVID-19 studies: A scoping review. J. Am. Med. Inform. Assoc. 2021, 28, 2050–2067. [Google Scholar] [CrossRef]
- Li, C.; Feng, S.; Zeng, Q.; Ni, W.; Zhao, H.; Duan, H. Mining dynamics of research topics based on the combined LDA and WordNet. IEEE Access 2018, 7, 6386–6399. [Google Scholar] [CrossRef]
- Das, K.; Samanta, S.; Pal, M. Study on centrality measures in social networks: A survey. Soc. Netw. Anal. Min. 2018, 8, 13. [Google Scholar] [CrossRef]
- Valente, T.W.; Coronges, K.; Lakon, C.; Costenbader, E. How correlated are network centrality measures? Connections 2008, 28, 16. [Google Scholar]
- Xu, K.; Zhou, M.; Yang, D.; Ling, Y.; Liu, K.; Bai, T.; Cheng, Z.; Li, J. Application of ordinal logistic regression analysis to identify the determinants of illness severity of COVID-19 in China. Epidemiol. Infect. 2020, 148, e146. [Google Scholar] [CrossRef]
- Kooman, J.P.; Carioni, P.; Kovarova, V.; Arkossy, O.; Winter, A.; Zhang, Y.; Bellocchio, F.; Kotanko, P.; Zhang, H.; Usvyat, L.; et al. Modifiable risk factors are important predictors of COVID-19-related mortality in patients on hemodialysis. Front. Nephrol. 2022, 2, 907959. [Google Scholar] [CrossRef]
- Ma, Y.; Liu, J.; Lu, W.; Cheng, Q. From “what” to “how”: Extracting the procedural scientific information toward the metric-optimization in AI. Inf. Process. Manag. 2023, 60, 103315. [Google Scholar] [CrossRef]
- Sarmiento Varón, L.; González-Puelma, J.; Medina-Ortiz, D.; Aldridge, J.; Alvarez-Saravia, D.; Uribe-Paredes, R.; Navarrete, M.A. The role of machine learning in health policies during the COVID-19 pandemic and in long COVID management. Front. Public Health 2023, 11, 1140353. [Google Scholar] [CrossRef]








| Index | Definition | Calculation method |
|---|---|---|
| Mention frequency | This index refers to the number of articles mentioning ML algorithms. The higher the mention count, the greater the influence of nodes. | The frequency of a node. |
| Weighted degree | This index refers to the sum of line weights of nodes in the network. The greater the weighted degree, the greater the influence of nodes. | . |
| Degree centrality | This index refers to the degree of nodes divided by the number of nodes in the network. The higher the degree centrality, the more important the node is in the network. | . |
| Eigenvector centrality | This index takes into account the interaction between nodes. The greater the influence of a node’s neighbors, the greater the influence of the node. | . |
| Closeness centrality | This index determines whether a node is close to the center of the network. High closeness centrality means that the closer the node is to other nodes, the more important the node is. | , which is the number of edges. |
| Betweenness centrality | This index is used to determine whether a node occupies an important path in the network from the perspective of network flow. The higher the betweenness centrality is, the shortest paths pass through the node, and the greater the influence of the node. | . |
| Research Topic | Topic Number | Count | Overview |
|---|---|---|---|
| Social impact | Topic 1 | 3289 | This topic focuses on analyzing factors to people’s attitudes, the spread of COVID-19, infection, and death. Students and workers participated in surveys on the perceptions and behavioral change regarding COVID-19 and control measures to the highest degree. |
| Vaccination | Topic 2 | 1936 | This topic regards vaccination, antibody immunization, viral infection, and mutant strain studies. The advantages of ML models in classification and prediction supported vaccine development and vaccination. In addition, studies on vaccination have mainly focused on the public’s trust in the vaccine and factors influencing vaccine hesitancy. |
| Clinical diagnosis and symptoms of COVID-19 | Topic 3 | 3256 | This topic is centered on exploring factors of COVID-19 infection based on case studies and clinical characterization, which involves detecting and diagnosing COVID-19 infection through PCR testing, case analysis, and clinical evaluation of respiratory symptoms. |
| Diagnosis of medical images | Topic 4 | 2702 | This topic focuses mainly on the features of medical images, the accuracy of diagnostic methods, and diagnosis using the ML network algorithms. |
| Mental health | Topic 5 | 3098 | This topic focuses on the psychological impacts of COVID-19 and their spread in the general population, including anxiety, depression, and stress. It also focuses on monitoring psychological changes in the public, immediate intervention, and improving the decision-making abilities of all relevant departments. |
| Laboratory research on viruses | Topic 6 | 1904 | This topic includes research pertaining to infection at the cellular level, drug therapy, immune response, and blood-related research. Predictive studies are carried out based on the genome, transcriptome, and proteome. Predictive markers of disease are identified using ML. |
| Test and detection of SARS-CoV-2 | Topic 8 | 4019 | This topic is about the detection and infection of SARS-CoV-2 antigen, including coverage of antigen detection methods in clinical samples, detection of viral variants, and rapid diagnostic methods, with an emphasis on the level of virology. |
| Mortality risk and outcome analysis | Topic 9 | 4400 | This topic mainly contains mortality risk prediction, disease severity analysis, and related factor identification. The purpose of this topic is to identify high-risk patients at an early stage and to provide a reference for clinical decision-making and selection of treatment options to enhance treatment outcomes and optimize healthcare resource management. |
| Other minor COVID-19 research tasks | Topic 7 | 541 | This topic focuses on children’s health issues during COVID-19. |
| Topic 10 | 1524 | This topic focuses on women’s health, e.g., studying the association of COVID-19 with pregnancy. | |
| Topic 11 | 1352 | This topic primarily concerns the clinical management of COVID-19 patients and is particularly oriented toward studies of lung diseases. | |
| Topic 12 | 1300 | This topic regards the characterization of social life and public behaviors during the COVID-19 pandemic. | |
| Topic 13 | 1297 | This topic centers on the impact of the COVID-19 pandemic on healthcare issues. |
| No. | Mention Count | Weighted Degree | Degree Centrality | Betweenness Centrality | Closeness Centrality | Eigenvector Centrality | Normalized Average | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | CNN | 803 | CNN | 2084 | RF | 0.779 | RF | 0.120 | RF | 0.819 | RF | 0.221 | RF | 0.890 |
| 2 | DNN | 478 | SVM | 1960 | LR | 0.714 | DT | 0.087 | LR | 0.778 | SVM | 0.214 | CNN | 0.888 |
| 3 | SVM | 384 | RF | 1820 | SVM | 0.701 | CNN | 0.077 | SVM | 0.77 | LR | 0.213 | SVM | 0.784 |
| 4 | RF | 375 | DNN | 1628 | CNN | 0.688 | LR | 0.069 | CMM | 0.762 | DT | 0.209 | DNN | 0.733 |
| 5 | PCR | 369 | LR | 1406 | DT | 0.688 | SVM | 0.063 | DT | 0.755 | CNN | 0.207 | LR | 0.723 |
| 6 | TL | 361 | DT | 1226 | DNN | 0.636 | DNN | 0.060 | DNN | 0.733 | DNN | 0.198 | DT | 0.711 |
| 7 | LR | 245 | KNN | 902 | PCR | 0.610 | LSTM | 0.044 | PCR | 0.72 | PCR | 0.197 | PCR | 0.596 |
| 8 | LSTM | 238 | LSTM | 868 | LSTM | 0.597 | SVR | 0.039 | LSTM | 0.706 | LSTM | 0.193 | LSTM | 0.579 |
| 9 | DT | 211 | PCR | 824 | SVR | 0.584 | PCR | 0.032 | SVR | 0.7 | SVR | 0.190 | SVR | 0.518 |
| 10 | KNN | 140 | TL | 814 | LR | 0.519 | FC | 0.027 | KNN | 0.669 | MLP | 0.183 | TL | 0.508 |
| No. | Mention Count | Weighted Degree | Degree Centrality | Betweenness Centrality | Closeness Centrality | Eigenvector Centrality | Normalized Average | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | LR | 2981 | LR | 1902 | LR | 0.807 | LR | 0.384 | LR | 0.826 | LR | 0.373 | LR | 1 |
| 2 | PCR | 1211 | PCR | 962 | PCR | 0.632 | PCR | 0.3 | PCR | 0.731 | PCR | 0.309 | PCR | 0.681 |
| 3 | MEN | 291 | MEN | 686 | MEN | 0.474 | MEN | 0.054 | MEN | 0.640 | MEN | 0.296 | MEN | 0.436 |
| 4 | LRR | 218 | LRR | 318 | LRR | 0.386 | CART | 0.043 | LRR | 0.6 | RF | 0.249 | LRR | 0.335 |
| 5 | Lasso | 96 | Lasso | 194 | RF | 0.386 | LRR | 0.039 | RF | 0.594 | LRR | 0.237 | RF | 0.314 |
| 6 | RF | 80 | RF | 170 | Lasso | 0.316 | BN | 0.037 | Lasso | 0.57 | Lasso | 0.228 | Lasso | 0.277 |
| 7 | FA | 76 | SR | 164 | FA | 0.263 | BP | 0.035 | FA | 0.564 | FA | 0.190 | FA | 0.246 |
| 8 | SR | 73 | FA | 162 | DT | 0.263 | TL | 0.035 | CART | 0.559 | DT | 0.183 | CART | 0.233 |
| 9 | RR | 64 | RR | 154 | CART | 0.246 | RF | 0.031 | SVM | 0.543 | SVM | 0.179 | DT | 0.219 |
| 10 | PE | 43 | PE | 114 | SVM | 0.246 | KM | 0.022 | RR | 0.538 | CART | 0.178 | SVM | 0.216 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Huang, S.; Liang, L.; Zhao, Y. Evaluating Machine Learning Algorithms in COVID-19 Research: A Framework Based on Algorithm Co-Occurrence and Symmetric Network Analysis. Symmetry 2026, 18, 163. https://doi.org/10.3390/sym18010163
Huang S, Liang L, Zhao Y. Evaluating Machine Learning Algorithms in COVID-19 Research: A Framework Based on Algorithm Co-Occurrence and Symmetric Network Analysis. Symmetry. 2026; 18(1):163. https://doi.org/10.3390/sym18010163
Chicago/Turabian StyleHuang, Siqi, Luoming Liang, and Ying Zhao. 2026. "Evaluating Machine Learning Algorithms in COVID-19 Research: A Framework Based on Algorithm Co-Occurrence and Symmetric Network Analysis" Symmetry 18, no. 1: 163. https://doi.org/10.3390/sym18010163
APA StyleHuang, S., Liang, L., & Zhao, Y. (2026). Evaluating Machine Learning Algorithms in COVID-19 Research: A Framework Based on Algorithm Co-Occurrence and Symmetric Network Analysis. Symmetry, 18(1), 163. https://doi.org/10.3390/sym18010163

