# Anomaly Identification during Polymerase Chain Reaction for Detecting SARS-CoV-2 Using Artificial Intelligence Trained from Simulated Data

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Results and Discussion

#### 2.1. Principal Component Analysis

#### 2.2. The ML Model

#### 2.3. Data Simulation

_{p}is the maximum amplitude, b is the growth rate, Thd is the threshold (Thd) for determining Cq, rand is a function that returns random numbers from 0 to 1, and the Thd∗rand multiplication simulates noise (Figure 3).

_{Max}and L_PC

_{Min}are the maximum and minimum values of L_PC, respectively (see Algorithms 1 and 2). The simulated data are shown in Figure 3.

Algorithm 1 Random Function (Matlab) |

function y=aleat(x,x2) |

dang=abs(x-x2); |

dt=rand() * dang; |

if x > x2 |

y=x2+dt; |

else |

y=x+dt; |

end |

end |

Algorithm 2 Simulation Algorithm Using PC (Matlab) |

pos=zeros(1000,46); % class + |

neg=zeros(1000,46); % class – |

Aa=zeros(1000,46); % class Aa |

AaEx=PCA; % Principal component |

AaExn=zeros(20,46); |

k=1; |

while k < 21 % the # PC was 20 |

AaExn(k,:)=(AaEx(k,:)-min(AaEx(k,:)))./(max(AaEx(k,:))-min(AaEx(k,:))); |

r=1; |

while r < 51 |

one=ones(1,46); |

s=1; |

while s < 47 |

one(s)=one(s) * rand(); |

s=s+1; |

end |

Ap=aleat(140,300); |

Apl=aleat(0,100); |

Aa(50 * (k-1)+r,:)=(Ap. * AaExn(k,:))+one-Apl; |

r=r+1; |

end |

k=k+1; |

end |

i=1; |

Thd=20; |

while i < 1001 |

b=aleat(0.02,0.5); % parameter b |

Cqp=aleat(10,40); % Cq for + |

Cqn=aleat(41,100); % Cq for – |

Ap=aleat(40,2000); % parameter Ap |

j=1; |

Cmp=((log((Ap/Thd)-1))/b)+Cqp; % Cq for + |

Cmn=((log((Ap/Thd)-1))/b)+Cqn; %Cq for – |

while j < 47 |

pos(i,j)=(Ap./(1+exp(-b. * (j-Cmp))))+(6 * rand()); |

neg(i,j)=(Ap./(1+exp(-b. * (j-Cmn))))+(6 * rand()); |

j=j+1; |

end |

i=i+1; |

end |

X=[pos; neg; Aa]; |

#### 2.4. Big Data Classification

#### 2.5. Challenges of the Methodology

#### 2.5.1. Data Simulation from Random Function (DSRF)

Algorithm 3 Simulation Algorithm Using Random Function (Matlab) |

pos=zeros(1000,46); % class + |

neg=zeros(1000,46); % class – |

Aa=zeros(1000,46); % class Aa |

i=1; |

Thd=20; |

while i < 1001 |

b=aleat(0.02,0.5); % parameter b |

Cqp=aleat(10,40); % Cq for + |

Cqn=aleat(41,100); % Cq for – |

Ap=aleat(40,2000); % parameter Ap |

j=1; |

Cmp=((log((Ap/Thd)-1))/b)+Cqp; |

Cmn=((log((Ap/Thd)-1))/b)+Cqn; |

while j < 47 |

pos(i,j)=(Ap./(1+exp(-b. * (j-Cmp))))+(Thd * rand()/3); |

neg(i,j)=(Ap./(1+exp(-b. * (j-Cmn))))+(Thd * rand()/3); |

if j<5 |

Aa(i,j)= aleat(Thd,Ap); |

else |

Aa(i,j)= aleat(10 * Thd,2 * Ap); |

end |

j=j+1; |

end |

% data smoothing |

Aa(i,:)=(Aa(i,:)+((circshift(Aa(i,:)′,1)′))+((circshift(Aa(i,:)′,2)′))+((circshift(Aa(i,:)′,3)′)))./4; |

Aa(i,:)=(Aa(i,:)+((circshift(Aa(i,:)′,1)′))+((circshift(Aa(i,:)′,2)′))+((circshift(Aa(i,:)′,3)′)))./4; |

% offset referring to the initial intensity |

Aa(i,:)=Aa(i,:)-mean(Aa(i,1:4)); |

i=i+1; |

end |

Xa=[pos; neg; Aa] |

#### 2.5.2. Data Simulation from ML (DSML)

#### 2.6. Implementation of AI

## 3. Materials and Methods

#### 3.1. Clinical Specimens

#### 3.2. Nucleic Acid Extraction

#### 3.3. PCR Method

#### 3.4. ML Methods Analysis and Data Simulation

#### 3.5. Web Platform Design for the Implementation of AI

## 4. Discussion and Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Mashamba-Thompson, T.P.; Crayton, E.D. Blockchain and artificial intelligence technology for novel coronavirus disease 2019 self-testing. Diagnostics
**2020**, 10, 198. [Google Scholar] [CrossRef] [PubMed][Green Version] - Huang, L.; Zhang, H.; Deng, D.; Zhao, K.; Liu, K.; Hendrix, D.A.; Mathews, D.H. LinearFold: Linear-time approximate RNA folding by 5′-to-3′ dynamic programming and beam search. Bioinformatics
**2019**, 35, i295–i304. [Google Scholar] [CrossRef] [PubMed][Green Version] - Jumper, J.; Tunyasuvunakool, K.; Kohli, P.; Hassabis, D.; Team, A. Computational Predictions of Protein Structures Associated with COVID-19. Available online: https://deepmind.com/research/open-source/computational-predictions-of-protein-structures-associated-with-COVID-19 (accessed on 28 July 2020).
- Robson, B. Computers and viral diseases. Preliminary bioinformatics studies on the design of a synthetic vaccine and a preventative peptidomimetic antagonist against the SARS-CoV-2 (2019-nCoV, COVID-19) coronavirus. Comput. Biol. Med.
**2020**, 119, 103670. [Google Scholar] [CrossRef] [PubMed] - Cai, C.Z.; Han, L.Y.; Chen, X.; Cao, Z.W.; Chen, Y.Z. Prediction of functional class of the SARS coronavirus proteins by a statistical learning method. J. Proteome Res.
**2005**, 4, 1855–1862. [Google Scholar] [CrossRef] - Ahuja, A.S.; Reddy, V.P.; Marques, O. Artificial intelligence and COVID-19: A multidisciplinary approach. Integr. Med. Res.
**2020**, 9, 100434. [Google Scholar] [CrossRef] - Allam, Z.; Dey, G.; Jones, D.S. Artificial Intelligence (AI) provided early detection of the Coronavirus (COVID-19) in China and will influence future urban health policy internationally. AI
**2020**, 1, 156–165. [Google Scholar] [CrossRef][Green Version] - Fusco, A.; Dicuonzo, G.; Dell’Atti, V.; Tatullo, M. Blockchain in healthcare: Insights on COVID-19. Int. J. Environ. Res. Public Health
**2020**, 17, 7167. [Google Scholar] [CrossRef] - Rakib, A.; Paul, A.; Chy, M.N.U.; Sami, S.A.; Baral, S.K.; Majumder, M.; Tareq, A.M.; Amin, M.N.; Shahriar, A.; Uddin, M.Z.; et al. Biochemical and computational approach of selected phytocompounds from tinospora crispa in the management of COVID-19. Molecules
**2020**, 25, 3936. [Google Scholar] [CrossRef] - Galán-Freyle, N.J.; Ospina-Castro, M.L.; Medina-González, A.R.; Villarreal-González, R.; Hernández-Rivera, S.P.; Pacheco-Londoño, L.C. Artificial intelligence assisted mid-infrared laser spectroscopy in situ detection of petroleum in soils. Appl. Sci.
**2020**, 10, 1319. [Google Scholar] [CrossRef][Green Version] - Pacheco-Londoño, L.C.; Warren, E.; Galán-Freyle, N.J.; Villarreal-González, R.; Aparicio-Bolaño, J.A.; Ospina-Castro, M.L.; Shih, W.C.; Hernández-Rivera, S.P. Mid-infrared laser spectroscopy detection and quantification of explosives in soils using multivariate analysis and artificial intelligence. Appl. Sci.
**2020**, 10, 4178. [Google Scholar] [CrossRef] - Hammad, M.; Maher, A.; Wang, K.; Jiang, F.; Amrani, M. Detection of abnormal heart conditions based on characteristics of ECG signals. Measurement
**2018**, 125, 634–644. [Google Scholar] [CrossRef] - Alghamdi, A.S.; Polat, K.; Alghoson, A.; Alshdadi, A.A.; Abd El-Latif, A.A. A novel blood pressure estimation method based on the classification of oscillometric waveforms using machine-learning methods. Appl. Acoust.
**2020**, 164, 107279. [Google Scholar] [CrossRef] - Khalil, H.; El-Hag, N.; Sedik, A.; El-Shafie, W.; Mohamed, A.E.N.; Khalaf, A.A.M.; El-Banby, G.M.; Abd El-Samie, F.I.; El-Fishawy, A.S. Classification of diabetic retinopathy types based on Convolution Neural Network (CNN). Menoufia J. Electron. Eng. Res.
**2019**, 28, 126–153. [Google Scholar] [CrossRef] - Haggag, N.T.; Sedik, A.; Elbanby, G.M.; El-Fishawy, A.S.; Khalaf, A.A. Classification of Corneal Pattern Based on Convolutional LSTM Neural Network. Menoufia J. Electr. Eng. Res.
**2019**, 28, 158–162. [Google Scholar] [CrossRef] - Sedik, A.; Iliyasu, A.M.; Abd El-Rahiem, B.; Abdel Samea, M.E.; Abdel-Raheem, A.; Hammad, M.; Peng, J.; Abd El-Samie, F.E.; Abd El-Latif, A.A. Deploying machine and deep learning models for efficient data-augmented detection of COVID-19 infections. Viruses
**2020**, 12, 769. [Google Scholar] [CrossRef] [PubMed] - Zhavoronkov, A. Artificial intelligence for drug discovery, biomarker development, and generation of novel chemistry. Mol. Pharm.
**2018**, 15, 4311–4313. [Google Scholar] [CrossRef][Green Version] - Yan, L.; Zhang, H.T.; Xiao, Y.; Wang, M.; Sun, C.; Liang, J.; Li, S.; Zhang, M.; Guo, Y.; Xiao, Y.; et al. Prediction of criticality in patients with severe Covid-19 infection using three clinical features: A machine learning-based prognostic model with clinical data in Wuhan. medRxiv
**2020**. [Google Scholar] [CrossRef][Green Version] - Kriegova, E.; Fillerova, R.; Kvapil, P. Direct-RT-qPCR detection of SARS-CoV-2 without RNA extraction as part of a COVID-19 testing strategy: From sample to result in one hour. Diagnostics
**2020**, 10, 605. [Google Scholar] [CrossRef] - Carter, L.J.; Garner, L.V.; Smoot, J.W.; Li, Y.; Zhou, Q.; Saveson, C.J.; Sasso, J.M.; Gregg, A.C.; Soares, D.J.; Beskid, T.R.; et al. Assay techniques and test development for COVID-19 diagnosis. ACS Cent. Sci.
**2020**, 6, 591–605. [Google Scholar] [CrossRef] - Yip, C.C.Y.; Sridhar, S.; Leung, K.H.; Ng, A.C.K.; Chan, K.H.; Chan, J.F.W.; Tsang, O.T.Y.; Hung, I.F.N.; Cheng, V.C.C.; Yuen, K.Y.; et al. Development and evaluation of novel and highly sensitive single-tube nested real-time RT-PCR assays for SARS-CoV-2 detection. Int. J. Mol. Sci.
**2020**, 21, 5674. [Google Scholar] [CrossRef] - Chow, F.W.N.; Chan, T.T.Y.; Tam, A.R.; Zhao, S.; Yao, W.; Fung, J.; Cheng, F.K.K.; Lo, G.C.S.; Chu, S.; Aw-Yong, K.L.; et al. A rapid, simple, inexpensive, and mobile colorimetric assay COVID-19-LAMP for mass on-site screening of COVID-19. Int. J. Mol. Sci.
**2020**, 21, 5380. [Google Scholar] [CrossRef] [PubMed] - Allam, M.; Cai, S.; Ganesh, S.; Venkatesan, M.; Doodhwala, S.; Song, Z.; Hu, T.; Kumar, A.; Heit, J.; Coskun, A.F.; et al. COVID-19 diagnostics, tools, and prevention. Diagnostics
**2020**, 10, 409. [Google Scholar] [CrossRef] [PubMed] - Yuan, X.; Yang, C.; He, Q.; Chen, J.; Yu, D.; Li, J.; Zhai, S.; Qin, Z.; Du, K.; Chu, Z.; et al. Current and perspective diagnostic techniques for COVID-19. ACS Infect. Dis.
**2020**, 6, 1998–2016. [Google Scholar] [CrossRef] [PubMed] - Chauhan, D.S.; Prasad, R.; Srivastava, R.; Jaggi, M.; Chauhan, S.C.; Yallapu, M.M. Comprehensive review on current interventions, diagnostics, and nanotechnology perspectives against SARS-CoV-2. Bioconjug. Chem.
**2020**, 31, 2021–2045. [Google Scholar] [CrossRef] [PubMed] - Epanechnikov, V.A. Non-parametric estimation of a multivariate probability density. Theory Probab. Its Appl.
**1969**, 14, 153–158. [Google Scholar] [CrossRef] - Rosenblatt, M. Remarks on some nonparametric estimates of a density function. Ann. Math. Stat.
**1956**, 27, 832–837. [Google Scholar] [CrossRef] - Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat.
**1962**, 33, 1065–1076. [Google Scholar] [CrossRef] - Fabian Pedregosa, G.V.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res.
**2012**, 13, 281–305. [Google Scholar] - Corman, V.M.; Landt, O.; Kaiser, M.; Molenkamp, R.; Meijer, A.; Chu, D.K.; Bleicker, T.; Brünink, S.; Schneider, J.; Schmidt, M.L.; et al. Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR. Eurosurveillance
**2020**, 25, 2000045. [Google Scholar] [CrossRef][Green Version] - Buitinck, L.; Louppe, G.; Blondel, M.; Pedregosa, F.; Mueller, A.; Grisel, O.; Niculae, V.; Prettenhofer, P.; Gramfort, A.; Grobler, J.; et al. API design for machine learning software: Experiences from the scikit-learn project. In Proceedings of the European Conference on Machine Learning and Principles and Practices of Knowledge Discovery in Databases, Prague, Czech Republic, 23 September 2013. [Google Scholar]

**Figure 1.**(

**a**) Score plots of principal component (PC) 2 vs. PC 1 for real-time RT-PCR curves for SARS-CoV-2 diagnostics; (

**b**) real-time RT-PCR curve plot for the two groups found during principal component analysis (PCA).

**Figure 3.**(

**a**) Logistic function plot for different growth rate parameters used in the simulations and simulated real-time RT-PCR curves for a class; (

**b**) simulated real-time RT-PCR curves for classes no amplification (–) and abnormal amplification (Aa).

**Figure 4.**(

**a**) Score plots of PC 2 vs. PC 1 for real-time RT-PCR curves for the diagnosis of SARS-CoV-2 and their classification based on the S-Data-model; (

**b**) scheme of the best preprocessing combination, preprocessing sequence: N[Cf[Sn[D[F]]]].

**Figure 5.**(

**a**) The artificial intelligence (AI) classification scheme; (

**b**) implementation of AI for the first PCR kit for COVID; (

**c**) implementation of AI for the second PCR kit for COVID.

**Table 1.**Confusion matrix and evaluation criteria for the random forest classifier (RFC) model for the well-characterized portion (W-CP).

Test for 20% of W-CP | ||||||||
---|---|---|---|---|---|---|---|---|

Precision | Recall | f1-Score | Support | Accuracy | Matrix of Confusion | |||

Model | + | − | ? | |||||

+ | 0.976 | 1.000 | 0.988 | 40 | 0.972 | 40 | 0 | 0 |

− | 0.961 | 1.000 | 0.980 | 49 | 0 | 49 | 0 | |

Aa | 1.000 | 0.700 | 0.824 | 17 | 1 | 2 | 14 |

+ | − | |
---|---|---|

Cq | 10 to 40 | 40 to 60 |

b | 0.2 to 1.0 | 0.2 to 0.8 |

Ap | 40 to 1000 | 40 to 1000 |

**Table 3.**Confusion matrix and evaluation criteria for the random forest classifier (RFC) model of the S-Data-model.

Test for 20% of Simulated Data | ||||||||
---|---|---|---|---|---|---|---|---|

Precision | Recall | f1-Score | Support | Accuracy | Matrix of Confusion | |||

Model | + | − | Aa | |||||

+ | 1.000 | 0.989 | 0.995 | 93 | 0.952 | 92 | 1 | 0 |

− | 0.989 | 1.000 | 0.994 | 87 | 0 | 87 | 0 | |

Aa | 0,873 | 1.000 | 0.932 | 117 | 0 | 0 | 117 | |

Test for W-CP | ||||||||

+ | 0.915 | 0.993 | 0.953 | 152 | 0,960 | 151 | 0 | 0 |

− | 0.984 | 0.996 | 0.990 | 255 | 1 | 254 | 0 | |

Aa | 1.000 | 0.859 | 0.924 | 142 | 13 | 4 | 122 |

**Table 4.**Confusion matrix and evaluation criteria for the test using all data for SB-model_RFC and SB2-model.

Test of All Data for the SB-Model_RFC | |||||||
---|---|---|---|---|---|---|---|

Precision | Recall | f1-Score | Support | Accuracy | Matrix of Confusion | ||

Model | + | −, Aa | |||||

+ | 0.955 | 0.979 | 0.967 | 5938 | 0.972 | 5811 | 127 |

−, Aa | 0.984 | 0.967 | 0.976 | 8284 | 272 | 8012 | |

Test of—and Aa of All Data for the SB2-Model | |||||||

Precision | Recall | f1-Score | Support | Accuracy | Matrix of Confusion | ||

Model | − | Aa | |||||

− | 0.990 | 0.950 | 0.970 | 7287 | 0.948 | 6923 | 364 |

Aa | 0.718 | 0.930 | 0.810 | 997 | 70 | 927 |

DSRF | DSML | |||
---|---|---|---|---|

Methods | Accuracy | Log Loss | Accuracy | Log Loss |

KNC | 97.5 | 0.6 | 93.0 | 1.2 |

SVM | 96.6 | 0.1 | 97.4 | 0.14 |

RFC | 92.2 | 0.2 | 96.1 | 0.2 |

QDA | 85.5 | 3.9 | 94.3 | 1.3 |

LDA | 97.6 | 0.1 | 98.0 | 0.18 |

**Table 6.**Confusion matrix and evaluation criteria for the test using all data for the SB-model_ DSRF _LDA and the SD-A.

Test of All Data for SB-Model_ DSRF _LDA (Preprocessing = N[Cf[Sn[D[F]]]]) | |||||||
---|---|---|---|---|---|---|---|

Precision | Recall | f-Score | Support | Accuracy | Matrix of Confusion | ||

Model | + | −, Aa | |||||

+ | 0.970 | 0.971 | 0.971 | 5801 | 0.976 | 5635 | 166 |

−, Aa | 0.980 | 0.979 | 0.980 | 8287 | 173 | 8114 | |

Test of—and Aa of All Data for SD-A | |||||||

Precision | Recall | f-Score | Support | Accuracy | Matrix of Confusion | ||

Model | − | Aa | |||||

− | 1.000 | 1.000 | 1.000 | 7289 | 1.000 | 7289 | 0 |

Aa | 1.000 | 1.000 | 1.000 | 998 | 0 | 998 |

**Table 7.**Confusion matrix and evaluation criteria for the test using all data for the SB-model_ DSML_LDA and the SD-A.

Test of All Data for SB-Model_ DSML_LDA (Preprocessing = N[Cf[Sn[D[F]]]]) | |||||||
---|---|---|---|---|---|---|---|

Precision | Recall | f-Score | Support | Accuracy | matrix of Confusion | ||

Model | + | −, Aa | |||||

+ | 0.969 | 0.970 | 0.970 | 5790 | 0.98 | 5616 | 174 |

−, Aa | 0.979 | 0.979 | 0.979 | 8306 | 178 | 8128 | |

Test of—and Aa of All Data for SD-A | |||||||

Precision | Recall | f-Score | Support | Accuracy | Matrix of Confusion | ||

Model | − | Aa | |||||

− | 1.000 | 1.000 | 1.000 | 7289 | 1.000 | 7289 | 0 |

Aa | 1.000 | 1.000 | 1.000 | 998 | 0 | 998 |

Sample Availability: Samples of the compounds are not available from the authors. |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Villarreal-González, R.; Acosta-Hoyos, A.J.; Garzon-Ochoa, J.A.; Galán-Freyle, N.J.; Amar-Sepúlveda, P.; Pacheco-Londoño, L.C. Anomaly Identification during Polymerase Chain Reaction for Detecting SARS-CoV-2 Using Artificial Intelligence Trained from Simulated Data. *Molecules* **2021**, *26*, 20.
https://doi.org/10.3390/molecules26010020

**AMA Style**

Villarreal-González R, Acosta-Hoyos AJ, Garzon-Ochoa JA, Galán-Freyle NJ, Amar-Sepúlveda P, Pacheco-Londoño LC. Anomaly Identification during Polymerase Chain Reaction for Detecting SARS-CoV-2 Using Artificial Intelligence Trained from Simulated Data. *Molecules*. 2021; 26(1):20.
https://doi.org/10.3390/molecules26010020

**Chicago/Turabian Style**

Villarreal-González, Reynaldo, Antonio J. Acosta-Hoyos, Jaime A. Garzon-Ochoa, Nataly J. Galán-Freyle, Paola Amar-Sepúlveda, and Leonardo C. Pacheco-Londoño. 2021. "Anomaly Identification during Polymerase Chain Reaction for Detecting SARS-CoV-2 Using Artificial Intelligence Trained from Simulated Data" *Molecules* 26, no. 1: 20.
https://doi.org/10.3390/molecules26010020