# Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences

## Abstract

**:**

## 1. Introduction

- A novel feature extraction method based on a novel pattern that was inspired by a literary work. The presented feature extraction method is the first text-based feature extraction function creation methodology.
- Using protein sequences, a classification model incorporating the novel pattern was applied for the binary classification of SARS-CoV-2 versus Influenza-A diagnosis. The model attained excellent classification performance, supporting its potential use as an adjunctive screening tool for suspected viral respiratory infections in the current pandemic.

## 2. Materials and Methods

#### 2.1. Materials

#### 2.2. Our Proposed Protein Sequence Classification Model

#### 2.2.1. Feature Extraction Using HamletPat

Algorithm 1. Pattern generator using enumerated letters. |

Input: The calculated values of the lettersOutput: Pattern |

01: for i = 1 to 26 do // Assign counter |

02: $counter\left(i\right)=0$; |

03: end for i |

04: i = 1; j = 1; // Define variables |

05: $sum={\sum}_{i=1}^{26}counter\left(i\right);$ |

06: while $sum$ < 26 do |

07: $v=val\left(i\right);$ |

08: if $counter\left(v\right)=0$ then |

09: $counter\left(v\right)=1$; |

10: $pattern\left(j\right)=v$; |

11: $j++;$ |

12: end if |

13: $sum={\sum}_{i=1}^{26}counter\left(i\right);$ |

14: $i++;$ |

15: end while |

^{9}), 256 (=2

^{8}), and 512 (=2

^{9}), respectively.

#### 2.2.2. Iterative Chi-Square Feature Selection

#### 2.2.3. Classification

## 3. Results

#### 3.1. Experimental Setup

#### 3.2. Evaluation Metrics

#### 3.3. Performance of the Proposed Model

#### 3.4. Time Complexity Analysis

## 4. Discussion

- Influenza and COVID-19 share similar symptoms, and clinical discrimination is difficult. Therefore, an automated protein-sequence-based model was developed to differentiate the disorders automatically.
- To our knowledge, HamletPat is the first text-based pattern utilized to create a new feature extraction function.
- The novel HamletPat-based classification model was trained on a two-class dataset and attained 99.87% and 99.92% accuracy rates by deploying a five-fold CV and hold-out (split ratio 75:25) CV, respectively.
- The model is simple, it has a low time complexity of $O\left(n\right),$ and is easy to implement.

- The model used overlapping blocks with a fixed length of 27. Therefore, the minimum length of the studied protein sequence should be 27 (we used a protein sequence with a length of 100 or greater in the study).
- We used the SVM classifier with default hyperparameters in the study. The hyperparameters can be further optimized using a metaheuristic optimization model.

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Appendix A

clc,clear all,close all text = ‘bernardowhostherefrancisconayanswermestandandunfoldyourselfbernardolonglivethekingfranciscobernardobernardohefranciscoyoucomemostcarefullyuponyourhourbernardotisnowstrucktwelvegettheetobedfranciscofranciscoforthisreliefmuchthankstisbittercoldandiamsickatheartbernardohaveyouhadquietguardfrancisconotamousestirringbernardowellgoodnightifyoudomeethoratioandmarcellustherivalsofmywatchbidthemmakehastefranciscoithinkihearthemstandhowhosthereenterhoratioandmarcellushoratiofriendstothisgroundmarcellusandliegementothedanefranciscogiveyougoodnightmarcellusofarewellhonestsoldierwhohathrelievedyoufranciscobernardohasmyplacegiveyougoodnightexitmarcellushollabernardobernardosaywhatishoratiotherehoratioapieceofhimbernardowelcomehoratiowelomegoodmarcellusmarcelluswhathasthisthingappeardagaintonightbernardoihaveseennothingmarcellushoratiosaystisbutourfantasyandwillnotletbelieftakeholdofhimtouchingthisdreadedsighttwiceseenofusthereforeihaveentreatedhimalongwithustowatchtheminutesofthisnightthatifagainthisapparitioncomehemayapproveoureyesandspeaktoithoratiotushtushtwillnotappearbernardositdownawhileandletusonceagainassailyourearsthataresofortifiedagainstourstorywhatwehavetwonightsseenhoratiowellsitwedownandletushearbernardospeakofthisbernardolastnightofallwhenyondsamestarthatswestwardfromthepolehadmadehiscoursetoillumethatpartofheavenwherenowitburnsmarcellusandmyselfthebellthenbeatingoneenterghostmarcelluspeacebreaktheeofflookwhereitcomesagainbernardointhesamefigurelikethekingthatsdeadmarcellusthouartascholarspeaktoithoratiobernardolooksitnotlikethekingmarkithoratiohoratiomostlikeitharrowsmewithfearandwonderbernardoitwouldbespoketomarcellusquestionithoratiohoratiowhatartthouthatusurpstthistimeofnighttogetherwiththatfairandwarlikeforminwhichthemajestyofburieddenmarkdidsometimesmarchbyheavenichargetheespeakmarcellusitisoffendedbernardoseeitstalksawayhoratiostayspeakspeakichargetheespeakexitghostmarcellustisgoneandwillntanswerbernardohownowhoratioyoutrembleandlookpaleisnotthissomethingmorethanfantasywhatthinkyouonthoratiobeforemygodimightnotthisbelievewithoutthesensibleandtrueavouchofmineowneyesmarcellusisitnotlikethekinghoratioasthouarttothyselfsuchwastheveryarmourhehadonwhenhetheambitiousnorwaycombatedsofrowndheoncewheninanangryparlehesmotethesleddedpolacksontheicetisstrangemarcellusthustwicebeforeandjumpatthisdeadhourwithmartialstalkhathhegonebyourwatchhoratioinwhatparticularthoughttoworkiknownotbutinthegrossandscopeofmyopinionthisbodessomestrangeeruptiontoourstatemarcellusgoodnowsitdownandtellmehethatknowswhythissamestrictandmostobservantwatchsonightlytoilsthesubjectofthelandandwhysuchdailycastofbrazencannonandforeignmartforimplementsofwarwhysuchimpressofshipwrightswhosesoretaskdoesnotdividethesundayfromtheweekwhatmightbetowardthatthissweatyhastedothmakethenightjointlabourerwiththedaywhoistthatcaninformmehoratiothatcaniatleastthewhispergoessoourlastkingwhoseimageevenbutnowappeardtouswasasyouknowbyfortinbrasofnorwaytheretoprickdonbyamostemulatepridedaredtothecombatinwhichourvalianthamletforsothissideofourknownworldesteemdhimdidslaythisfortinbraswhobyasealdcompactwellratifiedbylawandheraldrydidforfeitwithhislifeallthosehislandswhichhestoodseizedoftotheconqueroragainstthewhichamoietycompetentwasgagedbyourkingwhichhadreturndtotheinheritanceoffortinbrashadhebeenvanquisherasbythesamecovenantandcarriageofthearticledesigndhisfelltohamletnowsiryoungfortinbrasofunimprovedmettlehotandfullhathintheskirtsofnorwayhereandtheresharkdupalistoflawlessresolutesforfoodanddiettosomeenterprisethathathastomachintwhichisnootherasitdothwellappearuntoourstatebuttorecoverofusbystronghandandtermscompulsatorythoseforesaidlandssobyhisfatherlostandthisitakeitisthemainmotiveofourpreparationsthesourceofthisourwatchandthechiefheadofthisposthasteandromageinthelandbernardoithinkitbenootherbuteensowellmayitsortthatthisportentousfigurecomesarmedthroughourwatchsolikethekingthatwasandisthequestionofthesewarshoratioamoteitistotroublethemindseyeinthemosthighandpalmystateofromealittleerethemightiestjuliusfellthegravesstoodtenantlessandthesheeteddeaddidsqueakandgibberintheromanstreetsasstarswithtrainsoffireanddewsofblooddisastersinthesunandthemoiststaruponwhoseinfluenceneptunesempirestandswassickalmosttodoomsdaywitheclipseandeventhelikeprecurseoffierceeventsasharbingersprecedingstillthefatesandprologuetotheomencomingonhaveheavenandearthtogetherdemonstrateduntoourclimaturesandcountrymenbutsoftbeholdlowhereitcomesagainreenterghostillcrossitthoughitblastmestayillusionifthouhastanysoundoruseofvoicespeaktomeiftherebeanygoodthingtobedonethatmaytotheedoeasandgracetomespeaktomecockcrowsifthouartprivytothycountrysfatewhichhappilyforeknowingmayavoidospeakorifthouhastuphoardedinthylifeextortedtreasureinthewombofearthforwhichtheysayyouspiritsoftwalkindeathspeakofitstayandspeakstopitmarcellusmarcellusshallistrikeatitwithmypartisanhoratiodoifitwillnotstandbernardotisherehoratiotisheremarcellustisgoneexitghostwedoitwrongbeingsomajesticaltoofferittheshowofviolenceforitisastheairinvulnerableandourvainblowsmaliciousmockerybernardoitwasabouttospeakwhenthecockcrewhoratioandthenitstartedlikeaguiltythinguponafearfulsummonsihaveheardthecockthatisthetrumpettothemorndothwithhisloftyandshrillsoundingthroatawakethegodofdayandathiswarningwhetherinseaorfireinearthorairtheextravagantanderringspirithiestohisconfineandofthetruthhereinthispresentobjectmadeprobationmarcellusitfadedonthecrowingofthecocksomesaythatevergainstthatseasoncomeswhereinoursavioursbirthiscelebratedthebirdofdawningsingethallnightlongandthentheysaynospiritdaresstirabroadthenightsarewholesomethennoplanetsstrikenofairytakesnorwitchhathpowertocharmsohallowdandsograciousisthetimehoratiosohaveiheardanddoinpartbelieveitbutlookthemorninrussetmantlecladwalksoerthedewofyonhigheastwardhillbreakweourwatchupandbymyadviceletusimpartwhatwehaveseentonightuntoyounghamletforuponmylifethisspiritdumbtouswillspeaktohimdoyouconsentweshallacquainthimwithitasneedfulinourlovesfittingourdutymarcellusletsdotiprayandithismorningknowwhereweshallfindhimmostconveniently’; number = double(text)-96; histo = zeros(1,26); for j = 1:length(number) histo(number (j)) = histo(number (j)) + 1; end plot(histo) % Pattern Generation counter = zeros(1,26); summ = sum(counter); i = 1; j = 1; while(summ < 26) sy = number(i); if (counter(sy) == 0) counter(sy) = 1; pattern(j) = sy; j = j + 1; end summ = sum(counter); i = i + 1; end |

function histo = hamlet_pat(sinyal) h1 = zeros(1512); h2 = zeros(1256); h3 = h1; for i = 1:length(sinyal)-26 blok = sinyal(i:i + 26); m = blok(14); deger(1:13) = blok(1:13); deger(14:26) = blok(15:27); for j = 1:26 bit(j) = deger(j) >= m; end b1(1:9) = bit(1:9); b2(1:8) = bit(10:17); b3(1:9) = bit(18:26); m1(i) = 0; m2(i) = 0; m3(i) = 0; for j = 1:9 m1(i) = m1(i) + b1(j)*2^(j-1); m3(i) = m3(i) + b3(j)*2^(j-1); end for j = 1:8 m2(i) = m2(i) + b2(j)*2^(j-1); end h1(m1(i) + 1) = h1(m1(i) + 1) + 1; h2(m2(i) + 1) = h2(m2(i) + 1) + 1; h3(m3(i) + 1) = h3(m3(i) + 1) + 1; end histo = [h1 h2 h3]; |

**Figure 1.**Schema of the proposed HamletPat-based model for binary classification of viral protein sequences.

**Figure 2.**Block diagram of the proposed text-based feature extraction function generation model. We used Hamlet as a text in this paper. In the figure, Map defines feature map signals, and Hist is histogram.

**Figure 5.**Confusion matrices of the HamletPat-based classification model using hold-out (split ratio 75:25) versus 5-fold cross-validations (CVs).

**Figure 6.**Rules of a basic decision-support system using our selected features. Herein, the symbol 1 denotes COVID-19, while 2 denotes influenza.

id | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Ind. | 2 | 5 | 18 | 14 | 1 | 4 | 15 | 23 | 8 | 19 | 20 | 6 | 3 |

id | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 |

Ind. | 9 | 25 | 13 | 21 | 12 | 7 | 22 | 11 | 16 | 17 | 24 | 10 | 26 |

**Table 2.**Performance metrics for binary classification of viral protein sequences into SARS-CoV-2 versus Influenza-A using the HamletPat-based classification model.

Metric | Cross Validation | SARS-CoV-2 | Influenza-A |
---|---|---|---|

Sensitivity (%) | 5-fold CV | 99.95 | 99.79 |

75:25 | 100 | 99.86 | |

Specificity (%) | 5-fold CV | 99.79 | 99.95 |

75:25 | 99.86 | 100 | |

Precision (%) | 5-fold CV | 99.76 | 99.96 |

75:25 | 99.83 | 100 | |

F1-score (%) | 5-fold CV | 99.86 | 99.87 |

75:25 | 99.92 | 99.93 | |

Overall accuracy (%) | 5-fold CV | 99.87 | |

75:25 | 99.92 | ||

Overall geometric mean (%) | 5-fold CV | 99.87 | |

75:25 | 99.93 |

Model | Dataset | Number of Observations | Method | Result |
---|---|---|---|---|

Afify and Zanaty [9] | NCBI | 18,476 protein sequences: 9238 COVID-19 9238 HIV | Conjoint triad feature extraction and Random Forest classification with hold-out validation (80:20) | Accuracy: 99.80% |

Our model | NCBI | 36,424 protein sequences: 16,901 COVID-19 19,523 Influenza-A | HamletPat feature extraction, IChi2 feature selection, and SVM classification with hold-out validation (75:25) and 5-fold CV | Accuracy: hold-out: 99.92% 5-fold CV: 99.87% |

