Applications of Machine Learning and High-Performance Computing in the Era of COVID-19

: During the ongoing pandemic of the novel coronavirus disease 2019 (COVID-19), latest technologies such as artiﬁcial intelligence (AI), blockchain, learning paradigms (machine, deep, smart, few short, extreme learning, etc.), high-performance computing (HPC), Internet of Medical Things (IoMT), and Industry 4.0 have played a vital role. These technologies helped to contain the disease’s spread by predicting contaminated people/places, as well as forecasting future trends. In this article, we provide insights into the applications of machine learning (ML) and high-performance computing (HPC) in the era of COVID-19. We discuss the person-speciﬁc data that are being collected to lower the COVID-19 spread and highlight the remarkable opportunities it provides for knowledge extraction leveraging low-cost ML and HPC techniques. We demonstrate the role of ML and HPC in the context of the COVID-19 era with the successful implementation or proposition in three contexts: (i) ML and HPC use in the data life cycle, (ii) ML and HPC use in analytics on COVID-19 data, and (iii) the general-purpose applications of both techniques in COVID-19’s arena. In addition, we discuss the privacy and security issues and architecture of the prototype system to demonstrate the proposed research. Finally, we discuss the challenges of the available data and highlight the issues that hinder the applicability of ML and HPC solutions on it.


Introduction
The novel coronavirus disease 2019  has drastically changed the era of information gathering, processing, analytics, use, distribution, and removal. Due to the COVID-19 pandemic, a huge amount of data is collected by various organizations to estimate the probability of exposure or tracing infected individuals [1]. In addition, this pandemic has increased the reliance on digital technologies to analyze people's mobility and monitor people's compliance with the quarantine guidelines [2]. In addition, many companies have helped governments with innovative technologies to control the COVID-19 crisis [3]. The role of advanced technologies has been dominant in effectively handling this pandemic. Many countries have better control of the pandemic by utilizing technology and human resources simultaneously [4]. The innovative features of each technology have helped to curb the disease spread. The unique features of this pandemic such as estimating the possibility of infection, the identification of potentially contaminated places, contact tracing, mobility analysis, flow modeling, infected people's information sharing, data analytics, high-risk-zone geofencing, compliance monitoring, trends prediction, estimating the likelihood of this pandemic's end, heterogeneous source data fusion, and the adoption of remote technologies have increased the utilization of digital technology. We present the taxonomy of the epidemic features that demand the utilization of advanced techniques such as machine learning (ML) and high-performance computing (HPC) in Figure 1. These features are unique to the COVID-19 era and require the utilization of advanced technologies to curb the spread. Through the utilization of ML and HPC, significant mitigation and control of this pandemic can be achieved. Despite the enormous benefits of technologies in the COVID-19 era, the adoption of digital solutions has been low, mainly due to privacy issues [5]. Furthermore, some countries have adopted digital solutions by underestimating privacy requirements, and significant results have been obtained in terms of disease control. Moreover, people's anxiety and worries remain higher in such countries. For instance, in South Korea, credit card data, CCTV data, mobile signals, and random calls were adopted to trace the contacts of infected people. Similarly, China used advanced decision support systems to control the disease. In some countries, mobile apps were employed to find potently suspected patients. Through a detailed synthesis of the literature and analysis of multiple available apps and reports, we provide the statistics of ten countries in terms of higher adoption of AI over a period of almost one and half years in Figure 2. Furthermore, we provide insights into the investment in technology by companies around the world to combat the crisis caused by the ongoing pandemic [6]. To date, many studies have provided coverage of AI applications in the era of COVID-19. However, these studies have mainly focused on the general applications of the technology in the education sector, industry areas, tourism sector, etc. However, a concrete overview of ML and HPC usage in the COVID-19 context, and their usefulness in the mitigation and control of this pandemic, has not been discussed in previous studies. To address these issues, this study covers the applications of ML and HPC considering epidemic features (i.e., epidemic control measures, data life cycle in epidemic systems, and general application of ML and HPC). More specifically, we highlight the role, need, and utility of ML and HPC technologies in the COVID-19 era. To this end, a compact overview of the fidelity of ML and HPC techniques in the era of COVID-19 is demonstrated with practical examples. Through this concise perspective, we hope to provide a solid foundation for future research in the COVID-19 area.
The rest of this perspective is organized as follows. Section 2 provides an insight into the data of individuals collected and processed in the COVID-19 era from heterogeneous sources. Section 3 highlights the effectiveness of the ML and HPC techniques in the COVID-19 era. Section 4 discusses the challenges of technology adoption due to data issues, security and privacy issues, a prototype system to demonstrate the research, state-of-the-art studies and their key findings, and potential research directions. Finally, the study is concluded in Section 5.

Insights on Data Collection and Analytics Opportunities in the COVID-19 Era
In recent years, data have been regarded as an oil of the economy, as they can assist nations and countries in countless ways, including human behavior analysis, recommendations, policy planning, and improving the living standards of people, to name a few. The latest technologies are very good at finding insights from large-scale data [7]. We highlight the data collected and processed in the COVID-19 era and corresponding analytics opportunities offered by them in Figure 3. Apart from these techniques, Big Data analytics on existing data, such as commodities data, have profound benefits in separating the vulnerable population [8]. In addition, the collection of facilities' visit log data can be employed in individual profiling for mobility analysis. Despite the comprehensive analytics given in Figure 3, we provide the following analytics functions that can be exploited to curb the disease spread.

•
Finding the most exposed people through location analysis (e.g., church/moviegoers Predicting discharge dates based on demographics and comorbidity data.
The analytics documented above open up the era of research with new data that have not been analyzed before. With the help of the latest technologies, all of the above tasks can be achieved with sufficient accuracy to improve healthcare. By utilizing the above functions, the burden of healthcare can be significantly reduced. Furthermore, automated diagnosis and prescriptions can be provided with ease.

Effectiveness of ML and HPC Techniques in the COVID-19 Era
The combined utilization of ML and HPC can pave the way for improved healthcare and the effective mitigation and control of COVID-19. In the recent past, these techniques have demonstrated effectiveness in controlling the disease. HPC is very helpful in processing largescale data of individuals. We present the role of ML and HPC in the COVID-19 era in three contexts: in the data management life cycle, in analytics, and in general-purpose applications.

ML and HPC Techniques' Role in the Data Management Life Cycle
The data life cycle (DLC) is an important building block of knowledge-based applications. It has seven phases, and each phase has unique functions to be performed on the data. We provide an overview of the phases of the DLC in Table 1. The list of phases provided in Table 1 is generic, and they can be adopted for COVID-19 applications/systems. We highlight the use of ML and HPC in the above-mentioned phases of the DLC in Figure 4. ML techniques are mostly used for extracting desired knowledge from data. In contrast, HPC techniques can be useful for storing intermediate results or processing large-scale data.

Role of ML and HPC Techniques in Data Analytics in the COVID-19 Era
Data have become a crucial part of knowledge-based applications. They unlock the knowledge enclosed in underlying data. The latest analytics techniques with hyperparameter tuning functionalities have proven successful in many applications. We present the role of ML and HPC in analytics in the COVID-19 era in Figure 5. ML techniques can assist in knowledge extraction. For instance, they can be widely used in identifying patients with and without COVID-19. ML techniques can also be used to rank the comorbidities that can lead to deaths, ICU admission, oxygen need, etc. They can also be used to separate people that can easily become targets of COVID-19 due to their work nature or hygiene practices. In addition, they can assist in trends analysis, privacy-preserving analytics, and data distribution across organizations. In contrast, HPC techniques have more utility from an administration point of view; for example, community clustering involves large-scale data processing. In this regard, HPC techniques are handy to load, process, hold intermediate results, and deliver results to interested parties. Therefore, both these techniques play a critical role in the COVID-19 era. In some cases, both techniques are jointly used to perform relevant tasks. The potential discussion for more useful features of both technologies can be determined from the latest studies [9][10][11][12].

Role of ML and HPC Techniques in the General-Purpose Applications Related to the COVID-19 Era
Despite the specific applications in diagnostic and analytics, ML and the HPC techniques can be used for general-purpose applications. We summarize the potential generalpurpose applications of ML and HPC as follows. • Creating awareness about COVID-19 through sentiment analysis and recommending health tips. There exist multiple general applications of both HPC and ML that are assisting societies in multiple ways [13,14]. Furthermore, both techniques can be used for measuring and analyzing the data of multiple institutions for better analysis of the COVID-19 pandemic.

Discussion
This section briefly discusses the data-related challenges that hinder the effective utilization of ML and HPC techniques, privacy and security issues in the COVID-19 context, a prototype system to demonstrate the effectiveness of the AI and HPC techniques, and promising future research directions.

Data-Related Challenges That Hinder the Effective Utilization of ML and HPC Techniques
Although ML and HPC techniques are excellent at finding hidden knowledge from data, multiple issues related to collected data, such as unavailability of benchmark datasets, error-prone data, raw data, and limited access to related data due to privacy issues, are the main barriers in the adoption of these techniques. In addition, privacy and security issues are further important challenges that limit the data reusability and results dissemination across organizations [15]. We identified twelve technical features that can impact the adoption of the latest technologies (i.e., AI, ML, HPC etc). The challenges are listed below. In addition to the key challenges cited above, selection of the appropriate ML and HPC techniques is also very challenging. Hence, before applying any ML methods, understanding the data structure and related problems that can arise in analytics is of paramount importance.

Privacy and Security Issues
Due to the urgency of the situation to manage and develop solutions for the COVID-19 pandemic, information sharing in large volumes between international and national organizations (e.g., people with the virus (also known as electronic health records) and treatment data, contact tracing data, multimedia data produced by digital applications such as Zoom, Meet, Webex, Teams, etc.) has become inevitable. Furthermore, information sharing is an important building block for the realization of Big Data, and at the same time, it can lead to infringements on individual privacy. Due to the COVID-19 pandemic, different kinds of privacy and security issues have been reported in media around the world due to massive data collection and processing to curb the spread of COVID-19. Hence, the need to preserve people's privacy has become more urgent than ever. Due to privacy issues at a large scale, we may require more robust organizational, constitutional, and technical measures to address the security and privacy issues of digital applications in the COVID-19 era. Furthermore, heavy reliance on digital solutions, such as IoE and IoT sensors, Internet, SN, 5G networks, and HPC-based computing powers, will make the problem even worse. The ongoing pandemic has highlighted that existing privacy and security mechanisms are not sufficient to address the privacy and security implications emerging from the ongoing pandemic. They require a brand-new start to adhere to more stern data protection laws and regulations (e.g., HIPAA, IPA, GDPR, etc.) around the world to comfort people regarding their privacy requirements. We highlight the privacy and security issues of emerging Big Data applications and other digital technologies that are extensively used during the era of the COVID-19 pandemic in Figure 6. Recently, numerous pertinent solutions focusing on privacy preservation have been proposed [16][17][18][19][20][21]. However, general public trust in digital solutions use remained very low, mainly due to privacy issues. Therefore, the privacy and security issues presented in Figure 6 require a brand-new start to develop more practical solutions to serve the community in effective ways.

Architecture of Prototype System to Demonstrate the Effectiveness of AI and HPC Techniques in COVID Context
From the start of this unprecedented pandemic, a number of digital solutions have been proposed for different utilities such as digital contact tracing [22], IoT-based generic models for other infectious disease and COVID-19 suspects tracing [23], predictions of COVID-19 malignant progression [24], medical AI systems [25], expert systems for clinical guidelines [26], and real-time data based COVID-19 screening using wearable sensors data [27], to name a few. After a detailed synthesis of these systems, we found that all existing solutions were proposed to address one aspect of the COVID-19 pandemic. To address this deficiency, we designed an architecture of the prototype system to demonstrate the effectiveness of the AI and HPC techniques in the COVID-19 context and the need for such a system. The need for an AI-and HPC-powered system is demonstrated in Figure 7a, and the proposed prototype system architecture is shown in Figure 7b. The proposed prototype has three main modules that can be enhanced more in the future. The main modules, along with their brief description, are presented below.

•
Data collection: In this module, data are collected from relevant individuals and authentic sources such as healthcare departments and agencies. We identify three main sources of data that can contribute effectively to this pandemic, such as real-environment data (including social distancing, mask information, nature of contact with other people, spatiotemporal-based activities data, social circle information, etc.), authentic healthcare departments data (including previous disease history, travel information, ambient data, etc.), and IoT and SN data (including people behaviors).   In this subsection, we provide a comprehensive overview regarding the current state-ofthe-art (SOTA) studies involving the use of ML and HPC methods in the era of COVID-19. To date, a substantial number of SOTA studies have provided coverage regarding the use of ML and HPC methods in the era of COVID-19. We discuss the latest SOTA studies, their findings, and how these findings can address the clinical unmet needs in the COVID-19 in Table 2. Table 2. Comprehensive overview of the current SOTA studies involving the use of ML and HPC methods in COVID-19, their key findings, and role in addressing the clinical unmet needs in the era of COVID-19.

Category
SOTA Studies Discussions about Findings and Role in Addressing the Clinical Unmet Needs in COVID-19.

Study Details Key Findings of Each SOTA Study Role in Addressing the Clinical Unmet Needs in the COVID-19 Era
Pinter et al. [28] Predictions of mortality rate and time series of infected individuals Outbreak's modeling and mortality trends analysis Magar et al. [29] Virus-antibody sequence analysis and Identification of potential patients Robust identification of antibodies that potentially inhibit COVID-19 Aminu et al. [30] Accurate detection of people with COVID-19 with limited data Effective for the reliable diagnosis of COVID-19 Zeng et al. [31] Forecasting of patient survival probability Age-group-based mortality analysis to provide care to elderly people. Shah et al. [32] COVID-19 detection from X-ray images Help in diagnosing potential suspects as early as possible to mitigate the deadly disease Ashraf et al. [33] Predict the severity of disease or chances of death Significant contribution in separating vulnerable groups for ample care Prakash et al. [34] Impact analysis of various policies employed to control the disease Guidance for effective strategies that can help control the spread Ullah et al. [35] Classification of patients with and without COVID-19 Lowering the healthcare burden of patient diagnosis Rathod et al. [36] Effective crisis preparedness and management along with authorities' responses and mitigation strategies. Assistance in healthcare workers' burden analysis Rathod et al. [37] Detection of abnormal data for effective analysis Resource planning and accurate diagnosis ML Rashed et al. [38] Provides public awareness about the morbidity risk of COVID-19 Consistent and reliable forecasting patterns of the spread/decay phases of COVID-19 Hu et al. [39] Feasible analysis model for the treatment and diagnosis of COVID-19 Effective identifications of key symptoms and medicines for different syndromes Singh et al. [40] Reduce the high false-negative results of the RT-PCR Effectively handles the sensitivity issue that is associated with RT-PCR Peddinti et al. [41] Detection of COVID-19 cases in public places Helps officials in the accurate and faster diagnosis of the virus Saverino et al. [42] Changes implementation in rehabilitation services Staff satisfaction and stress reduction during pandemic times Lella et al. [43] Respiratory sound classification for potential patient identification Classification of asthma sounds, COVID-19 sounds, and regular healthy sounds Malla et al. [44] Real-time sentiment analysis of COVID-19 tweets Tweets prediction related to similar types of infectious diseases in the future Ibrahim et al. [45] Accurately diagnosing COVID-19 patients and analyzing severity level Detecting COVID-19 patients and classifying the severity degree from chest CT slices Roland et al. [46] Blood-test-based identification of patients with COVID-19 and estimate the mortality risk Automatic scanning of COVID-19 in a cost-effective way without any additional efforts Gros et al. [47] Accurate estimates of the cumulative medical load of COVID-19 outbreaks Understanding the outbreak dynamics and predicting future cases and fatalities Hack et al. [48] Promising treatments, including the virus' protein structure and attack mechanisms analysis, and resource planning Accelerate the science needed to develop treatments and strategies to combat COVID-19 West et al. [49] COVID-19 spread analysis among different populations and effective therapeutic response Virus transmission analysis at a very large scale LeGrand et al. [50] Drug discovery for targeting the proteins of the COVID-19 virus responsible for the current pandemic Drug discovery for COVID-19 by analyzing large-scale docking campaigns HPC Vermaas et al. [51] Virtual screening of billions of potential drug compounds to find COVID-19 proteins Contribution in developing drugs to combat the current COVID-19 pandemic Pérez-Moraga et al. [52] Drug repurposing to treat COVID-19 infection Contribution to the development of a cocktail for anticoronavirus treatments Mulholland et al. [53] Provides insights into the inner workings and mechanisms of the molecules of COVID-19 Can assist in suggesting potential drug candidates Zaki et al. [54] Identify drug/lead candidate for better inhibitory activity against the main protease of COVID-19. Useful to develop a therapeutic agent for COVID-19.
Pathak et al. [55] Innovative solutions for restricting COVID-19 spread Guidelines for pharmaceutical companies to devise better cures Hybrid Bhati et al. [56] Combines HPC and ML to accelerate drug discovery Target proteins analysis to identify lead compounds Bharadwaj et al. [57] Development of potential vaccines at a much diminished time and lower cost Tackling pandemics and overcoming the crisis with computing intelligence

Promising Research Directions for Future in the Era of COVID-19
In the future, it will be interesting to devise good models to make sense of the data, accurate predictions, and forecasting leveraging heterogeneous sources data, variations handling in epidemics data, and accurate trends analysis through model fusions. In addition, proposing COVID-19-specific models and new evaluation criteria(s) to evaluate the effectiveness of these models is another promising research direction. Furthermore, designing and developing robust and accurate privacy-preserving models for individual privacy preservation is imperative. To this end, the development of a prototype system that can address the privacy implications of all epidemic control measures demonstrated in Figure 1 is handy for the well-being of the community.

Conclusions
This paper presented the role of the latest technologies (i.e., ML and HPC) in the fight against the unanticipated challenge of COVID-19. Specifically, we presented an overview of the epidemic features that require ML and HPC techniques to serve humankind in an effective way by mitigating the pandemic through technology. We described the multitude of heterogeneous types of data collected in the COVID-19 era and the remarkable opportunities they offer when analyzed with advanced ML and HPC techniques. We highlighted the effectiveness of ML and HPC techniques in the mitigation and control of the COVID-19 pandemic through their unique use in a variety of applications. We provided potential research directions and challenges in the technical adoption of ML and HPC due to data issues (i.e., unavailability, sparsity, imprecise, data poisoning, etc.). We believe that this unique study provides a solid foundation for future studies in this area in relation to pandemic features and corresponding data. The main contributions of this study are given as follows.

•
It presents an overview of epidemic features that require ML and HPC techniques to serve humankind in an effective way by controlling the pandemic through technology. • It describes the multitude of heterogeneous types of data collected in the COVID-19 era and the remarkable opportunities they offer when analyzed with advanced ML and HPC techniques. • It highlights the effectiveness of ML and HPC techniques in the mitigation and control of the COVID-19 pandemic through their unique use in three contexts. • It provides potential research directions and challenges that hinder the adoption of ML and HPC due to data issues (i.e., availability, sparsity, data poisoning, etc.) • It discusses the privacy and security issues and architecture of a prototype system to demonstrate the proposed research. • To the best of our knowledge, this is the first work to provide a concise overview of the ML and HPC techniques used in the COVID-19 era with respective data in loop.
In addition to the key contributions given above, a vibrant area of research is developing flexible anonymization methods that can easily be tuned, based on the original data characteristics or circumstances to foster data reusability. In addition, using ML and HPC techniques to process large-scale data in a privacy-preserving manner is of paramount importance for future endeavors.

Data Availability Statement:
The statistical data used to support the findings of this study are included within the article.

Conflicts of Interest:
The author declares no conflict of interest.