Digital Twins in Pharmaceutical and Biopharmaceutical Manufacturing: A Literature Review

: The development and application of emerging technologies of Industry 4.0 enable the realization of digital twins (DT), which facilitates the transformation of the manufacturing sector to a more agile and intelligent one. DTs are virtual constructs of physical systems that mirror the behavior and dynamics of such physical systems. A fully developed DT consists of physical components, virtual components, and information communications between the two. Integrated DTs are being applied in various processes and product industries. Although the pharmaceutical industry has evolved recently to adopt Quality-by-Design (QbD) initiatives and is undergoing a paradigm shift of digitalization to embrace Industry 4.0, there has not been a full DT application in pharmaceutical manufacturing. Therefore, there is a critical need to examine the progress of the pharmaceutical industry towards implementing DT solutions. The aim of this narrative literature review is to give an overview of the current status of DT development and its application in pharmaceutical and biopharmaceutical manufacturing. State-of-the-art Process Analytical Technology (PAT) developments, process modeling approaches, and data integration studies are reviewed. Challenges and opportunities for future research in this ﬁeld are also discussed.


Introduction
Competitive markets today demand the use of new digital technologies to promote innovation, improve productivity, and increase profitability [1].The growing interests in digital technologies and the promotion of them in various aspects of economic activities [2] have led to a wave of applications of the technologies in manufacturing sectors.Over the years, the advancements of digital technologies have initiated different levels of changes in manufacturing sectors, including but not limited to the replacement of paper processing with computers, the nurturing and promotion of Internet and digital communication [1], the use of programmable logical controller (PLC) and information technology (IT) for automated production [3], as well as the current movement towards a fully digitalized manufacturing cycle [4].The digitalization waves have enabled a broad range of applications from upstream supply chain management, shop floor control and management, to post-manufacturing product tracing and tracking.
Among the new digital advancements, the development of artificial intelligence (AI) [5], Internet of Things (IoT) devices [3,5] and digital twins (DTs) have received attention from governments, agencies, academic institutions, and industries [6].The idea of Industry 4.0 has been put forward by the community of practice to achieve a higher level of automation for increased operational efficiency and productivity.Smart technologies under the umbrella of Industry 4.0, such as the development of the IoT, big data analytics (BDA), cyber-physical systems (CPS), and cloud computing (CC) are playing critical roles in stimulating the transformation of current manufacturing to smart manufacturing [7][8][9][10].With the development of these Industry 4.0 technologies to assist data flow, a number of manufacturing activities such as remote sensing [11,12], real-time data acquisition and monitoring [13][14][15], process visualization (data, augmented reality, and virtual reality) [16,17], and control of all devices across a manufacturing network [18,19] is becoming more feasible.The implementation of Industry 4.0 standards by institutions and companies encourages them to implement a more robust, integrated data framework to connect the physical components to the virtual environment [1], enabling a more accurate representation of the physical parts in digitized space, leading to the realization and application of DTs.
The concept of creating a "twin" of a process or a product can be traced back to the late 1960s when NASA ensembled two identical space vehicles for its Apollo project [20][21][22].One of the two was used as a "twin" to mirror all the parts and conditions of the one that was sent to the space.In this case, the "twin" was used to simulate the real-time behavior of the counterpart.
The first definition of a "digital twin" appeared in 2002 by Michael Grieves in the context of an industry presentation concerning product lifecycle management (PLM) at the University of Michigan [23][24][25].As described by Grieves, the DT is a digital informational construct of a physical system, created as an entity on its own and linked with the physical system [24].
Since the first definition of DT, interpretations from different perspectives have been proposed, with the most popular one given by Glaessegen and Stargel, noting that a DT is an integrated multiphysics, multiscale, probabilistic simulation of a complex product and uses the best available data, sensors, and models to mirror the life of its corresponding twin [26].It is generally accepted that a complete DT consists of a physical component, a virtual component, and automated data communications between the physical and virtual components [2].Ideally, the digital component should include all information of the system that could be potentially obtained from its physical counterpart.This ideal representation of the real physical system should be an ultimate goal of a DT, but for practical usage, simplified or partial DTs are the dominant ones in industry currently, including the employment of a digital model where the digital representation of a physical system exists without automated data communications in both ways, and a digital shadow where model exists with one-way data transfer from physical to virtual component [2].
Together with the US Food and Drug Administration (FDA)'s vision to develop a maximally efficient, agile, flexible pharmaceutical manufacturing sector that reliably produces high quality drugs without extensive regulatory oversight [27], the pharmaceutical industry is embracing the general digitalization trend.Industries, with the help of academic institutions and regulatory agencies, are starting to adopt Industry 4.0 and DT concepts and apply them to research and development, supply chain management, as well as manufacturing practice [9,[28][29][30][31].The digitalization move that combines Industry 4.0 with International Council for Harmonisation (ICH) guidelines to develop an integrated manufacturing control strategy and operating model is referred to as the Pharma 4.0 [32].
However, according to the recent survey conducted by Reinhardt et al. [33], the preparedness of the industry for this digitalization move is still unsatisfactory.It is noted that most pharmaceutical and biopharmaceutical processes currently rely on quality control checks, laboratory testing, in-process control checks, and standard batch records to assure product quality, whereas the process data and models are of lower impact.Within pharmaceutical companies, there are gaps in knowledge and familiarization with the new digitalization move, resulting in a roadblock in strategic and shop floor implementation of such technologies.
With the rapid development of DT and its building blocks, state-of-the-art review studies concerning pharmaceutical and biopharmaceutical manufacturing are limited.This paper aims to provide a literature review and a discerning summary of the current status of DT development and its application in the pharmaceutical industry, focusing on small and large molecule drug product manufacturing for the purpose of identifying current and future research directions in this area.The remainder of the paper is structured as follows.A description of the general DT framework is provided in Section 2, followed by a detailed review of DT in pharmaceutical and biopharmaceutical manufacturing in Sections 3 and 4, respectively.More specifically, we intend to provide readers with a summary of the critical components of an effective DT and the progress of implementing these components in pharmaceutical and biopharmaceutical manufacturing.After discussing the current status, we discuss the challenges associated with the development and application of DT in each section, with conclusions at the end.

Digital Twin Framework
As mentioned in Section 1, a DT has a physical component, a virtual component, and automated data communication in between, which is realized through an integrated data management system.This synergy between the physical, virtual space, and the integrated data management platform is demonstrated in Figure 1.The physical component consists of all manufacturing sources for data, including different sensors and network equipment (e.g., routers, workstations) [34].The virtual component needs to be a comprehensive digital representation of the physical component in all aspects [8].The models are built on prior knowledge, historical data, and the data collected in real-time from the physical components to improve its predictions continuously, thus capturing the fidelity of the physical space.The data management platform includes databases, data transmission protocols, operation data, and model data.The platform should also support data visualization tools in addition to process prediction, dynamic data analysis, and optimization [34].Sections 2.1-2.3 discuss each component in more detail.

Physical Component
Sourcing data from the physical process and component is one of the most essential elements in the development of a DT.The critical process parameters (CPPs) for equipment can be obtained either manually from the human-machine interface (HMI) generally provided by the equipment manufacturer or automated using several machine-machine interfaces (MMI).There are several standard MMIs such as Open Platform Communications (OPC), OPC Data Access (OPC DA), OPC Unified Architecture (OPC UA), and Modbus [35] for automating the data transfer between equipment software to a control or historian software.OPC UA is considered to be the current standard as it has added features such as multiple tags along with their properties [36].Data can also be transmitted over the network using message queue telemetry transport (MQTT), Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), etc.The critical quality attributes (CQAs) for the product are determined using soft sensors, and they usually employ network protocols for data transmission [37].Soft sensors are a combination of hardware sensors with their propriety software-enabled models that help obtain information about the process [38].Soft sensors have been implemented in several process industries for process monitoring and control.These sensors have been used to measure cake resistance in freeze-drying applications [39], measuring temperature from pyrometers [40], estimating product quality during crude distillation [41], and have also found several other industrial applications [42][43][44][45].Continuous acquisition of large amounts of data requires a systematic framework such as a data historian to store the historical data.Several studies have employed local data historians [46,47] to create an information infrastructure enabling the synchronous collection of process and sensor data.Zidek [48] demonstrated the Industry 4.0 concept for small-medium size enterprises (SMEs) where the quality of the product was assessed by a DT, and the communication between the OPC server and PLC system was achieved using OPC-UA.A combination of network and OPC communication protocols was used by Kabugo [35] to develop the cloud-based analytics platform for a waste-to-energy plant.Several other studies focusing on smart factories according to Industry 4.0 standard have utilized similar communication protocols [49][50][51].

Virtual Component
The virtual component consists of a collection of models to simulate the physical process and to analyze the current and future state of the system.With appropriate models, the virtual components can be used to perform real-time process simulation and system analyses, including but not limited to sensitivity studies that identify the set of most influential factors [52], design space studies that yield feasible operating conditions [53], and system optimization [54].Results from real-time process simulation can be sent to the data management platform to visualize the process, and the results of system analyses, together with the preprogrammed expert knowledge, can be used to deliver control commands to the physical counterpart to ensure process and component conformity.
Different model types exist for use in DT, namely mechanistic models, data-driven models, and hybrid models.Mechanistic models strongly rely on process knowledge and understanding, as the development is based on fundamental principles and process mechanisms [55].The resulting models are highly generalizable with physically interpretable variables and parameters, with a relatively low requirement from process data.Often, however, this comes with high development and computation costs [54,56].In contrast, data-driven models depend only on process data, and no prior knowledge is needed [55].The advantages include more straightforward implementation, relatively low development and computational expenses, and convenient online usage and maintenance.However, the poor interpretability, poor generalizability, and the need for large amounts of data present limitations of this modeling method [55,57,58].A hybrid modeling strategy is then introduced to balance the advantages and disadvantages of the other two model types [57,[59][60][61].With different hybrid structures, the hybrid modeling method offers improved predictability and flexibility in process modeling [58,61,62].
In addition to the development of models, the computational cost is also a main concern in the virtual component of DT.Since a fully developed DT aims to represent the physical counterpart and perform system analyses, it would require extensive computational power.For a large system, local desktops and consumer-grade Central Processing Units (CPUs) cannot meet the demand.
Many computationally intensive models can run in parallel using high-performance computing (HPC) to enhance the computational speed to achieve real-time or near-real-time simulations [63][64][65].
To develop models, perform simulations, and conduct system analyses for the virtual component of the DT framework, appropriate modeling platforms are needed.Various commercial modeling platforms and software packages have been developed and have become available.Among all the available ones, MATLAB and Simulink (MathWorks) [66], COMSOL Multiphysics (COMSOL) [67], gPROMS FormulatedProducts (Process Systems Enterprise/Siemens) [68], aspenONE products (AspenTech) [69], and STAR-CCM+ (Siemens) [70] are commonly seen in process industries.These platforms offer a large collection of models and/or tools that enable users to create or incorporate unit operations and flowsheet models based on the actual process.Some of these companies have also been developing local and cloud platforms (e.g., gPROMS Digital Applications Platform [71] from Process System Enterprise/Siemens, Siemens Mindsphere [72]) for hosting and computing models, for integrating physical component, and for providing data management functions, providing end-to-end DT solutions.Others have focused on improving compatibility with common data management and Internet of Things (IoT) integration platforms, which are described next in Section 2.3.

Data Management
In addition to model management and simulation platforms, several commercial IoT Platforms as a Service (PaaS), such as Predix (General Electric) [73], Mindsphere (Siemens) [72], SEEQ [74], TrendMiner [75], TIBCO Cloud [76], etc. have been developed.These platforms offer a large collection of tools that enable users to develop, visualize, analyze, and manage data on cloud servers.Some cloud service companies, such as Amazon Web Services (AWS) [77], Microsoft Azure [78], Google Cloud [79], IBM Watson [80], offer multipurpose platforms which are more versatile [81].These platforms also offer distributed computing, data analysis tools, interaction protocols, and data and device management tools.Several of the interface protocols mentioned in Section 2.1 are also applicable to data transfer in the cloud.These platforms also provide large data storage capacities at affordable prices.Industrial grade IoT platforms are developed with a higher emphasis on secure device connectivity and cyber-security [82].
Seamless data integration in most cases is mainly hindered by a large amount of heterogeneity between manufacturers and services based on the software used and data formats supported [83].Some cloud services provide their solutions as optional application program interfaces to integrate with other software, but several are left out due to the large number of software present.Thus, there is a need for a standard file format that needs to be employed to encourage cross-platform integration.The World Wide Web Consortium (W3C) has proposed Extensible Markup Language (XML), Resource Description Framework (RDF), among other markup languages to model information explicitly [84].XML [85] provides the user with the freedom to define tags and data structures which are both readable by machines and humans.This syntax is further developed to incorporate the graph structure of the information within the RDF framework.The W3C also proposed Web Ontology Language (OWL) for information modeling.OWL is a vocabulary extension of RDF and is currently in use with XML and RDF.Unfortunately, these files become cumbersome when large databases need to be stored [86]; thus, new standard language Structured Query Language (SQL) for relational databases was recommended by the American National Standards Institute (ANSI) [87].SQL databases are commonly found on cloud servers; however, their difficulty in horizontal scalability has led to the development of Non-SQL (NoSQL) databases, which are easily scalable vertically and horizontally [88] and can be hosted on cloud servers.Cloud servers are not limited to storage, but they offer large and scalable compute capabilities that can be leveraged for quick data analysis and simulations.A web service can also be hosted on a cloud server to create an online dashboard to visualize both the real-time physical data and the data from the simulation/data analysis.

Applications of Digital Twin
DT frameworks, as presented in Sections 2.1-2.3, are implemented across various industries [2,4,89] for simulation, real-time monitoring, control, and optimization to handle "what-if" or risk-prone [89] scenarios for improving process efficiency, safety analysis, maintenance, and decision-making [24].This section provides a brief overview of such applications [4] within various industries such as aerospace, energy, manufacturing, automobile, chemical, healthcare, semiconductor, and city planning, as shown in Table 1.A commercial application of fully integrated DT was first demonstrated by General Electric (GE) at the Minds + Machines event in 2017 for the GE90 engine [104], with 300 engines integrated together to supply historical and real-time process information for predicting process failure, mitigating risks, and optimizing maintenance costs.Similar applications in the aviation industry include DT of airplanes used for training simulations [100] and aircraft health management [98,99,105,106] for damage assessment and rectification.The aerospace industry focuses on DT applications for the development of next-generation outer-space vehicles, following a successful demonstration of Apollo 13 by NASA [26,101] rectifying maintenance problems.DT applications in the energy sector include GE's wind farm [92] and steam turbines [4,[90][91][92].These DTs are capable of integrating historical data in terms of process, fuel costs, electricity, process wear and tear, and weather forecasts to suggest possible real-time modifications for reducing operating costs.Smart manufacturing is another sector benefitting from DT applications through digitization of product manufacturing [96,97] and development of digital shop floor (DTS) [2,18,[93][94][95][96], incorporating real-time information of manufacturing plant, state of production machinery, environmental conditions, and its effects on manufactured products.DT applications in the area of automobile and transportation focus on automation of vehicles [107] and long-distance transportations [102] along with analysis of maintenance [22] and risk-prone issues [108].The healthcare industry includes applications such as virtual replica of patients used for surgical operation training [4], sensors for health monitoring [109], the study of health of a country's population [110], and the "The Living Heart" [111] project developed for the analysis of blood circulations.Furthermore, city planning is another domain where virtual replica of cities, known as "smart cities" [103] are used for urban city planning and optimal resource allocation [112].Such efforts promote the construction of smart, sustainable cities [113] while providing a holistic view of cross-vertical optimization of overall city infrastructure [114].
From the applications reviewed, it is clear that the concept of DT is rapidly being employed across various domains, given its advantages.However, it is important to identify the challenges associated with the development and application of integrated frameworks for the systematic utilization of DTs.

Challenges
Many research and review articles have discussed challenges in the implementation of DTs, and the issues can be categorized as time-, safety-, and mission-critical [115][116][117][118][119][120].In this section, issues that are more relevant to the manufacturing sector and modeling community are presented, including data communication, model development and maintenance, cyber-physical security, and real-time capability.
One of the challenges in achieving a DT framework is to establish a stable two-way connection between the physical and virtual components to support real-time integration.Heterogeneity in equipment manufacturers and their software [116] is a major hurdle that needs to be addressed using a common interface or file format that could make interactions between several software easier.Several prominent manufacturers are already making strides by supporting commonly used OPC UA/DA interfaces.The creation of a database system that is not only vertically and horizontally scalable but also structured would also be important in such a framework.Thus, migrating to a NoSQL database would be recommended, but in this case, the manufacturing industry lags since several software currently only save data in SQL databases.Additionally, the resolution of sensor data, latency within the data communication channel, increased volume and variety of data, and the requirement of fast storage and retrieval are all challenges within this context.
The development of virtual models is often costly and challenging due to the lack of a complete understanding of the physical process [93].This deficiency sometimes leads to inconsistences between models and the physical system.These inconsistencies need to be appropriately identified and handled, which can impose challenges to the modeling and operation teams.To resolve the issue, systematic model development approaches, along with appropriate model maintenance strategies are needed.Moreover, since the models need to perform simulation and system analyses in real-time, efficient and accurate algorithms that can utilize available information in real-time and continuously are crucial, presenting a challenge to both the modelers and allocation of computing resources.
In addition to the modeling aspects, cyber-physical security is another area of concern to ensure the normal operation of physical and virtual components against malicious attacks [121].In a fully integrated DT, large data sets with important and potentially confidential information are exchanged, which require secure communication and processing among all systems [122].

Digital Twin in Pharmaceutical Manufacturing
In pharmaceutical manufacturing, the potential of using DTs to facilitate smart manufacturing can be seen in different phases of process development and production.In the process design stage, the use of a DT can significantly accelerate the selection process of a manufacturing route and its unit operations as it is able to represent physical parts with various models.The understanding of process variations can be obtained from DT simulations, which allows for the prediction of product quality, productivity, and process attributes, reducing the time and costs for physical experiments [123].In the operation phase, real-time process performance can be monitored and visualized at any time, and the DT can analyze the system in a continuous manner to provide control and optimization insights of the process [123].The DT can also be used as a training platform for operators and engineers, as the real-time scenario simulation and on-the-job feedback can be realized through DT.With regards to pre-and post-manufacturing tasks, the DT platform can assist with tasks including but not limited to material tracking, serialization, and quality assurance.
Some key requirements for achieving smart manufacturing with DT include real-time system monitoring and control using Process Analytical Technology (PAT), continuous data acquisition from equipment, intermediate and final products, and a continuous global modeling and data analysis platform [29].The pharmaceutical industry has taken several steps towards this by using techniques such as Quality-by-Design (QbD) [124], Continuous Manufacturing (CM) [124], flowsheet modeling [125], and PAT implementations [126].Some of the tools have been investigated extensively, but the overall integration and development of DTs are still under infancy.
This section reviews the progress of current research and industry applications towards DTs in pharmaceutical manufacturing from aspects of PAT sensing, model building, and data integration, which corresponds to the physical component, virtual component, and data management parts in the general DT framework.Challenges and opportunities are discussed at the end of this section.

PAT Methods
A key component in the development of a DT is data collection.In addition to readings from equipment, (critical) quality attributes also need to be collected from physical plants in a timely manner for use in the virtual component.The models and analyses are reliant on good data.Several traditional technologies exist to determine CQAs such as sieve analysis and High-Performance Liquid Chromatography (HPLC), but these cannot provide real-time data and are performed away from the production line rather than in-line or at-line.Thus, PAT tools have been explored and developed to address these issues [127].
PAT tools in the pharmaceutical industry have a wide range of applications, including measuring particle size of crystals [128], blend uniformity [129], testing tablet content uniformity [130], etc. Spectroscopy tools (Nuclear Magnetic Resonance (NMR), Ultraviolet (UV), Raman, near-infrared, mid-infrared, online mass spectrometry) constitute one of the major techniques used to measure the CQAs of pharmaceutical processes.Raman and Near-Infrared Spectroscopy (NIRS) are commonly used in the industry.Raman Spectroscopy has been employed for the on-line monitoring of powder blending processes [131].Since acquisition times for Raman can be higher, NIRS is preferred for real-time measurements.NIRS has been used for real-time monitoring of powder density [15] and blend uniformity of processes [129].NIRS has also been integrated with control platforms for process monitoring and control [132].Baranwal et al. [133] employed NIRS to replace HPLC methods to predict API concentration in bi-layer tablets.PAT tools have also been used by the pharmaceutical industry to determine the particle size distribution of the product [134].Several available optical tools such as Focused Beam Reflectance Measurement (FBRM) [135], a high-resolution camera system [136] have also been employed in the industry for particle size analysis.Some studies have utilized a network of PAT tools to achieve a monitoring system to help monitor and control a unit process [127,137].
The US FDA has also taken steps in promoting the use of PAT tools in pharmaceutical manufacturing with the goal of ensuring final product quality [138].The pharmaceutical industry has adopted PAT in various applications throughout the drug-substance manufacturing process [139].Although this has certainly led to an increase in the usage of PAT tools, their applications still remain focused on research and development rather than in full-scale manufacturing [126].In the limited number of cases where they were employed in manufacturing, they have been successful in reducing manufacturing costs and improving the monitoring of product quality [140].The development of different PAT methods, with their compelling application as an integral part of a monitoring and control strategy [141], has established a building block in gathering essential data from the physical component, enabling the further development of process model and DT.

Process Modeling
DTs highly depend on the use of data and models, and in the pharmaceutical industry, there is a growing interest in the development and application of methods and tools that facilitate that [142].Different types of models have been developed for batch and continuous process simulations, material property identification and prediction, system analyses, and advanced control.Papadakis et al. recently proposed a framework for selecting efficient reaction pathways for pharmaceutical manufacturing [143], which includes a series of modeling workflows for reaction pathway identification, reaction and separation analysis, process simulation, evaluation, optimization, and operation [142].The overall framework would yield an optimized reaction process with identified design space and process analytical technology information.The models developed under this framework can all be used as the virtual component within a DT framework to provide further process understanding and control of the manufacturing plant.
As mentioned in Section 2.2, the modeling approaches can be classified as mechanistic modeling, data-driven modeling, and hybrid modeling.For mechanistic modeling approaches in pharmaceutical manufacturing, the discrete-element method (DEM), finite-element method (FEM), and computational fluid dynamics (CFD) are often used [144].To simulate the particle-level or bulk behavior of the material flow in different pharmaceutical unit operations, DEM is a powerful tool and has been applied widely [145][146][147], though its high computational cost limits its practical use when running locally.With HPC and cloud computing, it is possible to integrate DEM simulations with the overall process, resulting in a near-real-time model.For model fluid flow in pharmaceutical processes, including API drying and fluidized beds, CFD and FEM are popularly implemented [144].These two methods are also heavily utilized in biopharmaceutical manufacturing (see Section 4.2).
Data-driven modeling methods involve the collection and usage of a large amount of experimental data to generate models, and the resulting models are based on the provided datasets only.Commonly implemented approaches in pharmaceutical manufacturing include the artificial neural network (ANN) [148,149], multivariate statistical analysis, Monte Carlo [150], etc.These methods are less computationally intensive, but due to the lack of underlying physical understanding in the trained models, the prediction outside of the space of the dataset is often unsatisfactory.
There is also a recent trend in developing various types of hybrid modeling techniques to model complex pharmaceutical manufacturing processes, while lowering the demand of computational cost and data availability.Population balance modeling (PBM), with a comparatively lower computational cost, has been extensively used to model blending and granulation processes [64,151], and a PBM-DEM hybrid model has also been used to improve model accuracy while maintaining reasonable computational costs [152].Other semi-empirical hybrid models, such as the ones that incorporate material properties into process models [153], and to investigate the effect of material properties in residence time distribution (RTD) and process parameters [146,[154][155][156][157], have also been developed for different powder processing unit operations [52,158].These models, when incorporated with a full DT framework, will facilitate the overall product and process design and development, accelerating the drug-to-market timeline.
Table 2 provides a feature-based comparison of various models used in pharmaceutical manufacturing applications.The characterization of computational complexity is based on the typical computational cost for a single unit operation.The feature of real-time capability emphasizes the ability for a model to produce simulation or prediction results in real-time and optimally, in-sync with the equipment.This feature highly depends on computational complexity.Even though mathematical and semi-empirical modeling approaches have this capability, they are mostly trained and implemented offline.Real-time applications are rarely seen in the context of pharmaceutical manufacturing.For adaptive modeling capability, the modeling approaches that are able to incorporate data are advantageous as new data can be used to update the models.The online usage of these models in adaptive mode can hardly be found.In addition to developing models for single pharmaceutical unit operations, a flowsheet model integrating the entire manufacturing process can be used to predict the process dynamics affected by material properties and operating conditions of different unit operations.More importantly, systematic process analysis of the flowsheet model, such as sensitivity analysis, design space identification, and optimization, can all be performed with the flowsheet model.This provides insight into the characteristics and bottlenecks of the process and thus facilitates the development of control strategies [125].Throughout the years of development, many researchers and pharmaceutical companies have developed mature approaches in conducting these analyses offline during the process design phase [52, 56,125,159,160].Flowsheet models are needed for the development of DTs.However, flowsheet models are stand-alone, so they cannot automatically update adapting to the physical plant.In current research, there is limited communication between the flowsheet model and the plant, which is a challenge in the development of a DT.

Data Integration
The implementation of IoT devices in pharmaceutical manufacturing lines leads to the acquisition of vast amounts of data.This collection of process data and CQAs needs to be transmitted to the virtual component in real-time and in an efficient manner.In addition to these, several pharmaceutical process models also require material properties for accurate prediction.Thus, a central database location is required for access to all datasets for the virtual component [46].All data transfer protocols discussed in Section 2.3 are applicable here as well.In addition to these, the applications and databases should also be compliant with 21 CFR Part 11 data integrity requirements in accordance with US FDA's guidance [161].The database not only serves as a warehouse for real product data but can also be used to store results from simulations performed in the virtual component and optimized process parameters.It would also serve the purpose of relaying back these optimized process parameters to the real product.
Several studies have attempted to achieve an integrated data framework in downstream pharmaceutical manufacturing [46,84,132,[162][163][164][165].Some of these studies were focused on implementing a control system for the direct compression line [132,157,165].Cao et al. [46] presented an ISA-88 compliant manufacturing execution system (MES) where the batch data were stored on a cloud database as well as on a local data historian.The communications between the equipment and the control platform were performed in a similar manner for all the studies.The process control system (PCS) created a database based on the input recipe, and the database was replicated directly into the local data historian.The communication between the historian and PCS can be achieved using TCP/IP and OPC since each software is hosted on different computer systems on the same network.The historian database can in turn be duplicated onto the cloud using network protocols such as MQTT, HTTPS, etc.Some authors have also presented ontologies for efficient data flow for laboratory experiments performed during pharmaceutical manufacturing [166][167][168].Cao et al. [46] also addressed the collection of laboratory data in an ISA-88 applicable recipe-based electronic laboratory notebook-many of the presented studies focused primarily on integrating one component of a completely integrated data management system.Figure 2 illustrates a sample data integration framework, where data collected from the manufacturing plant as well as laboratory experiments are uploaded to a cloud database using the mentioned protocols.The data can then be used in the virtual component for simulations, and corrective actions can be sent back to the control platform.

Challenges and Opportunities
Integrating all building blocks mentioned in Sections 3.1-3.3,the authors are visioning a fully integrated, model-centric DT framework for pharmaceutical manufacturing, as shown in Figure 3.The physical plant continuously sends process data to the virtual end, establishing a data inflow to achieve continuous process monitoring and data storage.Once the real-time data are received, process visualization and evaluation can be performed in real-time using visualization tools and process models.Automatic control based on evaluation results can then be executed to modify process operations if it is needed.The overall data and information flow become a continuous, real-time, integrated loop.Models can be updated based on plant measurements and changes by implementing hybrid or adaptive modeling techniques, and real-time model evaluation results that support the identification of critical process parameter boundaries, process optimizations, and material/process characterization can guide the operational updates of the plant.Our review has showcased that the pharmaceutical industry is on the move towards adopting a full DT.Currently, continuous monitoring of processes, storage of operation data, process visualization, and model-predictive control have been implemented in pharmaceutical applications.Building blocks are in place for all three components, but there still exist some key challenges and gaps.In terms of process monitoring and the use of PAT, though the use of spectroscopy to estimate product compositions has become a routine, the accuracy of measurements in low-dose drug products, the consideration and handling of outside interferences, and the maintenance of calibration models (i.e., the robustness of calibration) are all common problems.For low-dose drug measurements, though there are new tools such as NIRS and in-line UV spectroscopy, the accuracy can be improved by increasing sampling frequency and spectra analysis.The outside interference issue may be resolved by implementing various iterative optimization technologies, as recent studies have demonstrated the capability of such an approach [169,170].With regard to the calibration model maintenance, different offline, adaptive methodologies have been well presented by Kadlec et al. [171], but the online, continuous update with streaming data may be an option moving forward.
At the virtual end, recent research and technology development have shaped the general framework and applications.Libraries of models and system analysis tools exist to develop a fully connected virtual model.However, as mentioned in Section 3.2, the computational cost for many complex and integrated models is high, requiring the use of cloud and/or high-performance computing.The high computational requirement also hinders the use of models in real-time, which is a key component of the DT framework [4].To resolve this issue, efficient computational algorithms and reduced order modeling approaches need to be implemented, as well as the efficient distribution of computational resources.Another relevant issue is that most models developed for the pharmaceutical industry are static, meaning that they only reflect the system at the time that the models are developed.The models do not update themselves as new data become available.Model maintenance is, therefore, required [172], and the goal is that this can be performed automatically by the virtual component [171,173,174].These model maintenance problems can also be viewed as issues caused by a number of drifts (i.e., concept drift, model drift, data drift, sensor drift).Methodologies in handling drifts have been extensively studied in many electrical and computer engineering papers [175][176][177][178], but case studies in pharmaceutical manufacturing have not yet been reported.
One of the most prominent issues includes the information communication between the two components.Table 3 illustrates a comparison between previous data integration frameworks that have been developed for pharmaceutical manufacturing.The limitations of each of these studies highlight the inability of current software tools and solutions to build a complete DT.Though the integration capability has been improving, it is noted that most of the current applications in the pharmaceutical industry only transfer data from the physical plant to the virtual component.The reverse is rarely seen.To have a fully integrated and automated DT, the information flow from the virtual component to the physical plant also needs to be established.The virtual plant should be able to change system settings and control the physical plant to help achieve an optimized process within the design space.Presented a cyber-physical framework for Process Analytical Technology (PAT) tools for pharmaceutical manufacturing N/A Data integration was only performed for PAT tools without any integration of analytics In addition, integrating data inside the physical manufacturing plant faces issues with homogeneity of the data format used by manufacturers [116].A full manufacturing cycle requires the collection of online and offline data from different departments and software.Though an increasing number of companies are adopting standard data formats and transfer protocols, the coordination among all different data, software, and platforms is still a challenge.Currently, this coordination is more of a business and engineering decision within the companies using these systems.Poor integration and coordination often lead to the burden of using and maintaining multiple platforms and software.Because of this, many companies now prefer to purchase equipment and systems from a sole vendor, which is both a challenge and an opportunity for equipment and system providers.
The use of cloud databases and cloud-based data management systems, data availability, stability of service, storage volume, and information security are all critical issues to be addressed [118].As data are stored on the cloud, these data should be available when needed, which demands a high stability service and a rigorous business continuation plan.Many cloud platforms are using distributed technologies and cloud backups to resolve this issue, but the validity and reliability of the solutions need to be carefully studied before implementing them [179].Moreover, with the implementation of IoT devices and various types of sensors, the volume of data collected from the manufacturing cycle can be extremely large.Even though many cloud platforms claim that they can coordinate the demanded storage capacity, it would result in an increasing burden to the company if the storage cost is high.With regard to information security, the issue is not new to the field of cloud storage, but it is particularly relevant to the pharmaceutical industry since the majority of the information is highly confidential, and cases have shown that a vulnerable cyber system in pharmaceutical companies can cost millions or even billions of dollars.This challenge gives rise to opportunities in research and employment of cyber-physical security systems to ensure the safety and confidentiality of the information being transferred.This field has been a hot topic, especially in electrical and computer engineering disciplines.Methodologies used in securing smart grids, statistical-based authentication systems, physical and virtual cyber barriers, etc. can be implemented in pharmaceutical manufacturing to develop a secure DT.
Finally, regulatory perspective is an important consideration in developing and applying DT in pharmaceutical manufacturing.The US FDA has developed modeling capability and has granted funding to academic institutions to explore the appropriate application of process models and DTs in the field.Various guidelines, reports, and presentations have all demonstrated that the regulatory experience and exposure to the DT concept is currently evolving [27,180].Though DT development is not required for regulatory approval, its components can definitely offer pharmaceutical companies and regulatory bodies more insight into the process and product.

Digital Twin in Biopharmaceutical Manufacturing
Biopharmaceutical manufacturing focuses on the production of large molecule-based products in heterogeneous mixtures, which can be used to treat cancer, inflammatory, and microbiological diseases [181,182].To fulfill the FDA regulations and obtain safe products, biopharmaceutical operations should be strictly controlled and operate under a sterilized process environment.
In recent years, there is an increasing demand for biologic-based drugs that drives the need for manufacturing efficiency and effectiveness [183].Thus, many companies are transitioning from batch to continuous operation mode and employing smart manufacturing systems [182].DT integrates the physical plant, data collection, data analysis, and system control [4], which can assist biopharmaceutical manufacturing in product development, process prediction, decision making, and risk analysis, as shown in Figure 4. Monoclonal Antibody production is selected as an example to represent the physical plant, which includes cell inoculation, seed cultivation, production bioreactor, recovery, primary capture, virus inactivation, polishing, and final formulation.These operations produce and purify protein products.Quality (majorly protein structure and composition) and impurities need to be monitored and transported to a virtual plant for analysis and virtual plant updates.Virtual plant includes plant simulation, analysis, and optimization, which guide the physical plant diagnosis and update with the help of the process control system.Integrated mAb production flowsheet modeling, bioreactor analysis and design space and biomass optimization are selected as examples shown in the three sections in the figure.However, the capabilities of virtual plant are not limited to the examples list above.To understand the progress of DT development in biopharmaceutical manufacturing, this section reviews the process monitoring, modeling and data integration (virtual plant, physical plant communication) in the existed industry and analyzed possibilities and gaps to achieve integrated biopharma-DT manufacturing.

PAT Methods
Biological products are highly sensitive to cell-line and operating conditions, while the fractions and structures of the product molecules are closely related to drug efficacy [184].Thus, having a real-time process diagnostic and control system is essential to maintain consistent product quality.However, process contamination needs to be strictly controlled in the biopharmaceutical manufacturing; thus, the monitoring system should not be affected by fouling nor interfere with media to maintain monitoring accuracy, sensitivity, stability, and reproducibility [185].In general, among different unit operations, process parameters and quality attributes need to be captured.
Biechele et al. [185] presented a review of sensing applied in bioprocess monitoring.In general, process monitoring includes physical, chemical, and biological variables.In the gas phase, the commonly used sensing system consists of semiconducting, electrochemical, and paramagnetic sensors, which can be applied to oxygen and carbon dioxide measurements [185,186].In the liquid phase, dissolved oxygen, carbon dioxide, and pH values have been monitored by an in-line electrochemical sensor.However, media composition, protein production, and qualities such as glycan fractions are mostly measured by online or at-line HPLC or GC/MS [186,187].The specific product quality monitoring methods are reviewed by Guerra et al. [188] and Pais et al. [189].
Recently, spectroscopy methods have been developed for accurate and real-time monitoring for both upstream and downstream operations.The industrial spectroscopy applications mainly focus on cell growth monitoring and culture fluid components quantifications [190].UV/Vis and multiwavelength UV spectroscopy have been used for in-line real-time protein quantification [190].NIR has been used for off-line raw material and final product testing [190].Raman spectroscopy has been used for viable cell density, metabolites, and antibody concentration measurements [191,192].In addition, spectroscopy methods can also be used for process CQA monitoring, such as host cell protein and protein post-translational modifications [187,193].Research shows that in-line Raman spectroscopy and Mid-IR have capabilities to monitor protein concentration, aggregation, host cell proteins (HCPs), and charge variants [194,195].The spectroscopy methods are usually supported with chemometrics, which require data pretreatments such as background correction, spectral smoothing, and multivariant analysis for quantitative and qualitative analysis of the attributes.Many different applications of spectroscopic sensing are reviewed in the literature [187,188,190,193].

Process Modeling
The application of DT in biopharmaceutical manufacturing requires a complete virtual description of physical plant within a simulation platform [4].This means that the simulation should capture the important process dynamics in each unit operation within an integrated model.Previous reviews have focused on the process modeling methods for both upstream and downstream operations [183,[196][197][198][199][200].
For downstream operation, modeling strategies have focused on selecting design parameters, adjusting operating conditions, and buffer usage to achieve high protein productivity and purities efficiently.The different operating conditions include (1) flowrate, buffer pH, or salt concentration effects for chromatography operation [223][224][225][226]; (2) residence time, buffer concentration, and pH used for virus inactivation; (3) feed protein concentration, flux, retentate pressure operated for filtration [227].Thus, the product concentration and various types of impurities can be predicted for each unit operation.The detailed modeling methods have been reviewed in the literature [228].
In recent years, biopharmaceutical companies are shifting from batch to continuous operations.It remains an unanswered question if it is feasible to start up a new, fully continuous process plant or replace specific unit operations with continuous units.Integrated process modeling provides a virtual platform to test various operating strategies such as batch, continuous, and hybrid operating modes [229].These different operating modes can be compared based on life cycle analysis and economic analysis for different target products under various operation scales [229][230][231][232][233].
For flowsheet modeling, there are two approaches available in the literature, which include mechanistic and data-driven models.Due to the high computational cost, mechanistic modeling mostly focuses on the integration of a limited number of units, such as the combination of multiple chromatography operations [234].Data-driven/empirical models are generally used to integrate all the unit operations in a computationally efficient way.Mechanistic models for a single unit can be integrated with other units that are built by the data-driven model to optimize a specific unit in the integrated process [235].Mass flow and RTD models [236] can be included in the model to examine different scenarios of adding and replacing new unit operations and adjusting process parameters.Coupling with the control system, flowsheet modeling will be able to achieve real-time decision making and optimize the overall process operation automatically [237].
The data-driven models can be further integrated with Monte Carlo analysis or linear/nonlinear programming for risk assessment and process scheduling.Zahel et al. [238] applied Monte Carlo simulation in the end-to-end data-driven model, which can be used to estimated process capabilities and provide risk-based decision making following a change in the manufacturing operations.
Table 4 shows examples of capabilities and methods for process modeling, that can be potentially used in DT virtual plant model building.However, it needs to note that although process modeling has capabilities to capture all the above operating conditions and critical quality attributes, none of the modeling work incorporates all the process information within a single model.In recent years, hybrid models (for example, ANN + mechanistic model) have become more prevalent in both upstream and downstream model building because they improve the computational speed as well as the broad applications and model robustness.4. Capabilities and methods for process modeling in biopharmaceutical manufacturing.Note that many studies have used these methods, and the studies cannot be listed one by one.The papers selected in the table are used to represent the capabilities of the specific methods.

Data Integration
Data obtained in the biopharmaceutical monitoring system are usually heterogeneous in data types and time scales.They can be collected from different sensors, production lines (laboratory or manufacturing), and at different time intervals.With the development of real-time PAT sensors, a large amount of data is obtained during biopharmaceutical manufacturing.Thus, data preprocessing is essential to handle missing data, perform data visualization, and reduce dimension [253].Casola et al. [254] presented data mining-based algorithms to stem, classify, filter, and cluster historical real-time data in batch biopharmaceutical manufacturing.Lee et al. [255] applied data fusion to combine multiple spectroscopic techniques and predict the composition of raw materials.These preprocessing algorithms remove noise from the dataset and allow the data to be used in a virtual component directly.
In DTs, virtual components and physical components should communicate frequently.Thus, the virtual platforms need to have the flexibility to adjust their model-structure for different products and operating conditions.Herold and King [256] presented an algorithm that used biological phenomena to identify fed-batch bioreactor process model structure automatically.Luna and Martinez [257] used experimental data to train the imperfect mathematical model and corrected model prediction errors.Although there are no such applications for the integrated process, these works show the possibilities to achieve physical and virtual component communication.
In biopharmaceutical manufacturing, the integrated database can guide process-wide automatic monitoring and control [258].Fahey et al. applied six sigma and CRISP-DM methods and integrated data collection, data mining, and model predictions for upstream bioreactor operations.Although the process optimization and control have not been considered in this work, it still shows the capabilities to handle large amounts of data for predictive process modeling [259].Feidl et al. [258] used a supervisory control and data acquisition (SCADA) system to collect and store data from different unit operations at each sample time and developed a monitoring and control system in MATLAB.The work shows the integration of supervisory control with a data acquisition system in a fully end-to-end biopharmaceutical plant.However, process modeling has not been considered during the process operations, which cannot support process prediction and analysis.

Challenges and Opportunities
In terms of process monitoring in the physical plant, the application of real-time CQA monitoring methods has not been adapted to industrial applications.The use of NIR or Raman spectroscopy shows potential in real-time multicomponent measurements, although most applications have not yet been applied to industrial practice.To obtain accurate predicting/measurement results, raw material calibration and chemometric methods need to be applied, which increases the complexity of the application of spectroscopy.In addition, the data obtained from biopharmaceutical manufacturing are high dimensional and heterogeneous, which require advanced data integration and synchronization.An automated data aggregation, mining, storage, and visualization system is required to achieve DT automation.The data storage system should have large enough capability, easy accessibility, and high security as described in Section 3.4 to ensure manufacturing data security, patient data privacy, and the communication between the physical and virtual plant successfully.
To build a simulation of the physical plant, although different modeling methods have been developed for both upstream and downstream unit operations, there is no robust model that captures CPPs and CQAs for all the unit operations in the integrated process.As listed in Table 4, upstream CFD, stoichiometric and kinetic models can achieve the bioreactor modeling on different scales (from genome scale to manufacturing scales); however, not all these methods can be implemented within a DT framework because of the high computational cost.Similarly, downstream processes composed of different unit operations that integrate and optimize all the mechanistic models altogether are not realistic.Thus, these can explain the reason why the current integrated process models focus on mass balance and activity plans based on empirical models or simulators.To deal with this problem, one possible way is to apply pre-analysis to the system to reduce the dimension and parameters by evaluating the CPPs and CQAs to ensure productivity and efficacy.Based on the analysis, the system will select models and use the limited number of parameters to analyze or optimize the process.In this case, all different modeling methods need to be built on the same platform or have good model-model communications.An alternative way is to apply hybrid models to reduce the computational burden in the integrated process.In addition, to capture the major unit operations, the auxiliary equipment such as buffer preparation, Cleaning-In-Place (CIP), and Sterilization-In-Place (SIP) also need to be integrated into the process modeling.These operations do affect decision-making, including manufacturing scheduling and cost analysis.However, there is no such model that captures all the auxiliary equipment.Moreover, in the risk analysis in biopharmaceutical manufacturing, process contamination will directly cause batch failure.Lot to lot variations also exist in the bioreactor culture and purification process.Developing a model-based control system that can diagnose the contamination and process variabilities at an early stage is essential to improve the process efficiency.It is known that pharmaceutical or biopharmaceutical industries follow more stringent regulatory pathways; thus, the progress of accepting new technologies usually takes a longer time than other industries.It must be noticed that current technologies such as AI DTs do not conform to the QbD regulatory guidelines.The good news is that regulatory agencies are also seeking the adoption of innovative technologies.If DT can be developed for process operations and control at the same time, this method might be promising to be accepted by regulatory [260].However, the DT approach is closely related to real-time optimization and operation supports, which are based on already built manufacturing platforms.In this situation, it might be hard to obtain regulatory approval [235].
The integration of virtual plant and physical plant in biopharmaceutical manufacturing is still in its infancy.It is promising to show that the application of data-model-control integration can be achieved for a single unit operation.Additionally, a data acquisition-control system can be achieved for an integrated process.However, to accomplish the biopharmaceutical DT, the development of real-time data acquisition, a dedicated data transferring system, an effective control and execution technique, robust simulation methods, anomaly detection, prediction tools, and easy access to secure the cloud server platform are still needed.

Conclusions
DTs are a crucial development of the close integration of manufacturing information and physical resources that raise much attention across industries.The critical parts of a fully developed DT include the physical and virtual components, and the interlinked data communication channels.Following the development of IoT technologies, there are many applications of DT in various industries, but the progress is lagging for pharmaceutical and biopharmaceutical manufacturing.This review paper summarizes the current state of DT in the two application scenarios, providing insights to stakeholders and highlighting possible challenges and solutions of implementing a fully integrated DT.
In pharmaceutical manufacturing, building blocks of a DT, including PAT methods, data management systems, unit operations, and flowsheet models, system analyses methods, and integration approaches have all been developed in the last few years, but gaps in PAT accuracy, real-time model computation, model maintenance capabilities, real-time data communication, as well as concerns in data security and confidentiality, are preventing the full integration of all the components.To solve these challenges, several insights are provided.The development of new tools such as NIRS and in-line UV spectroscopy, iterative optimization technologies, and different offline adaptive methodologies can help to resolve the existing issues in PAT methods.In order to reduce simulation time to achieve real-time computation, efficient algorithms, and reduced order modeling approaches should be further studied for process models.In terms of model maintenance, adaptive modeling methods with online streaming data are to be investigated further.To have a fully integrated and automated DT, the information flow from the virtual component to the physical plant also needs to be established.The virtual plant should be able to change system settings and control the physical plant to help to achieve an optimized process within the design space.Ideally, all these components should be placed under appropriate physical and virtual security protocols.
In biopharmaceutical manufacturing, similar constituting components of DTs have been discussed, as well as the implementation challenges in each block.In terms of process monitoring, the development of NIR or Raman spectroscopy, material calibration, and chemometric methods can help to obtain an accurate predicting/measurement result.Advanced data integration and synchronization technology should be in place.For process simulation, there is no robust model that captures CPPs and CQAs for all the unit operations in the integrated process due to the computational complexity.Pre-analysis to screen the CPPs and CQAs is a promising approach to reduce the computational burden.Process models to capture the auxiliary equipment and process contamination need to be further investigated.To achieve a fully integrated DT, real-time data acquisition methods, data transferring systems, effective control and execution techniques, robust simulation methods, and anomaly detection are still in need, with other supporting functions.
It is noted that given the rapid development and publication rate in this area, and that this paper is merely a narrative literature review, the authors are not able to list and review all studies in these areas in detail.The papers selected and problems described in the manuscript are only a nonholistic subset used to represent the capabilities and drawbacks of a method or technology.Since the manuscript is organized using a conceptual and topical frame, the authors recommend interested readers to go through cited references to explore additional details.In addition to the summarized research opportunities, further research directions can include the development of a demonstrative case study of DT in pharmaceutical and biopharmaceutical manufacturing and a systematic review of the field.

Figure 1 .
Figure 1.Physical component, virtual component, and data management platform of a general digital twin (DT) framework.

Figure 2 .
Figure 2. Framework for dataflow in a continuous direct compaction tablet line.The text over the arrow indicates options for data transfer protocols.

Figure 3 .
Figure 3. Fully integrated DT framework for continuous pharmaceutical manufacturing.

Table 1 .
Applications of digital twin in various industries.

Table 2 .
Feature-based comparison of various models.

Table 3 .
A comparison of data integration studies presented for pharmaceutical manufacturing.