# What Is a Digital Twin? Experimental Design for a Data-Centric Machine Learning Perspective in Health

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

The digital twin is a set of virtual information constructs that fully describes a potential or actual physical manufactured product from the micro atomic level to the macro geometrical level.

A digital twin is a virtual representation that serves as the real-time digital counterpart of a physical object or process and addresses every instance for its total life cycle.

## 2. What Is a Biological Digital Twin?

**Definition**

**1.**

**Definition**

**2.**

## 3. What Advantages Do the Digital Twins Provide?

**Definition**

**3.**

**Definition**

**4.**

## 4. What Is a Digital Twin System?

**Definition**

**5.**

## 5. Experimental Design

## 6. Applications

## 7. Ethical Considerations

## 8. Discussion

- A digital twin simulates data.
- A digital twin cohort is a collection of digital twins.
- There are four types of data sources, and digital twin data are one of these.
- Digital twin data are time-dependent.
- A digital twin cohort is calibrated to a target patient at time ${t}_{i}$.
- A Digital Twin System consists of two main parts (S-DTS and I-DTS), which are collections of analysis methods.

## 9. Conclusions

- A Digital Twin System is a complex entity with interconnected substructures.
- Each substructure needs to be optimized for a given problem setting, e.g., in medicine or health.
- A digital twin is just one method for simulating intervention-dependent data.

## Author Contributions

## Funding

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Grieves, M.; Vickers, J. Digital twin: Mitigating unpredictable, undesirable emergent behavior in complex systems. In Transdisciplinary Perspectives on Complex Systems; Springer: Berlin/Heidelberg, Germany, 2017; pp. 85–113. [Google Scholar]
- Renaudin, C.P.; Barbier, B.; Roriz, R.; Revel, D.; Amiel, M. Coronary arteries: New design for three-dimensional arterial phantoms. Radiology
**1994**, 190, 579–582. [Google Scholar] [CrossRef] [PubMed] - Gelernter, D. Mirror Worlds: Or the Day Software Puts the Universe in a Shoebox... How it Will Happen and What It Will Mean; Oxford University Press: Oxford, UK, 1991. [Google Scholar]
- Piascik, R.; Vickers, J.; Lowry, D.; Scotti, S.; Stewart, J.; Calomino, A. Technology area 12: Materials, structures, mechanical systems, and manufacturing road map. NASA Off. Chief Technol.
**2010**, 15–88. [Google Scholar] - Rosen, R.; Von Wichert, G.; Lo, G.; Bettenhausen, K.D. About the importance of autonomy and digital twins for the future of manufacturing. IFAC-Papersonline
**2015**, 48, 567–572. [Google Scholar] [CrossRef] - Tao, F.; Cheng, J.; Qi, Q.; Zhang, M.; Zhang, H.; Sui, F. Digital twin-driven product design, manufacturing and service with big data. Int. J. Adv. Manuf. Technol.
**2018**, 94, 3563–3576. [Google Scholar] [CrossRef] - Qi, Q.; Tao, F. Digital twin and big data towards smart manufacturing and industry 4.0: 360 degree comparison. IEEE Access
**2018**, 6, 3585–3593. [Google Scholar] [CrossRef] - Cimino, C.; Negri, E.; Fumagalli, L. Review of digital twin applications in manufacturing. Comput. Ind.
**2019**, 113, 103130. [Google Scholar] [CrossRef] - Koulamas, C.; Kalogeras, A. Cyber-physical systems and digital twins in the industrial internet of things [cyber-physical systems]. Computer
**2018**, 51, 95–98. [Google Scholar] [CrossRef] - Bauer, P.; Stevens, B.; Hazeleger, W. A digital twins of Earth for the green transition. Nat. Clim. Chang.
**2021**, 11, 80–83. [Google Scholar] [CrossRef] - Gettelman, A.; Geer, A.J.; Forbes, R.M.; Carmichael, G.R.; Feingold, G.; Posselt, D.J.; Stephens, G.L.; van den Heever, S.C.; Varble, A.C.; Zuidema, P. The future of Earth system prediction: Advances in model-data fusion. Sci. Adv.
**2022**, 8, eban3488. [Google Scholar] [CrossRef] - Laubenbacher, R.; Sluka, J.P.; Glazier, J.A. Using digital twins in viral infection. Science
**2021**, 371, 1105–1106. [Google Scholar] [CrossRef] - Wolkenhauer, O.; Auffray, C.; Brass, O.; Clairambault, J.; Deutsch, A.; Drasdo, D.; Gervasio, F.; Preziosi, L.; Maini, P.; Marciniak-Czochra, A.; et al. Enabling multiscale modeling in systems medicine. Genome Med.
**2014**, 6, 1–3. [Google Scholar] [CrossRef] [PubMed] - Peng, G.C.; Alber, M.; Buganza Tepole, A.; Cannon, W.R.; De, S.; Dura-Bernal, S.; Garikipati, K.; Karniadakis, G.; Lytton, W.W.; Perdikaris, P.; et al. Multiscale modeling meets machine learning: What can we learn? Arch. Comput. Methods Eng.
**2021**, 28, 1017–1037. [Google Scholar] [CrossRef] [Green Version] - Meier-Schellersheim, M.; Fraser, I.D.; Klauschen, F. Multiscale modeling for biologists. Wiley Interdiscip. Rev. Syst. Biol. Med.
**2009**, 1, 4–14. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Aguilar, B.; Gibbs, D.L.; Reiss, D.J.; McConnell, M.; Danziger, S.A.; Dervan, A.; Trotter, M.; Bassett, D.; Hershberg, R.; Ratushny, A.V.; et al. A generalizable data-driven multicellular model of pancreatic ductal adenocarcinoma. GigaScience
**2020**, 9, giaa075. [Google Scholar] [CrossRef] [PubMed] - Kovatchev, B. The year of transition from research to clinical practice. Nat. Rev. Endocrinol.
**2018**, 14, 74–76. [Google Scholar] [CrossRef] - Brown, S.A.; Kovatchev, B.P.; Raghinaru, D.; Lum, J.W.; Buckingham, B.A.; Kudva, Y.C.; Laffel, L.M.; Levy, C.J.; Pinsker, J.E.; Wadwa, R.P.; et al. Six-month randomized, multicenter trial of closed-loop control in type 1 diabetes. N. Engl. J. Med.
**2019**, 381, 1707–1717. [Google Scholar] [CrossRef] - Barbiero, P.; Torné, R.V.; Lió, P. Graph Representation Forecasting of Patient’s Medical Conditions: Toward a Digital Twin. Front. Genet.
**2021**, 12, 652907. [Google Scholar] [CrossRef] - Coorey, G.; Figtree, G.A.; Fletcher, D.F.; Redfern, J. The health digital twin: Advancing precision cardiovascular medicine. Nat. Rev. Cardiol.
**2021**, 18, 803–804. [Google Scholar] [CrossRef] - Voigt, I.; Inojosa, H.; Dillenseger, A.; Haase, R.; Akgün, K.; Ziemssen, T. Digital twins for multiple sclerosis. Front. Immunol.
**2021**, 12, 1556. [Google Scholar] [CrossRef] - Björnsson, B.; Borrebaeck, C.; Elander, N.; Gasslander, T.; Gawel, D.R.; Gustafsson, M.; Jörnsten, R.; Lee, E.J.; Li, X.; Lilja, S.; et al. Digital twins to personalize medicine. Genome Med.
**2020**, 12, 1–4. [Google Scholar] [CrossRef] [Green Version] - Kamel Boulos, M.N.; Zhang, P. Digital twins: From personalised medicine to precision public health. J. Pers. Med.
**2021**, 11, 745. [Google Scholar] [CrossRef] [PubMed] - Hernandez-Boussard, T.; Macklin, P.; Greenspan, E.J.; Gryshuk, A.L.; Stahlberg, E.; Syeda-Mahmood, T.; Shmulevich, I. Digital twins for predictive oncology will be a paradigm shift for precision cancer care. Nat. Med.
**2021**, 27, 2065–2066. [Google Scholar] [CrossRef] [PubMed] - Hormuth, D.A.; Jarrett, A.M.; Lorenzo, G.; Lima, E.A.; Wu, C.; Chung, C.; Patt, D.; Yankeelov, T.E. Math, magnets, and medicine: Enabling personalized oncology. Expert Rev. Precis. Med. Drug Dev.
**2021**, 6, 79–81. [Google Scholar] [CrossRef] - Chan, I.S.; Ginsburg, G.S. Personalized Medicine: Progress and Promise. Annu. Rev. Genom. Hum. Genet.
**2011**, 12, 217–244. [Google Scholar] [CrossRef] - Emmert-Streib, F.; Dehmer. A Machine Learning Perspective on Personalized Medicine: An Automatized, Comprehensive Knowledge Base with Ontology for Pattern Recognition. Mach. Learn. Knowl. Extr.
**2018**, 1, 149–156. [Google Scholar] [CrossRef] [Green Version] - Auffray, C.; Chen, Z.; Hood, L. Systems medicine: The future of medical genomics and healthcare. Genome Med.
**2009**, 1, 2. [Google Scholar] [CrossRef] - The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature
**2008**, 455, 1061–1068. [Google Scholar] [CrossRef] [Green Version] - Subramanian, A.; Narayan, R.; Corsello, S.M.; Peck, D.D.; Natoli, T.E.; Lu, X.; Gould, J.; Davis, J.F.; Tubelli, A.A.; Asiedu, J.K.; et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell
**2017**, 171, 1437–1452. [Google Scholar] [CrossRef] - Cleveland, W.S. Data science: An action plan for expanding the technical areas of the field of statistics. Int. Stat. Rev.
**2001**, 69, 21–26. [Google Scholar] [CrossRef] - Emmert-Streib, F.; Dehmer, M. Defining Data Science by a Data-Driven Quantification of the Community. Mach. Learn. Knowl. Extr.
**2019**, 1, 235–251. [Google Scholar] [CrossRef] [Green Version] - Emmert-Streib, F.; Dehmer, M. Introduction to Survival Analysis in Practice. Mach. Learn. Knowl. Extr.
**2019**, 1, 1013–1038. [Google Scholar] [CrossRef] - Cox, D.R.; Reid, N. The Theory of the Design of Experiments; Chapman and Hall/CRC: Boca Raton, FL, USA, 2000. [Google Scholar]
- Borenstein, M.; Hedges, L.V.; Higgins, J.P.; Rothstein, H.R. Introduction to Meta-Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2021. [Google Scholar]
- Ay, A.; Arnosti, D.N. Mathematical modeling of gene expression: A guide for the perplexed biologist. Crit. Rev. Biochem. Mol. Biol.
**2011**, 46, 137–151. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Van den Bulcke, T.; Van Leemput, K.; Naudts, B.; van Remortel, P.; Ma, H.; Verschoren, A.; De Moor, B.; Marchal, K. SynTReN: A generator of synthetic gene expression data for design and analysis of structure learning algorithms. BMC Bioinform.
**2006**, 7, 43. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Samee, M.A.H.; Lim, B.; Samper, N.; Lu, H.; Rushlow, C.A.; Jiménez, G.; Shvartsman, S.Y.; Sinha, S. A systematic ensemble approach to thermodynamic modeling of gene expression from sequence data. Cell Syst.
**2015**, 1, 396–407. [Google Scholar] [CrossRef] [PubMed] [Green Version] - McAdams, H.H.; Arkin, A. Stochastic mechanisms in gene expression. Proc. Natl. Acad. Sci. USA
**1997**, 94, 814–819. [Google Scholar] [CrossRef] [Green Version] - Zhu, R.; Ribeiro, A.S.; Salahub, D.; Kauffman, S.A. Studying genetic regulatory networks at the molecular level: Delayed reaction stochastic models. J. Theor. Biol.
**2007**, 246, 725–745. [Google Scholar] [CrossRef] - Shahrezaei, V.; Swain, P.S. Analytical distributions for stochastic gene expression. Proc. Natl. Acad. Sci. USA
**2008**, 105, 17256–17261. [Google Scholar] [CrossRef] [Green Version] - Edgar, R.; Domrachev, M.; Lash, A.E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res.
**2002**, 30, 207–210. [Google Scholar] [CrossRef] [Green Version] - Nekrutenko, A.; Taylor, J. Next-generation sequencing data interpretation: Enhancing reproducibility and accessibility. Nat. Rev. Genet.
**2012**, 13, 667. [Google Scholar] [CrossRef] - Goodwin, S.; McPherson, J.D.; McCombie, W.R. Coming of age: Ten years of next-generation sequencing technologies. Nat. Rev. Genet.
**2016**, 17, 333–351. [Google Scholar] [CrossRef] - Naumets, S.; Lu, M. Investigation into explainable regression trees for construction engineering applications. J. Constr. Eng. Manag.
**2021**, 147, 04021084. [Google Scholar] [CrossRef] - Zhou, J.; Gandomi, A.H.; Chen, F.; Holzinger, A. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics
**2021**, 10, 593. [Google Scholar] [CrossRef] - Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion
**2020**, 58, 82–115. [Google Scholar] [CrossRef] [Green Version] - Holzinger, A.; Dehmer, M.; Emmert-Streib, F.; Cucchiara, R.; Augenstein, I.; Del Ser, J.; Samek, W.; Jurisica, I.; Díaz-Rodríguez, N. Information fusion as an integrative cross-cutting enabler to achieve robust, explainable, and trustworthy medical artificial intelligence. Inf. Fusion
**2022**, 79, 263–278. [Google Scholar] [CrossRef] - Wang, F.; Kaushal, R.; Khullar, D. Should Health Care Demand Interpretable Artificial Intelligence or Accept “Black Box” Medicine? Ann. Intern. Med.
**2020**, 172, 59–60. [Google Scholar] [CrossRef] - Ashcroft, P.; Manz, M.G.; Bonhoeffer, S. Clonal dominance and transplantation dynamics in hematopoietic stem cell compartments. PLoS Comput. Biol.
**2017**, 13, e1005803. [Google Scholar] [CrossRef] [Green Version] - Shmulevich, I.; Dougherty, E.R.; Kim, S.; Zhang, W. Probabilistic Boolean networks: A rule-based uncertainty model for gene regulatory networks. Bioinformatics
**2002**, 18, 261–274. [Google Scholar] [CrossRef] [Green Version] - Emmert-Streib, F.; Yli-Harja, O.; Dehmer, M. Explainable Artificial Intelligence and Machine Learning: A reality rooted perspective. WIREs Data Min. Knowl. Discov.
**2020**, 10, e1368. [Google Scholar] [CrossRef] - Barricelli, B.R.; Casiraghi, E.; Fogli, D. A survey on digital twin: Definitions, characteristics, applications, and design implications. IEEE Access
**2019**, 7, 167653–167671. [Google Scholar] [CrossRef]

**Figure 1.**Visualizing the idea of a digital twin by comparing experimental settings in biology and medicine.

**Figure 2.**Complexity of the data (

**A**–

**C**) and the analysis system (

**D**,

**E**). (

**A**): A simplified view on an analysis system that has access to four different data sources. If all four data sources (i) to (iv) are available, we call the analysis system a Digital Twin System. (

**B**): Availability of data to the Digital Twin System over time. The time dependency of the different data sources is important. (

**C**): Starting from a calibrated digital twin cohort at time ${t}_{i}$, different outcomes of various interventions are shown corresponding to different patient trajectories. (

**D**): Part of the Digital Twin System for single analyses (S-DTS). (

**E**): Part of the Digital Twin System for integration of analysis results (I-DTS).

**Figure 3.**Main structure of a Digital Twin System consisting of S-DTS and I-DTS, which have themselves a complex substructure.

**Table 1.**An overview of different intervention types that can be simulated by different digital twins.

Intervention Type | External Condition | Internal Condition |
---|---|---|

environmental changes | knockdown effects | |

diet changes | gene therapy | |

surgery | pharmaceutical interventions |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Emmert-Streib, F.; Yli-Harja, O.
What Is a Digital Twin? Experimental Design for a Data-Centric Machine Learning Perspective in Health. *Int. J. Mol. Sci.* **2022**, *23*, 13149.
https://doi.org/10.3390/ijms232113149

**AMA Style**

Emmert-Streib F, Yli-Harja O.
What Is a Digital Twin? Experimental Design for a Data-Centric Machine Learning Perspective in Health. *International Journal of Molecular Sciences*. 2022; 23(21):13149.
https://doi.org/10.3390/ijms232113149

**Chicago/Turabian Style**

Emmert-Streib, Frank, and Olli Yli-Harja.
2022. "What Is a Digital Twin? Experimental Design for a Data-Centric Machine Learning Perspective in Health" *International Journal of Molecular Sciences* 23, no. 21: 13149.
https://doi.org/10.3390/ijms232113149