Virtual to Real-World Transfer Learning: A Systematic Review

: Machine learning has become an important research area in many domains and real-world applications. The prevailing assumption in traditional machine learning techniques, that training and testing data should be of the same domain, is a challenge. In the real world, gathering enough training data to create high-performance learning models is not easy. Sometimes data are not available, very expensive, or dangerous to collect. In this scenario, the concept of machine learning does not hold up to its potential. Transfer learning has recently gained much acclaim in the ﬁeld of research as it has the capability to create high performance learners through virtual environments or by using data gathered from other domains. This systematic review deﬁnes (a) transfer learning; (b) discusses the recent research conducted; (c) the current status of transfer learning and ﬁnally, (d) discusses how transfer learning can bridge the gap between the virtual and real-world. conducted on transferring a model trained on a simulation and transferring it to a physical robot or vehicle, respectively. There are few papers that suggest novel solutions to improve the performance of this application; discuss how training data are gathered; how simulation environments are setup; and how training data are generated for optimum learning. There are 23 (34%) papers related to research conducted on transfer learning methods, algorithms and techniques used to transfer learning. Classiﬁcations on simulation to physical robot and simulation to vehicle yield 24 (35%) and 6 (8.8%) papers, respectively. These papers include the application of transfer learning where authors of the paper conducted transfer learning.


Introduction
Developing robots that have the capabilities and the dexterity of humans is very challenging. It is also a daunting task to preprogram robots by hand to perform specific tasks. Moreover, these preprogrammed tasks cannot be transferred over to another domain or environment; therefore, robots become domain specific and very expensive to operate and manufacture. Recently, there has been much demand in adopting machine learning as a potential solution to teach robots to perform challenging tasks. Machine learning has the potential to allow robots to learn about their environment through sensors and make expert decisions just like a human. However, training robotic platforms require more training data and gathering the required amount of data from a physical machine is expensive. Machine learning algorithms often struggle to perform unseen tasks (online learning) and in transferring/applying learned experiences (continual learning) [1]. Gathering quality data is difficult because the collection of accurate data is expensive, has to consider different domains, needs to consider sample data efficiency, and it has to overcome safety concerns and unexpected behaviors that can emerge while operating robots/machinery. Since deep learning models are data dependent, recognizing patterns requires adequate training data from different domains; however, gathering training data from physical robots is considered to be inefficient and expensive.
Recent advancements in machine learning, specifically in the areas of deep reinforcement learning (DRL) and transfer learning have been adapted into developing advanced robotic platforms. Humans gain capability, dexterity, and knowledge throughout their lifetimes by learning at a very young age, by observing, mimicking, and adapting to their environment by trial and error [2]. Similarly, RL assumes that a machine can be trained by acquiring knowledge through rewards when the machine behaves/acts in a certain way. Adequate training data can be gathered using simulation-based training at a significantly lower cost, and a trained model can be transferred to the physical robot. There is an inherent mismatch when transferring knowledge from the virtual to the real world, which can be minimized through realistic virtual training environments [3]. There has been significant advancement using transfer learning in DRL to enhance the quality of training and transferring models, trained on a virtual environment to a physical mechanical entity.
The Neural Information Processing Systems (NIPS) workshop held on "Learning to Learn: Knowledge Consolidation and Transfer in Inductive Systems" in 1995 was the first inference for coining the term transfer learning [4,5]. There are many researchers who have conducted transferring of learned knowledge from one domain to another. However, in all these cases, the term transfer learning was not explicitly used as a valid technique. There are many questions that need to be answered in terms of benefits, limitations, challenges, and open research areas that need to be explored and improved.
The purpose of this paper, therefore, is to present a systematic review on transfer learning and to provide in-depth answers to the above-mentioned questions. Multiple surveys have been conducted on the topic of transfer learning [6][7][8]. This research study is different from previous studies in terms of the area of the study, its focus, and the methodology. While other review papers focus on the general status of transfer learning in a wide area of applications, this paper focuses specific attention on the status of transfer learning in the virtual to real-world context. Additionally, this paper followed a systematic structure when gathering research papers and extracting the information. This systematic review is structured as follows. Section 2 provides an overview of transfer learning. Section 3 defines the research methodology. Section 4 presents the research results, which are further discussed in Section 5. Finally, Section 6 concludes the paper and offers ideas for future work.

Overview of Transfer Learning
In classical supervised learning, a model is trained on a certain domain with labeled data that corresponds to the given domain. The data consist of training and testing sets that belong to the same domain or feature space. For example, for a machine learning model to detect different types of cars, it requires datasets that contain labeled images of cars of different types. The supervised learning paradigm breaks down when sufficient training data are not available. The reliability of the model depends on the labeled data's quantity, quality, and accuracy. For instance, a classification model trained on images taken in the daytime cannot be used to classify objects taken in the nighttime. The accuracy and the performance of the model drastically decreases as the model has not seen the new domain. In certain scenarios, such as when data collection is expensive and dangerous, or the available data are not sufficient [9], the accuracy and the performance of the model degrades. Other machine learning methods perform accurately when the domain and feature space are the same. However, when the domain changes, the learner models need to be retrained to adapt to the new domain. This retraining process is often expensive in terms of computational time and testing, and it seldom requires retraining from the beginning or it expects to gather new training data. In such scenarios, the transfer of knowledge from one domain to another is possible with the training that transfer learning provides.
There are several benefits of using transfer learning in contrast to traditional machine learning techniques.

1.
Traditional techniques are data dependent; however, transfer learning requires less training data to train the model as it can utilize pretrained models as a starting point.

2.
Models trained using transfer learning have the ability to easily generalize for other unseen domains. This is because transfer learning models are trained to identify features that can be applied to an unseen context/domain.

3.
Transfer learning has the potential to make machine learning and deep learning more accessible.

4.
Unlike other learning methods, transfer learning provides an optimized initial starting point, higher learning accuracy, and faster training for new domains.
As mentioned, transfer learning seems to provide a better, accurate model for new unseen tasks of learning and allows the reuse of already existing pretrained models as a starting point. Developers and researchers can circumvent pitfalls during their initial Electronics 2021, 10, 1491 3 of 18 approach to develop new ground-breaking machine learning and deep learning solutions. Transfer learning eliminates the need for expensive and time-consuming data gathering, cleaning, annotation and training procedures. Martina et al. in their research combined multiple pretrained models to create a subject-specific model for extracting emotional content from facial datasets [10]. The above mentioned benefits can be seen as a motivation to use transfer learning. Figure 1 shows how traditional machine learning systems learn individual tasks from the beginning whereas the transfer learning technique attempts to transfer the knowledge obtained from one learning system to another. There are three major types of transfer learning methods: inductive, unsupervised and transductive.
Electronics 2021, 10, x FOR PEER REVIEW 3 of 18 As mentioned, transfer learning seems to provide a better, accurate model for new unseen tasks of learning and allows the reuse of already existing pretrained models as a starting point. Developers and researchers can circumvent pitfalls during their initial approach to develop new ground-breaking machine learning and deep learning solutions. Transfer learning eliminates the need for expensive and time-consuming data gathering, cleaning, annotation and training procedures. Martina et al. in their research combined multiple pretrained models to create a subject-specific model for extracting emotional content from facial datasets [10]. The above mentioned benefits can be seen as a motivation to use transfer learning. Figure 1 shows how traditional machine learning systems learn individual tasks from the beginning whereas the transfer learning technique attempts to transfer the knowledge obtained from one learning system to another. There are three major types of transfer learning methods: inductive, unsupervised and transductive. 1. Inductive transfer learning: It is where few labeled data are available to be used as training data for the target domain. In this case, some labeled data are required to create an objective model. The aim of this transfer learning method is to improve the target function. 2. Unsupervised transfer learning: It is when no labeled training data are available from the source and target domains; however, the source and target tasks are both related and different. 3. Transductive transfer learning: It is a case where no labeled data are available from the target domain whereas more data are available from the source domain. As described in Table 1, the source and target domains are related but different, while the source and target tasks remain the same.

1.
Inductive transfer learning: It is where few labeled data are available to be used as training data for the target domain. In this case, some labeled data are required to create an objective model. The aim of this transfer learning method is to improve the target function.

2.
Unsupervised transfer learning: It is when no labeled training data are available from the source and target domains; however, the source and target tasks are both related and different.

3.
Transductive transfer learning: It is a case where no labeled data are available from the target domain whereas more data are available from the source domain. As described in Table 1, the source and target domains are related but different, while the source and target tasks remain the same. In contrast, transfer learning techniques tend to transfer knowledge from a previous task to a target task as shown in Table 1, adapted from [6]. As shown in the Figure 1, traditional machine learning techniques attempt to learn new tasks from the beginning, while transfer learning transfers the gained knowledge over to a target domain, which does not have quality training data available. As mentioned earlier, the need for transfer learning occurs when there is a limited supply of training data available for the target domain. This could be either due to the danger posed to humans when accessing such data, or gathering such data is very expensive and time consuming, or data are not available or accessible.
One particular area where the transfer learning method shines is learning from simulations. Many machine learning tasks are very expensive, time consuming or dangerous and can benefit from transfer learning, where the data are gathered through virtual simulations. For example, by using a game engine to simulate real-world physics and to perform expensive simulations to gather the necessary training data. Although a game engine cannot accurately simulate the real world, intended tasks can benefit from running multiple parallel simulations to reduce the margin of error. Self-driving cars and robotics can gain benefits using simulations to improve their accuracy and performance. Google, Waymo and Metamoto have developed their own vehicle simulation environments to simulate advanced road conditions and scenarios that involve pedestrians and other vehicles to train their models. Simulated transfer learning is applied in training robots where it is very expensive to manipulate them and often time consuming. Random scenarios are generated and trained in parallel to different environments.
There are different methods that aim to perform transfer learning in simulations of real-world learning. These methods are: Knowledge Distillation, Imitation Learning, Domain Randomization, Zero-shot Transfer, Meta-reinforcement Learning and Domain Adaptation. Each method will be discussed in the following section.

Knowledge Distillation
Knowledge distillation is a process where large high dimensionality networks are distilled into smaller networks that are smaller but efficient. In this case, the larger network (the teacher) generates training data for the training of the smaller network (the student) [11]. For example, complex visuals that contain high dimensional input are further merged into simpler forms when training the new model.

Imitation Learning
In imitation transfer learning, an expert model demonstrates near optimum behavior to a learning agent. This learning agent attempts to replicate the learned knowledge [12]. There are two methods of imitation learning. They are behavior cloning (agent learns mapping from demonstration) and inverse reinforcement learning where the agent estimates the reward function that best estimates the demonstration [3]. This method can be utilized with Reinforcement Learning to create robust virtual to real-world transfer models.

Domain Randomization
Domain randomization is a method in which instead of modelling all parameters for the real world, the simulated training is highly randomized. This randomization process during the simulated training can eliminate any bias created when transferred to the real world. Visual randomization is a method used to randomize camera position, lighting, textures and environmental details. This method is very useful in transfer learning situations where the robot depends on vision to perform pose-estimation, object localization, object detection and semantic segmentation.

Zero Shot Transfer Method
In this method, the training environment must be realistic to perform the straightforward transferring of the trained model. The difficulty with this method is to build realistic and precise models of the real world. If the simulation is identical to the real world, this method can be used to transfer the trained model without additional optimization [13]. This method is often carried out with a domain randomization method to make sure bias is eliminated.

Meta-Reinforcement Learning
In Meta-reinforcement learning, the model is expected to generalize to an unseen new environment. This method is often known as "learning to learn" because the aim of the trained model is to adapt to the unseen task or environment from multiple training tasks. An optimized meta learning model should be trained on multiple learning tasks and should also be introduced to unseen tasks [14]. With this method, the trained model can use past experience (observed as unseen tasks) in the real world.

Domain Adaptation
Domain adaptation is a subset of the transfer learning method. Traditionally, the aim of transfer learning is to improve the performance of a target domain (Ds) by transferring the learned knowledge that is contained in different but related source domains (Ds) [3]. There are cases where sufficient source domain training data are available, but target domain data are not available or not sufficient. There are two types of domain adaptation methods. They is one-step domain adaptation and multi-step domain adaptation. Each method uses different strategies to transfer knowledge from the source domain to the target domain [15]. In this case, the domain adaptation method can be utilized with simulated data to transfer the model's knowledge.

Research Methodology
For this systematic review and meta-analyses, we adapted the guidelines described in the PRISMA statement [16]. The goal of this systematic review was to get an overview of the research area and find state-of-the-art research conducted on the research topic. A systematic mapping study was conducted to identify the state-of-the-art research, developments, advantages and disadvantages and identify the efficacy of virtual-to-real knowledge transferring methods. This also helped to identify any available research gaps of using virtual to real-world transfer learning.
In this research we followed the systematic mapping methodology ( Figure 2) proposed by Petersen et al. [17]. and precise models of the real world. If the simulation is identical to the real world, this method can be used to transfer the trained model without additional optimization [13]. This method is often carried out with a domain randomization method to make sure bias is eliminated.

Meta-Reinforcement Learning
In Meta-reinforcement learning, the model is expected to generalize to an unseen new environment. This method is often known as "learning to learn" because the aim of the trained model is to adapt to the unseen task or environment from multiple training tasks.
An optimized meta learning model should be trained on multiple learning tasks and should also be introduced to unseen tasks [14]. With this method, the trained model can use past experience (observed as unseen tasks) in the real world.

Domain Adaptation
Domain adaptation is a subset of the transfer learning method. Traditionally, the aim of transfer learning is to improve the performance of a target domain (Ds) by transferring the learned knowledge that is contained in different but related source domains (Ds) [3]. There are cases where sufficient source domain training data are available, but target domain data are not available or not sufficient. There are two types of domain adaptation methods. They is one-step domain adaptation and multi-step domain adaptation. Each method uses different strategies to transfer knowledge from the source domain to the target domain [15]. In this case, the domain adaptation method can be utilized with simulated data to transfer the model's knowledge.

Research Methodology
For this systematic review and meta-analyses, we adapted the guidelines described in the PRISMA statement [16]. The goal of this systematic review was to get an overview of the research area and find state-of-the-art research conducted on the research topic. A systematic mapping study was conducted to identify the state-of-the-art research, developments, advantages and disadvantages and identify the efficacy of virtual-to-real knowledge transferring methods. This also helped to identify any available research gaps of using virtual to real-world transfer learning.
In this research we followed the systematic mapping methodology ( Figure 2) proposed by Petersen et al. [17].

Definition of Research Questions
As the initial process of systematic study, the following questions were answered to evaluate the state of the art in virtual to real-world transfer learning. These questions were designed to be in line with the research area and to fulfil the objectives of the research.

What Are the Benefits of Using Transfer Learning?
The intention of this question is to primarily understand the application of transfer learning as a solution to overcome downfalls experienced from other machine learning and deep learning techniques. By reviewing articles that utilized transfer learning meth-

Definition of Research Questions
As the initial process of systematic study, the following questions were answered to evaluate the state of the art in virtual to real-world transfer learning. These questions were designed to be in line with the research area and to fulfil the objectives of the research.

What Are the Benefits of Using Transfer Learning?
The intention of this question is to primarily understand the application of transfer learning as a solution to overcome downfalls experienced from other machine learning and deep learning techniques. By reviewing articles that utilized transfer learning methods to overcome pitfalls from other much hyped techniques, future researchers and developers can make better judgments over other techniques. This question also helps to identify the challenges and limitations of using other machine and deep learning methods.

What Are the Use Cases of Transfer Learning in the Virtual to Real-World Context?
There are many use cases, novel applications and implementations of transfer learning. However, the main goal of this research question is to identify use cases of transfer learning in the context of virtual to real-world knowledge transfer. This helps to identify any research gaps available in this context and/or identify areas that can be improved upon. There are many challenges and limitations to using transfer learning in the virtual to real-world context. Based on the applications and prototypes developed, this question seeks to find limitations of using transfer learning.

How Are These Challenges and Limitations Currently Addressed?
This question seeks to identify current approaches taken to overcome the challenges and limitations of using transfer learning. This question looks at current methods or techniques used to overcome challenges and provide guidance for future research development.

What Are the Open Research Issues and Potential Areas for Further Research or Improvements?
The purpose of this question is to identify research gaps in this area of study. Find research gaps and challenges and provide information on such gaps for future researchers to identify and address.

Conducting the Research
For conducting the research, preliminary articles were gathered by searching on multiple scientific publishing research databases and well renowned university publishing. The data were gathered from ScienceDirect, Springer, IEEE Xplore, AAAI Press, arXiv, Sage Journals, and university databases such as Stanford, Vermont and MIT. Some of these publishing sources were not peer reviewed; however, a significant effort was made to ensure that the sources collected were peer reviewed. The articles were collected by performing searches using keywords and phrases such as "transfer learning", "virtualto-real-world", "simulation to real-world", and "sim-to-real". These articles ranged from journals, conferences, books, and symposiums. The articles yielded through the search were downloaded into the local computer and were sorted into multiple folders based on the research database. Each download was tracked with a title and direct download link for future reference or for meta collection. These articles were collected without any time restrictions and if they were considered relevant to the study.

Screening for Relevant Papers
Before the initial sorting, papers yielded from the search results were downloaded to ensure that full text was available for sorting and screening. Also, this eliminated the need for downloading these papers later, for reviewing. Metadata information such as revision number, title of the article, authors, publisher, publishing country, publication type (book, journal, symposium, workshop, conference), published date, doi, year of publication and direct access links were stored in an excel sheet for easy management. Each paper was given an index for quick reference.
Papers retrieved through the search protocol were screened based on the following criteria: 1.
Relevance of the title.

2.
Keyword section contains transfer learning.

3.
Relevance of transfer learning in the context of machine/deep learning.
During the initial screening process, the above search criteria were used to exclude articles that were not relevant to the research topic. The Article's keyword section was used to further determine the relevance of the paper, as in most cases the title did not reflect any relevance. Later, papers that were not relevant to the research area, papers without full text availability, duplicate papers, or revisions of the same paper, were discarded. In the case where the relevance of the paper could not be determined they were passed to the next stage for further reading. Papers that were reviewed and selected in this order were considered as papers on applications of transfer learning.

Keywording on Abstracts
For categorizing papers, the method described in [17] was followed which is shown in Figure 3. It is a process of classifying articles based on keywording which ensures that existing studies are taken into account. The aim of this process was to categorize papers into multiple categories based on keywords found in the articles' abstract section. Based on the keywords found, articles were categorized into multiple categories and those articles were read in detail to make sure that the content was relevant to each identified category. If the content was not relevant, categories were updated. The final result of this process was that all articles collected were mapped into multiple categories. Books on the other hand were reviewed by looking at the content and if any relevant section was found, it was skimmed, and related selections of pages were noted down for further review. Papers retrieved through the search protocol were screened based on the following criteria: 1. Relevance of the title. 2. Keyword section contains transfer learning. 3. Relevance of transfer learning in the context of machine/deep learning.
During the initial screening process, the above search criteria were used to exclude articles that were not relevant to the research topic. The Article's keyword section was used to further determine the relevance of the paper, as in most cases the title did not reflect any relevance. Later, papers that were not relevant to the research area, papers without full text availability, duplicate papers, or revisions of the same paper, were discarded. In the case where the relevance of the paper could not be determined they were passed to the next stage for further reading. Papers that were reviewed and selected in this order were considered as papers on applications of transfer learning.

Keywording on Abstracts
For categorizing papers, the method described in [17] was followed which is shown in Figure 3. It is a process of classifying articles based on keywording which ensures that existing studies are taken into account. The aim of this process was to categorize papers into multiple categories based on keywords found in the articles' abstract section. Based on the keywords found, articles were categorized into multiple categories and those articles were read in detail to make sure that the content was relevant to each identified category. If the content was not relevant, categories were updated. The final result of this process was that all articles collected were mapped into multiple categories. Books on the other hand were reviewed by looking at the content and if any relevant section was found, it was skimmed, and related selections of pages were noted down for further review.

Meta Extraction and Mapping
For the final stage of the systematic mapping process, metadata collected during the screening process were filtered down to the following meta items listed in Table 2. Summary fields were added after the papers were read in full detail. The published country was not considered, as it was difficult to identify when there were multiple co-authors from multiple countries, and it did not yield interesting results to add to this paper. Additionally, each paper was assigned with an index for future reference. The Results section discusses the extracted data in more detail.

Meta Extraction and Mapping
For the final stage of the systematic mapping process, metadata collected during the screening process were filtered down to the following meta items listed in Table 2. Summary fields were added after the papers were read in full detail. The published country was not considered, as it was difficult to identify when there were multiple co-authors from multiple countries, and it did not yield interesting results to add to this paper. Additionally, each paper was assigned with an index for future reference. The Results section discusses the extracted data in more detail.

Results
Through the systematic review and following the research protocol, it was possible to gather a total of 150 papers from scientific and academic databases. These papers were evaluated by following the systematic review. After the initial review of the selection by title, keywords, and duplicate results, 50 papers were discarded, of which 42 papers were not within the research area and the rest were duplicates and irrelevant papers. After reading the abstract of the papers, the selection was further narrowed down to 75 papers. After reading those papers fully, 68 papers were selected for the study. Final selected papers provide novel applications and experimental methods about transfer learning. Table 3 lists all the selected papers, sorted based on the year starting from 2006 to the current research year 2021. The table also shows referenced authors, publication types and the application area of the research paper based on the context. There was no time limitation for the selection of research papers; however, it was important to find the latest research conducted on the applications and state of the art of transfer learning. The gathered research papers ranged from 2006 to mid-March 2021. According to Figure 4, the majority of the papers were published from 2016 till 2021, which means that there was a huge surge in using transfer learning during this period.

7
Contributions Contribution of the paper 8 Summary Summary or the abstract of the paper

Results
Through the systematic review and following the research protocol, it was possible to gather a total of 150 papers from scientific and academic databases. These papers were evaluated by following the systematic review. After the initial review of the selection by title, keywords, and duplicate results, 50 papers were discarded, of which 42 papers were not within the research area and the rest were duplicates and irrelevant papers. After reading the abstract of the papers, the selection was further narrowed down to 75 papers. After reading those papers fully, 68 papers were selected for the study. Final selected papers provide novel applications and experimental methods about transfer learning. Table 3 lists all the selected papers, sorted based on the year starting from 2006 to the current research year 2021. The table also shows referenced authors, publication types and the application area of the research paper based on the context.

Publication Year
There was no time limitation for the selection of research papers; however, it was important to find the latest research conducted on the applications and state of the art of transfer learning. The gathered research papers ranged from 2006 to mid-March 2021. According to Figure 4, the majority of the papers were published from 2016 till 2021, which means that there was a huge surge in using transfer learning during this period.

Publication Type and Channel
The publication type can be defined as the medium in which the paper is published such as journal, conference, book, symposium or any other type. The publication channel can be defined as a journal or publisher that is peer-reviewed and approved in the academic publication system. According to Figure 5, the distribution of papers based on the Electronics 2021, 10, 1491 9 of 18 publication type shows that 64% (43 papers) of the selection are journals. Papers published through conferences and symposiums yield 30% (21 papers) and 4% (3 papers) respectively. In addition, only one book chapter was selected after sorting. Table 4 shows the publication channels in which the selected papers have been published. According to this table, many research papers were published on ArXiv, which is an open-access repository to publish scholarly articles.

Publication Type and Channel
The publication type can be defined as the medium in which the paper is published such as journal, conference, book, symposium or any other type. The publication channel can be defined as a journal or publisher that is peer-reviewed and approved in the academic publication system. According to Figure 5, the distribution of papers based on the publication type shows that 64% (43 papers) of the selection are journals. Papers published through conferences and symposiums yield 30% (21 papers) and 4% (3 papers) respectively. In addition, only one book chapter was selected after sorting. Table 4 shows the publication channels in which the selected papers have been published. According to this table, many research papers were published on ArXiv, which is an open-access repository to publish scholarly articles.    Journal of Big Data volume [9] Conference on Intelligent Autonomous Systems [18] Ninth International Conference on Autonomous Agents and Multiagent Systems-Adaptive Learning Agents Workshop [19] The 2013 International Joint Conference on Neural Networks (IJCNN) [20] IEEE International Conference on Robotics and Automation (ICRA) [21][22][23]26,27,30,32,53] IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) [24,34,63] Robotics: Science and Systems [25] International Conference on Advanced Robotics and Intelligent Systems (ARIS) [28] American Control Conference (ACC) [29] NIPS Workshop on Acting and Interacting in the Real World: Challenges in Robot Learning [31] International Journal of Advanced Robotic Systems [36] IEEE Access [37,40]

Classification of Selected Papers
This section will focus on the results of the paper classification based on Table 2, and the metadata criteria seven and eight. All the gathered papers were read in full, and the author and summary of the paper were collected. The summary was used to identify the type of the paper as a review or technical paper, and to further categorize them based on the application and context. Table 3 shows that the information was sorted based on the context and application. These papers were further classified into publication types. Reviews are papers that were written by experts in the field. These papers include well known research and technical developments conducted for the specific research area. These papers can be considered as secondary publications and can be used to identify state-of-the-art research areas and get an informed idea about future developments. These secondary papers can be used to understand the research area in a broad view and use it as a guide to find open research areas and find relevant research papers. Technical papers are papers that provide technical information about a specific area of research. In this case, these papers provide research conducted on transfer learning in highly technical detail. It could be new learning and training methods, architectures, new frameworks, data collection methods, retaining methods, and improvements to learning accuracy. Furthermore, these papers allow the identification of pitfalls during transfer learning and provide technical detail on how to overcome them. The papers also introduce state-of-the-art implementation and applications of transfer learning from simulation to the real world. Papers that are classified as simulation to physical robot, simulation to vehicle, and knowledge transfer are considered as technical papers. These identified papers discuss how transfer learning was used to transfer a model's knowledge learned in a simulation to a physical robot or vehicle, and propose and discuss new techniques and frameworks to optimize the learning process. Figure 6 shows the distribution of papers according to the classifications. could be new learning and training methods, architectures, new frameworks, data collection methods, retaining methods, and improvements to learning accuracy. Furthermore, these papers allow the identification of pitfalls during transfer learning and provide technical detail on how to overcome them. The papers also introduce state-of-the-art implementation and applications of transfer learning from simulation to the real world. Papers that are classified as simulation to physical robot, simulation to vehicle, and knowledge transfer are considered as technical papers. These identified papers discuss how transfer learning was used to transfer a model's knowledge learned in a simulation to a physical robot or vehicle, and propose and discuss new techniques and frameworks to optimize the learning process. Figure 6 shows the distribution of papers according to the classifications. Out of the 68 selected papers, 15 (22%) papers were categorized as reviews and the remaining 53 (78%) were identified as technical papers. Technical papers often address multiple use cases/applications of transfer learning. Papers classified as 'Knowledge Transfer' are papers that discuss different transfer learning techniques, solutions to overcome overheads during transfer learning, and proposed novel solutions to overcome these problems. These papers propose new frameworks, algorithms and data collection methods that allow the optimization of the performance of the transferred model. Twenty-one papers were identified for this category.
Papers that were classified as 'Simulation to Robot' or 'Simulation to Vehicle' are papers of research conducted on transferring a model trained on a simulation and transferring it to a physical robot or vehicle, respectively. There are few papers that suggest novel solutions to improve the performance of this application; discuss how training data are gathered; how simulation environments are setup; and how training data are generated for optimum learning. There are 23 (34%) papers related to research conducted on transfer learning methods, algorithms and techniques used to transfer learning. Classifications on simulation to physical robot and simulation to vehicle yield 24 (35%) and 6 (8.8%) papers, respectively. These papers include the application of transfer learning where authors of the paper conducted transfer learning.

Discussion
In this section, research questions defined in the research methodology will be discussed and answered by referring to selected research papers. This section will also identify the limitations and shortcomings of transfer learning in the context of virtual-to-real transfer learning. In addition, it will also enable researchers to find potential open research Out of the 68 selected papers, 15 (22%) papers were categorized as reviews and the remaining 53 (78%) were identified as technical papers. Technical papers often address multiple use cases/applications of transfer learning. Papers classified as 'Knowledge Transfer' are papers that discuss different transfer learning techniques, solutions to overcome overheads during transfer learning, and proposed novel solutions to overcome these problems. These papers propose new frameworks, algorithms and data collection methods that allow the optimization of the performance of the transferred model. Twenty-one papers were identified for this category.
Papers that were classified as 'Simulation to Robot' or 'Simulation to Vehicle' are papers of research conducted on transferring a model trained on a simulation and transferring it to a physical robot or vehicle, respectively. There are few papers that suggest novel solutions to improve the performance of this application; discuss how training data are gathered; how simulation environments are setup; and how training data are generated for optimum learning. There are 23 (34%) papers related to research conducted on transfer learning methods, algorithms and techniques used to transfer learning. Classifications on simulation to physical robot and simulation to vehicle yield 24 (35%) and 6 (8.8%) papers, respectively. These papers include the application of transfer learning where authors of the paper conducted transfer learning.

Discussion
In this section, research questions defined in the research methodology will be discussed and answered by referring to selected research papers. This section will also identify the limitations and shortcomings of transfer learning in the context of virtual-to-real transfer learning. In addition, it will also enable researchers to find potential open research areas, solutions for existing limitations, and opportunities for further improvement in the field of transfer learning.

What Are the Benefits of Using Transfer Learning?
After the systematic review and the mapping study, different applications and use cases of transfer learning were identified. Taylor et al. [72] on their review on transfer learning suggest several metrics to measure the benefits of transfer learning. The first metric that they proposed is Jumpstart, in which the performance of the learning model can be increased by transferring knowledge from an existing source task. Next, the Asymptotic Performer metric, states that the final learned performance of a given model can be improved by using transfer learning. The Total Reward metric states that if an agent uses transfer learning, the reward accumulated by the agent is improved when compared with that of an agent without transfer learning. Transfer ratio metric measures the ratio between the total reward gained by the transfer learner and a non-transfer learner. And the final metric they proposed was time to threshold, which measures the training time that a learning model needs to reach its expected level of performance, which can be reduced through knowledge transfer.
The main goal of using transfer learning is to enable an agent to learn when sufficient training data from the target domain is not available. Barret et al. [19] demonstrated that transfer learning for reinforcement learning on a physical robot can speed up the learning though the simulator even if it does not capture the full dynamics of the robot. According to Wang et al. [51], transfer learning is important when a statistical model that was trained using the source domain does not directly associate with the target domain. Use of transfer learning minimizes the need for labeling data in the target domain.
In reinforcement learning, to solve complex problems, the learning algorithms often need to train the algorithm for a long period of time (high complexity, CPU and GPU intensive). This time cost is currently being reduced using transfer learning. However, this method does not work with multiagent reinforcement domains (it only works with single-agent domains). Boutsioukis et al. [59] in their paper, proposed a novel approach to use transfer learning with reinforcement learning in multiagent domains. Additionally, transfer learning also allows the creation of portable models. Reinforcement learning with rewards can increase the performance of the learning algorithm and further accelerate the learning. Konidaris et al. [56] in their paper, proposed a novel method for training which allows a model to reach an optimal performance even after a brief training session.
One of the major fallbacks of deep learning is that training data collected in one timeframe will not have the same distribution as on another. In this case, transfer learning can be used to adapt localization models trained on one timeframe with another [6]. Andrew et al. in their paper [57], proposed a novel framework called "self-taught learning" for using unlabeled data in transfer learning tasks. Since preparing unlabeled data is much easier and cost effective, the proposed framework utilizes a sparse coding method to reconstruct higher-level features. Wolf et al. [64], proposed TransferTransfo, a transfer learning approach for Neural Network based conversational agents. According to this research, transfer learning has shown significant improvements in discriminative language understanding tasks when compared to other traditional learning methods.

What
Are the Use Cases of Transfer Learning in the Virtual to Real-World Context?
As mentioned above, there are many use cases of transfer learning in contrast to traditional machine learning methods. In traditional methods, the learning is limited to a specific domain, whereas in transfer-learning datasets, models can be trained virtually and transferred into a physical robot. Generally, this field is still in its infancy, but more and more research is conducted in the field. The papers that were reviewed for this review included research methodologies, novel algorithms, frameworks and multiple real-world applications and proof of concepts.
One of the major motivations for using transfer learning is its ability to solve problems when there are insufficient training data. For traditional machine learning, quality training data are the foundation for creating robust machine learning applications. According to Tan et al. [36] on their research on a brain-computer interface for rehabilitation robots, one of the major issues they faced was insufficient training data in bioinformatics. Training using real robots is unrealistic, dangerous and the behavior is always random. Even if such training was completed, the trained model is domain specific. Traditional methods use learned policies; however, such policies learned through simulations will fail in real-world conditions. Therefore, transfer learning is a solution and a technique to create advanced learning-based models that are not domain specific and are capable of learning through experience in the real world. A domain randomization strategy allows us to randomize the environment and dynamics to create advanced simulated environments [47].
For the virtual-to-robotics use case, the primary example is creating an evolutionary robotic platform that can learn and adapt from its environment as proposed and developed by Lipton et al. [18]. The other example is [19] a robot adapting its behavior to environmental changes. Current robotic platforms are designed for ideal service environments such as industrial factories. However, in order to advance robots to their next level, they should be able to act autonomously in more complex and unpredictable environments. Robots should be able to sense when their environment changes, anticipate failures and damage to hardware and create contingency plans without relying on preprogramming. Cully et al. [60] in their paper, proposed a novel algorithm called Intelligent Trial and Error to create a behavior-performance map to allow robots to rapidly adapt to their complex environments and recover from damages/injuries similarly to an animal.
The second use case is creating and training models using virtual environments and transferring it into a real-world system. The benefit of using transfer learning in this situation is that it allows the reuse of a pretrained model, and allows the discovery and extraction of features. In this context, obtaining training data for training drones and robotics platforms are difficult and expensive. In traditional machine learning, necessary features are often hand-crafted by the researchers, but using a representational learning algorithm, features can be discovered in a short period of time. Additionally, it allows a reduction in the size of the datasets which in turn reduces the computational requirements. Examples are Autonomous MAV flight [41], motion planning [29], domain adaptation for improved robot grasping [30,31,35,45], multirobot transfer learning [24,32,53,65,66], mobile fulfilment systems [38] and autonomous driving [42][43][44][45][46][47]. The above mentioned papers use virtual training environments to generate synthetic training data to train a model in the virtual environment and use transfer learning techniques to transfer the knowledge to real-world platforms. These examples use domain adaptation, a method that allows a reduction of the domain bias, which allows the real-world application to adapt to changes in the environment, such as weather and lighting conditions which can drastically change the domain [41].
The interesting factor is that most of the papers that conducted virtual to real-world research used industrial robot arms, specifically pick and place robots, for their development. Simulated training was performed on a MuJoCo Physics engine [32] because of the high demand and precision required in manufacturing. Deploying these robots is an ideal platform to decrease manufacturing costs.

What Are the Challenges and Limitations of Using Transfer Learning in This Context?
Since transfer learning is still in a research state, there are many challenges and limitations that the researchers have faced. Some of the identified challenges and limitations are simulating the real world, performing one-shot transfer learning, unknown target environments, negative transfer and identifying the transferability between the source domain and the target domain. One of the major challenges in current transfer learning research is to determine how much information is required by the learner about the source and target domain relationship [72]. It is critical for a learner to understand the relationship between source and target domains to allow an agent to use past knowledge to be used in the target domain. Current transfer learning uses humans in the loop to direct the learner; however, in an autonomous context, the learner must understand the domain relationships.
Creating a simulator that can simulate the real world with high fidelity to perform one-shot transfer learning is difficult. Current transfer learning requires pre-processing or pre-calibration on the target domain. This is due to Quasi-static kinematics, and the physical forces cannot accurately be simulated [18]. Rusu et al. [76], (Figure 4) in their research, provide detailed information of robot manipulations and the physical world results. Real-world dynamics of the machinery is chaotic and sensitive to environmental variables. Tobin et al., [23] in their research, found that by performing a randomized change on the target tasks, such as tightening the joints on the robot, drastically reduced the performance compared to their simulated results. Additionally, if the target environment is unknown, then creating an accurate virtual environment is difficult, even if the target domain is known. This is because the simulator cannot match all the target attributes.
The purpose of transfer learning in this context is to improve the target learner using simulated learning. However, most of the proposed learning algorithms assume that the source domain and target domain are the same. In a case where the source domain (Ds) and target domain (Dt) are different, even brute-force transfer maybe unsuccessful. In this situation, the target learner is negatively impacted by the imbalanced domain relationship [6,9,31], which is known as negative transfer.

How Are These Challenges and Limitations Currently Addressed?
The studied papers proposed methods, algorithms, simulations, learning and training techniques that can be used to overcome the above-mentioned challenges and limitations in transfer learning. One of the proposed solutions to handle the uncertainties during transfer learning is to create a crude simulator that can capture the salient features of the search space and introduce noise. Domain randomization is a technique that is widely used in transfer learning to provide simulated variability during the training; therefore, during testing the model can generalize the real world [35,63]. Another proposed method is to create a coevolving simulator that becomes increasingly predictive. This method uses evolutionally techniques to design a simulator that can capture important properties of the target environment [18].
Mihalkova et al., in their paper [49], proposes a novel algorithm by studying the gap that exists when there exists a minimum number of target data and where in an extreme case, single-entry information is known. The proposed algorithm is SR2RL and can find effective mapping of knowledge from a source model to the target domain when the target data are extremely limited. To bridge the reality gap, GPU based high-fidelity simulators, such as NVIDIA Flex [43], MuJoCo physics simulator [23], Alphabet Soup and RAWSim-O (used for simulating Mobile Fulfilment Systems (RMFS)) [38] are being developed. Additionally, sensors such as LiDAR are being developed to be used in software-based simulators as well as in the real world to limit the reality difference.
Currently, to adapt a base learner to a new domain when few labeled datasets (fewshot learning) are available is handled using shallow neural networks. However, this method does not yield higher effectiveness and performance. Sun et al. [67] proposed, meta-transfer learning (MTL) which enables transfer of the weights of a deep neural network to handle few-shot learning. Additionally, they also introduced a hard task (HT) meta-batch technique to further improve the efficiency of MTL. Another novel method is to perform multiagent reinforcement transfer learning. Transfer learning methods are primarily applied to single-agent reinforcement learning but have not been performed on multiagent reinforcement learning. Boutsioukis et al. [59] suggest a novel method called BIas TransfER (BITER) to perform multiagent reinforcement learning. When sufficient training data and computation power is not available, Shafahi et al. proposed robust transfer learning, in which it transfers not only performance but also the robustness from the source domain to the target domain [55].
A negative transfer occurs when the information learned from the source domain has a negative effect on the target learning model [9]. For effective transfer learning, there needs to be relevance between the source domain and the target domain. The performance of the target domain significantly improves if both the source and target are relevant. The paper suggests a weighted multimodal knowledge transfer method to weight each source and target domain to determine the relevance of each source domain. The most related source domain will be assigned with the maximum weight. The applied weights on the source domains will lessen the adverse effects of the negative transfer. If the source and target are not related or do not have a close relationship, this can lead to a decrease in performance leading to a negative transfer [53]. From the many papers that have been selected, only two papers provide suggestions and techniques [9,53] to overcome negative transfer.

What Are the Open Research Issues and Potential Areas for Further Research or Improvements?
Since transfer leaning is still an emerging research area, there are still areas to be improved, techniques to be discovered, and proof of concepts to be made to avoid some of the major downfalls of this technique. Literature reviews suggests that fine-tuning compatibility of the source domain and target domain is a necessary factor for some applications of transfer learning. Powerful GPU's and CPUs allowed the researchers to gather high quality simulated datasets and perform real-world simulation, but for consumer applications this high computational demand can be a constraint. Training needs to be fine-tuned to make the model's virtual learning policies robust in the physical world to achieve higher success in transferring [26].
One of the primary areas that needs to be improved is minimizing the reality gap between the virtual and real world. The accuracy of the transfer depends on how close the simulation is to the real-world dynamics. Due to dynamic and static friction, acceleration and collision between actuators can cause nonlinearities. New fine-tuning methods need to be researched to improve the learning policies and minimize the difference between the real world and the simulation.
Finally, one of the major issues that was mentioned in the selected papers was negative transfer. The primary goal of using transfer learning is to improve the learning of the target model using training data from a related source (simulator). As mentioned, negative transfer occurs when the source domain is not related to the target domain. Therefore, the target learner is impacted by the weak relationship. This is called negative transfer, where the target learner's performance does not improve. This is a major open area that needs to be further explored.

Conclusions
The goal of this systematic review was to identify the state of transfer learning in a simulation to real-world context. Through the systematic mapping process [17], 68 papers were identified for the review. The goal during the paper selection was to focus on simto-real and to discuss the improvements through time. For the review, we focused on the state of the art, applications, research state and open research areas to be investigated. Five research questions were created to further identify the objectives of the state of transfer learning in a simulation to real-world context. The first task was to systematically identify the limitations and challenges of researching transfer learning to better identify the gaps and state. According to the systematic review, transfer learning is still in a research state as there were only a few papers of real-world applications and prototypes.
Multiple research gaps were identified during the systematic review. Some of them are negative transfer and closing the reality gap between the simulation and the physical world.
Further research is required to address these gaps, challenges and to improve the overall accuracy of transfer. Improving transfer learning and bridging the gap between the virtual and the real can benefit many applications in domains such as industrial, autonomous vehicles and robotics.