1. Introduction
The world population is growing; estimations show that by 2050, there will be more than 9 billion people living on the Earth [
1]. Moreover, due to urbanisation, 66% of the world population will live in cities. This will lead to numerous challenges concerning housing and food production. As the population grows, but the available area does not, agricultural practice needs to be able to produce more food with fewer resources. Thus, the need for efficient food-production systems is apparent. One possible method of gaining more efficiency lies in precision agriculture. The aim of precision agriculture is to use temporal and spatial information of field or animal produce in order to improve the productivity of the farm [
2]. A growing area of technology to achieve this is through virtualisation of all the objects on a farm. For the virtualisation, a digital twin can be used [
3]. Digital twins (DT) offer a novel way of representing a physical object or system in a digital environment. A DT is a data-driven, digital replica of a real-world object or environment, which can be used for decision support and systems analysis [
4]. Once a DT is created, the application can be linked to the real object/environment using the Internet of Things (IoT). With this link, a DT is able to replicate the states and behaviour of the physical object in a digital environment [
5]. This removes key constraints about human observations (time and place) since the observation can be done in the digital environment [
3,
6]. For instance, these observations can be achieved by means of augmented or mixed reality, as outlined in our previous work in [
7]. The human collaborative process can be enhanced when using additional technologies such as data analytics, machine learning, artificial intelligence, and automation. This could lead to a learning and adapting DT, which can make choices based on simulated scenarios and data from the past [
8] (with the main difference between simulations and DTs being the bi-directional communication). The DT and the physical object communicate during their life span, for instance, updating values based on actual measurements [
9].
DTs are currently employed across a wide variety of disciplines for diverse applications. In the manufacturing industry, for example, DTs are used for production planning and design, maintenance, product lifecycle, manufacturing, layout planning, and process design [
10], whereas in the energy sector, the approach is used to forecast the life expectancy of wind turbines [
11]. Further, DTs are readily used to evaluate the ergonomics of the workplace, detecting the welding completeness and automated cybersecurity testing in the automotive industry [
7,
12,
13,
14], and in the healthcare domain, as detailed by Du, et al. [
15], who implemented DTs resulting in an increased accuracy, recall, and F1 score of stem cell detections.
In addition to outlining four categories of DT, Verdouw, et al. [
3] identified five main classes for use cases in agriculture, including energy consumption analysis, system failure analysis and prediction, real-time monitoring, optimization/update, and a technology integration tool. However, ref. [
16] only identified two case study examples that described the application of DTs in greenhouses.
A greenhouse, according to Stanghellini, et al. [
17], is a permanent cover structure for crops, is tall enough to enter, and has some means of controlling the environment. Greenhouses are used to protect crops from the outside environment to ensure good crop yield even if conditions are harsh. More advanced greenhouses (high-tech) are equipped with computer-controlled environmental control, consisting of a network of actuators and sensors. When placed in a temperate climate (e.g., the Netherlands) these high-tech greenhouses consume a great deal of energy to control the indoor climate [
18]. In order to save energy, a wide variety of equipment is used, i.e., heat exchangers, aquifers, and windows for natural ventilation. During daily operation, decisions need to be made about the use of this equipment [
19]. As previously highlighted, DTs are used to perform energy analysis and are able to assist in the decision-making process. Therefore, it is possible to infer that these two applications of DTs seem to be ideal for greenhouses.
Based on the aforementioned discussion, this research aims to provide an overview of what technologies are used for the creation of an agricultural DT for greenhouses. There are other works focusing on this topic; for example, Ariesen-Verschuur, et al. [
20] provided a comprehensive review of DT applications in greenhouse horticulture. Their approach focused on applications of DT technology. However, the research in this article, instead presents findings on suitable approaches for data coupling, the stages of DT implementation, and Industry 4.0 technologies. To achieve this, a literature search on DTs was conducted focusing on DT implementation and Industry 4.0 technology. As such, the following research questions were defined to gain a clear overview of the state-of-the-art DT application in agriculture. (1)
What data are typically coupled to agricultural digital twins? (2)
What are the stages of implementation for creating a digital twin in scientific literature? (3)
What Industry 4.0 technologies could be implemented in a greenhouse DT? (3a)
What cloud-based technologies are implemented? (3b)
What AI subfields are used alongside a DT?
The remainder of this article is as follows.
Section 2 provides a background discussion on DTs and their categorization.
Section 3 outlines the methodology employed for the literature search, and
Section 4 presents the evaluation of related articles. The article is concluded in
Section 5.
2. Background
Based on the investigation by Verdouw, et al. [
3], six categories of DTs were identified, namely (i) monitoring, (ii) predictive, (iii) prescriptive, (iv) autonomous, (v) imaginary, and (vi) recollection. Monitoring DTs visualise the current and historic behaviour and state of the object or environment that it replicates. The digital representation is based on sensor data and a meta model. Typically, these sensor data are enriched with external data to further enhance the virtual representation. Predictive DTs go a step further: instead of showing the current or the past state, it shows the (possible) future state. Similar to the monitoring DTs, current and historic sensor data are employed. However, these data serve a prediction model to generate the future state of the real-world object or environment. A prescriptive DT shows the effect of certain interventions on the future DT, and these interventions are conducted in the present DT. An autonomous DT operates independently and does not need any intervention by humans; it has full control over the real-life object. An imaginary DT describes an object that does not yet exist in real-life, and this includes information needed to create the object. Recollection DTs store all the historical data of the physical object; therefore, this type of DT is also referred to as the digital memory. These historical data can be used to improve the next generation of objects [
21].
2.1. DT Interventions
Additionally, there are typically two types of interventions: (i) reactive and (ii) proactive [
3]. Reactive interventions address a current problem found within monitoring DT solutions; an example could be that the current temperature in a greenhouse is too high. A proactive intervention is based on problems identified by a forecast generated by the predictive DT. For instance, the predictive DT predicts that the temperature in the greenhouse will be too high in three hours.
The prescriptive DT provides information and insight into the different interventions and what the effects are, yet the final decision is still taken by a human operator, and the intervention is executed in the physical world. This is not the case for the autonomous DTs, which have full control over the object without human interventions. The outcome of the prescriptive DT (intervention) is remotely implemented using the actuators of the real-life object or environment.
Moreover, autonomous DTs can be enriched with self-learning, where embedded algorithms evaluate the results of the suggested interventions [
22]. The complex interactions between the different DT types and a greenhouse are visualised in
Figure 1. Within the diagram, the coloured nodes (yellow, red, green, blue) are specific to the different DTs, and the black nodes are common across all DT types. The diagram is inspired by articles by Verdouw et al. [
3,
16].
2.2. DT and Industry 4.0
Both according to industry and academia, DTs are finding a footing as one of the key technologies within the fourth industrial revolution (also known as Industry 4.0) [
23]. For example, Hofmann and Rüsch [
24] identified the following Industry 4.0 concepts based on their systematic literature review (SLR): cyber-physical systems (CPS), Internet of things (IoT), internet of services (IoS), and smart factory. CPS are systems that connect the virtual and physical world together on a network base; some DTs, therefore, could be considered as CPS. IoT is a mainstay technology within the Industry 4.0 domain. One of the more general definitions of IoT is an environment in which objects, using (relatively) small devices that are connected to the internet, can be turned into smart things [
25]. This also leads to the concept of IoS, which refers to the service society that we live in these days and making these services more readily accessible through the internet. Smart factory combines the power of the previously mentioned concepts (CPS, IoT, and IoS). A smart factory is based upon decentralised production where human, resource, and machine communicate seamlessly.
At the time of conducting this investigation, no literature review articles were found that focussed on the technologies, data, or the development of (specifically) greenhouse DTs. However, Pylianidis, et al. [
4] conducted a literature review on the application of DTs in agriculture focusing on the potential of the DT in agriculture. The authors concluded that most DTs were still in their primary stage, and therefore, the DTs did not yet offer the same benefits as found in other domains. Furthermore, they suggested starting implementing DTs with simpler functionalities and gradually adding more complex parts. Verdouw, et al. [
3] analysed what DTs can contribute to the advancement of smart farming. They defined the concept and introduced a typology of DTs based on literature. In addition, they proposed a conceptual framework for the implementation and design of DTs. This framework was validated and applied in a case study. Based on the study, they defined a DT as a dynamic representation of a real-life object that mirrors its states and behaviour across its lifecycle. This application can be used to monitor, analyse, and simulate current and future states of and interventions. Further, this can involve the use of data integration, artificial intelligence, and machine learning.
Furthermore, Sreedevi and Santosh Kumar [
26] found a variety of applications of DTs in agriculture based on ten articles. Based on the knowledge they gained through the review, they discussed how the application of similar techniques was to be applied in a case study on aquaponics.
Following this discussion, in the following sub-section, a literature review of DTs in the agricultural domain was provided first by selecting databases; then, defining the search strategy, defining the search string, and defining the selection criteria were performed before the actual search was started. Collecting and selecting papers, quality assessment, extracting data, and synthesizing data describe the steps performed during the search for papers.
3. Materials and Methods
The search for papers for this literature study were conducted over five databases including Scopus (SC), Web of Science (WoS), Springer Link (SL), ScienceDirect (SD), and IEEE Xplore (IEEE). The use of this selection of databases provided a comprehensive coverage of the field, with a focus on journals rather than conference papers. Further, the databases used were selected based on their relevance to agriculture and the informatics domain.
3.1. Search Protocol
Based on the knowledge gained during the background and related work section, it became clear that there was a limited number of papers regarding agricultural DTs. Therefore, to start with, a refined search string was first defined as in (1).
Based on the first retrieved papers, this was later extended to (2).
This search string was the basis for all the databases; however, each database has a specific search string. The selected articles from the database search were used to perform a snowballing review process. During the process of snowballing, the reference list of the selected article was reviewed for additional articles. The articles found through this process were also checked using the selection criteria and quality assessment.
The databases used have different search conventions, and therefore, each database had a specific search string and specific settings, defined in
Table 1.
3.2. Selection Criteria
All the found papers were checked based upon pre-defined selection criteria to ensure only useable papers are selected. If one of the exclusion criteria (
Table 2) was true, the paper was excluded from this study.
Articles were collected from the described databased using the search strings and by performing snowballing. The list of selected articles was gathered in an excel file, and this file was later used to keep track of the quality assessment and the data extraction. For each individual article, the following information was gathered, presented in
Table 3.
The articles found through either searching the databases with a search string or through snowballing were checked against the previously defined selection criteria. Only the selected articles were used in the following steps of the review.
3.3. Quality Assessment
Next, a quality assessment was performed on the selected papers to ensure their quality. As proposed in the study of Kitchenham, et al. [
27], eight questions were used to assess the quality and suitability of the articles for analysis, outlined in
Table 4.
Articles were scored based on the answer to each question, where 1 point (yes), 0 points (no), or 0.5 points (somewhat) were allocated after review of the article. The highest achievable score is 8; if an article had less than 4 points (50%), it was excluded from the study. The awarded score is not necessarily a complete reflection of the quality but rather the suitability for this investigation.
3.4. Data Extraction
Relevant data from the selected papers were extracted in order to answer the research questions. This was achieved by reading all papers in full and writing down the useful information. The data were gathered in a spreadsheet file to obtain an overview of the whole study.
3.5. Data Synthesis
The data collected during the extraction were diverse and needed to be synthesized to gain a holistic overview. This means that, per research question, umbrella terms were used to categorise the different answers of different articles. The defined umbrella terms per research question can be found in
Table 5 below.
5. Conclusions and Future Work
The database search is the biggest threat to validity of this literature study. The five used databases were selected based on their relevance to agriculture and the informatics domain. To the best of the authors’ knowledge, these databases cover the vast amount of the published articles. However, there could be other literature types that were not included in this review because a certain database was excluded. Moreover, the defined search strings could have missed relevant papers. By performing a broad search and looking for synonyms before defining the final search strings, the authors limited the potential number of missing papers. In addition, snowballing was performed on the selected papers to find papers that were not directly retrieved using the search string. By following a predefined and widely used procedure, the SLR [
27], this research is reproduceable when people are in doubt of the results or want to perform a similar study.
This article gained an overview of the state-of-the-art DT application in agriculture by performing a literature study. The search query across five different databases resulted in 22 articles used for data extraction. A wide variety of DT application in agriculture was found, ranging from predicting wheat growth [
42] to controlling an aquaponics system [
32]. Consequently, a wide variety of data types were used, and eight umbrella terms were defined: outdoor climate, indoor climate, soil, plant/crop, animal, water, images, machines, and other. The DTs used outdoor climate data the most (12 articles); this was often used in combination with soil data (7 articles). The DTs used, on average, 5.2 different data types, belonging to, on average, 2.6 categories.
The stages of implementation of the DTs were diverse. However, it became apparent that starting with equations or a textual description of the DT is common. Cloud interaction or simulation were often the final step of the DTs, and the most used step was linking data.
Out of the 22 selected articles, 3 did not implement any cloud-based technology, while the other 19 did. Out of the 22 articles, 68% of them used a database for their DT, and 36% used an API. Ten articles used machine learning in their DT; however, eight out of these ten did not specify what algorithm.
Based on all the collected information, we proposed a design for a DT of a greenhouse. Further research should focus on applying and testing the proposed design. In addition, the documentation of the development process and design choices of DTs should be improved, as this ensures that people can recreate and benefit from the research.