Deep Learning in Controlled Environment Agriculture: A Review of Recent Advancements, Challenges and Prospects

Controlled environment agriculture (CEA) is an unconventional production system that is resource efficient, uses less space, and produces higher yields. Deep learning (DL) has recently been introduced in CEA for different applications including crop monitoring, detecting biotic and abiotic stresses, irrigation, microclimate prediction, energy efficient controls, and crop growth prediction. However, no review study assess DL’s state of the art to solve diverse problems in CEA. To fill this gap, we systematically reviewed DL methods applied to CEA. The review framework was established by following a series of inclusion and exclusion criteria. After extensive screening, we reviewed a total of 72 studies to extract the useful information. The key contributions of this article are the following: an overview of DL applications in different CEA facilities, including greenhouse, plant factory, and vertical farm, is presented. We found that majority of the studies are focused on DL applications in greenhouses (82%), with the primary application as yield estimation (31%) and growth monitoring (21%). We also analyzed commonly used DL models, evaluation parameters, and optimizers in CEA production. From the analysis, we found that convolutional neural network (CNN) is the most widely used DL model (79%), Adaptive Moment Estimation (Adam) is the widely used optimizer (53%), and accuracy is the widely used evaluation parameter (21%). Interestingly, all studies focused on DL for the microclimate of CEA used RMSE as a model evaluation parameter. In the end, we also discussed the current challenges and future research directions in this domain.


Introduction
Sustainable access to high-quality food is a problem in developed and developing countries. Rapid urbanization, climate change, and depleting natural resources have raised the concern for global food security. Additionally, the rapid population growth further aggregate the food insecurity challenge. According to World Health Organization, the food production needs to be increased by 70% to meet the food demand of about 10 billion people by 2050 [1], of which about 6.5 billion will be living in urban areas [2]. A significant amount of food is produced in the open fields using traditional agricultural practices, which results in low yields per sq. ft of land used. Simply increasing the agricultural land is not a long-term option because of the associated risks of land degradation, de-forestation, and increased emissions due to transportation to urban areas [3]. Thus, alternative production systems are essential to offset these challenges for establishing a sustainable food supply chain.
Controlled environment agriculture (CEA), including greenhouses, high-tunnels, vertical farms (vertical or horizontal plane), and plant factories, is increasingly considered an important strategy to address global food challenges [4]. CEA is further categorized based on the growing medium and production technology (hydroponics, aquaponics, aeroponics, and soil-based). CEA integrates knowledge across multiple disciplines to optimize crop Table 1 presents the existing review articles covering DL applications in different sections of agriculture [22][23][24][25][26][27][28]. From the table, it is evident that the reported studies (based on the authors' knowledge) lacks a critical overview of recent advancements in DL methodologies for CEA. Thus, a need to review the recent works in CEA is consequential to determine state of the art, identify current challenges, and provide future recommendations. Figure 1 shows the bibliometric network and co-occurrence map of the author-supplied keywords. Table 1. Summary of the recent important related reviews.

Ref.
Year Focus of Study Highlights [22] 2018 Deep learning in agriculture 40 papers were identified and examined in the context of deep learning in the agricultural domain. [23] 2019 Fruit detection and yield estimation The development of various deep learning models in fruit detection and localization to support tree crop load estimation was reviewed. [24] 2019 Plant disease detection and classification A thorough analysis of deep learning models used to visualize various plant diseases was reviewed.
[ 25] 2020 Dense images analysis Review deep learning applications for dense agricultural scenes, including recognition and classification, detection counting, and yield estimation.

Ref.
Year Focus of Study Highlights [26] 2021 Plant disease detection and classification Current trends and limitations for detecting plant leaf disease using deep learning and cutting-edge imaging techniques. [27] 2021 Weed detection 70 existing deep learning-based weed detection and classification techniques cover four main producers: data acquisition, datasets preparation, DL techniques, and evaluation metrics approaches. [28] 2021 Bloom/Yield recognition Diverse automation approaches with computer vision and deep learning models for crop yield detection were presented.
Our Paper 2022 Deep learning applications in CEA Review developments of deep learning models for various applications in CEA.

Paper Organization
The article's organization is as follows: Section 2 features the methodology of the review process, including establishing review protocol, keywords selection, research questions formation, and data extraction. Section 3 presents the results of the review, including data synthesis and answers to the core research questions. Existing challenges and future recommendations are discussed in Section 4. The overall conclusions of the review is presented in Section 5.

Review Protocol
In this research, we adhered to the SLR standard approach as described by Chitu Okoli and Kira Schabram [29]. Using this approach, we identified, specified, and analyzed all the publications in DL for CEA applications from 2019 to date, in order to present a response to each research question (RQ) and identify any gaps. Planning, conducting, and reporting the review are the three parts we divided the SLR process into. Figure 2 depicts the actions taken at each level of the SLR. During the planning phase we identified RQs, relevant keywords, and databases. After the RQs were prepared, the search protocol was created, along with which databases and search strings should be used. Search string for each database was generated using selected keywords. Wiley, Web of Science, IEEEXplore Springer Link, Google Scholar, Scopus, and Science Direct are the databases used in this study. The databases were chosen to ensure adequate coverage of the target sector and to increase the scope of the assessment. By going through all the eligible studies, pertinent studies were chosen for the conducting review stage. Significant information was retrieved from the publications that met the selection/inclusion criteria in response to the RQs. Extracted data from selected publications were used to answer the RQs during the reporting stage, and the outcomes were presented using accompanying visuals and summary tables. This type of literature analysis demonstrates the most recent findings of DL research in CEA.

Research Questions
Identifying RQs is essential to the systematic review. At the start of the study, we set the RQs up to adhere to the review procedure. The searched articles were examined from a variety of aspects, and the following RQs were established.

Search Method
In order to focus the search results on papers that were specifically relevant to the SLR's scope, a methodical approach was taken. The original search was conducted using a generalized search equation that included the necessary keywords "deep learning" AND "controlled environment agriculture" OR "greenhouse" OR "plant factory" OR "vertical farm" to obtain the expanded search results. From the search results, a few studies were selected to extract the author supplied keywords, and synonyms. The discovered keywords produced the general search string/equation: ("controlled environment agriculture" OR "greenhouse" OR "plant factory" OR "vertical farm" OR "indoor farm") AND ("deep learning" OR "deep neural network"). All seven databases were searched using the same keywords. Following search strings were used for different databases: • Science Direct: ("controlled environment agriculture" OR "greenhouse" OR "plant factory" OR "vertical farm") AND ("Deep Learning") NOT ("Internet of Things" OR "GREENHOUSE GAS" OR "gas emissions" OR "Machine learning") • Wiley: ("controlled environment agriculture" OR "greenhouse" OR "plant factory" OR "vertical farm*") AND ("deep learning") NOT ("Internet of Things" OR "greenhouse gas" OR "Gas emissions" OR "machine learning" OR "Review") • Web of Science: (AB = ((("controlled environment agriculture" OR "vertical farm" OR "greenhouse" OR "plant factory") AND ("deep learning" ) NOT ( "Gas Emissions" OR "Internet of Things" OR "Greenhouse Gas" OR "machine learning" OR "Review")))) • Springer Link: ("deep learning") AND ("Greenhouse" OR "controlled environment agriculture" OR "vertical farm" OR "plant factory") NOT ("Internet of things" OR "review" OR "survey" OR "greenhouse gas" OR "IoT" OR "machine learning" OR "gas emissions") • Google Scholar: "greenhouse" OR "vertical farm" OR "controlled environment agriculture" OR "plant factory" "deep learning"-"Internet of Things"-"IoT"-"greenhouse gas"-"review"-"survey"-"greenhouse gases"-"Gas Emissions"-"machine learning" • Scopus: TITLE-ABS-KEY (("deep learning") AND ("vertical farm*" OR "controlled environment agriculture" OR "plant factory" OR "greenhouse")) AND (LIMIT-TO (PUBYEAR, 2022 ) OR LIMIT-TO (PUBYEAR, 2021) OR LIMIT-TO ( PUBYEAR, 2020) OR LIMIT-TO ( PUBYEAR, 2019 )) AND (LIMIT-TO (LANGUAGE, "English" )) AND (EXCLUDE (EXACTKEYWORD, "Greenhouse Gases") OR EXCLUDE ( EXACTKEY-WORD, "Gas Emissions") OR EXCLUDE (EXACTKEYWORD, "Machine Learning") OR EXCLUDE (EXACTKEYWORD, "Internet of Things")) • IEEEXplore: ("controlled environment agriculture" OR "greenhouse" OR "plant factory" OR "vertical farm") AND ("Deep Learning") NOT ("Internet of Things" OR "GREENHOUSE GAS" OR "gas emissions" OR "Machine learning") After all the results were processed, a total of 751 studies were found using the aforementioned search strings.

Selection/Inclusion Criteria
To establish the limits for the SLR, the inclusion Criteria (IC) and exclusion Criteria (EC) were defined. To choose the pertinent research based on the IC and EC, the studies that were obtained from all databases were carefully examined. The search outcomes from several databases were combined in a spreadsheet and compared to all of the IC and EC. A study must meet all of the ICs and ECs in order to be considered for the review. Upon passing the IC and EC, all studies that could respond to the RQs were deemed pertinent and chosen. The ICs and ECs are presented below: Applying the ICs and ECs produced a total of 72 eligible articles were selected, which were then shortlisted for additional examination. An overview of article search and selection procedure is shown Figure 3. The distribution of selected papers from different databases is shown in Table 2.  Tables 3 and 4 presents the summary of studies that fulfilled the selection criteria. The necessary data required to answer the RQs, were extracted from the selected studies. The extracted data were summarized using a spreadsheet application. In the spreadsheet, each study was assigned to separate row, and the column was assigned to different parameters. Tasks, DL model, training networks, imaging system, optimizer, pre-processing augmentation, application domain, performance parameters, growing medium, and publication year, journal, and country, as well as challenges were retrieved from the selected studies. To properly respond to the RQs, all of the extracted data were categorized and synthesized into various classifications. The following sections present the results of this SLR.           Selected Keywords: ("deep learning") AND ("Greenhouse" OR "controlled environment agriculture" OR "vertical farm" OR "plant factory") NOT ("Internet of things" OR "greenhouse gas" OR "IoT" OR "machine learning" OR "gas emissions")

Data Extraction
Records identified through database search (n = 751)

Removed ineligible papers
Removed nonpeer reviewed

RQ.1: What are the most often utilized DL models in CEA and their benefits and drawbacks?
In CEA, DL models have been applied to a variety of tasks, such as crop phenotyping, disease and small insect detection, growth monitoring, nutrient status and stress level monitoring, microclimatic condition prediction, and robotic harvesting, all of which require large amounts of data for the machine to learn from. The architectures have been implemented in various ways, including deep belief network (DBN), convolutional neural network (CNN), recurrent neural networks (RNN), stacked auto-encoders, long short-term memory (LSTM), and hybrid approaches. CNN, which has three primary benefits including parameter sharing, sparse interactions, and equivalent representations, is a popular and commonly used approach in deep learning. CNN's feature mapping includes k filters that have been spatially divided into several channels [102]. The feature map's width and height are reduced using the pooling technique. CNNs use filters to capture the semantic correlations through convolution operations in multiple-dimensional data as well as pooling layers for scaling and shared weights for memory reduction to evaluate hidden patterns. As a result, the CNN architecture has a significant advantage in comprehending spatial data, and the network's accuracy improves as the number of convolutional layers rises.
RNN and LSTM are very useful in processing time-series data, which are frequently utilized in CEA. The most well-known RNN variations include Neural Turing Machines (NTM), Gated Recurrent Units (GRU), and Long-Short Term Memory (LSTM), with LSTM being the most popular for CEA applications. Typically for data dimensionality reduction, compression, and fusion, autoencoders (AE) are used to automatically learn and represent the unlabeled input data. Encode and decode are two of the autoencoder's operations. Encoding input images yields a code, which is subsequently decoded to get an output. The back-propagation technique is used to train the network so that the output is equal to the input. A DBN is created by stacking a number of distinct unsupervised networks, such as RBMs (restricted Boltzmann machines), so that each layer can be connected to both previous and subsequent layers. As a result, DBNs are often constructed by stacking two or more RBMs. It is significant to demonstrate that DBNs have been used in CEA applications [74]. The benefits and drawbacks of various DL models are listed in Table 5. Table 5 reveals that the identified drawbacks of DL methods prevent them from becoming canonical approaches in CEA. Each DL approach has the features that make it better suited than the others to a certain application in the CEA. Hybrid models are said to address the shortcomings of some of the single DL methods. The hybrid approach demonstrates the integration of several deep learning techniques. In the publications we reviewed, we discovered some studies that made use of the hybrid approach. Figure 4. shows a visual breakdown of the most often used CEA approaches along with how frequently they are applied. The following subsection classifies CEA into two categories: (1) Greenhouse, (2) Indoor farm.

RQ.2: What are the main application domains of DL in CEA?
In this subsection, we present the DL models in greenhouse production for diverse applications. Table 3 present the application domain, tasks, DL model, network, optimizer, datasets, pre-processing augmentation, imaging method, growing medium and performance of DL in greenhouse.

Microclimate Condition Prediction
Maintaining the greenhouse at its ideal operating conditions throughout all phases of plant growth requires an understanding of the microclimate and its characteristics. The greenhouse can increase crop yield by operating at the optimal temperature, humidity, carbon dioxide (CO2) concentrations, and other microclimate parameters at each stage of the plant growth. For instance, greater indoor air temperatures-which can be achieved by preserving the greenhouse effect or using the right heating technology-are necessary for the maximum plant growth in cold climates. On the other hand, the greenhouse effect is only necessary in very hot areas for a brief period of around 2-3 months while other suitable cooling systems are needed [103]. Accurate prediction of a greenhouse's internal environmental factors using DL approaches is one of the recent trends in CEA. In our survey, we found 5 studies [30][31][32][33][34] that mentioned microclimate conditions prediction in the greenhouse.

Yield Estimation
Crop detection, one of the most important topics in smart agriculture, especially in greenhouse production, is critical for matching crop supply and demand and crop management to boost productivity. Many of the surveyed articles demonstrate the application of DL models for crop yield estimation. The Single Shot MultiBox detector (SSD) method was used in the studies [37,43,51,53] to estimate tomato crops in the greenhouse environment followed by robotic harvesting. Other applications of SSD include detecting oyster mushrooms in [39] and sweet pepper in [49]. Another DL model called You Only Look Once (YOLO) with different modifications has been utilized in some of the reviewed papers for crop yield estimation as demonstrated in [36,41,46,47,[51][52][53]. As described in [40,42,45,48,50,61], R-CNN models such as Mask-RCNN and Faster-RCNN, two of the most widely used DL models, are used in crop yield prediction applications, especially for tomato and strawberry. Other custom DL models for detecting crops have been proposed in the studies of [35,38,44,54].

Disease Detection and Classification
Disease control in greenhouse environments is one of the most pressing issues in agriculture. Spraying pesticides/insecticides equally over the agricultural area is the most common disease control method. Although effective, this approach comes at a tremendous financial cost. Techniques for image recognition using DL can dramatically increase efficiency and speed while reducing recognition cost. As indicated in Table 3, we only identified various diseases of tomato and cucumber based on our assessments of the evaluated publications. As indicated in Table 3, we identified various diseases of tomato such as powdery mildew (PM) in [55,58,62], early blight in [55,58,63], leaf mold in [59,62,63], yellow leaf curl [59,63], gray mold in [62,63], spider mite in [60] and virus disease in [56]. Similarly, the diseases of cucumber such as powdery mildew (PM) in [55,57,58], downy mildew (DM) in [55,57,58,61] and virus disease in [58] are the sole diseases discussed based on our assessments of the evaluated publications. The wheat disease stated in [64] is another disease reported in the examined articles.

Growth Monitoring
Plant growth monitoring is one of the applications where DL techniques have been applied to greenhouse production. Plant growth monitoring encompasses various areas such as length estimation at all crop growth stages as demonstrated in [76,77], and anomalies in plant growth in [78,82]. Other areas where plant growth monitoring is applied are in the prediction of Phyto-morphological descriptors as demonstrated in [79], seedling vigor rating in [80], leaf-shape estimation [83], and spike detection and segmentation in [81].

Nutrient Detection and Estimation
It is crucial for crop management in greenhouses to accurately diagnose the nutritional state of crops because both an excess and a lack of nutrients can result in severe damage and decreased output. The goal of automatically identifying nutritional deficiencies is comparable to that of automatically recognizing diseases in that both involve finding the visual signs that characterize the disorder of concern. Based on our survey, we realized that there are few works dedicated to DL for nutrient estimation compared to most works utilizing DL for nutrient detection. The goal of nutritional detection is to identify one of these pertinent deficiencies, therefore symptoms that do not seem to be connected to the targeted disorders are disregarded. The studies [69,75] employed the autoencoders approach to detect nutrient deficiencies and lead content, respectively. CNN models were also frequently used in applications for nutrient detection. This was demonstrated in soybean leaf defoliation in [70], nutrient concentration in [72], nutrient deficiencies in [75], net photosynthesis modeling in [71] and calcium and magnesium deficiencies in [73]. As shown in [74], the cadmium concentration of lettuce leaves was estimated using a different DL model called DBN that was optimized using particle swarm optimization.

Small Insect Detection
The intricate nature of pest control in greenhouses calls for a methodical approach to early and accurate pest detection. Using an automatic detection approach (i.e., DL) for small insects in a greenhouse is even more critical for quickly and efficiently obtaining trap counts. The most prevalent greenhouse insects discovered in the reviewed studies are whiteflies and thrips [65][66][67][68]. Our survey mentioned four studies for applying DL models (mostly CNN architectures) for tiny pest detection.

Robotic Harvesting
Robotics has evolved into a new "agricultural tool" in an era where smart agriculture technology is so advanced. The development of agricultural robots has been hastened by the integration of digital tools, sensors, and control technologies, exhibiting tremendous potential and advantages in modern farming. These developments span from rapidly digitizing plants with precise, detailed temporal and spatial information to completing challenging nonlinear control tasks for robot navigation. High-value crops planted in CEA (i.e., tomato, sweet pepper, cucumber, and strawberry) ripen heterogeneously and require selective harvesting of only the ripe fruits. According to the reviewed papers, few works have utilized DL for robotic harvesting applications, such as picking-point positioning in grapes [85], obstacle separation using robots in tomato harvesting [84], 3D-pose detection for tomato bunch [86] and lastly, target tomato positioning estimation [87].

Others
Other applications related to DL in CEA applications include predicting low-density polyethylene (LDPE) film life and mechanical properties in greenhouses using a hybrid model integrating both SVM and CNN [88].

Deep Learning in Indoor Farms
This subsection presents the main applications of the reviewed works that utilized DL in indoor farms (vertical farms, shipping containers, plant factories, etc.,). Table 4 present the application domain, tasks, DL model, network, optimizer, datasets, preprocessing augmentation, imaging method, growing medium, and performance of DL in indoor farms.

Stress-Level Monitoring
To reduce both acute and chronic productivity loss, early detection of plant stress is crucial in CEA production. Rapid detection and decision-making are necessary when stress manifests in plants in order to manage the stress and prevent economic loss. We discovered that a few DL stress-level monitoring papers are reported for plant factories. Stress level monitoring encompasses various areas such as water stress classification [92], tip-burn stress detection [93], lettuce light stress grading [94], and abnormal leaves sorting [91].

Growth Monitoring
In an indoor farm, it is critical to maintain a climate that promotes crop development through ongoing farm conditions monitoring. Crop states are critical for determining the optimal cultivation environment, and by continuously monitoring crop statuses, a proper crop-optimized farm environment can feasibly be maintained. In contrast to traditional methods, which is time-consuming, DL models are required to automate the monitoring system and increase measurement accuracy. We found several studies used DL models for growth monitoring in indoor farms, including plant biomass monitoring [99], growth prediction model in arabidopsis [97], growth prediction model in lettuce [95], vision based plants phenotyping [98], plant growth prediction algorithm [96,101] and the development of automatic plant factory control system [100].

Yield Estimation
Due to its advantages over traditional methods in terms of accuracy, speed, robustness, and even resolving complicated agricultural scenarios, DL methods have been applied to yield estimation and counting research applications in indoor farming systems. The domains covered by yield estimation and counting from the examined publications include the identification of rapeseed [89] and cherry tomatoes [90].
The application distribution of DL techniques in CEA is shown in Figure 5.

Summary of Reviewed Studies
We observed a rapid advancement in CEA using DL techniques between 2019 and 2022, as demonstrated in Figure 6. With rising work since 2019, this illustrates the relevance of DL in CEA. In Figure 7, we showed the distribution of published articles by various journals. The figure shows that the journal Computers and Electronics in Agriculture published the most DL for CEA articles (19). We also presented the country-by-country distribution of the evaluated articles, with China accounting for 40% of the total, indicating the highest number of publications, as shown in Figure 8. Korea and the Netherlands each contain 10% and 7% of the papers, respectively.

Evaluation Parameters
Our survey found that various evaluation parameters were employed in the selected publications (RQ.3). Precision, recall, intersection-over-union (IoU), root mean square error (RMSE), mean average precision (mAP), F1-Score, root mean square error (RMSE), R-Square, peak signal noise ratio (PSNR), Jaccard index, success rate, sensitivity, specificity, accuracy, structural similarity index measure (SSIM), errors, standard error of prediction (SEP), and inference time were the most commonly used evaluation parameters for the DL analysis in CEA. Figure 9 depicts the frequency with which the assessment parameters are used. With 29 times, accuracy was the most frequently utilized as an evaluation measure. Precision, recall, mAP, F1-Score, and RMSE were used at least 10 times; IoU and R-Square were used 5 times, while the rest were used fewer than 5 times. We noticed that RMSE and R-Square were utilized as evaluation metrics in all microclimate prediction studies. Success rate and accuracy were used as evaluation measures for robotic harvesting applications. With the exception of a few cases of recall, precision, mAP, and F1-score, works related to growth monitoring applications used accuracy, RMSE, R-Square, and accuracy. RMSE, precision, recall, mAP, F1-Score, and accuracy were commonly utilized in other applications in the examined studies.

RQ.4: What are the DL backbone networks used in CEA applications?
There are many backbone networks, but this article will only focus on the backbone networks used in the reviewed papers, which include ResNet, EfficientNet, DarkNet, Xception, InceptionResNet, MobileNet, VGG, GoogleNet, PRPNet. These network structures are fine-tuned or combined with other backbone structures.
ResNet was the most often utilized network in CEA applications, according to the survey, as illustrated in Figure 10. The ResNet architecture can overcome the vanishing/exploding gradient problem [104]. When using gradient-based learning and backpropagation to train a deep neural network, the number of n hidden layers is multiplied by the n number of derivatives. The vanishing gradient problem occurs when the derivatives are modest, and the gradient rapidly diminishes as it spreads throughout the model until it vanishes. The gradient increases exponentially as the derivatives grow, resulting in the exploding gradient problem. A skip connection strategy is utilized in the ResNet to skip some training layers and connect directly to the output. The benefit of utilizing the skipping approach is that if any layer degrades the performance of the network, regularization will skip it, preventing exploding/vanishing gradient problems. The main feature of MobileNet [105] is that it uses depth-wise separable convolutions to replace the standard convolutions of traditional network structures. Its significant advantages are high computational efficiency and small parameters of convolutional networks. MobileNet v1 and v2 are used in the reviewed articles, with v2 performing faster than v1. ResNet, on the other hand, adds a structure made up of multiple layers of networks that feature a shortcut connection known as a residual block. ResNet and FPN are used by Mask R-CNN to combine and extract multi-layer information. Many variants of ResNet architecture were discovered in reviewed articles, i.e., the same concept but with a different number of layers. A ResNeXt replicates a building block that combines a number of transformations with the same topology. It exposes a new dimension in comparison to ResNet, and requires minimal extra effort in designing each path.
Inception network [106] uses many tricks to push performance, both in terms of speed and accuracy, such as in dimension reduction. The versions of the inception network used in these reviewed papers are InceptionV2, InceptionV3, Inception-ResNetV2, and SSD InceptionV2. Each version is an upgrade to increase the accuracy and reduce the computational complexity. InceptionResNetV2 can achieve higher accuracies at a lower epoch. With the advantage of expanding network depth while using a small convolution filter size, VGG [107] can significantly boost model performance. VGGNet inherits some of its framework from AlexNet [108]. GoogleNet [109] has an inception module inspired by sparse matrices, which can be clustered into dense sub-matrices to boost computation speed, which is in contrast to AlexNet and VGGNet, which increases the network depth to improve training results. Contrary to VGG-nets, the Inception model family has shown that correctly constructed topologies can produce compelling accuracy with minimal theoretical complexity.
The backbone network for You Only Look Once (YOLO), DarkNet, has been enhanced in its most recent edition. YOLOv2 and YOLOv3 introduce DarkNet19 and DarkNet53, respectively, while YOLOv4 proposes CSPDarkNet [110]. CSPNet [111] is proposed to mitigate the problem of heavy inference computations from the network architecture per-spective and has been seen to be used in the recent YOLO structure, i.e., SE-YOLOv5 [56]. Other backbone network structures include Xception [112] with different layers of 65 and 71, EfficientNet [113], and PRPNet [55].

RQ.5: What are the optimization methods used for CEA applications?
In contrast to the increasing complexity of neural network topologies [114], the training methods remain very straightforward. In order to make a neural network efficient, it must first be trained, as most neural networks produce random outputs without it. Optimizers, which modify the properties of the neural network, such as weights and learning rate, have long been recognized as a primordial component of DL, and a robust optimizer can dramatically increase the performance of a given architecture.
Stochastic gradient descent (SGD) is an optimization approach and one of the variants of gradient descent that is also commonly used in neural networks. It updates the parameters for each training one at a time, eliminating redundancy. As a hyper-parameter, the learning rate of SGD is often difficult to tune because the magnitudes of multiple parameters change greatly, and adjustment is required during the training process. Several adaptive gradient descent variants have been created to address this problem, including Adaptive Moment Estimation (Adam) [115], RMSprop [116], Ranger [117], Momentum [118], and Nesterov [119]. These algorithms automatically adapt the learning rate to different parameters, based on the statistics of gradient leading to faster convergence, simplifying learning strategies, and have been seen in many neural networks applied to CEA applications, as demonstrated in Figure 11.

RQ.6: What are the primary growing media and plants used for DL in the CEA?
We note that the most common growing medium used in the evaluated studies is soil-based (78%), as shown in Figure 12. There are 14 publications on hydroponics, one on aquaponics, and none on aeroponics for soil-less growing media. This insinuates that these soilless growing media are still in their infancy. We also showed the distribution of the plants used in the evaluated papers, with tomatoes representing 39% of all plants grown in the CEA and corresponding to the highest number of publications, as shown in Figure 13. The percentages of papers that planted lettuce, pepper, and cucumber are 16%, 9%, and 8%, respectively. According to the reviewed publications, it was also discovered that indoor farms used soil-less techniques (hydroponics and aquaponics) more frequently than greenhouse systems, which frequently used soil-based growing medium.

Challenges and Future Directions
To the best of our knowledge, the paragraphs below provide a brief description of some specific aspects on the challenges and potential directions of DL applications in CEA.
For DL models to be effective, learning typically needs a lot of data. Such huge training datasets are difficult to gather, not publicly available for some CEA applications, and may even be problematic owing to privacy laws. Even while data augmentation and massive training datasets methods can somewhat make up for the shortage of huge labeled datasets, it is difficult to completely meet the demand for hundreds or thousands, if not less, high-quality data points. When utilized with validated data, DL models may not be able to generalize in situations where the data is insufficient. However, we discovered a number of studies that used smaller datasets and attained great accuracy, as shown in [40,45,56,59,82]. The studies demonstrated various strategies for handling this circumstance by carefully choosing the features that ensure the method will perform at its peak. Additionally, in order to ensure optimal performance and streamline the processing of the learning algorithms, the dimensionality of the input vectors for the classification and detection algorithms must be reduced.
DL algorithms are also susceptible to the caliber of the data utilized to train them. Overfitting can occur when an algorithm "learns" about noise and excessive details in the input set, which has a detrimental effect on the created model's ability to generalize. The model in this instance performs admirably on the training dataset but poorly on new data. To combat the overfitting model, regularization techniques include weight decay/regularization, altering the network's complexity (i.e., the amount of weights and their values), early halting, and activity regularization.
We expect in the future to see more combinations of two-time series models for temporal sequence processing as demonstrated in [31]. It is also anticipated that more methods would use LSTM or other RNN models in the future, utilizing the time dimension to make more accurate predictions, especially in climatic condition prediction.Additionally, it helps to gauge the reliability of time series prediction by offering an explicable result. As a result, improving interpretability will receive a lot of attention in the future [120].
The majority of the evaluated studies focused on supervised learning, while just a small number used semi-supervised learning. Future works that include unsupervised learning into CEA applications will be heavily reliant on tools like the generative adversarial network (GAN). A generative modeling method known as GAN learns to replicate a specific data distribution. The lack of data is a major barrier to creating effective deep neural network models, but GANs are the solution [121]. In order to lessen model overfitting, the realistic images created by GAN that differ from the original training data are appealing in data augmentation of DL-computer vision.
Another area worth noting is the clear interest in the use of AI and computer vision in CEA applications. With the use of DL-computer vision, a number of difficult CEA issues are being resolved. However, DL-computer vision does face significant difficulties, one of which is the enormous processing power. Adopting cloud-based solutions with auto scaling, load balancing, and high availability characteristics is one way to deal with this issue. Real-time video input analysis and real-time inferences are some of the limitations of cloud solutions, but edge devices with features like GPU accelerators can do it. Utilizing computer vision solutions on edge hardware helps lessen latency restrictions. Few works have addressed the need for proper security to ensure data integrity and dependability in the rapidly expanding field of computer vision in CEA; additional research into this area is needed in subsequent works.
There is an imperative need where deep learning needs to be applied in the next few years such as developing more microclimate models for monitoring and maintaining the microclimatic parameters to the desired range for optimal plant growth and development, thus helping in irrigation and fertigation management of the crops. The need for AI, particularly DL, to derive an empirical and non-linear "growth response function" that maps microclimate conditions to crop growth stages is critical because, according to the reviewed papers, this has not been extensively studied. This calls for the optimization of microclimate control set points at various growth stages of crops. There are currently very few publications that have developed prediction models for the microclimate parameters in CEA. In addition to the microclimate prediction models, the need to also develop more microclimate control systems such as (1) developing automatic shading system to prevent crops from harsh sunlight in greenhouses, (2) developing pad-fan systems and fogging systems based on vapor pressure deficit (VPD) control, which is an effective way to simultaneously maintain ideal ranges of temperature and relative humidity, thus significantly enhancing plant photosynthesis and productivity in greenhouse production, (3) developing photoperiod control systems based on light spectrum and intensity control. Despite the paucity of studies on microclimate prediction and control, extensive research is needed in the use of edge-AI systems for precise monitoring at various phases of crop growth. Lastly, it is crucial to investigate the use of DL for nutrient solution management in soilless cultures (influenced by both microclimate conditions and crop growth). We anticipate that further research that considers monitoring, predicting, controlling, and optimizing microclimate factors in CEA will become available in the near future as advancements in accuracy, efficiency, and architectures are put forth. Additionally, the labor availability and associated costs, are a growing concern for the sustainability and profitability of CEA industry. Some research has been reported for developing robotic systems, but majority of it is focused on field production. However, the CEA is a unique production environment and the indoor grown crops have different requirements for automation based on the production technology employed (greenhouse, vertical tower, vertical tier, hydroponic, dutch bucket, pot/tray, etc., ). Further, the CEA crops are more dense (plants per unit area), which makes robotics applications more challenging. Thus, extensive efforts are required to develop DL-driven automation and robotic systems for different production environments, to address these challenges.

Conclusions
Today, it is evident that prediction and optimization procedures are essential in many industries. This study has fully discussed a review of DL-based research efforts in CEA, which were motivated by the most recent breakthroughs in computational neuroscience. This study examined various application areas, described the tasks, listed technical details such as DL models and networks, described the preprocessing augmentation, the optimizer used, and performance of each method.
The results of this study demonstrate that the applications of DL models have attracted a lot of interest recently as a result of their ability to recognize distinctive object features and offer greater precision. There is no way to determine which DL model is the best. However, we found that RNN-LSTM was frequently used for predicting microclimate conditions in CEA due to its time series prediction. We noticed that prediction of the microclimate conditions, a crucial issue in CEA, was the subject of relatively little of the reported research. We can see that CNN models, the widely used DL model, have high applicability and universality based on the reviewed papers. CNN and ResNet are most widely adopted DL model and network, while other models and networks are also implemented in this domain. In order to generate constructive discussions of the limitations of DL techniques in the CEA domain, critical challenges and future research prospects were presented. We believe these studies will serve as a roadmap for future studies towards creating an intelligent system for various CEA applications.