Application of Convolutional Neural Networks in Weed Detection and Identification: A Systematic Review

: Weeds are unwanted and invasive plants that proliferate and compete for resources such as space, water, nutrients, and sunlight, affecting the quality and productivity of the desired crops. Weed detection is crucial for the application of precision agriculture methods and for this purpose machine learning techniques can be used, specifically convolutional neural networks (CNN). This study focuses on the search for CNN architectures used to detect and identify weeds in different crops; 61 articles applying CNN architectures were analyzed during the last five years (2019–2023). The results show the used of different devices to acquire the images for training, such as digital cameras, smartphones, and drone cameras. Additionally, the YOLO family and algorithms are the most widely adopted architectures, followed by VGG, ResNet, Faster R-CNN, AlexNet, and MobileNet, respectively. This study provides an update on CNNs that will serve as a starting point for researchers wishing to implement these weed detection and identification techniques.


Introduction
According to the United Nations, the world population is estimated to reach 9.7 billion inhabitants by 2050 [1].Against this backdrop, facing the challenge of feeding this growing population with high quality and sustainable products becomes an imperative task.Increasing crop productivity emerges as a measure to address this need.Thus, a strategy that contributes to improving productivity is proper management of weeds, given their direct impact on crop yields.Integrated weed management is essential to preserve agricultural productivity [2].Plants considered as weeds are fast-growing and actively compete for vital resources, such as space, water, nutrients, and sunlight.This competition not only affects resource availability but also has a negative impact on crop yield and quality [3].According to [4], damage due to weeds can represent up to 42% of agricultural production.
Currently, diverse weeding techniques are used, such as pre-and post-emergence herbicides, whose application not only generates environmental impacts but also affects the health of the workers who apply them [5].Mechanical weeding applies mechanized or manual techniques, whose effectiveness in eliminating weeds is not always the desired one, depending on their stage of development [5].Other weeding alternatives are still under development, or their feasibility has not been fully demonstrated.One is physical weeding using plastic covers [5] and micro-biological weeding involving micro-organisms [6].Traditional weeding methods present environmental challenges or economic disadvantages, creating the need to explore innovative solutions based on new technologies to increase treatment efficiency.For instance, precision weeding uses image sensors and computational algorithms to apply herbicides only when weeds are identified [7].Precision weeding is done by applying herbicides in a variable way, applying a specific amount in the exact place where the weeds are located.Examples of this are commercial developments, such as Blue River Technology from John Deere, WeedSeeker from Trimble, or Bosch Smart Agriculture, among others, which use algorithms based on artificial intelligence (AI) to recognize weeds in the field.In addition to variable rate herbicide application methods, new precision weeding techniques have been introduced commercially, such as the use of lasers to eradicate weeds by burning.For example, LaserWeeding from Carbon Robotics uses AI to identify weeds and eliminate them with a high-power laser in real time; the development of this type of technology reduces environmental impacts and lowers production costs.
Among the AI techniques used in precision weeding is Deep Learning (DL), which is an advanced branch of machine learning (ML) that uses multi-layered artificial neural networks (ANNs) to model and learn complex patterns in data.DL techniques are widely used in agriculture, and their applications are increasing as algorithms are improved [8].Several studies compiling DL applications in weed detection are presented in the works of [9][10][11][12].Among the different types of DL neural networks is Convolutional Neural Networks (CNN), a type of ANN architecture specially designed to process visual data, such as images and videos.CNNs efficiently detect spatial patterns in digital images by using convolution layers that apply filters to local regions of the input image [9].These convolution layers allow the network to automatically learn hierarchical and complex features, such as edges, textures, and shapes, instead of relying on predefined features.The basic structure of a CNN model consists of three layers (Figure 1), a convolutional layer, a clustering layer, and a connection layer [10].
to increase treatment efficiency.For instance, precision weeding uses image sensors and computational algorithms to apply herbicides only when weeds are identified [7].Precision weeding is done by applying herbicides in a variable way, applying a specific amount in the exact place where the weeds are located.Examples of this are commercial developments, such as Blue River Technology from John Deere, WeedSeeker from Trimble, or Bosch Smart Agriculture, among others, which use algorithms based on artificial intelligence (AI) to recognize weeds in the field.In addition to variable rate herbicide application methods, new precision weeding techniques have been introduced commercially, such as the use of lasers to eradicate weeds by burning.For example, LaserWeeding from Carbon Robotics uses AI to identify weeds and eliminate them with a high-power laser in real time; the development of this type of technology reduces environmental impacts and lowers production costs.
Among the AI techniques used in precision weeding is Deep Learning (DL), which is an advanced branch of machine learning (ML) that uses multi-layered artificial neural networks (ANNs) to model and learn complex patterns in data.DL techniques are widely used in agriculture, and their applications are increasing as algorithms are improved [8].Several studies compiling DL applications in weed detection are presented in the works of [9][10][11][12].Among the different types of DL neural networks is Convolutional Neural Networks (CNN), a type of ANN architecture specially designed to process visual data, such as images and videos.CNNs efficiently detect spatial patterns in digital images by using convolution layers that apply filters to local regions of the input image [9].These convolution layers allow the network to automatically learn hierarchical and complex features, such as edges, textures, and shapes, instead of relying on predefined features.The basic structure of a CNN model consists of three layers (Figure 1), a convolutional layer, a clustering layer, and a connection layer [10].

•
The convolutional layer extracts features from the image using mathematical filters; the features can be edges, corners, or alignment patterns, which give the output a feature map that serves as input to the next layer.

•
The pooling layer reduces the resolution by reducing the dimension of the feature map in order to minimize the computational cost.

•
The connection layer sends the feature maps obtained from the previous layer to the fully connected neural network layer, which contains the activation function used to recognize the final image.• The convolutional layer extracts features from the image using mathematical filters; the features can be edges, corners, or alignment patterns, which give the output a feature map that serves as input to the next layer.• The pooling layer reduces the resolution by reducing the dimension of the feature map in order to minimize the computational cost.• The connection layer sends the feature maps obtained from the previous layer to the fully connected neural network layer, which contains the activation function used to recognize the final image.
The application of CNNs involves a series of steps: first, data preparation (image acquisition and labeling); second, CNN selection and configuration (hyperparameter tuning); third, CNN training (through GPU processing Graphics Units); fourth, evaluation of CNN performance (usual metrics: average precision (mAP), recall, F1-score, and Confusion Matrix, among others); and fifth, model deployment (real-world applications).
CNNs have gained popularity as an effective image classification method [9]; the success of CNNs is attributed to their ability to process images effectively and extract relevant features automatically, which allows them to generalize new images due to their supervised learning, which has driven their wide use in practical applications and computer vision research.Although CNNs are efficient in object detection and classification, they present some limitations, e.g., large, labelled datasets are required for training.Lack of training data can lead to deficient performance or over-fitting problems.In addition, computational equipment with high processing capabilities, specifically GPUs, is required, since CNNs have a deep architecture with many parameters, which implies a higher need for processor and memory power.
To improve the efficiency of CNNs, transfer learning (TL) techniques are adopted; these techniques take advantage of the knowledge acquired from pre-trained CNNs to improve their performance in new training [13,14].TL aims to transfer knowledge from the source domain to the target application by improving its learning performance [15]; this is quite useful when the dataset for the target application is small or limited, as the pre-trained model can provide useful representations of the data and speed up the training process.
The application of CNNs in agriculture, specifically for weed identification, has significantly improved traditional weeding, which employs classical computer vision, where the analysis is carried out pixel by pixel, and is computationally expensive in processing, especially in execution time, so the use of CNNs in weed identification has improved the detection, localization and recognition of weeds, besides, being faster in achieving real-time applications [8,9,16].Although, in practice, weed detection faces several problems, such as similarities in color, textures, and shapes, as well as occlusion effects and variations in lighting environments, CNNs supported by large-scale datasets have shown great robustness to biological variability and diverse imaging conditions [13], leading to more accurate classification or detection [17,18], which allows for much more accurate and efficient automation of weeding processes [7,16].
One of the examples of using TL with CNN is AlexNet, which was trained on the ImageNet dataset [19].In agriculture, the TL approach has been implemented in weed detection and classification, helping to minimize the need for large-scale image data collection and reduce the computational costs associated with the training hours of a new CNN model [20][21][22].In addition to the use of TL in agriculture, some generative adversarial network (GAN) techniques have been applied to generate artificial images in order to augment the training set with TL [23].The evolution of CNNs has been marked by significant advances in terms of architecture, training techniques, efficiency, and applications, starting from the need to implement fast solutions in image analysis; some of the most used CNN architectures in weed detection are mentioned below: -AlexNet: Developed by [19]  the speed and computational efficiency limitations associated with R-CNN, providing a faster and more practical solution for object detection in images [27].Fast R-CNN uses a pre-trained CNN network to extract features from the input image.Then, regions of interest (ROIs) are generated using a technique called region proposal (e.g., using algorithms such as Selective Search), and these regions are transformed into a fixed region to be input to the CNN network.Finally, bounding boxes of these proposed regions are classified and fitted using classification and regression layers, respectively [27].The advantage of Fast R-CNN is the runtime efficiency of using a single CNN network to extract features and perform region classification rather than passing each proposed region through a separate network.A disadvantage is that it may have difficulty detecting small objects, or in cases of object overlap.Likewise, this architecture has variants, such as Mask R-CNN, which adds an additional branch to the network to perform semantic segmentation of objects in the image, object detection, and classification.-DenseNet (Densely Connected Convolutional Network): In 2017, proposed by [28], this is notable for its densely connected structure, where each layer is directly con-nected to all subsequent layers.This dense connectivity can potentially improve information flow and mitigate the problem of gradient fading.It has influenced the design of subsequent architectures and continues to be a popular choice in research and practical implementation in computer vision tasks.Although DenseNet has been used primarily in its original form since its introduction, there have been some proposed extensions and variants, such as DenseNet 121, 169, and 201, each with a different depth.These numbers represent the total number of layers in the network, including convolutional layers, pooling layers, fully connected layers, and normalization layers.The main advantage of DenseNet is the direct flow of information from the input layers to the output layers, facilitating the learning of complex features and the propagation of gradients through the network.In addition, having a direct data flow mitigates the problem of gradient fading, which facilitates the training of deeper networks.As a disadvantage, it has a higher computational cost, mainly in memory, due to its dense connections requiring more training and inference computations.-MobileNet: Proposed in 2017 by [29], it is specially designed for implementations on mobile devices and uses lightweight and efficient operations to balance performance and resource consumption.The main feature of MobileNet is its ability to strike a balance between network accuracy and computational efficiency through a series of building blocks called "Depthwise Separable Convolution" that significantly reduce the number of parameters and the amount of computation.The building blocks divide the standard convolution into two separate stages: a depthwise convolution followed by a pointwise convolution; this allows for drastically reduced computational cost without sacrificing too much accuracy [29].The advantages of MobileNet are computational efficiency and low resource consumption, which makes it ideal for running on resource-constrained devices, such as cell phones and IoT devices.Its main disadvantage is lower accuracy compared to other larger and more complex architectures in certain computer vision tasks.Attempts have been made to improve the architecture by obtaining versions such as MobileNetV2 and MobileNetV3, which improve accuracy and performance.-YOLO (You Only Look Once): Developed in 2016 by [30], it is a fast and efficient object detection architecture, as it approaches this task as a regression problem; instead of a separate classification for each region, this feature allows several versions from YOLOv1 up to YOLOv8 in 2023.Starting with the fifth version, released in 2020, known as YOLOv5, this was built on PyTorch [31], maintaining the original YOLO approach of dividing the image into a grid and predicting bounding boxes with class probabilities for each cell.The overall architecture includes convolutional layers, attention layers, and other modern techniques; it is important to mention that this version was developed by the Ultralytics team, not by the original authors.In 2022, the YOLOv6 and YOLOv7 versions were developed, presenting improvements in their architecture and training scheme, and improving object detection accuracy without increasing the cost of inference, a concept known as "trainable feature bags" [32].Finally, in 2023, YOLOv8 was presented; its improvements included new features, better performance, flexibility, and efficiency.Additionally, it includes improvements for detection, segmentation, pose estimation, tracking, and classification [33].
The following is a systematic review of the latest work on the detection and identification of weeds with CNN techniques to serve as a starting point for researchers wishing to implement this technique; for this purpose, the following sections are introduced: (a) Methods, using the PRISMA statement; (b) Results: developing the research question and producing statistics for the literature analysis, of the CNNs used for the detection and identification of weeds, of the sources of image acquisition for training and the study species in the articles reviewed; (c) Discussion: analysis of the articles found based on the proposed objectives; and (d) Conclusions: conclusions and proposals for future work are developed.

Methods
In this study, a systematic review was carried out to identify and analyze the scientific literature published on weed detection using CNNs.The guidelines of the PRISMA statement [34] were followed in this review.Specific Objectives: • To identify and analyze the most commonly used CNN architectures for weed detection in different crops.• To determine the image acquisition sources most commonly used in CNN train- ing for weed identification in different forms of production.

Sources of Information
The following databases were used for the systematic search: Web of Science and Scopus.

Search for Keywords
A primarily search was carried out to establish the relevant words for the systematic search; for this, the Scopus database was used with the words "weeds detection deep learning," establishing the "search within all fields," (ALL (weeds AND detection AND deep AND learning); in this search, 6096 results were obtained.With the filtering options in "Filter by keyword," we found the five most commonly used keywords with their number of matches: "Deep Learning" (1800), "Machine Learning" (988), "Remote Sensing" (704), "Crops" (647), and "Convolutional Neural Networks" (638).The initial search covered the topics of importance for this systematic review and took "Weed detection," "Deep Learning," and "Convolutional Neural Networks" as keywords for the search.

Inclusion and Exclusion Criteria
The following inclusion and exclusion criteria were used and implemented through the filters of each database: 1.
The search field is selected where the search is directed through titles, abstracts, and keywords, among others; this is specific to each database: • In Scopus, "search within Article title, Abstract, Keywords" was established.

•
In Web of Science, the search was established in "Topic"; this includes title, abstract, author keywords and keywords plus.

2.
The date range of the search is the last five years, from 2019 to 2023.
Excluded are reviews, book chapters, narrative articles, conference or congress articles, unofficial notes or communications, and studies from other areas, such as social, human, biological, chemical, legislative, social and economic impacts. 5.

Search String in Bibliographic Databases
The search equation is established by restrictively connecting all the results containing the keywords "weeds detection" AND "Deep learning" AND "Convolutional Neural Networks."With this, the search equation is established according to each platform:

2.
Records eliminated by exclusion criteria: 65 results eliminated and 79 results remained, 61 from Scopus and 18 from WOS.

Duplicates and Screening
The free access tool Zotero 6.0.30 was used to eliminate duplicate results, and 14 duplicate results were eliminated, obtaining a total of 65 results.

Additional Records
16 articles obtained from reading book chapters and reviews were added to the results, obtaining 81 results.

Records Excluded
A total of 20 records were excluded because they do not meet the objective of this review or cannot be accessed.
Finally, 61 articles that meet the established criteria were gathered and analyzed.Figure 2 illustrates the process in a flow chart using the PRISMA methodology.

Search String in Bibliographic Databases
The search equation is established by restrictively connecting all the results containing the keywords "weeds detection" AND "Deep learning" AND "Convolutional Neural Networks."With this, the search equation is established according to each platform:

Duplicates and Screening
The free access tool Zotero 6.0.30 was used to eliminate duplicate results, and 14 duplicate results were eliminated, obtaining a total of 65 results.

Additional Records
16 articles obtained from reading book chapters and reviews were added to the results, obtaining 81 results.

Records Excluded
A total of 20 records were excluded because they do not meet the objective of this review or cannot be accessed.
Finally, 61 articles that meet the established criteria were gathered and analyzed.Figure 2 illustrates the process in a flow chart using the PRISMA methodology.

Literature Analysis
A detailed analysis of the 61 bibliographic articles was conducted.The analysis indicated that, in the last two years, there has been a growth in the number of articles published on weed detection (Figure 3).The increasing amount of research in this area is mainly due to the development of new and more efficient CNN architectures, the increase in the processing capacity of computers, and the reduction in the price of cameras and GPU.

Literature Analysis
A detailed analysis of the 61 bibliographic articles was conducted.The analysis indicated that, in the last two years, there has been a growth in the number of articles published on weed detection (Figure 3).The increasing amount of research in this area is mainly due to the development of new and more efficient CNN architectures, the increase in the processing capacity of computers, and the reduction in the price of cameras and GPU.

Source of Images for the Training of the CNNs
In the review of the selected articles, it was found that the authors used different types of sources for the acquisition of the images used in the training and validation of the CNNs, such as digital cameras, professional Reflex type, industrial high-speed and lowcost cameras, such as those using Raspberry cards.UAVs had various types of cameras, both RGB and multi-spectral.Smartphones with high-resolution cameras were also used.In addition, it was found that some researchers did not acquire images and used free databases or previous works.Figure 4 shows the number of publications according to the source used, where 49.2% used digital cameras, 29.5% used UAV as a mean of acquisition, 11.5% used smartphones, and 9.8% used already built datasets.

Source of Images for the Training of the CNNs
In the review of the selected articles, it was found that the authors used different types of sources for the acquisition of the images used in the training and validation of the CNNs, such as digital cameras, professional Reflex type, industrial high-speed and low-cost cameras, such as those using Raspberry cards.UAVs had various types of cameras, both RGB and multi-spectral.Smartphones with high-resolution cameras were also used.In addition, it was found that some researchers did not acquire images and used free databases or previous works.Figure 4 shows the number of publications according to the source used, where 49.2% used digital cameras, 29.5% used UAV as a mean of acquisition, 11.5% used smartphones, and 9.8% used already built datasets.

CNN Architecture Used
Table 1 summarizes relevant information extracted from the selected reviewed studies on CNN architectures.

CNN Architecture Used
Table 1 summarizes relevant information extracted from the selected reviewed studies on CNN architectures.Regarding the types of images found in the study, 56 articles used images in the RBG color space, captured with different types of cameras, four articles worked with multispectral images, captured with multi-spectral cameras integrated into UAVs, and one article used Generative Adversarial Networks (GAN) to create its own RBG images.
Figure 5 illustrates the frequency of use of the different CNNs for segmentation, detection, and classification of weeds.The YOLO family of algorithms with its multiple versions is the most commonly applied architecture, followed by the VGG, ResNet, and Faster R-CNN architectures, and the previous one in various versions.Alexnet and MobileNet are also commonly used.spectral images, captured with multi-spectral cameras integrated into UAVs, and one article used Generative Adversarial Networks (GAN) to create its own RBG images.Figure 5 illustrates the frequency of use of the different CNNs for segmentation, detection, and classification of weeds.The YOLO family of algorithms with its multiple versions is the most commonly applied architecture, followed by the VGG, ResNet, and Faster R-CNN architectures, and the previous one in various versions.Alexnet and Mo-bileNet are also commonly used.As for the species used by the researchers, they are characterized by their great variety; a total of 27 species of cash crops and 77 species of weeds were counted.Figure 6 shows the frequency of the crop species used in the different studies, showing that the five most widely studied crops are Sugar Beet (Beta vulgaris), Soybean (Glycine max), Cotton (Gossypium hirsutum), Corn (Zea mays), and Wheat (Triticum aestivum L.).As for the species used by the researchers, they are characterized by their great variety; a total of 27 species of cash crops and 77 species of weeds were counted.Figure 6 shows the frequency of the crop species used in the different studies, showing that the five most widely studied crops are Sugar Beet (Beta vulgaris), Soybean (Glycine max), Cotton (Gossypium hirsutum), Corn (Zea mays), and Wheat (Triticum aestivum L.).
Figure 7 shows the frequency of the families of the weed species used in the different studies.The weed species were grouped into their families due to the variety found, obtaining 22 in total.It was found that the most studied weed families are Poaceae, Asteraceae, Amaranthaceae, Convolvulaceae, and Brassicaceae.
As for the species used by the researchers, they are characterized by their great variety; a total of 27 species of cash crops and 77 species of weeds were counted.Figure 6 shows the frequency of the crop species used in the different studies, showing that the five most widely studied crops are Sugar Beet (Beta vulgaris), Soybean (Glycine max), Cotton (Gossypium hirsutum), Corn (Zea mays), and Wheat (Triticum aestivum L.). Figure 7 shows the frequency of the families of the weed species used in the different studies.The weed species were grouped into their families due to the variety found,

Discussion
Figure 5 shows that the YOLO family of algorithms with its different versions is used most frequently in the investigations analyzed, followed by the VGG family of algorithms.ResNet and Faster R-CNN are used less frequently.These architectures are noted for their high accuracy but differ in speed; the case of VGG, ResNet, and Faster R-CNN require more time to process images, which may not be practical for applications that need fast responses.On the other hand, the YOLO family of architectures is widely recognized for its efficiency in real-time object detection, with a high ability to predict bounding boxes and classify multiple objects in a single pass of the network, which makes it extremely fast and suitable for real-time detection applications, for example, [49,96] claim that YOLO CNNs have higher speed in detecting weeds.
The least frequently used architectures are UNet, ShuffleNet, and EfficientNet.In the case of UNet, it is mainly used in medical image segmentation applications, which explains its low frequency of use in weed detection, but different researchers have made comparisons seeking to evaluate its performance with other specific CNNs for object detection, for example, in [38], using the CNNs SegNet, UNet, VGG16, and ResNet-50, for the detection of weeds in canola fields, the best model was SegNet.Similarly, the work of [63] combined U-Net with ResNet50, using it as a coding block for semantic segmentation of sugar beet, weeds, and soil, obtaining satisfactory results in the segmentation; this opens a window to evaluate different CNN architectures in different applications for

Discussion
Figure 5 that the YOLO family of algorithms with its different versions is used most frequently in the investigations analyzed, followed by the VGG family of algorithms.ResNet and Faster R-CNN are used less frequently.These architectures are noted for their high accuracy but differ in speed; the case of VGG, ResNet, and Faster R-CNN require more time to process images, which may not be practical for applications that need fast responses.On the other hand, the YOLO family of architectures is widely recognized for its efficiency in real-time object detection, with a high ability to predict bounding boxes and classify multiple objects in a single pass of the network, which makes it extremely fast and suitable for real-time detection applications, for example, [49,96] claim that YOLO CNNs have higher speed in detecting weeds.
The least frequently used architectures are UNet, ShuffleNet, and EfficientNet.In the case of UNet, it is mainly used in medical image segmentation applications, which explains its low frequency of use in weed detection, but different researchers have made comparisons seeking to evaluate its performance with other specific CNNs for object detection, for example, in [38], using the CNNs SegNet, UNet, VGG16, and ResNet-50, for the detection of weeds in canola fields, the best model was SegNet.Similarly, the work of [63] combined U-Net with ResNet50, using it as a coding block for semantic segmentation of sugar beet, weeds, and soil, obtaining satisfactory results in the segmentation; this opens a window to evaluate different CNN architectures in different applications for which they were not designed.In the case of ShuffleNet and EfficientNet, these are architectures used in applications where computational efficiency is required; they are usually used in devices with limited resources, for tasks such as semantic segmentation, image classification, or resource optimization, which limits their direct applicability in object detection tasks and, in this case, for weed detection.Despite this, they have been used in works such as [72] which compared the CNNs VGG, Resnet, DenseNet, ShuffleNet, MobileNet, EfficientNet, and MNASNet to detect Rumex obtusifolius in grasslands and found that the best CNN was MobilNet.
Selecting the most suitable CNN architecture for weed detection should focus on aspects such as speed, accuracy, adaptability to different types of weeds and crops, environmental conditions, and computational efficiency.Although the YOLO family algorithms stand out for their real-time speed and ability to detect multiple objects in a single pass, other architectures, such as VGG, ResNet, and Faster R-CNN, offer higher accuracy.However, they may require more processing time.Therefore, speed becomes a parameter whose relevance depends on the application.For example, in developing in-field systems (weeding robots), where working speed is essential to ensure optimal performance, the fastest CNNs should be considered while maintaining a balance with accuracy.On the other hand, if precision is prioritized, hardware with higher processing capacity should be considered to increase speed as much as possible, leading to a higher economic cost in developing this type of machine.
In relation to the adaptability to different types of weeds and crops and various environmental conditions, the efficiency of CNNs is intrinsically related to the quality and diversity of the training set of images.This diversity should contain the highest representativeness of field conditions, ensuring the model can be effectively generalized to real conditions.Therefore, the training images must cover a wide range of scenarios and environmental conditions, including variations in weed types (morphological and color diversity), crops (variability in planting density), lighting conditions (brightness and shadows), soil textures, and crop residues, among other conditions present in the field.When selecting a CNN architecture for weed detection, carefully considering computational efficiency and available resources is essential.This involves balancing accuracy, efficiency, and speed to ensure optimal system performance.
One of the promises in weed detection and classification is the YOLO family of algorithms that stand out for their speed and efficiency in detecting objects in real-time.Other architectures, such as MobileNet, ShuffleNet, SqueezeNet, and EfficientNet, show promise in specific applications requiring particularly high computational efficiency or image segmentation with limited computational resources, such as mobile devices or development boards deployed in low-cost field systems.The constant evolution of algorithms allows any CNN to be implemented in field applications, e.g., VGG, ResNet, and Faster R-CNNN offer higher accuracy but require high computational resources, which opens a window for researchers to optimize CNNs or combine them to reduce computational consumption and improve speed, while maintaining accuracy.
The selected papers analysed in this review show the use of a range of sources to capture images, from multi-spectral cameras on UAVs, to low-cost system development, to smartphones available to everyone.Several articles integrated different sources, as, for example, [60], who used a Nikon 7000 camera (Nikon Corporation, Tokyo, Japan) to build the image dataset for training using YOLOv5.Additionally, they built a spraying system using Raspberry cameras to distinguish dicotyledonous from monocotyledonous weeds.In [68] study, they used an UAV, a multi-copter drone (Hylio Inc., Houston, TX, USA) equipped with a Fujifilm GFX100 (100 MP) camera (Fujifilm Corporation, Tokyo, Japan), using YOLOv4 and Faster R-CNN architectures.In [37], they used two cameras, Sony Cyber-Shot (Sony Corporation, Tokyo, Japan) and Canon EOS Rebel T6 (Canon Inc., Tokyo, Japan), for image acquisition, with AlexNet, GoogLeNet, and VGGNet architectures to detect Perennial ryegrass, dandelion (Taraxacum officinale), ground ivy (Glechoma hederacea L.) and spotted spurge (Euphorbia maculata L.).
The static digital camera is the most commonly used source of image capture due to its higher image quality and speed compared to that of a smartphone or a camera integrated into a UAV; in addition, the digital camera allows greater configuration of the capture adjustment parameters, unlike the cameras integrated into UAVs, which allow specific configurations but cannot be compared to the performance of professional cameras.Additionally, factors such as the movement of the UAV, either by wind or high speeds at the time of capture, cause some images to be out of focus, losing relevant information and making it difficult to label objects.On the other hand, the image quality depends on the type of camera sensor; the charge-coupled device (CCD) sensor is used in some digital cameras due to its image quality and lower noise, but they have a higher power consumption and manufacturing cost.The CCD has been replaced in many digital cameras by CMOS (Complementary metal-oxide-semiconductor) sensors that are more energy efficient and less expensive to manufacture but have a slightly lower quality than the CCD in low light conditions.These types of sensors are used in cameras integrated into UAVs and smartphones, so the choice between the two sensors depends on the application's needs.For example, [59] evaluated three digital cameras, a Canon T6 DSLR camera (Canon Inc., Tokyo, Japan), an LG G6 smartphone (LG Electronics, Seoul, South Korea), and a Logitech c920 camera (Logitech International S.A., Lausanne, Switzerland), and the detection results were higher for the Canon T6 camera.In contrast, the Logitech c920 camera was not suitable for weed detection, demonstrating that SLR-type cameras are preferred for the development of mobile platforms, field carts, or robots due to their image quality and different adjustment options.In the work of [65], a semi-professional Nikon P250 camera (Nikon Corporation, Tokyo, Japan) was used to develop a prototype autonomous sprayer; similarly, in [36], a field robot was designed to detect weeds in high-density crops using a 20-Mpixel JAI camera (JAI A/S, Copenhagen, Denmark), which is a high-speed industrial camera.
Regarding the cameras integrated into the UAVs, the cameras manufactured by DJI (DJI, Shenzhen, China) are mainly used in the Phanton 3 and 4, Matrice 600, Spark and Mavic versions.Therefore, they are limited to the performance and sensor configuration used by the brand.In addition, [72] used UAV Phantom 3 Professional imaging at three flight altitudes (10 m, 15 m, and 30 m); the VGG, ResNet, and DenseNet architectures, along with smaller ShuffleNet, MobileNet, EfficientNet, and MNASNet models were used to detect Rumex obtusifolius.In [51], they used a Mavic pro UAV integrated with a Parrot Sequoia camera, modified the CNN VGG-16 to detect weeds in sugar beet (Beta vulgaris subsp.), and used a Mavic pro UAV integrated with a Parrot Sequoia camera, modified the CNN VGG-16 to detect weeds in augar beet (Beta vulgaris subsp.).
The use of smartphones for imaging source capture has increased in recent years, as there are more efficient and faster CNNs, such as MobileNet, specifically designed to be integrated into mobile devices that perform image capture and processing but lack professional settings.In the work of [62], a Huawei Y7 Prime smartphone (Huawei Technologies Co., Ltd., Shenzhen, China) was used to take images in pea (Pisum sativum) to work with the Faster RCNN ResNet 50 model.Similarly, [70], a Xiaomi Mi 11 smartphone (Xiaomi Corporation, Beijing, China) was used on bell pepper (Capsicum annum L.) to apply Alexnet, GoogLeNet, InceptionV3, and Xception.In addition, researchers take advantage of existing databases and image repositories, which are inexpensive, and many authors make images and training sets freely available.However, they do not exist for all weed species and crops.In [67], the DeepWeeds dataset was used to train SSD-MobileNet, SSD-InceptionV2, Faster R-CNN, CenterNet, EfficientDet, RetinaNet, and YOLOv4 models.Ref. [77] used the agri_data dataset, available on Kaggle, on Falsethistle grass and walnut (Carya illinoinensis), to train VGGNet, VGG16, VGG19, and SVM models.
In terms of the infrastructure required to train CNNs, Graphics Processing Units (GPU) are necessary, allowing the processor to offload training work to take care of general management tasks, such as data loading, workflow coordination, and communication with other system components.NVIDIA, known primarily for its development and manufacturing of GPUs, developed a parallel computing platform, CUDA (Compute Unified Device Architecture), allowing developers to utilize NVIDIA GPUs' processing power.This technology has been instrumental in the field of artificial intelligence (AI), especially in the training and inference of CNNs, so many researchers use it to train and validate their architectures.The following are the versions of GPUs found in this study:

Conclusions
In this systematic review, following the guidelines of the PRISMA statement, 61 scientific articles on the detection of weeds using CNN were analysed, using the following databases: Web of Science and Scopus.The review covers the last five years (2019-2023).
The CNN architectures were identified as the most commonly applied for weed segmentation, detection, and classification, followed by the YOLO family of algorithms, with its different versions.The VGG architecture and its versions are the third most widely used, followed by the ResNet and Faster R-CNNN.The Alexnet and MobileNet were the least widely used architectures.
The sources used to acquire the images used to train and validate the CNNs were identified.Fixed digital cameras, reflex type, or low-cost cameras which allow a wide configuration and higher image quality are the main sources used.Cameras integrated into UAVs are also used, despite their speed limitation and lower configurations compared to an SLR camera, which can be RGB or multi-spectral.Smartphones with high resolution cameras have also been used, with the drawback of low processing speed.In this review, it was noted that some authors used free databases or databases from previous studies to avoid image acquisition and the difficulties associated with it.
Despite the demonstrated effectiveness of CNNs in weed detection, several limitations and challenges arise that deserve attention in future research.One critical limitation is the need for large data sets representing a wide range of weed species and images at different growth stages of both the weeds and the crop.Another significant disadvantage is the limited ability of CNNs to generalize new images under untrained environmental conditions.Therefore, datasets must include different lighting conditions, shadows, glare, and the presence of elements commonly found in the crop.This diversity of data is crucial to ensure that the model can effectively generalize across different scenarios encountered under field conditions.
However, collecting and labeling these datasets are costly and laborious tasks that often require experts, so using new techniques such as GANs is an alternative to exploring with TL.Likewise, it is necessary to explore combining CNN structures to make them less complex and shallower, implying less processing time.
Finally, CNNs require significant computational resources, especially when using deep architectures or large datasets.This computational demand can be a limitation in resource-constrained environments, such as mobile devices or embedded systems.Therefore, computational efficiency should be carefully considered when selecting a CNN architecture for applications in these environments, seeking a balance between model accuracy and the required computational load.As future work, it is expected that the review will be extended to focus on the search for the most appropriate CNN architectures for weed detection and classification.Finally, it is hoped that this review article will help researchers to create new technological developments that will improve weed detection.

Figure 1 .
Figure 1.The basic structure of a CNN model.The application of CNNs involves a series of steps: first, data preparation (image acquisition and labeling); second, CNN selection and configuration (hyperparameter tuning); third, CNN training (through GPU processing Graphics Units); fourth, evaluation of CNN performance (usual metrics: average precision (mAP), recall, F1-score, and Confusion Matrix, among others); and fifth, model deployment (real-world applications).

Figure 1 .
Figure 1.The basic structure of a CNN model.

2. 1 .
Research Question and Review Objectives 1. Research question: What are the Convolutional Neural Network (CNN) architectures most frequently used for weed detection and what are the most commonly used image acquisition sources for training these CNNs?? 2. Main Objective: Analyze the different CNN architectures used for weed detection and identify the sources of image acquisition for CNN training.3.

Figure 2 .
Figure 2. Flow chart illustrating the number of articles included in the systematic review according to the PRISMA process.

Figure 3 .
Figure 3. Number of publications for specific years.

Figure 3 .
Figure 3. Number of publications for specific years.

Figure 4 .
Figure 4. Number of publications by image source for training of CNNs.

Figure 4 .
Figure 4. Number of publications by image source for training of CNNs.

Figure 5 .
Figure 5.The frequency of use of the different CNN architectures found in this study.

Figure 5 .
Figure 5.The frequency of use of the different CNN architectures found in this study.

Figure 6 .
Figure 6.Frequency of crop species found in this study.

Figure 6 .
Figure 6.Frequency of crop species found in this study.

Figure 7 .
Figure 7. Frequency of weed families found in this study.

Figure 7 .
Figure 7. Frequency of weed families found in this study.

Table 1 .
Summary of articles describing CNN architecture and source of image.

Table 1 .
Summary of articles describing CNN architecture and source of image.