CNN-MLP-Based Configurable Robotic Arm for Smart Agriculture

Li, Mingxuan; Wu, Faying; Wang, Fengbo; Zou, Tianrui; Li, Mingzhen; Xiao, Xinqing

doi:10.3390/agriculture14091624

Open AccessArticle

CNN-MLP-Based Configurable Robotic Arm for Smart Agriculture

by

Mingxuan Li

,

Faying Wu

,

Fengbo Wang

,

Tianrui Zou

,

Mingzhen Li

and

Xinqing Xiao

^*

College of Engineering, China Agricultural University, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Agriculture 2024, 14(9), 1624; https://doi.org/10.3390/agriculture14091624

Submission received: 5 July 2024 / Revised: 13 September 2024 / Accepted: 15 September 2024 / Published: 17 September 2024

(This article belongs to the Section Agricultural Technology)

Download

Browse Figures

Versions Notes

Abstract

Amidst escalating global populations and dwindling arable lands, enhancing agricultural productivity and sustainability is imperative. Addressing the inefficiencies of traditional agriculture, which struggles to meet the demands of large-scale production, this paper introduces a highly configurable smart agricultural robotic arm system (CARA), engineered using convolutional neural networks and multilayer perceptron. CARA integrates a highly configurable robotic arm, an image acquisition module, and a deep processing center, embodying the convergence of advanced robotics and artificial intelligence to facilitate precise and efficient agricultural tasks including harvesting, pesticide application, and crop inspection. Rigorous experimental validations confirm that the system significantly enhances operational efficiency, adapts seamlessly to diverse agricultural contexts, and bolsters the precision and sustainability of farming practices. This study not only underscores the vital role of intelligent automation in modern agriculture but also sets a precedent for future agricultural innovations.

Keywords:

agricultural robotics; robotic arm; deep learning; convolutional neural networks; multilayer perceptrons; intelligent robotic systems

1. Introduction

The agricultural sector is facing unprecedented challenges due to the increasing global population and the relative decline in arable land. To meet the growing demand for food, enhancing agricultural productivity and sustainability is imperative [1,2,3]. Traditional agriculture, which heavily relies on manual labor, is inefficient and cannot meet the demands of large-scale production. Agriculture is rapidly transitioning towards automation and intelligence, driven by technological innovations [4,5].

Smart agriculture, which is at the forefront of the agricultural technological revolution, utilizes advanced information and robotics technologies to achieve precision management and automation in agricultural production [6,7]. Smart agriculture integrates the Internet of Things (IoT), big data, and artificial intelligence to monitor crop growth, fertilize precisely, irrigate automatically, and control pests [8,9].

Robotic arms are a critical component of automation technology and play a vital role in intelligent agriculture. Robots can execute precise tasks such as plant picking, fruit sorting, and packaging with ease [10,11,12]. They significantly reduce manual labor and production costs while enhancing accuracy and efficiency [13,14,15]. However, accomplishing these complex tasks requires more than mere mechanical operations. It necessitates the support of advanced artificial intelligence technologies like deep learning to endow robotic arms with visual recognition and decision-making capabilities [16,17].

The integration of convolutional neural networks (CNN) and multilayer perceptron (MLP) in deep learning has demonstrated remarkable capabilities in image recognition and processing [18,19,20,21,22]. In the context of intelligent agriculture, these technologies empower robotic arms to accurately identify various crops and their growth stages, facilitating precise operations on specific crops. To deploy agricultural technologies effectively, a comprehensive system framework that integrates robotic arm hardware and deep learning software is necessary [4,23,24].

This paper introduces a novel configurable agricultural smart robotic arm system (CARA), which incorporates a highly configurable robotic arm (HCRA), image acquisition module (IAM), and deep processing center (DPC). The CARA system utilizes a deep learning model that combines CNN with MLP to autonomously perform agricultural tasks, including picking, pesticide spraying, and crop inspection. It is highly adaptable and can be quickly configured to meet the specific needs of different crops and operational environments. The CARA system has demonstrated significant potential and practical value in intelligent agricultural production through extensive research and experimental validation. This is a significant step towards more efficient, precise, and sustainable agricultural development.

2. Materials and Methods

The CARA is predicated on flexibility and adaptability, enabling it to perform a variety of tasks with precision and efficiency. As depicted in Figure 1a, the CARA is principally composed of three key components: the IAM, the HCRA, and the DPC. The CARA operates in the primary processing stage of agricultural produce, handling tasks such as the picking and bagging of fruits, and pesticide spraying on vegetables. At the heart of CARA’s hardware configuration lie the HCRA and the IAM (Figure 1b), with the software component being constituted by the DPC. This section provides a comprehensive overview of the design and functionality employed in the development of the HCRA, the design and integration of the DPC with the IAM, and the holistic experimental deployment of CARA. Each selected component and strategic decision have been elaborately detailed to underscore their substantial contributions to the system’s configurability and operational efficiency. Figure 1c shows the actual demonstration of tree pest and disease detection.

2.1. Structure and Configuration of HCRA

To achieve success in smart agriculture, selecting a lightweight and highly configurable robotic arm is crucial [25,26,27,28,29]. As depicted in Figure 1b, the arm comprises rigid links, an end effector, and joints. The mechanical structure of the robotic arm is composed of rigid links that are connected by joints. It is important to note that the system’s end effector is highly configurable and can perform a multitude of intelligent tasks necessary for farm operations. The programming pendant enables visual control of the robotic arm and provides software requirements for external control. Table 1 and Table 2 summarize the physical attributes of the robotic arm, which has been designed to meet the agility and configurability required for different farm work environments. The maximum velocity of the robotic arm’s end effector represents the theoretical peak angular speed achievable under optimal conditions, distinct from the commonly discussed operational angular velocities which are subject to various practical constraints.

The incorporation of a six-degrees-of-freedom (6-DOF) robotic arm into smart agriculture capitalizes on its established utility across diverse industrial applications. Characterized by its comprehensive validation in multifaceted operational settings, the 6-DOF system excels in executing intricate and precision-driven tasks. Its adoption reflects a commitment to the principles of standardization that resonate with the prevailing industrial automation ecosystem, ensuring a high degree of reliability and adaptability essential for the demanding environments of intelligent agricultural operations. The prevalent use of this configuration in industry highlights its capabilities for exceptional repeatability and accuracy, critical attributes that address the stringent requirements of agricultural productivity and efficiency. Consequently, employing a 6-DOF robotic arm in our system is a strategic choice, aimed at harnessing a well-established mechanical infrastructure to augment the technological sophistication and operational efficacy of agricultural automation initiatives.

The joints of the robotic arm are designed with the capability to rotate ±360 degrees. However, the achievable range of motion may be restricted based on the application environment. Specifically, in standard operational settings, limitations are typically observed in the second, third, and fifth degrees of freedom. This discussion is confined to describing the basic structural and performance characteristics of the robotic arm without delving into the nuances of environmental constraints on its operation. The focus here is to outline the core capabilities as a baseline, with empirical assessments provided to substantiate the mechanical arm’s foundational configurations, ensuring clarity in its basic operational parameters without considering the variability introduced by different deployment contexts.

Figure 2a displays a detailed dimensional model of the Highly Configurable Robotic Arm (HCRA), providing a granular view of the arm’s precise geometrical specifications. To achieve a comprehensive understanding of the arm’s structural and mechanical performance, a high level of detail is required. The HCRA has been modeled in SolidWorks and Adams, as shown in Figure 2b,c. This modeling represents the initial step towards iterative design improvements and functional enhancements, laying a solid foundation for the continuous development of the robotic arm’s capabilities. The HCRA’s workspace, as shown in Figure 2d, optimizes coverage and maneuverability for agricultural tasks. Additionally, Figure 2e displays the arm in various configurations, demonstrating the HCRA’s versatility and adaptability to diverse operational demands encountered in agricultural settings.

In discussions of the mathematics underlying a robotic arm, the focus typically centers on kinematics, dynamics, and control strategies [30,31,32,33,34,35,36,37]. Among these, the Denavit–Hartenberg (D-H) parameter transformation matrix serves as a method to describe the relative position and orientation of links and joints. This method utilizes four parameters: the link length

a_{i}

, the link twist angle

α_{i}

, the link offset

d_{i}

, and the joint angle

θ_{i}

. For the transformation from one link to the next, the transformation matrix

T_{i}

can be expressed Equation (1).

T_{i} = [\begin{matrix} \cos θ_{i} & - \sin θ_{i} \cos α_{i} & \sin θ_{i} \sin α_{i} & a_{i} \cos θ_{i} \\ \sin θ_{i} & \cos θ_{i} \cos α_{i} & - \cos θ_{i} \sin α_{i} & a_{i} \sin θ_{i} \\ 0 & \sin α_{i} & \cos α_{i} & d_{i} \\ 0 & 0 & 0 & 1 \end{matrix}] .

(1)

The forward kinematics equations of a robotic arm are employed to compute the position and orientation of the end effector based on given joint angles. If the robotic arm has

n

joints, its forward kinematics equation can be represented by the multiplication of the transformation matrices of all joints as Equation (2):

T = T_{1} \cdot T_{2} \cdot \cdot \cdot T_{n} .

(2)

This resulting matrix

T

describes the position and orientation of the end effector relative to the base frame.

The Jacobian matrix

J

delineates the relationship between the velocity of the end effector and the joint velocities. For a six-degree-of-freedom robotic arm, the Jacobian matrix can be divided into the linear velocity part

J_{v}

and the angular velocity part

J_{ω}

as Equation (3):

J = [\begin{matrix} J_{v} \\ J_{ω} \end{matrix}] .

(3)

Here,

J_{v}

and

J_{ω}

correspond to the dependencies of the end effector’s linear and angular velocities on the joint velocities, respectively.

PID (Proportional–Integral–Derivative) control is a commonly used control strategy for motion control of robotic arms. The objective of a PID controller is to reduce the discrepancy between the target and actual positions, and its control law can be expressed as Equation (4):

u (t) = K_{p} e (t) + K_{i} \int e (t) d t + K_{d} \frac{d e (t)}{d t}

(4)

Here,

u (t)

represents the control input (such as force or torque),

e (t)

is the position error, and

K_{p}

,

K_{i}

, and

K_{d}

are the proportional, integral, and derivative gains, respectively.

Due to the maturity and ubiquity of Proportional–Integral–Derivative (PID) control in automation, this study does not specifically test or extensively discuss PID performance. PID controllers, known for their robustness and effectiveness in maintaining system stability and optimizing performance, are well-documented across various applications. Numerous studies have validated their efficiency in real-world scenarios, establishing a solid foundation of understanding. Therefore, detailed discussions on PID tuning are considered unnecessary, as the focus is on presenting innovative advancements beyond this conventional control technique [38,39].

In conclusion, the HCRA will improve operational efficiency in smart agriculture, representing significant progress in automating complex and varied farm tasks. The deployment of this innovative technology demonstrates a strong commitment to enhancing productivity through intelligent mechanization, thereby increasing the precision and sustainability of agricultural practices.

2.2. Structure and Configuration of IAM and DPC

The CARA framework heavily relies on IAM for its visual perception capabilities. An efficient IAM module has been designed and implemented within the framework, as demonstrated in Figure 1b The module includes function buttons, switches, cameras, control circuits, and interfaces. This design optimizes the integration and layout of hardware components. The comprehensive and efficient control circuit ensures precise image acquisition capabilities. The intuitive and convenient interface provided by function buttons and switches allows users to activate the image acquisition process with ease. As the core component of image acquisition, camera captures visual information within the mechanical working environment. Camera captures signals that control circuit processes. The image acquisition process is optimized through pre-assembled and pre-programmed algorithms, ensuring high-quality and efficient transmission of image data. The IAM interface effectively transmits the image data to the DPC, allowing flexible deployment on computers or deep learning development boards. The DPC utilizes a highly efficient deep learning model that seamlessly integrates CNN (Figure 3a) and MLP (Figure 3b) to accurately process and recognize visual information of targets in even the most complex working environments. The DPC utilizes a highly efficient deep learning model that seamlessly integrates CNN and MLP to accurately process and recognize visual information of targets in even the most complex working environments. This model has been specifically designed to significantly enhance the end effector’s capabilities and performance. This deep learning model effectively extracts features from collected images and achieves high-precision target recognition by learning from a large amount of training data.

The proposed hybrid model integrates the feature extraction capabilities of CNN with the spatial and sequential processing strengths of MLP through MLPMixer and Permutator modules. Utilizing the Visual Geometry Group Network (VGG) for initial feature extraction, the architecture employs convolutional layers and Region of Interest (RoI) pooling to generate fixed-size feature representations, irrespective of object scales. Diverging from traditional CNN paths, this model introduces a bifurcation in which feature maps are concurrently processed by the MLPMixer, which reorganizes features across spatial dimensions to enhance global image context comprehension, and the Permutator, which applies a segment-wise permutation strategy to improve integration and interpretation of feature segments. This dual approach enables a deeper understanding of the interrelationships among input space components, significantly refining the model’s analytical capabilities [40,41,42,43,44,45].

This approach to feature processing involves two pathways that enhance the model’s representational capacity and flexibility in handling complex spatial relationships and feature interactions. The architecture consolidates the features from both pathways after processing through the MLP. These features are then harmonized through layer normalization and dimensionality reduction techniques. This consolidation unifies the diverse and rich feature sets into a coherent representation that is suitable for the final prediction stages.

The model’s forward pass directs the unified feature representation into dedicated heads for classification and bounding box regression. The components output final object class predictions and precise bounding box coordinates, utilizing the enriched feature set produced by the hybrid architecture. The design balances the depth and breadth of feature processing, ensuring adequate capture and utilization of both local and global image contexts for object detection tasks.

To provide a detailed and academic exposition of the mathematical principles underpinning CNN and MLP, we delve into the foundational operations, transformations, and learning mechanisms that characterize these two pivotal architectures in the domain of deep learning.

The convolution operation is central to a CNN’s functionality. For an input

X

and a filter

F

, the convolution operation

(i, j)

at position

(X * F) (i, j)

is computed as Equation (5).

S (i, j) = (X * F) (i, j) = \sum_{m} \sum_{n} (i + m, j + n) F (m, n) .

(5)

This equation implies that for each position

(i, j)

, the output

S

is the sum of element-wise products of the filter

F

with the input

X

. This operation is repeated across the entire input, producing a feature map that highlights the presence of features the filter is designed to detect.

After each convolution operation, the feature map is passed through an activation function to introduce non-linearity. The Rectified Linear Unit (ReLU) activation function is defined as Equation (6).

ReLU (x) = m a x (0, x) .

(6)

This function retains only positive values and sets all negative values to zero. It is essential for enabling the network to model complex functions and capture patterns like edges and corners in images.

To reduce the spatial dimensions of the feature map and to make the representation smaller and more manageable, a pooling layer is used. The max pooling operation is defined as Equation (7).

P (i, j) = \underset{k, l \in window}{m a x} X (i \cdot s + k, j \cdot s + l) .

(7)

This equation selects the maximum value in a specified window of the feature map, effectively downsampling the feature map to retain only the most significant features, thereby providing the network with translational invariance.

In an MLP, each neuron in a layer

l

computes an output based on its input weights

w

, input features

x

, and bias

b

. The operation is defined by Equation (8):

y^{(l)} = σ (\sum w_{i}^{(l)} x_{i} + b^{(l)}) .

(8)

This equation represents the weighted sum of the inputs plus a bias term, passed through a non-linear activation function

σ

, which is typically a sigmoid or ReLU function. This step allows the MLP to capture complex relationships between inputs and outputs.

The overall output of an MLP is obtained by propagating the input through multiple layers, each composed of neurons as described in Equation (8). The final output for a network with

L

layers is Equation (9).

O^{(L)} = σ (W^{(L)} σ (\dots σ (W^{(2)} σ (W^{(1)} X + B^{(1)}) + B^{(2)}) \dots) + B^{(L)}) .

(9)

This recursive equation represents the composition of functions across multiple layers, transforming the input

X

into a final output

O^{(L)}

through successive applications of linear and non-linear operations.

Learning in neural networks is accomplished by adjusting the weights

W

to minimize a loss function

L

. The update rule for weights in gradient descent is defined by Equation (10).

W_{new}^{(l)} = W^{(l)} - η \frac{\partial L}{\partial W^{(l)}} .

(10)

Similarly, the biases

B

are updated by Equation (11).

B_{new}^{(l)} = B^{(l)} - η \frac{\partial L}{\partial B^{(l)}} .

(11)

Both Equations (10) and (11) describe how the parameters of the network are adjusted in the direction that most reduces the loss. The partial derivatives

\frac{\partial L}{\partial W^{(l)}}

and

\frac{\partial L}{\partial B^{(l)}}

represent the gradient of the loss function with respect to the weights and η is the learning rate.

The equations encapsulate the essence of CNNs and MLPs, where convolution operations extract spatial features, activation functions introduce non-linearity, pooling layers downsample data, and neurons in MLPs build complex mappings from inputs to outputs, all iteratively refined through backpropagation to minimize a loss function.

2.3. Experimental Scheme

Tomatoes, apples, oranges, pears, and corn are crucial components of the CARA identification, processing, and harvesting experimental deployment, as depicted in Figure 4. CARA is a highly integrated structure consisting of IAM, DPC, and HARC. The IAM and HARC are indispensable hardware elements of the system. The DPC expertly coordinates the multitude of images captured by IAM across various HARC positions, executing a range of intricate tasks. Hardware-software integration is essential for accurately harvesting, processing, and identifying a variety of agricultural produce, such as vegetables, fruits, and cereal crops.

The IAM, located at the extremity of HARC, is equipped with real-time image acquisition capabilities. It interfaces with the DPC, which processes the collected image signals and dispatches behavioral commands to HARC for various processing tasks. The system’s capacity to smoothly switch between soft and rigid robotic claws to meet the harvesting needs of various fruits and vegetables is demonstrated in Figure 4b,c. This modular approach enables customization of the mechanical arm for different agricultural tasks and reduces the time and resources required for retrofitting machinery for different crop types.

Figure 4d demonstrates the successful completion of a fruit-grasping task by a soft robotic gripper developed by the Intelligent Sensing Laboratory at China Agricultural University’s College of Engineering. The gripper, made of liquid silicone casting, is mounted at the end-effector interface of the HARC. The IAM–DPC communication, with a power rating of 2 watts, enables IAM to transmit image sampling data to the DPC every 0.2 s for image processing, storage, and command execution.

In response to the observations raised, it is imperative to elucidate the operational dynamics between the neural network and the robotic manipulator within our system. The primary function of the neural network, beyond mere recognition of tomatoes or other fruits, extends to assessing their suitability for grasping based on size, orientation, and ripeness. Critically, the network determines the spatial positioning of the target relative to the gripper, an essential step for facilitating precise manipulation.

To bridge the gap between image recognition and physical action, our system leverages the Robot Operating System (ROS), a robust middleware renowned for its utility in robotics applications. ROS facilitates a seamless integration of computer vision and control algorithms, enabling real-time communication between the neural network and the robotic arm. This integration is crucial for translating the spatial coordinates identified by the neural network into actionable commands that guide the gripper’s movements. By employing ROS, we ensure that the gripper approaches, aligns with, and securely grasps the fruit, accomplishing this with a level of precision that manual control could not easily replicate.

This system architecture not only enhances the accuracy of the grasping process but also exemplifies the practical application of combining advanced neural network processing with traditional robotic control to achieve efficient automated agricultural tasks.

Analyzing the motion of the robotic arm in the experimental scenario improves its structure for enhanced stability during operation. The completeness and efficiency of image acquisition, as well as the accuracy of the CNN–MLP algorithm in processing images, are thoroughly evaluated through IAM and DPC performances. Finally, CARA’s ability to perform processing tasks is evaluated by placing various experimental materials on the table and testing the arm’s grasping response to commands, image processing, and arm motion. This sequence of experiments measures the system’s operational integrity, high degree of configurability, and component synergy. This assessment determines the system’s ability to operate autonomously with minimal human intervention and complete tasks accurately. Feedback on these outcomes is critical for the precision and reliability of CARA during operation, which is essential in the field of modern precision agriculture.

3. Results and Discussion

In the context of experimental scenarios, this section offers a detailed analysis and discussion on the performance of the robotic arm within a simulation environment, the efficiency of the CNN-MLP model, and the capability of the CARA system in processing various agricultural products.

3.1. Performance Analysis of the HARC

To optimize the design of a robotic arm for processing vegetables, fruits, and grain crops, it is crucial to conduct a performance analysis. This analysis helps with necessary evaluations throughout the research process. Tests conducted within a simulation environment can significantly reduce costs by eliminating the need for physical prototypes or testing apparatuses. To obtain accurate answers with minimal risk and loss associated with physical testing, only the robotic arm modeling and necessary parameter measurements are required. The simulation environment used in the testing process allows for rapid changes to the design parameters of the robotic arm, enabling the identification of the optimal design. This flexibility is difficult to achieve in real-world settings. Simulations can replicate dangerous or extreme operational scenarios that may not be safely achievable in real-world situations. They also allow for extensive and repeated testing of various working environments and conditions ensuring statistical reliability and robustness. This is often not feasible in terms of time and cost in actual settings.

The experiment modeled the robotic arm using Adams2020 simulation software, incorporating its actual dimensions and mass to provide reliable and comprehensive simulation data. The collected data on the relative angular acceleration of each joint of the robotic arm under a dynamic friction coefficient of 5 are shown in Figure 5a. The analysis was limited to the five movable joints as a movable joint was not used on the actuator’s end during deployment. The robotic arm demonstrated exceptional stability throughout the process, exhibiting minimal relative angular acceleration. The force changes over time for the five different joints studied are displayed in Figure 5b through five distinct curves. These forces represent the loads that the robotic arm must overcome when it is unloaded. The torque changes in the joints of the unloaded robotic arm are stable and consistent, with no significant fluctuations. Figure 5c displays the change in torque over time, with five curves corresponding to the torque on different joints. Torque is crucial for ensuring precise and efficient movements of the robotic arm. Furthermore, the variation in torque for each joint remains consistently stable, with no significant changes in numerical values.

In summary, these charts demonstrate the precise kinematic and dynamic performance of the robotic arm under specific dynamic friction coefficients. The consistent or slightly fluctuating curves indicate that the forces and torques are stable and controllable during operation. These data are crucial for evaluating the effectiveness of the robotic arm in different applications.

3.2. Performance Analysis of DPC

In accordance with experimental paradigms, a stringent analysis of the CNN coupled with the MLP model’s performance has been conducted. Images harvested by the IAM were leveraged and coupled with Class Activation Mapping (CAM) heatmap visualization, image augmentation techniques, and loss metric analysis, to assess the overall efficacy of the DPC integrated with CNN + MLP within the CARA framework.

The employment of CAM heatmaps furnishes an attention visualization of regions within the input image that significantly influences the convolutional neural network’s output. This analytical process is integral to comprehending the model’s focal points, ensuring that they genuinely concentrate on pertinent features for accurate prediction. Figure 6a depicts the CAM heatmap visualization analysis using tomatoes as a case study. The CAM heatmaps overlaid on tomato images elucidate the model’s perceived key areas for identifying and classifying objects across various scenarios. The distinct features affirm the viability of the DPC in the image processing stage within CARA. Notably, as illustrated in Figure 6b, image augmentation plays an indispensable role in the model’s robustness. By artificially expanding the dataset with transformed versions of the input images comprising flips, affine transformations, contrast adjustments, rotations, and brightness modifications diversity is infused into the training regimen. This augmentation process exemplified with tomatoes expanded the original dataset from 800 to over 4000 images, mitigating overfitting and enhancing the model’s generalization capability. The original and augmented images contribute to presenting the model with assorted perspectives, simulating the plethora of environmental conditions potentially encountered in practical applications—an essential measure to heighten IAM + DPC performance.

In the evaluation of the DPC, a critical component of our dataset collection involved simulating real-world agricultural challenges to test the robustness of the system under diverse and adverse conditions. The dataset included scenarios with occluded branches and foliage, overlapping fruits, and varying degrees of sunlight interference, which are commonplace in dynamic agricultural environments. To enhance the complexity and realism of the experimental settings, strategic modifications such as the introduction of artificial occlusions and adjustments in image clarity were employed. These modifications were designed to mimic natural obstructions and visual impairments that typically occur in field conditions. This approach not only tested the DPC’s capability to process and analyze images under suboptimal conditions but also helped in developing a more resilient model by exposing it to a wider array of variables that affect visual perception in agricultural operations. The enriched dataset thereby facilitated a comprehensive assessment of the DPC’s performance, ensuring its effectiveness in real-world agricultural applications where such challenges are prevalent.

The analytical procedure of loss involves two principal metrics: classification loss (Figure 6c) and bounding box regression loss (Figure 6d). Classification loss appraises the model’s precision in discerning the correct object categories within the image while bounding box regression loss quantifies the accuracy in object localization. As depicted, the convergence of these two metrics evidences the model’s learning trajectory. The steady decrement in loss values across epochs signifies the model’s escalating adeptness in distinguishing and accurately positioning target objects.

The training environment for the CNN–MLP model, integral to this investigation, was configured with a robust hardware suite to ensure optimal performance and reproducibility of results. The computational core of the system consisted of an Intel i9-12900 Central Processing Unit (CPU), which operates at a frequency of 3.19 GHz, providing the necessary processing power for managing the computationally intensive tasks associated with neural network training. This was supplemented by an NVIDIA GeForce RTX 4090 Graphics Processing Unit (GPU) equipped with 24 GB of video memory, offering expansive parallel processing capabilities that are crucial for decreasing the training durations of complex deep learning models.

The setup was further enhanced with 64 GB of Random Access Memory (RAM) under a Windows 10 operating system, which facilitated the efficient handling of large-scale data operations required by advanced deep learning frameworks. PyTorch 1.12 was selected as the deep learning framework due to its flexibility and dynamic computation features, supported by an active community. The Compute Unified Device Architecture (CUDA) 11.7 was employed to maximize GPU efficiency, facilitating more effective numerical computations and faster iteration of models.

All coding and script executions for the model were conducted in the Visual Studio Code (VSCode) Integrated Development Environment (IDE), using Python 3.8. This environment was chosen for its extensive support for multiple programming extensions and tools, enhancing coding efficiency and simplifying the model development process. This carefully engineered computational framework supports the rigorous demands of the CNN-MLP model training, ensuring comprehensive utilization of available computational resources to fulfill the research objectives effectively.

The analyses of CAM heatmap scrutiny, the image augmentation process, and the diminishing trend in loss metrics collectively facilitate a comprehensive performance evaluation. The dissected descent in loss illustrates enhancements in the model’s competence to categorize and precisely situate bounding boxes around objects of interest. This decline in loss is corroborated by the CAM heatmaps, wherein focal points are aligned with pivotal areas of interest in the images, signifying the model’s alignment to salient features. The execution of the image augmentation process is a testament to the necessity of bolstering robustness in deep learning—a practice well executed herein. Overall, these insights confirm the proficiency of the CNN-MLP model in agricultural applications and lay a cornerstone for the implementation of reliable and efficient automated processes within the CARA system.

3.3. Performance Analysis of IAM

The IAM performance assessment heavily relies on the reliability of image acquisition, which is fundamental to the system’s overall functionality. It is crucial to note that an evaluation based solely on IAM-captured images may not fully capture its effectiveness within the system. Hence, a more refined approach was adopted to ensure a comprehensive evaluation. Figure 7 shows that the DPC’s accuracy highlights its ability to accurately identify information related to agricultural products from the collected images. This process allows for objective categorization and accurate classification rates of the identified target crops. This evaluation methodology ensures a highly accurate reflection of IAM’s adaptability and efficiency in the integrated system. IAM plays a crucial role in enhancing the precision and reliability of agricultural product identification and processing.

3.4. Evaluation of CARA

CARA was successfully deployed at a farm in Shunyi, Beijing, where fruit identification tests were conducted to assess its performance with the involvement of farm staff and students participating in this study. Figure 8 demonstrates the high and stable recognition capabilities of the CNN–MLP model deployed by CARA for various vegetables, fruits, and cereal crops at different iterations, with the best performance achieved around 200 cycles. Table 3 displays the performance metrics of CARA before and after its deployment, showcasing impressive results. Additionally, Table 4 provides valuable suggestions for CARA’s improvement, as recommended by agricultural product processing personnel.

A comprehensive performance evaluation of the robotic arm was conducted to validate its operational efficiency and precision in executing tasks typical of agricultural automation. Key performance metrics included a peak operating speed of approximately 1.8 m/s and an average time to successfully grasp objects at about 3 s, indicating the system’s rapid response and processing capabilities. The end effector demonstrated exceptional repeatability with a positioning accuracy of about 0.1 mm. Over an extensive series of 200 rigorous field tests, the system achieved a 99% success rate in grasping, with only two unsuccessful attempts, highlighting its reliability and effectiveness in practical applications. The employment of a soft, non-destructive grasping mechanism significantly contributed to minimizing damage during operations, thereby preserving the integrity of delicate agricultural produce. Furthermore, the system’s average power consumption was recorded at approximately 200 watts, demonstrating an optimal balance between energy efficiency and performance output. These metrics not only confirm the robustness of the robotic arm but also underscore its suitability for deployment in settings that demand high precision and minimal impact on the products.

CARA’s micro and macro aspects were comprehensively analyzed by integrating various evaluation metrics. These included the performance of the robotic arm in a simulation environment, the efficacy of the CNN–MLP model, the IAM camera’s capability to capture images of diverse agricultural products, and the accuracy of the CARA system in grasping agricultural produce. This performance evaluation establishes CARA’s high proficiency in executing precise and intelligent tasks essential for smart agriculture. Its advanced automation and intelligence significantly enhance agricultural productivity, offering substantial improvements over traditional farming tools.

4. Conclusions

The deployment of the CARA system in smart agriculture environments signifies a transformative advancement in precision and efficiency. CARA demonstrated exceptional capability of automating a wide range of agricultural tasks, significantly enhancing productivity and sustainability. The experimental analysis conducted was thorough and extensive, confirming the system’s robust design and its application of advanced CNN-MLP algorithms. These algorithms facilitate precise task execution, adaptability across diverse crops and operational contexts, and require minimal human intervention. The precision of the CARA system is notably high, with its design enabling reliable performance across varied agricultural environments. Additionally, the system’s broad applicability suggests a significant potential for widespread use in different agricultural settings. We anticipate that future research will concentrate on further optimizing these algorithms to extend their applicability and incorporating autonomous decision-making processes. The successful implementation of the CARA system underscores the promising potential of integrating robotics and AI in agriculture, setting the stage for a more sustainable and efficient future in the field.

Author Contributions

Conceptualization, X.X.; Validation, F.W. (Faying Wu); Formal analysis, T.Z.; Investigation, F.W. (Fengbo Wang); Data curation, M.L. (Mingxuan Li); Writing—original draft, M.L. (Mingxuan Li); Writing—review & editing, X.X.; Visualization, M.L. (Mingzhen Li); Supervision, X.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the data are being used in another ongoing and unpublished experiment.

Acknowledgments

This research is supported by the Open Project of “Spark Task” in 2024 at the Key Laboratory of Digital Agriculture Sichuan Chongqing Joint Innovation of “Tianfu Granary” (No.5), Chinese Universities Scientific Fund, and the 2115 talent development program of China Agricultural University.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, J.J.; Chen, D.; Qi, X.D.; Li, Z.J.; Huang, Y.B.; Morris, D.; Tan, X.B. Label-efficient learning in agriculture: A comprehensive review. Comput. Electron. Agric. 2023, 215, 108412. [Google Scholar] [CrossRef]
Saleem, M.H.; Potgieter, J.; Arif, K.M. Automation in Agriculture by Machine and Deep Learning Techniques: A Review of Recent Developments. Precis. Agric. 2021, 22, 2053–2091. [Google Scholar] [CrossRef]
Tang, Y.C.; Chen, M.Y.; Wang, C.L.; Luo, L.F.; Li, J.H.; Lian, G.P.; Zou, X.J. Recognition and Localization Methods for Vision-Based Fruit Picking Robots: A Review. Front. Plant Sci. 2020, 11, 510. [Google Scholar] [CrossRef]
Kumar, M.S.; Mohan, S. Selective fruit harvesting: Research, trends and developments towards fruit detection and localization—A review. Proc. Inst. Mech. Eng. Part C-J. Mech. Eng. Sci. 2023, 237, 1405–1444. [Google Scholar] [CrossRef]
Rakhmatuiln, I.; Kamilaris, A.; Andreasen, C. Deep Neural Networks to Detect Weeds from Crops in Agricultural Environments in Real-Time: A Review. Remote Sens. 2021, 13, 4486. [Google Scholar] [CrossRef]
Morales-García, J.; Terroso-Sáenz, F.; Cecilia, J.M. A multi-model deep learning approach to address prediction imbalances in smart greenhouses. Comput. Electron. Agric. 2024, 216, 108537. [Google Scholar] [CrossRef]
Sharma, V.; Tripathi, A.K.; Mittal, H. Technological revolutions in smart farming: Current trends, challenges & future directions. Comput. Electron. Agric. 2022, 201, 107217. [Google Scholar] [CrossRef]
Hasan, M.M.; Rahman, T.; Uddin, A.; Galib, S.M.; Akhond, M.R.; Uddin, M.J.; Hossain, M.A. Enhancing Rice Crop Management: Disease Classification Using Convolutional Neural Networks and Mobile Application Integration. Agriculture 2023, 13, 1549. [Google Scholar] [CrossRef]
Kong, J.L.; Xiao, Y.; Jin, X.B.; Cai, Y.Y.; Ding, C.; Bai, Y.T. LCA-Net: A Lightweight Cross-Stage Aggregated Neural Network for Fine-Grained Recognition of Crop Pests and Diseases. Agriculture 2023, 13, 2080. [Google Scholar] [CrossRef]
He, Z.; Ma, L.; Wang, Y.C.; Wei, Y.Z.; Ding, X.T.; Li, K.; Cui, Y.J. Double-Arm Cooperation and Implementing for Harvesting Kiwifruit. Agriculture 2022, 12, 1763. [Google Scholar] [CrossRef]
Ma, Y.H.; Feng, Q.C.; Sun, Y.H.; Guo, X.; Zhang, W.H.; Wang, B.W.; Chen, L.P. Optimized Design of Robotic Arm for Tomato Branch Pruning in Greenhouses. Agriculture 2024, 14, 359. [Google Scholar] [CrossRef]
Vrochidou, E.; Tsakalidou, V.N.; Kalathas, I.; Gkrimpizis, T.; Pachidis, T.; Kaburlasos, V.G. An Overview of End Effectors in Agricultural Robotic Harvesting Systems. Agriculture 2022, 12, 1240. [Google Scholar] [CrossRef]
Amin, A.; Wang, X.C.; Zhang, Y.N.; Li, T.H.; Chen, Y.Y.; Zheng, J.M.; Shi, Y.Y.; Abdelhamid, M.A. A Comprehensive Review of Applications of Robotics and Artificial Intelligence in Agricultural Operations. Stud. Inform. Control 2023, 32, 59–70. [Google Scholar] [CrossRef]
Gonzalez-de-Santos, P.; Fernández, R.; Sepúlveda, D.; Navas, E.; Emmi, L.; Armada, M. Field Robots for Intelligent Farms-Inhering Features from Industry. Agronomy 2020, 10, 1638. [Google Scholar] [CrossRef]
Zimmer, D.; Plasdak, I.; Barad, Z.; Jurisic, M.; Radodaj, D. Application of Robots and Robotic Systems in Agriculture. Teh. Glas.-Tech. J. 2021, 15, 435–442. [Google Scholar] [CrossRef]
Cheng, C.; Fu, J.; Su, H.; Ren, L.Q. Recent Advancements in Agriculture Robots: Benefits and Challenges. Machines 2023, 11, 48. [Google Scholar] [CrossRef]
Xie, D.B.; Chen, L.; Liu, L.C.; Chen, L.Q.; Wang, H. Actuators and Sensors for Application in Agricultural Robots: A Review. Machines 2022, 10, 913. [Google Scholar] [CrossRef]
Atefi, A.; Ge, Y.F.; Pitla, S.; Schnable, J. Robotic Detection and Grasp of Maize and Sorghum: Stem Measurement with Contact. Robotics 2020, 9, 58. [Google Scholar] [CrossRef]
Din, A.; Ismail, M.Y.; Shah, B.B.; Babar, M.; Ali, F.; Baig, S.U. A deep reinforcement learning-based multi-agent area coverage control for smart agriculture. Comput. Electr. Eng. 2022, 101, 108089. [Google Scholar] [CrossRef]
Mohammed, E.A.; Mohammed, G.H. Robotic vision based automatic pesticide sprayer for infected citrus leaves using machine learning. Prz. Elektrotechniczny 2023, 99, 98–101. [Google Scholar] [CrossRef]
Ren, G.Q.; Lin, T.; Ying, Y.B.; Chowdhary, G.; Ting, K.C. Agricultural robotics research applicable to poultry production: A review. Comput. Electron. Agric. 2020, 169, 105216. [Google Scholar] [CrossRef]
Yu, Z.P.; Lu, C.H.; Zhang, Y.H.; Jing, L. Gesture-Controlled Robotic Arm for Agricultural Harvesting Using a Data Glove with Bending Sensor and OptiTrack Systems. Micromachines 2024, 15, 918. [Google Scholar] [CrossRef] [PubMed]
Magalhaes, S.A.; Moreira, A.P.; dos Santos, F.N.; Dias, J. Active Perception Fruit Harvesting Robots—A Systematic Review. J. Intell. Robot. Syst. 2022, 105, 14. [Google Scholar] [CrossRef]
Wang, Z.H.; Xun, Y.; Wang, Y.K.; Yang, Q.H. Review of smart robots for fruit and vegetable picking in agriculture. Int. J. Agric. Biol. Eng. 2022, 15, 33–54. [Google Scholar] [CrossRef]
Adamides, G.; Edan, Y. Human-robot collaboration systems in agricultural tasks: A review and roadmap. Comput. Electron. Agric. 2023, 204, 107541. [Google Scholar] [CrossRef]
Chen, S.X.; Noguchi, N. Remote safety system for a robot tractor using a monocular camera and a YOLO-based method. Comput. Electron. Agric. 2023, 215, 108409. [Google Scholar] [CrossRef]
Ju, C.; Kim, J.; Seol, J.; Il Son, H. A review on multirobot systems in agriculture. Comput. Electron. Agric. 2022, 202, 107336. [Google Scholar] [CrossRef]
Wang, T.H.; Chen, B.; Zhang, Z.Q.; Li, H.; Zhang, M. Applications of machine vision in agricultural robot navigation: A review. Comput. Electron. Agric. 2022, 198, 107085. [Google Scholar] [CrossRef]
Zhang, C.; Noguchi, N. Development of a multi-robot tractor system for agriculture field work. Comput. Electron. Agric. 2017, 142, 79–90. [Google Scholar] [CrossRef]
Cheein, F.A.A.; Carelli, R. Agricultural Robotics: Unmanned Robotic Service Units in Agricultural Tasks. IEEE Ind. Electron. Mag. 2013, 7, 48–58. [Google Scholar] [CrossRef]
Droukas, L.; Doulgeri, Z.; Tsakiridis, N.L.; Triantafyllou, D.; Kleitsiotis, I.; Mariolis, I.; Giakoumis, D.; Tzovaras, D.; Kateris, D.; Bochtis, D. A Survey of Robotic Harvesting Systems and Enabling Technologies. J. Intell. Robot. Syst. 2023, 107, 21. [Google Scholar] [CrossRef] [PubMed]
Emmi, L.; Gonzalez-de-Santos, P. Mobile robotics in arable lands: Current state and future trends. In Proceedings of the European Conference on Mobile Robots (ECMR), Paris, France, 6–8 September 2017. [Google Scholar]
Fue, K.G.; Porter, W.M.; Barnes, E.M.; Rains, G.C. An Extensive Review of Mobile Agricultural Robotics for Field Operations: Focus on Cotton Harvesting. Agriengineering 2020, 2, 150–174. [Google Scholar] [CrossRef]
Gil, G.; Casagrande, D.E.; Cortés, L.P.; Verschae, R. Why the low adoption of robotics in the farms? Challenges for the establishment of commercial agricultural robots. Smart Agric. Technol. 2023, 3, 100069. [Google Scholar] [CrossRef]
Lytridis, C.; Kaburlasos, V.G.; Pachidis, T.; Manios, M.; Vrochidou, E.; Kalampokas, T.; Chatzistamatis, S. An Overview of Cooperative Robotics in Agriculture. Agronomy 2021, 11, 1818. [Google Scholar] [CrossRef]
Qiao, Y.L.; Valente, J.; Su, D.; Zhang, Z.; He, D.J. AI, sensors and robotics in plant phenotyping and precision agriculture. Front. Plant Sci. 2022, 13, 1064219. [Google Scholar] [CrossRef]
Sparrow, R.; Howard, M. Robots in agriculture: Prospects, impacts, ethics, and policy. Precis. Agric. 2021, 22, 818–833. [Google Scholar] [CrossRef]
Joseph, S.B.; Dada, E.G.; Abidemi, A.; Oyewola, D.O.; Khammas, B.M. Metaheuristic algorithms for PID controller parameters tuning: Review, approaches and open problems. Heliyon 2022, 8, e09399. [Google Scholar] [CrossRef]
Shah, P.; Agashe, S. Review of fractional PID controller. Mechatronics 2016, 38, 29–41. [Google Scholar] [CrossRef]
Kusrini, K.; Suputa, S.; Setyanto, A.; Agastya, A.; Priantoro, H.; Chandramouli, K.; Izquierdo, E. Data augmentation for automated pest classification in Mango farms. Comput. Electron. Agric. 2020, 179, 105842. [Google Scholar] [CrossRef]
Li, W.H.; Yu, X.; Chen, C.; Gong, Q. Identification and localization of grape diseased leaf images captured by UAV based on CNN. Comput. Electron. Agric. 2023, 214, 108277. [Google Scholar] [CrossRef]
Lu, J.; Hu, J.; Zhao, G.N.; Mei, F.H.; Zhang, C.S. An in-field automatic wheat disease diagnosis system. Comput. Electron. Agric. 2017, 142, 369–379. [Google Scholar] [CrossRef]
Rauf, H.T.; Lali, M.I.U.; Zahoor, S.; Shah, S.Z.H.; Rehman, A.; Bukhari, S.A.C. Visual features based automated identification of fish species using deep convolutional neural networks. Comput. Electron. Agric. 2019, 167, 105075. [Google Scholar] [CrossRef]
Tetila, E.C.; Machado, B.B.; Astolfi, G.; Belete, N.A.D.; Amorim, W.P.; Roel, A.R.; Pistori, H. Detection and classification of soybean pests using deep learning with UAV images. Comput. Electron. Agric. 2020, 179, 105836. [Google Scholar] [CrossRef]
Zhang, J.J.; Ma, Q.; Cui, X.L.; Guo, H.; Wang, K.; Zhu, D.H. High-throughput corn ear screening method based on two-pathway convolutional neural network. Comput. Electron. Agric. 2020, 175, 105525. [Google Scholar] [CrossRef]

Figure 1. Design of the CARA architecture for Smart Agriculture. (a) The structure and configuration of CARA. (b) The structure and configuration of IAM with DPC. (c) Actual demonstration of tree pest and disease detection.

Figure 2. Design of the HCRA. (a) The dimension diagram of HCRA. (b) The SolidWorks modeling of HCRA. (c) The Adams modeling of HCRA. (d) The schematic of HCRA’s workspace. (e) The states of HCRA in different poses.

Figure 3. The structural diagram of the deep learning algorithm deployed in the DPC. (a) The structural diagram of the CNN. (b) The structural diagram of the MLP.

Figure 4. Demonstration of practical application deployment. (a) Schematic of CARA application deployment. (b) Demonstration of soft robotic grippers. (c) Demonstration of hard robotic grippers. (d) Schematic representation of CARA’s working scenarios.

Figure 5. Performance evaluation of HARC. (a) Performance testing of HARC’s joint relative angular acceleration. (b) Performance testing of the forces on HARC’s joints. (c) Torque performance testing of HARC’s joints.

Figure 6. Performance testing of DPC. (a) CAM heatmap representation of the dataset. (b) Image augmentation of the dataset. (c) Classification loss. (d) Bounding box regression loss.

Figure 7. Display of IAM performance (monitor viewing angle).

Figure 8. MATLAB statistical analysis of CARA performance.

Table 1. Hardware parameters of robotic arm.

Parameter	Value
Degrees of freedom (DOF)	6
Maximum operational radius (mm)	625
Weight (kg)	16
Load (kg)	3
Mounting diameter (mm)	φ140
Repeatability (mm)	±0.02
End effector velocity (m/s)	≤1.9
Rated power (W)	150
Peak power (W)	1000
Working temperature range (°C)	0–50
Working environment humidity	25–90%
Insulation method	Arbitrary angle
Protection level	IP54

Table 2. Parameters of each joint of the robotic arm.

Joint Motion Parameters	Motion Limit (°)	Maximum Angular Velocity (°/s)
Joint 1	±360	178°/s
Joint 2	±360	178°/s
Joint 3	±360	178°/s
Joint 4	±360	237°/s
Joint 5	±360	237°/s
Joint 6	±360	237°/s

Table 3. Performance pre and post CARA deployment.

ID	Content	Before Implementation	After Implementation
1	Grasping or processing capabilities	Null	Precise grasping or processing
2	Image acquisition functionality	Null	Efficient and rapid image extraction
3	Image processing capabilities	Null	Accurate image processing
4	Image-integrated robotic arm control functionality	Null	Precise and stable control
5	High-configurability end-effector motion capabilities	Null	Motion stability and high configurability

Table 4. Suggestions for CARA improvements.

ID	Suggestion	Suggestion Type
1	Further enhance the stability and robustness of robotic arm movements	Robotic arm functionality
2	Enrich the variety and functionality of end-effectors	Robotic arm functionality
3	Further improve recognition frequency and response rate	Image recognition capabilities
4	Expand CARA’s dimensions and functionalities	Full system performance
5	Further broaden the recognizable range of agricultural products	Full system performance

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, M.; Wu, F.; Wang, F.; Zou, T.; Li, M.; Xiao, X. CNN-MLP-Based Configurable Robotic Arm for Smart Agriculture. Agriculture 2024, 14, 1624. https://doi.org/10.3390/agriculture14091624

AMA Style

Li M, Wu F, Wang F, Zou T, Li M, Xiao X. CNN-MLP-Based Configurable Robotic Arm for Smart Agriculture. Agriculture. 2024; 14(9):1624. https://doi.org/10.3390/agriculture14091624

Chicago/Turabian Style

Li, Mingxuan, Faying Wu, Fengbo Wang, Tianrui Zou, Mingzhen Li, and Xinqing Xiao. 2024. "CNN-MLP-Based Configurable Robotic Arm for Smart Agriculture" Agriculture 14, no. 9: 1624. https://doi.org/10.3390/agriculture14091624

APA Style

Li, M., Wu, F., Wang, F., Zou, T., Li, M., & Xiao, X. (2024). CNN-MLP-Based Configurable Robotic Arm for Smart Agriculture. Agriculture, 14(9), 1624. https://doi.org/10.3390/agriculture14091624

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CNN-MLP-Based Configurable Robotic Arm for Smart Agriculture

Abstract

1. Introduction

2. Materials and Methods

2.1. Structure and Configuration of HCRA

2.2. Structure and Configuration of IAM and DPC

2.3. Experimental Scheme

3. Results and Discussion

3.1. Performance Analysis of the HARC

3.2. Performance Analysis of DPC

3.3. Performance Analysis of IAM

3.4. Evaluation of CARA

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI