Crop Recommendation Systems Based on Soil and Environmental Factors Using Graph Convolution Neural Network: A Systematic Literature Review

: Data-driven approaches and resource management to improve yield are becoming increasingly frequent in agriculture with the progress in technology. Based on a broad variety of environmental variables, this research compares two graph-based crop recommendation algorithms, GCN and GNN. Our methods select the optimal crop for a season based on nitrogen, potassium and phosphorus levels, as well as temperature, humidity, soil pH and rainfall. We assess the dataset’s complexity using GCN and GNN, which can handle graph-based structured data well. We utilize supervised learning to structure input information as nodes in a graph with edges reflecting plausible feature relationships to predict the optimal crop based on environmental conditions. Our experiment creates a graph via data preprocessing. Crop recommendation effectiveness is assessed using F1-score, recall, accuracy and precision for both models. To prevent overfitting and ensure generalizability, we employ k-fold cross-validation. Our crop suggestion comparison of GCN vs. GNN shows their pros and cons. Due to its concentration on graph convolution and feature aggregation, GCN captures localized connections in the feature graph better than GNN, which competes in situations needing larger feature interactions. This research advances graph-based models in agriculture and highlights their potential to enhance precision agriculture. We prioritize choosing the optimum graph-based model based on the dataset’s nature and inherent links to optimize crop management and resource allocation.


Introduction
India is the leading producer of agricultural goods.The agriculture industry employs 58 percent of the total Indian population as well as contributes 17% to the country's GDP.Crops rely on a multitude of factors, including the type of soil, amount of rainfall and sunlight, irrigation, fertilizer use, insect presence, and land preparation [1,2].One of the most frequent challenges that Indian farmers must overcome is choosing crops in accordance with the terrain and the climate [3].Considering the fact that climatic conditions and characteristics of soil have a direct impact on crop yield, it is necessary to develop crop management practices that consider the appropriateness of the site and the soil [4].Weather and agriculture are closely intertwined; therefore, it is essential to adapt to changing climatic trends in a productive way.Using climate-smart agricultural practices may help increase productivity and produce quality crops [5].
Precision agriculture has recently brought about significant advances in the world of agriculture, with an emphasis on irrigation systems, fertilization, crop monitoring and yield prediction [6].Choosing the appropriate crop in relation to location-specific soil factors and climatic conditions is also vital for enhancing production [7].Therefore, farmers must be Eng.Proc.2023, 58, 97 2 of 10 equipped with instruments that allow them to choose the best crop suited for the region's unique meteorological and soil conditions [8].The development of crop recommendation system using deep learning techniques are illustrated in Figure 1.In developing nations, using machine learning for agricultural planning objectives has resulted in the development of applications such as crop recommendation, crop disease diagnosis, fertilizer management and so on [9].The farmers would profit from the development of crop recommendation systems that consider location-specific factors.The research described in this article tries to create a recommendation algorithm that offers highest produce based on terrain and climate factors unique to a particular region [10].In this paper, the graph convolution neural networks model was utilized for developing a crop recommendation system that depends on terrain and environmental factors [11].
Eng. Proc.2023, 58, x 2 of 11 yield prediction [6].Choosing the appropriate crop in relation to location-specific soil factors and climatic conditions is also vital for enhancing production [7].Therefore, farmers must be equipped with instruments that allow them to choose the best crop suited for the region's unique meteorological and soil conditions [8].The development of crop recommendation system using deep learning techniques are illustrated in Figure 1.In developing nations, using machine learning for agricultural planning objectives has resulted in the development of applications such as crop recommendation, crop disease diagnosis, fertilizer management and so on [9].The farmers would profit from the development of crop recommendation systems that consider location-specific factors.The research described in this article tries to create a recommendation algorithm that offers highest produce based on terrain and climate factors unique to a particular region [10].In this paper, the graph convolution neural networks model was utilized for developing a crop recommendation system that depends on terrain and environmental factors [11].

Background
In this section, we discuss the proposed model graph convolution network (GCN) and the existing model graph neural network (GNN) along with their architectures.

Graph Convolution Network (GCN)
Graph-based neural networks, an extension of deep neural network models designed for data with inherent graph structures, have seen a surge in popularity in recent years.This trend is particularly notable in conjunction with the field of link prediction [12,13].
Consider a weighted undirected graph G, represented by its adjacency matrix A, where A(i, j) in the ith row and jth column signifies the weight of the edge (i, j).The degree matrix, D, is defined as follows: The following normalized symmetric definition applies to the graph G's Laplace matrix: As a positive semidefinite real symmetry matrix, L may be divided into the following:

Background
In this section, we discuss the proposed model graph convolution network (GCN) and the existing model graph neural network (GNN) along with their architectures.

Graph Convolution Network (GCN)
Graph-based neural networks, an extension of deep neural network models designed for data with inherent graph structures, have seen a surge in popularity in recent years.This trend is particularly notable in conjunction with the field of link prediction [12,13].
Consider a weighted undirected graph G, represented by its adjacency matrix A, where A(i, j) in the ith row and jth column signifies the weight of the edge (i, j).The degree matrix, D, is defined as follows: The following normalized symmetric definition applies to the graph G's Laplace matrix: As a positive semidefinite real symmetry matrix, L may be divided into the following: where U = (u 0 , u 1 , .signal is x = (x 0 , x 1 , . . ., x n−1 ) ∈ R n .The Fourier transform of the graph signal, x, is shown below.
The convolution of the two signals is calculated as follows: x * g = U((U T x) (U T g)) If g θ = diag(U T g) is used as a filter for a graph signal, x, the graph convolution may be defined as follows: x However, because of matrix-vector multiplication, the model's computational cost is O(n 2 ), which is rather high.A K-degree polynomial filter is used in the convolutional layer of a model known as ChebNets to address this issue.The model's k-th polynomial filter for the spectrum is written as follows.
In order to ensure spatial locality, the K-order polynomial filter of the spectrum is represented in the node domain as an aggregate of K-order neighborhoods, and the number of filter parameters is also kept to O(K) = O(1) [16,17].The model utilizes Chebyshev polynomial T k (x) = 2T k−1 (x) − T k−2 (x) to further minimize computing complexity for recursive computation, where T 0 (x) = 1 and T 1 (x) = x.As a result, the definition of the convolution of the filter and graph signal, x, is obtained as follows: In order to achieve numerical stability, the matrix of adjacency, A, is modified to produce A, which yields a combined convolutional layer that is more straightforward.
where A = I + A and D ij = ∑ jA ij , f (.) is the activation function and Θ is the matrix of the filter parameters.The GCN architecture and algorithm is followed.

Graph Neural Network
In 2005, a unique neural network model was created that demonstrated the capability to handle graph structure data.This model is known as the graph neural network.The objective of graph neural networks (GNNs) is to develop effective deep learning techniques for non-Euclidean spaces [18,19].
The following is an introduction to the relevant concepts.The input graph is . . ,v n }, which depicts the collection of nodes, and E = (i, j) when v i is adjacent to v j is a collection of edges.x i indicates the feature vector for node v i , and X V = {x 1 , x 2 , . . . ,x n } is the set of all nodes' feature vectors.x(i, j) denotes the feature vector of edge (i, j), and X E = x (i,j) (i, j) ∈ E is the collection of all edge feature vectors.
In a graph neural network model, the input graph G is turned into a dynamic graph , where t = 1, 2,. .., T denotes time and represents the state vectors of node v i at time t that is dependent on the graph i equation is as follows: where f w (•) represents the local transformation function with parameters; w, x ne(i) is the set of vectors of features of all nodes adjacent to node; v i , x co(i) is the set of feature vector of all edges linked to node v i ; h ne(i) is the collection of feature vectors of all edges linked to node vi; and the collection of state vector of all nodes that are adjacent to node v i at time t [20].

Literature Review
This research unveils a straightforward yield forecast system crafted for the convenience of farmers.The proposed solution takes the form of a smartphone app, acting as a communicative tool to inform farmers about the myriad factors influencing crop yield.Various machine learning methods, including SVM, ANN, RF, MLR and KNN, are used to estimate agricultural production.The random forest method had the highest accuracy at 95% [21].
This study introduces Agro DSS, a unique system that connects agricultural systems with cutting-edge decision support.The tools include predictive modeling, accuracy assessment, time series grouping, decomposition and structural change detection.The users may use them to forecast simulated situations and comprehend domain relationships or interconnections [22].
This paper introduces Agro Consultant, a smart system that helps farmers in India choose crops based on the sowing season, farm geographical position, soil properties and environmental variables like climate and precipitation.The results of the Multi-Label Classification (MLC) model, in comparison with KNN and random forest, showed that it is better for prediction than the existing models [23].
This paper describes a system of recommendations using a majority voting ensemble model employing trees at random, CHAID, K-nearest neighbor and naive Bayes to suggest crops depending on site-specific parameters which are very effective and accurate.This system uses data on soil features, kinds of soil and crop yield to guide farmers in choosing the right crop [24].
The authors developed a crop recommendation system based on soil characteristics, employing a blend of ensemble models and majoritarian voting methods like K-nearest neighbor and naive Bayes.This approach aims to select crops with high efficiency and precision.These algorithms assess agricultural productivity under given weather circumstances utilizing statistical data like environmental factors, agricultural production and state/district crops to provide categorization pictures [25].
This article discusses AI-driven precision agriculture and an ML-powered cloud-based agricultural suggestion engine to help farmers grow crops based on data.Extreme gradient boosting, decision tree, random forest, KNN and support vector machine (SVM) methods are tested to find the best prediction machine learning (ML) method for a cloud-based recommendation platform.The advancement and widespread use of free and open-source precision agriculture solutions contribute to the cultivation and acceptance of high-quality crops [26].
Our study aims to solve the problem of choosing optimal crops by creating a machine learning-based recommendation system along with image manipulation.We compared KNN, XGBoost, random forest, as well as neural network-based picture augmentation methods in this article and discovered that XGBoost outperformed the other models.The developed model is accurate enough [27].
The goal of this review is to provide a thorough overview of the most recent research projects using deep convolutional neural networks (CNNs) for plant phenotyping applications.We particularly examine how different CNN architectures are used to evaluate postharvest quality, monitor plant growth and measure plant stress.Finally, we provide a number of suggestions for further investigation into the use of CNN architecture for plant phenotyping [28].
This study demonstrated that, in addition to case-specific irrigation and drainage management optimization, combinations of soil amendments, conditioners and residue management may significantly increase crop yields while reducing soil salinity.These findings demonstrate that higher yields necessary for expanding and maintaining agricultural output may also be obtained via conservation agriculture [29].
In this article, we covered four topics: (1) the effect of conventional and unconventional cropping practices on soil health in agrosystems; (2) the evolution of plant-microbe soil complex and the biochemical mechanisms responsible for soil health under the pressure of agriculture; (3) changes in the notion of soil quality and health in recent decades in agrosystems and the key indicators currently used for evaluating soil health; and (4) the problems in agroecosystems that affect soil health [30].

Dataset
In this article, we used the following data set that can be found at https://www.kaggle.com/datasets/siddharthss/crop-recommendation-dataset(accessed on 21 August 2023).This dataset was build by augmenting datasets of rainfall, climate and fertilizer data available for India.The dataset which would allow the users to build a predictive model to recommend the most suitable crops to grow in a particular farm based on various parameters.

Data Preparation to Train GCN
Data preparation to train a graph convolutional network (GCN) entails many critical processes to guarantee that the model can successfully learn from the provided variables, which include edges, features and targets.
The creation of the graph structure is the initial stage of data preparation.This will include identification and organization of edges, which describe the connections among the nodes in the network.In order to do this, an adjacency matrix must be created to reflect the relationships between nodes.Additionally, node features need to be gathered.These features record data about each node and provide the GCN crucial input.In order to ensure proper information flow in the graph, it is essential to make sure that the node characteristics match with the associated nodes in an adjacency matrix.
Furthermore, for supervised learning tasks, target labels or values connected to certain nodes are crucial.These targets could represent actual data for regression tasks or categories for node classification.To set the baseline for the learning process, it is essential to match the goal values with their appropriate nodes.
After preparing the graph topology, node characteristics and goal values, data normalization should be considered.This stage improves training stability as well as convergence.Normalizing features and scaling target values eliminate problems caused by different magnitudes and distributions.
Finally, the data should be separated into three groups: training, validation and test.Care must be taken to make sure that the structure of graphs stays intact inside each set, maintaining the data's interconnectedness.Techniques such as stratified sampling are often used to maintain distribution of classes balance, especially for node tasks involving classification.Figure 2 depicts the flow chart for proposed methodology.

Model Training
After preprocessing the data, the graph convolutional network (GCN) requires many crucial training stages.The GCN model has graph convolutional units in each layer.These units transfer messages between neighboring nodes, allowing the model to reflect complex network interactions.This model minimizes a loss function that measures the difference among predicted and target values during training.The backpropagation procedure computes loss gradients with respect to the model's parameters, permitting gradient descent or associated methods for optimization.Dropout or L2 regularization may avoid overfitting.Splitting the training dataset into batches improves memory efficiency and convergence.A validation set continually monitors the model's performance to avoid overfitting and picks the ideal model according to the validation criteria.After training converges, the example may be tested on a separate test set for generalization to new data.Hyper parameter tweaking, including learning rate, layers and hidden units per layer, significantly affects the GCN's predictive capability and convergence behavior, thereby making training successful.
After preparing the graph topology, node characteristics and goal values, data normalization should be considered.This stage improves training stability as well as convergence.Normalizing features and scaling target values eliminate problems caused by different magnitudes and distributions.
Finally, the data should be separated into three groups: training, validation and test.Care must be taken to make sure that the structure of graphs stays intact inside each set, maintaining the data's interconnectedness.Techniques such as stratified sampling are often used to maintain distribution of classes balance, especially for node tasks involving classification.Figure 2 depicts the flow chart for proposed methodology.

Model Training
After preprocessing the data, the graph convolutional network (GCN) requires many crucial training stages.The GCN model has graph convolutional units in each layer.These units transfer messages between neighboring nodes, allowing the model to reflect complex network interactions.This model minimizes a loss function that measures the difference among predicted and target values during training.The backpropagation procedure computes loss gradients with respect to the model's parameters, permitting gradient descent or associated methods for optimization.Dropout or L2 regularization may avoid overfitting.Splitting the training dataset into batches improves memory efficiency and convergence.A validation set continually monitors the model's performance to avoid overfitting and picks the ideal model according to the validation criteria.After training converges, the example may be tested on a separate test set for generalization to new data.Hyper parameter tweaking, including learning rate, layers and hidden units per layer, significantly affects the GCN's predictive capability and convergence behavior, thereby making training successful.

Model Evaluation
The capacity of a trained model to generalize to new data is assessed through model assessment.Metrics, such the F1-score, precision, recall, accuracy, specificity and sensitivity reflect categorization task performance.Cross-validation guarantees that estimations are accurate.Confusion matrices, for example, provide extensive information.Evaluation guides deployment and improvement.

Performance Metrics
Accuracy: A simple way to gauge accuracy is to look at how often the classifier predicts correctly.The ratio of the number of accurate forecasts to all of the model's predictions may be used to determine accuracy.

Accuracy = TP + TN S
Precision: In terms of the total number of instances that have been categorized, precision is the proportion of cases that are accurately categorized.Precision = TP TP + FP Recall: It is the ratio of the total number of true and false negatives to the correct positive numbers.

Recall =
TP TP + FN F1-score: To compute the F1-score, the harmonic mean of the recall and accuracy scores is determined.

F1 =
2 * Precision * Recall Precision + Recall Sensitivity: Memory, or sensitivity, is another name for recall, and it refers to the proportion of properly positive labels that our computer is able to identify as labels.This may also be expressed as a percentage.

Sensitivity =
TP TP + FN Specificity: The algorithm identifies the negative labels as specificity, which is the proper classification.

Results
From Table 1 below, it can be seen that there are 22 classes, from 0 to 21. Class 0 is rice, Class 1 is wheat, and Class 2 is maize.The samples are as follows: 1-corn, 2-chickpeas, 3-kidney beans, 4-pigeon peas, 5-moth beans, 6-mung bean, 7-black gramme, 8-bean, 9-grape, 10-banana, 11-mango, 12-grapes, 13-watermelon, 14-muskmelon, 15-apple, 16-orange, 17-papaya, 18-coconut, 19-cotton, 20-jute and 21-coffee.Figure 3 illustrates the confusion matrices for the proposed Graph Convolutional Network (GCN) and established methods like GNN, CNN), and ANN.Notably, the GCN exhibits the highest accuracy in correct predictions and a notably lower incidence of misclassifications compared to the other existing methods.Our analysis leads to the conclusion that the suggested model, the Graph Convolutional Network (GCN), demonstrates superior accuracy in classification tasks by minimizing misclassifications when compared to alternative models.Figure 4 depicts the comparison of performance metrics of proposed and existing methods.

Discussion
The GCN (graph convolutional network) model is clearly better than other crop recommendation models, as shown by the performance metrics in Table 1.The GCN model, which has a significantly higher accuracy rate of 0.98, not only performs well in this important measure but also routinely achieves excellent results in a number of other crucial assessment criteria, including accuracy, recall, F1-score, specificity and sensitivity.The unique ability of the GCN model to analyze the vast network of data related to agriculture through a lens of graphs topologies, thereby capturing intricate interconnections that are frequently elusive via conventional models, is what really sets it apart.Because of its amazing capacity for translating abstract knowledge into practical insights, the GCN model has the potential to revolutionize agricultural decision making.This model provides stakeholders with essential information to optimize plans and resource allocation by providing individualized and informed crop suggestions.In simple terms, the GCN model's strength goes beyond quantitative measurements.It has the potential to fundamentally alter the way we think about agriculture, ushering in a time where improving yields, long-term viability and overall production will be led by data-driven intelligence.The GCN model's superior ability to comprehend complex data structures sheds a positive light on the future of the agricultural sector, where innovation and pragmatism are combined for the benefit of the sector and the security of the world's food supply.

Conclusions
The extensive research reported in Table 1

Discussion
The GCN (graph convolutional network) model is clearly better than other crop recommendation models, as shown by the performance metrics in Table 1.The GCN model, which has a significantly higher accuracy rate of 0.98, not only performs well in this important measure but also routinely achieves excellent results in a number of other crucial assessment criteria, including accuracy, recall, F1-score, specificity and sensitivity.The unique ability of the GCN model to analyze the vast network of data related to agriculture through a lens of graphs topologies, thereby capturing intricate interconnections that are frequently elusive via conventional models, is what really sets it apart.Because of its amazing capacity for translating abstract knowledge into practical insights, the GCN model has the potential to revolutionize agricultural decision making.This model provides stakeholders with essential information to optimize plans and resource allocation by providing individualized and informed crop suggestions.In simple terms, the GCN model's strength goes beyond quantitative measurements.It has the potential to fundamentally alter the way we think about agriculture, ushering in a time where improving yields, long-term viability and overall production will be led by data-driven intelligence.The GCN model's superior ability to comprehend complex data structures sheds a positive light on the future of the agricultural sector, where innovation and pragmatism are combined for the benefit of the sector and the security of the world's food supply.

Conclusions
The extensive research reported in Table 1 supports the GCN (graph convolutional network) model as the indisputable leader in crop recommendation tasks.The GCN model, with an amazing 98% accuracy rate, establishes an incredibly high standard that is routinely matched by great performance across a range of essential measures such as recall, precision, F1-score, specificity and sensitivity.This comprehensive study not only confirms the GCN model's superiority but also highlights its potential to transform the landscape and agricultural decision making.The GCN model's unique capacity to untangle the complexities of agricultural data using graph topologies gives a multidimensional view unmatched by its competitors.This distinguishing feature enables it to provide personalized and contextually appropriate crop advice with remarkable granularity.The comparison analysis emphasizes the GCN model's clear advantage over competing models, establishing it as the only viable option for constructing an efficient crop recommendation system.The GCN model possesses the potential to optimize the use of resources, improve sustainability practices and significantly increase agricultural output by leveraging the power of modern data analytic tools.Its ability to achieve an accuracy rate of 98% attests to its resilience and highlights its potential for moving further improving precision agriculture.In a world of changing difficulties and agricultural needs, the GCN model's capacity to deliver insightful and exact suggestions is a spark of innovation.As technology continues to alter agriculture's future, the GCN model's efficacy demonstrates its critical contribution, highlighting its importance in driving transformative change in crop management and supporting a more environmentally friendly and productive agricultural industry.

Figure 1 .
Figure 1.Crop recommendation system using deep learning.

Figure 1 .
Figure 1.Crop recommendation system using deep learning.

Figure 2 .
Figure 2. Flow chart of GCN data preparation.

Figure 2 .
Figure 2. Flow chart of GCN data preparation.

Figure 3 .
Figure 3.Comparison of proposed and existing methods.
. ., u n−1 ) is the matrix of eigenvectors and ∧ =
supports the GCN (graph convolutional Figure 4. Comparison of performance metrics.