Knowledge Mapping of Machine Learning Approaches Applied in Agricultural Management—A Scientometric Review with CiteSpace

: With the continuous development of the Internet of Things, artificial intelligence, big data technology, and intelligent agriculture have become hot topics in agricultural science and technology research. Machine learning is one of the core topics in artificial intelligence, and its application has penetrated every aspect of human social life. In modern agricultural intelligent management and decision making, machine learning plays an important role in crop classification, crop disease and insect pest prediction, agricultural product price prediction, and other aspects of management and decision-making processes in agriculture. To detect and recognize the latest research developing features in a quantitative and visual way, and based on machine learning methods in agricultural management, the authors of this paper used CiteSpace bibliometric methods to analyze relevant studies on the development process and hot spots. High-value references, productive authors, country and institution distributions, journal visualizations, research topics, and emerging trends were reviewed and analyzed. According to the keyword visualization and high-value references, machine learning approaches focus on sustainable agriculture, water resources, remote sensing, and machine learning methods. The research mainly focuses on six topics: learning technology, land environment, reference evapotranspiration, decision support systems for river geography, soil management, and winter wheat, while learning technology has been the most popular in recent years.


Introduction
Smart agriculture is a highly integrated product of agriculture and technology. The main applied technologies include sensor networks, big data, cloud computing, the Internet of Things, and so forth. Their main functions are to collect, process, and analyze the relevant data in agricultural production and operations so as to provide guidance and intelligent decision-making for the corresponding agricultural production. Machine learning algorithms are a common technology in the application of intelligent agriculture. Their main function is to use people's prior knowledge to build different algorithm models so as to realize the prediction of unknown things. The classification of crops, the identification and prediction of pests and diseases, and the price prediction of agricultural products are three crucial aspects for the development of smart agriculture. A large number of research results show that machine learning models (mainly including artificial neural networks, Bayesian models, deep learning, dimensionality reduction, decision trees, ensemble learning, and support vector machines) are widely used in the above three aspects [1].
Currently, smart agriculture management relies on data collected with high accuracy, and wireless sensor networks have been widely used with the Internet of Things to monitor the soil, water, and crop situation [2]. Data is collected through the use of the Internet of Things, and is then processed by machine learning techniques to increase the quantity and quality of crop production [3]. Thus, agribusiness can better monitor the status of crops and receive suggestions to help reduce pollution and pesticide use. These technologies not only help create more sustainable agriculture but also make it more productive and increase profits [4]. Except for wireless sensor networks, remote sensing technology [5], information systems [6], and unmanned aerial vehicles [7] all need the support of machine learning methods to contribute to a smart agriculture system. The main limitation of machine learning in agriculture management is the environment, such as greenhouses and large-scale lands. There have been some research reviews on the technical method of agricultural management in academic circles. Hu et al. [8] reviewed the current research on operational management in precision agriculture and digital agriculture and found that most of the research was on technological design, the precision management of agricultural production, and the traceability management of product quality and safety. Sharma et al. [9] summarized the application of machine learning in agricultural supply chains and proposed that decision-makers should cultivate machine learning technology and maintain sustainable input costs to keep companies growing healthily. Jordan et al. [10] believe that machine learning methods have improved the fields of crop management, livestock management, water management, and soil management, which are interrelated and have established knowledge-based intelligent agricultural systems. Mazzia et al. [11] indicate that machine learning could be crucial to precise agriculture and could help achieve cost-effective agricultural mergers with unmanned aerial vehicles. Moreover, core machine learning approaches, such as supervised learning, unsupervised learning, and reinforcement learning, contribute to the emergence of computer vision, speech recognition, and robot control [10], which have supported many agricultural management studies. Additionally, the advantages and disadvantages of different machine learning approaches vary, and each approach has particular features and applications [12], which make them indispensable, revolutionary, and widely used in agricultural management.
The existing research has provided a rich theoretical basis for this paper. However, there is only one review article about machine learning approaches in agricultural management. It reviewed specific machine learning approaches in detail and several applications in agriculture [1]. However, this review did not reveal other important research information. Who are the famous scholars on the application of machine learning methods in agricultural management? Which countries and institutions have close exchanges? What are the research subjects and development trends? These problems need further analysis. Therefore, in order to gain a deep understanding of the current research progress of machine learning algorithms in the application of intelligent agriculture, this paper focuses on the main research results of machine learning algorithms in intelligent agricultural management. CiteSpace, a literature analysis tool, was adopted to analyze 1950 articles and 88,984 citations collected from the Web of Science. The analysis revealed six research aspects: high-value references, productive authors, distributions of countries and institutions, journal visualization, research topics, and emerging trends. It is expected that this study will provide a reference for the exploration of machine learning algorithms more suitable for the development of intelligent agriculture.

Research Methods
Scientometric analysis is an effective method to deal with data and information visualization, which can be used to detect the research frontier and research hotspots as well as to track vital changes in one area [13]. To avoid subjectivity, scientometric analysis mainly uses statistical approaches to analyze, process, and evaluate the quality and the feature of research materials. Based on a large number of articles in the literature database, this method is regarded as a popular tool to learn about one subject, summarize the development path, and predict the future trend. Scientometric analysis is used for many literature analysis tools, including CiteSpace.
CiteSpace, a Java-based application, was created by Chaomei Chen, a famous scholar focused on information visualization, knowledge mapping, and science frontier atlases. CiteSpace is a strong tool to systematically learn about one field rapidly. The principle of CiteSpace is to label co-citation clusters and then use time-sliced snapshots to form timeliness and pivotal points [14]. CiteSpace 5.7.R2 (64-bit), which is run in the Java 8 environment, was used for this paper. The new version is equipped with 3 central concepts: Kleinberg's burst detection, Freeman's betweenness centrality metric, and heterogeneous networks. This means that three key problems can be better solved: (1) identifying research fronts, (2) labeling specialties, and (3) detecting emerging trends and abrupt points [15]. Moreover, Professor Chaomei Chen provided us with an excellent research paradigm in regenerative medicine, demonstrating the optimistic future of this field [16]. The number of publications using CiteSpace has drastically increased in recent years, and the program can be attained at http://cluster.ischool.drexel.edu/~cchen/citespace/download/ (accessed on 20 March 2021).

Data Collection
The most crucial component of a review article is data collection, and the quantity and quality of relevant article material directly determine the visualization effect and the quality of the review article. The Web of Science is one of the main citation index databases. It mainly includes the science citation index, social science citation index, and arts and humanities citation index, providing an effective platform to conduct scientific research. In order to find the relevant and proper articles, we followed specific steps to collect and select data: (1) choose the Web of Science Core Collection; (2) use the advanced search setting; (3) follow the search conditions listed in Table 1 and select the final research materials.

Visualization Results
First, in order to exactly describe the co-occurrence, relationship, and citation information, basic parameters needed to be set properly before systematically analyzing the trends and research status. The selection criteria were set as the top N strategy, which means that the top 50 levels of the most cited and high-frequency items were selected from each slice. Moreover, the fundamental value of examining map significance could be measured by two values: the degree of modularity (Q value), which is the modularity value assessing the structure of certain material, and the mean silhouette value (S value), representing the degree of clustering. If the Q value is > 0.3 and the S value is > 0.7, it means that the community structure is favorable and the clustering result is acceptable.
The results indicate that the Q value is 3.463 and the S value is 0.746, demonstrating that the visualization effect is fine [17].
The following specific functions were chosen to assess the related review results: (1) reference visualization, (2) productive authors, (3) country and institution distribution, (4) published journals, (5) keywords and their timeline trends, and (6) category visualization.

Reference Visualization
A total of nine typical and influential articles about machine learning methods applied in agricultural management were selected through citation frequency, where the threshold was set as 18. These articles could, to some extent, represent a trend. The visualization result is shown in Figure 1. Among the 1950 selected articles, the articles were mainly compiled by Berner, who has the highest citation frequency. Among the articles, Berner et al. [18] were the first to use a machine learning method to address agriculture and environmental problems. For the article, published in Biogeosciences in 2012, a typical machine learning approach, the random forest, was used to process and predict a large-scale dataset and solve the specific forest problem in northern Siberia. The results showed that machine learning approaches could be applied to solve difficulties in various fields, and that they have advantages, such as low bias, the ability to handle large datasets, and not overfitting models compared to traditional methods. This article is significant in leading the trends of machine learning approaches applied in the agricultural field.
An excellent work published in Nature, the article "Deep learning", written mainly by LeCun, a scholar from New York University, is one of the most influential and initial works in this field; and the article led to a new approach to address speech recognition, visual object recognition, and other fields. Moreover, the principle and classification of deep learning are interpreted clearly [19]. To some extent, the deep learning neural network method has been widely used to solve specific agricultural problems.
An excellent work mainly by Kamilaris conducted a real application survey of deep learning approaches in agriculture. The advantages of deep learning, such as its high accuracy, use of simulated datasets, and shorter testing time, and the disadvantages of deep learning, such as its inability to handle large datasets and being unable to exceed the boundaries of datasets, are discussed. In addition, as a promising machine learning approach, Kamilaris thinks that deep learning would have more application functions than land cover classification and weed detection [20].
It goes without saying that the constantly increasing population and consumption will lead to a lack of food and excessive greenhouse gas emissions for decades. The excessive exploration of land, water, energy, and fishing will not ensure sustainable agricultural development and food safety. Godfray outlined several aspects waiting to be solved in agriculture and food fields [21], while Tilman et al. [22]. stated that the sustainable intensification of agriculture is also of great significance. Determining how to get along well with the environment will have a profound influence on our future, which motivates many scholars to further probe specific aspects with various approaches, including machine learning.
Remote sensing is one of the key technologies in agricultural management, especially in the agricultural IOT [23]. Published in the ISPRS Journal of Photogrammetry and Remote Sensing, Belgiu's article "Random forest in the remote sensing: A review of applications and future directions", which expounded on the principles of the random forest approach and its application in remote sensing, has received wide attention. In addition, a comparison with other machine learning approaches and the future trends are discussed to achieve better applications and further research [24].
Elith et al. published an article named "A working guide to boosted regression trees" that introduced a strong algorithm with powerful ecological insight [25]. Boosted regression trees act as an automatic approach to recognize and predict relevant variables. Compared with traditional methods, boosted regression trees have huge predictive advantages and might be broadly used in agricultural management.
Finally, regarding evapotranspiration in arid environments, Tabari et al. stated that evapotranspiration becomes imperative when considering water resource management. Moreover, he evaluated climate data using machine learning approaches such as the Support Vector Regression, Adaptive-Network-Based Fuzzy Inference System, and Mixed Logistic Regression, and then expressed the criteria to examine the errors [26]. Similarly, the article "Extreme Learning Machines: A new approach for prediction of reference evapotranspiration" published in the Journal of Hydrology is a key point for machine learning approaches use in agriculture. The authors strongly recommend the Extreme Learning Machine algorithm due to its efficiency, simplicity, speed, and compatibility. This approach proved valid to accurately solve the reference evapotranspiration problem [27].
As seen above, the leading papers are mainly classified in the following four areas: (1). Specific machine learning approaches, especially the random forest, deep learning, and extreme learning. These new approaches are now leading a hot research trend in agricultural applications. (2). Sustainable agricultural development. How to address food shortages and how to solve intensive agricultural problems are becoming increasingly more significant for humans to survive well with limited resources. (3). Remote sensing. Remote sensing technology is now widely used in agricultural management, especially in the agricultural Internet of Things and precise agriculture. (4). Water resources. As the fundamental factor of agricultural operations, water resources are very important for crop yields and plant quality. Further research on water could greatly improve and push the progress.

Contributing Authors Analysis
The ten most productive authors were detected through CiteSpace. In this section, the ten most productive authors are introduced to help better understand the current research status. The results are shown in Figure 2. Ravinesh Deo, a professor at the University of Southern Queensland, is first, due to his highly cited articles; many of his articles were searched. His main research fields include machine learning, algorithms, artificial intelligence, and agriculture, especially machine learning approaches. His great contributions to agriculture are reflected in several of his articles. For the article "Soil moisture forecasting by a hybrid machine learning technique: ELM integrated with ensemble empirical mode decomposition" [28], he used a hybrid machine learning algorithm to predict soil moisture, which is crucial to the agricultural system. Moreover, a multistage probabilistic machine learning model was designed to build an accurate rainfall prediction system, improving the constant drought risk of agricultural production [29].
Z.M. Yaseen, who is from the National University of Malaysia, is especially good at advanced data analysis and machine learning approaches. Due to his high enthusiasm for machine learning, he has contributed greatly to the agricultural applications of machine learning approaches. Published in Agriculture Water Management, his work "Reference evapotranspiration prediction using hybridized fuzzy model with firefly algorithm: Regional case study in Burkina Faso" has had a great influence on pushing machine learning approaches into agriculture [30].
The articles of Ozgur Kisi, from Llia State University, have been cited over 18,000 times. Moreover, he is one of the main participants of the influential masterpiece above, LeCun et al. [20]. He has achieved several great works by using machine learning approaches to solve evapotranspiration problems. His article "Modelling reference evapotranspiration using a new wavelet conjunction heuristic method: wavelet extreme learning machine vs wavelet neural networks" published in Agricultural and Forest Meteorology provided a unique thought to measure the best model [31].
The research fields of Rei Sonobe, a scholar at Hokkaido University, Japan, are mainly focused on crops and machine learning. There were also three members detected on his team: Xiufeng Wang, Hiroshi Tani, and Nobuyuki Kobayashi. He combines remote sensing, agricultural management, and machine learning. In his article "Random forest classification of crop type using multi-temporal TerraSAR-X dual-polarimetric data", he compared two algorithms and achieved high accuracy with classifying various crops [32]. He also published several similar studies using different machine learning methods [33]. Similarly, Biswajeet Pradhan, from the University of Technology Sydney, is also keen on remote sensing and machine learning in land cover [34], water pollution [35], soil erosion [36], and natural hazards [37]. Sunil Saha from the University of Gour Banga is also an important participant in this field. Ningbo Cui, a scholar from Sichuan University, has especially mastered fruit quality [38], crop water resources [39], and machine learning. His work presents a mix of these topics that could lead to further research on applying machine learning.

Country and Institution Distribution
The visualization results when the threshold is set as 119 are shown in Figure 3. There are two aspects that need to be addressed. One aspect is the betweenness centrality, which reflects the importance of nodes in the visualization network. The sizes of the circles reflect the centrality in Figure 3. The other aspect is the lines between two countries, representing the collaborative relationships of these countries. Among the 1950 articles from 2000 to 2020, the United States ranks first, contributing the most to this field. Since the primary development of machine learning methods in agriculture began in the United States, the country has achieved early regulation and standards. Beginning in 2003, and over the 17 years thereafter, the United States published 467 papers. A large contribution was from China, whose number of publications has reached 274, although its starting time was much later than those of the other main contributing countries. Australia, which ranks third in this field, accounts for merely 10% of the total publications. The following countries are Spain (144), England (133), Iran (131), and Germany (119). More details are shown in Table 2. Overall, the United States is still the pioneer, while China is second. The other countries are much lower than these two. The research centers and leaders are still in the United States and China.  Figure 4 shows the institutional distribution with a threshold of 20, which could help recognize the important institutions and learn about the correlation of different institutions.
The ranks and specific institution names are listed in Table 3. The results show that America, China, and Canada have the top three positions. Although America has the most institutions, most publications are from the Chinese Academy of Sciences, China.

Cited Journals
To systematically learn about the publication status, the cited journal networks are shown in Figure 5. The most cited journal is Agriculture Ecosystems & Environment, which has been cited 508 times. The articles identified from Science and Nature accounted for 463 citations and 426 citations, respectively. The ranks of the different influential journals are listed in Table 4.

Popular Topics and Emerging Trends
In this section, popular topics and emerging trends are analyzed through the keyword visualization results. Figure 6 and Table 5 indicate the keywords with high frequencies, including "management", "agriculture", "prediction", "classification", and some machine learning methods. To help better understand the visualization results, Table 4 gives the top 25 most frequent keywords that appeared in the 1950 articles and their 88,984 references, and each keyword has appeared more than 60 times. These keywords could represent the research trends and popular topics to a large extent.
It is evident that "agriculture" and "management" take the first and second positions, respectively. The next several keywords, ordered by frequency, are "models", "artificial neural networks", "systems", "neural networks", "classification", and "prediction", which represent machine learning approaches and some of their applications in agriculture. Next, many keywords related to specific agriculture, such as "land use", "water", "precision agriculture", "yield", "forest" and "vegetation", appear in the high-frequency list. As Table 4 shows, the random forest, one of the main approaches in machine learning, although it started at a later time, still attracted much attention. In addition, it is easy to find that neural network approaches are widely used in agricultural fields. The application of machine learning mainly focuses on precision agriculture, prediction, classification, yield management, land use, water, forest, and vegetation. Moreover, as a key point of building smart agriculture systems, remote sensing is an inevitable trend for agricultural development.
These keyword networks could also be of great importance to detect the revolutionary trends of this field. To determine the developmental path in the agricultural applications of machine learning, the timeline of burst keywords from 2000 to 2020 is shown in Figure 7, and the burst keywords are classified in six ways.
The "learning technique" (#0), initially starting with the neural network in 2000, is the key point of this field. As shown in Figure 7, all the following keywords have a strong relationship with machine learning approaches. The specific machine learning approaches include decision trees, random forests, neural networks, genetic algorithms, deep learning, and logistic regressions. In the last few years, various machine learning techniques have achieved rapid progress and have more application functions, such as pest monitoring and unmanned agriculture vehicles (UAVs).
The "landscape context" (#1), a fundamental component of agricultural management, is inevitable in the further development of agricultural management. This topic mainly discusses land use and the sustainable development of basic agricultural resources, such as land use, communities, forests, conservation, rivers, vegetation, and ecosystems. In 2004-2013, the landscape context was a hot research trend for scholars.
"Reference evapotranspiration" (#2) and the "river GEODSS" (river geographic decision support system) (#4) are two other research hot topics that have a strong relationship to water resource management. Rainfall prediction, nutrition ability, productivity, and soil are mainly included. The burst period of reference evapotranspiration is from 2004 to 2013, while that of river GEODSS is from 2009 to 2010. The primary application of several machine learning approaches is used to evaluate and judge the effect of simulated results [27]. Moreover, it is also a significant part of sustainable development, especially in water resource management.
"Soil management" (#3) is the fourth identified direction of the 1950 articles and their references. It is clear that the major contents of this field are erosion, nitrogen, quality, and organic matter. There were two burst periods of soil management research: 2005-2006 and 2010-2011. Later, the focus of soil management changed more to environmental problems.
Finally, a noticeable directional trend is "winter wheat" (#5). Although different machine learning approaches have their advantages and disadvantages [40], it is unquestionable that the development of machine learning approaches has pushed this direction into a new trend. Additionally, sustainability, resources, crops, rice, and biomass are the main research topics in this direction. Recently, grain yields have been especially popular, which may motivate some new ideas later on.

Category Network Visualization
The categories of the 1950 works are visualized to show the relationships of different subjects in this field. As shown in Figure 8, for machine learning approaches applied in agricultural management, there were seven main subjects: multidisciplinary agriculture, agriculture, environmental science, computer science, water ecology, ecology, and engineering. The results indicate that the method of one subject could be applied to another subject, which may bring new research trends and even create new revolutions.

Conclusions
A systematic visualization of machine learning approaches and applications in agriculture management is given in this paper. By using the reliable and powerful tool CiteSpace, the author reviewed the current status in six ways. The research material, totaling 1950 articles and their 88,984 references, were all from the Web of Science. The top N strategy was selected as the selection criterion; furthermore, the time span was set from 2000 to 2020. The outcomes are shown in the corresponding part of this article. Moreover, the results are included in the mind map shown in Figure 9. According to the visualization results, major features can be extracted. The reference analysis shows that the majority of high-value articles focus on specific machine learning technology, water resource management, remote sensing, and sustainable agriculture. According to the productive author analysis, the main research figures are divided into two groups; one is led by Rei Sonobe, while the other is led by Ravinesh Deo, whose influence is relatively bigger. As for countries, United States, China, and Australia are the three top research centers of relative research. When it comes to institutions, the Chinese Academy of Sciences has the most publications; there is no American or Australian institution among the top five. The journal Agriculture Ecosystems & Environment has become the most popular journal among all the research materials, followed by Science, Nature, Journal of Hydrology, and Computers and Electronics in Agriculture. The six hot topics detected are "learning technique" (#0), "landscape context" (#1), "reference evapotranspiration" (#2), "soil management" (#3), "river GEODSS" (river geographic decision support system) (#4), and "winter wheat" (#5), and it is noticeable that "learning technique" is the most popular direction in recent years. According to the category analysis, agricultural management with machine learning is mainly linked to agriculture, ecology, computer science, and so on.
As can be drawn from the results, it is noticeable that machine learning is widely used in sustainable agricultural development, agricultural remote sensing technology, and water resource management; their application will be even more widespread in the future. Moreover, the fast development of machine learning methods (neural networks, deep learning, random forests, logistic regression, etc.) in recent years will energize the synthetic application in smart and sustainable agriculture. What is more, China and the United States will remain the two most important contributors in this field for a long time. The Chinese Academy of Sciences, the University of Tehran, Wageningen University and Research, and China Agricultural University are the leaders of the relevant research with huge developmental potential. Additionally, the application of machine learning in sustainable agriculture would provide more space for precision agriculture, yield prediction, crop grading, and pest and disease management in the future.
The application of machine learning approaches could directly affect the process of smart agriculture and promote sustainable agriculture development. With the development of machine learning, increasingly more agriculture problems could be better solved. With all the research material considered, the authors provided an integrated mind map of this area and a corresponding overview. In addition, this research could update the latest research trend of this field. Further research can be conducted by extending more databases or integrating more aspects of material analysis.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable; this research did not involve humans or animals.