Oil Palm and Machine Learning: Reviewing One Decade of Ideas, Innovations, Applications, and Gaps

Machine learning (ML) offers new technologies in the precision agriculture domain with its intelligent algorithms and strong computation. Oil palm is one of the rich crops that is also emerging with modern technologies to meet global sustainability standards. This article presents a comprehensive review of research dedicated to the application of ML in the oil palm agricultural industry over the last decade (2011–2020). A systematic review was structured to answer seven predefined research questions by analysing 61 papers after applying exclusion criteria. The works analysed were categorized into two main groups: (1) regression analysis used to predict fruit yield, harvest time, oil yield, and seasonal impacts and (2) classification techniques to classify trees, fruit, disease levels, canopy, and land. Based on defined research questions, investigation of the reviewed literature included yearly distribution and geographical distribution of articles, highly adopted algorithms, input data, used features, and model performance evaluation criteria. Detailed quantitative–qualitative investigations have revealed that ML is still underutilised for predictive analysis of oil palm. However, smart systems integrated with machine vision and artificial intelligence are evolving to reform oil palm agri-business. This article offers an opportunity to understand the significance of ML in the oil palm agricultural industry and provides a roadmap for future research in this domain.


Introduction
Palm oil is a key source of edible vegetable oil extracted from fruits of the oil palm tree. It has emerged as an important feedstock and biofuel raw material [1,2]. Cultivated on millions of hectares in the world, oil palm has a noticeable part of the volume of world trade. The oil palm started its journey from West Africa and has become a hope for the economy of many countries. However, Southeast Asia is considered the hub of the palm oil industry, with Indonesia and Malaysia being the major exporters [3]. The growing demand for the palm oil threatens the future of the rain forests because expanding oil palm cultivations tend to replace existing forestry. To overcome the negative impacts of oil palm farming by making it a key element of building a future sustainable world, plant science faces three major challenges. The global average palm oil yield of 3.5 tons per hectare should be elevated to the full yield potential around 11-18 t. The tree architecture must be changed to low labour intensity and high mechanization of the harvest. Oil composition should be tailored to the evolving demands of oleochemical, fuel industries, and, most importantly, food [4]. Thus the use of technology is inevitable to deal with these challenges through the sustainable intensification of oil palm [5]. The application of powerful, intelligent ML methods has the potential to transform the current productive agriculture into sustainable agriculture. In precision agriculture, complex tasks such and impartial insight into contemporary studies in the domain. The goal of writing this review was to analyse major progress, current trends, research gaps, and pioneering concepts intended for stimulating research on oil palm with the application of modern techniques. This research was envisaged to produce high-quality recommendations for novice researchers based on a scrupulous evaluation of the available works.
The remaining article is organized as follows. Section 2 provides the background of oil palm and ML. Section 3 explains the review scope and the proposed methodology. It provides the step-by-step procedure followed during the design and implementation phase. Accordingly, results are demonstrated in Section 4, while detailed discussion is presented in Section 5. Finally, Section 6 concludes the article.

Background of Oil Palm and Machine Learning
The Elaeis guineensis, commonly known as oil palm, is a single-stemmed branchless tree, which takes several years of investment and labour work before producing harvestable oil containing fresh fruit bunch (FFB) [25]. Two different kinds of vegetable oils are produced from oil palm fruit at first, namely, crude palm oil (CPO) and palm kernel oil. CPO is extracted from the pulp (mesocarp) of fruit, while its seeds provide palm kernel oil. Average annual raw crop production remains 12-18 tons h −1 in industrial oil palm under favourable conditions [26]. Besides its profitability and high production, some complex characteristics distinguish it from other crops. Unlike annual, biennial, and perennial crops, the oil palms are permanent plants [27] harvested twice a month throughout the plant's lifespan (except the initial growing period) [28,29]. Although some seasonal variations exist in oil palm yield, this divergence is limited to production capacity [30]. The seasonal impacts are considered a noticeable factor in contributing to yield decline. Some additional yield-reducing factors are fertilizers' limitations, irrigation limitations, pests, and infections. Careful regular analysis for disease assessment, fertilization, and timely harvesting contributes to high oil palm yield. Conversely, inappropriate field management and incorrect harvesting strategies restrict the oil quality and quantity [31]. Besides all efforts to enhance oil palm crop production, the yield-reducing factors influence the outcomes significantly. However, the factors mentioned above are variable and interdependent but are not unpredictable. Indeed, these sensitive issues are associated with some complicated parameters. As for fertilizer, its type, adequate quantity, and application frequency require precise optimization. At the same time, the estimation of harvesting time [32] is achieved with predictive analysis. Similarly, during disease assessment, healthy and unhealthy plants are classified. A decisive solution to such a complex problem is rendered in machine learning algorithms (MLAs) that perform prediction, classification, clustering, and optimization. Depending on research requirements, an ML model can be predictive or/and descriptive. A predictive model is used to forecast what is likely to happen in the future. On the other hand, a descriptive model gathers knowledge from the collected data to describe what happened in the past [33]. For solving the problems at hand, selection of the right algorithms is the key. Moreover, the utilized tools and chosen algorithms need to have the capability of handling bulk data. Aided by such intelligent algorithms, well-calculated field management strategies can boost the revenue through enhanced yield and optimized expenses. In addition, the oil palm as a profitable crop also needs technology-oriented tools (to reduce labour costs) and cerebral systems (to avoid the risk of human error) for decision-making.

Review Scope and Methodology
This work reviews the existing studies that apply different ML techniques to serve the oil palm agricultural industry from a problem-solving perspective. The proposed ML methods are explored while emphasizing recent research trends. We do not intend to criticize previous contributions; moreover, it is not a comparative analysis to support any specific research work or algorithm. To the best knowledge of the authors, barely any attempt could be discovered in the existing literature that reviews "application of MLAs in Agriculture 2021, 11, 832 4 of 26 the field of oil palm" as a special case because research on oil palm is limited. Such a dearth can be acknowledged as a relatively novel, yet promising, application field. Moreover, this review only considers MLAs that offer enhanced estimates rather than optimization algorithms taking a long time for finding global optima. In addition, it does not consider supplementary research on the advantages/disadvantages of oil palm consumption as well as field management strategies that sustain without ML. Those topics are not in the scope and do not fit the level of its specified target.
For this review, the methodology was based on a systematic literature review approach to perform a hybrid qualitative-quantitative analysis. The motivation for using the consolidated analysis was to provide a better understanding of the results by weighing the pros and cons of both techniques [34,35]. This method is susceptible to minimize bias, enhance transparency, and preserve flexibility [35,36]. In this study, we organized our empirical outcomes and answers to the "research questions" that are a central part of this research protocol. Moreover, a dual phased systematic method is proposed where each phase comprises five major sections as sub-steps. The procedure is described in Figure 1.
Agriculture 2021, 11, x FOR PEER REVIEW 4 of 2 specific research work or algorithm. To the best knowledge of the authors, barely an attempt could be discovered in the existing literature that reviews "application of MLA in the field of oil palm" as a special case because research on oil palm is limited. Such dearth can be acknowledged as a relatively novel, yet promising, application field. More over, this review only considers MLAs that offer enhanced estimates rather than optim zation algorithms taking a long time for finding global optima. In addition, it does no consider supplementary research on the advantages/disadvantages of oil palm consump tion as well as field management strategies that sustain without ML. Those topics are no in the scope and do not fit the level of its specified target. For this review, the methodology was based on a systematic literature review ap proach to perform a hybrid qualitative-quantitative analysis. The motivation for using th consolidated analysis was to provide a better understanding of the results by weighin the pros and cons of both techniques [34,35]. This method is susceptible to minimize bia enhance transparency, and preserve flexibility [35,36]. In this study, we organized ou empirical outcomes and answers to the "research questions" that are a central part of th research protocol. Moreover, a dual phased systematic method is proposed where eac phase comprises five major sections as sub-steps. The procedure is described in Figure 1

Review Topic Selection
To conduct this review, first of all, a domain-specific less intact research topic wa carefully chosen after a deep study of existing literature related to latent advancement i oil palm with the help of ML. In the quest of exploring the studies that have been pub lished in the domain of oil palm and ML, this review topic is appropriate to analyse sev eral dimensions.

Defining Search String
The basic searching was done by an automated search. The starting input for th search was "oil palm" AND "machine learning." Articles were retrieved, and abstract were read to find the synonyms of the keywords. The search input was used to obtain broad view of the studies. From the basic search experiment, a more complex search strin was built in order to avoid missing relevant studies. The final search string was as follow ("machine AND learning" OR "deep AND learning" OR "artificial AND intelligence" AND ("oil AND palm" OR "Elaeis AND guineensis"). After executing iterated combina tions of defined search strings in five databases, 1060 studies were retrieved through ad vance search with the title, abstract, and keywords. The "anywhere" option in Googl Scholar and Springer Link was selected. Inserted keywords included "oil palm," "Elae

Review Topic Selection
To conduct this review, first of all, a domain-specific less intact research topic was carefully chosen after a deep study of existing literature related to latent advancement in oil palm with the help of ML. In the quest of exploring the studies that have been published in the domain of oil palm and ML, this review topic is appropriate to analyse several dimensions.

Defining Search String
The basic searching was done by an automated search. The starting input for the search was "oil palm" AND "machine learning." Articles were retrieved, and abstracts were read to find the synonyms of the keywords. The search input was used to obtain a broad view of the studies. From the basic search experiment, a more complex search string was built in order to avoid missing relevant studies. The final search string was as follows: ("machine AND learning" OR "deep AND learning" OR "artificial AND intelligence") AND ("oil AND palm" OR "Elaeis AND guineensis"). After executing iterated combinations of defined search strings in five databases, 1060 studies were retrieved through advance search with the title, abstract, and keywords. The "anywhere" option in Google Scholar and Springer Link was selected. Inserted keywords included "oil palm," "Elaeis guineensis," "machine learning," "deep learning," and "artificial intelligence" as search strings according to the procedures defined in [37] with the AND keyword for the exact combination of two strings and OR for flexible searches. The initial search was performed based on the title, abstract, and keywords; however, the full text was considered for final selection, categorization, and information extraction.

Defining Review Scope and Boundary
The available literature on the topic was collected within a unique range of ten years. Only technical papers that propose the application of any MLA to explore oil palm were considered. On the other hand, all the articles that did not apply ML for agricultural oil palm as a special case were discarded. This was done to narrow down the search because applications of ML occur in numerous fields that do not serve our domain-specific investigation. For instance, a study was not included if it investigated public opinion (consumer perception) on the impacts of palm oil using ML methods [38].

Defining Exclusion Criteria
To incorporate only relevant publications, several exclusion criteria were defined to set boundaries for the review. In the systematic procedure of article selection, seven exclusion principles were followed to filter the database as described below. Q7-how are applied algorithms/ models evaluated to guarantee the significance and rationality of the outcomes?

Implementation Phase
This section covers the procedures to acquire relevant literature. Primarily, the articles that contained standalone or hybrid ML models integrated with other methods, precisely applied on oil palm, were searched. The choice of articles included literature with macrolevel (country or state), meso-level (entire plantation), and micro-level (tree or part of tree) oil palm assessment with multidimensional application of statistical, ML, or deep learning (DL) algorithms.

Article Collection
The article collection phase was inaugurated after streaming through the entire set of exclusion criteria. The candidate publications were identified and stored in a distinct database. Initially, we retrieved articles from different sources, including Scopus, Science Direct, Springer Link, Web of Science, and Google Scholar. To limit the literature search, a date filter was applied to extract articles within the defined period.

Initial Analysis
From the available database, the most significant and relevant literature was extracted after full-text reading to finalize the candidate list. Consequently, the number of remaining publications was reduced to 61, and we decided to analyse all the publications further regardless of the impact factor and the number of citations. Figure 2 describes the adopted PRISMA protocol [39].
The article collection phase was inaugurated after streaming through t exclusion criteria. The candidate publications were identified and stored in tabase. Initially, we retrieved articles from different sources, including S Direct, Springer Link, Web of Science, and Google Scholar. To limit the lit a date filter was applied to extract articles within the defined period.

Initial Analysis
From the available database, the most significant and relevant liter tracted after full-text reading to finalize the candidate list. Consequently, remaining publications was reduced to 61, and we decided to analyse all th further regardless of the impact factor and the number of citations. Figure adopted PRISMA protocol [39].

Articles Categorization
The articles categorization phase divides the entire database into two classification and regression. The grouping is performed based on analys of technical classification. Both groups are further divided into different considering the prime research objective(s) according to the strategy sta should be noted that some works containing multiple research objectives a more than one class; however, they were counted only once to avoid misca

Detailed Review
This step includes a survey of literature to be reviewed for answerin questions. All articles were deeply examined on an individual level for in traction as well as to obtain unified statistics of the entire database.

Information Extraction and Reporting
Referring to Q1, the annual distribution of extracted literature was

Articles Categorization
The articles categorization phase divides the entire database into two main groups: classification and regression. The grouping is performed based on analysis type instead of technical classification. Both groups are further divided into different subcategories considering the prime research objective(s) according to the strategy stated in [40]. It should be noted that some works containing multiple research objectives are included in more than one class; however, they were counted only once to avoid miscalculations.

Detailed Review
This step includes a survey of literature to be reviewed for answering the research questions. All articles were deeply examined on an individual level for information extraction as well as to obtain unified statistics of the entire database.

Information Extraction and Reporting
Referring to Q1, the annual distribution of extracted literature was examined. Accordingly, the scope and objectives of the study were considered regarding Q2. Subsequently, a detailed analysis of the methodology stated in the reviewed articles was performed to extract statistics of applied MLAs vis-à-vis Q3. Afterwards, the affiliations of authors and all co-authors responded to the fourth research question (Q4). Further study of materials and methods along with results was performed in order to discover the input data, key input features, and performance evaluation techniques to comply with Q5, Q6, and Q7, respectively.

Results
This section presents the outcomes of the proposed methodology described in the preceding section. The findings portray yearly trends of publishing articles in the discussed domain, the main problems and researchers' objectives, the adopted algorithms, the geographical information, the data sources, and the model performance evaluation methods. Regarding the annual distribution of published work, trends indicate that a higher number of research articles were published during the years 2014, 2016, 2018 and 2019, while, in the year 2015, the number of articles dropped too low. The yearly distribution of research publications is demonstrated in Figure 3. This outcome responds to the first research question (Q1).
Agriculture 2021, 11, x FOR PEER REVIEW 7 of 2 quently, a detailed analysis of the methodology stated in the reviewed articles was per formed to extract statistics of applied MLAs vis-à-vis Q3. Afterwards, the affiliations o authors and all co-authors responded to the fourth research question (Q4). Further stud of materials and methods along with results was performed in order to discover the inpu data, key input features, and performance evaluation techniques to comply with Q5, Q6 and Q7, respectively.

Results
This section presents the outcomes of the proposed methodology described in th preceding section. The findings portray yearly trends of publishing articles in the dis cussed domain, the main problems and researchers' objectives, the adopted algorithms the geographical information, the data sources, and the model performance evaluation methods. Regarding the annual distribution of published work, trends indicate that higher number of research articles were published during the years 2014, 2016, 2018 and 2019, while, in the year 2015, the number of articles dropped too low. The yearly distribu tion of research publications is demonstrated in Figure 3. This outcome responds to th first research question (Q1). The outcomes addressing Q2 provide further insight into the current research trend in the application domain. The results suggest that approximately 84% of work in the lit erature was dedicated to classification techniques, while the remaining 16% of work wa focused on regression analysis for forecasting. The important research objectives in th classification category were tree/land detection, fruit classification, disease detection, mul tipurpose classification, and canopy monitoring with unmanned aerial vehicles (UAV) Among the preceding subcategories, one class, namely, multipurpose classification con tained some articles with diverse (miscellaneous) objectives. On the other hand, variou objectives performed under regression analysis included the prediction of harvest time fruit yield, oil yield, palm oil prices, and seasonal impacts on production. The main cate gories and the subcategories with approximate proportions are presented in Figure 4. The outcomes addressing Q2 provide further insight into the current research trends in the application domain. The results suggest that approximately 84% of work in the literature was dedicated to classification techniques, while the remaining 16% of work was focused on regression analysis for forecasting. The important research objectives in the classification category were tree/land detection, fruit classification, disease detection, multipurpose classification, and canopy monitoring with unmanned aerial vehicles (UAV). Among the preceding subcategories, one class, namely, multipurpose classification contained some articles with diverse (miscellaneous) objectives. On the other hand, various objectives performed under regression analysis included the prediction of harvest time, fruit yield, oil yield, palm oil prices, and seasonal impacts on production. The main categories and the subcategories with approximate proportions are presented in Figure 4.  Further assessment of the reviewed database was performed to find out hi MLAs (referring to Q3). In the overall literature, support vector machine (SVM highest applied algorithm followed by artificial neural network (ANN), rand (RF), regression, classification and regression tree (CART), and convolutional n work (CNN), respectively. The most popular MLAs and their usage frequency in under discussion are presented in Figure 5. To address Q4, we extracted the authors' affiliations to achieve the geograp tribution of the reviewed publications. All authors and co-authors were assum resent the country mentioned in the author's profile. The contribution of researc all countries as per the number of publications is presented in Figure 6. Further assessment of the reviewed database was performed to find out highly used MLAs (referring to Q3). In the overall literature, support vector machine (SVM) was the highest applied algorithm followed by artificial neural network (ANN), random forest (RF), regression, classification and regression tree (CART), and convolutional neural network (CNN), respectively. The most popular MLAs and their usage frequency in question under discussion are presented in Figure 5.  Further assessment of the reviewed database was performed to find out highly used MLAs (referring to Q3). In the overall literature, support vector machine (SVM) was the highest applied algorithm followed by artificial neural network (ANN), random forest (RF), regression, classification and regression tree (CART), and convolutional neural network (CNN), respectively. The most popular MLAs and their usage frequency in question under discussion are presented in Figure 5. To address Q4, we extracted the authors' affiliations to achieve the geographical distribution of the reviewed publications. All authors and co-authors were assumed to represent the country mentioned in the author's profile. The contribution of researchers from all countries as per the number of publications is presented in Figure 6. To address Q4, we extracted the authors' affiliations to achieve the geographical distribution of the reviewed publications. All authors and co-authors were assumed to represent the country mentioned in the author's profile. The contribution of researchers from all countries as per the number of publications is presented in Figure 6. To address the fifth research question (Q5), articles were dug to explore the data sources and utilized data sets contingent upon explicitly stated data sources. This investigation was in support of researchers and practitioners to obtain information on data availability because the performance of ML models highly depends on the quality and appropriate quantity of the input data. It was discovered that multiple sources providing satellite images were mentioned and that field-specific self-collected data is widely used.  The response to Q6 describes the input features used by the researchers. The information is beneficial for feature extraction to develop effective and efficient models. We combined all input variables against each relevant category. For better understanding, the results are tabularized, as can be seen in Table 1.

Multipurpose classification
Tree crowns and categorical features; thermal images; quantitative features; climatological features; ratios of kernel to fruit; shell to bunch; shell to fruit; fruit to bunch; messocarp to fruit; oil to dry messocarp, and To address the fifth research question (Q5), articles were dug to explore the data sources and utilized data sets contingent upon explicitly stated data sources. This investigation was in support of researchers and practitioners to obtain information on data availability because the performance of ML models highly depends on the quality and appropriate quantity of the input data. It was discovered that multiple sources providing satellite images were mentioned and that field-specific self-collected data is widely used. Figure 7 demonstrates the approximate proportion of sources and data sets in the reviewed literature. The miscellaneous category contains multiple datasets for the range of parameters that were not distributable among other subcategories. To address the fifth research question (Q5), articles were dug to explore the d sources and utilized data sets contingent upon explicitly stated data sources. This inve gation was in support of researchers and practitioners to obtain information on data av ability because the performance of ML models highly depends on the quality and app priate quantity of the input data. It was discovered that multiple sources providing sa lite images were mentioned and that field-specific self-collected data is widely used. F ure 7 demonstrates the approximate proportion of sources and data sets in the review literature. The miscellaneous category contains multiple datasets for the range of para eters that were not distributable among other subcategories. The response to Q6 describes the input features used by the researchers. The in mation is beneficial for feature extraction to develop effective and efficient models. combined all input variables against each relevant category. For better understanding, results are tabularized, as can be seen in Table 1.  The response to Q6 describes the input features used by the researchers. The information is beneficial for feature extraction to develop effective and efficient models. We combined all input variables against each relevant category. For better understanding, the results are tabularized, as can be seen in Table 1. Q7 was addressed by analysing models' performance evaluation parameters statistics. The ranking of evaluation parameters based on usage frequency in the reviewed literature is shown in Figure 8. Q7 was addressed by analysing models' performance evaluation parameters statistics. The ranking of evaluation parameters based on usage frequency in the reviewed literature is shown in Figure 8. All above results provided insights into statistics of the literature on ML and oil palm to encounter our seven defined research questions. The specified qualitative review of individual articles from the subcategories can be extracted from Tables 2-7, and the brief precedent description of important studies is provided prior to each table.

Multipurpose Classification
Some articles were added to multipurpose classification due to unique objectives. Articles in this category are the works that have conducted the technical research predominantly focused on classifier's performance. One of the studies included in this group explains the results of various marker systems and modelling procedures for the implemen- All above results provided insights into statistics of the literature on ML and oil palm to encounter our seven defined research questions. The specified qualitative review of individual articles from the subcategories can be extracted from Tables 2-7, and the brief precedent description of important studies is provided prior to each table.

Multipurpose Classification
Some articles were added to multipurpose classification due to unique objectives. Articles in this category are the works that have conducted the technical research predominantly focused on classifier's performance. One of the studies included in this group explains the results of various marker systems and modelling procedures for the implementation of genome selection (GS) in the introgressive hybridization of the dura family. The discussed dura family "Deli x Nigerian," the resultant of a "Deli dura x Nigerian dura," was considered in the analysis [41]. The observed family is a valuable source of new palm trees with higher oil yields and improved bunch characteristics. The work in [42] proposed Jenks natural breaks (JNB) for the classification of chlorophyll sufficiency levels and relative chlorophyll content. The best subset of frond number, chlorophyll-sensitive wavelengths, and the classifier to categorize the chlorophylls according to the nominated sufficiency levels were suggested using a hyperspectral remote sensing platform. Correspondingly, the use of hyperspectral sensing combined with imbalance approaches and MLAs to track the nutrients levels of mature oil palm is highlighted in [43]. Hyperspectral spectroscopy has emerged as a promising alternative to conventional foliar analysis in assessing the nutritional status of oil palms, as the former one is costly and time-consuming. Details about classification techniques used in the oil palm field are provided in five different tables. Table 2 contains literature on multipurpose classification techniques applied to examine oil palm.

Disease Detection
Diseases in oil palm are a major threat to global food supply and security. Timely detection of infected trees is a difficult task due to the lack of appropriate monitoring resources for plants health assessment. Basal stem rot (BSR) is a common plant disease instigated by the Ganoderma boninense (G. boninense) fungus, which passes on the infection to oil palm plantations and causes a significant economic loss. In need of novel disease detection techniques in order to reduce oil palm losses caused by BSR, images along with field observations are utilized to perform classifications of infected oil palm estate through different ML algorithms in [47]. The purpose of this study was to use WorldView-3 imagery to predict disease severity using supervised learning algorithms, as well as to identify the typical indicators of BSR disease in oil palm at various levels of infection severity. Infection levels are classified from mild to severe using three different classes with the help of various classification algorithms. Similarly, Khaled et al. considered the impacts of BSR by classifying the healthy and unhealthy leaves from collected samples [48]. The feasibility of using electrical properties, namely, impedance, capacitance, dielectric constant, and dissipation factor, for rapid detection of the BSR disease in oil palm trees was explored. The detailed analysis was performed using five different MLAs, of which three models were exploited for features selection and two performed classification. Nevertheless, early identification of G. boninense infection is the technique for controlling BSR because contemporary methods do not guarantee complete recovery after severe infection. The key goal of a different study was to examine how well ML models can detect BSR infection in oil-palm-cultivated areas. The BSR disease distribution map was also created through MLA combined with remote-sensing techniques [11]. The rotten bunch disease was detected in [49]. Another study found that healthy trees can be differentiated from BSR-infected trees through intelligent algorithms even in early-stage with less visible symptoms [50]. Table 3 includes itemized material from specific articles that performed classification for the purpose of disease detection in oil palm plantations.

UAV for Canopy Monitoring
The concept of UAV for crop monitoring is not quite new in modern agricultural practices. Aerial colour and colour-infrared imaging have been used to track crop growth for more than 50 years. These methods are currently being re-evaluated for analysis in precision agriculture. UAVs are becoming more popular due to their low cost and ability to fly on low altitudes, which increases the spatial resolution [51]. The aerial imagery can be acquired rapidly during crucial periods of crop development. For oil palm field monitoring through UAV and classification of crop segments based on resulting images, linear regression was performed in [52]. Similarly, ref. [53][54][55][56] collected images of oil palm from UAV for counting or detection of oil palm trees by applying different ML techniques. The articles that suggest automated oil palm canopy monitoring using UAV are included in Table 4 as an imperative part of this review.

Prediction/Estimation
There is a plentiful scope for research on forecasting methods to support in many phases during oil palm cultivation's lifecycle. The studies stated that the current issue in agricultural oil palm is the high difference between potential and actual yields. However, the broad application of multidisciplinary ML methods can overcome this issue. Many factors influence crop yield, including crop genotype, environment, and management strategies. Year-to-year and location-to-location differences in crop yield are greatly influenced by changing environments, both spatially and temporally. Accurate harvesting time, seeds' sowing time, irrigation requirements, price prediction, and yield estimation is extremely beneficial to the production of global food under such circumstances. Decisions on appropriate import and export can be made on the basis of reliable forecasts [57]. In order to implement an appropriate harvesting strategy, oil palm harvest time is predicted based on fruit growth and ripeness level with regression analysis [58]. An oil palm prediction model to estimate production from cultivated area images and tree age estimation is proposed in [59]. Forthcoming CPO prices are predicted in [60,61] based on historical trends in CPO prices. The performances of two regression models trained from historical data of oil palm production are compared to predict future production in [62]. At the same time, a simple prediction model has been proposed to simulate the impacts of the environment on variance in the inflorescences and the number of harvested bunches [63]. Similarly, another study introduced a standard simulation model to predict the growth of oil palm trees and the potential yield [64]. The impacts of foliar nutrients' compositions have been analysed to model and predict the oil palm yield of Malaysia, inclusive of historical trends in FFB yield [65]. Likewise, annual oil palm yield is predicted, and climate impacts are explored, in [66]. Finally, small-scale oil palm yield has been predicted with the help of historical yield data and multiple environmental factors [67]. The above-discussed literature is part of Table 5.

Land Cover/Tree Detection
Oil palm is among the fastest-growing crops in terms of agricultural land use, and their development has been linked to substantial damage to the environment. As a result, this crop often appears in open and procedural discussions that are hindered or skewed by a lack of reliable conservational data. The shortage of consistent cultivation information in particular has continued to be a source of concern. Recent advances in remote-sensing data access and the ability of ML have played a remarkable role to address this issue. In this regard, the study in [10] has obviously contributed not only to oil palm tree detection, but it also incorporates ML for tree counting with adequate accuracy. Tree counting along with young and mature tree identification is carried out by Okoro et al. [69]. Some other studies [70][71][72][73] also performed similar work related to tree counting, age estimation, or tree detection but differ in study level, data sets, and algorithms. Another wide variety of researchers seems more concerned about oil palm area mapping, globally and locally, as [74,75] and many more have proposed models to monitor oil palm area expansion, covered land, plantation detection, patterns of land use for oil palm plantation, aboveground biomass, oil palm fields observation, and so on. Particulars on the above-discussed contributions are provided in Table 6. It covers all reviewed articles implementing various models to detect land cover, oil palm cultivation, or expansion. However, no model has been proposed to differentiate oil palm from other similar trees like date palm and coconut palm. This research gap may cause inaccurate classification of satellite images in case of analysing diversified cultivation on a large scale [76].

Fresh Fruit Bunch Analysis
Prior analysis of oil palm's FFB based on its ripeness stage and other characteristics is needed for sorting and harvesting to obtain the supplementary quantity and upheld oil quality that is compulsory to meet international standards. Ordinarily, oil palm harvest time is defined by its fruit maturity stage that guarantees a high oil extraction ratio (OER) by avoiding over-ripe, under-ripe, or unripe fruit harvest. As FFB is the first raw product gathered from oil palm cultivation, its sensitivity towards the ripeness stage motivated too many researchers for utilizing machine vision to determine fruit age, size, quality, etc. that is vital to ripeness stage identification. A large portion of the reviewed literature consists of methods and models for ripeness detection. The study in [45] makes a significant contribution by demonstrating that the blue to red fluorescence ratio-index can be used to evaluate the oil palm FFB maturity. Real FFB samples were observed for differentiating among maturity stages. On the other hand, ref. [88] processed FFB images to perform classification for fruit maturity level identification. Using the artificial fuzzy logic and red-green-blue (RGB) colour model, an automatic fruit grading system for oil palm was proposed in [89]. The resulting grading system can differentiate among various types of oil palm fruits based on colour intensity to describe ripeness level. Another study used a hyperspectral-based method for ripeness detection of FFB [90]. Conversely, Makky et al. proposed a multiclass classification method integrated with machine vision to develop a fruit grading machine [91]. By applying mathematical equations and regression analysis, ref. [92] describes a method to estimate the maturity stage and age of oil palm FFB based on its location in the phyllotaxis as an added element to confirm the maturity level. Another straightforward model was introduced to build a tool for oil palm fruit ripeness detection in [93]. The method of image segmentation to discriminate the region of oil palm fruit was proposed in [94]. To accomplish the segmentation method, Gaussian filtering has been developed for reducing image background noise. Likewise, an edge detection process has been introduced to obtain an outline of the oil palm fruit. Throughout [45,46,[88][89][90][91][92][95][96][97][98][99][100][101][102][103][104][105][106][107], it was discovered that although adopted methods, ideas, and input data varies, key objectives largely remained on the analysis of oil palm fruit, which can be seen in Table 7.

Results-Based Discussion
This subsection provides the discussion of obtained results referred to in the preceding section. As observed, the annual number of published articles is progressive but lacks variety in objectives. Around 33% of the research focused on fruit classification methods. Mainly, the authors performed FFB analysis for fruit ripeness detection, which is crucial to estimate harvesting time. In most of these studies, the images of FFB and real oil palm fruit samples were collected to detect differences among four ripeness stages, namely unripe, under-ripe, ripe, and overripe fruit. However, due to some common features between consecutive stages, ripeness identification becomes difficult with conventional classification approaches. Some authors also proposed useful features for image processing and oil palm fruit classification. In the category of work carried for health assessment, namely, the disease detection group, several MLAs were applied. Although 7% of the research revolved around the detection of disease and infection in oil palm, the amount of work done was insufficient considering the importance of the problem. Similarly, another 7% of research was dedicated to canopy monitoring using UAV, while the multipurpose classification group with various assorted objectives also constituted 7% of the total number of reviewed articles.
Another noticeable trend according to objectives in reviewed works was land and trees observation, which contributed to 30% of the reviewed literature. The impetus to use ML for this purpose was derived from two dominant needs. First, the difficulty in counting oil palm trees manually and second is to monitor the expansion of oil palm cultivation as it is considered a threat to tropical forests, biodiversity, and associated ecosystems. For better supervision, the reviewed studies used classification techniques with remote sensing and machine vision to monitor oil-palm-cultivated areas. Remote sensing integrated with classification models is widely applied for tree detection, tree counting, oil palm tree identification, and so on. Some authors preferred multiclass classification [108] techniques over binary classification [109] for oil palm detection.
Apart from classification, regression analysis is also another prevailing research area. This category is mainly concerned with applying ML for estimation and prediction. In the reviewed studies, various regression models were proposed to achieve different objectives of forecasting such as oil palm growth, oil quantity, FFB yield, palm oil prices, and seasonal variations. Although progress in forecasting with ML is praiseworthy, it is not satisfactory. The results of the review suggest that forecasting with ML in oil palm is still in its early stages, and several obstacles have to be overcome. Obviously, ML is underutilized in the oil palm industry for descriptive and predictive analysis. Nonetheless, it is anticipated that, sooner or later, ML is likely to contribute to solving more challenging problems in agriculture and specifically for the field of oil palm as part of agriculture 5.0 [110].
Regarding applied algorithms, deep insight into the results shows that the application of algorithms differs in relation to the objectives in defined categories. For instance, for fruit classification, ANN is implemented by researchers more frequently than other algorithms. Being a deep learning algorithm, ANN automatically performs feature extraction. Three parameters that determine the performance of the algorithm are the number of layers, the number of neurons in each hidden layer, and the connections among neurons. Network size optimization is vital in order to attain a viable model in terms of learning rate and computational complexity [111]. To detect symptoms of disease and infection, SVM and RF algorithms are preferred. The reason behind the widespread use of these methods is their capability to learn from small but complex data in a simple way. Accordingly, ANN is the most popular algorithm for prediction, followed by linear regression and SVM. The ranking of algorithms is performed on the basis of usage frequency instead of precision. It should be noted that the high frequency of SVM does not indicate its best performance alone. Mainly, it is due to the appropriateness of SVM for both regression and classification problems and its reliability as a standard algorithm for performance benchmarking.
Concerning countries involved in research on oil palm, results indicate that researchers from Malaysia, Indonesia, UK, the USA, Australia, China, Thailand, and the Netherlands are actively participating. From the literature, it was also found that some prominent research and development (R&D) organizations in the world like CENIPALMA (Columbia), NIFOR (Nigeria), CIRAD (France), IOPRI (Indonesia), and MPOB (Malaysia) deal with multiple aspects of oil palm plantation development in their countries. FELDA is another local government agency in Malaysia that encourages different organizations for agricultural research [112,113]. Current development in the global and Malaysian oil palm industry is a result of significant research combined with good management, a compatible climate, and well-established infrastructure [112]. Given one full credit to each author, statistics revealed that Malaysia is ranked number 1 for publishing approximately 62% of the articles, followed by Indonesia with 18% of the articles of the reviewed material.
Investigation about data sources found that Google earth engine (GEE), QuickBird, Worldview3, as well as some other tools and repositories provided considerable satellite imagery. The vast amount of self-collected site-specific images, field samples, expert opinions, and lab test results were also well utilized. The availability of satellite images is advantageous, but it is also an indication of a massive shortage of freely available quantitative data for research in the oil palm field. Data unavailability is the main challenge for precision farming of oil palm since consistent field data collection is a time-consuming and challenging task. The findings reveal that data collection is progressing with several impediments like difficulty in quantifying yield-reducing factors and real-time data storages.
A collaborative database management system (DBMS) integrated with big data and cloud computing is entailed where oil palm observations gathered during the entire lifespan of plants should be stacked by expert stakeholders since the extensive application of ML is restrained by the availability of gold standard datasets containing fine-grained qualitative and quantitative information.
Dealing with Q7 became complex due to many validation methods. Most of the researchers validated the algorithm with multiple techniques, and performance was also compared with other existing algorithms. Some works did not clearly mention the evaluation metrics, while few did not evaluate the algorithm at all. Expert opinion and human graders were also declared as the model's performance evaluation criteria in some studies. Based on the available information, it was found that the most popular evaluation parameter was overall accuracy (OA), followed by classification accuracy (CA), root mean squared error (RMSE), and F1-score.

General Discussion
Such research was inclined to risks to validity as potential threats to validity can be external, hypothetic validity, and consistency [114]. To address the aforementioned validity issues (i.e., external legitimacy and construct rationality), queries with broad initial search keywords returned a considerable number of studies. The introduced search string covered the entire scope of the proposed review. For the reliability of the review, the validity can be considered well addressed as the methodology of the review was well explained. It can be simulated as it is categorically replicable. If this review is replicated, marginally different selected publications can be returned, but the different personal judgments would be the reason behind the potential differences. However, it is very unlikely to obtain dissimilar overall findings.

Search-Based Discussion
Perhaps some important publications have been intact. Different search strings could have been applied, and a wider search could have reverted with some different studies, but the satisfactory number of articles indicates that the search was broad enough to address the research questions. The most challenging part of searching for articles was the inability of search engines to differentiate between "oil palm" (tree) and "palm oil" (oil), which caused a large number of less relevant articles. Similarly, some articles containing the words "palm" (part of hand), "coconut oil," "coconut palm," and "date palm" were also searched and were discarded during article exclusion.

Analysis Based Discussion
Another potential threat to validity was the method of conducting analysis. All publications did not clearly state the information required to answer our predefined research questions. In addition, strict organization of articles with more than one objective or compound methodology was not possible. For example, "regression for/and classification," "classification for disease detection," "FFB analysis for classification," or "FFB analysis for disease detection/prediction" and "UAV for trees classification" outwardly caused duplication in the grouping. However, the overall analysis remained focused on the primary objective.

Research Questions Based Discussion
Some relevant studies published during our time range might have been missed because of different keywords. It was practically not possible to include every individual MLA in our search string. Deep learning and machine learning algorithms were not separated explicitly. Some of the articles were written by many authors from different countries, so the publications were calculated multiple times based on the affiliations of authors during geographical distribution analysis. Similarly, applied algorithms were calculated based on the frequency of use instead of precision. Performance comparison of the algorithms was not considered in the results due to a variety of objectives and varied data sets, including some calculated parameters. SVM is ranked as the highestused algorithm because of the number of studies that applied it for hybrid modelling, comparison, and performance evaluation of the proposed model. With respect to data sources, it was observed that proper distribution of the data set used was not possible due to the wide range of sources and input variables. Similarly, evaluation parameters vary in different studies. In addition, many studies did not mention evaluation parameters clearly. Most of the studies used multiple evaluation parameters, which are considered just in case of being exceptionally mentioned in work.

Challenges-Based Discussion
Regarding challenges, researchers discussed general challenges in the form of research motivation. Several challenges are site or data-related, while other challenges were faced during model implementation. We explained only some common challenges confined with oil palm. Another separate technical review is planned to discuss technical challenges encountered during the application of different MLAs. From the current literature, the shortage of data availability for important parameters seems the reason for the limited performance of the proposed models. When site-specific interconnected parameters are measured and included, proposed models will have better precision.

Conclusions
This study investigated existing research on the application of machine learning in oil palm actively carried out by researchers during the last decade (2011-2020). For better understanding, the reviewed articles were divided into two categories: classification and regression analysis. The study identified the key objectives in the classification category as disease detection, multipurpose classification, land cover/tree detection, fresh fruit bunch analysis, and automated canopy management/segmentation using UAV. In contrast, the regression analysis included the prediction of FFB yield, CPO prices, harvest time, and seasonal impacts on oil palm. It was evident from the results that the research was mainly focused on land cover/tree detection and oil palm plantation area observation. The other preferred research direction was fresh fruit bunch ripeness identification. Surprisingly, predictive models for fruit yield, oil production, plant growth, and disease forecasting have not been widely applied. The results clearly show that the highly applied algorithm was SVM followed by ANN, RF, regression, and CART. A deeper analysis indicated the scarcity of available data in the field. Most of the data used for oil palm research was in the form of images collected from different satellite or remote-sensing sources. Due to the limited involvement of researchers from a small number of countries, several important research dimensions were missing, such as soil classification to identify suitable land, automated pest and weed recognition, identifying the symptoms of sunlight, nutrients and water limitations in oil palm crop, optimization of fertilizers, seed assessment, and so on. Most importantly, the research trends in the literature showed that existing MLAs and techniques were not adequately coupled together to support effective decision-making systems as compared to several other domains of ML applications. Current research is too inadequate to design practically supportive tools that are capable of increasing yields and improving quality and plantation sustainability in an environmentally friendly way. Integrated ML practices such as big data, remote sensing, data analytics, image-processing, and automated information extraction are progressing to achieve knowledge-based oil palm agriculture. This study provided a vivid idea to understand the advancement of oil palm associated with machine learning and can inspire researchers to find out the relevant problems in this area. Different outcomes of this study magnify the current research on "oil palm and machine learning" from various perspectives. We believe this article will pave the way to the development of oil palm with the help of automation and intelligence.
To improve the research, future work should focus on a technical review to analyse the performance of ML models applied in the oil palm agriculture domain.