A Review of an Artiﬁcial Intelligence Framework for Identifying the Most Effective Palm Oil Prediction

: Machine Learning (ML) offers new precision technologies with intelligent algorithms and robust computation. This technology beneﬁts various agricultural industries, such as the palm oil sector, which possesses one of the most sustainable industries worldwide. Hence, an in-depth analysis was conducted, which is derived from previous research on ML utilisation in the palm oil in-dustry. The study provided a brief overview of widely used features and prediction algorithms and critically analysed current the state of ML-based palm oil prediction. This analysis is extended to the ML application in the palm oil industry and a comparison of related studies. The analysis was predicated on thoroughly examining the advantages and disadvantages of ML-based palm oil prediction and the proper identiﬁcation of current and future agricultural industry challenges. Potential solutions for palm oil prediction were added to this list. Artiﬁcial intelligence and ma-chine vision were used to develop intelligent systems, revolutionising the palm oil industry. Overall, this article provided a framework for future research in the palm oil agricultural industry by highlighting the importance of ML.


Introduction
Palm oil plantations cover millions of hectares worldwide, which encompass a significant portion of global trade. Palm oil trees, or Arecaceae, are a genus of stemless, tree-like monocot plants that thrive in the tropics and are extremely valuable to humans and the ecosystem [1]. The African oil palm, or Elaeis guineensis, is the most prominent palm species native to West Africa, cultivated for its oil-rich fruit as a semiwild food source for over 7000 years. The tree produces a profusion of fruit bunches yearly with each containing between 1000 and 3000 fruits [2]. The processed oil palm fruits are a significant source of oil for society and an integral industrial derivative, i.e., soaps, detergents, and cosmetics. Hence, the industry significantly impacts locals and broader biodiversity in their native regions [1,3,4].
Malaysia and Indonesia are the largest exporters of palm oil in Southeast Asia [5]. The development rate of the palm oil industry is the primary contributor to the agriculture sector's value, thus indicating a 37.1% share. The agricultural industry accounted for 7.4% of Malaysia's GDP in 2020, though its growth rate fell to 2.2% from 2.0% in the previous year. The commodity subsector has seen a diminished growth of 3.6% due to the palm oil shortage (2019: 1.5 %) [6]. This issue is escalated with the rise in crude palm oil prices, directly resulting from the market disequilibrium in India and China. Furthermore, palm oil production is increasingly threatened by environmental, economic, and political factors [7]. Success in the industry is determined by the estate managers' ability to strategically adapt Table 1. An analysis of recent literature reviews on algorithms in palm oil yield prediction.

Reference
Objectives of the Review Lack of the Review [38] Applications of remote sensing technology for palm plantation monitoring. A list of the existing knowledge gaps is compiled. The findings are followed up withrecommendations for further investigation.
This review omits any mention related to the algorithm prediction of palm oil yield prediction. [39] An overview of the biotechnology used in palm oil breeding. This paper only reviews the application of molecular methods.
It does not employ ML techniques. These methods are insufficient to forecast specific breeding. [41] An overview of the techniques used for detecting palm oil nutrition deficiencies using proximal images and application examples.
This review does not mention algorithms for predicting palm oil yields and breeding. [44] An overview of existing ML models for predicting crop yields. Various performance metrics are employed to assess the effectiveness of various strategies.
Excludes other factors such as breeding. It only uses ML models to predict crop yield and few predictions on palm oil. [45] An overview of several existing ML models in forecasting palm oil yields. Multiple performance metrics were used to determine the effectiveness of various strategies.
Excludes other factors such as breeding and only uses ML models to predict palm oil yield.
Based on Table 1, only several aspects of palm oil using ML are discussed, i.e., yield prediction, crop monitoring, and nutrient deficits. Over the last decade, unidimensional reviews have yet to present the full scope of palm oil cultivation involving ML and multiple artificial intelligence (AI) aspects. Thus, this review article provides an in-depth perspective on how ML is used in palm oil cultivation, summarising everything that has been conducted to date. Unlike others, this review employed a systematic protocol to retrieve data from databases, ensuring objectivity. The contributions of this review paper are summarised as follows:

•
Outlining the background of the palm oil breeding programmes.

•
Introducing the factor that affects the palm oil growth and the fruit quality.

•
Comprehensive critical assessment of ML-based palm oil prediction algorithms, critical evaluation of feature sets used, and comparison of relevant research. This review outlined current types of research in predicting palm oil production using ML by extensively collecting various forecasting models for the palm oil ML framework. The rest of the paper is organised as follows: Section 2 presents a detailed review of related studies on the background of palm oil and factors affecting its growth and quality. Section 3 introduces the existing research on predicting palm oil yield using ML. Meanwhile, Section 4 discusses issues of palm oil predictions, future directions, and the selection of the optimum prediction algorithm. Section 5 summarises the framework prediction of palm oil, finalising with the conclusion in Section 6.

The PRISMA Strategy Article Selection
The search is narrowed down to the basic concepts relevant to the scope of this review. Various published studies are likely outside this review article's scope due to the substantial applications in ML. Figure 1 depicts the article selection process following the preferred reporting items for systematic reviews and meta-analyses (PRISMA) strategies [46]. Google Scholar, Scopus, Web of Sciences, IEEE Explore, Science Direct, SpringerLink, and Taylor Francis were the databases used to find the relevant articles. Searching for the most basic information was performed by an automated search engine. We consulted several reputable websites in order to obtain specific information. The scope of this review is primarily determined by two keywords: ("machine learning" OR "deep learning") AND "oil palm". As a result, in the majority of the searches, we used these two keywords in conjunction with other keywords such as "crop yield", "palm oil", "genomic prediction", "yield forecasting", "yield estimation", "prediction model", "Elaeis guineensis", "machine learning", "deep learning", "artificial intelligence". Among the documents we have collected are original articles, review articles, book chapters, conference papers, lecture notes, and reports.
We downloaded, read the title, abstract, conclusion, keywords datasets, models used, techniques, and metric evaluations in the first round of review and selected only the relevant articles manually and then entered them into excel. Articles that have been included in excel will be refiltered to be analyzed with data that may be important in this study to facilitate information retrieval to improve contextual understanding. This selection is due to constraints such as the difficulty to find palm oil study factors, especially in phenotypic data, because they are rarely studied because they have a long plant life span to bear fruit. After removing duplicates, the remaining articles are classified according to their application. During the first round of screening, machine learning-based palm oil and crop prediction articles written in English are used as selection criteria.
The selected manuscripts are scrutinised in the second round of screening. The detailed methodology was the final article selection criteria in this case. After following the above procedures, only "163" articles are chosen for this review.
The following information was meticulously extracted from each article: publication year, dataset information, detailed information about features, prediction algorithms, and system performance. The number of selected documents is divided into four categories: journal articles (82%), conference papers (11%), book chapters (5%), and websites (2%), as shown in Figure 2. In terms of the annual distribution of published work, trends show that a greater number of research articles were published in 2017, 2019, and 2021, while the number of articles dropped too low in 2015. Figure 3 depicts the annual distribution of research publications.

Palm Oil Background
The Arecaceae family includes palm oil (Elaeis spp.), encompassing two species: African palm oil (E. guineensis) and American palm oil (E. occidentalis) (E. oleifera). The male and female inflorescences can be detected on the same palm, and on rare occasions, inflorescences of hermaphrodites can be found as well. A cross-pollinated crop is created when the male and female inflorescences are produced alternately. However, artificial pollination is necessary to produce specific hybrids [47,48]. The drupe, or the oil palm's fruit, matures approximately six months after pollination. Within the drupe, a pericarp is formed from the exocarp (the outer layer) and the husk. The outer layer protects the kernel containing the endosperm and embryo located within the endocarp (the inner layers).
The E. guineensis species are classified into three oil palm fruit forms (pisifera, dura, and tenera) based on the thickness of their shell, an essential factor that will be used for breeding [49]. The pisifera genotype of the recessive homozygous ( ) alleles is shellless. It is believed that most palms are sterile; however, several were reported to bear fruit and have varying degrees of sterility. The dura genotype comprises a thick shell consisting of the dominant ( ) alleles. Meanwhile, the tenera of the heterozygous ( ) alleles exhibit a mesocarp with a thinner outer shell and a thicker inner ring of fibres. Notably, the tenera genotype is the only form of oil palm fruit used for commercial planting because of its higher mesocarp content. The three types of oil palm fruits are the dura (D), pisifera (P) and tenera (T), identified based on the thickness of their shells. Alleles and are codominantly expressed at a single locus that controls shell thickness [50]. The thick-shell dura is controlled by a dominant homozygote gene ( ), whereas the shell-less pisifera is controlled by a recessive homozygote gene ( ). The cross between the dura and pisifera would result in a homozygote tenera hybrid ( ) with a thin shell. Despite having a high mesocarp content of 95 per cent, pisifera is usually female sterile or semifertile and does not produce bunches. Hence, pisifera is only used as the male parent in the tenera hybrid. Notably, the tenera genotype is the only form of oil palm fruit used for commercial planting due to its higher mesocarp content.

Breeding Programmes
Tenera is a commercial term for palm oil, a cross between dura × pisifera, responsible for various new varieties. The germplasm of these materials is used to produce hybrid seeds. Vegetative characteristics and the bunch performance of four seedlings introduced to Indonesia in 1848 are consistent [51]. In 1848, Indonesia introduced the vegetative characteristics and consistent performance of four oil palm seedlings [51,52]. Private companies proposed various independent, breed-specific variations. Deli subpopulations and

Palm Oil Background
The Arecaceae family includes palm oil (Elaeis spp.), encompassing two species: African palm oil (E. guineensis) and American palm oil (E. occidentalis) (E. oleifera). The male and female inflorescences can be detected on the same palm, and on rare occasions, inflorescences of hermaphrodites can be found as well. A cross-pollinated crop is created when the male and female inflorescences are produced alternately. However, artificial pollination is necessary to produce specific hybrids [47,48]. The drupe, or the oil palm's fruit, matures approximately six months after pollination. Within the drupe, a pericarp is formed from the exocarp (the outer layer) and the husk. The outer layer protects the kernel containing the endosperm and embryo located within the endocarp (the inner layers).
The E. guineensis species are classified into three oil palm fruit forms (pisifera, dura, and tenera) based on the thickness of their shell, an essential factor that will be used for breeding [49]. The pisifera genotype of the recessive homozygous (Sh − Sh − ) alleles is shell-less. It is believed that most palms are sterile; however, several were reported to bear fruit and have varying degrees of sterility. The dura genotype comprises a thick shell consisting of the dominant (Sh + Sh + ) alleles. Meanwhile, the tenera of the heterozygous (Sh + Sh + ) alleles exhibit a mesocarp with a thinner outer shell and a thicker inner ring of fibres. Notably, the tenera genotype is the only form of oil palm fruit used for commercial planting because of its higher mesocarp content. The three types of oil palm fruits are the dura (D), pisifera (P) and tenera (T), identified based on the thickness of their shells. Alleles Sh + and Sh − are codominantly expressed at a single locus that controls shell thickness [50]. The thick-shell dura is controlled by a dominant homozygote gene (Sh + Sh + ), whereas the shell-less pisifera is controlled by a recessive homozygote gene (Sh − Sh − ). The cross between the dura and pisifera would result in a homozygote tenera hybrid (Sh + Sh − ) with a thin shell. Despite having a high mesocarp content of 95 per cent, pisifera is usually female sterile or semifertile and does not produce bunches. Hence, pisifera is only used as the male parent in the tenera hybrid. Notably, the tenera genotype is the only form of oil palm fruit used for commercial planting due to its higher mesocarp content.

Breeding Programmes
Tenera is a commercial term for palm oil, a cross between dura × pisifera, responsible for various new varieties. The germplasm of these materials is used to produce hybrid seeds. Vegetative characteristics and the bunch performance of four seedlings introduced to Indonesia in 1848 are consistent [51]. In 1848, Indonesia introduced the vegetative characteristics and consistent performance of four oil palm seedlings [51,52]. Private companies proposed various independent, breed-specific variations. Deli subpopulations and breeding materials are currently available for purchase or trade worldwide. Accordingly, the following populations were used in effective breeding programmes [2,53].

1.
Deli: The thick-shelled dura is a descendant of the original Bogor palms from Java. The subsequent progeny and local selection distribution to other countries resulted in Hence, many seed-production programmes make use of this pisifera.

Factors Affecting Palm Oil Growth and Quality
Palm oil forecasting has been critically shown in studies, particularly in early warning of potential problems. The current state of palm oil modelling indicates a lack of knowledge on palm oil growth and the ability to improve the fruit quality [54,55]; thus, an in-depth analysis is required. It is necessary to examine the heterogeneity of its production to better comprehend and exert control over the emergence of palm oil. An initial step can be taken before establishing predictive modelling to gain insight and understand the factors (environmental factors, phenotypes, and genotypes).
This step includes determining the relationships between palm oil and the method used in the growth analysis. An analysis method is essential to gain insight into the factors and inter-relationships in palm oil production. Palm oil analysis provides a comprehensive understanding of the uncertainty and nonlinearity of its forecast. The most common method of analysing the palm oil data is through, for example, using a preprocessing algorithm or feature selection algorithm. However, previous studies only needed the analysis with no attempt at making predictions.
Oil palms require light and nutrients such as nitrogen and phosphorus as a starting point for photosynthesis. Temperature and water turbidity are other environmental factors that affect palm oil production. Five environmental parameters facilitate the oil palm stress tolerance [56]: rainfall, temperature, relative humidity, light intensity, and wind speed. The inefficient production of palm oil may occur in low humidity levels, especially in the dry seasons when the watering holes and rivers have dried up. Oil palms may be more stressed during this season, affecting their yields.
Humidity concentration in palm oil cultivation can be predicted [57], for example, a study recognised several environmental effects influencing inflorescence abortion and sex determination. These factors include rainfall, monthly rain, sunshine hour, and evapotranspiration, followed by the minimum and maximum temperature [58]. Notably, in various studies, rainfall is positively correlated with crop output but is weakly associated with crop prices [59]. However, a cointegration analysis from 2018 to 2050 reported that rainfall changes affect future and spot prices with different time lags [60]. Furthermore, a study utilised monthly temperature anomalies to successfully predict palm oil yields [61].
Soil type and texture potentially improve the accuracy of forecasts and climate. Malaysian yields of over 30T fruit bunches per hectare were reported on all soil types, excluding shallow ones. This type exhibits issues such as reduced root proliferation, increased sensitivity to drought and flooding, and a higher risk of palms toppling. The most common soil type based on the Asian soil taxonomy is ultisols and oxiols [62,63]. One study used 14 different soil textures and chemical variables on six palm oil plantations to measure the effects on palm oil and the physicochemical properties of soil [64]. Anaba et al. [65] mentioned that palm oil physiological and behavioural adaptations to survive were defined as early tenera hybrid seedling stages. Sandy soils with macropores exhibit low resistance to penetration, producing excellent root growth in length and ramification. Meanwhile, the variability of soil in the water column is recognised as a more effective measurement approach than conventional techniques.
Phenotyping is possible at all levels of an organism's organisation, including the subcellular, cellular, tissue, organismic, and agrophytocenosis levels. This approach is used to identify productivity determinants, abiotic stressors, and plantation planning, especially on lands. This list can be extended to include the determining of the critical mechanisms of oil palm's resistance to pathogens [66]. In oil palm phenotyping, the fruit's size, shape, and physiological and biochemical characteristics are considered. These factors are then evaluated under specific environmental conditions and the oil palm genome [67,68]. Essentially, modern phenotyping methods enable the collection of real-time data and information analysis on the entirety of the phenotypic features. Hence, palm oil growth, development, and reproduction processes can now be comprehensively investigated [69,70]. One common phenotype in palm oil is the identification of ripe fruit. One of the prevalent phenotypes used in research is identifying the types of fruits, whether ripe or unripe.
Genotype has received significant attention, exhibiting the potential to improve the environment, phenotype, and forecast accuracy. The individual genotype can be determined using genotypic assaying, a method used in genotyping technology. Previous research used molecular markers (known DNA markers) breeding by deciphering genetics based on deoxyribonucleic acid (DNA) to enhance palm oil fruits. These markers include restriction fragment length polymorphism markers (RFLPs), amplified fragment length polymorphism markers (AFLPs), and short tandem repeat or simple sequence repeat markers (SSRs). Furthermore, the markers are used in the early stages of oil palm genetic mapping. For instance, the SNP's marker entails linkage and linkage disequilibrium (LD) mapping. This marker is favoured because of its abundance, low mutation rates, and amenability to high-throughput analysis.
Automated and high-throughput genotyping is well-suited to the binary SNPs at the genomewide scale. This genotyping technique is currently possible using the array [71,72] or sequencing-based technologies [73,74]. The SNP arrays serve as an alternative to the laborious cloning and primer design, though it lacks the discovery process and favours genotyping new populations. Hence, new sequencing techniques have emerged, including next-generation sequencing, i.e., restriction-site-associated DNA sequencing (RAD-seq) [75] and genotyping by sequencing (GBS) [76]. Currently, genome-wide markers can be discovered in a model of palm oil. Table 2 presents several categories in which the variables listed above, and others, can be sorted, and additional variables present a list of categories that influence oil palm growth along with input features used in previous studies.

Type of Components Features Data Factor Category
Soil and Fertilisation Levels of potassium (K), nitrogen (N), phosphorus (P), magnesium (Mg), calcium (Ca), nutrient supplements of groundwater, topography and slope, consumption of chemical fertilisers, consumption of agricultural pesticide, type and texture of the soil, and planting density.
Precipitation, humidity, maximum temperature, minimum temperature, mean temperature, rainfall, irrigation carbon dioxide concentration, solar radiation, water measurements, and waterlogging.
UAV for Canopy Surveillance Tree crown, stems length, tree colours, crown size, bare land, and water.

Yield
Low lipase, high stearic and high oleic, low free fatty acids content and high iodine value, oil quality, quality, yield, oil to-dry mesocarp (O/DM), total oil yield per palm (O/P), oil content in mesocarp, oil-to-wet mesocarp (O/WM), palm oil yield (PO), FFB, yield, height, and high carotene.

Tree Detection
Positive/negative histogram of oriented gradients (HOG), crown size, images of palm oil, built-up, bare land, water, and forest.

Vegetative
Total dry matter, plant architecture, leaf length, number of leaves per plant, plant height, height increment (HT), and frond length (FL).

Estimation
Nutrient content and plant height, shell thickness, mantled somaclonal variation together with gene networks for oil biosynthesis, drought and cold tolerance, in vitro regeneration potential, lipase activity, carotene and vitamin E contents, FA composition and iodine value, fruit shell thickness, fatty acid (FA) profile, fruit age, plant life, age factor, average value, daily CPO prices, monthly closing prices of oils, benefits, plant scale index, sex determination, inflorescence abortion, foliar nutrient composition, FFB yield, growth, and respiration.

Others
Tree crowns and categorical features, thermal images, quantitative features, oil per palm, chlorophyll-sensitive wavelengths, and electrical properties of leaves.
Breeding DNA and tissues of the dura, pisifera, tenera; breeding-Deli, AVROS, Yangambi, Calabar, Ekona, Binga, and La Mé. Genotype The review results based on this review will be highly significant given that essential factors are incorporated into the prediction process via big data. However, more research is required to determine if big data can improve the prediction of palm oil performance.

Data-Driven Prediction Model
Data and process-driven models are the most frequently employed algorithms in palm oil forecasting. Several parameters, such as initial conditions and agricultural variables, are required for process-driven models, which entail the extensive accuracy of their systems [9]. Nevertheless, the model exhibits multiple problems due to the uncertainty of breeding analysis and difficulty obtaining necessary data during the simulation. Meanwhile, empirical studies have reported proven success in using Data-Driven Models (DDMs) based on AI and ML [10]. The process requires a training dataset representing the system's behaviour to be fed into a machine-learning algorithm.
The data is then utilised to learn the relationship between the inputs and outputs. Eventually, the trained model can be tested using independent data to determine the generalisability of the new datasets. Accordingly, palm oil forecasts are more accurate if the optimum parameter level is acquired from previous data, though the correct features must first be selected. For instance, clustering can support data discovery and insight rather than relying on domain knowledge-based and unsupervised ML methods. Consequently, we can build a palm oil prediction model using ML, unsupervised and supervised, explained in the following sections.

Prediction of Palm Oil Using Unsupervised ML
Conventional clustering methods are unsupervised, signifying the absence of outcome variables or relationships between the dataset observations [77]. Generally, it is easier to spot patterns and features when the palm oil data are clustered. There are numerous cluster approaches, e.g., hierarchical clustering, comprising Unweighted Pair Group Methods with Arithmetic Mean (UPGMA) and Neighbour-Joining (NJ). These two methods are widely utilised in genotype predictions of palm oil. The UPGMA is a unique high-dimensional data visualisation technique with a quick hierarchical clustering system, which provides a simple approach to constructing the distance matrix of the phylogenetic tree [78]. The method implicitly assumes a constant substitution rate with time and phylogenetic lines. Meanwhile, the NJ method is expressed as an algorithm involving knowledge of the distance between each pair of taxa to shape a tree [79]. The bottom-up (agglomerative) clustering approach creates phylogenetic trees based on DNA or protein sequence data [80]. A recent review of hierarchical clustering in palm oil genotype prediction proposed UPGMA and NJ to cluster the DNA of palm oil [81].
Bayesian network (BN) clustering is a prediction model where the division of items into subsets becomes a probability model parameter for the data. However, this model is subjected to presumptions and clustering references derived from post distribution properties. Notably, NJ and Bayesian methods were effective clustering and model-prediction tools. A study proposed these methods to predict the leaf genotype of the Acrocomia aculeata species [74]. Meanwhile, other clustering algorithms, such as k-means, are used in phenotype predictions of palm oil. K-means clustering is an unsupervised classification technique derived from signal processing. This model seeks to divide n observations into k clusters, where each observation is a cluster prototype of the cluster of the nearest mean. Table 3 indicates that the clustering of phenotype and environment datasets is less, compared to genotype. This result demonstrates that k-means were applied less often to the phenotype dataset.  [36,38]. The prediction models of palm oil growth were previously studied. Hence, this review will focus on these models and other methods such as Genetic Algorithm (GA), Naive Bayes (NB), Random Forest (RF), and Regression and Support Vector Machine (SVM).

Types of Regression
Learning can be achieved using data and regression algorithms in an ML environment to minimise the observed loss or error. Regression is a supervised learning model to predict an output variable based on the known data variables. The objective of the regression process is to predict a continuous quantity from a set of input variables. In other words, a function that accounts for the observed quantity is approximated using relevant variables. Linear Regression (LR) and Multiple Linear Regression (MLR) are standard palm oil yield prediction algorithms. Accordingly, various complex regression algorithms were developed, such as support vector regression (SVR), pairwise regression (PR), principal component regression (PCR), and others.
A commonality among regression-based methods exists because the goal entails the determination of the optimum fit between the two variables. However, the most significant differences between the two methods are the type of image and the input variables used. The LR model depicts the link between one or more independent and dependent variable [91]. Meanwhile, multi-independent variables are more reliable than single ones using the MLR model [92,93], commonly conducted using the least-squares method (LSM).
Furthermore, the generalised cross-validation (GCV) index, which uses the MLR method to account for nonlinearity, can be used to analyse the MARS model [94,95]. A study proposed LR to predict the vegetative components of oil palms [96]. Similarly, Solichin [97] suggested a prediction method for LR, SVR, and MLP combined with a time series model and intelligent nonlinear model. Notably, this approach improves the error caused by time series analysis. Table 4 comprehensively illustrates regression predictions derived from previous studies. A support vector machine is a binary classification system that produces a linear hyperplane for data classification [108]. The hyperplane is intuitively separated, exhibiting the most significant distance to the closest data point in a reasonable margin. Generally, the larger the margin, the less the classification's generalised error. SVM differs from SVR, used to address regression issues through the SVR method [109]. The SVR algorithm aims to find a hyperplane in a space in N dimensions that defines the data points.
Concerning the palm oil growth point, a model of classification of SAR pictures is proposed for the L-band by hybridising SVM and CNN [110]. Subsequently, CNN removes unnecessary images in the first step, and SVM is then used as a preliminary basis for extracting plantation pixels. Chen and Liao [111] suggested a binary SVM classification algorithm using aerial UAV photos to identify the palm oil growth rate. Resultantly, SVM requires fewer samples but produces high prediction accuracy. Table 5 describes SVM prediction based on the previous studies.

Random Forest (RF)
Random forest is widely used in agricultural research to examine organisms' spread and model ecosystem suitability [121,122]. It is a supervised learning algorithm commonly trained with bagging methods and an ensemble of the decision tree. The bagging approach is based on the premise that combining learning models improves the final result. Several studies have investigated the RF algorithm's potential for random agricultural forests [123,124]. The advantages of using this approach include resolving vector collinearity issues, often encountered while using traditional LR models.
Simultaneous and discrete variables could be used in the RF model, exhibiting an advantage over linear regression models [125]. Nevertheless, the RF variants present several drawbacks. For instance, the model predictions will over-ride other than the range of training outcomes. Hence, it can be challenging and unpredictable to estimate palm oil prediction due to the harsh environmental conditions. This restriction is necessary for RF regression in extrapolating the results and the absence of adequate dataset preparation. Furthermore, the constructed RF models presented a below-average yield. This phenomenon may be mitigated by the measurement number and the required training predictors. Recent studies demonstrated the potential of RF regression with long-term algometric variables to forecast palm oil. Table 6 presents RF predictions derived from previous studies.

Deep Learning
Artificial Intelligence (AI) depicts the human brain structure through deep learning, a fundamental part of AI. The primary structure of deep learning, the neural network, is used to process hidden layers to improve learning. However, the role of AI in agriculture is vague, given that it is essentially a transformation of knowledge and labour. The primary reason is that the world is significantly dependent on the diversity of organisms. The agricultural sector plays the most crucial role in preserving biodiversity. The change or difficulty in agriculture's ecological balance directly impacts the human race and its ability to maintain the balance of the ecosystem. As such, the combination of ML and deep learning supports palm oil forecasting, which utilises two methods: artificial neural network (ANN) and time-series.

Artificial Neural Network (ANN)
An ANN is an ML algorithm that can model the nonlinear dynamic input-output relationship [129], comprising three layers: an input layer, a hidden layer, and an output layer [130]. Several variables influence ANN success, including the number of nodes in the hidden layer, the learning intensity, and the training tolerance [95]. In a sequence of iterations, the study rate specifies the sum at which weight changes to obtain the expected value within a reasonable range of the observed value. The conventional ANN is a minimal local issue, whereby an optimised mechanism frequently stops in a local state rather than a global state.
The standard ML models frequently face difficulty in overfitting. The reverse and forward optimisation process, which maximises performance, is conducted within the backpropagation backward propagation learning algorithm. The removal of loss functions that could take place during the reverse propagation can be handled with effective activation functions such as sigmoid. Table 7 describes ANN predictions based on previous studies.

Time-Series
A Recurrent Neural Network (RNN), or LSTM, is a feed-forward network with a backpropagation loop. LSTM provides additional benefits by holding onto the value of the previous output for short periods, serving as a small part of a network's memory that supports feedback analysis. Dynamic time-series models, such as the Autoregressive Integrated Moving Average (ARIMA), are the most widely used time-series approach. Other models, such as the Autoregressive (AR), Moving Average (MA), and Autoregressive Moving Average (ARMA), are subclasses of the ARIMA model. Box and Jenkins suggested a popular version of the ARIMA paradigm, the Seasonal ARIMA (SARIMA), for seasonal time-series forecasting. The ARIMA model's popularity is primarily due to the associated Box-Jenkins methodology for constructing the best model, in addition to its ability to represent a wide range of time-series effortlessly.
The complex variables and combinational inputs are the most common utilisation examples. Vegetative index, weather, and climatic data are used to track the progress of palm oils. Accordingly, LSTM is used to analyse the data for the attribute variables to better understand the time-series data. It yields an estimated 83 per cent of the crop [77]. It produces an estimated accuracy of 83 per cent of palm oil age detection using the Landsat time-series [77]. The LSTM model possesses two layers, one for processing genotypes and another for environmental factors. The two-layered approach provided a more effective process and straightforward suggestions with a value of 8.52 of RMSE [138].
The palm oil's satellite-based SIF was analysed to determine the vegetative index, exhibiting 0.69 of r 2 [139]. The LSTM method combined price and time-series data to better estimate yield. The one feedback connection of the RNN can be directly processed by the one feedback connection data [140]. Furthermore, a low training time indicates that the processing time and storage space efficiency were due to this data type. The predicted palm oil yield suggested 2.7098% of MAPE. Table 8 describes the time-series forecasting based on previous literature.

Other Approaches in Palm Oil Prediction
A hybrid solution incorporates more than one algorithm to boost efficiency. Integrating several ML algorithms will vastly increase the overall outcome of each algorithm by tuning, generalising, or adapting to new tasks. Table 9 shows other approaches to prediction derived from previous findings. Metaheuristic algorithms, such as the Genetic Algorithm (GA), can be used to solve a wide range of optimisation problems. The GA structures comprise two types based on the assumption of a finite solution space: local and global search [145]. The GA strings' populations, or chromosomes, are defined according to an optimisation problem.  A study employed GA to improve the generalisation ability of the Radial Basis Function Neural Network (RBFNN and Fuzzy Radial Basis Function Neural Network (FRBFNN) [157]. Meanwhile, Ibrahim et al. [158] proposed a prediction method of fatty acid and an intelligent nonlinear model to improve the error. Ishola et al. [159] suggested an investigation to determine the palm oil kernel process by improving the error of response surface methodology (RSM). This method includes the adaptive neuro-fuzzy inference system (ANFIS) accuracy when combined with GA.

Analysis and Discussion
The data-driven projects of various categories in a single dataset are a difficult task; thus, they are frequently overlooked. The issue with these projects includes the differing data complexity and its update frequency; hence, distinct preprocessing methods will be used to clean up the data because of the frequency difference. Various factors must be considered in future research. Table 2 shows how various critical factors, such as environment (E), phenotype (P), and genotype (G), can be divided into different categories. These factors presented evidence of unfinished work based on the literature. As of 2011, researchers have been focusing on these factors, which are used to summarise trends between 2015 and 2021, as shown in Figure 4.  Numerous factors, including genetics, climate, and disease detection, were found to affect palm oil yield in this study. The term "phenotypes" refers to the traits that can be seen and measured in a breeding program, such as increased genetic variation in palm oil yield. Another factor that affected the study's results was the selection of reliable data. Genotype, phenotype, and environmental data from oil palm crops are all difficult to analyse because of the high level of complexity. Most researchers use genotype data to gather breed-specific DNA for selection purposes. Additional data such as bunch characteristics, yield component, and vegetative measurement can be used for relative selection in oil palm. Environmental and historical data are interdependent. Data from both sources can be used to improve the process and save money. Figure 5 shows most of the data features from previous studies utilised to predict palm oil. Seventy-four per cent of studies that rely on the climate and soil and fertilisation (twenty-six per cent) are in the environment category. However, it is impossible to indicate the most effective subfeatures for phenotype factors under each feature group. For instance, there are nine subfeatures listed under "phenotype factors" which are utilised extensively, e.g., bunch, fruit, and diseases. Nevertheless, these studies did not specify the optimal set of sub-features under the bunch and yield of palm oil, requiring further research.
The ideal phenotype subfeatures based on various regions should also be investigated. Because of the wide range of existing factors, such as diseases, tree detection, and physical characteristics of palm oil yield, predictions in similar fields should be made over several years [45,105]. Palm oil yield predictions in similar fields should be recorded over several years due to variations in existing factors such as diseases, tree detection, and physical characteristics [45,105]. Palm oil forecasting does not emphasise phenotype data similarly to genotype representation; therefore, redundant features must be eliminated to ensure the predictive model accuracy.  Numerous factors, including genetics, climate, and disease detection, were found to affect palm oil yield in this study. The term "phenotypes" refers to the traits that can be seen and measured in a breeding program, such as increased genetic variation in palm oil yield. Another factor that affected the study's results was the selection of reliable data. Genotype, phenotype, and environmental data from oil palm crops are all difficult to analyse because of the high level of complexity. Most researchers use genotype data to gather breed-specific DNA for selection purposes. Additional data such as bunch characteristics, yield component, and vegetative measurement can be used for relative selection in oil palm. Environmental and historical data are interdependent. Data from both sources can be used to improve the process and save money. Figure 5 shows most of the data features from previous studies utilised to predict palm oil. Seventy-four per cent of studies that rely on the climate and soil and fertilisation (twenty-six per cent) are in the environment category. However, it is impossible to indicate the most effective subfeatures for phenotype factors under each feature group. For instance, there are nine subfeatures listed under "phenotype factors" which are utilised extensively, e.g., bunch, fruit, and diseases. Nevertheless, these studies did not specify the optimal set of sub-features under the bunch and yield of palm oil, requiring further research.
The ideal phenotype subfeatures based on various regions should also be investigated. Because of the wide range of existing factors, such as diseases, tree detection, and physical characteristics of palm oil yield, predictions in similar fields should be made over several years [45,105]. Palm oil yield predictions in similar fields should be recorded over several years due to variations in existing factors such as diseases, tree detection, and physical characteristics [45,105]. Palm oil forecasting does not emphasise phenotype data similarly to genotype representation; therefore, redundant features must be eliminated to ensure the predictive model accuracy.
The insufficient data due to the frequency update would hinder the prediction process. Specific data are updated every minute, hour, or day, depending on the volume. However, sporadic data limit the finding of patterns, which occurs when data are collected only once a month. This phenomenon can also occur in the event of substantial missing data and when dealing with small-sized data. Hence, larger-sized datasets are the suggested means of solving this issue. Oil palm phenotype data are challenging to come by due to a lack of available information. Only a few factors reviewed thus far have fully utilised all three, even though this paper has learned that climates under the meteorological factor are critical. However, when it comes to the category of factors, there have been very few that have used all of them. Besides, genotype environmental and historical information can be found in general datasets, whereas phenotype data come from the statutory body of the Malaysian Palm Oil Board (MPOB) or Felda Global Ventures (FGV), which have their own procedures. Because of the high cost of capturing data in the field of study, phenotypic data are difficult to communicate. Data storage and analysis tools, phenotype reduction, and other laboratory costs are among the challenges to developing an optimal digital phenotype platform [160]. As a result of these concerns, several gaps must be addressed, such as dealing with a wide range of data sizes and implementing the appropriate data-driven method. The insufficient data due to the frequency update would hinder the prediction process. Specific data are updated every minute, hour, or day, depending on the volume. However, sporadic data limit the finding of patterns, which occurs when data are collected only once a month. This phenomenon can also occur in the event of substantial missing data and when dealing with small-sized data. Hence, larger-sized datasets are the suggested means of solving this issue. Oil palm phenotype data are challenging to come by due to a lack of available information. Only a few factors reviewed thus far have fully utilised all three, even though this paper has learned that climates under the meteorological factor are critical. However, when it comes to the category of factors, there have been very few that have used all of them. Besides, genotype environmental and historical information can be found in general datasets, whereas phenotype data come from the statutory body of the Malaysian Palm Oil Board (MPOB) or Felda Global Ventures (FGV), which have their own procedures. Because of the high cost of capturing data in the field of study, phenotypic data are difficult to communicate. Data storage and analysis tools, phenotype reduction, and other laboratory costs are among the challenges to developing an optimal digital phenotype platform [160]. As a result of these concerns, several gaps must be addressed, such as dealing with a wide range of data sizes and implementing the appropriate data-driven method.
When parameter selection is required, techniques for managing and conducting data experiments become complex. In addition, parameter tuning must be performed. More experiments conducted on the data yield superior results. The difficulty in developing techniques for future research is minimising the impact of data inaccuracies within datasets. During testing and training, the first step recommended by this study is to identify and discard low-quality data points [86]. Due to the random selection of features, varying quantities of features deemed too high or too low may result in model-fitting issues and performance fluctuations. According to previous studies, the randomness in the dataset has caused the model to be overfit and forecasting errors to be made [115,126]. Another possible cause of forecasting failure is a lack of suitable feature selection methods for ML and DL. Another challenge is model selection, which requires data. Accuracy in oil palm When parameter selection is required, techniques for managing and conducting data experiments become complex. In addition, parameter tuning must be performed. More experiments conducted on the data yield superior results. The difficulty in developing techniques for future research is minimising the impact of data inaccuracies within datasets. During testing and training, the first step recommended by this study is to identify and discard low-quality data points [86]. Due to the random selection of features, varying quantities of features deemed too high or too low may result in model-fitting issues and performance fluctuations. According to previous studies, the randomness in the dataset has caused the model to be overfit and forecasting errors to be made [115,126]. Another possible cause of forecasting failure is a lack of suitable feature selection methods for ML and DL. Another challenge is model selection, which requires data. Accuracy in oil palm breeding models is achieved through a hybrid method in this study. A model selection problem is presented because it transitions from traditional to modern model [147]. Rather than using inefficient and time-consuming traditional modelling methods to optimise crop yields, farmers are now turning to AI models that are more accurate and efficient. Oil palm model selection is an example of how knowledge can be applied in various ways. Evaluating the impact of different parameters on oil palm yields is possible. The artificial intelligence (AI) method is a form of optimisation based on natural selection and inherited principles [159].
Previous studies have employed various classification and regression algorithms to predict palm oil. Figure 6 shows the prediction algorithms based on the previous findings. The extracted data indicate that regression (24 per cent) is the most used palm oil prediction algorithm, followed by clustering, SVM, time-series, and RF. The second most popular algorithms for prediction the clustering approaches, i.e., NJ, UPGMA, and Bayesian network, focusing on the breeding palm oil prediction. Meanwhile, K-means perceive behaviour on nonlinear variables, though more studies are required to test this claim. The third most preferred algorithm for prediction is SVM, though several flaws must be resolved. The SVM results were more significant than the MSC-PCA hybrid with SVM, SVM-RBF, and SVM-ANN [104,107]. This result implies that the increased SVM performance is attributed to the improved optimisation techniques for a wide range of parameters [114].
Previous studies have employed various classification and regression algorithms to predict palm oil. Figure 6 shows the prediction algorithms based on the previous findings. The extracted data indicate that regression (24 per cent) is the most used palm oil prediction algorithm, followed by clustering, SVM, time-series, and RF. The second most popular algorithms for prediction the clustering approaches, i.e., NJ, UPGMA, and Bayesian network, focusing on the breeding palm oil prediction. Meanwhile, K-means perceive behaviour on nonlinear variables, though more studies are required to test this claim. The third most preferred algorithm for prediction is SVM, though several flaws must be resolved. The SVM results were more significant than the MSC-PCA hybrid with SVM, SVM-RBF, and SVM-ANN [104,107]. This result implies that the increased SVM performance is attributed to the improved optimisation techniques for a wide range of parameters [114]. The SVM provides additional kernel functionality, improving the prediction models through feature recognition. This model possesses a more compact management potential than ANN, specifically in its distribution, geometry, and data overflow. Furthermore, the SVM is theorised based on the principle of reducing structural instability, diminishing the high bond error instead of training error [115]. Moreover, this study found various issues in ANN related to DL algorithms. For example, the ANN output involves several hidden layers and types of activation functions. Alternatively, a different optimisation algorithm can be utilised, optimising hidden layers and the ideal activation function, i.e., a genetic algorithm [161,162]. This substitute procedure may not be the highest performance substitute. The terms "much used" and "best output" are not similar definitions (See Figure  6). The regression and time-series predictably exceeded the ML and DL techniques as per the anticipated palm oil observations. The parameters and goal parameters could be isolated from regression and time-series [147,163].
To date, there have been very few investigations into the RF algorithm's potential for classification and regression analysis in agriculture. Palm oil yield estimation may be difficult and unreliable due to varying environmental conditions in different fields. This constraint may be critical in extrapolating the results for RF regression as well as a result of The SVM provides additional kernel functionality, improving the prediction models through feature recognition. This model possesses a more compact management potential than ANN, specifically in its distribution, geometry, and data overflow. Furthermore, the SVM is theorised based on the principle of reducing structural instability, diminishing the high bond error instead of training error [115]. Moreover, this study found various issues in ANN related to DL algorithms. For example, the ANN output involves several hidden layers and types of activation functions. Alternatively, a different optimisation algorithm can be utilised, optimising hidden layers and the ideal activation function, i.e., a genetic algorithm [161,162]. This substitute procedure may not be the highest performance substitute. The terms "much used" and "best output" are not similar definitions (See Figure 6). The regression and time-series predictably exceeded the ML and DL techniques as per the anticipated palm oil observations. The parameters and goal parameters could be isolated from regression and time-series [147,163].
To date, there have been very few investigations into the RF algorithm's potential for classification and regression analysis in agriculture. Palm oil yield estimation may be difficult and unreliable due to varying environmental conditions in different fields. This constraint may be critical in extrapolating the results for RF regression as well as a result of the lack of sufficient training datasets. The developed RF models overestimated the average yield and underestimated the yield below it. As the number of observations and appropriate predictors for training increase, this issue may be mitigated. According to this study, long-term agrometric variables can be predicted using RF regression. It is clear that the ML and DL techniques, as well as LASSO's ability to isolate dynamic correlations between the variables and the target predictor, outperformed LR in palm oil yield prediction [155].
Both clustering and classifier frameworks are used in the selected articles. Since images are used for clustering in certain articles, the article utilises a numerical dataset in conjunction with machine vision rather than ML. The use of clustering architectures may be investigated in depth in order to identify various opportunities regarding this issue. In time series prediction, determining the window size, lag value, and even the features selected as input for the prediction algorithm is standard practice (i.e., LSTM). It is also common for LSTM to use past data to learn from all the features or input it receives [77]. Due to the prevalence of LSTM algorithms in palm oil forecasting, a focus on DL architectures should be investigated. All DL architectures such as ANN and LSTM were found to be the most frequently used in our review articles that used DL approaches. However, LSTM-CART hybrid DL algorithms [140,144] have also been applied to this problem.
In previous crop and palm oil yield prediction research, a wide variety of performance evaluation metrics have been employed. Accuracy is the most utilised metric in the algorithm for predicting crop and palm oil yield. When palm oil prediction models are evaluated using different performance metrics, it is nearly impossible to compare them. A standard and systematic approach, or one single measure, should be used to quantify the model's palm oil prediction accuracy. It is possible to compare crops and palm oil prediction algorithms if the performance metrics used are the same for all of the algorithms.
Only Hilal et al. [147] used a maximum number of features under the historical yield data, cropland information, and climatic information groups in other previous studies related to palm oil. Palm oil yield has been predicted using only one feature in other studies [30,56,110,140,151]. Using a large number of features in a crop yield prediction model may be more accurate than using a smaller number of features. Studies, such as [98], show that by combining information from WOFOST crop model outputs data, weather data, crop data, soil data, and yield statistics, they were able to accurately predict soft wheat, spring barley, sunflower, grain maize, sugar beets, and potato yields. Data on water, earlyseason weed control, late-season weed control, season-long weed control, temperature, and crop management were used to predict soybean yield by Landau et al. [24]. Soil properties, different types of fertiliser application rate, and yield data from historical yields were used to predict corn yields by Du et al. [21]. As a result, the palm oil yield prediction research should also take into account a large number of features groups.
The efficiency with which palm trees absorb nutrients is directly related to their soil properties. Nutrients like these have a significant impact on palm yield effectiveness [38]. It is possible to estimate palm oil yield by looking at a wide range of soil properties, such as soil electrical conductivity and conductivity to clay and silt, organic carbon content, pH and cation exchange capacity, bulk density, and the percentage of clay, silt, and sand in the soil. The use of remote sensing to examine soil properties in a palm grove could be a promising avenue for future investigation. It is possible to monitor the soil's characteristics from afar because the reflected light has different responses. For example, soil moisture content may be taken into consideration when determining the dielectric characteristics of the spreading wave. It is possible to investigate the relationships between recorded signals and a variety of soil characteristics, including soil texture, soil form, and overall soil structure, in order to develop analytical relationships.
However, the estimated parameters, such as soil moisture, LAI, greenness of the palm canopy, and height, that contribute significantly to the yield prediction should be taken into account. A precise yield forecasting model can be created once the connection is defined. Including historical yield data, various climatic data, vegetation data, fertiliser application information, and other types of satellite data could improve yield forecasting performance. Remote sensing makes it possible to quickly gather data on the ground from a large area. For example, it has been used to monitor the environment and topographic conditions [98,126]. It collects spatial data without any direct contact. According to an increase in demand for oil palm products, numerous studies have been conducted on oil palm plantation mapping using remote sensing.
Mapping a palm plantation accurately can have significant economic and environmental advantages. Traditional ML techniques, classical image processing methods, and deep learning methods can all be used to identify the crowns of trees. RF [125,126] and SVM [111,116] are two of the most commonly used classifiers for tree crown detection in traditional ML approaches. When compared to traditional image processing methods, ML techniques have made significant progress. Image binarization and segmentation are commonly obtained using high-resolution UAV images such as local maximum filter [111]. However, complex situations like overlapping tree crowns may degrade detection results due to the method's unnecessary labels. Researchers using remote sensing images have increasingly turned to DL-based classifiers, which employ multiscale computational methods, in recent studies to detect trees from satellite images; the most advanced studies use deep learning-based classification [144]. Features can be extracted using DL, which is known for its impressive capacity. With Faster RCNN [156], advances in object detection based on pre-trained models have been made to detect palm trees in the plantation area. It is possible to improve the predictive accuracy of classifiers by optimising their input hyper-parameters, which are highly dependent on their accuracy. In order to improve accuracy, hyperparameter optimisation must be used. Its accuracy was superior to other classifiers, such as SVM, CART, NN, and IGSO-RF classifiers that used the IGSO algorithm to fine-tune the parameters of the traditional RF model [125].
Disease image recognition plays a critical role in the development of innovative agriculture. Oil palm diseases have been studied using a wide variety of diagnostic methods. Categories such as SVM [120], as well as ANN [135] and NB [152], are included. In this regard, traditional ML methods have some drawbacks when it comes to monitoring palm oil disease. Furthermore, existing methods are heavily reliant on disease images that are of high quality. As a result of these methods, which include image preprocessing, image segmentation, feature extraction, and classification, as well as significant operations that add complexity and delay implementation, the implementation can be significantly delayed. Training is challenging using traditional ML methods, especially when the training dataset is large. Deep learning and transfer learning, two of the most recent advanced ML techniques, have the potential to aid in the development of oil palm disease recognition.
Oil palm plantations require a lot of monitoring, which takes time and effort.RF [118], ANN [137], and hybrid CNN-SVM [110] are just a few of the ML approaches that have been used to track palm growth via remote sensing in the past. Nevertheless, landowners must extract useful information from remote sensing data. There is a chance that this will lead to solutions via deep learning, transfer learning, and recognition of objects.
Palm oil yield prediction is only indirectly addressed in a few ML-based studies conducted in the industry. So far, there have been five studies examining how well scientists can predict phenotype palm oil yields [56,87,141,150,151]. In light of these studies, it is difficult to determine which algorithm is best for predicting palm oil yields. It is possible to predict palm oil yield with the help of some regression algorithms based on ML (ML). Multiple algorithms should be considered instead of a single algorithm to improve prediction model robustness.
It is a common practice in the palm oil industry to employ SVM models and algorithms. Perhaps the most critical part of this success is regularisation. Outliers and noise are common in agricultural data, making it challenging to analyse. The classifier's generalisation capabilities may be improved through regularisation, which could help solve this issue [114]. The regularised linear SVM classifier outperformed the RBF classifier in this study. For example, the regularisation parameter C and the RBF width using kernel 2 require manual adjustment in SVM. These benefits occurred as a result of the low execution speed. In addition, because the SVM decision rule is a simple linear function in the kernel space, it is stable and has low variance [114]. It is critical to obtain low variability in agricultural data because features are highly dynamic and change over time. SVM's ability to withstand the curse of dimensionality may be the final possible explanation for the problem. Small training sets and high-dimensional feature vectors have allowed SVM to achieve excellent results [114]. In terms of finding a nonlinear pattern in data, the key advantages of RF are its generalisation performance, high speed, and the use of an ensemble of tree-structured classifiers. It has low data preprocessing requirements in training because it is robust regarding to unit differences and can make accurate predictions on sparsely annotated data. The RF algorithm yields better results in the selection of features. Meanwhile, the input hyperparameters play a vital role in RF.
High-dimensional feature vectors can benefit from SVM, one of the best classification algorithms. To solve the problem of high dimensionality, dynamic classifiers can also consider a sequence of feature vectors rather than just one high-dimensional feature vector. Another benefit of using these methods is that they can deal with large datasets. A combination of classifiers may be used to solve this problem because it reduces the variance, especially when there is no stationarity in the dataset. This method is still useful, but it may be outperformed by a classifier that uses a combination of RBF and SVM. Due to the limited number of training datasets, simpler techniques such as LDA may be used. The k-NN algorithms are rarely employed in the palm oil industry due to the curse of dimension. However, k-NN may be superior if only low-dimensional feature vectors are taken into account in agriculture and the palm oil industry, or if no feature vectors are needed. In remote sensing, the ANN is widely used to predict vegetation parameters and crop yields because it can retrieve complex, dynamic, and non-linear patterns from the data. Many libraries and software tools are readily available, making them the most basic form of ML. A few drawbacks exist in practical applications, such as the rate at which neurons are learned, how many neurons are chosen for the hidden layers, and the overfitting issue when using a large training dataset. Large datasets cause the process to slow down. As the number of epochs required increases, backpropagation networks become slower in training. Using a CNN for image data with a large training set is a good option. A leaf image can identify a disease, map an oil palm plantation using remote sensing-based spectral images, and so on. The use of advanced CNN architectures, such as Faster R-CNN, is also highly effective for object recognition-based tasks. It is possible to predict palm oil yield and oil palm price using regression-based algorithms such as RF, ANN, and SVR. In addition, to improve the predictability of the model, an ensemble of multiple algorithms should be investigated rather than a single algorithm.
In DL, even though LSTM appears to be the best method in general, LSTM has primarily been applied to data pertaining to palm oil age detection, environment, as well as price prediction. Palm oil yield requires additional research. It is also worth noting that there are several possible reasons for the LSTM's fluctuating performance, including the number of indicators used and the method's limitation. According to [59,77], while the sequence-to-sequence architecture of LSTM can store long-term memories, its internal representation is limited to a fixed length. Internal representations or input sequences of LSTMs have a fixed length because information is divided into small pieces for easier recall. Various random weight initialisations impact LSTMs, and as a result, they behave in a manner similar to that of a feed-forward neural network. As a result, many unresolved issues must be addressed to improve or resolve the issue at hand. LSTM feature engineering and a basic grasp of the phenotypic factors of palm oil data can enhance performance in this area. The current performance of LSTM is expected to improve with additional parameter tuning and additional learning methods.
The DL is a subset of ML, commonly believed to be significantly persuasive for palm oil prediction. However, the only difference between ML and DL is that the latter is inefficient for a limited training dataset; thus, fundamental components are automatically removed from this dataset. The attributes of other samples must be manually removed to preclude the use of successful DL. Hence, detailed studies must be conducted on the use of DL strategies in palm oil, and there is still room to delve into the performance of algorithms. The research has shown that DL usage in forecast time-series is superior to basic ML in average performance; hence, future studies can focus on this idea. ML data-driven models were unable to extract features from multifactor timing data efficiently. According to this study, most models did not reflect the data's phenotypic characteristics. Time-series has shown to outperform other forecasting methods with minor forecasting errors.

Identifying Palm Oil Prediction Framework
In the light of previous research on palm oil prediction, we have proposed a framework for predicting palm oil yields in the future. Figure 7 depicts the prediction of palm oil yield as a prospective framework. In an effort to accurately predict palm oil, a diverse range of data must be gathered, including genotype, breeding, fruit, yield, and bunch. Others include soil properties, climate, vegetation indexes, diseases, previous yield records, fertilisation, and UAV monitoring data. Furthermore, data must be preprocessed after collection to further analyse the information. Once the data has been preprocessed, the entire dataset is divided into a training and testing set.
preclude the use of successful DL. Hence, detailed studies must be conducted on the use of DL strategies in palm oil, and there is still room to delve into the performance of algorithms. The research has shown that DL usage in forecast time-series is superior to basic ML in average performance; hence, future studies can focus on this idea. ML data-driven models were unable to extract features from multifactor timing data efficiently. According to this study, most models did not reflect the data's phenotypic characteristics. Time-series has shown to outperform other forecasting methods with minor forecasting errors.

Identifying Palm Oil Prediction Framework
In the light of previous research on palm oil prediction, we have proposed a framework for predicting palm oil yields in the future. Figure 7 depicts the prediction of palm oil yield as a prospective framework. In an effort to accurately predict palm oil, a diverse range of data must be gathered, including genotype, breeding, fruit, yield, and bunch. Others include soil properties, climate, vegetation indexes, diseases, previous yield records, fertilisation, and UAV monitoring data. Furthermore, data must be preprocessed after collection to further analyse the information. Once the data has been preprocessed, the entire dataset is divided into a training and testing set.  A training dataset is used to build a prediction model, which is trained using a variety of ML-based regression and classification algorithms. Moreover, parameter optimisation is used to improve trained models when they fail to meet expectations, and the testing of the models is conducted after achieving the required performance. The critical factors affecting palm oil prediction include disease recognition and management, fruit and bunch, palm oil breeding, growth, and nutrition level monitoring. Accordingly, this study combines the prediction model output with palm oil variables to accurately predict palm oil yield.

Conclusions
This study examined existing research on ML in palm oil conducted over the last decade, categorised into unsupervised, supervised, and deep learning analyses. The critical objective of this study aimed to identify the breeding prediction of oil palms derived from the clustering category. Furthermore, the study incorporated regression analyses, including determining FFB yield, tree detection, and climate. The findings from the research were primarily focused on tree detection, the observation of palm oil plantation areas, and identifying the ripeness of fruit bunches. However, models for forecasting disease were not widely used, i.e., in fruit yield, oil production, and breeding.
Notably, SVM, RF, LR, and SVR are the most promising conventional machine learning architectures. Additionally, DL models such as LSTM and ANN are used to estimate crop yields in the field. This selection has been made based on most of the algorithms used in previous studies, hybrid modelling, comparison, and performance evaluation. Some feature selection algorithms and performing ensemble methods identify and treat missing and outliers' values, computational complexity, and model fit, as well as outperforming models, and should be examined comprehensively in order to determine the best model. Research on palm oil yield prediction is scarce, according to the findings of the current review. In addition, very few feature sets and ensemble methods have been used in the existing palm oil yield prediction studies, resulting in large discrepancies between the predicted and actual palm oil yield. When it comes to palm oil yield predictions, it is too early to speculate on the best set of features, models to use, and the appropriate ensemble models. As a result, more studies involving a wide range of features and prediction algorithms are needed. Extensive research on crop yield prediction and palm oil yield prediction is expected to be based on this paper.
Nevertheless, most of the data used in palm oil research were climatic from various sources. For example, identifying suitable land requires various factors that must be considered, e.g., soil classification, automated pest and weed detection, and optimising fertiliser use. This list of factors can be extended to identifying the symptoms of sunlight and water limitations in palm oil crops and seed assessment.
The findings from this review highlight current palm oil and ML research from various angles. Innovative ML practices, such as big data analytics and automated information extraction, support the advancement of knowledge-based palm oil. However, research trends indicate that existing MLAs and techniques were not adequately coupled to support effective decision-making systems compared to other ML application domains. Thus, the current research is insufficient to design practical tools that increase yields, quality, and plantation sustainability. A vivid idea could be presented on how ML can enhance palm oil development and inspire researchers to find relevant solutions in this area. Overall, this article may support the palm oil industry's automation and intelligence development. However, future work should include a technical review of other ML models used in palm oil agriculture.