Stacking-Based Ensemble Learning Method for Multi-Spectral Image Classiﬁcation

: Higher dimensionality, Hughes phenomenon, spatial resolution of image data, and presence of mixed pixels are the main challenges in a multi-spectral image classiﬁcation process. Most of the classical machine learning algorithms suffer from scoring optimal classiﬁcation performance over multi-spectral image data. In this study, we propose stack-based ensemble-based learning approach to optimize image classiﬁcation performance. In addition, we integrate the proposed ensemble learning with XGBoost method to further improve its classiﬁcation accuracy. To conduct the experiment, the Landsat image data has been acquired from Bishoftu town located in the Oromia region of Ethiopia. The current study’s main objective was to assess the performance of land cover and land use analysis using multi-spectral image data. Results from our experiment indicate that, the proposed ensemble learning method outperforms any strong base classiﬁers with 99.96% classiﬁcation performance accuracy.


Introduction
Ethiopia is the second most populous country after Nigeria on the African continent. According to United Nation reports, more than 85% of the population is primarily dependent on agriculture for their economic activity and livelihood [1,2].The agriculture sector is still based on subsistence farming using traditional methods such as ox-drawn ploughs with very little mechanization [3,4]. For thousands of years, agriculture in Ethiopia has been practiced using traditional methods and tools. In addition, due to recurrent droughts, the country was unable to feed a portion of its burgeoning population [5], now estimated at 110 million. The other limitation in the agricultural sector was lack of a decision support system for the purposes of land-cover analysis, disease monitoring system, yield prediction mechanism, and efficient weather monitoring methods.
On the other hand, the current state-of-the-art in the agricultural domain requires an application of high-tech to improve production and productivity. In this regard, Mahmoud A. and colleague [6] proposed Precision agriculture is an approach to use information technology to improve the quality of crops and increase yields. Precision Agriculture has been defined as maximizing yields using minimal resources such as water, fertilizer, pesticides and seeds by Spyridon, N. and his colleague [7]. Similarly, Alaa, A. and his colleague [8] well defined the concepts of Precision agriculture as a farm management system using information technology to identify, analyze and manage the variability of fields to ensure profitability, sustainability, and protection of the environment. In addition, computer vision (CV) [9][10][11] and machine learning models play a significant roles to determine soil properties. According to Kariheinz Knickel [3], since the 1950s and 1960s, agricultural modernization vigorously entrenched and established a form of agriculture that is capital-intensive, high-input, high-output, specialized and rationalized system by industrial countries. To achieve sustainable agricultural output [8], precision agriculture is used and it is the technology that can enhance the farming process. Recently, AI and machine learning simplified the complex process of data collection, data processing, data-interpretation and decision-making strategy in the agriculture sector. Similarly, the advancement of remote sensing technologies [12] significantly shifted the trends of agricultural practices. Quite larger numbers of studies have been reported that, remote sensing technologies are predominantly utilized to enhance production quality and crop health monitoring purposes.
Due to the rich information contained, in them, in this study, our dataset includes multispectral images. The research community in the domain area classified remote sensing data into multi-spectral and hyper-spectral image data. The only difference between the two data types are the ranges of numbers of bands employed to solve the problem at hand. Knickel and his colleague [3], defined remote sensing as a method of extracting or acquiring relevant information about earth's surface [13] by analyzing the extracted hyperspectral features. Hyperspectral image data comprises multiple bands [4,12,14] that are sensitive to a very narrow wavelength range along the electromagnetic spectrum [12,15]. Hyperspectral imaging is preferred because of its cost-effectiveness [16], the non-destructive measurement of biophysical and biochemical properties of object on the surface, and the ease to analyzing image data in real-time. The advantage is even more pronounced when one considers the fact that processing handcrafted image data is time a consuming endeavor and getting high quality images is very expensive.
In this study, we propose and test an ensemble-based machine learning approach to classify remotely sensed image data collected from around Bishoftu, located in the Oromia Region of Ethiopia. The area is rich in terms of its bio-diversity and large portions of the land are covered with cereal crops. In the work, our main objective was to utilize multispectral image to analyze land-cover [10] and land-use. Application of multi-spectral image data in the agriculture domain allows us to make valuable contributions by addressing the challenges of traditional data collection, interpretation, and analysis. In addition, such system reduce the time and effort of domain experts to process large-sized image data quickly and efficiently.
The proposed stack-based ensemble learning approach will have the following contributions to the multi-spectral image process for land-use land-cover analysis: • Crafting automatic spatial-spectral feature extraction, in order to resolve the challenges of manual region of interest (ROI) due to labeling. • Optimizing the classification performance of the proposed ensemble-based machine learning model. Most GIS tools have a built-in classical machine learning model to classify land-use and land-cover. But, due to the complex nature of multi-spectral image data, these models suffer from a few limitations and fail to get optimal classification performance. • Our target was to design an ensemble learning approach to address the challenges of bias-variance trade-off, which is the limitation of many classical machine learning models. Many classical models are sensitive, small changes in the training data will brought a significant change on the performance of the classifiers.

Review of Related Works
A self-trained ensemble with semi-supervised support vector machine (SV M) for pixel classification of remote sensing imagery had been proposed by Maulik and Chakraborty [17]. He ensemble was based on application of the margin maximization principle to both labeled and unlabeled data. In the semi-supervised support vector machine (SS-SVM) approach, the classifier uses majority voting [18] to classify a pixel into its respective category. Maulik and Chakraborty recommend that the Mahalanobis distance can be utilized to query the correlated points from the unlabeled database when designing the various self-trained methods. The Mahalanobis distance is an effective multivariate distance metric that measures the distance between a point and the mean value of a distribution. It is an extremely useful distance metric with applications in multivariate anomaly detection, classification on highly imbalanced datasets, and one-class classification. Another useful approach is the rotation-based SVM (RoSVM) ensemble in the classification of hyperspectral data with limited training samples. The basic idea of RoSVM is to generate diverse SVM classification results using random feature selection and data transformation. This approach can enhance both individual accuracy and diversity within the ensemble (PCA and projection). Its main weakness is the higher than normal computational complexity; compared to SVM and RSSVM [19]. A fast learning method recommended by the authors, a combination of SVMs and multiple classifier system (MCS) for the classification of hyperspectral data and a semi-supervised approach, is another effective method to deal with limited training samples and RoSVM's drawback. Similarly, Fang and his colleague [18] proposed the Adaptive Rotation forest (RoF) model to handle the challenge of class-imbalance due to limited training sample. Fei LV [20] inspired extreme learning machine with auto-encoder to solve ineffective classification of HSI data due to inadequate labeled training.

High Dimensional Hyperspectral Image Data
Hyperspectral image data contains hundreds of data channels with a large and various number of features. Consequently, dimensional issues are the striking challenge when it comes to processing such multi-spectral images. Similarly, down-sizing the numbers of dimensions will also cause the processing to eliminate some relevant features from the image data. The dimensionality challenge [21,22] creates the Hughes phenomenon, a well-known problem in the classification domain where an initial increase in the number of features leads to an increase in a classifier's performance load until an optimal number of features is reached. Most conventional machine learning models fail to handle this phenomenon well and lead to poor performance. Ceamanos and his colleagues [23] inspired the fusion approach to handle high dimensionality. The authors decomposed a large image dataset into sub-samples [24,25] and trained each by using the standard SVM model. The prediction output from each model was then fused using another SVM. Similarly, Mohamed [26] combined the SVM model with a bagging technique to handle the n-dimensional hyperspectral dataset. The study attempted to minimize prediction variances and, thus, improve overall accuracy. On the other hand, Xia and his colleague [27] proposed a novel ensemble based canonical correlation forest to address challenges associated with high-dimensionality. The authors employed principal component analysis (PCA) on each subset to transform the input data into a new feature space. Then, the final classification results were determined by the prediction of individual canonical correlation trees (CCTs ) using a majority voting rule. However, it is not clear why [27] used the majority voting rule to produce the final classification result. On the other hand, it is clear that the curse of dimensionality degraded classification accuracy. To tackle this challenge, Juang, Ch. X. and his colleagues [28] proposed the fuzzy C-means based support vector machine or SVM-fuzzy C-mean. The SVM was used for band selection purposes whereas the C-mean method was used to build an ensemble of clustering maps. The Markov fisher selector has been used to minimize computation complexity during the clustering process. They utilized majority voting technique and Markov random field theory to fuse the final model. Similarly, Samat and colleagues [29] also proposed an ensemble extreme learning method to resolve high-dimensionality problems. Bagging-based and adaptive boost (AdaBoosting) based ensemble schemes have been used to conduct experiment on the Reflective Optics System Imaging Spectrometer (ROSIS) and the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) [30] data.According to the authors, a differential and non-differential parameters with a kernel based activation function can be used to improve classification performance of the proposed model. Therefore, handling the dimensionality issues will significantly improve the classification performance.

Feature Extraction
Other challenges in multi-spectral image processing were extracting features used to train a machine learning model. Manual feature extraction techniques are resourceintensive and time-consuming processes compared to other state-of-the-art methodologies. To tackle the limitation, a lot of attempts have been made in the domain area. Merentitis and his colleague utilized principal component analysis (PCA) [15] to handle dimension reduction issues and Feature Extraction (FE) techniques for massive datasets. Some authors argue that these techniques have a limitation: they ignore correlation between neighboring features or pixels. Similarly, Parshakov and his colleague [31] inspired a Spectral Angel Mapper technique to map a cluster of pixel values instead of individual pixels. A unified framework method has also been inspired by Merentitis [32] to address bias-variance decomposition. Merentitis, introduced MNF transformation, blind unmixing and derivation of abundance maps (Ams), non-parametric weighted feature extraction (NWFE), and synthetic features methods to resolve the trade-off. On the other hand, Zhong and Liu [19] proposed a binary classifier or dichotomizer technique to separate subsets of classes. Another important concern in multi-spectral images is the issue of combining spectral and spatial features. To resolve the challenge, Chen and his colleague employ a Gabor-filtering and kernel based extreme learning machine (KELM) classifier and multihypothesis (MH)-prediction-based approach and they proposed and applied the approach to produce superior results compared to pixel-wise classification. Similarly, Ergul and Bilgin [33] combined spatial circle-neighborhood information with a semi-supervised classifier approach [34] to extract relevant features from image data. Random Multi-Graphs (RMGs) for spectral and spatial classification framework and matrix-based spatial-spectral feature representation have been used by Hang et al. [34] Likewise, Shaohui Mei and his colleague proposed the CNN method to extract relevant features to train a deep stacked neural network (DSNN) model. Shaohui applied a fusion approach to concatenate the extracted features. Pan and colleague [35] also proposed Hierarchical guidance filtering based ensemble learning to integrate the spatial and spectral feature from HSI data. In addition, an Adaptive Boosting (AdaBoost) approach has been used by Chenming Qi and his colleague [36] to minimize redundancy and maximize relevance in feature extraction process. The authors argue that, a mutual information based ensemble learning classifier outperforms similar classifiers for multi-class classification problems. There are hundreds of feature extraction techniques available. Machine learning researchers and experts are expected to critically assess the appropriate feature extraction approaches to tackle the challenges of model over-fitting.
A good generalizable machine learning models mainly depends on the wealth of the training dataset. In this regard, on the size of the training dataset causes the model either to under-fit or over-fit on the testing or validation data. Preparing adequate representative datasets is an important consideration when designing machine learning models. The research community in the domain area attempted to handle the limitation of small-size training datasets in multi-spectral image processing. Li and his colleagues [37], argue that classification performance suffers due to training model having limited labeled training samples [19]. In case of multi-spectral image classification, training samples need to be considered carefully to reduce model over-fitting. According to Maulik, Ujjwal [17], one of the possible solutions to limited training samples was applying the semi-supervised approach to classify multi-spectral image data. In semi-supervised approach, one combine [38] limited labeled samples [39] with large unlabeled samples to exploit the abundance of unlabeled samples. Similarly, Ramzi and his colleague [40] highlighted that, for a broad range of spectral bands, it is often difficult to find sufficient number of training samples. In addition, Fang and colleague [18] proposed a multi-Scale CNN to extract features from an unlabeled data. The multi-scale classifier is used to assign a label to unlabeled sample. A majority voting schema have been used to select an unlabeled image with a predefined threshold. Finally, the selected sample instances are added to the original training data to be used in subsequent training iterations. In addition, image quality is a big concern in the area of multi-spectral image classification. To tackle this pitfall, the research community proposed different approaches such as denoising [25,41,42] feature reconstruction, superresolution methodologies and image recovering technique. On the other hand, Xia and his colleague [43] the proposed Random Forest ensemble where extended multi-extinction profiles are implemented to improve classification performance. They used different boosting strategies to construct the ensemble model. Therefore, from the review of related works, it is apparent that very optimal classification performance is difficult to achieve without having adequate training samples. In the current study, our main objective was to mitigate the challenges of models bias-variance and enhancing classification performance. The proposed stack-based ensemble learning model performs well to handle the complex multi-spectral image data.

Stack-Based Ensemble Learning Model
Stacking is an ensemble learning technique to combine multiple base classification models via a meta-classifier. This approach combines multiple conventional classifiers to build one generalized machine learning model. First different base learning models L1, ..., LN are trained on the same dataset S, which consists of examples si = (xi, yi), i.e., pairs of feature vectors (xi) and its corresponding target class (yi). To generate a training set for learning the meta-level classifier, a leave-one-out or a cross validation procedure is applied. The proposed stacked-based ensemble learning model has been presented on Figure 1 below. Note: d.pv represent input dimensional pixel values, and constructed a new L1 represents array of image data generated from each first level classifiers and pm1 to pm4 represents the prediction output of first level classifier respectively.
The improved version of the stacking algorithm is presented in Figure 2 below and it can be summarized as follows: From the raster image data about 65, 000 features are used as input dimension vectors to train first level classifier. First, we trained h first level classifiers with D input dimension. In the current study, we have utilized four first level classifiers to fit the training data. From the prediction output of each first level classifier, we generated a new training data-set x1 . The meta-classifier was trained using x1 to fit the stacking model: S(x) = (h1i(x), h2i(x). . . hr(x)). To mitigate the over-fitting challenge, we applied cross-validation techniques to split the training data into 10 fold. Let k = 1...N implies 1. . . K be an index function that indicates the partition to which observation i is allocated by the randomization.

Experiment Results and Discussion Datasets
To conduct the experiment, LandSat image data has been collected from Yerer Selassie located near the town of Bishoftu in the Oromia region of Ethiopia. Yerer is located in the East Shewa zone in the great rift valley at 38°57 40.863 E and 8°50 55.016 N. Yerer selassie borders on the south by Dugda Bora to the south the West Shewa Zone to the West, the town of Akaki to the Northwest, the district of Gimbichu to the northeast, and of Lome to the east. Altitudes in this district range from 1500 to over 2000 m above sea level. From this specific geo-location, three image data were collected in different time frames.
During the data pre-processing steps, we made all the required corrections such as: determining Spatial resolution using operational land imagery, Top of Atmosphere Reflectance, Radiometric correction and Topographic Correction, and Generating False color composition (FCC). About four bands namely the red, blue, green and near-Infrared bands have been selected to extract the relevant features. Layer stacking has been performed to combine all four selected spectral bands to obtain a single stacked image data. At this level, one can utilize different false color combinations to make the visualization of the stacked image appear more natural. Figure 3 shows the image stacking process we utilized.
The stacked multi-spectral image was used as an input dataset to train the proposed model which comprises a dimension of 372 nrow * 200 ncol, 30 m spatial resolution, and a total of 65,400 features. A sample or observation of 147 spatial-polygon has been labeled as a training dataset from the raster image data. The distribution provides a parameterized mathematical function that can be used to calculate the probability for any individual observation from the sample space. From this analogy, the training datasets are almost evenly distributed. Figure 4 illustrates the distribution of training samples on each layer stack.  Then, the dataset split and 70% of it was used for training purposes and the remaining 30% was used for model testing purpose. The next step was applying multiple base classifiers to build our classification models. In this regard, we selected RF, SV M, KNN and XGBoosting models as base classifiers. All the base classifiers utilized nsample = 103, predictor = 3, nclass = 4, nd the bootstrapped sampling technique was used to select sample dataset. The Samples were constructed by drawing observations from a large data sample one at a time and returning them to the data sample after they have been chosen. This allows a given observation to be included in a given small sample more than once. In this sub-section, we briefly discuss the nature of the individual base classifiers. First, the support vector machine is one of the strong machine learning algorithms provided non-linear issues are properly handled. We have utilized the RBF kernel function to fit the model. The radial kernel function has a form: where γ is a tuning parameter which accounts for the smoothness of the decision boundary and controls the variance of the model. If γ is very large then we get quiet fluctuating and wiggly decision boundaries which accounts for high variance and over-fitting. If γ is small, the decision line or boundary is smoother and has low variance. Then the support of SVM becomes: The second base classifier is the random forest (RF) classifier which is an ensemble learner by its very nature. It a classifier comprising a number of decision trees on various subsets of the given observation and takes the average to improve the predictive accuracy of that dataset. The output chosen by the majority of the decision trees becomes the final output of the rain forest system. The third base classifier was the KNN algorithm that stores all the available data and classifies a new data point based on similarity. That is, when new data appears, then it can be easily classified into a category by using the KNN algorithm. KNN is a non-parametric algorithm, which means it does not make any assumptions about the underlying data. There is no particular way to determine the best value for "K", so we need to try some values to find the best performing one. The most preferred value for K is 5.
where, x and y are the two vector points on sample space. Optimal classification performance obtained at k = 5. Table 1 above summarizes experiment results and the classification performance of each base classifier on the training data-sets. Generally, each classifiers performs very well on the training data. In the case of KNN, the new training samples were classified into their respective categories by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function. There is no standard to define the numbers of K values. It needs several trial and errors to get the optimal values. On the other hand, we have evaluate different distance metrics that fit the problem domain. Similarly, SVM needs to address the non-linearity issue by finding the best fitting kernel function. The idea behind generating non-linear decision boundaries is that we need to do some nonlinear transformations on the features X i , which transforms them into a higher dimensional space. In our case, the non-linear decision boundary and the values of the tuning parameters were c = 1, r = 1.8 and a number of support vectors. Results of the experiment show that the number of predictors affects the classification performance of each base classifier. In this regard, as the number of predictors increases or decreases, the performance of the model also varies. The last but not least base classifier was the XGBoost method [37][38][39]. Assume that our image dataset is D = (xi, yi): i = 1 . . . n, xi E R m , yi E R m , then we get n observations with m dimensions each and with a target class y. Letŷi be defined as a result given by the ensemble represented by the generalized model as follows [43]: where f k is a regression tree, and f k (xi) represents the score given by the k-th tree to the i-th observation in the data. In order to define function f k , the following regularized objective function should be minimized: L is the loss function. In order to prevent the model from getting too large and complex, the penalty term Ω is included as follows: where Y and λ are parameters controlling penalty for the number of leaves T and the magnitude of leaf weights w, respectively. The purpose of Ω( f k k ) is to prevent over-fitting while simplifying models produced by this algorithm. Note: The final values used for the model were nrounds = 50, max d epth = 1, eta = 0.3, gamma = 0, colsample b ytree = 0.8, min c hild w eight = 1 and subsample = 0.75.
After evaluating the performance of individual base classifier, we employed the remaining 30% of our test data-set to evaluate the overall prediction performance of each base classifier. Table 2 summarizes the prediction performance of the base classifiers using the test dataset. From Table 2, one can deduce that all the models performed well in classifying the test dataset with high accuracy. However, when we compared the prediction performance of KNN and SVM on test and training data-sets, we encountered an issue an issue overfitting. One of the possible solutions is increasing the sizes of both the training and testing data set. Another challenge is, the sensitive of base classifiers, where small variances significantly affected their prediction performance. The main objective of this study was to implement stack-based ensemble learning method. In ensemble learning method, before proceeding to combine different base classifier, evaluating the correlation among individual base classifiers are very import. If there is no difference among base classifier, then no need of applying Ensembling learning approach. The rules of ensemble-based learning are mainly based on difference among base classifiers. From our experimental results, the correlation among the base classifiers are briefly summarized in Table 3. In Table 3, the pairwise correlations between individual models are low and fulfill the requirement of an ensemble learning rule. From experiment results, two models are highly correlated in predicting the training data-sets. Building the ensemble learning demand the analogy of less difference among the base classifiers. Due to the bootstrap re-sampling, the size of selected samples in each target classes are differ at different executions. The base classifier sensitivity was also observed in the correlation among the models.
Once the assessment has been completed, the next step was implementing the stack-based ensemble learning approach to determine the final categories using meta-data. The stacking approach utilizes meta-data generated from each base classifier as input for training the metamodel. Bagging namely bootstrap aggregating [44,45] is one of the most intuitive and simple frameworks in ensemble learning that uses the bootstrapped sampling method. In this method, many original training samples may be repeated in the resulting training data, whereas others may be left out. Samples are selected randomly from the training set, instructive iteration is applied to create different bags, and the weak learner is trained in each bag. Each base learner predicts the label of the unknown sample, respectively.
In the case of Stacked XGBoosting method, a multi-nominal distribution sampling technique has been used to select observations randomly from the sample space. First, the parameters P1. . . pk are sorted in descending order. Then, for each trial, variable X is drawn from a uniform (0, 1) distribution. The resulting outcome is the component.
X j = 1; X k = 0 for K! = j is one observation from the multi-nomial distribution with Pi . . . P k and n = 1. A sum of independent repetitions of this experiment is an observation from a multi-nominal distribution with n equal to the number of such repetitions. Our overall results from the experiment, in terms of the classification accuracy of the base and ensemble models are summarized in Table 4. It is apparent from Table 4 that the majority of the individual base classifiers have classified both the training and testing datasets with satisfactory accuracies. When the base classifiers were combined using the stacking ensemble approach, the performance improved. The proposed model classified the multi-spectral image data with 99.96% classification accuracy. This is promising and the model is worth applying for various application areas. We plan to extend our work to further test the proposed model using larger image datasets. In this study, handling the challenges of bias and variance is our concern. To resolve this pitfall, we integrate the XGBoosting method into the ensemble learning approach. The XGBoost model belongs to a family of boosting algorithms that turn weak learners into strong learners. Boosting is a sequential process; i.e., trees are grown using the information from a previous grown tree one after the other. This process slowly learns from previously use of data and tries to improve its prediction in subsequent iterations. Boosting reduces both variance and bias. It reduces variance because it uses multiple models through a bagging process. It also reduces bias through training successive models by passing information on errors made by previous models. These are some of the reasons why we integrated the XGBoosting method into the stacking learning approach to address the challenges of bias-variance trade-off. In addition, Samat and colleague [25] also proposed the Meta-XGBoosting framework to improve the limitations of the XGBoost model. Finally, the XGBoost model enabled us to achieve optimal classification accuracy from the proposed model.

Discussion
Currently, remote sense image processing plays significant roles in the domain of agriculture. One of the broader application areas was to conduct land use and land cover (LULC) analysis using multi-spectral image data. A proper land cover management provides insights into how to efficiently utilize scarce natural resources. In addition, this is one of the ways forwarded for designing a mechanism to implement precision agriculture approaches in the sector. Table 5 provides an example data collected on the land cover of our study area. From Table 5, the land-cover dynamically becomes bare land within 1.5 months. This is due the fact that the majority of the crops are harvested during this period. Table 5 also shows the patterns of land-use and land-cover by different classes over three months. Based on the insight obtained from the pattern, one could draw an informed decision.
In addition, we assessed the importance of variables (spectral bands) importance to examine which bands are important to classify the land-cover. To classify the landcover, we used the blue, green, red and near-Infrared spectrum. These visible wavelengths cover a range from approximately 400 to 700 nano meter. Each spectrum characterizes objects on the surface by unique spectral signature. Blue spectrum is widely responsible for increasing plant quality specifically crop leaf. Similarly, the green spectrum absorb and used for photosynthesis. On the other hand, the red wavelength helps stem, leaf and general vegetation growth. Table 6 shows variable importance in ranked order from the highest to the lowest during model fitting using training data. Variable importance assessment gives insight about the relevance of each bands used to build the first level classifiers. From the experimental result, Band 2 and Band 3 are the most important variables used to classify the training data-sets into the respective categories. To deal with multi-spectral image data, good understanding about the nature of data would help to handle the complexity.
Machine learning are a cost effective and efficient approach to analyze land-cover and land-use in the domain of multi-spectral image processing.Those tools have built-in image classifiers to label and classify the target classes. Figure 5 below present the semi-automated multi-spectral image data training data labeling process. Manually training data-set labeling using a polygon (region of interest) methods were prone to biases and error due to mixed pixel values. Similarly, most of conventional machine learning models fail to handle the complexity of multi-spectral image data due to their higher dimensionality, the Hughes phenomenon resulting from unbalanced training samples, poor spatial and spectral resolution of the image data, large size of spectral features, and presence of mixed pixels. The process of semi-automatic feature extraction has been plotted on Figure 6 below. In case of multi-spectral image classification, data preprocessing and correction are the bed-rock to obtain robust classification accuracy. Extracting relevant feature from multi-spectral image data to label the target class is a tricky task. In this study, we extracted a feature from each bands namely red, green, blue, near infrared bands to represent each pixel values. The stacked image was the combination of the above four band to represent classes of land cover in our study area. We have used R programming language scripts to automatically extract pixel values from the image data.
We have also used the Random Forest, Support Vector Machine, K-Nearest Neighbor, and XGBoost as base classifiers. With well labeled training datasets, the classifiers were competent enough to classify the test dataset into its respective categories. Generally, the issues of bias and variances are the big concern to be addressed by the proposed model. Therefore, the proposed stack-based ensemble model using the XGBoosting method has been used to classify multi-spectral image data. An ensemble learning approach integrated with the extreme gradient boosting (XGBoost) method handled the bias and variance challenges of base classifiers much better. Based on experimental results, our proposed stack-based ensemble learning method outperformed individual base classifiers.
In addition, multi-spectral image processing was used a tool to compute different indexing parameters such as metrics. In Ethiopia, agricultural experts employ traditional and time-consuming approaches to computing yield estimation and crop-disease monitoring. To address the problem, one of the possible solutions is computing the Normalized Difference Vegetation Index (NDVI) which gives a measure of the vegetative cover on wide areas. Dense vegetation shows up very strongly in the imagery, and areas with little or no vegetation are also clearly identified. NDVI also identifies water and ice. The Normalized Difference Vegetation Index (NDVI) is a measure of the difference in reflectance between these Wavelength ranges. NDVI takes values between −1 and 1, with values 0.5 indicating dense vegetation and values <0 indicating no vegetation. Figure 7 below represent class categories of land-cover using NDVI values. In this study, our main objective was to build stack-based ensemble learning approach by combining multiple base classifiers. The main purpose of the stacking approach was to further optimize classification performance by combining the base classifiers. Stack-based method is a potentially capable enough to handle complex data such as raster image data. Consequently, the proposed model outperformed all the individual base classifiers and produced the highest classification accuracy. We have conducted a comparison analyses between the performance of our proposed methods and state-of-the-art machine learning models in similar domain areas. Table 6 summarize comparative data.
It is clear from Table ?? that several attempts have been made to improve classification performance in the domain of hyper-spectral image processing. A number of ensemble learning methods have been proposed to analyze land cover and land use using image data. Our proposed Stacked-XGBoost model outperformed other models and it is efficient enough to handle multi-spectral image data.

Summary and Conclusions
The purpose of the present work was to develop an effective method for accurate land use/land cover (LULC) classification to efficiently utilize scarce natural resources and to design a mechanism for precision agriculture. Well-designed machine learning method can address the limitations of current labor-intensive and time-consuming processes. In addition, getting high resolution multi-spectral image data is a critical challenge in the Ethiopian context. Despite image resolution, the process of labeling training and collecting representative sample data were a tricky task. During model building, we observed that most of the base classifiers suffer from an inability to handle complex and non-linear hyper-spectral image data. This is due to the high-dimensionality of Landsat images that contain hundreds and thousands of feature bands, mixed pixel values, dark-object on the surface of the earth, and limitation of feature extraction, and others.
Therefore, we proposed a stack-based ensemble learning method to classify hyperspectral image data collected from a location in Ethiopia. The performance level of the proposed model exceeded that of individual base classifier. Furthermore, integrating ensemble learning methods can potentially lead to capable of handling complex hyperspectral image data.
Based on our findings, we can conclude that, ensemble-based approaches outperform any strong single machine learning algorithm. However, there are many issues in the domain area that need further exploration. First, in the case of Ethiopia, obtaining satellite data with less than 10 m resolution is a big challenge. To conduct high-level domain specific research such as precision agriculture, image resolution and its quality are critical factors when making the final decision. Second, we employed small-size observations to fit the model. Small size and imbalance of training samples were the causes leading to model over-fitting. A semi-supervised features extraction can be a possible solution to obtain representative feature to enhance models' classification performance. Finally, model sensitivity needs further exploration in multi-spectral image classification. To handle the pitfalls, currently we are working on an ensemble of deep-learning to classify hyper-spectral image data. Combining the feature extraction capabilities with deep-learning ensemble approaches could resolve the above-mentioned bottle-necks. Funding: There is no funding organization or institutional support to conduct the current study. But, MDPI Editorial Board provide us a grant of waivers and 100% discounts on this article publishing charges (APCs).

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The multi-spectral image classification experimental output and sample code will be uploaded on my Github repository which is located at https://github.com/tagel123 /Multi-Spectral-Image-Classification/ (accessed on 1 December 2021). But, our data-set size is more than 3 GB which beyond the limitation of git-hub. To resolve this challenge connect the git with google drive to make the data public available.
analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.