Spatio-Temporal Patterns of Land Use / Land Cover Change in the Heterogeneous Coastal Region of Bangladesh between 1990 and 2017

Although a detailed analysis of land use and land cover (LULC) change is essential in providing a greater understanding of increased human-environment interactions across the coastal region of Bangladesh, substantial challenges still exist for accurately classifying coastal LULC. This is due to the existence of high-level landscape heterogeneity and unavailability of good quality remotely sensed data. This study, the first of a kind, implemented a unique methodological approach to this challenge. Using freely available Landsat imagery, eXtreme Gradient Boosting (XGBoost)-based informative feature selection and Random Forest classification is used to elucidate spatio-temporal patterns of LULC across coastal areas over a 28-year period (1990–2017). We show that the XGBoost feature selection approach effectively addresses the issue of high landscape heterogeneity and spectral complexities in the image data, successfully augmenting the RF model performance (providing a mean user’s accuracy > 0.82). Multi-temporal LULC maps reveal that Bangladesh’s coastal areas experienced a net increase in agricultural land (5.44%), built-up (4.91%) and river (4.52%) areas over the past 28 years. While vegetation cover experienced a net decrease (8.26%), an increasing vegetation trend was observed in the years since 2000, primarily due to the Bangladesh government’s afforestation initiatives across the southern coastal belts. These findings provide a comprehensive picture of coastal LULC patterns, which will be useful for policy makers and resource managers to incorporate into coastal land use and environmental management practices. This work also provides useful methodological insights for future research to effectively address the spatial and spectral complexities of remotely sensed data used in classifying the LULC of a heterogeneous landscape.


Introduction
An accurate estimation of land use/land cover change (LULCC) is essential for improved understanding of its impacts on climatic and environmental systems, to enable the implementation of appropriate environmental management practices [1].The changing climate and its multifaceted, multi-scaled ramifications (e.g., sea-level rise, frequent storm surges) pose tremendous challenges to the coastal regions of Bangladesh, which include salinity intrusion, loss of vegetation, and loss of agricultural productivity.In addition to frequent natural disturbances, these regions have been subjected to rapid urbanization and economic development.An estimated 35 million people currently live in the coastal belt of Bangladesh [2,3], with associated activities causing significant alteration to the coastal land cover features at various spatial and temporal scales [4].These changes, for the most part driven by anthropogenic activities, contribute to the loss of ecosystem services to an increased risk of natural hazards.Therefore, to identify effective strategies for dealing with the existing challenges, a detailed investigation of coastal LULCC dynamics has become indispensable for resource managers and planners.Although the LULCC research goes back decades, it is surprising that a comprehensive picture regarding such changes in coastal Bangladesh is still lacking, primarily due to the poor quality of accessible remotely sensed data over what is a highly complex, heterogeneous coastal landscape.This, in turn, poses unique challenges for large-scale data processing and analyses.
The coastal region of Bangladesh constitutes a large area (~710 km coastline) with a high degree of landscape diversity with the presence of human settlements, flat and hilly topography, forests, numerous tidal flats and creeks, natural levees, estuaries, mangrove swamps, muddy flats and sandy beaches [3,5,6].The challenges involved in the accurate classification of LULC across such a complex, heterogeneous biophysical landscape (e.g., coastal region of Bangladesh) arise from three major factors.First, the classification of LULC is highly susceptible to generalization and information loss due to the spatial and spectral limitations of available image data.Existing medium to low-resolution imageries provided by Landsat (30 m) are perhaps a perfect example.Mixed pixels in these images can lead to spectral confusion and problems in differentiating various land use categories.Studies suggest that even sub-meter spatial resolution images are not capable of explaining inter-and intra-class variability in a heterogeneous landscape [7].Second, images with coarse temporal granularity are not ideal for a detailed understanding of LULCC dynamics in regions that are under rapid and indiscriminate or unsystematic manipulation.Third, spectral complexities of the image objects are introduced due to various unfavorable local environmental and weather conditions (e.g., presence of thick dust, haze, and clouds), and large-scale biophysical changes, resulting from environmental disturbances (e.g., tropical cyclones) [8,9].These systematic and non-systematic factors introduce substantial challenges in reliably depicting spatio-temporal (space-time) patterns of LULCC in large geo-spaces when applying an algorithmic image classification process.
Thematic mapping of LULC is commonly based on a number of image classification techniques [10].To date, the most widely used classification methods and associated algorithms fall into a number of categories-supervised and unsupervised, parametric and nonparametric, hard and soft (fuzzy) classification, or per-pixel, subpixel and per field categories (for a comprehensive review, see Lu and Weng, 2007).Among those, the most heavily used image classification methods are maximum likelihood classifier (MLC), K-Means, and ISODATA.However, MLC is a parametric approach and therefore performs accurately when spectral data are normally distributed.This tends not to be the case for image data for the coastal region of Bangladesh, which exhibit a non-linear distribution of landscape features.Due to the fact that these are unsupervised techniques, K-Means and ISODATA are often undesirable because they do not allow expert knowledge input (e.g., labeling of clusters) into the LULC map generation process.To tackle these challenges, previous research has adopted advanced classification algorithms including artificial neural networks (ANN), Support Vector Machines (SVM), Random Forest (RF), and Classification and Regression Trees (CART) [11,12].RF and SVM algorithms are the most popular in the RS community [13,14] and are widely used for multispectral and hyperspectral image classification tasks.These classifiers outperform traditional parametric approaches with their ability to deal with noise, discrete/continuous as well as unbalanced datasets [15].RF's LULC classification accuracy is comparable to other ensemble learning techniques-namely, bagging and boosting [16,17].Besides high accuracy, RF has robustness against overfitting on the training data and requires considerably less tuning of the parameters than boosting ensembles, and provides measure of variable importance that can be used for the final model development.Despite these positive features, RF's performance can be limited when classifying the LULC of a heterogeneous landscape, since RF's randomized bagging-based feature selection and classification approach can be biased towards the majority class and therefore, can misclassify the minority class within the data [18,19].Another point to note is that, for coastal areas comprised of sandy beaches, rivers, patchy inland swamps and forest canopy layer, an image pixel of moderate spatial resolution can encompass multiple features on the ground.This can lead to a similarity in spectral reflectance among multiple features, especially in urban/built-up areas.RF's important feature selection approach is less reliable when predictor features are correlated with each other, and therefore can perform poorly in identifying most informative features-i.e., spectral bands that are most discriminative among LULC classes [20].In such cases, a separate stage of feature selection prior to the model building process can be used for achieving improved predictive performance of the RF classifier.
For reliable LULC mapping, selection of appropriate remotely sensed datasets and choice of a suitable classification system are considered the two most important factors [10].Yet, the use of 30-m resolution Landsat images is the only viable option for LULC classification for coastal region in Bangladesh.Therefore, landscape heterogeneity and complexities inherent within existing image data, in regards to spatial, radiometric, spectral, and temporal properties, warrant careful selection of procedures to be used in a typical LULC mapping workflow: image processing, training sample generation, feature extraction and selection, choice of appropriate classification method, post-classification processing, and accuracy assessment.
Feature extraction/selection is a critical step in the image classification task, as often only a few features (e.g., spectral bands) in a multispectral or hyperspectral image contain meaningful information necessary for separating individual LULC classes (e.g., pond vs. mangrove swamps).The most common feature extraction method is principal component analysis (PCA), which transforms the data into a new set of principle components that best describe the underlying structure of the original dataset [21].Other feature extraction methods are minimum noise fraction transform, discriminant analysis, decision boundary feature extraction, non-parametric weighted feature extraction, wavelet transform, and spectral mixture analysis [10].Feature selection involves finding a subset of features in the dataset that contribute most to the accuracy of a classification model.Common feature selection approaches include wrappers, embedded methods, semantic groupings, and filters.The wrappers approach makes use of the learning algorithm itself to evaluate various subsets of features during the classification process, but it has a high risk of overfitting and is also computationally intensive [22].Although inherently similar to the wrappers, the embedded feature selection techniques (e.g., LASSO regularization and Decision Trees) utilize an intrinsic model-building metric to determine (learn) which features contribute the most to the classification accuracy.While the LASSO feature selection algorithm performs well in modeling linear correlations between features, it cannot determine non-linear feature dependencies [23].Semantic groupings involve selection of features (e.g., spectral or texture features) based on their type or by expert opinion.Filter feature selection methods use a pre-processing step, which is independent of the learning algorithm; they apply a statistical measure (e.g., Chi square test, information gain, and correlation coefficient scores) to rank each feature by the determined score.Classification and Regression Trees (CART) and Random Forest (RF) are two of the popular filter methods that are commonly used in most of the remote sensing applications.CART is a decision tree (DT)-based learning algorithm that uses binary recursive partitioning to grow DTs, using Gini and Twoing methods that search for important relationships and patterns in the data.CART feature selection was successfully used for vegetation and crop classification in moderate to high spatial resolution imageries (e.g., multi-temporal MODIS) [24,25].
A Random Forest constructs an ensemble model of decision trees from random subsets of features and bagged samples of the training data [15].Like CART, RFs were used for feature selection leading to improved classification accuracy in mapping land-cover [8,9] and crops [26].However, both CART and RF can perform poorly for high-dimensional and unbalanced data; recursive binary splitting and the bagging randomization process involved in these algorithms can be biased toward uninformative features [27,28].Extreme gradient boosting (XGBoost)-another filter-based feature selection algorithm started gaining popularity recently [29][30][31].XGBoost is a new implementation of boosting CART algorithms that improves the accuracy of classification through iterative computation of weak (basic) classifiers.It usually outperforms benchmark classifiers such as support vector machines (SVM) or RF, and thus has gained much popularity in the Machine Learning community-particularly in Kaggle competitions, and most recently in image classification [32][33][34][35].In identifying important features, the XGBoost algorithm uses boosted trees to obtain feature scores.The more a feature is used for devising key decisions with boosted trees, the greater the score assigned to that feature.This approach can thus reduce class bias.XGBoost also utilizes the "sparsity-aware split finding" approach that can be highly advantageous for data with heterogeneously distributed features space, as usually manifested in the remotely sensed data of complex and heterogeneous coastal landscapes of Bangladesh.Moreover, the XGBoost algorithm employs measures of gain, frequency, and cover, that can deal with correlation among features while searching for the most important or discriminative features [30].
A few studies exist that shed light on LULC change in the coastal areas of Bangladesh.While these studies highlight the space-time dynamic of coastal land-use, they are limited in geographic scope-i.e., focused on a relatively small coastal area comprising a few districts [4,12,36] or specific agro-ecological regions [3].As such, the existing literature does not provide a detailed, comprehensive understanding on the changing pattern of LULC across the entire coastal region.To overcome this identified research gap, this study took up the challenge to produce detailed, multi-year LULC maps of the entire coastal region of Bangladesh, with an aim to provide insight into the varied level of interplays that exist between human and the physical environment in this region.In doing so, this study employed a unique methodological approach compared to those found in existing studies-that is, it quantified multi-year coastal LULC classes using two state-of-the-art machine learning algorithms-XGBoost and Random Forest.Specifically, using these advanced algorithms, this study evaluated the performance of an integrated feature selection and classification modeling approach for LULC classification in a large and heterogeneous coastal landscape.Consequently, this study sought to answer two specific contextual and methodological questions, as follows: 1.
What are the multi-year spatial and temporal patterns of LULC in the heterogeneous coastal region of Bangladesh? 2.
Does the XGBoost feature selection contribute to an improved performance of the RF algorithm for LULC classification in a highly heterogeneous landscape?

Description of the Study Area
The study area covers the entire coastal region of Bangladesh (Figure 1), an area of 4,625,334 hectares of coastal floodplains [5,37].The region is comprised of nineteen districts and is characterized by an extremely hydrologically active geomorphic setting, with elevation ranging from 0 to 986 m above the mean sea-level.There are numerous tidal rivers that infrequently inundate the inlands and cause waterlogging.Furthermore, the conical shape of the study area and the lower elevation compared to that of the sea cause a backwater effect and contribute heavily to waterlogging issues [38].Due to the region's close proximity to the Bay of Bengal, salinity levels in the soil and river systems are also considerably higher than in any part of the country [37,39,40].The southwestern part of the region is covered by a dense mangrove forest, known as the Sundarbans, which offers a natural protection to most of the natural disasters, particularly from cyclones developed in the sea [41].The flora and fauna of the Sundarbans are considered to be a part of the World Heritage sites and are critical for maintaining the ecological well-being of the coastal area [42].The Sundarbans is one of the most unique mangrove forests in the world and can be highly susceptible to any unplanned land use and land cover changes [43,44].
The mean annual temperature in the area is 33.74 • C with a precipitation of 177.21 cm/year [45].The people inhabiting this area are mainly involved in the primary activities (e.g., farming, fishing) for sustenance, which make them vulnerable to any adverse environmental processes [5].In the past, people have migrated away from this region due to cyclones, storm surges and other natural disasters [46] that account for the stunted development of the region [47,48].The high population density in this region has also led to the exploitation of the newly formed islands (locally known as Chars), an issue which also strongly contributes to LULC changes along the coastal areas of Bangladesh [39].
Remote Sens. 2019, 11, x FOR PEER REVIEW 5 of 28 World Heritage sites and are critical for maintaining the ecological well-being of the coastal area [42].
The Sundarbans is one of the most unique mangrove forests in the world and can be highly susceptible to any unplanned land use and land cover changes [43,44].
The mean annual temperature in the area is 33.74 o C with a precipitation of 177.21 cm/year [45].The people inhabiting this area are mainly involved in the primary activities (e.g., farming, fishing) for sustenance, which make them vulnerable to any adverse environmental processes [5].In the past, people have migrated away from this region due to cyclones, storm surges and other natural disasters [46] that account for the stunted development of the region [47,48].The high population density in this region has also led to the exploitation of the newly formed islands (locally known as Chars), an issue which also strongly contributes to LULC changes along the coastal areas of Bangladesh [39].As a result, a total of forty-eight images between 1990 and 2017 were used.To minimize atmospheric disturbances in the datasets, only dry season images with less than 5% cloud cover were obtained.Radiometric corrections were performed through the conversion of raw digital numbers to top-ofatmosphere (TOA) reflectance [49].Atmospheric corrections were performed to remove the remaining atmospheric disturbances (e.g., haze) through identification of the darkest pixel value in each band and the subsequent subtraction of this value from every pixel [50].

Data Acquisition and Preparation
Figure 2 shows the overall workflow of this study.The data preparation phase started with the acquisition of eight Landsat TM and OLI-TIRS scenes (at 30 x 30m resolution), covering the entire coastal regions of Bangladesh for individual years including 1990, 1995, 2000, 2005, 2010, and 2017.As a result, a total of forty-eight images between 1990 and 2017 were used.To minimize atmospheric disturbances in the datasets, only dry season images with less than 5% cloud cover were obtained.Radiometric corrections were performed through the conversion of raw digital numbers to top-of-atmosphere (TOA) reflectance [49].Atmospheric corrections were performed to remove the remaining atmospheric disturbances (e.g., haze) through identification of the darkest pixel value in each band and the subsequent subtraction of this value from every pixel [50].Multi-temporal Google Earth aerial imageries were used to select suitable training sites from the acquired Landsat images and to collect training pixels/areas related to six LULC types (Table 1) using photo-interpretation techniques.On average, 850 training samples were collected for use in the LULC classification.Data for model training and LULC classification were selected while maintaining a uniform distribution among the feature classes and a homogeneous distribution over the study area (i.e., maintaining an equal number of training points for each class spread homogenously over the study area).However, it should be noted that dry season (e.g., November and December) rice (locally called "Boro" paddy) cultivation and associated waterlogging via irrigation in many coastal areas offered substantial challenges in isolating agricultural and wetland areas, even via geolinked Google Earth imageries.Moreover, existing Google Earth imageries corresponding to the study area in 1990, 1995 and 2000 are not as clear as the recent years (i.e., 2005, 2010 and 2017).This is noticeable when attempting the accurate identification of areas related to small urban patches, agricultural lands, and inland water bodies.Hence, multi-year normalized differential vegetation index (NDVI), normalized difference water index (NDWI), and normalized difference built-up index (NDBI) were produced to assist in the training data generation process.Training data showing an agreement with these three indices and the imagery available in Google Earth, were finally retained for mapping LULC.Multi-temporal Google Earth aerial imageries were used to select suitable training sites from the acquired Landsat images and to collect training pixels/areas related to six LULC types (Table 1) using photo-interpretation techniques.On average, 850 training samples were collected for use in the LULC classification.Data for model training and LULC classification were selected while maintaining a uniform distribution among the feature classes and a homogeneous distribution over the study area (i.e., maintaining an equal number of training points for each class spread homogenously over the study area).However, it should be noted that dry season (e.g., November and December) rice (locally called "Boro" paddy) cultivation and associated waterlogging via irrigation in many coastal areas offered substantial challenges in isolating agricultural and wetland areas, even via geolinked Google Earth imageries.Moreover, existing Google Earth imageries corresponding to the study area in 1990, 1995 and 2000 are not as clear as the recent years (i.e., 2005, 2010 and 2017).This is noticeable when attempting the accurate identification of areas related to small urban patches, agricultural lands, and inland water bodies.Hence, multi-year normalized differential vegetation index (NDVI), normalized difference water index (NDWI), and normalized difference built-up index (NDBI) were produced to assist in the training data generation process.Training data showing an agreement with these three indices and the imagery available in Google Earth, were finally retained for mapping LULC.

Feature Selection Process
It is important to use only image features (i.e., spectral bands) that are most useful for accurately separating different LULC classes.Table 2 shows the outcome of an exploratory analysis; the Pearson correlation statistic was employed to reveal the statistically significant pairwise associations between spectral bands corresponding to multi-temporal images used in this study.Hence, the XGBoost algorithm was used to select important features (i.e., spectral bands) that were used later for LULC classification using the RF algorithm.We implemented the XGBoost feature selection process using the 'xgboost' package in R. The XGBoost algorithm was built as an enhanced version of the gradient boosting decision tree algorithm (GBDTA) [30,31].GBDTA is a regression and classification technique that develops prediction models based on weak prediction models-the decision trees.The term 'weak' or 'strong' refers to the measure of how weakly or strongly the classification learners are correlated in relation to the actual target variable.Based on errors, the process progressively assigns weights to the classifiers in each iteration (i.e., iterative additive modeling) and consequently corrects previous models using the present model.This process continues until an accurate prediction or reproduction of the training data is possible by the final model.XGBoost has revolutionized the performance of GBDTA because instead of adopting the conventional weight assignment process, the algorithm adopts a step size shrinkage process that directly computes weights of the new features and shrinks these newly-computed feature weights when assigning these to the current model.The entire process helps to prevent overfitting, minimize training loss and reduce classification errors while developing the final model [31,[51][52][53][54]. Since XGBoost only works with numeric vectors, One-Hot Encoding was used to convert the six land-cover categorical classes in Table 1 into a sparse matrix with numeric binary categories (1, 0).This allowed building individual XGBoost models for each LULC class and to generate 'gain' scores for each of the features.The gain scores show the improvement in model classification accuracy after a particular feature was added into the model.Although the process is generally used for categorical predictor variables, we have used this to rapidly and accurately transform dependent variables (the feature classes) to binary categories.Before fitting the XGBoost model, we split the entire coastal dataset into a training dataset (75%) and a test dataset (25%) using seed.To ensure that only models having a high classification accuracy were used for the identification of important variables, we tracked each individual model based on the Area Under the Curve (AUC) value and the classification accuracy rate.We retained models with AUC > 0.75 and accuracy rate > 80%, as these rates indicate a model's high usefulness [55,56].For model that failed to meet these criteria, parameters (e.g., maximum iterations, maximum depth, gamma) were retuned and this process was repeated until the desired model accuracy was obtained.Some features in the dataset were highly correlated as found by Pearson correlation coefficients (see Table 2).Therefore, several trial runs were conducted to confirm that the following parameter settings resulted in obtaining the best models: maximum number iterations = 250, booster type = gradient boosting decision tree, maximum depth = 3, gamma = 0, subsample = 1, column sample by tree = 1 and evaluation metrics = misclassification rate.Finally, the gain scores for each individual spectral band were recorded for each of the LULC classes.The mean scores were computed and the top three spectral bands with the highest mean scores across all the classes were considered as "important features" and selected for use by the RF classifier.

LULC Classification
We used the 'randomForest' package in R to produce multi-year (i.e., 1990,1995,2000,2005,2010, and 2017) LULC maps for the coastal region of Bangladesh.Similar to the XGBoost feature selection stage (i.e., using the same seed), the multi-temporal image data were split into training (75%) and test (25%) datasets.However, only three XGboost-selected important features were included in these datasets for each LULC classification year.We fitted RF models on the training data for each year using the number of decision trees (ntree) = 500 and the number of input features (mtry) = 3.While mtry can be set based on tuneRF in R, we instead used fixed mtry as were informed by the XGBoost feature selection process.In general, the choice of these hyperparameters was guided by the goal of reducing the chance of overfitting the trained models.These trained models were then used for predicting LULC classes in the test datasets.Kappa coefficients and user's accuracy values were used for analyzing RF model performances.
To further evaluate the performance of the XGBoost feature-selected RF classifier, a standalone RF classification was performed on the same dataset.Multiple trials with different mtry (e.g., 3, 5, 6) and ntree (e.g., 100, 500, 1000) parameterizations were completed to inspect out-of-bag (OOB) error rates.Finally, an RF model was trained with a default mtry setting and ntree value of 500 that contributed to the lowest OOB error rate.This trained model was then used for predicting LULC classes (for each study year) based on the test datasets.
To discriminate true land cover changes from possible incorrect changes (generated as a result of classification errors), error matrices and per-class accuracy indices were computed.In doing so, using Google Earth images, 300 random points were generated over the study area for each year and then their corresponding LULC classes were ascertained.These empirically-validated classes, derived from the reference images were then used to evaluate per class accuracy (i.e., user's and producer's accuracy) and a separate set of overall accuracy and Kappa coefficient for each year were also derived.

XGBoost-Selected Image Features
Table 3 shows the overall gain scores (i.e., information gain) of spectral bands used in training the XGBoost models.Based on the highest average gain scores, TM 2, 4 and 6 for 1990; TM 3, 5 and 6 for 1995; TM 1, 4 and 6 for 2000; TM 2, 4 and 5 for 2005; TM 4, 5 and 6 for 2010; and OLI 3, 5 and 7 for 2017 were identified as the top informative bands (i.e., providing the best band information).The generation of training gain scores assisted in the identification of important bands for each feature class and allowed extensive assessment of individual bands with respect to LULC classes of interest.This is important when mining for land cover patterns in a large, heterogeneous geographic area.Identification of the top three spectral bands (based on their average gain scores), allowed detection of those bands that consistently performed well in the construction of the boosted trees.Additionally, the integration of a thermal band (band 6) during the classification process (for 1990, 1995, 2000 and 2010) helped in the distinguishing features based on their emission energies [57,58], especially when the visible and the near-infrared bands could not be relied upon to determine the separate LULC classes.For each year, among the top three XGBoost-selected features (identified by the high gain scores), there are at least two features that have relatively low pairwise correlations (Table 2).While in theory multicollinearity should not be an issue for RF (since only one candidate-predictor is examined at a time during the tree generation process), in practice, an RF model is at risk of becoming unstable in the presence of high feature correlations [59].Studies have found that predictor variables with lower pairwise associations can significantly improve the classification accuracy of the tree generation process for an RF model [20,52].Table 4 summarizes XGBoost models' performances based on multi-temporal test datasets.For each year and overall for each LULC class, XGBoost models had high AUC values and classification accuracy rates.We found that the average AUC and accuracy rate of the thirty-seven XGBoost models were 0.91 (±0.06) and 0.92 (±0.05), respectively.The averages for precision, specificity and misclassification rate were 0.82 (±0.13), 0.96 (±0.04), and 0.08 (±0.05), respectively.When applied within an ecological application context, a classification model with an AUC > 0.75 is generally considered as a useful model [55,56,60].

Performance of XGBoost-Feature Selected RF in LULC Classification
Six LULC classes per year were generated using the RF technique.An evaluation of the accuracy results is presented in Table 5, which shows that there are no significant differences between year-to-year RF performances for classifying the LULC classes of a large geographic area with high heterogeneous geophysical settings.Each year's user's accuracy and the Kappa coefficient values were above 0.80 and 0.75, respectively.The average user's accuracy and the Kappa coefficient were 0.87 (± 0.04) and 0.80 (± 0.05), respectively.Table 6 shows the results from the per-class accuracy assessment for individual years.For most of the classes in the majority of the years, user's accuracy (UA) and producer's accuracy (PA) ranged between 70-95%.The exception was the bare soil class, which had lower accuracy values in multiple years (i.e., 1990, 2010 and 2017).This low accuracy was quite expected due to the extremely low and dispersed distribution of this class in the study area.For example, Table 7 shows that the bare soil class contributed only 0.60%, 0.38% and 0.52% of the total study area for the years 1990, 2010 and 2017, respectively.In addition to the bare soil class, wetland and river classes had accuracies lower than 70% (1995, 2010 and 2017), which show some commission and omission errors by the RF classifier, as both the classes have very similar salinity levels throughout the study area that disrupt spectral distinguishability.However, with the exception of the user's accuracy in 2010, most of the commission and omission errors remained within 50% and below for these two classes.Interestingly, these variations in per-class accuracies suggest that there is a limit to which powerful machine learning ensembles such as XGBoost and RF can account for minority bias and spectral confusions introduced by mixed pixels.This observation, requires further investigation.Note that the slight differences in overall accuracy (OA) and Kappa coefficient reported in Tables 5 and 6 resulted from the use of two different sample sizes in the accuracy assessments (see "LULC Classification" sub-section under Section 2).

Comparative Assessment of RF and XGBoost-Feature Selected RF in LULC Classification
As noted above (see Section 2.4), to evaluate the performance of the XGBoost-RF classifier, a standalone RF classification was performed that yielded an average ~0.78 accuracy rate.Table 5 suggests this standalone RF classifier was outperformed by the XGBoost-feature selected RF classifier when classifying the LULC classes for every year.Table 7 shows (see also Figures A1 and 4) the area of multi-temporal LULC classes obtained by RF and XGBoost-feature selected RF classifiers.LULC classifications performed by these two approaches differ quite significantly, particularly for agriculture, vegetation, and built-up area classes.Previous studies [3,12] focused on estimating LULC classes of small discrete regions in the coastal areas of Bangladesh, for which it is difficult to empirically validate these results.The RF-based estimations of multi-year LULC areas do not appear to reflect growing economic activity and documented large-scale natural disturbances (e.g., cyclones, storm surges) in Bangladesh's coastal region over recent decades.
For instance, RF-based classification underestimated the obvious impacts of the natural processes and urbanization on coastal LULC distributions, particularly for agricultural land, bare soil, and built-up areas.The percentage area, estimated solely via RF-based classifications, often yielded values that were close to zero for the aforementioned classes.An example is the year 2010, where the RF-based classification in Table 7 shows a built-up area of only 0.04%, a percentage which contradicts the findings of several studies [3,12,47,61,62] that reported high rates of urbanization (hence an increased rate of development for built-up areas) in the past decade.Furthermore, the percentage area covered by the bare soil class was zero for the years 2005, 2010 and 2017, which for a hydrologically active area such as the coastal region of Bangladesh is quite impossible, especially because erosion and accretion processes (driven by the large river system) are highly prominent in this area [5,37,63].In contrast, XGBoost-RF-derived results are more aligned with the growing human and natural activity trend across the whole coastal region.Comparison of RF only and the XGBoost-RF values for the percentage area covered by the built-up class indicates that the latter is consistent with the findings of previous studies.The XGBoost-RF-derived percentages agree with the increasing trend of the built-up class from 2005-2017, as could be expected due to a proliferation in the amount of urbanization during the last decade.Similarly, contrasting the percentage area covered by the bare soil class during 2005-2017, it is evident that the XGBoost-driven classification technique has performed better than the solely RF-driven classification process.For example, the percentage areas for bare class in the XGBoost-driven classified images were 7.90, 0.38 and 0.52 for the years 2005, 2010 and 2017 respectively; compared to a percentage of 0 in the RF only classified images.In addition, declining coastal vegetation coverage alongside increasing agricultural land areas, increasing amounts of bare soil areas following natural disturbances (e.g., floods in 1998 and 2004; and cyclones of 2007 and 2009), and the increasing amount of built-up area were picked up comparatively well by the XGBoost-RF classification approach.
As mentioned previously, several studies have reported that RF's randomized bagging-based feature selection and classification approach can be biased towards the majority class and therefore, can misclassify the minority class in the data [18,19].The findings from our study reiterate these shortcomings of the RF classifier, most notably when detecting minority classes (such as built-up land and bare soil) in a highly heterogeneous geomorphic setting.

Spatio-Temporal Patterns of LULC
Determining the spatio-temporal dynamics of LULC patterns in the coastal regions of Bangladesh requires a careful interpretation of the underlying natural and anthropogenic drivers that influence the pattern of LULC change.The coastal zone of Bangladesh is divided into western, eastern and central regions [64]; and each of these zones has its own characteristics and geomorphic processes that shape the LULC pattern.Because of the interplay between these natural processes and human activities, people living in these areas are constantly subjected to severe flooding, cyclones, and storm surges, which cause extensive destruction of property and force displacements within and outside the region.For example, the southwestern region relies predominantly on the upstream waters for land building activity and experiences a myriad of environmental issues such as high-level soil and river water salinity [65].Similarly, the central zone is highly dynamic and various natural hazards such as storm surge-induced coastal flooding contribute to the environmental externalities.Although less frequent than the other two regions, the eastern zone is also affected by occasional cyclones and storm surges [3,66] and has undergone huge infrastructural development in the recent years.These natural processes, combined with the recent increase in anthropogenic activities such as construction of embankments to secure people and properties from coastal flooding, infrastructural development and the advancement of tourism sector are greatly influencing coastal dynamics [5,47,61,67].Given the heavily intertwined human and natural processes within, for such a large and heterogeneous landscape, LULC classification using moderate resolution Landsat data is obviously a challenging task.As a result, and despite the use of powerful machine learning ensembles, some confusion in the detection and accurate classification of land covers is not surprising, especially when 10 out of the 19 coastal districts within the study area are susceptible to moderate to severe coastal flooding and cyclones [68][69][70][71].
The LULC transition matrix (Table 8), LULC maps (Figure 3), and a per-year distribution of LULC class sizes (in hectare) (Figure 4) demonstrate some interesting and important spatio-temporal patterns.It can be seen that the built-up areas experienced an increasing trend from 1990 to 2017, with a slight decrease in 2000 when a large proportion of human settlements on the southwestern coast was severely affected by the catastrophic flood of 2000 [72].The effect of this flood was probably manifested in the degradation of vegetation for 2000, since a large proportion of coastal vegetation cover was probably washed away.It appears, however, that successful coastal afforestation initiatives [73] in successive years resulted in a gradual increase in vegetation, notably from 2000 to 2017 [73][74][75].Flooding has also influenced the distribution of inland water bodies.The devastating flood event of 2004 caused massive waterlogging in the low-lying coastal areas due to drainage congestion and the associated backwater effect, resulting in a sharp increase in water bodies in the classified LULC map of 2005.The residual effect of this event was probably linked with the increased proportion of the wetland category in 2005.Although filling of (grabbing) water bodies including river areas is a common practice in a region where rapid urbanization is taking place over the past decades, an increasing pattern of river area since 2005 was observed in the coastal region.This phenomenon can be associated with the increased popularity and practice of shrimp cultivation, with possible links to factors such as sea-level rise and land subsidence, especially in the southwestern region [76][77][78].Areas under agricultural activity showed an increasing trend between 1990 and 2000 with a decreasing trend in vegetation cover evident during the same period (Table 8).A detailed analysis of the intra and inter-transition of LULC classes (Table 8) reveals that human-environment interactions are shaping the LULC dynamics of the coastal region.For example, contrary to common knowledge and perception, built-up areas in the coastal region of Bangladesh experienced conversion into other land cover types (e.g., agricultural land).This somewhat unconventional land use transition may be linked with coastal out-migration, induced predominantly by environmental disturbances and climate change [79,80].Although it is not entirely clear as to how human migration could lead to land cover change, Braimoh (2004) demonstrated that seasonal migration resulted in dramatic land use change in Ghana [81].Fasona and Omojola (2009), using remote sensing data, also noted that increased soil salinity via coastal erosion caused the loss of livelihoods, eventually leading to the dislocation of a total of 39 coastal communities in Nigeria [82].In the case of the coastal area of Bangladesh, it is possible that migration may force people to sell their properties to local elites, and it is, in fact, the choice of the buyers as to whether to keep the newly purchased land in its present form or to convert it to other categories such as shrimp land for a higher economic return.However, this warrants further investigation.
In order to explore the transition of built-up class in detail, we examined the last two population censuses (2001 and 2011) of the coastal districts, and found that four districts located in the central and southwestern zones experienced negative population changes between 2001 and 2011 (Table 9).A comparison of these population changes against yearly LULC maps in Figure 3 also supports the observation that the areas experiencing a disappearance of built-up pixels (when compared to the previous years) belong to these four districts, experiencing a decrease in population.This strongly suggests that depopulation resulting from natural hazards or anthropogenic causes may play a significant role in explaining the decline in built-up areas in 2000 and the low intra-class transition of pixels in Tables 7 and 8, respectively.Furthermore, coastal erosion which is pervasive in many areas (particularly in the central zone), has, in the past, resulted in the loss of entire villages to the river or the sea.Given these observations, and the inconsistency in the built-up area/pixel transformation detailed, it is possible to suggest that the observations are indicative of the high-degree of human-environment interactions that govern LULC dynamics in Bangladesh's coastal region.
However, it can also be observed that the conversion of built-up areas to other classes significantly declined after 2000 and now aligns more closely with the characteristics usually expected from this LULC class (remaining unchanged or showing an increased intra-class conversion).This could be the result of the various adaptation and disaster risk mitigation strategies implemented by the government and the non-government organizations in more recent times [69,83,84].The spatio-temporal patterns of agricultural land cover show a gradual increase between 1990 and 2000; however, this trend has fluctuated in the period between 2000 and 2017.One particular aspect that may govern this fluctuation in post 2000 years could result from increasing shrimp farming activities.The spate of recurrent cyclones and flooding has impacted on the stability of crop production as a livelihood practice.Additionally, increasing soil salinity also made crop production increasingly difficult for the farmers [39,85].Therefore, shrimp production, which is well suited to operating in the saline conditions and is less prone to environmental externalities, gained considerable prominence.The rising popularity of shrimp cultivation coupled with the natural calamities that took place during the study period (i.e., 1990-2017) can help understand the transition of agriculture to other classes in Table 8.Periodic waterlogging caused by embankment construction is another factor that could have shaped agricultural land cover.Embankments that were constructed in the 1960s to secure people and properties in the coastal region is a prime example of man-made activity that appears to interrupt the sedimentation process, hence the elevation loss of the land [86].Consequently, water ingress into the areas behind the embankment walls during a cyclone or high tide events can lead to pervasive flooding of agricultural lands and human settlements [86].It is evident that the percent area covered by the agricultural land remained almost unchanged during the 1990-1995 and 1995-2000 periods.However, a sharp decline was observed during 2000-2005 period, after which the change again became nearly constant in post-2005 periods.The recurrent The spatio-temporal patterns of agricultural land cover show a gradual increase between 1990 and 2000; however, this trend has fluctuated in the period between 2000 and 2017.One particular aspect that may govern this fluctuation in post 2000 years could result from increasing shrimp farming activities.The spate of recurrent cyclones and flooding has impacted on the stability of crop production as a livelihood practice.Additionally, increasing soil salinity also made crop production increasingly difficult for the farmers [39,85].Therefore, shrimp production, which is well suited to operating in the saline conditions and is less prone to environmental externalities, gained considerable prominence.The rising popularity of shrimp cultivation coupled with the natural calamities that took place during the study period (i.e., 1990-2017) can help understand the transition of agriculture to other classes in Table 8.Periodic waterlogging caused by embankment construction is another factor that could have shaped agricultural land cover.Embankments that were constructed in the 1960s to secure people and properties in the coastal region is a prime example of man-made activity that appears to interrupt the sedimentation process, hence the elevation loss of the land [86].Consequently, water ingress into the areas behind the embankment walls during a cyclone or high tide events can lead to pervasive flooding of agricultural lands and human settlements [86].It is evident that the percent area covered by the agricultural land remained almost unchanged during the 1990-1995 and 1995-2000 periods.However, a sharp decline was observed during 2000-2005 period, after which the change again became nearly constant in post-2005 periods.The recurrent floods and cyclones rendered most of the agricultural lands productively useless.The repeated displacements of people due to these natural disasters may have led to the abandonment of these agricultural lands, which were later colonized by different forms of vegetation cover (such as shrubs and bushes) and thus, accounting for the high transition between these two classes.Additionally, a total change, ranging from 9.33 to 12.11%, was found for the conversion of agricultural class to water bodies and river classes during the 2000-2017 period.Both the conversion of agricultural lands to shrimp farming and the formation of large temporary water bodies due to floods and cyclones can potentially account for this change.In such a dynamic situation, the recognition of pure pixels (even with an advanced ensemble) could be challenging as satellite data only represent a snapshot of a given time point.
The river and wetland classes in LULC maps show an alternating trend of increase and decrease between 1990 and 2017.In the LULC classification process, isolating wetlands and river areas, was difficult for the years when coastal flooding occurred (either by high tide/cyclone induced storm surges), when river water was used for irrigation purpose or used for shrimp cultivation.These factors appear to have played a crucial role in the classification process by mixing up inland water bodies and river areas [87,88].Furthermore, after Cyclone Aila in 2009, numerous saline water bodies were created in the southwestern part of the region [89].When inland water bodies reach a certain salinity threshold, the spectral reflectance of these inland water bodies becomes identical to that of the rivers [88,[90][91][92][93]. Consequently, water bodies that are actually shrimp cultivation sites are classified as part of the river system.This aspect of spectral reflectance identification may be one of the reasons for the increasing trend of river areas being detected in the LULC maps of 2005, 2010 and 2017, as well as the high transition rate between water bodies and river classes in Table 8.Additionally, one of the popularly-practiced adaptation strategies for flood and waterlogging in the southern region can be identified in the LULC transition matrix in Table 8, that is-a large proportion of the water bodies seem to have converted into agricultural land.This may have resulted from alternating practice of shrimp cultivation and crop production in low-lying areas such as the beels [91,92].Farmers resort to shrimp cultivation when these low elevation lands become inundated with water after intense rainstorms, and thus, for those years, the low-lying areas were classified as water bodies.In succeeding years, when there is not sufficient influx of water from the rainy season or the tide from the nearby rivers, these areas dry up and are again used for agricultural activities.As a result, large areas of land could have varied spectral responses when they are engulfed with rain or saline waters, or are used for agricultural purposes.This type of issue may be a common source of mixed pixels while employing moderate resolution images for LULC classification.
A significant decline in the amount of vegetation cover can be seen in LULC maps between 1990 and 2000.An intricate web of river systems and estuaries along the central zone makes it highly dynamic and susceptible to high-tide induced coastal floods [5], and this may result in the frequent submergence of built-up land and human settlements along with the removal of vegetation cover.Despite the eastern zone being relatively stable in terms of erosion and accretion processes [64], clearing of forest cover is pervasive, together with other increased human activities such as infrastructural development and urban expansion [94].However, in the post 2000 years, areas with vegetation cover increased until 2017, which can be attributed to major coastal afforestation projects along the coastal belt undertaken since 2000 [73][74][75].As a part of these projects, the Department of Forest from the government of Bangladesh took a mega initiative of planting 2872.88 hectares of Nipa, 10.0 hectares of coconut, 40.0 hectares of Arica palm, 280.0 hectares of Bamboo and Cane, 192,395.24hectares of mangrove and 8689.53 hectares of non-mangrove trees.The effects of these afforestation projects are clearly visible in the LULC maps between 2005 and 2017.Table 8 further shows that a large portion of the vegetation cover in the study area remained unchanged during the study period  and that the intra-class percent conversion remained nearly above 50% from 1995-2017.This could be due mainly to the large mangrove forest occupying the southwestern region of the study area, which did not show any discernible change in the LULC maps (Figure 3a-f).Apart from this, vegetation plantations around homesteads are a popular measure used to save habitation and people from the ingress of frequent cyclones such as those that occurred in 2007 and 2009.This has become a common practice, to save communities against climate-influenced adversities, especially in the central coastal zone [95].

Conclusions
This study provided a detailed assessment of multi-temporal LULC changes in the coastal regions of Bangladesh using Landsat data and advanced feature selection and classification techniques-XGBoost and Random Forest.The XGBoost-based feature selection approach allowed detection of the most informative spectral bands.These contributed to the improved performance by the RF classifier in producing six-class LULC maps.This joint XGBoost-RF approach performed substantially better than an independent RF classifier when dealing with high-level landscape heterogeneity and spectral complexity issues (e.g., dominance of mixed pixels/LULC areas) in 30-m Landsat images during the LULC classification process.This improvement is indicated by the user's accuracy and Kappa coefficient values of 0.80 and 0.75, respectively.The spatio-temporal investigation of the classified LULC maps shows that over the past 28 years (1990-2017) the coastal regions of Bangladesh have experienced an increase in agricultural land (5.44%), and in both built-up (4.91%) and river (4.52%) areas.The vegetation coverage across coastal landscapes experienced a significant decline in some years due to large-scale natural disturbances (e.g., flooding, cyclones and storm surges in 1996, 1997, 1998, 2004 and 2007) as well as human-influenced alterations (e.g., shrimp farming).However, the 2005-2017 LULC maps show a substantial overall increase in areas corresponding to the vegetation class-an understandable manifestation of the effect of coastal reforestation projects undertaken by the governmental and non-governmental entities in Bangladesh.These findings on the spatio-temporal dynamics of LULC can inform policy guidelines that are essential for combating the adverse effects of human-caused and natural disturbances across the entire coastal regions of Bangladesh.
This study also demonstrated that the use of powerful ensemble techniques such as XGBoost and Random Forest can be useful in addressing the complexities inherent in the LULC classification of a large and heterogeneous geographic area, particularly when available remotely sensed data come with insufficient spatial and spectral resolution.An effective collaboration among these techniques offers an excellent opportunity for frequent and detailed monitoring of the space-time LULC patterns of a rapidly changing landscape, especially in developing regions of the Asian subcontinent, which often rely on freely available lower-quality remotely sensed data.However, the results from this study should be interpreted with caution, since it is very challenging even for advanced algorithms to deal effectively with discriminating class memberships of a very large and complex landscapes by using lower to moderate spatial/spectral resolution imageries (i.e., Landsat).In fact, this large-scale study provides a useful methodological direction that should be further explored by future studies, using available remotely sensed data of varied qualities.

Figure 1 .
Figure 1.Location of the study area showing the coastal districts of Bangladesh.

Figure 2
Figure 2 shows the overall workflow of this study.The data preparation phase started with the acquisition of eight Landsat TM and OLI-TIRS scenes (at 30 x 30m resolution), covering the entire coastal regions of Bangladesh for individual years including 1990, 1995, 2000, 2005, 2010, and 2017.As a result, a total of forty-eight images between 1990 and 2017 were used.To minimize atmospheric disturbances in the datasets, only dry season images with less than 5% cloud cover were obtained.Radiometric corrections were performed through the conversion of raw digital numbers to top-ofatmosphere (TOA) reflectance[49].Atmospheric corrections were performed to remove the remaining atmospheric disturbances (e.g., haze) through identification of the darkest pixel value in each band and the subsequent subtraction of this value from every pixel[50].

Figure 1 .
Figure 1.Location of the study area showing the coastal districts of Bangladesh.

Figure 2 .
Figure 2. Flowchart showing overall methodology of this work

Figure 2 .
Figure 2. Flowchart showing overall methodology of this work.

Figure 3 .
Figure 3. Classified land cover maps of the coastal area of Bangladesh: (a) 1990 and (b) 1995.

Figure 3
Figure 3 contin.Classified land cover maps of the coastal area of Bangladesh: (c) 2000 and (d) 2005.

Figure 3
Figure 3 contin.Classified land cover maps of the coastal area of Bangladesh: (e) 2010 and (f) 2017.

Figure 4 .
Figure 4. Percent area distribution of each land cover class between 1990 and 2017, obtained by the XGBoost-RF classifier.

Figure 4 .
Figure 4. Percent area distribution of each land cover class between 1990 and 2017, obtained by the XGBoost-RF classifier.

Table 1 .
Land cover classification scheme used in this study.

Table 1 .
Land cover classification scheme used in this study.

Table 2 .
Correlation matrices showing Pearson correlation coefficients between image features (i.e., spectral bands) for individual years.Values in bold face show the correlation between the bands used for the final RF classification.
** Correlation is significant at the 0.01 level.* Correlation is significant at the 0.05 level.

Table 3 .
Training gain scores for features (reflective bands) used in XGBoost models.The bands corresponding to the bold-faced mean rank values were used in LULC classification using RF.

Table 4 .
Accuracy assessment and precision diagnostics of the XGBoost models for the individual feature classes for each year.

Table 5 .
Accuracy assessments of multi-temporal RF models.

Table 6 .
Per-class accuracy assessments of multi-temporal RF models.

Table 7 .
Comparison of RF and XGBoost-feature selected RF classifiers.

Table 8 .
Transition matrix showing the changes in percentage areas of the studied classes.

Table 9 .
Population change during 2001-2011 in the coastal areas of Bangladesh.Districts highlighted with bold face experienced a negative population change.