Prediction of Land Use and Land Cover Changes for North Sumatra, Indonesia, Using an Artiﬁcial-Neural-Network-Based Cellular Automaton

: Land use and land cover (LULC) form a baseline thematic map for monitoring, resource management, and planning activities and facilitate the development of strategies to balance conservation, conﬂicting uses, and development pressures. In this study, changes in LULC in North Sumatra, Indonesia, are simulated and predicted using an artiﬁcial-neural-network-based cellular automaton (ANN-CA) model. Five criteria (altitude, slope, aspect, distance from the road, and soil type) are used as exploratory data in the learning process of the ANN-CA model to determine their impacts on LULC changes between 1990 and 2000; among the criteria, altitude and distance from the road have strong impacts. Comparison between the predicted and the real LULC maps for 2010 illustrates high agreement, with a Kappa index of 0.83 and a percentage of correctness of 87.28%. Then, the ANN-CA model is applied to predict LULC changes in 2050 and 2070. The LULC predictions for 2050 and 2070 demonstrate high increases in plantation area of more than 4%. Meanwhile, forest and crop area are projected to decrease by approximately 1.2% and 1.6%, respectively, by 2050. By 2070, forest and crop areas will decrease by 1.2% and 1.7%, respectively, indicating human inﬂuences on LULC changes from forest and cropland to plantations. This study illustrates that the simulation of LULC changes using the ANN-CA model can produce reliable predictions for future LULC.


Introduction
Land use refers to the purpose that land serves, for example, recreation, wildlife habitat, or agriculture. Land cover refers to the surface cover on the ground, whether vegetation, urban infrastructure, water, bare soil or other. Although the meanings of the terms are distinct, land use and land cover (LULC) are often used interchangeably. Identifying, delineating and mapping land cover is important for global monitoring studies, resource management, and planning activities. Identification of land cover establishes the baseline from which monitoring activities (change detection) can be performed and provides the ground cover information for baseline thematic maps. Moreover, land use applications involve both baseline mapping and subsequent monitoring, since timely information is required to know what current quantity of land is involved in what type of use and to identify the land use changes from year to year. Therefore, knowledge of LULC will help with the development of strategies to balance conservation, conflicting uses, and development pressures. Additionally, issues can be identified from LULC studies, such as the removal or disturbance of productive land, urban encroachment, and the depletion of forests suitable for species distribution [1]. North Sumatra has experienced massive development over the last three decades where industrialization has changed the land use of most of the area, followed by agricultural expansion [30]. The forest area in North Sumatra is one of the largest in Indonesia, with more than 1,206,881 ha of preservation forest and 51,600 ha of conservation forest. However, there is forest degradation resulting from forest disturbance, especially from land conversion [31]. Several changes impact the forest as the habitat for many tree species displaced by other functions or land use. Moreover, the cover has also changed from trees to buildings or crops.
The human interest in land allocation influences the planning in North Sumatra to support the development of the area to be more productive, rather than maintaining it as forest [30]. For instance, most of the residents choose crop plantation for restoration of tree cover, rather than favouring hardwood trees, in the Lake Toba catchment area [32]. The LULC changes result from conflicting land interests. Crop plantation expands to increase profit, reducing forest area [33]. These conditions require the LULC perspective to emphasize the future impact of the changes, especially for loss of forest biodiversity.
Since the 17th century, North Sumatra has become a famous regency for forest products, such as Sumatra benzoin from the Styrax sumatrana trees. This product has become the income source of North Sumatra has experienced massive development over the last three decades where industrialization has changed the land use of most of the area, followed by agricultural expansion [30]. The forest area in North Sumatra is one of the largest in Indonesia, with more than 1,206,881 ha of preservation forest and 51,600 ha of conservation forest. However, there is forest degradation resulting from forest disturbance, especially from land conversion [31]. Several changes impact the forest as the habitat for many tree species displaced by other functions or land use. Moreover, the cover has also changed from trees to buildings or crops.
The human interest in land allocation influences the planning in North Sumatra to support the development of the area to be more productive, rather than maintaining it as forest [30]. For instance, most of the residents choose crop plantation for restoration of tree cover, rather than favouring hardwood trees, in the Lake Toba catchment area [32]. The LULC changes result from conflicting land interests. Crop plantation expands to increase profit, reducing forest area [33]. These conditions require the LULC perspective to emphasize the future impact of the changes, especially for loss of forest biodiversity.
Since the 17th century, North Sumatra has become a famous regency for forest products, such as Sumatra benzoin from the Styrax sumatrana trees. This product has become the income source of primary importance for communities around the forested areas in North Sumatra [34]. However, North Sumatra faces a problem with space allocation due to the increase in the population and industries, Sustainability 2019, 11, 3024 4 of 16 especially palm oil, rubber, and log-forest product plantations owned by private companies [35]. S. sumatrana was threatened by land conversion. This conversion influences the species sustainability in North Sumatra. The spatial LULC information is crucial to analyze the potential area for this species to grow. Therefore, the purpose of LULC change prediction in this study is to contribute to regional planning and management for the distribution of S. sumatrana in North Sumatra, and this is the rationale for choosing North Sumatra as the study region.

Data and Criteria
The data collected and used in this study include a digital elevation model (DEM), road map, soil types and three LULC maps for 1990, 2000, and 2010, as presented in Table 1. The Advanced Spaceborne Thermal Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM) v2 is one of the most widely used DEM datasets and is used in this study. The road map and LULC maps were obtained from the Ministry of Environment and Forestry of Indonesia, and the soil type map was obtained from the Food and Agriculture Organization (FAO).
From the dataset, five criteria, including altitude, slope and aspect from the DEM, distance from the road from the road map, and soil type, were selected for further analysis. Then, the criteria were classified into the input data and the explanatory data in the LULC simulation and prediction. The three LULC maps are the input data, and the other criteria, altitude, slope, aspect, distance from the road, and urban density, belong to the explanatory data ( Table 1).
The three LULC maps were converted into raster data using ArcGIS. The resolution was 30 arc seconds, with 25 categories. Then, the number of categories was reduced into 12 classifications by regrouping several categories as shown in Table 2. The airport is considered separately, since it is assumed to be unchangeable in land use. The area of primary, secondary and plantation forest 4 Lake The area of large-scale water bodies 5 Mangrove The area for primary and secondary mangrove forest 6 Open area The barren area Pond The area of small-scale water bodies for fisheries 10 Residential Transmigration and permanent residential area 11 Shrub Shrub and juvenile plant area 12 Swamp Primary and secondary swamp forest area 3. Methodology

Artificial-Neural-Network-Based Cellular Automaton Model
The simulations of LULC change and prediction were conducted using the ANN-CA model. ANN is used to determine the transition probability of LULC using multiple output neurons for simulating multiple LULC changes, within the structure of ANN-CA presented in Figure 2. CA is used to model the LULC changes by applying the transition probabilities from the ANN learning process. The overall analytic procedure is described in the following steps ( Figure 3), and QGIS and its MOLUSE module were utilized for the ANN-CA modelling [24].
Step 1: The first step is to define the inputs to the neural network for the simulation. The simulation is cell-based (pixel-based), and each cell has a set of n attributes (spatial variables) as the inputs to the neural network. The spatial variables can be represented by where x i is the i-th attribute, and T is transposition. The initial (1990) and final (2000) LULC maps, as well as the five exploratory maps, are loaded as input data. Since we use five criteria from exploratory maps and 12 categories in LULC maps, 17 attributes (spatial variables) for each cell are used for modelling and simulating LULC changes. All datasets are processed to retrieve the attributes and have the same spatial extent and resolution (30 arc seconds) in raster format.
Step 2: Each correlation between spatial variables is evaluated in a two-way raster comparison by selecting the first raster from one variable and the second raster from another variable. Then, the LULC area and changes for each category are computed between the initial (1990) and final (2000) time periods. The transition matrix showing the proportions of pixels changing from one category to another is also obtained from the computation.
Step 3: In this step, the transition probability is modelled by ANN. The neural network structure consists of three layers, namely, the input, hidden, and output layers ( Figure 2). Each spatial variable is associated with a neuron in the input layer after scaling within the range of [0, 1]. Therefore, the 17 neurons corresponding to the 17 attributes are used in the input layer.
In the hidden layer, the signal received by the j-th neuron, net j (k,t), from the input layer for the k-th cell at time t was computed by where w i,k is the weight between the input and the hidden layers, and x i (k, t) is the i-th scaled attribute associated with the i-th neuron in the input layer with respect to the k-th cell at time t. In terms of the number of neurons in the hidden layer, the use of 2n + 1 is recommended to guarantee the perfect fit of any continuous functions, and reduction of the number of neurons may lead to lower accuracy. However, based on Wang [25], 2n/3 hidden neurons can generate results of almost similar accuracy while requiring much less time to train. Therefore, we used 12 hidden neurons in this study.
where xi is the i-th attribute, and T is transposition. The initial (1990) and final (2000) LULC maps, as well as the five exploratory maps, are loaded as input data. Since we use five criteria from exploratory maps and 12 categories in LULC maps, 17 attributes (spatial variables) for each cell are used for modelling and simulating LULC changes. All datasets are processed to retrieve the attributes and have the same spatial extent and resolution (30 arc seconds) in raster format.   Step 2: Each correlation between spatial variables is evaluated in a two-way raster comparison by selecting the first raster from one variable and the second raster from another variable. Then, the LULC area and changes for each category are computed between the initial (1990) and final (2000) time periods. The transition matrix showing the proportions of pixels changing from one category to another is also obtained from the computation.
Step 3: In this step, the transition probability is modelled by ANN. The neural network structure consists of three layers, namely, the input, hidden, and output layers ( Figure 2). Each spatial variable is associated with a neuron in the input layer after scaling within the range of [0, 1]. Therefore, the 17 neurons corresponding to the 17 attributes are used in the input layer.
In the hidden layer, the signal received by the j-th neuron, netj(k,t), from the input layer for the k-th cell at time t was computed by  The output layer includes 12 neurons corresponding to the 12 classifications in LULC maps ( Table 2). The l-th neuron in the output layer generates a value that represents the transition probability from the initial type to the l-th (target) type of LULC. The transition probability is obtained by the following equation according to the output function of a neural network. where P(k, t, l) is the probability of conversion from the existing to the l-th type of LULC for the k-th cell at time t, and w j,l is the weight between the hidden and the output layers. A higher value indicates that the transition probability from the initial type to the lth type is larger. An iterative neural network based on the back-propagation learning algorithm is designed to simulate land uses in this study. At each iteration, each neuron in the output layer generates a transition probability from an existing type to another type of land use. In this simulation, LULC change is determined by comparing the values of transition probability such that LULC will convert from the existing type to the type with the highest value of transition probability. If the same type of LULC has the highest transition probability, the state of the corresponding cell remains unchanged.
Step 4: Once the transition probability is obtained, then modelling of LULC change is carried out by the CA simulation. The CA consists of regular spatial lattices of cells, each of which can have any one of a finite number of states, depending on the states of neighboring cells [27]. CA considers the composition of associations of cells around one cell [36]. CA simulation usually involves many iterations to decide whether a cell is changed or not. A predetermined threshold value should be used to control the rate of change so that land use conversions occur step-by-step. If the highest transition probability is lower than the threshold value, which is 0.9 in this study based on Li and Yeh [2], then the cell remains unchanged. The threshold value varies from 0 to 1, and the large value of 0.9 is used to keep the LULC changes stable in each iteration, thus obtaining fine patterns of simulation [2]. The CA in urban areas in this study only divides the classification into urban and non-urban areas for simplicity, because, if multiple land uses are presented, the transition rules of urban CA models become substantially more complicated, since the simulation involves the use of a much larger set of spatial variables and weights and more complex model structures.
Step 5: Validation of the LULC simulation using values of the Kappa coefficient to evaluate and compare the real (reference) and predicted (simulated) LULC maps for 2010 is described in Section 3.2.
Step 6: After the validation, the predictions of the future LULC maps in 2050 and 2070 are computed, assuming the continuation of the current trends and dynamics of the LULC changes. The same weight values are utilized for the neural network in the simulation of future LULC changes.

Validation
The validation is carried out by comparing the predicted and the real LULC maps for 2010. The true agreement from this validation is measured by the Kappa coefficient.
The Kappa coefficient is widely used in LULC assessments for accuracy [37] to measure the true agreement between the observed agreement and chance agreement [38]. The Kappa coefficient is calculated by where p o is the proportion of observed agreements, and p e is the proportion of agreements expected by chance.
where p ij is the i-th and j-th cell of the contingency table, p i T is the sum of all cells in i-th row, pT j is the sum of all cells in j-th column, and c is the count of the raster category. The contingency table is a matrix form that illustrates the frequency distribution of the variables and is used to show the interrelation between i-th and j-th cells in this study. The interaction of every cell is tabulated into a matrix and calculated. The result explains the agreement of every criterion of each cell. Finally, the contingency table is used to express the percent agreement as a Kappa coefficient.
To determine the exploratory map using the five criteria, a number of simulations to predict the LULC changes for 2010 were conducted by using each criterion, alone and in combinations of two to five, for the exploratory data in the LULC simulation. The result of explanatory map analysis from the seven simulations with different combinations of variables is presented in Table 3. As displayed in Table 3, the overall correctness and Kappa coefficients are high and similar. Among these values, the exploratory data with altitude and distance from the road and their combination produced the highest Kappa score of 0.83 and the highest percent of correctness of 87.28%. The highest Kappa value indicates that the two criteria, altitude and distance from the road, have greater impacts on LULC changes than the other criteria. The two criteria are illustrated in Figure 4. Table 4 demonstrates the transition matrix of the 12 LULC categories between 1990 and 2000. Then, the result is employed to simulate the prediction of LULC for the years 2050 and 2070. five, for the exploratory data in the LULC simulation. The result of explanatory map analysis from the seven simulations with different combinations of variables is presented in Table 3. As displayed in Table 3, the overall correctness and Kappa coefficients are high and similar. Among these values, the exploratory data with altitude and distance from the road and their combination produced the highest Kappa score of 0.83 and the highest percent of correctness of 87.28%. The highest Kappa value indicates that the two criteria, altitude and distance from the road, have greater impacts on LULC changes than the other criteria. The two criteria are illustrated in Figure 4. Table 4 demonstrates the transition matrix of the 12 LULC categories between 1990 and 2000. Then, the result is employed to simulate the prediction of LULC for the years 2050 and 2070.

Results and Discussion
The percentage of area changes is presented by a transformation matrix that is used as the input for the neural network to obtain the transition probability for every input and to produce the LULC output layer. The LULC transition matrix is produced by comparing the area percentages of the LULC categories between the years 1990 and 2000, as shown in Table 4.
The ANN-CA model shows the prediction of LULC for the year 2010 by calculating the transition probability of every cell in the layer between two different LULC raster maps by considering altitude and distance from the road as the explanatory variables. The validation was implemented by comparing the factual LULC in the year of 2010 with the predicted LULC of the same year. The simulation shows a percentage of correctness of 87.28% and a Kappa value of 0.83 by using altitude and distance from road as the explanatory map. The percentage of correctness shows the percentage of the predicted area that is precisely the same as the real area in the same year. The Kappa value shows the degree of accuracy and reliability in a statistical classification. The value of 0.83 shows high agreement between the two different LULC maps from the same year [39].
The difference between the predicted and the real LULC maps of 2010 is described in Table 5. A positive value in the second column shows the overestimation of the prediction over the real condition, while a negative value indicates the opposite. A value of zero occurs when the real and predicted maps are equal or have the same value. The third and fourth columns explain the changes of every classification from the initial year of 2000 to the predicted and the real maps in 2010, respectively. Furthermore, the deviation between the predicted and the real maps of the LULC classifications are displayed in Figure 5.
LULC prediction maps show the projected land use changes in 2050 and 2070 ( Figure 6). The classifications in 2050 and 2070 are based on the classifications in 2000. To predict a species' distribution, the land use change represents the percentage change in suitable land cover classification required for the species to survive in the future prediction. The percentages of these changes are presented in Table 6.

Results and Discussion
The percentage of area changes is presented by a transformation matrix that is used as the input for the neural network to obtain the transition probability for every input and to produce the LULC output layer. The LULC transition matrix is produced by comparing the area percentages of the LULC categories between the years 1990 and 2000, as shown in Table 4.
The ANN-CA model shows the prediction of LULC for the year 2010 by calculating the transition probability of every cell in the layer between two different LULC raster maps by considering altitude and distance from the road as the explanatory variables. The validation was implemented by comparing the factual LULC in the year of 2010 with the predicted LULC of the same year. The simulation shows a percentage of correctness of 87.28% and a Kappa value of 0.83 by using altitude and distance from road as the explanatory map. The percentage of correctness shows the percentage of the predicted area that is precisely the same as the real area in the same year. The Kappa value shows the degree of accuracy and reliability in a statistical classification. The value of 0.83 shows high agreement between the two different LULC maps from the same year [39].
The difference between the predicted and the real LULC maps of 2010 is described in Table 5. A positive value in the second column shows the overestimation of the prediction over the real condition, while a negative value indicates the opposite. A value of zero occurs when the real and predicted maps are equal or have the same value. The third and fourth columns explain the changes of every classification from the initial year of 2000 to the predicted and the real maps in 2010, respectively. Furthermore, the deviation between the predicted and the real maps of the LULC classifications are displayed in Figure 5.
LULC prediction maps show the projected land use changes in 2050 and 2070 ( Figure 6). The classifications in 2050 and 2070 are based on the classifications in 2000. To predict a species' distribution, the land use change represents the percentage change in suitable land cover classification required for the species to survive in the future prediction. The percentages of these changes are presented in Table 6.   The percentages of shrub and garden fields are projected to increase by approximately 0.1547% and 0.3532% from 2000 to 2050 and 0.1768% and 0.3606% from 2000 to 2070, respectively. However, the crop and forest areas are projected to decrease by 1.24848% and 1.6779% from 2000 to 2050, followed by 1.2860% and 1.7256% from 2000 to 2070, respectively. Moreover, most of the classification will change to plantation. In 2050, the percentage of changes to plantation area are projected to increase to 4.1379% by 2050 and 4.1709% by 2070, respectively.
Land use change between 1990 and 2000 in the transition matrix shows that most of the area changed from forest and shrubland into plantations. Two areas with significant changes are forest by 0.045% and shrubland by 0.053% into plantation. However, the highest change was shrubland to crop area at 0.11%. This transition shows that human activities in forest and shrubland are high. The demands of land for human activities change LULC classifications from less profitable (the non-valuable classification: forest and shrubland, which are allotted lower economic profits) to a more profitable classification (valuable classification: plantation such as palm oil and rubber, which provide annual or seasonal income). This condition threatens the species in forest and shrubland by reducing the suitable area for species to survive. Other studies have demonstrated that changes in forest and crop area into oil palm plantation is higher in Indonesia, in particular on Sumatra Island [40,41].   The differences between the land use map from 2000 and those of 2050 and 2070 illustrate the high increases in plantation area of more than 4%, while the other classifications show changes of approximately 0%-1.8%. An increase in one classification indicates a decrease in other classifications. Higher changes of the categories into plantation indicate that the probability for plantation is higher than for the other categories. The changes were acquired by the ANN to weight the categories proportionally for use in the model.

Conclusions
The LULC is of critical importance and forms a baseline thematic map for monitoring, resource management, and planning activities. Moreover, the LULC can be used to develop strategies to balance conservation, conflicting uses, and development pressures. In this study, changes in LULC in North Sumatra, Indonesia, are simulated and predicted using the ANN-CA model.
Five criteria, including altitude, slope, aspect, distance from road, and soil type, are derived from a DEM, road map and soil type map, and utilized as exploratory data for training the ANN-CA model. In the simulation of LULC changes between 1990 and 2000, two criteria, altitude and distance from road, show high impacts on LULC changes. The Kappa index also indicates high agreement between the predicted and real LULC maps of 2010. Then, the ANN-CA model is used to predict the LULC changes for the years 2050 and 2070.
The LULC predictions for 2050 and 2070 show high increases in plantation area, which increase by more than 4% from the same category in 2010. Meanwhile, forest and crop areas are projected to decrease by approximately 1.2% and 1.6%, respectively, by 2050, and by approximately 1.2% and 1.7%, respectively, by 2070, relative to the 2010 LULC, indicating the human influences on LULC changing forest and cropland to plantations.
As described, among all regions in Indonesia, North Sumatra is one of the regions with great forest area that has been threatened by climate change and land use change [31]. In fact, North Sumatra has experienced massive development over the last three decades, as industrialization has changed the land use of most areas, followed by the agricultural expansion [30]. Therefore, there is forest degradation as the result of forest disturbance, especially from land conversion [31]. The human interest in land allocation lures the planning in North Sumatra to support the development of the region. Development changes the land to more productive uses, such as agriculture or industrial types, rather than preserving it as forest [30]. For instance, most of the residents choose crop plantations for reforestation rather than hardwood trees in the Lake Toba catchment area [42]. The resident corporations expand their plantations to increase profit and reduce the forest area [33]. This condition demands the LULC perspective to emphasize the future impact of the changes, especially on loss of forest biodiversity.
The ANN-CA model in this study is based on historical information and observes changes supported by explanatory variables. The CA demonstrates that the change of one pixel into another category relies on its neighborhood. The changes will continue until every pixel undergoes the changing process. The ANN-CA model might be more reliable in terms of reduced subjectivity and uncertainty when compared to other methods such as analytic hierarchy process (AHP) and technique for order preference by similarity to ideal solution (TOPSIS) for weighting criteria [43,44]. Moreover, this study illustrates that the ANN-CA model is successfully applied to conduct multiple LULC change simulations within one CA model by considering the complicated interactions and competition among different land use categories.
On the other hand, long-term effects altering the landscape dynamics, as exhibited in [26,45,46], due to climate factors, such as global warming, extreme weather events, etc., and ecological degradation, such as hydrological variation, soil erosion, etc., are not considered in this study. Moreover, this ANN-CA model does not consider planning and development factors in such a way as to determine demands for different land use types from the macro-scale perspective and to examine scenarios representing future development pathways [47,48]. Therefore, it should be noted that predicted future LULC changes obtained in this study are very limited in a conservative manner.
The forest area in North Sumatra, as described, is one of the largest in Indonesia with large preservation forest and conservation forest areas. However, there is forest degradation resulting from forest disturbance, especially from land conversion. The development of the region by human interests changes the land to more productive uses, such as agriculture or industrial types, rather than preserving it as forest. In general, the LULC changes occur as the impact of conflicts over land interests. For instance, the local residents choose crop plantations for reforestation, and corporations expand their plantations to increase profit and thus reduce forested area in the region. This condition demands the LULC perspective to emphasize the future impact of the change, particularly for loss of forest biodiversity. Therefore, the LULC changes predicted by this study can be used in sustainable forest management and regional planning.
In this study, however, other variables, such as climate, policy, regulation or human development, affecting LULC changes are not taken into account. Therefore, future investigation will be improved by including those variables.