Long-Term Land Cover Data for the Lower Peninsula of Michigan , 2010 – 2050

Land cover data are often used to examine the impacts of landscape alterations on the environment from the local to global scale. Although various agencies produce land cover data at various spatial scales, data are still limited at the regional scale over extended timescales. This is a critical data gap since decision-makers often use future and long-term land cover maps to develop effective policies for sustainable environmental systems. As a result, land change science incorporates common data mining tools to create future land cover maps that extend over long timescales. This study applied one of the well-known land cover change models, called Land Transformation Model (LTM), to produce urbanization maps for the Lower Peninsula of Michigan in United States from 2010 to 2050 with five year intervals. Long-term urbanization data in the Lower Peninsula of Michigan can be used in various environmental studies such as assessing the impact of future urbanization on climate change, water quality, food security and biodiversity. Data Set: available as a supplementary file. Data Set License: CCBY


Introduction
Humans alter the landscape to meet their resource needs [1].One considerable landscape change in response to human interaction includes urbanization [2].Environmental planners have raised concerns about the amount of land that urbanization occupies, especially in the United States and other developed nations [3].While only 2.7% of the world's land is currently occupied by urban development, increased urbanization often leads to the intensification of global resources (e.g., water, food, soil).With global demographic trends projecting 2.5 billion more urban residents by 2050 [4], urbanization will continue with the potential negative impacts on environment and various ecosystem services [5].Long-term land cover data, especially future urbanization data, play a significant role in better understanding the future impacts of land cover changes on multiple ecosystem services [6] such as biodiversity, climate change, water quality and food security [7][8][9][10][11].
A variety of agencies have produced land cover data from the local to global scales, covering different time periods and different spatial resolutions, such as Moderate Resolution Imaging Spectrometer [12], National Land Cover Data (NLCD) [13], International Geosphere-Biosphere Program [14], and GlobCover [15].Most of these historical land cover products created from remote sensing sensors are coarse resolution data and limited in long-term time intervals [16,17].In terms of spatial resolution, for example, Global Land Cover and Moderate Resolution Imaging Spectrometer are at 1 km and 500 m, respectively.In terms of temporal resolution, NLCD has land cover data for entire United States only for 1992, 2001, 2006, and 2011.Decision-makers often use future land cover maps to develop effective policies for sustainable future environments [18].To overcome the limitation of coarse resolution land cover data and their long-term time intervals, we incorporated a land cover change model called Land Transformation Model (LTM) to create future high resolution land cover maps (e.g., 30 m) from 2010 to 2050 with five year intervals for the Lower Peninsula of Michigan in United States [19].
Land cover change is a very complex process driven by non-linear factors, including public policy, behavior, economics, and a variety of biophysical and geographic factors, operating at a variety of spatial and temporal scales [20][21][22].The complex nature of land cover change relationships requires use of modern data-mining tools to extract underlying patterns in land cover change data [23].A variety of data mining approaches including empirical [24], dynamic [25], rule-base [26], agent-based [27] and machine learning techniques [1,2,15] have been applied in land change science.
A variety of studies have compared existing land cover change models with each other across the globe.Many of these studies have shown that machine learning techniques, such as artificial neural network (ANN), performed more accurately than other models [23,28].ANN is inspired by the way biological nervous systems (e.g., brain) process information.ANN is composed of a large number of interconnected processing elements called neurons working together to solve specific problems.Within the ANN, the learning process involves adjustments to the synaptic connections that exist between the neurons.
The Land Transformation Model (LTM) as a data mining approach simulates large-scale land cover change by integrating remote sensing and GIS data within an ANN [28].The LTM has been used to forecast land cover change in a variety of areas across the world, such as the United States [1,29], Europe [30] eastern Africa [23] and Asia [31].Primarily, the LTM has been used to (1) determine the uncertainty levels of land change model outputs at a variety of spatial-temporal scales and land change contexts [32]; (2) couple other process-based models to understand how land cover alters climate [33,34] and water [26,35] and ecosystem dynamics; and (3) generate baseline data layers for online decision making tools [36,37].Here, we applied LTM to produce urbanization maps for the Lower Peninsula of Michigan in United States from 2010 to 2050 with five-year intervals.

Study Area
We used the Lower Peninsula of Michigan as the primary boundary (Figure 1), which includes the areas that drain into the adjacent Great Lakes, incorporating parts of the Chicago metropolitan area, northern Indiana, and the Toledo metropolitan area.Nearly 10 million people live in Michigan's Lower Peninsula, and more than 15 million people when including the boundary extent [38].The largest metropolitan areas include Chicago, Detroit, Grand Rapids, Kalamazoo, Lansing, South Bend, Toledo, and Traverse City.The main commodities produced include corn, soybeans, wheat, hay, cherries, apples, blueberries, potatoes, cucumbers, dry beans, and sugar beets.These eleven commodities account for nearly 98% of the agricultural land across the boundary, and 36% of the entire boundary region; the remaining area is 14% urban, 25% forest, 12% grassland, and 13% open water.

Data
We focused on modeling the conversion of other land cover (e.g., forest, agriculture) to urban (Figure 2B).The output of LTM is a binary land cover change map between two times, which is either coded 1 (for cells converted from other land cover classes to urban class) or coded 0 (for cells remaining in the same land cover class).We used the NLCD in 1992 and 2001 to create the initial binary urbanization maps (Figure 2A).To find cells with a status of 1 and 0, we aggregated NLCDs at each time interval to a binary level covering urban and non-urban classes.By comparing the binary level map in 1992 and 2001, we created a land cover change map between two times, highlighting new urbanization cells in 2001 compared to 1992.
We followed the existing literature in the United States [1,2,23,28] to select the main driving forces that influence urbanization in the Lower Peninsula (Figure 2C), including (1) distance to urban areas, (2) population density, (3) distance to primary highways, (4) distance to secondary highways, (5) distance to rivers, (6) distance to inland lakes, and (7) distance to the coast.Each distance raster was calculated using Euclidean distance in ArcGIS at 30 m × 30 m resolution.All layers were rasters of 25,000 rows and 25,000 columns in 1992.Within the modelling process, urban areas, roads, parks and water bodies in 1992 were labelled as exclusionary zones, that is, areas where no urban expansion would occur.

Data
We focused on modeling the conversion of other land cover (e.g., forest, agriculture) to urban (Figure 2B).The output of LTM is a binary land cover change map between two times, which is either coded 1 (for cells converted from other land cover classes to urban class) or coded 0 (for cells remaining in the same land cover class).We used the NLCD in 1992 and 2001 to create the initial binary urbanization maps (Figure 2A).To find cells with a status of 1 and 0, we aggregated NLCDs at each time interval to a binary level covering urban and non-urban classes.By comparing the binary level map in 1992 and 2001, we created a land cover change map between two times, highlighting new urbanization cells in 2001 compared to 1992.
We followed the existing literature in the United States [1,2,23,28] to select the main driving forces that influence urbanization in the Lower Peninsula (Figure 2C), including (1) distance to urban areas; (2) population density; (3) distance to primary highways; (4) distance to secondary highways; (5) distance to rivers; (6) distance to inland lakes; and (7) distance to the coast.Each distance raster was calculated using Euclidean distance in ArcGIS at 30 m × 30 m resolution.All layers were rasters of 25,000 rows and 25,000 columns in 1992.Within the modelling process, urban areas, roads, parks and water bodies in 1992 were labelled as exclusionary zones, that is, areas where no urban expansion would occur.

Land Transformation Model
LTM uses ANN, which is a machine learning technique, for modelling land cover change [37,[39][40][41].The multilayer perceptron is one of the well-known ANN forms that is most commonly employed in land cover change science [1].The multilayer perceptron ANN consists of one input layer, one hidden layer and one output layer.In this study, the number of nodes set for the input and hidden layers was the same as the number of driving forces [36], while we used one node for the output layer.The output node of the ANN was coded with either 1 (for cells converted from other land cover classes to urban classes) or 0 (for cells remaining in same land cover class) between 1992 and 2001.To avoid overfitting, we used a stratified random sampling approach to select 50% of the data for training purposes since the proportion of urban and non-urban areas was not the same.The multilayer perceptron was trained with the most widely utilized algorithm called back propagation (Figure 2B,C) to estimate ANN parameters.
The back-propagation algorithm encompasses two phases.First, the ANN initially assigns random values to ANN parameters (weights and biases), then the multilayer perceptron applies random weights and biases to the input data in order to estimate the outputs.Second, the ANN calculates the mean squared error (difference between the estimated outputs and reference outputs), which is then propagated backward to the previous layers [40].Optimum model parameters values are estimated by iterating the LTM model through many cycles.A cycle is defined as one complete presentation of all the training data to the ANN [28].Within the forward and backward process, the error between the reference and estimated outcomes are reduced by updating the ANN parameters (e.g., weights and biases).The difference between mean squared errors of cycles is used as a stopping condition, where a training run continues until two successive mean squared error differences reach less than 0.05 [23].The training run stopped after 10,000 cycles.

Land Transformation Model
LTM uses ANN, which is a machine learning technique, for modelling land cover change [37,[39][40][41].The multilayer perceptron is one of the well-known ANN forms that is most commonly employed in land cover change science [1].The multilayer perceptron ANN consists of one input layer, one hidden layer and one output layer.In this study, the number of nodes set for the input and hidden layers was the same as the number of driving forces [36], while we used one node for the output layer.The output node of the ANN was coded with either 1 (for cells converted from other land cover classes to urban classes) or 0 (for cells remaining in same land cover class) between 1992 and 2001.To avoid overfitting, we used a stratified random sampling approach to select 50% of the data for training purposes since the proportion of urban and non-urban areas was not the same.The multilayer perceptron was trained with the most widely utilized algorithm called back propagation (Figure 2B,C) to estimate ANN parameters.
The back-propagation algorithm encompasses two phases.First, the ANN initially assigns random values to ANN parameters (weights and biases), then the multilayer perceptron applies random weights and biases to the input data in order to estimate the outputs.Second, the ANN calculates the mean squared error (difference between the estimated outputs and reference outputs), which is then propagated backward to the previous layers [40].Optimum model parameters values are estimated by iterating the LTM model through many cycles.A cycle is defined as one complete presentation of all the training data to the ANN [28].Within the forward and backward process, the error between the reference and estimated outcomes are reduced by updating the ANN parameters (e.g., weights and biases).The difference between mean squared errors of cycles is used as a stopping condition, where a training run continues until two successive mean squared error differences reach less than 0.05 [23].The training run stopped after 10,000 cycles.
The estimated parameters are then applied to the independent predictor variables in 1992 (Figure 2C) to calculate the likelihood of urbanization in 2001 (Figure 2D).The result of this step is called a suitability map, where cell values vary from 0 (least likely to convert to urban) to 1 (most likely to convert to urban) and describe the probability of urbanization [26].While the suitability map determines the potential for urbanization, the LTM also requires a quantity of urbanization to convert the suitability map to a simulated binary map with values of 0, indicating non-urban, or 1, indicating urbanization [36].These quantities calculated as a result of comparison between the land cover maps in 1992 and 2001 (Figure 2E).The cells in the suitability map are sorted from 1 to 0, and the LTM converts the cells with higher suitability values to urban class until the quantity of urbanization is met (Figure 2F).

Model Calibration
To evaluate the predictive ability of the LTM, the transformation map in 2001 was combined with NLCD in 1992 to create a composite land cover map that was then compared to the NLCD in 2001 (as a reference map) (Figure 2G).The performance of the LTM for modeling urbanization was evaluated using Percent Correct Match statistics (PCM) and Relative Operating Characteristic curves (ROC) [23,28,42].PCM describes the proportion of the reference map where urbanization, and no-urban, have been correctly predicted by the LTM [29].In contrast, ROC is capable of calculating the accuracy across a range of thresholds vary from 0 to 1 [30,40].For each given threshold (e.g., 0, 0.1, 0.2, . . ., 0.9, 1), the suitability map was converted to a simulated urbanization map.We then compared the simulated urbanization map with NLCD 2001 to calculate false positive (disagreements between urban areas) rates and true positive (agreement between urban areas) rates for each threshold [1,30].False positive rates and true positive rates are plotted along X and Y axes for each threshold.The area under the ROC curve represents the model accuracy [43].The calculated PCM (90%) and ROC (85%) indicated that the accuracy of trained model was high enough to be used for future land cover prediction [30].

Future Land Cover Projection
We then used the calibrated LTM between 1992 and 2001 to forecast urbanization from 2010 to 2050 for each 5-year interval (Figure 3).The LTM parameters linearly estimated from urban transitions between 1992 and 2001 were applied to the predictor variables in 2001 to generate the suitability map of urbanization in 2010.The other ingredient for forecasting urbanization was quantity estimates of urbanized areas from 2010 to 2050.These area estimates were required to convert continuous probabilities to the binary simulated land cover map.These values were estimated using future population projection from U.S. Geological Survey (USGS) to estimate the future quantity of urbanization.With area estimates for urbanization, we allocated future urban areas using the suitability map, in the order of high to low suitability values.Urban transition cells with the highest suitabilities were converted until the total urban quantity was satisfied.This process led to predicted land cover maps that superimposed on NLCD 2001, resulting in a land cover map (Figure 3).

Conclusion
The calculated PCM (90%) and ROC (85%) indicate that the goodness-of-fit of our trained model was high enough to be used for future land cover prediction.The predicted land cover maps in each time coded either with values of 0, indicating non-urban, or 1, indicating urbanization.The predicted land cover maps are raster files of 25,000 rows and 25,000 columns.Table 1 shows the future urban areas for metropolitan regions in the Lower Peninsula of Michigan.The predicted land cover maps in the Lower Peninsula of Michigan can be used to assess the impact of future urbanization on climate change, water quality, food security and biodiversity into the future.Author Contributions: All authors contributed to writing the manuscript.Amin Tayyebi conceived the project, ran the LTM, and wrote the initial draft of manuscript.Samuel Smidt prepared spatial and temporal land cover data for modeling.This included data conversion and geographically projecting each land cover as well as processing predictor variables.Bryan C. Pijanowski is the owner of LTM and provided insights during the entire project.All authors have read and approved the final manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Conclusions
The calculated PCM (90%) and ROC (85%) indicate that the goodness-of-fit of our trained model was high enough to be used for future land cover prediction.The predicted land cover maps in each time coded either with values of 0, indicating non-urban, or 1, indicating urbanization.The predicted land cover maps are raster files of 25,000 rows and 25,000 columns.Table 1 shows the future urban areas for metropolitan regions in the Lower Peninsula of Michigan.The predicted land cover maps in the Lower Peninsula of Michigan can be used to assess the impact of future urbanization on climate change, water quality, food security and biodiversity into the future.

Figure 1 .
Figure 1.Study area in the Lower Peninsula of Michigan in United States.

Figure 1 .
Figure 1.Study area in the Lower Peninsula of Michigan in United States.

Figure 2 .
Figure 2. Schematic illustrating the use of the Land Transformation Model.

Figure 2 .
Figure 2. Schematic illustrating the use of the Land Transformation Model.

Figure 3 .
Figure 3. Future urbanized maps for the Lower Peninsula of Michigan in United States from 2010 to 2050 provided by the Land Transformation Model.

Figure 3 .
Figure 3. Future urbanized maps for the Lower Peninsula of Michigan in United States from 2010 to 2050 provided by the Land Transformation Model.