Recovering Forecast Distributions of Crop Composition: Method and Application to Kentucky Agriculture

: This paper proposes a novel application of the multinomial logit (MNL) model using Cropland Data Layer and ﬁeld-level boundaries to estimate crop transition probabilities, which are used to generate forecast distributions of total acreage for ﬁve major crops produced in the state of Kentucky. These forecasts distributions have a wide range of applications that, besides providing interim acreage estimates ahead of the June Acreage Survey, can inform the ability of producers to incorporate new crops in the land-use rotation, investments in location-speciﬁc capital and input distribution as well informing the likelihood of adverse water quality events from nutrient run-o ﬀ .


Introduction
Improvements in crop production forecasts and forecasting methods have been a focus of agricultural economics research for decades because production forecasts directly influence year-to-year local, state, regional, national, and international economies. Crop production assessments also provide important implications for agribusiness and food management [1]. On a macro level, understanding the determinants of crop acreage helps with the identification and management of the demand and supply of crop production [2]. For example, merchandizers rely on crop supply and demand estimates prepared by both public and private organizations [3]. Moreover, estimates and forecasts of acreage can have a significant impact on futures prices and market volatility [4], as well as market participants [5]; thus, impacting farm income and investment in agriculture. As a consequence, the release of forecasts is highly anticipated market events [2,4].
The National Agricultural Statistics Service (NASS) of the U.S. Department of Agriculture (USDA) is the primary provider of public information on crop acreage forecasts [4]. According to Adjemian and Smith [6], the USDA also releases the World Agricultural Supply and Demand Estimates (WASDE) at the beginning of May in each year, and it provides forecasts for several crops of annual U.S. production. Thereafter, USDA releases a new WASDE report each month by adding detailed farm surveys, weather forecasts, and expected market development from the NASS and Interagency Commodity Estimates Committees (ICEC). In addition to the NASS estimates, private expectations data by Conrad Leslie and Informa Economics (previously Sparks Companies) have been used for crop acreage forecasts, especially for corn, soybeans, and wheat [5]. These crop production forecasts, as well as the acreage estimation of the NASS, are based on survey data. To be specific, the acreage estimation is reported based on the March and June Agricultural Surveys. In the March Agricultural or water quality outcomes. It is also the case, that our approach could be employed to generate distributions of land-use forecasts several years into the future, which is another advantage relative to the existing June Acreage Survey estimates.
In summary, our work represents an exercise that (i) makes new use of the Cropland Data Layer and field-level boundaries for generating land-use forecasts, (ii) proposes a novel application of the multinomial logit model to generate forecast distributions of total acreage for the five largest crops produced in the state of Kentucky and (iii) provides a discussion on how to harness the agricultural land-use forecast distributions to inform policy-discussions related to investments in location-specific capital, distribution of supply inputs, soil and water quality management as well as viability of new crops such as hemp becoming a significant part of a state's crop portfolio.
The remainder of the paper is structured as follows: Section 2 discusses related literature, Section 3 describes and presents data. Section 4 explains the empirical models, including the multinomial logit model and a first-order Markov chain approach. Section 5 discusses the analysis and presents the results. We conclude the paper in Section 6 and offer policy implications and areas for future research.

Literature Review
Crop rotation (also called polyculture) is defined as growing a series of multiple crops in the same field in alternating years whereas monoculture is defined as growing a single crop in consecutive years in the same field [13]. Farmers commonly practice crop rotation because its advantages offset its disadvantages. Crop rotation benefits include increased yield [14,15], improved soil fertility [16][17][18], reduced greenhouse gas emissions [19], and reduced economic risk by having more than one crop as a potential income source [13]. The traditional crop rotation in western Kentucky is either a Corn-Soybean Rotation or Wheat-Double Crop Soybeans-Corn. In areas with tobacco production, it is typically Tobacco-Tobacco-Alfalfa. Both corn and tobacco require significant nitrogen fertilizer for growth, so that is why they are rotated with either soybeans or alfalfa. Furthermore, incorporating leguminous crops, commonly known as nitrogen-fixing crops such as soybeans, into a rotational sequence with the region's dominant crop will result in increasing the robustness and resilience of the local agricultural system [20,21].
A MNL method, developed by McFadden [22], is employed in this study to model the crop choice behavior. Using this method, we can model a farmer's choice of a crop among different and exclusive alternatives (i.e., corn, soybean, tobacco, wheat, and alfalfa) and estimate Markov transition probabilities for these crops. Later in Section 4.1, we explain the MNL method in detail. The MNL model has been widely used in agricultural-related studies especially for modeling land use [23][24][25][26]. Hardie and Parks [24] employ the MNL model into the land use decision by incorporating heterogeneous land quality in the southeastern U.S. Plantinga, Mauldin and Miller [25] simulate carbon sequestration based on estimates of land use share in Maine, South Carolina, and Wisconsin by utilizing the MNL model. Wu, Adams, Kling and Tanaka [26] predict crop choice and tillage practices to assess the economic and environmental consequences of agricultural land-use changes by using the MNL model. Carrión-Flores, Flores-Lagunes and Guci [23] use the MNL model by incorporating spatial dependence in Medina County, Ohio, for the determinants of land-use choices. Paton, et al. [27] investigate the impact of rainfall and crop profit margin on crop choice by using MNL regression to generate the crop choice transition probabilities.
Matis, et al. [28] propose a methodology to forecast crop yields and provide forecast distributions of crop yield by using Markov chain theory. They find that forecasting crop yield distributions are more informative compared to the forecasting mean yields. Following the literature in crop yield distribution forecasting, we use the first-order Markov Chain approach to predict crop acreage distributions by accounting for the dynamics in the analysis with specifying the year-to-year transitions between five crops. To the best of our knowledge, our approach has not been employed previously to forecast crop acreage distributions, though there are several relevant applications to inform producers and policymakers.

Data
The primary source of information for our crop choice data is the Cropland Data Layer (CDL) produced by NASS. The CDL was initiated in early 1997 on a limited basis to provide annual geospatial content to customers who were interested in annual cropland cover updates. Starting in 2008, the CDL provides coverage for the lower 48 state in the U.S. The CDL provides a comprehensive, raster-formatted, and geo-referenced imagery for crop-specific land cover classification to identify field crop types accurately and geospatially [29]. Note that raster, also called raster graphic, is simply an image that represents the rectangular grid of pixels. Each pixel in the CDL is a ground resolution of 30 meters by 30 meters. The CDL includes crop or land use classification codes, which are assigned to each pixel and classified by NASS using data from satellite sensors [9,16]. The use of the CDL data to date has been limited but has received more attention recently to study farmer's behavior regarding crop choice [9,16,18,21,30,31]. We use the CDL observations for Kentucky from 2010-2015. We focus our analysis on five main crops: corn, soybeans, tobacco, wheat, and alfalfa. These are the top five-row crops in Kentucky based on acreage (see Figure A1 in the Appendix B). This will provide an idea of how much areas these crops occupy in Kentucky Next, we employ the Common Land Unit (CLU) boundaries, obtained from the GeoCommunity to identify field boundaries. Based on the Farm Service Agency (FSA) of the USDA, the CLU is defined as the smallest unit of land and individual contiguous farming parcel. The CLU includes a permanent, contiguous boundary, common land cover, and land management.
To construct a field-level crop choice dataset we used the following steps: First, to remove non-agricultural fields we overlay the CLU with the 2011 National Land Cover Dataset (NLCD) 2011, which is the most recent national land cover product, produced by the Multi-Resolution Land Characteristics (MRLC). Second, we overlay the CLU with the CDL to identify changes in rotations on a field by field basis instead of pixel or county basis. Third, we apply a moving window filter, which replaces each cell in a raster based on the majority of adjacent cells; this removes mis-specified (i.e., spurious) cells and to smooth the raster. Finally, we employ zonal statistics, which calculate the values of a raster within the zones of another dataset, to identify how many pixels are located in each field. Table 1 shows the total number of observations and percent of observations by crop class and by year, respectively. We used 1,874,184 fields in total, with approximately 1.5 million observations. In Appendix A, we describe the data merging and cleaning process in detail. In 2015, soybeans, corn, alfalfa, tobacco, and wheat acreage represented 42%, 36%, 1%, 0.6%, and 0.4% of total acres in Kentucky, respectively. As illustrated in Figure A1 in Appendix B, the majority of corn and soybeans are planted in western Kentucky.
A data limitation is that 28 (out of 120) counties in Kentucky are excluded from this study because the CLU data was not available. Table 2 shows the missing acres in percentage compared to the original CDL data. Based on Table 2, we are losing more data for tobacco and alfalfa compared to corn, soybeans, and wheat. Figure A2 in Appendix B shows the locations of excluded counties.
Crop choice decisions by farmers are partially dependent on the weather (e.g., precipitation and temperature) observed in a growing season and, therefore, are predictors of land-use [32]. We obtained precipitation and temperature data from the Parameter-elevation Regressions on Independent Slopes Model (PRISM). For the weather variables, we use average precipitation in April, May, and June, the average temperature in June, July, and August months of each year.
Soil quality is another predictor of crop choice behavior by farmers; thus, we obtain data on soil textures (e.g., percent clay, percent silt, and percent sand) from Gridded Soil Survey Geographic (gSSURGO) database, which is provided by USDA National Resources Conservation Service (NRCS). Finally, we obtain cross-sectional National Elevation data (30-meter resolution) from the Geospatial Data Gateway provided by USDA-NRCS to calculate the elevation and slope. All of the data, which are precipitation, temperature, slope, elevation, and soil textures, are spatially joined based on the unique field ID. Table 3 shows the summary statistics of the data used in this study. As shown in Table 3, the mean acreage of these fields is 5.97 acres, and their soil composition is 66.3% silt, 20.6% clay, and 11.8% sand, on average. These fields represent 92 (out of 120) counties in Kentucky. Our sample covers 92 counties accounting for 32,149.43 square miles which corresponds to almost 80% of the total land in Kentucky. In addition, the average monthly temperature and precipitation are 29.94 • C and 143.68 mm, respectively.

Multinomial Logit Model
This study employs the MNL model developed by McFadden [22] to estimate Markov transition probabilities for the five primary crops in Kentucky: corn, soybean, tobacco, wheat, and alfalfa. In our setting, the farmer chooses one alternative among different and exclusive alternatives. The random utility model (RUM) provides the theoretical economic justification for using an MNL when modeling choices. Under the RUM framework the chosen the alternative is determined by the utility level, U ij , each choice j, provides the farmer i. It follows that where i = 1, . . . , N; j = 1, . . . , m; V ij is a deterministic component that depends on regressors and unknown parameters, ε ij is the unobserved component. The alternative providing the highest level of utility will be chosen; in other words, alternative j is chosen if and only if ∀ k j U j > U k . Suppose a farmer chooses at least one crop among competing alternatives. The farmer will choose corn if the utility by choosing corn is higher than the utility by choosing soybeans (i.e., U Corn > U Soybean ). Under the RUM framework, the utility and the choice are random in that some of the determinants of the utility are unobserved, implying the choice is supposed to be analyzed in probabilities. In this regard, we observe the outcome y i = j if alternative j provides the highest utility of the alternative, and the general expression of the probability of choosing alternative j can be defined as: According to Croissant [33], the MNL model generally assumes that error terms are independently and identically distributed (IID). With this assumption, Equation (2) can be shown that the choice probabilities are where P ij = 0 < P ij < 1 and m j=1 P ij = 1. In this study, we use crop choice at year t as the dependent variable and choice in year t − 1, precipitation, temperature, soil texture variables, slope, and elevation as explanatory variables to predict crop choice behavior of Kentucky farmers. This study also considers and includes one more alternative, called "other." The choice of other represents that farmers do not plant any of five major crops in this field during our study periods or other crops. The other category includes fallow, oats, barley, grains sorghum, and double-crop beans. By setting β j = 0 for the choice of other, the set of coefficients corresponding to each outcome are estimated as Equation (4). Note that we record outcomes 1, 2, 3, 4, 5, and 6 for other, corn, soybean, tobacco, wheat, and alfalfa respectively. Since the recorded numerical values are arbitrary, the greater number does not imply better outcome compared to the smaller number. In addition, the outcome of no production (i.e., the choice of other) is our base outcome: 1=0.

Markov Chain Approach
The Markov chain approach has been widely used in land use studies [34][35][36][37][38][39] as well as agricultural-related studies especially for studying the crop rotation behavior [27,28,[40][41][42][43]. Based on Taylor and Karlin [44] and Savage [38], a Markov process {X t } given the value of X t is a stochastic process with the property that the values of X u for u < t do not affect the values of X s for s > t. In other words, knowledge of past behavior is predictive of the probability of any specific future behavior if the current state is known. The Markov property, in general, can be defined as the following: for all time periods t and all states i 0 , . . . , i t−1 , i, j.
The probability of X t+1 in state j given X t in state i is called the one-step transition probability, denoted by P t,t+1 ij . The Markov chain is stationary if the one-step transition probabilities are not a function of the time variable t. Considering a stationary Markov chain, the one-step transition probabilities are re-written as The one-step transition probabilities of outcome j observed in the current period given the outcome i observed in the previous period can be modified as follows: With a stationary Markov chain, the transition probabilities are arrayed in a square matrix with a dimension based on the number of possible outcomes. The Markov process is then fully defined based on both the transition probability matrix and initial condition. In this study, we have six possible outcomes based on six different choices, and the transition probability matrix has 6 × 6 with elements, which determine the probability of transitioning from a row outcome to a column outcome: the row outcome represents a cropping choice among five crops in the previous year, and the column outcome represents the crop choice in the current year.

MNL Results and Transition Probabilities
This study implements several hypotheses tests for measure fit. First, we test whether all of the coefficients associated with the independent variable are equal to zero by using the Likelihood-ratio (LR) test and Wald test. These tests allow us to determine whether the independent variables used in the MNL model are significant across all outcome categories. Based on the results of the LR and Wald tests, we reject the null hypothesis that all coefficients associated with given variables are zero, implying any variables cannot be dropped from the model since independent variables have a significant impact across all crop choices. Second, we test whether some categories of the dependent variable can be combined or not by using the Wald test. If outcomes are not differentiated concerning the independent variables, we combine outcomes. We find no evidence to support combining outcome variable categories. Table 4 presents the estimated results of the MNL model; in a third test, we see the chi-square and p-values indicate that the model as a whole is statistically significant. The results are consistent with the expected crop rotation results for all five crops used in this study. For example, in the case of corn, it is expected that there is a higher likelihood of soybeans or wheat to be planted after corn compared to corn, tobacco, or alfalfa. We also calculate the marginal effects, which are reported in Table 5. Based on the marginal effects in Table 5, it shows that if corn was planted in the previous year, then compared to other choice category, it is associated with corn being 6.1% less likely, soybean being 27.7% more likely, tobacco being 0.002% more likely, wheat is 0.1% more likely, and alfalfa is 0.01% less likely to be planted. In Table 5, we do not report the marginal effect of the choice of 'other'. In general, the marginal effects sum up to zero.  As another measure of model performance, McFadden's pseudo R-squared commonly used for summarizing in-sample model fit. Based on Louviere, et al. [45], in-sample model fit is considered to be extremely good if the value of the McFadden's R-squared is between 0.2 and 0.4. Domenech and Mcfadden [46] argue this range is equivalent to 0.7 to 0.9 for a linear regression model; our model has a pseudo R-squared of 0.25 indicating a high degree of in-sample explanatory power.
We also conduct an out-of-sample validation test for the predicted probabilities from the MNL model. In other words, how accurately does the estimated model predict the crop choice? For the validation test, we employ the following steps. First, we select 10% of the data using a random sampling process, which we call our test sample. Second, we estimate the MNL model with the remaining dataset (90% of data, also called training set), and then predict the crop choice probabilities for each observation in the test sample. Third, we assign crops based on the highest predicted probabilities for each field. For example, if a field has predicated probabilities of 0.55, 0.20, 0.10, 0.5, 0.5, and 0.5 probabilities for corn, soybeans, tobacco, alfalfa, wheat, and other, respectively then corn (having the highest probability of being planted) is assigned to the field. Finally, we compare these predictions with actual crop choice in each field and calculate the acreage accuracy. Based on the out-of-sample validation exercise, 60.7% were correctly predicted. We also resize the test sample sizes by 20% and 30%, and we find that probabilities are correctly predicted by 61.4% and 60.7%, respectively.
As a summary of the MNL model results presented in Table 4, we calculate average conditional predicted probabilities or Markov transition probabilities over our study period. Table 6 shows average transition probabilities based on the annual predicted transition probabilities from 2010 through 2015. Based on Table 6, this study finds that if corn is planted in year t-1, then the average predicted transition probability between 2010 and 2015 is 0.411 probability that corn will be planted in year t. As a summary, average crop rotation probabilities between corn to soybean and soybean to corn from the year t-1 to the year t are 0.239 and 0.309, respectively. Compared to other crops such as tobacco, wheat, and alfalfa, transition probabilities of corn and soybean show relatively lower probabilities in their transition. Martinez and Maier (2014) state that crop rotation between cereal crops such as corn and wheat followed by leguminous crops such as soybean and alfalfa is a common example. Therefore, farmers switch corn to soybean and soybean to corn for not only maintaining and improving soil fertility but also protecting the environment from the nitrogen runoff. Overall, the statistical performance of our model demonstrates significant explanatory power and qualitatively predicts general crop rotation patterns that accord with our priors, though new data and model refinements can likely improve the explanatory power of our estimated model.

Simulation Exercise
The objective of this section is to generate a forecast distribution of total acreage for crop I in the year t using information up to year t-1 (e.g., predicting crop composition in 2016 based on using data up to 2015 to estimate the MNL model). For this purpose, we follow three steps. First, we use an MNL specified in the previous section to estimate the probability of a farmer choosing crop I for field j. Using this method, we generate a probability distribution for each field j. Second, we make a single draw from each field's distribution (e.g., field 1 turns up corn, field 2 turns up corn, field 3 turns up soybeans and so on for each field). Based on this single draw for all fields and knowing the acreage corresponding to each field, we are able to sum up the aggregate (forecasted) acreage corresponding to each crop-this is one realization of the aggregate composition of acreage (for one draw). Third, we complete 1000 such draws; thus, generating 1000 forecasts of aggregate acres by crop. It is exactly these 1000 draws that we use to generate our forecast distributions by crop type.
We apply this method to Kentucky fields and predict the expected crop composition in 2016 based on our MNL model results. Figure 1 indicates forecasted distributions for each crop in 2016. When comparing these to the actual state average acres produced, we find that the forecasted mean is close to the historical means for the simulated counties. There are missing counties from the data set due to the lack of CLUs for those counties. The two largest crops produced in the state are by far corn and soybeans. As expected, these distributions are significantly wider than the other three crops considered. In general, there is a little variation in the tobacco, wheat, and alfalfa acres in the simulation, which makes sense given growing and contractual considerations. Alfalfa is a perennial crop and has a five to seven-year rotation and is typically rotated with tobacco. This causes the acres of this crop to be fairly stable over time. Tobacco is primarily produced via a production contract. This creates a situation in which the acreage will be quite stable from year to year. Likewise, for wheat, Kentucky is home to a significant milling and distilling industries that have relatively stable contracts with producers. Lastly, this leaves corn and soybean acres to be the primary acreages in the state that shift to fill in the gaps. These forecast distributions demonstrate this phenomenon.  As a summary of results, findings show there are higher probabilities of planting soybeans or wheat after corn relative to corn after corn, tobacco, or alfalfa. In addition, the transition probability of the crop rotation demonstrates that corn will be planted after soybeans, and vice versa and that As a summary of results, findings show there are higher probabilities of planting soybeans or wheat after corn relative to corn after corn, tobacco, or alfalfa. In addition, the transition probability of the crop rotation demonstrates that corn will be planted after soybeans, and vice versa and that alfalfa has a lower probability of being rotated with other crops from year to year. These findings are expected with traditional crop rotation in the U.S., and a characteristic of a perennial crop, especially for alfalfa. Finally, forecasting results indicate that there are significantly wider distributions in corn and soybean, whereas there is a little variation in the tobacco, wheat and alfalfa acres in the simulation.

Discussion of Production and Policy Implications
The forecast distributions we generate can inform both short and long-run investment decisions. In the short run, fertilizer suppliers must decide how much fertilizer to have in inventory and to where they should anticipate sending fertilizer supply [47]. Location-specific demand for fertilizer is uncertain and our forecast distribution reflects this uncertainty. For example, in response to uncertainty, fertilizer suppliers may have ready buffer fertilizer supplies where forecast distributions are wider.
In the longer term, investments in location-specific capital will benefit from the knowledge of forecast distributions. For a historical example from the recent past, an ethanol plant will generally have a higher Return on Investment (ROI) when there is a high degree of certainty for expected crop production, in the case of an ethanol plant -expected corn production. While there is less discussion of new ethanol plants in Kentucky, the distillery industry continues to grow in the state and requires investment in new distilleries, another example of location-specific capital [48]. Knowledge of the distribution of corn and wheat compared to soybean and new crops such as hemp, which benefits from flex in land use composition, can support investment decisions in the industry. Note that Kentucky is ranked the third largest state in the U.S. to grow hemp. Total areas of hemp grown in 2017 and 2018 were 3271 acres and 6700 acres, respectively. In addition, growing industrial hemp as an agricultural commodity in the U.S. was legalized based on the 2018 Farm Bill, suggesting more room for growth.
Our forecast distributions may also offer insight into nutrient load management and water quality outcomes. For example, with parcel-specific information on soil characteristics, land slope and proximity to waterways, then predictions of nutrient run-off can be generated for crop choice under normal and adverse weather conditions. Such information may be combined with our forecast distributions to generate nutrient run-off distribution for both normal and adverse weather conditions. Further, these distributions may be combined with water quality standards to calculate exceedance probabilities on the likelihood that nutrient loads in local waterways exceed water quality standards under normal and adverse weather conditions. To enrich this analysis for specific applications, qualitative information could be collected for all the farmers in the region on production practices, soil test values, and other demographic information to further enhance the prediction ability of the model.

Concluding Remarks
This study proposes a novel application of the MNL model to estimate the conditional transition probabilities of crop choice and then generates forecast distributions of total acreages by crop type. For this purpose, we utilize the Cropland Data Layer (CDL), which we overlay with the Common Land Unit (CLU) dataset to accurately define crop choice at the field-level-we focus on the production of corn, soybeans, tobacco, wheat, and alfalfa in Kentucky. Based on transition probability estimation results, we find that corn is more likely to be followed by soybeans, as would be expected. For tobacco and alfalfa, they are found to be monoculture crops since they are more likely to be planted in consecutive years. These findings are consistent with the traditional crop rotation in Kentucky.
Specifically, rotation between soybeans and corn can be explained by changes in prices, but Moreover, in part, by soil productivity. Since corn requires a significant amount of nitrogen fertilizer, farmers tend to incorporate leguminous crops such as soybean into a rotational sequence, which can result in wider forecast distributions. Regarding the forecast distributions for the other crops, they are narrower due to the fact that alfalfa is a perennial crop and that tobacco and, frequently, wheat are contracted crops. These estimated forecast distributions can be used and applied in various fields of applied research.
The method proposed in this study could be used to evaluate where the most likely places are for the production of new crops such as hemp, inform investments in location specific-capital such as new distilleries, provide information for the management of location-specific input storage distribution such as fertilizer, and assess the likelihood of adverse water quality events such as those triggered by nutrient run-off from crops with significant nutrient applications. Our results may also supplement NASS June Acreage Survey estimates in areas where response rates are low and can serve as interim estimates in the Winter season before the Spring Survey. It is difficult to make direct comparisons between our estimates and those of NASS since NASS combines estimates for several counties Kentucky (i.e., there are only estimates for 79 of 120 counties, and these do not match with counties for which we generate estimates). Despite differences in exact coverage, historical distributions of corn and soybean from 1990 to 2015 based on data from USDA NASS (see Appendix B, Figure A3) show that the forecasted distribution of corn looks similar to the historical distribution of corn even though the average acreage is lower in the forecasted distribution. This can be explained by missing counties in the CLU (See Figure A2). For soybeans, we find that the historical distribution of soybeans is right-skewed, whereas the forecasted distribution of soybean is normal. Finally, the generation of the forecast distributions is another way for farmers, policymakers, and other stakeholders to consider or visualize uncertainty in forecast estimates by crop.
Several limitations should be outlined alongside our findings. First, some crops are poorly identified in the CDL data. Based on CDL accuracy assessment information provided by NASS, average accuracy for tobacco, wheat, and alfalfa in Kentucky is 76.1%, 48.8%, and 74.1% from 2010 to 2015, whereas corn and soybeans are identified with 96.1% and 93.5% accuracy, respectively. Higher quality of the CDL data will result in better prediction. Second, this study, for simplicity and data availability, we assume that farmers plant a single crop in each field per growing season instead of planting several crops in one field in one growing season. In the reality of farm production, however, a farmer might grow more than one crop in their field, which needs to be addressed in future studies. Third, our MNL model and simulation exercise are only based on agronomic characteristics. Therefore, this study can be extended by incorporating microeconomic variables such as expected net return, expected price, and farmer's characteristics. A caveat for this study is that we focus on Kentucky, where agricultural lands are relatively heterogeneous compared to some states like Iowa, Illinois, and Nebraska, where agricultural lands are homogeneous-our results do not easily generalize to other states.
Despite these limitations, our use of the CDL to tackle the specific empirical objective of forecasting agricultural land-use distributions is, to our knowledge, novel and presents several new avenues of applied research.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A Discussion of Data Set
Although we mentioned and described how we merged and utilized all the different data sets in the data section, we would like to discuss some difficulties we had and how we overcame them more in detail. First, we realized that the field boundary data do not cover the entire state of Kentucky (i.e., there are missing field boundaries) as we mentioned in the manuscript (see Figure A2). However, we think those missing fields do not represent the major crops produced in Kentucky. In other words, those missing fields may be a problem in other states if the missing fields cover a large portion of the major croplands. Utilizing the CDL data with CLU, therefore, researchers should pay attention to those missing fields and carefully check before the merge with the CDL. Second, we had an issue in identifying the crop choice. In realistic, each field is not fully covered by one single crop based on the pixel in CDL when we overlay with CLU (see Figure A1). For instance, there might exist multiple crops in one single field, so we make a strong assumption that the filed is corn if corn acreage dominates other crops. Alternatively, an analyst is able to identify the representative crop in the field by using a centroid point if the point interacts spatially with the pixel. However, this may work if we consider with a large sample such as all 48 contiguous U.S. However, this may provide inaccurate results in forecasting perspectives. Third, some data sets (especially Soil, Slope, and Elevation data) used in this study are not time-varying data; in other words, those data sets are data for a given year. However, those data sets are only available information that can be merged with the field level data, and we assume soil quality, slope, and elevation for lands do not significantly vary over time.

Appendix B Additional Figures and Tables
Sustainability 2020, 12, x FOR PEER REVIEW 4 of 18 dominates other crops. Alternatively, an analyst is able to identify the representative crop in the field by using a centroid point if the point interacts spatially with the pixel. However, this may work if we consider with a large sample such as all 48 contiguous U.S. However, this may provide inaccurate results in forecasting perspectives. Third, some data sets (especially Soil, Slope, and Elevation data) used in this study are not time-varying data; in other words, those data sets are data for a given year. However, those data sets are only available information that can be merged with the field level data, and we assume soil quality, slope, and elevation for lands do not significantly vary over time. Figure A1. 2015 major crops produced in Kentucky from Cropland Data Layer (CDL). Notes: This map is in the field level and created using CDL dataset and filed boundaries. In 2015, Corn was a major crop in Kentucky, followed by Soybeans, Tobacco, Alfalfa, and wheat.