Linking Arable Crop Occurrence with Site Conditions by the Use of Highly Resolved Spatial Data

Agricultural land use is influenced in different ways by local factors such as soil conditions, water supply, and socioeconomic structure. We investigated at regional and field scale how strong the relationship of arable crop patterns and specific local site conditions is. At field scale, a logistic regression analysis for the main crops and selected site variables detected, for each of the analyzed crops, its own specific character of crop–site relationship. Some crops have diverging site relations such as maize and wheat, while other crops show similar probabilities under comparable site conditions, e.g., oilseed rape and winter barley. At the regional scale, the spatial comparison of clustered variables and clustered crop pattern showed a slightly stronger relationship of crop combination and specific combinations of site variables compared to the view of the single crop–site relationship.


Introduction
In the last several decades, European arable farming has been characterized by modifications of cropping patterns and crop choice driven by enormous progress in plant breeding, plant protection, fertilization, and drainage techniques [1,2]. Additionally, market prices, farm subsidies, and political incentives such as support of bioenergy crops have influenced crop choice [3][4][5]. Recent studies have shown that a few cash crops are preferentially grown both in time and space while other crops are neglected [6,7]. In Northern Germany, maize and winter wheat are cropped on more than 50% of the arable area, and in many regions only one to three relevant crops are grown [7]. On the other hand, a decreasing importance of regional site conditions such as soil conditions, water supply, and climate for choosing a crop for a given site can be observed [8,9]. Thus, the relationship between site conditions and farmers crop choice (hereafter referred to as the crop-site relationship) seems to become weaker in modern farming.
One initial objective of the Common Agricultural Policy (CAP) is to increase productivity. This policy, therefore, has been a major driver of land use change for many decades [10]. The reform of 2003 introduced new rules of payments to framers. Payments were decoupled from production to Single Farm Payment. At the same time, intervention prices for specific crops were maintained. National schemes on the promotion of renewable energy crops supported the intensive cultivation of crops for biomass production [11]. All this resulted in a continuation of intensive arable production in many historically intensively managed regions [12][13][14]. The latest reform of the CAP in 2013 implemented political instruments that are commonly named with the term "greening" [15] such as crop diversification. However, there is lack of knowledge ensuring that farmers have enough options to crop diversification. However, there is lack of knowledge ensuring that farmers have enough options to diversify crop rotations. In a recent approach, it was shown on the basis of spatial data that some crop rotation patterns refer to site conditions, whereas others explicitly do not [16]. To our knowledge, there is no spatial explicit information on the extent to which the crop-site relationship still exists in recent landscapes. We present here a method to detect the relationship of crop cultivation and site conditions to improve the understanding and assessment of ecosystem services in the agricultural system.
With the presented methods, a binary logistic regression and a k-means clustering, we analyzed crop patterns in the landscape to understand to what extent crop choice still depends on site conditions. We first explored how intensive the individual relationship between the single crop and the single site variable is. Second, we localized regions of relationship between the clustered sets of site variables and the clustered crop patterns. Our study combines site variables and crop data of the year 2011 for the German federal state Niedersachsen (Lower Saxony), which includes an exceptional variety of agricultural systems. These characteristics made the region a good example for other arable regions and for the estimation of future trends in agricultural land use.

Research Area
Lower Saxony is characterized by various site conditions and a broad spectrum of agricultural land uses. The 2.6 million ha of farmland are cultivated by 41,730 farms with an average farm size of 61.8 ha [17]. During the last decade, maize (Zea mays L.) became the most dominant crop followed by winter wheat (Triticum aestivum L.) and oilseed rape (Brassica napus L.) ( Figure 1). The northwestern part is dominated by marshy land with maritime climate, a high proportion of permanent grassland and extensive cattle breeding in the north and livestock breeding in the west. The cropping proportion of maize on arable land is above average for the Lower Saxonian acreage in this region. In the eastern part, sandy moraine soils with mixed farms are dominating. Arable farming characterizes the middle and south of Lower Saxony established on loessial soils in a hilly terrain influenced by subcontinental climate. The preferred crops under these conditions are sugar beet (Beta vulgaris subsp. vulgaris), oilseed rape, and winter wheat.

Data Characteristics and Processing
Our analysis followed two complementary approaches to detect the characteristics and spatial distribution of specific crop-site relationship. In a first step, a logistic regression analysis was processed that combines crop information at the field scale for the ten most commonly used crops in Lower Saxony with site variables such as soil, precipitation, and livestock density to characterize the relationship between these and the crops at the field scale. This result is compared with the result from a k-means clustering process to localize spatial overlays of clustered crops and clustered site variables at the regional scale.
For the crop data at the field scale, the Land Parcel Identification System (LPIS) was used, which is a yearly updated database that supports the administration of direct payments for European farmers as part of the Integrated and Control System (IACS). It was established in all member states of the European Union in 1992 and developed concurrently with political reform measures [19]. In Germany, the data are managed by the German Federal States' institutions. The access is limited due to privacy protection reasons, and special permission is required for scientific use. For this study, information about the main agricultural land use type in 2011, the field size, and individual field identification numbers were provided for the state Lower Saxony. The dataset was attributed to a GIS-geometry that comprises the boundaries for all agricultural parcels (about 990,000 records in total) [20]. Due to a small amount of imprecise field identification, e.g., the assignment of one ID to more than one field, the IACS dataset had to be debugged for uncertainties. For the analysis, only arable fields were included. Hence, with a loss of 15% due to imprecise field identification and intersection loss, the basic dataset of the analysis consists of 444,009 agricultural parcels.
To analyze the crop-site relationship, it was necessary to find spatial variables that represent the site conditions of the investigated area in a suitable resolution and area-wide consistent availability. Official data from well-established public sources satisfied these requirements ( Table 1). The variables were selected with the aim to represent the environmental site conditions in Lower Saxony. This northwestern part of Germany is characterized by high local densities of livestock husbandry and grassland farming ([21], Figure 2). Therefore, variables concerning animal production were included. The data for cattle density, pig, and poultry density, and the average farm size were extracted from agricultural census data at LAU-2 (Local Administrative Unit) scale ( Figure 2). The relative biotope index was developed by the Julius Kühn-Institute, the German Federal Research Centre for Cultivated Plants, to estimate the biotope features in agricultural landscapes. The value for the relative biotope density was calculated using the locally observed density of linear biotope habitats (field margins and hedgerows) and patch biotopes (small woods and grassland patches) per estimated minimum biotope density at LAU-2 scale. The latter was extrapolated from the intensity of plant protection in the corresponding landscape type-the higher the intensity of plant protection applications is, the higher the need for biotopes is [27]. The proportion of grassland refers to the area of grassland per arable area in a 1 × 1 km cell of a raster. The multi-annual precipitation sum (1981-2010, [24]) is available in a 0.96 × 0.96 km raster format. The temperature was not regarded due to the low variation of the thermal regime in the study region. For the soil texture and slope information, the data of the European Soil Database were used which are available in so-called Soil Typological Units [23]. The arable farming potential was derived by the Lower Saxonian State Office for Mining, Energy and Geology (LBEG) based on soil and climate parameters (e.g., soil texture, bulk density, humus content, soil structure, and water logging level) [28]. The higher the value of the arable farming potential is, the higher the natural locally potential for biomass production of the soil is. For the regression analysis, all metric variables were transformed from metric values into interval values to facilitate the comparison of the variables' potential ( Table 1). The classification of the intervals was implemented by a geometrical interval algorithm that minimizes the sum of the squares of the number of elements per class to ensure approximately the same number of values in each range [29].
Due to the differences in format and spatial scales of the used datasets, they were processed in relation to a reference scale. For the logistic regression the reference scale was the field scale. For the cluster process the information content of the variable polygons was attributed to a 1 × 1 km grid according to their spatial location and proportion. Grid cells with less than 10% of arable area within the grid cell area, i.e., less than 10 ha of arable area, were not included in the analysis. The merging of the attributed information was performed with the Spatial Join tool in ArcGIS ® . For the small patched polygons of the arable farming potential, the mean of all soil classes per quadrant was attributed. Furthermore, the grid surface permits the calculation of the crop area proportion (crop area per arable Due to the differences in format and spatial scales of the used datasets, they were processed in relation to a reference scale. For the logistic regression the reference scale was the field scale. For the cluster process the information content of the variable polygons was attributed to a 1 × 1 km grid according to their spatial location and proportion. Grid cells with less than 10% of arable area within the grid cell area, i.e., less than 10 ha of arable area, were not included in the analysis. The merging of the attributed information was performed with the Spatial Join tool in ArcGIS ® . For the small patched polygons of the arable farming potential, the mean of all soil classes per quadrant was attributed. Furthermore, the grid surface permits the calculation of the crop area proportion (crop area per arable area in a 1 × 1 km grid cell) as metric variables. The crop area per grid cell is the sum of all fields that had their centroid within one grid cell.

Binary Logistic Regression (Field Scale)
Logistic regression is used instead of linear regression when the observed or measured response of interest is not continuous but binary to predict the likelihood of an event over the likelihood of nonoccurrence [30]. The cultivation of a crop on a specific field is such a binary event. Its likelihood under the occurrence of a specific site variable indicates the strength of its relationships to the cultivation site. If the site variable, e.g., cattle density, changes by one unit while all other variables stay stable, the likelihood of crop occurrence, e.g., maize, is increased or decreased by the resulting value of the regression equation. This value is larger or smaller than zero and can be larger than one. The two variables, arable farming potential and soil texture, have an ordinal scale and not a metric scale like all the other variables. Due to this, all characteristics of these two variables were analyzed separately ( Table 3). The first characteristic, peat soil for soil texture and very low arable farming potential, had the role of the reference value, the same role that zero had for the other variables.

Binary Logistic Regression (Field Scale)
Logistic regression is used instead of linear regression when the observed or measured response of interest is not continuous but binary to predict the likelihood of an event over the likelihood of nonoccurrence [30]. The cultivation of a crop on a specific field is such a binary event. Its likelihood under the occurrence of a specific site variable indicates the strength of its relationships to the cultivation site. If the site variable, e.g., cattle density, changes by one unit while all other variables stay stable, the likelihood of crop occurrence, e.g., maize, is increased or decreased by the resulting value of the regression equation. This value is larger or smaller than zero and can be larger than one. The two variables, arable farming potential and soil texture, have an ordinal scale and not a metric scale like all the other variables. Due to this, all characteristics of these two variables were analyzed separately ( Table 3). The first characteristic, peat soil for soil texture and very low arable farming potential, had the role of the reference value, the same role that zero had for the other variables.
The nine main crops of Lower Saxony were chosen for analysis plus one group containing all spring cereals. For each of the ten crop categories, a binomial regression equation with a binary response variable, y ∈ {0, 1}, was defined to determine the probability of occurrence for each crop separately [31,32]. The regression analysis was performed using the software CRAN-R version 3.1.0 [33]. It uses a logarithmic function calculating the logit (π i ) for the ratio of the probability (Pij) that a field (i) is cultivated with a specific crop (j) or not (1 − Pij). Written in a logit equation as suggested by Fahrmeir et al. [34], The predictor (π i ) represents the logarithmic odds (log odds), while the coefficient (β k ) for this variable (x ik ) is the expected change in these log odds. While holding the corresponding predictor variables constant, a one unit increase of the predictor variable causes a change in the probability corresponding to the coefficient value for having the subject crop [29,35].
The likelihood ratio test with a null model for each crop resulted in a rejection of the null hypothesis for all crops. That means that the observed crop occurrence is more likely under the presented model than under the null model.
In contrast to the other variables, arable farming potential and soil texture are handled as factor variables. The coefficient of the first category acts as a reference category with a value of zero.
We inspected the correlation effects between the site variables to identify the rate of correlation between the variables, e.g., between cattle density and biotope density and between soil texture and arable farming potential ( Table 2). These effects are immanent for variables that characterize ecological and spatial phenomena [36]. A high correlation of the variables is an expected effect and is therefore not considered in the equation. This decision is forced by the objective of the regression analysis, which is not used as a predicting model but as a method to characterize the relationship between the crops and the site conditions. The values of the correlation matrix indicate already the joint appearance of single variables ( Table 2). The interaction of site variables creates regional patterns that are investigated in the next section employing a cluster analysis.

Cluster Analysis (Regional Scale)
A non-hierarchical k-means clustering with the Hartigan & Wong algorithm [37] was used to detect regional patterns of similarities for the site variables and for crops [38,39]. This was realized with the software CRAN-R version 3.1.0 [33,40]. The k-means clustering is a common method for identifying spatial units at the landscape scale [41][42][43]. It was used in this paper to identify spatial units with consistent properties. The crop clusters and the site clusters were than compared in their spatial concordance.
The optimal number of classes, k, was found by comparing results of multiple runs with different number of classes and visualizing the grade of clustering in a map [44]. The uncertainty of the initial random partition was adjusted by choosing the most frequent version of partition in ten runs. In a previous step, a z-transformation of all variable values standardized the very different scales to improve the comparability of the results. The cluster analysis generated five site clusters (S1, S2, S3, S4, and S5) and five crop clusters (C1, C2, C3, C4, and C5).

Site Dependency at Field Scale
The intensity of the crop-site relationship is reflected in the coefficient value of the logistic regression analysis (Table 3). In general, the probability of crop appearance in the dataset depends more strongly on soil variables than on other site variables. Arable farming potential and soil texture show a high likelihood of determining the occurrence or non-occurrence of a crop but vary in their direction of relationship.
There are linear relations between crop and site variables in different directions, e.g., the increase in farming potential increases the probability of wheat but decreases the probability of forage cropping. Oilseed rape is an example of non-linear relations. It was cropped on fields with a middle and high arable farming potential with a much higher likelihood than on fields with an extremely high farming potential. The log odd results of sugar beet prove that soil variables can differ in their direction of influence and explain different aspects of the crop-soil relationship. The ambivalent relationship of sugar beet cropping and soil texture is determined by historical production quotas rather than by soil conditions. The variables farm size, pig/poultry density, grassland density, and biotope index have in general a low influence on probability. Each of the analyzed crops has its own specific character of site dependencies. Some crops such as maize and wheat have diverging site relations, while other crops, e.g., oilseed rape and winter barley, show similar probabilities under comparable site conditions. This result will be examined further in the next section by identifying regions with convergent characteristics.

Statistical Clustering and Spatial Projection
The nature of the relationship between site variables and the grown crop is examined in the regression analysis. With two statistical clustering processes-one for the site variables and one for the crop data-the characterization of the crop-site relationship will be transferred into a spatial projection to visualize overlapping spatial patterns. The k-means clustering of the site variables formed five continuous regions, which are characterized by their mean value in the defined clusters (Table 4). Table 4. Mean values per cluster of the k-means clustering for site variables (S1, S2, S3, S4, and S5-corresponding map in Figure 3a). Values are z-standardized and represent how strong the standard deviation differs from the mean value (µ = 0.000). A small value shows no significant difference from the mean value. The positive and negative values represent the direction of deviation from the mean value in that cluster. S1 is characterized by a low farming potential and sandy soils, which correlate with a higher than average cattle density, biotope density, and grassland proportion. A quite different pattern of site conditions and crops characterizes S2: less humid climate and larger farm sizes. S3 has strong relations to farms that are smaller than average, with a specialization in pig and poultry farming. S4 and S5 have many similar characteristics but are distinguishable by the steeper slope and higher precipitation of the fifth cluster. The k-means clustering of the regional crop area proportion also resulted in five clusters (C1, C2, C3, C4, and C5). Each of these clusters has a characteristic composition of dominant crops ( Table 5): The regional pattern of site conditions in C1 is related to a much higher than average maize proportion of the crop clustering process. C2 is the only cluster that is not dominated by maize or wheat but by a mixture of other crops, mainly rye and potato. C3 is characterized by a mixture of maize, triticale, and forage cropping. A composition of oilseed rape, winter wheat, and winter barley is the distinct feature of C4. The most obvious characteristic of C5 is a winter wheat proportion that is three times higher than the mean in Lower Saxony.

S1
The transfer in a spatial projection of the clustering results reveals relationships between the site variables and the crop clustering on the one hand and distinctive differences on the other (Figure 3). Significant congruencies can be proved for S2 and C2, the potato-rye cluster. The second and third highest proportions of quadrants with spatial congruence were observed for S5 with C5 and for S1 with C1. The other two crop clusters have less than 50% spatial congruence with the site clusters. Table 5. Mean values of the k-means clustering of crop data (corresponding map in Figure 3b). The values represent mean ratios of the crop area per arable area of the related quadrant. Values in bold are significantly higher than the mean value of the certain crop and are considered characteristic crops for the cluster type.

General Discussion
Agricultural crops do not grow randomly at a specific site. Their spatial occurrence reflects the sum of farmers' decisions as a product of site conditions and the political and economic framework. In recent decades, farmers, breeders, and the plant protection industry have focused on a few profitable crops. This was also a result of the market price development and the European agricultural policy and culture of yield-based subsidies. However, sustainable cropping systems rely on diverse cropping systems, among other factors [45,46]. In our study, we detected the strongest relationship of site variables, namely soil texture and arable farming potential, with crops at the most productive areas and the least productive areas. Crops like sugar beet, oilseed rape, and winter wheat are characterized by a high probability to be cropped on sites with a high arable farming potential. The spatial congruence of site clusters (e.g., S5) with crop clusters (e.g., C5) confirmed the regression result referring to the relationship of very high farming potential and the combined cropping of sugar beet and winter wheat. This was supplemented reversely by the significant absence of single crops on soils with high farming potential, such as rye and forage. Zimmermann and Britz concluded from their study of the use of agri-environmental measures by farmers in the EU that those measures were most likely found on less productive sites during 2000-2009 [47]. The recent CAP 2014-2020 includes agri-environmental measures such as crop diversification as obligatory requirements for initial pillar payments. Recent studies concerning the impact assessment of the CAP 2014-2020 show contrary results: a limited environmental impact of the new greening rules [48] and strong effects on the farmland use in high-intensive agricultural regions [49].
The spring cereals and forage crops are characterized by a weak crop-site relationship as well as maize and winter wheat, which are the main arable crops, with acreage of 32% and 21% of the arable area, respectively [17]. The economical preference, the high tolerance for the combination with other crops, as well as the tolerance to short intervals in the rotation result in a dense cropping of maize and winter wheat in space and time [7,16]. Nevertheless, each of these two crops dominate regions that are characterized by contrasting conditions concerning the soil texture and arable farming potential, the slope, as well as the grassland and livestock density.
The relationship of maize cropping and specific combinations of site conditions is strongly determined by the cultivation practice for this crop. Rotations with maize are characterized by very dense cropping up to permanent cropping on the one hand and maize as one part of very diverse rotations on the other hand [16]. These rotation phenomena are common in regions with different site characteristics and geography. This is further confirmed by the result that the spatial congruency of site clusters and the crop cluster with dense maize cultivation (Figure 3, C1) was clearly distinguishable from their relationship to the cluster of maize cultivation in combination with other crops (C3). Whether maize cropping is allocated to C1 or C3 apparently has consequences in terms of ecosystem effects. While the spatially dense maize cultivation can have negative impacts on ecosystem services, the maize cultivation within the more diverse system of C3 can have a positive impact [50]. As the identified areas with high maize acreage are only partly explainable by livestock farming, they may correspond with other factors such as biogas production that are not represented by the explanatory data. The area cultivated with maize increased in Northwestern Germany from 2005 to 2011 by 67% [17]. The widespread cultivation of maize is an effect of the expansion of biogas production after the implementation of the national renewable energy law [11,26].

Reflections on the Methods Used
For a realistic analysis of regional crop-site relationships, the use of crop information at field scale is essential [51,52]. The yearly updated database of the LPIS is a valuable data source for agronomical and environmental analysis. The LPIS data have a high spatial resolution that allows for a precise intersection with other spatial information and yields precise answers to field scale questions. Area-wide crop information on field scale could also be useful for the validation of crop growth models especially for areas with a large diversity of cropping systems [53,54] and for modeling procedures when information concerning cropping practices is needed [52,55,56]. The scientific use of LPIS data, e.g., for the prediction of the crop yield or for projecting changes in agricultural land use practice, is increasingly becoming important [55,[57][58][59][60].
Two statistical methods were applied for the analysis of the crop-site relationship: logistic regression analysis and k-means clustering, visualized by a map projection. Both approaches concern different levels and aspects of the relationship. The level of spatial similarities between the crop clusters and the site clusters supplemented the results of the logistic regression analysis and elucidated in parts the fuzzy picture of direct relationships. This underpins the need to include cropping patterns instead of single crop information in modeling approaches.
Not all the chosen variables have the expected potential to explain crop-site dependencies. The low influence of farm size, pig/poultry density, grassland density, and biotope index on the probability of crop cultivation in comparison with the soil variables can be explained by their low tendency to form spatial patterns or clusters in Lower Saxony, which is reflected in the high standard deviation values. In our analysis, we focused on environmental variables instead of economic variables because most of the studies concerning the cropping-plan decision making process of farmers consider economical and sociological drivers [3,61]. However, we could show the still high potential of soil variables as drivers for decision making, which is also confirmed by a study of Peltonen-Sainio et al. [62]. This study exposed also field size as a potent driver variable, which was not concerned in our study, because it is indirectly included in the biotope index.
The crop clustering process resulted in a much more scattered picture than the site cluster projection. The latter is based on variables with different spatial resolution, ranging from the smaller scaled LAU-2 data to 1 km 2 resolved raster data that gave a different degree of precision. However, the reason for the different degree of spatial clustering is not only caused by the spatial resolution of the data sources. While the site clusters are products of natural conditions, the crop clusters are a result of both site conditions and socio-economic factors, e.g., market prices and subsidies. This supports the flexibility of the farmers in crop choice and therefore the fragmentation of crop clusters, especially in the center of Lower Saxony (#3, 5, 6 referring to Figure 1), with medium arable farming potential, sandy soils, and a higher variation of farm types in this area than in other regions.

Conclusions
The relationship of site conditions and crop cultivation at the field scale is generally weak but detectible for some crops. One reason is that modern cropping practice enables the farmer to override the relationship of crop and site to a large extent. However, this does not apply to all crop-site relationships. In arable regions with productive soils, the crop-site relationship is stronger. This comes along with specialization of the farming systems to a few cash crops, mainly the most profitable crops like sugar beet and winter wheat. On the other hand, a stronger relationship of crop and site at the regional scale was also detected for clusters with less productive soils and the crop cluster with dominant maize cultivation. Economic reasons and policy-based incentives, such as support for bioenergy crops may have enforced this allocation. Farming practice and agricultural policy must face the chances but also the risks of this development.
In regions with less fertile soils and mixed farming structure, farmers' cultivation practices are much more diverse. These site clusters are not dominated by one crop cluster but by multiple crop clusters with a number of dominating crops. The likelihood of crop rotation diversification is higher in these multiform regions, but in rather monotonous regions diversification efforts are crucial.