www.mdpi.com/journal/ijgi/ Integrating Open Access Geospatial Data to Map the Habitat Suitability of the Declining Corn Bunting (Miliaria calandra)

The efficacy of integrating open access geospatial data to produce habitat suitability maps for the corn bunting (Miliaria calandra) was investigated. Landsat Enhanced Thematic Mapper Plus (ETM+), Shuttle Radar Topography Mission (SRTM) and Corine (Coordination of Information on the Environment) land cover data for the year 2000 (CLC2000) were processed to extract explanatory variables and divided into three sets; Satellite (ETM+, SRTM), CLC2000 and Combined (CLC2000 + Satellite). Presence-absence data for M. calandra, collected during structured surveys for the Catalan Breeding Bird Atlas, were provided by the Catalan Ornithological Institute. The dataset was partitioned into an equal number of presence and absence points by dividing it into five groups, each composed of 88 randomly selected presence points to match the number of absences. A logistic regression model was then built for each group. Models were evaluated using area under the curve (AUC) of the receiver operating characteristic (ROC). Results of the five groups were averaged to produce mean Satellite, CLC2000 and Combined models. The mean AUC values were 0.69, 0.81 and 0.90 for the CLC2000, Satellite and the Combined model, respectively. The probability of M. calandra presence had the strongest positive correlation with land surface temperature, modified soil adjusted vegetation index, coefficient of variation for ETM+ band 5 and the fraction of non-irrigated arable land.


Introduction
European farmland bird species and long-distance migrants continue to decrease at an alarming rate [1] despite the passing, in 1979, of the Council Directive 79/409/EEC on the conservation of wild birds, also known as the "Birds Directive".Examples of declining farmland bird species include the corncrake (Crex crex) in France [2] and the grey partridge (Perdix perdix) in Italy [3].Some species have become extinct as regular breeders in certain countries, such as the red-backed shrike (Lanius collurio) in the United Kingdom and the roller (Coracias garrulus) in the Czech Republic [4].
Four farmland-specialized bunting species in Western Europe have also suffered particularly severe declines.For example, the Ortolan bunting (Emberiza hortulana) population in Finland declined 72% between 1984 and 2002 [5], and in Poland, its population declined 20% between 2000 and 2010 [6].The population collapse of the cirl bunting (Emberiza cirlus) in Britain between 1970 and 1990 coincided with unprecedented large-scale changes in agricultural practice, leading to losses in the species' breeding habitat [7,8].The yellowhammer (Emberiza citrinella) suffered significant declines in the United Kingdom [9], Poland [10] and Sweden [11] since the mid-1960s, due to losses of both suitable breeding habitat and nearby wintering sites [12].Similarly, northern and central European populations of the corn bunting (Miliaria calandra) have declined sharply since the mid-1970s [4], notably in Britain [13], Poland [14], Germany [15] and Ireland [16].The declines of these species have been credited to detrimental land use policies, such as the Common Agricultural Policy (CAP) [1], that have been in effect in the European Union since the early 1960s.CAP promotes the maximization of agricultural productivity through the intensification of farmland; a process that results in monocultures, pesticide use and the eradication of uncultivated areas [17].
Geospatial technologies provide valuable tools to map bird distribution by linking independent observations with habitat and other environmental information extracted from satellite imagery or land cover data.To assess the relationship between farmland bird species and their habitats, Fuller et al. [18] applied a clustering method to link species' occurrence data from the atlas of breeding birds of Britain to a national land cover dataset and found that maps of species clusters exhibited patterns associated with habitat assemblages.In another study, Kuczynski et al. [19] combined the proportional cover of Corine land cover classes and digital elevation data to study the habitat use of the great grey shrike (Lanius excubitor).Recently, Kosicki and Chylarecki [6] combined data from Corine land cover, a digital elevation model, normalized difference vegetation index (NDVI) and climatic data from Worldclim to predict the habitat of the E. hortulana in Poland.
This study aims to better understand the habitat requirement of M. calandra in north-eastern Spain through the fusion of open access data derived from Corine land cover, Landsat and the Shuttle Radar Topography Mission as surrogates for land use, habitat structure and topography, respectively.Declines in Northern European M. calandra populations have been attributed to farmland intensification [13,17], while Southern European breeding densities, particularly in Spain, Portugal and Turkey, are fairly stable [20,21], due to the prevalence of traditional low-intensity farming.Furthermore, most of the studies on the distribution ecology of M. calandra have been conducted in Northern and Central Europe [22].Hence, there is an opportunity for the study of the habitat requirement of this species in a part of the continent that is relatively unaffected by the declining trend.

Study Area
The study area is located in the province of Lerida in the western part of the autonomous community of Catalonia in Spain (Figure 1).The area consists of the comarques of Noguera, Segria, Urgell and Pla d'Urgell, covers approximately 3,920 square kilometers and is composed mainly of non-irrigated cropland and dry forests with extensive agriculture and dry pastures.Intensive agricultural practices are slowly being implemented in the region to replace the economically unviable low-intensity extensive farming methods that dominate the landscape of Lerida [23,24].

Open Access Geospatial Data
Since the 1970s, Landsat data has been widely used to map the habitat preference and distribution of various species (e.g., [25][26][27][28]).Imagery from the Enhanced Thematic Mapper Plus (ETM+) sensor onboard the Landsat 7 satellite were downloaded from the USGS Global Visualization Viewer, version 7.26, available from glovis.usgs.gov[29].Two scenes from path-198, row-31 for the month of June (1 June 2006 and 17 June) of the year 2001 were used for this study to temporally coincide with the survey period of the bird data.The dataset used was a standard level one terrain-corrected (L1T) product that underwent radiometric and geometric correction.Atmospheric correction was performed on the Landsat data using the "Landsat" package [30] in the software program R [31], then projected into the European Datum 1950/Universal Transverse Mercator zone 31 North (ED50/UTM 31N) projection.A brief description of the Landsat bands is given in Table 1.The Shuttle Radar Topography Mission (SRTM) [32] digital elevation model (DEM) was included as a variable because M. calandra abundance is linked directly to habitat diversity, which is, in turn, correlated with topographic variability [33].The 90 m DEM dataset was downloaded from the CGIAR-CSI database [34] and projected into the ED50/UTM 31N projection.The dataset was then resampled to 30 m to match the resolution of the ETM+ imagery using the nearest neighbor method.Elevation of the study area ranges from 1,603 m in the northern parts to 73 m around the river Segre.
Corine (Coordination of Information on the Environment) land cover 100 m data for the year 2000 (CLC2000) [35] was downloaded from the European Environmental Agency (EEA) website and projected into the ED50/UTM 31N projection.The dataset was rasterized at the original 100 m resolution, then resampled to 30 m using the nearest neighbor method to match the spatial resolution of the ETM+ imagery.Corine is a pan-European project that aims to produce a distinctive and comparable land cover dataset for Europe.The project has a total of 44 land cover classes, out of which, 26 occur in the study area (Table 2).The entire CLC2000 dataset is available from www.eea.europa.eu/data-and-maps[35].

Explanatory Variables
Explanatory variables in this study describe the biogeophysical aspects of the landscape and were used in statistical analysis to predict the occurrence of M. calandra at unsurveyed locations.A total of 27 explanatory variables (Table 3) were derived from open access geospatial datasets described in the preceding section.The following section details the different subgroups of explanatory variables.Saveraid et al. [36] proposed that the use of satellite data alone is not sufficient in modeling bird distribution and that habitat structure variables are also necessary.In order to extract habitat structure from the satellite data, first order texture variables were produced by running a 3 × 3 pixel (~8,100 m 2 ) local statistic across ETM+ imagery of the study area to calculate standard deviation (SD) and coefficient of variation (CV).This resulted in two texture measurements for each of the six ETM+ visible and infrared bands: BANDxSD and BANDxCV, where x is the Landsat band used (see Tables 1 and 2).Standard deviation assesses the variability of image texture, and the coefficient of variation (standard deviation of pixel values divided by their mean) gives a measure of the variability in image texture as a percent of the mean.Satellite image texture variables have been used to quantify landscape heterogeneity and to model the spatial patterns of birds [37] by providing data on the relationship between bird distribution and habitat structure.

Vegetation Index (ETM+)
Spectral vegetation indices have been widely used in habitat mapping [38], and they provide important information about the condition of the vegetation.The modified soil adjusted vegetation index (MSAVI) was chosen to minimize the soil background effect and enhance the dynamic range of the vegetation signal, producing greater vegetation sensitivity in areas that have significant portions of bare soil [39].MSAVI was calculated from the red (ETM+3) and near-infrared (ETM+4) bands using the methods described in Qi et al. [39].

Land Surface Temperature (ETM+)
Land surface temperature (LST) measures the energy efficiency of terrestrial ecosystems by quantifying radiated thermal energy [40].It is seldom included as a variable in ecological studies [41], but it can offer valuable information about anthropogenic or natural modifications to ecosystem energy budgets [42].LST was derived from the thermal infrared band (ETM+6) using methods described by Sobrino et al. [43] and NASA [44].

Landscape Metrics (CLC2000)
Landscape metrics describe the spatial pattern of habitats and have been used to classify the habitat suitability of different bird species [45,46].Metrics were calculated to quantify fractions of the eight dominant habitat types in the study area; dominance in this context was established if a CLC2000 class covered an area greater than 100 m 2 .For each metric, a 3 × 3 local statistic was used to calculate the proportion (between 0 and 1) of a given CLC2000 class within the window.Additionally, distance metrics were obtained by calculating the Euclidean distance of each pixel from superimposed layers consisting of regions occupied by humans and wet areas (rivers, lakes and marshes).See De Smith et al. [47] for details on the derivation of landscape metrics.

Topography (SRTM)
Topographic heterogeneity directly affects habitat selection by influencing temperature and humidity gradients and indirectly by modifying the vegetation composition [48].The inclination of the surface slope in degrees and aspect (downslope direction) of the study area were extracted from the DEM.

Open Access Bird Data
This study was commenced with the intention to only use open access data in mapping the habitat of M. calandra.Therefore, data on M. calandra presence was downloaded from two open access sources, the Global Biodiversity Information Facility (GBIF; www.gbif.org)[49] and the Avian Knowledge Network/eBird (AKN/eBird; www.avianknowledge.net)[50].However, it was found that the quality of these data was too poor to perform robust analysis on the habitat suitability of this particular species in Catalonia (Figure 2).Hence, these open access data were not used in this study.Data from the Catalan Breeding Bird Atlas Data was requested from the Catalan Ornithological Institute (ICO), which provided presence and absence records of M. calandra from the Catalan Breeding Bird Atlas (CBBA) [51].CBBA surveys were conducted during the breeding season (1 March to 30 July) in the years 1999-2002.Surveys were conducted by experienced professionals between sunrise and 11 am and between 6 pm and sunset.The survey plots were 1 km × 1 km UTM squares in which two 1-hour surveys were conducted and the presence or absence of the species was recorded [51].The total number of M. calandra locations in the study area was 339, out of which, 251 (74%) were presence points and 88 (26%) were absence points.Further details on the field sampling methodology can be found in Brotons et al. [51].

Free and Open Source Geospatial Software
Due to the limited availability of funds, only free and open source software were used in this study.Statistical modeling was conducted using the R Environment for Statistical Computing version 2.9.2 [31]; the latest version of R is available from www.r-project.org[31].R was also used to project the data to the ED50/UTM 31N projection using the "rgdal" package [52].Spatial analysis, including calculation of landscape metrics and visualization of the final maps, was conducted using Whitebox Geospatial Analysis Tools (WGAT) version 0.12, available at www.uoguelph.ca/~hydrogeo/Whitebox[53].Topographic analysis was done in the System for Automated Geoscientific Analyses (SAGA) version 2.03 [54].SAGA is available from www.saga-gis.org[54].All work was done in Ubuntu version 9.10.Ubuntu is a Linux-based open source operating system that can be downloaded from www.ubuntu.com[55].

Modeling
There are three times (251) as many presence points as there are absence points (88) in the bird data, and this disparity will introduce bias in the modeling process.In order to minimize this, the dataset was partitioned into an equal number of presence and absence points.Five groups were created (Figure 3), each composed of 88 randomly selected presence points to match the number of absences in the dataset.All the explanatory variables were extracted at each one of these points using the "overlay" function in R. The modeling framework is presented in Figure 4.
Each of the five groups comprises three subgroups: (1) a Satellite group that comprises the explanatory variables derived from the ETM+ imagery and the SRTM data, (2) a CLC2000 group that comprises the landscape metrics derived from the Corine land cover data and (3) a Combined group that consists of both the Satellite and CLC2000 variables.
A logistic regression [56] model was built for each of the five groups, and a stepwise elimination process based on Akaike Information Criterion (AIC) [57] optimization was applied to remove insignificant variables.Then, a stepwise variance inflation factor (VIF) [58] elimination was applied using the "vif" function in R. All variables with VIF values greater than 5 [59] were discarded to avoid multicollinearity (Table A1 in the Appendix).Models were evaluated using Cohen's kappa [60], area under the curve (AUC) of the receiver operating characteristic (ROC) [61] with a threshold of 0.50 to denote suitable habitat.Resultant probability maps were averaged to produce mean CLC2000, Satellite and Combined maps (Figure 5).

Results
The CLC2000, Satellite and Combined model outputs for all five groups were averaged to produce three final maps (Figure 5).The mean AUC values of 0.69, 0.81 and 0.90 for the CLC2000, Satellite and the Combined Model, respectively, were above the random model threshold (0.50), indicating that the explanatory variables influence M. calandra habitat selection.
The mean logistic regression model for the CLC2000 group is summarized in Table 4. AUC value for this model was 0.69.The model assumes unfavorable habitat in the steeps slopes of higher altitudes, and there was a distinctive preference for the non-irrigated arable land (p < 0.001) landscape metric.The exclusion of non-irrigated arable land had the greatest effect on the model by increasing the AIC by an average of 43.56.In contrast, M. calandra exhibited reduced preference for areas that were predominantly composed of permanently irrigated land (p = 0.002), but the map displays favorability for grassy fringes, where permanently irrigated land meets habitats, such as non-irrigated arable land.
The mean logistic regression model for the Satellite group is summarized in Table 5. AUC value for this model was 0.81.Land surface temperature (p < 0.001), surface slope (p < 0.001) and modified soil adjusted vegetation index (p < 0.001) had the strongest influence on the independent variable.M. calandra is a ground nesting species and prefers uncultivated land; therefore, it is not surprising that land surface temperature would have a strong influence on habitat preference during the breeding season, as it is able to discriminate the thermal signature of dry, non-irrigated land.The correction factor in the modified soil adjusted vegetation index algorithm enhances the vegetation signal in areas with low vegetation density.The coefficient of variation of the ETM+ band 5 (p = 0.002) also exhibited a strong positive influence on M. calandra occurrence, because vegetation moisture content is discernible in the shortwave infrared region between 1.55 and 1.75 microns, suggesting that texture features in the infrared region are likely to detect variation in vegetation structure.An interesting result was the negative relationship of M. calandra with the standard deviation of the ETM+ band 1 (p = 0.097); removal of this variable significantly increased the model AIC score.The negative relationship could be attributed to the fact that the spectral range of ETM+ band 1 captures the reflection of urban landscapes.The mean logistic regression model for the Combined group consisted of nine explanatory variables (Table 6, Figure 6) and had the highest AUC of 0.90.The comparative ROC plot in Figure 7 displays the effect of multiple factors not limited by data source on M. calandra habitat selection behavior.For example, the selection of the modified soil adjusted vegetation index indicates the effect of soil background as an important factor in habitat selection.Increased reflectance from the underlying soil could be caused by the operation of heavy machinery or other anthropogenic disturbances.The Combined model indicates that M. calandra avoids areas with steep slopes, areas near human activity and urban infrastructure and areas entirely composed of intensely irrigated land.The model also shows that, during the breeding season, M. calandra habitat suitability is positively influenced by non-irrigated arable land and dry, open areas far from forest cover.

Discussion
Over the last forty years, changes in the European agricultural landscape have resulted in a continuously decreasing trend in the breeding numbers of farmland bird species.Predictive distribution modeling of species of concern is important in order to assess the significance of habitats from a conservation perspective, and there is a need to monitor this decline using tools that are easily available, affordable and practical.The fusion of open access data, both biological and geophysical, from different sources is a viable method in producing models that reflect species' habitat preference.
Explanatory variables selected for this study were derived from open access geospatial data sources to assess M. calandra habitat suitability during the breeding season.Image texture represents the visual effect produced by the spatial distribution of tonal variability (pixel values) in a given area [62] and can serve as a substitute for habitat structure, because variability in the reflectance among adjacent pixels can be caused by horizontal variability in plant growth [37].MSAVI possesses a correction factor that adjusts according to vegetation density, which has been shown to enhance the dynamic range of the vegetation signal and produce greater vegetation sensitivity [39].Topography indirectly affects the distribution of species by modifying the relationships of birds with vegetation or by modifying the vegetation types [63].Landscape metrics quantify specific spatial characteristics of patches, classes of patches or entire landscape mosaics from categorical map patterns [47] and help explain how spatial patterns of landscapes influence ecological processes [64].Anthropogenic distance metrics are important measures for predicting bird assemblages in agricultural eco-regions [65], since some species prefer to breed in areas far from intensive human disturbance.Non-irrigated arable land is an extensive land cover class [66] that includes infrequently irrigated fallow land or areas under crop rotation that are cultivated with cereals, legumes, fodder crops and root crops.On the other hand, permanently irrigated land is an intensive land cover class [66] that includes areas under constant irrigation using permanent infrastructure, such as irrigation channels and drainage networks.The varieties of crops grown on permanently irrigated land cannot be cultivated without an artificial water supply.
The mean Combined logistic regression model had an accuracy of 0.90 based on AUC.M. calandra had a strong positive correlation with LST, MSAVI, BAND5CV and non-irrigated arable land (NIAL).These parameters served as ecological surrogates in the modeling process.Land cover variables provided direct causal relationship with M. calandra distribution; for example, during the breeding season, the species prefers to nest in uncultivated land and avoids pastures, which explains the favorability of NIAL over permanently irrigated land (PIL).Pastures and intensified agricultural fields exhibit low temperature in summer breeding months, due to heavy irrigation.Here, LST proves useful in discriminating between intensely irrigated and non-irrigated land.However, urban environments also exhibit high LST, and the inclusion of BAND1SD, which is sensitive to the highly variable spectral signature of urban landscapes, provided additional information about habitat suitability.
Research on the declines of the four farmland bunting species in Europe generally involved a variety of statistical modeling of field-collected independent variables (e.g., the presence and absence of species [8] or the abundance and density [5]) and explanatory variables (e.g., crop type, land use [7], tree density and vegetation structure [21] or proportions of certain land use features [12]).The absence of geolocated components in these studies creates difficulty in understanding the spatial patterns of species distributions.In contrast, the logic behind the modeling approach utilized in this study is that the presence or absence of the species at a particular location is a function of the explanatory variables that represent the species' environment at that location.The use of variables derived from remote sensing and land cover data as proxies predicting avian habitat suitability is not a new field of research, but previous studies have not taken full advantage of the information contained within open access remote sensing data.For example, in the study by Kosicki and Chylarecki [6], NDVI was the only biophysical variable extracted from remote sensing data.Previous studies on M. calandra habitat association that did not use open access geospatial data have yielded results similar to the outcomes of this study; Brambilla et al. [22] found that the proportion of arable land positively influenced the habitat preference of breeding M. calandra, while Stoate et al. [21] found that 'treeless cereal cropland' supported the highest breeding densities of the species.
The data used in this study were provided by the ICO, and the procurement of biodiversity data from such sources often has to undergo a request-and-decision process that results in approval or rejection, depending on the quality of the submitted proposals, the affiliation and academic level of the requester, the cooperativeness of the data-holding institution and the sensitivity of the data (e.g., IUCN Red List species).Closed-access data, such as the CBBA, are often of a higher scientific quality and are more extensive than open access, readily available data, such as GBIF and AKN/eBird (Figure 2).During the process of developing a methodology for this project, and prior to gaining permission to use ICO data, the author had experienced either rejections or no responses to requests for data access from several European organizations.Hence, the scope of the project and the methodology had to be constrained.It is thus imperative for conservation organizations with large high quality scientific data to invest in web-based infrastructure that facilitates access to biodiversity data, so that the research community, particularly those with limited finances, can use them.

Conclusion
The development of distribution maps that consist of information from both Earth observation and land cover datasets are of importance for species that have indeterminate ranges [67] and for monitoring the spatial dynamics of threatened species.This data-fusion approach helps in the identification and maintenance of important habitats as the Bird Directive stipulates and, in combination with readily accessible open access biodiversity data, also facilitates the rapid delivery of environmental information to decision makers.

Figure 1 .
Figure 1.Location of Lerida (red outline) in relation to Europe and Catalonia.

Figure 2 .
Figure 2. Comparison of corn bunting observations in Catalonia from one closed access (Catalan Breeding Bird Atlas (CBBA)) and two open access sources (Global Biodiversity Information Facility (GBIF) and Avian Knowledge Network/eBird (AKN/eBird)).

Figure 3 .
Figure 3.The 251 presence points were divided into five groups, each composed of 88 randomly selected presence points (in black) to match the 88 absence points (in red).

Figure 4 .
Figure 4. Framework of this study.

Figure 5 .
Figure 5. Mean probability maps according to the (A) CLC2000 (B) Satellite and (C) Combined models.Note the difference in texture between A and B owing to the coarser resolution of the CLC2000 data and the finer resolution of the combined Landsat and SRTM data.C produces a harmonized output that captures key variations of the previous two without compromising resolution.

Figure 6 .
Figure 6.The nine explanatory variables common to the Combined model in all five groups: (A) Modified Soil Adjusted Vegetation Index; (B) Land Surface Temperature in °C; (C) Coefficient of Variation of ETM+ Band 5; (D) Standard Deviation of ETM+ Band 1; (E) SRTM Slope in degrees; (F) Fraction of Broad-leaved Forest; (G) Fraction of Non-irrigated Arable Land; (H) Fraction of Permanently Irrigated Land; (I) Distance to Human Activity in meters.

Figure 7 .
Figure 7. Comparative receiver operating characteristic (ROC) plot of the mean area under the curve (AUC) values for the five groups.

Table 1 .
Description of the Landsat 7 Enhanced Thematic Mapper Plus (ETM+) bands used in this study.Optical data ranges from the beginning of the visible (400 nm) until the end of the thermal infrared (1 mm) portion of the electromagnetic spectrum.

Table 2 .
Coordination of Information on the Environment (Corine) land cover 2000 (CLC2000) classes that occur in the study area.Classes included in the analysis represent over 95% of the total surface area.

Table 3 .
Univariate descriptive statistics of all extracted explanatory variables grouped by data source and their associated acronyms.SD, standard deviation; CV, coefficient of variation; LST, land surface temperature; MSAVI, modified soil adjusted vegetation index; SRTM, Shuttle Radar Topography Mission.

Table 4 .
Output of the mean CLC2000 model.

Table 5 .
Output of the mean Satellite model.

Table 6 .
Output of the mean Combined model.Bold letters in parentheses link each variable with the corresponding plot in Figure6.