The pruned Delaunay triangulation of VPO locations results in annual graphs of VPOs locations. For lesser celandine (Figure 2
), wood anemone (Figure A1
) and cow parsley (Figure A2
), these graphs cover the whole of the Netherlands. However, the VPOs of pedunculate oak (Figure A3
) are mostly located in the centre of the country. For all species, a large dispersion of VPO locations from year to year was observed, which is intrinsic to volunteer monitoring network data. For example, the spatial variability found for lesser celandine shows that volunteers observed this species almost everywhere in the Netherlands however, in some years (e.g., 2007 and 2008) observations are more clustered in the western part of the country (Figure 3
and Figure A4
, Figure A5
and Figure A6
). The four largest cities of the country (Amsterdam, Rotterdam, The Hague and Utrecht) are located in the western part. The level of spatial connectivity of VPOs is higher surrounding these cities. In these areas, the results of consistency check are more reliable because a larger number of VPOs are available to confirm or refute the consistency of the connected observations.
The correlation between the difference in DOYs and the difference in GDDs was significant for all species. The average correlation coefficient was 0.91 for lesser celandine (Figure 3
), 0.90 for wood anemone (Figure A4
), 0.93 for cow parsley (Figure A5
) and 0.90 for pedunculate oak (Figure A6
). This indicates the considerable influence of the accumulated daily temperature on the timing of flowering. The slopes of the fitted regression lines show the rate of change of the difference in reported DOYs per unit of difference in GDD (i.e., the steeper the slope, the larger the difference in the timing of the phenophase). The comparison of the annually modelled and reported difference in DOYs is used to identify 24 sets of inconsistent observations.
The correlation coefficient between the annual standard deviation of consistent DOY corresponding to each ΔMax value and the annual average GDD were calculated for lesser celandine (Figure 4
), wood anemone, cow parsley and pedunculate oak (Figure A7
). The ΔMax that lead to the phenological synchrony model with the largest coefficient of determination was 13 days for lesser celandine, 20 days for wood anemone, seven days for cow parsley and 15 days for pedunculate oak. For example, the optimal ΔMax value indicated that the maximum difference in the timing of flowering of two lesser celandine plants (located less than 100-km apart and growing under similar temperature regimes) was 13 days. The small value of ΔMax for cow parsley flowering, a plant with a shallow root system, might be caused by the impact of other environmental parameters such as soil moisture and light intensity [55
For all species, optimal inconsistent VPOs show a large difference in the reported DOY compared to their surrounding VPOs (Figure 5
). For all species, except cow parsley, the annual percentages of VPOs, highlighted as possible inconsistent observations, is smaller than the annual percentages of boxplot outliers (Figure 6
and Figure A8
). Inconsistent VPOs refer to unusually early or late DOYs with respect to the regional temperature regime of the observation site whereas boxplot outliers only identify very early or late observations for a given set of annual observations and, in consequence, do not consider the effect of regional contextual information. The annual boxplots for lesser celandine (Figure 7
d), wood anemone, cow parsley and pedunculate oak (Figure A8
) show that outliers are mostly located below the lower whisker of the boxplot. This indicates that the distribution of observed DOYs is not normally (Gaussian) distributed as the boxplot assumes. As a result, boxplot filters out several early VPOs that are consistent observations. Such consistent early VPOs can provide reliable information about advancement in the timing of phenophases.
The synchrony analysis resulted in models that predict the standard deviation of the DOY of the selected phenophases using the annual average GDD. For consistent VPOs, the correlation coefficient between the standard deviation of the DOY and the annual average GDD was 0.78 for lesser celandine, 0.63 for wood anemone, 0.61 for cow parsley and 0.60 for pedunculate oak. These results suggest that the timing of flowering and leafing onsets for the species under study is more synchronous in cold late winters and early springs than in warm ones. The comparison between the synchrony models, made from original, outlier-free and consistent VPOs, shows that using boxplots negatively impacts the quality of the model (Figure 7
and Figure A9
, Figure A10
and Figure A11
). For all species, the R-squared value of the models based on consistent VPOs is larger than that of the models based on outlier-free data (Table 2
). Moreover, the models based on consistent VPOs are more in line with the models made using original VPOs. Removing a large number of outliers (Figure 6
) by using the Tukey boxplot method leads to strongly distorted models at large geographical scales (c.f. Figure 7
b). The application of our workflow improves the quality of the model (notice the improved R-squared values in Figure 7
This study presents a workflow to check the consistency of VPOs. Unlike purely statistical methods, our workflow uses the geographic location and the corresponding accumulation of daily temperature as independent sources of information. The workflow defines and evaluates consistency constraints based on the correlation between VPOs synchrony and the rate of change of temperature from winter to spring. We used the workflow to filter out phenological observations that do not provide regionally representative species-specific flowering and leafing DOYs and labels them as “inconsistent”. Unlike existing threshold-based outlier detection methods, our workflow is based on a geographic context approach to identify neighbours of VPOs. Moreover, the workflow not only considers the geographic context of VPOs, as used by Schlieder and Yanenko [11
] and Bonter and Cooper [18
], but it integrates this information with environmental contextual information to check the VPOs consistency. As a result, the workflow avoids filtering unusually early or late observations that are consistent with their environment.
Inconsistent VPOs can be caused by either species and/or phenophase misidentification or they can be true observations influenced by a microscale temperature regime that is hard to model using 1 by 1 km temperature data. In either case, inconsistent observations are not representative of the phenology of this species in the Netherlands. The high correlation found between the difference in DOYs and the difference in GDD of near observations on flowering and leafing onsets indicates that daily temperature is indeed relevant for the analysis of the selected species and phenophases. This is in line with the fact that daily temperature is a dominant factor for plant phenology in temperate and boreal regions [56
]. This highlights the importance of storing metadata about volunteered observations to improve the temporal consistency in phenological databases.
Considering ΔMax as a proxy for the spatial variability of the timing of a phenophase under similar temperature conditions helps to constrain the temporal window in which the occurrence of a VPO is consistent. As ΔMax takes into account the geographic context, a quality control mechanism of VPOs based on this metric outperforms alternative methods solemnly based on data distributions. Given the increasing popularity of citizen science networks, we expect to get better estimates of ΔMax in the near future. The ΔMax metric can also help to understand the phenology of species. For instance, the relatively small value of ΔMax for cow parsley (seven days) indicates that the flowering onset of this species is controlled more strongly by temperature as opposed to the other selected species or phenophases. In consequence, this species shows the highest correlation between the difference in DOYs and the difference in GDD of VPOs. The initial ΔMax value (i.e., one week) that sets a maximum difference in DOY for observations performed under similar environmental conditions is not known. This value might be different in other study areas or for other plant species.
In this study, we used a pruned Delaunay triangulation to connect VPOs to their closest neighbour [57
]. This approach has advantages over using a distance-based method, which connects all VPOs within a specific distance to the under-check observation. In the distance-based approach, a distance threshold has to be chosen. However, such a distance threshold is not always known and selection of an arbitrary distance threshold could have a negative impact on the analysis. Moreover, a distance-based approach would potentially lead to VPOs with no neighbours within their vicinity. However, the applied Delaunay triangulation method is more flexible and data-driven. It uses closer neighbours when available (leading to smaller triangles) and more distant VPOs when needed (leading to larger triangles).
Here, we assume that GDD accumulations drive the synchrony of the selected phenophases within a 100-km distance. The annual number of observations does not substantially affect the synchrony model of species. For instance, the annual number of observations of cow parsley is higher than the number of observations of wood anemone and pedunculate oak. Yet, the R-squared of the synchrony model found for this species is smaller than that of wood anemone and pedunculate oak. In the Netherlands, the level of spatial connectivity of VPOs using this distance threshold is high, however, this might not be the case when analysing larger areas. In such areas, the analysis might be hampered by annual graphs with a low level of spatial connectivity. Thus, advanced spatial point pattern tests could help to evaluate the spatial distribution of VPOs as well as to identify the similarity of VPOs distributions over the study period. For example, Andresen and Malleson provide various tests to measure the degree of density and of similarity of spatial point patterns [60
Other temperature-driven metrics than GDD such as the average temperature, could be tested with our workflow. For example, Calinger et al. [61
] showed the suitability of average monthly temperatures during the month of the phenophase and some number of months prior to the event to model phenological responses to temperature across many species. Moreover, GDD accumulations may not be the only or main driver of flowering and leafing onset in other study areas. The length of the chilling period [62
], photoperiod [24
], and precipitation and elevation [64
] might also drive flowering onset in spring. For example, Studer et al. [65
] used a multivariate regression to model the timing of wood anemone flowering as a function of temperature and precipitation. A similar model could be used during the consistency check phase of our workflow because it is generic enough to accommodate other phenological drivers.
Our workflow works for events that are synchronized, yet, this is the case for several ecological phenomena. In citizen science, several networks monitor environmental events that are weather-driven. Examples of such types of monitoring are the reporting of tick and mosquitoes bites and observation of migrating birds. For these types of phenomena, different types of weather data can be used to find inconsistent observations. The developed workflow can also be useful in these domains.
VGI has greatly contributed to phenological studies, leading to an improved understanding of plant and animal seasonality across the globe. In this respect, checking the consistency of volunteered phenological observations or VPO is a pre-requisite to ensure the validity and representativeness of VPO-based results. In this paper, we present a workflow designed to use geographical and contextual information associated with phenological observations to check the consistency of observations while analysing their synchrony. This workflow was used to improve our knowledge of the local impact of inter-annual temperature variations on the consistency and synchrony of VPOs from various plant species in the Netherlands.
Our results reveal that the most common method (boxplot) to filter outliers in VPOs substantially biased synchrony analysis of the timing of the spring flowering and leafing. Our results indicate that climate change and inter-annual weather variability determine changes in the synchrony of spring plant phenology. Given that several national and international initiatives facilitate and actively support the collection of VGI for ecological studies and that the open data movement is resulting in more contextual environmental information becoming available, the proposed workflow provides a unique opportunity to check the consistency of volunteered observations. Considering its general character, we think that this geocomputational workflow could be adapted to other kinds of VGI, hence contributing to the curation of this interesting source of geospatial data.