Between December 2013 and December 2016, there were 19,711,242 geo-tagged Flickr images within the conterminous US.
Table 1 illustrates the top 48 objects detected by YOLO, and the number of images that contain at least one of these objects. These objects were used to infer human activities as well as environmental characteristics of locations that the photographs were taken at. For example, the presence of bicycles may be useful to quantify biking behavior, sports ball may indicate sports activities, and objects such as sofa, bed, vase, and chair may indicate indoor activities. In this article, we focused on only bird images, and used birdwatching activity as a case study to demonstrate the utility of our analytical framework. The object “bird” was the 5th most frequent object detected in 747 thousand images.
We organized the results and evaluation under two subsections: verification and validation. We first present our comparative evaluation of metadata search and YOLO to verify the accuracy of both approaches. Second, we compare YOLO-detected birding activity to EBird observations to evaluate the validity and biases of Flickr and eBird data in inferring birdwatching activity.
3.1. Verification
Our objective for verification was to answer the following questions:
While we detected 747,015 (3.8%) images with birds using YOLO, we detected 534,121 (2.7%) images that contain bird keywords with metadata-based search.
Table 2 represents the temporal variability in the detection of birds by metadata search and YOLO. Overall, YOLO allowed increased detection of birds over 50% of what metadata search could detect, and this increase was consistent across different seasons. There was a substantial increase of over 1% in the detected number of bird photographs when using YOLO as compared to the number images captured by the metadata search. Among the 19.7 million images, both YOLO and metadata searches commonly detected birds in 409,779 (2%) images. Since both methods detected birds in these images, we considered the classification as accurate. In order to identify the mismatch between the two methods, we further compared images detected only by YOLO and those only by metadata search. YOLO detected an additional 1.8% bird images, which were not detected by metadata search. On the other hand, metadata search detected only 0.7% additional images with bird keywords, which were not detected by YOLO.
We assessed the accuracy of bird-detected images by only YOLO and only metadata search, using human classification by the first author. We defined the human classification task with the question: “Is there a real bird in this photograph?” We used a random sample of 1000 bird photographs detected by only YOLO, or by only metadata search. According to the accuracy testing, bird images classified only by the metadata search but not with YOLO resulted in a substantially lower accuracy of 26%, while bird images detected only by YOLO resulted in an accuracy of 89%. Although our sample size for human classification was low at this point, this finding confirmed the increased accuracy of YOLO detection.
Although YOLO detection had an accuracy of 89% in classifying birds,
Figure 4 represents a variety of sample cases of accurate and inaccurate classifications of YOLO.
Figure 4a,b,d represent accurate classifications of birds. The algorithm detected two birds in
Figure 4a with an estimated accuracy of 60% and 59%, although there were obviously more birds (five) in this image. However, since the image was not tagged with bird keywords it was not captured by the metadata search. The algorithm also detected a bench with 60% accuracy, although there were multiple benches in this photograph. Both birds in
Figure 4b were accurately detected by 85% and 80% estimated accuracy, and the bird in
Figure 4d was accurately detected by an accuracy of 98%. Although the rest of the images in
Figure 4c,e,f do not contain birds, they were inaccurately classified by YOLO as containing birds. The shape of a butterfly in
Figure 4c and the shape of the flowers resemble features of a bird such as the wings, neck, and beak, which possibly led to misclassification. However, classification accuracy for these two images were low, 54% and 51%, respectively. As we did not include a threshold, we included any classification regardless of the probability value provided by YOLO. Finally,
Figure 4f contains a realistic drawing of a hummingbird, which was classified as a bird by YOLO. This classification illustrates the case where classification is algorithmically accurate, but semantically inaccurate as the purpose is to identify real birds.
Figure 5 represents bird images detected by only the metadata and not YOLO.
Figure 5a is an accurate classification of a woodpecker, a common bird species thanks to the title of this image “Acorn woodpecker”. YOLO algorithm was not able to detect the bird in this photograph because of how the bird had blended well with the tree branch, which concealed the major features of the bird for objection detection. On the other hand,
Figure 5b–d do not contain real birds but they commonly contain “bird” “keywords”.
A comparison of the density of bird images obtained from metadata search and YOLO are shown in
Figure 6. We combined the counts of observations of YOLO and keyword search data, and employed natural breaks classification to determine the class breaks in
Figure 6. While the areas of bird images detected by YOLO and metadata search substantially overlapped, YOLO identified more bird images than metadata search for most of the study area. YOLO identified a much larger quantity of bird pictures in urban areas such as New Orleans, San Francisco, New York, Washington D.C., and Seattle. Moreover, YOLO results represent the continuity of bird habitat regions in coastal areas of Florida, the North East, Lake Michigan, and California. On the other hand, the density of bird images detected by metadata search produced more fragmented spatial patterns across the nation.
3.2. Validation
Our objective for validation was to answer the following questions:
To answer these questions, we compared YOLO-detected Flickr bird image statistics with eBird observations. We first calculated Spearman Rank correlation based on the fixed distance counts of observations and distinct users. We found a strong correlation between the count of eBird observations and YOLO-detected Flickr bird images with a correlation coefficient of 79%. Moreover, the count of distinct eBird users and Flickr users produced even a larger coefficient of 85%. These values indicated the strong overlap between eBird observations and Flickr bird images. We then compared the temporal patterns of Flickr image, user, and image-to-user ratios with eBird observation, user and observation-to-user ratios (
Figure 7). Overall, Flickr had a declining trend from 2013 to 2016 both in terms of the number of bird photographs and users. This decline was also consistent with the overall decline in Flickr usage. On the contrary, eBird observations and users exhibited an increasing trend over the three-year period. Both Flickr and eBird photographs and users statistics peaked in spring months. Photograph-to-user ratio had an increasing trend for Flickr. On the other hand, eBird observation-to-user ratio was very consistent across the three-year period and peaked around spring and summer months.
Between December 2013 and December 2016, there were 125,179,161 eBird observations within the bounding box of the conterminous US. Among these observations, 115,682,223 observations were exactly within the conterminous US. There were only 1,422,554 distinct coordinates, which corresponds to 1% of eBird observations in the conterminous US. This was mostly due to the multiple observations made from the same site throughout the day. Among 746,998 Flickr bird images, 346,549 images had distinct coordinates (46%), while the rest of the 54% of the images had coordinates that repeated more than once. This was also a result of the same user’s, or even in rare cases, multiple users’ sharing of multiple images from the same coordinates (e.g., habitat observation towers). We attributed this pattern to Flickr users’ casual birdwatching behavior as compared to eBird users’ serious birdwatching activity.
In order to identify the spatial variation among Flickr bird images and eBird observations, we compared kernel density estimates of YOLO and eBird observations with a fixed-distance threshold of 20 miles (
Figure 8). We observed an increased dispersion of the spatial distribution of eBird observations, which can be attributed to the fact that eBird had approximately 167 times more observations than Flickr bird photographs, and approximately 3.7 times more users than Flickr users who took bird photographs. We computed the z-scores for both YOLO and eBird observations in order to compare the two different distributions in which eBird observations had much higher density than YOLO-detected Flickr photographs. We combined the z-scores of the two dataset, and employed natural breaks classification to determine the class breaks for Flickr and YOLO maps in
Figure 8. From
Figure 8, we confirmed that the spatial distribution of eBird observations and Flickr photographs were similar to each other except few areas in which the magnitude and spatial extent of eBird and Flickr observations showed substantial differences. Both datasets indicated that high birdwatching activities take place around coastal areas and populous regions adjacent to metropolitan areas. While spatial patterns of birdwatching were similar between the two datasets, eBird was relatively more prominent in coastal areas of the North East, South East, West, Gulf Coast, and Great Lakes; national forests, prairie grass lands, wetlands, and areas where there was infrastructure for human access and birdwatching. While the magnitude of eBird density was much higher than Flickr across the nation, Flickr was relatively more prominent around urban areas such as New Orleans, Miami and Detroit.
Figure 9 illustrates the percentage of YOLO-detected Flickr bird images among both Flickr and eBird observations. This figure represents the bi-polar ratio of Flickr to eBird, and highlight areas where YOLO-detected Flickr photographs are above 1% using adaptive kernel smoothing that employs the 100 nearest users (both Flickr and eBird) to identify the neighborhood in the smoothing parameter.
Figure 9 highlights prominent areas of Flickr bird photographs in natural lands that provide nesting, stopover, and overwintering habitat for birds. Interestingly, the spatial patterns were very distinct and different from fixed-distance density distribution, and provided a valuable input where Flickr usage was relatively higher in comparison to eBird. Regardless of the difference between the number of observations between Flickr and eBird, Flickr bird photographs were prominent (over 10%) in areas where there was access and infrastructure for birdwatching across the nation. The example areas where Flickr bird photographs were relatively higher are Grand Canyon River and Colorado Plato, Yellow Stone National Park, Southern Colorado, national preserves and wildlife areas in Southern Florida, and the wetland and prairie lands in the Mid-West. These Flickr users likely represent tourists who are not serious birdwatchers.