The Role of Citizen Science and Deep Learning in Camera Trapping

: Camera traps are increasingly one of the fundamental pillars of environmental monitoring and management. Even outside the scientiﬁc community, thousands of camera traps in the hands of citizens may offer valuable data on terrestrial vertebrate fauna, bycatch data in particular, when guided according to already employed standards. This provides a promising setting for Citizen Science initiatives. Here, we suggest a possible pathway for isolated observations to be aggregated into a single database that respects the existing standards (with a proposed extension). Our approach aims to show a new perspective and to update the recent progress in engaging the enthusiasm of citizen scientists and in including machine learning processes into image classiﬁcation in camera trap research. This approach (combining machine learning and the input from citizen scientists) may signiﬁcantly assist in streamlining the processing of camera trap data while simultaneously raising public environmental awareness. We have thus developed a conceptual framework and analytical concept for a web-based camera trap database, incorporating the above-mentioned aspects that respect a combination of the roles of experts’ and citizens’ evaluations, the way of training a neural network and adding a taxon complexity index. This initiative could well serve scientists and the general public, as well as assisting public authorities to efﬁciently set spatially and temporarily well-targeted conservation policies.


Introduction
In recent decades, there have been advances in low-impact methods for wildlife monitoring, such as natural markings [1], identification by footprints [2], vocal individuality [3], DNA sampling [4], and UAV monitoring [5], among others. One of the most frequently applied tools is the digital camera trap (CT) equipped with passive infrared (PIR) sensors [6]. A camera trap provides researchers with outputs, shedding light on a wide range of wildlife-related parameters, from species distribution to tempo-spatial behaviour [7,8]. CT can also provide data on the effectiveness of conservation interventions and management efforts [9]. CT has made ecological monitoring more efficient in almost every kind of environmental condition at any time of the day or year [7].
Ecological monitoring as a fundamental component of wildlife conservation and management has become increasingly important, particularly in the context of the growing anthropogenic pressure on wildlife, their habitats, and ecosystems in general [10,11]. Knowledge of a species' richness and wildlife community structure within a focus area is essential for reducing the current and future conservation threats and improving the management actions [12,13]. In the effort to enhance the security and sustainable coexistence between wildlife and humans [14], wildlife monitoring can provide valuable information for the adaptive management in mitigating property losses (e.g., predation of livestock or field and orchard yields damaged by game) [15] or even public health (animal diseases transmissible to humans) [16]. Conventional wildlife monitoring (using techniques such as direct observation, looking for animal signs/tracks, and the capture/mark/release of animals) comes along with constraints, particularly those that occur in studies of population dynamics of cryptic taxa or taxa living in low densities and in poorly accessible habitats. Research into an elusive species is even more demanding [17,18]. To reach acceptable levels of efficiency, conventional environmental monitoring requires well-trained professionals to take part, which is time-consuming. However, in the case of Citizen Science, a number of motivated public volunteers may serve as the missing "tool" for reaching acceptable levels of effectiveness. Even though such people may lack extensive knowledge, they can still be engaged as sensors for collecting and managing data [19].
Although widely used and appreciated, CTs still require nontrivial initial costs, not only in the beginning but, also, during the study, due to the vulnerability of the materials to natural conditions, especially in long-term research [20], and the theft of cameras [21]. Other discussed issues are batteries for CTs (long-lasting in remote areas, reliable, and high quality), as well as data storage (particularly SD cards with different writing speeds, storage capacity, etc.) [22]. Not surprisingly, the reduction in CT prices and the increased interest of both researchers and the public have resulted in an increase in the number of CTs deployed and the images in recent times. There is now an increasing demand for the effective processing of such multi-sourced and multi-format data and the sharing of data [23]. Each camera trap study generates a vast amount of data. Without an effective data-processing tool [24] and sufficient time, funding, and human resources, it is common practice that researchers analyse "only" images of particular project target species and lack the motivation and time to classify, process, and analyse bycatch data of nontarget taxa in particular [25]. However, secondary analyses of such bycatch data collected from the longterm monitoring on a large scale can often answer research questions on the community dynamics of species where quantitative direct observations are lacking [26]. These factors have led to an increasing number of software packages capable of streamlining data processing, varying in function and access (desktop software/web interface), according to the purpose of a particular study generating data (e.g., Reference [27] and see the review in Reference [24]). CT practitioners must choose from the current selection of processing tools or develop their own solutions adapted to their specific research area or topic. Although a universal standardised system may remain only an unworkable idea, any newly developed system needs to adopt standards that allow the sharing of data from multi-collaborators (e.g., Reference [28]). This is particularly important if CT projects are to result in benefits not only for researchers but, also, for policymakers, law enforcement agencies, and antipoaching teams [29].
In recent years, several camera trap projects, either on a national scale (e.g., Mammal Web and Wildlife Spotter); transnational scale (e.g., eMammal, Agouti, Trapper, and Wildlife Insight); or landscape scale (e.g., Snapshot Serengeti), appear to have covered the demand for the effective processing of an enormously increasing amount of CT data. Further, several stand-alone software packages have emerged [27]. However, no single solution out of these general software packages has become a favourite for data processing [24,25,27].
As part of the data processing with any CT software, the identification of animals is certainly of paramount importance. In recent years, two parallel approaches within large-scale projects have appeared to be of value in the identification of animals: the use of computer-based deep learning, for example, with deep convolutional neural networks [30][31][32][33] and the involvement of volunteers [34][35][36]. Nevertheless, meeting artificial intelligence (AI) and citizen scientists within one CT project/platform has emerged in very recent studies. Competencies given to algorithms and humans differed in these studies and resulted in a lack of accuracy in various specific species identification and a large amount of human effort expended in data processing [37][38][39]. Such approaches can benefit "simply" from photo submissions by crowds and image classification by AI (e.g., WildBook; see Reference [40]) or from combining both human and AI in a consensus classification workflow [39]. Researchers see a great potential in the latter. AI can be implemented by different approaches; in a semi-autonomous regime, when the images classified by algorithms exceed the desired confidence threshold, they do not need to be processed by humans [39]. Another approach can respect AI as another vote added to the votes of volunteers. Alternatively, a consensus classification workflow can be achieved by a series of consensus classification stages, known as cascade filtering [37].
To achieve a more efficient and promising integration of Citizen Science and AI in CT, our study proposes to show how the workflow can be updated by respecting a mix of the roles of experts' and citizens' evaluations, the method of training a neural network, and adding a taxon complexity index. Including the latter in the workflow can play a significant role in minimising the bias in citizens' consensus on taxa identification. Untrained (or not properly trained) volunteers do not need to be aware of the occurrence of rare or invasive species that may confuse with related domestic species. Moreover, such taxa can be minimally represented in an existing dataset; thus, an expert evaluation has its irreplaceable role in the workflow.
The other, overall goal of this study has been to develop a conceptual framework and an analytical concept for a web-based camera trap database (CTD) following a standardised format that will meet the above-mentioned approaches. The collaborative coordination of particular CT projects and their data sharing on a national scale, or even on a continental scale, is still rare, since such a solution poses many constraints, varying from the technical (e.g., interoperability, sustainability, standardisation, and ease of access); the financial; and the legal (general data protection) to the motivational (data sharing and misuse). For this reason, here, we introduce an open framework that can be used as an analytical base for newly developing (or for updating existing) collaborative CT projects/platforms. By processing data efficiently in this way and performing a number of other practical functions, the database should inevitably motivate CT users to upload and share their data with other researchers and governmental agencies.

Materials and Methods
In order to enhance the solutions for an intuitive, widely used, web-based platform for uploading and presenting the data from camera traps, we developed a user motivational and conceptual framework, together with a logical concept (see the Supplemental File-Information System for Managing Camera Traps' Data).
We specified a set of rewards for camera trappers to provide their data based on interviews with professional users of CT (e.g., academics, Nature protection managers, and NGO activists), some of whom also have experience with the involvement of the Citizen Science concept. We identified the following crucial CTD attributes for motivating users: • automatic generation of much statistical information with an overview, • map projection, • automatic classification of the uploaded images that helps to engage citizens in data processing and minimises human effort at the same time, • archiving and compressing of all records, • export of collected data in various formats, and • voluntary data embargo while research is active (e.g., to allow threatened species to be protected, their occurrence data should not be revealed to the public).
The database users who will be permitted to explore and download data may browse through a very detailed data projection, both on a map and in a table (compatible with the existing comprehensive standards). They may also export filtered entries in several output formats. The data could be exported as tables (the standard format adopted is CSV [24,25] or as map data (GPX or KML).

Data (Workflow)
Automatic extraction of metadata, i.e., date, time, PIR sensitivity, CT manufacturer, CT model, etc., stored in EXIF (Exchangeable Image File Format) takes place after a successful uploading of the records. The metadata give the database user important information about the record conditions, which are key for the analyses of species' detectability and distribution.
All of the CT automatically extracted metadata can be manually edited by the trapper. In particular, this is obligatory for GPS coordinates, so the altitude can be automatically imported from the attribute table of the digital terrain model accessed through a web map service. The list of metadata attributes and their adopted format and examples is presented in Table 1. The trapper can further add other optional useful attributes and notes related to the animal, such as sex, age count, etc. We used the existing Access to Biological Collections Data (ABCD) schema as an evolving comprehensive standard [41], and we proposed a slight extension respecting the Camera Trap Metadata Standard (Table 1 [28]). The advantage of using ABCD is its compatibility with GBIF and BioCASe (Biological Collection Access Service for Europe) networks. We also discussed the compatibility issues of metadata with other standards-e.g., Dublin Core, Project Open Data Metadata Schema, Keep In, ALA BioCollect, and others. Full compatibility with the other main standards is the subject of future challenges, especially with the integration of larger Citizen Science infrastructures, such as SciStarter or Zooniverse. An automatic detection of empty records [42] or people in the images in accordance with GDPR law, together with an identification of the content, starts immediately as soon as the new records are successfully uploaded. Empty records are labelled and filtered out from the data in the subsequent workflow; people detected in the images are irreversibly masked, and the images are labelled. Subsequently, automatic image classification (deep learning) aiming to reach a confidence threshold is implemented as an initial phase in the identification of animals before citizens are engaged in classifying images. If AI successfully classifies images above the desired confidence threshold [39], two-thirds of such data are automatically verified, and the rest continue to the volunteers'/spotters' classification stage. We expect that "attractive taxa", once made available for human identification, will help to encourage users to continue to identify others. Thereafter, a consensual species identification done by volunteers is included: at least three spotters have to agree on their photo/video classification for its verification to be successful. However, the minimum number of classifiers required for consensus could be increased in the case of rare or less common species, based on the research protocols. Moreover, volunteers can set their level of certainty related to their particular classifications and, thus, potentially deliberately lower the weight of their particular votes. If the spotters' decisions are not consistent, the camera trap record remains unverified. Other spotters have to classify the photo/video until there is at least a 75% consensus among all the spotters involved or the number of spotters reaches 10. If the maximum number of spotters is reached, or if the resulting consensually estimated taxon belongs in the group of high taxa, further validation by two experts is needed. If there is a consensus among these experts, the image is successfully classified, and the image is assigned to the training model. Otherwise, if there is no consensus of two experts out of a total number of three, the image's classification is rejected. This approach is presented in a flowchart below (Figure 1). The 75% represents an arbitrary threshold. Every spotter builds up his/her own credibility. Spotters who classify photos/videos in consensus with successful (verified) classification get a higher weighting for their future votes. The spotter's weight is a decimal number between zero and 1 and is estimated only for spotters who have already made some classifications. This weighting is computed as a simple division of the number of the spotter's correct classifications (where the spotter's classification is consistent with the final consensus) by the total number of the spotter's classifications. Therefore, spotters who do not recognise the taxa well or who would like to spoil the system are automatically eliminated.
As the CTD continues consolidating, all the processed and consensually agreed classified records (by a combination of spotters and experts) serve as datasets for further machine learning. Therefore, the inner models (mathematical descriptions) representing each taxon are regularly retrained. Thus, the capability of the automatic classifier may improve. This situation is illustrated in Figure 2.
The system has a logical design; it depicts the entity relationships in the database of the whole system. Each record in the system (i.e., an entry in the table of records that represents a photo/video from a camera trap) has multiple associations with other entities in the database. Firstly, it associates with a specific day/time and some location (area) and habitat where the record was taken. Each record also relates to a trapper, who is the person who uploaded the photo/video to the system. Finally, each record is also associated with several classifications made by spotters, as well as with the resulting taxon (the genus and species of the trapped animal).  The system has a logical design; it depicts the entity relationships in the database of the whole system. Each record in the system (i.e., an entry in the table of records that represents a photo/video from a camera trap) has multiple associations with other entities in the database. Firstly, it associates with a specific day/time and some location (area) and habitat where the record was taken. Each record also relates to a trapper, who is the person who uploaded the photo/video to the system. Finally, each record is also associated with several classifications made by spotters, as well as with the resulting taxon (the genus and species of the trapped animal).

Results
One of the aims of this communication is to define the conceptual framework for possible further realisation of the system. Therefore, the results presented in this section are concept-oriented, using a top-down approach. The system design and workflow (already shown in Figures 1 and 2) are enhanced by the detailed use case overview, along with the complex data model (represented by the entity relationship diagram) in this section.
The use case diagram reflects all the desired functionality from the point of view of the different groups of users (roles) of the information system (web application). These groups are defined in the following way (considering the brief descriptions of users in the Materials and Methods section ordered hierarchically): • unknown visitors as normal, nonprivileged web application visitors who are only able to perform a limited number of actions in the system, • registered spotters, who-in addition to the role of visitors-can classify records; they are mere evaluators, voluntary experts, or laymen; these users do not have access to the records' attributes, • permitted users, who-in addition to the role of visitors-can view records with their attributes, export data, and statistics,

Results
One of the aims of this communication is to define the conceptual framework for possible further realisation of the system. Therefore, the results presented in this section are concept-oriented, using a top-down approach. The system design and workflow (already shown in Figures 1 and 2) are enhanced by the detailed use case overview, along with the complex data model (represented by the entity relationship diagram) in this section.
The use case diagram reflects all the desired functionality from the point of view of the different groups of users (roles) of the information system (web application). These groups are defined in the following way (considering the brief descriptions of users in the Materials and Methods section ordered hierarchically): • unknown visitors as normal, nonprivileged web application visitors who are only able to perform a limited number of actions in the system, • registered spotters, who-in addition to the role of visitors-can classify records; they are mere evaluators, voluntary experts, or laymen; these users do not have access to the records' attributes, • permitted users, who-in addition to the role of visitors-can view records with their attributes, export data, and statistics, • registered trappers, who-in addition to the roles of visitors and spotters togethercan view their records with attributes, upload a batch of records to the system for further classification, and export data and statistics, • administrators, which is a limited group of users with the highest possible position in the hierarchy of roles of this information system, aimed at the maintenance of the system, and • finally, the system, which is a specific role, performs automatically defined actions.
The hierarchy of the roles in the whole information system and their defined use cases are presented in Figure 3.  The complex data model of the information system defines all the required database entities. They are the following: Records, Taxa, TaxonClasses, Districts, Habitats, AreaProtections, Daytimes, Classifications, Persons, Expertises, Projects, and Configurations. The relationship among these entities is depicted in Figure 4, including cardinalities, attributes, and primary and foreign keys (PK and FK). A brief description of all the entities follows in the subsequent paragraphs. The entity of "Records" is determined for detailed information obtained from camera trap records that will be used for further analysis and statistical processing in an effort to preserve as much information as possible from the records and, also, to gather a lot of information describing the habitat and other consequences with each record.
The entity of "Taxa" is designed not only for the purpose of enumeration (conversion of an ID into its verbal expression) but, also, to make internationalisation relatively easy. Adding a language mutation of names (inserting a new column) is a relatively simple step. Individual items of this entity contain the genus and species together. The entity also includes an empty element, entitled "EmptyRecord", suitable for cases where the record is a false positive (without an animal). At the same time, it is appropriate for statistical purposes to record, in particular, the element identifying a human in the record.
The entity of "TaxonClasses" contains the classes that are represented in the database. The entity of "Districts" contains a list of the districts in a particular country/continent. The entity of "Habitats" contains information about the conversion of the identifier into its verbal expression. The presence of this entity in the system will make it possible to easily export the system to particular language mutations (internationalisation).
The entity of "AreaProtections" contains an overview of areas with their status of protection. The entity of "Daytimes" contains information on the conversion of the identifier into a verbal expression of the daytime. The presence of this entity in the system enables a further simple translation of particular language mutations.
The entity of "Classifications" is essential for the evidence on which records were classified, by whom, and how. In addition, this way of storing classifications will make it possible to prevent records from being repeatedly displayed to the same spotters.
The entity of "Persons" is designed to keep records of registered users (their role in the system, access, and other data). An internal auto-classifier also assigns classifications. Therefore, it is listed as the first item of this entity.
The entity of "Expertises" contains information on the conversion of the knowledge level identifier into its verbal expression. The presence of this entity in the system ensures the possibility of easy translation into particular language mutations (internationalisation).
The entity of "Projects" is designed to manage information about the projects from which camera trap data were obtained. It is assumed that particular data are obtained within the maximum of one project. The details about the projects are brief but sufficient for searching data in the system. The entity of "Configurations" is intended to define all editable system settings.
The results and findings presented in this document (the conceptual model with defined use cases, entities, attributes, and relations) are further discussed in the following section.

Discussion
Our conceptual framework and analytical concept of a camera trap database represent one approach to the collaborative processing of CT data from the multiscale monitoring of related or independent research projects. It could enable CT practitioners to share data and co-create other studies, while also being an important asset for efficient wildlife management and the coordination of CT monitoring on a national and/or continental level.
The image processing of the growing numbers of CTs with their different levels of complexity requires efficient data management tools. Unlike the more recently developed software solutions that come with intuitive and effective sorting and the classification of images [2,31,43], many CT monitoring projects are still handling data manually (Romportl, pers. comm.), either directly by the researchers themselves or by engaging volunteers to process the data [34,36]. With regards to its universal use and the optimising of efficiency of image classification, our logical concept of CTD has combined involving citizen scientists (multiple classifiers) and machine learning processes, enabling the filtering of blank images and automatic animal identification to be integrated into the data workflow. When considering the balance between minimising the effort required of volunteers and researchers per image and maximising the accuracy of animal detection, we decided to include a consensus of at least three volunteers (in the case of common species) while keeping at least a 75% consensus per image or video. This number represents a conservative minimum that could avoid mistakes in the classification of records of common species and yet still minimises the need for expert review. The adequacy of using three agreed classifiers in common species identification was tested in the Snapshot Serengeti project [34]. However, since every research project is done under different environmental conditions (geographical, biodiversity region, weather conditions, image background, etc.), the proposed concept allows the threshold to be increased operatively with respect to these project characteristics. Nevertheless, to avoid the bias of false volunteers' consensus on taxa with a high complexity of identification, we extended the framework with a taxon complexity index and the appropriate expert validation process. All verified images can thus serve as regular training for the inner models of species. In this way, the system keeps learning and adjusting to its own inner models of the previously validated data representing each taxon and, thus, can potentially improve the reliability of its future classifications. Regarding the diversity of data and the aims of particular research projects, the CTD should be flexible enough and useful for a wide range of users [43]. Thus, such features that are included in the proposed system, along with the automatic detection of empty records (typically shots with no animal present, i.e., false-positive images) in the initial phase of data processing, and the involvement of citizen scientists for species determination might motivate trappers to use a web-based CTD. Moreover, such batch, automated, image classification may help researchers in assessing bycatch data that would otherwise probably have remained unprocessed. Combining bycatch data over multiple sites may enable scientific collaboration to address issues on macro-ecological patterns [44]. Furthermore, the rapidly increasing amount of data collected by laic citizens have a high potential to a significantly contribute to the CTD and its aims.
However, the ease of use and any number of clever features of the system can lose their attractiveness at the moment a researcher lacks the motivation to share data, typically due to various concerns about the misuse of data. This could be due to data-sharing policies, restrictions, or for personal reasons. Thus, we have tried to address such concerns. Firstly, every trapper receives a limited embargo on the use of their data while their research is running (if needed). Furthermore, every trapper can adjust the embargo on the data of the concerned or threatened species. In addition, every person who is permitted to download and use data must agree to accept a specific Licence Agreement for specific data management. Nevertheless, data sharing within the scientific community is widely encouraged [23]. According to the Committee on Responsibilities of Authorship in the Biological Sciences, scientists are obliged to make their data available to others in a format that other scientists can use for future research [45]. Hence, we believe that our presented approach may provide a useful tool to address the aforementioned recommendation while also taking the needs of individual researchers into account.
Furthermore, there is a risk that less attractive records will be skipped over without being classified. The latter can be mitigated by continually motivating the volunteers. One such motivational approach proposed in our framework is as follows: although all images successfully classified by AI above the desired confidence threshold can be automatically verified [39,46], still, one-third of such "attractive" data is subject to the classification done by citizens in our framework. The involvement of citizen scientists in research-in CT studies, in particular-is rapidly moving into its "golden era". When the public is properly educated, trained, and engaged in using intuitive best practices, more desirable outputs are reached (as can be seen in similar studies, e.g., References [9,36,47]). Moreover, despite the rapid development of computer deep-learning processes in identifying species and particular individuals [31,33], the engagement of citizen scientists in classification processes would mean better data quality, as well as increasing the likelihood that less-favoured data will not be overlooked. Human validation is still needed to ensure the data quality.
Another role of citizens, which is proposed in our analytical concept (see the Supplementary Materials), is including them in classifying the number of individuals trapped and their behavioural aspects (categorised). Human effort can thus streamline AI processes in camera trapping research (e.g., Reference [39]). Furthermore, it motivates the public to appreciate and conserve wildlife, as volunteers have a greater affinity with nature and to being a part of the research. In addition, the recognition of such approaches is highly valued by policymakers, as they are in a better position to make well-informed decisions regarding landscape management, spatial planning, and conservation approaches, altogether accomplishing SDG 15-Life on Land.

Future Challenges
Despite some of the above-described advantages that our analytical model presents, there are still several issues that need to be addressed and where further progress is needed.
A key aspect is the reidentification of an individual animal upon recapture essential for the analyses of population dynamics, behavioural ecology, and ecosystem functions. Hence, either manual or automatic re-ID ought to be part of any CT software solution. While, today, fully autonomous reidentification is still in development-and requires a robust database of many images of any one individual animal [33]-several CT studies have already demonstrated their power in identifying individuals of a particular species (e.g., References [48][49][50]). The autonomous re-ID of individuals from a pack of dissimilar species within one system will therefore certainly be required in the near future. Such a feature could also be implemented in the proposed system. In addition, such a CTD based on our analytical concept could be enabled to automatically export data to specific national faunistic databases in order to simplify the conservation steps and would have the potential of being adapted in global Citizen Science platforms, e.g., SciStarter, Zooniverse, CitSci.org, and others.