The vast majority of empirical studies in the social sciences still rely on traditional, survey-based data collection procedures. However, a growing number of experiments are taking advantage of the opportunities offered by the information revolution and the development of infocommunication technologies, and are developing new procedures and methods for data collection [6
]. The vast majority of empirical studies in the social sciences still rely on traditional, survey-based data collection procedures. However, a growing number of experiments are taking advantage of the opportunities offered by the information revolution and the development of infocommunication technologies and are developing new procedures and methods for data collection [6
]. In this paper we join these efforts and present the results of an exploratory study conducted in Hungary. This involved an online survey to determine willingness to participate in a future research project based on smartphone data collection.
In our study, we seek answers to three questions: first, how the willingness to participate in smartphone-based data collection is related to the socio-demographic characteristics of the individuals; second, how participation is influenced by the different types of smartphone use; and third, to what extent can the selection biases caused by the previous two factors be reduced by careful planning of the survey design.
Factors Determining Willingness to Participate
The development and application of innovative data collection techniques is necessitated by the fact that traditional survey methods are encountering increasing difficulties as non-response rates rise [1
]. However, with novel and innovative techniques—such as online data collection, data donation or the use of smartphone applications—it may be possible to reach social groups (e.g., young people) who are difficult to persuade to participate in social science research using conventional survey methods [3
]. In addition to reaching potentially new groups of respondents, however, a far greater advantage of these new techniques is that we can measure digital activity with a level of accuracy and detail that is not possible with traditional surveys.
According to the method of data collection, a distinction can be made between active and passive data collection [5
]. Traditional survey-based techniques almost exclusively use the active form of data collection. Data is collected in such a way that respondents actively answer questions formulated in advance by the researcher. In this case, the researcher assumes, or is forced to assume, that respondents understand the questions in the survey and have sufficient information to answer them. Another important assumption is that respondents are willing to answer the questions honestly and to the best of their knowledge and belief [14
]. It is clear that these conditions, or some of them, are regularly not fulfilled and the researcher has limited possibilities to verify them.
Passive data collection, on the other hand, does not require the data provider to actively participate in the collection of the information. Smartphone-based data collection techniques, for example, obtain information through an application that accesses data generated by sensors built into the device or by the smartphone’s operating system. This can include information about people’s online activities, app usage, location data or even their health status [9
]. A key advantage of passive data collection is that it avoids many of the pitfalls of active techniques, as it does not rely on the active role of participants in data collection. This can lead to more accurate measures of digital activity.
However, it is also clear that while innovative data collection techniques have many advantages over traditional survey-based methods, there are also some problems that pose a major challenge to the widespread use of these techniques.
All social science data collection is based on a strong relationship of trust between researchers and data providers [21
]. If this relationship of trust does not exist, members of society will refuse to participate in research. Refusal to participate is a growing problem in traditional data collection, but it is even more pronounced in passive data collection (including smartphone-based techniques), which require access to particularly sensitive personal data when they aim to observe people’s digital behaviour. It is no coincidence that willingness to participate in this type of data collection is generally lower than in traditional surveys [13
]. Depending on the research design and specific data collection, actual and hypothetical willingness to participate varies widely. (For a comparative overview see [5
Digital footprints can be used for research in many ways. The constantly evolving technological environment puts these data collections under constant pressure to innovate. On desktops, the most common methods are customised browsers or browser plug-ins that log website traffic and methods that use screen scraping. On mobile devices, it is possible to monitor a wide range of user activity by directly accessing sensors and local databases via APIs. The third method is data donation, where users donate their personal data generated and stored by social media services, e-commerce, etc. Research has looked at a variety of factors that influence willingness to participate and the level of collaboration. The type of organisation conducting the research (i.e., academic research is favoured over commercial research) and the purpose of the research (to what extent it serves the public good) seem to have an impact on willingness to participate [30
]. Empirical experience to date is contradictory in terms of preference for full, partial or no transparency of the actual data users provide from their device when participating [21
]. However, the sensitivity and personal relevance of the data is perceived to be the most decisive factor [31
]. So far, research on the role of incentives has been rather hypothetical, i.e., surveys have been conducted to measure willingness to participate in the case of different incentives. Experience so far shows that there are no easy-to-apply formulas for the amounts of incentives for which people are willing to participate in passive data collection or to donate their data. Higher amounts are associated with higher willingness to participate, but beyond a certain point, higher payments no longer improve the level of engagement [12
]. Compared to previous studies, we examine these factors in a controlled survey experiment that allows us to test whether the different factors in the research design affect willingness to participate differently in different social groups.
Since data collection via smartphone directly observes the digital behaviour of individuals, the question of research ethics arises to a much greater extent. Therefore, data collection needs to be much more carefully and thoroughly planned to avoid data containing personal information and to ensure proper anonymisation of sensitive data. In addition, the safe storage of particularly sensitive data must be ensured [9
Due to the higher refusal rate, smartphone-based data collection techniques may also lead to difficulties in reaching certain social groups. For example, it is obvious that a prerequisite for participation in a smartphone-based data collection is that the respondent owns such a device. In Hungary, more than 90% of the population owns a smartphone, but those who are excluded due to the lack of smart devices still cause systematic selection bias. In fact, the most disadvantaged and vulnerable populations are systematically excluded from data collection. Participation in this type of innovative data collection also requires a minimum level of digital literacy, which automatically limits the reach of certain social groups, e.g., the digitally excluded groups [37
]. However, this innovative method of data collection can reach previously less accessible groups (e.g., the younger generation) at a higher rate. In our exploratory study, we focus on these issues by investigating which social groups can be reached with passive digital data collection and which would refuse to participate in such a survey. We investigate the role socio-economic variables play in willingness to participate. In addition, we use a survey experiment to estimate how certain elements of the research design affect the willingness to participate of different social groups and different types of smartphone users.
Based on the literature, we explore three questions related to participation rates and research design.
First, previous studies show that participation in traditional surveys is systematically related to demographic and socio-economic characteristics of respondents. In Hungary, for example, non-response is systematically higher among young people, people living in the capital and people with a more favourable social position [2
]. However, we expect socio-economic factors to have a different effect in the case of innovative, sensor-based data collection. For example, we expect that, similar to other studies, younger people are more likely to participate in smartphone-based data collections [5
]. Our first question is therefore: How is the willingness to participate in a smartphone-based data collection related to the demographic and socio-economic characteristics of the respondents?
Second, smartphone ownership and the nature of smartphone use itself is related to individuals’ socio-economic characteristics [32
]. Based on these findings, we hypothesise that the type of smartphone use itself might also be related to participation in data collection. In our work, we create a typology of smartphone users based on intensity and different types of device use and explore how this relates to willingness to participate. Based on previous work, we hypothesise that people who use their smartphone more intensively, diversely and progressively are more willing to participate in data collection [27
]. Therefore, our second question is whether and how smartphone use is related to willingness to participate.
The third main question of the article is to what extent willingness to participate in a sensor-based data collection depends on the research design itself. Similar to [5
], but with a different measurement of the dependent variable, we investigate how the survey can be designed to minimise non-response and encourage participation. In this context, we examine the role of various factors (survey organiser, amount of data collected, duration of the survey, level of incentives offered to respondents, interruption and control of data collection) that can have a major impact on willingness to participate in data collection [5
However, we would like to explore not only how these factors relate to participation, but also how they work across different social groups and smartphone user groups. This part of the analysis is mainly exploratory, although the basis of these analyses is our assumption that incentives, duration of data collection or other factors do not have the same effect on willingness to participate among higher and lower status individuals or among basic and advanced smartphone users. The strength and direction of these effects are difficult to predict because people from different social backgrounds and with different ways of using their phones may have different sensitivities to elements of the research design, resulting in different participation rates. For this reason, we systematically investigate (i) how the interactions between social status and different factors of the research design and (ii) how the interactions between types of phone users and factors of the research design relate to willingness to participate in sensor-based data collection. Our fourth question is therefore: What are the effects of the different elements of the research design on the different social groups and types of smartphone users studied and how do they affect willingness to participate?