Evaluation of User Reactions and Veriﬁcation of the Authenticity of the User’s Identity during a Long Web Survey

: Web surveys are very popular in the Internet space. Web surveys are widely incorporated for gathering customer opinion about Internet services, for sociological and psychological research, and as part of the knowledge testing systems in electronic learning. When conducting web surveys, one of the issues to consider is the respondents’ authenticity throughout the entire survey process. We took 20,000 responses to an online questionnaire as experimental data. The survey took about 45 min on average. We did not take into account the given answers; we only considered the response time to the ﬁrst question on each page of the survey interface, that is, only the users’ reaction time was taken into account. Data analysis showed that respondents get used to the interface elements and want to ﬁnish a long survey as soon as possible, which leads to quicker reactions. Based on the data, we built two neural network models that identify the records in which the respondent’s authenticity was violated or the respondent acted as a random clicker. The amount of data allows us to conclude that the identiﬁed dependencies are widely applicable.


Introduction
Nowadays, user surveys are an important part of online services provided on the web. Service providers ask their customers about the interface usability, quality of service, and other issues. They try to customize their systems to ensure better user experience, also trying to model a consumer behavior, which is important for customer-oriented systems.
Usually, such feedback is requested in the form of a web survey, where the user is invited to spend a few minutes of their time and answer questions about the quality of services, or information about the person, such as age, gender, or place of residence. The same kind of surveys are used for research, which has become especially common during the coronavirus pandemic. Despite certain limitations (information is collected outside the "natural" situation; any information obtained during the survey is not devoid of subjectivity associated with the pressure of social approval; the survey provokes an answer, even if the respondent is not competent in one aspect or another), web surveys are widely used in research and practical tasks. At the same time, web surveys make it possible to obtain information directly from the event participants on a wide range of topics and allow for collecting information from any number of respondents. The problems associated with the use of questionnaires are widely discussed in a number of studies [1][2][3], so we will not specifically dwell on the discussion of the advantages and limitations of this method. The same type of web survey is used in mass psychological research. Some surveys can take up 2 of 12 to an hour or even more of a respondent's time. In these circumstances, researchers need tools that allow them to know whether the respondent is tired, the interface is convenient, the participant is a random clicker, or the answer was filled by a bot (script).
One of the effective tools for solving the problems of the evaluation of user actions and their verification throughout the web survey is the evaluation of user reactions. Technically, user reaction time is the timespan from the start of the web survey screen representing a page with a number of questions to the time when the user clicked the desired answer option. In works [4][5][6] on various experiments with large samples, it was shown that reaction time reflects personal skills of interaction with the user interface. Of course, various factors may influence the change in reaction time, but in general, reactions are maintained the same over the course of a few hours of experimentation.
In studies of web surveys, researchers quite often pay attention to the respondent's reaction time. The main focus is on the fact that long reaction times indicate the difficulty of the question [7]. Research in works [8][9][10][11] is dedicated to web questionnaire design and emphasize that the user reaction time should be noted. Works [12][13][14] consider reaction time to predict that the respondent will abandon a web survey prior to completing it. It is logical to assume that there is a correlation between the time it takes to read a question and the reaction time; besides, a direct comparison of respondents' reaction times is problematic, because each person has an individual speed [15]. Individual characteristics of a person, according to studies, influence the reaction time quite strongly, which makes it possible to identify the respondent [16][17][18][19][20][21][22].
During the preprocessing of experimental data, it is possible to identify random clickers, who run through web surveys without even reading given questions [23][24][25]. Identifying random clickers and excluding them from the dataset is an important task [26], and machine learning is used to solve this problem quite often [27]. Machine learning techniques are used to solve many problems in online survey research [28]. For example, [29] the following methods are considered to describe various relationships between the result and the input data: logistic regression, advanced tree models (decision trees, random forest and boosting), support vector machine and neural network.
A significant problem, especially in a child survey, is the substitution of the user during the interview. For example, either a companion or an adult replaces the interviewee in the middle of the survey. Such data should be questioned.
Thus, the purpose of this study is to show the possibilities for evaluating data correctness in web surveys. There are three hypotheses:

1.
While participating in a long web survey, respondents get tired, which leads to a slowing down of an individual's reaction time.

2.
There is a relationship between the response time and the question number, which allows for identification of random clickers and bots.

3.
Change in reaction time is and individual's characteristic when working with the web interface, which can be used to ensure the respondent's authenticity.

4.
The work is organized as follows: Section 2 describes the experimental data; Section 3 is focused on the research methods and results obtained; Section 4 discusses the results obtained; finally, Section 5 concludes the paper.

Research Materials
For the study, we used a dataset generated from a web survey that took around 45 min of an average respondent's time. The survey involved 20,000 students to answer the web survey and generate the dataset.
There were two types of questions: standard survey questions and questions about attitudes toward various subjects and processes.
The research was carried out as follows: respondents were provided with a survey web-interface where data were collected on the local device to exclude the influence of networks [30]. For fairness, we use the response time of the first question (interface element) on each page of the web survey. Obviously, open-ended questions require significantly more time to answer; therefore, such questions were excluded from the data.
For Hypothesis #2, a bot program was developed. Data generation by the bot was performed using the same web platform as that used by real respondents. The program was developed and designed to mimic the actions of a real user when filling out the questionnaire in a web browser. Thus, the interaction of a person and the program with the web platform had no differences.
The following components were used to develop the program that simulates respondent behavior:

•
Node.js platform and JavaScript language as the basis for the script development. • Selenium WebDriver as the system for automating actions in the browser. • Chromedriver, which provides Selenium WebDriver interaction with the Google Chrome browser.
The program fills in the web survey questionnaire from beginning to end with random data, and there are delays between the actions taken.
Different researchers [31][32][33][34] defined and filtered random clickers and their thoughtless responses in different ways. In previous studies of web surveys [4,5,30], we gained experience in detecting random clickers. In this study, we simulate the behavior of random clickers based on the analysis of data from previous surveys when working with the given web interface.
To create variability in the delays between actions in the browser, we used beta distributed pseudorandom numbers with the following parameters: alpha = 1.5; beta = 5. In the diagram shown in Figure 1, the lag time between a number of actions is shown as "random time". The following formula was used for our study: The algorithm of the program shown in Figure 1 consists of the following steps. The program starts the web-browser and goes to the specified page, then waits for 1 s before pressing the "Start" button. After that, the program presses the "Start" button and proceeds to fill out the questionnaire. The cycle checks the completion of the questionnaire and, if the questionnaire is not filled in completely, the program waits for a random time (as if the respondent has opened the questionnaire page and familiarized themselves with the arrangement of interface elements on the page). After that, the program identifies all the questions on the page and proceeds to their sequential completion. For each question, the following sequence is repeated: if it is a multiple-choice question, the program selects one random option, and if it is an open-ended question, the program enters a preset text, then the program waits for a random time (assuming the respondent needs some time to move to the next interface item). When all questions on the page have been answered, the program presses the "Next page" button and waits for 0.5 s. When the web survey questionnaire is completely filled out, the program closes the web-browser window.
To test Hypothesis #3 about the possibility of ensuring a user's authenticity and the development of the user verification tool for assessing the possible replacement of the respondent during the course of the survey, another data sample was formed. This sample consists of the mixed data of real users. That is, a third of the user's survey answers are taken and two-thirds of another, random user's answers are added to it. Appl. Sci. 2021, 11, x FOR PEER REVIEW 4 of 12 To test Hypothesis #3 about the possibility of ensuring a user's authenticity and the development of the user verification tool for assessing the possible replacement of the respondent during the course of the survey, another data sample was formed. This sample consists of the mixed data of real users. That is, a third of the user's survey answers are taken and two-thirds of another, random user's answers are added to it.

Regression Analysis
To test Hypothesis #1, a diagram of the relationship between reaction time and question number was plotted.
Questions with an open-ended answer are excluded from the diagram. The median reaction time is shown in Figure 2.

Regression Analysis
To test Hypothesis #1, a diagram of the relationship between reaction time and question number was plotted.
Questions with an open-ended answer are excluded from the diagram.
The median reaction time is shown in Figure 2. The figure demonstrates that Hypothesis #1 about the user's reaction getting slow over time is not confirmed for the large sample. This is explained by the fact that the respondents get used to the web interface and do not spend time searching for interface elements. Moreover, the reaction time decrease may be due to the wish to finish the survey as soon as possible.
If we consider the individual trajectories presented in Figure 3, there are tendencies for a decrease in reaction time. There are also cases when the respondent was distracted by external circumstances, which can be seen in Figure 4 as noticeable outliers.  The figure demonstrates that Hypothesis #1 about the user's reaction getting slow over time is not confirmed for the large sample. This is explained by the fact that the respondents get used to the web interface and do not spend time searching for interface elements. Moreover, the reaction time decrease may be due to the wish to finish the survey as soon as possible.
If we consider the individual trajectories presented in Figure 3, there are tendencies for a decrease in reaction time. There are also cases when the respondent was distracted by external circumstances, which can be seen in Figure 4 as noticeable outliers.

Regression Analysis
To test Hypothesis #1, a diagram of the relationship between reaction time and question number was plotted.
Questions with an open-ended answer are excluded from the diagram.
The median reaction time is shown in Figure 2. The figure demonstrates that Hypothesis #1 about the user's reaction getting slow over time is not confirmed for the large sample. This is explained by the fact that the respondents get used to the web interface and do not spend time searching for interface elements. Moreover, the reaction time decrease may be due to the wish to finish the survey as soon as possible.
If we consider the individual trajectories presented in Figure 3, there are tendencies for a decrease in reaction time. There are also cases when the respondent was distracted by external circumstances, which can be seen in Figure 4 as noticeable outliers.   Let us add a regression line for the median values of reaction times. To make the data comparable, we will use only the reaction time related to the first question on each page, since the survey offers various number of questions per page. Open-ended questions are excluded from the dataset in order to keep an objective picture.
Linear and exponential regression models are shown in Figures 5 and 6. Let us add a regression line for the median values of reaction times. To make the data comparable, we will use only the reaction time related to the first question on each page, since the survey offers various number of questions per page. Open-ended questions are excluded from the dataset in order to keep an objective picture.
Linear and exponential regression models are shown in Figures 5 and 6. Let us add a regression line for the median values of reaction times. To make the data comparable, we will use only the reaction time related to the first question on each page, since the survey offers various number of questions per page. Open-ended questions are excluded from the dataset in order to keep an objective picture.
Linear and exponential regression models are shown in Figures 5 and 6.    Let us add a regression line for the median values of reaction times. To make the data comparable, we will use only the reaction time related to the first question on each page, since the survey offers various number of questions per page. Open-ended questions are excluded from the dataset in order to keep an objective picture.
Linear and exponential regression models are shown in Figures 5 and 6.   Thus, the built models show that the reaction time decreases on average.

Random Clicker and Bot Detection
To test Hypothesis #2, a neural network model was built. Therefore, the real data of 16,350 of 20,000 entries (containing no null values) were labeled as Class 1 and the random clicker data of 688 entries were generated automatically and labeled as Class 2. A total of 17,038 entries were randomly split into the training set of 6815 entries and the test set of 10,223 entries. The multi-layer perceptron classifier (MLPClassifier) from scikit-learn 1.0 was then fitted on the training set with the following parameters: solver: 'adam', random_state: 1, max_iter = 3000, activation = 'relu', alpha = 0.0001, learning_rate = 'constant'.
We then searched for the optimal number of neurons in the hidden layer of the perceptron (Figures 7 and 8) and identified 30 as the best parameter. and labeled as Class 2. A total of 17,038 entries were randomly split into the training set of 6815 entries and the test set of 10,223 entries. The multi-layer perceptron classifier (MLPClassifier) from scikit-learn 1.0 was then fitted on the training set with the following parameters: solver: 'adam', random_state: 1, max_iter = 3000, activation = 'relu', alpha = 0.0001, learning_rate = 'constant'.
We then searched for the optimal number of neurons in the hidden layer of the perceptron (Figures 7 and 8) and identified 30 as the best parameter.  The classification report with a total of 30 neurons is given in Table 1. The confusion matrix is given in Figure 9. and labeled as Class 2. A total of 17,038 entries were randomly split into the training set of 6815 entries and the test set of 10,223 entries. The multi-layer perceptron classifier (MLPClassifier) from scikit-learn 1.0 was then fitted on the training set with the following parameters: solver: 'adam', random_state: 1, max_iter = 3000, activation = 'relu', alpha = 0.0001, learning_rate = 'constant'.
We then searched for the optimal number of neurons in the hidden layer of the perceptron (Figures 7 and 8) and identified 30 as the best parameter.  The classification report with a total of 30 neurons is given in Table 1. The confusion matrix is given in Figure 9. The classification report with a total of 30 neurons is given in Table 1. The confusion matrix is given in Figure 9. Therefore, the MLPClassifier was able to distinguish excellently between the clickers and original data. However, there were 40 entries mistakenly labeled as random clickers ( Figure 9). Manual analysis of those records showed that they were abnormal in terms of the speed change of a user's reaction from very fast (1-2 s) to more than a minute in consecutive questions during the survey. In a real practice, such a suspicious subset of records, identified by the classifier, could be analyzed by hand by the expert.  Figure 9. The confusion matrix for clicker classifier. Therefore, the MLPClassifier was able to distinguish excellently between the clicker and original data. However, there were 40 entries mistakenly labeled as random clicker (Figure 9). Manual analysis of those records showed that they were abnormal in terms o the speed change of a user's reaction from very fast (1-2 s) to more than a minute in con secutive questions during the survey. In a real practice, such a suspicious subset of rec ords, identified by the classifier, could be analyzed by hand by the expert.

Verification of the Respondent's Identity Authenticity
To test Hypothesis #3, a neural network model was built. In order to model the situation "Bob started filling out the questionnaire then Caro continued and completed it", we randomly divided the normalized dataset of user's reac tion times into two equal parts: Part A and B. Therefore, Part A contained 8175 origina entries of users' reaction times to 26 consecutive questions from the questionnaire. Mean while, Part B's dataframe of another 8175 original entries was split horizontally, leavin reaction times to the first 10 questions in the left part and the reaction times to the last 1 questions in the right part. Then, the rows of the right part of the dataframe were ran domly permutated and joined again with the left part of Part B. Then, we assigned Clas 1 ("good") for all of the entries of Part A and Class 2 ("bad") for all of the entries of mod ified Part B. So, Class 1 ("good") corresponded to the situation when Bob started and com pleted the survey himself, and Class 2 ("bad") corresponded to the situation when Caro intervened after Question 10 and finished the survey.
We joined Part A and Part B dataframes vertically and split the resulting datafram randomly into the training and test set. The training set contained 6540 entries and the tes set contained 9810 entries. MLPClassifier was then fitted on the training set with th

Verification of the Respondent's Identity Authenticity
To test Hypothesis #3, a neural network model was built. In order to model the situation "Bob started filling out the questionnaire then Carol continued and completed it", we randomly divided the normalized dataset of user's reaction times into two equal parts: Part A and B. Therefore, Part A contained 8175 original entries of users' reaction times to 26 consecutive questions from the questionnaire. Meanwhile, Part B's dataframe of another 8175 original entries was split horizontally, leaving reaction times to the first 10 questions in the left part and the reaction times to the last 16 questions in the right part. Then, the rows of the right part of the dataframe were randomly permutated and joined again with the left part of Part B. Then, we assigned Class 1 ("good") for all of the entries of Part A and Class 2 ("bad") for all of the entries of modified Part B. So, Class 1 ("good") corresponded to the situation when Bob started and completed the survey himself, and Class 2 ("bad") corresponded to the situation when Carol intervened after Question 10 and finished the survey.
We joined Part A and Part B dataframes vertically and split the resulting dataframe randomly into the training and test set. The training set contained 6540 entries and the test set contained 9810 entries. MLPClassifier was then fitted on the training set with the following parameters: solver: 'adam', random_state: 1, max_iter = 3000, activation = 'relu', alpha = 0.0001, learning_rate = 'constant'. To measure the performance of the classifier on the test set, we applied the accuracy measure, as there was an equal number of Class 1 and Class 2 representatives in the dataset and the accuracy measure was meaningful.
We then searched for the optimal number of neurons in the hidden layer of the perceptron (Figures 10 and 11) and identified 60 as the best parameter. following parameters: solver: 'adam', random_state: 1, max_iter = 3000, activation = 'relu', alpha = 0.0001, learning_rate = 'constant'. To measure the performance of the classifier on the test set, we applied the accuracy measure, as there was an equal number of Class 1 and Class 2 representatives in the dataset and the accuracy measure was meaningful.
We then searched for the optimal number of neurons in the hidden layer of the perceptron (Figures 10 and 11) and identified 60 as the best parameter.  The classification report with a total of 60 neurons is given in Table 2. The confusion matrix is given in Figure 12.  following parameters: solver: 'adam', random_state: 1, max_iter = 3000, activation = 'relu', alpha = 0.0001, learning_rate = 'constant'. To measure the performance of the classifier on the test set, we applied the accuracy measure, as there was an equal number of Class 1 and Class 2 representatives in the dataset and the accuracy measure was meaningful. We then searched for the optimal number of neurons in the hidden layer of the perceptron (Figures 10 and 11) and identified 60 as the best parameter.  The classification report with a total of 60 neurons is given in Table 2. The confusion matrix is given in Figure 12.  The classification report with a total of 60 neurons is given in Table 2. The confusion matrix is given in Figure 12. Therefore, the MLPClassifier was able to distinguish between the original and permutated data quite accurately (0.73), providing the opportunity to identify the situation when the survey was completed by another person.
As the cardinality of parametric set of MLPClassifier extends 1000 and the training process takes some significant time with each tuple of parameters, different strategies can be applied to reduce the brute force search. The evolutionary algorithms could be applied as a promising solution for the directed search of parameters.  Figure 12. The confusion matrix for change identity classifier. Therefore, the MLPClassifier was able to distinguish between the original and permutated data quite accurately (0.73), providing the opportunity to identify the situation when the survey was completed by another person.
As the cardinality of parametric set of MLPClassifier extends 1000 and the training process takes some significant time with each tuple of parameters, different strategies can be applied to reduce the brute force search. The evolutionary algorithms could be applied as a promising solution for the directed search of parameters.

Discussion
The web survey was conducted in the Russian Federation as a part of psychological study. The dataset was generated from the responses of 20,000 students to simple survey questions. For this study, the data were preprocessed as follows. Empty records and questions that took an abnormally long time to answer were excluded. To preserve comparability, only the response time of the first question on each page was taken for further analysis.
Hypothesis #1 was not confirmed. Contrary to the expectations, in the long survey, respondents accelerated over time. This is likely the respondents getting accustomed to the web interface and them stopping spending time searching for interface elements. Another possible explanation is that the reaction time decrease may be due to the wish to finish the survey as soon as possible.
For Hypothesis #2 an additional dataset representing the random clickers was generated. It consists of 688 survey results filled in by an automation script. To identify random clickers, MLPClassifier was trained, and it was able to do so with high accuracy (0.99). There were 40 entries mistakenly labeled as random clickers. These records appeared to be abnormal in terms of the speed change of a user's reaction from very fast (1-2 s) to more than a minute in consecutive questions during the survey. It can be suggested that the approach is suitable for identification of random clickers and bots.
For Hypothesis #3, the initial dataset was split into two dataframes (Part A contained 8175 original entries of users' reactions to 26 consecutive questions from the questionnaire; meanwhile, Part B's dataframe of another 8175 entries was partly mixed with another respondent's answers). To find records with violated authenticity, MLPClassifier was

Discussion
The web survey was conducted in the Russian Federation as a part of psychological study. The dataset was generated from the responses of 20,000 students to simple survey questions. For this study, the data were preprocessed as follows. Empty records and questions that took an abnormally long time to answer were excluded. To preserve comparability, only the response time of the first question on each page was taken for further analysis.
Hypothesis #1 was not confirmed. Contrary to the expectations, in the long survey, respondents accelerated over time. This is likely the respondents getting accustomed to the web interface and them stopping spending time searching for interface elements. Another possible explanation is that the reaction time decrease may be due to the wish to finish the survey as soon as possible.
For Hypothesis #2 an additional dataset representing the random clickers was generated. It consists of 688 survey results filled in by an automation script. To identify random clickers, MLPClassifier was trained, and it was able to do so with high accuracy (0.99). There were 40 entries mistakenly labeled as random clickers. These records appeared to be abnormal in terms of the speed change of a user's reaction from very fast (1-2 s) to more than a minute in consecutive questions during the survey. It can be suggested that the approach is suitable for identification of random clickers and bots.
For Hypothesis #3, the initial dataset was split into two dataframes (Part A contained 8175 original entries of users' reactions to 26 consecutive questions from the questionnaire; meanwhile, Part B's dataframe of another 8175 entries was partly mixed with another respondent's answers). To find records with violated authenticity, MLPClassifier was trained, and it was able to distinguish between the original and permutated data quite accurately (0.73).

Conclusions
We have obtained new data that can be widely used to verify the results of web surveys. The research was conducted using a dataset of 20,000 records containing responses to a web survey, which took an average of 45 min to complete. Respondents with different devices and different operating systems participated in the survey; however, the technology used to collect the reaction times on client devices made it possible to compensate for the difference in gadgets.
It is shown that it is sufficient to consider the reaction time for user verification during the entire web survey. For the reliability of the experiment, we considered the reaction time obtained only while answering the first question on each web survey page. For the present study, the reaction time is the sum of the time in which the respondent read the question and choose the answer option from the suggested ones.
We have excluded open-ended questions from consideration. The data from a long web survey on a large sample gave interesting results: respondents do not get tired; rather, they begin to react faster, that is, they get used to the interface elements and try to finish the survey faster. This trend is observed for both individual users and for the median model.
Based on the data, we built a neural network model that determines the records in which the respondent was changed and a neural network model that identifies if the respondent acted as a random clicker. As a possible direction for future work, we consider an evaluation of different probability distributions and parameters, which can lead to more accurate random clicker detection.
The amount of data allows us to conclude that the identified dependencies are widely applicable to verify the user during the participation in the web surveys.