Treating Nonresponse in Probability-Based Online Panels through Calibration: Empirical Evidence from a Survey of Political Decision-Making Procedures

The use of probability-based panels that collect data via online or mixed-mode surveys has increased in the last few years as an answer to the growing concern with the quality of the data obtained with traditional survey modes. However, in order to adequately represent the general population, these tools must address the same sources of bias that affect other survey-based designs: namely under coverage and non-response. In this work, we test several approaches to produce calibration estimators that are suitable for survey data affected by non response where auxiliary information exists at both the panel level and the population level. The first approach adjusts the results obtained in the cross-sectional survey to the population totals, while, in the second, the weights are the result of two-step process where different adjusts on the sample, panel, and population are done. A simulation on the properties of these estimators is performed. In light of theory and simulation results, we conclude that weighting by calibration is an effective technique for the treatment of non-response bias when the response mechanism is missing at random. These techniques have also been applied to real data from the survey Andalusian Citizen Preferences for Political Decision-Making Procedures.


Introduction
In the last few years, we are witnessing a strong development of online research methods in general and web surveys specifically. According to the World Association of Opinion and Marketing Research professionals (ESOMAR) data referring to 2016, 56% of the global revenues in the market research field was made using different kinds of quantitative and qualitative online methods. Online is also the preferred mode for survey-based research such as web or smartphone surveys accounts for two-thirds of the revenues on this chapter [1].
The growth of online research methods goes in parallel with the increasing difficulties that traditional modes, such as face to face or telephone interviews, are facing to obtain samples that achieve the highest quality standards [2][3][4]. Changes in the patterns of the distribution of working and leisure time, the incorporation of women to the workforce, and the increase in residential and geographic mobility have hampered the location of survey respondents. On the other hand, respondents are less willing to participate in surveys as they are more concerned with privacy and confidentiality issues and tired of the growing number of unsolicited requests from the survey and the marketing from the two sample designs (the first by which the panel is obtained, and the second by which the sample is selected from the panel) that must be taken into consideration.
In this paper, we will adapt the calibration methodology to the case of probabilistic panels for the treatment of non-response. To the best of our knowledge, this is the first time that this methodology is studied in this context. We will propose different calibration methods in one and two steps and compare the efficiency of these estimators by means of a simulation study. Finally, we will apply the proposed techniques to a real survey conducted using the Citizen Panel for Social Research (PACIS), a probability-based mixed-modes panel of the Andalusian population recruited and maintained by the Institute for Advanced Social Studies(IESA-CSIC).

Proposed Calibration Strategies in Panel Surveys
Let y be the interest variable related to a finite population U = {1, . . . , N} consisting of N units. For this population U, a panel P of size N P is drawn with probability p d1 (P) according to d 1 . For this sampling design, the first order probabilities are given by π (1) i = ∑ P i p d1 (P) for i ∈ U, where ∑ P i denotes the sum of the panels containing the unit i. We denote by d In a second-phase, a probability sample s of size n is selected from the panel P with conditional probability p(s | P). In this sampling design d 2 , the first-order inclusion probabilities are denoted by π i|P . For each unit in the panel P, we denote by d (2) i = 1/π i|P the design weight from the panel. The final sampling design p(·) defined in U has first-order inclusion probabilities π i = ∑ P i p d1 (P)π i|P , i, ∈ U . Let d i = 1/π i denote the sampling design-basic weight for unit i ∈ U.
We denote by y i the value of the variable y associated with the i-th individual. We consider the linear parameter Y = ∑ N i∈U y i , the population total. We assume missing data on the sample s for the variable y, which can be divided into the disjoint sets s r = {i ∈ s/i responds to main variable} s m = {i ∈ s/i do not responds to main variable}, where s r is the respondent sample is of size r, and s m is the size n − r.
The natural candidate to estimate the population total is the Horvitz-Thompson estimatorŶ HT = ∑ i∈s r d i y i ; however, it may lead to biased estimates because certain specific groups can be substantially under-represented. These errors can be overcome by the use of reweighting techniques. A standard weighting procedure is to rake or poststratify weights to external population control totals. Raking and post-stratification are special cases of calibration adjustment.
Calibration adjustments [30,31] can be used to extrapolate the estimations of a survey. Calibration estimates reduce non-coverage, non-response, and other biases [23,32]. We will adapt this methodology in our context.
To do so, we assume the existence of auxiliary information relative to several variables related to the main variable y. Auxiliary information can exist simultaneously at the population and the panel level, thus this auxiliary information is given as two vectors: x * k and x o k . Auxiliary information at population level is denoted by x * k , and the population vector total ∑ U x * k is known. Auxiliary information at panel level is noted by x o k , and these values are known for k ∈ P. The total panel X o P = ∑ P x o k is thus known and the total population of these variables can be estimated by ∑ P d (1) k x o k . We denote by x k the auxiliary vector constructed by combining the two vectors: and We will use these auxiliary variables in different ways in the calibration process.

Calibration in One Step
Now, we use all auxiliary information to calibrate at population level. We define a one step calibration estimator as: where the weights w k , for k ∈ s r , modify as little as possible, in an average sense, the original sampling weights d k , while respecting the calibration equations Then, given a distance measure G(w k , d k ), the calibration process consists of finding the solution of the following minimization problem: while respecting the calibration Equation (5). This calibration problem results in final calibrated weights w k = d k F(q k x k λ), where F(·) is the inverse of ∂G(w k , d k )/∂w k , where λ is the Lagrange multiplier and q k are known positive constants used for scaling the calibrated weights (in many situations the value of q k = 1, k ∈ s ).
Since Equation (6) depends on the chosen distance measure G(w k , d k ), each different distance measure leads to a specific weighting system and thereby to a new estimator. Many distance measures have been proposed for calibration. The authors in [33] consider several distance measures with desirable properties.
Under mild regularity assumptions, the authors in [33] also state the important result that all the calibration estimators given by (6) are asymptotically equivalent to the regression estimator. Thus, the choice of the distance measure G(w k , d k ) has only a modest impact on important properties of estimators as the variance.
The raking ratio method (also known as the multiplicative method) uses the Kullback-Leibler pseudo-distance The calibrated weights are obtained as w k = d k exp(q k x k λ). The existence of solutions for this problem is not guaranteed (see [33,34]).
In linear calibration (the case of the Euclidean distance), the weights can be positive or negative, but raking calibration guarantees positive weights, with the same unbiasedness properties as generalized regression estimator; however, the model used for the adjustment of non-response has a slightly more reasonable form than the generalized regression estimator [28]. Finally, the raking method is an available procedure for calculating calibration weights in the best known statistical packages. Note 1. The authors in [35] introduced an alternative calibration approach called the functional form calibration that gives the same estimators. In this paper, we consider the usual calibration approach because it is simpler to formulate. Note 2. In some practical situations, the π i|P for all possible panels P are not known. For these cases, one can use (following the idea of [36]) the π -estimator changing π i by π = π (1) i π i|P as the base estimator and thus calculate a similar π -calibrated estimator.

Calibration in Two Steps
Calibration is a very flexible technique to incorporate varied information at various stages. Next, we will explore several ways of doing two-stage calibration: 1. A two-step calibration method can be defined in this manner: Step 1: Adjusting the representativeness of the panel in the population.
The calibration on population auxiliary variables whose population totals ∑ U x * k are known yields the calibrations weights w Then, each unit in the panel has a weight that summarizes the auxiliary information obtained from the population.
Step 2: Adjusting the non-response of the sample in the panel.
The panel auxiliary information is incorporated through a calibration estimator whose calibrated weights w k ·d (2) k (here the starting weights for calibration are the design weights of the units in the panel) verified The final weights are obtained by multiplying these weights and the two-step calibration estimator proposed isŶ k ·y k .
2. By using the same idea as in [22], we can consider a second procedure: Step 1. Computing first intermediate weights by calibrating the survey response set to the panel.
Thus, we calculate the weights by min w k ∑ s r G(w k , w The calibrated weights obtained in this first step are noted by w Pck .
Step 2. Then, using these intermediate weights w Pck for calibration to the population. The problem is now min We denote the resulting calibration estimator byŶ CAL3 .
3. We consider other alternative by calibrating the survey response set to the sample that is: Step 1. Compare the response set with the sample by min w k ∑ s r G(w k , w The calibrated weights obtained calibrating the survey response set are noted by w Sck .
Step 2. Compare, using these intermediate weights, the sample with the population by We denote the resulting calibration estimator byŶ CAL4 .
Note that one particular auxiliary variable can be used in both steps (see the next Section 3). Previously discussed calibration methods are defendable procedures. They may be comparable in terms of computational burden. In addition, the effectiveness of each method can be assessed by the theoretical variance of the estimator that cannot be expressed simply. We can apply automated linearization to each of the calibration estimators to find the approximate variance (see [37]).

Simulation Study
We select for this simulation study the adults of eusilcP data set (population size 47,123), a population available from the R-packages simFrame [38] and simPop [39]. As a panel, we draw a sample of 1500 households stratified by region and select all members in each household. This process is repeated 1000 times (average panel size over simulations 2857.607). For each panel, we select a simple random sample of size 1000 (this process is repeated 1000 times).
As target variable, we use eqIncome, a numeric variable about household income. As x * -variables (for which are known the population total), we use age, gender, and ecoStat, a factor showing the person's economic status. As x 0 -variable (for which only are known the panel total), we select hy040n, a numeric income from rental of a property or land (cor(eqIncome, hy040n) = 0.387).
For each panel, we generated missing values due to non-response in the sample using two mechanisms: Missing at random (MAR) with age as auxiliar variable and missing not at random (MNAR), with an auxiliar variable namely benefits, obtained as a sum of the variables referred to different types of subsidies (cor(benefits,eqIncome) = 0.530737). Missing completely at random (MCAR) is not included in this simulation study because it is not a practical situation.
Uncalibrated estimator, HT, and alternative calibration estimators are computed in each simulation runs: CAL1 (one step calibration) and the three versions of calibration using two steps: CAL2, CAL3, and CAL4. Figures 1 and 2 show the boxplots of our results. Green line is the true population mean, and blue square is the average over 1000 × 1000 simulation runs for each method compared.
From Figures 1 and 2, some empirical evidence can be highlighted: • The Horvitz-Thompson estimator is a biased estimator and the non-response bias increases with the non-response rate. The best behavior is taken by estimators CAL1 and CAL2: they correct bias very well even for the largest non-response rates and have the least variability. In this situation of MAR non-response, the calibration estimators CAL3 and CAL4 are not able to correct the non-response bias: they are even more biased than the HT estimator. Note that the bias of the Horvitz-Thompson estimator is negative, while the CAL3 and CAL4 estimators have a positive bias.

•
The behavior of the estimators in the second scenario is very different. None of the estimators can correct the non-response bias (note that the green line is off the graph when the non-response rate is 60%) especially when the non-response rate is large. The estimators that are best profiled to reduce them are the CAL3 and CAL4 estimators although the bias is very large for high response rates. This result was expected since the MNAR non-response is very complex to deal with. • It is noteworthy that the estimators CAL1 and CAL2 have a similar behavior to each other as well as CAL3 and CAL4. It is also observed that the estimators CAL1 and CAL2 have less variability in general than CAL3 and CAL4.

The Dataset
The survey Andalusian Citizen Preferences for Political Decision-Making Processes is part of a wider project titled Why do we hate politics? (PRY079/14). This project was leaded by two IESA/CSIC researchers and funded by the public foundation Centro de Estudios Andaluces [40] with the aim of analyzing the perception that lay citizens have on democracy and its limits. Fieldwork was conducted from 30 November to 19 December 2015, the day before the Spanish general election took place. This survey was the second survey conducted using the PACIS.
The PACIS is a probability-based, mixed-modes panel of the Andalusian general population. It was recruited offline following full probability sampling methods in the first semester of 2015. The target population was the general population with 16 years and older residing in independent private households in Andalusia. As obtaining a random sample of specifically named persons from the Spanish Continuous Population Register is restricted to official statistics and international surveys, we conducted address-based sampling using as a sampling frame the Spanish Cadaster, the administrative inventory of real estate that includes all the properties segmented by use (residence, rural, industrial, etc.). The cadaster includes the address of each property and geospatial data that was used to ascribe the addresses to a census section and to design the maps that helped the recruiters to locate the households selected to be part of the initial sample. Census sections are operational partitions of the territory that are defined by easily identifiable boundaries and group 1000 to 2500 residents each. From this sampling frame, households were selected using two-stage stratified cluster sampling where the primary sampling unit was the census section and the secondary sampling unit the household address. Census sections were stratified by province using proportional allocation. Once the household was located and contacted, the aim was to try to register all the persons residing in the household. Recruitment was made in two sequential phases using postal invitations first and face to face visits to households in the second place.
PACIS is a mixed-mode panel because, although web surveys are the main interviewing mode, CATI is used to interview offline population and online panel members that did not answer to online invitation to participate in the survey. In order to maximize response rates and lower attrition, participation is rewarded with 5 euros that the respondent may receive or donate to a charity. It must be noted that PACIS was not conceived to study changes over time, as it is the case with longitudinal panels, but as a valid pool of respondents for extracting cross-sectional samples.
The method used for selecting those survey samples is stratified random sampling with proportional allocation to the population distribution by sex and age.

Nonresponse in PACIS Surveys
As it has been explained, non-response occurs at two different moments in the case of panels: the recruitment stage and the subsequent surveys processes. Tables 1 and 2 show the final disposition of case codes and the outcome rates for the recruitment phase of the panel and the PACIS survey on Andalusian Citizen Preferences for Political Decision-Making Processes.
The overall response rate at panel registration was 21.6% (American Association for Public Opinion Research AAPOR RR3), a figure that is in line with the outcome rates obtained by the German Internet Panel (18.1%, RR3) and somewhat lower than more resourced panels like ELIPSS (27.3% AAPOR RR3) or the GESIS panel (25.1% AAPOR RR5) [20]. Refusals at the respondent level are also the most important reason for not participating in the second wave PACIS survey. In addition, 2398 panelists were invited to participate in the survey and 1081 completed the questionnaire, which accounts for a response rate of 48.1%. Table 3 compares the distribution of the panel members and the respondents to the second wave with the general Andalusian population according to continuous population register statistics. Given that, in both cases, sampling selection has followed a probabilistic procedure and that the recruitment and the interviewing modes are not subjected to undercoverage, we assume that the biggest part of the observed differences is caused by non-response. We find that the PACIS underrepresents men, the oldest age group (over 60 y.o.), inactive population, people with a basic education level, and non-nationals. Residents in Malaga, a coastal area with a large proportion of secondary residences and holiday rentals, are also less likely to have registered as members in the panel.

Main and Auxiliary Variables in the PACIS
The data at the population level were obtained from the Spanish National Institute of Statistics selecting 2015 as the reference year. In our case, the recruitment of the panel and the survey were conducted the same year. However, we would like to note that an important aim for computing the weighting adjustments is to correct for the increasing mismatches between the panel and the population associated as the panel ages (panel attrition, changes in the population, etc.). To fullfill this objective in the case of cross-sectional panels, we must take the time of the survey as the reference for selecting the auxiliary data at the population level [41]. Some demographics variables are described in Table 4. Three main variables in this study are analyzed. The first of them is related to the "interest in politics". The variable is the answer to the following question: How interested would you say you are in politics?: Very interested, Quite interested, Somewhat interested, Hardly interested, Not at all interested.
The second is about the preferences that the interviewees have about the way in which political decisions should be made. The variable is the answer to the question: How would you like to be made political decisions in Spain, Andalusia, your municipality? The answer to this question was offered in the form of a average score of a numerical scale from 0 to 10, where 0 means that citizens should make all decisions on their own, and 10 that politicians should make all decisions on their own.
In addition, the last one is related to the concrete procedures of decision-making preferred by the interviewees. The question is: We would like you to rate the following ways of making political decisions?
The answer to this question was offered in the form of scores in a numerical scale from 0 to 10, where 0 means that the procedure in question does not help anything in decision-making and the 10 that is the best way to make decisions. Six procedures are analyzed: Elections are the way for citizens to intervene in decisions (Elections), Take policy decisions through consultation to experts (Consult to experts), Organize assemblies and meetings so that people can take decisions by themselves (Assemblies), Allow experts to take important policy decisions (Allow experts), Organize referenda frequently (Referenda), and Let the government take the decisions (Government).
There are several auxiliary variables that can be used for calibration at the panel level. We have followed the criterion proposed by [25] based on the H 3 bias indicator, to select the variables that, included in the auxiliary vector, were the most effective for bias reduction. The final auxiliary variables selected are described in Table 4 and 5.

Results
Tables 6-8 show the estimated proportions (in percent, %) for the three variables by using the two proposed calibration methods. Root mean squared errors (RMSE) are estimated using jackknife techniques and are also shown in Tables 6-8.
By examining the estimators and the RMSE in the above tables, we are led to the following observations: • Survey results after applying calibration weights are different from unweighted results. • Differences in survey results after applying one-step or two-step calibration estimators are small in estimates and in errors.

•
Calibration estimator in one step CAL1 come close in RMSE to CAL2. The two-step calibration estimators CAL3 and CAL4 estimators also give similar results to each other.

•
There is no estimator that always has the lowest RMSE errors. We would like to note that the results obtained by applying these methods to the survey indicate that 45% of Andalusians are interested in politics, a figure that is somewhat higher than that of the Spanish Citizens (40% according to 2016 European Social Survey data) but still far from the 47% of EU citizens that are interested in politics [42]. Regarding their preferences on who should make the political decisions, most Andalusians choose central positions, with a few more preferring the participatory to the representative pole and that for the three governance levels analyzed (state, region, and municipality). This preference is coherent with the way Andalusian people evaluate different decision-making procedures, with the elections obtaining the highest score (7.27) just before consultation with experts (7.05) and citizen assemblies (6.7). The same pattern is observed in a nationally representative survey conducted in Spain in 2011 (Centro de Investigaciones Sociológicas, CIS, Study number 2860) [43].

Discussion
In this work, we examined probability-based panel survey settings where there are missing values. We use raking calibration weighting to remove selection biases resulting from unit non-response. Although this technique was developed to reduce standard errors, many authors have shown that calibration can also be used to handle non response [22][23][24].... In Section 2, we consider four methods to use the different auxiliary information in the calibration although other alternatives are possible. In each method, a set of calibrated weights is computed by modifying the design weights through the available auxiliary information. This single set of weights will then be used to calculate the calibrated estimates of all the population parameters of interest since the values of the weights do not depend on y variables. This is a great advantage of the calibration technique in multipurpose surveys.
An important problem is the selection of auxiliary vector for the calibration. The authors in [44,45] point out that the selection of the variables is more important than the method used to calibrate. The role of the explanatory variables has been demonstrated to be crucial at the post-adjustment stage. The advantage of having a probability-based panel is that more auxiliary information is available for the calibration. Many different vectors can be composed. The authors in [25] provide computational tools for obtaining a preference ordering of potential x-vectors, with the objective to reduce as much as possible the bias remaining in the calibration estimator. The selection of variables for the calibration is a very interesting problem but quite complex to deal with in this situation, which is different for each method used. This is a problem to deal with in future research.
We conduct a simulation study in which it is observed that the estimators CAL1 and CAL2 have a similar behavior to each other as well as CAL3 and CAL4. The simulation study also shows that calibration can significantly reduce bias for MAR mechanisms while it is not very effective for NMAR mechanisms.
We use data from one survey conducted using the Citizen Panel for Social Research to test the behavior of proposed estimators. Although the same auxiliary information goes into the final weights, the resulting calibration estimators are not identical. In the same way, the differences for the estimators of the variances can sometimes be significant depending on the relationship between the variables.

Conclusions
The main conclusion of this work is that weighting by calibration is an effective technique for reducing non-response bias in probability-based panels when data are missing at random (MAR). Regarding the calibration method, in one or two steps, there are not theoretical reasons to prefer one over the other. However, the two-steps method has some practical advantages. Knowing the weights of each unit in the sampling frame allows for incorporating this information in the selection of the subsequent cross-sectional samples and maximizing the number of auxiliary variables when computing the weights for each specific survey. Another benefit of this approach is that it allows for isolating the sources of error produced at each step (i.e., coverage error in the recruitment of the panel and attrition bias in the subsequent surveys). It must be taken into account that panel weights have to be updated taking the time of the survey as the reference for selecting the auxiliary data at the population level when using this approach to calibrate cross-sectional panels. In this case, the calibration in one step may be more straightforward, particularly if auxiliary variables at the survey level do not add much value to the quality of the estimates.
In all, the choice of the calibration method is a complex decision where different aspects come into play: the technical features of the panel and the survey, the mechanism of non-response affecting our data, and the availability of auxiliary information.
Author Contributions: The authors contributed equally to this work in conceptualization, methodology, software, and original draft preparation. All authors have read and agreed to the published version of the manuscript.

Funding:
The work was supported by the Ministerio de Economia, Industria y Competitividad, Spain, under Grant MTM2015-63609-R.

Conflicts of Interest:
The authors declare no conflict of interest.