Peer Reporting: Sampling Design and Unbiased Estimates
Abstract
1. Introduction
2. Estimation Framework
2.1. Notation and Model Specification
2.2. Observables from Ego-Centric Sampling
2.3. ECM Estimator
2.4. Based on Reciprocity
2.5. Variance Estimation
- (1)
- Draw a bootstrap replicate by resampling ego-centric networks (each ego together with its reported peers) with replacement, and denote the resulting bootstrap sample as ;
- (2)
- Based on , compute the corresponding estimator using ;
- (3)
- Repeat steps (1)–(2) for , obtaining a set of bootstrap estimates:
- (4)
- Sort these estimates in ascending order, and construct the percentile confidence interval as
3. Experimental Design
3.1. Synthetic and Real-World Networks
- (1)
- Density (): quantifies the overall connectivity of a network [33] and is calculated aswhere E is the number of edges, and N is the number of nodes in the network.
- (2)
- Average Clustering Coefficient (): is a measure of how nodes tend to cluster together [34]. For each node i, the local clustering coefficient is defined aswhere is the degree of node i, and is the number of triangles that node i forms with its peers. The overall average clustering coefficient is then the mean of all individual :
- (3)
- Homophily (H): quantifies the extent to which nodes prefer connections within their own group rather than across groups. Let denote the proportion of links among all links originating from A-nodes [35,36]. The definition of H is as follows:when , all A-nodes only connect to other A-nodes (perfect assortative mixing); when , A-nodes connect to others proportionally to group sizes (random mixing); intermediate values indicate partial within-group preference. Negative values () correspond to disassortative mixing, i.e., a tendency to connect across groups [37].
- (4)
- Activity Ratio (AR): is set to values in the range by swapping attributes between high- and low-degree nodes to induce specific levels of degree–attribute correlation, while preserving both the network topology and marginal attribute counts [38].
3.2. Real-World Networks
3.3. Sampling and Estimation Procedure
3.4. Evaluation Metrics
3.5. Bootstrap Confidence Intervals
4. Results
4.1. Performance on Synthetic Networks
4.2. Performance on Real-World Networks
4.3. Bootstrap Coverage Rate
5. Sensitivity Analysis
5.1. Population Proportion P(A)
5.2. Activity Ratio (AR)
5.3. Network Density and Clustering
5.4. Combined Effects of AR and H
5.5. Combined Effects of AR and P(A)
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Variance Estimation
Appendix A.1. Asymptotic Variance Estimation
Appendix A.2. Bootstrap Methods for Confidence
Appendix A.2.1. Resampling Designs
- (a)
- Ego-level Resampling (BS-Ego). This is the standard nonparametric bootstrap that treats each ego-network as a single independent sampling unit.
- i.
- Draw a bootstrap sample of egos by sampling S egos from with replacement.
- ii.
- For each , include its complete observed record (degree and alter list).
- iii.
- Recompute the estimator using Equation (17) on the resampled ego set.
This approach preserves within-ego dependence but ignores the uncertainty arising from the second-stage alter sampling. It serves as the default bootstrap when no neighbor-level information is available. - (b)
- Hierarchical Tree Bootstrap (BS-Tree). A two-level resampling scheme that mirrors the hierarchical structure of ego-centric sampling. Each ego is kept fixed, while its alter list is resampled to capture within-ego stochasticity.
- i.
- Level 1 (Ego layer): Retain all S egos from the original sample.
- ii.
- Level 2 (Alter layer): For each ego , resample alters with replacement from its own alter list to form a bootstrap neighborhood .
- iii.
- Recompute on the reconstructed bootstrap dataset.
- (c)
- Neighbor Pool Bootstrap (BS-Pool). This design resamples at the edge level, ignoring ego boundaries. To maintain directionality, edges are divided into two origin-based groups: those emanating from A-egos and those from B-egos. Within each group, alters are treated as conditionally i.i.d.
- i.
- Construct two disjoint edge pools:Code each edge in as (1 if the alter is B), and each edge in as (1 if the alter is A). Let , , and , .
- ii.
- For each bootstrap replication, draw samples with replacement from and from , obtainingEach draw corresponds to one resampled edge indicator from the respective pool.
- iii.
- Compute the resampled proportions and, keeping the sample activity ratio fixed, recompute .
This edge-level approach does not account for correlations within each ego’s neighborhood, but it avoids mixing edges that originate from different groups, which could otherwise distort the estimates of and . Bootstrap samples in which no edges are drawn from either A- or B-egos ( or ) are excluded from analysis. To prevent numerical errors when proportions are extremely close to 0 or 1, the resampled proportions are bounded within , while the reported point estimates use the original, unbounded values.
Appendix A.2.2. Confidence Interval Construction
- Compute the point estimate from the original sample.
- For :
- (i)
- Generate a bootstrap sample using one of the three designs: BS-Ego, BS-Tree, or BS-Pool.
- (ii)
- Recompute the estimator on this replicate, denoted .
- Sort the bootstrap estimates in ascending order:
- Construct the percentile confidence interval as
Appendix B. Supplementary Comparison with RDS and NSUM



References
- Beyrer, C.; Baral, S.D.; van Griensven, F.; Goodreau, S.M.; Chariyalertsak, S.; Wirtz, A.L.; Brookmeyer, R. Global Epidemiology of HIV Infection in Men Who Have Sex with Men. Lancet 2012, 380, 367–377. [Google Scholar] [CrossRef]
- Lu, X.; Qin, W. Informatics in the Era of AI. Innov. Inform. 2025, 1, 100002. [Google Scholar] [CrossRef]
- Wan, M.; Wang, J.; Wang, Y.; Cao, R.; Wang, Z.; Wang, Z.; Shi, P.; Zhao, Z. Understanding as Compression: A New Evaluation Framework for Large Language Models. Innov. Inform. 2025, 1, 100003. [Google Scholar] [CrossRef]
- Tourangeau, R.; Rips, L.J.; Rasinski, K. (Eds.) The Psychology of Survey Response; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar] [CrossRef]
- Krumpal, I. Determinants of Social Desirability Bias in Sensitive Surveys: A Literature Review. Qual. Quant. 2013, 47, 2025–2047. [Google Scholar] [CrossRef]
- Cai, M.; Huang, G.; Kretzschmar, M.E.; Chen, X.; Lu, X. Extremely Low Reciprocity and Strong Homophily in the World Largest MSM Social Network. IEEE Trans. Netw. Sci. Eng. 2021, 8, 2279–2287. [Google Scholar] [CrossRef]
- Ward, M.K.; Meade, A.W. Dealing with Careless Responding in Survey Data: Prevention, Identification, and Recommended Best Practices. Annu. Rev. Psychol. 2023, 74, 577–596. [Google Scholar] [CrossRef]
- Yan, T. Consequences of Asking Sensitive Questions in Surveys. Annu. Rev. Stat. Its Appl. 2021, 8, 109–127. [Google Scholar] [CrossRef]
- Kreuter, F.; Presser, S.; Tourangeau, R. Social Desirability Bias in CATI, IVR, and Web Surveys: The Effects of Mode and Question Sensitivity. Public Opin. Q. 2008, 72, 847–865. [Google Scholar] [CrossRef]
- Warner, S.L. Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias. J. Am. Stat. Assoc. 1965, 60, 63–69. [Google Scholar] [CrossRef] [PubMed]
- Dalton, D.R.; Wimbush, J.C.; Daily, C.M. Using the Unmatched Count Technique (UCT) to Estimate Base Rates for Sensitive Behavior. Pers. Psychol. 1994, 47, 817–828. [Google Scholar] [CrossRef]
- Kowalczyk, B.; Niemiro, W.; Wieczorkowski, R. Item Count Technique with a Continuous or Count Control Variable for Analyzing Sensitive Questions in Surveys. J. Surv. Stat. Methodol. 2023, 11, 919–941. [Google Scholar] [CrossRef]
- Aquilino, W.S. Interview Mode Effects in Surveys of Drug and Alcohol Use: A Field Experiment. Public Opin. Q. 1994, 58, 210–240. [Google Scholar] [CrossRef]
- Corkrey, R.; Parkinson, L. A Comparison of Four Computer-Based Telephone Interviewing Methods: Getting Answers to Sensitive Questions. Behav. Res. Methods Instruments Comput. 2002, 34, 354–363. [Google Scholar] [CrossRef]
- Ehler, I.; Wolter, F.; Junkermann, J. Sensitive Questions in Surveys: A Comprehensive Meta-Analysis of Experimental Survey Studies on the Performance of the Item Count Technique. Public Opin. Q. 2021, 85, 6–27. [Google Scholar] [CrossRef]
- Laga, I.; Bao, L.; Niu, X. Thirty Years of the Network Scale-Up Method. J. Am. Stat. Assoc. 2021, 116, 1548–1559. [Google Scholar] [CrossRef]
- Salganik, M.J.; Mello, M.B.; Abdo, A.H.; Bertoni, N.; Fazito, D.; Bastos, F.I. The game of contacts: Estimating the social visibility of groups. Soc. Netw. 2011, 33, 70–78. [Google Scholar] [CrossRef]
- Maltiel, R.; Raftery, A.E.; McCormick, T.H.; Baraff, A.J. Estimating population size using the network scale up method. Ann. Appl. Stat. 2015, 9, 1247–1277. [Google Scholar] [CrossRef] [PubMed]
- Feehan, D.M.; Salganik, M.J. Generalizing the Network Scale-Up Method: A New Estimator for the Size of Hidden Populations. Sociol. Methodol. 2016, 46, 153–186. [Google Scholar] [CrossRef] [PubMed]
- Fisher, J.C.D.; Flannery, T.J. Designing Randomized Response Surveys to Support Honest Answers to Stigmatizing Questions. Rev. Econ. Des. 2023, 27, 635–667. [Google Scholar] [CrossRef]
- Blair, G.; Imai, K.; Zhou, Y.Y. Design and Analysis of the Randomized Response Technique. J. Am. Stat. Assoc. 2015, 110, 1304–1319. [Google Scholar] [CrossRef]
- Tourangeau, R.; Yan, T. Sensitive Questions in Surveys. Psychol. Bull. 2007, 133, 859–883. [Google Scholar] [CrossRef]
- Helleringer, S.; Adams, J.; Yeatman, S.; Mkandawire, J. Evaluating Sampling Biases from Third-Party Reporting as a Method for Improving Survey Measures of Sensitive Behaviors. Soc. Netw. 2019, 59, 134–140. [Google Scholar] [CrossRef]
- Heckathorn, D.D. Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations. Soc. Probl. 1997, 44, 174–199. [Google Scholar] [CrossRef]
- Lu, X.; Malmros, J.; Liljeros, F.; Britton, T. Respondent-driven sampling on directed networks. Electron. J. Stat. 2013, 7, 292–322. [Google Scholar] [CrossRef]
- Abdesselam, K.; Verdery, A.; Pelude, L.; Dhami, P.; Momoli, F.; Jolly, A.M. The development of respondent-driven sampling (RDS) inference: A systematic review of the population mean and variance estimates. Drug Alcohol Depend. 2020, 206, 107702. [Google Scholar] [CrossRef] [PubMed]
- Lu, X. Linked Ego Networks: Improving Estimate Reliability and Validity with Respondent-Driven Sampling. Soc. Netw. 2013, 35, 669–685. [Google Scholar] [CrossRef]
- Verdery, A.M.; Merli, M.G.; Moody, J.; Smith, J.A.; Fisher, J.C. Brief Report: Respondent-Driven Sampling Estimators Under Real and Theoretical Recruitment Conditions of Female Sex Workers in China. Epidemiology 2015, 26, 661–665. [Google Scholar] [CrossRef]
- Heckathorn, D.D.; Cameron, C.J. Network Sampling: From Snowball and Multiplicity to Respondent-Driven Sampling. Annu. Rev. Sociol. 2017, 43, 101–119. [Google Scholar] [CrossRef]
- Beaudry, I.S.; Gile, K.J. Correcting for differential recruitment in respondent-driven sampling data using ego-network information. Electron. J. Stat. 2020, 14, 2678–2713. [Google Scholar] [CrossRef]
- Chen, S.; Lu, X.; Liljeros, F.; Jia, Z.; Rocha, L.E.C.; Li, X. Indirect inference of sensitive variables with peer network survey. J. Complex Netw. 2021, 9, cnab034. [Google Scholar] [CrossRef]
- Baraff, A.J.; McCormick, T.H.; Raftery, A.E. Estimating Uncertainty in Respondent-Driven Sampling Using a Tree Bootstrap Method. Proc. Natl. Acad. Sci. USA 2016, 113, 14668–14673. [Google Scholar] [CrossRef]
- Newman, M. The Structure and Function of Complex Networks. SIAM Rev. 2003, 45, 167–256. [Google Scholar] [CrossRef]
- Watts, D.J.; Strogatz, S.H. Collective Dynamics of ‘Small-World’ Networks. Nature 1998, 393, 440–442. [Google Scholar] [CrossRef]
- McPherson, M.; Smith-Lovin, L.; Cook, J.M. Birds of a Feather: Homophily in Social Networks. Annu. Rev. Sociol. 2001, 27, 415–444. [Google Scholar] [CrossRef]
- Lu, X.; Bengtsson, L.; Britton, T.; Camitz, M.; Kim, B.J.; Thorson, A.; Liljeros, F. The Sensitivity of Respondent-Driven Sampling. J. R. Stat. Soc. Ser. A Stat. Soc. 2011, 175, 191–216. [Google Scholar] [CrossRef]
- Lu, X. Respondent-Driven Sampling: Theory, Limitations & Improvements. Ph.D. Thesis, Karolinska Institutet, Stockholm, Sweden, 2013. [Google Scholar]
- Salganik, M.J.; Heckathorn, D.D. Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling. Sociol. Methodol. 2004, 34, 193–240. [Google Scholar] [CrossRef]
- Rossi, R.A.; Ahmed, N.K. The Network Data Repository with Interactive Graph Analytics and Visualization. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar] [CrossRef]
- Rozemberczki, B.; Allen, C.; Sarkar, R. Multi-Scale Attributed Node Embedding. J. Complex Netw. 2021, 9, cnab014. [Google Scholar] [CrossRef]
- Spiller, M.W.; Gile, K.J.; Handcock, M.S.; Mar, C.M.; Wejnert, C. Evaluating Variance Estimators for Respondent-Driven Sampling. J. Surv. Stat. Methodol. 2018, 6, 23–45. [Google Scholar] [CrossRef]
- Gile, K.J.; Beaudry, I.S.; Handcock, M.S.; Ott, M.Q. Methods for Inference from Respondent-Driven Sampling Data. Annu. Rev. Stat. Its Appl. 2018, 5, 65–93. [Google Scholar] [CrossRef]
- Ver Hoef, J.M. Who Invented the Delta Method? Am. Stat. 2012, 66, 124–127. [Google Scholar] [CrossRef]











| Network | Nodes | Edges | Density | Clustering | P(A) | AR |
|---|---|---|---|---|---|---|
| AIDS | 31,385 | 32,390 | 0.00007 | 0.005 | 0.1750 | 0.54 |
| PTC | 5110 | 54,690 | 0.00040 | 0.006 | 0.1411 | 0.60 |
| Git | 37,700 | 289,003 | 0.00041 | 0.168 | 0.2583 | 0.49 |
| Flickr | 80,513 | 5,900,000 | 0.00182 | 0.165 | 0.4420 | 1.22 |
| Tox | 127,998 | 130,481 | 0.00001 | 0.003 | 0.1611 | 0.58 |
| 580,800 | 1,400,000 | 0.00001 | 0.394 | 0.0331 | 1.64 |
| AR | Bias (SD) | RMSE (Pbest) | ||||
|---|---|---|---|---|---|---|
| Sample | ECM | Sample | ECM | |||
| BA network, | ||||||
| 0.7 | 0.020 (0.014) | 0.037 (0.008) | 0.010 (0.007) | 0.024 (27.8%) | 0.038 (0.6%) | 0.013 (71.6%) |
| 1 | 0.021 (0.015) | 0.008 (0.006) | 0.008 (0.006) | 0.026 (19.6%) | 0.010 (37.2%) | 0.010 (43.2%) |
| 1.3 | 0.020 (0.015) | 0.028 (0.011) | 0.007 (0.006) | 0.025 (20.4%) | 0.030 (6.8%) | 0.009 (72.8%) |
| 1.5 | 0.020 (0.015) | 0.049 (0.012) | 0.007 (0.005) | 0.025 (23.8%) | 0.050 (0.2%) | 0.009 (76.0%) |
| ER network, | ||||||
| 0.7 | 0.025 (0.018) | 0.027 (0.008) | 0.009 (0.007) | 0.030 (18.2%) | 0.028 (6.8%) | 0.011 (75.0%) |
| 1 | 0.022 (0.016) | 0.008 (0.006) | 0.007 (0.006) | 0.027 (14.2%) | 0.010 (37.4%) | 0.009 (48.4%) |
| 1.3 | 0.022 (0.017) | 0.026 (0.011) | 0.007 (0.006) | 0.028 (17.6%) | 0.028 (6.8%) | 0.009 (75.6%) |
| 1.5 | 0.022 (0.017) | 0.043 (0.012) | 0.007 (0.005) | 0.027 (19.6%) | 0.045 (0.8%) | 0.009 (79.6%) |
| Sampling Strategy | Bias (SD) | RMSE (Pbest) | ||||
|---|---|---|---|---|---|---|
| Sample | ECM | Sample | ECM | |||
| BA network, | ||||||
| F | 0.021 (0.016) | 0.092 (0.014) | 0.011 (0.008) | 0.026 (29.1%) | 0.094 (5.8%) | 0.013 (65.1%) |
| P5 | 0.018 (0.014) | 0.092 (0.013) | 0.015 (0.010) | 0.022 (41.8%) | 0.093 (5.8%) | 0.018 (52.4%) |
| P10 | 0.019 (0.015) | 0.092 (0.013) | 0.011 (0.009) | 0.024 (33.0%) | 0.094 (5.4%) | 0.014 (61.6%) |
| W | 0.021 (0.016) | 0.092 (0.014) | 0.010 (0.008) | 0.026 (28.1%) | 0.094 (5.7%) | 0.013 (66.1%) |
| ER network, | ||||||
| F | 0.032 (0.024) | 0.059 (0.012) | 0.010 (0.008) | 0.040 (19.4%) | 0.061 (8.1%) | 0.013 (72.6%) |
| P5 | 0.023 (0.017) | 0.059 (0.012) | 0.010 (0.008) | 0.029 (26.9%) | 0.060 (7.8%) | 0.013 (65.3%) |
| P10 | 0.030 (0.022) | 0.059 (0.012) | 0.010 (0.008) | 0.037 (20.6%) | 0.060 (8.1%) | 0.013 (71.2%) |
| W | 0.032 (0.024) | 0.059 (0.012) | 0.010 (0.008) | 0.040 (20.0%) | 0.061 (7.7%) | 0.013 (72.3%) |
| PA | AR = 0.8 | AR = 1.0 | AR = 1.2 | AR = 1.4 | AR = 1.6 | AR = 1.8 |
|---|---|---|---|---|---|---|
| 0.10 | 0.86 (0.95, 0.85) | 0.83 (0.94, 0.83) | 0.85 (0.98, 0.86) | 0.93 (0.99, 0.91) | 0.92 (0.99, 0.92) | 0.88 (0.97, 0.87) |
| 0.20 | 0.75 (0.92, 0.75) | 0.83 (0.95, 0.77) | 0.90 (0.96, 0.89) | 0.83 (0.97, 0.88) | 0.78 (0.94, 0.79) | 0.81 (0.90, 0.81) |
| 0.30 | 0.84 (0.95, 0.77) | 0.84 (0.98, 0.86) | 0.90 (0.97, 0.91) | 0.85 (0.98, 0.84) | 0.83 (0.95, 0.84) | 0.89 (0.98, 0.88) |
| 0.40 | 0.80 (0.93, 0.84) | 0.86 (0.98, 0.88) | 0.85 (0.96, 0.84) | 0.84 (0.94, 0.84) | 0.81 (0.94, 0.84) | 0.82 (0.94, 0.83) |
| PA | AR = 0.8 | AR = 1.0 | AR = 1.2 | AR = 1.4 | AR = 1.6 | AR = 1.8 |
|---|---|---|---|---|---|---|
| 0.10 | 0.91 (0.98, 0.91) | 0.89 (0.98, 0.90) | 0.96 (1.00, 0.96) | 0.96 (1.00, 0.97) | 0.93 (0.99, 0.92) | 0.95 (0.99, 0.95) |
| 0.20 | 0.90 (0.99, 0.91) | 0.91 (0.98, 0.92) | 0.97 (0.99, 0.97) | 0.97 (0.99, 0.96) | 0.93 (0.99, 0.93) | 0.91 (0.99, 0.93) |
| 0.30 | 0.87 (0.98, 0.86) | 0.97 (1.00, 0.96) | 0.95 (0.99, 0.95) | 0.96 (1.00, 0.95) | 0.95 (0.99, 0.96) | 0.89 (0.99, 0.90) |
| 0.40 | 0.93 (0.99, 0.92) | 0.93 (0.98, 0.92) | 0.93 (0.99, 0.91) | 0.94 (0.99, 0.93) | 0.92 (0.99, 0.92) | 0.85 (0.97, 0.91) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wen, K.; Mou, J.; Lu, X. Peer Reporting: Sampling Design and Unbiased Estimates. Entropy 2026, 28, 116. https://doi.org/10.3390/e28010116
Wen K, Mou J, Lu X. Peer Reporting: Sampling Design and Unbiased Estimates. Entropy. 2026; 28(1):116. https://doi.org/10.3390/e28010116
Chicago/Turabian StyleWen, Kang, Jianhong Mou, and Xin Lu. 2026. "Peer Reporting: Sampling Design and Unbiased Estimates" Entropy 28, no. 1: 116. https://doi.org/10.3390/e28010116
APA StyleWen, K., Mou, J., & Lu, X. (2026). Peer Reporting: Sampling Design and Unbiased Estimates. Entropy, 28(1), 116. https://doi.org/10.3390/e28010116

