# Design-Based Approach for Analysing Survey Data in Veterinary Research

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Overview of Probability Sampling

_{1}, S

_{2}, S

_{3}, … We can then define a set “sample space” (denoted as Ω) that contains all these samples. With probability sampling, a probability can be explicitly assigned for each of the samples, with the constraint that ${\sum}_{i=1}^{L}P\left({S}_{i}\right)=1$, as the axiom states that the probability of a sample space is 1 and the union of all the samples forms the sample space. The probability of obtaining each of the L samples does not have to be constant—i.e., $P\left({S}_{1}\right)\ne P\left({S}_{2}\right)$ is absolutely acceptable—and we can also restrict the probability of a particular sample to 0 if some animals within the sample are considered inappropriate as study units. The other feature of these samples is that two samples can include the same animals, and the probability of an animal k being selected $\left({\pi}_{k}\right)$ is calculated by summing the probabilities of all samples including this animal—i.e., ${\pi}_{k}={\sum}_{S:k\in S}P\left(S\right)$. An intuitive numeric example is displayed in Figure 1. Eventually, we define the sampling weights ${w}_{k}$ as the reciprocal of the inclusion probability ${\pi}_{k}$ for any type of sampling method [15]. Generally, it is recommended that the veterinary researcher interprets the sampling weight of the animal k as the number of animals in the target population represented by this animal (a deeper treatment of sampling weights can be found in Gelman [16]; however, non-response adjustments are beyond the scope of this article).

## 3. Design-Based and Model-Based Approaches

#### 3.1. Overview of Design-Based Approach

#### 3.2. Overview of Model-Based Approach

## 4. Sampling Methods

#### 4.1. Simple Random Sampling

#### 4.2. Stratified Random Sampling

#### 4.3. Cluster Sampling

#### 4.3.1. One-Stage Cluster Sampling

#### 4.3.2. Two-Stage Cluster Sampling

- $\mathbb{E}\left[\widehat{\tau}\right]=\mathbb{E}\left[\mathbb{E}\left[\widehat{\tau}|\mathit{Z}\right]\right]=\mathbb{E}\left[\mathbb{E}\left[{\sum}_{i=1}^{n}\left.\frac{N}{n}{\widehat{\tau}}_{i}\right|\mathit{Z}\right]\right]=\mathbb{E}\left[\mathbb{E}\left[{\sum}_{i=1}^{N}\left.\frac{N}{n}{Z}_{i}{\widehat{\tau}}_{i}\right|\mathit{Z}\right]\right]$ (partition theorem for expectations)
- $=\mathbb{E}\left[{\sum}_{i=1}^{N}\mathbb{E}\left[\left.\frac{N}{n}{Z}_{i}{\widehat{\tau}}_{i}\right|\mathit{Z}\right]\right]$ (the conditional expectation of a sum is the sum of the conditional expectations)
- $=\mathbb{E}\left[{\sum}_{i=1}^{N}\frac{N}{n}\mathbb{E}\left[\left.{Z}_{i}{\widehat{\tau}}_{i}\right|\mathit{Z}\right]\right]$ (expectation is a linear operator and $\frac{N}{n}$ is a constant)
- $=\mathbb{E}\left[{\sum}_{i=1}^{N}\frac{N}{n}{Z}_{i}\mathbb{E}\left[\left.{\widehat{\tau}}_{i}\right|\mathit{Z}\right]\right]$ (knowing a vector means the same as knowing every element of the vector; conditional on the selection status of every herd means knowing the selection status of any herd)
- $=\mathbb{E}\left[{\sum}_{i=1}^{N}\frac{N}{n}{Z}_{i}\mathbb{E}\left[{\widehat{\tau}}_{i}\right]\right]$ (${\widehat{\tau}}_{i}$ and $\mathit{Z}$ are independent)
- $=\mathbb{E}\left[{\sum}_{i=1}^{N}\frac{N}{n}{Z}_{i}{\tau}_{i}\right]$ (unbiased estimator for stratified random sampling for each herd)
- $={\sum}_{i=1}^{N}\frac{N}{n}{\tau}_{i}\mathbb{E}\left[{Z}_{i}\right]={\sum}_{i=1}^{N}\frac{N}{n}{\tau}_{i}\frac{n}{N}={\sum}_{i=1}^{N}{\tau}_{i}=\tau $ (linear property of expectation).

## 5. Sample Size Consideration

## 6. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A

## Appendix B

## Appendix C

## Appendix D

## Appendix E

- Joint probability: $P\left(A,B\right)=P\left(A|B\right)P\left(B\right)$,
- Independence of two events: $P\left(A|B\right)=P\left(A\right),$
- Law of total probability: $P\left(A\right)=P\left(A|B\right)P\left(B\right)+P\left(A|{B}^{c}\right)P\left({B}^{c}\right),$
- Expectation of a discrete random variable: $\mathbb{E}\left[X\right]={\sum}_{x}xf\left(x\right),$
- Property of expectation: $\mathbb{E}\left[aX+b\right]=a\mathbb{E}\left[X\right]+b,$
- Variance of a random variable: $Var\left(X\right)=\mathbb{E}\left[{X}^{2}\right]-\mathbb{E}{\left[X\right]}^{2}$
- Property of variance:
- $Var\left(aX+b\right)={a}^{2}Var\left(X\right),$
- If ${X}_{1},{X}_{2},\dots ,{X}_{n}$ are mutually independent, then $Var\left({\sum}_{i=i}^{n}{X}_{i}\right)={\sum}_{i=i}^{n}Var({X}_{i}),$

- Bias: $\mathrm{Bias}\left(\widehat{\theta}\right)=\mathbb{E}\left[\widehat{\theta}\right]-\theta ,$ unbiasedness implies $\mathrm{Bias}\left(\widehat{\theta}\right)=0,$
- Expectation for a function of two random variables: $\mathbb{E}\left[g\left(X,Y\right)\right]={\sum}_{x}{\sum}_{y}g\left(x,y\right)f\left(x,y\right),$
- Covariance: $Cov\left(X,Y\right)=\mathbb{E}\left[XY\right]-\mathbb{E}\left[X\right]\mathbb{E}\left[Y\right],$
- Properties of covariance:
- Independence between $X$ and $Y\u27f9Cov\left(X,Y\right)=\mathbb{E}\left[XY\right]-\mathbb{E}\left[X\right]\mathbb{E}\left[Y\right]=0,$
- $Var\left(X\right)=Cov\left(X,X\right),$
- $Cov\left(aX+b,cX+d\right)=acCov\left(X,Y\right),$
- $Cov\left({\sum}_{i=1}^{n}{X}_{i},{\sum}_{j=1}^{n}{X}_{j}\right)={\sum}_{i=1}^{n}{\sum}_{j=1}^{n}Cov\left({X}_{i},{X}_{j}\right)={\sum}_{i=1}^{n}Var\left({X}_{i}\right)+{\sum}_{i=1}^{n}{\sum}_{j\ne i}^{n}Cov\left({X}_{i},{X}_{j}\right),$

- Conditional expectation: $\mathbb{E}\left[X|Y\right]$ is a random variable subject to the variation of $X,$
- Properties of conditional expectation:
- Independence between $X$ and $Y\u27f9\mathbb{E}\left[X|Y\right]=\mathbb{E}\left[X\right]$ and $\mathbb{E}\left[g\left(X\right)|Y\right]=\mathbb{E}\left[g\left(X\right)\right],$
- $\mathbb{E}\left[g\left(Y\right)|Y\right]=g\left(Y\right)$ and $\mathbb{E}\left[g\left(Y\right)X|Y\right]=g\left(Y\right)\mathbb{E}\left[X|Y\right],$
- $\mathbb{E}\left[aX+bY|Z\right]=a\mathbb{E}\left[X|Z\right]+b\mathbb{E}\left[Y|Z\right],$

- Partition theorem for expectations: $\mathbb{E}\left[X\right]=\mathbb{E}\left[\mathbb{E}\left[X|Y\right]\right],$
- Variance partition formula: $Var\left(X\right)=Var\left(\mathbb{E}\left[X|Y\right]\right)+\mathbb{E}\left[Var\left(X|Y\right)\right],$
- Conditional variance formula: $Var\left(X|Y\right)=\mathbb{E}\left[{X}^{2}|Y\right]-\mathbb{E}{\left[X|Y\right]}^{2}.$

## References

- Sano, H.; Barker, K.; Odom, T.; Lewis, K.; Giordano, P.; Walsh, V.; Chambers, J.P. A survey of dog and cat anaesthesia in a sample of veterinary practices in New Zealand. N. Z. Vet. J.
**2018**, 66, 85–92. [Google Scholar] [CrossRef] - Thomson, K.; Rantala, M.; Hautala, M.; Pyörälä, S.; Kaartinen, L. Cross-sectional prospective survey to study indication-based usage of antimicrobials in animals: Results of use in cattle. BMC Vet. Res.
**2008**, 4, 15. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Ouyang, Z.; Sargeant, J.; Thomas, A.; Wycherley, K.; Ma, R.; Esmaeilbeigi, R.; Versluis, A.; Stacey, D.; Stone, E.; Poljak, Z.; et al. A scoping review of ‘big data’, ‘informatics’, and ‘bioinformatics’ in the animal health and veterinary medical literature. Anim. Health Res. Rev.
**2019**, 20, 1–18. [Google Scholar] [CrossRef] [PubMed] - Valletta, J.J.; Torney, C.; Kings, M.; Thornton, A.; Madden, J. Applications of machine learning in animal behaviour studies. Anim. Behav.
**2017**, 124, 203–220. [Google Scholar] [CrossRef] - Cernek, P.; Bollig, N.; Anklam, K.; Döpfer, D. Hot topic: Detecting digital dermatitis with computer vision. J. Dairy Sci.
**2020**, 103, 9110–9115. [Google Scholar] [CrossRef] - Astill, J.; Dara, R.A.; Fraser, E.D.G.; Sharif, S. Detecting and predicting emerging disease in poultry with the implementation of new technologies and big data: A focus on avian influenza virus. Front. Vet. Sci.
**2018**, 5. [Google Scholar] [CrossRef] [PubMed] - Skinner, C.; Wakefield, J. Introduction to the design and analysis of complex survey data. Stat. Stat. Sci.
**2017**, 32, 165–175. [Google Scholar] [CrossRef] [Green Version] - Revilla, M.; Lenoir, G.; Flatres-Grall, L.; Muñoz-Tamayo, R.; Friggens, N.C. Quantifying growth perturbations over the fattening period in swine via mathematical modelling. bioRxiv
**2020**. [Google Scholar] [CrossRef] - Mansfield, H.C.; Winthrop, D. Alexis de Tocqueville, Democracy in America; University of Chicago Press: Chicago, IL, USA, 2000. [Google Scholar]
- Gregoire, T.G. Design-based and model-based inference in survey sampling: Appreciating the difference. Can. J. For. Res.
**1998**, 28, 1429–1447. [Google Scholar] [CrossRef] - Jones, G.; Johnson, W.O. A Bayesian superpopulation approach to inference for finite populations based on imperfect diagnostic outcomes. J. Agric. Biol. Environ. Stat.
**2016**, 21, 314–327. [Google Scholar] [CrossRef] - Yang, D.A.; Johnson, W.O.; Müller, K.R.; Gates, M.C.; Laven, R.A. Estimating the herd and cow level prevalence of bovine digital dermatitis on New Zealand dairy farms: A Bayesian superpopulation approach. Prev. Vet. Med.
**2019**, 165, 76–84. [Google Scholar] [CrossRef] [PubMed] - Little, R.J. To model or not to model? Competing modes of inference for finite population sampling. J. Am. Stat. Assoc.
**2004**, 99, 546–556. [Google Scholar] [CrossRef] [Green Version] - Baffetta, F.; Fattorini, L.; Franceschi, S.; Corona, P. Design-based approach to k-nearest neighbours technique for coupling field and remotely sensed data in forest surveys. Remote Sens. Environ.
**2009**, 113, 463–475. [Google Scholar] [CrossRef] [Green Version] - Pfeffermann, D. The use of sampling weights for survey data analysis. Stat. Methods Med Res.
**1996**, 5, 239–261. [Google Scholar] [CrossRef] - Gelman, A. Struggles with survey weighting and regression modeling. Stat. Sci.
**2007**, 22, 153–164. [Google Scholar] [CrossRef] [Green Version] - Chen, Q.; Elliott, M.R.; Haziza, D.; Yang, Y.; Ghosh, M.; Little, R.J.; Sedransk, J.; Thompson, M. Approaches to improving survey-weighted estimates. Stat. Sci.
**2017**, 32, 227–248. [Google Scholar] [CrossRef] - Stehman, S.V. Practical implications of design-based sampling inference for thematic map accuracy assessment. Remote Sens. Environ.
**2000**, 72, 35–45. [Google Scholar] [CrossRef] - Tate, J.E.; Hudgens, M.G. Estimating population size with two-and three-stage sampling designs. Am. J. Epidemiol.
**2007**, 165, 1314–1320. [Google Scholar] [CrossRef] - Dorazio, R.M. Design-based and model-based inference in surveys of freshwater mollusks. J. N. Am. Benthol. Soc.
**1999**, 18, 118–131. [Google Scholar] [CrossRef] - West, P.W. Simple random sampling of individual items in the absence of a sampling frame that lists the individuals. N. Z. J. For. Sci.
**2016**, 46, 15. [Google Scholar] [CrossRef] [Green Version] - Abera, Z.; Degefu, H.; Gari, G.; Kidane, M. Sero-prevalence of lumpy skin disease in selected districts of West Wollega zone, Ethiopia. BMC Vet. Res.
**2015**, 11, 135. [Google Scholar] [CrossRef] [Green Version] - Abebe, R.; Hatiya, H.; Abera, M.; Megersa, B.; Asmare, K. Bovine mastitis: Prevalence, risk factors and isolation of Staphylococcus aureus in dairy herds at Hawassa milk shed, South Ethiopia. BMC Vet. Res.
**2016**, 12, 270. [Google Scholar] [CrossRef] [Green Version] - Sulayeman, M.; Dawo, F.; Mammo, B.; Gizaw, D.; Shegu, D. Isolation, molecular characterization and sero-prevalence study of foot-and-mouth disease virus circulating in central Ethiopia. BMC Vet. Res.
**2018**, 14, 110. [Google Scholar] [CrossRef] - Horvitz, D.G.; Thompson, D.J. A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc.
**1952**, 47, 663–685. [Google Scholar] [CrossRef] - Cochran, W.G. Sampling Techniques, 3rd ed.; John Wiley & Sons, Inc.: New York, NY, USA, 1977. [Google Scholar]
- Lohr, S.L. Sampling: Design and Analysis; Cengage Learning: Boston, MA, USA, 2010. [Google Scholar]
- Heayns, B.; Baugh, S. Survey of veterinary surgeons on the introduction of serological testing to assess revaccination requirements. Vet. Rec.
**2012**, 170, 74. [Google Scholar] [CrossRef] - Atuman, Y.J.; Ogunkoya, A.B.; Adawa, D.A.Y.; Nok, A.J.; Biallah, M.B. Dog ecology, dog bites and rabies vaccination rates in Bauchi State, Nigeria. Int. J. Vet. Sci. Med.
**2014**, 2, 41–45. [Google Scholar] [CrossRef] [Green Version] - Kaler, J.; Wani, S.A.; Hussain, I.; Beg, S.A.; Makhdoomi, M.; Kabli, Z.A.; Green, L.E. A clinical trial comparing parenteral oxytetracyline and enrofloxacin on time to recovery in sheep lame with acute or chronic footrot in Kashmir, India. BMC Vet. Res.
**2012**, 8, 12. [Google Scholar] [CrossRef] [Green Version] - Wickham, J.D.; Stehman, S.V.; Smith, J.H.; Wade, T.G.; Yang, L. A priori evaluation of two-stage cluster sampling for accuracy assessment of large-area land-cover maps. Int. J. Remote Sens.
**2004**, 25, 1235–1252. [Google Scholar] [CrossRef] - Getahun, K.; Kelay, B.; Bekana, M.; Lobago, F. Bovine mastitis and antibiotic resistance patterns in Selalle smallholder dairy farms, central Ethiopia. Trop. Anim. Health Prod.
**2008**, 40, 261–268. [Google Scholar] [CrossRef] - Regassa, A.; Tassew, A.; Amenu, K.; Megersa, B.; Abunna, F.; Mekibib, B.; Macrotty, T.; Ameni, G. A cross-sectional study on bovine tuberculosis in Hawassa town and its surroundings, Southern Ethiopia. Trop. Anim. Health Prod.
**2010**, 42, 915–920. [Google Scholar] [CrossRef] - Solís-Calderón, J.J.; Segura-Correa, J.C.; Aguilar-Romero, F.; Segura-Correa, V.M. Detection of antibodies and risk factors for infection with bovine respiratory syncytial virus and parainfluenza virus-3 in beef cattle of Yucatan, Mexico. Prev. Vet. Med.
**2007**, 82, 102–110. [Google Scholar] [CrossRef] - Hotchkiss, J.W.; Reid, S.; Christley, R. A survey of horse owners in Great Britain regarding horses in their care. Part 1: Horse demographic characteristics and management. Equine Vet. J.
**2007**, 39, 294–300. [Google Scholar] [CrossRef] - Bisson, A.; Maley, S.; Rubaire-Akiiki, C.; Wastling, J. The seroprevalence of antibodies to Toxoplasma gondii in domestic goats in Uganda. Acta Trop.
**2000**, 76, 33–38. [Google Scholar] [CrossRef] - Stevenson, M.A. Sample size estimation in veterinary epidemiologic research. Front. Vet. Sci.
**2021**, 7. [Google Scholar] [CrossRef]

**Figure 1.**An intuitive explanation of the probability of a sample selection P(S) and the probability of an animal selection π.

**Table 1.**Quantities used in a two-stage cluster sampling design, where stratified random sampling is implemented in the second stage.

$N$ | The number of dairy herds in the region. |

$n$ | The number of dairy herds in the sample. |

${M}_{ij}$ | The number of cows in the jth stratum in the ith herd. |

${m}_{ij}$ | The sample size in the jth stratum in the ith herd. |

${M}_{i}$ | The number of cows in the ith herd (herd size for herd i), ${M}_{i}={\sum}_{j=1}^{J}{M}_{ij}$. |

${m}_{i}$ | The sample size for herd i. |

$M$ | The total number of cows in the region, $M={\sum}_{i=1}^{N}{M}_{i}.$ |

${y}_{ijk}$ | The disease outcome (1/0) of the kth cow in the the jth stratum in the ith herd. |

${\tau}_{i}$ | The total number of diseased cows in the ith herd, ${\tau}_{i}={\sum}_{j=1}^{J}{\sum}_{k=1}^{{M}_{ij}}{y}_{ijk}$. |

${p}_{i}$ | The herd prevalence for the ith herd, ${p}_{i}=\frac{{\tau}_{i}}{{M}_{i}}$. |

$\tau $ | The total number of diseased cows in the region, $\tau ={\sum}_{i=1}^{N}{\sum}_{j=1}^{J}{\sum}_{k=1}^{{M}_{ij}}{y}_{ijk}$. |

$p$ | The overall prevalence in the region, $p=\frac{\tau}{M}$. |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Yang, D.A.; Laven, R.A.
Design-Based Approach for Analysing Survey Data in Veterinary Research. *Vet. Sci.* **2021**, *8*, 105.
https://doi.org/10.3390/vetsci8060105

**AMA Style**

Yang DA, Laven RA.
Design-Based Approach for Analysing Survey Data in Veterinary Research. *Veterinary Sciences*. 2021; 8(6):105.
https://doi.org/10.3390/vetsci8060105

**Chicago/Turabian Style**

Yang, D. Aaron, and Richard A. Laven.
2021. "Design-Based Approach for Analysing Survey Data in Veterinary Research" *Veterinary Sciences* 8, no. 6: 105.
https://doi.org/10.3390/vetsci8060105