# Design-Based Approach for Analysing Survey Data in Veterinary Research

## Abstract

## 1. Introduction

## 2. Overview of Probability Sampling

_{1}, S

_{2}, S

_{3}, … We can then define a set “sample space” (denoted as Ω) that contains all these samples. With probability sampling, a probability can be explicitly assigned for each of the samples, with the constraint that ${\sum}_{i=1}^{L}P\left({S}_{i}\right)=1$, as the axiom states that the probability of a sample space is 1 and the union of all the samples forms the sample space. The probability of obtaining each of the L samples does not have to be constant—i.e., $P\left({S}_{1}\right)\ne P\left({S}_{2}\right)$ is absolutely acceptable—and we can also restrict the probability of a particular sample to 0 if some animals within the sample are considered inappropriate as study units. The other feature of these samples is that two samples can include the same animals, and the probability of an animal k being selected $\left({\pi}_{k}\right)$ is calculated by summing the probabilities of all samples including this animal—i.e., ${\pi}_{k}={\sum}_{S:k\in S}P\left(S\right)$. An intuitive numeric example is displayed in Figure 1. Eventually, we define the sampling weights ${w}_{k}$ as the reciprocal of the inclusion probability ${\pi}_{k}$ for any type of sampling method [15]. Generally, it is recommended that the veterinary researcher interprets the sampling weight of the animal k as the number of animals in the target population represented by this animal (a deeper treatment of sampling weights can be found in Gelman [16]; however, non-response adjustments are beyond the scope of this article).

## 3. Design-Based and Model-Based Approaches

#### 3.1. Overview of Design-Based Approach

#### 3.2. Overview of Model-Based Approach

## 4. Sampling Methods

#### 4.1. Simple Random Sampling

#### 4.2. Stratified Random Sampling

#### 4.3. Cluster Sampling

#### 4.3.1. One-Stage Cluster Sampling

#### 4.3.2. Two-Stage Cluster Sampling

- $\mathbb{E}\left[\widehat{\tau}\right]=\mathbb{E}\left[\mathbb{E}\left[\widehat{\tau}|\mathit{Z}\right]\right]=\mathbb{E}\left[\mathbb{E}\left[{\sum}_{i=1}^{n}\left.\frac{N}{n}{\widehat{\tau}}_{i}\right|\mathit{Z}\right]\right]=\mathbb{E}\left[\mathbb{E}\left[{\sum}_{i=1}^{N}\left.\frac{N}{n}{Z}_{i}{\widehat{\tau}}_{i}\right|\mathit{Z}\right]\right]$ (partition theorem for expectations)
- $=\mathbb{E}\left[{\sum}_{i=1}^{N}\mathbb{E}\left[\left.\frac{N}{n}{Z}_{i}{\widehat{\tau}}_{i}\right|\mathit{Z}\right]\right]$ (the conditional expectation of a sum is the sum of the conditional expectations)
- $=\mathbb{E}\left[{\sum}_{i=1}^{N}\frac{N}{n}\mathbb{E}\left[\left.{Z}_{i}{\widehat{\tau}}_{i}\right|\mathit{Z}\right]\right]$ (expectation is a linear operator and $\frac{N}{n}$ is a constant)
- $=\mathbb{E}\left[{\sum}_{i=1}^{N}\frac{N}{n}{Z}_{i}\mathbb{E}\left[\left.{\widehat{\tau}}_{i}\right|\mathit{Z}\right]\right]$ (knowing a vector means the same as knowing every element of the vector; conditional on the selection status of every herd means knowing the selection status of any herd)
- $=\mathbb{E}\left[{\sum}_{i=1}^{N}\frac{N}{n}{Z}_{i}\mathbb{E}\left[{\widehat{\tau}}_{i}\right]\right]$ (${\widehat{\tau}}_{i}$ and $\mathit{Z}$ are independent)
- $=\mathbb{E}\left[{\sum}_{i=1}^{N}\frac{N}{n}{Z}_{i}{\tau}_{i}\right]$ (unbiased estimator for stratified random sampling for each herd)
- $={\sum}_{i=1}^{N}\frac{N}{n}{\tau}_{i}\mathbb{E}\left[{Z}_{i}\right]={\sum}_{i=1}^{N}\frac{N}{n}{\tau}_{i}\frac{n}{N}={\sum}_{i=1}^{N}{\tau}_{i}=\tau $ (linear property of expectation).

## 5. Sample Size Consideration

## 6. Conclusions

## Appendix A

## Appendix B

## Appendix C

## Appendix D

## Appendix E

- Joint probability: $P\left(A,B\right)=P\left(A|B\right)P\left(B\right)$,
- Independence of two events: $P\left(A|B\right)=P\left(A\right),$
- Law of total probability: $P\left(A\right)=P\left(A|B\right)P\left(B\right)+P\left(A|{B}^{c}\right)P\left({B}^{c}\right),$
- Expectation of a discrete random variable: $\mathbb{E}\left[X\right]={\sum}_{x}xf\left(x\right),$
- Property of expectation: $\mathbb{E}\left[aX+b\right]=a\mathbb{E}\left[X\right]+b,$
- Variance of a random variable: $Var\left(X\right)=\mathbb{E}\left[{X}^{2}\right]-\mathbb{E}{\left[X\right]}^{2}$
- Property of variance:
- $Var\left(aX+b\right)={a}^{2}Var\left(X\right),$
- If ${X}_{1},{X}_{2},\dots ,{X}_{n}$ are mutually independent, then $Var\left({\sum}_{i=i}^{n}{X}_{i}\right)={\sum}_{i=i}^{n}Var({X}_{i}),$

- Bias: $\mathrm{Bias}\left(\widehat{\theta}\right)=\mathbb{E}\left[\widehat{\theta}\right]-\theta ,$ unbiasedness implies $\mathrm{Bias}\left(\widehat{\theta}\right)=0,$
- Expectation for a function of two random variables: $\mathbb{E}\left[g\left(X,Y\right)\right]={\sum}_{x}{\sum}_{y}g\left(x,y\right)f\left(x,y\right),$
- Covariance: $Cov\left(X,Y\right)=\mathbb{E}\left[XY\right]-\mathbb{E}\left[X\right]\mathbb{E}\left[Y\right],$
- Properties of covariance:
- Independence between $X$ and $Y\u27f9Cov\left(X,Y\right)=\mathbb{E}\left[XY\right]-\mathbb{E}\left[X\right]\mathbb{E}\left[Y\right]=0,$
- $Var\left(X\right)=Cov\left(X,X\right),$
- $Cov\left(aX+b,cX+d\right)=acCov\left(X,Y\right),$
- $Cov\left({\sum}_{i=1}^{n}{X}_{i},{\sum}_{j=1}^{n}{X}_{j}\right)={\sum}_{i=1}^{n}{\sum}_{j=1}^{n}Cov\left({X}_{i},{X}_{j}\right)={\sum}_{i=1}^{n}Var\left({X}_{i}\right)+{\sum}_{i=1}^{n}{\sum}_{j\ne i}^{n}Cov\left({X}_{i},{X}_{j}\right),$

- Conditional expectation: $\mathbb{E}\left[X|Y\right]$ is a random variable subject to the variation of $X,$
- Properties of conditional expectation:
- Independence between $X$ and $Y\u27f9\mathbb{E}\left[X|Y\right]=\mathbb{E}\left[X\right]$ and $\mathbb{E}\left[g\left(X\right)|Y\right]=\mathbb{E}\left[g\left(X\right)\right],$
- $\mathbb{E}\left[g\left(Y\right)|Y\right]=g\left(Y\right)$ and $\mathbb{E}\left[g\left(Y\right)X|Y\right]=g\left(Y\right)\mathbb{E}\left[X|Y\right],$
- $\mathbb{E}\left[aX+bY|Z\right]=a\mathbb{E}\left[X|Z\right]+b\mathbb{E}\left[Y|Z\right],$

- Partition theorem for expectations: $\mathbb{E}\left[X\right]=\mathbb{E}\left[\mathbb{E}\left[X|Y\right]\right],$
- Variance partition formula: $Var\left(X\right)=Var\left(\mathbb{E}\left[X|Y\right]\right)+\mathbb{E}\left[Var\left(X|Y\right)\right],$
- Conditional variance formula: $Var\left(X|Y\right)=\mathbb{E}\left[{X}^{2}|Y\right]-\mathbb{E}{\left[X|Y\right]}^{2}.$

## References

**Figure 1.**An intuitive explanation of the probability of a sample selection P(S) and the probability of an animal selection π.

**Table 1.**Quantities used in a two-stage cluster sampling design, where stratified random sampling is implemented in the second stage.

$N$ | The number of dairy herds in the region. |

$n$ | The number of dairy herds in the sample. |

${M}_{ij}$ | The number of cows in the jth stratum in the ith herd. |

${m}_{ij}$ | The sample size in the jth stratum in the ith herd. |

${M}_{i}$ | The number of cows in the ith herd (herd size for herd i), ${M}_{i}={\sum}_{j=1}^{J}{M}_{ij}$. |

${m}_{i}$ | The sample size for herd i. |

$M$ | The total number of cows in the region, $M={\sum}_{i=1}^{N}{M}_{i}.$ |

${y}_{ijk}$ | The disease outcome (1/0) of the kth cow in the the jth stratum in the ith herd. |

${\tau}_{i}$ | The total number of diseased cows in the ith herd, ${\tau}_{i}={\sum}_{j=1}^{J}{\sum}_{k=1}^{{M}_{ij}}{y}_{ijk}$. |

${p}_{i}$ | The herd prevalence for the ith herd, ${p}_{i}=\frac{{\tau}_{i}}{{M}_{i}}$. |

$\tau $ | The total number of diseased cows in the region, $\tau ={\sum}_{i=1}^{N}{\sum}_{j=1}^{J}{\sum}_{k=1}^{{M}_{ij}}{y}_{ijk}$. |

$p$ | The overall prevalence in the region, $p=\frac{\tau}{M}$. |

