Bayesian Assessment of True Prevalence of Paratuberculosis Infection in Dairy Herds and Their Parity Subgroups

Katalin Veres; Zsolt Lang; László Ózsvári

doi:10.3390/pathogens14090900

,

and

¹

Department of Biostatistics, University of Veterinary Medicine Budapest, 1078 Budapest, Hungary

²

Department of Veterinary Forensics and Economics, University of Veterinary Medicine Budapest, 1078 Budapest, Hungary

³

National Laboratory of Infectious Animal Diseases, Antimicrobial Resistance, Veterinary Public Health and Food Chain Safety, University of Veterinary Medicine, 1078 Budapest, Hungary

^*

Author to whom correspondence should be addressed.

Pathogens2025, 14(9), 900;https://doi.org/10.3390/pathogens14090900

Version Notes

Order Reprints

Abstract

Paratuberculosis is a widespread infectious disease in ruminants that leads to significant economic losses in livestock production. In this study, we developed a practical method for predicting the likelihood of the herd-level presence of the infection and estimating its prevalence in subgroups of a dairy herd—specifically, first-time calving cows (primiparous) and those that have calved more than once (multiparous). We fit a Bayesian hierarchical model to cow-level data, incorporating prior knowledge about regional prevalence of infection to improve the accuracy and reliability of the estimates. The model was tested using synthetic data representing six regional scenarios in four countries (Chile, Denmark, Italy, and Hungary). The likelihood that a herd is infected is evaluated using Bayes factors and posterior probability of infection. Both the Bayes factor and the posterior probability of infection classified the simulated herds in accordance with the proportions of infected herds. Summary measures obtained for within-herd true prevalence estimates demonstrated acceptable accuracy. The R and STAN codes of the model are available as an open-access tool. The model can be customized for any region using real local data and prior information. The relationship between true and apparent prevalence is linear and stable and therefore can be estimated well. We found that, in Hungary, the TP/AP ratios were 1.6 and 1.5 for primi- and multiparous cows, respectively.

Keywords:

paratuberculosis; John’s disease; dairy cattle; prevalence modelling; Bayesian latent class modelling; true prevalence; parity; historical priors; Bayes factor

1. Introduction

Paratuberculosis (PTBC), or Johne’s disease, is a chronic infectious condition caused by Mycobacterium avium subspecies paratuberculosis (MAP), which affects ruminants worldwide and poses a significant challenge to animal health and dairy farm productivity [1,2,3,4,5]. Estimating the prevalence of MAP within herds is a key component of both disease surveillance and economic decision making at the herd level.

Typically, MAP infection happens in the early neonatal period by colostrum, bulk colostrum, or through a contaminated environment. The infection progresses slowly and has a long incubation period, with around 95% of infected cattle in subclinical stage and only 5% displaying clinical signs. Both the clinical and subclinical stages of the infection cause substantial economic losses, but the subclinical stage is economically more influential; at this stage, the animals do not show obvious symptoms but the negative impact on production is already present [5]. The resistance to MAP progression increases with age. However, cattle can also become infected at a later age due to high infection pressure. In addition to the age of the cows, genetics is also likely to influence susceptibility to MAP infection [3].

Direct and indirect methods exist to detect MAP infection. Direct methods are expensive and have a long turnaround time. Milk and serum ELISA are indirect methods detecting the presence of antibodies in the sample. Both have low costs and a rapid turnaround time. Milk ELISA is not invasive and can be routinely and conveniently performed [3]. S/P ratios (numerical ELISA test results) are classified into negative and positive categories according to predefined cut-offs. To avoid excessive false positive results, the cut-off of MAP ELISA is calibrated to favor specificity over sensitivity [6]. ELISA specificity is high (98–100%), its main disadvantage is the low sensitivity (15% for milk ELISA) in the early stages of the infection due to the delay in the appearance of antibodies. The sensitivity depends on the stage of the infection, on the stage of lactation, and on the age of the animal, reaching 90% in clinically affected cattle [3].

The latent and slow progression of MAP infection and the low sensitivity of diagnostic tests make the diagnosis challenging, leading to a marked difference between apparent and true prevalence levels. Disease frequency is the base measure serving as a starting point for the choice of appropriate testing method, control measures, and the assessment of the efficiency of the control program. Using crude apparent prevalence in herd health management can lead to poor decisions and to the failure of control programs. Appropriate statistical methods are needed to adjust apparent prevalence calculated from test results to obtain a prevalence estimate reflecting the true, underlying prevalence of a herd.

There has been considerable research interest worldwide in assessing PTBC prevalence at national or regional level. Nielsen and Toft [7] provided a comprehensive review of paratuberculosis prevalence across European farmed animals, focusing largely on regional data. Hierarchical Bayesian models were fitted to clustered PTBC prevalence data, where the true infection status of herds and animals was unobserved latent classes [8,9,10,11,12]. Herd true prevalence (HTP) and conditional within-herd true prevalence (CWHP) were estimated, using the methods originally developed by Hanson et al. [13] and Branscum et al. [14]. McAloon et al. [11] and other authors [8,9,10,12] modified the methodology of Branscum et al. [14] by introducing new notations and estimated PTBC prevalence in several regions.

In practice, prevalence estimation of different contagious diseases at the herd level is related mainly to substantiating freedom from infection [15], ranging from simple hypothesis tests [16,17] to complex Bayesian frameworks [18].

The aim of this study was to present a Bayesian model developed for a single herd, which can be used to estimate the within-herd animal level true prevalence of PTBC infection separately for primiparous and multiparous cows, as well as to infer the infection status of the herd, based on PTBC diagnostic test results and region-specific prior information. Additionally, we aimed to provide an internationally relevant context for our analysis by integrating prior information from diverse regional studies together with region-specific synthetic data.

2. Materials and Methods

We used the Bayes factor and the posterior infection probability to predict herd-level PTBC infection [19,20]. Within-herd animal-level prevalence was estimated using the method developed by Veres et al. [21] in the subgroups of primiparous and multiparous cows. We had real-world data available only from Hungary, but we also generated synthetic datasets with the characteristics of regions for which prior information was available in the literature. To demonstrate how the model worked in practice, we used the Hungarian field data together with relevant prior knowledge. A herd was considered infected if it had at least one truly infected cow. Throughout the study, we assumed the use of the IDEXX milk ELISA test. The sensitivity (Se) of the test was assumed to be age-dependent and calculated according to (1). The specificity (Sp) of the test was considered to be 98.6% [22].

2.1. Hungarian Data

A nationwide voluntary PTBC testing program started in February 2018 in Hungary, which was set up for dairy farms to provide help for herd managers to reduce the prevalence and the impact of PTBC in their herds. A Bayesian model was fitted to these data in our previous study [21] and the obtained posterior distribution was used as informative prior in this study. The data consisted of the PTBC screening test results of 55,594 cows in 116 large (number of cows ≥ 100) intensive dairy cattle herds, which represented 24.3% of all dairy cows in Hungary. The detailed description of this dataset and the analysis providing the posterior distributions can be found in Veres et al. [21]. The data collected were used for the following purposes: (1) as the basis for the prior distribution; (2) we illustrated our method using data from two real herds selected from the database; (3) we generated synthetic data for other regions from one of the large Hungarian herds in the database, as described below.

2.2. Data for Other Regions

We analyzed synthetic datasets mimicking herds in Hungary, Denmark, Southern Italy, Lombardy (Northern Italy), Veneto (Northern Italy), and Chile. To imitate real-world data, we manipulated the Hungarian data so that it corresponded to the prevalence and the average herd size of the regions analyzed (Table 1).

Table 1. Average size of dairy cattle herds reported in the literature and the size of herds used in the region-specific synthetic data.

Based on information from the literature, we generated 20 synthetic datasets for each region as follows: from the data of a sufficiently large Hungarian herd, we drew samples with replacement, in accordance with the average herd size reported for the given region. The original ELISA test results were replaced with independently simulated values. The MAP infection status of the herd was randomly generated from a Bernoulli distribution, with the parameter taken from the prior distribution of the region-specific herd true prevalence (HTP, see Table 2 and Table 3). If the herd was randomized to be infected, the infection probabilities of individual cows in parity subgroups were generated according to a Gaussian copula model [21,27] using region- and parity-specific beta prior distributions (see Appendix B.2 for details).

Table 2. Priors used along with Hungarian data.

Table 3. Priors used in the study.

The age-specific sensitivity was calculated for each cow based on the equation

l o g i t (S e (t)) = a - b \times e^{- c t},

(1)

where

a = 1.2, b = 3, c = 0.3

, and

l o g i t (x) = \ln \frac{x}{1 - x}

.

We adopted the parameter values in (1) from [28]. The specificity (Sp) of the test was considered to be 98.6% [22]. We calculated the apparent prevalence as the sum of the probability of true positive and false positive test results (A5). Test results of individual cows were drawn from a Bernoulli distribution with the apparent prevalence as parameter. If the herd was classified as uninfected, individual cow test results were drawn from a Bernoulli distribution with parameter 1-Sp.

2.3. Statistical Analysis

2.3.1. Inferring the Infection Status of Herds

To infer the infection probability of the herd, we used Bayes factors [20], which is the ratio of the probability of the observed data in one statistical model to the probability of the same data in another model. The Bayes factor quantifies how much more (or less) likely the data are in one model than in another, thus indicating which model provides a better fit to the observed data. The Bayes factor was calculated using the R package bridgesampling [29], version 1.1.2. In our setting, we compared a model assuming the herd was infected with another model assuming the herd was not infected. The posterior probability of the model given the data is

\frac{B F \times μ_{H T P}}{B F \times μ_{H T P} + (1 - μ_{H T P})}

, where

B F

is the Bayes factor, and

μ_{H T P}

is the expected value of the prior distribution of HTP.

2.3.2. Estimating the PTBC Infection Prevalence in a Single Herd

If—based on the assessment of the infection status—we assume that the herd H is not infected, then no further calculations are needed. If the herd H is thought to be infected, we estimate

{C H W P}_{H 1}

and

{C H W P}_{H 2}

, the conditional within-herd prevalences of infection of primiparous and multiparous cows, respectively. A Bayesian latent class model, similar to the one presented in Veres et al. [21], adapted to one single herd, was fitted to the herd data to calculate the posterior distributions of

{C H W P}_{H 1}

and

{C H W P}_{H 2}

of herd H. To calculate the estimates, we used prior information: the results of the ELISA tests, the parity, and the age of the cows. Posterior distributions are the Bayesian estimates of the prevalence of PTBC associated with primiparous and multiparous cows in herd H, respectively.

It is relatively easy to calculate the proportion of positive cases (apparent prevalence, AP) from raw data, but in practice, due to false positive and false negative results, the true underlying prevalence is often different. Our model aims to provide an estimate of the true prevalence of infection, considering the limitations of the diagnostic test.

We assume that the true prevalence of infection in a given herd and parity group is influenced by the regional mean prevalence in that parity group, as well as by unobservable factors specific to the herd and parity group. These factors are modeled as normally distributed random effects, assessed by their variances.

The model approximates the posterior probability distributions of the parameters, conditional on the observed data and on the prior distributions of herd true prevalence (HTP), parity group prevalences (

{C H W P}_{H 1}

and

{C H W P}_{H 2}

), and the variances of the random effects, by means of the Markov Chain Monte Carlo (MCMC) method. For a more technical description and model checks, see Appendix C.

The Bayesian model runs using the rstan package of the statistical software R 4.1.3 [30,31] (https://github.com/VeresKatalin/PTBCprevalence, accessed on 1 September 2025).

2.3.3. Model Runs

The accuracy of our Bayesian model was tested by simulation. We ran the model on the simulated data of 20 herds for each of the six regions (120 herds in total) and examined the proportion of truly infected herds in which the true prevalence fell within the 95% credible interval (CrI) provided by the model. We also calculated the half-width of the CrI.

2.3.4. Prior Information

The model builds on expert opinions or insights from the historical data of the region, in the form of prior distributions. The model uses priors for the proportion of infected herds in the region (HTP), the mean regional within-herd infection prevalence of primiparous and multiparous cows (

μ_{1}

and

μ_{2}

), and for the variances of the herd- and parity-level random effects.

HTP and

μ_{1}

and

μ_{2}

are modeled by beta distributions. The beta distribution is defined on the interval [0, 1], making it ideal to fit proportions. Its flexibility allows it to express a wide range of prior beliefs. The variances of the random effects are described by inverse gamma priors. The inverse gamma distribution is commonly used as a prior for variance parameters because it ensures positivity and allows control over the expected variability. Its shape can reflect uncertainty or prior knowledge about the spread of random effects.

Priors for Hungary were defined to best fit the posterior distributions of the analyses of the results of a national PTBC screening program [21] (see Table 2). The procedure epi.betabuster [32] in R 4.1.0 package epiR [33] was used to specify the parameters of the beta distributions.

For Denmark, we derived region-specific CWHP₁ and CWHP₂ prior distributions from posterior distributions reported in the literature using the procedure epi.betabuster [32] in R 4.1.0 package epiR [33] to specify the parameters of the beta distributions. For the other regions, we derived region-specific prior distributions from prior distributions found in the literature [8,10,12,23] (Appendix A, Table A1). For the variances of the random effects (

σ^{2}

,

{σ_{1}}^{2}

and

{σ_{2}}^{2}

), we assumed the same prior distributions as for Hungary in each region. The details of the derivations of

μ_{1}

,

μ_{2}

can be found in Appendix B.1. The priors used in the analysis are shown in Table 3.

3. Results

3.1. Downloadable Application

The model infers the infection status of the herd and estimates the true prevalence for primiparous and multiparous cows from the data of the individual cows in the herd using prior information to incorporate the region’s characteristics. The software inputs data from one individual herd as xlsx file in the format shown in Table 4.

Table 4. Input data format for the model. HERD_ID: integer, herd ID. COW_ID: cow ID, integer, unique. MULTIPAR: parity indicator—0 for primiparous cows; 1 for multiparous cows. POS: milk ELISA positivity indicator—1 for test-positive cows; 0 for test-negative cows.

The model offers six regions to choose from or can be run with user-defined prior distributions and custom data.

3.2. Model Results on Real World Data

We illustrate the use of the model with three datasets: (1) a Hungarian herd with no positive test results (Herd 1); (2) a sample of 150 cows (approx. 20%) from Herd 1, to demonstrate model performance when the herd is not fully tested; and (3) a typical Hungarian herd (Herd 2). Herd 2 is typical in the sense that the parity and the age distribution of the cows, as well as the herd size, are representative of a large Hungarian dairy herd.

Table 5 and Table 6 summarize the characteristics of Herd 1. There was no ELISA-positive cow in this herd. The posterior probability of infection of the herd is low; the herd is classified as uninfected based on both the Bayes factor and the posterior probability of infection. If the herd is infected, the model estimates low within-herd prevalences.

Table 5. Descriptive characteristics and true prevalence estimates (Herd 1).

Table 6. Bayes factor and posterior probability of the herd being infected (Herd 1).

Table 7 and Table 8 summarize the characteristics of a sample of 150 cows from Herd 1. The posterior probability of infection of the herd is moderate; the herd is classified as uninfected according to the Bayes factor, and the level of the posterior probability of infection indicates that infection is weakly refuted. If the herd is assumed to be infected, then the model estimates low within-herd prevalences, with wide CrI and different estimates.

Table 7. Descriptive characteristics and true prevalence estimates for a sample of 150 cows from Herd 1.

Table 8. Bayes factor and posterior probability of the herd being infected for a sample of 150 cows from Herd 1.

Table 9 and Table 10 summarize the characteristics of Herd 2. This is a Hungarian herd with a typical infection prevalence and herd size. The posterior probability of infection of the herd is 100%; the herd is classified as infected based on both the Bayes factor and the posterior probability of infection. The model estimates of true prevalences are 1.6 and 1.5 times greater than the apparent prevalence in the two-parity groups, respectively. These results are consistent with the rule of thumb outlined in Section 3.4.

Table 9. Descriptive characteristics and true prevalence estimates (Herd 2).

Table 10. Bayes factor and posterior probability of the herd being infected (Herd 2).

3.3. Results of the Model on Synthetic Data

The model was run on the simulated data of 20 pseudo-herds per region. Table 11 summarizes the results.

Table 11. Summary measures characterizing the model estimates for the true prevalence in truly infected herds.

We evaluated the performance of the Bayes factor to infer the infection status of the herd. The Bayes factor is not related to the HTP prior of the region. The findings are summarized in Table 12.

Table 12. Infection status of all 120 pseudo-herds (20 herds in each region) evaluated based on the Bayes factor, categorized according to Kass and Raftery [20].

We calculated the posterior probability of the infection of the herd. We chose a categorization with five cut points. The findings are summarized in Table 13.

Table 13. Infection status of all 120 pseudo-herds (20 herds in each region) evaluated based on the posterior probability of infection, categorized based on Kass and Raftery [20].

3.4. Estimating the PTBC Infection Prevalence in a Single Herd Without the Bayesian Model

The same model was run for all herds tested in the Hungarian national screening program and a linear regression was fitted to the results. The goal was to find a simple relationship linking apparent and true prevalence. The estimated true prevalence was considered as a dependent variable and the apparent prevalence estimated from raw data was considered as an explanatory variable. The intercept was close to zero for both parity groups, so the apparent prevalence multiplied by a simple factor could be used to obtain an estimate of the true prevalence. The true prevalence was estimated as 1.6 times the apparent prevalence for primiparous cows and 1.5 times for multiparous cows in Hungary [34].

4. Discussion

This study presents a Bayesian model designed to estimate the true prevalence of paratuberculosis (PTBC) at herd level, separately for primiparous and multiparous cows, by incorporating informative priors derived from regional surveys or reflecting local expert opinions. This approach is particularly suitable for dairy cattle herds in regions where PTBC infection is endemic. Additionally, the model estimates the probability and odds that a given herd is infected by the posterior probability of infection and the Bayes factor, respectively.

Parity, used as a subgroup in the model, is a natural categorization of cows in herd management. This stratification enhances the model’s relevance for practical herd-level decision making. Since primiparous and multiparous cows differ significantly in their susceptibility to infection and infection detectability [6,22,35], handling them together can lead to biased conclusions. Breaking down the herd by parity results in more homogeneous groups with markedly different infection prevalences.

One of the main advantages of the Bayesian approach is that, by incorporating regional priors, the model can consider region-specific factors that cannot be directly measured or quantified. Thus, estimates are based not only on data from the target herd, but also on broader epidemiological patterns and environmental influences characteristic of the region.

In a previous Hungarian study [21], the mean true within-herd prevalence (CWHP) was estimated at 8.4% (95% credible interval: 6.6–10.4%) for primiparous cows and 15.8% (13.5–18.4%) for multiparous cows. Median CWHP estimates were 4.7% (3.2–6.4%) for primiparous and 12.4% (10–15%) for multiparous cows. These values formed the empirical basis for the priors used in our Bayesian model, so the analysis was based on well-characterized national data [21]. To provide an internationally relevant context for our analysis, we integrated prior information from diverse regional studies along with region-specific synthetic data. This approach demonstrated the flexibility and applicability of the model across different contexts.

Ideally, prior distributions are informed by posterior results from previous studies. Hungary and Denmark have published posterior distributions from PTBC prevalence studies using milk ELISA test [21,23]. For these two countries, posterior HTP and CWHP distributions were directly used as prior along with the age-dependent sensitivity and constant specificity values from Veres et al. [21]. For the other four regions, where only serum ELISA test results were available, HTP and CWHP priors were derived from published expert elicitation [8,10,12,23]. Consequently, the resulting posterior distributions may suggest bigger uncertainty than observed in the reality.

For the variance of the random herd effect and additive random parity effects, we initially derived priors from region-specific expert elicitation, but this led to unrealistically high variance estimates and consequent shrinkage toward the mean in some regions. To ensure greater accuracy in the estimates, we used the posterior distribution obtained from the Hungarian study as the prior distribution. While we acknowledge this is a limitation of our approach, we note that assuming the same prior of the variance of the random effects does not imply the same posterior variance of the within-herd animal level prevalence across all regions. The posterior variance of the CWHP also depends on the regional mean of CWHP and the observed data of the target herd.

We applied an age-dependent fixed value for the sensitivity of the diagnostic test, together with a fixed value for its specificity. Using prior distributions instead of fixed values may result in unstable or biased estimates and may lead to an unidentifiable model. The use of fixed values is confirmed by our previous study [21], in which we presented the possible range of Se and Sp, based on available published values and a detailed sensitivity analysis.

In the analysis of the infection status based on the value of Bayes factor, two uninfected herds were classified as “strongly refuted”. Both infected and uninfected herds appeared in the “refuted” category. All herds rated as “supported” and “strongly supported” were indeed infected. For 42 herds, the results were “weakly refuted” or “weakly supported”. Overall, the infection status was correctly identified in 66 of the 78 pseudo-herds that were not classified as weakly refuted or weakly supported.

When estimating the posterior probability of infection of the herds, prior information on herd true prevalence was also taken into account. Out of 120 pseudo-herds, 53 truly infected herds were classified as “strongly supported”, with no herds falling into the “strongly refuted” category. Only one truly infected herd was misclassified into the “refuted” group. The results were “weakly refuted” or “weakly supported” for 48 herds. Overall, the infection status was correctly identified for 69 out of the 72 pseudo-herds with more explicit results. From a Bayesian perspective, misclassification occurs because probabilities should preferably be interpreted as indicators rather than used as classifiers. Errors arise from forcing clear-cut decisions that are fundamentally based on probabilistic information.

Indeterminate classification may arise from a low prevalence in a high herd prevalence region or from a small sample size. If the herd with weak results was not fully tested, then herd managers may try to test a larger sample of cows. In case of a low prevalence in a high herd prevalence region, we recommend evaluating the effectiveness of herd management practices in preventing the spread of the disease between herds. If the practices introduced are appropriate, the herd can be considered uninfected; otherwise, it is reasonable to assume that the herd is infected. To assess the potential level of infection, the estimated CWHP prevalences need to be evaluated along with the width of the credible intervals. This approach provides a comprehensive picture of the infection status of the herd. To confirm the results obtained or to observe changes in the infection status of the herd, we recommend retesting the animals after a certain period of time and analyzing the test results by fitting the presented Bayesian model using the updated, most recent prior distributions. Given the economic and health damage caused by PTBC [4], we believe that it is worthwhile to take measures to prevent and reduce infection even if the farm has been classified as “weakly refuted”.

Table 11 contains summary measures obtained for within-herd true prevalence estimates in truly infected herds. For primiparous cows, the coverage, i.e., the proportion of herds where the simulated within-herd prevalence fell within the CrI, was at least 95%, demonstrating the accuracy of the Bayesian model and the prior distributions of primiparous CWHP in all regions. For multiparous cows the coverage was less accurate in Lombardy (85%) and Veneto (71%). This may indicate that there is too much heterogeneity in the prior distribution of multiparous CWHP in these regions. The mean half length of the CrI measures the uncertainty of estimates. In all regions, it is of the same order of magnitude as the mean primiparous CWHP values, and it is slightly lower than the mean multiparous CWHP values, which seems acceptable from a practical point of view.

In a previous study using Hungarian real-world data, a simple rule of thumb for estimating the true herd-level prevalence has been developed [34]. The model was run on the individual PTBC ELISA milk test results from 116 dairy cattle farms in Hungary. We fitted a linear regression with the estimated true within-herd prevalence as the dependent variable and the apparent prevalence as the explanatory variable. In the regression equation, the intercept estimate was close to zero, so the true within-herd prevalence could be estimated by multiplying the apparent prevalence by a constant [34]. Apart from providing a rule of thumb for Hungary, this method may allow us to derive a similar rule of thumb for other regions using their own herd-level data. It is theoretically appropriate to apply linear regression to pairs of apparent prevalence and estimated true prevalence. According to the relationship between apparent and true prevalences (A5), for specificities close to 1, the ratio of apparent to true prevalence depends primarily on the sensitivity of the test. To investigate the ratio of apparent to true prevalence, we also need to consider the age dependence of sensitivity (1). In Hungarian data, the age heterogeneity of primiparous cows is low, so the sensitivity in this category is relatively stable. Although the age difference is greater in multiparous cows, the sensitivity at higher ages is less dependent on age differences. Thus, for a region with the above characteristics, the relationship between true and apparent prevalence is linear and stable and therefore can be well estimated. The TP/AP ratios of 1.6 and 1.5 that we found for primi- and multiparous cows in Hungary are comparable to the ratios of 1.4, 1.5, and 1.5 published in the Danish study by Verdugo et al. [23] for three consecutive years: 2011, 2012, and 2013. As these ratios depend only on the age distribution and the test sensitivity, for regions using the same milk ELISA test and having similar age distribution to the Hungarian herds, the same values of 1.5 and 1.6 apply.

We note that the model uses synthetic datasets to illustrate the approach, and therefore, regional prevalence estimates should not be interpreted as measures of PTBC prevalence in these areas. When generating synthetic data, we undertook a pragmatic approach: resampling a Hungarian herd and replacing test results to reproduce regional herd sizes and prevalences. This technique may not fully capture the unique characteristics of herds in other regions, such as different age distributions, genetic backgrounds, or management practices, all of which can influence the age-sensitivity relationship and the true prevalence. The results for non-Hungarian regions demonstrate the flexibility of the model rather than its accuracy in these specific locations, where no local validation has taken place. The model can be customized for any region using real local data and prior information.

5. Conclusions

The Bayesian latent class model introduced in this study provides estimates of the herd-level infection status and the true prevalence of primiparous and multiparous cows within a single herd. Accurate estimation of within-herd infection prevalence is crucial, as higher prevalence leads to greater production losses and increased culling rates. Infection status assessment and prevalence estimation are the basis for developing effective control and eradication strategies, as well as for monitoring the success of these interventions over time. We demonstrated our model’s potential utility for prevalence estimation and infection status inference across diverse international settings. Combined with high-quality local data and prior information, our approach serves as a robust and adaptable framework to support evidence-based decision making at the herd level. The novelty of the method lies in the estimation of both the probability and the Bayes factor of infection of the target herd, combined with the estimation of infection prevalence within homogeneous subgroups of the herd, specifically parity groups. The approach can be readily generalized to other types of subgroups, such as different breeds, lactation stages, housing pens, or health status.

Author Contributions

Conceptualization, K.V., Z.L. and L.Ó.; methodology, Z.L. and K.V.; formal analysis, K.V.; resources, L.Ó.; data curation, L.Ó. and K.V.; writing—original draft preparation, K.V.; writing—review and editing, K.V., Z.L. and L.Ó.; visualization, K.V.; supervision, Z.L. and L.Ó.; project administration, L.Ó.; funding acquisition, L.Ó. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Project no. RRF-2.3.1-21-2022-00001 which was implemented with the support provided by the Recovery and Resilience Facility (RRF), financed under the National Recovery Fund budget estimate, RRF-2.3.1–21 funding scheme.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

We would like to thank Attila Monostori for the help in collecting Hungarian diagnostic data.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AP	apparent prevalence
CrI	credible interval
CWHP	conditional within-herd prevalence
HTP	herd true prevalence
MAP	Mycobacterium avium subspecies paratuberculosis
MCMC	Markov Chain Monte Carlo
PTBC	bovine paratuberculosis
Se	sensitivity
Sp	specificity
TP	true prevalence

Appendix A

Table A1. Information from the literature used to inform prior prevalence distributions of the model [8,10,12,23].

Region (Source of Prior Information)	Variable	Distribution from the Literature	Information
Denmark	HTP	beta (425.11, 144.64)	0.747 (0.781) ^a
(posterior)	CWHP	beta (3.584, 45.398)	0.055 (0.16) ^b
Southern Italy	HTP	beta (5.03, 7.04)	0.4 (0.2) ^b
(prior)	CWHP	$beta (μ ψ$ $, (1 - μ$ $) ψ$ )	$-$
	$CWHP (μ$ )	beta (3.14, 31.7)	0.09 (0.18) ^c
	$CWHP (ψ$ )	gamma (11.16, 11.31)	$-$
Northern Italy (Lombardy, Veneto)	HTP	beta (13.32, 6.28)	0.70 (>0.50) ^b
(prior)	CWHP	$beta (μ ψ$ $, (1 - μ$ $) ψ$ )	$-$
	$CWHP (μ$ )	beta (1.53, 15.69)	0.035 (<0.22) ^d
	$CWHP (ψ$ )	gamma (8.81, 1.42)	0.2 (<0.30) ^e
Chile	HTP	beta (14.2, 0.7)	0.97 (0.99) ^e
(prior)	CWHP	$beta (μ ψ$ $, (1 - μ$ $) ψ$ )	$-$
	$CWHP (μ$ )	beta (22.2, 176.9)	0.11 (0.15) ^e
	$CWHP (ψ$ )	gamma (9.1, 4.6)	0.25 (0.30) ^e

HTP—herd true prevalence; CWHP—conditional within-herd prevalence; Se—sensitivity of ELISA test; Sp—specificity of ELISA test; ^a—median and 95th percentile; ^b—mode and 5th percentile; ^c—estimates based on expert opinion: the mean prevalence is 0.09 with 95% certainty that it is not more than 0.18, where the expert is also confident that 90% of all herds have a prevalence less or equal to 0.35 with 95% certainty that it does not exceed 0.45; ^d—mode and 95th percentile; ^e—median and 95th percentile of the distribution of 90th percentile of the within-herd TP.

Appendix B. Derivation of Priors for Conditional Within-Herd Prevalence of Subgroups Used in the Model

We collected historical information about the distribution of herd true prevalence (HTP) and within-herd true prevalence (CWHP) of PTBC infection from the literature to obtain informative priors. The priors from the studies representing different regions had the following form:

H T P ~ b e t a (α_{H T P}, β_{H T P}),

m C W H P ~ b e t a (μ ψ, (1 - μ) ψ),

μ ~ b e t a (α_{μ}, β_{μ}), E (μ) = \frac{α_{μ}}{α_{μ} + β_{μ}},

ψ ~ g a m m a (α_{ψ}, β_{ψ}), E (ψ) = α_{ψ} \times β_{ψ},

where mCWHP is the mean conditional within-herd prevalence,

μ

is the expected value, and

ψ

is the precision parameter of the distribution of mCWHP.

Appendix B.1

The model requires the following priors:

Herd true prevalence: $H T P ~ b e t a {(α}_{H T P}, β_{H T P})$ .
Mean conditional within-herd prevalence for primiparous cows: ${m C W H P}_{1} ~ b e t a (μ_{1} ψ_{1}, (1 - μ_{1}) ψ_{1})$ .
Mean conditional within-herd prevalence for multiparous cows: ${m C W H P}_{2} ~ b e t a (μ_{2} ψ_{2}, (1 - μ_{2}) ψ_{2})$ .
Variance of the herd random effect: $σ^{2} ~ i n v . g a m m a (α_{σ}, β_{σ}) .$
Variance of the additive parity effect for primiparous cows: ${σ_{1}}^{2} ~ i n v . g a m m a (α_{σ_{1}}, β_{σ_{1}})$ .
Variance of the additive parity effect for multiparous cows: ${σ_{2}}^{2} ~ i n v . g a m m a (α_{σ_{2}}, β_{σ_{2}})$ .

The

μ_{1}

and

μ_{2}

parameters of

{m C W H P}_{1}

and

m {C W H P}_{2}

priors were derived from the mCWHP prior using additional assumptions. Based on Hungarian data, the proportion of primiparous cows was assumed to be 40% and the ratio of

μ_{1}

and

μ_{2}

was assumed to be 2.

Assumptions:

Population proportion of primiparous cows: p.
Ratio of $μ_{1}$ and $μ_{2}$ : R.

$E (μ) = p μ_{1} + (1 - p) μ_{2}$

(A1)

$\frac{μ_{2}}{μ_{1}} = R$

(A2)

From (A1) and (A2), we have

μ_{1}

=

\frac{E (μ)}{p (1 - R) + R}

and

μ_{2} = \frac{R \times E (μ)}{p (1 - R) + R}

.

Appendix B.2

The following priors were required for simulating the synthetic data:

Conditional within-herd prevalence for primiparous cows: ${C W H P}_{1} ~ b e t a (μ_{C W H P 1} ψ_{C W H P 1}, (1 - μ_{C W H P 1}) ψ_{C W H P 1})$
Conditional within-herd prevalence for multiparous cows: ${C W H P}_{2} ~ b e t a (μ_{C W H P 2} ψ_{C W H P 2}, (1 - μ_{C W H P 2}) ψ_{C W H P 2})$

μ_{C W H P 1}

and

μ_{C W H P 2}

were chosen to be the mean

μ_{1}

and

μ_{2}

of

{m C W H P}_{1}

and

{m C W H P}_{2}

priors. To be coherent with the model,

ψ_{C W H P 1}

and

ψ_{C W H P 2}

were assigned the

ψ

parameters from the Hungarian study.

To model the joint infection probabilities of primiparous and multiparous cows within a herd, we used a Gaussian copula [27] with a correlation coefficient of

ρ = 0.89

[21]. The construction is the following: Let

Z_{1} ~ N (0,1), Z_{2} ~ N (0,1)

with

C o r r (Z_{1}, Z_{2}) = 0.89

. Let

U_{1} = Φ (Z_{1})

and

U_{2} = Φ (Z_{2})

, where

Φ

is the cumulative standard normal distribution function. The joint distribution of the infection probabilities

{C W H P}_{1}

and

{C W H P}_{2}

of the parity groups is derived by transforming the uniformly distributed

U_{1}

and

U_{2}

using the inverse cumulative distribution functions of the respective prior distributions. Let

X_{1} = F_{{C W H P}_{1}}^{- 1} (U_{1})

and

X_{2} = F_{{C W H P}_{2}}^{- 1} (U_{2})

, where

F_{{C W H P}_{.}}^{- 1}

is the inverse of the beta cumulative distribution function of CWHP. Thus,

(X_{1}, X_{2})

are random variables with the desired marginal distributions (that of

{C W H P}_{1}

and

{C W H P}_{2}

) and a joint dependence structure induced via the Gaussian copula with

ρ = 0.89

.

Appendix C. Technical Description of the Single Herd Model

Let the parity be indexed by j = 1, 2 for primiparous and multiparous cows, respectively. Let Pos_jk be 1 if the k-th animal from the j-th parity is test positive and 0 if the selected animal is test negative. Further, let Pos_j = (…, Pos_j_k, …) be the vector containing all the n_j test results for animals from the j-th parity group. We assume that the components of Pos_j (indicating test results of all animals in parity group j) follow independent Bernoulli distributions.

{P o s}_{j} ~ \prod_{k = 1}^{n_{j}} B e r n o u l l i ({A P}_{j}), i f t h e h e r d a s s u m e d t o b e i n f e c t e d,

(A3)

{P o s}_{j} ~ \prod_{k = 1}^{n_{j}} B e r n o u l l i (1 - S p), i f t h e h e r d i s a s s u m e d t o b e n o t i n f e c t e d .

(A4)

{A P}_{j}

represents the conditional within-herd apparent prevalence, which is the probability that a randomly selected individual from the j-th parity group tests positive, given the herd is infected. Sp is the specificity of the test.

{A P}_{j}

can be expressed as the sum of the probability of true positive and false positive test results:

{A P}_{j} = S e \times {C W H P}_{j} + (1 - S p) \times (1 - {C W H P}_{j}),

(A5)

where

{C W H P}_{j}

is the conditional within-herd prevalence for the j-th parity group and

S e

,

S p

stands for the sensitivity and specificity of the diagnostic test. From this point on, we assume that the herd is infected.

According to Veres et al. [21],

{C H W P}_{H j} = B_{μ_{j}, ψ_{j}}^{- 1} (Φ (\frac{b_{H} + b_{H j}}{{(σ^{2} + σ_{j}^{2})}^{\frac{1}{2}}})), j = 1, 2 and

(A6)

ψ_{j} = \frac{1}{(σ^{2} + σ_{j}^{2})},

(A7)

where

B^{- 1}

is the inverse of the beta cumulative distribution function.

We use region-specific priors for

μ_{j} (j = 1, 2)

and for the variances of the random effects (

σ^{2}, σ_{j}^{2}, j = 1, 2)

in the following form:

μ_{1} ~ b e t a (α_{μ_{1}}, β_{μ_{1}}), μ_{2} ~ b e t a (α_{μ_{2}}, β_{μ_{2}}),

σ^{2} ~ i n v e r s e . g a m m a (α_{σ}, β_{σ}), {σ_{1}}^{2} ~ i n v e r s e . g a m m a (α_{σ_{1}}, β_{σ_{1}}), {σ_{2}}^{2} ~ i n v e r s e . g a m m a (α_{σ_{2}}, β_{σ_{2}})

To compute posterior estimates of the

{C H W P}_{H j}, b_{H}, a n d b_{H j}

parameters, four chains were run, each of 20,000 MCMC iterations. The runs were started from different sets of initial values of the parameters. The first 10,000 warm-up iterations in each chain were discarded. Convergence was checked by visual inspection of the trace plots of the chains. Additionally, the split R-hat formula [36] was used as convergence measure. Both approaches confirmed convergence.

The Bayesian model runs using the rstan package of the statistical software R 4.1.3 [30,31].

References

Rasmussen, P.; Barkema, H.W.; Mason, S.; Beaulieu, E.; Hall, D.C. Economic Losses Due to Johne’s Disease (Paratuberculosis) in Dairy Cattle. J. Dairy Sci. 2021, 104, 3123–3143. [Google Scholar] [CrossRef]
Ott, S.L.; Wells, S.J.; Wagner, B.A. Herd-Level Economic Losses Associated with Johne’s Disease on US Dairy Operations. Prev. Vet. Med. 1999, 40, 179–192. [Google Scholar] [CrossRef]
Fecteau, M.-E. Paratuberculosis in Cattle. Vet. Clin. N. Am. Food Anim. Pract. 2018, 34, 209–222. [Google Scholar] [CrossRef]
Ózsvári, L.; Harnos, A.; Lang, Z.; Monostori, A.; Strain, S.; Fodor, I. The Impact of Paratuberculosis on Milk Production, Fertility, and Culling in Large Commercial Hungarian Dairy Herds. Front. Vet. Sci. 2020, 7, 778. [Google Scholar] [CrossRef]
Fodor, I.; Matyovszky, B.; Biczó, A.; Ózsvári, L. The Losses Due to Paratuberculosis and Its Control in a Hungarian Large-Scale Holstein-Friesian Dairy Farm. Magy. Allatorvosok Lapja 2014, 136, 213–222. [Google Scholar]
Nielsen, S.S.; Enevoldsen, C.; Gröhn, Y.T. The Mycobacterium Avium Subsp. Paratuberculosis ELISA Response by Parity and Stage of Lactation. Prev. Vet. Med. 2002, 54, 1–10. [Google Scholar] [CrossRef]
Nielsen, S.S.; Toft, N. A Review of Prevalences of Paratuberculosis in Farmed Animals in Europe. Prev. Vet. Med. 2009, 88, 1–14. [Google Scholar] [CrossRef] [PubMed]
Sposato, A.; Fanelli, A.; Cordisco, M.; Trotta, A.; Galgano, M.; Corrente, M.; Buonavoglia, D. Bayesian Estimation of Prevalence of Johne’s Disease in Dairy Herds in Southern Italy. Prev. Vet. Med. 2022, 199, 105552. [Google Scholar] [CrossRef] [PubMed]
Ózsvári, L.; Lang, Z.; Monostori, A.; Kostoulas, P.; Fodor, I. Bayesian Estimation of the True Prevalence of Paratuberculosis in Hungarian Dairy Cattle Herds. Prev. Vet. Med. 2020, 183, 105124. [Google Scholar] [CrossRef]
Verdugo, C.; Valdes, M.F.; Salgado, M. Within-Herd Prevalence and Clinical Incidence Distributions of Mycobacterium Avium Subspecies Paratuberculosis Infection on Dairy Herds in Chile. Prev. Vet. Med. 2018, 154, 113–118. [Google Scholar] [CrossRef] [PubMed]
McAloon, C.G.; Doherty, M.L.; Whyte, P.; O’Grady, L.; More, S.J.; Messam, L.L.M.; Good, M.; Mullowney, P.; Strain, S.; Green, M.J. Bayesian Estimation of Prevalence of Paratuberculosis in Dairy Herds Enrolled in a Voluntary Johne’s Disease Control Programme in Ireland. Prev. Vet. Med. 2016, 128, 95–100. [Google Scholar] [CrossRef]
Pozzato, N.; Capello, K.; Comin, A.; Toft, N.; Nielsen, S.S.; Vicenzoni, G.; Arrigoni, N. Prevalence of Paratuberculosis Infection in Dairy Cattle in Northern Italy. Prev. Vet. Med. 2011, 102, 83–86. [Google Scholar] [CrossRef]
Hanson, T.E.; Johnson, W.O.; Gardner, I.A. Hierarchical Models for Estimating Herd Prevalence and Test Accuracy in the Absence of a Gold Standard. J. Agric. Biol. Environ. Stat. 2003, 8, 223. [Google Scholar] [CrossRef]
Branscum, A.J.; Gardner, I.A.; Johnson, W.O. Bayesian Modeling of Animal- and Herd-Level Prevalences. Prev. Vet. Med. 2004, 66, 101–112. [Google Scholar] [CrossRef]
Meletis, E.; Conrady, B.; Hopp, P.; Lurier, T.; Frössling, J.; Rosendal, T.; Faverjon, C.; Carmo, L.P.; Hodnik, J.J.; Ózsvári, L.; et al. Review State-of-the-Art of Output-Based Methodological Approaches for Substantiating Freedom from Infection. Front. Vet. Sci. 2024, 11, 1337661. [Google Scholar] [CrossRef] [PubMed]
Cameron, A.R.; Baldock, F.C. A New Probability Formula for Surveys to Substantiate Freedom from Disease. Prev. Vet. Med. 1998, 34, 1–17. [Google Scholar] [CrossRef] [PubMed]
Heisey, D.M.; Jennelle, C.S.; Russell, R.E.; Walsh, D.P. Using Auxiliary Information to Improve Wildlife Disease Surveillance When Infected Animals Are Not Detected: A Bayesian Approach. PLoS ONE 2014, 9, e89843. [Google Scholar] [CrossRef] [PubMed]
Madouasse, A.; Mercat, M.; van Roon, A.; Graham, D.; Guelbenzu, M.; Santman Berends, I.; van Schaik, G.; Nielen, M.; Frössling, J.; Ågren, E.; et al. A modelling framework for the prediction of the herd-level probability of infection from longitudinal data. Peer Community J. 2022, 2, e4. [Google Scholar] [CrossRef]
Hanson, T.E.; Johnson, W.O.; Gardner, I.A.; Georgiadis, M.P. Determining the Infection Status of a Herd. J. Agric. Biol. Environ. Stat. 2003, 8, 469–485. [Google Scholar] [CrossRef]
Kass, R.E.; Raftery, A.E. Bayes Factors. J. Am. Stat. Assoc. 1995, 90, 773–795. [Google Scholar] [CrossRef]
Veres, K.; Lang, Z.; Monostori, A.; Kostoulas, P.; Ózsvári, L. Bayesian Latent Class Modelling of True Prevalence in Animal Subgroups with Application to Bovine Paratuberculosis Infection. Prev. Vet. Med. 2024, 224, 106133. [Google Scholar] [CrossRef]
Nielsen, S.S.; Toft, N.; Okura, H. Dynamics of Specific Anti-Mycobacterium Avium Subsp. Paratuberculosis Antibody Response through Age. PLoS ONE 2013, 8, e63009. [Google Scholar] [CrossRef] [PubMed]
Verdugo, C.; Toft, N.; Nielsen, S.S. Within- and between-Herd Prevalence Variation of Mycobacterium Avium Subsp. Paratuberculosis Infection among Control Programme Herds in Denmark (2011–2013). Prev. Vet. Med. 2015, 121, 282–287. [Google Scholar] [CrossRef] [PubMed]
Sora, V.M.; Panseri, S.; Nobile, M.; Di Cesare, F.; Meroni, G.; Chiesa, L.M.; Zecconi, A. Milk Quality and Safety in a One Health Perspective: Results of a Prevalence Study on Dairy Herds in Lombardy (Italy). Life 2022, 12, 786. [Google Scholar] [CrossRef] [PubMed]
Vaona, F. Comparison of Potential Effects on the Profitability of the US MPP Application on Dairy Farms in Veneto (Italy) and Wielkopolska (Poland), University Degli di Padova. 2019. Available online: https://ssrn.com/abstract=3309657 (accessed on 1 September 2025).
Gonzalez, S. Dairy and Products Annual; USDA: Washington, DC, USA, 2024. [Google Scholar]
Song, P.X. Multivariate Dispersion Models Generated from Gaussian Copula. Scand. J. Stat. 2000, 27, 305–320. [Google Scholar] [CrossRef]
Meyer, A.; Bond, K.; Van Winden, S.; Green, M.; Guitian, J. A Probabilistic Approach to the Interpretation of Milk Antibody Results for Diagnosis of Johne’s Disease in Dairy Cattle. Prev. Vet. Med. 2018, 150, 30–37. [Google Scholar] [CrossRef]
Gronau, Q.F.; Singmann, H.; Wagenmakers, E.-J. Bridgesampling: An R Package for Estimating Normalizing Constants. J. Stat. Softw. 2020, 92, 1–29. [Google Scholar] [CrossRef]
R Core Team R: A Language and Environment for Statistical Computing. Available online: https://www.R-project.org/ (accessed on 1 September 2025).
Stan Development Team RStan: The R Interface to Stan. Available online: https://mc-stan.org/ (accessed on 1 September 2025).
Christensen, R.; Johnson, W.; Branscum, A.; Hanson, T.E. Bayesian Ideas and Data Analysis: An Introduction for Scientists and Statisticians; CRC Press: Boca Raton, FL, USA, 2010; Volume 79. [Google Scholar]
Stevenson, M.; Sergeant, E.; Heuer, C.; Nunes, T.; Heuer, C.; Marshall, J.; Sanchez, J.; Thornton, R.; Reiczigel, J.; Robison-Cox, J.; et al. epiR: Tools for the Analysis of Epidemiological Data. Available online: https://CRAN.R-project.org/package=epiR (accessed on 1 September 2025).
Veres, K.; Lang, Z.; Monostori, A.; Ózsvári, L. Bayesian modelling in practice. Estimation of within-herd paratuberculosis prevalence in dairy cattle herds (Bayes-i modellezés a gyakorlatban—Tejelő tehénállományok állományon belüli paratuberkulózisérintettségének becslése). Magy. Allatorvosok Lapja 2024, 146, 323–337. [Google Scholar] [CrossRef]
McAloon, C.G.; O’Grady, L.; Botaro, B.; More, S.J.; Doherty, M.; Whyte, P.; Saxmose Nielsen, S.; Citer, L.; Kenny, K.; Graham, D.; et al. Individual and Herd-Level Milk ELISA Test Status for Johne’s Disease in Ireland after Correcting for Non-Disease-Associated Variables. J. Dairy Sci. 2020, 103, 9345–9354. [Google Scholar] [CrossRef]
Gelman, A.; Vehtari, A.; Dunson, D.B.; Rubin, D.B.; Stern, H.S.; Carlin, J.B. Bayesian Data Analysis, 3rd ed.; CRC Press: Boca Raton, FL, USA, 2014; ISBN 978-1-4398-9822-2. [Google Scholar]

Table 1. Average size of dairy cattle herds reported in the literature and the size of herds used in the region-specific synthetic data.

Region	Average Herd Size ¹	Herd Size (Synthetic Data)	Source
Denmark	195, 205, 208 ²	200	[23]
Southern Italy	315	300	[8]
Lombardy	191	200	[24]
Veneto	100	100	[25]
Chile	83	100	[26]

¹ Large variability. ² Data from three consecutive years.

Table 2. Priors used along with Hungarian data.

Variable	Prior	Median (95th Percentile)
HTP	beta (150.589, 12.250)	0.927 (0.955)
$μ_{1}$	beta (65.700, 715.900)	0.084 (0.101)
$μ_{2}$	beta (134.195, 716.181)	0.158 (0.179)
$σ^{2}$	inverse gamma (37.030, 4.620)	0.126 (0.167)
${σ_{1}}^{2}$	inverse gamma (5.330, 0.070)	0.014 (0.032)
${σ_{2}}^{2}$	inverse gamma (6.050, 0.090)	0.016 (0.034)

HTP—herd true prevalence;

μ_{1}

—mean conditional within-herd prevalence for primiparous cows in the region;

μ_{2}

—mean conditional within-herd prevalence for multiparous cows in the region;

σ^{2}

—variance of the herd random effect;

{σ_{1}}^{2}

—variance of the parity random effect in primiparous cows;

{σ_{2}}^{2}

—variance of the parity random effect in multiparous cows.

Table 3. Priors used in the study.

Region	Parameter	Priors and Parameters Used in the Analysis	Median (95th Percentile)
Denmark	HTP	beta (425.110, 144.641)	0.746 (0.756)
	$μ_{1}$	beta (1.770, 43.210)	0.033 (0.095)
	$μ_{2}$	beta (3.850, 45.090)	0.073 (0.150)
Southern Italy	HTP	beta (5.030, 7.040)	0.412 (0.650)
	$μ_{1}$	beta (2.766, 46.335)	0.050 (0.118)
	$μ_{2}$	beta (6.018, 47.403)	0.108 (0.019)
Northern Italy (Lombardy, Veneto)	HTP	beta (13.320, 6.280)	0.686 (0.838)
	$μ_{1}$	beta (1.079, 18.347)	0.041 (0.157)
	$μ_{2}$	beta (2.347, 18.788)	0.099 (0.239)
Chile	HTP	beta (14.200, 0.700)	0.971 (0.999)
	$μ_{1}$	beta (12.896, 172.151)	0.068 (0.103)
	$μ_{2}$	beta (28.061, 173.629)	0.138 (0.181)

HTP—herd true prevalence;

μ_{1}

—mean conditional within-herd prevalence for primiparous cows in the region;

μ_{2}

—mean conditional within-herd prevalence for multiparous cows in the region.

Table 4. Input data format for the model. HERD_ID: integer, herd ID. COW_ID: cow ID, integer, unique. MULTIPAR: parity indicator—0 for primiparous cows; 1 for multiparous cows. POS: milk ELISA positivity indicator—1 for test-positive cows; 0 for test-negative cows.

HERD_ID	COW_ID	AGE
1000	1001	2.7186202
1000	1002	2.1694455
1000	1003	2.3301748
1000	1004	2.4988055

Table 5. Descriptive characteristics and true prevalence estimates (Herd 1).

Subgroup	Number	Apparent Prevalence	Estimated True Prevalence (95% CrI)
Primiparous cows	331	0%	0.1% (0;0.4)
Multiparous cows	390	0%	0.4% (0;1.4)
Overall	721	0%

Table 6. Bayes factor and posterior probability of the herd being infected (Herd 1).

Measure	Value	Decision
Bayes factor	0.016	infection strongly refuted
Posterior probability	17.54%	infection refuted

The decision rules are based on Kass and Raftery [20].

Table 7. Descriptive characteristics and true prevalence estimates for a sample of 150 cows from Herd 1.

Subgroup	Number	Apparent Prevalence	Estimated True Prevalence (95% CrI)
Primiparous cows	61	0%	0.4% (0;2.4)
Multiparous cows	89	0%	1.4% (0.1;5.6)
Overall	150	0%

Table 8. Bayes factor and posterior probability of the herd being infected for a sample of 150 cows from Herd 1.

Measure	Value	Decision
Bayes factor	0.071	infection refuted
Posterior probability	47.42%	infection weakly refuted

The decision rules are based on Kass and Raftery [20].

Table 9. Descriptive characteristics and true prevalence estimates (Herd 2).

Subgroup	Number	Number of Positives	Apparent Prevalence	Estimated True Prevalence (95% CrI)
Primiparous cows	205	12	5.9%	9.3% (4;15.9)
Multiparous cows	262	31	11.8%	17.7% (12.1;24.2)
Overall	467	43	9.2%

Table 10. Bayes factor and posterior probability of the herd being infected (Herd 2).

Measure	Value	Decision
Bayes factor	6.17 × 10²⁰	infection strongly supported
Posterior probability	100%	infection strongly supported

The decision rules are based on Kass and Raftery [20].

Table 11. Summary measures characterizing the model estimates for the true prevalence in truly infected herds.

	Primiparous Cows			Multiparous Cows
Country	Coverage of 95% CrI	Mean CWHP₁	Mean Half-Length of CrI	Coverage of 95% CrI	Mean CWHP₂	Mean Half-Length of CrI
Hungary	100%	9%	5.1%	94%	17.8%	5.2%
Denmark	100%	5.5%	6.3%	100%	10.2%	6.5%
Southern Italy	100%	2.6%	3.8%	100%	7.1%	4.8%
Lombardy	100%	5.8%	6.9%	85%	2.4%	3.6%
Veneto	100%	3.8%	6.2%	71%	3.9%	5.1%
Chile	95%	9.2%	7.9%	100%	16.2%	9.5%

CWHP₁ and CWHP₂: conditional within-herd prevalence of primiparous and multiparous cows. CrI: credible interval for the prevalence estimate. Coverage: proportion of herds where the simulated infection probabilities of individual cows in parity subgroups fell within the CrI.

Table 12. Infection status of all 120 pseudo-herds (20 herds in each region) evaluated based on the Bayes factor, categorized according to Kass and Raftery [20].

Bayes Factor ¹	Infection	Number of Truly Infected Herds	Number of Truly Not Infected Herds
0–0.05	strongly refuted	0	2
0.05–0.33	refuted	12	13
0.33–1	weakly refuted	18	8
1–3	weakly supported	15	1
3–20	supported	5	0
>20	strongly supported	46	0

¹ not related to HTP prior.

Table 13. Infection status of all 120 pseudo-herds (20 herds in each region) evaluated based on the posterior probability of infection, categorized based on Kass and Raftery [20].

Posterior Probability ¹	Infection	Number of Truly Infected Herds	Number of Truly Not Infected Herds
0–0.05	strongly refuted	0	0
0.05–0.25	refuted	1	4
0.25–0.5	weakly refuted	16	12
0.5–0.75	weakly supported	14	6
0.75–0.95	supported	12	2
0.95–1	strongly supported	53	0

¹ incorporating HTP prior.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Bayesian Assessment of True Prevalence of Paratuberculosis Infection in Dairy Herds and Their Parity Subgroups

Abstract

1. Introduction

2. Materials and Methods

2.1. Hungarian Data

2.2. Data for Other Regions

2.3. Statistical Analysis

2.3.1. Inferring the Infection Status of Herds

2.3.2. Estimating the PTBC Infection Prevalence in a Single Herd

2.3.3. Model Runs

2.3.4. Prior Information

3. Results

3.1. Downloadable Application

3.2. Model Results on Real World Data

3.3. Results of the Model on Synthetic Data

3.4. Estimating the PTBC Infection Prevalence in a Single Herd Without the Bayesian Model

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Appendix B. Derivation of Priors for Conditional Within-Herd Prevalence of Subgroups Used in the Model

Appendix B.1

Appendix B.2

Appendix C. Technical Description of the Single Herd Model

References

Article Metrics

Citations

Article Access Statistics