Next Article in Journal
An Analysis of Vectorised Automatic Differentiation for Statistical Applications
Previous Article in Journal
Determinants of Blank and Null Votes in the Brazilian Presidential Elections
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Theoretical Advancements in Small Area Modeling: A Case Study with the CHILD Cohort

by
Charanpal Singh
†,‡ and
Mahmoud Torabi
*,‡
Department of Community Health Sciences, University of Manitoba, Winnipeg, MB R3E 0W3, Canada
*
Author to whom correspondence should be addressed.
Current address: S113 Medical Services Building, 750 Bannatyne Ave., Winnipeg, MB R3E 0W3, Canada.
These authors contributed equally to this work.
Stats 2025, 8(2), 39; https://doi.org/10.3390/stats8020039
Submission received: 26 March 2025 / Revised: 2 May 2025 / Accepted: 12 May 2025 / Published: 16 May 2025

Abstract

:
Developing accurate predictive models in statistical analysis presents significant challenges, especially in domains with limited routine assessments. This study aims to advance the theoretical underpinnings of longitudinal logistic and zero-inflated Poisson (ZIP) models in the context of small area estimation (SAE). Utilizing data from the Canadian Healthy Infant Longitudinal Development (CHILD) study as a case study, we explore the use of individual- and area-level random effects to enhance model precision and reliability. The study evaluates various covariates’ impact (such as mother’s asthma, mother wheezed, mother smoked) on model performance to predict child’s wheezing, emphasizing the role of location within Manitoba. Our main findings contribute to the literature by providing insights into the development and refinement of small area models, emphasizing the significance of advancing theoretical frameworks in statistical modeling.

1. Introduction

Predictive modeling in statistics often involves estimating parameters within complex data structures. This paper focuses on the theoretical development of longitudinal logistic and zero-inflated Poisson (ZIP) models, addressing challenges posed by small sample sizes and sparse data. The context of asthma prevalence in Manitoba serves as a case study to illustrate these models’ application. However, the primary objective is to advance the theoretical framework and methodologies underlying these models, beyond asthma-specific applications.
Asthma is a chronic disease characterized by airway inflammation, leading to variable expiratory airflow limitation and symptoms such as shortness of breath, chest tightness, coughing, and wheezing [1]. In Canada, asthma prevalence is highest among individuals aged 1–34 years [2], with incidence rates gradually increasing each year, particularly among younger demographics [3,4]. Manitoba, a sparsely populated province with the majority of its population concentrated in Winnipeg, presents unique challenges in studying asthma due to small sample sizes in many areas.
This study aims to develop robust statistical models for small area estimation (SAE) to assess childhood wheezing—a symptom of asthma—and its associated risk factors, leveraging longitudinal data from the Canadian Healthy Infant Longitudinal Development (CHILD) study. By focusing on methodological and statistical techniques, this study seeks to enhance the precision and accuracy of estimating wheezing severity across different regions of Manitoba.
To address the issue of small sample sizes, we employed small area models to analyze wheezing rates across different regions. A “small area” is defined as any domain for which direct estimates of adequate precision cannot be produced [5]. Small area models leverage auxiliary variables and random area effects to improve predictions of wheezing rates and severity in geographical areas [5]. Our goal is to refine the theoretical framework and methodologies of these models, advancing the field of small area estimation and providing more accurate and reliable estimates for public health research.
Longitudinal logistic models are designed to handle binary or ordinal response variables, particularly when data are collected over multiple time points. These models account for within-subject correlations and allow for the inclusion of random effects. In this study, we utilized hierarchical Bayesian logistic models to estimate the probability of wheezing severity at different follow-ups, incorporating individual-, area-, and interaction-level random effects.
ZIP models are suitable for count data with an excess number of zeroes, which is common in longitudinal health studies. These models assume that the population consists of two groups: those who never experience the event (always zero) and those who may experience the event (Poisson-distributed counts). The ZIP models in this study incorporate area and follow-up level effects to address overdispersion and excess zeros in wheezing episode counts.
The Canadian Healthy Infant Longitudinal Development (CHILD) study provided a unique opportunity to explore these theoretical advancements. The CHILD study includes detailed longitudinal data on childhood wheezing and associated risk factors, collected from a cohort of children born between 2008 and 2012 and followed up until 2.5 years of age. The data were first accessed for this study on 6 June 2020. The time points for this study were prenatal, 3, 6, 12, 18 months and 2 and 2.5 years. The Manitoba sub-cohort was used for this study and members were recruited within a radius of Winnipeg and Morden-Winkler; families who moved out of this region were still followed. For this particular study, participants were assumed to have the same location throughout the study. This was done using postal codes from the 2-year follow-up, since this follow- up contained the most unique postal codes. Furthermore, individuals with incomplete follow-ups for postal codes or missing postal codes were removed, which resulted in a sub-cohort of 690 families. The responses from the home environment questionnaire indicated that between 80 and 100 participants moved between each follow-up in Manitoba. The primary outcome of the CHILD study was expert physician diagnosed asthma, but wheezing was one of the secondary outcomes. However, it can be challenging to make a diagnosis of asthma in children aged five years since it is not possible to have routine airflow limitation measurement within younger age groups [1]. Wheezing symptoms, however, can still be present in children under five years old. Thus, the difficulty of diagnosing asthma in younger children means that wheezing can be used as an indicator of “possible” asthma diagnosis in the future [6].
For the purpose of this study, Manitoba was divided into small geographical regions (areas) and thus more information was provided regarding which regions had higher rates of wheezing and which factors contributed. The province has five regional health authorities (RHAs) created from the initial eleven RHAs in 2012. These five RHAs are further divided into 96 regional health authority districts (RHADs); these RHADs were intended to be used as small areas for this study. However, for this study, only 60 of the 96 RHADs were recruited by the CHILD cohort. Thus, the 60 recruited RHADs were used as small areas in this study and the RHADs which were not recruited were greyed out in the figures throughout this study. The Manitoba sub-cohort, comprising 690 families, was used to analyze wheezing severity and its predictors.
By employing hierarchical Bayesian logistic models and ZIP models, we aimed to identify significant predictors of wheezing severity and rates, such as maternal asthma, smoking, and environmental factors. The integration of individual- and area-level covariates allowed us to capture the variability within small areas and enhance the precision of our estimates.

2. Small Area Models

2.1. Linear Mixed Model

Basic types of small area models are area-level and unit-level models. Area-level models are also called aggregate models, which relate small area means to area-specific auxiliary covariates [5,7,8]. Unit-level models relate the unit values within a small area to unit-specific auxiliary variables. Unit-level auxiliary variables provide individual-level information [9]. Assuming the response of the model is y k , the parameter of interest is a function of the small area means Y ¯ k for area k such that μ k = g ( Y ¯ k ) . In this case, g ( · ) is the link function that connects the response y k with the covariates and random components of the model. In the case of the linear model, the link function g ( · ) is the identity, for the case of the logistic model, the link function g ( · ) is logit, and for the Poisson model, the link function g ( · ) is log. It is crucial to understand that the end goal of small area analysis is to estimate the small area parameter μ k . Assume the indirect estimator of μ k is
μ k ^ = g Y ¯ ^ k = μ k + e k , k = 1 , , K ,
where K is the number of small areas and the error terms e k ( k = 1 , , K ) are independently and normally distributed with mean 0 and variance ψ k . Note that it is assumed that ψ k is known to avoid identifiability issues. Further, if area-level auxiliary data x k = x 1 k , , x p k T are available then a linear model can be formed as
μ k = x k T β + b k v k , k = 1 , , K ,
where b k ’s are the known positive constants; these are to set optional different random effects for each small area. The v k term represents the area-specific random effects with the assumption that all v k ’s are independent and identically distributed (iid) with mean 0 and variance σ v 2 , as denoted by v k i i d N 0 , σ v 2 . Note that v k ’s are assumed to be normally distributed but are not limited to this distribution. Lastly, β is a vector of regression coefficients. Combining (1) with (2) results in
μ k ^ = x k T β + b k v k + e k , k = 1 , , K ,
which is the final form of the basic area-level model and is also referred to as the Fay–Herriot (FH) model [10]. Now, if there are unit-specific auxiliary data available within each small area, then the basic unit-level linear model can be used to model small areas. Assuming that i is any element, say an individual within any small area k, and n k represents the sample size in each area, then the model is
y i k = x i k T β + v k + e i k , i = 1 , , n k , k = 1 , K .
In Equation (4), x i k = ( x i k 1 , , x i k p ) T are the auxiliary data and β are the corresponding regression coefficients. Like the FH model, v k represents the area-level random effects while the sampling error is e i k N 0 , t i k σ e 2 with known constants t i k but unknown variance σ e 2 . It should be kept in mind that the parameters of interest like the FH model are the small area means, Y ¯ k . Note that one can also incorporate area-level and unit-level covariates into the unit-level model (4) as is the case for the CHILD cohort.
In general, to estimate small area means, the best predictor (BP) is used. The BP of a parameter of interest, such as the small area mean is the conditional expectation of the parameter given the data and the model parameters. Thus, the BP assumes that the model parameters are known. The BP does not have a closed form for most non-linear models. Thus, numerical methods were needed to estimate such models. In particular, Bayesian methods were used for this study; hence, diffuse priors were implemented on the model parameters β , σ v 2 , and σ e 2 .

2.2. Logistic Mixed Model

Suppose now that y i k is binary, and we are interested in the small area proportions P k , k = 1 , K , then the hierarchical Bayesian (HB) area-level logistic model [7,8] is
y k p k Binomial p k , n k ,
logit p k = x k T β + v k , with v k iid N 0 , σ v 2
and priors,
f ( β ) 1 and σ v 2 Gamma ( a , b )
Note the diffuse prior chosen for the inverse of variance parameters is a gamma distribution with shape and location parameters 0.001 ( a = b = 0.001 ). If unit-specific auxiliary data are available within each small area, and we are interested in the small area proportions Y ¯ k = P k = i = 1 N k y i k / N k , k = 1 , K , where N k is the number of individuals in area k, then the unit-level logistic model is
y i k p i k Bernoulli p i k ,
logit p i k = x i k T β + v k , with v k iid N 0 , σ v 2
and priors,
f ( β ) 1 and σ v 2 Gamma ( a , b )

2.3. Poisson Mixed Model

Now, we consider the count data in the response by assuming that y k are area-level counts and are distributed Poisson variables [7,8]. The parameter of interest in this model would be λ k , the true average rate of wheezing in the k t h area. Thus, the model becomes
l o g λ k = x k T β + v k ,
where k = 1 , , K and v k iid N ( 0 , σ v 2 ) . The same priors as in Section 2.2 were used for β and σ v 2 .

2.4. Inference

A Monte Carlo Markov Chain (MCMC) was used to estimate the parameters of interest for the above models (5), (6), and (7). We used the R package Rjags to implement the MCMC. All models in thus study were run on Rjags; the initial models had 200,000 iterations with a thinning degree of 18. Diagnostic tools such as the Gelman–Rubin test were used to determine if the model converged to the target posterior distribution. The Gelman–Rubin diagnostic evaluates MCMC convergence by analyzing the differences between multiple Markov chains—a small within-chain variance indicates convergence [11]. Thus, multiple chains were sampled from each, with burn-in to ensure accurate results. If the initial model did not converge according to the Gelman diagnostics for all covariates, then the model was updated each time with 10,000 iterations. The model was updated until convergence was reached. Additionally, the 95% credible intervals for the posterior distributions were calculated using the equal-tail interval method, which is the default method in Rjags. Particular attention was made to the choice of priors and structure of the random effects. The Deviance Information Criterion (DIC) was used for model comparison and to determine model adequacy. DIC is often viewed as a trade-off between model adequacy and complexity [12]. DIC statistics were used throughout the model building stages of this study.

3. Proposed Statistics Models

For the CHILD study, our interest was to determine which factors affected the rate of wheezing and severity over time (follow-ups) along with geographical information. Wheezing severity was either a binary or an ordinal variable; thus, longitudinal logistic area models were required [13,14,15]. Furthermore, a model to analyze the number of wheezing episodes was needed for each follow-up. Thus, extensions of the logit (6) and Poisson (7) models were needed for this study. In general, the models have different link functions dependent on the type of response variable. In the context of the CHILD study, the response y i j k was dependent on participant i, follow-up j, and area k. Assuming, without loss of generality, the model has three random effects which can depend on any combination of i, j and k and with response variable y i j k and corresponding X i j k , then, letting μ i j k = E ( y i j k ) , we obtain,
g ( μ i j k ) = X i j k T β + f i j + v k + I i j k .
In Equation (8), f i j was the random effect of the individual and follow-up. In this model, we assume the disturbance is an AR(1) (autoregressive-one) effect. I i j k is the interaction random effect of all three levels interacting with one another and v k is simply the random effect of each small area.
From model (8), g ( · ) is the link function that connects y i j k with the covariates and random components of the model. The link functions of interest for this study were the logit and log, as shown below in Equations (9) and (10), for the logistic and Poisson models, respectively. The logit link function is used when the response variable is assumed to follow a binomial, multinomial or ordinal distribution (e.g., a child wheezed or not in our CHILD study). The log link function is primarily used when the response is assumed to follow a Poisson distribution (e.g., wheezing episodes in our CHILD study). So, accordingly, we have
g μ i j k = logit μ i j k = log μ i j k 1 μ i j k
g ( μ i j k ) = log μ i j k
The three types of models of interest for this study were as follows.

3.1. Ordinal Logistic Model

A logistic model with the assumption that the response, y i j k , is ordinal was used to quantify how much of an impact location and time had on wheezing severity. Here, y i j k represented the childhood wheezing severity of child i in a particular area k at follow-up j.
Now, the ordinal logistic model is described. In our CHILD study, y i j k have three levels l = 1 , 2 , 3 ; thus, there are two categories c = 1 , 2 (with another level as a reference category). We then obtain
log Prob y i j k c Prob y i j k > c = a c + X i j k T β i j k + f i j + v k + I i j k ,
where i represents the individual, j is follow-up, and k is the small area, and
v k N 0 , σ v 2 f i j = ρ f i j 1 + ϵ i j , ϵ i j N 0 , σ f 2 f i j f i j 1 N ρ f i j 1 , σ f 2 I i j k N 0 , σ I 2 1 < ρ < 1 .
Thus, the two logit equations with Q ( m ) = Prob y i j k c are
logit [ Q ( 1 ) ] = a 1 + X i j k T β i j k + f i j + v k + I i j k logit [ Q ( 2 ) ] = a 2 + X i j k T β i j k + f i j + v k + I i j k .
Note, in this case, the vector of covariates X i j k does not include the intercept term since this is represented by a 1 and a 2 . In general, a c was a level specific intercept with the restriction that a 1 < < a c such that c = 1 , , l 1 .

3.2. Binary Logistic Model

Due to the lack of viable data, the severity of wheezing response variable was transformed to two levels. This was accomplished by merging responses 2 and 3 into one level, as the numbers in these two levels were small compared to level 1. The resulting model is described as follows:
y i j k p i j k Bernoulli p i j k .
Then, p i j k = Prob ( y i j k = 1 ) is the probability that the individual i had wheezing severity at follow-up j and small area k. Then, the logistic model is
log p i j k 1 p i j k = X i j k T β i j k + f i j + v k ,
where
v k N 0 , σ v 2 f i j = ρ f i j 1 + ϵ i j , ϵ i j N 0 , σ f 2 f i j f i j 1 N ρ f i j 1 , σ f 2 1 < ρ < 1 ,
note that we ignored the interaction term I i j k in model (12) to reduce the number of parameters and variability in the model.

3.3. Zero-Inflated Poisson Model

A different model was required to model the count data from the CHILD study. While a Poisson regression model could have been used, there were issues with this approach, including excess zeros in the data and overdispersion. As a result, a Zero-Inflated Poisson (ZIP) model was used to allow for excess zeros in the dependent variable and overdispersion [16]. It was assumed that there were two types of participants in the study: those with always-zero counts and those with counts predicted by the standard Poisson. The ZIP model is a mixture of probabilities from the two groups and allows for both overdispersion and excess zeros. Area and follow-up level effects were introduced to the ZIP model, similar to previous logistic models, but to reduce the number of parameters, I i j k was not included in the model.
Θ i j k = X i j k T β i j k + f i j + v k ,
where
v k N 0 , σ v 2 f i j = ρ f i j 1 + ϵ i j , ϵ i j N 0 , σ f 2 f i j f i j 1 N ρ f i j 1 , σ f 2 1 < ρ < 1 .
Then, the ZIP model becomes
Prob y i j k = m = π i j k + 1 π i j k exp λ i j k if m = 0 1 π i j k λ i j k y i j k exp λ i j k y i j k ! if m > 0
such that log ( λ i j k ) = Θ i j k , l o g i t ( π i j k ) = Θ i j k , and m is the number of wheezing episodes. Here, λ i j k represents the expected rate of the Poisson component, while π i j k represents the probability of excess zeroes. Then, the expected value of the ZIP model is E y i j k = ( 1 π i j k ) λ i j k . Similarly, the variance of the ZIP model is Var y i j k = λ i j k ( 1 π i j k ) ( 1 + π i j k λ i j k ) .
In ZIP models, it is possible to choose different predictors for the true and excess zeros if it is expected that different variables are driving the presence or absence of the outcome in the data. For simplicity, the same covariates for both components were fitted in the ZIP model, as shown by the expression Θ i j k . Typically, we are less interested in the modeling of false zeros; thus, a diffuse Bayesian prior for π i j k was also implemented as an alternative model such that l o g i t ( π i j k ) = l o g i t ( π ) = γ 0 , γ 0 N (0,10,000).

3.4. Predicted Proportions and Rates

In addition, to study the effects of the covariates over time, we were also interested to investigate which areas (regions) are more at risk in terms of wheezing episodes and also their severity. Thus, we needed area-level models to predict the proportions and rates of wheezing. We considered the area-level binomial logit model, where y k was the number of children who wheezed at area k for the first follow-up. The parameter of interest was p k = h ( x k T β + v k ) , where h ( t ) = 1 / { 1 + exp ( t ) } . Since the true population parameters were not known, they were estimated using the logit-model by p ^ k H B = h ( x k T β ^ + v ^ k ) directly from the MCMC samples. Here, β ^ was the vector of the estimated fixed effects and v ^ k was the predicted area-specific random effect. Note that p ^ k H B represents the probability or proportion of wheezing in area k.
Similarly, for the ZIP model, our interest was the average count per area. We again restricted our ZIP model to the first follow-up. The model was described as
Θ i k = x i k T β + v k
where x i k are the unit-level covariates and v k N 0 , σ v 2 .
Then, the ZIP model became
Prob y i k = m = π i k + 1 π i k exp λ i k if m = 0 1 π i k λ i k y i k exp λ i k y i k ! if m > 0
such that log ( λ i k ) = Θ i k and logit ( π i k ) = Θ i k . Then, we assumed the simple case with covariates x i k at area-level such that x i k = x k . Then, the area-level ZIP model became
Θ k = x k T β + v k ,
where x k were the area-level covariates.
Then,
Prob y k = m = π k + 1 π k exp λ k if m = 0 1 π k λ k y k exp λ k y k ! if m > 0
Now, our parameter of interest was λ k = exp ( x k T β + v k ) . Using the parameter estimates of the ZIP model, this became λ ^ k H B = exp ( x k T β ^ + v ^ k ) . Here, λ ^ k H B represents the average number of wheezing episodes in area k.

4. Results

Simulation studies were conducted on the ordinal logistic and ZIP models to evaluate the performance of the proposed models. Furthermore, a simulation was conducted on a simpler binary logistic model. The results of the simulations are displayed in Appendix B. Due to the lack of viable data, the severity of the wheezing response variable was transformed to run a binary logistic model by merging responses 2 and 3 into one level. However, the binary logistic model yielded inflated odds ratios. After model building using a step-up approach, a simpler Hierarchical Bayes unit-level model was fitted [5]. The response variable was the severity of wheezing at 3-month follow-up, along with covariates at the prenatal and 3-month time points. The model findings are presented in Table 1. The unit-level model results demonstrated that maternal asthma and smoking during pregnancy increased the probability of developing wheezing, while living in close proximity to a farm and having furry pets reduced the odds of developing wheezing. Maternal asthma during pregnancy emerged as the sole significant predictor in this model.
To analyze the wheezing rates of children in the cohort, ZIP models were fitted. Model building was performed using a step-up approach by adding covariates that were statistically and clinically significant one at a time. The results of the most viable final ZIP model are shown in Table 2. The unit-level ZIP was considered the most viable model since the preliminary models indicated no autocorrelation between follow-ups, and thus the AR(1) was not needed in the model. The unit-level ZIP model used a diffuse prior for the logit component of the model, which accounted for the excess zeros in the data.
The results of unit-level ZIP model in Table 2 indicated that there were a high number of false zeros in the data, since the estimate of 1 π meant that only 19% of individuals have true zero counts. This was to be expected since many of the individuals did not wheeze; thus, their number of wheezing episodes was 0. The results also implied that maternal wheezing seemed to marginally increase the expected rate of wheezing episodes in Manitoba. However, maternal smoking and asthma greatly increased the expected rate of wheezing episodes, while only maternal asthma was statistically significant. Interestingly, living near a farm also increased the rate of wheezing episodes among those who already had wheezing symptoms. The key distinction between the ZIP and logistic model was that the ZIP model was focused on individuals in the cohort who already had wheezing, as opposed to the odds of developing wheezing, which was the focus of the logistic model.

Proportions and Rates of Wheezing

The area-level models were fitted after individual-level auxiliary variables were converted to area-level covariates. It should be mentioned that aggregating the sample data resulted in the loss of some information, accounting for why some covariates had only a marginal contribution to the odds of wheezing. However, the main purpose of the area-level models was the prediction of wheezing proportions and rates for each RHAD. The results of the final area-level (after model building) are reported in Table 3.
The results of Table 3 show that the proportion of prenatal maternal asthma and wheezing in an area were marginally associated with increased odds of developing wheezing, while living near a farm slightly decreased the odds of developing wheezing. Interestingly, maternal smoking had no effect on the odds of developing wheezing. The proportion of predicted wheezing in each area k (= 1 , , K ) was obtained by p ^ k H B = h ( x k T β ^ + v ^ k ) by using the results of Table 3. Then, the predicted proportions were mapped to their corresponding RHADs in Manitoba (Figure 1). Recall that for this study, we had information for 60 (out of 96) RHADs; thus, the rest appeared as missing in the figures. For comparison purposes, the area-level wheezing proportions were obtained from the data by dividing wheezing severity at 3 months by the sample size for each area, as shown in Figure 2. The proportions obtained from the raw data indicated some areas with almost no proportions of wheezing, while the predicted proportions obtained from the model showed slightly higher proportions. The reason for this was that the area-level logistic model predicted the proportions of wheezing severity for each small area along with taking into account the auxiliary information available for each area. This resulted in smoothed predictions towards the mean, which is shown in Figure 1.
The results of the final area-level ZIP (after model building steps) are reported in Table 4. The area-level ZIP model used maternal asthma and farm as covariates for both the logit (excess zero) and Poisson component, and thus did not use a prior for the excess zero component of the model. The area-level ZIP model indicated that living within a 100 m from a farm decreased the expected rate of wheezing episodes by 3%, while prenatal maternal asthma increased the expected rate of wheezing episodes by 1%.
Using the results of Table 4, the predicted average number of wheezing episodes in each area k (= 1 , , K ) was calculated using λ ^ k H B = exp x k T β ^ + v ^ k . These predicted average wheezing episodes were then mapped to their corresponding RHADs in Manitoba (Figure 3). For comparison purposes, the area-level wheezing episodes at 3 months obtained from the raw data were also mapped (Figure 4). The average number of wheezing episodes obtained directly from the data indicated that many areas have no or a small number of wheezing episodes. In particular, regions 27, 28, and 45 had no wheezing episodes while the smoothed predictions using the proposed new ZIP model (Figure 3) had at least four episodes of wheezing in those areas.

5. Discussion

The use of advanced statistical modeling techniques, particularly small area estimation (SAE) models, has proven to be valuable in this study. The application of Bayesian methods to both area-level and unit-level data has allowed for a more nuanced understanding of childhood wheezing patterns across Manitoba.
The simulation studies revealed that the proposed models performed well in terms of bias and the mean squared error (MSE) of the model parameter estimates. One of the key strengths of this study is the rigorous model validation process, including the use of the Gelman–Rubin diagnostic for ensuring MCMC convergence and the DIC statistics for model comparison. These steps have ensured that the models used are both statistically sound and robust. The choice of the hierarchical Bayesian approach provided flexibility in modeling the complex structure of the data, accommodating both individual and area-level random effects. We have selected non-informative priors within the models to prioritize data-driven influence over subjective bias.
The inclusion of auxiliary variables, such as maternal health history and environmental factors, has enhanced the precision of the estimates. These variables have acted as significant predictors, explaining a substantial portion of the variation in wheezing outcomes. The models demonstrated that maternal asthma and smoking are crucial risk factors, significantly increasing the odds and rates of wheezing in children.
Given the sparsity of the population in many regions of Manitoba, the use of SAE models was particularly appropriate. These models effectively drew strength from related areas, improving the estimates in regions with small sample sizes. This methodological approach is crucial for public health research in sparsely populated areas, ensuring that findings are reliable and actionable. As a practical guideline, we recommend using the proposed ordinal logistic model (Section 3.1) for longitudinal data with ordinal outcomes when the goal is to predict outcomes at the area level. Similarly, we suggest the proposed ZIP model (Section 3.3) for longitudinal count data with many zeros, with the same area-level prediction objective.
The implementation of the ZIP model addressed the issue of overdispersion and excess zeros in the count data of wheezing episodes. This model differentiated between children who never wheezed and those who did, providing a more accurate representation of the data. The ZIP model’s findings regarding the influence of environmental factors, such as proximity to farms, have important implications for understanding the triggers of wheezing in susceptible populations.
Notably, we encountered complex computational challenges in analyzing longitudinal data, particularly with high-dimensional covariates/areas and relatively small sample sizes. The interaction between these factors posed significant obstacles to achieving robust and efficient analytical methods. To overcome these difficulties, we leveraged high-performance computing resources to address computational constraints and employed small area estimation techniques to mitigate issues related to sample size limitations.

6. Conclusions

The methodological rigor applied in this study highlights several key findings and their implications for public health policy:
The study underscores the critical role of maternal health, particularly asthma and smoking, in influencing wheezing outcomes in children. These findings highlight the need for targeted interventions aimed at improving maternal health as a strategy to reduce childhood wheezing and asthma.
The use of Bayesian SAE models has provided more accurate and reliable estimates of wheezing prevalence and severity across Manitoba. These models are particularly valuable for public health surveillance in regions with sparse populations, offering insights that would not be possible with traditional methods.
The study has revealed the dual role of environmental factors, such as farm proximity, in influencing wheezing outcomes. While farm living appears to reduce the odds of developing wheezing, it increases the frequency of episodes among those already affected. These nuanced findings can inform targeted public health interventions and policies.
The study highlights the potential for further research using more comprehensive and longitudinal data. Future studies could expand on this work by incorporating additional environmental and genetic factors, as well as exploring the long-term impacts of early childhood wheezing on asthma development.
These innovative techniques can also be applied across different provinces and regions to examine variations in asthma and wheezing over both geographic areas and time.

Author Contributions

Conceptualization, methodology, software, resources, validation, visualization, formal analysis, investigation, data curation, writing—original draft preparation, writing—review and editing, C.S.; Conceptualization, methodology, validation, supervision, writing—review and editing, M.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Health Research Ethics Board (HREB) of the University of Manitoba, with ethics number HS23359 (H2019:429). The approval was granted on 28 October 2024, and is valid until 15 November 2025. The project, titled “Modelling Severity of Wheezing in Small Areas in Manitoba Using Longitudinal Data”, underwent annual review and was found to be ethically acceptable for research involving human participants.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The dataset used in this study cannot be shared publicly due to copyright and privacy concerns for participants involved in the CHILD Cohort Study. For more details on data access policies, please visit https://childstudy.ca/for-researchers/data-access/ (accessed on 25 March 2025). The study utilized individual-level participant data, which are classified as controlled access.

Acknowledgments

We are very grafeful to Elinor Simons and Zeinab Mashreghi for their help in guiding and focusing the clinical and statistical objectives of this paper throughout multiple revisions and rewrites. We also sincerely thank the three reviewers for their constructive and insightful comments, which have significantly enhanced the quality and clarity of this manuscript. We thank the CHILD Cohort Study (CHILD) participant families for their dedication and commitment to advancing health research. CHILD was initially funded by CIHR and AllerGen NCE. Visit CHILD at childstudy.ca.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Appendix A. Supplementary Tables

Table A1. Severity of wheezing frequency distribution for six follow-ups within the CHILD cohort. Scale of 0–3, 0 being no evidence of wheezing.
Table A1. Severity of wheezing frequency distribution for six follow-ups within the CHILD cohort. Scale of 0–3, 0 being no evidence of wheezing.
Severity of Wheezing
Severity 3 Months 6 Months 1 Year 18 Months 2 Years 2.5 Years
0618596576567600581
134292623124
219161961712
387122139
Missing1142571122964
Table A2. Region code with associated location and small area number.
Table A2. Region code with associated location and small area number.
CodeRegionSmall Area
W002Assiniboine South1
W11ADowntown2
W11BDowntown3
W03AFt. Garry4
W03BFt. Garry5
W09AInkster6
W09BInkster7
W10APoint Douglas8
W10BPoint Douglas9
W07ARiver East10
W07BRiver East11
W07CRiver East12
W07DRiver East13
W12ARiver Heights14
W12BRiver Heights15
W08ASeven Oaks16
W08BSeven Oaks17
W08CSeven Oaks18
W05ASt. Boniface19
W05BSt. Boniface20
W01ASt. James-Assiniboia21
W01BSt. James-Assiniboia22
W04ASt. Vital23
W04BSt. Vital24
W006Transcona25
WE34Souris River26
WE31Asessippi27
WE36Spruce Woods28
IE42Arborg/Riverton29
WE32Little Saskatchewan30
IE31Beausejour31
WE33Turtle Mountain32
SO14Cartier/SFX33
IE43St. Laurent34
WE13Riding Mountain35
WE15Dauphin36
SO25St Pierre/DeSalaberry37
SO22Carman38
SO26Red River South39
IE41Gimli40
WE35Whitemud41
WE11Duck Mountain42
SO45Hanover43
SO44Steinbach44
IE32Pinawa/LDB45
SO11Seven Regions46
SO31Lorne/Louise/Pembina47
SO23MacDonald48
WE12Porcupine Mountain49
SO24Morris50
SO12MacGregor51
SO13Rural Portage52
SO15City of Portage la Prair53
SO33Altona54
IE21Stonewall/Teulon55
SO36Roland/Thompson56
IE22Wpg Beach/St. Andrews57
IE11Selkirk58
IE23St. Clements59
IE24Springfield60
SO32Stanley61
SO34Morden62
SO35Winkler63
SO42Tache64
WP21Winnipeg Churchill65
IE52Fisher/Peguis66
NO11Flin, Snow, Cran, Sher67
NO13LL/MC, LR, O-P(SIL), PN(GVL)68
IE51Powerview/PF69
NO21GR/Mis, ML/Mos, Eas/Che70
SO43Ste Anne/LaBroquerie71
SO21Nortre Dame/St. Claude72
NO14Thomp, Myst Lake73
SO41Niverville/Richot74
IE61Northern Remote75
SO46Rural East76
NO22Puk/Mat Col CN77
NO23SayD(TL), Bro/BL, NoL(Lac)78
NO26Bu(OH), MS(GR), GLN/GLFN79
NO28Norway House/NH CN80
NO27Cross Lake/Pimi CN81
NO31Island Lake82
IE33Whiteshell83
WE14Agassiz Mountain84
WE16Swan River85
WE23Bdn Downtown86
WE24Bdn South End87
WE22Bdn North Hill88
WE25Bdn East End89
WE21Bdn West End90
IE53Eriksdale/Ashern91
NO12The Pas/OCN, Kels92
NO15Bay Line93
NO16Gillam, Fox Lake CN94
NO25Sham, YorkF, Tat(SPL)95
NO24Nelson House/NCN96

Appendix B. Simulation Studies: Theory and Results

Appendix B.1. Ordinal Logistic Model

A simulation study was performed in which data were simulated with an ordinal response; the response represented the wheezing severity scale found on the CHILD study questionnaires. The dependent variable or response variable y i j k was randomly generated with 1600 random values, all ranging from 1–3 using the integer uniform distribution or trinomial distribution. The probability of generating each integer 1 , 2 , 3 was 1 3 ; thus, equal probabilities. The aim of the simulation was to investigate the performance of the proposed ordinal logistic model with a similar set-up as the data structure from the CHILD cohort. The ordinal logistic model used for the simulation is described as follows. Let y i j k have three levels l = 1 , 2 , 3 ; thus, there were two categories c = 1 , 2 (with another level as a reference category). We then obtain
log Prob y i j k c Prob y i j k > c = a c + β 1 X i + β 2 X i j + β 3 X i j k + f i j + v k + I i j k ,
where i represents the individual, j is follow-up, and k is small area, and
v k N 0 , σ v 2 f i j = ρ f i j 1 + ϵ i j , ϵ i j N 0 , σ f 2 f i j f i j 1 N ρ f i j 1 , σ f 2 I i j k N 0 , σ I 2 1 < ρ < 1 .
Thus, the two logit equations with Q ( m ) = Prob y i j k c are
logit [ Q ( 1 ) ] = a 1 + β 1 X i + β 2 X i j + β 3 X i j k + f i j + v k + I i j k logit [ Q ( 2 ) ] = a 2 + β 1 X i + β 2 X i j + β 3 X i j k + f i j + v k + I i j k .
For the simulation, there were 20 individuals, 4 follow-ups, and 20 small areas. The covariates X i , X i j , and X i j k were generated from a binary distribution with the probability of generating each integer 0 , 1 being equal to 1 2 . The simulation was conducted by fitting the simulated data with the ordinal logistic model using Rjags. Diffuse priors were used for all the model parameters. Coefficients from the fitted model were then to be used as true parameters, since a starting point was needed to begin the simulations.
These estimates were then used as the true parameters for the simulation study. The initial goal was to evaluate the performance of the model parameters. The simulation procedure was as follows:
Algorithm A
Step 1:
Consider the estimates from Table A3 as the true parameters, represented by the vector δ
Step 2:
Use the true parameters to generate the random effects, which, in turn, generate the new response values
Step 3:
Refit the model using the new generated response values
Step 4:
Extract the estimates from the newly fitted model
Step 5:
Repeat the cycle by going to step 2
Table A3. Ordinal Simulation Study: Three chains are used with 60,000 samples from each chain. Furthermore, the Gelman diagnostics indicated convergence of model parameters. These estimates were used as the true parameters for the ordinal logistic model.
Table A3. Ordinal Simulation Study: Three chains are used with 60,000 samples from each chain. Furthermore, the Gelman diagnostics indicated convergence of model parameters. These estimates were used as the true parameters for the ordinal logistic model.
ParameterEstimate
a 1 −0.685
a 2 0.772
β 1 −0.090
β 2 0.068
β 3 0.170
ρ 0.378
σ v 0.145
σ f 0.162
σ I 0.169
In the above steps, the model was essentially reversed in the sense that the aim was to simulate values of the response y i j k using the ordinal model described before and thus replacing the original values of y i j k with the simulated values of y i j k and then refitting the same ordinal model; this was performed 500 times. Each simulation run produced its own estimates ( δ ^ r ; r = 1 , , 500 ) of the model parameters. Then, the bias, relative bias (R. Bias), and MSE (Mean Square Error) were calculated using the three formulas below, assuming element-wise operation of the vectors. The bias and relative bias were used to quantify the average difference between the true parameters and the estimates within the simulations runs. The MSE was used to assess, on average, how far an estimator was from the true parameter within the simulation runs.
Bias = 1 500 r = 1 500 δ δ ^ r Relative Bias ( % ) = 1 500 r = 1 500 δ δ ^ r δ × 100 MSE = 1 500 r = 1 500 δ δ ^ r 2
The results from Table A4 indicated the model parameter estimates have very small bias and MSE. The relative bias for σ v is higher than the other parameters, but overall it is acceptable. A possible explanation is that the number of small areas may not have been enough in this simulation set-up to reflect the performance of small area variation in the proposed model. Thus, the results indicate that the estimated parameters are working well in the proposed model.
Table A4. Bias, relative bias (R. Bias), and MSE of model parameter estimates based on 500 simulation runs from the proposed ordinal logistic model.
Table A4. Bias, relative bias (R. Bias), and MSE of model parameter estimates based on 500 simulation runs from the proposed ordinal logistic model.
ParameterTrueBiasR. Bias (%)MSE
a 1 −0.685−0.0223.210.037
a 2 0.7720.0101.240.038
β 1 −0.0900.003−3.330.011
β 2 0.068−0.002−2.940.009
β 3 0.170−0.010−5.880.009
ρ 0.378−0.019−4.950.015
σ v 0.1450.02819.240.015
σ f 0.1620.0116.480.001
σ I 0.169−0.006−3.550.001
Further simulation studies were performed, which assumed the responses are binary and count; thus, logit and log link functions were used.

Appendix B.2. Binary Logistic Model

The model is described as
y i j k p i j k Bernoulli p i j k ,
where y i j k is the response of wheezing for individual i, follow-up j, and small area k. Then, p i j k = Prob ( y i j k = 1 ) is the probability that the individual i had a wheezing event at follow-up j, and in small area k. Now the logistic model is described as
log p i j k 1 p i j k = β 0 + β 1 X i + β 2 X i j + β 3 X i j k + f i j + v k ,
where
v k N 0 , σ v 2 f i j = ρ f i j 1 + ϵ i j , ϵ i j N 0 , σ f 2 f i j f i j 1 N ρ f i j 1 , σ f 2 1 < ρ < 1 .
Now, f i j is the random effect of the individual and follow-up. Usually, this is the time-specific disturbance, but in the case of this model, the disturbance is an autoregressive-one or AR(1) model and v k is simply the random effect of each small area. A simulation study was performed in which data were simulated with a binary response; the response represented whether each child within the cohort wheezed or not. The aim of the simulation was to verify the estimates produced by a binary logistic model with data representing the CHILD cohort. For the simulation, there were 20 individuals, 4 follow-ups, and 20 small areas. The dependent variable y i j k was randomly generated with 1600 random binary values from a Bernoulli distribution. The probability of generating each integer 0 , 1 was 1 2 ; thus, equal probabilities. The covariates X i , X i j , and X i j k were also generated from a binary distribution with the probability of generating each integer 0 , 1 being equal to 1 2 . The estimated parameters are reported in Table A5.
Table A5. Binary Logistic Simulation Study: Three chains were used, with 60,000 samples from each chain. Furthermore, the Gelman diagnostics indicated convergence of model parameters. These estimates were used as the true parameters for the binary logistic model.
Table A5. Binary Logistic Simulation Study: Three chains were used, with 60,000 samples from each chain. Furthermore, the Gelman diagnostics indicated convergence of model parameters. These estimates were used as the true parameters for the binary logistic model.
ParameterEstimate
β 0 −0.298
β 1 −0.118
β 2 0.270
β 3 0.0168
ρ 0.0117
σ v 0.111
σ f 0.128
The goal was to evaluate the performance of the model parameters in the proposed logistic model. The simulation procedure followed from Algorithm A. In the simulation steps, the model was essentially reversed in the sense that the aim was to simulate the values of the response y i j k using the binary logistic model described before, thus replacing the original values of y i j k with the simulated values of y i j k and then refitting the same binary logistic model; this was performed 500 times. Each simulation run produced its own estimates ( δ ^ r ; r = 1 , , 500 ) of the model parameters. Then, the bias, relative bias (R. Bias), and MSE were calculated (Table A6).
The results from Table A6 indicated the model parameter estimates had small bias and MSE. The relative bias for ρ was higher than the other parameters, but it was still acceptable. Note that the model is sensitive to the number of follow-ups in the dataset. Ultimately, the number of areas, follow-ups and individuals chosen show how well the simulation performed. Thus, the overall results indicated that the estimated parameters were working well in the proposed binary logistic model.
Table A6. Bias, relative bias (R. Bias), and MSE of model parameter estimates based on 500 simulation runs from the proposed binary logistic model.
Table A6. Bias, relative bias (R. Bias), and MSE of model parameter estimates based on 500 simulation runs from the proposed binary logistic model.
ParameterTrueBiasR. Bias (%)MSE
β 0 −0.2980.009−3.00< 0.0001
β 1 −0.118−0.002−1.80< 0.0001
β 2 0.270−0.001-0.41< 0.0001
β 3 0.0168−0.0002−8.40< 0.0001
ρ 0.01170.002522.70< 0.0001
σ v 0.1110.00050.47< 0.0001
σ f 0.1280.00060.62< 0.0001

Appendix B.3. Zero-Inflated Poisson Model

We also simulated the Zero-Inflated Poisson (ZIP) model, in which the response was the number of wheezing episodes for each child in the cohort. We considered the same covariates as discussed in the previous logistic model (Appendix B.2). For this simulation, we were less interested in the modeling of false zeros; therefore, a diffuse prior for π i j k was implemented such that l o g i t ( π i j k ) = l o g i t ( π ) = γ 0 , γ 0 N ( 0 , 10 , 000 ) . Again, for the simulation, there were 20 individuals, 4 follow-ups, and 20 small areas. The dependent variable y i j k was randomly generated with 1600 random values from the ZIP distribution with π = 0.4 and λ = 4 , which give the range of possible values for y i j k as between 0 and 13. The covariates X i , X i j , and X i j k were also generated from a binary distribution with the probability of generating each integer 0 , 1 being equal to 1 2 . The statistical measures used to test the credibility of the models were the bias and MSE. Now, let
Θ i j k = β 0 + β 1 X i + β 2 X i j + β 3 X i j k + f i j + v k
where
v k N 0 , σ v 2 f i j = ρ f i j 1 + ϵ i j , ϵ i j N 0 , σ f 2 f i j f i j 1 N ρ f i j 1 , σ f 2 1 < ρ < 1 .
Then, the ZIP model becomes,
Prob y i j k = m = π + 1 π exp λ i j k if m = 0 1 π λ i j k y i j k exp λ i j k y i j k ! if m > 0
such that log ( λ i j k ) = Θ i j k and logit ( π ) = γ 0 where γ 0 N ( 0 , 10 , 000 ) .
The model parameter estimates are presented in Table A7. Then, the simulation procedure follows Algorithm A. Similar to previous simulations, the model was essentially reversed. Each simulation run produced its own estimates ( δ ^ r ; r = 1 , , 500 ) of the model parameters. Then, the bias, relative bias (R. Bias), and MSE were calculated (Table A8).
The results from Table A8 indicate that the model parameter estimates behaved reasonably well; however, some parameters had relative bias above 30 percent, which, after further investigation, was due to ρ not being statistically significant for this particular simulated dataset. When using such a model with real data, particular attention should be placed on the AR(1) and this should be removed if not significant.
Table A7. ZIP Simulation Study: Three chains are used with 60,000 samples from each chain. Furthermore, the Gelman diagnostics indicated convergence of model parameters. These estimates are to be used as the true parameters for the ZIP model.
Table A7. ZIP Simulation Study: Three chains are used with 60,000 samples from each chain. Furthermore, the Gelman diagnostics indicated convergence of model parameters. These estimates are to be used as the true parameters for the ZIP model.
ParameterEstimate
β 0 1.085
β 1 −0.034
β 2 0.066
β 3 0.085
γ 0 −0.304
ρ 0.657
1 π 0.575
σ v 0.639
σ f 0.042
Table A8. Bias, relative bias (R. Bias), and MSE of model parameter estimates based on 500 simulation runs from the proposed ZIP model.
Table A8. Bias, relative bias (R. Bias), and MSE of model parameter estimates based on 500 simulation runs from the proposed ZIP model.
ParameterTrueBiasR. Bias (%)MSE
β 0 1.0850.27024.90.252
β 1 −0.034−0.0038.40< 0.0001
β 2 0.0660.0023.30< 0.0001
β 3 0.085-0.003−3.50< 0.0001
γ 0 −0.304< 0.0001 0.10< 0.0001
ρ 0.657−0.224−34.100.194
1 π 0.575< 0.0001 < 0.0001 < 0.0001
σ v 0.639−0.210−34.000.355
σ f 0.042< 0.0001 0.30< 0.0001

References

  1. Global Initiative for Asthma. Global Strategy for Asthma Management and Prevention. Eur. Respir. J. 2008, 31, 143–178. [Google Scholar] [CrossRef] [PubMed]
  2. Public Health Agency of Canada. Report from the Canadian Chronic Disease Surveillance System: Asthma and Chronic Obstructive Pulmonary Disease (COPD) in Canada, 2018; Public Health Agency of Canada: Ottawa, ON, Canada, 2018.
  3. Statistics Canada. Table 13-10-0096-08 Asthma, by Age Group, 2022. [CrossRef]
  4. Nahum, U.; Gorlanova, O.; Decrue, F.; Oller, H.; Delgado-Eckert, E.; Böck, A.; Schulzke, S.; Latzin, P.; Schaub, B.; Karvonen, A.M.; et al. Symptom trajectories in infancy for the prediction of subsequent wheeze and asthma in the BILD and PASTURE cohorts: A dynamic network analysis. Lancet Digit. Health 2024, 6, e718–e728. [Google Scholar] [CrossRef] [PubMed]
  5. Rao, J.; Molina, I. Small Area Estimation; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2015. [Google Scholar] [CrossRef]
  6. Richelle, J. Wheezing in Early Life and Validating the Asthma Predictive Index. Bachelor Thesis, University of Manitoba, Winnipeg, MB, Canada, 2014. [Google Scholar]
  7. Jiang, J.; Torabi, M. Sumca: Simple, unified, Monte-Carlo-assisted approach to second-order unbiased mean-squared prediction error estimation. J. R. Stat. Soc. Ser. B 2020, 82, 467–485. [Google Scholar] [CrossRef]
  8. Torabi, M.; Lele, S.R.; Prasad, N.G. Likelihood inference for small area estimation using data cloning. Comput. Stat. Data Anal. 2015, 89, 158–171. [Google Scholar] [CrossRef]
  9. Battese, G.E.; Harter, R.M.; Fuller, W.A. An error-components model for prediction of county crop areas using survey and satellite data. J. Am. Stat. Assoc. 1988, 83, 28–36. [Google Scholar] [CrossRef]
  10. Fay, R.E.; Herriot, R.A. Estimates of Income for Small Places: An Application of James-Stein Procedures to Census Data. J. Am. Stat. Assoc. 1979, 74, 269–277. [Google Scholar] [CrossRef]
  11. Alkan, N. Assessing convergence diagnostic tests for Bayesian Cox regression. Commun. Stat. Simul. Comput. 2017, 46, 3201–3212. [Google Scholar] [CrossRef]
  12. Chan, J.C.C.; Grant, A.L. On the Observed-Data Deviance Information Criterion for Volatility Modeling. J. Financ. Econom. 2016, 14, 772–802. [Google Scholar] [CrossRef]
  13. Torabi, M.; Shokoohi, F. Likelihood inference in small area estimation by combining time-series and cross-sectional data. J. Multivar. Anal. 2012, 111, 213–221. [Google Scholar] [CrossRef]
  14. Torabi, M.; Shokoohi, F. Hierarchical Bayes estimation in small area estimation using cross-sectional and time-series data. J. Stat. Comput. Simul. 2014, 84, 605–613. [Google Scholar] [CrossRef]
  15. Shokoohi, F.; Torabi, M. Semi-parametric small-area estimation by combining time-series and cross-sectional data methods. Aust. N. Z. J. Stat. 2018, 60, 323–342. [Google Scholar] [CrossRef]
  16. Torabi, M. Zero-inflated spatio-temporal models for disease mapping. Biom. J. 2017, 59, 430–444. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Predicted proportions of wheezing severity using the final area-level logistic model in Manitoba. Winnipeg-specific proportions are shown on the right panel (region codes are given in Table A2).
Figure 1. Predicted proportions of wheezing severity using the final area-level logistic model in Manitoba. Winnipeg-specific proportions are shown on the right panel (region codes are given in Table A2).
Stats 08 00039 g001
Figure 2. Raw proportion of wheezing severity obtained from the data in Manitoba (aggregated wheezing severity at 3 months divided by sample size in each area). Winnipeg-specific proportions are shown on the right panel (region codes are given in Table A2).
Figure 2. Raw proportion of wheezing severity obtained from the data in Manitoba (aggregated wheezing severity at 3 months divided by sample size in each area). Winnipeg-specific proportions are shown on the right panel (region codes are given in Table A2).
Stats 08 00039 g002
Figure 3. Predicted average number of wheezing episodes in Manitoba using the final area-level (new) ZIP model. Winnipeg-specific proportions are shown on the right panel (region codes are given in Table A2).
Figure 3. Predicted average number of wheezing episodes in Manitoba using the final area-level (new) ZIP model. Winnipeg-specific proportions are shown on the right panel (region codes are given in Table A2).
Stats 08 00039 g003
Figure 4. Raw number of wheezing episodes obtained from the data in Manitoba (aggregated wheezing episodes at 3 months). Winnipeg-specific proportions are shown on the right panel (region codes are given in Table A2).
Figure 4. Raw number of wheezing episodes obtained from the data in Manitoba (aggregated wheezing episodes at 3 months). Winnipeg-specific proportions are shown on the right panel (region codes are given in Table A2).
Stats 08 00039 g004
Table 1. Unit -level binary logistic model (odds ratios were calculated by taking the exponent of the covariate parameter estimates).
Table 1. Unit -level binary logistic model (odds ratios were calculated by taking the exponent of the covariate parameter estimates).
ParameterEstimate95%
Credible
Interval (Lower, Upper)
Odds Ratio
Intercept−2.74(−3.25, −2.27)-
Mother asthma (prenatal)1.20(0.63, 1.76)3.31
Mother smoked (prenatal)1.05(−0.02, 2.01)2.85
Farm (3 months)−0.36(−1.47, 0.60)0.70
Furry pets (3 months)−0.28(−0.86, 0.28)0.76
σ v 2 0.37(0.04, 0.95)-
Table 2. Simple unit-level ZIP model (rate ratios were calculated by taking the exponent of the covariate parameter estimates).
Table 2. Simple unit-level ZIP model (rate ratios were calculated by taking the exponent of the covariate parameter estimates).
ParameterEstimate95%
Credible
Interval (Lower, Upper)
Rate Ratio
Intercept−1.22(−2.17, −0.35)-
Mother asthma (prenatal)0.84(0.18, 1.52)2.32
Mother wheezed (prenatal)0.02(−0.63, 0.65)1.02
Mother smoked (prenatal)1.01(−0.38, 2.44)2.75
Farm (3 months)0.90(0.10, 1.78)2.45
γ 0 1.45(1.01, 1.88)-
1 π 0.19(0.13, 0.27)-
σ v 2 1.37(0.86, 2.07)-
Table 3. Final area-level binary logistic model.
Table 3. Final area-level binary logistic model.
ParameterEstimate95%
Credible
Interval (Lower, Upper)
Odds Ratio
Intercept−3.27(−4.09, −2.52)-
Mother wheezed (prenatal)0.01(−0.01, 0.04)1.01
Mother asthma (prenatal)0.02(0.01, 0.05)1.02
Mother smoked (prenatal)0.00(−0.03, 0.03)1.00
Farm−0.01(−0.03, 0.00)0.99
σ v 2 0.25(0.03, 0.84)-
Table 4. Final area-level ZIP model.
Table 4. Final area-level ZIP model.
ParameterEstimate95%
Credible
Interval (Lower, Upper)
Rate Ratio
Intercept0.95(−0.02, 1.90)-
Farm−0.03(−0.60, 0.01)0.97
Mother asthma (prenatal)0.01(−0.02, 0.03)1.01
σ v 2 1.17(0.76, 1.75)-
DIC164.7--
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Singh, C.; Torabi, M. Theoretical Advancements in Small Area Modeling: A Case Study with the CHILD Cohort. Stats 2025, 8, 39. https://doi.org/10.3390/stats8020039

AMA Style

Singh C, Torabi M. Theoretical Advancements in Small Area Modeling: A Case Study with the CHILD Cohort. Stats. 2025; 8(2):39. https://doi.org/10.3390/stats8020039

Chicago/Turabian Style

Singh, Charanpal, and Mahmoud Torabi. 2025. "Theoretical Advancements in Small Area Modeling: A Case Study with the CHILD Cohort" Stats 8, no. 2: 39. https://doi.org/10.3390/stats8020039

APA Style

Singh, C., & Torabi, M. (2025). Theoretical Advancements in Small Area Modeling: A Case Study with the CHILD Cohort. Stats, 8(2), 39. https://doi.org/10.3390/stats8020039

Article Metrics

Back to TopTop