Bayesian Meta-Analysis for Binary Data and Prior Distribution on Models

In meta-analysis, the structure of the between-sample heterogeneity plays a crucial role in estimating the meta-parameter. A Bayesian meta-analysis for binary data has recently been proposed that measures this heterogeneity by clustering the samples and then determining the posterior probability of the cluster models through model selection. The meta-parameter is then estimated using Bayesian model averaging techniques. Although an objective Bayesian meta-analysis is proposed for each type of heterogeneity, we concentrate the attention of this paper on priors over the models. We consider four alternative priors which are motivated by reasonable but different assumptions. A frequentist validation with simulated data has been carried out to analyze the properties of each prior distribution for a set of different number of studies and sample sizes. The results show the importance of choosing an adequate model prior as the posterior probabilities for the models are very sensitive to it. The hierarchical Poisson prior and the hierarchical uniform prior show a good performance when the real model is the homogeneity, or when the sample sizes are high enough. However, the uniform prior can detect the true model when it is an intermediate model (neither homogeneity nor heterogeneity) even for small sample sizes and few studies. An illustrative example with real data is also given, showing the sensitivity of the estimation of the meta-parameter to the model prior.


Introduction
Meta-analysis has been widely applied in many research areas and is of particular importance in healthcare studies. When there exist different randomized controlled clinical trials (or studies) of a particular medical treatment, a meta-analysis may be conducted to determine what final conclusion can be drawn on the important question from each study, the effectiveness of the treatment.
One of the cases that has received more attention in the literature is the meta-analysis for binary data [1]. On the one hand, because it is very common for effectiveness to be measured through a binary variable according to whether or not a certain objective has been achieved (to survive, do not relapse or to reach a low viral load). Charles et al. [2] found that half of trials calculated their sample size based on a binary outcomes. On the other hand, binary outcomes have different statistical considerations to using continuous outcomes. The Bayesian random-effects model for meta-analysis given by Sutton and Abrams [3] would not be suitable for modeling binary data: where x i denotes an observed effect for each of k studies, θ the estimated pooled effect, and τ 2 is an estimate of the between-study variance. For binary data, the preceding normal hierarchical model has been applied to the logit transformation y i = log[x i /(n i − x i )] with the reparametrization log[θ i /(1 − θ i )], where x i denotes the number of successes at the ith study [4][5][6][7]. However, this normal approximation does not work properly when the samples sizes (n i ) are small or when the number of successes is zero, even if a continuity correction is applied to the original data, as it was shown by Sweeting et al. [8].
Moreno et al. [9] proposed an objective Bayesian meta-analysis model for binary data in which no continuity correction is required. The Bayesian model proposed for the study i is based on the binomial distribution {M i : Bin(x i , θ i , n i ), π(θ i )}, and the linking distribution between the parameters of each study θ i and the meta-parameter θ, π(θ i , θ), belongs to the Fréchet class of bidimensional distributions with fixed marginals π(θ i ) and π(θ). The objective Bayesian analysis assumes that these marginals are uniform priors, Unif(θ i |0, 1) and Unif(θ|0, 1).
The model parameters will therefore be the parameters for the k studies, θ 1 , . . . , θ k , and the meta-parameter θ. However, if some of the θ i s are equal the dimension of the model would be reduced. In [10], the authors proposed to study the between-sample heterogeneity as a model selection problem, clustering the parameters θ 1 , . . . , θ k based on the samples (x 1 , n 1 ), . . . , (x k , n k ). They adopted a Bayesian approach based on product partition models proposed in [11,12]. Bayesian model selection process requires the definition of a specific model prior.
In the absence of information about the models, the uniform prior is the most common prior assumed in a Bayesian model selection problem. However, this prior does not consider the structure of the cluster problem and other alternative model priors are possible such as considering the uniform distribution in each of the hierarchy levels of the clusters, or even considering the Poisson-Intrinsic prior proposed by Casella et al. [13] which penalizes the number of clusters. Although all these priors can be considered as not informative as they do not add new information to that provided by the data, the prior probabilities assigned to each partition vary. Due to the sensitivity of the estimation of the meta-parameter to the chosen cluster, we analyze in this paper the characteristics of these model priors and in which cases each one my be preferable.
A frequentist evaluation is carried out with simulated data, where different number of studies, sample sizes, and real clusters are considered. The rest of the paper is organized as follows. The binomial Bayesian model is presented in Section 2, where the Bayesian procedure for clustering the samples and the likelihood of the meta-parameter are also given. In this section, the four model priors to be compared will be presented. The simulated data and the results of the frequentist validation are described in Section 3. Section 4 provides one illustrative example with a real dataset. Finally, Section 5 summarizes the main conclusions drawn and presents some concluding remarks.

The Bayesian Binomial Model
Assume a meta-analysis involving k studies that provide k independent discrete samples which follow a binomial distribution {Bin(x i |n i , θ i ), i = 1, . . . , k}, where θ i represents the treatment effectiveness, n i the number of patients, and x i the number of successful treatments, conditional on the study i. We assume weak prior information on the conditional treatment effectiveness θ i . Accordingly, the uniform prior Unif(θ i |0, 1) is used [14,15]. The Bayesian sampling model (M i ) for i = 1, . . . , k studies is then given by where and 1 A is the indicator function that takes a value of 1 to all elements of A, and 0 elsewhere. The meta-model is defined by a patient in a virtual study, which is not affected by between-study variability. The variable x is a binary latent variable and the meta-parameter θ defines the probability of success for this virtual patient. The distribution of this metavariable x is the Bernoulli meta-model Ber(x|θ), where the meta-parameter θ represents the true (unconditional) treatment effect. The objective Bayesian meta-model M is then given by

The Linking Distribution
A distribution π(θ i |θ) is needed to link the experimental parameters θ i and the metaparameter θ. This linking distribution should ensure there is coherence between the conditional and marginal distributions of the experimental parameters and the meta-parameter. This requires that the corresponding bivariate distribution belongs to the class of bivariate distributions with given marginals. The class of bivariate distributions solving this problem is called the Frèchet class: Following Moreno et al. [9], a candidate π(θ i , θ) is constructed using the intrinsic priors for model selection [16]. The conditional intrinsic linking distributions {π I (θ i |θ, t), t = 1, 2, . . .} arises from the model comparison between the meta-model M and the experimental model M i . For any positive integer t, the intrinsic method gives the conditional intrinsic prior as a Beta-Binomial mixture, In general, the bivariate intrinsic prior π I (θ i , θ|t) enjoys two interesting properties. One is that it belongs to the Fréchet class with marginals π(θ i ) and π(θ) following a uniform distribution. A second one is that the concentration degree of π I (θ i |θ, t) around θ is controlled by the training sample size t, the larger the t the larger the concentration degree. Note that the correlation coefficient between θ i and θ is ρ = t/(t + 1). In practice, the hyperparameter t is fixed, assuming a large enough correlation between θ i and θ. We assume in our examples a correlation of 0.98, which implies that t = 48. Hence, for the sake of simplicity in notation, we refer to the linking distribution π I (θ i |θ) rather than π I (θ i |θ, t).
As it is assumed that θ i , i = 1, . . . , k are conditional independent given θ, the linking distribution of θ 1 , . . . , θ k conditional on θ is given by

Clusters
The previous section assumes that there are k experimental parameters θ i , i = 1, . . . , k to be estimated. However the dimension of the experimental model can be reduced if some of the θ i 's are equal. Following Moreno et al. [10], model estimation in this parametric setting is a problem of clustering the parameters θ 1 , . . . , θ k , based on the samples x 1 , . . . , x k from the experiments. We first define what is meant by cluster. The samples x i and x j , i = j, from f (x|θ i , n) and f (x|θ j , n), respectively, are said to be in the same cluster if θ i = θ j . The between-sample heterogeneity is then determined by the number of clusters and by the location of the samples (x 1 , n 1 ), . . . , (x k , n k ) within these clusters.
To cluster the samples we adopt the product partition model approach proposed by Barry and Hartigan [12], together with a Bayesian model selection procedure based on Bayes factors for the intrinsic priors for the model parameters.
We employ the following notations and expressions in the meta-analysis conducted [13]. For a given p, we define a partition of the samples into p clusters by the vector r p = (r 1 , . . . , r k ), where r i , i = 1, . . . , k, is an integer between 1 and p denoting the cluster to which x i is assigned. Figure 1 shows the possible clustering structures for k = 3, and their corresponding r p .

The Likelihood of θ for a Particular Partition
The likelihood of θ will depend on the partition of the samples. Given a partition r p = (r 1 , . . . , r k ), the sampling distribution of x = (x 1 , . . . , x k ) given in (2) is where θ p = (θ 1 , . . . , θ p ) is an unknown parameter of dimension p, the component θ j in (8) corresponds to r i = j, and m j = ∑ i:r i =j n i and s j = ∑ i:r i =j x i are the sample size and number of success of the cluster j. The likelihood of a particular partition, for example, The heterogeneity partition r k = (1, 2, 3, . . . , k) has the corresponding likelihood function given by and the homogeneity partition r 1 = (1, 1, . . . , 1) has the corresponding likelihood function given by Now, integrating out θ p with the intrinsic prior π(θ p |p, r p ) = π I (θ 1 , . . . , θ p |θ)1 (0,1) (θ) dθ, we obtain the likelihood of θ, conditional on the cluster model (p, r p ) given by where 3 F 2 (v, w, z) denotes the generalized hypergeometric function with argument z and vector parameters v and w of dimensions 3 and 2. In this case, the parameters a j = (−t, −t, s j + 1) and b j = (1, −m j − t + s j ) are related with the number of 1's and 0's in cluster j, respectively.

The Likelihood of θ the Prior Distribution over the Partitions
To derive the likelihood function of θ we need to integrate out (9) with respect to a discrete prior on (p, r p ). The (unconditional) likelihood of θ for the data x is given by The prior distribution on the partitions π(p, r p |k) plays an important role in the estimation of the parameter θ [13]. We consider here four priors on (p, r p ) which are motivated by reasonable but different assumptions. The four selected prior distribution assume the absence of prior information about the models, but ranges from the assignment of high prior probability at the boundary p = 1 and p = 4 (homogeneity and heterogeneity structures, respectively) to other intermediate situations that moderate the a priori assignment to these two clusters or considers them all equally probable. • The Uniform prior. The first prior proposed is the uniform prior (U), which gives the same probability to every model, that is, where B k , the Bell number, is the number of subsets a set of size k can be partitioned into. Figure 2 shows the prior probabilities for each partition when four studies are considered. In this example, the Bell number is 15. This choice does not take into account the level of complexity of each partition. • The Hierarchical Uniform Prior with 2 levels (HU2). As recommended by Casella et al. [13], a hierarchical uniform prior can be appropriate to take into account the different levels of complexity of the partitions. This prior distribution distinguishes two levels of complexity in the partitions. The first level is given by the number of clusters p in which the k samples are grouped. The second level will be given by the number of possible partitions of the k samples into p clusters. Let R p represent this set of partitions into p clusters, which we call the cluster class. The number of partitions in R p is given by the Stirling number of the second kind S(k, p) and can be written as where ! corrects the count by considering the redundant strings corresponding to the vector (k 1 , . . . , k p ). For instance, to calculate the Stirling number S(4, 2), there are two possible vectors (k 1 , k 2 ), the vector (1, 3), and the vector (2, 2), and the Stirling number would be which is the number of possible partitions for p = 2 and k = 4. The hierarchical uniform distribution for 2 levels will be given by the decomposition Figure 3 shows the prior probabilities for each partition using the hierarchical uniform prior with 2 levels with 4 studies. Note that this hierarchical distribution assigns a higher prior probability to cases of homogeneity and heterogeneity. • The Hierarchical Uniform Prior with 3 levels (HU3). Following Casella et al. [13] and Moreno et al. [10], the prior specification for (p, r p ) can be decomposed in three levels: π HU3 (p, r p |k) = π(p, r p |R p;k 1 ,...,k p , k)π(R p;k 1 ,...,k p |p, k)π(p|k).
Unlike the previous prior distribution, the hierarchical uniform prior with 3 levels considers the number of ways the integer k can be partitioned into p clusters. We will call it the number of configuration classes within each R p and it will be denoted by b(k, p). In our illustrative example with k = 4, this value is equal to 1 for p = 1, 3, 4 (b(4, 1) = b(4, 3) = b(4, 4) = 1), and only for the cluster class p = 2 there are two configuration classes, corresponding to the configurations x|xxx and xx|xx, so b(4, 2) = 2.
The hierarchical uniform prior with 3 levels is given by the expression Figure 4 shows the prior probabilities for each partition using the hierarchical uniform prior with 3 levels and 4 studies. • The Hierarchical Poisson Prior with 3 levels (HP3). Casella et al. [13] argue that when analyzing a cluster problem of a sample size k, the extreme case of having k clusters should be given a priori a smaller probability than that given to any other case. Extending this argument for any k, it might be reasonable that the prior distribution on the number of clusters π(p|k) might be a truncated Poisson distribution P (p|λ), where λ is an unknown parameter. We can assume an intrinsic prior π I (λ|λ 0 = 1) for λ, constructed by testing the Poisson null hypothesis H 0 : λ = λ 0 versus H 1 : λ ∈ R + [17], where 0 F 1 (1/2, λ) denotes the confluent hypergeometric function. The reason for taking λ 0 = 1 is that the one cluster model is the reference model throughout the analysis. The resulting marginal intrinsic distribution for p is We cannot assume a Poisson distribution for the other two levels of the hierarchical structure because there is no a clear order in relation to complexity. For this reason, a uniform distribution is assumed for the other two levels. The hierarchical Poisson prior will be given by Figure 5 shows the prior probabilities for each partition using the hierarchical Poisson prior with 4 studies. The prior probability for the homogeneity cluster is more than four times higher than the prior probability of the heterogeneity case. Finally, from (10) and the priors defined in (11), (14), (16) and (19), the (unconditional) likelihood of θ for the data x is given by

Bayesian Model Averaging in the Meta-Analysis
The BMA approach to meta-analysis involves averaging over all the possible models (heterogeneity structures or partitions) when making inferences about the parameter of interest θ.
In this case, the posterior probabilities correspond to those of any heterogeneity structure given by a pair (p, r p ), which is represented by where m r p (x|p, r p ) = f (x|p, r p , θ p )π(θ p |p, r p ) dθ p is the marginal of the data x conditional on model (p, r p ), with f (x|p, r p , θ p ) and π(θ p |p, r p ). These posterior model probabilities Pr(p, r p |x) are the weights for the meta inference. The posterior distribution for the parameter of interest θ becomes where π(θ|x, p, r p ) = f (x|p, r p , θ) The posterior distribution in (22) is computed numerically using Wolfram Mathematica (see code in the Supplementary Material Section).

Simulated Data
This section presents the simulated data used in the frequentist validation. The data have been simulated from binomial distributions, where the number of studies included in the meta-analysis (k), the partition for the data (r p ), and the sample size within each study (n) vary between simulations.
The values for the number of studies (k) in the meta-analysis are 3, 5. Other greater values of k are obviously possible. For instance, we developed the case k = 8 (see supplementary material Section) where the conclusions obtained are similar to that in the cases k = 3 and 5. Therefore, in order to facilitate the reading of the Table 1 we only present the cases k = 3 and k = 5. These numbers of studies are manageable to do this simulation exercise. In this respect, Davey et al. [18] conducted an extensive review of the Cochrane Database of Systematic Reviews (CDSR) and pointed out that just under 75% of the meta-analyses contained five or fewer studies.
The sample size within each study is also a crucial parameter of the simulation. For simplicity we assume a common sample size for the k studies and this sample size takes values of 10, 30, 100, and 300. Finally different "true" partitions are considered for each k, where the heterogeneity and homogeneity cases are always included and one or two intermediate cases are also analyzed. Table 1 shows the parameters of the simulated data. The θ i parameters used in the simulation are sufficiently disparate between clusters to expect that with moderate sample sizes, the Bayesian selection process will be able to detect the true model. For all simulation scenarios, 500 simulations were performed. To analyze the properties of the prior distributions over posterior probabilities of the partitions we show the proportion of times the true model is found as the model with the highest posterior probability and the mean posterior probability in those cases in which the true model is found as the most probable. We also show the number of cases the homogeneity cluster (r 1 = (1, 1, . . . , 1)) and the heterogeneity case (r k = (1, 2, 3, . . . , k)) are found as the most probable model. Figures 6 and 7 show the results of the frequentist validation for the case k = 3 and true partitions r 1 = (1, 1, 1), r 2 = (1, 1, 2) and r 3 = (1, 2, 3) corresponding to a situation of homogeneity, intermediate heterogeneity, and heterogeneity, respectively. As expected, for the true case of homogeneity r 1 , the uniform prior shows worse performance as it is the only one that does not assume an a priori preference for the homogeneity. However, the results show how the uniform prior reaches a proportion of correct choices close to 70% with sample sizes greater than 100. The mean posterior probabilities reach values greater than 30% and 40% for sample sizes of 100 and 300, respectively. Observe that b(3, p) = 1, 1 ≤ p ≤ 3, thus the results from HU2 and HU3 priors are identical. The hierarchical Poisson prior, which assigns a higher prior probability to the homogeneity case, reaches a proportion of correct choices higher than 95% even with sample sizes of 10, and posterior probabilities higher than 50%. All the prior distributions show a good performance for high sample sizes. When the true partition is an intermediate one, the results vary. Figure 6 shows the results for the true case r 2 = (1, 1, 2). The uniform prior shows the best performance although the proportion of right choices is smaller than 50% for n = 10. The mean posterior probability reaches a value of 80% with a sample size of 300. With the hierarchical Uniform priors the true model is never chosen as the most probable for a sample size of n = 10. In this case, the mean of the posterior probability for the true partition when it is found as the most probable does not exist and it is shown as 0 in the figure. For small sample sizes these prior distributions found the heterogeneity case as the most probable (Figure 7). The hierarchical Uniform priors only achieve the 50% of right choices for a sample size of 300. The hierarchical Poisson prior improves the behavior of the hierarchical Uniform priors although for a small sample size of n = 10, it chooses the homogeneity case more than 85% of the simulations (Figure 7). When the true model is the heterogeneity case, i.e., r 3 = (1, 2, 3), the hierarchical Uniform priors (HU2 and HU3) show the best performance even with small sample sizes. Surprisingly, the uniform prior shows worse results than those of the hierarchical Poisson prior. This can be explained by the higher prior probabilities assigned to intermediate partitions by the uniform prior, which hinder the identification of the heterogeneity case as the true model. The proportion of right choices and the mean posterior probabilities for the true model are near to 100% for all the prior models when the sample size is 300. Figures 8 and 9 show the results of the frequentist validation for the case k = 5 and true partitions r 1 = (1, 1, 1, 1, 1), homogeneity case, r 2 = (1, 1, 1, 2, 2) and r 3 = (1, 1, 2, 2, 3), intermediate situations and r 5 = (1, 2, 3, 4, 5), heterogeneity case.

Frequentist Evaluation
The analysis of the homogeneity case with k = 5 shows a bad performance of the uniform prior, even worse than observed for k = 3. The proportion of right choices only exceed 50% for a sample size of 300. As it is shown in the Figure 9, the uniform prior found as the most probable model some intermediate models as the proportion of cases in which the heterogeneity case is chosen is 0. As it was found with k = 3, the hierarchical Poisson prior shows a better performance than the HU2 and HU3 which becomes similar as the sample size increases. Some results obtained for the HU2 and HU3 priors are quite similar, showing an overlapping behavior in some cases.
Once again, the analysis of the intermediate cases with k = 5 shows a similar behavior to that observed with k = 3. The uniform prior reaches a higher proportion of correct choices, although for the case of two clusters (p = 2), the hierarchical Poisson improves it for sample sizes greater than 100. With a moderate number of clusters (p = 3), all the prior models show difficulties to choose the true model with small sample sizes. In the case of the hierarchical Poisson prior, it chooses the homogeneity case for small sample sizes, while the uniform prior chooses other intermediate models (Figure 9). The hierarchical uniform priors never choose the true model with sample sizes smaller than 300, showing preference for the heterogeneity case.
For the heterogeneity case r 5 = (1, 2, 3, 4, 5), the proportion of right choices show a U-shape for all model priors except the Uniform prior. For small sample sizes, the greater prior probability assigned to the extreme cases leads to a preference for the heterogeneity case (the homogeneity case is never chosen as it is shown in Figure 9). As the sample size increases, the importance of the prior information is reduced and other intermediate partitions are chosen, probably due to the small difference in the true probability of success between the 5 studies (see Table 1). Finally, with a sample size of 300, all model prior distributions choose the true model.
An additional analysis for the case k = 8 is shown in the Supplementary Material Section. The results are very similar to those obtained for the case k = 5.

An Illustrative Example with Real Data
In this section, we show an illustrative example with real data to analyze the impact of the prior models over the estimation of the meta-parameter θ. With the objective to determine the effectiveness of granulocyte transfusions compared to no granulocyte transfusions for treating infections in patients with neutropenia or disorders of neutrophil function in reducing mortality, Stanworth et al. [19] conducted a meta-analysis. The dataset in Table 2 is extracted from Stanworth et al. [19] and corresponds to the mortality analysis in four studies (k = 4, subgroup analysis for studies transfusing greater than 1 × 10 10 granulocytes at days 20-22) for granulocyte transfusions for treating infections in patients with neutropenia or neutrophil dysfunction treated with transfusion (Treatment). This is a good example to apply the model proposed by Moreno et al. [9] as the number of cases is small and there are even no cases in one study. For k = 4, there are 15 possible partitions and the estimation of the meta-parameter θ will depend on the partition considered. Figure 10 shows the posterior mean conditioned on each partition. The posterior mean varies from the 0.1929 obtained for the partition r 2 = (1, 1, 1, 2) to 0.1344 for the partition r 3 = (1, 2, 3, 2). The four prior models are applied to this dataset. The prior probabilities assigned to each partition can be shown in  To include into the analysis the model uncertainty, the posterior distributions for θ are all averaged in the BMA posterior distribution, and this BMA posterior distribution depends on the model priors assumed. Table 3 shows the top cluster models for the four model priors.  Top cluster models and their posterior probabilities are sensitive to the model prior. For the uniform prior model, the most probable model is {x 1 x 2 x 3 |x 4 }, with a posterior probability of 0.19. However, this model is found as the second most probable model for the hierarchical uniform priors, and reaches the third position for the Poisson prior. These last models found the heterogeneity case as the most probable model. This analysis also points out the importance of the BMA estimation as it allows the model uncertainty to be included in the estimation of the meta-parameter. As can be seen, the estimation of the meta-parameter by BMA is less sensitive to the choice of the prior distribution for the models, ranging from 0.159 for the hierarchical Uniform priors to 0.164 for the Uniform prior.

Conclusions
Bayesian methods for the design, analysis, and synthesis of clinical trials have been developed in several areas including meta-analysis where the structure of the betweensample heterogeneity is essential in estimating the meta-parameter. As part of the design of the Bayesian framework, we address the question from a different standpoint, arguing that between-sample heterogeneity is a clustering problem and that model uncertainty can be incorporated into the inference using a Bayesian procedure. Under this procedure, the posterior probabilities of the cluster models are computed and the definition of the prior distribution over the models takes on special importance.
Meta-analysis for binary data is an increasingly used tool for estimating the effectiveness of a certain treatment. Meta-analysis for binary data presents interesting statistical challenges that have been addressed in the literature, such as the presence of zeros, that make it difficult to apply logit transformations to the data [20,21]. The definition of an objective Bayesian meta-analysis for binary data that does not require transformations to the data represented an advance in literature [9].
The objectivity of this analysis is given by the prior distribution assumed for the experimental parameters (θ i ) and the meta-parameter (θ) [22]. However, when the betweensample heterogeneity is considered in the analysis as a problem of clustering the experimental parameters (θ i ), the objectivity remains in doubt since the Bayesian model selection requires the definition of the prior distributions over the models. The hierarchical structure of the clusters does not allow to conclude that the Uniform prior distribution is the best or unique option. Moreno et al. [10] proposed to use the hierarchical Uniform prior with three levels, but other options are possible.
In this paper we analyze the properties of four model priors assuming the absence of prior information about the cluster model: the Uniform prior, the hierarchical Uniform prior with two and three levels and the hierarchical Poisson prior. There are other priors proposed by the literature for the problem of clustering, such as the Ewens-Pitman prior [23][24][25] or the Jensen-Liu prior [26]. However, these prior distributions require the assessing of a hyperparameter that reflects the a priori information about the models.
A first conclusion achieved from the frequentist validation is that none of the prior distributions is completely non-informative. The posterior probabilities for the models are very sensitive to the model priors, even with moderately large sample sizes. A useful guideline for daily practice could be as follows. If you consider that the homogeneity case is probable, the hierarchical Poisson prior for small samples sizes, or the hierarchical Uniform prior for moderately large sample sizes are the best options. If you consider that the heterogeneity case is probable, the hierarchical Uniform priors are preferable. Finally, if you consider that the real cluster is not the homogeneity or heterogeneity cases, the Uniform prior can be used for small number of studies and sample sizes.
A second conclusion is that carrying out a meta-analysis based on a single partition (even if it is the partition with the maximum posterior probability) can obtain biased results as it ignores a very important part of the uncertainty around the estimation of the metaparameter. The BMA procedure offers a natural way to incorporate this model uncertainty into the estimation of θ [27].
As the BMA procedure implies estimating the meta-parameter for all possible partitions, computational difficulties arise when the number of studies k is moderately large due to time required for estimation. In that case, a set of good cluster models can certainly by found using a stochastic algorithm [13]. Funding: Financial support for this study was provided in part by grant ECO2017-85577-P (Ministerio de Ciencia, Innovación y Universidades, Agencia Estatal de Investigación, Spain).

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
Mathematica codes implementing the simulate and real data experiment are available on the supplementary material section.