Bayesian Hierarchical Framework from Expert Elicitation in the South African Coal Mining Industry for Compliance Testing

Occupational exposure assessment is important in preventing occupational coal worker’s diseases. Methods have been proposed to assess compliance with exposure limits which aim to protect workers from developing diseases. A Bayesian framework with informative prior distribution obtained from historical or expert judgements has been highly recommended for compliance testing. The compliance testing is assessed against the occupational exposure limits (OEL) and categorization of the exposure, ranging from very highly controlled to very poorly controlled exposure groups. This study used a Bayesian framework from historical and expert elicitation data to compare the posterior probabilities of the 95th percentile (P95) of the coal dust exposures to improve compliance assessment and decision-making. A total of 10 job titles were included in this study. Bayesian framework with Markov chain Monte Carlo (MCMC) simulation was used to draw a full posterior probability of finding a job title to an exposure category. A modified IDEA (“Investigate”, “Discuss”, “Estimate”, and “Aggregate”) technique was used to conduct expert elicitation. The experts were asked to give their subjective probabilities of finding coal dust exposure of a job title in each of the exposure categories. Sensitivity analysis was done for parameter space to check for misclassification of exposures. There were more than 98% probabilities of the P95 exposure being found in the poorly controlled exposure group when using expert judgments. Historical data and non-informative prior tend to show a lower probability of finding the P95 in higher exposure categories in some titles unlike expert judgments. Expert judgements tend to show some similarity in findings with historical data. We recommend the use of expert judgements in occupational risk assessment as prior information before a decision is made on current exposure when historical data are unavailable or scarce.


Introduction
Coal is the second largest energy source in the world, contributing a quarter of the world's energy sources, and the consumption of coal has been on the rise in recent years. By 2040, worldwide coal consumption is expected to rise at a rate of about 1.5% per year [1]. South Africa is the fourth largest coal producer in the world, employing more than 92,000 workers in 2019 [2]. Coal mining areas in South Africa are geographically found in the KwaZulu-Natal, Mpumalanga, and Limpopo provinces. In coal mining activities, coal dust is produced. Inhalation of coal dust is highly associated with the development of Coal Mine Dust Lung Disease (CMDLD). CMDLD is life-threatening to coal mine workers in developing countries [3,4].

Study Design and Data Collection
The study was a survey collection of respirable coal mine dust exposure in underground coal mines in South Africa. The coal mine workers were only males. Workers included in this study were from 10 job titles. The inclusion criteria for the selection of these job titles were based on the availability of previous or historical (past) exposure data and current data. Job titles must have exposure from both the previous and current or the latest year to be included in the analysis. The data from the previous year were used as prior data. Personal sampling of respirable dust was conducted by using cyclones attached to a sampling pump followed by gravimetric analysis [16]. Further details on sampling are available from the earlier paper [6]. Exposures were measured over a full shift and expressed as eight-hour time-weighted average dust concentration (TWA8h concentration).

Expert Elicitation Process
In this study, a modification of the "Investigate", "Discuss", "Estimate", and "Aggregate" (IDEA) structured protocol was used [17]. The first step in the IDEA protocol is the background information on experts, followed by an investigation where experts are requested to individually answer questions and provide reasons for their answers. During the discussion, an expert was shown anonymous answers from each participant to resolve varying interpretations of the questions. Each expert had to estimate for the second and final time. During the post-elicitation process, aggregation of the data takes place, where the mean of experts' second-round responses is obtained. Many studies have adopted the IDEA protocol seamlessly. The protocol was evaluated in public health, ecology, and conservation studies, and recently in the US Intelligence Advanced Research Projects Activity [18,19]. The advantage of the IDEA protocol is that it minimizes bias resulting from overconfidence, availability, and representativeness. It is also possible to apply for remote elicitation, which is cost-effective. We invited a group of experts to take part in the study. The experts were given time to investigate and understand the questions, and then give their feedback on the questions. The modification from the IDEA protocol is that we did not allow experts to receive feedback on their judgements from other experts. The experts were not encouraged to discuss their results to provide second estimates. We instead adopted and simplified the IDEA protocol four-step elicitation questions in Appendix B. The advantage of the modification is that it prevents more experienced expert(s) from influencing the discussion which may skew the final judgements toward just one dominant expert. The expert elicitation process was done remotely.
In the four-step process we used in this study, the credible intervals (range) were the minimum and maximum probability of exposure in each of the exposure categories. These ranges were standardized using linear extrapolation [18,20].
The minimum standardized percentage: B − ((B − L) × (S/C)) The maximum standardized percentage: B + ((U − B) × (S/C)) Where B is the best possible guess, L is the minimum percentage, U is the maximum percentage, S is the level of credible intervals to be standardized, and C is the level of confidence (how sure is the expert). The distributions were truncated for extreme values if the adjusted credible intervals fall outside the probability bounds of 0 and 100%. The credible intervals were standardized to reduce overconfidence.
The expert's judgements were combined to form a joint probability distribution. The distribution was used as an informative prior, updating current data from respective job titles to produce posterior estimates when compared to the OEL. The individual expert probability distribution can be mathematically or behaviourally aggregated to produce the joint distribution [21]. For this study, we used equal weighting to aggregate the data.

Selection of Experts
Experts were invited through email to take part in this study. The average year of experience was 13 years. The experts were occupational hygienists, mine ventilation engineers, supervisors, mine engineers, and geologists. The experts were found by the lead of occupational health and hygiene in the coal mining industry. The identified experts then recommended other experts through snowballing. All experts who accepted the invitation were asked to voluntarily take part. Experts had a range of experience associated with various responsibilities and/or connections to occupational health and safety in coal mining.
Before the elicitation, experts were trained on the uncertainty, the P95, and an outline of the tasks to be performed. They were also provided with an information sheet to obtain their professional background in the field. Each job title was described in detail to the experts and an example of how the probability of exposure to each exposure category was shown to familiarize them with the elicitation. An expert was allowed to consult at any time if he/she is confused or lost in the elicitation. According to Highlights of the Expert Judgement Policy Symposium and Technical Workshop conference, held in the year 2006 [22], a sample size between six to twelve experts is recommended for expert elicitation. In this study, a sample size of six experts was included.

Statistical Methods
Statistical analysis was conducted in STATA version 14.1 [23], and R version 4.1.1 (R Core Team, Auckland, New Zealand), using RStan and bayestestR packages for the Bayesian model [24][25][26]. A summary of the current and historical data was presented as arithmetic mean (AM), standard deviation (SD), geometric mean (GM), and geometric standard deviations (GSD). Each HEG is said to be unique [6], however the same job title and work activity can be found in different HEGs. The median difference in job title exposure across HEGs was assessed by using the Kruskal-Wallis Test, a non-parametric version of the analysis of variance (ANOVA). A job title with a statistically significant p-value of less than 0.05 at the 95% confidence interval was not selected for the Bayesian analysis, as this would mean the exposure distribution is different across HEGs. Such a job title is not regarded as having a similar exposure profile.

Producing Joint Distribution from Experts' Elicitation
We used the Sheffield Elicitation Framework (SHELF) tool to produce the joint probability distribution. SHELF is an open-source tool developed by the University of Sheffield [27].
The SHELF tool finds the distribution that best fits the expert's elicited judgement. The SHELF tool works by using lease squares procedures by finding a distribution that best fits the given expert's data inputs by minimizing the sum of squares of the residuals. Recall that occupational exposure data are best fitted by a lognormal distribution. In this study, we used lognormal distribution to produce our prior means and variance for the Bayesian framework. In the SHELF tool, we used the quartile method to elicit our expert's judgement in a continuous quantity [28].
The Quartile methods ask for experts' median and quartiles. The quartiles are presented as lower (Q1) and upper quartiles (Q3). The lower (L) and upper (U) limits during the elicitation were 0 and 100%. The Q1 is a value between L and M, where M is the median value, while Q3 is a value between M and U. In the expert's elicitation questionnaire in Appendices B and C, we collected the expert's best guess in form of percentages to distribute the P95 grouping according to the exposure categories. Additionally, we asked about their minimum and maximum percentages on the probability of the P95 coal dust exposure. This was important to capture the expert's uncertainty when they may not be sure about their best guess. Therefore, in the Quartile methods, the best guess was used as the median, while their minimum and maximum percentages were represented as the Q1 and Q3, respectively. The cumulative probabilities were set at 0.25, 0.5, and 0.75.
A minimum informative non-parametric distribution then spread the mass uniformly between estimates to make sure the joint distribution fully is the expert assessment [29]. The arithmetic mean of the lognormal mean and variance was obtained with the assumption that each expert contributed equally. The mean and variance were then used prior to the Bayesian framework for the exposure compliance assessment.

Bayesian Framework
Consistent and similar coal dust exposure from expert judgements data of the job titles were used to update current monitoring data. The posterior geometric mean (GM), geometric standard deviations (GSD), P95, and the probabilities of finding the P95 exposure in each of the exposure control categories were obtained. For the prior specification, we randomly selected a prior sample size of five out of the six experts' data collected, as recommended from earlier studies for occupational exposure assessment [30,31]. From the occupational exposure perspective, the prior sample size should be from 10% to 40% of the current data so that the posterior distribution learned more from the current likelihood data than from the prior distribution. Therefore, the sample size of five was used to keep the focus of the posterior distribution on the current data, as the sample size of the current data increases. In Bayesian statistics, the posterior distribution is a compromise between the information from the prior and the current data, but the distribution must be seen from the current data to a good measure as the sample size increases [32]. For the likelihood function, we took all the available current monitoring data.

Model Specification Using Current Likelihood Data
The likelihood was specified as µ and σ which is the GM and GSD, respectively. The likelihood function is presented below in Equation (1).
y i is the log-normal (LN) current monitoring data, n is the number of observations. The OEL exposure categories were added as a random variable in the model directly to produce the posterior probability distribution of the P95 to each of the categories [30,31].

Model Specification Using a Non-Informative Prior Distribution, and Informative Prior from the Historical Data and Expert Judgements
The non-informative prior follows a uniform distribution with GM and GSD. For the GM, µ = LnGM ∼ Unif(a µ b σµ ), and for the GSD, , where a and b are the lower and upper bounds of the prior distribution, respectively.
For the informative prior, the GM µ takes the form as shown below in Equation (2).
where − y 0 and s s y0 are the prior mean and variance, respectively, and n 0 is the prior sample size. For the GSD, the variance is given by Equation (3).
For n 0 > 1, where IG (a, b) is an inverse gamma distribution in Equation (4) with shape parameter a, and scale parameter b Therefore, Further details on the prior specification and full conditional for µ and σ 2 are available at Made et al. [33], and Quick, 2017 [31].

The Sensitivity Analysis for the Parameter Space
In this study, we placed restrictions on the lower bounds of the µ and σ 2 and P95 using the suggestions from Bayesian decision analysis (BDA) [15]. For the upper bound, we took motivation from Quick, 2017 [31], and allowed it to vary iteratively. The upper bound was not placed on any restriction to avoid being unfairly skewed towards a more favorable or unfavorable result. Despite this, we believed that the use of the parameter space might misclassify an exposure to a wrong exposure category. Therefore, we randomly took two job titles, namely shuttle car operator and pump attendant. We changed the parameter space and compared their findings.
In shuttle car operator, the GM and GSD's lower bound was set at 0.00001, assuming that the coal dust exposure cannot be zero. Their upper bound is set to be just above the GM and GSD values calculated in this study data. The GM is 1.08 so the upper bound was placed at 1.09; the GSD is 2.62 and upper bound was placed at 2.63, to make sure the exposures estimate fall within.
For the pump attendant, we placed the upper bound of the GM to be 0.42, just above the actual value of 0.41, while GSD was placed at 3.42 above the actual value of 3.41.
We also ran analysis on these two job titles without placing any restrictions on the parameter space. In other words, no lower and upper bounds were specified.
The full posterior conditional distribution was drawn using the Markov chain Monte Carlo (MCMC) of the Gibbs sampler [34]. The Gibbs sampler was used because of computational flexibility. The Gelman-Rubin convergence diagnostics was used to assess the stationary of the posterior distributions including the reliability of the posterior samples. Please refer to Made et al. [33] for more details on the Gibbs sampler and convergence diagnostics.

Results
A summary of the job titles and their corresponding exposures are presented in Table 1. A total of 10 job titles were included in this study. The job titles were chosen on the basis that they have both monitoring (current) data and earlier (historical) data in the HEG they belong to. There are 455 observations in the monitoring data and 108 from historical data. Six job titles in the current data showed high exposure variability according to the GSD (GSD greater than 3), while from the historical data, eight job titles showed high exposure variability according to the GSD. All the job titles showed statistically non-significant differences in median exposure variation across their parent HEGs (p < 0.05).  Table 2 shows the posterior GM, GSD, and P95 from the Bayesian analysis. The GM from the non-informative and informative prior distribution were all below the OEL. Only one job title had a GM above the OEL in the Bayesian findings generated by informative prior from expert judgments. For variability of exposures, four and five job titles showed high variability according to the GSD in the non-informative and informative prior from historical data, respectively. The posterior GSD from the expert judgments showed that six job titles had a high variability of exposure. Three of the job titles showed consistently high variability of exposures across all the methods. Figure 1 shows the comparison of the posterior P95 by job titles. The findings from the use of expert judgments as prior indicated higher P95 in all job titles exposure. Only for pump attendants, roofbolt operators, safety officers, and shuttle car drivers, the P95 is lower from historical data than from the non-informative prior distribution. Table 3 estimates the probabilities of grouping exposure in each exposure category. Overall, there were high probabilities (above 75%) of finding exposures in the highest category (4) (above OEL/very poorly controlled). The pump attendants' exposures had a higher probability of being in exposure category 4 from non-informative prior (79.69%) than historical data (73.09%). However, there were higher probabilities of being in exposure category 3 when using historical data compared to the non-informative. The findings from the experts showed more than a 99.9% probability of all the exposures in the poorly controlled group (exposure category 4).  Table 4 shows the results of the sensitivity analysis. The findings of the parameter space we employed versus putting no restriction on the bound and using different parameter values were compared (see Section 2). Using no parameter space revealed 100% probability of finding the P95 in the highest exposure category. Using parameter values described in the Section 2 showed similar findings with the values inspired by the BDA on the probabilities of locating the P95 in each exposure category. The posterior probabilities when using experts' judgment as prior were similar (99.94% versus 99.91% for exposure category 4). The findings from the non-informative and informative prior from historical data showed slightly higher probabilities of the exposure in category 3 when using different parameter values than values from the BDA. Since we used a lower parameter space for the lower and upper bound, this resulted in lower probabilities of finding exposures in category 4.

Discussion
This study aimed to compare the posterior probabilities of grouping the P95 according to the COP exposure control categories or groups. We derived informative prior distributions from expert judgments and historical data for each job title. Since the same job titles were found in more than one HEG or mining area, we included those that were similar in exposure distribution across HEG. This is important, as each HEG could have different exposure control methods that could influence exposure variability. It is only natural that the exposure distribution of a job title should be similar regardless of the area, since they perform the same work activity [6]. This is important, because their findings can be generalized back to the workers that have yet to be selected. In occupational exposure assessment, incorporating expert judgments in a Bayesian framework for exposure categories was first used and proposed by Hewett et al. [15]. The use of expert judgments in this study is necessary because if historical data are unavailable or inadequate, expert knowledge can always be used to answer questions about the quantity of interest. The expert elicitation was conducted using a modified IDEA [17]. We changed the IDEA technique by not allowing expert judgments to be shared among each other. It was good to avoid a more experienced expert having too much influence over the final expert prior to distribution. The confidence level in individual expert judgments was adjusted using the 70% upper credible limit from the Committee of European Normalization approaches [35]. The joint probability lognormal distribution for the prior mean and variance from the individual experts were derived using the SHELF package [27]. The lognormal parameters were generated so that the lower point is the 2.5th percentile and the highest value is the 97.5th percentile. SHELF was used to conduct the elicitation of uncertain quantities in the form of probabilities from a group of experts. Instead of running expert group interaction and sharing information to get a consensus, we only used SHELF to produce the lognormal distribution of the mean and variance of the uncertain quantities. These parameters were used as informative priors in a Bayesian framework.
Our study found that the posterior GM from non-informative priors and the prior derived from historical and expert data were all below the OEL (OEL = 2 mg/m 3 ), showing no overexposure risk when considering the GM as a parameter to estimate the risk ( Table 2). It is important to know that exposure management decisions are based on the P95 of the lognormal exposure distribution [36]. The posterior GSD showed more job titles had very high exposure variability in the expert prior than in the historical and weak priors. The distribution of the median P95 (95% CrI) across all methods is in Figure 1. The P95 estimates the upper bound of the TWA8h exposures and can be achieved for 95% of the workers [37]. Compared to the other two approaches, the P95 was mostly lower in the non-informative prior. The 95% CrIs were more expansive in the non-informative prior and the prior derived from experts' judgments. That means that a risk decision may not be taken with high confidence, since the level of uncertainty is high. The reason the experts' data had a wider 95% CrI may also be due to the different beliefs of experts about their uncertainty and level of confidence.
The posterior probabilities of finding the P95 in a specific exposure control category are shown in Table 3. From the informative prior from historical data, there were at least 75% probabilities of finding the exposures in the highly exposed exposure category 4 similarly to the non-informative prior distribution. The findings from using expert judgments showed more than 95% probabilities of exposures in the highest exposure category. This may show the experts either overstated their beliefs or their exposures were higher. The results from the experts are inconsistent with those from previous studies where qualitative exposure assessments showed underprediction of exposure by experts [38,39]. The results generated by the experts and historical data tend to show more consistency. These higher probabilities of finding the P95 in the highest exposure category when using expert judgments might be attributed to overconfidence on the part of some experts, despite reducing their confidence level by 30%. Some of the GSDs indicated low variability of exposure, but the respective probabilities of locating the P95 were found in higher exposure control categories. These contrasting results might show that incorrect priors updated the posterior P95. Therefore, more sampling might be needed to repeat the process and reach a decision. Our results also contrast with the earlier study, which used HEGs; here, we found that the probabilities of the exposures being in the higher category (poorly controlled exposure) were greater than 95% [33]. This clearly explains why grouping by using a job title can give a truer exposure distribution than grouping according to HEGs. Despite having only some of the data for compliance testing, exposure determinants, e.g., specific activities (often associated with job titles), workplace conditions, and production volume, are highly needed in exposure assessment [40,41]. Therefore, job titles are important to consider as a determinant of exposure variation.
The BDA motivated the lower and upper bounds placed on the parameter space in this study (see Table 4) [15]. The use of bounds in the parameter space is essential to avoid the exposure being classified in an unfavorable exposure category, either in the lowest exposure category, the highest exposure category, or infinity. According to the BDA, the GM and GSD cannot be any values, no matter how large or near-impossible they can be like in frequentist statistics. Thus, the minimum and maximum values were set as bounds for GM and GSD. However, in this study, specifying bounds for the parameter space might misclassify exposure towards the favorable or unfavorable exposure control category. We conducted a sensitivity analysis to compare the bounds adopted in this study with if no bounds are specified, or different bounds are used. The findings suggest that using no bounds tends to place exposures in the poorly controlled exposure category compared to using parameter values in the parameter space. The use of another bound shows comparable results to the one we adopted. It is also natural that exposure measurement values in occupational settings cannot be extreme or at infinity, thus the use of bounds. An earlier study revealed that because of the small sample size of the data, the use of no bound could have placed the exposure in the highest exposure category [31].
We draw strength from this study because Bayesian statistics naturally allow earlier information from historical data or expert judgments to be incorporated into the model with the current data for informed decision-making [7]. Even with a small sample size, the results are robust and easy to interpret, even by non-statisticians. One limitation of the study is that, since we had inadequate data from experts, we could not apply weighted aggregation, which would have improved the accuracy of the prior experts. Experts' judgments can also be very biased, and their responses are dependent on their knowledge of the activities of a specific job title. Human judgments can be complex and biased. Sometimes experts rely on the Rule of Thumb to make decisions or judgments about unknown quantities, including probability assessments. The Rule of Thumb does not synthesize information like the algorithmic process to arrive at a particular judgment [42,43]. Therefore, biases might be common with the use of expert judgments. Expert training is highly encouraged in an expert elicitation process to minimize biases and improve the cognitive interpretation of the information needed. Finally, it is important to note that Bayesian analysis also offers confidence in whether to use experts' judgment as a prior or not, as it can compare expert prior updates with the non-informative prior.

Conclusions
Expert judgments are particularly useful when data are scarce or the available data are inadequate. It is well known that experts' judgments can be very biased, but the use of a robust method and a sensitivity analysis may minimize the bias. Our findings suggest that prior expert judgments can produce similar posterior probabilities of exposure when compared to historical data. This study seeks to advance the prevention of overexposure to coal dust. Therefore, mine ventilation engineers, occupational hygienists, and safety officers are encouraged to incorporate our findings into their routine risk assessments. Future occupational exposure compliance assessments should be based on this Bayesian framework, where historical data or, if unavailable, experts' judgments should be used to reach an informed decision. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.
Acknowledgments: Our gratitude goes to Salome Makgato from the Zibulo Colliery for giving us a full description of the data collection. We also thank our guess experts for their unwavering support and contributions.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix B. Data Collection Questionnaire
Experts Prior Elicitation Instruction for the experts Based on the 95th percentiles (P95) value of the eight-hour time-weighted average (TWA8h) coal dust concentration distribution for a specific job title, question (a) will ask you to rate (in percentages) where you think the chances are (or the likelihood is) that the P95 would be in each of the occupational exposure category. Instruction: If you think that it is absolutely clear that it will fit in exposure category 4, you fill in 100% and leave the rest of the categories empty; if you think that the highest chance is (most likely) that the P95 will be in exposure category 4, e.g., 80%, but there also a chance that it will be in category 3, say 20%, you fill in 80% for category 4 and 20% for category 3. The total sum of the percentages should be equal to 100. Therefore, you have to distribute your ratings in categories in a way that the total is 100%.
The next question (b) asks you to give the range (minimum-maximum) of the percentages that you answered to question a (your best estimate).
Question (c) is about your level of confidence that the answer(s) you gave to questions an and b are correct. Please bear in mind that this is about your level of certainty and is different from the estimated distribution of the P95 over exposure categories, as was asked in questions (a) and (b). Table 1 is the description of the exposure categories. Below are the three questions that you can use to put your rating in percentages in Table 2 for each job title. For question a, please make sure your answers are equal to 100%.
Expert elicitation questions for Table 2 (please give all answers in percentages).
(a) Based on your experience in the industry, what would be your best guess in percentages, of the P95 of exposure concentrations being found in each of the exposure categories? (b) What would be the maximum and minimum percentages in each of the exposure categories? (c) How sure are you that the above answers are correct?
Appendix C