1. Introduction
Safety analysis focuses on identifying the most significant factors that affect the occurrence of an incident [
1,
2]. Analyzing occupational incidents based on industry and injury characteristics is important for finding causes of accidents, and managing prevention planning [
3]. Occupational injury management is significant from organizational, engineering, and economic points of view in an industry [
4]. Applying data analytics to identify risk groups in an organization results not only in optimizing workers’ productivity, but also in safety improvement through the targeted control of occupational injuries and illnesses [
5]. Injury risk analysis allows investigating modifiable occupational injuries by focusing on intervention [
6].
The prediction of occupational incidents is an important task for any industry [
7]. Since occupational incidents affect workers’ lives, both in and out of work, and impose a considerable economic burden on employers, employees, medical care systems, insurance companies, and society, taking planned actions to reduce the frequency of occupational incidents is necessary [
8]. With the purpose of reducing the frequency and severity of occupational incidents and improving occupational safety, researchers aim to identify causes and mechanisms of the occurrence of occupational incidents [
9,
10]. Introducing any preventive measure to reduce the risk of occupational injuries is based on a correct assessment of risk components using quantitative methods [
11]. Data mining techniques contribute to deriving actionable conclusions from empirical data to improve workplace safety [
4]. Understanding the factors influencing the occupational incidents is the first step in the process of preventing incidents and improving workplace safety [
9]. In addition, analyzing trends and patterns of occupational incidents helps to develop effective and actionable incident prevention strategies and reduce or eliminate workplace injuries [
2]. Incident severity prediction models are crucial for improving safety [
12]. In incident and injury severity analysis, identifying subgroups of incidents with homogeneous categorical variables is significant in determining factors that contribute to incident severity [
13]. Furthermore, one of the principle objectives in occupational safety analysis is to identify the key factors that affect the severity of an incident. Most datasets with information about incidents have the issue of heterogeneity in the data [
14]. Thus, clustering is a beneficial method in analyzing incident factors, and gaining information about those variables that are statistically significant [
15].
Latent class clustering for identifying injury/incident patterns has been used mostly in the field of traffic accidents or crash severity analysis. An analysis of cyclist-motorist crashes between 2007–2011 in Denmark revealed 13 distinguished latent classes, and contributed to investigating their prevalence and severity [
14]. This analysis showed the features that distinguished the latent classes included incident factors such as speed limit, helmet wearing behavior, and road surface conditions. After determining latent classes of cyclists–motorists, the severity of injuries per latent class was analyzed. A similar study performed in Italy [
13] on cyclist crashes between 2011–2013 applied latent class clustering to identify distinguished subgroups of crashes with categorical variables including road infrastructure, road user, vehicle, and environmental and time period variables. They segmented the cyclist crashes into 19 subgroups, each representing a different crash type. Another study used crash data on highway–railway crossings between 1997–2006 in the USA [
16], and segmented the injury risk groups of such crashes using the latent class clustering approach. The results identified the most influencing factors on crash occurrence, separating the injury risk groups include the driver’s age, as well as the presence of rain or snow, time of the crash, and motorist’s actions prior to the incident [
16]. Several other studies applied latent class clustering for incident pattern recognition, and yielded useful information regarding clusters/subgroups of such incidents and injuries [
17,
18,
19,
20,
21,
22,
23]. Although the latent class clustering approach has been popular in crash severity analysis, its application in occupational safety and health analysis has been limited. Virtanen et al. [
24] performed a study on 2,445 employees with diabetes to segment them into separated clusters based on potential risk factors for work disability in future. Another study used patients’ data and applied the latent class clustering method to identify factors associated with the risk of low back pain [
15].
Despite ongoing improvement in coordinated prevention measures, improved technology, training, and higher education of the workforce [
25], agriculturally-related industries are among the most hazardous work environments [
26]. Understanding injury patterns and the underlying mechanisms of incidents with respect to a specific industry may produce effective insights for boosting policymaking, training, and incident/injury intervention and prevention efforts [
27]. The aim of this study is to identify distinguished and meaningful subclasses of occupational injuries with inflated costs in agribusiness industries incidents based on workers’ compensation claims data. The inflated costs are indicative of the severity of the injury; the higher the claim-incurred amount in dollars, the more severe the injury. The novelty of this study is that it introduces a novel application of latent class clustering in the segmentation of high severity occupational incidents in agribusiness industries. The results will contribute to informed decision-making that could help either prevent the occurrence or reduce the frequency of severe injuries with inflated costs in agribusiness industries.
3. Results
This section includes a discussion of the model fit statistics to determine the best number of latent classes, and an explanation of the characteristics of the selected latent classes as well as the most statistically significant classifiers of latent classes. An analysis of the relationship between the latent class members and injury outcomes and costs completes this section. The terms “class”, “cluster”, and “latent class” are interchangeably used.
3.1. Summary of Latent Class Analysis
The LCA is conducted to identify statistically distinctive and meaningful risk subgroups of occupational incidents in agribusiness industries based on
injury type,
class codes,
injured body part(s),
cause, and
nature of the injury. In the first step, the latent class analysis is employed as an explorative method for pattern recognition in the data fitting eight models with 3 to 10 latent classes. AIC and BIC are used as the relative fit measures. Lower values for BIC and AIC show a better fit to the data. The fit statistics for models with different numbers of classes are shown in
Table 3.
The changes in BIC and AIC represent the model with three classes as the best fit. Based on values of BIC and AIC, three classes with different injury patterns are found: class one (44.32% of the population), class two (34.31% of the population), and class three (21.37% of the population).
In order to decide each row of the data, which includes various levels of input variables, belonging to each latent class (class 1, 2, or 3), the probability of the class membership is calculated for each latent class. By comparing the three probabilities, the one with the highest value determines the latent class to which that specific data row belongs. The statistical details for calculating per class formula is given based on [
38].
Let j = 1, ..., J represent the observed columns (Y) of input variables. For this study, those Y columns are the input variables of
injury type,
class codes,
injured body part(s),
cause of injury, and
nature of injury. Denote the number of levels for column j by Rj. A multidimensional contingency table of the J variables contains W = R1*...*RJ cells. Each of these cells is defined by its response pattern for the J variables. Therefore, each response pattern is a J-length vector of the form y = y1, ..., yj. Define Y to be the W by J array of all the response patterns considered as row vectors. Each element, y
w, in Y has a probability Pr(
yw). These probabilities sum to 1, as given in Equation (4):
Consider the following notation:
C is the number of clusters in the latent class model.
is the probability of membership in cluster c (the are the latent class prevalence). These parameters sum to 1.
is the kth level of the jth response.
is the probability of observing response rj, k in column j conditional on membership in class c (the are the item-response probabilities). For a given cluster and response variable j, the sum of the is 1.
is an indicator function that equals 1 when the yj response is the kth level of the jth response, and 0 otherwise.
As presented in Equation (5), the probability of observing a specific vector of responses
yw = y1, ..., y
j is the sum of the conditional probabilities of observing that vector of responses for each of the C latent classes:
Thus, Equation (5) is the denominator of the probability formula that is saved to the dataset for each row. The final formula for probability per latent class gives Pr(Cluster = c|yw), which equals to Pr (yw, Cluster = c)/Pr (yw).
For example, for a data row that describes a permanent partial disability in chauffeurs and helpers class code, for knee as the injured body part, with cause of strain or injury by, and nature of strain or tear, the probability of belonging to latent classes 1, 2, and 3 is calculated as 0.9982, 0.0017, and almost zero, respectively. Thus, this row is labeled as latent class 1. Similarly, such process continues until all data rows are labeled as either latent class 1, 2, or 3. Then, the frequency of all the classes is calculated. Based on the results, class one includes 44.32%, class two has 34.31%, and class three counts includes 21.37% of the data rows.
Furthermore, the analysis of the 3-cluster model shows that medical injuries, major permanent partial disability, minor permanent partial disability, and permanent total disability are not present in any of the three classes. However, permanent partial disability and temporary total or partial disability injuries are the most prevalent in all three classes. The only class that includes fatality with some size (0.16) is class three. Chauffeurs or helpers, grain elevator operations, gas and oil dealers, hay grain or feed dealers, grain milling, and farm machinery operations are the class codes present in all three classes with various probabilities (only those class codes with a probability higher than 0.06 are shown in the class tables). The most statistically distinctive factor is the nature of the injury, which is different in each class with a sizable probability. Injured body parts and cause of injury are also different in each class with a less significant presentation probability. The mean total costs of claims are also different, with class one having the lowest mean of $205,583 and class three having the highest mean of $374,783. The mean total cost of claims for injuries in class two is $289,086.
3.2. Contributing Factors in Differentiating Classes
Based on the statistical details given in
Section 2.3, the effect size per input variable and its corresponding Logworth values are calculated and shown in
Table 4. Considering the values of LR Logworth, all the input variables are statically significant classifiers of latent classes for the selected three-class model, with nature of the injury as the most influential factor in segmenting occupational incidents.
3.3. Characteristics of Latent Class Members
Class one is characterized by the very high probability of 0.78 of strain or tears as the nature of the injury. The significant type of injury is permanent partial disabilities (0.75) and temporary total or partial disability with a much lower probability of 0.24. Such injuries occurred in the lower back area (0.37), shoulders (0.29), and knees (0.11). The dominant causes of injuries in this class include lifting with the probability of 0.22 and strain with the probability of 0.16 followed by fall, slip, or trip (0.08), injury on ice or snow, twisting and repetitive motions (0.05). Class codes with the highest probability are chauffeurs and helpers (0.12), grain elevator operations (0.097), and gas and oil dealers (0.08). The specific probabilities of this class are shown in
Table 5.
Class two consists of injuries with 0.75 probability of permanent partial disability. This class is characterized by fracture and contusion as nature of injury with probabilities of 0.47 and 0.23, respectively. The most significant cause of injury is fall; fall from a different level (elevation) has the highest probability of 0.19, followed by slip or trip (0.10) and fall from ladder or scaffolding (0.09). Motor vehicle, falling or flying objects, and falling on snow or ice are less prevalent causes of injury in this class. Multiple body parts have the probability of 0.12, while the knees, ankles, and shoulders have an equal probability 0.08. Hips, soft tissues, and the skull have the lowest probabilities of 0.06, 0.05, and 0.05, respectively. Class codes with the highest probability are chauffeurs and helpers (0.12), grain elevator operations (0.098), and hay grain or feed dealers (0.08). The specific probabilities of this class are shown in
Table 6.
As shown in
Table 7, class three is characterized by the nature of injury for all other specific injuries, amputation, laceration, fracture, burn, concussion, and crushing in multiple body parts, hand, lower leg, foot, fingers, and skull, which are caused mainly by machine or machinery, vehicle upset and being caught in, under, or between categories. Class three is different from the other two classes in that it is the only one including death with a big enough probability of 0.16. However, the probability of permanent partial disabilities (0.64) and temporary total or partial disabilities (0.16) are lower, compared to the earlier classes. Class codes with the highest probability are grain elevator operations (0.11), hay grain or feed dealers (0.08), grain milling (0.06), and farm machinery operations (0.06). Injuries in multiple body parts have a probability of 0.30 with specific injuries having the probability of 0.20.
As discussed previously, all the same class codes are present in all classes with slightly different probabilities. However, looking at the mean total cost of claims for each class code within each class shows the noticeable differences depicted in
Table 8.
3.4. Association of Class Membership and Injury Outcomes
Based on the data in
Table 7, the financial risk calculation is done for the expected losses of the workers’ compensation claims in the classes selected based on the LCA model in the previous section. The financial risk definition used here is the multiplication of the frequency of losses (number of incidents in each class) by the severity of losses (the mean of the total cost of claims incurred per class code in each class).
Figure 1 shows the results of the financial risk calculation. This provides a simple frame for estimating future losses based on the historical data and the latent class analysis. As
Figure 1 shows, the biggest claim costs were from chauffeurs or helpers, grain milling, and grain elevator operations in latent classes one, two, and three respectively, between 2008–2016. The occupational injuries (or fatalities) among grain elevator operations in class three and the grain-milling class codes have the highest mean total claim cost compared to classes one and two.
3.5. ANOVA Test for Mean Total Claim Costs Per LCA
In addition, analysis of variance (ANOVA) is used to test whether the differences in average claim monetary values among the three latent classes are statistically significant. This analysis helps in confirming the perception that the difference in the cost of the severity of incidents in each class does not occur totally at random, and is due to some existing variables in each class. As discussed above, the mean total costs of claims are also different. Latent class one has the lowest mean of
$205,583 while latent class three the highest mean of
$374,783. The mean total cost of claims for injuries in latent class two is
$289,086. As shown in
Table 9, an injury has a cost of
$232,000 to
$305,000 in class one,
$225,000 to
$312,000 in class two, and
$233,000 to
$335,000 in class three. According to
Table 10, the
p-value
< 0.05 suggests that the difference in the average cost among pairwise classes is also statistically significant.
4. Discussion
The results of the present study suggest that the occupational injuries in major agribusiness industries in the Midwest of the United States consists of segments characterized by a distinct nature of injury patterns and occupation classes. The insight gained through this study can be used to define a different categorization in the workers’ compensation field based on injury characteristics for severe injuries. This helps risk managers and safety professionals design and implicate preventive measures and strategies occupation-wise to achieve the goal of fewer and less severe injuries. This work provides a basis for analyzing severe injuries in a high-hazard industrial environment. The results of this study have significant applications for safety practitioners. Reducing the total cost of risk is a major goal for risk managers, and for claim and safety professionals in any organization. The results of the study have significant implications in determining which ergonomic investments will have the greatest impact on a company loss. LCA modeling showed that the driving factors of loss include strain, tear, fracture, contusion, amputation, laceration, burn, concussion, and crushing when leading to permanent partial disabilities in the lower back area, shoulders, knees, soft tissues, hip, lower leg, ankle, skull, finger, foot, hand, and multiple body parts. In addition, such injuries created excessive costs when the injured workers were working as grain elevator operators, grain millers, hay grain feeders or dealers, chauffeurs or helpers, and gas or oil dealers. Those injuries had causes of caught (in/under/between), vehicle upset, machine or machinery, falling or flying objects, motor vehicles, from ladder or scaffolding, fall, strip, trip, from different levels (elevation), and strain or injury by, pushing, pulling, or twisting, on ice and snow.
As shown in
Table 11, the average age of injured workers is 45 to 50 years old for all three classes. Even though age was not as important a variable in the prediction of severe injuries, the analysis shows that a higher age of workers imposes higher medical and indemnity costs on the employers, employees, and insurance companies. This confirms prior research that hazards in the workplace may exacerbate age-related disorders. One limitation of the study is that the dataset does not provide any information of the medical history of the injured workers. Having access to prior records of injury per worker, which is entitled to specific ethical issues, would clarify more information about the high medical costs. This clarifies the importance of ergonomics and health data collection in agribusiness industries to reduce the total cost of risk.
The LCA models help obtain more information about the most important characteristics in each cluster and the interaction between various variables. Overall, the nature of injury, cause of injury, and occupation are the classifiers that most differentiate the clusters.
Cause of injury is a preventable factor, as it exists in the workplace prior to incident occurrence, while the nature of injury is defining after the incident occurs. Therefore, identifying causes of injuries is significant in reducing the likelihood and frequency of injuries, while identifying the nature of injuries can help in estimating health care cost planning and management.
The claims with the highest costs were incurred on the injuries in cluster 3 with an average total incurred value of $375,000. Considering the causes of injuries in cluster 3, the main predicted causes were motor vehicle, crash of rail vehicle, vehicle upset, animal or insect, temperature extremes, slip or trip, electric current, hand tools (not powered), absorption/inhalation/ingestion, moving parts of machines, caught in/under/between, and struck or injured by.
The next large claims were incurred on injuries in cluster 2, with an average of $289,000. It was predicted that injuries in cluster 2 were caused by cold objects/substances, explosion or flare back, fall from elevation, objects being lifted or handled, from liquid or grease spills, and striking against or stepping on. Freezing was the predicted cause in both cluster 1 and 2 with an equal probability of occurrence.
Deriving the specific causes of injuries can direct the focus of prevention measures to decrease the chance of future incident occurrence by removing the sources of risks. Using the injury information from this study, safety and health training and educational programs can focus on the identified causes for high-cost injuries to decrease hazard exposures and reduce the probability and costs of potential occupational incidents. Risk management control alternatives can be employed including risk avoidance, loss prevention and reduction, setting standards for defining acceptable performance, comparing the actual results with the standards, and modifying actual results to comply with standards.
Considering injury nature, the predicted prevalent nature of injury in cluster 3 includes vision loss, hearing loss or impairment, strain or tear, puncture, asphyxiation, amputation, laceration, carpal tunnel syndrome, concussion, rupture, electric shock, and respiratory disorders with probability over 80%. The nature of injuries that were predicted to occur with more than 80% probability include contusion, dislocation, and fracture in cluster 2, and sprain or tear, and inflammation in cluster 1.
Such insight informs technical and managerial decisions about the planning and executing risk management programs in agribusiness industries. Technical decisions answer the question of what action should be taken in which areas, while managerial decisions address the questions of who should take action, and how. A risk management program includes stages of identifying risk exposures, measuring and estimating risk exposures, risk mitigation strategies, and continuous performance evaluation of risk mitigation strategies.
5. Conclusions
Using workers’ compensation extensive claims, the aim of this paper was to identify distinctive and meaningful classes of occupational incidents based on workers’ compensation claims data on injuries. Based on latent class analysis, three main classes were identified that included the details of injuries per class. The results from analysis of variance confirmed that the difference in the average severity of incidents’ cost in each class does not occur totally at random and is due to some existing variables in each class. The occupational injury analysis carried out in this study can be repeated systematically per year to identify sources of safety risk, analyze the underlying causes of injuries, and decide on proper safety measurement plans to avoid the occurrence of similar incidents.
The study has several limitations arising from the nature of data. First, there is inconsistency in data collection or recording processes. Not all incident reports included accurate information on the age and tenure of the injured workers due to wrong or missing entries. This might be due to wrong entry, lack of data, or human error. Second, claims are recorded based on the injured workers’ information and general industries. Having access to detailed data about the injury history in specific industries can make the analysis more focused and useful. Finally, the data do not provide any information about the working hours and days away from work. Having access to detailed information, the probability of future injuries can be calculated using alternative clustering analyses that fit to numerical variables as well as categorical factors. Furthermore, the presence of added variables such as working hours and days away from work contributes to identify their potential association and trend within each cluster. In addition, the availability of days away from work data can help develop similar models to predict days away from work and indemnity costs in a new research.
Although the focus of this study was on analyzing severe injuries in agribusiness industries, a similar approach is useful in analyzing and determining patterns of severe injuries in other manufacturing industries. In addition, this study enlightens the value of ergonomic and health data collection and analyses. The results suggest that when specific medical and health information of the injured workers is available, quantitative analyses are reliable in estimating loss cost and addressing the bottlenecks in inflated claims. Future work can focus on studying the possibilities and tools for collecting ergonomic and health data for specific industries and occupations. The more detailed and reliable data are available, the more realistic, reliable, and applicable the quantitative analyses and the models will be for implication in injury prediction and reduction.