Next Article in Journal
Exploring the Associations between Autistic Traits, Sleep Quality and Well-Being in University Students: A Narrative Review
Previous Article in Journal
A Scoping Review of Patient Health-Related Quality of Life Following Surgery or Molecular Testing for Individuals with Indeterminate Thyroid Nodules
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Risk Factors for Musculoskeletal Disorders in Korean Farmers: Survey on Occupational Diseases in 2020 and 2022

1
Department of Applied Statistics, University of Suwon, Hwaseong 18323, Republic of Korea
2
Department of Occupational & Environmental Medicine, Wonjin Green Hospital, Seoul 02221, Republic of Korea
3
Department of Data Science, University of Suwon, Hwaseong 18323, Republic of Korea
*
Author to whom correspondence should be addressed.
Healthcare 2024, 12(20), 2026; https://doi.org/10.3390/healthcare12202026
Submission received: 3 September 2024 / Revised: 27 September 2024 / Accepted: 2 October 2024 / Published: 11 October 2024

Abstract

:
Background/Objectives: This study investigated factors influencing the prevalence of musculoskeletal disorders (MSDs) resulting from agricultural work, utilizing the 2020 and 2022 occupational disease survey data collected by the Rural Development Administration. The combined data from these years indicated a 6.02% prevalence of MSDs, reflecting a significant class imbalance in the binary response variables. This imbalance could lead to classifiers overlooking rare events, potentially inflating accuracy assessments. Methods: We evaluated five distinct models to compare their performance using both original and synthetic data and assessing the models’ performance based on synthetic data generation. In the multivariate logistic model, we focused on the main effects of the covariates as there were no statistically significant second-order interactions. Results: Focusing on the random over-sampling examples (ROSE) method, gender, age, and pesticide use were particularly impactful. The odds of experiencing MSDs were 1.29 times higher for females than males. The odds increased with age: 2.66 times higher for those aged 50–59, 4.60 times higher for those aged 60–69, and 7.16 times higher for those aged 70 or older, compared to those under 50. Pesticide use was associated with 1.26 times higher odds of developing MSDs. Among body part usage variables, all except wrists and knees were significant. Farmers who frequently used their necks, arms, and waist showed 1.27, 1.11, and 1.23 times higher odds of developing MSDs, respectively. Conclusions: The accuracy of the raw method was high, but the ROSE method outperformed it for precision and F1 score, and both methods showed similar AUC.

1. Introduction

To understand the extent of diseases and injuries caused by agricultural work among Korean farmers, and to develop an agricultural safety and health policy to improve farmers’ quality of life, the Rural Development Administration (RDA) initiated the “survey of occupational diseases and injuries of farmers” in 2009, designating it as a nationally approved statistic [1]. This survey, which alternates between disease and injury assessments, is conducted every other year. Selected farmers are visited directly and interviewed about diseases or injuries related to agricultural work that occurred in the past year. Since the survey relied on farmers’ recollections, it took the form of a recall survey. While research on farmers’ injuries is relatively active, both domestically and internationally, studies focusing on diseases are relatively rare. In South Korea, research on farmers’ diseases has typically been limited to specific conditions, particular types of agriculture, or specific regions, with comprehensive studies on farmers’ diseases being nearly non-existent.
A literature review was conducted on occupational diseases among farmers, which highlighted conditions that are more prevalent in agriculture than in other occupational groups [2]. Several studies focused on musculoskeletal disorders (MSDs), which represent the largest proportion of diseases among farmers [3,4,5,6,7,8]. In Poland, data on diseases of agricultural workers were analyzed by extracting information from Polish industrial accident records from 2000 to 2004 [9]. Due to the comprehensive registration of all patient information in the administrative system, research in Poland can leverage the full datasets obtained through administrative reports. Similarly, compensation claims for farm injuries and diseases were analyzed using time-series data from Finnish farmers insured by industrial accident insurance over 26 years (1982–2008) [10]. MSDs have emerged as the most common diseases related to agricultural work among Korean farmers. According to survey data, the prevalence of MSDs was 5.18% in 2020 and increased to 6.72% in 2022. Despite being the most frequently reported condition, the overall prevalence of MSDs in farming populations remains relatively low. This study aimed to identify the risk factors associated with MSDs in agricultural work and to develop basic guidelines aimed at reducing the incidence of MSDs among farmers.
However, a significant disparity between the incidence of events and non-events, such as the prevalence of MSDs among Korean farmers, can lead to an overestimation of the accuracy of classifiers [11]. This occurs because rare events are often overlooked and misclassified as more common during the model estimation stage for the training dataset. Studies have shown that logistic regression models underestimate the conditional probability of rare events [12]. Similarly, linear discriminant analysis (LDA) often relies on the dispersion of the prevalent class, as the common covariance matrix is estimated as a weighted average of the covariance matrices of each class [13]. Both parametric methods, such as logistic regression models and LDA, and more flexible non-parametric methods, such as decision trees and support vector machines, struggle with response variables that have a skewed distribution and optimize objective functions, such as accuracy [3,14]. To address the issue of imbalanced classification, focusing on the model estimation stage has been found to be an effective approach. In an attempt to mitigate this imbalance [15,16], techniques such as “under-sampling” and “over-sampling” have been proposed to create more balanced samples between the two classes [17]. Under-sampling involves extracting samples from each category without replacement so that the number of samples is the same as that of the least frequent category. The least frequent category remains unchanged, and under-sampling is only performed on the remaining categories. In contrast, over-sampling involves extracting samples using the replacement sampling method to maintain equal numbers of samples in each category. The original data remain unchanged, and over-sampling is only performed on the remaining categories, excluding the most frequent one. However, under-sampling may ignore useful samples, while over-sampling may lead to duplicated samples and overfitting [18]. To overcome these challenges, it has been suggested that noise be added with a normal distribution to rare event samples [19] and samples be randomly selected from the line segment connecting the nearest sample in feature space and the rare event sample, known as the synthetic minority over-sampling technique (SMOTE) [20]. The adaptive synthetic (ADASYN) sampling approach [21] dynamically adjusts the number of synthetic samples created based on the difficulty of classifying rare event samples. The more difficult a sample is to classify, the more synthetic samples are generated around it. This is determined by the density of the nearest neighbors of the rare event. A strategy to lessen the effect of class imbalance in a categorical response variable on model estimates and model evaluations was provided with the random over-sampling examples (ROSE) approach [11], which is based on the smoothing bootstrap methodology [22]. Unlike methods such as SMOTE or ADASYN, ROSE applies random perturbations to both rare and common event samples. This makes it a more general over-sampling technique that enhances the diversity of the data rather than focusing exclusively on rare event boundaries. Meanwhile, more recent variants such as borderline-SMOTE and K-means SMOTE have been developed to generate more sophisticated borderline samples around minority classes or to address noise issues [23]. More recently, research on loss functions and model structures has been conducted. Focal loss, which learns from unbalanced data by underweighting the prevalent event class and overweighting the rare event class, was the focus of [24]. In contrast, generative adversarial networks, which provide an effective method for learning features from rare event classes, were studied by [25].
In this study, due to the imbalance in the distribution of the response variables related to the prevalence of MSDs in agricultural work, a resampling method such as ROSE, which includes both under- and over-sampling, was employed to generate synthetic or augmented data. Among various statistical analysis approaches, a parametric additive logistic regression model was applied to identify the risk factors influencing the prevalence of MSDs among Korean farmers.
This remainder of this article is structured as follows. Section 2 provides a brief introduction to the ROSE algorithm [11] and the R package implementation [26], outlines the data collection procedure, and introduces the response variable and predictor variables used in this study. Section 3 presents the results of the association test between the presence of MSDs related to agricultural work and the predictor variables, along with the outcomes of the logistic regression model based on both original and synthetic data. Additionally, the sensitivity, precision, F1 score, and accuracy are compared across different threshold values to evaluate the logistic regression models. Finally, Section 4 offers guidelines for reducing the prevalence of MSDs among farmers, focusing on sociodemographic factors and the frequency of body part usage; it also discusses the limitations of this study.

2. Materials and Methods

2.1. Study Design and Settings

In this study, we used survey data on occupational diseases among farmers collected by the National Academy of Agricultural Sciences under the RDA in 2020 and 2022 (data source: https://kosis.kr/index/index.do, accessed on 25 April 2023). The dataset consisted of 14,075 participants in 2020 and 16,473 participants in 2022. After excluding 1205 individuals whose primary type of agriculture was not classified, 29,343 participants were selected for the analysis. To investigate the factors influencing the prevalence of MSDs among farmers, we considered study variables such as household members, farmers, and characteristics of exposure to risk factors.
Household characteristics included sex, age, duration of agricultural activity in the past year, and pesticide use. The past year refers to 2019 in the 2020 survey and 2021 in the 2022 survey. Age was categorized as “less than 50”, “50 to 59”, “60 to 69”, and “70 or older”, while the duration of farming activity was classified as “less than 6 months” or “6 months or more”. Pesticide use was defined as “yes” if pesticides were used directly or indirectly, and “no” otherwise. The farmers’ characteristics considered in this study included primary types of agriculture and household income. The types of agriculture were grouped into five categories: “rice”, “dry field”, “orchards”, “greenhouses”, and “livestock”. Household income was divided into four categories: “less than US$3800”, “US$3800–US$14,999”, “US$15,000–US$37,999”, and “US$38,000 or more”.
The characteristics of risk factor exposure included whether participants’ body parts were in uncomfortable postures while performing agricultural work. Specifically, this encompassed whether the neck was bent or hyper-extended, if the arms were raised above shoulder level, whether fingers or wrists were used repeatedly, whether there was a posture where the waist was bent or twisted to the side, and whether postures such as squatting or bending the knees were assumed. Additionally, we assessed whether participants engaged in work involving lifting objects weighing 10–19 kg or more than 20 kg. Responses of always and frequently were classified as “yes”, while responses of sometimes or rarely were classified as “no”. The outcome variable was defined as “yes” if the individual had experienced at least one MSD either caused or worsened by agricultural work over the past year and “no” if no such condition was reported.

2.2. Statistical Methods

2.2.1. Review on the ROSE Resampling Strategy

T n was used as a training set consisting of n samples. Here, T n represented a collection of realizations for pairs Y , x with Y as a categorical response variable and x   a s a vector of predictors x 1 , x 2 , , x d . Specifically, T n = x i , y i ,   i = 1 , 2 , , n . Here, x i R d followed an unknown probability density function f(x). For convenience, the category labels of Y were denoted with J labels L 1 ,   L 2 , , L J . Additionally, n j (where n j < n ) represented the number of samples corresponding to the category label L j for j = 1 , 2 , , J , such that n = j = 1 J n j . The following steps were required to generate a synthetic training set:
Step 1: y * was selected with probability π j = 1 J .   Let y * = L j .
Step 2: One sample from the training dataset T n was randomly selected among individuals whose labels were y * with a probability of p j = 1 n j . That is, x i , y i , where y i = y * .
Step 3: Synthetic samples were generated by using the smooth kernel estimation method around the feature space of the extracted training samples x i , y i . Specifically, x * , y * , where x * was randomly chosen from a normal distribution centered at x i with a covariance matrix H j (i.e., using a Gaussian kernel). At this stage, the smooth diagonal matrix was defined as H j = d i a g h 1   ( j ) , h 2   ( j ) , , h d   ( j ) with
h q   ( j ) = 4 d + 2 n 1 d + 4 σ ^ q   ( j ) ,   q = 1 , 2 , , d ,
where σ ^ q   ( j ) represented the sample standard deviation of the q -th predictor of training samples belonging to the class label L j [27].
Step 4: The process outlined in Steps 1–3 was iteratively performed until there was a nearly equivalent number of synthetic samples in each category and the overall count of synthetic samples closely matched the number of samples in T n . Finally, we set T m * representing the synthetic training set.
For model evaluation, the R package ROSE integrated the leave-K-out cross-validation (LKOCV) and bootstrap methods, replacing the resubstitution and validation set methods. The resubstitution method may lead to an overestimation of the model accuracy, while the validation set method may exhibit high variability due to random division. The LKOCV method was preferred because, similar to the validation set method, it involved random division but showed less variability. However, the LKOCV method required a longer computation time as it involves repeated model estimation, unlike the validation set or holdout methods.
Step 1: The training dataset T n was randomly divided into Q = n K folds, denoted as T K 1 ,   T K 2 , , T K Q , where each fold contained K samples. In cases where n was not a multiple of K , Q 1 folds had the same number of samples, and the remaining fold consisted of the leftover samples.
Step 2: One fold out of the Q folds (denoted as T K i ) was chosen and the remaining Q 1 folds were grouped (denoted as T n i = T n T K i ).
Step 3: The ROSE method was used by setting T n i as the training set and a synthetic training set from T n i (denoted as T m i ) was generated according to the procedure described in Section 2.1.
Step 4: The model was estimated using the synthetic training set T m i and predictions for the samples in the holdout set T K i were obtained (denoted as P K i ).
Step 5: For each of the Q folds, Steps 2 through 4 were repeated to generate predictions for the samples in each fold (i.e., P K 1 , P K 2 , , P K Q ) . The accuracy was calculated based on these predictions.
Therefore, the accuracy of the LKOCV method was defined as:
1 n q = 1 Q j T K q I   ( y j = P K , j q ) ,
where P K , j q represented the predicted value for the j -th sample in the set T K q   q = 1 , 2 , , Q ;   j = 1 , 2 , , K .
While the error rate (or accuracy) is a commonly used measure for model evaluation, it has limitations, particularly in the context of imbalanced data. In such cases, models may exhibit a bias towards the majority class, leading to an overestimation of accuracy (or an underestimation of the error rate) [28]. For example, in a dataset with a rare event rate of 1%, a “naive” classifier that predicts all samples as the majority class would yield an error rate of just 1%, thus underestimating the true error rate.
This underscores the importance of selecting appropriate evaluation metrics when dealing with unbalanced data. Unlike the error rate, which emphasizes false positives and false negatives as shown in the confusion matrix, metrics such as precision and recall focus on true positives. Precision is defined as the proportion of true positives among all samples predicted as positive, while recall (or sensitivity) is the proportion of actual positive samples correctly identified.
In addition, the F1 score, which is the harmonic mean of precision and recall, is often used to provide a balanced measure of a model’s performance [29]. Another widely used metric is the receiver operating characteristic (ROC) curve, which plots the true-positive rate against the false-positive rate at various threshold settings. The area under the curve (AUC) represents the area under the ROC curve, with values closer to 1 indicating a better-performing model. These measures, particularly in the context of imbalanced data, offer a more profound understanding of model performance than simple accuracy or error rates.

2.2.2. Statistical Analysis

Pearson’s large-sample chi-square test was used to determine if the study variables were marginally associated with MSDs. To further investigate factors influencing the prevalence of MSDs, a multivariate logistic regression analysis was conducted. Given the imbalance in the dataset, where the number of subjects with MSDs was significantly lower than those without, resampling methods [11,26] were employed. These methods generated synthetic or artificial samples, balancing the number of subjects with and without MSDs, followed by logistic regression analysis on these balanced samples. The performance of the proposed regression models was evaluated using various metrics, such as Nagelkerke’s R-squared statistic [30], precision, recall, F1 score, accuracy, and AUC. These metrics offer a comprehensive comparison of the models’ predictive capabilities regarding the prevalence of MSDs.

3. Results

Data of the contingency and association tests between each study variable and MSDs are shown in Table 1. MSDs are rare occurrences, with an estimated prevalence rate of 6.02%. The incidence rate was strongly associated with sex, age, and household income. With the exception of the wrist, the neck, arms, waist, and knees were substantially associated with MSDs among the body parts exposed to risks during agricultural work. The task of lifting objects weighing 10 to 19 kg or over 20 kg was closely correlated with the presence of MSDs. The duration of farming activity, type of farming, and pesticide use showed a slight relationship with MSDs.
Table 2 presents the point estimates and 95% confidence intervals (CIs) for the odds ratios related to the prevalence of MSDs, while Table 3 presents the estimated performance measures for various models. In this study, we evaluated five distinct models to assess their comparative performance using both original and synthetic data. We also examined the performance of the models based on the method of synthetic data generation. In the multivariate logistic model, we focused on the main effects of the covariates, as there were no statistically significant second-order interactions. The “raw” column shows the regression coefficients derived from the original dataset, which included 29,343 individuals (1765 with and 27,578 without MSDs). In contrast, the “synthetic” column encompasses the results from four resampling methods: “under”, “over”, “both”, and “ROSE”, which were applied to generate balanced datasets with nearly equal numbers of individuals with and without MSDs.
In the “under” method, the number of individuals without MSDs was reduced to match those with MSDs (3529 individuals: 1765 with and 1764 without MSDs). The “over” method increased the number of individuals without MSDs to match the number of those with MSDs, resulting in 55,263 individuals (27,685 with and 27,578 without MSDs). The “both” and “ROSE” methods involved under- and over-sampling, resulting in 29,343 synthetic individuals each. For the “both” and “ROSE” methods, the number of samples from each class (with and without MSDs) was set to be equal, with sampling probabilities set to 0.5, to optimize model performance. This was evidenced by the higher precision, recall, and AUC in unreported results.
Goodness-of-fit measures were calculated by applying the models to the entire set of 29,343 individuals, using a threshold value of 0.5 for classification. Figure 1 displays the recall (sensitivity), precision, F1 score, and accuracy of each model at different thresholds, increasing by 0.05 from 0.1 to 0.9. The ROC curves for the five models indicated that the AUCs were similar across the models (Figure 2 and Table 3).
The models, generated using synthetic data, were comparable across all four performance measures. However, despite demonstrating high accuracy, the raw model had a recall of 0 and an undefined precision and F1 score. The estimated probability of MSDs in the raw model for each of the 29,343 individuals was lower than the threshold value of 0.5, resulting in all individuals being classified as not having MSD. This highlights the misleading nature of the high accuracy of the raw model, which stems from the fact that the majority group (without MSDs) was correctly classified, but no cases of MSD were identified.
The models in the synthetic column outperformed the raw model for Nagelkerke’s R-squared statistic, which was approximately two times higher. Among the resampling methods, no significant difference was observed in model performance. However, the “under” sampling method yielded the highest Nagelkerke’s R-squared statistic value. The “over” and “both” sampling methods achieved the highest sensitivity, while the “both” sampling method showed the highest precision and F1 score. Interestingly, the “ROSE” method demonstrated the highest accuracy. Most variables were highly significant (p-value = 0.01) in the ROSE column, except for the household income (p-value = 0.077), wrist (p-value = 0.039), and knee (p-value = 0.159). Specifically, females showed 1.29 times higher odds of experiencing MSDs compared to males. The odds increased with age: 2.66 times higher for those aged 50–59, 4.60 times higher for those aged 60–69, and 7.16 times higher for those aged 70 or older, compared to those under 50. Farmers with less than 6 months of agricultural activities had 1.25 times greater odds of developing MSDs than those with 6 months or more. Pesticide use was associated with 1.26 times higher odds of developing MSDs.
For different farming types, the odds of MSD occurrence were 1.15 times higher for greenhouse farming than for rice farming. However, livestock, dry field, and orchard farming were associated with 0.97, 0.90, and 0.75 times lower odds of MSDs, respectively. Households with incomes ranging from USD 3800 to USD 14,999 and USD 15,000 to USD 37,999 had 0.96 and 0.92 times lower odds of experiencing MSDs, respectively, compared to households with incomes less than USD 3800. Finally, farmers who frequently used their neck, arms, and waists demonstrated 1.27, 1.11, and 1.23 times higher odds of developing MSDs, respectively. Lifting objects weighing more than 20 kg was associated with 1.14 times higher odds of developing MSDs, while lifting objects of 10–19 kg was, paradoxically, associated with 1.19 times higher odds of MSDs for those who rarely lifted such objects compared to those who frequently lifted.

4. Discussion

In this study, we analyzed the factors influencing the prevalence of MSDs among agricultural workers, using data from the 2020 and 2022 occupational disease surveys conducted by the RDA. The combined data from 2020 and 2022 revealed a 6.02% prevalence of MSDs associated with agricultural work, indicating a significant class imbalance in the binary response variables. To address this class imbalance, we applied resampling methods [11,26] to improve the accuracy of our model estimation. The following trends were identified by analyzing the prevalence of MSDs in farmers without considering other risk factors. Farmers in their 60 s and 70 s exhibited prevalence rates that were 3.8% and 6.3% higher, respectively, than those in their 50 s, with a steady increase in prevalence with age. This finding has been commonly observed in research related to MSDs in agricultural work [8,31]. The mechanism of MSDs associated with agricultural work is attributed to the exposure to cumulative ergonomic risk factors. Aging exacerbates this phenomenon by increasing the duration of exposure and decreasing the body’s ability, thereby leading to a higher prevalence of MSDs [32]. Farmers routinely work beyond the standard retirement age, often farming at the age of 70 or more. MSDs related to work exposure are enhanced by age-related musculoskeletal changes, increasing the chances of having high-severity MSDs [33].
The prevalence of MSDs was 2% higher in women than in men, primarily because of their physical sensitivity to ergonomic risk factors, such as handling heavy loads and the repetitive nature of work [34]. Beyond physical conditions, socioeconomic characteristics such as lower household income and single-person households contribute to the reduced use of agricultural machinery, lower healthcare access, and an increased burden of domestic chores. These factors further increase the risk of MSDs [35]. Although the MSD prevalence did not differ significantly by farming types, it was higher among those engaged in rice paddy, field, or greenhouse farming than those involved in livestock or orchard farming. There was a little variation in the level of exposure to ergonomic risk factors across farming types [36]. Additionally, because many agricultural workers engaged in multiple types of crops, there was a minimal difference in the prevalence of MSDs by the type of farming. However, in livestock and orchards, the relatively younger age of agricultural workers and greater use of agricultural machinery contributed to a comparatively lower prevalence of MSDs.
Farmers earning less than USD 3800 had a 2.5% higher prevalence rate of MSDs compared to those with incomes exceeding USD 38,000. To prevent MSDs, it is essential to use agricultural machinery and convenient equipment to reduce exposure to ergonomic risk factors. However, in low-income countries, making preventative investments is difficult. Additionally, limited income restricts access to healthcare services, impeding early management of MSDs. Furthermore, limitations in agricultural work performance due to MSDs lead to reduced productivity, creating a vicious cycle in which decreased income results in fewer resources for prevention and management [37].
The waist showed the most significant difference in prevalence rates among body parts. Farmers who frequently bent or twisted their waists had a 1.5% higher prevalence of MSDs compared to those who did not. The prevalence of low back pain (LBP) among agricultural workers is approximately 50%, significantly higher than that among industrial workers, who have a prevalence of approximately 37% [38]. The annual incidence of chronic LBP (lasting more than three months) is approximately 10% [39]. The prevalence of musculoskeletal pain among Korean farmers is exceptionally high at 97.2%, with LBP accounting for 58.7% of the cases [40]. Various occupational risk factors contribute to low back disorders, key risk factors of which include handling heavy loads and maintaining a bent back posture during work [41]. Despite advancements in agricultural mechanization in South Korea, many agricultural tasks are still performed manually. In rice paddies, fields, and greenhouse farming, the height of crops is typically below waist level, leading to most tasks being performed in a bent-over posture. Additionally, the lack of lightweight agricultural materials results in frequent handling of heavy items, contributing to the high prevalence of MSDs in the lumbar region [42,43].
Compared to those who rarely lifted heavy objects, the prevalence rate of MSDs was 1.5% higher among farmers who regularly lifted objects weighing less than 20 kg, and 0.9% higher for those lifting objects weighing more than 20 kg. The incidence of MSDs in the lumbar region among agricultural workers increased significantly when regularly lifting weights of approximately 20 kg or higher [44]. Many tasks performed by agricultural workers involve postures that can induce MSDs in the lumbar region, with lifting heavy objects identified as the most hazardous posture [45].
Agriculture in South Korea is dominated by small-scale self-employed farmers. Unlike other industries, the agricultural sector has been slow to adopt automation and mechanization. Additionally, the rural population in South Korea is aging at a rate faster than the urban population [46]. These socioeconomic characteristics likely contribute to the risk factors for MSDs in Korean farmers. Similar trends have been observed in studies of MSDs in Southeast Asian farmers, who share similar agricultural characteristics with Korea. A systematic review of MSD studies in 11 Southeast Asian countries revealed common risk factors, including increasing age, female gender, and heavy lifting [3]. A meta-analysis of 64 MSD studies from 23 low- and middle-income countries found that lumbar spine disorders were the most common, with heavy lifting being a major risk factor [47].
To prevent MSDs, measures should be taken to address risk factors. Currently, a national pilot program in South Korea provides specialized health screenings for some female farmers that include musculoskeletal disease screening. This program should be expanded to include all women farmers and be linked not only to screening but also to education and early treatment for MSDs. Social security for low-income farmers should be strengthened. Reductions in insurance fees for low-income farmers are needed for agricultural workers’ compensation insurance, and coverage for MSDs should be strengthened. Additionally, reducing the volume and weight of fertilizers, pesticides, and other materials used during farming operations and agricultural products to less than 10 kg can help reduce the burden of heavy lifting. The development of agricultural machinery and material handling equipment that can be easily used by older farmers should be prioritized, especially in paddy, field, and greenhouse farming.
Among the resampling methods used to address class imbalance, the over-sampling method resulted in the largest sample size, while the under-sampling method had the smallest sample size. Despite the differences in the training samples, the model evaluation metrics remained consistent across the methods. However, the ROSE method demonstrated significantly different performance compared to the raw method. The raw method classified all samples as “no MSDs”, resulting in undefined precision and F1 score. Although the raw method had higher accuracy, the ROSE method outperformed it for recall, precision, and F1 scores, with both methods showing similar AUC.
In the ROSE method, the key sociodemographic variables, such as gender, age, farming type, household income, farming activity period, and pesticide use, were statistically significant. Among these factors, sex, age, and pesticide use were particularly important. The prevalence of MSD was higher among women, farmers who used pesticides (directly or indirectly), and older age groups (in descending order: 70 s, 60 s, 50 s, and <50 s). Among the body part usage variables, all except the wrists and knees were significant, with the neck, waist, and lifting objects weighing less than 20 kg being highly significant. The generalized variance inflation factor (GVIF) was calculated to ascertain the presence of multicollinearity among the covariates [48]. The GVIF values for all variables, including age, were approximately 1, indicating the absence of collinearity among the covariates and a consistency in the GVIF values across the five evaluated models.
Various methods, including resubstitution, validation set (holdout), cross-validation (CV), and bootstrapping, were considered for model evaluation. The bootstrap method is the most computationally intensive, involving the repeated estimation of models B times (for example, B = 100), while the CV method repeats as many times as the number of folds. Other methods estimate models only once. In this study, the resubstitution method was applied. In Step 2, as outlined in Section 2.1, selection probabilities for each class were specified through a grid search, with model performance measures determined by the optimal value. Since this survey was conducted in a 1:1 face-to-face manner, with an interviewer visiting the farmers, there was a no option for non-response. When unit non-response occurred, a replacement was made during follow-up. Additionally, incorrect responses were filtered out through logical editing before analysis, and values were replaced by nearest neighbor imputation. However, the impact of imputation was statistically negligible. The survey for this study was conducted twice: first in 2020 and then again in 2022, during the peak of the global COVID-19 pandemic. As a result, additional efforts were necessary, including establishment of safety protocols for field surveys and the provision of safety gear. In the 2022 survey, COVID-19 infection was also included as a separate category of disease type. Of 16,473 respondents in 2022, 1005 contracted COVID-19 and experienced symptoms, but none attributed it to farming. Therefore, it is challenging to determine if COVID-19 directly impacted agricultural diseases.
This study has limitations. First, it is a recall survey that relies on respondents’ memories. In Korea, there is a high proportion of elderly farmers, who primarily guessed their responses when asked to remember the situation of the past year. In some cases, they were not sure about the year of the event. This may have had a slight effect on the accuracy of their responses. To obtain more accurate and less biased information on healthcare utilization behaviors, future research should consider integrating data from the Health Insurance Review & Assessment Service database, rather than solely depending on the participants’ recall data. Second, the inherent sampling error in sample surveys is a crucial limitation. The survey was conducted on 16,473 out of approximately 2.3 million farmers in Korea. Even after excluding non-sampling errors, there remains inherent sampling error. Considering that the prevalence of MSDs caused by agricultural work is around 4%, the sample size of 16,473 is not optimal. However, due to budget constraints, increasing the sample size was challenging. It is important to acknowledge the presence of sampling error in the findings. Lastly, the models proposed in this study were not obtained from 100% real survey data; rather, they were from data obtained using resampling methods that expanded or reduced existing data. To overcome the significant decrease in model sensitivity when using a model based only on survey data, synthetic resampling methods were introduced. Therefore, it is necessary to recognize this limitation and interpret the results accordingly.

5. Conclusions

The accuracy of the raw method was high, but the ROSE method outperformed it for precision and F1 score, and both methods showed similar AUC.

Author Contributions

Conceptualization, J.K.; methodology, J.K. and J.P.; software, J.K.; validation, J.P. and K.Y.; formal analysis, J.K.; investigation, J.P. and K.Y.; resources, J.P. and K.Y.; data curation, J.P. and K.Y.; writing—original draft preparation, J.K., J.P. and K.Y.; writing—review and editing, J.K., J.P. and K.Y.; visualization, J.K.; supervision, J.P.; project administration, J.P.; funding acquisition, J.K. and J.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all the participants involved in the study during the data collection process of the original survey.

Data Availability Statement

The raw data supporting the conclusions of this article can be downloaded from https://kosis.kr/index/index.do (accessed on 25 April 2023).

Acknowledgments

We thank the National Academy of Agricultural Sciences under the Rural Development Administration of South Korea for creating the data source used in this study, which was partially supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No.RS-2023-00239958) (Jinheum Kim) and by the National Academy of Agricultural Sciences under Grant RS-2024-00343939 (Jinwoo Park).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Park, J.; Kim, J.; Youn, K.; Lee, M. A study on the prevalence and characteristics of occupational diseases in Korean farmers. Surv. Res. 2023, 24, 27–41. [Google Scholar] [CrossRef]
  2. Lee, S.-J. The Occupational Diseases of Agricultural Workers. Hanyang Med. Rev. 2010, 30, 305–312. [Google Scholar] [CrossRef]
  3. Akbar, K.A.; Try, P.; Viwattanakulvanid, P.; Kallawicha, K. Work-Related Musculoskeletal Disorders Among Farmers in the Southeast Asia Region: A Systematic Review. Saf. Health Work 2023, 14, 243–249. [Google Scholar] [CrossRef]
  4. Holmberg, S.; Stiernström, E.L.; Thelin, A.; Svärdsudd, K. Musculoskeletal symptoms among farmers and non-farmers: A population-based study. Int. J. Occup. Environ. Health 2002, 8, 339–345. [Google Scholar] [CrossRef]
  5. Lee, M.; Kim, K.; Choi, D. Factors Affecting Musculoskeletal Symptoms among Korean Farmers: Focusing on the Sociodemographic Characteristics. J. Agric. Med. Community Health 2022, 47, 255–267. [Google Scholar]
  6. Lee, S.-H.; Lee, J.-Y.; Cho, Y.-C. Musculoskeletal Disorder Symptoms and Related Factors among Male Workers in Small-scale Manufacturing Industries. J. Korea Acad.-Ind. Coop. Soc. 2012, 13, 4025–4035. [Google Scholar] [CrossRef]
  7. Min, D.; Baek, S.; Park, H.W.; Lee, S.A.; Moon, J.; Yang, J.E.; Kim, K.S.; Kim, J.Y.; Kang, E.K. Prevalence and Characteristics of Musculoskeletal Pain in Korean Farmers. Ann. Rehabil. Med. 2016, 40, 1–13. [Google Scholar] [CrossRef]
  8. Osborne, A.; Blake, C.; Fullen, B.M.; Meredith, D.; Phelan, J.; McNamara, J.; Cunningham, C. Risk factors for musculoskeletal disorders among farm owners and farm workers: A systematic review. Am. J. Ind. Med. 2012, 55, 376–389. [Google Scholar] [CrossRef]
  9. Szeszenia-Dąbrowska, N.; Świątkowska, B.; Wilczyńska, U. Occupational diseases among farmers in Poland. Med. Pr. 2016, 67, 163–171. [Google Scholar] [CrossRef]
  10. Karttunen, J.P.; Rautiainen, R.H. Distribution and characteristics of occupational injuries and diseases among farmers: A retrospective analysis of workers’ compensation claims. Am. J. Ind. Med. 2013, 56, 856–869. [Google Scholar] [CrossRef]
  11. Menardi, G.; Torelli, N. Training and assessing classification rules with unbalanced data. Data Min. Knowl. Discov. 2012, 28, 92–122. [Google Scholar] [CrossRef]
  12. King, G.; Zeng, L. Logistic Regression in Rare Events Data. Political Anal. 2001, 9, 137–163. [Google Scholar] [CrossRef]
  13. Hand, D.J.; Vinciotti, V. Choosing k for two-class nearest neighbour classifiers with unbalanced classes. Pattern Recognit. Lett. 2003, 24, 1555–1562. [Google Scholar] [CrossRef]
  14. Chawla, N.V. C4.5 and imbalanced data sets: Investigating the effect of sampling method, probabilistic estimate, and decision tree structure. ICML 2003, 3, 36. [Google Scholar]
  15. Kuhn, M. Building Predictive Models in R Using the caret Package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar] [CrossRef]
  16. Torgo, L. Data Mining with R: Learning with Case Studies, 2nd ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 2016. [Google Scholar]
  17. Kuhn, M. Caret: Classification and Regression Training. Astrophys. Source Code Libr. 2015, ascl-1505. [Google Scholar]
  18. McCarthy, K.; Zabar, B.; Weiss, G. Does cost-sensitive learning beat sampling for classifying rare classes? In Proceedings of the 1st International Workshop on Utility-Based Data Mining, Chicago, IL, USA, 21 August 2005; pp. 69–77. [Google Scholar]
  19. Lee, S.S. Noisy replication in skewed binary classification. Comput. Stat. Data Anal. 2000, 34, 165–191. [Google Scholar] [CrossRef]
  20. Chawla, N.; Bowyer, K.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. arXiv 2002, arXiv:1106.1813. [Google Scholar] [CrossRef]
  21. Haibo, H.; Yang, B.; Garcia, E.A.; Shutao, L. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; pp. 1322–1328. [Google Scholar]
  22. Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; Taylor & Francis: Oxfordshire, UK, 1994. [Google Scholar]
  23. Douzas, G.; Bacao, F.; Last, F. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Inf. Sci. 2018, 465, 1–20. [Google Scholar] [CrossRef]
  24. Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef]
  25. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
  26. Lunardon, N.; Menardi, G.; Torelli, N. ROSE: A Package for Binary Imbalanced Learning. R J. 2014, 6, 79. [Google Scholar] [CrossRef]
  27. Bowman, A.W.; Azzalini, A.; Bowman, A.W.; Azzalini, A. Software. In Applied Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus Illustrations; Oxford University Press: Oxford, UK, 1997. [Google Scholar]
  28. He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
  29. Sokolova, M.; Japkowicz, N.; Szpakowicz, S. Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1015–1021. [Google Scholar]
  30. Nagelkerke, N.J.D. A note on a general definition of the coefficient of determination. Biometrika 1991, 78, 691–692. [Google Scholar] [CrossRef]
  31. Greggi, C.; Visconti, V.V.; Albanese, M.; Gasperini, B.; Chiavoghilefu, A.; Prezioso, C.; Persechino, B.; Iavicoli, S.; Gasbarra, E.; Iundusi, R.; et al. Work-Related Musculoskeletal Disorders: A Systematic Review and Meta-Analysis. J. Clin. Med. 2024, 13, 3964. [Google Scholar] [CrossRef]
  32. Tonelli, S.; Culp, K.; Donham, K. Work-related musculoskeletal disorders in senior farmers: Safety and health considerations. Workplace Health Saf. 2014, 62, 333–341. [Google Scholar] [CrossRef]
  33. Xiao, H.; McCurdy, S.A.; Stoecklin-Marois, M.T.; Li, C.S.; Schenker, M.B. Agricultural work and chronic musculoskeletal pain among Latino farm workers: The MICASA study. Am. J. Ind. Med. 2013, 56, 216–225. [Google Scholar] [CrossRef] [PubMed]
  34. Gjesdal, S.; Bratberg, E.; Mæland, J.G. Gender differences in disability after sickness absence with musculoskeletal disorders: Five-year prospective study of 37,942 women and 26,307 men. BMC Musculoskelet. Disord. 2011, 12, 37. [Google Scholar] [CrossRef]
  35. Lee, H.; Cho, S.Y.; Kim, J.S.; Yoon, S.Y.; Kim, B.I.; An, J.M.; Kim, K.B. Difference in health status of Korean farmers according to gender. Ann. Occup. Environ. Med. 2019, 31, 7. [Google Scholar] [CrossRef]
  36. Baek, S.; Park, J.; Kyoung Kang, E.; Kim, G.; Kim, H.; Park, H.W. Association Between Ergonomic Burden Assessed Using 20-Item Agricultural Work-Related Ergonomic Risk Questionnaire and Shoulder, Low Back, and Leg Pain in Korean Farmers. J. Agromedicine 2023, 28, 532–544. [Google Scholar] [CrossRef]
  37. Lizer, S.K.; Petrea, R.E. Health and safety needs of older farmers: Part II. Agricultural injuries. Aaohn J. 2008, 56, 9–14. [Google Scholar] [CrossRef] [PubMed]
  38. Walker-Bone, K.; Palmer, K.T. Musculoskeletal disorders in farmers and farm workers. Occup. Med. 2002, 52, 441–450. [Google Scholar] [CrossRef] [PubMed]
  39. Brackbill, R.M.; Cameron, L.L.; Behrens, V. Prevalence of chronic diseases and impairments among US farmers, 1986–1990. Am. J. Epidemiol. 1994, 139, 1055–1065. [Google Scholar] [CrossRef] [PubMed]
  40. Kee, D.; Haslam, R. Prevalence of work-related musculoskeletal disorders in agriculture workers in Korea and preventative interventions. Work 2019, 64, 763–775. [Google Scholar] [CrossRef] [PubMed]
  41. Bernard, B.P. Musculoskeletal Disorders and Workplace Factors: A Critical Review of Epidemiologic Evidence for Work-Related Musculoskeletal Disorders of the Neck, Upper Extremity, and Low Back; The National Institute for Occupational Safety and Health: Washington, DC, USA, 1997. [Google Scholar]
  42. Hong, C.; Lee, C.G.; Song, H. Characteristics of lumbar disc degeneration and risk factors for collapsed lumbar disc in Korean farmers and fishers. Ann. Occup. Environ. Med. 2021, 33, e16. [Google Scholar] [CrossRef]
  43. Park, K.H.; Baek, S.; Kang, E.K.; Park, H.W.; Kim, G.; Kim, S.H. The Association Between Sagittal Plane Alignment and Disc Space Narrowing of Lumbar Spine in Farmers. Ann. Rehabil. Med. 2021, 45, 294–303. [Google Scholar] [CrossRef]
  44. Meyers, J.M.; Faucett, J.; Tejeda, D.G.; Kabashima, J.; Miles, J.A.; Janowitz, I.; Duraj, V.; Smith, R.; Weber, E. High Risk Tasks for Musculoskeletal Disorders in Agricultural Field Work. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 2000, 44, 616–619. [Google Scholar] [CrossRef]
  45. Allread, W.G.; Wilkins, J.R., 3rd; Waters, T.R.; Marras, W.S. Physical demands and low-back injury risk among children and adolescents working on farms. J. Agric. Saf. Health 2004, 10, 257–274. [Google Scholar] [CrossRef]
  46. Lee, J.; Oh, Y.-G.; Yoo, S.-H.; Suh, K. Vulnerability assessment of rural aging community for abandoned farmlands in South Korea. Land. Use Policy 2021, 108, 105544. [Google Scholar] [CrossRef]
  47. Shivakumar, M.; Welsh, V.; Bajpai, R.; Helliwell, T.; Mallen, C.; Robinson, M.; Shepherd, T. Musculoskeletal disorders and pain in agricultural workers in Low- and Middle-Income Countries: A systematic review and meta-analysis. Rheumatol. Int. 2024, 44, 235–247. [Google Scholar] [CrossRef]
  48. Fox, J.; Monette, G. Generalized Collinearity Diagnostics. J. Am. Stat. Assoc. 1992, 87, 178–183. [Google Scholar] [CrossRef]
Figure 1. Sensitivity, precision, F1 score, and accuracy plots based on raw and synthetic samples.
Figure 1. Sensitivity, precision, F1 score, and accuracy plots based on raw and synthetic samples.
Healthcare 12 02026 g001
Figure 2. Receiver operating characteristic curves based on raw and synthetic samples.
Figure 2. Receiver operating characteristic curves based on raw and synthetic samples.
Healthcare 12 02026 g002
Table 1. Association test between each predictor and disease prevalence caused by musculoskeletal disorders.
Table 1. Association test between each predictor and disease prevalence caused by musculoskeletal disorders.
MSDs’ prevalence
NoYes
PredictorCategoryn (%)n (%)p-value
Total-27,578 (94.0)1765 (6.0)-
SexMale (ref)13,623 (95.0)718 (5.0)<0.0001
Female13,955 (93.0)1047 (7.0)
Age,
yrs
<50 (ref)903 (98.9)10 (1.1)<0.00001
50–592820 (97.2)80 (2.8)
60–69-7760 (95.1)398 (4.9)
7016,095 (92.6)1277 (7.4)
FAP,
months
0–5 (ref)1822 (93.1)135 (6.9)0.0889
6–1225,756 (94.0)1630 (6.0)
PesticideNo (ref)4491 (94.6)254 (5.4)0.0362
Yes23,087 (93.9)1511 (6.1)
Types of farmingRice (ref)10,521 (93.6)784 (6.4)0.0921
Dry field12,126 (94.1)763 (5.9)
Orchard2587 (94.9)139 (5.1)
Greenhouse1063 (94.2)65 (5.8)
Livestock281 (95.2)14 (4.8)
Income,
US dollars
<3799 (ref)9329 (93.1)690 (6.9)<0.0001
3800–14,99910,568 (93.9)680 (6.1)
15,000–37,9995656 (94.9)302 (5.1)
38,0002025 (95.6)93 (4.4)
NeckNo (ref)13,324 (94.5)770 (5.5)0.0001
Yes14,254 (93.5)995 (6.5)
ArmsNo (ref)15,924 (94.4)952 (5.6)0.0017
Yes11,654 (93.5)813 (6.5)
WristsNo (ref)9649 (94.2)589 (5.8)0.1671
Yes17,929 (93.8)1176 (6.2)
WaistNo (ref)5631 (95.3)279 (4.7)<0.0001
Yes21,947 (93.7)1486 (6.3)
KneesNo (ref)6710 (94.9)361 (5.1)0.0003
Yes20,868 (93.7)1403 (6.3)
Lifting: 10–19,
kg
No (ref)15,091 (93.3)1086 (6.7)<0.0001
Yes12,487 (94.8)679 (5.2)
Lifting: 20,
kg
No (ref)18.752 (93.7)1265 (6.3)0.0013
Yes8826 (94.6)500 (5.4)
FAP: Farming activity period (unit: months).
Table 2. Odds ratio with the 95% confidence interval of each predictor on disease prevalence caused by musculoskeletal disorders from multivariable logistic regression analysis based on raw and synthetic samples.
Table 2. Odds ratio with the 95% confidence interval of each predictor on disease prevalence caused by musculoskeletal disorders from multivariable logistic regression analysis based on raw and synthetic samples.
PredictorCategoryRawSynthetic
UnderOverBothROSE
OR
(95% CI)
p-ValueOR
(95% CI)
p-ValueOR
(95% CI)
p-ValueOR
(95% CI)
p-ValueOR
(95% CI)
p-Value
Sex Female1.31
(1.17, 1.45)
<0.00011.42
(1.22, 1.64)
<0.00011.42
(1.22, 1.64)
<0.00011.42
(1.22, 1.64)
<0.00011.32
(1.26, 1.39)
<0.0001
Age, yrs
Ref: <50
50–592.51
(1.30, 4.87)
<0.00011.93
(0.92, 4.06)
<0.00011.93
(0.92, 4.06)
<0.00011.93
(0.92, 4.06)
<0.00012.63
(2.05, 3.38)
<0.0001
60–694.43
(2.35, 8.33)
3.40
(1.68, 6.88)
3.40
(1.68, 6.88)
3.40
(1.68, 6.88)
4.55
(3.59, 5.77)
706.64
(3.54, 12.5)
4.69
(2.32, 9.45)
4.69
(2.32, 9.45)
4.69
(2.32, 9.45)
6.69
(5.28, 8.47)
FAP
Ref: 0–5
6–120.79
(0.66, 0.95)
0.01510.84
(0.64, 1.09)
0.19230.84
(0.64, 1.09)
0.19230.84
(0.64, 1.09)
0.19230.88
(0.80, 0.96)
0.0053
PesticideYes1.24
(1.08, 1.43)
0.00201.27
(1.04, 1.54)
0.01651.27
(1.04, 1.54)
0.01651.27
(1.04, 1.54)
0.01651.23
(1.15, 1.32)
<0.0001
Types of
farming
Dry field0.88
(0.79, 0.98)
0.04290.78
(0.67, 0.90)
0.02080.78
(0.67, 0.90)
0.02080.78
(0.67, 0.90)
0.02080.86
(0.81, 0.90)
<0.0001
Orchard0.83
(0.68, 1.01)
0.87
(0.66, 1.15)
0.87
(0.66, 1.15)
0.87
(0.66, 1.15)
0.80
(0.73, 0.88)
Greenhouse1.12
(0.86, 1.46)
0.93
(0.65, 1.34)
0.93
(0.65, 1.34)
0.93
(0.65, 1.34)
1.12
(0.99, 1.28)
Livestock1.16
(0.67, 2.01)
1.25
(0.59, 2.68)
1.25
(0.59, 2.68)
1.25
(0.59, 2.68)
1.06
(0.81, 1.38)
Income,
US dollars
Ref: <3800
3800–14,9990.93
(0.83, 1.04)
0.44361.05
(0.89, 1.23)
0.84591.05
(0.89, 1.23)
0.84591.05
(0.89, 1.23)
0.84590.90
(0.86, 0.96)
0.0010
15,000–37,9990.90
(0.77, 1.04)
0.96
(0.78, 1.18)
0.96
(0.78, 1.18)
0.96
(0.78, 1.18)
0.92
(0.85, 0.99)
38,0000.90
(0.71, 1.14)
1.00
(0.73, 1.38)
1.00
(0.73, 1.38)
1.00
(0.73, 1.38)
0.85
(0.76, 0.95)
NeckYes1.22
(1.08, 1.38)
0.00151.19
(1.00, 1.42)
0.04471.19
(1.00, 1.42)
0.04471.19
(1.00, 1.42)
0.04471.20
(1.13, 1.28)
<0.0001
ArmsYes1.14
(1.01, 1.29)
0.03361.11
(0.94, 1.32)
0.22191.11
(0.94, 1.32)
0.22191.11
(0.94, 1.32)
0.22191.19
(1.12, 1.26)
<0.0001
WristsYes0.93
(0.82, 1.06)
0.28770.95
(0.80, 1.15)
0.62040.95
(0.80, 1.15)
0.62040.95
(0.80, 1.15)
0.62040.94
(0.89, 1.00)
0.0642
WaistYes1.26
(1.07, 1.49)
0.00461.42
(1.22, 1.64)
0.15291.42
(1.22, 1.64)
0.15291.42
(1.22, 1.64)
0.15291.27
(1.18, 1.38)
<0.0001
KneesYes1.05
(0.91, 1.22)
0.49061.93
(0.92, 4.06)
0.26921.93
(0.92, 4.06)
0.26921.93
(0.92, 4.06)
0.26921.05
(0.98, 1.13)
0.1657
Lifting: 10–19 kgYes0.78
(0.68, 0.90)
0.00063.40
(1.68, 6.88)
0.00223.40
(1.68, 6.88)
0.00223.40
(1.68, 6.88)
0.00220.74
(0.69, 0.79)
<0.0001
Lifting: 20 kgYes1.15
(0.98, 1.34)
0.08284.69
(2.32, 9.45)
0.41834.69
(2.32, 9.45)
0.41834.69
(2.32, 9.45)
0.41831.17
(1.09, 1.26)
<0.0001
OR: odds ratio; CI: confidence interval; FAP: farming activity period; Under: under-sampling; Over: over-sampling; Both: under- and over-sampling; ROSE: random over-sampling examples.
Table 3. Measures of model performance from multivariable logistic regression analysis based on raw and synthetic samples.
Table 3. Measures of model performance from multivariable logistic regression analysis based on raw and synthetic samples.
OR
(95% CI)
p-ValueOR
(95% CI)
p-ValueOR
(95% CI)
p-ValueOR
(95% CI)
p-ValueOR
(95% CI)
p-Value
MeasuresNagelkerke’s R-square0.0290.0750.0640.0650.074
Sensitivity00.6790.6570.6590.671
Precision-0.0770.0800.0790.079
F1 score-0.0690.0710.0710.070
Accuracy0.9400.5080.5240.5180.508
AUC0.6190.6150.6180.6150.618
OR: odds ratio; CI: confidence interval; FAP: farming activity period; Under: under-sampling; Over: over-sampling; Both: under- and over-sampling; ROSE: random over-sampling examples; AUC: area under curve.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, J.; Youn, K.; Park, J. Risk Factors for Musculoskeletal Disorders in Korean Farmers: Survey on Occupational Diseases in 2020 and 2022. Healthcare 2024, 12, 2026. https://doi.org/10.3390/healthcare12202026

AMA Style

Kim J, Youn K, Park J. Risk Factors for Musculoskeletal Disorders in Korean Farmers: Survey on Occupational Diseases in 2020 and 2022. Healthcare. 2024; 12(20):2026. https://doi.org/10.3390/healthcare12202026

Chicago/Turabian Style

Kim, Jinheum, Kanwoo Youn, and Jinwoo Park. 2024. "Risk Factors for Musculoskeletal Disorders in Korean Farmers: Survey on Occupational Diseases in 2020 and 2022" Healthcare 12, no. 20: 2026. https://doi.org/10.3390/healthcare12202026

APA Style

Kim, J., Youn, K., & Park, J. (2024). Risk Factors for Musculoskeletal Disorders in Korean Farmers: Survey on Occupational Diseases in 2020 and 2022. Healthcare, 12(20), 2026. https://doi.org/10.3390/healthcare12202026

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop