Novel Ensemble Learning Algorithm for Early Detection of Lower Back Pain Using Spinal Anomalies

: Lower back pain (LBP) is a musculoskeletal condition that affects millions of people worldwide and significantly limits their mobility and daily activities. Appropriate ergonomics and exercise are crucial preventive measures that play a vital role in managing and reducing the risk of LBP. Individuals with LBP often exhibit spinal anomalies, which can serve as valuable indicators for early diagnosis. We propose an advanced machine learning methodology for LBP detection that incorporates data balancing and bootstrapping techniques. Leveraging the features associated with spinal anomalies, our method offers a promising approach for the early detection of LBP. Our study utilizes a standard dataset comprising 310 patient records, including spinal anomaly features. We propose an ensemble method called the random forest gradient boosting XGBoost Ensemble (RGXE), which integrates the combined power of the random forest, gradient boosting, and XGBoost methods for LBP detection. Experimental results demonstrate that the proposed ensemble method, RGXE Voting, outperforms state-of-the-art methods, achieving a high accuracy of 0.99. We fine-tuned each method and validated its performance using k-fold cross-validation in addition to determining the computational complexity of the methods. This innovative research holds significant potential to revolutionize the early detection of LBP, thereby improving the quality of life.


Introduction
Lower back pain (LBP) is a prevalent musculoskeletal problem that affects millions of people globally [1], leading to discomfort and, in severe cases, debilitating pain.The complexity of LBP stems from factors such as mechanical strain, disc degeneration, muscle imbalance, and psychological stress [2].Understanding and addressing these root causes present challenges, necessitating comprehensive approaches for effective management.The multifaceted nature of LBP affects daily activities [3] and quality of life [4].
This widespread issue has significant drawbacks [5], including diminished mobility, reduced workplace productivity, and heightened emotional distress.Sleep disturbances [6] and potential reliance on medications further contribute to a decline in the overall quality of life.In severe instances, there is a risk of disability [7], which increases the burden faced by individuals.The associated care costs create financial challenges, emphasizing the need for holistic strategies that consider both well-being and economic implications.
LBP is the primary cause of disability worldwide, affecting approximately 619 million individuals [2].This prevalence raises significant public health concerns, as the repercussions extend beyond personal suffering to a substantial decrease in work productivity.The financial burden on affected individuals [8] and society underscores the importance of addressing the widespread impact of LBP through comprehensive strategies.
Shifting the focus to healthcare, machine learning [9] transforms the landscape by analyzing extensive patient data for disease diagnosis and treatment personalization [10].This technology accelerates diagnostics and ensures faster and more efficient healthcare delivery.Its versatility is evident in addressing complex medical challenges and reshaping medicine with innovative solutions that prioritize precision, speed, and personalized care.The potential benefits of machine learning extend to the medical community, providing a quick, dependable, and efficient approach for disease detection and diagnosis [11].
In contrast to manual disease detection processes that are prone to human error, machine learning-based algorithms consistently achieve high accuracy in prediction tasks.This reliability offers an efficient alternative for medical diagnoses, minimizing the risks associated with human error.The integration of machine learning has the potential to benefit the medical community [12] by ensuring quick and precise disease detection while advancing the overall effectiveness and accessibility of healthcare services.
Our contributions can be summarized as follows: • We propose a novel ensemble method called RGXE that integrates the combined power of the random forest (RF), gradient boosting (GB), and XGBoost (XGB) methods for LBP detection.• We implemented advanced classification methods (logistic regression (LR), Gaussian naïve Bayes (GNB), RF, decision tree (DT), support vector machine (SVM), k-nearest neighbors (KNN), GB, and XGB) to evaluate the proposed scheme against state-of-theart approaches.

•
We improved precision through hyperparameter optimization and k-fold cross-validation and demonstrated exceptional performance compared to existing studies.
The remainder of this paper is organized as follows: Section 2 presents a comparative analysis of the literature.Section 3 elaborates on this novel methodology.The experimental evaluations conducted in this study are described in detail in Section 4. Finally, Section 5 encapsulates the study's findings and conclusions.

Literature Review
We conducted a broad assessment of earlier examinations on the discovery of LBP.This review presents the different scopes of studies that utilize different strategies for distinguishing and surveying LBP.Through a careful examination of the existing literature, we intend to acquire knowledge, distinguish patterns, and pinpoint gaps that currently exist in previous studies.A summary of previous studies is presented in Table 1.
In a previous research [13], feature selection based on a genetic algorithm (GA) was used to identify significant parameters.Two studies compared predictive models that included feature selection with those that did not, assessing performance measures such as accuracy, precision, f1 score, recall, and area under the receiver operating characteristics (ROC) Curve (AUC).A dataset comprising 310 observations and 12 features was obtained from Kaggle.This study aimed to predict early LBP symptoms using machine learning techniques, emphasizing feature selection for enhanced model performance, and achieved an accuracy of 85.2%.In a previous study [14], the proposed multi-layer perceptron (MLP)-type artificial neural network (ANN) computed the likelihood of surgery based on the identified attributes in a model that mimicked surgical decision-making.Fifty-five criteria were found to be predictive of surgical progression.Each patient (n = 483) who presented with a lumbar spine complaint at a single Australian Tertiary Hospital between 2013 and 2019 had their medical records examined, and relevant information was gathered.The model achieved a remarkable accuracy of 92.1% in predicting surgical candidacy.The excellent discriminative ability (AUC = 0.90) and good data fit in the calibration analysis demonstrated its reliability.
In another study [15], a pioneering approach to LBP classification was introduced, utilizing RF as the classification algorithm.With an impressive accuracy of 85.80%, which significantly surpassed the initial 71.25%, the improvement was attributed to the application of parameter tuning.This marked RF as a leading contender for enhancing LBP classification, showcasing its transformative potential in medical diagnostics.Data were collected from Kaggle.Meticulous methodology and breakthrough incorporation of parameter tuning have set the stage for future advancements in precision medicine for LBP.
In a previous study [16], the proposed GNB machine learning model predicted the risk of chronic LBP.General characteristics, including sex, age, BMI, and physical activity level, were collected from all participants using the Global Physical Activity Questionnaire.Data collected from the CLBP and matched NLBP subjects adhered to ethical standards.Patients with CLBP (n = 20) were recruited from a hospital meeting specific criteria, while patients with NLBP (n = 20) were matched and recruited using the defined exclusion criteria.The model achieved an accuracy of 79% in predicting the risk of LBP.
In a previous study [17], stacked ensemble machine learning was proposed to investigate LBP.This study underlined the importance of early LBP detection and presented a classification system based on various algorithms.The intricate anatomy of the lumbar spine and its vulnerability to pain were highlighted.Kasula et al. used a dataset from Kaggle and employed hyperparameter tuning to optimize the method.The dataset contained 310 observations.The proposed method achieved an accuracy of 76.34%.They proposed a stacking ensemble classifier as an automated tool to predict the LBP tendency of a patient.
Lamichhane et al. [18] used cortical thickness (CT) as a feature to train SVM to accurately classify participants into two groups: LBP and healthy control (HC).Achieving a classification accuracy of 74.51%, an AUC of 0.787, a sensitivity of 74.07%, and a specificity of 75.00%, the model was effective in distinguishing between the two conditions.ROC curves provide a visual representation of classification performance while pinpointing the cortical regions involved in the classification process.These regions are depicted on a brain mesh surface, enriching the depth of the findings.The approach not only showcases the potential of CT in discriminating LBP but also provides insights into the specific neuroanatomical regions associated with the condition, contributing to a comprehensive understanding of the neural correlates of LBP.
Liew et al. [19] utilized functional data boosting to evaluate predictive ensemble models aimed at distinguishing between different subtypes of LBP and HCs during lowload lifting.The study included 49 participants with different LBP statuses.Three models exhibited notable accuracy: Model 1 (control vs. LBP) achieved an AUC of 90.4%, Model 2 (control vs. recurrent LBP) achieved an AUC of 91.2%, and Model 3 (recurrent LBP vs. LBP) demonstrated an impressive AUC of 96.7%.Influential predictors such as the biceps femoris, deltoid, and iliocostalis muscles underscore the potential for targeted interventions in LBP management, marking a significant advancement in predictive modeling accuracy for nuanced subtype classification.
Mao et al. [20] explored the role of the habenula in chronic LBP (cLBP) using restingstate functional connectivity (rsFC) and effective connectivity, revealing enhanced connectivity patterns in patients with cLBP.The combination of rsFC pathways, including the habenula-left superior frontal cortex, habenula-pons, and habenula-thalamus, achieved an accuracy of 75.9% in distinguishing HCs from patients with cLBP using an SVM.The dataset for the study comprised 52 individuals diagnosed with cLBP and an equivalent group of 52 HCs, which were utilized to investigate the rsFC and effective connectivity of the habenula.These findings suggest abnormal habenular connectivity in patients with cLBP, emphasizing the potential of machine learning to discriminate between chronic pain conditions.
Yu et al. [21] highlighted the potential of combining ultrasound and shear wave elastography (SWE) features, particularly highlighting the significant role of SWE elasticity in improving the automatic classification of patients with non-specific LBP (NSLBP).This advancement aids in enhancing diagnostic accuracy and intervention planning.Using a sample of 52 subjects from the University of Hong Kong-Shenzhen Hospital, we employed a SVM model to analyze 48 selected features.The SVM model achieved accuracy, precision, and sensitivity of 0.85, 0.89, and 0.86, respectively, surpassing the previous MRI-based values.
Shim et al. [22] focused on creating machine learning models designed to accurately forecast the likelihood of cLBP.Data were utilized from the Sixth Korea National Health and Nutrition Examination Survey (KNHANES VI-2, 3), including 6119 patients, with 1394 experiencing LBP.The study employed various classification models.These included k-nearest neighbors, RF, naïve Bayes, DT, GB machine, LR, SVM, and ANN.The ANN model emerged as the most effective, with an AUROC of 0.716, surpassing the other algorithms.The study underscores the potential of machine learning, particularly the ANN model, for identifying populations at high risk of cLBP.This offers a promising approach for targeted interventions and preventive strategies.

Research Gap
Following a thorough examination of the current literature, our analysis highlights specific areas of research that require further exploration.
Previously, researchers typically utilized classical machine learning methods to detect LBP.However, a growing need exists for more sophisticated ensemble machine learning approaches.Furthermore, the diagnostic performance scores in recent studies have been less than optimal.

Proposed Methodology
Our novel research methodology for detecting LBP in humans is shown in Figure 1.We obtained a dataset of LBP symptoms from Kaggle and conducted comprehensive preprocessing to handle null values.To address class imbalances, we utilized the synthetic minority over-sampling technique (SMOTE) [23] and further augmented the dataset through bootstrapping.After these enhancements, the data was carefully split into an 80%, and 20% for training and testing sets, respectively, to ensure a thorough evaluation of the model's performance.Using this well-prepared and expanded dataset, we applied a machine learning model to predict LBP symptoms.

Proposed Methodology
Our novel research methodology for detecting LBP in humans is shown in Figure 1.We obtained a dataset of LBP symptoms from Kaggle and conducted comprehensive preprocessing to handle null values.To address class imbalances, we utilized the synthetic minority over-sampling technique (SMOTE) [23] and further augmented the dataset through bootstrapping.After these enhancements, the data was carefully split into an 80%, and 20% for training and testing sets, respectively, to ensure a thorough evaluation of the model's performance.Using this well-prepared and expanded dataset, we applied a machine learning model to predict LBP symptoms.

Lower Back Pain Symptoms Data
This study employed a comprehensive dataset [24] featuring 310 rows and 12 distinct features, categorizing instances into two classes: normal and abnormal.The initial 12 columns encapsulated various features, including pelvic incidence, pelvic radius, pelvic tilt, lumbar lordosis angle, sacral slope, degree of spondylolisthesis, pelvic slope, direct tilt, thoracic slope, cervical tilt, sacral angle, and scoliosis slope.Each of these features provides valuable information for the analysis.The last column of the dataset serves as the target variable, indicating whether the case falls into the "normal" or "abnormal" category.This structured dataset forms the foundation for a comprehensive investigation of factors influencing spinal health and abnormalities.

Synthetic Minority Over-Sampling Technique (SMOTE)-Based Data Resampling
Upon recognizing an imbalance in the dataset, in which 210 instances were labeled abnormal and 100 normal, we took proactive measures to address this issue.We used SMOTE [23] to augment the representation of the minority class, specifically the normal class, by generating synthetic instances (see Figure 2a).This strategic enhancement aims to create a more balanced distribution of abnormal and normal classes, which is a critical

Lower Back Pain Symptoms Data
This study employed a comprehensive dataset [24] featuring 310 rows and 12 distinct features, categorizing instances into two classes: normal and abnormal.The initial 12 columns encapsulated various features, including pelvic incidence, pelvic radius, pelvic tilt, lumbar lordosis angle, sacral slope, degree of spondylolisthesis, pelvic slope, direct tilt, thoracic slope, cervical tilt, sacral angle, and scoliosis slope.Each of these features provides valuable information for the analysis.The last column of the dataset serves as the target variable, indicating whether the case falls into the "normal" or "abnormal" category.This structured dataset forms the foundation for a comprehensive investigation of factors influencing spinal health and abnormalities.

Synthetic Minority Over-Sampling Technique (SMOTE)-Based Data Resampling
Upon recognizing an imbalance in the dataset, in which 210 instances were labeled abnormal and 100 normal, we took proactive measures to address this issue.We used SMOTE [23] to augment the representation of the minority class, specifically the normal class, by generating synthetic instances (see Figure 2a).This strategic enhancement aims to create a more balanced distribution of abnormal and normal classes, which is a critical step in ensuring unbiased model training [23].Several previous studies have addressed the issues of imbalanced data by applying data sampling techniques both before [25,26] and during model validation [27,28].Blagus and Lusa [29] discussed the possibility of overoptimism when applying data sampling techniques.Overoptimism refers to a positive bias in the estimation of the performance of a model.Furthermore, Santos et al. explored two approaches (before and during model validation) for handling imbalanced datasets in more detail [30].They elaborated on the potential for overoptimism and overfitting when dealing with imbalanced datasets, considering data complexity, cross-validation approaches, and data sampling methods.The study revealed that overfitting might occur when using the oversampling method alone.Regarding overoptimism, the study indicated that this issue might arise in cases of high data complexity, such as overlapping individual feature values, class separability, and the geometry and topology of the data.Additionally, previous work [31] has employed the same techniques applied in this study.
datasets in more detail [30].They elaborated on the potential for overoptimism and overfitting when dealing with imbalanced datasets, considering data complexity, cross-validation approaches, and data sampling methods.The study revealed that overfitting might occur when using the oversampling method alone.Regarding overoptimism, the study indicated that this issue might arise in cases of high data complexity, such as overlapping individual feature values, class separability, and the geometry and topology of the data.Additionally, previous work [31] has employed the same techniques applied in this study.
Therefore, we guided the dataset through this transformation in the present study, striving to achieve a fair representation of both classes and enhancing the model's ability to discern patterns across various instances of spinal health.The resulting balanced dataset is illustrated in Figure 2b, showcasing improved distribution after the application of SMOTE.

Bootstrapping-Based Data Sampling
Recognizing the limitations of the small dataset, we employed a bootstrapping technique [32] to increase the data volume.By generating multiple resamples from the existing data, we effectively amplified the dataset size, providing a model with a more diverse set of instances to learn from.This approach enhances the performance of the model by exposing it to a broad range of scenarios and patterns.Bootstrapping is a strategic step for optimizing the learning process and improving the overall robustness of the model.

Data Splitting
To ensure an effective evaluation of the performance of our model, we carefully divided the dataset into a training set that contained 80% of the data and a testing set that contained the remaining 20%.With the help of this division, we trained the model on a significant percentage of the data, which helped identify patterns and relationships Therefore, we guided the dataset through this transformation in the present study, striving to achieve a fair representation of both classes and enhancing the model's ability to discern patterns across various instances of spinal health.The resulting balanced dataset is illustrated in Figure 2b, showcasing improved distribution after the application of SMOTE.

Bootstrapping-Based Data Sampling
Recognizing the limitations of the small dataset, we employed a bootstrapping technique [32] to increase the data volume.By generating multiple resamples from the existing data, we effectively amplified the dataset size, providing a model with a more diverse set of instances to learn from.This approach enhances the performance of the model by exposing it to a broad range of scenarios and patterns.Bootstrapping is a strategic step for optimizing the learning process and improving the overall robustness of the model.

Data Splitting
To ensure an effective evaluation of the performance of our model, we carefully divided the dataset into a training set that contained 80% of the data and a testing set that contained the remaining 20%.With the help of this division, we trained the model on a significant percentage of the data, which helped identify patterns and relationships efficiently.The distinct testing set evaluated how well the model generalizes to new, untested data and acts as an unbiased standard.

Novel Proposed Ensemble Method
The design of the proposed RGXE Voting model is shown in Figure 3.By introducing the RGXE Voting Classifier, which is an innovative ensemble model designed to classify LBP, we integrated the strengths of RF, GB, and XGB.By leveraging the unique capabilities of each algorithm, RGXE Voting ensures a robust classification framework that captures the intricate patterns associated with LBP symptoms.This ensemble approach [17] signifies a strategic collaboration of powerful algorithms collectively aimed at achieving heightened predictive accuracy and reliability in discerning patterns within the spinal health realm.the RGXE Voting Classifier, which is an innovative ensemble model designed to classify LBP, we integrated the strengths of RF, GB, and XGB.By leveraging the unique capabilities of each algorithm, RGXE Voting ensures a robust classification framework that captures the intricate patterns associated with LBP symptoms.This ensemble approach [17] signifies a strategic collaboration of powerful algorithms collectively aimed at achieving heightened predictive accuracy and reliability in discerning patterns within the spinal health realm.The strategic combination of RF, GB, and XGBoost within the RGXE Voting model underscores the concerted efforts to bolster predictive accuracy and reliability in the realm of spinal health.Each algorithm brings unique strengths: RF adeptly handles high-dimensional data and complex interactions, GB iteratively minimizes errors, and XGBoost efficiently optimizes predictive performance.By harnessing the complementary capabilities The strategic combination of RF, GB, and XGBoost within the RGXE Voting model underscores the concerted efforts to bolster predictive accuracy and reliability in the realm of spinal health.Each algorithm brings unique strengths: RF adeptly handles high-dimensional data and complex interactions, GB iteratively minimizes errors, and XGBoost efficiently optimizes predictive performance.By harnessing the complementary capabilities of these algorithms, our ensemble method ensures thorough exploration of the feature space, leading to enhanced classification outcomes.Moreover, the ensemble nature of RGXE Voting fortifies against overfitting and augments generalization performance by amalgamating diverse model predictions.This integration of powerful algorithms reflects our dedication to crafting a nuanced classification solution capable of tackling the multifaceted challenges in LBP diagnosis.
Our proposed RGXE model employs a hard voting mechanism for LBP detection, where three classifiers are used to make individual predictions.Each classifier votes for a predicted class, and the class that receives the majority of votes is chosen as the final prediction.This approach leverages the strengths of different classifiers to enhance overall accuracy and robustness for LBP detection.

Machine Learning (ML) Methods
Following the practices of previous work [32], we briefly describe the machine learning methods utilized in our study as follows: • The RF algorithm [33], a potent ensemble-learning technique, was applied to classify LBP within a dataset.This approach builds a group of DTs and uses a voting mechanism to aggregate their output.Each DT was trained using a different subset of the dataset.The RF model makes predictions based on the mode of each DT prediction.The final predicted class represents the culmination of the individual predictions.This algorithm introduces randomness during both feature selection and dataset bootstrapping, thereby promoting diversity among the constituent trees.This diversity enhances the ability of the model to generalize new unseen data and helps prevent overfitting.The RF's collective decision making through majority voting contributes to a robust classification model adept at identifying patterns related to LBP symptoms.

•
The GB ensemble method [34] for classifying LBP within a dataset involves harnessing the power of the iterative technique for predictive modeling.Applying GB to the LBP dataset involves leveraging the power of the iterative ensemble method for classification.GB creates a DT in a sequence, and each tree focuses on rectifying the previous error.This process optimizes the predictive accuracy of the model by minimizing residual errors.Mathematically, the prediction of a GB model is expressed as the sum of the predictions from all individual trees weighted by the learning rate.By iteratively improving the performance of the model, GB offers a robust approach for classifying LBP instances.The ability of the algorithm to capture complex relationships within data enhances its suitability for discerning nuanced patterns associated with spinal health.Ultimately, the application of GB contributes to the creation of a sophisticated and accurate classifier tailored to the specific challenges posed by LBP classification.

•
The DT model [35] on the LBP dataset involves the deployment of a tree-like structure to partition the data based on features to create a predictive model for classifying instances.The DT serves as a versatile and interpretable tool for classification tasks by dividing the decision-making process into a series of straightforward conditions.The algorithm iteratively selects the most informative features at each node and optimizes the model to effectively discriminate between normal and abnormal lower back conditions.Through its hierarchical structure, DT learns to make decisions by evaluating different feature thresholds, resulting in a clear and interpretable set of rules for classification.This model is particularly useful for medical diagnosis and provides insights into the factors contributing to LBP. • SVM [36] is capable of performing both linear and nonlinear classification tasks on the LBP dataset.SVM is particularly suitable for medical diagnosis [31], providing strong performance in detecting patterns within complex datasets.By transforming data points into a higher-dimensional space and identifying the optimal hyperplane for classification, SVM aims to maximize the margin between different classes.This methodology facilitates the creation of a robust classifier for distinguishing between normal and abnormal instances of lower back conditions.SVM is reliable for classifying and analyzing LBP and is suitable for dealing with the complexities of spinal health data.

•
KNN algorithm [37] on the LBP dataset utilizes a versatile and intuitive model for classification tasks.KNN works on the principle of proximity and classifies instances according to the majority class within their nearest neighbors.This method works well with all types of data, such as when looking at spinal health, because it does not need to know the exact distribution of the data.The model computes the distances between the data points and assigns a class label based on the consent of its k-nearest neighbors.
In the domain of LBP classification, KNN offers a simple yet effective approach for capturing local patterns within the data.Its adaptability makes it a valuable tool for recognizing patterns and detecting abnormal spinal conditions.• LR [38] is well suited for classification tasks, providing a probabilistic framework for discerning patterns within data.By applying a logistic function, the model estimates the probability of an instance belonging to a specific class.This simplicity and interpretability make LR particularly valuable for medical diagnoses, such as classifying LBP instances as normal or abnormal.This method is helpful in healthcare analysis because it provides a good understanding of complex connections and useful information regarding the factors affecting spinal health.

•
XGB algorithm [39] for the LBP dataset utilizes an advanced ensemble-learning technique renowned for its efficiency in classification tasks.XGB builds a robust predictive model through the sequential training of DTs, with each correcting errors made by the previous ones.The ensemble approach combines the strengths of the individual trees, thereby resulting in a highly accurate and resilient classifier.By optimizing a predefined objective function, XGB efficiently handles imbalances in class distribution, making it particularly well suited for medical datasets such as LBP analysis.The ability of this model to capture intricate patterns and exhibit high predictive performance contributes to its prominence in healthcare analytics, thereby providing valuable insights into spinal health conditions.

•
GNB algorithm [40] on the LBP dataset leverages a probabilistic model based on Bayes' theorem.GNB assumes that features are conditionally independent, allowing for a straightforward and effective classification approach.In the context of LBP classification, GNB models the probability distribution of features for each class and assigns the most likely class based on the observed feature values.Despite its simplicity, GNB has proven particularly valuable in healthcare analytics, offering interpretable insights into the likelihood of instances being classified as normal or abnormal based on the given features.Its efficient computation and ability to handle continuous data make GNB a practical choice for discerning patterns within medical datasets.

Parameter Settings
The best hyperparameters [41] for the ML methods are listed in Table 2.These hyperparameters were optimized for each method using k-fold cross-validation, including multiple testing and training.Our results show that, by fine-tuning the hyperparameters, we achieved remarkable performance scores in LBP detection.

Results and Discussion
Our research findings and discussion demonstrate the outcomes of implementing a novel method for identifying LBP.This section evaluates the performance metrics used to measure the effectiveness of the proposed approach compared with established methods.

Experimental Settings
In our study, we conducted experiments on the computer with the specified specifications in Table 3.To evaluate the effectiveness of our machine learning models, we relied on metrics; they are accuracy, precision, recall, and F1 score.Additional details related to the experimental setup are listed in Table 3.

Performance Analysis before Bootstrapping
A comparison of the performance of the applied machine learning models before bootstrapping is presented in Table 4.The accuracy scores revealed that RGXE was the top-performing model with 0.95 accuracy, closely followed by XGB at 0.90.RF and GB also perform well, with accuracies of 0.92 and 0.94, respectively.LR and SVM achieved a solid accuracy of 0.85.However, GNB and KNN exhibited slightly lower accuracies at 0.82 and 0.84, respectively.DT trails, with an accuracy of 0.83.These results provide valuable insight into the relative performance of each model, aiding in informed model selection for classification tasks.We further analyzed the performance results of our proposed model using the original dataset, as presented in Table 5.The results confirmed that the ensemble learning model achieved moderate success in detecting lower back pain.These findings suggest the need for further research experiments focused on data balancing and boosting techniques.Figure 4 shows a performance comparison using histogram-based bar charts, contrasting the proposed technique with established machine learning techniques before bootstrapping.The graph accentuates the performance of the new method, shedding light on its effectiveness compared with traditional methods.This analysis offers valuable insights into the relative efficacy of the proposed approach and aids in understanding its potential advantages over conventional techniques.

Performance Analysis after Bootstrapping and Proposed Approach
Table 6 presents the performance disparities observed across various machine learning methodologies when implemented in conjunction with the bootstrapping technique.Following bootstrapping, our examination of various models on the LBP dataset revealed substantial improvements in classification results.Notably, the RGXE Voting model stands out with an impressive 99% accuracy, demonstrating exceptional performance scores for both normal and abnormal classes.This noteworthy advancement underscores the effectiveness of the RGXE Voting ensemble, which combines RF, GB, and XGB.

Performance Analysis after Bootstrapping and Proposed Approach
Table 6 presents the performance disparities observed across various machine learning methodologies when implemented in conjunction with the bootstrapping technique.Following bootstrapping, our examination of various models on the LBP dataset revealed substantial improvements in classification results.Notably, the RGXE Voting model stands out with an impressive 99% accuracy, demonstrating exceptional performance scores for both normal and abnormal classes.This noteworthy advancement underscores the effectiveness of the RGXE Voting ensemble, which combines RF, GB, and XGB.
Individual models, such as RF and GB, also exhibit heightened accuracy and improved performance after bootstrapping.By contrast, the DT and KNN models also exhibited improved performance.SVM experienced slight improvements in accuracy and precision after bootstrapping, whereas GNB and LR maintained their performance levels.
The RGXE Voting model proposed in this study emerges as the top-performing model after bootstrapping, as shown in Table 7. Table 7 also reveals that for all methods evaluated in this study, the accuracy performance of most methods increased by an average of up to 5%.Its resilience and efficacy in addressing class imbalances and capturing intricate patterns within the LBP dataset were evident through its remarkable accuracy and wellbalanced precision, recall, and F1 scores.These attributes make RGXE Voting a robust and reliable choice for classifying LBP.
In Figure 5, the radar chart visualizes the performance of the different methods, showing that our proposed approach consistently leads to superior outcomes across all evaluated techniques.The chart demonstrates the broader spectrum of performance accuracy achieved using our novel method, emphasizing its effectiveness in enhancing the accuracy of the applied methods.This study underscores the success of our innovative approach in significantly improving the accuracy scores across the board for the methods examined.Individual models, such as RF and GB, also exhibit heightened accuracy and improved performance after bootstrapping.By contrast, the DT and KNN models also exhibited improved performance.SVM experienced slight improvements in accuracy and precision after bootstrapping, whereas GNB and LR maintained their performance levels.
The RGXE Voting model proposed in this study emerges as the top-performing model after bootstrapping, as shown in Table 7. Table 7 also reveals that for all methods evaluated in this study, the accuracy performance of most methods increased by an average of up to 5%.Its resilience and efficacy in addressing class imbalances and capturing intricate patterns within the LBP dataset were evident through its remarkable accuracy and well-balanced precision, recall, and F1 scores.These attributes make RGXE Voting a robust and reliable choice for classifying LBP.In Figure 5, the radar chart visualizes the performance of the different methods, showing that our proposed approach consistently leads to superior outcomes across all evaluated techniques.The chart demonstrates the broader spectrum of performance accuracy achieved using our novel method, emphasizing its effectiveness in enhancing the accuracy of the applied methods.This study underscores the success of our innovative approach in significantly improving the accuracy scores across the board for the methods examined.Figure 6 depicts the outcomes of the evaluation performed using a confusion matrix for the ML methods when employing the bootstrapping technique.The results demonstrate the outstanding performance of the proposed RGXE approach, with a notably high rate of accurate classification and minimum errors compared with alternative methods.Figure 7 illustrates a performance comparison using histogram-based bar charts, delineating the effectiveness of the newly proposed approach against established machine learning techniques.The graph highlights the performance of the new method and offers insights into its efficacy relative to conventional methods.This visual representation underscores the potential superiority of the proposed approach for achieving desirable outcomes.Figure 7 illustrates a performance comparison using histogram-based bar charts, delineating the effectiveness of the newly proposed approach against established machine learning techniques.The graph highlights the performance of the new method and offers insights into its efficacy relative to conventional methods.This visual representation underscores the potential superiority of the proposed approach for achieving desirable outcomes.

10-Fold Cross Validations Analysis
Following the application of 10-fold cross-validation, the validation results outlined in Table 8 specifically highlight the superiority of the RGXE method.The analysis underscores that the RGXE model consistently achieves a high k-fold accuracy, surpassing 99%, and shows minimal standard deviation scores.This strong validation ensures the generalizability of our proposed RGXE method, particularly for the identification of LBP.

Assessment of Computational Complexity Performance
The assessment results of the computational complexities of the implemented ML methods are listed in Table 9.The results revealed that the runtime computations for various methods in our study varied, with RF requiring 0.034 s, GB requiring 0.187 s, and DT performing computations in 0.009 s.SVM had a longer runtime of 1.014 s, whereas the proposed RGXE method exhibited the highest runtime of 2.406 s.These findings highlight

10-Fold Cross Validations Analysis
Following the application of 10-fold cross-validation, the validation results outlined in Table 8 specifically highlight the superiority of the RGXE method.The analysis underscores that the RGXE model consistently achieves a high k-fold accuracy, surpassing 99%, and shows minimal standard deviation scores.This strong validation ensures the generalizability of our proposed RGXE method, particularly for the identification of LBP.

Assessment of Computational Complexity Performance
The assessment results of the computational complexities of the implemented ML methods are listed in Table 9.The results revealed that the runtime computations for various methods in our study varied, with RF requiring 0.034 s, GB requiring 0.187 s, and DT performing computations in 0.009 s.SVM had a longer runtime of 1.014 s, whereas the proposed RGXE method exhibited the highest runtime of 2.406 s.These findings highlight the diverse computational efficiencies of the methods, with RGXE demonstrating a longer runtime but potentially offering distinct advantages in detecting LBP.

Comparison with Previous Studies
In this sub-section, we further analyzed and compared the performance of our proposed method with the results of previous work with the same dataset, as can be seen in Table 10.For a thorough comparison, we evaluated the efficacy of our innovative RGXE model, employing bootstrapping, against established methodologies.Previous studies relied predominantly on classical machine learning, achieving a maximum accuracy of 94%.By contrast, our RGXE model, which leverages a novel bootstrapping approach, exceeds these benchmarks by achieving a high performance score of 99%.This analysis conclusively establishes the superiority of our proposed methodology over existing state-of-the-art methods.Additionally, we further validated our proposed model using an independent dataset called "Anemia Types Classification [44]".This independent dataset comprises more heterogeneous data, featuring 15 broad demographic characteristics across 1281 instances.The performance results of the proposed approach are presented in Table 11.The results demonstrate that the proposed model achieved high performance scores on this independent data, predicting the type of anemia with an accuracy of up to 98%.This additional experiment confirmed that our proposed model performs well even on new datasets.

Limitations of the Study
The proposed RGXE for the early detection of lower back pain using spinal anomalies demonstrates promising results.However, it has certain limitations.One notable limitation is the relatively high computational time complexity, recorded at 2.41 s.This can be a constraint in scenarios requiring real-time analysis or the rapid processing of large datasets.Future work can focus on optimizing the algorithm to reduce computational time, potentially through techniques such as parallel processing or more efficient data handling methods.Additionally, the current model utilizes original features for building the detection models.While this approach has yielded significant results, there is potential for further enhancement through advanced feature engineering.Incorporating more sophisticated features derived from the raw data could improve the model's accuracy and robustness, leading to more reliable early detection of lower back pain.

Conclusions
This study introduces a novel ML method called RGXE for the early detection of LBP.Using a Kaggle dataset with 310 rows and 12 columns, including spinal anomaly features, we balanced the data using SMOTE and bootstrapping techniques.Subsequently, advanced ML models were applied to the bootstrapped data to evaluate their performance.Validation was conducted via 10-fold cross-validation to ensure the reliability of our results.Furthermore, we present the computational complexity of each method to understand its efficiency in terms of runtime.Our extensive research demonstrates that the proposed RGXE Voting outperforms other ML methods and that of previous work results, achieving an impressive accuracy of 0.99.Fine-tuning was performed for each method to further optimize performance.This study makes a substantial contribution to the early LBP detection field by presenting a robust and efficient approach to revolutionizing healthcare practices.
Furthermore, we aim to advance our method by decreasing its computational runtime complexity and exploring feature engineering methods as well as explainable artificial intelligence to improve not only the performance of the models but also to provide logic and explanation behind the decision being presented.Therefore, this will eventually increase the trust of health practitioners toward the decision being made by the model.

Figure 1 .
Figure 1.Architecture of our novel proposed research methodology.

Figure 1 .
Figure 1.Architecture of our novel proposed research methodology.

Figure 3 .
Figure 3. Structure of our proposed ensemble model.

Figure 3 .
Figure 3. Structure of our proposed ensemble model.

Figure 4 .
Figure 4. Histogram-based comparisons of all models' performances before bootstrapping.

Figure 4 .
Figure 4. Histogram-based comparisons of all models' performances before bootstrapping.

Figure 5 .
Figure 5. Radar chart-based comparison of ML model performance.Figure 5. Radar chart-based comparison of ML model performance.

Figure 5 .
Figure 5. Radar chart-based comparison of ML model performance.Figure 5. Radar chart-based comparison of ML model performance.

Figure 6
Figure6depicts the outcomes of the evaluation performed using a confusion matrix for the ML methods when employing the bootstrapping technique.The results demonstrate the outstanding performance of the proposed RGXE approach, with a notably high rate of accurate classification and minimum errors compared with alternative methods.

Figure 6 .
Figure 6.Confusion matrix results of ML methods.

Figure 6 .
Figure 6.Confusion matrix results of ML methods.

Figure 7 .
Figure 7. Histogram-based comparisons of all models' performances after bootstrapping.

Figure 7 .
Figure 7. Histogram-based comparisons of all models' performances after bootstrapping.

Table 1 .
Literature summary of previously published works examined for comparative analysis.

Table 2 .
Analysis of hyperparameter tuning of applied machine learning models.

Table 3 .
Specification details of our experimental setup.

Table 4 .
Performance of ML models before bootstrapping.

Table 5 .
Performance of ML model with original data.

Table 6 .
Performance of ML models after bootstrapping.

Table 6 .
Performance of ML models after bootstrapping.

Table 7 .
Performance of ML models before and after bootstrapping.

Table 7 .
Performance of ML models before and after bootstrapping.

Table 8 .
Performance evaluation based on variations in the 10-fold cross-validation mechanism.

Table 8 .
Performance evaluation based on variations in the 10-fold cross-validation mechanism.

Table 9 .
Analysis of computational complexity in terms of runtime for the applied approaches.

Table 10 .
Performance comparison of our proposed RGXE with previous studies with the same dataset.

Table 11 .
Performance comparison of our proposed RGXE with independent dataset.