Machine Learning Identifies Chronic Low Back Pain Patients from an Instrumented Trunk Bending and Return Test

Nowadays, the better assessment of low back pain (LBP) is an important challenge, as it is the leading musculoskeletal condition worldwide in terms of years of disability. The objective of this study was to evaluate the relevance of various machine learning (ML) algorithms and Sample Entropy (SampEn), which assesses the complexity of motion variability in identifying the condition of low back pain. Twenty chronic low-back pain (CLBP) patients and 20 healthy non-LBP participants performed 1-min repetitive bending (flexion) and return (extension) trunk movements. Analysis was performed using the time series recorded by three inertial sensors attached to the participants. It was found that SampEn was significantly lower in CLBP patients, indicating a loss of movement complexity due to LBP. Gaussian Naive Bayes ML proved to be the best of the various tested algorithms, achieving 79% accuracy in identifying CLBP patients. Angular velocity of flexion movement was the most discriminative feature in the ML analysis. This study demonstrated that: supervised ML and a complexity assessment of trunk movement variability are useful in the identification of CLBP condition, and that simple kinematic indicators are sensitive to this condition. Therefore, ML could be progressively adopted by clinicians in the assessment of CLBP patients.


Introduction
Low back pain (LBP) is the leading cause of a high number of years lived with disability worldwide. In both 10-24 and 50-74-year-old age groups, LBP is typically responsible for the loss of an entire year of full health [1]. A better understanding of this musculoskeletal condition and its complexity is therefore essential for clinicians involved in the care of patients with LBP.
LBP, especially chronic LBP (CLBP), leads to a fear of movement and causes patients to limit their activities of daily living and social participation to avoid pain [2,3]. The sedentary 1.
The use of a set of three IMUs should provide more information about the b&r test and allows the whole lumbopelvic-hip complex to be examined. Some authors placed the IMUs at L2 and S2 vertebrae to measure low lumbar flexion-extension movements [22], while others chose T10-12 and S2 to be able to measure the local dynamic stability, coordination, and variability of the lumbar spine in repeated flexionextension movements [23]. We chose T12 and S2 so to consider the entire lumbar spine for a given movement. Differences between the time series of different sensors can provide information about the relative angular velocity and acceleration between two anatomical landmarks. Typically, for sagittal plane angular velocities, the difference between the data from a sensor placed at the twelfth thoracic vertebra and a sensor placed at the second sacral vertebra should provide an estimate of the lumbar angular velocity. From a clinical perspective, the use of three IMUs placed at different points may help to identify the most relevant location for a single IMU, which may be useful when time constraints apply.

2.
Two main techniques were used in this study. The first is a standard statistical analysis, which consists of computing the SampEn from the IMUs time series in two groups-NLBP subjects and CLBP patients-and comparing them. The second technique is ML, which has been used for about 40 years in the study of LBP [13], especially in the field of medical imaging and clinical data analysis for diagnostic and decision-making purposes [14,15]. Note that SampEn will be part of the data used by ML to identify CLBP patients. 3.
We believe that a first step toward integrating the clinical interpretation of a test, such as the b&r test, into a biopsychosocial model must be to examine its ability to discriminate between CLBP patients and NLBP subjects. A recent study using a particular supervised ML algorithm (Support Vector Machine, SVM) to analyze IMU data has already shown that a kinematic test of the lumbar spine is able to discriminate NLBP subjects from LBP patients and classify them according to their risk of chronicity, i.e., between high risk and medium to low risk, with an accuracy of >75% [12]. Moreover, it has been shown in [24] that SVM can detect neck pain from rotational head movements with an accuracy of 82%. These last two studies show that a diagnostic analysis using ML algorithms supplied with kinematic parameters is a promising way to investigate these spinal conditions further. Our work focuses on the kinematic signature of patients suffering from CLBP and, more specifically, on the complexity of their variability during repetitive movements of the trunk along the lines of [20].
The main question addressed by this study is: "Can an instrumented b&r test identify CLBP patients by resorting to ML algorithms?" Our hypothesis is that algorithms such as SVM can accurately discriminate CLBP patients from NLBP subjects using raw data from three IMUs and SampEn values.

Population
Data were collected from a group of CLBP patients and from a matched group of healthy NLBP subjects. The study protocol was approved by the Intercommunale de Santé Publique du Pays de Charleroi Ethics Committee (ISPPC/OM008) under the number B325-2020-43666 and complied with the Helsinki Declaration on the Ethical Principles for Medical Research Involving Human Beings. All of the patients and subjects received an information sheet explaining the purpose of the study and gave informed consent before participation.
Twenty CLBP patients were recruited on a voluntary basis between the 3 November and the 1 December 2020 from the pool of patients treated for CLBP at the University Hospital of Charleroi (CHUC) in the "Sport Santé" department in Monceau-sur-Sambre, Belgium. The inclusion criteria were: the presence of LBP diagnosed by a physician and lasting longer than three months, ability to perform three trunk flexions with the lower limbs extended, aged between 18 and 65 years, body mass index (BMI) between 18 and 35 kg·m −2 , and a pain score of less than 8/10 on a verbal Numerical Rating Scale (NRS) the day of the test. The exclusion criteria were: the presence of tumor, fracture, neurological signs (loss of strength and/or sensitivity), decrease or abolition of reflexes, and presence of pain of neuropathic origin evaluated by a positive DN4 ("diagnostic de Douleur Neuropathique", i.e., neuropathic pain diagnostic) questionnaire [25], recent trauma, surgery at the spinal level, vertigo and balance disorders due to positional changes, musculoskeletal disorders in another region that could interfere with the b&r test, or systemic and metabolic disorders.
Twenty healthy NLBP subjects were recruited on a voluntary basis between 1 December and 15 December 2020 from CHUC staff or their personal acquaintances. They were matched with the CLBP patients using the following criteria: age, BMI, and physical activity level. For these subjects, the exclusion criteria were: the presence of LBP in the past year, history of spinal surgery, musculoskeletal disorders in another region that could interfere with the b&r test, systemic and metabolic disorders, and the use of analgesics.

Protocol, Data Collection and Preprocessing
The general characteristics were collected from all participants: gender, age, BMI, and physical activity level were collected through the Global Physical Activity Questionnaire (GPAQ) [26]. Two experienced examiners (physiotherapists) were involved in the protocol. A first examiner (examiner #1) gave standardized instructions to each participant and showed a video demonstrating how to perform the b&r test. The same examiner (#1) cleaned the skin with cotton wool and ether at the sites where the three IMUs were taped. These were attached with hypoallergenic double-sided adhesive tape ( Figure 1): Sensor #1 (SENS1) on the opposite to the spinous process of the twelfth thoracic vertebra (T12); Sensor #2 (SENS2) opposite to the second sacral vertebra (S2) on a horizontal line connecting the posterior-superior iliac spines; and Sensor #3 (SENS3) on the lateral side of the thigh, 10 cm below the greater trochanter.
center of a stool with the fingertips of both hands, followed by an immediate trunk extension to return to the starting position while focusing on a target in front of them ( Figure  1). This movement was repeated continuously, as quickly and comfortably as possible. The test was performed once and lasted 70 s. The participants were allowed to stop the test before the end of the 70 s if they felt unable to continue; however, in this case, they were then excluded from the study. The examiner was present near the participant during the test but gave no verbal instructions. A second examiner (examiner #2) recorded the data from the IMUs on a computer located less than 3 m from the patient or subject, with the IMUs connected to the computer via a cable [27]. Examiner #2 informed the participant of the time remaining every 10 s. The computer screen was visible to neither examiner #1 nor the participant. The data were stored on the computer's hard disk drive for later analysis.
The IMUs used are part of a homemade system called DYSKIMOT (for Motor Dyskinesia), which has been presented in detail previously in [27]. The DYSKIMOT system software (v. 3.1) recorded the data from the three IMUs with a sampling frequency of 100 Hz. The system is based on commercial IMUs (LSM9DS1, SparkFun Electronics, Niwot, CO, USA) equipped with a triaxial accelerometer, gyroscope, magnetometer, and thermometer. Time series of accelerations (Acc X, Y, Z) along each of the 3 axes of each sensor and angular velocities (Gyr X, Y, Z) were recorded. The DYSKIMOT software also computed the angular accelerations (AccA X, Y, Z) in real-time by numerical differentiation (method) and the corrected angles (AngC X, Y, Z) by numerical integration and linear drift subtraction of the angular velocities. Typical traces are shown in Figure 2. The DYSKIMOT IMU Each participant was then placed in a standing position with their arms by their sides, as shown in Figure 1. To perform the b&r test, the participants were instructed to bend the trunk forward (flexion) without flexing their lower limbs and to touch a target in the center of a stool with the fingertips of both hands, followed by an immediate trunk extension to return to the starting position while focusing on a target in front of them ( Figure 1). This movement was repeated continuously, as quickly and comfortably as possible. The test was performed once and lasted 70 s. The participants were allowed to stop the test before the end of the 70 s if they felt unable to continue; however, in this case, they were then excluded from the study. The examiner was present near the participant during the test but gave no verbal instructions.
A second examiner (examiner #2) recorded the data from the IMUs on a computer located less than 3 m from the patient or subject, with the IMUs connected to the computer via a cable [27]. Examiner #2 informed the participant of the time remaining every 10 s. The computer screen was visible to neither examiner #1 nor the participant. The data were stored on the computer's hard disk drive for later analysis.
The IMUs used are part of a homemade system called DYSKIMOT (for Motor Dyskinesia), which has been presented in detail previously in [27]. The DYSKIMOT system software (v. 3.1) recorded the data from the three IMUs with a sampling frequency of 100 Hz. The system is based on commercial IMUs (LSM9DS1, SparkFun Electronics, Niwot, CO, USA) equipped with a triaxial accelerometer, gyroscope, magnetometer, and thermometer. Time series of accelerations (Acc X, Y, Z) along each of the 3 axes of each sensor and angular velocities (Gyr X, Y, Z) were recorded. The DYSKIMOT software also computed the angular accelerations (AccA X, Y, Z) in real-time by numerical differentiation (method) and the corrected angles (AngC X, Y, Z) by numerical integration and linear drift subtraction of the angular velocities. Typical traces are shown in Figure 2. The DYSKIMOT IMU casing was 3D printed using PLA (Polylactic Acid), the most widely used polymer in 3D printing. Other polymers may have been used, such as ABS (Acrylonitrile butadiene styrene) or TPU (Thermoplastic polyurethane). The latter choice made it possible to produce flexible parts that would presumably be more comfortable for the user. casing was 3D printed using PLA (Polylactic Acid), the most widely used polymer in 3D printing. Other polymers may have been used, such as ABS (Acrylonitrile butadiene styrene) or TPU (Thermoplastic polyurethane). The latter choice made it possible to produce flexible parts that would presumably be more comfortable for the user. As in [2], we considered the first 10 s of the time series as the warm-up period, and any data collected after 70 s were omitted from analyses. Therefore, only data between 10 and 70 s were used for the analysis; all of the time series lasted 1 min.

SampEn and Complexity Factors
An R (v 4.1.0, R Core Team, Auckland, New Zealand) routine was developed to calculate SampEn for all recorded time series. In accordance with Yentes [19], we used the following parameters in the algorithm (R-Studio) for calculating the SampEn: vector length m = 2, tolerance r = 0.2 SD and the number of points in the data series N = 6000. The details can be found in [20]. The values obtained from Gyr and Acc, respectively, were referred to as SampEn Gyr X, Y, Z and SampEn Acc X, Y, Z.
In the case of the b&r test, the Gyr Y time series should have considerable practical implications since it follows the direction of the performed movement. Therefore, specific computations were performed for this time series. The Gyr Y of SENS2 was subtracted from the same time series of SENS1 (Gyr Y) and SENS3 (−Gyr Z; SENS3 has a different orientation, see Figure 1). SampEn was computed for the resulting time series. The calculation of SampEn using these data should provide a relevant estimation of the complexity of angular velocity variations in lumbar and hip -flexion during the b&r test. The SampEn values obtained in this way were termed the lumbar flexion complexity factor (LCF) and the hip flexion complexity factor (HCF), and together formed what we called complexity factors (CF).
Statistical comparisons between all SampEn and CF in CLBP patients and NLBP subjects were performed with a t-test or with a Mann-Whitney rank sum when the Shapiro-Wilk normality test failed. The mean, SD, standard error of measurement ( where N is the sample size), 95% confidence interval (CI denotes half-width), and minimum detectable change ( . √ ) were calculated for the differences in parameter values between the two groups. Statistical tests were performed with SigmaPlot software (v. 14.0, Systat Software, San Jose, CA, United States of America) with a significance level of 5%. As in [2], we considered the first 10 s of the time series as the warm-up period, and any data collected after 70 s were omitted from analyses. Therefore, only data between 10 and 70 s were used for the analysis; all of the time series lasted 1 min.

SampEn and Complexity Factors
An R (v 4.1.0, R Core Team, Auckland, New Zealand) routine was developed to calculate SampEn for all recorded time series. In accordance with Yentes [19], we used the following parameters in the algorithm (R-Studio) for calculating the SampEn: vector length m = 2, tolerance r = 0.2 SD and the number of points in the data series N = 6000. The details can be found in [20]. The values obtained from Gyr and Acc, respectively, were referred to as SampEn Gyr X, Y, Z and SampEn Acc X, Y, Z.
In the case of the b&r test, the Gyr Y time series should have considerable practical implications since it follows the direction of the performed movement. Therefore, specific computations were performed for this time series. The Gyr Y of SENS2 was subtracted from the same time series of SENS1 (Gyr Y) and SENS3 (−Gyr Z; SENS3 has a different orientation, see Figure 1). SampEn was computed for the resulting time series. The calculation of SampEn using these data should provide a relevant estimation of the complexity of angular velocity variations in lumbar and hip -flexion during the b&r test. The SampEn values obtained in this way were termed the lumbar flexion complexity factor (LCF) and the hip flexion complexity factor (HCF), and together formed what we called complexity factors (CF).
Statistical comparisons between all SampEn and CF in CLBP patients and NLBP subjects were performed with a t-test or with a Mann-Whitney rank sum when the Shapiro-Wilk normality test failed. The mean, SD, standard error of measurement (SEM = SD √ N , where N is the sample size), 95% confidence interval (CI denotes half-width), and minimum detectable change (MDC = 1.96 √ 2 SEM) were calculated for the differences in parameter values between the two groups. Statistical tests were performed with SigmaPlot software (v. 14.0, Systat Software, San Jose, CA, United States of America) with a significance level of 5%.

Data Segmentation
The first way to apply ML is to use and analyze a given time series as a whole. A second way is to divide the time series into cycles and use them as independent data related to a particular participant. Since we have explored both options, further details on segmentation into cycles are provided below. The different b&r cycles can be clearly seen in Figure 2. Each cycle corresponds to the repetition of b&r trunk movements carried out by a participant. The start and end of each cycle were determined by identifying the successive minima of the Acc Z time series for SENS1, i.e., the time series with the highest signal-to-noise ratio.
The segmentation algorithm, which divides the time series into distinct cycles, consists of five steps: (1) averaging the original Acc Z time series using a rolling centered window (width: 25 points or 0.25 s); (2) calculating the global amplitude and the exact value of the threshold (40% of the amplitude above the minimum); (3) extracting data below this threshold to avoid the processing of local minimum above this threshold; (4) detecting the minima and computing the cycle limits as the average position of two consecutive minima; (5) finally, all cycles were normalized by linear interpolation to 450 points, which is the maximum number of points for a cycle in our data set. The last 450 points were referred to as "itime".
In the Results section, ML analysis based on 1-min time series is referred to as "whole sequences", and ML analysis based on individual cycles is referred to as "cycle segmentation".

Discrimination of NLBP and CLBP Participants
The discriminating power of classification by the ML algorithm was assessed by computing accuracy and Area Under Curve (AUC) values for binary classification. Several classification algorithms were used, all belonging to the supervised ML, as each participant was labeled with either NLBP or CLBP condition. The ML algorithms studied were Linear Support Vector Machines (Linear SVM), Non-linear Support Vector Machine Radial Basis Function (SVM RBF), Random Forest (RF), Gaussian naive Bayes (GaussianNB), K-neighbors (KNN), Adaptive Boosting (AdaBoost), and Decision Tree (DT) [28]. The grid-search method [29] was used to optimize the hyperparameters for each ML algorithm. The hyperparameters are listed in Table 1. The ML algorithms and the grid search are implemented by the v1.1 Scikit-learn library.  Statistical describers were maximum, minimum, mean, median, 1st quartile, 3rd quartile, and SD for each available time series (each component X, Y, and Z was treated as an independent time series). These parameters, referred to as raw data features, were used as discriminators for ML algorithms. The performance evaluation of each ML algorithm was measured after an n-fold cross-validation process, scaling of the time series, and 100 training repetitions with a random ordering of NLBP and CLBP.
The computed SampEn and CF were also included as features in the ML algorithms in a second step. The accuracy and AUC of the ML algorithms were computed using only raw data features and only SampEn or CF. The added value of SampEn and CF in identifying CLBP was assessed.

Most Discriminative Features
Two methods were used to identify the best discriminating feature: Sequential Feature Selector forward (SFS Forward) and Sequential Feature Selector backward (SFS Backward) [30]. The SFS Forward accumulates the best-performing features one by one and creates a hierarchy from the best-performing feature to the worst-performing feature. The SFS Backward starts with all features and removes the worst-performing features one by one. This results in a hierarchy from the longest-remaining feature to the least-remaining feature. Each SFS was run 700 times (7 algorithms × 100 training repetitions).

SampEn & CF
Four of the 18 calculated SampEn values (six time series and 3 IMUs) showed significant differences between CLBP and NLBP groups. HCF showed significant differences between CLBP and NLBP groups, whereas LCF did not. These statistically significant results are shown in Table 2, while the others have been omitted for simplicity. for: the Y-axis of the gyroscope from the sensor 1 (Gyr Y SENS1), the Z-axis of the gyroscope from the sensor 2 (Gyr Z SENS2), Hip Complexity Factor (HCF), the Y-axis of the gyroscope from the sensor 2 (Gyr Y SENS2), the X-axis of the accelerometer from the sensor 2 (Acc X SENS2).

Cycle Segmentation
The result of the segmentation process was a data set of 1678 cycles labeled with their respective index (CLBP patient or healthy NLBP subject). Figure 3 show the principle of segmentation. Figure 4 shows all the computed cycles for Gyr Y of SENS2 as a function of time in the form of a mean cycle and a shaded area indicating the cycle's SD.

Machine Learning
The optimal hyperparameter values for each ML algorithm are highlighted in bold in Table 1. The performance indicators of all the ML algorithms used are listed in Table 3.  gyroscope from the sensor 2 (Gyr Z SENS2), Hip Complexity Factor (HCF), the Y-axis of the gyroscope from the sensor 2 (Gyr Y SENS2), the X-axis of the accelerometer from the sensor 2 (Acc X SENS2).

Cycle Segmentation
The result of the segmentation process was a data set of 1678 cycles labeled with their respective index (CLBP patient or healthy NLBP subject). Figure 3 show the principle of segmentation. Figure 4 shows all the computed cycles for Gyr Y of SENS2 as a function of time in the form of a mean cycle and a shaded area indicating the cycle's SD.   gyroscope from the sensor 2 (Gyr Z SENS2), Hip Complexity Factor (HCF), the Y-axis of the gyroscope from the sensor 2 (Gyr Y SENS2), the X-axis of the accelerometer from the sensor 2 (Acc X SENS2).

Cycle Segmentation
The result of the segmentation process was a data set of 1678 cycles labeled with their respective index (CLBP patient or healthy NLBP subject). Figure 3 show the principle of segmentation. Figure 4 shows all the computed cycles for Gyr Y of SENS2 as a function of time in the form of a mean cycle and a shaded area indicating the cycle's SD.   The best classification accuracy was obtained using the whole sequences with the GaussianNB algorithm. In this case, an accuracy of 79% and an AUC value of 0.85 were obtained. Gaussian naive Bayes process each feature, computing a normal distribution with their mean and standard deviation. To predict the class of a new subject, the algorithm computes the likelihood of the subject's raw data belonging to the gaussian distribution of NLBP and CLBP. The Gaussian naïve Bayes could compare the scores obtained for each class and decide the best-fitted class for the subject. As the raw data were on a times series, it seems reasonable that the data fit well with the GaussianNB algorithm. Table 4 shows the effects of using SampEn or CF as features instead of raw IMU data for the whole sequences. The performance globally decreased with SampEn. The best algorithm (GaussianNB) achieved 64% Accuracy and an AUC value of 0.69. Using CF, the best algorithm (SVM RBF) achieved 74% Acc and an AUC value of 0.80. Overall, cycle segmentation produced the worst results, so for simplicity, we have omitted these data from the results table. Table 4. Accuracy and AUC scores for CLBP-NLBP classification using the whole sequences with different features.

Most Discriminative Features
The minimum value of Gyr Y measured by SENS2 was the most discriminative feature (first 355 times in 700 runs), followed by the Q3 of Acc X measured by SENS3. The SD of Acc Y measured by SENS2 was the most discriminating feature of the second test (114 times in 700 runs). The complete results can be found in Table 5.

Discussion
Our results showed that ML was able to discriminate CLBP patients from NLBP subjects with good accuracy, especially for the GaussianNB algorithm. The identification of the most discriminating features shows that Gyr Y SENS2 min is the best parameter: CLBP patients have a lower flexion angular velocity (in norm) in the return phase of the b&r test than NLBP subjects; the angular velocity was measured by a sensor placed in front of the second sacral vertebra. A wide variety of models for the kinematic analysis of the lumbar region are available in the literature [31]. Several authors have proposed positioning two sensors, one near T12 and the other near S2, to characterize the movements of the lumbar spine in the sagittal plane. Some used this model with an optoelectronic system [32], while others used IMUs with or without comparison to an optoelectronic system [9,11,32,33]. In the positioning of our IMUs, SENS2 (placed at S2) provided the maximum number of discriminative features. SENS1 (placed at T12) alone did not provide us with discriminative data, but the HCF, calculated from SENS2 (S2) and SENS3, did. The optimal number of sensors remains an open discussion, but SENS2 should be preferred for clinical applications. According to our results, the single SENS2 seems to be sufficient to clinically discriminate between CLBP and NLBP subjects; this result may seem attractive for clinical use by reason of its simplicity; however, due to the limited sample of our study, this finding should be treated with caution. Nonetheless, we advocate the use of multiple sensors to analyze lumbopelvic and hip movements in clinical research in order to expand our understanding of the disturbances of these movements in the clinical setting.
Due to the early onset of fatigue, the duration of the b&r test (70 s) could be a source of bias if a CLBP patient is severely physically deconditioned. However, the test can be interrupted after 20 s, and reliable SampEn values can be obtained despite a three-fold shorter time series [20]. Future clinical studies should be conducted to understand the effects of spinal erector muscle fatigue on the b&r test. The use of the b&r test in acute or sub-acute patients has also not been clinically studied.
Some values of SampEn and CF were significantly different between CLBP patients and NLBP subjects when the b&r test was performed. SampEn showed a significant difference for the Gyr Y of SENS1, located at the lower thoracic level, and Gyr Y SENS2, located at the upper sacral level. These differences suggest a lower complexity of variability in angular velocity measured about the transverse axis, which corresponds to the main b&r movement. In other words, CLBP patients show more stereotyped movements of the lumbo-pelvic complex, which is consistent with the optimal variability paradigm of [21,34]: the healthier an individual, the more variability they can show in performing a movement, which is an indication of their adaptability. HCF also showed a significant decrease in CLBP patients, as did SampEn for Acc X of SENS2. In contrast, SampEn for Gyr Z showed an increase. The former describes the complexity of the pelvic tilting movement about an antero-posterior axis, and the latter the complexity of the vertical pelvic displacement, which is an obvious part of the b&r test. As with lumbar muscle activity in a previous study [35], the observed behaviors suggest a tendency toward stereotyped movements in CLBP patients. The literature suggests that the increase or decrease in complexity may indicate two different pathological phenotypes: instability or hypercontrol [36]. It is quite possible that the observed phenomenon of either upward or downward changes in CF is responsible for the less and even non-discriminative results in distinguishing CLBP patients from NLBP participants. In future studies, it would be appropriate to pre-classify CLBP patients using existing relevant clinical tests [37][38][39] to investigate the ability of the b&r test to highlight these two phenotypes.
Despite these observed significant differences, the mean differences between SampEn and HCF in CLBP patients compared with NLBP subjects are so close to those of SEM and MDC that the clinical interest of SampEn can be questioned. Furthermore, although SampEn is easy to compute with any modern computer, a deviation of this parameter from a norm not yet established may be very difficult or impossible to detect by a clinician in practice. It is important to identify more standard kinematic parameters (position, velocity, acceleration) representative of the CLBP population. Visualization of all cycles performed during the b&r test in a single graphical representation has already shown that NLBP participants, for example, achieved overall higher values for angular velocity about the flexion axis than CLBP patients. This component of angular velocity proved to be the most discriminating feature in our study. Therefore, focusing on flexion movement velocity may be a good recommendation for clinicians when managing CLBP patients.
Regardless of the most discriminating features, the ML algorithms were able to accurately identify CLBP patients when performing a b&r test. The algorithm with the best performance is GaussianNB, which is supplied with the whole time series recorded by the IMUs. It achieves an accuracy of 79% and an AUC value of 0.85. It is worth noting that good accuracy can also be achieved when an SVM RBF algorithm is provided with CF from the whole time series. In this case, an accuracy of 74% and an AUC value of 0.80 are achieved. The better performance of the SVM algorithm might be related to a well-known property of this method, which tends to perform better than other ML algorithms when the number of features is small [40].
Abdollahi et al. were able to distinguish LBP patients from NLBP subjects during a kinematic test using ML analysis-specifically SVM-of time series collected via an IMU attached to the sternum [12]. They were also able to classify them according to the risk of chronicity [12]. Similar results with lower accuracy were obtained with the same test but using classical statistical analysis (linear discriminant analysis) [41]. In our study, we found that the SENS2 had higher discriminating power in both classical statistical analysis and ML analysis. However, we did not investigate whether our method is able to classify patients according to their risk of chronicity. The differences between [12] and our study are mainly due to the different experimental protocols. Although we use a sagittal plane test, our test is performed over a longer period of time (60 s of recording) with a sensor recording frequency (100 Hz) five times higher than [12]. The amount of data collected this way allowed us to analyze the complexity of the movement variation. In our study, we did not measure the center of mass motion, and our NLBP population consisted of only 20 subjects, which severely limits the learning ability of the ML process. However, both studies demonstrate the interest in using ML in spinal kinematic analysis to identify and assess CLBP patients. It remains to be shown that these tools are also useful for the therapeutic follow-up of these patients.
As a counterpoint to using biomechanics to better understand LBP conditions, a metamodel created with the help of a panel of 27 multidisciplinary experts is used to illustrate the number of factors that contribute to LBP, disability, quality of life, and other outcomes as well as the number and strength of their interactions [42]. With a problem as multifactorial and complex as LBP, the authors of this counterpoint emphasize the need to integrate the interactions of biopsychosocial factors into research to improve the management of patients with LBP. This suggests that in future works, it will be essential to integrate these biopsychosocial factors and the state of the internal environment into the raw IMU data and the variability indicators computed from them when performing a b&r test to better understand its discriminating power. This may be achieved via analysis from non-invasive portable electrochemical sensors [43] of fluids such as sweat and its relevant components. An ML algorithm can help achieve this goal, as both quantitative and qualitative data can be used.

Conclusions
The objective of this study was to evaluate the relevance of various machine ML algorithms and SampEn in the identification of LBP conditions. Using the raw data from three IMUs and the SampEn values obtained during a b&r test, the results showed better abilities to discriminate CLBP patients from NLBP subjects for the GaussianNB ML algorithm than the SampEn discriminant values alone. This study demonstrated that: supervised ML and a complexity assessment of trunk movement variability are useful in the identification of CLBP conditions, and that simple kinematic indicators are sensitive to the latter condition.
Regardless of the pathology studied, our study can shed light on the advantages and disadvantages of "standard" statistical analysis and ML-based approaches in clinical applications. In standard statistical analysis, the experimenter/clinician a priori identifies relevant indices and computes them. If the indices are well chosen, they will show statistically significant differences between groups, typically between healthy subjects and patients. By accepting arbitrariness in the choice of indices, such a methodology can show significant results for small populations, typically groups of 20 subjects, which can be easily recruited from a clinician's patients and acquaintances. One of the advantages of ML is that the algorithm itself finds the most discriminative indices in a non-arbitrary way, but the population has to be larger than in usual statistical analysis, which can make ML complicated to apply in daily clinical practice. What we have shown is that ML may already give good results in discriminating small but accurately selected groups and that it may also identify the main discriminative features of the performed measurements. The latter features might, in turn, be used in quick clinical tests based on the measurement of a single parameter and threshold value. Hence, ML may provide key parameters to be included in more standard approaches already used in daily clinical practice.