Using Natural Language Processing to Identify Low Back Pain in Imaging Reports

Kim, Yeji; Song, Chanyoung; Song, Gyuseon; Kim, Sol Bi; Han, Hyun-Wook; Han, Inbo

doi:10.3390/app122412521

Open AccessArticle

Using Natural Language Processing to Identify Low Back Pain in Imaging Reports

¹

Research Competency Milestones Program of School of Medicine, CHA University School of Medicine, Bundang-gu, Seongnam-si 13488, Republic of Korea

²

Department of Biomedical Informatics, CHA University School of Medicine, Bundang-gu, Seongnam-si 13488, Republic of Korea

³

Department of Neurosurgery, CHA University School of Medicine, CHA Bungdang Medical Center, Seongnam-si 13497, Republic of Korea

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2022, 12(24), 12521; https://doi.org/10.3390/app122412521

Submission received: 17 October 2022 / Revised: 29 November 2022 / Accepted: 6 December 2022 / Published: 7 December 2022

(This article belongs to the Special Issue Intervertebral Disc Regeneration II)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

A natural language processing (NLP) pipeline was developed to identify lumbar spine imaging findings associated with low back pain (LBP) in X-radiation (X-ray), computed tomography (CT), and magnetic resonance imaging (MRI) reports. A total of 18,640 report datasets were randomly sampled (stratified by imaging modality) to obtain a balanced sample of 300 X-ray, 300 CT, and 300 MRI reports. A total of 23 radiologic findings potentially related to LBP were defined, and their presence was extracted from radiologic reports. In developing NLP pipelines, section and sentence segmentation from the radiology reports was performed using a rule-based method, including regular expression with negation detection. Datasets were randomly split into 80% for development and 20% for testing to evaluate the model’s extraction performance. The performance of the NLP pipeline was evaluated by using recall, precision, accuracy, and the F1 score. In evaluating NLP model performances, four parameters—recall, precision, accuracy, and F1 score—were greater than 0.9 for all 23 radiologic findings. These four scores were 1.0 for 10 radiologic findings (listhesis, annular fissure, disc bulge, disc extrusion, disc protrusion, endplate edema or Type 1 Modic change, lateral recess stenosis, Schmorl’s node, osteophyte, and any stenosis). In the seven potentially clinically important radiologic findings, the F1 score ranged from 0.9882 to 1.0. In this study, a rule-based NLP system identifying 23 findings related to LBP from X-ray, CT, and MRI reports was developed, and it presented good performance in regards to the four scoring parameters.

Keywords:

natural language processing; low back pain; lumbar spine imaging

1. Introduction

Low back pain (LBP) is defined as pain and discomfort localized under the ribs and above the inferior gluteal folds, with or without leg pain [1]. A total of 70–80% of adults experience LBP in some form during their lives [2]. More than 85% of people under 45 years of age experience at least one LBP symptom that requires medicine or interventional treatment [3]. LBP can be classified as acute or chronic, depending on its onset. Acute LBP usually occurs suddenly, and if the pain persists for more than 3 months, it can be called chronic LBP [4]. LBP is one of the most common causes of hospital visits and the second-leading cause of sick leave [5]. Because of its high direct and indirect costs, the health, social, and economic impacts on individuals, families, and society are significant [6]. Particularly, between 5% and 10% of chronic low back pain (CLBP) cases require high costs and long-term care [7]. It is important to develop tools to help patients with recently developed back pain accurately predict whether persistent pain will occur [8].

The most common methods to evaluate chronic LBP are X-radiation (X-ray) examinations, computed tomography (CT), and magnetic resonance imaging (MRI) [9]. The radiological findings identified from reviewing those images are translated by clinicians and transferred to a radiology report. A radiology report is a formal interpretation of a radiological examination and contains radiological findings related to the clinical diagnosis and treatment decisions [10,11]. However, many radiologic reports use a free-text form language and remain unstructured, requiring a manual review to harvest clinical information. Manual extraction of information is labor-intensive and impractical for large-scale research. It is important to create a pipeline for extracting clinical information from the radiology reports.

Natural language processing (NLP) is defined as understanding, analyzing, and extracting meaningful information from text (natural language) by computer science [12]. NLP, a computer technology specializing in text processing, can be used to extract critical information from unstructured text in electronic health records [13]. The NLP system is classified into three approaches: the rule-based approach, the machine learning-based approach, and the hybrid types approach [12]. For example, a Bidirectional Encoder Representations from Transformers (BERT)-based NLP pipeline (a state-of-the-art deep learning model for language processing) was used to identify important findings in intensive care chest radiograph reports [14]. Emilien et al. used the NLP system developed by the BERT model. The free-text form records in the University Hospital of Amiens-Picardy in France was used to develop the BERT model. This BERT model learned contextual embeddings. In this research, the utilization of the BERT method in the medical care field was discussed [15]. Tan et al. developed a rule-based NLP system and a machine learning-based NLP system to identify lumbar spine imaging findings related to LBP and compared their performance [16].

In previous studies, there were some points that need to be improved. First, while the machine learning approach has been actively studied, the rule-based approach has not been studied much in the NLP system for extracting words associated with LBP. The set-up cost of the rule-based approach is lower than that of the machine learning approach. The training dataset size of the rule-based approach is also smaller than that of the machine learning approach. In previous studies, the rule-based approach presented high specificity, but moderate sensitivity [16]. This study developed an improved rule-based NLP system to that used in the previous research. Second, there is a need to develop the NLP system for extracting words associated with LBP in CT radiology reports. In previous studies, the NLP system was not trained and tested in the CT radiology reports [16,17,18,19,20,21]. In this study, the utilization of the NLP system was expanded to the CT radiology reports for extracting words associated with LBP.

This study aimed to develop an NLP system to recognize radiologic findings associated with LBP in X-ray, CT, and MRI radiologic reports. A rule-based approach of the NLP system was trained and evaluated with radiologic reports from patients who sought medical care for LBP. By developing an NLP system, it is possible to analyze the free-text from radiology reports unique to the medical institution. Unstructured clinical information from the radiology reports was processed into a structured data form. The structured form clinical data will be used for the data-drive research of LBP. This study laid the foundation for extracting the clinical data from free-text from radiology reports and using it in various clinical studies.

2. Materials and Methods

2.1. Dataset

This retrospective study of lumbar spine imaging reports enrolled patients who visited CHA University Bundang CHA Medical Center between January 2011 and December 2020. A dataset was assembled from 18,640 reports, and a sample of 300 X-ray, 300 CT, and 300 MRI reports was randomly extracted. With reference to the method of the NLP system for extracting words associated with LBP in previous research, 5% of these words were extracted from the original dataset and used for this study [17]. The dataset was selected randomly. The dataset was split into training and test datasets in an 8:2 ratio, maintaining the ratios of the three different modalities. The training dataset was used for developing the NLP system, and the test dataset was used to validate the information extraction performance. A consensus was reached through a discussion for any cases of discrepancies. This study was approved by the Institutional Review Board of the Bundang CHA Medical Center (IRB No. 2021-04-212).

2.2. Inclusion Criteria

The extracting of clinical information by the NLP system was carried out according to the inclusion criteria. The criteria were composed of radiologic findings in the form of words about spine pathology, structural problems of the spine, and spinal disease. A total of 23 radiologic findings known to be related to LBP from previous studies were established, and each report was annotated by two physicians specializing in LBP diagnosis [18,22] (Table 1). In the 23 radiologic findings, 7 items—disc extrusion, endplate edema or Type 1 Modic, any stenosis, central stenosis, foraminal stenosis, nerve root displaced/compressed and lateral recess stenosis represent the potentially clinically important findings [16].

2.3. NLP System

This NLP pipeline was developed using Python (3.8.10). Section and sentence segmentation from the radiology reports was performed using a rule-based approach, using regular expression with negation detection. A regular expression is a set of characters specialized for analyzing patterns in text.

The “re” module is one of the Python libraries used to implement regular expressions in Python. By using “re”, syntactic analysis was performed to develop clinical information extraction rules. The training dataset was analyzed to construct a customized dictionary of terms associate with the 23 predefined LBP features. From the X-ray, CT, and MRI radiology reports, an NLP pipeline was developed to extract only terms relevant to LBP diagnosis. The pipeline extracted LBP information in three steps: text preprocessing, concept mapping, and summarizing.

First, in the text preprocessing step, tokenization was performed. The NLP refined the unstructured raw text to remove unexpected white spaces and punctuation and filtered out sentences unrelated to LBP. In this step, section segmentation, sentence segmentation, and normalization is performed. Second, LBP concepts were mapped by referring to the constructed dictionary and linguistic rules. Finally, in the summarizing step, the mapped concepts were summarized in a predefined format (Figure 1).

The whole process was repeated until the performance of this NLP system was improved. The k-fold cross validation was used for the NLP model validation. The NLP pipeline version with the best performance was validated by the test dataset. Therefore, the words associated with LBP were extracted from the X-ray, CT, and MRI reports, using the patterned regular expressions. The extracted words are expressed in a matrix. The source code of the NLP system can be found at https://github.com/YJK96/Low-Back-Pain-NLP.git, which was accessed on 28 November 2022.

2.4. Statistical Analysis

The chi-square test was used to compare ratios, and the t-test was used to compare quantitative variables. The performance of the NLP system was measured using recall, precision, accuracy, and the F1 score.

The recall is defined as

\frac{N u m b e r o f t r u e p o s i t i v e s}{N u m b e r o f t r u e p o s i t i v e s + N u m b e r o f f a l s e n e g a t i v e s}

.

The precision is defined as

\frac{N u m b e r o f t r u e p o s i t i v e s}{N u m b e r o f t r u e p o s i t i v e s + N u m b e r o f f a l s e p o s i t i v e s}

.

The accuracy is defined as

\frac{N u m b e r o f t r u e p o s i t i v e s + N u m b e r o f t r u e n e g a t i v e s}{N u m b e r o f t r u e p o s i t i v e s + N u m b e r o f t r u e n e g a t i v e s + N u m b e r o f f a l s e p o s i t i v e s + N u m b e r o f f a l s e n e t g a t i v e s}

(1)

The F1 score is defined as

\frac{2 \times r e c a l l \times p r e c i s i o n}{p r c i s i o n + r e c a l l}

and is used to estimate performance. This score is the harmonic mean of precision and recall, and it is considered more useful than accuracy because of the prevalence of class imbalance in text classification [23].

3. Results

3.1. Dataset Characteristics

The dataset (n = 900) was divided by the training datasets (n = 720) and the test datasets (n = 120). The median age, standard deviation of patients, and the proportion of male or female patients for each dataset were calculated.

In the dataset evaluation, the training and test dataset was classified by 23 radiologic findings. The prevalence of radiologic findings ranged from 1.4% (disc sequestration) to 56.3% (disc bulge) in the training set and from 0% (disc herniation, facet hypertrophy) to 53.9% (disc bulge) in the test set. In both the training and test sets, disc bulge was the most common finding (56.3% and 53.9%, respectively). Spondylosis was the second-most common finding (39.7% and 42.8%, respectively), followed by listhesis as the third-most common finding (36% and 33.3%, respectively) (Table 2). Between the training and test datasets, 4/23 radiologic findings (facet hypertrophy, lateral recess stenosis, any stenosis, disc sequestration) show a statistically significant difference. (p-value < 0.05) (Figure 2).

3.2. NLP System Performances

To evaluate the best performances of the NLP system, the NLP model was trained and datasets tested by calculation of the recall, precision, accuracy, and the F1 score. The raw data of the NLP system is available at https://github.com/YJK96/Low-Back-Pain-NLP.git, which was accessed on 28 November 2022.

In the test datasets (n = 120) of the rule-based NLP model, all 23 radiologic findings had scores of more than 0.9 for recall, precision, accuracy, and the F1 score. These four scores were 1.0 for 10 radiologic findings—listhesis, annular fissure, disc bulge, disc extrusion, disc protrusion, endplate edema or type 1 Modic change, lateral recess stenosis, Schmorl’s node, osteophyte, and any stenosis. The lowest F1 score was 0.9802 for facet hypertrophy.

The highest recall score was 1.0 for 14 radiologic findings—listhesis, fracture, annular fissure, disc bulge, disc extrusion, disc herniation, disc protrusion, endplate edema or type 1 Modic, facet hypertrophy, lateral recess stenosis, spondylolysis, Schmorl’s node, osteophyte and any stenosis. The lowest recall score was 0.9840 for central stenosis.

The highest precision score was 1.0 for 16 radiologic findings—listhesis, scoliosis, spondylosis, annular fissure, disc bulge, disc extrusion, disc protrusion, endplate edema or type 1 Modic, central stenosis, nerve root displaced/compressed, lateral recess stenosis, Schmorl’s node, osteophyte, disc space narrowing, any stenosis, and disc sequestration. The lowest precision score was 0.9611 for facet hypertrophy. The highest accuracy score was 1.0 for 10 radiologic findings—listhesis, annular fissure, disc bulge, disc extrusion, disc protrusion, endplate edema or type 1 Modic, lateral recess stenosis, Schmorl’s node, osteophyte, and any stenosis. The lowest accuracy score was 0.9611 for facet hypertrophy (Table 3).

The F1 score for the potentially clinically important 7 radiologic findings is as follows: 1.0 for disc extrusion, 1.0 for endplate edema or Type 1 Modic, 1.0 for any stenosis, 0.9919 for central stenosis, 0.9882 for foraminal stenosis, 0.9969 for nerve root displaced/compressed, and 1.0 for lateral recess stenosis.

4. Discussion

The purpose of study was the development of the NLP system for extracting the clinical findings, which are associated with LPB on X-ray, CT and MRI radiologic reports. For NLP system modeling, the training datasets (n = 720; 240 X-ray, 240 CT. and 240 MRI) and the test datasets (n = 120; 40 X-ray, 40 CT, and 40 MRI) were extracted from 18,640 radiologic reports. Between the training and test datasets, 19/23 radiologic findings show no statistically significant differences.

In this study, we developed a rule-based NLP pipeline for identifying 23 radiologic findings in X-ray, CT, and MRI reports. In detail, regular expression and negation detection were used for modeling. In assessing the performance of this NLP system, there are four parameters for validation—recall, precision, accuracy, and F1 score. This NLP system presented accuracy more than 0.9611. To correct the effects of dataset bias, the NLP system was evaluated using the F1 score. The F1 score is the most important thing among the parameters for NLP system evaluation. The F1 score is an indicator that reflects recall and precision. In this NLP system, all radiologic findings had an F1 score of 0.9802 or higher. Recall means the sensitivity of the statistics. The developed NLP system had a sensitivity of at least 0.984. Compared to the previous rule-based NLP system [16], the sensitivity of the rule-based NLP pipeline was improved.

From the previous study, the potentially clinically important radiologic findings were defined—Disc extrusion, endplate edema or Type 1 Modic, any stenosis, central stenosis, foraminal stenosis, nerve root displaced/compressed, and lateral recess stenosis [16]. In this study, the 7 potentially clinically important radiologic findings had an F1 score of 0.9882 to 1.0. Moreover, the developed NLP system presented a higher sensitivity of potentially clinically important radiological findings than did the previous rule-based NLP system [16]. In this study, a unique NLP system was developed for extracting the clinical findings associated with LBP pain from the radiologic reports of CHA University Bundang CHA Medical Center.

The imaging findings associated with LBP are not explicitly coded in the medical database that is part of the electronic health record. An NLP system automatically identifies such findings from free-text radiology reports, reducing the burden of manual extraction [16]. In this study, a rule-based NLP system was developed and validated for identifying radiologic findings related to LBP in radiology reports. Radiology reports contain a substantial amount of content within electronic health records and can be used for communication and documentation of the imaging. The same radiologic findings can be expressed in different words in radiology reports, and many radiologic findings remain unstructured in medical databases. A trained person can perform the information extraction task, but manual extraction is labor-intensive, costly, and time-consuming. An NLP system can be used to select useful information from unstructured text in electronic medical records, reducing the burden of manual extraction [13].

Numerous NLP systems now exhibit similar accuracy to humans and have already been used for radiologic reports. Knirsch et al. compared an NLP system with specialist review for chest reports [24]. The chest reports of patients with a positive culture and suspected to have tuberculosis were identified. The focus was on identifying six keywords in the chest reports, and a consensus of 89–92% was gained. Chapman et al., developed an NLP system for disease chronicity, certainty, presence, and examination technical quality [25]. The NLP system showed high sensitivity (86–98%) and high specificity (89–93%), but did not exhibit such high results for chronicity (60% and 99%). Jujjavarapu et al. compared different preprocessing and featurization of the NLP pipeline for classifying radiologic findings associated with LBP from X-ray and MRI radiologic reports. In that study, if the NLP pipeline was developed in the same system (e.g., healthcare institution, EMR system), N-grams was the preferred NLP method. If NLP pipelines were developed in multiple systems, document embeddings were considered the best NLP method [17]. Travis Caton et al., investigated the relationship between the radiologic findings of degenerative spinal stenosis on lumbar MRIs (LMRI) and patient characteristics (e.g., age, sex) using rule-based NLP systems. The NLP system identified the patterns of lumbar spine degeneration [19]. Caton et al. investigated lumbar spine MRIs (LMIR) to identify a performance metric for measuring global severity of lumbar degenerative disease (LSDD) [20]. Huhdanpaa et al. developed a rule-based NLP system to identify the patients with Type 1 Modic endplate changes from lumbar MRI reports [18]. Lewandrowski et al. developed deep learning neural network modes to identify features from MRI digital imaging and communications in medicine (DICOM) datasets to produce automatic MRI reports [26]. Galbusera et al. developed BERT-based NLP system to generate annotations for radiographic images from lumbar X-ray reports [21]. In many previous studies, there were no NLP systems for classification or generation annotations from CT radiologic reports.

Some previous studies used clinical notes or electronic health records to assist in clinical diagnosis or treatment decisions for LBP patients. Miotto et al. developed a convolutional neural network model for classifying acute LBP episodes from free-text clinical notes [27]. Walsh et al. developed a support vector machine model for identifying the axial spondyloarthritis (SpA) concepts from electro medical records [28]. Additioanlly, Walsh et al. developed an NLP system for identifying the patients with axial SpA from clinical charts [29]. Zhao et al. developed the NLP algorithms to classify the axial SpA patients from the free text data of the electronic health records [30]. Particularly, the NLP system was applied to identify the electronic medical record or operation notes during surgery or post-operation. Ehresman et al. developed the NLP system for identifying incidental durotomies from intra-operative electronic health records [31]. Karahade et al. developed the NLP system for identifying incidental durotomies from the free-text operation notes [32]. Additionally, a few previous studies identified the complications of spinal surgery from operative notes or electronic medical records. Karhade et al. developed the machine learning algorithms for the prediction of intra-operative vascular injury (VI) and the NLP systems for identifying VI from free-text operative notes [33]. Karhade et al. developed the NLP algorithms for identifying the post-operative wound infection that requires reoperation after lumbar discectomy from free-text operation notes [34]. Karhade et al. investigated the NLP algorithms for predicting the 90-day unplanned readmission of the lumbar spine fusion patients from free-text notes during the hospitalization period [35]. Dantes et al. developed an NLP system based on IDEAL-X to identify venous thromboembolism from electronic medical records [36]. The free-text type clinical note, operation notes, and electronic health records could be used for developing NLP algorithms for identifying or predicting specific spinal diseases or complications of spinal surgery (Table 4) [12].

Recently, NLP system research on clinical reports written using official language has been conducted globally. Emilien et al. developed the NLP system for learning contextual embeddings from free-text form clinical records in France [15]. Kim Y et al. developed an NLP system for identifying a Korean medical corpus using BERT models [37]. Dahl et al. developed an NLP system for classifying Norwegian pediatric CT radiology reports. A bidirectional recurrent neural network model, a convolutional neural network model, and a support vector machine model were used for training and testing the NLP system [38]. Fink et al. developed an NLP system for identifying the oncologic outcomes from structured oncology reports created in the German language [39].

In previous research, machine learning- or deep learning-based NLP system predicted multiple findings using the same datasets. Moreover, the machine learning-based NLP system was more scalable than the rule-based model. However, machine learning- or deep learning-based NLP systems required larger datasets and a higher set-up cost than rule-based NLP systems [16].

In this study, the NLP system is developed using the rule-based approach. Clinicians determined and set the regular expressions using “re.” The various patterns of the free-text form clinical reports could be defined as regular expressions. Because rules are set by the user, and the rule-based method is easier to debug and has higher precision that what can be expected using the machine learning-based method. Several limitations of the previous studies were overcome. First, this NLP system was trained with relatively smaller datasets (n = 720) than those used in previous studies. Second, the NLP system showed improved sensitivity (from 0.984 to 1), which has been pointed out as low in previous rule-based NLP systems [16]. Third, the utilization of the NLP system was extended to the CT radiology reports for extracting words associated with LBP.

There are several limitations of this study. First, the radiologic findings were collected from one medical center. The pipeline may not properly process reports with different styles of reporting or expressions. In future works, numerous radiology reports with different styles of reporting should be obtained, as well as datasets from multiple institutions. Second, only the rule-based method was used for developing the NLP model. In future work, other featurization methods (e.g., N-grams, document embeddings) should be used and compared to each other. Despite these limitations, the feasibility of the NLP system to identify lumbar imaging results associated with LBP from sampled radiographic reports was confirmed.

In the future, the NLP system will be used to identify lumbar imaging results among subjects without LBP, to predict whether LBP will become chronic, and to investigate the accuracy of predictions of chronic LBP persistence by investigating these predictions along with other test results, such as hematological tests. Additionally, the transformer models (i.e., BERT model) can be used for further studies. The patient datasets with LBP and without LPB will be learned in the BERT model. Through this method, we are expected to discover the sentences or words with a high correlation with LBP. More keywords, in addition to the 23 radiologic findings used in the current study, will be found.

5. Conclusions

A rule-based NLP system was developed to identify 23 findings associated with LBP from X-ray, CT, and MRI radiology reports sampled from CHA University Bundang CHA Medical Center. This rule-based NLP system presented good performance. In particular, the potentially clinically important seven radiologic findings exhibited high F1 scores.

Through research, an NLP system was developed to identify and extract the necessary terms from the free-text form clinical data used in medical institutions. Through this study, an NLP system for extracting words associated with LBP from not only X-ray and MRI reports, but also CT reports, was developed.

The utilization of this NLP system will be expanded to extract data needed for medical research or data that becomes a marker for diagnosis or treatment from free-text form clinical data of the medical institution. This NLP system has potential for use in clinical diagnosis, treatment decisions, and large-scale research.

In future studies, the criteria of radiologic findings will be subdivided according to a lesion location. Additionally, results of each patient’s complete blood count (CBC) test and blood chemistry test should be used for the prediction of chronic LBP persistency, along with the radiologic findings from the rule-based NLP system. Moreover, the correlation between the degree of pain and the radiologic findings from the NLP system will be investigated.

Author Contributions

Conceptualization and methodology; H.-W.H. and I.H., writing; Y.K., S.B.K., G.S. (Gyuseon Song) and C.S. (Chanyoung Song); data acquisition; Y.K., S.B.K. and G.S. (Gyuseon Song). All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Korean Health Technology Research and Development Project, the Ministry for Health and Welfare Affairs (HR16C0002 and HH21C0027), and the Institute of Information and Communications Technology Planning and Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2019-0-00224, AIM: AI based Next-Generation Security Information Event Management Methodology for Cognitive Intelligence and Secure-Open Framework).

Institutional Review Board Statement

This study was approved by the Institutional Review Board of the Bundang CHA Medical Center (IRB No. 2021-04-212).

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in [repository name e.g., FigShare] at [doi], reference number [reference number].

Conflicts of Interest

The authors declare no conflict of interest.

References

Andersson, G.B. Epidemiology of low back pain. Acta Orthop. Scand. Suppl. 1998, 281, 28–31. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Deyo, R.A.; Cherkin, D.; Conrad, D.; Volinn, E. Cost, controversy, crisis: Low back pain and the health of the public. Annu. Rev. Public Health 1991, 12, 141–156. [Google Scholar] [CrossRef] [PubMed]
Otluoğlu, G.D.; Konya, D.; Toktas, Z.O. The Influence of Mechanic Factors in Disc Degeneration Disease as a Determinant for Surgical Indication. Neurospine 2020, 17, 215–220. [Google Scholar] [CrossRef] [PubMed]
Atlas, S.J.; Deyo, R.A. Evaluating and managing acute low back pain in the primary care setting. J. Gen. Intern. Med. 2001, 16, 120–131. [Google Scholar] [CrossRef]
Deyo, R.A.; Weinstein, J.N. Low back pain. N. Engl. J. Med. 2001, 344, 363–370. [Google Scholar] [CrossRef]
Dionne, C.E.; Dunn, K.M.; Croft, P.R. Does back pain prevalence really decrease with increasing age? A systematic review. Age Ageing 2006, 35, 229–234. [Google Scholar] [CrossRef] [Green Version]
Meucci, R.D.; Fassa, A.G.; Faria, N.M. Prevalence of chronic low back pain: Systematic review. Rev. Saude Publica 2015, 49, 1. [Google Scholar] [CrossRef]
Jarvik, J.J.; Hollingworth, W.; Heagerty, P.; Haynor, D.R.; Deyo, R.A. The Longitudinal Assessment of Imaging and Disability of the Back (LAIDBack) Study: Baseline data. Spine 2001, 26, 1158–1166. [Google Scholar] [CrossRef] [Green Version]
Li, A.L.; Yen, D. Effect of increased MRI and CT scan utilization on clinical decision-making in patients referred to a surgical clinic for back pain. Can. J. Surg. 2011, 54, 128–132. [Google Scholar] [CrossRef] [Green Version]
Birkmeyer, N.J.; Weinstein, J.N.; Tosteson, A.N.; Tosteson, T.D.; Skinner, J.S.; Lurie, J.D.; Deyo, R.; Wennberg, J.E. Design of the Spine Patient outcomes Research Trial (SPORT). Spine 2002, 27, 1361–1372. [Google Scholar] [CrossRef]
Sistrom, C.L.; Langlotz, C.P. A framework for improving radiology reporting. J. Am. Coll. Radiol. 2005, 2, 159–167. [Google Scholar] [CrossRef] [PubMed]
Bacco, L.; Russo, F.; Ambrosio, L.; D’Antoni, F.; Vollero, L.; Vadalà, G.; Dell’Orletta, F.; Merone, M.; Papalia, R.; Denaro, V. Natural language processing in low back pain and spine diseases: A systematic review. Front. Surg. 2022, 9, 957085. [Google Scholar] [CrossRef] [PubMed]
Cai, T.; Giannopoulos, A.A.; Yu, S.; Kelil, T.; Ripley, B.; Kumamaru, K.K.; Rybicki, F.J.; Mitsouras, D. Natural Language Processing Technologies in Radiology Research and Clinical Applications. Radiographics 2016, 36, 176–191. [Google Scholar] [CrossRef] [Green Version]
Bressem, K.K.; Adams, L.C.; Gaudin, R.A.; Tröltzsch, D.; Hamm, B.; Makowski, M.R.; Schüle, C.Y.; Vahldiek, J.L.; Niehues, S.M. Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports. Bioinformatics 2021, 36, 5255–5261. [Google Scholar] [CrossRef] [PubMed]
Arnaud, E.; Elbattah, M.; Gignon, M.; Dequen, G. Learning Embeddings from Free-text Triage Notes using Pretrained Transformer Models. In Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies—Vol 5: Healthinf, Lisbonne, Portugal, 9 February 2022; pp. 835–841. [Google Scholar]
Tan, W.K.; Hassanpour, S.; Heagerty, P.J.; Rundell, S.D.; Suri, P.; Huhdanpaa, H.T.; James, K.; Carrell, D.S.; Langlotz, C.P.; Organ, N.L.; et al. Comparison of Natural Language Processing Rules-based and Machine-learning Systems to Identify Lumbar Spine Imaging Findings Related to Low Back Pain. Acad. Radiol. 2018, 25, 1422–1432. [Google Scholar] [CrossRef]
Jujjavarapu, C.; Pejaver, V.; Cohen, T.A.; Mooney, S.D.; Heagerty, P.J.; Jarvik, J.G. A Comparison of Natural Language Processing Methods for the Classification of Lumbar Spine Imaging Findings Related to Lower Back Pain. Acad. Radiol. 2022, 29 Suppl S3, S188–S200. [Google Scholar] [CrossRef]
Huhdanpaa, H.T.; Tan, W.K.; Rundell, S.D.; Suri, P.; Chokshi, F.H.; Comstock, B.A.; Heagerty, P.J.; James, K.T.; Avins, A.L.; Nedeljkovic, S.S.; et al. Using Natural Language Processing of Free-Text Radiology Reports to Identify Type 1 Modic Endplate Changes. J. Digit. Imaging 2018, 31, 84–90. [Google Scholar] [CrossRef]
Travis Caton, M., Jr.; Wiggins, W.F.; Pomerantz, S.R.; Andriole, K.P. Effects of age and sex on the distribution and symmetry of lumbar spinal and neural foraminal stenosis: A natural language processing analysis of 43,255 lumbar MRI reports. Neuroradiology 2021, 63, 959–966. [Google Scholar] [CrossRef]
Caton, M.T., Jr.; Wiggins, W.F.; Pomerantz, S.R.; Andriole, K.P. The Composite Severity Score for Lumbar Spine MRI: A Metric of Cumulative Degenerative Disease Predicts Time Spent on Interpretation and Reporting. J. Digit. Imaging 2021, 34, 811–819. [Google Scholar] [CrossRef]
Galbusera, F.; Cina, A.; Bassani, T.; Panico, M.; Sconfienza, L.M. Automatic Diagnosis of Spinal Disorders on Radiographic Images: Leveraging Existing Unstructured Datasets With Natural Language Processing. Glob. Spine J. 2021, 21925682211026910. [Google Scholar] [CrossRef]
Takahashi, K.; Miyazaki, T.; Ohnari, H.; Takino, T.; Tomita, K. Schmorl’s nodes and low-back pain. Analysis of magnetic resonance imaging findings in symptomatic and asymptomatic individuals. Eur. Spine J. 1995, 4, 56–59. [Google Scholar] [CrossRef] [PubMed]
Syed, K.; Sleeman, W.T.; Hagan, M.; Palta, J.; Kapoor, R.; Ghosh, P. Automatic Incident Triage in Radiation Oncology Incident Learning System. Healthcare 2020, 8, 272. [Google Scholar] [CrossRef] [PubMed]
Knirsch, C.A.; Jain, N.L.; Pablos-Mendez, A.; Friedman, C.; Hripcsak, G. Respiratory isolation of tuberculosis patients using clinical guidelines and an automated clinical decision support system. Infect. Control Hosp. Epidemiol. 1998, 19, 94–100. [Google Scholar] [CrossRef] [PubMed]
Chapman, B.E.; Lee, S.; Kang, H.P.; Chapman, W.W. Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm. J. Biomed. Inform. 2011, 44, 728–737. [Google Scholar] [CrossRef] [Green Version]
Lewandrowsk, I.K.; Muraleedharan, N.; Eddy, S.A.; Sobti, V.; Reece, B.D.; Ramírez León, J.F.; Shah, S. Feasibility of Deep Learning Algorithms for Reporting in Routine Spine Magnetic Resonance Imaging. Int. J. Spine Surg. 2020, 14, S86–S97. [Google Scholar] [CrossRef]
Miotto, R.; Percha, B.L.; Glicksberg, B.S.; Lee, H.C.; Cruz, L.; Dudley, J.T.; Nabeel, I. Identifying Acute Low Back Pain Episodes in Primary Care Practice From Clinical Notes: Observational Study. JMIR Med. Inform. 2020, 8, e16878. [Google Scholar] [CrossRef] [PubMed]
Walsh, J.A.; Shao, Y.; Leng, J.; He, T.; Teng, C.C.; Redd, D.; Treitler Zeng, Q.; Burningham, Z.; Clegg, D.O.; Sauer, B.C. Identifying Axial Spondyloarthritis in Electronic Medical Records of US Veterans. Arthritis Care Res. 2017, 69, 1414–1420. [Google Scholar] [CrossRef] [Green Version]
Walsh, J.A.; Pei, S.; Penmetsa, G.; Hansen, J.L.; Cannon, G.W.; Clegg, D.O.; Sauer, B.C. Identification of Axial Spondyloarthritis Patients in a Large Dataset: The Development and Validation of Novel Methods. J. Rheumatol. 2020, 47, 42–49. [Google Scholar] [CrossRef]
Zhao, S.S.; Hong, C.; Cai, T.; Xu, C.; Huang, J.; Ermann, J.; Goodson, N.J.; Solomon, D.H.; Cai, T.; Liao, K.P. Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records. Rheumatology 2020, 59, 1059–1065. [Google Scholar] [CrossRef]
Ehresman, J.; Pennington, Z.; Karhade, A.V.; Huq, S.; Medikonda, R.; Schilling, A.; Feghali, J.; Hersh, A.; Ahmed, A.K.; Cottrill, E.; et al. Incidental durotomy: Predictive risk model and external validation of natural language process identification algorithm. J. Neurosurg. Spine 2020, 33, 342–348. [Google Scholar] [CrossRef]
Karhade, A.V.; Bongers, M.E.R.; Groot, O.Q.; Kazarian, E.R.; Cha, T.D.; Fogel, H.A.; Hershman, S.H.; Tobert, D.G.; Schoenfeld, A.J.; Bono, C.M.; et al. Natural language processing for automated detection of incidental durotomy. Spine J. 2020, 20, 695–700. [Google Scholar] [CrossRef] [PubMed]
Karhade, A.V.; Bongers, M.E.R.; Groot, O.Q.; Cha, T.D.; Doorly, T.P.; Fogel, H.A.; Hershman, S.H.; Tobert, D.G.; Srivastava, S.D.; Bono, C.M.; et al. Development of machine learning and natural language processing algorithms for preoperative prediction and automated identification of intraoperative vascular injury in anterior lumbar spine surgery. Spine J. 2021, 21, 1635–1642. [Google Scholar] [CrossRef] [PubMed]
Karhade, A.V.; Bongers, M.E.R.; Groot, O.Q.; Cha, T.D.; Doorly, T.P.; Fogel, H.A.; Hershman, S.H.; Tobert, D.G.; Schoenfeld, A.J.; Kang, J.D.; et al. Can natural language processing provide accurate, automated reporting of wound infection requiring reoperation after lumbar discectomy? Spine J. 2020, 20, 1602–1609. [Google Scholar] [CrossRef] [PubMed]
Karhade, A.V.; Lavoie-Gagne, O.; Agaronnik, N.; Ghaednia, H.; Collins, A.K.; Shin, D.; Schwab, J.H. Natural language processing for prediction of readmission in posterior lumbar fusion patients: Which free-text notes have the most utility? Spine J. 2022, 22, 272–277. [Google Scholar] [CrossRef]
Dantes, R.B.; Zheng, S.; Lu, J.J.; Beckman, M.G.; Krishnaswamy, A.; Richardson, L.C.; Chernetsky-Tejedor, S.; Wang, F. Improved Identification of Venous Thromboembolism From Electronic Medical Records Using a Novel Information Extraction Software Platform. Med. Care 2018, 56, e54–e60. [Google Scholar] [CrossRef]
Kim, Y.; Kim, J.H.; Lee, J.M.; Jang, M.J.; Yum, Y.J.; Kim, S.; Shin, U.; Kim, Y.M.; Joo, H.J.; Song, S. A pre-trained BERT for Korean medical natural language processing. Sci. Rep. 2022, 12, 13847. [Google Scholar] [CrossRef]
Dahl, F.A.; Rama, T.; Hurlen, P.; Brekke, P.H.; Husby, H.; Gundersen, T.; Nytrø, Ø.; Øvrelid, L. Neural classification of Norwegian radiology reports: Using NLP to detect findings in CT-scans of children. BMC Med. Inform. Decis. Mak. 2021, 21, 84. [Google Scholar] [CrossRef]
Fink, M.A.; Kades, K.; Bischoff, A.; Moll, M.; Schnell, M.; Küchler, M.; Köhler, G.; Sellner, J.; Heussel, C.P.; Kauczor, H.U.; et al. Deep Learning-based Assessment of Oncologic Outcomes from Natural Language Processing of Structured Radiology Reports. Radiol. Artif. Intell. 2022, 4, e220055. [Google Scholar] [CrossRef]

Figure 1. Summary of the natural language processing pipeline.

Figure 2. Frequency of the radiologic findings between the training and test datasets. (**) indicates a statistically significant difference (p-value < 0.05).

Table 1. 23 Radiologic findings of the study.

Radiologic Findings
Listhesis
Scoliosis
Fracture
Spondylosis
Annular fissure
Disc bulge
Disc degeneration
Disc extrusion
Disc herniation
Disc protrusion
End plate edema or type 1 Modic change
Facet hypertrophy
Central stenosis
Foraminal stenosis
Nerve root displaced/compressed
Lateral recess stenosis
Spondylolysis
Schmorl’s node
Osteophyte
Disc space narrowing
Any stenosis
Disc sequestration
Intradiscal vacuum

Table 2. Characteristics of training and testing datasets for the development of the natural language processing pipeline.

Characteristics	Training (N = 720)	Testing (N = 180)	p-Value
Mean age (SD)	62.5 (15.9)	61.3 (15.4)	0.402
Sex (n)			0.066
Male	291 (40.4)	67 (37.2)
Female	429 (59.6)	113 (62.8)
Listhesis (%)	259 (36.0)	60 (33.3)	0.508
Scoliosis (%)	147 (20.4)	34 (18.9)	0.647
Fracture (%)	130 (18.1)	37 (20.6)	0.440
Spondylosis (%)	286 (39.7)	77 (42.8)	0.455
Annular fissure (%)	58 (8.1)	10 (5.6)	0.256
Disc bulge (%)	405 (56.3)	97 (53.9)	0.568
Disc degeneration (%)	17 (2.4)	2 (1.1)	0.297
Disc extrusion (%)	82 (11.4)	25 (13.9)	0.354
Disc herniation (%)	15 (2.1)	0 (0.0)	0.051
Disc protrusion (%)	152 (21.1)	43 (23.9)	0.418
Endplate edema or type 1 Modic change (%)	22 (3.1)	10 (5.6)	0.105
Facet hypertrophy (%)	19 (2.6)	0 (0.0)	0.028
Central stenosis (%)	215 (29.9)	55 (30.6)	0.856
Foraminal stenosis (%)	218 (30.3)	51 (28.3)	0.610
Nerve root displaced/compressed (%)	54 (7.5)	17 (9.4)	0.387
Lateral recess stenosis (%)	18 (2.5)	10 (5.6)	0.035
Spondylolysis (%)	18 (2.5)	5 (2.8)	0.833
Schmorl’s node (%)	46 (6.4)	17 (9.4)	0.151
Osteophyte (%)	42 (5.8)	11 (6.1)	0.887
Disc space narrowing (%)	181 (25.1)	51 (28.3)	0.381
Any stenosis (%)	29 (4.0)	1 (0.6)	0.020
Disc sequestration (%)	10 (1.4)	8 (4.4)	0.009
Intradiscal vacuum (%)	62 (8.6)	19 (10.6)	0.415

This table shows the number of diseases diagnosed in 720 training set and 180 test set reports.

Table 3. Performance of the natural language processing pipeline in the test dataset (N = 180).

Indicators	Recall	Precision	Accuracy	F1 Score
Listhesis	1	1	1	1
Scoliosis	0.9932	1	0.9944	0.9966
Fracture	1	0.986	0.9889	0.993
Spondylosis	0.9903	1	0.9944	0.9951
Annular fissure	1	1	1	1
Disc bulge	1	1	1	1
Disc degeneration	0.9944	0.9944	0.9889	0.9944
Disc extrusion	1	1	1	1
Disc herniation	1	0.9889	0.9889	0.9944
Disc protrusion	1	1	1	1
Endplate edema or Type 1 Modic change	1	1	1	1
Facet hypertrophy	1	0.9611	0.9611	0.9802
Central stenosis	0.984	1	0.9889	0.9919
Foraminal stenosis	0.9921	0.9844	0.9833	0.9882
Nerve root displaced/compressed	0.9939	1	0.9944	0.9969
Lateral recess stenosis	1	1	1	1
Spondylolysis	1	0.9886	0.9889	0.9943
Schmorl’s node	1	1	1	1
Osteophyte	1	1	1	1
Disc space narrowing	0.9922	1	0.9944	0.9961
Any stenosis	1	1	1	1
Disc sequestration	0.9942	1	0.9944	0.9971
Intradiscal vacuum	0.9874	0.9874	0.9778	0.9874

Table 4. Classification of previous studies to develop NLP systems regarding LBP pain, spinal disease, and complications of spinal surgery.

Study	Topic (Findings)	Source	Year
Datasets from radiological image reports
Tan et al. [16]	Radiologic findings associated with LBP.	Lumbar MRI reports and X-ray reports	2018
Jujjavarapu et al. [17]	Radiologic findings associated with LBP.	Lumbar MRI reports and X-ray reports	2022
Travis Caton et al. [19]	Relationship between radiologic findings of degenerative spinal stenosis on lumbar MRI (LMRI) and patient characteristics.	Lumbar MRI reports	2021
Caton et al. [20]	Performance metric for measuring global severity of lumbar degenerative disease (LSDD).	Lumbar MRI reports	2021
Huhdanpaa et al. [18]	Patients with Type 1 Modic endplate changes.	Lumbar MRI reports	2018
Lewandrowski et al. [26]	Producing automatic MRI reports.	MRI DICOM datasets	2020
Galbusera et al. [21]	Generating annotations for radiographic images.	Lumbar X-ray reports	2021
Datasets from free-text type reports
Miotto et al. [27]	Classification of acute LBP episode.	Free-text clinical notes	2020
Walsh et al. [28]	Identifying the axial SpA concepts.	Electronic medical records	2017
Walsh et al. [29]	Identifying the axial SpA.	Clinical chart database	2020
Zhao et al. [30]	Classification of the axial SpA patients.	Electronic health records	2020
Ehresman et al. [31]	Identifying incidental durotomy.	Intra-operative electronic health records	2020
Karhade et al. [32]	Identifying incidental durotomy.	Free-text operation notes	2020
Karhade et al. [33]	Prediction of intra-operative vascular injury (VI) and identifying VI.	Free-text operation notes	2021
Karhade et al. [34]	Identifying the post-operative wound infection that requires reoperation after lumbar discectomy.	Free-text operation notes	2020
Karhade et al. [35]	Predicting 90-day unplanned readmission of the lumbar spine fusion patients.	Free-text notes during hospitalization period	2022
Dantes et al. [36]	Venous thromboembolism.	Electronic medical records	2018

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, Y.; Song, C.; Song, G.; Kim, S.B.; Han, H.-W.; Han, I. Using Natural Language Processing to Identify Low Back Pain in Imaging Reports. Appl. Sci. 2022, 12, 12521. https://doi.org/10.3390/app122412521

AMA Style

Kim Y, Song C, Song G, Kim SB, Han H-W, Han I. Using Natural Language Processing to Identify Low Back Pain in Imaging Reports. Applied Sciences. 2022; 12(24):12521. https://doi.org/10.3390/app122412521

Chicago/Turabian Style

Kim, Yeji, Chanyoung Song, Gyuseon Song, Sol Bi Kim, Hyun-Wook Han, and Inbo Han. 2022. "Using Natural Language Processing to Identify Low Back Pain in Imaging Reports" Applied Sciences 12, no. 24: 12521. https://doi.org/10.3390/app122412521

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Using Natural Language Processing to Identify Low Back Pain in Imaging Reports

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. Inclusion Criteria

2.3. NLP System

2.4. Statistical Analysis

3. Results

3.1. Dataset Characteristics

3.2. NLP System Performances

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI