Deep Learning-Based Projection Angle Estimation for Lumbar Oblique Radiography: A Two-Stage Object Detection Approach Using Vertebral–Pedicle Ratio Analysis

Yamamoto, Riria; Tsutsumi, Kaori; Yoshimura, Takaaki; Sugimori, Hiroyuki

doi:10.3390/app16062800

Open AccessArticle

Deep Learning-Based Projection Angle Estimation for Lumbar Oblique Radiography: A Two-Stage Object Detection Approach Using Vertebral–Pedicle Ratio Analysis

by

Riria Yamamoto

¹,

Kaori Tsutsumi

²

,

Takaaki Yoshimura

^3,4,5

and

Hiroyuki Sugimori

^2,3,*

¹

Department of Health Sciences, School of Medicine, Hokkaido University, Sapporo 060-0812, Japan

²

Department of Biomedical Science and Engineering, Faculty of Health Sciences, Hokkaido University, Sapporo 060-0812, Japan

³

Global Center for Biomedical Science and Engineering, Faculty of Medicine, Hokkaido University, Sapporo 060-8638, Japan

⁴

Department of Health Sciences and Technology, Faculty of Health Sciences, Hokkaido University, Sapporo 060-0812, Japan

⁵

Department of Medical Physics, Hokkaido University Hospital, Sapporo 060-8648, Japan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(6), 2800; https://doi.org/10.3390/app16062800

Submission received: 27 February 2026 / Revised: 9 March 2026 / Accepted: 9 March 2026 / Published: 14 March 2026

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Lumbar oblique radiography plays a crucial role in diagnosing spinal disorders, particularly spondylolysis and spondylolisthesis. Achieving optimal projection angles remains challenging due to variability in positioning techniques and subjective quality assessment. This study presents a deep learning framework for automatic angle estimation in lumbar oblique X-ray images using a two-stage object detection approach. Training data consisted of synthetic X-ray images generated from CT datasets with known projection angles (20° to 60°), annotated with three classes: L2–L4 vertebral levels, vertebral bodies, and pedicles. Two detection models were compared: Model1, a three-class whole-image detector, and Model2, a single-class pedicle detector applied to vertebral body crops from Model1. The Vertebral–Pedicle Ratio (VPR) was used to estimate projection angle via separate linear regression for negative-angle (n-group) and positive-angle (p-group) projections. Five-fold cross-validation showed Model2 achieved higher detection performance (macro mean AP@0.5 = 0.913, mean DSC = 0.825) than Model1 (macro mean AP@0.5 = 0.762, mean DSC = 0.791). Pooled regression yielded R²_n = 0.832 and R²_p = 0.870. Angle estimation with Model2 achieved MAE = 5.42° (SD 1.08°), substantially lower than Model1 (MAE = 9.57°, SD 1.64°), while Model1 offered faster throughput (18.3 FPS vs. 2.9 FPS). Two-stage pedicle detection using VPR-based linear regression provides clinically acceptable angle estimation accuracy in lumbar oblique radiography. Automated angle verification enables real-time positioning feedback during imaging, post-imaging image quality documentation in PACS, and retrospective auditing of facility positioning protocols. These comprehensive implementations are expected to standardize lumbar oblique radiography.

Keywords:

deep learning; lumbar spine radiography; angle estimation; object detection; vertebral–pedicle ratio; two-stage detection; quality control

1. Introduction

Low back pain represents one of the most prevalent musculoskeletal complaints worldwide. In Japan, the burden is particularly high: the National Health and Nutrition Survey consistently identifies back pain as the leading symptom-based complaint, with prevalence rates of 91.6 per 1000 men and 111.9 per 1000 women [1]. This substantial disease burden underscores the critical importance of accurate and efficient diagnostic imaging techniques to identify treatable etiologies. Among the most clinically significant causes of back pain in young individuals are lumbar spondylolysis and its sequela, spondylolisthesis.

Spondylolysis is a stress fracture of the pars interarticularis of the lumbar vertebra, occurring in approximately 5–7% of the general Japanese population based on CT examination [2]. Its prevalence is markedly higher among adolescent athletes experiencing back pain. The condition exhibits a notable sex disparity, with males showing higher incidence rates than females [3]. Vertebral level distribution is non-uniform: L5 is affected in approximately 90% of cases, followed by L4. The natural history of spondylolysis is consequential: L5 spondylolysis progresses to spondylolisthesis in approximately 30% of cases, while L4 shows a 90% progression rate, emphasizing the critical need for early and accurate diagnosis [4]. Without appropriate management, progressive vertebral slippage can result in radiculopathy, neurogenic claudication, and significant disability.

The diagnostic gold standard for detecting lumbar spondylolysis remains the oblique radiograph, which reveals the characteristic “Scotty dog sign”—a disruption in cortical continuity at the pars interarticularis that appears as a collar on the neck of the Scotty dog silhouette formed by the vertebral elements. Clinically recommended oblique projection angles range from 30° to 45°, with specific institutional protocols suggesting 35°, 40°, or 45° projections depending on patient anatomy. Despite the widespread use of CT and MRI for definitive diagnosis, plain oblique radiographs remain an indispensable first-line screening tool, particularly in outpatient settings and facilities where advanced cross-sectional imaging is not immediately available.

Achieving and verifying the appropriate projection angle in oblique radiography presents several challenges. First, the lack of standardization across institutions leads to inconsistent image quality. Second, subjective assessment of image adequacy frequently results in repeat examinations, increasing patient radiation exposure, examination costs, and workflow inefficiencies. Third, even when radiographers intend to position a patient at a target angle, the actual anatomical projection angle may differ from the intended value due to variability in body habitus, table positioning, and tube angulation. The accurate verification of projection angles is therefore a clinically meaningful problem that has, until recently, lacked an automated, objective solution. Beyond quality rejection, automated angle knowledge enables several downstream applications: real-time acquisition feedback allowing immediate repositioning before the patient leaves the table; automatic annotation of the verified projection angle in PACS records, which supports consistent radiologist interpretation and medico-legal documentation; retrospective institutional audit of positioning consistency; and geometric correction or normalization of diagnostic measurements—such as slip percentage in spondylolisthesis grading—that depend on a known projection geometry [5]. In surgical planning contexts, knowing the precise oblique angle from preoperative radiographs allows the surgeon to account for the projection geometry when estimating pars defect dimensions or screw trajectory [6].

Artificial intelligence (AI) and deep learning have demonstrated transformative potential in medical imaging, with convolutional neural networks (CNNs) achieving human-level performance across a wide range of clinical decision- and workflow-support tasks [7,8,9,10,11]. The YOLOX anchor-free object detector has been particularly influential, offering real-time detection with strong performance on multi-scale targets [12]. Within spinal imaging specifically, deep learning methods have been applied to vertebral body detection and segmentation [13], spondylolisthesis classification [14], and the automated assessment of degenerative spinal conditions from plain radiographs [15]. These prior works collectively demonstrate that anatomical structures in spine radiographs can be reliably localized using modern object detection frameworks, even in the presence of complex overlapping anatomy.

A key insight motivating the present work is the predictable geometric relationship between vertebral bodies and pedicles as a function of projection angle. As the oblique angle changes, the relative horizontal offset of the pedicle with respect to the vertebral body changes systematically. This change can be quantified using the Vertebral–Pedicle Ratio (VPR), a normalized geometric index derived from bounding box coordinates detected by an object detection model. If the VPR can be reliably estimated from the detected anatomy, projection angle can be estimated through linear regression, enabling fully automatic, quantitative quality control of lumbar oblique radiography.

The use of synthetic X-ray images generated from CT data provides a solution to the training data problem. Digitally reconstructed radiographs (DRRs) produced from CT volumes by ray-sum projection at controlled angles yield large annotated datasets with precisely known ground-truth angles [16]. Several studies have demonstrated that models trained on synthetic spine radiographs can transfer effectively to clinical images [17], and the approach is now recognized as a general strategy for data augmentation and model development in projection radiography.

Initial experiments indicated that simultaneous detection of three anatomical classes—vertebral region, vertebral body, and pedicle—from the full radiograph yielded inconsistent pedicle localization, particularly at the smaller scales at which pedicles appear in whole-image views. This motivated the design of a two-stage strategy in which a dedicated pedicle detector operates within a pre-localized vertebral body crop, reducing the detection search space and increasing the relative size of the target structure. Model1 was therefore retained both as a standalone comparison baseline and as the first-stage detector within the Model2 pipeline. The purpose of this study was to develop and systematically evaluate a deep learning-based framework for automated projection angle estimation in lumbar oblique radiography using a Vertebral–Pedicle Ratio (VPR) approach. Two object detection pipelines were compared: a single-stage three-class detector (Model1) that simultaneously identifies the L2–L4 vertebral region, vertebral bodies, and pedicles from the full radiograph; and a two-stage detector (Model2) in which vertebral body localization by Model1 is followed by dedicated single-class pedicle detection within the cropped vertebral body region. The impact of these contrasting detection strategies on both object detection accuracy and downstream angle estimation error was assessed through five-fold cross-validation.

Novelty and contributions: (i) We introduce a geometry-informed Vertebral–Pedicle Ratio (VPR) computed from detector outputs to estimate the lumbar oblique projection angle via regression; (ii) we propose a two-stage detection pipeline that improves small-structure (pedicle) localization by operating within vertebral-body crops; (iii) we leverage CT-derived digitally reconstructed radiographs (DRRs) with precisely known projection angles for systematic five-fold cross-validation; and (iv) we discuss practical deployment scenarios, including near-real-time acquisition feedback versus post-acquisition quality control.

2. Materials and Methods

2.1. System Configuration and Development Environment

All model development, training, and evaluation were performed on a high-performance workstation equipped with an Intel Core i9-10980XE processor (3.0 GHz), dual NVIDIA RTX A6000 GPUs (48 GB each), and 64 GB DDR4 RAM. Algorithm development and implementation were conducted in MATLAB 2024a (MathWorks Inc., Natick, MA, USA) using the Deep Learning Toolbox and Computer Vision Toolbox. An overview of the entire study pipeline, from data preparation through angle estimation output, is provided in Figure 1.

2.2. Training Data and Target Scope

Training data consisted of JPEG images paired with same-name text annotation files recording class labels and bounding box coordinates in YOLO format (normalized center x, center y, width, height per class). The three annotation classes used for Model1 were: (i) L2–L4 vertebral region as a single encompassing bounding box, (ii) individual vertebral bodies, and (iii) individual pedicles. Each image was accompanied by its projection angle label, which was assigned automatically from the filename suffix (e.g., _n20 indicating a negative-angle projection of −20°, and _p45 indicating a positive-angle projection of +45°), where the prefix “n” denotes a third-oblique (negative-angle) projection and “p” denotes a fourth-oblique (positive-angle) projection. Angle estimation analysis was restricted to images with absolute projection angles between 20° and 60°, corresponding to the clinically recommended range for lumbar oblique radiography.

2.3. Dataset Acquisition and Synthetic Image Generation

2.3.1. CT Data Source

Synthetic X-ray images were generated from publicly available CT datasets obtained from The Cancer Imaging Archive (TCIA), focusing on scans with clear lumbar vertebral anatomy. From this repository, 100 high-quality CT datasets were selected. Synthetic radiograph generation followed a ray-sum projection protocol: starting from a patient-specific reference angle defined by the line connecting the spinal canal center and the spinous process, oblique projections were generated at 5° increments from 20° to 60° (nine angles per patient), yielding 900 synthetic radiographs with precisely known ground-truth projection angles. Post-processing steps included window-level optimization to replicate typical radiographic appearance.

2.3.2. Data Augmentation

To enhance model robustness and prevent overfitting, a comprehensive augmentation pipeline was applied to all training images before each epoch. Augmentation operations included: horizontal flipping, rotation at discrete angles between −15° and +15°, and intensity variation with multiple brightness. Each augmentation was applied independently, generating a substantially larger effective training dataset size for both the whole-image detection (Model1) and the vertebra-specific pedicle detection (Model2) tasks. The augmentation parameters were kept identical for both models to ensure a fair comparison.

2.4. Object Detection Model Architecture (Model1 and Model2)

Both models employed the YOLOX anchor-free detection architecture [12], which is schematically illustrated in Figure 2. YOLOX adopts a CSPDarkNet53 backbone to extract multi-scale feature maps, a Feature Pyramid Network (FPN) combined with a Path Aggregation Network (PAN) neck to fuse features across scales, and a decoupled detection head that separately handles classification and bounding box regression tasks at each of three output scales (P3/8×, P4/16×, P5/32×). Unlike anchor-based predecessors, YOLOX uses an anchor-free prediction strategy whereby each feature map grid point directly predicts an object’s bounding box center and dimensions without reference to pre-defined anchor sizes. Label assignment during training is performed by the SimOTA strategy, which globally optimizes the matching between predicted boxes and ground-truth boxes based on classification and regression costs. Non-maximum suppression (NMS) is applied as a post-processing step to remove duplicate detections.

Model1 is a three-class detector that receives whole radiographs and simultaneously detects L2–L4 vertebral region, vertebral body, and pedicle. Model2 is a single-class pedicle detector applied in a two-stage pipeline: Model1 first detects the vertebral body region; this region is then cropped and input to Model2, which detects pedicles exclusively within the crop. Model2 thus operates on a substantially smaller and more homogeneous search space than Model1, which is the principal mechanism underlying its performance advantage. Both models were trained with the following hyperparameters: 10 epochs, learning rate 0.0001, batch size 128, and momentum 0.9, with the same augmentation strategy applied to both.

2.5. Detection Performance Evaluation

Detection performance was assessed using 5-fold cross-validation applied uniformly to both models. Two complementary metrics were computed. First, Average Precision at an IoU threshold of 0.5 (AP@0.5) was used as the primary detection quality metric, following standard object detection benchmarking practice. Second, a bounding-box-unit Dice Similarity Coefficient (DSC) was computed at the level of individual matched bounding box pairs: DSC = 2|A ∩ B|/(|A| + |B|), where A and B are the areas of a predicted and ground-truth bounding box, respectively. Predicted and ground-truth boxes were matched by greedy IoU-based assignment. Per-fold, per-class DSC statistics (mean, median, SD, IQR) were computed across all matched pairs. For Model1 (three classes), all per-class metrics were averaged with equal weights to yield macro-average values. For Model2 (single class), the macro-average is identical to the class-level result.

2.6. Evaluation Metrics

The Vertebral–Pedicle Ratio (VPR) is a normalized geometric index that captures the horizontal offset of the pedicle relative to the corresponding vertebral body as observed in the oblique projection, as illustrated in Figure 3B. It is formally defined as:

VPR = (x_pd − x_vb)/w_vb

where x_pd is the left-edge x-coordinate of the pedicle bounding box in full-image pixel space, x_vb is the left-edge x-coordinate of the corresponding vertebral body bounding box, and w_vb is the width of the vertebral body bounding box. Normalization by w_vb makes the VPR invariant to image resolution and to the absolute size of the vertebral body, which varies across patients and levels. As the projection angle increases within the n-group or p-group range, the pedicle shifts progressively relative to the vertebral body in a predictable direction, yielding a monotonic and near-linear VPR–angle relationship.

Because each radiograph was expected to contain vertebral levels L2, L3, and L4, up to three VPR measurements were available per image. The image-level VPR was computed as the arithmetic mean of the level-wise VPRs. Individual vertebral body candidates were accepted if their bounding box center fell within the L2–L4 region detected by Model1; pedicle candidates were accepted if their bounding box center fell within the accepted vertebral body bounding box. Images in which all three vertebral levels (L2, L3, L4) could not be successfully assigned were excluded from the angle estimation analysis.

2.7. Inference Protocol

The complete inference protocol, illustrated schematically in Figure 3A, proceeded as follows for each test image.

Step 1 (Image Loading and Preprocessing): The input image was loaded from disk, resized to the model input dimensions, and pixel values were normalized to the range required by the YOLOX architecture. For the Model2 pipeline, this step also generated the cropped vertebral body region (see Step 3 below).

Step 2 (Forward Pass through YOLOX): The preprocessed image was passed forward through the YOLOX network, comprising the CSPDarkNet53 backbone, the FPN-PAN neck, and the decoupled detection head. At each of the three output scales (P3, P4, P5), the decoupled head simultaneously produced class score predictions and bounding box regression predictions for all grid positions. Raw predictions were converted to absolute bounding box coordinates and confidence scores by decoding the model outputs.

Step 3 (NMS Post-Processing and Geometric Filtering): Raw detections were first filtered by a confidence threshold, then subjected to Non-Maximum Suppression (NMS) to eliminate duplicate bounding boxes with high overlap. The NMS IoU threshold was set consistently for both models. After NMS, geometric filtering was applied: vertebral body candidates were retained if their bounding box center lay within the detected L2–L4 region, and pedicle candidates were retained if their bounding box center lay within the retained vertebral body bounding box. Final assignments were performed by greedy IoU-based matching between vertebral bodies and pedicles. For the Model2 pipeline specifically, the following additional steps were applied between Step 2 and Step 3: (a) Model1 was first used to detect vertebral body bounding boxes on the full image; (b) for each accepted vertebral body, a cropped subimage was extracted from the full radiograph using the Model1 vertebral body bounding box coordinates; (c) Model2 was applied independently to each cropped subimage to detect the pedicle within the crop; and (d) the pedicle bounding box coordinates were transformed back to full-image coordinate space by adding the offset of the crop within the full image (x_pd_full = x_pd_crop + x_vb_offset; y_pd_full = y_pd_crop + y_vb_offset), after which the standard geometric filtering and VPR computation proceeded on full-image coordinates.

Step 4 (VPR Computation): For each successfully matched vertebral body–pedicle pair at L2, L3, and L4, the level-wise VPR was computed using the formula defined in Section 2.6. The image-level VPR was obtained as the mean of available level-wise VPRs. Images for which fewer than three levels were successfully matched were excluded from downstream angle estimation.

Step 5 (Group Assignment and Angle Estimation): The projection group (n-group or p-group) was determined from the filename suffix of each test image. Based on the assigned group, the corresponding linear regression coefficients (a_n, b_n or a_p, b_p) estimated during training were applied to compute the projected angle estimate:

\hat{θ}

= a · VPR + b. The absolute difference between the estimated angle and the ground-truth angle from the filename was recorded as the image-level absolute error, and MAE was computed across all images within a fold.

Step 6 (Processing Speed Measurement): Inference processing speed was measured in frames per second (FPS) for each fold. The mean FPS was then calculated from the second frame onward across all test images within the fold, and fold-level FPS values were summarized as mean ± SD across five folds.

2.8. Linear Regression for Angle Estimation

The relationship between image-level VPR and projection angle was modelled separately for the n-group and the p-group using ordinary least-squares (OLS) linear regression:

\hat{θ}

= a · VPR + b, where a is the regression slope and b is the intercept. This separation is necessary because the direction of pedicle offset relative to the vertebral body is opposite for third-oblique (n-group) and fourth-oblique (p-group) projections, meaning that a single combined regression would be non-linear and poorly fitting. Regression coefficients (a_n, b_n) and (a_p, b_p) and coefficients of determination (R²_n, R²_p) were estimated per fold using training data only. Fold-level coefficients were summarized across five folds (mean, median, SD, IQR). In addition, a pooled regression was computed by concatenating all training data from all five folds to obtain globally stable coefficient estimates.

3. Results

3.1. Detection Performance: Per-Fold Results

Vertebral Body Detection

Table 1 and Table 2 present the per-fold detection performance of Model1 and Model2, respectively. Model2 consistently achieved higher AP@0.5 and mean DSC values compared with the Model1 macro-average in all five folds. Notably, Model1 exhibited substantial fold-to-fold variability in AP@0.5 (range: 0.553–0.890), reflecting the sensitivity of three-class simultaneous detection to training set composition.

3.2. Five-Fold Summary Statistics

Across five folds, Model2 achieved a mean macro AP@0.5 of 0.913 (SD 0.071) and mean DSC of 0.825 (SD 0.031), compared with 0.762 (SD 0.126) and 0.791 (SD 0.041) for Model1, respectively (Table 3).

3.3. Linear Regression Coefficients

Both the n-group and p-group exhibited high linearity between VPR and projection angle across all folds (R²_n range: 0.821–0.859; R²_p range: 0.860–0.879) (Table 4 and Table 5). The pooled regression yielded R²_n = 0.832 and R²_p = 0.870, confirming a stable and strong VPR–angle linear relationship (Table 6).

3.4. Angle Estimation Accuracy and Speed

Model2 achieved a substantially lower overall MAE (5.42° ± 1.08°) compared with Model1 (9.57° ± 1.64°) (Table 7). The improvement was consistent across both projection groups. The two-stage pipeline of Model2 operated at 2.87 FPS, compared with 18.27 FPS for the single-stage Model1.

3.5. Comparison with Published Approaches

To contextualize the proposed pipeline, we summarize representative published machine-learning approaches in related spine radiography tasks and synthetic-to-clinical transfer (Table 8). Direct, one-to-one benchmarks for lumbar oblique projection-angle estimation are limited; therefore, we compare against the closest work-flow-relevant tasks (pose/position estimation, spinal landmark/detection in radiographs, and synthetic DRR training).

4. Discussion

This study demonstrates that a two-stage object detection approach combining vertebral body localization and dedicated pedicle detection achieves substantially better angle estimation accuracy compared with a single-stage three-class detector for lumbar oblique radiography. The overall MAE of 5.42° achieved by Model2 is within the clinically tolerated positioning variation of ±5° that is commonly accepted in routine spinal radiographic practice. This level of accuracy validates our methodological approach in a radiographic context.

The rationale for developing two models was grounded in the expectation that simultaneous multi-class detection of small structures such as pedicles in whole-image context would be inherently more challenging than single-class detection within a constrained crop. Model1 was first developed as a straightforward end-to-end pipeline; its relatively high fold-to-fold variability in AP@0.5 (range: 0.553–0.890) confirmed this hypothesis and directly motivated the two-stage Model2 design. The superiority of Model2 over Model1 in detection performance and angle estimation can be primarily attributed to the reduction in the detection search space. When pedicle detection is performed within a cropped region already containing the target vertebral body, the relative size of the pedicle in the image is substantially larger, reducing the challenges inherent in detecting small objects in full-field radiographs. This finding aligns with the broader principle that hierarchical or region-of-interest-based detection pipelines improve performance for anatomical structures with large inter-scale variability [13,18]. Our two-stage approach mirrors strategies applied in other spinal imaging studies—such as the YOLOv8-then-classification pipeline reported for lumbar spondylolisthesis detection [14]—and extends this concept to the projection radiograph domain.

The VPR metric proved to be a robust geometric index for projection angle estimation. High R² values (0.832 for n-group, 0.870 for p-group) in both per-fold and pooled regressions confirm that the horizontal offset of the pedicle relative to the vertebral body width changes predictably and nearly linearly with projection angle across the clinically relevant 20–60° range. The separate treatment of n-group and p-group projections is essential: because the VPR–angle relationship operates in opposing directions for these two projection types, combining both into a single regression would substantially reduce estimation accuracy. In the present study, the angle group label was derived directly from the filename suffix of the synthetic training and test images. In a fully automated clinical pipeline, this label could be reliably assigned by a lightweight oblique-direction classifier or by exploiting DICOM header metadata.

The fold-to-fold variability in regression slope coefficients reflects sensitivity to the composition of each training fold. However, the near-identical R² values in pooled regression versus per-fold estimates demonstrate that the linear VPR–angle relationship is a stable property of the underlying anatomy rather than an artifact of any particular data split. This stability provides confidence that the regression coefficients estimated from the synthetic training data will generalize to new synthetic images and, by extension, to clinical images in which the same vertebral–pedicle geometric relationship holds.

The processing speed trade-off between the two models has direct implications for clinical deployment scenarios. Model1 operates at 18.3 FPS, which is sufficient for near-real-time feedback during image acquisition. Model2 processes at approximately 2.9 FPS due to the sequential two-stage inference and the overhead of coordinate transformation. While this throughput is adequate for post-acquisition quality control workflows, it may not support real-time intraoperative guidance. Future engineering approaches including GPU-batched multi-vertebra crop inference, model quantization, and asynchronous pipeline execution could substantially improve Model2 throughput while preserving its accuracy advantage.

The use of synthetic radiographs generated from CT data is a principled solution to the training data bottleneck [17,19]. Ray-sum projection of CT volumes at controlled angles produces images with precisely known ground-truth projection angles, enabling supervised regression without the need to acquire multiple radiographs from the same patient at varying angles. The consistent R² values across folds confirm that overfitting to any particular CT dataset is minimal. Extension to clinical radiographs would require adaptation strategies—such as domain adaptation or fine-tuning with a small set of annotated clinical images—to bridge the appearance gap between synthetic and real radiographs.

Limitations and Assumptions

To avoid list-style discussion, we consolidate these points as follows. Because training and validation were performed on CT-derived DRRs, a domain gap may exist relative to clinical radiographs (e.g., noise/scatter and exposure variability) [20]; therefore, clinical validation using lumbar oblique radiographs with reference projection angles (derived from calibrated acquisition geometry or CT-based reconstruction) will be required [21]. The VPR-based formulation also assumes that the vertebral–pedicle geometric relationship is sufficiently consistent across subjects; this assumption may be violated in severe deformity or post-surgical instrumentation, motivating confidence-based rejection and/or implant-aware filtering as future work. In addition, the current implementation excludes cases where all three levels (L2–L4) cannot be matched, which improves stability but reduces coverage; we will implement a fallback strategy that fuses one- or two-level VPR estimates with reliability weighting and will report both MAE and exclusion rate with and without this fallback. Finally, our results indicate a practical speed–accuracy trade-off (Model1: 18.27 FPS; Model2: 2.87 FPS), supporting a hybrid workflow in which Model1 provides near-real-time feedback during acquisition and Model2 provides high-precision post-acquisition quality control [22].

Despite these limitations, the present results establish VPR-based two-stage detection as a sound methodological foundation for automated lumbar oblique radiographic quality control. Beyond detection accuracy, the clinical value of automated angle estimation lies in its potential for integration across the radiographic workflow. At the point of acquisition, real-time angle feedback can prompt immediate repositioning, eliminating the need for repeat examinations and reducing patient radiation dose—a particular concern in the predominantly young, athletic population affected by spondylolysis [2,3]. After acquisition, automatic PACS annotation of the verified projection angle supports consistent radiological interpretation, because the apparent morphology of the Scotty dog sign varies with projection angle; a documented record of the actual angle allows the reporting radiologist to contextualize borderline findings. In longitudinal follow-up, angle-normalized comparison of serial radiographs reduces measurement variability attributable to positioning differences. In surgical planning, the verified oblique angle from preoperative radiographs enables geometrically corrected estimation of pars defect dimensions and screw trajectory [6], improving operative precision. Finally, from a broader AI-in-radiology perspective, this work contributes to the growing evidence that synthetic data generation combined with appropriate geometric modeling and hierarchical detection pipelines can produce clinically meaningful automated assessments of radiographic acquisition quality [7,15]. Integration of such tools into routine radiography workflows could ultimately reduce repeat examinations, lower patient radiation exposure, and support the standardization of imaging protocols across institutions.

5. Conclusions

This study successfully developed and evaluated a two-stage deep learning framework for automated angle estimation in lumbar oblique radiography. Model2, a dedicated pedicle detector applied within vertebral body crops, achieved a mean AP@0.5 of 0.913 and a mean bounding-box DSC of 0.825 across five folds, outperforming the single-stage three-class detector (Model1). VPR-based linear regression demonstrated high coefficients of determination (R²_n = 0.832, R²_p = 0.870 in pooled regression), and angle estimation with Model2 yielded a clinically acceptable MAE of 5.42°, compared with 9.57° for Model1. While Model2 trades processing speed (2.9 FPS vs. 18.3 FPS) for accuracy, both pipelines offer practical quality-control utility in post-acquisition workflows. These results establish VPR-based two-stage detection as a sound foundation for automated standardization of lumbar oblique radiographic positioning.

Author Contributions

R.Y. contributed to the data analysis, algorithm construction, and writing and editing of the manuscript. K.T. and T.Y. reviewed and edited the manuscript. H.S. proposed the idea and contributed to the data acquisition, performed supervision and project administration, and reviewed and edited the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The created models in this study are available on request from the corresponding author. The source code of this study is available at https://github.com/MIA-laboratory/LumbarVPR/ (accessed on 25 February 2026).

Acknowledgments

The authors would like to thank the laboratory members of the Medical Image Analysis Laboratory for their help.

Conflicts of Interest

The authors declare that no conflicts of interest exist.

Abbreviations

The following abbreviations are used in this manuscript:

AI	artificial intelligence
AP@0.5	Average Precision at IoU threshold 0.5
BN	batch normalization
CNN	convolutional neural network
CSP	Cross-Stage Partial
CT	computed tomography
DRR	digitally reconstructed radiograph
DSC	Dice similarity coefficient
FPN	Feature Pyramid Network
FPS	frames per second
IoU	Intersection over Union
MAE	mean absolute error
NMS	non-maximum suppression
OLS	ordinary least-squares
PACS	picture archiving and communication system
PAN	Path Aggregation Network
ROI	region of interest
SD	standard deviation
SimOTA	Simplified Optimal Transport Assignment
SiLU	Sigmoid Linear Unit
SPP	Spatial Pyramid Pooling
TCIA	The Cancer Imaging Archive
VB	vertebral body
VPR	Vertebral–Pedicle Ratio
YOLO	You Only Look Once

References

Muraki, S.; Akune, T.; Oka, H.; Ishimoto, Y.; Nagata, K.; Yoshida, M.; Tokimura, F.; Nakamura, K.; Kawaguchi, H.; Yoshimura, N. Incidence and Risk Factors for Radiographic Lumbar Spondylosis and Lower Back Pain in Japanese Men and Women: The ROAD Study. Osteoarthr. Cartil. 2012, 20, 712–718. [Google Scholar] [CrossRef] [PubMed]
Sakai, T.; Sairyo, K.; Takao, S.; Nishitani, H.; Yasui, N. Incidence of Lumbar Spondylolysis in the General Population in Japan Based on Multidetector Computed Tomography Scans from Two Thousand Subjects. Spine 2009, 34, 2346–2350. [Google Scholar] [CrossRef] [PubMed]
Asai, R.; Tatsumura, M.; Gamada, H.; Okuwaki, S.; Eto, F.; Nagashima, K.; Takeuchi, Y.; Funayama, T.; Mammoto, T.; Hirano, A.; et al. Epidemiological Differences between the Sexes in Adolescent Patients with Lumbar Spondylolysis: A Single-Institution Experience in Japan. BMC Musculoskelet. Disord. 2023, 24, 558. [Google Scholar] [CrossRef] [PubMed]
Beutler, W.J.; Fredrickson, B.E.; Murtland, A.; Sweeney, C.A.; Grant, W.D.; Baker, D. The Natural History of Spondylolysis and Spondylolisthesis: 45-Year Follow-Up Evaluation. Spine 2003, 28, 1027–1035. [Google Scholar] [CrossRef] [PubMed]
Vialle, R.; Levassor, N.; Rillardon, L.; Templier, A.; Skalli, W.; Guigui, P. Radiographic Analysis of the Sagittal Alignment and Balance of the Spine in Asymptomatic Subjects. J. Bone Jt. Surg. 2005, 87, 260–267. [Google Scholar] [CrossRef] [PubMed]
Herman, M.J.; Pizzutillo, P.D. Spondylolysis and Spondylolisthesis in the Child and Adolescent: A New Classification. Clin. Orthop. Relat. Res. 2005, 434, 46–54. [Google Scholar] [CrossRef] [PubMed]
Yamashita, R.; Nishio, M.; Do, R.K.G.; Togashi, K. Convolutional Neural Networks: An Overview and Application in Radiology. Insights Imaging 2018, 9, 611–629. [Google Scholar] [CrossRef] [PubMed]
Yoshimura, T.; Nishioka, K.; Hashimoto, T.; Mori, T.; Kogame, S.; Seki, K.; Sugimori, H.; Yamashina, H.; Nomura, Y.; Kato, F.; et al. Prostatic Urinary Tract Visualization with Super-Resolution Deep Learning Models. PLoS ONE 2023, 18, e0280076. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Yoshimura, T.; Horima, Y.; Sugimori, H. A Hessian-Based Deep Learning Preprocessing Method for Coronary Angiography Image Analysis. Electronics 2024, 13, 3676. [Google Scholar] [CrossRef]
Usui, K.; Yoshimura, T.; Ichikawa, S.; Sugimori, H. Development of Chest X-Ray Image Evaluation Software Using the Deep Learning Techniques. Appl. Sci. 2023, 13, 6695. [Google Scholar] [CrossRef]
Yoshimura, T.; Hasegawa, A.; Kogame, S.; Magota, K.; Kimura, R.; Watanabe, S.; Hirata, K.; Sugimori, H. Medical Radiation Exposure Reduction in PET via Super-Resolution Deep Learning Model. Diagnostics 2022, 12, 872. [Google Scholar] [CrossRef] [PubMed]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
Da Mutten, R.; Zanier, O.; Theiler, S.; Ryu, S.J.; Regli, L.; Serra, C.; Staartjes, V.E. Whole Spine Segmentation Using Object Detection and Semantic Segmentation. Neurospine 2024, 21, 57–67. [Google Scholar] [CrossRef] [PubMed]
Xu, C.; Liu, X.; Bao, B.; Liu, C.; Li, R.; Yang, T.; Wu, Y.; Zhang, Y.; Tang, J. Two-Stage Deep Learning Model for Diagnosis of Lumbar Spondylolisthesis Based on Lateral X-Ray Images. World Neurosurg. 2024, 186, e652–e661. [Google Scholar] [CrossRef] [PubMed]
Hong, N.; Cho, S.W.; Shin, S.; Lee, S.; Jang, S.A.; Roh, S.; Lee, Y.H.; Rhee, Y.; Cummings, S.R.; Kim, H.; et al. Deep-Learning-Based Detection of Vertebral Fracture and Osteoporosis Using Lateral Spine X-Ray Radiography. J. Bone Miner. Res. 2023, 38, 887–895. [Google Scholar] [CrossRef] [PubMed]
Unberath, M.; Zaech, J.-N.; Lee, S.C.; Bier, B.; Goldmann, F.; Fotouhi, J.; Navab, N. DeepDRR—A Catalyst for Machine Learning in Fluoroscopy-Guided Procedures. Lect. Notes Comput. Sci. (MICCAI 2018) 2018, 11073, 98–106. [Google Scholar] [CrossRef]
Sukesh, R.; Fieselmann, A.; Jaganathan, S.; Shetty, K.; Kärgel, R.; Kordon, F.; Kappler, S.; Maier, A. Training Deep Learning Models for 2D Spine X-rays Using Synthetic Images and Annotations Created from 3D CT Volumes. In Informatik Aktuell; Springer Vieweg: Wiesbaden, Germany, 2022; pp. 63–68. ISBN 9783658369316. [Google Scholar]
Prisilla, A.A.; Guo, Y.L.; Jan, Y.K.; Lin, C.Y.; Lin, F.Y.; Liau, B.Y.; Tsai, J.Y.; Ardhianto, P.; Pusparani, Y.; Lung, C.W. An Approach to the Diagnosis of Lumbar Disc Herniation Using Deep Learning Models. Front. Bioeng. Biotechnol. 2023, 11, 1247112. [Google Scholar] [CrossRef] [PubMed]
Sherouse, G.W.; Novins, K.; Chaney, E.L. Computation of Digitally Reconstructed Radiographs Using Geometrically Correct Perspective Projection. Med. Phys. 1990, 17, 504–510. [Google Scholar] [CrossRef]
Toth, D.; Cimen, S.; Ceccaldi, P.; Kurzendorfer, T.; Rhode, K.; Mountney, P. Training Deep Networks on Domain Randomized Synthetic X-Ray Data for Cardiac Interventions. In Proceedings of the Medical Imaging with Deep Learning (MIDL) 2019-Full Paper Track, London, UK, 8–10 July 2019; Volume 102, pp. 468–482. [Google Scholar]
Jecklin, S.; Shen, Y.; Gout, A.; Suter, D.; Calvet, L.; Zingg, L.; Straub, J.; Cavalcanti, N.A.; Farshad, M.; Fürnstahl, P.; et al. Domain Adaptation Strategies for 3D Reconstruction of the Lumbar Spine Using Real Fluoroscopy Data. Med. Image Anal. 2024, 98, 103322. [Google Scholar] [CrossRef] [PubMed]
Del Cerro, C.F.; Giménez, R.C.; García-Blas, J.; Sosenko, K.; Ortega, J.M.; Desco, M.; Abella, M. Deep Learning-Based Estimation of Radiographic Position to Automatically Set Up the X-Ray Prime Factors. J. Imaging Inform. Med. 2024, 38, 1661–1668. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Overall Study Pipeline for Lumbar Oblique Radiograph Angle Estimation. The pipeline consists of four sequential phases: (1) data preparation, including CT dataset selection from TCIA, digitally reconstructed radiograph (DRR) generation by ray-sum projection at 5° increments from 20° to 60°, data augmentation, YOLO-format annotation, and 5-fold cross-validation data splitting; (2) model training, in which Model1 (a three-class whole-image YOLOX detector [12]) and Model2 (a single-class pedicle-in-crop YOLOX detector [12]) are trained independently on each fold; (3) inference protocol, comprising full-image detection by Model1, optional vertebral body cropping and dedicated pedicle detection by Model2 with coordinate transformation back to full-image space, geometric filtering and VPR computation; and (4) output, in which projection angle is estimated by separate linear regression for n-group and p-group projections.

Figure 2. YOLOX Anchor-Free Detection Architecture Used in This Study. The architecture consists of four components processed sequentially: (i) Input image (full radiograph for Model1; vertebral body crop for Model2); (ii) Backbone—CSPDarkNet53, which extracts multi-scale feature maps at 1/8×, 1/16×, and 1/32× downscaling factors using Focus modules, Cross-Stage Partial (CSP) stages, Spatial Pyramid Pooling (SPP), and SiLU activation; (iii) Neck—Feature Pyramid Network (FPN) top-down pathway combined with a Path Aggregation Network (PAN) bottom-up pathway, fusing the three backbone output scales into a rich multi-scale feature representation; (iv) Decoupled Detection Head, applied identically at each of the three scales, with separate branches for classification (class score) and regression (bounding box coordinates and IoU score). Label assignment during training uses the SimOTA dynamic assignment strategy. Non-maximum suppression (NMS) is applied at inference to yield the final set of bounding boxes. For Model1, output classes are L2–L4 region, vertebral body, and pedicle; for Model2, output is pedicle only.

Figure 3. Two-Stage Inference Protocol and VPR Geometry. (A) Flowchart of the complete inference protocol. After image loading and preprocessing (Step 1), a forward pass through YOLOX is executed (Step 2) and followed by NMS post-processing and geometric filtering (Step 3). If the Model2 pipeline is active, vertebral body crops are extracted, Model2 is applied to each crop, and pedicle coordinates are transformed back to full-image space before geometric filtering. The image-level VPR is then computed as the mean of up to three level-wise VPRs (Step 4), and angle is estimated by group-specific linear regression (Step 5). Processing speed is measured excluding the first warm-up frame (Step 6). (B) VPR geometry and schematic regression. The left schematic shows the horizontal positions of the vertebral body (x_vb) and pedicle (x_pd) bounding boxes and the vertebral body width (w_vb) used to compute the normalized VPR. The scatter plot schematic on the right illustrates the separate linear regression lines for the n-group (third oblique, blue) and p-group (fourth oblique, red), with the pooled R² values annotated.

Table 1. Model1 (three classes) per-fold macro-average detection performance (Macro AP@0.5, Macro mean DSC, FPS excluding first-frame warm-up).

Fold	Macro AP@0.5 ^a	Macro Mean DSC ^b	FPS ^c
1	0.5525	0.7205	19.70
2	0.7566	0.8034	19.88
3	0.8028	0.8066	19.76
4	0.8053	0.7987	19.84
5	0.8903	0.8259	19.77

^a Average Precision at IoU threshold 0.5, ^b Dice similarity coefficient, ^c Frame per second.

Table 2. Model2 (pedicle class only) per-fold detection performance.

Fold	AP@0.5 ^a	Mean DSC ^b	FPS ^c
1	0.7899	0.7712	19.90
2	0.9628	0.8462	20.89
3	0.9193	0.8271	20.89
4	0.9561	0.8404	20.96
5	0.9349	0.8388	20.97

^a Average Precision at IoU threshold 0.5, ^b Dice similarity coefficient, ^c Frame per second.

Table 3. Five-fold summary statistics (mean, median, SD, IQR across folds) for both models.

Model	Metric	Mean	Median	SD	IQR
Model1	Macro AP@0.5	0.7615	0.8028	0.1264	0.0487
Model1	Macro mean DSC	0.7910	0.8034	0.0407	0.0079
Model1	FPS	19.791	19.773	0.0709	0.0766
Model2	AP@0.5	0.9126	0.9349	0.0707	0.0368
Model2	Mean DSC	0.8247	0.8388	0.0307	0.0133
Model2	FPS	20.723	20.893	0.4638	0.0676

Table 4. Per-fold linear regression coefficients for n-group and p-group (

\hat{θ}

= a·VPR + b). a: slope; b: intercept; R²: coefficient of determination; N: number of training samples.

Table 4. Per-fold linear regression coefficients for n-group and p-group (

\hat{θ}

= a·VPR + b). a: slope; b: intercept; R²: coefficient of determination; N: number of training samples.

Fold	a_n	b_n	R²_n	N_n	a_p	b_p	R²_p	N_p
1	102.22	−77.92	0.8589	1434	102.48	3.171	0.8794	1438
2	95.838	−76.10	0.8267	1428	96.586	5.329	0.8598	1426
3	93.612	−74.66	0.8209	1426	97.719	4.755	0.8684	1424
4	92.941	−74.31	0.8278	1426	97.522	5.260	0.8758	1424
5	93.406	−73.95	0.8363	1430	97.659	5.054	0.8670	1424

Table 5. Five-fold summary of regression coefficients (mean, median, SD, IQR).

Param	Mean	Median	SD	IQR
a_n	95.603	93.612	3.864	4.144
b_n	−75.387	−74.657	1.633	2.339
R²_n	0.8341	0.8278	0.01491	0.01674
a_p	98.394	97.659	2.332	1.622
b_p	4.714	5.054	0.891	0.919
R²_p	0.8701	0.8684	0.00772	0.01151

Table 6. Pooled regression using all five-fold training data concatenated.

a_n	b_n	R²_n	N_n	a_p	b_p	R²_p	N_p
95.306	−75.272	0.8321	7144	98.316	4.741	0.8696	7136

Table 7. Angle estimation accuracy (MAE, mean ± SD across folds) and processing speed comparison between Model1 and Model2.

Metric	Model1	Model2
FPS (mean ± SD)	18.269 ± 0.361	2.875 ± 0.097
Overall MAE (°)	9.574 ± 1.635	5.420 ± 1.084
MAE—n-group (°)	8.800	5.611
MAE—p-group (°)	10.433	5.285

Table 8. Summary of related published ML approaches and comparison axes.

Reference	Task	Data	Output	Metric	Notes vs. Ours
[14]	Two-stage DL for lumbar spondylolisthesis	Clinical lateral X-ray	Diagnosis/classification	AUC/Acc	Different endpoint; supports two-stage spine radiography pipelines
[16]	DeepDRR synthetic radiographs for ML	CT-derived synthetic DRR	Radiograph synthesis	N/A	Supports synthetic training with controlled geometry
[17]	Synthetic-to-real training for 2D spine X-rays	Synthetic (CT) → clinical transfer	Detection/segmentation	Varies	Closest in synthetic transfer context

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yamamoto, R.; Tsutsumi, K.; Yoshimura, T.; Sugimori, H. Deep Learning-Based Projection Angle Estimation for Lumbar Oblique Radiography: A Two-Stage Object Detection Approach Using Vertebral–Pedicle Ratio Analysis. Appl. Sci. 2026, 16, 2800. https://doi.org/10.3390/app16062800

AMA Style

Yamamoto R, Tsutsumi K, Yoshimura T, Sugimori H. Deep Learning-Based Projection Angle Estimation for Lumbar Oblique Radiography: A Two-Stage Object Detection Approach Using Vertebral–Pedicle Ratio Analysis. Applied Sciences. 2026; 16(6):2800. https://doi.org/10.3390/app16062800

Chicago/Turabian Style

Yamamoto, Riria, Kaori Tsutsumi, Takaaki Yoshimura, and Hiroyuki Sugimori. 2026. "Deep Learning-Based Projection Angle Estimation for Lumbar Oblique Radiography: A Two-Stage Object Detection Approach Using Vertebral–Pedicle Ratio Analysis" Applied Sciences 16, no. 6: 2800. https://doi.org/10.3390/app16062800

APA Style

Yamamoto, R., Tsutsumi, K., Yoshimura, T., & Sugimori, H. (2026). Deep Learning-Based Projection Angle Estimation for Lumbar Oblique Radiography: A Two-Stage Object Detection Approach Using Vertebral–Pedicle Ratio Analysis. Applied Sciences, 16(6), 2800. https://doi.org/10.3390/app16062800

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning-Based Projection Angle Estimation for Lumbar Oblique Radiography: A Two-Stage Object Detection Approach Using Vertebral–Pedicle Ratio Analysis

Abstract

1. Introduction

2. Materials and Methods

2.1. System Configuration and Development Environment

2.2. Training Data and Target Scope

2.3. Dataset Acquisition and Synthetic Image Generation

2.3.1. CT Data Source

2.3.2. Data Augmentation

2.4. Object Detection Model Architecture (Model1 and Model2)

2.5. Detection Performance Evaluation

2.6. Evaluation Metrics

2.7. Inference Protocol

2.8. Linear Regression for Angle Estimation

3. Results

3.1. Detection Performance: Per-Fold Results

Vertebral Body Detection

3.2. Five-Fold Summary Statistics

3.3. Linear Regression Coefficients

3.4. Angle Estimation Accuracy and Speed

3.5. Comparison with Published Approaches

4. Discussion

Limitations and Assumptions

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI