Automatic Hip Detection in Anteroposterior Pelvic Radiographs—A Labelless Practical Framework

Automated detection of the region of interest (ROI) is a critical step in the two-step classification system in several medical image applications. However, key information such as model parameter selection, image annotation rules, and ROI confidence score are essential but usually not reported. In this study, we proposed a practical framework of ROI detection by analyzing hip joints seen on 7399 anteroposterior pelvic radiographs (PXR) from three diverse sources. We presented a deep learning-based ROI detection framework utilizing a single-shot multi-box detector with a customized head structure based on the characteristics of the obtained datasets. Our method achieved average intersection over union (IoU) = 0.8115, average confidence = 0.9812, and average precision with threshold IoU = 0.5 (AP50) = 0.9901 in the independent testing set, suggesting that the detected hip regions appropriately covered the main features of the hip joints. The proposed approach featured flexible loose-fitting labeling, customized model design, and heterogeneous data testing. We demonstrated the feasibility of training a robust hip region detector for PXRs. This practical framework has a promising potential for a wide range of medical image applications.


Introduction
The deep convolutional neural network (DCNN) has shown a significant breakthrough in many aspects of commercial image differentiation and identification. In recent years, DCNNs have also played important roles in medical image analysis [1,2]. For example, the ChestX-ray8 [3] and MURA [4] are two representative studies utilizing the state-of-the-art DCNN classification and visualization models to detect and locate disease patterns in the chest and musculoskeletal radiographs.
Some studies employ a more delicate "two-step" classification strategy, which first detects specific ROIs [5][6][7][8][9], followed by conventional classification methods [10][11][12]. A seminal work is the automatic knee osteoarthritis diagnosis in lateral knee radiographs, where knee regions are first identified [13], followed by classification and heatmap visualization [14]. The advantage of this "two-step" approach is the capability to identify subtle localized abnormalities and has gradually become the mainstream technology, especially for the analysis of PXRs, including fracture subclass identification [15], hip osteoarthritis grading [16], and avascular necrosis detection [17]. Nonetheless, the above studies barely mentioned the model parameter settings and selection criteria, and none of them reported the confidence score for the detected ROIs, which the confidence score is a crucial metric indicating the likelihood that the predicated ROI contains the correct object.
A critical component for a successful "two-step" classification system is accurate ROI detection, which falls into computer vision object detection tasks [18], usually tackled by different strategies [19]. Among these methods, the bounding-box-based methodology is advantageous for its lower annotation workload and simple implementation, which is proven to be effective in popular computer vision applications in other sectors. In order to identify multiple objects across different scales in one image, one must generate anchor boxes of varied sizes and aspect ratios for hyper-parameter optimization. However, there is usually a small number of non-overlapping objects in medical images. It is not optimal to apply the same object detection parameters on different underlying applications.
In this work, we propose a labor-less practical framework of ROI detection and parameter selection in medical images. To the best of our knowledge, this is the first work that provides a systematic guideline for parameter selection based on the obtained datasets and has a promising potential for a wide range of medical image applications for further personalized medicine.

Dataset Acquisition
This retrospective study analyzed hip joints seen on 7399 PXRs from three diverse sources, including the Chang Gung Memorial Hospital Osteoarthritis (CGOA) dataset containing 4290 high-resolution radiographs, the second Osteoarthritis Initiative Hip (OAIH, pelvic radiograph dataset extracted from a subset of data from the OAI [20]) dataset containing 3008 radiographs with relatively lower resolutions, and the third Google Image Search (GIS) dataset containing 101 heterogeneous radiographs. Table 1 lists the summary statistics of these datasets. This experimental design, which utilizes radiographs generated from diverse sources of different imaging protocols, resolutions, and ethnicities, ensures that model generalization can be achieved. Details of these three datasets can be found in Table 1.  Figure 1 shows the overview of the proposed framework. Clinical readings on etiology and grading of all CGOA images were performed by one physician with 15 years of clinical experience. To annotate hip regions of interest, we employed three annotators trained to place square bounding boxes approximately centered at the femoral head or the artificial hip joint with customized GUI software. It is noted that identifying a complete round femoral head in healthy hips is relatively straightforward; however, for cases with disrupted hip conditions with collapsed femoral heads, we employed a loose-fitting manner to make sure every hip joint lay appropriately in the bounding box. All the labeled ROIs in the CGOA dataset were visually reviewed by physicians, and the ROI annotators used the same rules to annotate the remaining OAIH and GIS datasets. Clinical readings on etiology and grading of all CGOA images were performed by one physician with 15 years of clinical experience. To annotate hip regions of interest, we employed three annotators trained to place square bounding boxes approximately centered at the femoral head or the artificial hip joint with customized GUI software. It is noted that identifying a complete round femoral head in healthy hips is relatively straightforward; however, for cases with disrupted hip conditions with collapsed femoral heads, we employed a loose-fitting manner to make sure every hip joint lay appropriately in the bounding box. All the labeled ROIs in the CGOA dataset were visually reviewed by physicians, and the ROI annotators used the same rules to annotate the remaining OAIH and GIS datasets.

Proposed SSD Model Architecture for ROI Detection in Hip Radiographs
The proposed hip region detection architecture simplifies existing SSD model architecture (as Figure 2) [9], which was originally developed for detecting multiple objects with different sizes and aspect ratios in applications. For ROI detection in medical images, we replaced the SSD VGG-16 backbone by Res-Net-101 [11] backbone, which was pre-trained on ImageNet [21]. All these modifications could reduce ROI detections from several thousands to a few hundreds, decreasing training time and complexity as well as increasing detection accuracy and confidence.

Proposed SSD Model Architecture for ROI Detection in Hip Radiographs
The proposed hip region detection architecture simplifies existing SSD model architecture (as Figure 2) [9], which was originally developed for detecting multiple objects with different sizes and aspect ratios in applications. Clinical readings on etiology and grading of all CGOA images were performed by one physician with 15 years of clinical experience. To annotate hip regions of interest, we employed three annotators trained to place square bounding boxes approximately centered at the femoral head or the artificial hip joint with customized GUI software. It is noted that identifying a complete round femoral head in healthy hips is relatively straightforward; however, for cases with disrupted hip conditions with collapsed femoral heads, we employed a loose-fitting manner to make sure every hip joint lay appropriately in the bounding box. All the labeled ROIs in the CGOA dataset were visually reviewed by physicians, and the ROI annotators used the same rules to annotate the remaining OAIH and GIS datasets.

Proposed SSD Model Architecture for ROI Detection in Hip Radiographs
The proposed hip region detection architecture simplifies existing SSD model architecture (as Figure 2) [9], which was originally developed for detecting multiple objects with different sizes and aspect ratios in applications. For ROI detection in medical images, we replaced the SSD VGG-16 backbone by Res-Net-101 [11] backbone, which was pre-trained on ImageNet [21]. All these modifications could reduce ROI detections from several thousands to a few hundreds, decreasing training time and complexity as well as increasing detection accuracy and confidence. For ROI detection in medical images, we replaced the SSD VGG-16 backbone by ResNet-101 [11] backbone, which was pre-trained on ImageNet [21]. All these modifications could reduce ROI detections from several thousands to a few hundreds, decreasing training time and complexity as well as increasing detection accuracy and confidence.
To best determine the anchor box parameter settings, we first defined the size of the square ROI divided by the length of the long side of the input image (zero padding to a square if needed). This ratio is designed as a normalizer, making the anchor boxes and ROI instances compatible across different datasets. Next, we analyzed image size distributions ( Figure 3A) and distributions ( Figure 3B) of the three available heterogeneous datasets, where the ratios lie mostly between 10% to 30%.
We specified the input image size of 224 × 224 pixels split by 7 × 7 grid cells, where each grid cell is of size 32 × 32 pixels. We set 6 equally spaced scales parameters {0.7, 1.0, 1.3, 1.6, 1.9, 2.2} ( Figure 3C) so that the smallest and largest anchor boxes could cover 10% and 31.4% of the images, respectively. This design ensures that the designed anchor boxes can identify appropriate hip ROIs in the datasets.
To best determine the anchor box parameter settings, we first defined the size of the square ROI divided by the length of the long side of the input image (zero padding to a square if needed). This ratio is designed as a normalizer, making the anchor boxes and ROI instances compatible across different datasets. Next, we analyzed image size distributions ( Figure 3A) and distributions ( Figure 3B) of the three available heterogeneous datasets, where the ratios lie mostly between 10% to 30%. We specified the input image size of 224 × 224 pixels split by 7 × 7 grid cells, where each grid cell is of size 32 × 32 pixels. We set 6 equally spaced scales parameters {0.7, 1.0, 1.3, 1.6, 1.9, 2.2} ( Figure 3C) so that the smallest and largest anchor boxes could cover 10% and 31.4% of the images, respectively. This design ensures that the designed anchor boxes can identify appropriate hip ROIs in the datasets.

Data Preprocessing, Training, and Evaluation
For data preprocessing, each radiograph was zero padding to a square image and resized to 224 × 224 pixels with 8-bit grayscale before feeding into the model. The model was implemented by fastai v0.7 library [22] with Python 3.6.4, and we randomly split the combined CGOA and OAIH dataset into 90% for training and 10% for validation once, and used all 101 GIS radiographs as the independent test dataset. We fixed the same training and validation images in either the combined dataset or each individual dataset in all experiments for fair comparison. For evaluation, we used the standard IoU metric for comparing the predicted bounding box Bpred and ground truth bounding box Bgt: where ∩ and ∪ denote intersection and union, respectively. We reported the associated confidence, which denotes the likelihood that the anchor box contains an object, for each predicted bounding box, average IoU, average confidence, minimal confidence, and AP50, as the 0.5 cutoff indicates poor ROI detection, which may cause issues for downstream analysis.

Demographics of the Study Population
The original CGOA cohort contained 4643 high resolution radiographs, including 3013 patients who underwent hip surgery with an average age of 63.06 ± 15.72 years and 40.8% being male, and 1630 control cases from emergency room without undergoing hip surgery with an average age of 44.88 ± 20.46 years and 68.2% being male. Among the 3013

Data Preprocessing, Training, and Evaluation
For data preprocessing, each radiograph was zero padding to a square image and resized to 224 × 224 pixels with 8-bit grayscale before feeding into the model. The model was implemented by fastai v0.7 library [22] with Python 3.6.4, and we randomly split the combined CGOA and OAIH dataset into 90% for training and 10% for validation once, and used all 101 GIS radiographs as the independent test dataset. We fixed the same training and validation images in either the combined dataset or each individual dataset in all experiments for fair comparison. For evaluation, we used the standard IoU metric for comparing the predicted bounding box B pred and ground truth bounding box B gt : where ∩ and ∪ denote intersection and union, respectively. We reported the associated confidence, which denotes the likelihood that the anchor box contains an object, for each predicted bounding box, average IoU, average confidence, minimal confidence, and AP50, as the 0.5 cutoff indicates poor ROI detection, which may cause issues for downstream analysis.

Demographics of the Study Population
The original CGOA cohort contained 4643 high resolution radiographs, including 3013 patients who underwent hip surgery with an average age of 63.06 ± 15.72 years and 40.8% being male, and 1630 control cases from emergency room without undergoing hip surgery with an average age of 44.88 ± 20.46 years and 68.2% being male. Among the 3013 surgical patients, 353 cases with severe fractures were excluded due to completely different morphology and treatment options. The remaining 2660 trauma patients including hundreds of occult fracture cases and 1630 control cases constructed the COGA dataset. The second OAIH dataset was a consolidated pelvic radiograph dataset extracted from subset of data from the OAI project, which recruited 4796 participants from February 2004 to May 2006 to form a baseline cohort (58% female and ranged in age from 45 to 79 years at time of recruitment). The third GIS dataset was acquired through Google image search engine, and the demographics are not available.

Model Performance and Visualization
In Table 2, we take a closer look at the best performance results and carefully examine those cases where hip ROIs had IoU < 0.5. As AP50 metrics were 1 in both training and validation set and 0.9901 in the independent GIS test set, we only identified two cases below IoU 0.5 cutoff, which may indicate poor ROI detection and cause issues for downstream analysis. Table 2. Detailed performance metrics with the optimal parameters using the proposed hip region detection architecture.

Datasets
Number of Images We further examined other radiographs in the heterogeneous test set, and the hip ROI detection showed several representative results, as Figure 4 presents. Figure 4A shows a radiograph with some text outside the key hip area. Figure 4B shows the dislocation on the left hip, but the detected hip ROI covers most key features of the left hip. Figure 4C shows a radiograph with plates on the left pubic ramus and acetabulum, and ROI can detect the hips correctly. Figure 4D shows a radiograph with pediatric patients. Figure 4E shows left hip artificial can be detected correctly. Figure 4F hip ROI indicated right proximal femoral fracture. Figure 4G shows right temporal cemented prosthesis fracture and left total hip replacement, and the hip ROI can be detected. Finally, as shown in Figure 4H, the hip ROI was able to detect right acetabular fracture with plate fixation and destructed femoral head. These results suggest that our model with specially designed anchors and trained by diverse datasets is a general and robust hip region detector that can be applicable for a wide range of heterogeneous datasets with different qualities and resolutions and can be potentially useful for automated assessment of many hip bone conditions.

Discussion
In this work, we have demonstrated a practical framework for detecting regions of interest in medical images. With the case study for hip detection in PXRs, we achieved average IoU over 80% and average confidence higher than 95%. These independent test set showed promising ROI detection results on GIS with heterogeneous resolutions and appearance. The proposed hip region detection architecture simplified existing SSD

Discussion
In this work, we have demonstrated a practical framework for detecting regions of interest in medical images. With the case study for hip detection in PXRs, we achieved average IoU over 80% and average confidence higher than 95%. These independent test set showed promising ROI detection results on GIS with heterogeneous resolutions and appearance. The proposed hip region detection architecture simplified existing SSD model architecture, which was originally developed for detecting multiple objects with different sizes and aspect ratios in applications. For ROI detection in medical images, there are usually one or two important organs in one radiograph. It is feasible to have a simplified SSD architecture with only one feature layer as the only convolutional predictor, with an appropriate receptive field size, one aspect ratio (1:1 in for hip ROI), and a small set of scales.
Compared to traditional object detection tasks, which need to recognize multiple objects with different sizes and aspect ratios in images and videos, the proposed SSD architecture has the advantages of simpler structure, higher IoU accuracy, and reliable confidence. The challenge of determining those empirical parameter settings now relies on the basic statistics on the available datasets to generate enough anchor boxes. Our results suggest that more anchors do not necessarily encourage higher IoU but may decrease the prediction performance. The proposed method provides a more effective approach for anchor design and parameter optimization.
Annotation by doctors is time-consuming and is usually the bottleneck for medical image analysis. The approximate identification of hip regions by automated and accurate ROI detection is critical for automated computer-assisted analysis for screening and diagnostics. The proposed framework provides a guideline for parameter settings in anchor-based object detection algorithms, and it is especially useful for applications such as joint identification in medical image problems. Several studies have reported good results [14][15][16][17]. However, heavy labeling workload and cost of physicians' label are another consideration that has limited this method from going global. Our study provided a method of manual annotation with approximation identification of hip regions that can be performed effectively and inexpensively.
Medical artificial intelligence is progressive in order to change the healthcare system, and various DCNNs have showed that it is feasible to detect lesions from pathologic images [23] and radiography [24]. These algorithms presented outstanding achievement in disease detection or prediction of whose performance is not inferior to that of the physicians [23][24][25]. These results inspire us in that DCNN might help individuals in the healthcare sector in different ways. However, the development of medical AI is not accessible due to some limitations. The data clearance and accurate label were considered fundamental for deep learning because of the limited size and data quality of medical images [19] and the high cost of a medical expert to perform labeling [26]. Moreover, the hip ROI detection system can help the physician to label the lesion in a weak supervision way, wherein we can pick out the hip regions and save time for the physician to crop and copy the images. The reduction of the barrier between an outliner and the way in which to attract more physicians and scientists to join a new rising technologic field are other issues to be considered in the real world. In this study, we developed the diagnostic assistance system and created a useful tool for reducing the workload during data collection and tuning. With our tool, we can simply label workload, minimize the calculation requirement, and eventually make the physician use it in the way they need. There are numerous existing programs [27,28] that can help orthopedics to plan the surgical strategy. Our algorithm might accelerate the speed of these programs by reducing calculation requirements in the future. The utility of such ROI detection approaches highly depends on the downstream applications. With input of clinical physicians' expertise, this automated hip ROI detection enables applications such as fracture identification, osteoarthritis assessment, osteoporosis, and even surgical prediction in the future. The evaluations of such applications and integrated systems remain to be investigated in future works and remain to be open research topics.

Limitation
Our study provided a feasible framework of automated ROI labeling. However, there are still some limitations in the existing method. First, the manual hip annotation with loose-fitting criteria is not unique and can be varied from person to person, especially for those cases with destructed hips. In these situations, a closer visual examination is needed. Because of the data distribution, we excluded most images from patients with endomedullary prostheses to make the training data solid. Therefore, we did not have these kinds of images for further validation, which might impact the usability of this algorithm. Lastly, limited medical image data might influence the performance of this algorithm. Increasing data from other sources might increase the performance and prevent the possibility of overfitting.

Conclusions
In conclusion, with the proposed DCNN framework, we can identify the hip joint with high accuracy, reliability, and reproducibility. It has a clear approach for ROI detection in plain X-ray and has practical usefulness for future applications in medical imaging. Increasing data and destructed hip analysis might improve the performance of this algorithm. However, the downstream application of hip ROI detection is a further research direction, and with our tool, we can simply label workload and eventually adjust the algorithm to fulfil the physicians' need to achieve the aim of personalized healthcare.