The spine, which consists of vertebrae, is the main load-bearing component of the body, and its skeletal status influences a person’s quality of life. Osteoporotic fractures, particularly vertebral fractures, can be associated with chronic disabling pain and even directly affect a person’s survival and life expectancy. Clinical diagnosis of osteoporosis and assessment of fracture risk are mainly based on the areal bone mineral density (BMD) of trabecular bone in the spine and/or hip observed using dual energy X-ray absorptiometry (DEXA) [1
]. However, a number of clinical studies have demonstrated the limitations of BMD measurements. It has been recognized that BMD can account for only 60% of the variation in bone strength [2
]. Recently, researchers found that concomitant deterioration of the bone structure, especially structural changes in trabecular bone, occurs with the loss of bone mass [3
]. This deterioration and loss of bone mass both reduces bone quality and increases fracture susceptibility, indicating that bone structure also plays a key role in bone strength.
Microcomputed tomography (micro-CT), the gold standard for measuring bone microstructure, is an imaging system with exceptionally enhanced resolution (at the micron level) and can generate three-dimensional (3D) images of internal microstructures [4
]. However, micro-CT scanners cannot be applied to materials larger than 10 cm in diameter (e.g., human torso), precluding their incorporation into in vivo imaging and diagnosis. Clinical multidetector computed tomography (MDCT) is widely used in the imaging diagnosis of spinal diseases, but it does not allow for accurate measurements of bone microstructure to be determined. Previously published in vitro studies have investigated the feasibility of using MDCT to measure bone structure, with some parameters exhibiting only a moderate correlation with that of micro-CT [5
]. The trabecular bone thickness (Tb.Th) is approximately 100 microns, which is far less than the maximum resolution of MDCT images of approximately 200–500 microns [8
]. Thus, the ideal imaging instrument for analyzing the structure of trabeculae needs to meet the requirement of a resolution lower than that of the thinnest trabeculae [9
]. However, there is still a lack of suitable in vivo methods for measuring the microstructures of vertebrae. Therefore, we hope to find a method to enhance the resolution of MDCT images to obtain more image information about patients’ bone structure, which will help improve the accuracy of osteoporosis diagnosis and related fracture prediction.
In the medical imaging field, image enhancement methods have recently been used to improve the visualization of important details [10
], for example, defects of retinal blood vessels [12
] and indications of tuberculosis [13
]. Essentially, there are three kinds of methods used in medical image enhancement: example-based methods [14
], convolutional neural network (CNN)-based methods [15
] and conditional generative adversarial network (CGAN)-based methods [19
]. However, most of these methods can accommodate only mappings of local regions or low-resolution images and lack stability when high-resolution images are being evaluated [21
]. To enhance vertebral images, with the goal of making accurate measurements of the bone microstructures, we needed to map the structure, orientation and other specific features of the trabecular bone between two sets of images (i.e., micro-CT and MDCT) with large resolution differences. Pix2pixHD [24
], a CGAN-based method, consists of a coarse-to-fine generator and multiscale discriminators and is designed for the generation of high-detail and high-resolution images of more than 2048 × 1024, which fits our research needs. Therefore, we used pix2pixHD in our endeavor to enhance MDCT images of vertebrae.
In this study, intact vertebrae from human cadavers were imaged using clinical MDCT and micro-CT imaging protocols to (1) take micro-CT images as the gold standard and to regard corresponding MDCT images as inputs to train the pix2pixHD model to enhance vertebral images and obtain micro-CT-like images; (2) use objective image quality metrics to compare the performance of the proposed model with that of two other models named pix2pix and CRN to determine which method is the most suitable for enhancing vertebral images; (3) compare the difference between pix2pixHD-derived micro-CT-like images and micro-CT images by a subjective assessment method to evaluate the quality of the micro-CT-like images and (4) assess the accuracy of trabecular bone structure metrics generated from pix2pixHD-derived micro-CT-like images using micro-CT images as the gold standard to further validate that the proposed method is clinically applicable.
2. Materials and Methods
This study was performed with 5 sets of lumbar spines (between segments L1 and L5, including 25 vertebrae) harvested from 5 formalin-fixed human cadavers (3 males and 2 females; mean age, 75 years; age range, 68–88 years). The donors had dedicated their bodies for educational and research purposes to the local Institute of Anatomy prior to death, in compliance with local institutional and legislative requirements. Lumbar vertebrae with significant compression fractures, bone neoplasms or other causes of significant bone destruction were excluded. All 25 specimens were included in the experiment. The lumbar spine with surrounding muscle was cut into individual segments using a band saw, with pedicle and appendix structures preserved as much as possible. The samples were immersed in phosphate-buffered saline (PBS) solution at 4 °C for 24 h prior to scanning to minimize trapped gas. The study protocol was reviewed and approved by the local institutional review boards.
2.2. Imaging Techniques
The specimens were scanned by micro-CT (Inveon, Siemens, Erlangen, Germany) and MDCT (SOMATOM Definition Flash, Siemens, Erlangen, Germany). The parameters of micro-CT imaging were 80 kVp/500 mAs, the field of view on the xy plane was 80 × 80 mm2, the standard matrix size used was 1536 × 1536 pixels, the number of slices was 1024 at an effective pixel size of 52 μm and the exposure time was 1500 ms in each of the 360 rotational steps. The MDCT imaging parameters were 120 kVp/250 mAs, the field of view was 100 × 100 mm2, the slice thickness was 0.6 mm, the slice interval was 0.1 mm, pitch was 0.8 and the standard matrix size used was 512 × 512 pixels. The two scans provided stack images on the axial plane that covered the entire vertebrae.
2.3. Image Co-Registration
The independent acquisition of the two scanning methods causes MDCT and micro-CT images to be mismatched. To obtain micro-CT-like images, the first step is to achieve slice matching between MDCT and micro-CT axial images [25
]. The scanned micro-CT slice interval was approximately 0.05 mm, and the MDCT slice interval was approximately 0.1 mm. For a sample of any of the 25 vertebrae, after removing images with incomplete vertebral structures and images involving the upper and lower endplates, we selected an area of 2.5 cm in height on the vertebra, obtaining approximately 500 micro-CT images and 250 MDCT images. There were twice as many micro-CT images as MDCT images.
Subsequently, the MDCT and micro-CT images were compared one by one, and the best image mappings were obtained by the dynamic time warping (DTW) algorithm [27
] and scale-invariant feature transform (SIFT) [28
]. Then, the MDCT images were doubled according to the mapping relationships to obtain MDCT and micro-CT image pairs. Applying the above method to each of the 25 vertebrae, a total of 25 × 500 = 12,500 image pairs could be obtained. The image pairs were stored in database_0. Figure 1
illustrates the process mentioned above.
2.4. Construction of Training Set and Testing Set
In our study, we assumed that images from the obtained 12,500 image pairs can be treated as individual samples from the micro-CT domain and MDCT domain. In other words, the relationship between these samples and the vertebrae to which they belong was ignored during training and testing. This assumption is supported by the following reasons:
Characteristics of the selected model. In this paper, we intended to map MDCT images to micro-CT-like images using an image-to-image method named pix2pixHD. This method is a supervised paired image learning method that maps images from the source MDCT domain to the target micro-CT domain and does not consider the continuity within the image domain. Image pairs are randomly selected for tuning the model during training, and no images of a particular vertebra are fed into the training as a set. In other words, in the framework of the selected technique, all image pairs are considered independent during training, and the correlation between different slices of images within a vertebra is ignored.
Diversity within each vertebra. Due to the diversity of images at each slice inside vertebrae (see Figure 2
), the images within a vertebra do not obey the same distribution. This diversity is even more pronounced in the presence of vertebral attachments. To better realize the training, we needed to use all pairs of images at all slices in vertebrae as the basic unit for model training.
Therefore, there was no “vertebra” in the training and testing processing but only image pairs. The sequential information can be further broken if the training set and test set are constructed by random sampling. The training and test sets obtained on this basis can be considered to be independent.
Based on the above analysis, we could obtain the test set and training set by random sampling. To prevent a certain slice of images from being trained, for any vertebra, 100 image pairs (20%) were randomly selected as the testing set, and the remaining 400 image pairs were used as the training set (80%) [29
]. Random sampling ensured that continuous information was removed, and the training and test sets covered most parts of the vertebrae so that the trained model did not suffer from underfitting or overfitting. In this way, the 12,500 image pairs in database_0 were divided into training (dataset_training) and test (dataset_testing) sets.
2.5. Model Training
] is a model based on a CGAN that can generate high-resolution micro-CT-like images given the input MDCT images by finding the complex mapping function. The framework of pix2pixHD consists of a coarse-to-fine generator and multiscale discriminators. The coarse-to-fine generator contains a global generator network and a local enhancer network, where the global generator network focuses on coarse and global features of images (such as external contours and geometric structures) and the local enhancer network focuses on local details (such as the texture and direction of bone trabeculae). Similar ideas but different architectures can be found in [30
]. These multiscale discriminators are designed for training the coarse-to-fine generator using three identically structured networks focusing on different scales of details. The network framework of the pix2pixHD model is shown in Figure 3
The pix2pixHD model was trained in the PyTorch platform on a Windows Server 2019 workstation with two Nvidia A6000 graphics processing units (GPUs). The batch size was set as 10. The maximum number of epochs was set as 200, and there were 200 iterations in each epoch. We compared our method with two other mature methods: CRN [33
] and pix2pix [21
]. We trained these two models with their default settings.
2.6. Objective Assessment of Image Quality
After training, the pix2pixHD model was validated by objective metrics based on the designed testing set (dataset_testing), as were the pix2pix and CRN methods. The objective metrics are described below.
Structural similarity index measure (SSIM) [34
]: The SSIM computes the perceptual distance between micro-CT-like images and the gold standard (i.e., micro-CT images). In this paper, we used the simplified version of the SSIM:
are the average values of input images x
are small constants (the default values of
are 0.01 and 0.03, respectively), and
is the dynamic range of the pixel values (255 for 8-bit grayscale CT images).
Fréchet inception distance (FID) [35
]: The FID measures the distance between a generated micro-CT-like image and the corresponding micro-CT image by extracting a feature vector with 2048 elements by a trained Inception-V3 model. The FID formula is as follows:
are the mean values of the features of the real and generated images, respectively, and
are the covariance matrices of the real and generated images, respectively.
These two indexes evaluate the similarity between two images from different perspectives. The SSIM tends to evaluate similarity in terms of structure, and higher SSIM indicates higher similarity of the images [36
]. In contrast, the FID tends to evaluate similarity in terms of details, and a lower FID indicates a higher similarity of the images [35
]. The above two objective metrics validated the generated micro-CT-like images from a computer imaging perspective. By comparing the two metrics from the results of the three methods (pix2pixHD, pix2pix and CRN), we could ascertain the effectiveness of the three methods and determine which method better enhances vertebral images.
2.7. Subjective Assessment of Image Quality
Subjective assessment of image quality was performed by three radiologists (Observer 1, J.D., 6 years of experience in musculoskeletal imaging; Observer 2, Z.Q., 5 years of experience in musculoskeletal imaging; Observer 3, W.C., 3 years of experience in musculoskeletal imaging) through image scoring. The detailed experimental operation was as follows: to prevent visual fatigue of the observers which could impact the fairness of the scoring results, we randomly selected 30 micro-CT images and 30 pix2pixHD-derived micro-CT-like images and sorted them into a sequence as an experimental collection. Each image was assigned a unique identification number. These sequences were anonymized and presented to the three observers independently in a blinded and random fashion. To provide comparable results, all images were displayed using the same graphics software, and all images were consistent in size, window level and width. Contrast was rated on a 3-point scale, and noise, sharpness, shadow and texture were rated on a 5-point scale to assess image quality. These ratings are further described in Table 1
2.8. Assessment of the Trabecular Bone Microstructure
To measure the bone microstructure, we needed to obtain continuous axial images to form a cylindrical volume of interest (VOI). After training the model, we inputted all the original MDCT images of the 25 vertebrae from database_0 into the pix2pixHD model to obtain continuous micro-CT-like images. Then, we selected micro-CT-like images with the original micro-CT images of the 25 vertebrae. Then, two cylindrical VOIs (approximately 15 mm in diameter and 5 mm in height) for each vertebra (n
= 50 in total) were defined on both the micro-CT and micro-CT-like images. The positioning of the VOI can be found in Figure 4
. The same VOI setting was also used for MDCT images to calculate bone structure parameters as a control group.
Trabecular microstructure analysis of the micro-CT-like and micro-CT images was performed using the BoneJ plug-in [37
] in Fiji [38
]. Fiji is a distribution of the image processing package ImageJ2 (National Institutes of Health, USA) [39
]. The micro-CT-like images of the vertebrae were processed in conjunction with the micro-CT images as 8-bit stack maps in Fiji software. The micro-CT and micro-CT-like grayscale image pairs were binarized into bone and marrow phases using a global (histogram-derived) thresholding method named the IsoData algorithm [41
]. The underlying assumption of this method is that the histogram intensity distribution is bimodal, exhibiting bone and background peaks. The midpoint between the two peaks was used as the threshold value. Then, the following structural parameters were calculated: bone volume fraction (BV/TV), trabecular thickness (Tb.Th) and trabecular spacing (Tb.Sp). BV/TV was derived through simple voxel counting. In this method, all the foreground voxels were counted, and all voxels were assumed to represent bone; then, the number of foreground voxels was compared to the total number of voxels in the image. Tb.Th and Tb.Sp were calculated without model assumptions as direct measures. Foreground voxels were considered to be trabeculae, and background voxels were regarded as the spacing [42
]. BoneJ was used to calculate the mean and the standard deviation of the Tb.Th or Tb.Sp directly from pixel values in the resulting thickness map.
The Kolmogorov–Smirnov test was used to analyze normality, and the Levene test was used to analyze the homogeneity of variance among the measurement data. Data showing a Gaussian distribution are reported as the mean ± standard deviation. For objective image analysis, because the data did not satisfy homogeneity of variance, the Kruskal–Wallis test was used to assess the difference in the SSIM and FID for the three methods. For the subjective assessment, Kendall’s coefficient of concordance (Kendall’s W) was calculated to evaluate interobserver agreement for each subjective image evaluation score of 5 aspects. We considered Kendall’s W values of less than 0.20 to be indicative of poor agreement, values between 0.20 and 0.40 to indicate fair agreement, values between 0.60 and 0.80 to indicate moderate agreement and values greater than 0.80 to indicate excellent agreement. Then, the Mann–Whitney U test was performed to compare the subjective assessment scores between micro-CT and pix2pixHD-derived micro-CT-like images. For trabecular bone microstructure analysis, the paired Student’s t-test was used to determine the statistical significance of differences between micro-CT and micro-CT-like images for each structural parameter. Parameters derived from the micro-CT and micro-CT-like images were correlated using Pearson’s correlation coefficient. These statistical analyses were performed using SPSS 26.0 software (SPSS Inc., Chicago, IL, USA), and a p-value < 0.05 was considered statistically significant.
In this paper, we used a deep-learning-based method, pix2pixHD, to find mappings between MDCT and micro-CT axial images to generate micro-CT-like images of vertebrae. To our knowledge, integrating image mapping and texture accuracy enhancement between two sets of images with very different textures and details, such as MDCT images and micro-CT images, is still a challenge; additionally, this is the first attempt to map micro-CT and MDCT images using the deep-learning-based pix2pixHD method.
By comparing the performance of the three methods regarding the generated images using objective image assessment metrics, it was demonstrated that the pix2pixHD method resulted in superior micro-CT-like images compared to the other two methods, with sufficient similarity between the generated images and the corresponding micro-CT images. This similarity was reflected not only in the overall vertebral body but also in the local details and anatomical subtleties of the images. The reason the pix2pixHD method outperformed the other methods is that it adopts a multiscale generator and discriminators, considering the overall structure and local details. In contrast, the CRN and pix2pix models were not designed for the high-resolution and high-detail medical image enhancement problem; they do not have an adequate field of view and have severe overlapping shadow and blurring problems when processing high-resolution images [24
All three observers had high agreement on all subjective image quality scores and concluded that the contrast and overlapping shadow scores of pix2pixHD-derived micro-CT-like images were not significantly different from those of micro-CT images. This means that the generated images were excellent in both aspects. This result arises because pix2pixHD’s generator and discriminator were both built using a multiscale architecture and can generate high-detail and high-resolution images with a resolution of more than 2048 × 1024, which covered our image scope completely.
Micro-CT-like images also have some shortcomings, with a slightly deficient performance in terms of noise, sharpness and ability when visualizing trabecular bone texture compared to micro-CT images. We reviewed our micro-CT-like images with relatively low noise scores and found that noise was mainly found in the vertebral appendages (including the pedicles and laminae), as shown in Figure 7
, which are characterized by a thicker bone cortex or markedly heterogeneous increases in bone density at localized positions. This outcome may be due to the complex interleaving of pixels representing bone contained in the abovementioned regions. The objective function [43
] used by the model was insensitive to noise in this case. Fortunately, osteoporosis is mainly associated with the vertebral body, and noise at the above anatomical positions does not directly affect the accuracy of measurement of the bone structure of the vertebral body. Nevertheless, the pedicle is the clinical entry point for pedicle screws in spinal decompression and fixation fusion, especially in posterior internal fixation systems. Furthermore, studies have demonstrated that the bone quality of this component determines the stability of internal fixation [44
]. Hence, in the future, we plan to use more auxiliary means to improve the accuracy of bone structure in vertebral appendages.
In addition, the observers subjectively determined the sharpness and trabecular texture scores of micro-CT-like images to be lower than those of micro-CT images (p < 0.001), which is consistent with the trend of our objective metric results (SSIM and FID) and trabecular bone measurement results (Tb.Th and Tb.Sp). This result arises because the method used is based on image-by-image mappings with insufficient consideration of the correlation between adjacent images. This caused the bone trabecular details to have unreasonable missing and abnormal textures, which reduced the corresponding score in the subjective evaluation. To solve the above problems, we need to increase the number of samples, build models that can extract association information between images and optimize the parameters of the training models in future work.
Regarding all trabecular bone structural measurements (BV/TV, Tb.Th and Tb.Sp) in our study, the correlation of their values computed from micro-CT and pix2pixHD-derived micro-CT-like images was very high (R > 0.88) and better than the correlation computed from micro-CT and MDCT images. The mean values of the measurements in our study were lower than those of the gold standard. Previously published in vitro studies on the feasibility of bone structure measurements using MDCT on vertebral bodies reported similar results for BV/TV. Issever et al. [5
] reported a correlation coefficient of 0.86 (coefficient of determination, R2
) for BV/TV measured in vertebrae specimens. However, the correlation between Tb.Th and Tb.Sp in the results of Issever et al. [5
] was considerably weaker and rare (R2
= 0.19–0.26), which is consistent with our MDCT image results. Guha [6
] and Chen [7
] explored the correlation between trabecular bone structural measurements of MDCT and the corresponding micro-CT images using in vitro tibial and distal radius specimens. Although the anatomical positions of the study specimens were different, the correlation coefficients of Tb.Th and Tb.Sp was also relatively moderate (R < 0.80, Pearson). In addition, note that the mean values of MDCT-derived Tb.Th measured by the existing studies were greater than those of the gold standard, which is the same as our MDCT images but the opposite of the patterns we found from micro-CT-like images. Scholars have concluded that [5
] this is a result of the relatively low image resolution of MDCT, causing the thinner trabeculae to be lost in the images and the trabecular network to be blurred. Unlike existing studies, our method can recover trabecular bone with widths smaller than the maximum resolution of MDCT by modeling the implicit mapping relationships in MDCT and micro-CT images. Through this method, we obtained Tb.Th and Tb.Sp values that were extremely close to those of the gold standard (R > 0.90). Notably, the bone structure metrics derived from our micro-CT-like images were lower than those of the gold standard. This is mainly because our trained pix2pixHD model still has some deficiencies in the extraction of image features during the generation of the map images, making the grayscale values of the pixels in and around the bone trabeculae fluctuate. This fluctuation directly affected the bone structure measurement process; in particular, it caused local disappearance, fragmentation and displacement of trabecular bone during the binarization process. The thickness of trabecular bone was reduced, and the number of bone trabeculae was increased.
In summary, our chosen method is more suitable for the task of generating high-resolution micro-CT-like images than previous methods are. Nevertheless, prior to implementation in clinical practice, the following improvements should be made in future studies. Firstly, the relationship between images needs to be captured by a 3D mapping model. Thus, the fineness of the bone trabecular texture can be further enhanced. Secondly, the relationship between bone structure metrics and bone biomechanical metrics needs to be analyzed. In the future, we plan to perform mechanical experiments on bone samples to determine the relationship between the bone structural metrics of generated micro-CT-like images and bone strength in a more detailed way. This relationship could be used to further enhance the significance of bone structural metrics studies for clinical applications, such as the diagnosis of osteoporotic fragility fractures.
Continued increases in life expectancy are predicted to increase the population with osteoporosis, and associated fracture rates are expected to increase as well. Therefore, it is essential to identify fracture risks to plan therapeutic interventions and monitor treatment responses. In addition, as the age of the population undergoing spinal instrumentation increases, clinicians need to consider bone quality more carefully than ever before and tailor surgical techniques to optimize patient outcomes and reduce the probability of postoperative complications [47
]. Although our results are currently at the in vitro stage, with the expansion of the sample size, the inclusion of in vivo experiments and the maturation of the deep learning algorithm, it will be possible to obtain more accurate bone structural parameters while performing conventional CT scans in the future. Additionally, the bone density and bone structure measurements of vertebrae can be obtained simultaneously through the use of a commercial calibration phantom during MDCT scanning. These composite metrics may provide a new predictive basis for osteoporotic fractures and a new reference for surgical planning and drug selection.