Measuring Identiﬁcation and Quantiﬁcation Errors in Spectral CT Material Decomposition

: Material decomposition methods are used to identify and quantify multiple tissue components in spectral CT but there is no published method to quantify the misidentiﬁcation of materials. This paper describes a new method for assessing misidentiﬁcation and mis-quantiﬁcation in spectral CT. We scanned a phantom containing gadolinium (1, 2, 4, 8 mg/mL), hydroxyapatite (54.3, 211.7, 808.5 mg/mL), water and vegetable oil using a MARS spectral scanner equipped with a poly-energetic X-ray source operated at 118 kVp and a CdTe Medipix3RX camera. Two imaging protocols were used; both with and without 0.375 mm external brass ﬁlter. A proprietary material decomposition method identiﬁed voxels as gadolinium, hydroxyapatite, lipid or water. Sensitivity and speciﬁcity information was used to evaluate material misidentiﬁcation. Biological samples were also scanned. There were marked differences in identiﬁcation and quantiﬁcation between the two protocols even though spectral and linear correlation of gadolinium and hydroxyapatite in the reconstructed images was high and no qualitative segmentation differences in the material decomposed images were observed. At 8 mg/mL, gadolinium was correctly identiﬁed for both protocols, but concentration was underestimated by over half for the unﬁltered protocol. At 1 mg/mL, gadolinium was misidentiﬁed in 38% of voxels for the ﬁltered protocol and 58% of voxels for the unﬁltered protocol. Hydroxyapatite was correctly identiﬁed at the two higher concentrations for both protocols, but mis-quantiﬁed for the unﬁltered protocol. Gadolinium concentration as measured in the biological specimen showed a two-fold difference between protocols. In future, this methodology could be used to compare and optimize scanning protocols, image reconstruction methods, and methods for material differentiation in spectral CT.


Introduction
Spectral (multi-energy) computed tomography is an extension of dual-energy computed tomography (DECT) or single energy CT. In spectral CT, a single broad X-ray spectrum is compartmentalized into separate energy bins by photon counting detectors to extract energy We compare two different imaging protocols using one sparse solution material differentiation method for post material decomposition assessment by measuring the sensitivity (true positive rate) and specificity (true negative rate) of individual high-Z material at each concentration in the material image domain.

MARS Spectral Scanner
MARS spectral CT uses the Medipix3RX photon-counting X-ray detector technology [14,[38][39][40]. The scanner is suitable for preclinical spectral imaging of small animal models and human samples of diseased tissue. It consists of a rotating gantry, detector, X-ray source and sample holder shaft, computer hardware and software as shown in the scanner schematic in Figure 1a. It is the photon processing X-ray detector, Medipix3RX [41], that allows for spectral computed tomography imaging by measuring the number and energy of each X-ray photon, to provide specific identification and quantification (g/cm 3 ) of the component material. Medipix3RX allows simultaneous acquisition of up to eight energy bins at a single X-ray exposure. Energy ranges are user-defined, therefore offering flexibility and customization for a wide range of medical and industrial applications. To reduce the effect of charge sharing [42], the charge summing feature is included in the Medipix3RX ASIC which allows the total charge for a single interaction to be summed and allocated to the closest pixel [43]. To do this, each pixel communicates with its four neighbours to locate the pixel with the highest charge for a co-incident event, the total charge is then allocated to this pixel [42]. The aim of this study is to generate a quantitative metric using MD images of a calibration phantom which can be used to quantify correct material identification at various concentrations. We compare two different imaging protocols using one sparse solution material differentiation method for post material decomposition assessment by measuring the sensitivity (true positive rate) and specificity (true negative rate) of individual high-Z material at each concentration in the material image domain.

MARS Spectral Scanner
MARS spectral CT uses the Medipix3RX photon-counting X-ray detector technology [14,[38][39][40]. The scanner is suitable for preclinical spectral imaging of small animal models and human samples of diseased tissue. It consists of a rotating gantry, detector, X-ray source and sample holder shaft, computer hardware and software as shown in the scanner schematic in Figure 1a. It is the photon processing x-ray detector, Medipix3RX [41], that allows for spectral computed tomography imaging by measuring the number and energy of each X-ray photon, to provide specific identification and quantification (g/cm 3 ) of the component material. Medipix3RX allows simultaneous acquisition of up to eight energy bins at a single x-ray exposure. Energy ranges are user-defined, therefore offering flexibility and customization for a wide range of medical and industrial applications. To reduce the effect of charge sharing [42], the charge summing feature is included in the Medipix3RX ASIC which allows the total charge for a single interaction to be summed and allocated to the closest pixel [43]. To do this, each pixel communicates with its four neighbours to locate the pixel with the highest charge for a co-incident event, the total charge is then allocated to this pixel [42]. The threshold settings for the selection of energy bins have a major impact on spectral image quality in terms of image contrast and noise level. The wider the energy bin width is, the lower the noise level is, but the poorer the reconstructed spectral image contrast is. This is mainly because Medipix3RX detector works as a single photon counting detector using user-specified energy thresholds to gather data for a certain energy range. Figure 1b shows the mass attenuation coefficients of calcium, gadolinium (K-edge energy = 50.2 keV), tissue and adipose and simulated x-ray spectrum with an external 0.375 mm brass filter. The threshold settings for the selection of energy bins have a major impact on spectral image quality in terms of image contrast and noise level. The wider the energy bin width is, the lower the noise level is, but the poorer the reconstructed spectral image contrast is. This is mainly because Medipix3RX detector works as a single photon counting detector using user-specified energy thresholds to gather data for a certain energy range. Figure 1b shows the mass attenuation coefficients of calcium, gadolinium (K-edge energy = 50.2 keV), tissue and adipose and simulated X-ray spectrum with an external 0.375 mm brass filter.
The MARS scanner incorporated a microfocus poly-energetic X-ray source and a Medipix3RX detector within a continuous rotating gantry. We used a 2 mm thick CdTe sensor (14 × 14 mm 2 ), bump-bonded at 110 µm to a Medipix3RX readout chip.

Experimental SETUP
The 720 circular projections over 360 • were acquired using a Source-Ray SB-120-350 X-ray tube (Source-Ray Inc., Ronkonkoma, NY, USA) with a tungsten anode having 1.8 mm of aluminium (equivalent) intrinsic filtration; the focal spot size of the X-rays was~50 µm. Two imaging protocols were used for post-MD assessment: protocol-1 employed a 0.375 mm external brass filter to mimic a clinical CT X-ray energy range and protocol-2 featured no external filtration as often used in pre-clinical X-ray CT imaging (shown in Table 1). In this study, four charge summing counters were used for each of the protocols to set low energy thresholds. Vertical dotted lines in Figure 1b is an illustrative example of protocol-1. The source to object (SOD) and source to detector distances (SDD) were 200 and 250 cm, respectively, providing a magnification factor of~1.25. Five vertical positions of the CdTe camera were used to create a virtual detector of the greater area to cover the 31 mm field of view (FOV). The bias voltage applied to the sensor was −600 V. The MARS camera readout system using Medipix3 has been described previously [44]. Threshold equalization with respect to the noise edge and energy calibration of the detector was performed prior to the measurements. Threshold equalization involves per-pixel calibration to reduce the intrinsic variations in the threshold levels associated with each counter and provides a consistent individual pixel response to a uniform X-ray flux [45]. The energy response of the detector was calibrated against the reference energies such as X-ray fluorescence (XRF) emitted from metallic foils (Mo and Pb) and γ-rays emitted from an Am-241 radioisotope [31].

Imaging Samples
Two phantoms were used in this study. The first phantom (as shown in Figure 2) was a calibration phantom (Gd/HA phantom) to demonstrate the application of our proposed technique on two different image acquisition protocols. The second phantom with a biological specimen was used to show the use of the technique to exploit the advantages of monitoring changes in detectability of multiple high-Z materials for their correct identification. The calibration phantom and the two biological specimens were scanned with two imaging protocols as described in Table 1 of 8 mm and length of 7 mm, were used to quantify gadolinium uptake in articular cartilage. The sample preparation setup was similar to that used in a previous study using iodinated contrast [9]. Samples were incubated for 24 h in Gd contrast agent solution (100% MultiHance) at 37 °C and then rinsed in PBS for 30 s. A non-Gd incubated sample was used as a control. Both incubated and control samples were placed side by side in a 15-mL Falcon tube, with PBS infused cotton wool at both ends, sealed to maintain a humid environment during scanning.

Image Processing and Reconstruction
Flat field measurements (1440 for each camera position and energy bin) were taken after the sample scan to correct for fixed-pattern variations in individual pixel responses. The raw data in DICOM format was transferred from the scanner's server to the scanner's inbuilt patient archive and communication system (PACS), where an automated image processing was performed.
The image processing has three stages: pre-reconstruction processing, reconstruction into attenuation volumes, and decomposition into material volumes (see Section 2.5). In this case, the image processing produced slices with 0.1 mm 2 voxels on a 480 × 480 matrix.
The pre-reconstruction processing consists of three steps; the first step is pixel masking to remove any bad pixels; the second step is flat field correction [32,46]; the final step is ring filtration. The ring filter is based on the work by Jan Sijbers and Andrei Postnov [47], but applied to the projection data and adapted into 3D to exploit more information provided by the detector (x, y, and theta).
The reconstruction uses a low-resolution version of the polychromatic form of the Beer-Lambert Law. What this means is that the overlapping, low-threshold counters that measure the data are processed simultaneously to produce attenuation volumes representing non-overlapping energy bins across the measured spectrum. For example, four counters representing the energy range 30-120 keV, 45-120 keV, 60-120 keV and 78-120 keV would simultaneously produce four attenuation volumes representing the energy ranges 30-45 keV, 45-60 keV, 60-78 keV and 78-120 keV.
Processing the counters simultaneously exploits all photons available in the scan. Consider the attenuation volume for 78-120 keV. All four counters contribute to this volume. The counter for 60-120 keV provides specific information about the energy range but has fewer photons (higher statistical noise). The counter for 30-120 keV is less specific but contains all the photons in the scan (lower statistical noise). Given that the counter's measurements are acquired simultaneously, we know each counter maps the same structures. This reconstruction process uses all photons available in the wider counters to reconstruct clean boundaries and structures, while using the narrower counters to guarantee that the correct energy response is obtained.
The reconstruction algorithm itself is a statistical iterative technique. It uses multiplicative correction terms similar to OSEM [48], or MART [49]. The statistical element is introduced by weighting the corrections to the volume by a normalized Poisson distribution to slow down

Image Processing and Reconstruction
Flat field measurements (1440 for each camera position and energy bin) were taken after the sample scan to correct for fixed-pattern variations in individual pixel responses. The raw data in DICOM format was transferred from the scanner's server to the scanner's inbuilt patient archive and communication system (PACS), where an automated image processing was performed.
The image processing has three stages: pre-reconstruction processing, reconstruction into attenuation volumes, and decomposition into material volumes (see Section 2.5). In this case, the image processing produced slices with 0.1 mm 2 voxels on a 480 × 480 matrix.
The pre-reconstruction processing consists of three steps; the first step is pixel masking to remove any bad pixels; the second step is flat field correction [32,46]; the final step is ring filtration. The ring filter is based on the work by Jan Sijbers and Andrei Postnov [47], but applied to the projection data and adapted into 3D to exploit more information provided by the detector (x, y, and theta).
The reconstruction uses a low-resolution version of the polychromatic form of the Beer-Lambert Law. What this means is that the overlapping, low-threshold counters that measure the data are processed simultaneously to produce attenuation volumes representing non-overlapping energy bins across the measured spectrum. For example, four counters representing the energy range 30-120 keV, 45-120 keV, 60-120 keV and 78-120 keV would simultaneously produce four attenuation volumes representing the energy ranges 30-45 keV, 45-60 keV, 60-78 keV and 78-120 keV.
Processing the counters simultaneously exploits all photons available in the scan. Consider the attenuation volume for 78-120 keV. All four counters contribute to this volume. The counter for 60-120 keV provides specific information about the energy range but has fewer photons (higher statistical noise). The counter for 30-120 keV is less specific but contains all the photons in the scan (lower statistical noise). Given that the counter's measurements are acquired simultaneously, we know each counter maps the same structures. This reconstruction process uses all photons available in the wider counters to reconstruct clean boundaries and structures, while using the narrower counters to guarantee that the correct energy response is obtained.
The reconstruction algorithm itself is a statistical iterative technique. It uses multiplicative correction terms similar to OSEM [48], or MART [49]. The statistical element is introduced by weighting the corrections to the volume by a normalized Poisson distribution to slow down convergence when approaching the solution, thereby reducing noise. The reconstruction algorithm also adopts a multi-stage approach where it initially reconstructs voxels that are eight times larger than requested. Later on, this is repeatedly subdivided for a total of four stages until the requested voxel size is reached [50]. This approach allows for the reconstruction to proceed quickly. It is also a weak form of a sparsity constraint as a large voxel is the same as a set of small voxels with the same value. Lastly, the larger the voxel, the more pixels from the projection images will contribute to it. This reduces the effect of dead regions. This is particularly useful during the initial reconstruction stages where the effects of dead regions are the most significant.
To evaluate the spectroscopic response, CT slices corresponding to a thickness of~0.5 mm were averaged and then analyzed. The CT numbers were quantified in spectral Hounsfield Units [51] by normalizing CT attenuation (commonly known as linear attenuation coefficient µ) to water and air to account for the differing attenuation at different energies as defined by Equation (1) HU Energy bin (E) = 1000 where µ mat is the attenuation measurement of the given material in the given energy bin, i.e., The relationship between CT number and energy bins was analyzed graphically by manually selecting ROIs comprising~1409 voxels (N) over each solution insert. The standard error was calculated by the normalized standard deviation (stddev/ √ N). The relationship between signal intensity and material concentration was also determined for all four energy bins by linear regression. This was achieved by plotting CT number as a function of concentration.

Material Decomposition
Material decomposition is the process of converting spectral attenuation (energy information) into information about the constituent materials contributing to that attenuation. This typically involves obtaining material properties inverting the mass attenuation equation for one of its variants (i.e., volume fraction [52], Compton-photoelectric [20], or ρZ [53,54]). The weighted average X-ray attenuation of a compound or mixture of the constituent materials is given by In this equation, µ E is the linear attenuation of some composite material for a given energy range E. The material properties desired are the densities of the constituent materials (indexed by m). These are proportionally connected to the linear attenuation through the mass attenuation of the respective materials (µ/ρ) mE for the given energy ranges. Since attenuation varies differently over energy for different materials, such material properties can be deduced when using multiple energies.
The MARS small animal spectral CT scanner incorporates a heuristic image space material decomposition algorithm (MARS-MD) for converting reconstructed energy bins into sparse material images [55][56][57]. This algorithm follows a three-step process for assigning materials to each voxel: (1) each voxel classified as either high, low, or insignificantly attenuating; (2) a calculation of uniquely constrained least squares decomposition for each feasible pair of materials falling into the respective attenuation category; and (3) a classification of the result which has the smallest regression error. The output of this process is that every voxel is assigned the 1-2 materials with the most significant contribution to the voxel's attenuation. The algorithm was calibrated using mass attenuation coefficients estimated from the reconstructed phantom data. For more details on this algorithm we direct the reader to our draft paper [57]. Using this algorithm, the spectral data analyzed in this study was decomposed at a voxel level into sparse combinations of gadolinium, hydroxyapatite, fat, and water. We are exploiting the single high-Z property of the MD algorithm to generate a quantitative metric in decomposed material images which can be used to both measure and monitor changes in detectability and correct identification of multiple high-Z materials over a wide range of concentrations.
To measure the amount of misidentification between different materials, a metric was generated for quantitative evaluation of post-MD volumes of the calibration phantom which provides sensitivity (true positive rate) and specificity (true negative rate) for correct material identification at various material concentrations. As the composite material contains high-Z material in solution form (i.e., in water), all voxels of high-Z material density images should also be identified as water on the water density image. Therefore, water density image was taken as a reference for our metrics. In addition, voxels in all material images with value >0 were masked for further processing.
In our metric, sensitivity refers to the test's ability to quantify successful voxel identification in the target material's region of interest (ROI). Manual selection of a ROI comprising 1409 voxels (N) over each of the given solution inserts of the target material image was compared with the corresponding voxels in the water density image, as shown in Figure 3. Corresponding voxels with values equal to 1 in both images (target material and water density) were identified as successful voxels (voxels that are identified as the correct material). To measure the amount of misidentification between different materials, a metric was generated for quantitative evaluation of post-MD volumes of the calibration phantom which provides sensitivity (true positive rate) and specificity (true negative rate) for correct material identification at various material concentrations. As the composite material contains high-Z material in solution form (i.e., in water), all voxels of high-Z material density images should also be identified as water on the water density image. Therefore, water density image was taken as a reference for our metrics. In addition, voxels in all material images with value >0 were masked for further processing.
In our metric, sensitivity refers to the test's ability to quantify successful voxel identification in the target material's region of interest (ROI). Manual selection of a ROI comprising 1409 voxels (N) over each of the given solution inserts of the target material image was compared with the corresponding voxels in the water density image, as shown in Figure 3. Corresponding voxels with values equal to 1 in both images (target material and water density) were identified as successful voxels (voxels that are identified as the correct material). The percentage of successful voxels identification in the target material's ROI provides the true positive voxels (TP) such that Here is a voxel and R shows the ROI, TM_v is the target material image voxel, and W_v is the water material image voxel. Similarly, the percentage of unsuccessful voxels provide the false negative voxels (FN) in the target material's ROI such that The sensitivity (identification as a percentage) of every voxel of each material at each of the given concentration was calculated by equation (5).

100
(5) The specificity refers to the test's ability to quantify voxels which are correctly identified as not the target material in the control ROIs. The specificity describes that the control material will be identified in the target material image when misidentification occurs.
Manual selection of ROIs in the target material image comprising 1409 voxels (N) over each of the given solution inserts of the control material was compared with the corresponding voxels in the The percentage of successful voxels identification in the target material's ROI provides the true positive voxels (TP) such that Here v is a voxel and R shows the ROI, TM_v is the target material image voxel, and W_v is the water material image voxel. Similarly, the percentage of unsuccessful voxels provide the false negative voxels (FN) in the target material's ROI such that The sensitivity (identification as a percentage) of every voxel of each material at each of the given concentration was calculated by Equation (5). The specificity refers to the test's ability to quantify voxels which are correctly identified as not the target material in the control ROIs. The specificity describes that the control material will be identified in the target material image when misidentification occurs.
Manual selection of ROIs in the target material image comprising 1409 voxels (N) over each of the given solution inserts of the control material was compared with the corresponding voxels in the water density image as demonstrated in Figure 4. Corresponding voxels in control solutions in the target material image with values not equal to 1 were identified as successful voxels (correctly identified as not containing the target material). water density image as demonstrated in Figure 4. Corresponding voxels in control solutions in the target material image with values not equal to 1 were identified as successful voxels (correctly identified as not containing the target material). The percentage of successful voxels which are correctly identified as not the target material in the control ROI provides the true negative voxels (TN) such that where _ shows control material image voxel. Similarly, the percentage of unsuccessful voxels (incorrectly identified as containing the target material) provide the true false positive voxels (FP) such that Specificity of every voxel of each material at each of the given concentration is calculated by Equation (8) 1 100 It is important to mention that (Equation (5)) and (Equation (8)) should sum to the total number of the selected voxels. However, the specificity of target material to water cannot be determined by Equation (8) as the water basis is used as a reference material. Therefore, percentage of voxels in the composite materials which are only classified as water are calculated as (9) Finally, the relationship between the known and measured concentrations of gadolinium and hydroxyapatite was analyzed in the material domain images by manually selecting ROIs comprising ~1409 voxels (N) over each solution insert. The percentage of successful voxels which are correctly identified as not the target material in the control ROI provides the true negative voxels (TN) such that

Results
where CM_v shows control material image voxel. Similarly, the percentage of unsuccessful voxels (incorrectly identified as containing the target material) provide the true false positive voxels (FP) such that Specificity of every voxel of each material at each of the given concentration is calculated by Equation (8) It is important to mention that TP + FN (Equation (5)) and TN + FP (Equation (8)) should sum to the total number of the selected voxels. However, the specificity of target material to water cannot be determined by Equation (8) Finally, the relationship between the known and measured concentrations of gadolinium and hydroxyapatite was analyzed in the material domain images by manually selecting ROIs comprising 1409 voxels (N) over each solution insert.

Results
To calibrate biological data against the phantom containing known concentrations of known materials, calibration phantom images were evaluated in the energy domain (spectroscopic images) as well as the material domain (material segmentation and quantification) before they could be used for calibration purposes.
The overall image quality produced by the MARS scanner was adequate for the analysis task, i.e., no major artefacts (beam hardening, partial volume effect, photon starvation or undersampling) were observed in the images. A few ring artefacts were observed towards the center of the reconstructed images. Figure 5 shows the spectral response of the transverse slices of the Gd/HA phantom with 4 energy bins for two imaging protocols.
Appl. Sci. 2018, 8, x FOR PEER REVIEW 9 of 20 as well as the material domain (material segmentation and quantification) before they could be used for calibration purposes.
The overall image quality produced by the MARS scanner was adequate for the analysis task, i.e., no major artefacts (beam hardening, partial volume effect, photon starvation or undersampling) were observed in the images. A few ring artefacts were observed towards the center of the reconstructed images. Figure 5 shows the spectral response of the transverse slices of the Gd/HA phantom with 4 energy bins for two imaging protocols.  Figure 6 shows the relationship between CT number and energy bins. The CT number of the gadolinium and hydroxyapatite decreases with energy due to the decreasing influence of the photoelectric effect and increasing Compton scattering, however, for both protocols, a higher value for CT numbers is noticeable due to the influence of k-edge at gadolinium's k-edge energy range. Moreover, the CT number of the lipid (effective Z ≈ 6.5) increases uniformly with energy. The spectral separation between different materials is the key information required for the segmentation task of the material decomposition.
The linear fitted lines in Figure 7 indicate the system's linearity response over a range of concentration and energy bins ( 0.99 . Linearity determines the ability of a system to detect the presence or absence of any material. This information is important for material quantification and is therefore used to correlate between the known (contrast material) and unknowns (suspected regions). It can be affected by random noise, beam hardening, and photon starvation [58]. The non-zero CT number at zero concentration can be related to possible imperfections of the photon counting detectors or contributions of random (quantum) noise and systematic effects of the system.  Figure 6 shows the relationship between CT number and energy bins. The CT number of the gadolinium and hydroxyapatite decreases with energy due to the decreasing influence of the photoelectric effect and increasing Compton scattering, however, for both protocols, a higher value for CT numbers is noticeable due to the influence of k-edge at gadolinium's k-edge energy range. Moreover, the CT number of the lipid (effective Z ≈ 6.5) increases uniformly with energy. The spectral separation between different materials is the key information required for the segmentation task of the material decomposition.
The linear fitted lines in Figure 7 indicate the system's linearity response over a range of concentration and energy bins (R 2 ≥ 0.99). Linearity determines the ability of a system to detect the presence or absence of any material. This information is important for material quantification and is therefore used to correlate between the known (contrast material) and unknowns (suspected regions). It can be affected by random noise, beam hardening, and photon starvation [58]. The non-zero CT number at zero concentration can be related to possible imperfections of the photon counting detectors or contributions of random (quantum) noise and systematic effects of the system. Figures 8 and 9 show identification of hydroxyapatite, gadolinium, lipid and water in Gd/HA phantom with protocol-1 and protocol-2 respectively. Some misidentification in material voxels (crosstalk) is observed between different materials, especially at the lower concentrations of gadolinium and hydroxyapatite decomposed images. It is important to highlight that despite clear material separation in attenuation profiles of gadolinium and hydroxyapatite in energy images (see Figures 6  and 7), the misidentification of material type and its concentration is difficult to assess visually in the images in Figures 8 and 9. It is also observed that the phantom body (PMMA) was significantly misidentified as gadolinium with protocol-2 (see Figure 9c), hence the need for a voxel by voxel assessment method for quantifying the degree of misidentification.   Figure 8 and Figure 9 show identification of hydroxyapatite, gadolinium, lipid and water in Gd/HA phantom with protocol-1 and protocol-2 respectively. Some misidentification in material voxels (crosstalk) is observed between different materials, especially at the lower concentrations of gadolinium and hydroxyapatite decomposed images. It is important to highlight that despite clear   Figure 8 and Figure 9 show identification of hydroxyapatite, gadolinium, lipid and water in Gd/HA phantom with protocol-1 and protocol-2 respectively. Some misidentification in material voxels (crosstalk) is observed between different materials, especially at the lower concentrations of visually in the images in Figure 8 and 9. It is also observed that the phantom body (PMMA) was significantly misidentified as gadolinium with protocol-2 (see Figure 9c), hence the need for a voxel by voxel assessment method for quantifying the degree of misidentification.    visually in the images in Figure 8 and 9. It is also observed that the phantom body (PMMA) was significantly misidentified as gadolinium with protocol-2 (see Figure 9c), hence the need for a voxel by voxel assessment method for quantifying the degree of misidentification.   For post material decomposition evaluation, images corresponding to a thickness of~2 mm were analyzed slice by slice by using Equation (5) for sensitivity, Equation (8) for the specificity of the target material with high-Z materials and Equation (9) for the specificity of the target material with water. Results were then averaged and plotted in the MD misidentification charts as shown in Figures 10  and 11. water. Results were then averaged and plotted in the MD misidentification charts as shown in Figure  10 and 11.
Once the material separation was quantified, the relationship between the known and measured concentrations of gadolinium and hydroxyapatite was determined from material images. Results indicate that despite good spectral and linear correlation of gadolinium in energy images with protocol-2 (Figure 6c and 7c), the measured concentration of gadolinium is underestimated for protocol-2 (Figure 12a) whereas, protocol-1 indicates a good correlation between the known and measured concentrations over the full range of gadolinium 0.99 . Hydroxyapatite concentrations were observed to be linear with both protocols. Material misidentification and misquantification results from Figures 10-12 are summarized in Table 2. These results demonstrate that the methodology can be used to compare how different scanning protocols using one material differentiation method affect the measurement of the concentration of materials. Figure 10. MD identification chart for protocol-1 shows quantitative misidentification of voxels between gadolinium, hydroxyapatite, lipid, and water. For the 8 mg Gd/mL solution insert, not a single voxel was misidentified (100% sensitivity) as a different material. For 4 mg Gd/mL, ~98% voxels were correctly identified as gadolinium and 2% were misidentified as HA. Similarly, for 1 mg Gd/ml solution insert, ~62% voxels were identified as gadolinium, ~18% were misidentified as HA, ~10% misidentified as lipid and water. For HA calibration rods, no misidentification is observed for 808 mg HA/mL and 212 mg HA/mL but ~4% voxels from 54 mg HA/mL was misidentified as Gd. A corresponding table in the right is shown to look up individual value in the chart. Figure 11. MD identification chart for protocol-2 shows quantitative misidentification of voxels between gadolinium, hydroxyapatite, lipid and water. For given gadolinium solutions, sensitivity decreased from ~94% for 8 mg Gd/mL to 42% for 1 mg Gd/mL solution insert. For HA, no misidentification was observed in any of the given calibration rods. A corresponding table in the right is shown to look up individual value in the chart. Figure 10. MD identification chart for protocol-1 shows quantitative misidentification of voxels between gadolinium, hydroxyapatite, lipid, and water. For the 8 mg Gd/mL solution insert, not a single voxel was misidentified (100% sensitivity) as a different material. For 4 mg Gd/mL,~98% voxels were correctly identified as gadolinium and 2% were misidentified as HA. Similarly, for 1 mg Gd/mL solution insert,~62% voxels were identified as gadolinium,~18% were misidentified as HA,~10% misidentified as lipid and water. For HA calibration rods, no misidentification is observed for 808 mg HA/mL and 212 mg HA/mL but~4% voxels from 54 mg HA/mL was misidentified as Gd.
A corresponding table in the right is shown to look up individual value in the chart.
water. Results were then averaged and plotted in the MD misidentification charts as shown in Figure  10 and 11. Once the material separation was quantified, the relationship between the known and measured concentrations of gadolinium and hydroxyapatite was determined from material images. Results indicate that despite good spectral and linear correlation of gadolinium in energy images with protocol-2 (Figure 6c and 7c), the measured concentration of gadolinium is underestimated for protocol-2 (Figure 12a) whereas, protocol-1 indicates a good correlation between the known and measured concentrations over the full range of gadolinium 0.99 . Hydroxyapatite concentrations were observed to be linear with both protocols. Material misidentification and misquantification results from Figures 10-12 are summarized in Table 2. These results demonstrate that the methodology can be used to compare how different scanning protocols using one material differentiation method affect the measurement of the concentration of materials. Figure 10. MD identification chart for protocol-1 shows quantitative misidentification of voxels between gadolinium, hydroxyapatite, lipid, and water. For the 8 mg Gd/mL solution insert, not a single voxel was misidentified (100% sensitivity) as a different material. For 4 mg Gd/mL, ~98% voxels were correctly identified as gadolinium and 2% were misidentified as HA. Similarly, for 1 mg Gd/ml solution insert, ~62% voxels were identified as gadolinium, ~18% were misidentified as HA, ~10% misidentified as lipid and water. For HA calibration rods, no misidentification is observed for 808 mg HA/mL and 212 mg HA/mL but ~4% voxels from 54 mg HA/mL was misidentified as Gd. A corresponding table in the right is shown to look up individual value in the chart. Figure 11. MD identification chart for protocol-2 shows quantitative misidentification of voxels between gadolinium, hydroxyapatite, lipid and water. For given gadolinium solutions, sensitivity decreased from ~94% for 8 mg Gd/mL to 42% for 1 mg Gd/mL solution insert. For HA, no misidentification was observed in any of the given calibration rods. A corresponding table in the right is shown to look up individual value in the chart. Figure 11. MD identification chart for protocol-2 shows quantitative misidentification of voxels between gadolinium, hydroxyapatite, lipid and water. For given gadolinium solutions, sensitivity decreased from~94% for 8 mg Gd/mL to 42% for 1 mg Gd/mL solution insert. For HA, no misidentification was observed in any of the given calibration rods. A corresponding table in the right is shown to look up individual value in the chart.
Once the material separation was quantified, the relationship between the known and measured concentrations of gadolinium and hydroxyapatite was determined from material images. Results indicate that despite good spectral and linear correlation of gadolinium in energy images with protocol-2 (Figures 6c and 7c), the measured concentration of gadolinium is underestimated for protocol-2 (Figure 12a) whereas, protocol-1 indicates a good correlation between the known and measured concentrations over the full range of gadolinium (R 2 ≥ 0.99). Hydroxyapatite concentrations were observed to be linear with both protocols. Material misidentification and mis-quantification results from Figures 10-12 are summarized in Table 2. These results demonstrate that the methodology can be used to compare how different scanning protocols using one material differentiation method affect the measurement of the concentration of materials.  Spectral CT and material component images of ex-vivo bovine cartilage-bone plugs are shown in Figure 13. The bone in the bovine plugs is represented in the HA component images (see Figure  13b,f). The layer of articular cartilage of the incubated knee specimen is represented by the gadolinium density images (see Figure 13c,g) whereas no gadolinium signal from control specimen was detected with either protocol. However, misidentification of gadolinium as water is observed with protocol-2.  Spectral CT and material component images of ex-vivo bovine cartilage-bone plugs are shown in Figure 13. The bone in the bovine plugs is represented in the HA component images (see Figure 13b,f). The layer of articular cartilage of the incubated knee specimen is represented by the gadolinium density images (see Figure 13c,g) whereas no gadolinium signal from control specimen was detected with either protocol. However, misidentification of gadolinium as water is observed with protocol-2. Figure 14a,b shows fused images of gadolinium and hydroxyapatite densities with protocol-1 and protocol-2 respectively. Quantitative variations in the gadolinium concentration are plotted (Figure 14c) along the depth of the cartilage. Contrast quantification is crucial in a biological specimen especially when radiologists determine disease burden and contrast delivery by using high-Z contrast materials. Figure 14c shows that despite good qualitative gadolinium decomposition, measured gadolinium concentration with protocol-2 is over 50% lower than with protocol-1. Which measurement of these two concentrations is more reliable? Protocol-2 underestimates gadolinium concentration in comparison to Protocol-1 in the phantom (Figure 12a), by a similar amount.  Figure 14a,b shows fused images of gadolinium and hydroxyapatite densities with protocol-1 and protocol-2 respectively. Quantitative variations in the gadolinium concentration are plotted (Figure 14c) along the depth of the cartilage. Contrast quantification is crucial in a biological specimen especially when radiologists determine disease burden and contrast delivery by using high-Z contrast materials. Figure 14c shows that despite good qualitative gadolinium decomposition, measured gadolinium concentration with protocol-2 is over 50% lower than with protocol-1. Which measurement of these two concentrations is more reliable? Protocol-2 underestimates gadolinium concentration in comparison to Protocol-1 in the phantom (Figure 12a), by a similar amount. Figure 14. Fused images of bovine knee specimens with protocol-1 (a) and protocol-2 (b) with a gadolinium channel displayed quantitatively using a green color map with concentration ranging from 0 to 50 mg Gd/mL; (c) shows a quantified correlation in gadolinium uptake between the two protocols with respect to cartilage thickness (corresponding to arrow directions in (a,b). It is evident that despite using the same biological specimen, the two imaging protocols are reporting different Figure 13. A single energy spectral CT image of two bovine knee plugs and material component images for hydroxyapatite, gadolinium, and water obtained from the material decomposition with protocol-1 (top row) and protocol-2 (bottom row). The window levels are normalized from 0 to 1 for display. Figure 13. A single energy spectral CT image of two bovine knee plugs and material component images for hydroxyapatite, gadolinium, and water obtained from the material decomposition with protocol-1 (top row) and protocol-2 (bottom row). The window levels are normalized from 0 to 1 for display. Figure 14a,b shows fused images of gadolinium and hydroxyapatite densities with protocol-1 and protocol-2 respectively. Quantitative variations in the gadolinium concentration are plotted (Figure 14c) along the depth of the cartilage. Contrast quantification is crucial in a biological specimen especially when radiologists determine disease burden and contrast delivery by using high-Z contrast materials. Figure 14c shows that despite good qualitative gadolinium decomposition, measured gadolinium concentration with protocol-2 is over 50% lower than with protocol-1. Which measurement of these two concentrations is more reliable? Protocol-2 underestimates gadolinium concentration in comparison to Protocol-1 in the phantom (Figure 12a), by a similar amount. Figure 14. Fused images of bovine knee specimens with protocol-1 (a) and protocol-2 (b) with a gadolinium channel displayed quantitatively using a green color map with concentration ranging from 0 to 50 mg Gd/mL; (c) shows a quantified correlation in gadolinium uptake between the two protocols with respect to cartilage thickness (corresponding to arrow directions in (a,b). It is evident that despite using the same biological specimen, the two imaging protocols are reporting different Figure 14. Fused images of bovine knee specimens with protocol-1 (a) and protocol-2 (b) with a gadolinium channel displayed quantitatively using a green color map with concentration ranging from 0 to 50 mg Gd/mL; (c) shows a quantified correlation in gadolinium uptake between the two protocols with respect to cartilage thickness (corresponding to arrow directions in (a,b). It is evident that despite using the same biological specimen, the two imaging protocols are reporting different concentrations. This is due to greater misidentification of gadolinium as water with protocol-2 (refer to Figures 11 and  12a). For more insight about the biological sample, the reader is referred to [9].

Discussion
Material misidentification affects the overall sensitivity and specificity of the measurements which is the key requirement of specific identification and quantification of multiple tissue components for spectral molecular imaging. The MARS MD algorithm assumes that each voxel consists of primarily one high-Z material. Therefore, in the segmentation task, once a material is identified in any given voxel, the value for this voxel in other basis materials is assumed to be zero. Therefore, this particular analysis method was designed using a sparse solution MD as a reference, only taking into account whether the solution is zero/non-zero in each material and therefore exploits this "binary property" and quantifies correct material identification at various material concentrations. However, for MD algorithms that have noise in all material channels (i.e., every voxel contains a non-zero value for all material channels), modification like low-value thresholding or smoothing using techniques such as compressed sensing can be applied. Figures 10 and 11 show that sensitivity of any given material in the post-MD images is density dependent whereas specificity is not, and therefore specificity ideally should be 100% for every voxel of each material at each of the given concentrations.
There could be various factors that likely to have an effect on sensitivity, specificity and material mis-quantification. The statistical noise of the system could be one of the factors affecting the sensitivity and specificity of the given material. Noise not only affects the ability to detect low-Z materials but also affects the differentiation of low concentrations of high-Z materials [59]. Another noise related factor affecting material decomposition is pulse pileup [33] which contributes negatively to the energy calibration. It has its origin in photon statistics and also depends on the read-out speed of the detector electronics [60,61]. Pileup occurs when two or more photons arrive within the same detector region and within the same integration time of the readout electronics. These events appear to be only one thus resulting in an information loss. It also leads to events in low energy ranges being detected which exceeds the maximum possible photon energy which also complicates the process of energy calibration and results in a loss of energy resolution [31]. Low X-ray flux was used with the brass filter protocol (protocol-1) to reduce the probability of the pileup effect whilst providing adequate statistics in a reasonable exposure time as shown in our previous studies [32,62]. This might be the reason for less material misidentification in protocol-1 than protocol-2. The underestimation of the gadolinium quantity with protocol-2 can be associated with higher gadolinium misidentification as hydroxyapatite. The threshold energy settings for the selection of energy bins could also have a major impact on spectral image quality in terms of image contrast and noise level. Wide energy bins have lower noise but suffer from poor contrast in the reconstructed images. In single photon counting detectors, a photon is only counted if the signal pulse is higher than the user specified energy threshold. However, the signal created by a single X-ray photon could be registered simultaneously by all energy thresholds depending on photon energy. Therefore, the selection of energy ranges needs to be balanced with the requirements for achieving desirable contrast and material's K-edge. Moreover, CT reconstruction for spectral CT imaging needs to be performed in subtracted energy bins to reconstruct the difference of the photon counts between the two subsequent counters.
It is difficult to pinpoint the exact cause of misidentification but factors such as X-ray scatter, Compton scattering, K-fluorescence X-rays from high-Z semiconductor materials and imperfections of the CdTe detector (pulse pileup, charge trapping, semiconductor detector inhomogenities, and crosstalk within the semiconductor material) are likely to play a part. All these effects are intrinsic to detector properties and not all of them can be compensated by flat field correction, especially energy-dependent effects. They contribute to misidentification of materials in material decomposition. Image related artefacts also have negative effects on the material decomposition.
Other groups [25,63,64] have reported noisy regions with incorrect material identification of high-Z materials but they claimed their ability to detect low concentrations of high-Z materials with high accuracy by measuring the concentrations in decomposed images and comparing it to the true concentration value. Our results show that good correlation over the full range of the known and measured concentrations fails to provide information about the misidentification between different materials and the extent of such misidentification in decomposed material images. Results from the misidentification chart can be used to choose contrast materials best suited to a spectral CT imaging task in relation to the anticipated concentration of those materials. We can use the binary information to quantitatively compute sensitivity and specificity performance of the (data acquisition and data processing) system for target materials of different concentrations and composition and scanned with different imaging protocols.
The materials selected for this study are commonly used for radiology related studies due to their attenuation properties. A shortcoming of the study could be the evaluation of a limited number of high-Z materials and limited image acquisition protocols. Despite these limitations, we showed that the technique can be applied to various concentrations of high-Z materials simultaneously and can be used (1) as a quality assurance tool to set up a threshold for lower detectable limits of any given material in the post-MD domain; (2) for optimization of the spectral CT imaging parameters and acquisition protocol to maximize the sensitivity and specificity of the given materials; and (3) to establish a baseline of misidentification measurement for daily QA of spectral CT scanners.
The ground truth for the identity and concentration of all materials in a complex biological specimen is frequently unknown. There are many potential sources of error in spectral CT analysis of compositionally complex specimens, including the MD method itself. This paper describes a new method that measures the net effect of all these sources of error on the ability to identify and quantify materials at spectral CT. We have successfully tested the method in both a phantom and a biological sample. We have demonstrated its utility for one material differentiation method applied to two scanning protocols. In future, this methodology could be used to compare and optimize scanning protocols, image reconstruction methods, and methods for material differentiation in spectral CT.

Conclusions
In summary, we developed and validated a method for quantifying misclassification of the identity and quantity of gadolinium and hydroxyapatite in spectral CT material images. The method can be applied to any chosen material to measure the image quality of the final result. Although developed with multi-energy CT, it is equally applicable to dual-energy CT. These results demonstrate that the methodology can be used to compare how different scanning protocols using one material differentiation method affect material identification and measurement of the concentration of that material for clinically relevant contrast agents and tissues.