Impact of Deltoid Computer Tomography Image Data on the Accuracy of Machine Learning Predictions of Clinical Outcomes after Anatomic and Reverse Total Shoulder Arthroplasty

Background: Despite the importance of the deltoid to shoulder biomechanics, very few studies have quantified the three-dimensional shape, size, or quality of the deltoid muscle, and no studies have correlated these measurements to clinical outcomes after anatomic (aTSA) and/or reverse (rTSA) total shoulder arthroplasty in any statistically/scientifically relevant manner. Methods: Preoperative computer tomography (CT) images from 1057 patients (585 female, 469 male; 799 primary rTSA and 258 primary aTSA) of a single platform shoulder arthroplasty prosthesis (Equinoxe; Exactech, Inc., Gainesville, FL) were analyzed in this study. A machine learning (ML) framework was used to segment the deltoid muscle for 1057 patients and quantify 15 different muscle characteristics, including volumetric (size, shape, etc.) and intensity-based Hounsfield (HU) measurements. These deltoid measurements were correlated to postoperative clinical outcomes and utilized as inputs to train/test ML algorithms used to predict postoperative outcomes at multiple postoperative timepoints (1 year, 2–3 years, and 3–5 years) for aTSA and rTSA. Results: Numerous deltoid muscle measurements were demonstrated to significantly vary with age, gender, prosthesis type, and CT image kernel; notably, normalized deltoid volume and deltoid fatty infiltration were demonstrated to be relevant to preoperative and postoperative clinical outcomes after aTSA and rTSA. Incorporating deltoid image data into the ML models improved clinical outcome prediction accuracy relative to ML algorithms without image data, particularly for the prediction of abduction and forward elevation after aTSA and rTSA. Analyzing ML feature importance facilitated rank-ordering of the deltoid image measurements relevant to aTSA and rTSA clinical outcomes. Specifically, we identified that deltoid shape flatness, normalized deltoid volume, deltoid voxel skewness, and deltoid shape sphericity were the most predictive image-based features used to predict clinical outcomes after aTSA and rTSA. Many of these deltoid measurements were found to be more predictive of aTSA and rTSA postoperative outcomes than patient demographic data, comorbidity data, and diagnosis data. Conclusions: While future work is required to further refine the ML models, which include additional shoulder muscles, like the rotator cuff, our results show promise that the developed ML framework can be used to evolve traditional CT-based preoperative planning software into an evidence-based ML clinical decision support tool.


Introduction
The deltoid is the largest muscle in the shoulder and the primary elevator; its size and shape power shoulder motion, particularly abduction and forward elevation.Clinical studies [1][2][3] suggest that objective measures of deltoid morphology may be prognostic of clinical performance after reverse total shoulder arthroplasty (rTSA); after all, a nonfunctioning deltoid is a contraindication for rTSA.
Despite the importance of the deltoid to shoulder biomechanics [4][5][6] and rTSA, in particular [6][7][8][9][10][11], only a few small studies have attempted to quantify deltoid muscle characteristics (e.g., size, shape, and quality) [12][13][14][15][16][17][18] and correlate [15,17,19,20] those measures to shoulder range of motion, strength, function, and patient-reported outcome measures.Several challenges have restricted previous efforts from successfully conducting muscle-tooutcomes-related research in any statistically/scientifically relevant manner.First, reliably quantifying soft tissue using various different medical imaging modalities, like magnetic resonance images (MRI) and computer tomography (CT) images, can be complex and technically challenging.Second, obtaining high-resolution imaging can be expensive, which often limits the number of patients included in these studies.Third, differences in imaging modalities and scanning protocols, including variations in image resolution and slice thickness, practically limit generalizability.Fourth, manually delineating muscle boundaries on MRI and CT images, which is necessary to obtain accurate three-dimensional (3D) muscle volumes, is tedious and time-consuming, further contributing to subjectivity in interpretations and inconsistency of measurements.
Machine learning (ML) techniques present an opportunity to objectively quantify medical images at scale.ML techniques can be used, after training, to automatically segment images to create 3D volumes of various tissues, such as bone, muscle, tendon, and fat.These segmented images can then be automatically analyzed for size, shape, and other radiomic measures [21].Importantly, these radiomic measurements can potentially characterize tissue quality by analyzing the distribution of gray-scale voxel intensities that compose these images and 3D volumes.Hounsfield units (HUs) represent the radiation attenuation (e.g., radiodensity) of different tissues in CT images.The HU scale is defined such that the radiodensity of water has a HU value of 0, and the radiodensity of air is typically −1000 HU.Different HU ranges have been suggested for muscle and fat.Most commonly, HU intervals of −190 to −30 characterize fat, and HU intervals of −29 to 150 characterize muscle [22].By incorporating radiomic analyses into an ML framework that includes muscle segmentation, challenges associated with traditional techniques (i.e., manual segmentation and subjective assessments) are effectively overcome.As an added advantage, because ML techniques can be efficiently scaled, larger sample sizes can be analyzed, facilitating a more granular comparative analysis with greater statistical power.
In this study, we aim to utilize a CT-based ML framework to segment the deltoid muscle from a registry of preoperative CT images of shoulder arthroplasty patients enrolled in an IRB-approved multi-center prospective clinical outcome study.From these segmented deltoid images, we aim to quantify various muscle characteristics, including volumetric (size, shape, etc.) and intensity-based HU measurements.By analyzing these objective measures of the deltoid muscle alongside each patient's clinical outcomes after anatomic total shoulder arthroplasty (aTSA) and rTSA, we aim to better understand the relationship of the deltoid to clinical performance.Finally, we aim to utilize these deltoid muscle image data and clinical outcomes data to create a CT-based ML model that predicts clinical outcomes after aTSA and rTSA.
The specific goals of this study are: (1) to quantify and compare the preoperative deltoid muscle characteristics using various radiomic measurements for male and female aTSA and rTSA patients; (2) to quantify and compare the impact of varying CT image convolution kernels on these deltoid measurements; (3) to investigate if these deltoid muscle measurements correlated with preoperative pain, range of motion, and function and also investigate if these muscle measurements impact 2-year minimum postoperative clinical outcomes after aTSA and rTSA; and (4) to incorporate these deltoid muscle measurements into an ML-based predictive model, rank-order the relative importance of each image measurement to predict 2-year minimum clinical outcomes, and quantify the ability of these objective measures to improve the predictive performance of regression and classification ML models for multiple clinical outcome measures at multiple follow-up timepoints after aTSA and rTSA as compared to non-image based versions of the ML predictive models for each outcome measure.

Materials and Methods
Preoperative CT images from 1057 patients (585 female, 469 male, and 3 unspecified; age = 70.0 ± 7.9, range: 38-92; 799 primary rTSA and 258 primary aTSA) enrolled in a multicenter, IRB-approved prospective clinical outcome study of a single platform shoulder arthroplasty prosthesis (Equinoxe; Exactech, Inc., Gainesville, FL, USA) were analyzed in this study.These patients were selected from a larger CT dataset of patients with images acquired using the ExactechGPS CT-scan acquisition protocol.This CT protocol permits slice thickness between 0.3 to 1.25 mm-with a recommended thickness of 0.625 mm, pixel resolutions between 0.3 × 0.3 mm to 1 × 1 mm, and accepts multiple different convolution kernels from different manufacturers, including but not limited to BONE (GE), B41 (Siemens), FC30 (Toshiba), and L (Philips).Each patient's CT scan included complete acquisition of the deltoid muscle and scapular bone.

Deltoid Image Analysis
An overview of the ML framework is described in Figure 1.First, a CT-based ML segmentation algorithm automatically delineates the deltoid boundaries and creates 3D deltoid masks.After deltoid segmentation, 3D models of the reconstructed deltoid were viewed by two trained evaluators to confirm that the entirety of the deltoid was present and that there were no major errors in the deltoid segmentation that could affect subsequent quantification.Cases with an estimated volumetric error >5% were excluded from further analysis.
Next, a quantification technique was utilized to extract deltoid characteristics, specifically the shape, size, and distribution of voxel intensities.The segmented masks are overlayed onto the raw images [23], and the shape and size features are calculated from the resulting mesh created by the mask and the raw image.These calculated features include deltoid volume, normalized deltoid volume (relative to the scapular bone volume), and normalized deltoid atrophy (relative to deltoids of patients having the same age and gender).Beyond these basic features, first-order radiomic shape features are derived from mesh volume and include deltoid shape flatness, sphericity, length (max 2D diameter column), and width (max 2D diameter row).In addition to these shape and size features, the first-order radiomics were calculated from the distribution of voxel intensities (i.e., distribution of HU values).These radiomic measures include skewness, entropy, uniformity, mean intensity value, root-mean-squared intensity value, 90th percentile intensity, and kurtosis.Radiomics were extracted after resampling images to an isotropic voxel of 1 mm and using a bandwidth of 25 HU.Finally, the voxels within the deltoid (segmented area) were stratified into fat and muscle based on their observed HU values (fat = −190 to −30 HU; muscle = −29 to 150 HU).Fatty infiltration was calculated as the ratio of fat (number of voxels that represent fat) to soft tissue (number of voxels that represent both muscle and fat).The definitions of each of these volumetric-based and HU intensity-based deltoid measurements are described in Table 1.
The ML CT image segmentation algorithm used in this study to automatically segment the deltoid has been previously validated [24].This deltoid segmentation algorithm was fine-tuned from a pre-trained model using SwinUNETR.The SwinUNETR model is a U-Net architecture that uses Swin transformers instead of convolutional neural networks (CNNs) to extract features.Swin transformers are a type of attention-based neural network architecture that effectively capture long-range dependencies in data, making them wellsuited for tasks such as medical image segmentation [25].The model was fine-tuned using CT scans from 78 patients, from which medical students, under the supervision of an orthopedic surgeon and experienced radiologist, manually delineated the deltoid boundaries on each CT image to generate labeled masks.After training using these masks, the model was deployed and tested on CT images from an additional 20 patients and was demonstrated to produce deltoid masks with a high dice coefficient of 0.93 ± 0.03 [24].

Image Parameter Type of Image Measurement Definition
Deltoid Shape Flatness Volumetric Data The true flatness of the shape or region of interest is based on spacing between voxels.The less space between voxels, the flatter the object.Value ranges between 1 (non-flat, sphere-like) and 0 (a flat surface or single-slice segmentation).

Normalized Deltoid Volume
Volumetric Data Deltoid volume normalized or adjusted to the size of patients' scapula.It is calculated as the ratio of deltoid volume to scapula volume.Larger values imply a larger deltoid.

Normalized Deltoid Atrophy Volumetric Data
The ratio of individual normalized deltoid volume to the mean normalized deltoid volume across all patients with same age and gender.Values range between 0-1 (if deltoid is smaller than the average deltoid, the atrophy is higher) and more than 1 (if deltoid is larger than average deltoid, the atrophy is lower).

Deltoid Shape Sphericity Volumetric Data
A measure of the roundness of the shape or the region of interest relative to a sphere.It is the ratio of the surface area of a perfect sphere with the same volume to the surface area of the given object.Values range between 1 (perfect sphere) and 0 (a flat surface).

Max 2D Diameter Row Volumetric Data
Largest distance between two points in sagittal plane; can be approximated as deltoid length.

Max 2D Diameter Column
Volumetric Data Largest distance between two points in coronal plane; can be approximated as deltoid width.

Convolution Kernel Image Protocol
A process used in CT scanners during image reconstruction by reducing blurring and noise captured in the raw data.Different convolution kernels are used to enhance or suppress certain features in the raw image (e.g., muscle or bone).The kernels for the images in the dataset are BONE, FC30, BONEPLUS, B60s, B31s, and ['I31s, '3 ′ ].

Skewness of Deltoid Voxels
Hounsfield-Based Data The asymmetry of the distribution of Hounsfield Unit (HU) values against the average HU of deltoid voxels.The value can be positive or negative depending on tail of the distribution and the average of the distribution.

Entropy of Deltoid Voxels
Hounsfield-Based Data The uncertainty/randomness in the HU values across deltoid voxels.More entropy implies more variation in the values of HU.The value can be any real positive number.

Uniformity of Deltoid Voxels Hounsfield-Based Data
The homogeneity of Hounsfield unit, where a greater uniformity implies a greater homogeneity or a smaller range of discrete intensity values.Values range between 0 (low uniformity) and 1 (high uniformity).

Mean Deltoid Voxel Hounsfield-Based Data
The average value of Hounsfield unit in the deltoid voxels.

Predict+ Background
Predict+ (Exactech, Inc., Gainesville, FL, USA) is an ML-based clinical decision support tool (CDST) that preoperatively predicts personalized aTSA and rTSA outcomes from a "minimal feature set" of preoperative inputs [26,27].Specifically, ML-based regression predictions for 7 outcome measures (VAS pain, global shoulder function, shoulder arthroplasty smart (SAS) score [28], active abduction, active forward elevation, active external rotation, and internal rotation (IR) score [29]) are made at 6 postoperative timepoints (3-6 months, 6-9 months, 1 year, 2-3 years, 3-5 years, and 5+ years).With additional preoperative inputs, the ASES and Constant score can also be predicted at the same timepoints.Classification predictions are provided 2-3 years after surgery to describe the likelihood that a patient will achieve improvement that exceeds the minimal clinically important difference (MCID) [28,30,31] and substantial clinical benefit (SCB) [28,31,32] patient-satisfaction thresholds associated with each outcome measure.The internal validations describing the accuracy of the Predict+ ML algorithms are published [26,31,[33][34][35][36][37], and these algorithms have also been externally validated [27] for up to 2 years.Furthermore, recent research has demonstrated that Predict+ provides fair [36] and unbiased predictions for aTSA and rTSA patients of different ethnicities, sexes, and ages.Additionally, the use of Predict+ has been demonstrated to improve surgeon confidence [38] when deciding between aTSA and rTSA.
Currently, the Predict+ "minimal feature set" [26] of preoperative input requirements does not include any direct CT image data or any data related to the quality of a patient's

Predict+ Background
Predict+ (Exactech, Inc., Gainesville, FL, USA) is an ML-based clinical decision support tool (CDST) that preoperatively predicts personalized aTSA and rTSA outcomes from a "minimal feature set" of preoperative inputs [26,27].Specifically, ML-based regression predictions for 7 outcome measures (VAS pain, global shoulder function, shoulder arthroplasty smart (SAS) score [28], active abduction, active forward elevation, active external rotation, and internal rotation (IR) score [29]) are made at 6 postoperative timepoints (3-6 months, 6-9 months, 1 year, 2-3 years, 3-5 years, and 5+ years).With additional preoperative inputs, the ASES and Constant score can also be predicted at the same timepoints.Classification predictions are provided 2-3 years after surgery to describe the likelihood that a patient will achieve improvement that exceeds the minimal clinically important difference (MCID) [28,30,31] and substantial clinical benefit (SCB) [28,31,32] patient-satisfaction thresholds associated with each outcome measure.The internal validations describing the accuracy of the Predict+ ML algorithms are published [26,31,[33][34][35][36][37], and these algorithms have also been externally validated [27] for up to 2 years.Furthermore, recent research has demonstrated that Predict+ provides fair [36] and unbiased predictions for aTSA and rTSA patients of different ethnicities, sexes, and ages.Additionally, the use of Predict+ has been demonstrated to improve surgeon confidence [38] when deciding between aTSA and rTSA.
Currently, the Predict+ "minimal feature set" [26] of preoperative input requirements does not include any direct CT image data or any data related to the quality of a patient's soft tissue or bone.Due to the widespread clinical use of CT-based preoperative planning software tools for both aTSA and rTSA, ample CT data are readily available in existing clinical workflows to enhance CDST predictions and further support clinical decisionmaking by including each patient's CT image information related to their bones and soft tissue.However, it is currently unknown which features derived from CT image data are useful to improve the accuracy of the ML-based CDST outcome predictions.

Analysis of Deltoid Image Measurements and Clinical Outcomes
To better understand the relationship between CT-based deltoid image measurements and clinical outcomes after aTSA and rTSA, preoperative and postoperative clinical data from 1057 primary shoulder arthroplasty patients were analyzed with each patient's preoperative CT-based deltoid image data.Preoperative outcome measures and 2-year minimum clinical outcomes were compared with numerous different deltoid image measurements for various patient cohorts and were statistically analyzed using a two-tailed unpaired students t-test, with a p-value < 0.05 defining significance.Additionally, a Pearson correlation analysis was performed to measure the strength and direction of linear associations between the deltoid image measurements and preoperative clinical measures and between deltoid image measurements and 2-year minimum clinical outcomes.

Machine Learning Prediction
XGBoost is a supervised, ensemble ML technique of multiple-regression trees that are built by iteratively partitioning the training dataset into multiple small batches using a method called boosting [39,40].XGBoost was used to construct algorithms that predict 1-year (9-18 months), 2-3-year (18-36 months), and 3-5-year (36-60 months) aTSA and rTSA outcomes for each aforementioned clinical outcome measure both before and after addition of the CT image deltoid data.Specifically, two ML models were constructed: (1) non-image-based model (which utilizes the minimal feature set of inputs + implant data and bone measurements readily available in CT planning software, e.g., ExactechGPS Equinoxe Planning App v2) and ( 2) image-based model (which utilizes all of the inputs form the first model + the CT image based deltoid measurements).
The predictive performance of each regression model prediction was quantified by the Mean Absolute Error (MAE), which describes the mean absolute difference between the actual and predicted values of each clinical outcome measure at each of the 1-year, 2-3-year, and 3-5-year timepoints.The predictive performance of each 2-3-year MCID [28,30,31] and SCB [28,31,32] classification model describing if a patient will achieve clinical improvement that exceeds the MCID and SCB improvement thresholds was quantified using the classification metrics of accuracy (which quantifies the ability of a model to correctly predict a class), precision or positive predictive value (which quantifies the ability of a model to not identify a negative as positive), recall or sensitivity (which quantifies the ability of a model to identify a positive as a positive), and the area under the receiver operating curve (AUROC).
The relative importance of each preoperative input data to predict each 2-3-year clinical outcome measure was quantified by the F-score and the Reciprocal Fusion Rank score.The F-score is determined by the XGBoost ML technique and quantifies the predictive value of an individual feature to the overall algorithm by the frequency that each feature is used as a candidate for the split by the decision tree algorithm [40].The Reciprocal Fusion Rank score [41] combines the F-score value with the prevalence and uniqueness of that feature within the dataset, deprioritizing features with non-unique and sparse inputs [40,41].Combined, these feature importance data are useful for interpretability [42,43] to better understand the internal logic of the ML model and review the basis of the predictions, which can be particularly important when evaluating radiomic features as some of these measurements are non-intuitive.
Finally, to investigate the feasibility of constructing an ML clinical outcome model that does not require manual input of preoperative range of motion data or patient subjective pain and function survey responses (to improve ML CDST usability and minimize responder fatigue), we constructed a new ML model to predict clinical outcomes after aTSA and rTSA at 1 year, 2-3 years, and 3-5 years by substituting the deltoid image data for the aforementioned surgeon-measured objective range of motion data and patient subjective survey data.We then quantified the predictive accuracy of this new image-based ML model (which effectively simulates inputs that can be automatically obtained from CT-based preoperative planning software) and compared this predictive accuracy to the non-image-based ML models for each outcome measure at each postoperative timepoint.

Results
The results of this study are presented in three sections.First, we present the deltoid shape, size, and radiomic data for the 1057 shoulder arthroplasty patients and provide a statistical analysis describing the relationship of these muscle measurements to patient demographics and clinical outcomes after aTSA and rTSA.Second, we present the impact of deltoid fatty infiltration and the impact of convolution kernel on clinical outcomes after aTSA and rTSA.Third, we present the predictive accuracy associated with the Predict+ ML models utilizing the deltoid image data and report on the feature importance rankings of each deltoid measurement to predict clinical outcomes after aTSA and rTSA.

Deltoid Shape, Size, and Radiomics
The ML framework successfully segmented and extracted radiomic data, including volumetric and HU intensity-based measurements from the CT images of 1057 patients without any manual correction.Substantial variability in CT-based deltoid measurements was observed between patient cohorts when stratified by gender, prosthesis type (Table 2), and by convolution kernel (Table 3).As described in Table 2, significant differences by gender and prosthesis type were observed for four of the six mean volumetric deltoid measurements (deltoid shape flatness, normalized deltoid volume, max column, and max row).Additionally, significant differences by gender and prosthesis type were observed for five of the eight mean HU intensity deltoid measurements (deltoid fat %, skewness, mean voxel, root mean square voxel, and 90th percentile voxel).No significant differences in the mean radiomic measurements for normalized deltoid atrophy, entropy, uniformity, or kurtosis were observed between male and female patients for aTSA or rTSA.However, significant differences were observed in mean deltoid measurements when stratified by convolution kernel for every measurement except deltoid shape flatness and kurtosis.
The mean deltoid volume of the 1057 patients in our study was 342.2 ± 113.0 cm 3 .Mean deltoid volume was observed to decline with patient age by approximately 6% per decade for male patients and 8% per decade for female patients.As described in Table 4, these trends were observed for the direct deltoid volume measurement and the deltoid volume measurements when normalized by scapular bone volume and patient height.Normalized deltoid volume (by scapular bone volume) demonstrated greater relevance to preoperative clinical measures (Table 5) than 2-year minimum postoperative clinical outcome measures (Table 6) after aTSA and rTSA.Prior to surgery, larger normalized deltoid volume was associated with a significantly higher SAS score, significantly more abduction, forward elevation, and internal rotation for male patients and significantly more abduction for female patients.At a minimum of 2 years after aTSA, no differences were observed in any clinical outcome measure between patients with a normalized deltoid volume <4 and patients with a normalized deltoid volume >4.However, some 2-year minimum postoperative differences were observed with rTSA.Male rTSA patients with larger normalized deltoid volumes had significantly higher VAS pain, whereas female rTSA patients with larger normalized deltoid volumes had significantly more abduction but significantly less global shoulder function and significantly lower ASES and SAS scores.

Fatty Infiltration and the Impact of Convolution Kernel
A wide range of deltoid fatty infiltration was observed across patient cohorts.As illustrated in Figure 2, patients with lower deltoid fat percentages tended to have fat along the periphery of the muscle, whereas patients with higher fat percentages tended to have fat distributed throughout the muscle.Considering all convolution kernels, the mean deltoid fat percentage was 20.0 ± 9.9%, though deltoid fat percentage varied on average from 18 to 20% for male patients and from 19 to 23% for female patients across the age cohorts (Table 4).Importantly, the mean deltoid fat percentage was significantly different between male and female patients for both aTSA and rTSA (Table 3) and the mean deltoid fat measurements varied significantly by convolution kernel (Table 3 and Figure 3), particularly between the BONE kernel and the FC30 kernel (Figure 3).Interestingly, the relationship between deltoid fatty infiltration and patient de- Interestingly, the relationship between deltoid fatty infiltration and patient demographics and comorbidities varied by convolution kernel.As described in Table 7, for BONE kernel CT images, male patients with greater deltoid fatty infiltration had significantly higher BMI and more comorbidities; in particular, a significantly greater percentage of fatty infiltration patients had diabetes.Similarly, for BONE kernel CT images, female patients with greater deltoid fatty infiltration had significantly higher weight, higher BMI, a lower occurrence of CTA, and more comorbidities; in particular, a significantly greater percentage of fatty infiltration patients had inflammatory arthritis, hypertension, and diabetes.In contrast, for FC30 kernel CT images, male patients with greater deltoid fatty infiltration had a significantly lower weight but had significantly more comorbidities.Similarly, for FC30 kernel CT images, female patients with greater deltoid fatty infiltration had significantly more comorbidities.Interestingly, the relationship between deltoid fatty infiltration and patient demographics and comorbidities varied by convolution kernel.As described in Table 7, for BONE kernel CT images, male patients with greater deltoid fatty infiltration had significantly higher BMI and more comorbidities; in particular, a significantly greater percentage of fatty infiltration patients had diabetes.Similarly, for BONE kernel CT images, female patients with greater deltoid fatty infiltration had significantly higher weight, higher BMI, a lower occurrence of CTA, and more comorbidities; in particular, a significantly greater percentage of fatty infiltration patients had inflammatory arthritis, hypertension, and Deltoid fatty infiltration was demonstrated to have some impact on both preoperative clinical measures (Table 8) and 2-year minimum postoperative clinical outcome measures (Table 9) after aTSA and rTSA.However, this impact varied by gender and by convolution kernel, particularly preoperatively.Prior to surgery, male patients with greater deltoid fatty infiltration had significantly less deltoid volume (for both the BONE and FC30 kernel cohorts), and specifically for FC30 kernel CT images, male patients with high fatty infiltration also had significantly less forward elevation and less strength (as measured by the max weight held in hand measurement from the Constant score).Prior to surgery, female patients with greater fatty infiltration, specifically for BONE kernel CT images, had significantly larger deltoid volumes, more abduction, less internal rotation, more external rotation, more strength, and a higher global shoulder function score; in contrast, for FC30 kernel CT images, females with greater fatty infiltration had a significantly lower SAS score.At a minimum of 2 years after aTSA, patients with greater fatty infiltration had significantly less strength and a significantly lower Constant score (for both the BONE and FC30 kernel cohorts).Specifically for BONE kernel CT images, aTSA patients with greater fatty infiltration also had significantly less internal rotation.Specifically, for FC30 kernel CT images, aTSA patients with greater fatty infiltration also had significantly less abduction.At a minimum of 2 years after rTSA, patients with greater fatty infiltration had significantly less strength and significantly lower Constant and ASES scores (for both the BONE and FC30 kernel cohorts).Specifically, for BONE kernel CT images, rTSA patients with greater fatty infiltration had significantly less forward elevation and internal rotation and a significantly lower SAS score.

Deltoid Features in Predict+
Both image-based (using deltoid characteristics) and non-image-based XGBoost ML regression models resulted in accurate clinical outcome predictions after aTSA and rTSA (Table 10).The addition of deltoid image data to the 1-year, 2-3-year, and 3-5-year XGBoost ML regression models resulted in modest improvements in predictive accuracy for most outcome measures relative to the ML regression models without image data (Table 10).The largest improvements in predictive accuracy were observed for the active abduction and forward elevation ML regression models.Specifically, for active abduction, the deltoid image-based ML models were associated with lower MAE for both aTSA and rTSA at each timepoint, with the most substantial improvement being a 16.1% reduction in MAE for aTSA at 3-5 years as compared to the ML models without image data.Similarly, for forward elevation, the deltoid image-based ML models were associated with lower MAE for both aTSA and rTSA at each timepoint, with the most substantial improvement being a 10.8% reduction in MAE for aTSA at 2-3 years as compared to the ML models without image data.Some marginal differences in predictive accuracy were also observed in the other outcome measure ML models.
The MCID (Table 11) and SCB (Table 12) classification predictions associated with each of the image-and non-image-based clinical outcome ML models at 2-3 years follow-up are presented in Tables 11 and 12, respectively.Both image-based and non-image-based XGBoost ML classification models resulted in accurate MCID and SCB predictions after aTSA and rTSA, with very little improvement observed by the addition of the deltoid image data.For aTSA patients, the deltoid image-based predictive models achieved 82-93% accuracy in MCID with an AUROC between 0.69-0.80 and 77-91% accuracy in SCB with an AUROC between 0.70-0.91.For rTSA patients, the deltoid image-based predictive models achieved 77-94% accuracy in MCID with an AUROC between 0.69-0.85and 74-91% accuracy in SCB with an AUROC between 0.72-0.86.
The F-scores and Reciprocal Fusion Rank scores describing the relative feature importance ranking of each CT-based deltoid image parameter to predict 2-3-year clinical outcomes after aTSA and rTSA are presented in Table 13.A review of these F-scores and Reciprocal Fusion Rank score rankings demonstrates that eight out of fourteen deltoid image measurements analyzed in this study were of high predictive value to each 2-3-year clinical outcome model.Deltoid shape flatness was the consensus most predictive deltoid image measurement, being the most utilized feature to predict abduction, external rotation, IR score, ASES, and SAS scores.Deltoid shape flatness was also the second most utilized feature to predict the Constant and VAS pain scores.For reference, deltoids associated with low, medium, and high flatness are depicted in Figure 4. Generally, deltoids with low flatness values were more planar, which may suggest smaller anterior and posterior deltoids (and potentially smaller moment arms), while deltoids with higher flatness values tend to be more curved and/or spherically shaped, with potentially larger anterior and posterior deltoids (and potentially larger moment arms).Normalized deltoid volume was the second most utilized deltoid image measurement and was found to be the most utilized feature to predict the Constant and global shoulder function scores.Other important deltoid image measurements in order of feature importance are deltoid voxel skewness, deltoid shape sphericity, normalized deltoid atrophy, deltoid fat percentage, deltoid voxel entropy, and deltoid voxel uniformity.For additional context as to which preoperative data are used to predict 2-3-year outcomes with each ML model, the features with the top 10 F-scores for each outcome prediction are presented in Table 14.As described in Table 14, preoperative abduction and native glenoid retroversion were generally the most predictive features used to predict 2-3-year outcomes after aTSA and rTSA.15) demonstrated a low linear correlation between these deltoid measurements and each preoperative and 2-year minimum clinical outcome measure for both aTSA and rTSA.Additionally, aboveor below-average deltoid shape flatness, being the overall most predictive feature across all ML models, was observed to only modestly discriminate between patient cohorts for mean preoperative and postoperative outcome measures (Tables 16 and 17).Prior to surgery, male patients with deltoid flatness >0.47 had significantly more internal rotation than male patients with deltoid flatness <0.47, whereas female patients with deltoid flatness >0.47 had significantly more internal rotation, significantly less pain, and significantly higher ASES and SAS scores as compared to female patients with deltoid flatness <0.47 (Table 16).At a minimum of 2 years after aTSA, no differences were observed in any clinical outcome measure between patients with deltoid flatness >0.47 and patients with deltoid flatness <0.47 (Table 17).However, some 2-year minimum postoperative differences were observed with rTSA.Male rTSA patients with deltoid flatness >0.47 had significantly less strength and significantly lower Constant scores as compared to male patients with deltoid flatness <0.47, whereas female rTSA patients with deltoid flatness >0.47 had significantly less strength as compared to female patients with deltoid flatness <0.47.

Discussion
The results of this 1057 patient study demonstrate that our CT-based ML framework can automatically segment the deltoid from preoperative CT scans utilized for preoperative planning software and then automatically quantify numerous volumetric-based and HU intensity-based deltoid measures from those segmented images.An analysis of these radiomic features demonstrated that several deltoid muscle measurements vary with age, gender, prosthesis type, and CT image kernel; additionally, many of these deltoid measurements (like normalized deltoid volume and deltoid fatty infiltration) were demonstrated to be relevant to preoperative and postoperative clinical outcomes after aTSA and rTSA.Additionally, we constructed ML models using preoperative CT-based deltoid image data and demonstrated that the addition of these image data improves ML model performance, with the largest improvements in accuracy observed for the prediction of abduction and forward elevation after aTSA and rTSA.Finally, we rank-ordered the input features driving those ML models and identified the specific deltoid measurements, as well as the top 10 preoperative features that were most predictive of postoperative clinical outcomes after aTSA and rTSA.In particular, we identified that deltoid shape flatness, normalized deltoid volume, deltoid voxel skewness, and deltoid shape sphericity were the consensus most predictive image-based features used to predict clinical outcomes after aTSA and rTSA.Many of these deltoid measurements were found to be more predictive of aTSA and rTSA postoperative outcomes than patient demographic data, comorbidity data, and diagnosis data.
This study is the largest of its kind to analyze deltoid muscle morphology and correlate those objective image-based measures of deltoid size/shape and muscle quality to clinical outcomes after aTSA and rTSA.This research is also significant because it illustrates how the application of an ML-based framework can evolve traditional CT-based preoperative planning software into an evidence-based ML-CDST.Over the past decade, CT-based preoperative planning software has become widely adopted for worldwide use with shoulder arthroplasty.This software helps surgeons better appreciate a patient's bony morphology and/or deformity of the scapula and/or humerus, facilitating personalized implant type/size selection and identification of the precise implant position that best fits a patient's bone while avoiding impingement.However, the ideal placement of any implant for any bony deformity is currently unknown, and despite 10+ years of clinical use of preoperative planning software with shoulder arthroplasty, no consensus guidelines exist for how to utilize these tools to optimally position implants for various bony deformities [44,45].Effectively, the current state only facilitates surgeons to shape match a virtual implant model to a bone model however they think best.Furthermore, no preoperative planning software currently provides visualization and/or analysis of a patient's muscles; therefore, the use of these tools and any heuristic derived from their use has been exclusively based only on bone visualization.Our study, using the deltoid as an example, demonstrates that an ML framework can, at scale, automatically segment and analyze muscles from preoperative CT images and then input that objective muscle data into an ML-based model that more accurately predicts postoperative clinical outcomes after aTSA and rTSA.Deployment of this ML framework within CT-based preoperative planning software can facilitate a more quantitative assessment of the shoulder that can be helpful to better characterize a patient's diagnosis/pathology on a continuum, as opposed to just a subjective classification.We demonstrated that these deltoid measurements are correlated to and highly predictive of clinical outcomes after aTSA and rTSA, and because these measurements are some of the most commonly used and important features driving each ML clinical outcome model, we demonstrated that these automated CT image-based measurements can be used as a substitute for the "minimal feature set" of manual inputs (i.e., surgeon-measured preoperative range of motion data and patient subjective survey questions related to pain and function) without sacrificing ML predictive accuracy.As such, this framework, when integrated into the CT planning software, effectively creates an automated personalized ML-CDST, requiring little-to-no manual input.
The strength of our study is the large scale of our CT analysis, which quantified numerous radiomic measurements of deltoids from 1057 patients.Nearly all the clinical literature related to the analysis of shoulder muscles has limited generalizability and limited statistical power due to small sample sizes (n~100 or less).Additionally, due to the complexity and technical challenge of quantifying deltoid muscle characteristics from medical imaging, the results of many of these studies are further limited due to the simplified 2D methodologies that they deployed to characterize the 3D deltoid morphology and/or methodologies, which only analyze a small portion of the muscle in 3D [14][15][16][17][18].Our ML framework quantified a mean deltoid volume of 342.2 ± 113.0 cm 3 ; however, we observed differences in mean deltoid volume by age and gender, with male patients having a mean deltoid volume between 397.0 to 478.6 cm 3 and females having a mean deltoid volume between 237.1 to 304.1 cm 3 .Our deltoid volume measurements are similar to the 380.5 ± 157.7 cm 3 reported by Holzbaur et al. [12], who analyzed the deltoid volume from MRI scans of 10 healthy patients (5M/5F) with a mean age of 28.6 ± 4.5 years (range: 24-37 years) and similar to the of 313.7 ± 77.3 cm 3 reported by Vidt et al. [13], who analyzed the deltoid volume from MRI scans of 18 elderly patients with a mean age 75.1 ± 4.3 years (range: 66-83 years) [12,13].Small differences in deltoid volume between studies are likely due to the differences in patient age and gender; specifically, regarding the Holzbaur et al. study [12], our patient cohort was substantially older (70.0 ± 7.9 years, range: 38-92 years), and our results demonstrate that deltoid volume declines between 6-8% per patient age-decade.
Our ML framework quantified a mean deltoid fatty infiltration of 20.0 ± 9.9%; however, we observed differences in deltoid fat percentage by age, gender, and, most importantly, by CT image kernel, where, specifically, BONE kernel patients had a deltoid fatty infiltration of 13.0 ± 5.6% and FC30 kernel patients had a deltoid fatty infiltration of 28.4 ± 4.6%.Only a few studies quantified deltoid fatty infiltration as a percentage of volume.Vidt et al. analyzed the fat percentage of all shoulder abductors, including the deltoid, from MRI scans of 18 elderly patients and reported an overall fat percentage of 26.5 ± 2.4%, with males (25.6 ± 2.4%) having a slightly lower mean fat percentage than females (27.6 ± 2.2%) [13].Kälin et al. analyzed the deltoid fatty infiltration from MRI scans of 76 patients (37M/39F) and reported a deltoid fatty infiltration rate of 6.7 ± 2.7% for females and 6.9 ± 3.5% for males [16].Similarly, Wiater et al. analyzed the deltoid fatty infiltration from MRI scans of 25 rTSA patients and reported a deltoid fatty infiltration rate of 7.9 ± 4.3% [15].Given that these measurements are all HU-based, differences in fatty infiltration between studies are most likely related to differences in measurement techniques and also in the acquisition and reconstruction protocols of the medical images from which the measurements are derived.Future work is required to further investigate the impact of different HU-based thresholds and the impact of different CT image kernels on the accuracy of fat volume segmentation.
Both the size/shape and quality of a muscle are important and related to its force production capacity.A larger muscle has a greater cross-sectional area and, therefore, a larger moment arm; as such, it is likely that patients with more muscle volume have improved biomechanics.Specifically related to rTSA, patients with larger deltoid volumes likely have larger deltoid abduction moment arms and greater deltoid efficiency, requiring less deltoid force to elevate the arm.Conversely, rTSA patients with deltoids having high fatty infiltration likely have a reduced force capacity and may have less efficient deltoids, requiring more deltoid force to elevate the arm.These differences in joint loading may have important clinical implications on rTSA complication rates, particularly for acromial and scapular stress fractures, instability, and aseptic loosening.
A few small studies have suggested that deltoid muscle volume and fatty infiltration impact rTSA clinical outcomes.Greiner et al. quantified deltoid fatty infiltration in 18 rTSA patients and reported that patients with greater deltoid fatty infiltration had significantly lower Constant scores [19].Wiater et al. quantified deltoid volume in 30 rTSA patients and deltoid fatty infiltration in 25 rTSA patients and reported that deltoid volume significantly correlated to 2-year minimum Constant, ASES, and subjective shoulder value scores and reported that greater deltoid fatty infiltration significantly correlated to decreased 2-year minimum ASES scores [15].Yoon et al. quantified deltoid volume (normalized by BMI) in 35 rTSA patients with 1-year minimum clinical follow-up and reported that deltoid volume was significantly correlated to the Constant score, forward elevation, and external rotation [20].McClatchy et al. quantified deltoid volume for a small region of the deltoid in 107 patients from a combination of both MRI and CT images and reported that both deltoid volume and deltoid volume normalized by BMI were significantly (positively) associated with satisfactory levels of forward elevation; additionally, a sub-analysis clarified that preoperative deltoid volume was significantly associated with satisfactory levels of forward elevation for rotator cuff deficient patients but not those with an intact rotator cuff [17].
Our large-scale multi-center analysis of deltoids from 1057 patients has the statistical power to investigate the impact of various 3D deltoid radiomic measures across different age and gender cohorts to provide more valid and generalizable results.We observed that normalized deltoid volume was associated with significantly more preoperative abduction (for both male and female patients), significantly more forward elevation and internal rotation, and a significantly higher SAS score, specifically for male patients.However, regarding 2-year minimum rTSA outcomes, larger deltoid volumes were not associated with more shoulder motion or higher clinical outcome scores for either male or female patients.Our analysis related to the impact of fatty infiltration was less clear, at least for preoperative clinical measures, as the direction of the correlations seems to be partially dependent upon CT image kernel.Female patients with BONE kernel CT images having greater fatty infiltration were associated with significantly more preoperative abduction, external rotation, strength, and a significantly higher global shoulder function score but significantly less preoperative internal rotation; in contrast, female patients with FC30 kernel CT images having greater fatty infiltration were associated with a significantly lower preoperative SAS score.Male patients with FC30 kernel CT images having greater fatty infiltration were associated with significantly less forward elevation.However, 2-year minimum postoperative trends were more consistent, as rTSA patients with either BONE kernel or FC30 kernel CT images who had greater fatty infiltration were associated with significantly less strength and significantly lower Constant and ASES scores.Additionally, patients with BONE kernel CT images having greater fatty infiltration also had significantly less forward elevation and internal rotation and a significantly lower SAS score.
The relationship between deltoid muscle atrophy and deltoid fatty infiltration is unclear.It may be that these are independent processes, each generally correlated with loss of muscle strength and shoulder function, or it may be that these processes are related.Generally, loss of muscle mass and increased fatty accumulation are thought to be a result of complex interactions of factors, including decreased muscle repair capacity secondary to metabolic changes, increased insulin resistance, higher percentage/redistribution of body fat, decreased physical activity, changes in hormone levels, nutritional deficits, and chronic inflammation [46][47][48].Future work is required to investigate etiologic pathways and better understand the relationship between deltoid muscle atrophy and deltoid fatty infiltration for each shoulder arthroplasty-related diagnosis.Future work is also required to investigate the impact of intramuscular fat and extramuscular fat, as well as the impact of diffuse vs. localized fat accumulation on clinical outcomes.Ultimately, an improved understanding of these pathways will likely improve treatment decision-making (including the timing of treatment), as it is likely that the quality and size/shape of the shoulder muscles at the time of intervention, as well as their likelihood of progressive degradation, impacts clinical outcomes after aTSA and rTSA.
This multi-phase study (1.CT image analysis, 2. CT image/clinical data collection and analysis, and 3. the ML model development) has numerous limitations.First, regarding the limitations of our CT image analysis, our deltoid muscle analysis only characterized the overall deltoid muscle volume and did not analyze the characteristics of any individual deltoid muscle segment.Future work is required to divide the deltoid into different functional segments, such as the anterior deltoid, middle deltoid, and posterior deltoid, and/or sub-segments using the seven different intramuscular tendons, as identified by Sakoma et al. [49], and separately analyze the radiomic measures associated with each segment to further investigate the impact on clinical outcomes after aTSA and rTSA.Similarly, future work should compare relative size differences between deltoid muscle segments to identify the impact of any abnormal muscle imbalance on preoperative function or glenoid bone wear patterns and identify the impact of any abnormal muscle imbalance as a risk factor for complications, like instability or aseptic glenoid loosening.Second, we did not analyze the rotator cuff as others [50][51][52][53][54][55][56][57][58] have attempted, or other muscles in the shoulder, like pectoralis, latissimus, or scapular elevators.It is possible that injuries, degeneration, or abnormalities associated with any of these muscles may have some impact on deltoid morphology and function.Future work will deploy a similar CT-based ML framework to analyze the rotator cuff and other shoulder muscles to investigate any such relationships with the deltoid and better understand the impact that each shoulder muscle group has on clinical outcomes after aTSA and rTSA.Third, regarding the radiomic HU-based measurements, we assumed that HU intervals of −190 to −30 accurately characterized fat and HU intervals of −29 to 150 accurately characterized muscle; we acknowledge that this HU threshold likely included some transitional/hybrid tissue in either the bone or muscle segments.Future work should investigate if tissue with HU intervals of −29 to 29 is normal or should be excluded from the muscle/fat segmentation analysis.Additionally, it is likely that the use of a single HU threshold across convolution kernels is not appropriate given the differences we observed in HU distribution between kernels, where, specifically, we observed that some kernels (e.g., BONE kernel) are associated with lower HU fat percentages than other kernels (e.g., FC30 kernel).Kernels with greater noise, like the FC30 images, may interfere with HU threshold calculations of fat, potentially overstating the fatty infiltration quantified with FC30 images.This may explain the surprising lack of significant correlation between patient BMI and fatty infiltration in the FC30 patients, as was observed for male and female patients with BONE images.Furthermore, it is currently unclear if the same attenuation ranges should be used for HU thresholding for patients of different ages, genders, and ethnicities.Fourth, we quantified the deltoid using fourteen different radiomic features (six volumetric-based measurements and eight HU intensitybased measurements); it should be noted that there is a library of additional radiomic measurements that we could have included, and it is possible that some of these additional measurements not analyzed in our study have some predictive capability.Future work is required to assess the validity of these additional measurements and ensure that the most relevant measurements with the greatest contribution to predicting postoperative range of motion, strength, and function are included in the final deployed image-based predictive models.Fifth, even though we identified many deltoid image features that were highly predictive and useful to the ML models, it is unclear exactly why the ML models utilized some features more than others.For example, deltoid shape flatness was the consensus most important feature utilized across all the ML predictive models; however, few differences in preoperative and postoperative clinical outcomes were observed for aTSA or rTSA when stratifying by this image-based measurement.As such, future work should seek to improve the interpretability of these ML predictions and better understand under which conditions these image-based measures are most predictive of clinical outcomes.Given that the F-Score describes the frequency of decision tree splits, it may be that deltoid shape flatness is useful (by itself or in combination with other size/shape and muscle quality measures) to inform the development of a new deltoid morphology classification system that is relevant to aTSA/rTSA clinical outcomes.Future work should attempt a clustering analysis of deltoid measures to identify any relevant relationships/associations.Sixth, our deltoid image-based clinical outcome comparison included patients of multiple different diagnoses, as opposed to a more homogenous cohort.As such, it is unclear from our analysis if these deltoid image features are descriptive of the etiologic mechanisms associated with each of the patient's various disease diagnoses or if these radiomic measures are merely descriptive of secondary symptoms.Future work is necessary to analyze patient cohorts with homogenous diagnoses to better understand the capability of these measures to better characterize a patient's diagnosis/pathology on a continuum for each shoulder arthroplasty-related diagnosis.
Regarding the limitations associated with CT image/clinical data collection, first, our study utilized images from multiple clinical sites, which provided CT images acquired from different CT scanner manufacturers using slightly different imaging protocols (though each CT scan met the minimum requirements specified by ExactechGPS protocol; notably, all CT images were collected within 6 months of the patient's surgery, the maximum allowable slice thickness was 1.25 mm, and 65% of CT images had a slice thickness of 0.5 or 0.625 mm).While our results demonstrated that different CT image kernels are associated with different radiomic measures, our results are limited because we did not have an equivalent distribution of kernels within our dataset.Future work should harmonize CT-derived metrics using batch effect correction methods to improve measurement reliability [59].Second, we did not have any longitudinal CT images for any patients, so we were unable to analyze the repeatability of any deltoid measurements.Third, we did not have multiple CT images of any patient from different CT scanner manufacturers and/or CT images with different convolution kernels; therefore, we were unable to compare any radiomic measurement directly for the same patient at equivalent timepoints between CT image types.However, we did separately analyze radiomic measures for the various convolution kernels and identified numerous differences between measurements between kernels.Finally, we did not have CT images of healthy patients for use as a control for our deltoid measurements; future work should obtain CT images of healthy patients and identify baselines for each deltoid image measurement, ideally for patients of similar age as those analyzed in our study.
Regarding the limitations associated with the ML model development and underlying clinical data, first, the patients in this study were contributed to by 18 different clinical sites, including >25 surgeons, and data from each site/surgeon inevitably contain some bias.As such, the derived models will also contain bias [42,43,60].To reduce collection bias and input variability, all sites were trained to collect data using standardized data forms, and all completed forms were independently verified before computer-scoring on a secured database.Second, surgeons who contributed clinical data were experienced shoulder specialists who had multiple years of experience with the prosthesis utilized in this study; as such, these predictions may not be translatable to less experienced surgeons or to surgeons who have not completed the learning curve with these devices.Third, our clinical database consists only of patients who elected to undergo shoulder arthroplasty, and those patients are primarily elderly, non-Hispanic, and Caucasians of European descent.For example, we did not collect data on individuals who were candidates for shoulder arthroplasty but elected to forgo surgery due to comorbid illness and financial or personal reasons.Therefore, model predictions may not be representative of the outcomes achieved by patients of different demographics, regions, or ethnicities/races, and model predictions may be biased against patients too sick to safely undergo the procedure or patients whose condition was not sufficiently degenerative to have the procedure.However, recent research by Allen et al. [36] demonstrated that the ML models utilized by Predict+ accurately predict clinical outcomes after aTSA and rTSA for patients of different ethnicities, sexes, and ages.Fourth, our models were developed from a dataset of primary aTSA and primary rTSA patients using one platform shoulder prosthesis, where patients with revisions, humeral fractures, or hemiarthroplasty were excluded; therefore, model predictions may not be appropriate for those excluded indications or other prosthesis types or designs.Fifth, our study utilized one tree-based machine learning technique to construct algorithms that quantify outcomes after shoulder arthroplasty; other techniques, such as deep learning, could achieve better predictive accuracy than XGBoost, as has been shown previously using the Wide and Deep [61] ML technique.Despite these small improvements in accuracy using Wide and Deep, we utilized the XGBoost in our study because its predictions are more interpretable, providing an F-score that identifies the most meaningful parameters used by the model.

Conclusions
Our ML framework successfully analyzed CT images from 1057 patients and quantified numerous volumetric-based and HU intensity-based deltoid measures, many of which were demonstrated to be relevant to preoperative and postoperative clinical outcomes after aTSA and rTSA.Incorporating these preoperative CT-based deltoid image measurements into our ML models improved the accuracy of the clinical outcome predictions, particularly for abduction and forward elevation after aTSA and rTSA.While future work is required to further refine the ML models and include additional shoulder muscles, like the rotator cuff, our results show promise that the developed ML framework can be used to evolve traditional CT-based preoperative planning software into an evidence-based ML-CDST.
Author Contributions: All authors contributed to various aspects of this multi-phase study, including conceptualization, study methodology, model validation, analysis, writing, and editing.All authors have read and agreed to the published version of the manuscript.

Figure 1 .
Figure 1.Deltoid Muscle Model Workflow, From Segmentation of CT Images to Model Creation, Volume Analysis, Hounsfield Unit Thresholding Analysis, and 3D Visualization.

Figure 1 .
Figure 1.Deltoid Muscle Model Workflow, From Segmentation of CT Images to Model Creation, Volume Analysis, Hounsfield Unit Thresholding Analysis, and 3D Visualization.

Figure 2 .
Figure 2. Representative Deltoid Muscle with Low ((a-c): top row), Medium ((d-f): middle row), and High ((g-i): bottom row) Levels of Fatty Infiltration for 3D Volumes Reconstructed from BONE Kernel CT Images.

Figure 3 .
Figure 3. Histogram Comparison of Normalized Deltoid Volume (left), Deltoid Flatness (middle), and Deltoid Muscle Fatty Infiltration Percentages (right) for Different CT Convolution Kernels.Note the difference in distribution for the Hounsfield thresholding-based measurements of Deltoid fat percentage between BONE and FC30, which are the two most common convolution kernels for CT images in this study.

Figure 2 .
Figure 2. Representative Deltoid Muscle with Low ((a-c): top row), Medium ((d-f): middle row), and High ((g-i): bottom row) Levels of Fatty Infiltration for 3D Volumes Reconstructed from BONE Kernel CT Images.

Figure 2 .
Figure 2. Representative Deltoid Muscle with Low ((a-c): top row), Medium ((d-f): middle row), and High ((g-i): bottom row) Levels of Fatty Infiltration for 3D Volumes Reconstructed from BONE Kernel CT Images.

Figure 3 .
Figure 3. Histogram Comparison of Normalized Deltoid Volume (left), Deltoid Flatness (middle), and Deltoid Muscle Fatty Infiltration Percentages (right) for Different CT Convolution Kernels.Note the difference in distribution for the Hounsfield thresholding-based measurements of Deltoid fat percentage between BONE and FC30, which are the two most common convolution kernels for CT images in this study.

Figure 3 .
Figure 3. Histogram Comparison of Normalized Deltoid Volume (left), Deltoid Flatness (middle), and Deltoid Muscle Fatty Infiltration Percentages (right) for Different CT Convolution Kernels.Note the difference in distribution for the Hounsfield thresholding-based measurements of Deltoid fat percentage between BONE and FC30, which are the two most common convolution kernels for CT images in this study.

Figure 4 .
Figure 4. Representative Deltoid Muscle with Low (top row, more planar), Medium (middle row), and High (bottom row, more curvature) Values of Deltoid Shape Flatness for 3D Volumes Reconstructed from BONE Kernel CT Images.

Funding:
No funding was provided to complete this specific study; however, Exactech Inc. (Gainesville, FL) sponsored clinical data collection for the clinical outcomes data and CT image data used in this machine learning analysis.Institutional Review Board Statement: All clinical image and clinical outcomes data utilized in this multi-center study were collected utilizing an IRB-approved protocol.Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.Data Availability Statement: Data are contained within the article.

Table 1 .
Definitions of Volumetric-Based and Hounsfield (HU) Intensity-Based Deltoid Image Measurements.

Table 4 .
Comparison of Deltoid Volume for Patients of Different Genders and Ages at the Time of Surgery.

Table 5 .
Impact of Normalized Deltoid Volume on Preoperative Clinical Outcome Measures for Male and Female Patients.

Table 6 .
Impact of Normalized Deltoid Volume on 2-year Minimum Clinical Outcome Measures for aTSA and rTSA Patients Stratified by Gender.

Table 7 .
Relationship between Deltoid Fat Percentage and Patient Demographics and Comorbidities for Male and Female Patients Stratified by the BONE Kernel and FC30 Kernel CT Image Cohorts.

Table 8 .
Impact of Deltoid Fat Percentage on Preoperative Clinical Outcome Measures for Male and Female Patients Stratified by the BONE Kernel and FC30 Kernel CT Image Cohorts.

Table 9 .
Impact of Deltoid Fat Percentage on 2-year Minimum Clinical Outcome Measures for aTSA and rTSA Patients Stratified by the BONE Kernel and FC30 Kernel CT Image Cohorts.

Table 10 .
Comparison of the Mean Absolute Error (MAE) Associated with Two Different Machine Learning Models (Image-Based vs. No-Image Data) to Predict Clinical Outcomes at 1 year, 2-3 years, and 3-5 years after aTSA and rTSA.

Table 11 .
Classification Prediction Performance Associated with Two Different Machine Learning Models (Image-Based vs. No-Image Data) to Predict aTSA and rTSA Clinical Improvement at 2-3 Years Follow-Up Greater Than the MCID Threshold for Multiple Different Outcome Measures.

Table 12 .
Classification Prediction Performance Associated with Two Different Machine Learning Models (Image-Based vs. No-Image Data) to Predict aTSA and rTSA Clinical Improvement at 2-3 Years Follow-Up Greater Than the SCB Threshold for Multiple Different Outcome Measures.

Table 13 .
Comparison of the Relative Feature Importance Ranking of Each CT-based Deltoid Image Parameter to Predict 2-3-year Clinical Outcomes after aTSA and rTSA.

Table 14 .
Comparison of the Top 10 Preoperative Model Inputs (by F-Score Ranking) Used to Predict 2-3-year Clinical Outcomes after aTSA and rTSA.

Table 15 .
Pearson Correlation for a Selection of Deltoid Image Parameters to Preoperative and 2-Year Minimum Postoperative Outcome Measures for aTSA and rTSA.Moderate Correlations (>±0.3) or Higher Highlighted for Emphasis.

aTSA, rTSA Deltoid Shape Flatness Normalized Deltoid Volume Deltoid Shape SphericityTable 16 .
Impact of Deltoid Shape Flatness on Preoperative Clinical Outcome Measures for Male and Female Patients.

Table 17 .
Impact of Deltoid Shape Flatness on 2-year Minimum Clinical Outcome Measures for aTSA and rTSA Patients Stratified by Gender.

Table 18 .
Comparison of the Mean Absolute Error (MAE) Associated with Two Different Machine Learning Models (Automated Image-Based with No Surgeon Range of Motion Measurements or Patient Subjective Inputs vs. No-Image Data) to Predict Clinical Outcomes at 1 year, 2-3 years, and 3-5 years after aTSA and rTSA. ASES