WindowNet: Learnable Windows for Chest X-ray Classification

Public chest X-ray (CXR) data sets are commonly compressed to a lower bit depth to reduce their size, potentially hiding subtle diagnostic features. In contrast, radiologists apply a windowing operation to the uncompressed image to enhance such subtle features. While it has been shown that windowing improves classification performance on computed tomography (CT) images, the impact of such an operation on CXR classification performance remains unclear. In this study, we show that windowing strongly improves the CXR classification performance of machine learning models and propose WindowNet, a model that learns multiple optimal window settings. Our model achieved an average AUC score of 0.812 compared with the 0.759 score of a commonly used architecture without windowing capabilities on the MIMIC data set.


Introduction
To better differentiate subtle pathologies, chest X-rays (CXR) are commonly acquired with a high bit-depth.For example, the images in the MIMIC data set provide 12-bit gray values, see Johnson et al. (2019).However, to reduce the file size and save bandwidth, these images are often compressed to a lower bit-depth.The Chest X-ray 14 data set, for example, was reduced to 8-bit depth before publication (Wang et al., 2017).Under optimal conditions, the human eye can differentiate between 700 and 900 shades of gray, or 9-to 10-bit depth (Kimpe and Tuytschaever, 2007).Hence, radiologists cannot differentiate all 12-bit gray values when inspecting a chest X-ray.To better identify subtle contrasts, they apply a windowing operation to the image: they increase the contrast by limiting the range of gray tones (see Figure 1).These windowing operations can be specified by their center (level) and width.
In contrast to chest radiographs, gray values in computed tomography (CT) images are calibrated to represent a specific Hounsfield unit (HU) (Maier et al., 2018).For example, a HU value of -1000 corresponds to air, 0 HU to distilled water at standard pressure and temperature, bones range from 400 HU to 3000 HU (Maier et al., 2018).To highlight the lung in a chest CT image, one could apply a window with a level of -600 HU and width of 1500 HU (Kazerooni and Gross, 2004).In other words, everything below -1350 HU is displayed as black and above 150 HU as white.Consequently, more distinct gray tone values can be used for the specified range, resulting in a higher contrast.
For CT images, several studies showed that windowing improves classification performance of deep neural networks (Karki et al., 2020;Huo et al., 2019;Lee et al., 2018;Kwon and Choi, 2020).For CXR, no quantitative scale like the Hounsfield Unit exists.Nevertheless, radiologists window CXR for enhanced contrasts during inspection.Furthermore, depending on the region of interest, they use different window settings.This observation leads to the following research questions: does windowing affect chest X-ray classification performance and if yes, can windowing improve it?To the best of our knowledge, so far, chest X-rays are commonly processed by a deep learning model without applying any windowing operation (for example, (Rajpurkar et al., 2017;Wollek et al., 2023a)).This study investigates the effect of windowing on chest X-ray classification and proposes a model, WindowNet, that learns optimal windowing settings.
Our contributions are: • We show that a higher bit-depth (8-bit vs. 12-bit) improves chest X-ray classification performance.
• We demonstrate that applying a window to the chest radiograph as a pre-processing step increases classification performance.
• We propose WindowNet, a chest X-ray classification model that learns optimal windowing settings.

Data Set
To investigate the importance of windowing on chest X-ray classification we selected the MIMIC data set, as it is the only publicly available, largescale chest X-ray data set with full bit-depth (Johnson et al., 2019).The MIMIC data set provides chest radiographs in the original Digital Imaging and Communications in Medicine (DICOM) format with 12-bit depth gray values, containing 377,110 frontal and lateral images from 65,379 patients.The images have been labeled according to the 14 CheXpert classes: atelectasis, cardiomegaly, consolidation, edema, enlarged cardiomediastinum, fracture, lung lesion, lung opacity, no finding, pleural effusion, pleural other, pneumonia, pneumothorax, and support devices (Irvin et al., 2019).In our experiments, we used the provided training, validation, and test splits.During pre-processing, the images were resized to 224 × 224 pixels.

WindowNet
To incorporate windowing into the model architecture, we extended the baseline architecture by prepending a windowing layer, as illustrated in Figure 2. In the following, we refer to this model as WindowNet.
We implemented the windowing operation as a 1 × 1 convolution with clamping, similar to (Lee et al., 2018).This implementation of windowing utilizing convolutional kernels enables the model to learn and use multiple windows in parallel.As the pre-trained DenseNet-121 expects three input channels, we added an additional 1×1 convolution with three output channels after the windowing operation.Following the windowing layer, the images are scaled to the floating point range (0.0, 255.0) and then normalized according to the ImageNet mean and standard deviation.

Training
Both models were trained with binary cross-entropy loss, AdamW optimization with a learning rate of 1e-4 (Loshchilov and Hutter, 2019), and a batch size of 32.During training, the learning rate was divided by a factor of 10 if the validation loss did not improve in three consecutive epochs.The training was stopped if the validation loss did not improve after 5 consecutive epochs.The final models were selected based on the checkpoint with the highest mean validation area under the receiver operating characteristic curve (AUC).

Experiments 2.3.1. 8-Bit vs. 12-Bit
As applying a windowing operation in our experiments required a higher initial bit-depth than conventionally used for chest X-ray image classification, we first tested the effect of bit-depth on classification performance.We trained the baseline model with 8-bit and 12-bit depth and compared mean, and class-wise AUC scores.In both settings no windowing operation was applied.However, the 12-bit images were still scaled to the floating point range (0.0, 255.0).In both settings, the images were normalized according to the ImageNet mean and standard deviation.

Single Fixed Window
To investigate whether windowing has an effect on classification performance, we trained the baseline model with a single fixed windowing operation applied to the 12-bit CXRs.After windowing, the images were scaled to have a maximum value of 255 and normalized according to the ImageNet mean and standard deviation.
For windowing, we use a fixed window level of 100, and levels ranging from 250 to 3500 in steps of 250.All levels were combined with fixed window widths of 500, 1000, 1500, 2000, and 3000.For evaluation, we compared the mean and class-wise AUCs of each model to the baseline with no windowing, i.e., a window level of 2048 and width of 4096.

Trainable Multi-Windowing
To test if end-to-end optimized windows improve chest X-ray classification performance we compared our proposed WindowNet to the baseline and a modified WindowNet without clamping in the windowing layer (No Windowing), i.e., a conventional 1 × 1 convolutional layer.

Theory
A windowing operation can be described by its center (window level) and width (window width).Formally, the windowing operation applied to a pixel value px can be defined as: Where U is the upper limit and L the lower limit of the window defined by the window level WL and window width WW.
For efficient training, the windowing operation can be re-written using a clamped 1×1 convolution.Here, the weight matrix is initialized as W = U W W and the bias term as b = − U W W L, similar to (Lee et al., 2018): To recover the window level and width after training, we compute:

8-Bit vs. 12-Bit
The classification AUCs, when trained with 8-bit or 12-bit depth, are shown in Table 1.Training with 12-bit images improved the average classification performance compared to 8-bit images (0.772 vs. 0.759 AUC).Also, most (12/14) class-wise AUCs increased when training with a higher bit-depth.The only exceptions where atelectasis and pleural effusion, where training with 8-bit images resulted in slightly higher AUC with 0.751 vs. 0.749 and 0.883 vs. 0.879, respectively.

Single Fixed Window
The results of training with fixed window chest X-rays are reported in Table 2.They demonstrate that windowing improved chest X-ray classification AUCs for most classes (12/14) except for fracture and pneumonia with AUCs of 0.710 vs. 0.706 and 0.698 vs. 0.690, respectively.On average, the window with level 2500 and width 3000 performed slightly better than the full range with an AUC of 0.775 vs. 0.772.Across all windows, a window width of 3000 performed best with varying window levels.
A comparison of the four best-performing windows to the baseline is shown in Table 3.All five settings achieved similar average AUC scores.No single window performed consistently better across all classes, suggesting that multiple windows could overall improve the classification performance.

Discussion
In this study, we investigated the importance of windowing, inspired by radiologists.Our results show that our proposed multi-windowing model, WindowNet, considerably outperformed a popular baseline architecture with a mean AUC of 0.812 compared to 0.759 (see Table 4).As a necessary pre-condition, we also demonstrated that the common bit-depth reduction negatively affected classification performance (0.759 vs. 0.772 AUC), as seen in Table 1.
Similarly to related work in the CT domain (Lee et al., 2018;Karki et al., 2020;Kwon and Choi, 2020), our results show that windowing is a useful pre-processing step for neural networks operating on chest X-rays.These findings are also in line with the observed manual windowing performed by radiologists in their daily practice.In addition, like radiologists apply multiple windows when inspecting a single image, no single window was better across classes, including not windowing at all (see Table 2).
When comparing our proposed WindowNet with the same architecture but without windowing, in other words, a conventional 1 × 1 convolution, our results showed that the windowing operation is an important aspect of the architecture (see Table 4).When inspecting the learned windows, see Figure 3, the windows converged to 14 different settings.This provides further evidence that multiple windows are important for classification performance.
While our study's results are promising, limitations include the exploratory nature of the study and the evaluation on a data set from a single institution, due to lack of other high bit depth public data sets.Further research is needed to show generalization to other data sets and institutions.Another limitation is that the model learns general windowing settings.In contrast, radiologists adapt the windowing setting based on the specific image.Future work could investigate an image-based window setting prediction layer.
In conclusion, we believe our work offers an important contribution to the field of computer vision and radiology by demonstrating that multiwindowing strongly improves chest X-ray classification performance, as shown by our proposed model, WindowNet.

Acknowledgments
This work was supported in part by the German federal ministry of health's program for digital innovations for the improvement of patientcentered care in healthcare [grant agreement no.2520DAT920].

Figure 1 :
Figure 1: Applying a windowing operation enhances the contrast of particular structures of an image.For example, the depicted windowing operation improved cardiomegaly classification performance on the MIMIC data set.

Figure 2 :
Figure 2: Optimal multi-window chest X-ray classification.Our proposed WindowNet architecture learns to optimize multiple windows for improved classification.

Table 2 :
Effect of fixed windowing on chest X-ray classification AUCs.For each finding, the best performing window and the baseline without no windowing are reported.Higher AUCs values are highlighted in bold.Enlarged Cardiom.= enlarged cardiomediastinum.
For example, pneumothorax classification AUC improved from 0.802 to 0.886 with windowing.Only for the fracture and pleural other class the baseline model performed better with an AUC of 0.664 vs. 0.615 and 0.823 vs. 0.793, respectively.