3. Methods
3.1. Overview
We propose an ensemble approach for lung segmentation that uses a novel selective method to select lung segmentation networks that produce the best segmentations for a given CR. To create the collection of networks to select from, we used U-Nets with depths varying from 1–5 and DeepLabv3+ with two base networks, ResNet18 and ResNet50. The Shenzhen dataset was separated into a 400-image partition and a 166-image partition to train the collection of lung segmentation networks. All images were zero-padded to become square images and resized to 256 × 256 to match the inputs of the networks. The length of the third dimension of the input was maintained at 1 for U-Net and at 3 by duplicating the same image three times for DeepLabv3+ networks.
The introduced selective method is a deep learning model that classifies whether a certain lung segmentation network is ‘selected’ or ‘not selected’, indicating whether the segmentation produced by that network is accurate/has a high IoU with the true lung mask. Therefore, the lung segmentations generated by all seven networks for the 166 CR partition were re-labeled based on the resulting lung mask’s IoU with the true lung mask: ‘not selected’ if IoU
and ‘selected’ if IoU
. This labeled dataset formed using the 166 CR partition was used to train the selective model, which is referred to as the selector network classifier throughout this paper. It is important to note that the segmentation masks generated by all the above-mentioned networks are applied to the original image, as shown in the example in
Figure 2, to reform the 166 CR partition to be used as training and testing sets for this selector network. The example in
Figure 2 shows a CR, a lung mask with an IoU
generated by DeepLabv3+ with ResNet50 for this CR, and the CR segmented using the mask. Then, this dataset was divided into training (72%), validation (8%), and test (20%) datasets to fine-tune a ResNet18 [
39] pre-trained on ImageNet [
40] that classifies between ‘selected’ and ‘not selected’ images.
We identified which network(s) generated segmentations for a test case ‘selected’ using the selector network classifier. The test case was then segmented using the identified networks to generate ‘selected’ segmentations, the pixel scores were averaged, and a threshold (<0.5) was applied to generate a binary mask. If none of the networks generated ‘selected’ segmentations, the corresponding test case was considered to not meet the necessary standards. For such a case, the algorithm does not proceed to output a segmentation. Therefore, the performance was solely estimated based on the cases that are selected.
3.2. U-Net Lung Segmentation Network Architectures
U-Net is a network that is popular in segmenting tasks related to medical images. The U-Net architecture starts with a contracting path that is similar to a convolutional network architecture and targets capturing the context of an input to the network [
21]. The context is captured by downsampling within the contracting path. The contracting path is followed by an expanding path that upsamples the captured context similar to a decoder and performs a precise localization [
21]. The feature maps on expanding path concatenate with feature maps from the contracting path via skip connections, regaining information that may have been lost during downsampling.
The basic building block of a downsampling step in the contracting path consists of two
convolutions without padding, each followed by a ReLU layer, and a
max pooling layer with a stride of 2 [
21].
Figure 3 shows an example U-Net architecture that exhibits each step of this path. In the expanding path, a block starts with a
up-convolution. This is followed by concatenation with the contracting path’s feature maps and two
convolutions (each followed by a ReLU layer) [
21]. The depth of a U-Net can be changed based on the number of such blocks used in the contracting (encoder) path of the architecture; the same number of blocks on the expanding path decodes the encoded features. After the contracting path, the third dimension of the feature vector is mapped to the number of classes using a
convolution. We used five U-Net architectures with depths from 1 to 5 in the collection of initial networks considered by the selector network classifier. The example U-Net architecture in
Figure 3 has a depth of 3, where the lowest resolution reaches
pixels.
3.3. DeepLabv3+ Lung Segmentation Network Architectures
DeepLabv3+ is another encoder–decoder structure-based deep convolutional network that is widely used for semantic segmentation of medical images. It uses Atrous Spatial Pyramid Pooling (ASPP) to encode the contextual information at multiple scales, and the decoding path effectively recovers the boundaries of the object being segmented [
22]. In the DeepLabv3+ architectures used to generate results for the proposed method, pre-trained ResNet18 and ResNet50 CNN models were used as the underlying feature extractors during encoding.
3.4. Selector Network Classifier Architecture
An image subjected to the proposed method proceeds to a network classifier that predicts which of the seven initial networks (five U-Net and two DeepLabv3+ versions) outputs the best segmentations. The selection of the best network is made by a pre-trained ResNet18 model that predicts one out of the ‘selected’ or ‘not selected’ classes as the output corresponding to each network. Therefore, the ResNet18 architecture was altered to classify between two classes. During training, the Adam optimizer was used along with other training parameters, with a mini-batch size set to 16, initial learn rate of , and validation enabled to select a model. The training and validation data were shuffled at each epoch and before each network validation at 50 iterations. The training continued until reaching 30 epochs or three occurrences where loss on validation was larger than or equal to the previously smallest loss. We call this classifier network the selector network classifier.
It is important that the segmented portion of the CR is extracted, similar to the image in
Figure 2c, for use in the training set or in testing. The labels of the training set for this network classifier are based on the IoU between the segmentation masks produced by seven initial networks and the true lung mask; a threshold of 0.9 for the IoU decides whether a segmentation is acceptable. When a segmented CR is tested with the selector network classifier, the posterior probabilities predicted for the ‘selected’ class are observed and a threshold of 0.5 is applied to decide whether the segmentation belongs in the ‘selected’ class.
3.5. Fusion Technique
For a given CR, the segmentation masks produced by the networks selected by the selector network classifier are fused to obtain a final segmentation mask of the lungs. The scores assigned to each pixel on the semantic segmentations generated by each selected network are averaged to obtain a single score distribution for the image. Then, the averaged score distribution is binarized by applying a threshold of 0.5 such that the possible lung area is assigned a binary value of 1. Morphological operations are applied following the binarization to fill any holes within objects and to select the largest two objects within the image.
Figure 4 shows the block diagram for the overall system. A CR is segmented by 7 initial networks, and each one of them is classified as ‘selected’ or ‘not selected’ by the selector network classifier. The networks are shown at the top of
Figure 4, and the selector network classifier is shown below to the right of them, while the inputs to are indicated by arrows. The arrows that exit the selector network classifier are shown in two colors to represent ‘selected’ and ‘not selected’ predictions. Note that the set of predictions shown in
Figure 4 is an example that can be output by the selector network classifier. Then, networks corresponding to the ‘selected’ cases are averaged and a threshold is applied to produce a binary segmentation mask. The large green arrows shown as inputs to the ‘Average Probability Scores’ block in
Figure 4 represents the probability score distributions output from ‘selected’ networks. The input arrow to the ‘Apply Threshold’ block represents the average probability distribution produced for the image. The arrow that exits this block shows the final binary lung mask after a threshold is applied.
5. Discussion and Conclusions
In this paper, we have proposed using a novel selective step for ensemble techniques used for lung segmentation in CRs. A selective process was developed to permit a deep learning network to predict the best-performing networks for a given case; hence, the selection is patient-specific. The introduced deep learning network is referred to in the scope of this research as the selector network classifier. Seven variations of U-Net and DeepLabv3+, which are networks known to perform exceptionally well in semantically segmenting lung fields on CRs, are considered for testing the selector network classifier. The encoder depth of the U-Net is varied from 1 through 5 to generate five of the seven networks. The other two networks are DeepLabv3+ with ResNet18 and ResNet50 as backbone architectures, respectively. Technically, the function of the selector network classifier is to predict which of the seven networks generates segmentation masks with an IoU higher than or equal to 0.9 for a given CR. The 166 CR partition of the Shenzhen dataset is used as the training data to train the selector network classifier. The data used to train the selector network classifier contain information within the lung areas detected by each network, which is ensured by multiplying each segmentation mask by the original CR image. Therefore, the features within the segmented lungs are visible, and the rest of the image pixels are assigned a value of zero.
The proposed selector network classifier follows the ResNet18 model, a widely used transfer learning technique. The model is pre-trained on ImageNet and altered to output two classes. As described in
Section 4.3, the fine-tuned ResNet18 model performs well in predicting whether a network generates segmentations with an IoU higher than or equal to 0.9. The selector network classifier shows acceptable performance, making 83.2% of all predictions correctly and with an AUC of 0.9016 for the ROC curve.
With the selector network classifier in place, the ensemble effort towards generating lung segmentations displays performance that exceeds that of individual lung segmentation networks and traditional ensemble methods. Evaluations over ten trials on two public datasets, MC and JSRT, revealed that the proposed method shows a higher mean IoU than the traditional methods. The MC and JSRT datasets exhibit 13% and 3% increases in mean IoU for the AoTE approach, while showing 13% and 5% increases for the AoAE approach. Compared to all methods, test performance on the MC and JSRT datasets show 21% and 9% increases. In addition, the mean IoU values produced by the proposed method for the MC dataset were higher than 0.9 during nine out of ten trials, with the exception a selected case considered with regard to the IoU during the labeling of the training data. Another key factor of note is the similar disease focus of TB in the training set (Shenzhen) and the MC dataset. The presence of a particular disease is a major factor that varies the features within a specific patient’s lung region on a chest X-ray. JSRT, on the other hand, has a different disease focus, namely, lung nodules; nonetheless, the proposed method shows a mean IoU very close to 0.9, and an improvement compared to the traditional methods. To consider the presence of a disease, further performance analysis using segmentations generated by the proposed algorithm in TB detection are performed, as shown in
Section 4.5. The use of our approach shows the highest AUC of 0.8 for the corresponding ROC compared to both the traditional approaches and direct use of raw images. This shows the effectiveness of our approach in detecting disease on a CR.
In general, most of the cases in the MC dataset are ‘selected’, while most cases in JSRT are classified as ‘not selected’. The images that are ‘not selected’ do not produce a segmentation mask, and are classified as images that do not meet the requirements for the algorithm. Therefore, the system does not have an output if none of the lung segmentation networks are predicted to produce a segmentation mask with an IoU for an input CR. This is a limitation of the proposed system. The major potential requirement with respect to such images is preprocessing, which opens up the possibility of future research. In addition, the performance of the proposed system for a given CR is limited by the best-performing lung segmentation network. If the selector network classifier selects multiple networks, the fusion of their multiple probability scores could lead to a slight performance improvement. Another limitation of the SE method is the inability to segment an abnormally shaped lung region. The selector network classifier is trained with lung segmentations produced for lung regions that are normally shaped. Moreover, the selective method is not trained for pediatric CRs, which are different from adult CRs with respect to the shape and presence of other organs that are not present or differently shaped in an adult CR.
In addition to image quality improvements via preprocessing, there are other important aspects of this research that can stand as benchmarks for future research. The proposed SE method can be extended to incorporate any number of lung segmentations any time a new lung segmentation algorithm is introduced. The studied main component, that is, the selector network classifier, can be implemented as another transfer learning method, and has the potential to be used for other applications. More training data could be added to train the selector network classifier. Furthermore, it can be improved to function as a grader instead of a classifier in order to predict a weight corresponding to each lung segmentation. Then, the final fusion is a combination of weighted probability scores instead of an average. As ensemble methods have been introduced in the literature for other applications, both medical and non-medical, and for different types of image types other than CRs, the selector network classifier proposed in this paper can be extended accordingly. In addition, the IoU threshold used to categorize the selector network classifier training data can be adjusted in keeping with the expected accuracy.
As our results suggest, the selective ensemble method proposed in this paper shows a performance improvement over traditional ensemble methods in lung segmentation in CRs. We believe the selective approach helps to overcome a performance drop caused by the possible incorporation of inapplicable segmentations during an ensemble approach. This paper sets a compelling benchmark for many potential future works and applications of the proposed method.