Implementation of Automatic Segmentation Framework as Preprocessing Step for Radiomics Analysis of Lung Anatomical Districts

Stefano, Alessandro; Bini, Fabiano; Lauciello, Nicolò; Pasini, Giovanni; Marinozzi, Franco; Russo, Giorgio

doi:10.3390/biomedinformatics4040125

Open AccessArticle

Implementation of Automatic Segmentation Framework as Preprocessing Step for Radiomics Analysis of Lung Anatomical Districts

by

Alessandro Stefano

^1,†

,

Fabiano Bini

^2,*,†

,

Nicolò Lauciello

²,

Giovanni Pasini

^1,2

,

Franco Marinozzi

² and

Giorgio Russo

¹

Institute of Bioimaging and Complex Biological Systems—National Research Council (IBSBC-CNR), Contrada Pietrapollastra-Pisciotto, Cefalù, 90015 Palermo, Italy

²

Department of Mechanical and Aerospace Engineering, Sapienza University of Rome, Eudossiana 18, 00184 Rome, Italy

^*

Author to whom correspondence should be addressed.

^†

These authors share first authorship.

BioMedInformatics 2024, 4(4), 2309-2320; https://doi.org/10.3390/biomedinformatics4040125

Submission received: 9 October 2024 / Revised: 14 November 2024 / Accepted: 2 December 2024 / Published: 11 December 2024

(This article belongs to the Section Imaging Informatics)

Download

Browse Figures

Versions Notes

Abstract

Background: The advent of artificial intelligence has significantly impacted radiology, with radiomics emerging as a transformative approach that extracts quantitative data from medical images to improve diagnostic and therapeutic accuracy. This study aimed to enhance the radiomic workflow by applying deep learning, through transfer learning, for the automatic segmentation of lung regions in computed tomography scans as a preprocessing step. Methods: Leveraging a pipeline articulated in (i) patient-based data splitting, (ii) intensity normalization, (iii) voxel resampling, (iv) bed removal, (v) contrast enhancement and (vi) model training, a DeepLabV3+ convolutional neural network (CNN) was fine tuned to perform whole-lung-region segmentation. Results: The trained model achieved high accuracy, Dice coefficient (0.97) and BF (93.06%) scores, and it effectively preserved lung region areas and removed confounding anatomical regions such as the heart and the spine. Conclusions: This study introduces a deep learning framework for the automatic segmentation of lung regions in CT images, leveraging an articulated pipeline and demonstrating excellent performance of the model, effectively isolating lung regions while excluding confounding anatomical structures. Ultimately, this work paves the way for more efficient, automated preprocessing tools in lung cancer detection, with potential to significantly improve clinical decision making and patient outcomes.

Keywords:

radiomics; deep learning; segmentation

1. Introduction

According to the American Cancer Society, lung cancer is the second most common form of cancer in the United States in both men and women. It is estimated that in 2024 there will be about 238,340 new cases of lung cancer in the United States alone. Five-year survival for lung cancer varies widely by stage and type, with an overall average of about 23%. Lung cancer is by far the leading cause of cancer death in the United States, accounting for about one in five of all cancer deaths. Each year, more people die from lung cancer than from colon, breast and prostate cancer combined [1]. Lung cancer is divided into two main types: non-small-cell lung cancer (NSCLC) and small-cell lung cancer (SCLC). NSCLC accounts for about 80–85% of cases and includes sub-types such as adenocarcinoma, squamous-cell carcinoma and large-cell carcinoma. SCLC, on the other hand, makes up about 10–15% of cases and is generally more aggressive and associated with a worse prognosis [2]. These data highlight the importance of early detection and innovative approaches, such as radiomic analysis, to improve the management and prognosis of lung cancer.

Radiomics, a revolutionary approach in radiology, aims to redefine diagnostic images from mere visual representations into complex and data-rich sources of quantitative information [3,4]. This paradigm shift enables the extraction and analysis of information previously obscured, making it increasingly accessible and interpretable through advanced deep learning techniques [5,6]. Radiomics bridges the gap between engineering and medicine, enhancing diagnostic and therapeutic processes through the synergy of visual image interpretation and in-depth data analysis [7,8]. Radiomics has proven to be a powerful tool for analyzing a range of medical imaging, including computed tomography (CT), magnetic resonance imaging (MRI) and positron emission tomography (PET) and facilitates the extraction of quantitative features that are not readily apparent through visual inspection alone. This data-centric approach supports the creation of predictive models capable of addressing specific clinical questions, thereby enabling more informed and accurate decision making [9,10,11]. A pivotal component of the radiomic workflow is the segmentation process, i.e., delineating regions of interest (ROIs) within medical images. Accurate segmentation is crucial, as it directly impacts the quality and reliability of subsequent analytical steps [12,13]. However, manual segmentation remains the gold standard, and its time-consuming and operator-dependent nature limits radiomics applicability and strengthens the need to develop automatic segmentation frameworks.

Current automatic biomedical image segmentation primarily relies on deep neural networks and CNN architectures, achieving high precision and accuracy. However, challenges remain, including the need for large labeled datasets, susceptibility to noise and difficulties in generalizing to new clinical scenarios. Techniques like transfer learning and data augmentation have proven effective in overcoming these limitations in medical imaging. For instance, Kamnitsas et al. [14] demonstrated that these techniques improved accuracy in brain image segmentation, while Bressem et al. [15] proposed hybrid approaches combining pre-trained networks with data augmentation to enhance generalization. Transfer learning is based on a two-phase approach: in the first phase, the model is trained on a large, generic dataset to acquire foundational features; in the second phase, it is fine-tuned for specific tasks.

In lung cancer segmentation, most studies have focused on direct cancer segmentation, eliminating unnecessary information for the task and keeping only crucial information that could improve cancer segmentation. In the literature, Primakov et al. [16] adopted a similar approach in which the segmentation network was preceded by a pre-processing pipeline focused on preserving the lung district and eliminating useless information, such as that coming from the heart. Commercially, a similar solution is adopted by the Eclipse treatment planning system, which, leveraging the AI-Rad Companion Lung CAD (Siemens Helthineers) [17] system, enables the segmentation of lung nodules in four steps: (i) pre-processing, (ii) candidate generation, (iii) classification and (iv) post processing. Indeed, in the pre-processing step, the auto contouring platform segments the whole lung parenchyma, leveraging a V-Net [18,19] convolutional neural network.

In line with these studies, this research focuses on developing an automated framework whose aim is to detect and segment the whole lung district, comprising lungs and the cancer as preprocessing step for lung cancer segmentation. This preprocessing step uses the DeepLabv3+ net, thus differing from [16], which does not employ a CNN, and from the Eclipse treatment planning system, differing in both CNN type and clinical objective. Indeed, the Eclipse system focuses on detecting lung nodules.

Moreover, this research explores the use of transfer learning, an advanced technique that overcomes the limitations of traditional models, particularly in the medical field, where labeled data are scarce or difficult to obtain [20,21]. Compared to conventional deep learning, which typically requires enormous amounts of data for accurate results, transfer learning reduces dependence on large datasets and minimizes the need for manual annotations, thus saving time and resources [22,23]. By utilizing pre-trained models on large and diverse datasets, researchers can leverage learned features that generalize well to specific medical tasks, allowing for robust performance even with limited data. Additionally, this technique enhances the model’s ability to generalize, reducing the risk of overfitting—a common issue in models trained on small datasets. This approach not only accelerates the training process but also improves the accuracy and reliability of segmentation results, which are crucial in clinical settings to ensure consistent and objective outcomes [24]. In this context, transfer learning is ideal for whole district lung segmentation, as it enables the reuse of complex structural features already learned from larger datasets, ensuring precise segmentation while minimizing manual interventions.

2. Materials and Methods

This study utilizes a dataset of chest CT images that were processed to ensure uniformity and quality, using intensity normalization, spatial resampling and contrast enhancement procedures. The objective is the automated segmentation of whole lung districts, for which a deep learning model based on the DeepLabV3Plus [25] architecture was employed, adapted to segment specific regions of interest. The model’s performance was evaluated using standard segmentation metrics. The following sections will provide a detailed explanation of the steps involved.

2.1. Image Preprocessing

Preprocessing is a critical step in radiomics aimed at enhancing image quality and standardizing inputs prior to feature extraction and analysis. For this study, preprocessing was implemented in MATLAB, where custom code was developed to handle DICOM files, execute necessary preprocessing tasks and conduct lung segmentation. The main steps involved in preprocessing included the following:

Intensity Normalization: The Hounsfield Unit (HU) values in each CT image were constrained within a range of [−1024, 3071] HU to ensure uniformity across different imaging sources and protocols. Values outside this range were clipped, while images were subsequently normalized to a [0, 1] scale at the end of the preprocessing pipeline.

Isotropic Voxel Resampling: The resampling process standardized the spatial resolution of each image to 1 mm³ voxel spacing. This step ensured that each voxel had isotropic dimensions, preserving anatomical details and preventing distortions that might otherwise affect the model’s accuracy. Bilinear interpolation was selected as the resampling method, achieving a balance between image quality and computational efficiency and ensuring consistent resolution across all images.

Thresholding: A thresholding method was applied to distinguish the internal background (thoracic cage) and lung areas from other anatomical structures. This technique involved converting images to a binary format by defining a specific HU range that classified pixels as either belonging to lung regions or not. Non-relevant areas outside the thoracic region were removed, producing a mask that effectively isolated the chest region.

Specifically, a preliminary thoracic mask was created by applying a HU range threshold of [−200, 3000] to filter out non-thoracic structures. This mask was used to generate a rough mask of the lungs. Then, the lung mask underwent morphological operations, including erosion with a circular kernel (radius 3 pixels) and dilation with a circular kernel (radius 3 pixels), followed by edge cleaning to remove disconnected regions.

Bed Removal: The presence of the patient’s bed in CT images introduced extraneous artifacts that could interfere with segmentation accuracy. These artifacts were removed using connected component analysis (CCA), ensuring that segmentation focused solely on anatomical features relevant to lung analysis and reducing the influence of unrelated elements [26].

Image Cropping to the Thoracic Region: Images were cropped to focus on the thoracic region, excluding background areas that could potentially confuse the neural network during segmentation. This step concentrated the model’s attention on the area of interest while reducing computational complexity by narrowing the region analyzed.

Contrast Enhancement using CLAHE (Contrast-Limited Adaptive Histogram Equalization): To improve the visibility of finer anatomical details, CLAHE was applied. This technique divided each image into smaller regions, performing histogram equalization within each area to enhance contrast without excessively amplifying noise. CLAHE was essential for accentuating lung boundaries, which in turn increased the model’s effectiveness in delineating structures with high precision [27].

These preprocessing steps were essential in preparing consistent, high-quality images for segmentation. By removing artifacts, standardizing intensity and enhancing contrast, this preprocessing pipeline ensured that each image was optimized for accurate and reliable segmentation by the neural network.

2.2. Lung Segmentation Using Deep Learning and Transfer Learning

This study focuses on developing an automated lung segmentation system using advanced deep learning techniques, particularly transfer learning. DeepLabV3+ was selected for lung segmentation due to its advanced encoder–decoder architecture, which effectively balances the need for both high spatial accuracy and contextual understanding in semantic segmentation tasks. As an extension of DeepLabV3, it incorporates an encoder based on a modified Xception model, which uses dilated convolutions to capture hierarchical features at multiple levels of granularity without losing resolution. This allows the model to handle the intricate structural details of lung tissue, crucial for distinguishing between healthy regions and pathologies. A central component in the encoder is the Atrous Spatial Pyramid Pooling (ASPP) module, which aggregates information from multiple spatial scales using atrous (dilated) convolutions. This structure, shown in Figure 1, enhances the model’s capacity to capture both fine details and broader contextual information, addressing the challenge of variability and heterogeneity in lung tissue representation. Such multi-scale context aggregation is critical in medical imaging, where the model must balance localized details with an overall understanding of lung anatomy [25]. The decoder in DeepLabV3+ refines spatial resolution, particularly around object boundaries, which is essential for medical applications where precision at boundaries impacts clinical decisions. This refinement is achieved by fusing features from multiple resolutions, which combines detailed information from the encoder with convolutional layers specifically tuned for edge clarity. As a result, the model achieves higher segmentation accuracy around complex areas, such as tumor boundaries and lung contours. Moreover, DeepLabV3+ offers several advantages for this task.

Adaptability to Medical Data: The architecture’s ability to capture and retain both local and global features makes it ideal for lung segmentation, where structures are diverse and irregular.

Efficiency with Limited Data: By leveraging transfer learning on pre-trained DeepLabV3+ models, the approach maximizes performance even with smaller labeled datasets, making it practical for clinical applications where data are limited.

Edge Detection and Precision: The combination of ASPP and resolution-preserving convolutions improves edge accuracy, crucial for precise segmentation in complex anatomical regions.

Improved Robustness through Custom Preprocessing: To enhance generalization and robustness, this study incorporates preprocessing techniques, such as intensity normalization and augmentation, ensuring the model’s resilience to variability in clinical imaging data.

In summary, DeepLabV3+ is well suited to the demands of lung segmentation, offering an optimized solution for capturing detailed, boundary-aware features necessary in clinical contexts. This approach not only improves segmentation accuracy but also supports reliable outcomes that are crucial in lung pathology analysis.

Modification for Specific Tasks: The model is adapted to classify and segment three specific classes: external background, chest (internal background) and lung districts (areas of interest including lungs and cancer together). The modifications are implemented using the Deep Network Designer tool in MATLAB.

2.3. Experimental Setup

The experimental setup included an Intel Core i5 13600KF CPU, a Nvidia GeForce RTX 3070 (8 GB) GPU, 64 GB DDR4 RAM and a B660I AORUS PRO DDR4 Motherboard. MATLAB R2024a was used to develop our preprocessing, training, validation and testing pipelines. SGDM was used as optimizer with a momentum equal to 0.9, mini batch size was set to 8, the Initial Learning Rate was set to 0.001 and L2 Regularization was set to 0.005; the training lasted 5820 iterations. The adopted rationale for dataset division is 60% training, 20% validation and 20% testing, which allows for a substantial amount of data for model training, validation and testing, as already used in [28,29]. The dataset division was based on patients and not on images.

2.4. Loss Function and Evaluation Metrics

To evaluate and optimize the performance of the segmentation model, several metrics and functions are employed.

Loss Function: the loss function used is the Weighted Cross Entropy, which quantifies the discrepancy between the predicted and true class distributions. This function is particularly useful in scenarios with class imbalance. The formula is defined as follows:

L (Y t r u e, Y p r e d) = - ((w_{1} \cdot Y t r u e \cdot l o g (Y p r e d) + w_{0} \cdot (1 - Y t r u e) \cdot l o g (1 - Y p r e d))

(1)

where Ytrue and Ypred are, respectively, the true and the predicted value; w₀ and w₁ are the weights that change during the iterations to minimize the loss function.

Evaluation Metrics: Accuracy measures the proportion of correct predictions relative to the total number of predictions. It is calculated as follows:

A c c u r a c y = \frac{N u m b e r o f c o r r e c t p r e d i c t i o n s}{T o t a l n u m b e r o f p r e d i c t i o n s}

(2)

Dice Coefficient: evaluates the similarity between two sets, typically used to assess the overlap between the predicted segmentation and the ground truth. The Dice coefficient is calculated as follows:

D I C E = \frac{2 \times | A \cap B |}{|A| + | B |}

(3)

where A and B are the predicted and true segmentations, respectively. An index of 1 indicates perfect overlap, while an index of 0 indicates no overlap.

Jaccard Index: complementary to the Dice coefficient, gives a stringent assessment of overlap between predicted and true segmentations and is calculated as follows:

J (A, B) = \frac{| A \cap B |}{|A| ⋃ | B |}

(4)

BF Score: indicates how well the predicted boundary of each class aligns with the true boundary and is calculated as follows:

B F S c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(5)

and results in a balanced measure of precision and recall.

2.5. Dataset Introduction

The dataset utilized in this study, the “non-small cell lung cancer” (NSCLC-Radiomics) collection [30], sourced from the Cancer Imaging Archive, is critical for evaluating the effectiveness of the proposed methodology for automated lung segmentation. It consists of a representative collection of chest CT images from patients diagnosed with various pulmonary conditions, with a focus on NSCLC. The dataset, sourced from the “non-small cell lung cancer” (NSCLC-Radiomics) collection, includes DICOM images and comprises 422 patients whose diagnoses were confirmed through biopsy.

3. Results

This section presents the outcomes of both the image preprocessing steps and the automated lung segmentation process. The preprocessing pipeline, which included intensity normalization, spatial resampling and contrast enhancement, successfully standardized the imaging data, ensuring consistency across different scans and improving the overall image quality. These steps were crucial in facilitating more accurate feature extraction and model performance.

For the lung segmentation task, the DeepLabV3Plus model demonstrated strong performance, accurately distinguishing lung regions from surrounding structures. Evaluation metrics, including accuracy and the Dice coefficient, showed a high degree of overlap between the predicted segmentations and the ground truth provided by expert radiologists, confirming the reliability of the proposed method. The following sections provide a more detailed analysis of these results.

3.1. Dataset

Each patient is associated with a CT scan volume at a resolution of 512 × 512 pixels, resulting in a total of 47,919 individual chest CT images, as seen in Table 1.

Additionally, its reliance on standard CT imaging ensures compatibility with clinical practice, facilitating the application of findings in real-world clinical decision making. This dataset also enables high-throughput radiomic analysis essential for the non-invasive assessment of intratumor heterogeneity, a key factor in identifying prognostic biomarkers and advancing personalized medicine.

The 2D images obtained from the dataset, from the volumes of CT scans, are divided to enable and organize the model building and development phases—training, validation and testing. Sixty percent of the images will be used only in the training phase, the remainder will be divided into a first part, i.e., twenty percent for the validation phase only and the remaining part, i.e., twenty percent, will be used for testing the capabilities and skills acquired by the neural network. Therefore, 29,187 images, 9312 images and 9420 images were used for training, validation and testing, as seen in Table 2.

3.2. Preprocessing Results

Figure 2 provides an example of the images used in this study. A slice from a CT volume in the dataset was selected as the reference image to illustrate the preprocessing steps. The figure highlights the key stages of preprocessing, showcasing how the raw data are systematically transformed to ensure consistency and improve the quality of input for the model. These steps are crucial to enhance the accuracy and robustness of the subsequent segmentation process by addressing variability in image quality and resolution.

The preprocessing of CT images was a crucial step to ensure that the images were standardized and optimized for further analysis; the main steps included the following:

Isotropic Voxel Resampling: Standardizing resolution to 1mm 3 voxel spacing ensured consistent image scaling and preserved anatomical details, allowing the model to process each image without distortions (Figure 2b).

Thresholding: This method effectively differentiated lung regions from other anatomical structures, reducing the influence of non-relevant areas on thoracic analysis and providing an accurate chest mask (Figure 2c).

Patient Bed Removal: Eliminating the bed artifact prevented extraneous elements from interfering with segmentation, thereby increasing the precision of segmented boundaries (Figure 2d).

Cropping to the Thoracic Region: Focusing on the thoracic area improved the neural network’s ability to concentrate on relevant areas, reduced computational complexity and enhanced segmentation effectiveness (Figure 2e).

CLAHE (Contrast Limited Adaptive Histogram Equalization): By enhancing image contrast, this step increased the visibility of fine anatomical details, leading to more accurate segmentation of lung boundaries (Figure 2f).

These steps contributed to improving the quality of the images provided to the neural network, ensuring that the data were consistent and ready for segmentation analysis. Enhancing contrast and removing artifacts facilitated the accuracy and reliability of the segmentation process.

3.3. Segmentation Results

The automatic segmentation of CT images was carried out using the DeepLabV3Plus model, with modifications to classify three categories: lungs, thoracic cage and external background. The model achieved excellent performance in terms of both accuracy and overlap compared to expert manual segmentations.

Evaluation Metrics

The model achieved a Dice index of 0.97 for whole lung district segmentation and 0.99 for the thoracic cage, indicating an almost perfect overlap between the automatic and manual segmentations.

The overall accuracy of the model was 97.56% for the whole lung district segmentation and 98.75% for the thoracic cage, demonstrating the effectiveness of the deep learning approach.

The model achieved a Jaccard index of 93.97% for the whole lung district segmentation, indicating a very good overlap between the automatic and manual segmentations.

BF Score was 93.06% for the whole lung district segmentation, demonstrating the effectiveness of the model.

The segmented images showed excellent distinction between lungs, thoracic cage and external background. In particular, the model’s ability to accurately identify lung boundaries was optimized using preprocessing techniques, as seen in Figure 3. Moreover, in Figure 4, there is an example of images before the pipeline and after the application of the automated whole lung district segmentation. From the Figure, it is evident that the heart and the spine were completely removed.

3.4. Final Considerations on Segmentation

The results demonstrate that the use of transfer learning applied to the DeepLabV3Plus network can provide highly accurate segmentations in a clinical context. This approach has overcome the limitations of traditional methods, particularly in terms of reproducibility and speed, significantly reducing segmentation time and ensuring greater consistency across different imaging sessions. Integrating deep learning into the radiomic workflow can therefore not only improve diagnostic accuracy but also enhance the overall efficiency of clinical processes, allowing for faster and more precise management of medical information.

4. Discussion

The results of this research highlight the effectiveness of the proposed framework for automatic lung segmentation from CT images, integrating radiomic techniques with advanced deep learning models. The implementation of the DeepLabV3+ model, chosen for its key strengths, combined with transfer learning, has addressed many limitations traditionally associated with manual or semi-automatic segmentation methods, ensuring greater precision and reliability. The choice of DeepLabV3+ is particularly justified by its key strengths. Pyramid Spatial Architecture: this feature allows the model to capture anatomical structures at multiple scales, crucial for accurately segmenting complex lung anatomy. Contextual Awareness: the model effectively maintains the global context during segmentation, enhancing accuracy in areas with intricate details. Versatility: DeepLabV3+ adapts well to various imaging tasks, making it suitable for broader clinical applications beyond lung segmentation. Robustness to Noise: its design improves segmentation quality, even in images affected by noise and artifacts. The high performance achieved, in terms of both the Dice index and overall accuracy, demonstrates the robustness of the adopted approach. Average Dice values exceeding 0.97 indicate an almost perfect overlap between the automatic segmentation and that performed manually by expert radiologists. This accuracy, along with reduced processing time, lays a strong foundation for integrating these tools into clinical settings, enhancing both the speed and consistency of results. Moreover, as shown in Figure 4, it is evident that the automated pipeline effectively segmented the whole lung districts comprising the lungs and the cancer and removed potentially confounding anatomical structure such as the spine and the heart. A key aspect emerging from this research is the importance of image preprocessing. Techniques such as isotropic voxel resampling and Contrast-Limited Adaptive Histogram Equalization (CLAHE) improved image quality, enhancing the visibility of anatomical details and facilitating the deep learning model’s segmentation task. This positively impacted prediction quality, reducing errors caused by noise or artifacts in the original images. The integration of transfer learning played a crucial role in optimizing model performance by adapting pre-trained weights to a new radiomic dataset. This significantly reduced the need for extensive annotated data during training without compromising the model’s ability to generalize effectively to new data, proving advantageous when dealing with smaller datasets. The use of deep learning techniques for image segmentation has been widely validated across various fields of medical imaging. For example, ref. [31] applied convolutional neural networks (CNNs), particularly U-net-based architectures, for fully automatic segmentation of the head and neck, achieving encouraging early results with Dice scores ranging from 0.70 to 0.78. This progress has driven research and strategies aimed at further improving results, such as modifications to the loss function to recalibrate the neural network [32]. The evolution of these methodologies has also extended to specific applications, including lesion segmentation and classification tasks, where continuous advancements have emerged due to the development of convolutional networks and increasingly precise algorithms. For instance, segmentation and classification of mandibular lesions using U-Net yielded promising Dice scores of 87.2% [33], while liver lesion segmentation and volume estimation reported Dice scores as high as 0.97 [34]. These results highlight the growing precision of segmentation techniques across different anatomical structures. Focusing specifically on lung segmentation, several studies have shown promising results using deep learning models. For instance, ref. [35] achieved a Dice coefficient of 0.967 using a U-Net-based architecture for lung segmentation on MRI images. Similarly, ref. [36] implemented an automatic segmentation model for lung images, achieving a Dice similarity coefficient of 89.42 with significant improvements in processing time. In the case of U-Net++, the same article discusses enhanced lung image segmentation using deep learning, achieving an accuracy above 98% and a Dice similarity coefficient of 0.95. Although results were lower compared to what we obtained, a straightforward comparison cannot be carried on, since the focus of our research is to segment the whole lung district (comprising the lungs and the cancer) and not only the lungs.

Moreover, this research tries to address a current issue in lung cancer, and the choice of this specific NSCLC dataset was informed by several criteria closely aligned with the research objectives. Lung cancer exhibits one of the highest rates of incidence and mortality globally, making it a critical area of study. Specifically, as noted in the introduction, 80–85% of lung cancer cases are NSCLC, justifying the focus on this cancer type and the pulmonary anatomical region. The dataset’s large sample size provides a robust basis for analyzing tumor phenotypes, capturing diverse characteristics like intensity, shape and texture.

Furthermore, patient-based dataset splitting, even if the analysis was carried in 2D form, eliminates the possibility to have correlated images in training, validation and datasets, thus enhancing the robustness of the pipeline.

Despite these excellent results, challenges remain. Extending the methodology to larger, more diverse datasets and different clinical contexts requires further validation. Inter-observer variability in manual annotations presents another critical area, as differences in reference labels can affect model performance. Finally, segmenting more complex anatomical structures or irregularly shaped tumors will necessitate developing advanced methodologies, potentially including multi-modal approaches that integrate images from multiple diagnostic sources. In conclusion, our research makes a significant contribution to the field of precision medicine, demonstrating how deep learning technologies for automatic segmentation can enhance the diagnosis and treatment of lung cancer. The solid methodological foundation and the results obtained pave the way for further developments, with potential clinical applications extending to various pathologies and anatomical regions.

5. Conclusions

This paper focused on developing and applying an automatic deep learning network for segmenting whole lung districts. Leveraging a robust pipeline, articulated in patient-based dataset splitting, intensity normalization, voxel resampling, bed removal, contrast enhancement and deep learning through DeepLabV3+ net, promising results were achieved (Dice = 0.97, Accuracy = 97.56%, Jaccard Index = 93.97%, BF Score = 93.06%). The development of this framework is beneficial for lung cancer segmentation, since it preserves only the relevant information in the image, discarding potential confounding anatomical structures, such as the heart and the spine. As a future development, the framework should be completed by adding another net in the cascade and performing lung cancer segmentation on preprocessed images, thus allowing for consistent results for radiomics analysis.

Author Contributions

Conceptualization, F.B., A.S. and F.M.; methodology, N.L. and G.P.; software, N.L. and G.P.; validation, F.B., A.S., F.M. and G.R.; formal analysis, N.L. and G.P.; investigation, N.L. and G.P.; resources, N.L. and G.P.; data curation, N.L. and G.P.; writing—original draft preparation, N.L. and F.B.; writing—review and editing, F.B. and A.S.; visualization, N.L. and G.P.; supervision, F.B., A.S., F.M. and G.P; project administration, F.B., A.S. and F.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki, and it is based on public available datasets.

Informed Consent Statement

Informed consent was waived due to the retrospective nature of this study. Moreover, it is based on publicly available datasets.

Data Availability Statement

Data are available upon request to the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Siegel, R.L.; Giaquinto, A.N.; Jemal, A. Cancer Statistics, 2024. CA Cancer J. Clin. 2024, 74, 12–49. [Google Scholar] [CrossRef] [PubMed]
Travis, W.D.; Brambilla, E.; Nicholson, A.G.; Yatabe, Y.; Austin, J.H.M.; Beasley, M.B.; Chirieac, L.R.; Dacic, S.; Duhig, E.; Flieder, D.B.; et al. The 2015 World Health Organization Classification of Lung Tumors: Impact of Genetic, Clinical and Radiologic Advances Since the 2004 Classification. J. Thorac. Oncol. 2015, 10, 1243–1260. [Google Scholar] [CrossRef] [PubMed]
Rogers, W.; Seetha, S.T.; Refaee, T.A.G.; Lieverse, R.I.Y.; Granzier, R.W.Y.; Ibrahim, A.; Keek, S.A.; Sanduleanu, S.; Primakov, S.P.; Beuque, M.P.L.; et al. Radiomics: From Qualitative to Quantitative Imaging. Br. J. Radiol. 2020, 93, 20190948. [Google Scholar] [CrossRef] [PubMed]
van Timmeren, J.E.; Cester, D.; Tanadini-Lang, S.; Alkadhi, H.; Baessler, B. Radiomics in Medical Imaging—“How-to” Guide and Critical Reflection. Insights Imaging 2020, 11, 91. [Google Scholar] [CrossRef] [PubMed]
Piorkowski, A.; Obuchowicz, R.; Najjar, R. Redefining Radiology: A Review of Artificial Intelligence Integration in Medical Imaging. Diagnostics 2023, 13, 2760. [Google Scholar] [CrossRef]
Jha, A.K.; Mithun, S.; Sherkhane, U.B.; Dwivedi, P.; Puts, S.; Osong, B.; Traverso, A.; Purandare, N.; Wee, L.; Rangarajan, V.; et al. Emerging Role of Quantitative Imaging (Radiomics) and Artificial Intelligence in Precision Oncology. Open Explor. 2023, 4, 569–582. [Google Scholar] [CrossRef]
Lambin, P.; Leijenaar, R.T.H.; Deist, T.M.; Peerlings, J.; De Jong, E.E.C.; Van Timmeren, J.; Sanduleanu, S.; Larue, R.T.H.M.; Even, A.J.G.; Jochems, A.; et al. Radiomics: The Bridge between Medical Imaging and Personalized Medicine. Nat. Rev. Clin. Oncol. 2017, 14, 749–762. [Google Scholar] [CrossRef]
Stefano, A. Challenges and Limitations in Applying Radiomics to PET Imaging: Possible Opportunities and Avenues for Research. Comput. Biol. Med. 2024, 179, 108827. [Google Scholar] [CrossRef]
Zhang, W.; Guo, Y.; Jin, Q.; Zhang, W.; Guo, Y.; Jin, Q. Radiomics and Its Feature Selection: A Review. Symmetry 2023, 15, 1834. [Google Scholar] [CrossRef]
Zhang, Y.P.; Zhang, X.Y.; Cheng, Y.T.; Li, B.; Teng, X.Z.; Zhang, J.; Lam, S.; Zhou, T.; Ma, Z.R.; Sheng, J.B.; et al. Artificial Intelligence-Driven Radiomics Study in Cancer: The Role of Feature Engineering and Modeling. Mil. Med. Res. 2023, 10, 22. [Google Scholar] [CrossRef]
Vial, A.; Stirling, D.; Field, M.; Ros, M.; Ritz, C.; Carolan, M.; Holloway, L.; Miller, A.A. The Role of Deep Learning and Radiomic Feature Extraction in Cancer-Specific Predictive Modelling: A Review. Transl. Cancer Res. 2018, 7, 803–816. [Google Scholar] [CrossRef]
Comelli, A.; Stefano, A.; Coronnello, C.; Russo, G.; Vernuccio, F.; Cannella, R.; Salvaggio, G.; Lagalla, R.; Barone, S. Radiomics: A New Biomedical Workflow to Create a Predictive Model. Commun. Comput. Inf. Sci. 2020, 1248, 280–293. [Google Scholar] [CrossRef]
Pasini, G.; Bini, F.; Russo, G.; Comelli, A.; Marinozzi, F.; Stefano, A. MatRadiomics: A Novel and Complete Radiomics Framework, from Image Visualization to Predictive Model. J. Imaging 2022, 8, 221. [Google Scholar] [CrossRef] [PubMed]
Kamnitsas, K.; Ledig, C.; Newcombe, V.F.J.; Simpson, J.P.; Kane, A.D.; Menon, D.K.; Rueckert, D.; Glocker, B. Efficient Multi-Scale 3D CNN with Fully Connected CRF for Accurate Brain Lesion Segmentation. Med. Image Anal. 2017, 36, 61–78. [Google Scholar] [CrossRef]
Bressem, K.K.; Adams, L.C.; Erxleben, C.; Hamm, B.; Niehues, S.M.; Vahldiek, J.L. Comparing Different Deep Learning Architectures for Classification of Chest Radiographs. Sci. Rep. 2020, 10, 13590. [Google Scholar] [CrossRef]
Primakov, S.P.; Ibrahim, A.; van Timmeren, J.E.; Wu, G.; Keek, S.A.; Beuque, M.; Granzier, R.W.Y.; Lavrova, E.; Scrivener, M.; Sanduleanu, S.; et al. Automated Detection and Segmentation of Non-Small Cell Lung Cancer Computed Tomography Images. Nat. Commun. 2022, 13, 3423. [Google Scholar] [CrossRef]
Siemens-Helthineers Instructions for Use—AI-Rad Companion (Pulmonary) VA31. Available online: https://content.doclib.siemens-healthineers.com/rest/v1/view?document-id=930870 (accessed on 4 November 2024).
Siemens Helthineers AI-Rad Companion Chest CT VA20 Whitepaper—April 2022. Available online: https://marketing.webassets.siemens-healthineers.com/d4d8b5ba29e6d49e/e8eba575c238/siemens-healthineers-dh-AI-rad_chest_ct_whitepaper.pdf (accessed on 4 November 2024).
Milletari, F.; Navab, N.; Ahmadi, S.A. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In Proceedings of the 2016 4th International Conference on 3D Vision, 3DV 2016, Stanford, CA, USA, 25–28 October 2016; pp. 565–571. [Google Scholar] [CrossRef]
Iman, M.; Arabnia, H.R.; Rasheed, K. A Review of Deep Transfer Learning and Recent Advancements. Technologies 2023, 11, 40. [Google Scholar] [CrossRef]
Said, Y.; Alsheikhy, A.A.; Shawly, T.; Lahza, H. Medical Images Segmentation for Lung Cancer Diagnosis Based on Deep Learning Architectures. Diagnostics 2023, 13, 546. [Google Scholar] [CrossRef]
Boudoukhani, N.; Elberrichi, Z.; Oulladji, L.; Dif, N. New Attention-Gated Residual Deep Convolutional Network for Accurate Lung Segmentation in Chest x-Rays. Evol. Syst. 2024, 15, 919–938. [Google Scholar] [CrossRef]
Murugappan, M.; Bourisly, A.K.; Prakash, N.B.; Sumithra, M.G.; Acharya, U.R. Automated Semantic Lung Segmentation in Chest CT Images Using Deep Neural Network. Neural Comput. Appl. 2023, 35, 15343–15364. [Google Scholar] [CrossRef]
Rayed, M.E.; Islam, S.M.S.; Niha, S.I.; Jim, J.R.; Kabir, M.M.; Mridha, M.F. Deep Learning for Medical Image Segmentation: State-of-the-Art Advancements and Challenges. Inf. Med. Unlocked 2024, 47, 101504. [Google Scholar] [CrossRef]
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Computer Vision—ECCV 2018; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018; Volume 11211, pp. 833–851. [Google Scholar] [CrossRef]
Abd Rahni, A.A.; Mohamed Fuzaie, M.F.; Al Irr, O.I. Automated Bed Detection and Removal from Abdominal CT Images for Automatic Segmentation Applications. In Proceedings of the 2018 IEEE EMBS Conference on Biomedical Engineering and Sciences, IECBES 2018—Proceedings, Sarawak, Malaysia, 3–6 December 2018; pp. 677–679. [Google Scholar] [CrossRef]
Sanagavarapu, S.; Sridhar, S.; Gopal, T.V. COVID-19 Identification in CLAHE Enhanced CT Scans with Class Imbalance Using Ensembled ResNets. In Proceedings of the 2021 IEEE International IOT, Electronics and Mechatronics Conference, IEMTRONICS 2021—Proceedings, Toronto, ON, Canada, 21–24 April 2021. [Google Scholar] [CrossRef]
Zaalouk, A.M.; Ebrahim, G.A.; Mohamed, H.K.; Hassan, H.M.; Zaalouk, M.M.A. A Deep Learning Computer-Aided Diagnosis Approach for Breast Cancer. Bioengineering 2022, 9, 391. [Google Scholar] [CrossRef] [PubMed]
Chen, P.N.; Lee, C.C.; Liang, C.M.; Pao, S.I.; Huang, K.H.; Lin, K.F. General Deep Learning Model for Detecting Diabetic Retinopathy. BMC Bioinform. 2021, 22, 84. [Google Scholar] [CrossRef] [PubMed]
Aerts, H.J.W.L.; Velazquez, E.R.; Leijenaar, R.T.H.; Parmar, C.; Grossmann, P.; Cavalho, S.; Bussink, J.; Monshouwer, R.; Haibe-Kains, B.; Rietveld, D.; et al. Decoding Tumour Phenotype by Noninvasive Imaging Using a Quantitative Radiomics Approach. Nat. Commun. 2014, 5, 4006. [Google Scholar] [CrossRef]
Kermany, D.S.; Goldbaum, M.; Cai, W.; Valentim, C.C.S.; Liang, H.; Baxter, S.L.; McKeown, A.; Yang, G.; Wu, X.; Yan, F.; et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell 2018, 172, 1122–1131.e9. [Google Scholar] [CrossRef]
Wang, Y.; Lombardo, E.; Huang, L.; Avanzo, M.; Fanetti, G.; Franchin, G.; Zschaeck, S.; Weingärtner, J.; Belka, C.; Riboldi, M.; et al. Comparison of Deep Learning Networks for Fully Automated Head and Neck Tumor Delineation on Multi-Centric PET/CT Images. Radiat. Oncol. 2024, 19, 3. [Google Scholar] [CrossRef]
Liu, W.; Li, X.; Liu, C.; Gao, G.; Xiong, Y.; Zhu, T.; Zeng, W.; Guo, J.; Tang, W. Automatic Classification and Segmentation of Multiclass Jaw Lesions in Cone-Beam CT Using Deep Learning. Dentomaxillofac. Radiol. 2024, 53, 439–446. [Google Scholar] [CrossRef]
Gross, M.; Huber, S.; Arora, S.; Ze’evi, T.; Haider, S.P.; Kucukkaya, A.S.; Iseke, S.; Kuhn, T.N.; Gebauer, B.; Michallek, F.; et al. Automated MRI Liver Segmentation for Anatomical Segmentation, Liver Volumetry, and the Extraction of Radiomics. Eur. Radiol. 2024, 34, 5056–5065. [Google Scholar] [CrossRef]
Weng, A.M.; Heidenreich, J.F.; Metz, C.; Veldhoen, S.; Bley, T.A.; Wech, T. Deep Learning-Based Segmentation of the Lung in MR-Images Acquired by a Stack-of-Spirals Trajectory at Ultra-Short Echo-Times. BMC Med. Imaging 2021, 21, 79. [Google Scholar] [CrossRef]
Gite, S.; Mishra, A.; Kotecha, K. Enhanced Lung Image Segmentation Using Deep Learning. Neural Comput. Appl. 2023, 35, 22839–22853. [Google Scholar] [CrossRef]

Figure 1. Semantic segmentation architecture of pretrained deeplabv3+.

Figure 2. (a) Slice of a CT volume; (b) resampled slice; (c) slice after threshold; (d) slice after bed removal; (e) slice cropped to the thoracic region; and (f) slice after CLAHE.

Figure 3. Comparison of manual (a) and automatic (b) segmentation.

Figure 4. (a) Image before the automated deep learning whole lung district segmentation. (b) Image after the application of the automated deep learning whole lung district segmentation.

Table 1. Description of dataset.

Site	Patients	Type of Cancer	Images	Resolution [Pixel]
Lungs	422	Non-Small cells	47,919	512 × 512

Table 2. Division of 2D images belonging to the dataset.

Total 2D Images	Training	Validation	Test
47,919	29,187	9312	9420

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Stefano, A.; Bini, F.; Lauciello, N.; Pasini, G.; Marinozzi, F.; Russo, G. Implementation of Automatic Segmentation Framework as Preprocessing Step for Radiomics Analysis of Lung Anatomical Districts. BioMedInformatics 2024, 4, 2309-2320. https://doi.org/10.3390/biomedinformatics4040125

AMA Style

Stefano A, Bini F, Lauciello N, Pasini G, Marinozzi F, Russo G. Implementation of Automatic Segmentation Framework as Preprocessing Step for Radiomics Analysis of Lung Anatomical Districts. BioMedInformatics. 2024; 4(4):2309-2320. https://doi.org/10.3390/biomedinformatics4040125

Chicago/Turabian Style

Stefano, Alessandro, Fabiano Bini, Nicolò Lauciello, Giovanni Pasini, Franco Marinozzi, and Giorgio Russo. 2024. "Implementation of Automatic Segmentation Framework as Preprocessing Step for Radiomics Analysis of Lung Anatomical Districts" BioMedInformatics 4, no. 4: 2309-2320. https://doi.org/10.3390/biomedinformatics4040125

APA Style

Stefano, A., Bini, F., Lauciello, N., Pasini, G., Marinozzi, F., & Russo, G. (2024). Implementation of Automatic Segmentation Framework as Preprocessing Step for Radiomics Analysis of Lung Anatomical Districts. BioMedInformatics, 4(4), 2309-2320. https://doi.org/10.3390/biomedinformatics4040125

Article Menu

Implementation of Automatic Segmentation Framework as Preprocessing Step for Radiomics Analysis of Lung Anatomical Districts

Abstract

1. Introduction

2. Materials and Methods

2.1. Image Preprocessing

2.2. Lung Segmentation Using Deep Learning and Transfer Learning

2.3. Experimental Setup

2.4. Loss Function and Evaluation Metrics

2.5. Dataset Introduction

3. Results

3.1. Dataset

3.2. Preprocessing Results

3.3. Segmentation Results

Evaluation Metrics

3.4. Final Considerations on Segmentation

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI