Inhalation Injury Grading Using Transfer Learning Based on Bronchoscopy Images and Mechanical Ventilation Period

Li, Yifan; Pang, Alan W.; Zeitouni, Jad; Zeitouni, Ferris; Mateja, Kirby; Griswold, John A.; Chong, Jo Woon

doi:10.3390/s22239430

Open AccessArticle

Inhalation Injury Grading Using Transfer Learning Based on Bronchoscopy Images and Mechanical Ventilation Period

by

Yifan Li

¹,

Alan W. Pang

²,

Jad Zeitouni

³

,

Ferris Zeitouni

³

,

Kirby Mateja

³,

John A. Griswold

² and

Jo Woon Chong

^1,*

¹

Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX 79409, USA

²

Department of Surgery, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA

³

School of Medicine, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA

^*

Author to whom correspondence should be addressed.

Sensors 2022, 22(23), 9430; https://doi.org/10.3390/s22239430

Submission received: 30 October 2022 / Revised: 23 November 2022 / Accepted: 29 November 2022 / Published: 2 December 2022

(This article belongs to the Special Issue Sensors and Signal Processing for Biomedical Application)

Download

Browse Figures

Versions Notes

Abstract

The abbreviated injury score (AIS) is commonly used as a grading system for inhalation injuries. While inhalation injury grades have inconsistently been shown to correlate positively with the time mechanical ventilation is needed, grading is subjective and relies heavily on the clinicians’ experience and expertise. Additionally, no correlation has been shown between these patients’ inhalation injury grades and outcomes. In this paper, we propose a novel inhalation injury grading method which uses deep learning algorithms in bronchoscopy images to determine the injury grade from the carbonaceous deposits, blistering, and fibrin casts in the bronchoscopy images. The proposed method adopts transfer learning and data augmentation concepts to enhance the accuracy performance to avoid overfitting. We tested our proposed model on the bronchoscopy images acquired from eighteen patients who had suffered inhalation injuries, with the degree of severity 1, 2, 3, 4, 5, or 6. As performance metrics, we consider accuracy, sensitivity, specificity, F-1 score, and precision. Experimental results show that our proposed method, with both transfer learning and data augmentation components, provides an overall 86.11% accuracy. Moreover, the experimental results also show that the performance of the proposed method outperforms the method without transfer learning or data augmentation.

Keywords:

inhalation injury; deep learning; convolutional neural networks (cnn); transfer learning

1. Introduction

A smoke inhalation injury is one of the significant complications of burn injuries [1]. Among the reported burn injuries in the United States (US) in 2022, 7.7% included smoke inhalation injuries [1]. Burn patients with an inhalation injury exhibit higher mortality rates than those without inhalation injuries [2]. For example, from 2009 to 2018, the mortality rate of burn patients without inhalation injuries was 2.9%, while in the presence of smoke inhalation injuries, the mortality rate increased by 20%. Specifically, the mortality increased by up to 60% with concurrent pneumonia [2]. In addition, smoke inhalation also increases the length of hospital stay [3,4]. The pathophysiology of inhalation injury is caused by the following three reasons: (1) direct thermal injury to the upper airway, (2) chemical irritation to the lower subglottic airway by smoke, and (3) metabolic injury from specific chemicals from the smoke, such as hydrogen cyanide [5]. Airway inflammation and pulmonary shunting after the inhalation injury both lead to hypoxemia [6]. Moreover, inhalation injuries may impair macrophage function, leading to impaired ciliary clearance [7,8]. Inhibition of alveolar macrophages may expose patients to bacterial infections and predispose them to developing pneumonia (a leading cause of death in burn injury) [9]. Furthermore, patients with cutaneous injuries are at additional risk of developing pneumonia [9]. Currently, the most widely used grading system for inhalation injury is the abbreviated injury score (AIS) [10], which categorizes the severity of the injury on a scale of (1) grade 0 (no injury), (2) grade 1 (mild injury), (3) grade 2 (moderate injury), (4) grade 3 (severe injury), and (5) grade 4 (massive injury). Examples of bronchoscopy imaging acquired from Grades 0–4 patients are shown in Figure 1 [11]. However, studies have shown an inconsistent cause-and-effect relationship between inhalation injury grade and the period during which the patient requires mechanical ventilation [12]. Additionally, the inhalation injury grade from the current AIS is not highly correlated with mortality [10,11,12,13], either.

The currently widely employed inhalation injury grading systems, such as AIS, are based on the findings of the initial bronchoscopy examination. The lack of consensus on diagnosis, grading, and prognosis of inhalation injury stems from the limitations of the AIS grading system. For example, since bronchoscopy is not easy to identify, narrow distal airway changes, diagnosis and grading likely depend on the image quality and interpretation [14]. Additionally, the subjectivity of this grading system places reliance on the examiner’s expertise for accuracy [14].

Deep learning and artificial intelligence have shown promising medical imaging classification capabilities [15]. For example, a convolutional neural network (CNN), a type of artificial neural network, is widely used for medical image classification and feature recognition [16]. Our proposed inhalation injury grading method is based on machine learning and artificial intelligence and aims to address the limitations of the currently utilized method, such as AIS. As a result, the proposed method allows consistency and objectivity in diagnosis. To the best of our knowledge, the application of deep learning artificial intelligence in grading the severity of inhalation injuries has not been done.

1.1. Related Work

The diagnosis of inhalation injury currently relies on clinical exam findings and bronchoscopy evidence. The development of grading systems and the use of modalities such as chest computed tomography allow for a more detailed assessment of inhalation injury and enhance predictive power [17]. Currently, to the best of our knowledge, there is no literature discussing the application of deep learning models to diagnose the severity of inhalational injuries or the development of inhalation injury grading systems.

Machine learning has been applied in burn care recently. Similar to the treatment of inhalational injury, the diagnosis and staging of burn injuries is also largely subjective and heavily reliant on clinician experience. Artificial intelligence and automated machine learning models are currently being proposed for the diagnosis and staging of burn wounds as they would provide clinicians with an automatic and reliable diagnostic tool. Yadav et al. developed a burn classification model using a support vector machine to diagnose burn injuries [18]. This model aimed to classify burn injuries into two categories, ones that needed grafting and ones that did not. Using burn injury skin images from the biomedical image processing database, their model showed an accuracy of 82.43% [18]. Rangel-Olvera et al. proposed a burn detection method based on using a sparse representation of feature vectors with over-redundant dictionaries, which provides 95.65% sensitivity and 94.02% precision [19]. Suha et al. applied CNN [20], and Lee et al. applied encoder–decoder CNN [21] to burn images to estimate burn severity. In [20], the CNN with transfer learning was trained to classify images according to their burn severity as a first-, second-, or third-degree burn, and the method provided 95.63% accuracy. In [21], encoder–decoder CNN classified burn depth based on altered tissue morphology and yielded 99% accuracy, 98% sensitivity, and 100% specificity. Moreover, a deep convolutional neural network-based body part-specific burns severity assessment model (BPBSAM) was investigated by Chauhan et al. [22] to predict the severity of a burn by applying body part-specific SVM. This is trained with CNN features which are extracted from the body part image of the burn. This model showed 84.85% accuracy on the BI test dataset, which included a blood test and ultrasound examination [22].

Optical coherence tomography (OCT) and Raman spectroscopy (RS) are two non-invasive optical modalities also used to extract textural and spectral wave features [23,24,25]. In [23], SVM and random forest algorithms are applied to OCT and RS, and 85% accuracy was achieved [23]. Spatial frequency domain imaging (SFDI) has also been used to characterize skin burns. An SVM can be trained with SFDI reflectance data at multiple spatial frequencies and can reach an accuracy of 92.5% [24]. These studies show that machine learning effectively diagnoses and manages burn care [25].

While inhalation injuries are a uniquely burn-related complication, imaging techniques on bronchoscopy images are applied in other fields, such as pulmonology [26,27]. Machine learning has been utilized and shown to accurately classify fibrotic lung disease using CT imaging, diagnose and differentiate COVID-19-associated pneumonia from other pneumonia, and even accurately diagnose lung cancer using low-dose CT imaging [26]. One study took this use of machine learning even further. Bronchoscopy imaging is used to visualize lung cancer, but physicians cannot diagnose the specific type of lung cancer. A recent study successfully used machine learning to estimate particular types of lung cancer using bronchoscopy imaging at an accuracy of 86% [27]. The diagnoses made were shown to be specific and accurate [27]. This is something we aimed to emulate in our study. We set out to utilize bronchoscopy imaging to accurately diagnose a specific grade of inhalation injury.

Transfer learning has proven to be effective in handling problems without large datasets, such as medical imaging [28]. Transfer learning retrains deep networks, pre-trained with a large amount of data, to solve the problem of data scarcity. VGG-19 [29], which is a form of CNN-based architecture, achieves an F1 score of 0.81–1 in COVID-19 recognition. In this experiment, a dataset was used which consisted of X-Ray images, Ultrasound images, and computerized tomography (CT) images [14]. Compared with AlexNet [30], GoogLeNet [31], and VGG-16 [32], ResNet-50 [33] achieves the highest accuracy of 56.09–81.41%, recognizing common thorax diseases on the ChestX-ray8 dataset, which contains chest X-ray images from 32,717 patients with eight common thorax diseases [34]. Moreover, a comprehensive multi-source dataset, built with X-ray and CT scan images for COVID-19 detection, proves that the AlexNet model can provide an accuracy of up to 98% [35]. ResNet-50 is also proven effective in recognizing malignant and benign tissues on CT scan images and achieves an accuracy of 99% [36]. For the cases where the learned features between the raw image and medical images mismatch, a novel transfer learning method has been shown to be promising in overcome the previous shortcomings by training deep learning models on a large unlabeled medical image dataset. This is then followed by the transfer of knowledge to train a deep learning model on the basis of a dataset comprised of a small number of labeled medical images [37]. The contrast-limited adaptive histogram equalization (CLAHE) method, which is used for enhancing the details, textures, and local contrast of the images, is improved with a log-normalization function that normalizes the intensity contrast of the images to normalized-CLAHE (N-CLAHE) [38]. Self-supervised learning (SSL), federated learning (FL), and generative adversarial network (GAN) methods are also proven to be applicable in the biomedical area [39]. Additionally, with transfer learning, ResNet 101 achieved the best accuracy of 95% in the skin burn diagnosis area [40].

1.2. Contribution

The main contributions of this paper are as follows:

We propose a novel grading method for evaluating the severity of inhalation injury. Conventional inhalation diagnostic methods focused on the percentage of inhalation injury. However, our proposed inhalation diagnostic method is novel in that the method determines the severity of inhalation injuries based on our proposed deep machine learning algorithm with bronchoscopy images. Moreover, compared to the current manual grading system which depends on examiners, our proposed method gives quantitative and consistent results, which do not depend on inconsistent and subjective examiners’ decisions.
Our proposed algorithm provides functionality that optimizes the hyperparameters of its deep machine learning model in terms of prediction accuracy of grading the severity of inhalation injuries. These include factors such as learning rate, drop period, max epochs, and mini-batch size. To achieve this, data augmentation and typical CNN-based models were also implemented for comparison with our proposed transfer learning method and exploration of higher performances. As a result, the proposed algorithm provides an average testing accuracy of 86.11%, which shows the potential to predict the severity of inhalation injuries.
We analyze the impact of data augmentation and transfer learning by including or excluding these factors, respectively, in or from the algorithm. That is, we evaluate accuracy performance, in this paper, for the following combinations of methods, factors and paramteres: (1) transfer learning with data augmentation, (2) transfer learning without data augmentation, (3) non-transfer learning with data augmentation, and (4) non-transfer learning with data augmentation.

1.3. Paper Organization

The rest of this paper is organized as follows. In Section 2, we explain the process of developing the dataset and introduce the methods we applied in this study. In Section 3, we display the experiment results. We analyze the experiment results, compare our proposed method with previous ones, highlight the contribution of this study, and discuss the future steps we could undertake in Section 4.

2. Materials and Methods

2.1. Dataset Development

2.1.1. Image Collection

The images which are used in this paper were collected by Drs. Griswold and Pang, both burn surgeons at the Timothy J. Harnar Regional Burn Center/Department of Surgery at Texas Tech University Health Science Center (TTUHSC) following the IRB (IRB#00000096) approved by the institutional review board for the protection of human subjects. Bronchoscopy is the gold standard used for evaluating airway and diagnosing inhalational injuries. A bronchoscopy, in this instance, is typically performed by the physician or physician trainees (residents). Learning to perform a bronchoscopy is included in medical training.

Additionally, it should be noted that there are few options to visualize an inhalation injury other than using a bronchoscopy. The diagnosis of the inhalational injury is made by the physician performing the bronchoscopy through the visualization of carbonaceous deposits, blistering, or fibrin casts in the bronchial tree. This diagnosis is entirely subjective and dependent on the physician. This is what primarily led us to develop a program utilizing machine learning and specific parameters to better diagnose and grade inhalation injuries. The way we visualize the injuries is no different from the current standard. The only change we make is by implementing an objective measurement (machine learning model) instead of the traditional subjective measurement form (physician determination and discretion). During the image collection, a thin, flexible, tubular camera (bronchoscope) is passed through the patient’s endotracheal tube and into the bronchi, where the bronchoscope takes images of injured bronchi.

A mechanical ventilator machine is an assistant tool that respiratory therapists and physicians use to treat respiratory failure patients, especially after sustaining an inhalational injury. It acts as a bellow to move air in and out of the lungs, as shown in Figure 2. Therapists and doctors set the ventilator to control how often it pushes air into the lungs and how much air is received.

The current grading systems for severity grading of inhalation injuries have an inconsistent cause-and-effect relationship between grades and the period when patients require mechanical ventilation. The current AIS grades are not highly correlated with mortality [11,12,13]. However, inhalation injury grades are proven to be highly associated with mortality, the period during which patients require mechanical ventilation and f fluid resuscitation [12].

In this paper, we collect images from eighteen patients’ bronchi and separate them into six groups, corresponding to six categories of inhalation injury severity, according to the period during which patients require mechanical ventilation. The quantity of time requiring mechanical ventilation measured in days is an objective outcome measurement we used to quantify injury severity. The longer the patient required mechanical ventilation, the more severe the injury. We regard (1) extubating under 24 h as degree 1, (2) extubating between 1–2 days as degree 2, (3) extubating between 3–7 days as degree 3, (4) extubating between 8–14 days as degree 4, (5) extubating between 14–30 days as degree 5, and (6) extubating after 30 days as degree 6 as shown in Figure 3. The eighteen patients are divided into (1) two degree 1, (2) six degree 2, (3) three degree 3, (4) two degree 4, (5) three degree 5, and (6) two degree 6.

2.1.2. Image Preprocessing

The initial dataset we collected from the patients has various quality issues, such as blurring and differences in intensity. These various quality issues can cause bias during deep learning model training because of the differences in the image histograms. To minimize the effects of different qualities of the dataset and make the deep learning network focus more on features of inhalation injury, we implement the N-CLAHE algorithm, described by K. Koonsanit et al. [38]. However, before N-CLAHE is used, we convert the initial images into grayscale.

N-CLAHE algorithm consists of two main steps:

1.: Normalization

Original images may have some intensities which need to be corrected by adjusting the intensity contrast values by the linear normalization function in (1):

I_{N} = (I - Min) \frac{newMax - newMin}{Max - Min} + newMin

(1)

where

I_{N}

is a normalized image,

I

is an initial image,

Max

is the maximum intensity value of pixels in the initial image,

Min

is the minimum intensity value of pixels in the initial image,

newMax

is the maximum value of the normalized image which we set as 255 in this study, and

newMin

is the minimum intensity value of the normalized image which we set to 0 in this paper.

2.: Contrast-Limited Adaptive Histogram Equalization (CLAHE) [41]

The above normalization step enhances an original image in some aspects, e.g., textures and local contrast. The CLAHE process contains three stages: firstly, divide the original image into several nonoverlapping equal-size regions; secondly, calculate the histogram of each region and obtain the clip limit for clipping histograms; thirdly, reassign each histogram so that its height does not exceed the clip limit. The clip limit β can be obtained as in (2):

β = \frac{M \times N}{L} (1 + \frac{α}{100} (s_{\max} - 1))

(2)

where M × N is the number of pixels in each region, L is the number of grayscales,

α

is a clip factor in (0, 100), and

s_{\max}

is the maximum allowable slope. In this paper, we set the values of M, N, L,

α

, and

s_{\max}

to 8, 8, 256, 40, and −1.4, respectively, which results in a β value of 0.01. When original images are compared with preprocessed images as shown in Figure 4, the intensity variance is reduced, and the details of injuries are enhanced. That will help deep learning models to focus on the features of inhalation injuries.

2.2. Method

Machine learning technologies have been widely used in real-world applications. However, labeled training data can often be expensive, inaccessible, or hard to obtain. Transfer learning is proposed to solve this problem by training high-performance models with more easily obtained and larger amounts of data [42]. In this paper, we collected 125 bronchoscopy images from a total of six degrees which are: (1) 13 images of degree 1, (2) 35 images of degree 2, (3) 24 images of degree 3, (4) 22 images of degree 4, (5) 13 images of degree 5, and (6) 18 images of degree 6. The size of our dataset was limited, so transfer learning was applied to solve the size issue and obtain a deep learning model for recognizing the severity of inhalation injury.

2.2.1. Learning and Testing Pipeline

The learning and testing pipeline is shown in Figure 5. The preprocessing is first applied to the original image dataset. In the preprocessing step, the images are converted into grayscale, and their color maps are saved. Then, we applied N-CLAHE algorithm to reduce the intensity variance and enhance the details. After N-CLAHE, these grayscale images are converted back into color with color maps. After preprocessing, the image dataset is resized to fit the input layers of pre-trained models. For example, VGG-16/VGG-19 requires an input image size of 224 × 224 × 3, and SqueezeNet requires an input image size of 227 × 227 × 3.

2.2.2. Data Augmentation

We train deep learning networks with the current dataset, which has a limited size. To avoid overfitting coming from a limited database size, we perform data augmentation after resizing [43]. We confine the way of data augmentation only to the rotation and scaling of our bronchoscopy images to avoid label-changing at some distortion magnitude in data augmentation. The details of the rotation and scaling in our data augmentation are described in the following:

3.: Image rotation:

The rotated image output

I_{o u t p u t}

is obtained by rotating the original image

I_{i n p u t}

clockwise by positive angles

θ

as in Equations (3) and (4):

x_{o u t p u t} = c o s θ (x_{i n p u t} - x_{o}) - s i n θ (y_{i n p u t} - y_{o}) + x_{o}

(3)

y_{o u t p u t} = c o s θ (y_{i n p u t} - y_{o}) + s i n θ (x_{i n p u t} - x) + y_{o}

(4)

where

(x_{o u t p u t}, y_{o u t p u t})

are the coordinates of the rotated image,

(x_{i n p u t}, y_{i n p u t})

are the coordinates of the original image,

θ

is the angle of rotation, and

(x_{o}, y_{o})

are the coordinates of the center of rotation.

4.: Image scaling:

We implemented the vertical direction scaling, horizontal direction scaling, and the whole-image scaling models. Bilinear interpolation is applied in image scaling. When we just scale the image in one direction, horizontally (x) or vertically (y), we apply Equations (5) and (6), of which there is an example in the x direction:

P i x e l (x, y_{1}) = \frac{x_{2} - x}{x_{2} - x_{1}} P i x e l (x_{1}, y_{1}) + \frac{x - x_{1}}{x_{2} - x_{1}} P i x e l (x_{2}, y_{1})

(5)

P i x e l (x, y_{2}) = \frac{x_{2} - x}{x_{2} - x_{1}} P i x e l (x_{1}, y_{2}) + \frac{x - x_{1}}{x_{2} - x_{1}} P i x e l (x_{2}, y_{2})

(6)

where

P i x e l (x_{i}, y_{i})

is the pixel value in the position

(x_{i}, y_{i})

of the original image, x is the coordinate in the horizontal direction of the interpolated pixel, (

x_{1}, y_{1}

), (

x_{2}, y_{1}

), (

x_{1}, y_{2}

), (

x_{2}, y_{2}

) are the four nearest points in the original image, with

x_{2} = x_{1} + 1

,

y_{2} = y_{1} + 1

.

When applying the whole-image scaling, we followed Equation (7):

\begin{matrix} P i x e l (x, y) \\ = \frac{1}{(x_{2} - x_{1}) (y_{2} - y_{1})} [\begin{matrix} x_{2} - x & x - x_{1} \end{matrix}] [\begin{matrix} P i x e l (x_{1}, y_{1}) & P i x e l (x_{1}, y_{2}) \\ P i x e l (x_{2}, y_{2}) & P i x e l (x_{2}, y_{2}) \end{matrix}] [\begin{matrix} y_{2} - y \\ y - y_{1} \end{matrix}] \end{matrix}

(7)

where y is the coordinate in the vertical direction of the interpolated pixel.

Image reflection is not applied in either horizontal or vertical directions as data augmentation in this paper. Because bronchoscopy images are not symmetrical (central, vertical, or horizontal), the reflection may cause distortion. Data augmentation extends the size of the dataset from 125 images to 1000 images. We divide these images into 70% training and 30% testing sets. The training set is used to train the selected models, and the testing set is used to test trained models with performance metrics, including precision, sensitivity, specificity, accuracy, and F1 score.

2.2.3. Transfer Learning

Transfer learning is an approach that enhances the performance of traditional machine learning methods when the database size is small [42]. We consider the ImageNet database, which contains 1000 classes and 14,000,000 labeled images, as the source domain

D_{S}

, with an average image resolution of 496 × 387 × 3. The source learning task

T_{S}

is the training process that was used to train and obtain the pre-trained model

M_{P}

. At the same time, the bronchoscopy image dataset developed in this paper was regarded as the target domain

D_{T}

, with an average image resolution of 512 × 512. The target learning task

T_{T}

is the process used to figure out how to grade the severity of inhalation injuries, where

D_{S} \neq D_{T}

, and

T_{S} \neq T_{T}

. In this case, transfer learning contributed to measuring the target predictive function

f (\cdot)

from

D_{T}

in

T_{T}

as shown in Figure 6a, which determined the injury degree from bronchoscopy result [44] in Equation (8):

D = f (I_{i n p u t})

(8)

where D is the predicted degree of inhalation injury,

I_{i n p u t}

is the input image fed in the model.

The flow chart of transfer learning training is shown in Figure 6b; transfer learning is a “black box” method, which means that we can use the pre-trained models directly to simplify the training process. For example, if VGG-16 [29] is selected as our pre-trained model, it will be divided into two parts: fixed layers and learnable layers. We keep all the weights in fixed layers, which consist of convolutional layers and ReLU layers, to simplify the training because these layers take information from the previously learned task with ImageNet. Indeed, they have the potential to recognize the characteristics in images. VGG-16 can recognize 1000 classes; however, there are six grades of inhalational injuries. Hence, adjusting the size of the fully connected layer and SoftMax layer from 1 × 1 × 1000 to 1 × 1 × 6 and retraining these layers are necessary; these layers become the last learnable layers. Then, the retrained VGG-16 model will recognize the grades of input images.

2.2.4. Model Selection

Deep convolutional neural network (CNN) models are outstanding at analyzing images. Transfer learning based on deep CNN has proven effective in medical image classification [45]. In this study, we select six pre-trained deep CNN-based models and retrain the weights in the last learnable layers, which are used to predict the probability of the image belonging to each category, using the training set. Then, the testing set was used to evaluate the performance of the updated models. To find the optimized model, we focused on which models are widely used in transfer learning and available in packaged form through trusted public libraries, such as Keras. This is discussed below.

1.: VGG-16

VGG-16 is a CNN-based model used in object detection and the classification of up to 1000 different categories with high accuracy. Its architecture has small convolution filters, which make the network converge faster. They also reduce the number of weight parameters so that the tendency to overfit decreases. The fully connected and SoftMax layers were retrained to fit our six categories.

2.: VGG-19

VGG-19 [29] is also a CNN-based model with a similar architecture to VGG-16, both of which use 3 × 3 filters to improve accuracy. The difference between VGG-16 and VGG-19 is that VGG-16 has 16 layers in the base model, while VGG-19 has 19 layers. The fully connected and SoftMax layers were retrained to six dimensions, thus fitting the six categories.

3.: SqueezeNet

SqueezeNet [46] is a CNN-based model with smaller architecture. The base model was pre-trained on more than a million images from the ImageNet dataset and provided high computational efficiency and achieved AlexNet-level [30] accuracy on ImageNet with fewer parameters. In this paper, we trained the convolution layer at the end of the model and the SoftMax layer to fit six degrees.

4.: ResNet-18

ResNet-18 [33] is a CNN-based model which is 18 layers deep. A residual learning framework is present to ease the training of deeper networks. This network has lower complexity with deeper architecture, preventing overfitting. The fully connected layer and the SoftMax layer were trained to recognize six categories.

5.: ResNet-50

ResNet-50 [33] is a CNN-based model which is 50 layers deep. Like ResNet-18, more complex residual blocks are also used in ResNet-50 to prevent overfitting and the gradient from vanishing. The fully connected layer and the SoftMax layer were still trained for six categories in this study.

6.: GoogLeNet

GoogLeNet [47] is a CNN-based model which is 22 layers deep. It uses inception architecture to reduce the number of parameters, including weights and biases. Global average pooling is applied at the end of the network instead of a fully connected layer, decreasing the number of trainable parameters. The averaging layer and the SoftMax layer at the end of the network were also trained to fit our six categories.

7.: CNN-13

We designed two typical CNN models to compare transfer learning and traditional image classifiers: CNN-13 and CNN-25. CNN-13 is 13 layers deep: the 3 convolutional layers, 2 max-pooling layers, and the fully connected layer can learn the features of the image dataset, and the SoftMax layer provides the probability of the image belonging to the corresponding grade. The architecture of CNN-13 is shown in Figure 7a.

In convolution layers (

L_{C}

), we applied 3 × 3 kernel (K) and stride (s) as 1. A total of 32 kernels formed the filter bank where the kernels were connected to the same region for collecting features such as the green square, shown in Figure 7b of the output of the previous layer. Batch normalization is a widely used regularization technique which is normally activated by Equation (9):

{\hat{x}}_{i} = \frac{x_{i} - μ_{B}}{\sqrt{σ_{B}^{2}}}

(9)

where

x_{i}

is the input elements of the batch normalization layer (

L_{B}

), which is composed also of output elements from

L_{C}

.

μ_{B}

and

σ_{B}

are the mean and variance over the dimensions (d) of input. Further activation was applied following Equation (10):

{\hat{y}}_{i} = γ + δ {\hat{x}}_{i}

(10)

where δ and γ are learnable parameters updating during training.

Then, the ReLU function was used as the nonlinear activation function after

L_{B}

, as shown in Equation (11):

g (x) = {\begin{matrix} x x \geq 0 \\ 0 x < 0 \end{matrix}

(11)

where x represents each element

{\hat{y}}_{1}

–

{\hat{y}}_{N}

from the output matrix

Y_{B}

of batch normalization layers, as shown in Figure 7b, and N is the number of elements in

Y_{B}

. ReLU function thresholds represent each input element of the ReLU layer where any value less than zero will be set to zero.

SoftMax function in Equation (12) was used to measure the probability,

P_{j}

, of the input image belonging to the corresponding degree j from the output

f_{j}

of the fully connected layer:

P_{j} (f) = \frac{e^{f_{j}}}{\sum_{i = 1}^{N_{I}} e^{f_{i}}}

(12)

where

j \in {1, 2, 3, 4, 5, 6}

is the serial number of injury degree,

f_{j}

is the output from a fully connected layer of degree j,

N_{I}

is 6 in this case which means there are 6 degrees,

f_{i}

is output from the fully connected layer with the index number i, and

P_{j} (x)

is the probability of the input image belonging to degree j.

8.: CNN-25

Similar to CNN-13, CNN-25 is also a typical CNN-based model which is 25 layers deep. We inserted six more convolution layers to observe if the accuracy of a typical CNN-based model increases with a deeper architecture. The architecture of CNN-25 is shown in Figure 8.

2.2.5. Experiment Set Up

In this study, we finished all experiments with Matlab 2022a and the deep network designer developed by MathWorks. Individual experiments are designed for each base model to achieve the best performance metrics. Multiple hyperparameter ranges are tested to fit different models, as shown in Table 1. A larger learning rate makes the model learn faster while obtaining suboptimal weights; a smaller learning rate helps the model to achieve more optimal weights while taking a longer time to train. In this study, we apply a learning rate schedule, dropping the learning rate during training instead of relying on a fixed learning rate to optimize the learning rate [48]. The initial learning rate is updated every period, which is a certain number of epochs, by multiplying with a certain factor. Mini-batch [49], a typical method to smoothen the gradient descent during training, is also applied in this study and shuffled every epoch. We also performed comparison experiments without data augmentation to analyze the impact of data augmentation.

3. Results

The accuracy in each degree with different models is shown in Figure 9, where red, green, blue, yellow, purple, orange, rosy-brown, and khaki dots represent the accuracies of VGG-16, VGG-19, SqueezeNet, ResNet-18, ResNet-50, GoogLeNet, CNN-13, and CNN-25, and the dotted lines represent the overlapped results, respectively. Specifically, Figure 9a,b shows the accuracies with and without data augmentation, respectively. As shown in Figure 8, data augmentation improves by 25–75% in degree 1, 10–20% in degree 2, 14–86% in degree 3, 14–34% in degree 4, 25–75% in degree 5, and 20–80% in degree 6. Besides, the models trained without data augmentation can provide 20%, 10%, 10%, and 20% higher accuracies in degree 2 with VGG-16, SqueezeNet, GoogLeNet, and CNN-25.

We also use performance metrics which consist of (1) precision, (2) sensitivity, (3) specificity, (4) accuracy, and (5) F-measure (F1-score) to evaluate optimized models; these indexes are calculated with Equations (13)–(17) for every individual degree. In our multi-class case, performance metrics are calculated separately, degree by degree, with the one versus all method being used. This regards one degree as degree A and the other degrees as degree B [50]. For example, when we calculate performance metrics of degree 2, degree 2 is regarded as degree A, and the other degrees (1, 3, 4, 5, 6) are regarded as degree B, where true positive (TP) counts the number of images that the model recognized in degree A correctly; false positive (FP) counts the number of images that the model misrecognized from degree B as in degree A; true negative (TN) counts the number of images that the model recognized correctly as being in degree B; false negative (FN) counts the number of images that the model misrecognized from degree A as in degree B. Specifically, the sensitivity, in this case, refers to the ability of the network identify each degree. Additionally, the specificity refers to the ability of the network to correctly recognize the images which do not belong to each degree [50].

precision = \frac{TP}{TP + FP}

(13)

sensitivity = \frac{TP}{TP + FN}

(14)

specificity = \frac{TN}{TN + FP}

(15)

accuracy = \frac{TP + TN}{P + N}

(16)

F 1 score = 2 \frac{precision \cdot sensitivity}{precision + sensitivity}

(17)

Our current dataset is not balanced, and the dataset distribution is shown in Figure 9c, where the images in degrees 1, 2, 3, 4, 5, and 6 take 10.4%, 28%, 19.2%, 17.6%, 10.4%, and 14.4%, respectively. In Figure 9d, we compare the specificity and sensitivity of each method on degree 2 without data augmentation, where the networks provide higher sensitivity and lower specificity at the same time. This suggests that the false positive predictions are abnormal high, and the networks tend to predict input images as degree 2 no matter the images belong to.

In Figure 9e, the average values of sensitivity and specificity over all degrees from each model are shown, and it can be seen that the models trained with data augmentation provide higher specificity and sensitivity. This indicates that data augmentation reduces the false negative and false positive rates, which means fewer cases of injury are missed and fewer cases are classified to other degrees.

Besides transfer learning models, we designed and tested two typical CNN-based models: CNN-13 and CNN-25. As shown in Figure 9e, the sensitivity and specificity of CNNs are always lower than those of transfer learning methods with and without data augmentation.

The micro average performance metric values of six degrees are used to represent the performance of the methods. The performance metrics of the optimized models are shown in Table 2 where the bold in the table highlights best performance over all methods, and GoogLeNet achieves the best performance with the highest index values. It is also observed that CNN-25 and CNN-13 could not achieve the accuracy that transfer learning reached.

We also compare the performance metrics of each model, both with and without data augmentation. The performance metrics of optimized models without data augmentation are shown in Table 3 where the bold in the table highlights best performance over all methods. Comparing the corresponding values in Table 2 and Table 3, we can observe data augmentation obviously increases the performance of deep models and the accuracy of the severity grading.

Additionally, to determine if the depth of models increasing causes overfitting in our case, we compared VGG-16/VGG-19, CNN-13/CNN-25, and ResNet-18/ResNet-50. It is observed that the accuracy of VGG-19 improves by 19.45% and 22.22% with and without data augmentation compared to VGG-16, and the accuracy of ResNet-50 improves by 16.66% and 13.89% with and without data augmentation compared to ResNet-18. However, the accuracy of CNN-25 decreases by 8.33% with data augmentation compared to CNN-13.

4. Discussion

Table 2 shows that the typical CNN-based model’s accuracy is between 36.11% and 44.44%, which is always lower than transfer learning. The accuracy even declines when the depth of the model increases since the limited size of our current dataset causes overfitting when the complexity of the model grows. In contrast, the overfitting issue does not happen to the transfer learning methods. For example, when we compare the following pairs of transfer learning methods, (1) VGG-16 versus VGG-19, (2) ResNet-18 versus ResNet-50, the model’s accuracy increases as the depth grows. This proportional relation between the accuracy and the depth may arise because the pre-trained weights and biases in fixed layers, introduced in Section 2.2.2, may effectively reduce the complexity of models during training, which results in only 2–3 learnable layers needing to be updated.

Figure 9c shows the total number of images used for training or testing the methods. In this paper, grade 2 images are relatively more common (28% of total images) than the other grades 1, 3, 4, 5, and 6. This higher percentage of grade 2 images leads to a higher a priori probability in prediction, which causes these networks to tend to classify the input images to degree 2, no matter what degree the images belong to.

Due to this data imbalance, performance metrics such as sensitivity and specificity may be biased if the data augmentation or transfer learning is not included. Table 2 and Table 3 show data augmentation’s impact on all the performance metrics: accuracy, sensitivity, specificity, F-1 score, and precision. Conversely, Figure 9e only visualizes each method’s average sensitivity and specificity with and without data augmentation among these performance metrics. As shown in Figure 9e, the average sensitivity and specificity increase when data augmentation is applied since the expanded size of the dataset coming from the data augmentation reduces overfitting during training. Moreover, transfer learning employed in VGG-16, VGG-19, SqueezeNet, ResNet-18, ResNet-50, and GoogLeNet gives higher sensitivity and specificity than CNN-13 and CNN-25, which do not have the transfer learning component. Furthermore, more hidden layers improve the sensitivity and specificity of the methods using the transfer learning component, while the more hidden layers did not improve the sensitivity and specificity of the method which does not have the transfer learning component, as shown in Figure 9e.

To analyze only the impact of the transfer learning, we did not include the data augmentation component for each method in Figure 9d. As shown in Figure 9d, the specificity values of CNN-13 and CNN-25 without data augmentation are zero, while the specificity values of the other methods are not. This is because CNN-13 and CNN-25 are not equipped with the transfer learning component, so the ability to correctly classify non-degree 2 data into non-degree 2 is low, i.e., the specificity value is small when the data augmentation is not implemented. Hence, the specificity value is seen to be vulnerable to the lack of both transfer learning applications. Moreover, Figure 9d shows that CNN-13 and CNN-25 give 100% sensitivity, while they give 0% specificity values without data augmentation in terms of grade 2. This indicates that the unbalanced dataset distribution led to networks having a higher probability of predicting the degree which takes the highest percentages in the dataset.

The novelty of this study and method cannot be understated. Currently, there is no objective way to diagnose and grade an inhalation injury. This is entirely dependent on the healthcare provider, who makes a subjective determination. This study aims to establish an objective and consistent way for providers to diagnose and grade inhalation injuries. Additionally, establishing a standardized and objective method to diagnose inhalation injuries will allow providers to care for their patients more accurately by understanding the full and accurate extent of their inhalation injury. This will be especially important given that inhalation injuries are complications historically associated with higher mortality and morbidity in the burn population.

In conclusion, our experimental results show that our proposed transfer learning method performs with higher accuracy than the typical CNN-based model, which is a non-transfer learning method, and that data augmentation improves the average accuracy. Specifically, the proposed algorithm with GoogLeNet provides the highest average accuracy of 86.11%. It indicates that grading the severity of inhalation injury can be predicted by bronchoscopy images. According to the expected grading of the severity, the period of mechanical ventilation will be determined by clinicians. Since the dataset is limited in this paper, evaluating and improving our proposed algorithm with a larger dataset will be required to use this approach to aiding burn surgeons in clinics. Hence, our future work will involve collecting a larger amount of bronchoscopy images from patients who suffer inhalation injuries and training the model with these images. In this procedure, more understanding of a larger dataset is expected to improve our proposed method. As a result, the improved method is expected to make the first step to standardizing therapy in an efficacious manner by overcoming the current limitations of bronchoscopy-based severity estimation of inhalational injuries, such as a lack of accuracy and consistency in each degree and an unbalanced dataset. This is will be achieved by objectifying the process of inhalational injury grading, and developing an assisting tool determining the period of mechanical ventilation based on objectively predicted injury grades.

Author Contributions

Conceptualization, A.W.P.; Methodology, Y.L.; Software, Y.L.; Validation, Y.L.; Formal analysis, Y.L.; Data curation, A.W.P., J.Z., F.Z., K.M.; Writing—review & editing, A.W.P., J.Z., F.Z., K.M. and J.W.C.; Supervision, J.A.G. and J.W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in approved by the Institutional Review Board of Texas Tech University Health Sciences Center (IRB# L18-186, approved on 22 August 2022).

Informed Consent Statement

Patient consent was waived due to the retrospective nature of this study and approval by the Institutional Review Board of Texas Tech University Health Sciences Center.

Conflicts of Interest

The authors declare no conflict of interest.

References

American Burn Association. National Burn Repository: 2002 Report Dataset Version 8; American Burn Association: Chicago, IL, USA, 2002. [Google Scholar]
Veeravagu, A.; Yoon, B.C.; Jiang, B.; Carvalho, C.M.; Rincon, F.; Maltenfort, M.; Jallo, J.; Ratliff, J.K. National trends in burn and inhalation injury in burn patients: Results of analysis of the nationwide inpatient sample database. J. Burn Care Res. 2015, 36, 258–265. [Google Scholar] [CrossRef]
American Burn Association. National Burn Repository: 2019 Update Dataset Version 14.0; American Burn Association: Chicago, IL, USA, 2019. [Google Scholar]
Merrel, P.; Mayo, D. Inhalation injury in the burn patient. Crit. Care Nurs. Clin. 2004, 16, 27–38. [Google Scholar] [CrossRef] [PubMed]
Traber, D.; Hawkins, H.; Enkhbaatar, P.; Cox, R.; Schmalstieg, F.; Zwischenberger, J.; Traber, L. The role of the bronchial circulation in acute lung injury resulting from burn and smoke inhalation. Pulm. Pharmacol. Ther. 2007, 20, 163–166. [Google Scholar] [CrossRef] [PubMed]
Shirani, K.Z.; Pruitt, B.A., Jr.; Mason, A.D., Jr. The influence of inhalation injury and pneumonia on burn mortality. Ann. Surg. 1987, 205, 82. [Google Scholar] [CrossRef]
Herlihy, J.; Vermeulen, M.; Joseph, P.; Hales, C. Impaired alveolar macrophage function in smoke inhalation injury. J. Cell. Physiol. 1995, 163, 1–8. [Google Scholar] [CrossRef] [PubMed]
Al Ashry, H.S.; Mansour, G.; Kalil, A.C.; Walters, R.W.; Vivekanandan, R. Incidence of ventilator associated pneumonia in burn patients with inhalation injury treated with high frequency percussive ventilation versus volume control ventilation: A systematic review. Burns 2016, 42, 1193–1200. [Google Scholar] [CrossRef]
Mlcak, R.P.; Suman, O.E.; Herndon, D.N. Respiratory management of inhalation injury. Burns 2007, 33, 2–13. [Google Scholar] [CrossRef]
Endorf, F.W.; Gamelli, R.L. Inhalation injury, pulmonary perturbations, and fluid resuscitation. J. Burn Care Res. 2007, 28, 80–83. [Google Scholar] [CrossRef]
Albright, J.M.; Davis, C.S.; Bird, M.D.; Ramirez, L.; Kim, H.; Burnham, E.L.; Gamelli, R.L.; Kovacs, E.J. The acute pulmonary inflammatory response to the graded severity of smoke inhalation injury. Crit. Care Med. 2012, 40, 1113. [Google Scholar] [CrossRef]
Jones, S.W.; Williams, F.N.; Cairns, B.A.; Cartotto, R. Inhalation injury: Pathophysiology, diagnosis, and treatment. Clin. Plast. Surg. 2017, 44, 505–511. [Google Scholar] [CrossRef]
Mosier, M.J.; Pham, T.N.; Park, D.R.; Simmons, J.; Klein, M.B.; Gibran, N.S. Predictive value of bronchoscopy in assessing the severity of inhalation injury. J. Burn Care Res. 2012, 33, 65–73. [Google Scholar] [CrossRef] [PubMed]
Horry, M.J.; Chakraborty, S.; Paul, M.; Ulhaq, A.; Pradhan, B.; Saha, M.; Shukla, N. COVID-19 detection through transfer learning using multimodal imaging data. IEEE Access 2020, 8, 149808–149824. [Google Scholar] [CrossRef] [PubMed]
Mazurowski, M.A.; Buda, M.; Saha, A.; Bashir, M.R. Deep learning in radiology: An overview of the concepts and a survey of the state of the art with focus on MRI. J. Magn. Reson. Imaging 2019, 49, 939–954. [Google Scholar] [CrossRef] [PubMed]
Yamashita, R.; Nishio, M.; Do, R.K.G.; Togashi, K. Convolutional neural networks: An overview and application in radiology. Insights Into Imaging 2018, 9, 611–629. [Google Scholar] [CrossRef]
Walker, P.F.; Buehner, M.F.; Wood, L.A.; Boyer, N.L.; Driscoll, I.R.; Lundy, J.B.; Cancio, L.C.; Chung, K.K. Diagnosis and management of inhalation injury: An updated review. Crit. Care 2015, 19, 351. [Google Scholar] [CrossRef]
Yadav, D.; Sharma, A.; Singh, M.; Goyal, A. Feature extraction based machine learning for human burn diagnosis from burn images. IEEE J. Transl. Eng. Health Med. 2019, 7, 1800507. [Google Scholar] [CrossRef]
Rangel-Olvera, B.; Rosas-Romero, R. Detection and classification of burnt skin via sparse representation of signals by over-redundant dictionaries. Comput. Biol. Med. 2021, 132, 104310. [Google Scholar] [CrossRef]
Suha, S.A.; Sanam, T.F. A deep convolutional neural network-based approach for detecting burn severity from skin burn images. Mach. Learn. Appl. 2022, 9, 100371. [Google Scholar] [CrossRef]
Lee, S.; Lukan, J.; Boyko, T.; Zelenova, K.; Makled, B.; Parsey, C.; Norfleet, J.; De, S. A deep learning model for burn depth classification using ultrasound imaging. J. Mech. Behav. Biomed. Mater. 2022, 125, 104930. [Google Scholar] [CrossRef]
Chauhan, J.; Goyal, P. BPBSAM: Body part-specific burn severity assessment model. Burns 2020, 46, 1407–1423. [Google Scholar] [CrossRef]
Rangaraju, L.P.; Kunapuli, G.; Every, D.; Ayala, O.D.; Ganapathy, P.; Mahadevan-Jansen, A. Classification of burn injury using Raman spectroscopy and optical coherence tomography: An ex-vivo study on porcine skin. Burns 2019, 45, 659–670. [Google Scholar] [CrossRef] [PubMed]
Rowland, R.A.; Ponticorvo, A.; Baldado, M.L.; Kennedy, G.T.; Burmeister, D.M.; Christy, R.J.; Bernal, N.P.; Durkin, A.J. Burn wound classification model using spatial frequency-domain imaging and machine learning. J. Biomed. Opt. 2019, 24, 056007. [Google Scholar] [PubMed]
Liu, N.T.; Salinas, J. Machine learning in burn care and research: A systematic review of the literature. Burns 2015, 41, 1636–1641. [Google Scholar] [CrossRef] [PubMed]
Chauhan, N.K.; Asfahan, S.; Dutt, N.; Jalandra, R.N. Artificial intelligence in the practice of pulmonology: The future is now. Lung India Off. Organ Indian Chest Soc. 2022, 39, 1. [Google Scholar] [CrossRef] [PubMed]
Feng, P.H.; Lin, Y.T.; Lo, C.M. A machine learning texture model for classifying lung cancer subtypes using preliminary bronchoscopic findings. Med. Phys. 2018, 45, 5509–5514. [Google Scholar] [CrossRef]
Ravishankar, H.; Sudhakar, P.; Venkataramani, R.; Thiruvenkadam, S.; Annangi, P.; Babu, N.; Vaidya, V. Understanding the mechanisms of deep transfer learning for medical images. In Deep Learning and Data Labeling for Medical Applications; Springer: Berlin/Heidelberg, Germany, 2016; pp. 188–196. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Huang, S.; Dang, J.; Sheckter, C.C.; Yenikomshian, H.A.; Gillenwater, J. A systematic review of machine learning and automation in burn wound evaluation: A promising but developing frontier. Burns 2021, 47, 1691–1704. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Wang, X.; Peng, Y.; Lu, L.; Lu, Z.; Bagheri, M.; Summers, R.M. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2097–2106. [Google Scholar]
Maghdid, H.S.; Asaad, A.T.; Ghafoor, K.Z.; Sadiq, A.S.; Mirjalili, S.; Khan, M.K. Diagnosing COVID-19 pneumonia from X-ray and CT images using deep learning and transfer learning algorithms. In Multimodal Image Exploitation and Learning 2021; SPIE: Bellingham, WA, USA, 2021; Volume 11734, pp. 99–110. [Google Scholar]
Sajja, T.; Devarapalli, R.; Kalluri, H. Lung Cancer Detection Based on CT Scan Images by Using Deep Transfer Learning. Traitement Du Signal 2019, 36, 339–344. [Google Scholar] [CrossRef]
Alzubaidi, L.; Al-Amidie, M.; Al-Asadi, A.; Humaidi, A.J.; Al-Shamma, O.; Fadhel, M.A.; Zhang, J.; Santamaría, J.; Duan, Y. Novel transfer learning approach for medical imaging with limited labeled data. Cancers 2021, 13, 1590. [Google Scholar] [CrossRef]
Koonsanit, K.; Thongvigitmanee, S.; Pongnapang, N.; Thajchayapong, P. Image enhancement on digital x-ray images using N-CLAHE. In Proceedings of the 2017 10th Biomedical Engineering International Conference (BMEICON), Hokkaido, Japan, 31 August–2 September 2017; IEEE: Piscataway, NJ, USA; pp. 1–4. [Google Scholar]
Shin, H.; Shin, H.; Choi, W.; Park, J.; Park, M.; Koh, E.; Woo, H. Sample-Efficient Deep Learning Techniques for Burn Severity Assessment with Limited Data Conditions. Appl. Sci. 2022, 12, 7317. [Google Scholar] [CrossRef]
Volety, R.; Jeeva, J. Classification of Burn Images into 1st, 2nd, and 3rd Degree Using State-of-the-Art Deep Learning Techniques. ECS Trans. 2022, 107, 18323. [Google Scholar] [CrossRef]
Zuiderveld, K.J. Contrast Limited Adaptive Histogram Equalization. In Graphics Gems; Elsevier: Amsterdam, The Netherlands, 1994; pp. 474–485. [Google Scholar]
Weiss, K.; Khoshgoftaar, T.M.; Wang, D. A survey of transfer learning. J. Big Data 2016, 3, 9. [Google Scholar] [CrossRef]
Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
Lin, Y.-P.; Jung, T.-P. Improving EEG-based emotion classification using conditional transfer learning. Front. Hum. Neurosci. 2017, 11, 334. [Google Scholar] [CrossRef]
Zhao, X.; Qi, S.; Zhang, B.; Ma, H.; Qian, W.; Yao, Y.; Sun, J. Deep CNN models for pulmonary nodule classification: Model modification, model integration, and transfer learning. J. X-Ray Sci. Technol. 2019, 27, 615–629. [Google Scholar] [CrossRef]
Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
Ballester, P.; Araujo, R.M. On the performance of GoogLeNet and AlexNet applied to sketches. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning (Adaptive Computation and Machine Learning Series); MIT Press: Cambridge MA, USA, 2017. [Google Scholar]
Li, M.; Zhang, T.; Chen, Y.; Smola, A.J. Efficient mini-batch training for stochastic optimization. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 661–670. [Google Scholar]
Lalkhen, A.G.; McCluskey, A. Clinical tests: Sensitivity and specificity. Contin. Educ. Anaesth. Crit. Care Pain 2008, 8, 221–223. [Google Scholar] [CrossRef]

Figure 1. An example of grading of inhalation injury using the abbreviated injury score (AIS). (A)—no injury; (B)—mild injury; (C)—moderate injury; (D)—severe injury; (E)—massive injury.

Figure 2. An example of a mechanical ventilator. The ventilator pulls air and extracts oxygen (O₂) from an external source. The patient receives the oxygen from the tube passing into the lungs. The mechanical ventilator also removes CO₂ from the patient’s lungs. The ventilator can be adjusted to control the rate and amount of ventilation.

Figure 3. Bronchoscopy images of different degrees based on our proposed grading system. (A): degree 1, (B): degree 2, (C): degree 3, (D): degree 4, (E): degree 5, (F): degree 6.

Figure 4. Prepossessed bronchoscopy images from different degrees where (a,c,e,g,i,k) are original images from 1st degree to 6th degree; (b,d,f,h,j,l) are preprocessed images from 1st degree to 6th degree.

Figure 5. The pipeline of image preprocessing and model development. The original images are converted into gray scales and the N-CLAHE algorithm is applied, after which the images are converted back to color. The images need to be resized to fit the input layers of different models. Data augmentation, including image rotation and scaling, is applied to help fix the issue of limited dataset size. At last, the dataset is divided into the training and testing sets to validate the selected models.

Figure 6. (a) The flow chart of training transfer measure the target predictive function

f (\cdot)

from

D_{T}

in

T_{T}

. (b) The diagram of how the layers in pre-trained models (

M_{P}

) help to predict the degree of inhalation injury (D).

Figure 6. (a) The flow chart of training transfer measure the target predictive function

f (\cdot)

from

D_{T}

in

T_{T}

. (b) The diagram of how the layers in pre-trained models (

M_{P}

) help to predict the degree of inhalation injury (D).

Figure 7. (a) Architecture of CNN-13, which is a typical 13 layers deep CNN model (includes 3 convolution layers). (b) The details of how hidden layers in CNN-13 work.

Figure 8. Architecture of CNN-25, which is a typical 25-deep CNN model (includes 6 convolution layers).

Figure 9. The accuracy in each degree with different models, where red dots represent the accuracy of VGG-16, green dots represent the accuracy of VGG-19, blue dots represent the accuracy of SqueezeNet, yellow dots represent the accuracy of ResNet-18, purple dots represent the accuracy of ResNet-50, orange dots represent the accuracy of GoogLeNet, rosy-brown dots represent the accuracy of CNN-13, and khaki dots represent the accuracy of CNN-25. (a) Accuracy over each degree with augmentation. (b) Accuracy over each degree without augmentation. (c) The dataset distribution contains both the testing set and the training set. (d) Comparison of sensitivity and specificity over each method with and without data augmentation on degree 2 only. (e) Comparison of average sensitivity and specificity over each method with and without data augmentation over all degrees.

Table 1. Experiment setting.

Hyperparameters	Range
Initial learning rate (l)	$10^{- 5}$ – $10^{- 3}$
Learning rate drop period (LP)	5–15
Learning rate drop factor (LF)	0.05–0.2
Max epochs (ME)	10–50
Mini-batch size (MB)	2–6

Table 2. Performance metrics of retrained models and typical CNN models with data augmentation.

Model	Precision	Sensitivity	Specificity	Accuracy	F1 Score
VGG-16	61.11%	61.11%	92.22%	61.11%	61.11%
VGG-19	80.56%	80.56%	96.11%	80.56%	80.56%
Squeeze Net	61.11%	61.11%	92.22%	61.11%	44.44%
ResNet-18	66.67%	66.67%	93.33%	66.67%	66.67%
ResNet-50	83.33%	83.33%	96.67%	83.33%	83.33%
GoogLeNet	86.11%	86.11%	97.22%	86.11%	86.11%
CNN-13	44.44%	44.44%	88.89%	44.44%	44.44%
CNN-25	36.11%	36.11%	87.22%	36.11%	36.11%

Table 3. Performance metrics of retrained models and typical CNN models without data augmentation.

Model	Precision	Sensitivity	Specificity	Accuracy	F1 Score
VGG-16	30.56%	30.56%	86.11%	30.56%	30.56%
VGG-19	52.78%	52.78%	90.56%	52.78%	52.78%
Squeeze Net	44.44%	44.44%	88.89%	44.44%	44.44%
ResNet-18	52.78%	52.78%	90.56%	52.78%	52.78%
ResNet-50	66.67%	66.67%	93.33%	66.67%	66.67%
GoogLeNet	70.27%	70.27%	94.05%	70.27%	70.27%
CNN-13	27.78%	27.78%	85.56%	27.78%	27.78%
CNN-25	27.78%	27.78%	85.56%	27.78%	27.78%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Y.; Pang, A.W.; Zeitouni, J.; Zeitouni, F.; Mateja, K.; Griswold, J.A.; Chong, J.W. Inhalation Injury Grading Using Transfer Learning Based on Bronchoscopy Images and Mechanical Ventilation Period. Sensors 2022, 22, 9430. https://doi.org/10.3390/s22239430

AMA Style

Li Y, Pang AW, Zeitouni J, Zeitouni F, Mateja K, Griswold JA, Chong JW. Inhalation Injury Grading Using Transfer Learning Based on Bronchoscopy Images and Mechanical Ventilation Period. Sensors. 2022; 22(23):9430. https://doi.org/10.3390/s22239430

Chicago/Turabian Style

Li, Yifan, Alan W. Pang, Jad Zeitouni, Ferris Zeitouni, Kirby Mateja, John A. Griswold, and Jo Woon Chong. 2022. "Inhalation Injury Grading Using Transfer Learning Based on Bronchoscopy Images and Mechanical Ventilation Period" Sensors 22, no. 23: 9430. https://doi.org/10.3390/s22239430

APA Style

Li, Y., Pang, A. W., Zeitouni, J., Zeitouni, F., Mateja, K., Griswold, J. A., & Chong, J. W. (2022). Inhalation Injury Grading Using Transfer Learning Based on Bronchoscopy Images and Mechanical Ventilation Period. Sensors, 22(23), 9430. https://doi.org/10.3390/s22239430

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Inhalation Injury Grading Using Transfer Learning Based on Bronchoscopy Images and Mechanical Ventilation Period

Abstract

1. Introduction

1.1. Related Work

1.2. Contribution

1.3. Paper Organization

2. Materials and Methods

2.1. Dataset Development

2.1.1. Image Collection

2.1.2. Image Preprocessing

2.2. Method

2.2.1. Learning and Testing Pipeline

2.2.2. Data Augmentation

2.2.3. Transfer Learning

2.2.4. Model Selection

2.2.5. Experiment Set Up

3. Results

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI