A Survey on AI Techniques for Thoracic Diseases Diagnosis Using Medical Images

Thoracic diseases refer to disorders that affect the lungs, heart, and other parts of the rib cage, such as pneumonia, novel coronavirus disease (COVID-19), tuberculosis, cardiomegaly, and fracture. Millions of people die every year from thoracic diseases. Therefore, early detection of these diseases is essential and can save many lives. Earlier, only highly experienced radiologists examined thoracic diseases, but recent developments in image processing and deep learning techniques are opening the door for the automated detection of these diseases. In this paper, we present a comprehensive review including: types of thoracic diseases; examination types of thoracic images; image pre-processing; models of deep learning applied to the detection of thoracic diseases (e.g., pneumonia, COVID-19, edema, fibrosis, tuberculosis, chronic obstructive pulmonary disease (COPD), and lung cancer); transfer learning background knowledge; ensemble learning; and future initiatives for improving the efficacy of deep learning models in applications that detect thoracic diseases. Through this survey paper, researchers may be able to gain an overall and systematic knowledge of deep learning applications in medical thoracic images. The review investigates a performance comparison of various models and a comparison of various datasets.


Introduction
Thoracic diseases are diseases of the organs within the rib cage, including heart and lung diseases. Lung diseases result in hypoxia and dyspnea. Furthermore, some diseases may cause the failure of the entire respiratory system and thus lead to death [1], such as the novel coronavirus disease (COVID- 19), which emerged recently and became a pandemic that poses a threat to the entire world. There are several types of thoracic diseases [2,3] represented as follows: (i) asthma, COVID-19, tuberculosis (TB), and chronic obstructive pulmonary disease (COPD) are examples of diseases that affect the airways or lungs; (ii) diseases that affect the heart, such as cardiomegaly and heart failure; (iii) other diseases affecting bones and muscles in the chest, such as fracture and lung metastasis. The World Health Organization (WHO) has classified pneumonia as the third-deadliest disease in the world after heart disease and cerebral palsy. As in 2019, 2.5 million death cases from pneumonia around the world [4], 14% of all deaths of children under five years old, which results in the death of 672,000 children [5]. In addition, COVID-19 has caused more than 6.5 million death cases around the world since its emergence in 2019 [6]. Tuberculosis (TB) resulted in the death of approximately 1.5 million people in 2020.
Early detection refers to detecting symptomatic patients as early as possible, detection refers to the act of detecting or sensing something; discovering something that was hidden or disguised, and diagnosis refers to the identification of the nature and cause of an illness. As a result, human health is at a serious risk due to thoracic diseases, and early detection of these diseases improves recovery and reduces mortality. The consultant provided by Diagnostics 2022, 12, 3034 2 of 60 a thoracic expert or radiologist is solely responsible for the patient's diagnosis. However, there may be emergency situations where radiology professionals are too busy, unavailable, or unable to diagnose a large number of thoracic images rapidly [7,8]. Artificial intelligence (AI) systems can be extremely useful in this regard [9].
AI is used to analyze, display, and understand complex medical and health data. The ability of computer algorithms to guess conclusions based solely on input data are known as artificial intelligence. The major goal of health-related AI applications is to figure out how clinical procedures affect patient outcomes. AI systems are employed in diagnostics, treatment protocol creation, drug research, personalized medicine, patient monitoring, and care. What distinguishes AI technology from traditional healthcare solutions is its ability to collect and process data and deliver a specific and fast result [10]. Artificial intelligence does this through the use of machine learning (ML) and deep learning (DL) algorithms. These processes are able to recognize patterns of behavior and develop their own logic.
ML allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. Machine learning algorithms use historical data as input to predict new output values. There are four basic approaches for ML: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning [11]. The type of the algorithm; that the data scientists choose to use, depends on what type of data they want to predict. There are different learning techniques which can be summarized as follows: • Supervised learning: data scientists supply algorithms with labeled training data and define the variables they want the algorithm to assess for correlations. Both the input and the output of the algorithm are specified. Some of the most common algorithms in supervised learning include Support Vector Machines (SVM), Decision Trees, and Random Forest; • Unsupervised learning: involves algorithms that train on unlabeled data. The algorithm scans through datasets looking for any meaningful connection. The data that the algorithms train on as well as the predictions or recommendations they output are predetermined; • Semi-supervised learning: occurs when part of the given input data has been labeled. Unsupervised and semi-supervised learning can be more appealing alternatives as it can be time-consuming and costly to rely on domain expertise to label data appropriately for supervised learning; • Reinforcement learning: data scientists typically use reinforcement learning to teach a machine to complete a multi-step process for which there are clearly defined rules. Data scientists program an algorithm to complete a task and give it positive or negative cues as it works out how to complete a task. However, for the most part, the algorithm decides on its own what steps to take along the way. Table 1 illustrates a simplified summary of the four types of ML approaches. Machine learning models are trained using large amounts of input data in order to provide relevant insights and predictions. Currently, several datasets of thoracic images for different thoracic diseases are publicly available. The doctor's efficiency can be improved by AI systems, especially through DL.
AI is now widely applied in a variety of sectors, including medicine and the rapid detection of diseases. AI played a key contribution in developing a Coronavirus vaccine in record time [12]. In South Korea, intelligence assisted doctors in learning the statistics of affected persons, allowing them to predict the Coronavirus outbreak at the start of the crisis. At a time when many governments around the world were still considering the idea of imposing a blanket closure owing to the pandemic, a business in Seoul (Seegene) employed artificial intelligence to develop tests to detect the Coronavirus in weeks, whereas it would have taken months without it [13]. "People thought the delta mutant would spread across the African continent, clogging hospitals with patients, but with AI, we can control it," explains a South African AI expert. Artificial intelligence has aided scientists in gaining a better understanding of how quickly the virus can change, as well as in developing and

Method
Advantage Disadvantage

Supervised Learning
It performs classification and regression tasks. It exists notions of the output along the learning process.
It requires a labeled dataset.

Unsupervised Learning
It does not require a training data to be labeled. Classification task is fast.
There are no notions of the output along the learning process. 1.
It provides a comprehensive overview of the use of AI in detecting thoracic diseases, including COVID-19; 2.
It presented the different types of AI models that used to detect thoracic diseases and the databases that include those diseases. In addition, the progress of the works and the direction the researchers are moving in this domain throughout the recent years; 3.
To express that CNN has penetrated the field of understanding the medical picture with high accuracy; 4.
It collected many different databases for thoracic diseases with descriptions; 5.
It also presents the issues of thoracic diseases detection using deep learning found in the literature studies.

Methodology
The methodology used to conduct the survey of recent thoracic diseases detection using DL/ML models: First, an appropriate dataset containing a large number of images that includes related diseases must be selected, and this is described in detail in the next section. Second, the DL/ML algorithms that are applied to detect related diseases must be identified, and this is described in detail in the next section. In the last stage, the performance of the model used in the detection of the disease is determined.

The Taxonomy of State-of-the-Art Work on Thoracic Diseases Detection Using DL/ML
In this section, a taxonomy of the recent work on thoracic diseases detection using DL/ML is presented, which is the first contribution of this paper.
The taxonomy consists of 9 traits that are common in the surveyed articles: image type, dataset description, image pre-processing, deep learning models, ways to train deep learning, the ensemble of classifiers, pre-trained models, type of disease, and evaluation criteria.

Imaging Thoracic Exams
Medical and health protocols recommend thoracic imaging because it is a rapid and painless technique. Infected patients, including children, adults, and the elderly, are now being assessed using image scans. Imaging systems that rely on AI technologies are provided with thousands of medical images so that these systems can identify abnormal masses that could indicate the onset of disease. Then, these systems are able to identify a specific area on the radiograph for the doctor to examine with greater accuracy, thus integrating 'artificial intelligence' techniques with the doctors' efforts [30]. Some diseases of the thoracic require radiological images (as X-ray or CT-Scan) to detect the disease. Others require examination of the tissues themselves to confirm the presence of the disease. The most examination types to diagnose thoracic diseases [31] are: • Chest X-ray (CXR): can be used to check for diseases such as pneumonia [32] and a lung infection that causes fluid buildup [33]. It can also be used to detect cancer or pulmonary fibrosis, which is a scar tissue buildup in the lungs. CXR scans are commonly used in clinical practice since they are inexpensive, simple to perform, give a quick scan for the patient as two-dimensional (2D) images, and can be widely used for diagnosis and treatment of lung and cardiovascular diseases [34,35]. Although X-rays are frequently used, they have side effects such as exposure to ionizing radiation harmful to the human body and relatively low information when compared to other imaging methods; • Computerized Tomography (CT): is a more advanced imaging test that can be used to detect disorders such as cancer that an X-ray could miss [36][37][38][39]. A CT scan is a series of X-rays taken from various angles that are patched together to create a complete image. While CT scans are more reliable in diagnosing COVID-19, they are less accurate in diagnosing non-viral pneumonia-like consolidation [40]. The CT scan is very accurate spatial information and quick, but the disadvantages of the CT scan are the risk of exposure to radiation is high, require expensive equipment, and is therefore not always accessible to all levels of people; • Histopathology: often known as histology, is the microscopic examination of organic tissues in order to observe the appearance of diseased cells [41]. The tissue that was sent for testing, as well as the characteristics of the tumor under the microscope is described in a histopathology report [42]. A biopsy report or a pathology report are both terms used to describe a histopathology report. It can identify features of what cancer looks like under the microscope, or detect cardiomegaly disease [43]. Histology examination is low cost and allows an evaluation of infection distribution in various tissues. However, it needs 2-7 days of preparation time, might not detect low-level infection, and it depends on the expertise of pathologists; • Sputum Smear Microscopy: refers to the microscopic investigation of sputum [44]. This has been proved to be one of the most effective ways of detecting tuberculosis infection in patients so that treatment can begin [45]. In some times, a chest X-ray and a sputum sample are needed to find out if a person has tuberculosis [46]. In poor and middle-income countries, sputum smear microscopy has been the major approach for diagnosing pulmonary tuberculosis [47]. Sputum smear microscopy examination has a long experience, inexpensive, and is used for the follow-up of patients on treatment. However, it is cumbersome for laboratory staff and patients and needs two samples; • Magnetic Resonance Imaging (MRI): is a type of scan that uses powerful magnetic fields and radio waves to provide detailed images of the inside of the body. An MRI scanner is a huge tube with powerful magnets within. During the scan, the patient will be lying inside the tube. MRI scans can be used to investigate practically any region of the body, including the brain, breast, and heart problems [48]. MRI has more advantages as a 3D technique and is safer (no ionizing radiation, and excellent soft-tissue contrast. However, it has long total scan times (30-75 min), is not as readily accessible, and is claustrophobic (enclosed space).

Image Pre-Processing
The main goal of image pre-processing a segmentation is to enhance the quality of the input image and reduce the amount of noise. Images must match the network's input size in order to train a network model and make predictions on new data. You can re-scale or crop your data to the desired size if you need to modify the size of your images to fit the network. By using random augmentation to your data, you can effectively enhance the amount of training data [93]. The image augmentation creates many variations from the original images. The image augmentation process may include cropping, flipping, brightness, saturation, contrast, rotation, scaling, translation, zooms, and/or adding noise, as shown in Figure 1. The figure illustrates the different variations from the input X-ray image, including horizontal and vertical shift, horizontal and vertical flip, rotation, brightness, and zoom using Keras ImageDataGenerator method. As an example of data augmentation pre-processing, in [94], the authors used data augmentation to diagnose pneumonia disease and achieved an accuracy of 96.61%. For image classification tasks, in terms of training loss, accuracy, and validation loss, a deep learning model with image augmentation outperforms a deep learning model without it. Augmentation can solve the problem of imbalanced classes in binary classification [95]. When training a binary classification model, the resulting model will be biased if one class has more samples than the other. There are other advanced methods that are used to handle the imbalanced dataset, such as the synthetic minority oversampling technique (SMOTE) [96], generative adversarial networks (GAN) [97,98], and adaptive synthetic sampling method (ADAYSN) [99]. In [98], the authors used GAN method to detect lung cancer and achieved an accuracy of 99.86%. Image segmentation is used to perform operations on images to detect patterns and retrieve information from them. Image segmentation is the process of splitting a digital image into several regions or objects, each of which is made up of sets of pixels with similar features or attributes that are labeled differently to represent distinct regions or objects. The purpose of segmentation is to make an image more understandable and easier to evaluate by simplifying and/or changing its representation, as in [100], which achieved an AUC of 97.7% for segmentation. In [101], the authors achieved an accuracy of 96.47% without segmentation and 98.6% with segmentation.

Deep Learning Models
Deep learning has become very popular in the field of scientific computing because its algorithms are widely used to solve complex problems in medical applications. Deep learning algorithms employ several types of neural networks to perform specific tasks such as speech recognition, image recognition, data compression, machine translation, data visualization, and image classification. Deep learning supports the classification, quantification, and identification of medical images. DL is a learning type of neural network relevant to layer size [102], and it refers to systems that learn from experience on large data sets. Deep learning is predicated on the concept of extracting features from input data utilizing many layers to find different elements that are important to the input data. Data classification is very important in the medical field to generate statistics about the causes of illness and causes of death. Many varieties of deep learning algorithms are utilized in different applications as the nature of the data determines which deep learning algorithms are used. The most widely used deep learning algorithms are as follows:

Convolutional Neural Networks (CNNs)
For image classification, CNN is one of the most commonly used deep neural network types [103]. Unlike neural networks 'ANN', where the input is a vector, here the input is a multi-channeled image. CNN operates by extracting features from images directly [104]. The essential features are not pre-trained; they are learned while the network trains on a set of images. This automated feature extraction makes deep learning models more accurate for computer vision tasks such as object classification [105]. CNN learns to detect distinct aspects of an image using many hidden layers. CNN is formed by three main types of layers (convolutional layer, pooling layer, and fully connected layer) [106,107] as shown in Figure 2. The description of these layer types is as follows:  • Convolutional layer has a set of filters (or kernels). A kernel or a filter is a collection of weights, where each neuron in one layer is connected to every neuron in the next layer in the neural network using weights. It performs a convolution operation (a linear operation involving a set of weights multiplied (in a dot product mode) by the input is called convolution) [108]. To obtain a certain value, the value of dot products are added together; • Pooling layer is applied to the feature maps produced by a convolutional layer. It provides an approach for downsampling feature maps by summarizing the presence of features in patches of the feature map, which leads to reducing the number of parameters and calculations in the network. It recognizes the complex objects in the image and thus preventing overfitting. Average pooling and max pooling are two common pooling algorithms that summarize a feature's average presence maps; • Fully connected layer connects all of the neurons from the previous layer and assigns each connection a weight. Each output node in the output layer represents a class's score. Multiple convolutional-pooling layers are merged to generate a deep architecture of nonlinear transformations, which helps to create a hierarchical representation of an image, facilitating the learning of complex relationships.
CNN is widely used in image classification because it is powerful in achieving high accuracy with lowest error rate, but there are some disadvantages as follows: CNN has multiple layers, hence the training process takes a lot of time and also requires a large data set to process and train the neural network [109].

Recurrent Neural Networks (RNNs)
RNNs are widely employed in image labels, speech recognition, time series analysis, machine translation, and natural language processing (NLP). RNNs use some types of feedback, in which the output is fed back into the input as shown in Figure 3. It is a loop that passes data back to the network from the output to the input. The nodes in different layers of the neural network are compressed to form a single layer of recurrent neural networks. 'x' is the input layer, 'h' is the hidden layer, and 'y' is the output layer. A, B, and C are the network parameters used to improve the output of the model. At any given time t, the current input is a combination of input at x(t) and x(t − 1). The output at any given time is fetched back to the network to improve on the output. The previous elements of a sequence determine the output of the RNNs. Therefore, they are able to remember previous data and use that information in their prediction [110]. RNN is the best example of long-term memory as it remembers all information since it was first used. Using its prior knowledge, it anticipates your other actions. However, there are some drawbacks such as slow computation of this neural network, training can be difficult, and very long sequences cannot be processed if you use relu as an activation function. Therefore, RNN includes less feature compatibility when compared to CNN [111].

Deep Belief Networks (DBNs)
DBN is a type of deep neural network that comprises a large number of hidden units connected between layers but not between units within each layer as shown in Figure 4.  Restricted Boltzmann Machines (RBMs) are a binary variant of factor analysis. Instead of having multiple factors, the network output will be determined by a binary variable. DBN can be used to extract the in-depth features of the original data. Object recognition, video sequences, and motion capture data are all processed using DBN applications [112,113].
A deep belief network is especially useful when limited training data are available. DBN has specific robustness in classification (size, position, color, and viewing angle rotation). The same neural network approach in a DBN can be implemented on various applications and data types. However, there are some drawbacks including that it requires huge data to perform better techniques such as CNN model, has hardware requirements, and requires classifiers to understand the output [114].

Multilayer Perceptron (MLP)
MLP is a sort of feedforward neural network made up of multiple layers of perceptrons with activation functions and is a fully connected class of Artificial Neural Network (ANN), where ANN refers to models of human neural networks that are designed to help computers learn. It consists of a large number of highly interconnected processing elements called neurons, and one or more hidden layers. MLP are made up of at least three fully connected layers: input, hidden, and output layers as shown in Figure 5. MLP might have several hidden layers, and they are employed in applications of machine translation software, complex signal processing, speech recognition, and image recognition [115].
The MLP model is one of the best and simplest types of artificial neural networks, and it works well with both small and large input data. However, one of its drawbacks is that the calculation process is difficult and takes a long time [116].

Ways to Train Deep Learning Models
A pre-trained model is one that has been trained on a large dataset to handle a problem similar to the one we are working on. There are three types of training a deep learning model: learning from scratch, transfer learning, and fine-tuning.

•
Learning from scratch collects a large number of labeled datasets and designs a network architecture to learn the features that may then be used as input to a model (i.e., feature extractor). Feature extraction images may be extracted from a model automatically as in the CNN model or manually using hand-crafted methods such as Histogram of Oriented Gradients (HOG), Intensity Histograms (IH), Scale Invariant Feature Transform (SIFT), Local Binary Patterns (LBP), and Edge Histogram Descriptor (EHD) [117]. For applications with a large number of output classes, this strategy is useful, but it needs more time to train a model [118]; • Transfer learning is the process of transferring information from one model to the next, allowing for more accurate model creation with less training data as shown in Figure 6. Instead of starting the learning process from scratch, transfer learning begins with patterns learned while solving a previous problem, allowing for faster progress and improved performance while tackling the second problem [119]. Many studies use transfer learning to enhance their model performance, such as the ones in [94,101,[120][121][122]; • Fine-tuning is a common technique for transfer learning. In addition, it is making minor changes in order to obtain the desired result or performance, using the weights of a pre-trained neural network model as initialization for a new model trained on the same domain's data. Except for the output layer, the target model duplicates all model designs and their parameters from the source model and fine-tunes them based on the target dataset. The target model's output layer, on the other hand, must be trained from scratch. Fine-tuning deep learning involves using weights of a previous deep learning algorithm for programming another similar deep learning process as in [32,123,124]. Because it already has crucial knowledge from a previous deep learning algorithm, its procedure dramatically reduces the time required to develop and process a new deep learning algorithm. When the amount of data available for the new deep learning model is limited, fine-tuning deep learning models can be used, but only if the datasets of the current model and the new deep learning model are similar [125].

Ensemble Learning
Ensemble learning is the process of strategically generating and combining several models, such as classifiers to solve a specific problem [126]. It is largely used to improve a model's performance (classification, prediction, function approximation, etc.) or to lower the chance of a poor model selection. It can also be used to assign a confidence level to the model's decision, data fusion, incremental learning, non-stationary learning, pick optimal (or near-optimal) features, and error-correcting. Classifiers may be Support Vector Machine (SVM), SoftMax, Decision Trees, or Naïve Bayes Classifiers. Voting scheme [127,128], bagging [129], boosting [130,131], and stacking [132,133] are the most commonly used ensemble learning algorithms.

Pre-Trained Models
As mentioned before, transfer learning is a machine learning method where we reuse a pre-trained model as the starting point for a model on a new task as shown in Figure 6. The following are many pre-trained models for image classification and segmentation as: • Visual Geometry Group (VGG) is the most familiar model for image classification.
It is a standard CNN with multiple layers [134]. The VGG models are VGG-16 and VGG-19, which supports 16 and 19 convolutional layers, respectively, trained on the ImageNet (ImageNet is a database with over 14 million images divided into 1000 categories). VGG-16 takes a long time to train compared to other models, and this can be a disadvantage when we are using large datasets. The main feature of this architecture is that it focuses on basic 3 × 3 size kernels rather than a large number of hyper-parameters (a kernel is a matrix of weights that are multiplied with the input to improve the output in a preferred manner) in the convolutional layers and the max-pooling layers of 2 × 2 size. Finally, it has two fully connected (FC) layers for output, followed by a Softmax classifier. The VGG's weight configuration is publicly available and has been utilized as a baseline feature extractor in a variety of other applications and challenges. VGG-19 differs from VGG-16 in that each of the three convolutional blocks has an extra layer [135]. The work in [136] used VGG-16 for the classification of 14 different thoracic diseases and the work in [137] used the same model for COVID-19 detection. The work in [138] used VGG-19 for the detection of tuberculosis and the work in [139] used VGG-19 in the detection of pneumonia; • Inception-V3 Szegedy et al. invented a type of CNN in 2014 [140]. Inception v3 is an image recognition model that has been shown to attain greater than 78.1% accuracy on the ImageNet dataset [141]. Inception models are different from typical CNNs in that they are made up of inception blocks, concatenating the results of many filters on the same input tensor. The model itself is made up of symmetric and asymmetric building blocks, including convolutions, average pooling, max pooling, concatenations, dropouts, and fully connected layers. Batch normalization is used extensively throughout the model and applied to activation inputs. Loss is computed using Softmax. Inception-V3 is a new version of the starting model that was first released in 2015. It has three different filter sizes in a block of parallel convolutional layers (1 × 1, 3 × 3, and 5 × 5). Moreover, a maximum 3 × 3 assembly is performed. The outputs are transmitted to the next unit in a consecutive order. It accepts an entry image size of 299 × 299 pixels [142]. In [119], the authors used this model for the detection of lung nodule disease; • ResNet50 is a type of deep neural network that is a subclass of CNNs and is used to classify images. ResNet50 is a variant of ResNet model which has 48 Convolution layers along with one MaxPool and one Average Pool layer [143]. The usage of residual layers to create a new in-network architecture is a major innovation. ResNet50 is comprised of five convolution blocks, each having three layers of convolution. ResNet50 is a residual network that accepts photos with a resolution of 224 × 224 pixels and has 50 residual networks [144]. The work in [120,145] used this model in the classification of 14 different thoracic diseases; • Inception-ResNet-V2 is an ImageNet-trained CNN. The network is 164 layers deep and can classify images into 1000 object categories [141]. It is a hybrid approach that combines the structure of inception with the residual connection. It accepts 299 × 299 pixel images and generates a list of estimated class probabilities. The conversion of inception modules into residual inception blocks, the addition of more inception modules, and the creation of a new type of inception module (Inception-A) following the Stem module are among the advantages of Inception-Resnet-V2 [146]; • DenseNet201 is a 201-layer CNN that receives a 224 × 224 pixel input image. DenseNet201 is a ResNet upgrade that adds dense layer connections. It connects one layer to the next in a feed-forward approach. DensNet201 has direct connections L(L + 1)/2 while the standard convolutional networks have L layers and L connections. In DenseNet, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. Concatenation is used. Each layer is receiving a "collective knowledge" from all preceding layers. Since each layer in DenseNet receives all preceding layers as input, it has more diversified features and tends to have richer patterns [147]. By increasing the amount of computing required, encouraging feature reuse, minimizing the number of parameters, and reinforcing feature propagation, DenseNet can enhance the model's performance [148]; • MobileNet-V2 is an improved version of MobileNet-V1 that uses the ImageNet database to train. It contains only 54 layers and a 224 × 224 pixel input image. MobileNetV2 contains the initial fully convolution layer with 32 filters, followed by 19 residual bottleneck layers [149]. Its key distinctive feature is that it uses depth-wise separable convolutions instead of a single 2D convolution. That is, two 1D convolutions with two kernels are used. As a result, training takes up less memory and requires fewer parameters, resulting in a tiny and efficient model. A residual block with a stride of 1 and a downsizing block with a stride of 2 are the two types of blocks. Each block has three layers: a 1 × 1 convolution with ReLU6, a depthwise 3 × 3 convolution with ReLU6, and another 1 × 1 convolution with nonlinearity.
MobileNetV2 is a mobile-oriented model that can be used to solve a variety of visual identification tasks (classification, segmentation, or detection) [150]. The work in [151] used MobileNet-v2 in the classification of 14 different thoracic diseases, and the work in [101] used this model for the detection of tuberculosis disease; • Xception is a CNN that has 71 layers called Xception and presented by Chollet [152]. It features depthwise separable convolutions and is a more advanced version of Inception's architecture. The traditional Inception modules are replaced by depthwise separable convolutions in Xception. It outperforms VGG16, ResNet, and Inception in conventional classification issues when compared to them. It uses a 299 × 299 pixel input image [152]; • NASNet is a type of convolutional neural network discovered through a search for neural architecture. It has been trained on over a million images from ImageNet. For a wide variety of images, the network learned rich feature representations. Normal and reduced cells are the basic building blocks [153]. The network accepts 331 × 331 pixel images as input [154]. The work in [135] used this model in lung cancer detection; • U-Net is used for semantic segmentation. It is a convolutional network architecture for fast and precise segmentation of images. It is used for biomedical image segmentation [155]. In the U-Net model, the input images go through several stages of convolutional and pooling, which reduce the height and width of the image as the depth grows after each convolution in down-sampling, followed by fully convolutional and several stages of up-sampling to produce the image mask [156]. The segmentation image size of 512×512 pixel [157,158]. In [159], the authors used this model for segmentation of thoracic fracture disease, and in [100], the authors used U-Net in segmentation of cardiomegaly disease.

Evaluation Criteria
The final step is using a loss function or confusion matrix C ij to determine the number of observations that were categorized properly or wrongly. The loss function is the difference between the expected outcome and the expected output. From the loss function, we can derive the gradients which are used to update the weights. For a data point Yi and its predicted value Yj, where n is the total number of data points in the dataset, the mean squared error (MSE) is defined as in Equation (7). The observed i and projected j outcome values are compared as shown in Figure 7. The confusion matrix shows the number of correct and incorrect predictions categorized by type of outcome [160]. Recall, Precision, Specificity, Accuracy, Area Under the Curve (AUC), and Receiver Operating Characteristics (ROC) curve can be measured using a confusion matrix. The benchmark metrics are: Positive Negative

Lung Diseases
These chest diseases affect the structure of the lung tissue, airways, or any part of the respiratory system, causing it to become scarred or inflamed, which makes the lungs unable to fully expand [181]. These diseases appear as opacities on chest radiograph such as pneumonia, MERS-CoV, edema, and consolidation [182]. Tables 3-5 summarize the studies on using deep learning to diagnose lung diseases. Table 3 includes the following diseases: Pneumonia, Fibrosis, Lesion, Pleural Thickening, Asbestosis Signs, Edema, Lung Metastasis, and Consolidation.

Pneumonia
Pneumonia is an infection that causes breathing difficulties by inflaming the air sacs in one or both lungs.
Using the current deep learning architectures (VGG-16, VGG-19, ResNet-50, DenseNet-201, Inception-ResNet-V2, Inception-V3, MobileNet-V2, and Xception models) for transfer learning to compare current deep CNN architectures and retraining of a baseline CNN, Idri et al. [94] established the best performing architecture for 2-class categorization (pneumonia and normal) based on X-ray images. The OCT and COVID Chest X-ray were the two datasets used. As a result, they determined that the fine-tuned version of Resnet50 operates exceptionally well, with rapid increases in training and testing accuracy (more than 96%). Using transfer learning of current deep learning architectures, they established the best performing architecture for 2-class categorization (pneumonia and normal) based on X-ray images. Dey et al. [139] presented a Deep-Learning System (DLS) to diagnose lung diseases based on X-ray images. The suggested study makes use of traditional chest radiographs as well as chest radiographs that have been processed with a threshold filter. Standard DL models with a SoftMax classifier are utilized for the first experimental evaluation using the ChestX-ray8 dataset, including AlexNet, VGG16, VGG19, and ResNet50. The results showed that VGG-19 has a higher classification accuracy of 86.97% when compared to other approaches. They then used the Ensemble Feature Scheme to modify the VGG19 network to identify pneumonia. VGG19 with an RF classifier has a higher accuracy of 95.70%. When the same experiment was conducted with chest radiographs that had been handled with a threshold filter, the classification accuracy of the VGG19 using the RF classifier was 97.94%.
An automated model for detecting and localizing pneumonia on chest X-ray images were provided by Sirazitdinov et al. [127]. For pneumonia identification and localization, they suggest an ensemble of two convolutional neural networks, Mask R-CNN and Reti-naNet, where RetinaNet is a one-stage object detection model that utilizes a focal loss function to address class imbalance during training. The RetinaNet backbone uses ResNet and Feature Pyramid Net (FPN) structures. Based on the FPN structure, a top-down path and horizontal connection are added. Each level of the FPN is connected to the fully convolutional networks, which include two independent subnets that are used for classification and regression. The Mask R-CNN is a Convolutional Neural Network and state-of-the-art in terms of image segmentation. This variant of a Deep Neural Network detects objects in an image and generates a high-quality segmentation mask for each instance.
For the detection of pneumonia, the Faster R-CNN-based technique was used. They used the Kaggle Pneumonia Detection Challenge dataset, which contains 26,684 X-ray images of pneumonia. The recall score was 79.3%.

Fibrosis
The pulmonary fibrosis disease is characterized by scarred and damaged lung tissue. These thick, rough tissues make it difficult for your lungs to function properly, and as pulmonary fibrosis worsens, you will start to feel short of breath.
Christe et al. [172] presented a CNN model for the classification and diagnosis of pulmonary fibrosis disease by using CT images. They used three datasets: Lung Tissue Research Consortium Database (LTRC-DB), the Multimedia Database of Interstitial Lung Diseases (MD-ILD), and the Inselspital Interstitial Lung Diseases Database (INSEL-DB). They used the random forest (RF) classifier that was able to recommend a radiological diagnosis. The output accuracy is 81%, and the F1-score is 80%.
Fu et al. [184] developed and tested an elegant convolutional neural network (CNN) for histological image segmentation, particularly those containing Masson's trichrome stain. There are 11 convolutional layers in the network. The CNN model was trained and tested on a 72-image dataset of cardiac histology pictures (labeled fibrosis, myocytes, and background). The segmentation performance of the model was excellent, with a test mean dice similarity coefficient (DSC) of 0.947.

Lesion
Pulmonary lesions, pulmonary nodules, lung nodules, pulmonary nodules, white spots, and lesions are various words for the same thing: an abnormality in the lungs. They are distinct, well-defined spherical opacities with a diameter of less than or equal to 3 cm (1.5 in) that are entirely surrounded by lung tissue, do not touch the lung root or mediastinum, and are not associated with enlarged lymph nodes, collapsed lung, or pleural effusion. A pulmonary nodule might be malignant or benign.
Zhang et al. [135]  Chen et al. [186] introduced a faster region convolutional neural network (Faster R-CNN) that has been effectively used for computed tomography nodule candidate detection. Before doing nodule detection, they did nodular enhancement and segmentation. They To categorize pulmonary images, Wang et al. [119] employed a DCNN model pretrained on Inception-v3 to create a viable and practicable computer-aided diagnostic model. The computer-aided diagnostic approach could help clinicians diagnose thoracic disorders more accurately and quickly. They employed the fine-tuned Inception-v3 model based on transfer learning and a variety of classifiers (Softmax, Logistic, and SVM). They worked using the JSRT dataset. The sensitivity of the model was 95.41%, and the specificity was 80.09%.

Pleural Thickening
Pleurisy is a disease that causes thickening of the lung lining, or pleura that may cause chest pain and difficulty breathing.
Guan et al. [171] proposed an attention guided convolution neural (AG-CNN) network that avoids noise and improves alignment by learning from disease-specific regions. AG-CNN is divided into three branches. Five convolutional blocks with batch normalisation and ReLU make up the global and local branches. A max pooling layer, a fully connected (FC) layer, and a sigmoid layer are then connected to each of them. Unlike the global branch, the local branch's input is a local lesion patch that is cropped by the mask formed by the global branch. The fusion branch is then created by concatenating the maximum pooling layers of these two branches. They initially learn about a global CNN branch by looking at global visuals. Then, they used the attention heat map obtained by the global branch to infer a mask to crop a discriminative region from the image.
The ChestX-ray14 dataset was used to train and test the model. The AUC for AG-CNN is 86.8% on average. The average AUC was 87.1% when DenseNet-121 is utilized.
For clinical applications, solving the problem of abnormality localization in addition to categorising abnormalities, further training of these models to locate abnormalities could be employed to address this problem. However, doing so accurately will necessitate a significant number of clinical expert disease localisation annotations.
Ouyang et al. [145] employed a hierarchical attention mining framework that unites activation and gradient-based visual attention in a holistic manner, as well as an attentiondriven weakly supervised algorithm. The three layers of attention mechanisms in the hierarchical attention mining framework are foreground attention, positive attention, and abnormal attention. ChestX-ray14 and CheXpert datasets are used in their investigation. The average AUC for the ChetX-ray dataset is 83.5%. The AUC of ResNet50 and ResNet152 increased to 88.8% and 89.5%, respectively, when transfer learning was used.

Asbestosis Signs
Asbestosis is a long-term lung illness caused by inhaling asbestos fibers. Long-term exposure to these fibers causes lung tissue scarring and shortness of breath. The disease's symptoms can range from minor to severe, and they normally do not show up for several years following persistent exposure.
Using medical CT data, Myong et al. [167] presented a Long-term Recurrent Convolution Networks (LRCN) model capable of recognizing the existence and severity of asbestosis. The CNN and RNN models are combined in the LRCN model. LRCN processes the variable-length visual input with a CNN. In addition, their outputs are fed into a stack of recurrent sequence models, which is long short-term memory (LSTM). The final output from the sequence models is a variable-length prediction. DenseNet161 is used to train the CNN model (transfer learning). They used private data from 469 patients who had been screened for asbestosis at Seoul St. Mary's Hospital in Korea. The purpose of this study was to employ LSTM which is a special type of RNN to address the image classification problem with CT data. The model achieved an accuracy of 83.3%, with a true positive of 81.578% and a true negative of 86%. Additionally, a model was built that can test validity by assisting an expert with a Grad-CAM that can see the judgement.
A lung segmentation and deep learning model-based approach for recognizing patients with asbestosis in segmented computed tomography (CT) images has been developed by Kim et al. [187], which could be used as a clinical decision support system (CDSS). They also suggested that the LRCN model to categorize lungs into normal and asbestosis lungs (CNN extracts image features, and RNN learns the extracted sequence information). They used a private dataset at Seoul St. Mary's Hospital, which is part of the Catholic University of Korea's College of Medicine (IRB no. KC17ENSI0379). There were a total of 447 patients, with 275 being healthy and 172 having asbestosis. In addition, 87 of the 172 patients with asbestosis were diagnosed in the early stages, while 85 were discovered in the advanced stages. The algorithm built with the DenseNet201 pre-trained model performed exceptionally well, with a sensitivity of 96.2%, specificity of 97.5%, accuracy of 97%, AUROC of 96.8%, and F1 score of 96.1%.

Pulmonary Edema
Excess fluid in the lungs causes this disorder. This fluid gathers in the lungs' many air sacs, making breathing harder.
Chauhan et al. [166] used the Medical Information Mart for Intensive Care CXR dataset (MIMIC-CXR) to present a Bidirectional Encoder Representations from a Transformers (BERT) neural network model that learns from images and text to assess pulmonary edema severity from chest radiographs, where BERT is a deep learning model in which every output element is connected to every input element, and the weightings between them are dynamically calculated based upon their connection. Overall, the accuracy is 89%.
Liao et al. [183] also measure the severity level of pulmonary edema in CXR images, but by using a Bayesian model for training and testing on the MIMIC-CXR dataset. The root mean squared (RMS) error is 0.66, and the Pearson correlation coefficient (CC) is 0.52.

Lung Metastasis
Metastasis of the Lungs or Metastatic Lung Disease Cancer is a malignant tumor that develops elsewhere and spreads to the lungs through the circulation. Breast cancer, colon cancer, prostate cancer, sarcoma, bladder cancer, neuroblastoma, and Wilm's tumor are all common malignancies that metastasize to the lungs. Any malignancy, on the other hand, has the potential to move to the lungs.
To overcome the difficulty of sparse data, the Generative Adversarial Network (GAN) was presented by Lin et al. [98] to generate computed tomography images of lung cancer. GAN is applied to generate new data automatically. It trains the generator and discriminator networks simultaneously. The former generates new images, and the latter learns to distinguish the fake images from the input of real and generated data. The AlexNet model is applied for the classification of lung cancer into benign or malignant tumors. They used the SPIE-AAPM Lung CT Challenge Data Set that contains 22,489 lung CT images, with 11,407 images of malignant tumors and 11,082 images of benign tumors. The image size is 512 × 512 pixels. The model achieved an accuracy of 99.86%.
Using CT images from the LIDC-IDRI datasets, Ashharet al. [121] evaluated the performance of five convolutional neural network architectures: ShuffleNet, GoogleNet, SqueezeNet, DenseNet, and MobileNetV2 in categorizing lung tumors into two classes: malignant and benign categories. They proved that GoogleNet has the best performance for CT lung tumor classification with a specificity of 99.06%, an accuracy of 94.53%, sensitivity of 65.67%, and AUC of 86.84%.

Consolidation
Pulmonary consolidation is an area of normally compressible lung tissue that occurs when that tissue is filled with fluid instead of air.
To detect consolidation lung illness, Rostami et al. [168] deployed a pre-trained deep convolutional neural network (DCNN) VGG16 and DenseNet121 on ImageNet datasets. The dataset they used was the Pediatric Chest X-ray dataset, which contains two classes, normal and pneumonia/consolidation. The model correctly identified consolidation with a 94.67% accuracy.
A CNN classification model pre-trained on VGG-19 was developed by Bhatt et al. [185] for COVID-19 pulmonary consolidations in chest X-ray detection. They look at binary classification to detect consolidation lung disease, followed by multi-classification predictions (normal, pneumonia, and SARS-CoV-2). They used the COVIDx dataset, which includes 66 COVID-19 among the 16,756 chest radiography images. For binary classification, the accuracy was 89.58%, while for multi-classification, it was 64.58%. Table 4 includes the following diseases: Asthma, COPD, TB, and COVID-19.    The mask extraction stage could be improved. In addition, more complex algorithms, approaches, and datasets appear promising to improve system performance.  Tuberculosis Tuberculosis (TB) is a bacterial infection caused by Mycobacterium tuberculosis bacteria. The bacteria most commonly assault the lungs, but they can also harm other regions of the body. When a person with tuberculosis coughs, sneezes, or talks, it spreads via the air.

Lung Diseases That Affect Airways
Tuberculosis was was properly detected from chest X-ray images using data augmentation, image segmentation, and deep-learning classification approaches. Rahman et al. [101] employed nine distinct deep CNNs for transfer learning (ResNet18, ResNet50, ResNet101, ChexNet, InceptionV3, VGG19, DenseNet201, SqueezeNet, and MobileNet). They used the NIAID TB dataset as well as the RSNA dataset. Without segmentation, the output classification accuracy, precision, and recall for tuberculosis detection were 96 El-Melegy et al. [193] presented a Faster Region-based Convolutional Neural Network (RCNN) to detect Tuberculosis Bacilli using Sputum Smear microscopy images. They employed the ZNSM-iDB public database, which includes auto-focused data, overlapping objects, single or few bacilli, views without bacilli, occluded bacilli, over-stained views with bacilli, and artifacts. The model achieved F1-Score of 89.7%, a recall of 98.3%, and precision of 82.6%.

COVID-19
This disease caused by the severe acute respiratory syndrome coronavirus (SARS-COV-2) is called emerging coronavirus disease . COVID-19 appeared in late 2019, and it appears as a ground-glass opacity (GGO) on radiographs. In March 2020, COVID-19 was declared a global pandemic by the WHO.
By using a chest CT scan from Tabriz's Alinasab Hospital, Sadjadi et al. [137] demonstrated a deep convolutional neural network (DCNN) model for classification of COVID-19 versus healthy individuals, where DCNN is a CNN that consists of several layers using a three-dimensional neural pattern. There were 131 COVID-19 patients and 150 normal cases controls in this study, which employed a total of 10,979 CT images. VGG16 was used to pretrain a CNN model. They scored 92% precision, 90% sensitivity, 91% specificity, 91% F1-Score, and 90% accuracy.
An adaptive feature selection guided deep forest (AFSDF) method was proposed by Sun et al. [131] for COVID-19 classification from chest CT images. This model was built using a high-level representation of the features. A feature selection approach was applied to the trained deep forest model to remove feature redundancy. They used a private dataset that included 1027 individuals with community-acquired pneumonia (CAP) or non COVID-19 and 1495 patients with COVID-19. The model achieved 91.79% accuracy, specificity, sensitivity, and area under the ROC curve, respectively, were 89.95%, 93.05%, and 96.35%.
Mamalakis et al. [122] presented a new deep transfer learning pipeline network (DenResCov-19) based on chest X-ray images to diagnose patients with COVID-19, pneu-monia, and tuberculosis. They have added an extra layer with CNN blocks to combine these two models (DenseNet-121 and ResNet-50) and achieve superior performance over either of them. They put their proposed network to the test on classification problems with two classes (pneumonia vs. healthy), three classes (including COVID- 19), and four classes (including tuberculosis). In all four datasets, the proposed network was able to correctly classify these lung diseases, and it outperformed the benchmark networks, DenseNet and ResNet. For the four classes, precision is 82.90 percent, AUC is 95 percent, and F1-Score is 75.75%.
COVID-Net CXR-S was introduced by Aboutalebi et al. [191]. It is a CNN model that uses CXR images to predict the severity of a SARS-CoV-2 positive patient's airways. With customized macroarchitecture and microarchitecture designs for COVID-19 diagnosis from chest X-ray images, the COVID-Net backbone design demonstrates sparse longrange connectivity and a significant architectural diversity. To give better representational capabilities while maintaining minimal architectural and computational difficulties, the network architecture used projection-expansion-projection-expansion (PEPE) patterns, which are light-weight design patterns. The model classifies input images into two levels of severity. They used the RSNA dataset. The model achieved Level 1 sensitivity, Level 2 sensitivity, level 1 Positive Predictive Value (PPV), Level 2 PPV value, and accuracy of 92.3%, 92.85%, 87.27%, 95.78%, and 92.66%, respectively. They proved that a COVID-Net CXR-S model has high performance compared with CheXNet and ResNet-50.
Deb et al. [133] presented a DCNN model to classify COVID-19 disease. They used VGGNet, GoogleNet, DenseNet, and NASNet to pre-train the model. They used two publicly available datasets and one private dataset. They demonstrated that a multi-model ensemble architecture outperforms a single classifier in terms of performance. When using a public dataset, the model achieved an accuracy of 88.98% for three class classifications (COVID-19, Normal, and Community-Acquired Pneumonia (CAP)) and, for binary class classification, they reported an accuracy of 98.58%. The model achieved accuracy of 93.48% when they used private dataset.
In order to extract visual features from COVID-19-infected areas and deliver an accurate clinical diagnosis while optimizing the pathogenic diagnostic test and cutting down on time, Allioui et al. [192] proposed deep reinforcement learning (DRL) mask extractionbased methodologies. DRL used to minimize the long-term manual mask extraction and enhance medical image segmentation frameworks. They used a public CT images dataset. The model achieved a precision of 97.12% with a Dice of 80.81%, a sensitivity of 79.97%, and a specificity of 99.48%.

Asthma
It is a disorder in which the airways narrow and swell, producing excess mucus and making breathing difficult, coughing, and shortness of breath.
To diagnose adult asthma, a deep neural network (DNN) was presented by Tomita et al. [173], where DNN is a neural network with some level of complexity, usually at least two layers. They used a private dataset derived from clinical records of 566 adult outpatients who presented to Kindai University Hospital for the first time with non-specific respiratory symptoms. The output accuracy result is 98%.
Spyroglou et al. [190] presented a Bayesian Logistic Regression model to predict asthma. Data were gathered from 147 patients by the Pediatric department of the University Hospital of Alexandroupolis, Greece during the period from 2008 to 2010. The output accuracy for prediction was 86.3673% and the sensitivity of 87.25%.

COPD
Chronic Obstructive Pulmonary Disease (COPD) is a set of diseases that cause airflow restrictions in the lungs and breathing difficulties. Emphysema and chronic bronchitis are two conditions that make breathing difficult. The lungs rely on the natural flexibility of the airways and alveoli to remove air from the body. In the case of COPD, the lungs lose their elasticity, which leads to their expansion, which leads to the retention of air inside them [194]. COPD affects millions of people, although it is rarely recognized or treated [195]. Changes in the airways of the lungs are an early sign of COPD. According to WHO estimates, COPD is the third largest cause of mortality worldwide, causing 3.23 million deaths in 2019 [196]. A chest X-ray may not show COPD until it is severe, and the images may show enlarged lungs or airways (bullae), cardiac stenosis, or a flat diaphragm. Thus, doctors may request a computerized tomography (CT) scan after the X-ray scan to obtain a clearer picture to help diagnose them [197].
Using CT scans, Bao et al. [170] presented a 15-direction Multi-View Deep Neural Network (MV-DCNN). To create the MV-DCNN, they used 15 anti-aliased ResNet18 models as well as a classification layer. The three steps of the multi-View DCNN algorithm are as follows: The initial step is to extract images from three-dimensional data from 15 different angles. The second stage is to improve the data in each of these 15 views. To extract and categorize the features, the final step is to build 15 Multi-View DCNN (MV-DCNN) models. They used RFAI's synthetic texture datasets to test the accuracy of 3D texture feature classification techniques. COPD classification has an output accuracy of 97.7%.
A new 3D-cPRM classification approach for COPD grouping was developed by Ho et al. [188] using a 3D-CNN model and the parametric-response mapping (PRM) method. The researchers then utilized a technique called gradient-weighted class activation mapping (Grad-CAM) to highlight the key components of the CNN learning process. They used data from the Institutional Review Boards of Kangwon National University Hospital (KNUH) and Jeonbuk National University Hospital (JNUH). CT scans at KNU and JNU Hospitals yielded 596 patients (204 with COPD and 392 without COPD). The model had a sensitivity of 88.3% and an accuracy of 89.3%.

Emphysema
A symptom of a lung disorder is shortness of breath. In persons with emphysema, the air sacs in the lungs (alveoli) become damaged, and the alveoli's inner walls weaken and burst over time, resulting in bigger air gaps rather than many smaller ones. The surface area of the lungs is reduced, limiting the amount of oxygen that reaches the bloodstream. Emphysema is a part of COPD.
Peng et al. [175] using multi-scale deep convolutional neural networks, a novel deep learning DCNN technique for pulmonary emphysema classification was presented. The findings revealed that, (1) when compared to a single scale setup, the multi-scale technique was far more effective. (2) In terms of performance, the model exceeded current approaches.
(3) The severity of emphysema measured agreed well with various pulmonary function indices. They worked using a private dataset. The accuracy of the classification output is 92.68 percent.
Choudhary et al. [189] presented a CNN model used to predict the probability of one of the fifteen diseases, including emphysema. They used the ChestX-ray14 dataset. An overall accuracy of 89.77% was achieved for the classification of the different diseases.

Infiltration
A pulmonary infiltrate is a substance that is denser than air and persists within the parenchyma of the lungs, such as pus, blood, or protein. Tuberculosis is related to pulmonary infiltrates. Table 5 provides a summary of some of the infiltration disease literature.  Abiyev et al. [177] explained the applicability of CNN technology to classify chest Xray diseases. Backpropagation neural networks (BPNNs) and competitive neural networks (CpNNs) with unsupervised learning are being utilized to diagnose chest diseases. All of the networks were trained and tested using the ChestX-ray14 database. CNN has a 92.4% output performance, BPNN has an 80.04% output performance, and CPNN has an 89.57% output performance.
Hazra et al. [136] presented first a CNN architecture including convolutional, activation, pooling, and fully connected layers, followed by a Softmax layer that delivers the likelihood of the output for each type of sickness. Then, a CNN model was trained using the ChestX-ray14 dataset and a pre-trained VGG-16 model. Using Grad-CAM, they were able to see how the model performed against a test image. They obtained 83.671% accuracy (scratch CNN) and 97.81% accuracy (transfer learning).

Atelectasis
When the patient's lung sacs do not inflate properly, the blood may be unable to supply oxygen to your organs and tissues, resulting in atelectasis. Table 5 provides a summary of some of the atelectasis disease literature.
Wang et al. [169] proposed a ChestNet model for diagnosing chest diseases with X-ray images consists of two branches: attention and classification. The attention branch exploits the correlation between class labels and the locations of pathological abnormalities, allowing the model to focus adaptively on the pathologically abnormal regions. The classification branch (ResNet-152 model) serves as a uniform feature extraction-classification network, freeing users from troublesome handcrafted feature extraction. Six convolutional layers make up the attention branch: 1 × 1, 3 × 3, and 1 × 1 kernels are used in the first three convolutional layers, which are each followed by a ReLU activation function. The ChestX-ray14 dataset was used. ChestNet's overall AUC is 0.7810, while Atelectasis disease's AUC is 0.7433.
Abdelbaki et al. [151] presented the MobileNet V2 model (CNN + Additional Neural Network layers) for classifying and predicting frontal thoracic X-ray lung diseases. They used the NIH ChestX-ray14 database. The AUC average of 81.1% has an accuracy of more than 90% and a specificity of 97.3%. Atelectasis has an accuracy of 79.6% and a specificity of 96.8%.

Pneumothorax
A pneumothorax or a deflated lung occurs when a collapsed lung causes an abnormal collection of air in the pleural space between the lung and the chest wall. The most common symptoms are dyspnea and severe pain on one side of the chest. A pneumothorax is a complete or partial collapse of the lung that needs to go to medical attention immediately. Table 5 provides a summary of some of the pneumothorax disease literature.
Gooßen et al. [174] compare and contrast three distinct deep learning algorithms for detecting and localizing pneumothorax in chest X-ray images (CNN, multiple-instance learning, and fully convolutional networks). To predict 14 illnesses, a CNN model trained on the ChestX-ray14 dataset and pre-trained on ResNet-50. The dense layer for pathology prediction was replaced by a new binary classification layer for pneumothorax identification. Multiple-Instance Learning (MIL) combines classification and localization while only requiring image-level labels for training. Fully Convolutional Networks (FCNs) are more advanced networks that are designed for semantic segmentation. They combined the separate methods in a linear fashion. The three approaches (CNN, MIL, and FCN) had AUCs of 96%, 93%, and 92%, respectively. The total classification performance was improved by combining the proposed three approaches as an ensemble.
Based on the whole 26-layer you only look once (YOLO) model, a CNN model was proposed by Park et al. [198]. The YOLO model was utilized to determine the lesions' bounding boxes. The CNN model was developed using a proprietary dataset that included 1596 chest radiographs of pneumothorax patients of varying severity, as well as 11,137 of normal cases, which were gathered from two tertiary referral hospitals. The CNN model performed well in diagnosing pneumothorax on chest radiographs, with an overall accuracy of 87.3%.

Heart Diseases
Cardiovascular or Heart diseases (CVDs) are diseases that impact your heart's structure or function, such as: cardiomegaly and heart insufficiency diseases [199]. Cardiovascular disease (CVD) is the major cause of death in the world which causes narrowing or blockage of blood vessels, causing shortness of breath and chest pain. According to the World Health Organization (WHO), 17.9 million people died from cardiovascular diseases in 2019, accounting for 32% of all fatalities worldwide [200]. The data in Table 6 illustrate the summary of heart disease detection.

Cadiomegaly
Many studies have looked at the detection of cardiomegaly with other abnormalities in a multi-classification situation, predicting all available labels from the datasets provided as in [100,120,128] and some studies detect cardiomegaly in a binary classification as [201].
Ammar et al. [128] presented a cardiac segmentation and diagnosis through an automated pipeline based on a private MRI images dataset of 150 patients from the Dijon Hospital (Medical Image Computing and Computer Assisted Intervention in the Post-2017 Era (MICCAI)). They employed a complete CNN model for classification and UNet deep learning segmentation network. To classify heart diseases, they utilized a multilayer perceptron (MLP), support vector machine (SVM), and a random forest (RF). As a result of this procedure, the accuracy was 92%.
Sogancioglu et al. [100] used the publicly available ChestX-ray14 dataset for Classification to study the detection of cardiomegaly on frontal chest radiographs using two alternative deep-learning approaches: anatomical segmentation and image-level classification. They trained a typical U-net architecture on a separate JSRT dataset to partition the heart and lung areas. They used ResNet18, ResNet50, and DenseNet121 in the classification. The AUC for segmentation is 0.977, while the AUC for classification is 0.941 as a result. They will look into applying the segmentation-based method to other diagnostic procedures.
The same ChestX-ray14 dataset was used to classify the 14 diseases, Nickisch et al. [120] looked at the performance of multiple network architectures including ResNet-38, ResNet-50, and ResNet-101 to classify 14 different diseases. ResNet-50 achieved an elevated AUC of 0.822 on average. DCNN was used by Candemir et al. [201] to automatically detect cardiomegaly in digital chest X-rays. They used and fine-tuned various deep CNN architectures to detect cardiomegaly disease. Following that, the researchers provided a CXR-based pre-trained model in which they fully trained an architecture (AlexNet, VGG-16, VGG-19, and Incep-tionV3) with a large CXR dataset. Finally, they investigated the association between the severity of the disease and a Softmax probability of an architecture. The datasets they used were the NLM-Indiana Collection and the NIH-CXR, both of which are freely available. The accuracy of the NIH-CXR dataset is 88.24% (training set: NIH set and 30% of Indiana Collection; test set: 70% of Indiana Collection) and 89.86% (training set: NIH set and 30% of Indiana Collection).

Heart Failure
Heart failure, or insufficiency, refers to the heart's inability to properly pump blood throughout the body. This occurs when the heart becomes too weak or stiff. It does not indicate that the heart has stopped working; it only requires some assistance to help it function better.
Nirschl et al. [178] created a CNN classifier to predict clinical heart failure in 209 patients using H&E stained whole-slide images. They used private data from the University of Pennsylvania's Cardiovascular Research Institute and Department of Pathology and Laboratory Medicine, which they received and analyzed. They proved that the CNN model is able to detect patients with heart failure or severe pathology with a sensitivity of 99% and a specificity of 94%.
To predict hospital admission, exacerbation of HF, at 30 and 90 days in patients with heart failure with low ejection fraction (HFrEF), Wang et al. [202] employed a sequential model architecture based on bidirectional long-term memory (Bi-LSTM) layers. They used two sets of data: the HFrEF patient group, which had only 47,498 patients but had almost two million medical events or interactions obtained from claims, and the general patient group data collection. The AUC is 86.1%.

Others
This class includes diseases that affect bones or muscles of the chest such as fracture, hernia, and mass, as shown in Figure 8. Table 7 illustrates the summary of mass, fracture, and hernia diseases detection.

Fracture
Chest fractures are injuries to the chest wall, such as the bones, skin, fat, and muscles that protect your lungs or any of the organs inside the chest.
Wu et al. [159] used a three-dimensional rib segmentation model (U-Net) and a deep learning R-CNN pre-trained on the ResNet50 algorithm capable of recognizing rib fractures and related anatomic locations on CT images. First, they scanned the rib fractures and the ribs segmented section by section using a two-dimensional (2D) detection network. To improve rib segmentation accuracy, a three-dimensional (3D) network was used. With an 84.3% free-response receiver operating characteristic (FROC) score on the test set 1, the model correctly diagnosed rib fractures. With a detection sensitivity of 84.9%, a precision of 82.2%, and an F1-score of 83.3%, the system did well in the test set 2. The model achieved an AUC of 93%, a sensitivity of 87.9%, and a specificity of 85.3% on the test set 3. The model received an 82.7% dice score and a 96% accuracy for rib segmentation.
Zhou et al. [203] also demonstrated an R-CNN model that can detect and categorize rib fractures in computed tomography (CT) images and generate structured reports. First, CNN's raw output was used, and then the merged structured report was used. They used private data from three hospitals. There were 1079 patients in this study. The results indicated that the model does a good job of classifying rib fractures into three classes (old, healing, and fresh fractures). Fresh fractures and healing fractures had higher detection efficiency than old fractures (F1-scores of 84.9%, 85.6%, and 77%, respectively), and the model's robustness was good in the five multicenter/multiparameter validation sets (all mean F1-score 80%). The five radiologists' precision climbed from 80.3% to 91.1%, while their sensitivity increased from 62.4% to 86.3%. The radiologists' diagnosis time was decreased by 73.9 s.

Hernia
The section of a lung that pushes through a tear or bulges through a weak area in the chest wall, neck canal, or diaphragm is called a lung hernia.
To increase model performance, an entropy weighting loss was presented by Mo et al. [179] to notice inter-label relationships and make full use of classes with fewer cases than others. They tested out three different deep learning models (VGG16, ResNet50, DenseNet121). Under the Chest X-ray14 dataset, DenseNet121 produced better results, with an AUC score of 84.3% on average.
The triple-attention learning model was presented by Wanget al. [204] for computeraided diagnosis (CAD) of thoracic diseases. For element-wise, channel-wise, and scale-wise attention learning, the model combines three attention modules into a cohesive framework. It was pre-trained using DenseNet121 for feature extraction. The deep learning model can use element-wise attention to focus on areas with pathological abnormalities, and scale-wise attention to rescale feature maps. The utilized dataset was the ChestX-ray14. The model achieved an AUC of 82.6% across 14 different thoracic diseases.

Mass
A lung mass is defined as a spot or abnormal area in the lungs larger than 3 cm (about 1.5 inches) in size.
On chest radiographs, Liang et al. [180] evaluated the diagnostic performance of a deep learning-based system for the detection of clinically significant lung nodules/masses. They used the ChestX-ray14 dataset for 100 patients with 47 mass images and 53 images without mass. They used four algorithms to detect pulmonary nodules/masses: heat map, abnormal probability, nodule probability, and mass probability. They used the QUIBIM Chest X-ray Classifier app module that assists radiologists in dealing with the vast amounts of chest radiographs. Chest radiographs are generated in health centers every day, by prioritizing potentially problematic instances. The Chest Radiograph Module is a collection of 14 pathology-specific 19-layer convolutional neural networks, followed by a fully connected layer that takes a chest radiograph and generates a likelihood of disease as well as heat maps indicating the areas of the image that are most symptomatic of chest disease. For pulmonary nodule/mass detection, the mass probability algorithm exhibited the best predictive performance with a sensitivity of 76.6%, AUC 91.6%, and specificity of 88.68%.
Li et al. [205] presented a faster Region-based convolutional neural network (RCNN) pre-trained on ResNet to diagnose lung mass disease. They used the JSRT dataset. The model achieved an accuracy of 53.38%.

Discussion
After analyzing these data from previous studies, we present the trend analysis of thoracic diseases detection recently through the following attributes, the analysis of the trend image type, transfer learning, data augmentation, deep learning model, and an ensemble classifier, respectively. X-ray images were used in the majority of studies (59%) followed by CT scans (33%) as shown in Figure 9, and this is because it is cheaper, has a simple technique, has lower radiation compared to CT scans, and is widely used by radiologists to identify cracks, infection levels, and identify abnormal cases. However, it does not provide 3D information.
arning, data augmentation, deep learning model, and an ensemble classifier, respectively. X-ray images were used in the majority of studies (59%) followed by CT scans (33%) as shown i gure 9, and this is because it is cheaper, has a simple technique, has lower radiation compared t T scan, and is widely used by radiologists to identify cracks, infection levels, and identify abnorma ses. But it doesn't provide 3D information.
X-ray 59% CT-scan 33% Others 8% Figure 9. Image type of thoracic diseases distribution using deep learning in the recent years.
In the majority of the research included in this review, DL models perform excellently whe ained and tested on carefully selected datasets, including one or more classes of disease. The use o ansfer learning has grown in popularity as shown in Figure 10. Transfer learning enables the us features learned while training for a previous task to be applied to a new task, which improve assification accuracy. This could be because the model was trained on a larger number of image aking it more generalized.
With Transfer Learning 44% Without Transfer Learning When there is a limited number of data or the emergence of a new disease, such as the recen ndemic of COVID-19 disease. So, the data are in a higher class than another, which leads to th odel becoming biased. Data Augmentation solved this problem. Data augmentation has the abilit improve the model's performance and image quality when it is employed. As a result, the numbe In the majority of the research included in this review, DL models perform excellently when trained and tested on carefully selected datasets, including one or more classes of disease. The use of transfer learning has grown in popularity as shown in Figure 10. Transfer learning enables the use of features learned while training for a previous task to be applied to a new task, which improves classification accuracy. This could be because the model was trained on a larger number of images, making it more generalized. Figure 9, and this is because it is cheaper, has a simple technique, has lower radiation compare 954 CT scan, and is widely used by radiologists to identify cracks, infection levels, and identify abno 955 cases. But it doesn't provide 3D information. When there is a limited number of data or the emergence of a new disease, such as the recent pandemic of COVID-19 disease. Thus, the data are in a higher class than another, which leads to the model becoming biased. Data Augmentation solved this problem. Data augmentation has the ability to improve the model's performance and image quality when it is employed. As a result, the number of works that use data augmentation increased over time. Figure 11 illustrates the majority usage of data augmentation in this survey. Version November 20, 2022 submitted to Diagnostics of works that use data augmentation increased over time. Figure 11 illustrates the majority 967 data augmentation in this survey.
With Data Augmentation 47% Without Data Augmentation 53% Figure 11. Data Augmentation distribution using deep learning aided thoracic diseases detectio the recent years.

968
In recent years, CNN has been the most used deep learning algorithm, as shown in F 969 More research may show that CNN is preferable to other deep learning algorithms for d 970 thoracic diseases. This is due to CNN's sturdiness, automatic feature extraction, and ability to 971 high classification accuracy.

CNN 69%
Others 31% Despite the fact that ensemble is a less popular technique, as shown in Figure 13, stud 973 used it reported superior detection performance than those that did not. This study shows tha  In recent years, CNN has been the most used deep learning algorithm, as shown in Figure 12. More research may show that CNN is preferable to other deep learning algorithms for detecting thoracic diseases. This is due to CNN's sturdiness, automatic feature extraction, and ability to achieve high classification accuracy.
With Data Augmentation 47% 53% e 11. Data Augmentation distribution using deep learning aided thoracic diseases detectio cent years. cent years, CNN has been the most used deep learning algorithm, as shown in F arch may show that CNN is preferable to other deep learning algorithms for d iseases. This is due to CNN's sturdiness, automatic feature extraction, and ability t ification accuracy.

CNN 69%
Others 31% e 12. Model type distribution using deep learning aided thoracic diseases detection in the re .
ite the fact that ensemble is a less popular technique, as shown in Figure 13, stu orted superior detection performance than those that did not. This study shows tha mble classifier to detect lung illness is still underused. The types of ensembles pre ch are as follows: stacking, boosting, averaging, majority voting, and multi-scale e Ensemble models can generate better predictions and accomplish better results tributing model, and can reduce the spread or dispersion of the predictions.
With Ensemble 22% Figure 12. Model type distribution using deep learning aided thoracic diseases detection in the recent years.
Despite the fact that ensemble is a less popular technique, as shown in Figure 13, studies that used it reported superior detection performance than those that did not. This study shows that the use of an ensemble classifier to detect lung illness is still underused. The types of ensembles presented in the research are as follows: stacking, boosting, averaging, majority voting, and multi-scale ensemble module. Ensemble models can generate better predictions and accomplish better results than any single contributing model, and can reduce the spread or dispersion of the predictions. ensemble classifier to detect lung illness is still underused. The types of ensembles pre esearch are as follows: stacking, boosting, averaging, majority voting, and multi-scale e ule. Ensemble models can generate better predictions and accomplish better results e contributing model, and can reduce the spread or dispersion of the predictions.  This research presented several thoracic diseases that can automatically detect using deep learning, namely pneumonia, COVID-19, edema, lesion, cohesion, fibrosis, emphysema, atelectasis, asthma, asbestos signs, cardiomegaly, heart failure, chronic obstructive pulmonary disease, pleural thickening, fracture, lung metastasis, hernia, pneumothorax, mass, tuberculosis, and infiltration.
Segmentation of lung may increase the performance of the model. Therefore, some research uses it in some diseases as in [101].

Critical Analysis
There are four major difficulties/issues in the papers we presented: data imbalance, image size handling, dataset availability, and high correlation of errors when employing ensemble techniques: (i) Data Imbalance: occurs when completing classification training. The resulting model will be biased if one class has a lot more samples than the other. It is preferable if each class has the same number of images. Therefore, researchers use the data augmentation technique to avoid this problem; (ii) Image Size Handling: most studies reduced the original image size during training to save computing costs. Training a very complex model with the original image size is incredibly computationally expensive, and even with the most powerful GPU hardware, it takes a long time; (iii) Dataset Availability: for training purposes, thousands of images of each class should be collected. This is carried out in order to create a more accurate classifier. The amount of available training data is generally less than optimal due to the limited number of datasets available. As a result, researchers are looking for new ways to produce a good classifier; (iv) When employing ensemble approaches, there is a high correlation of errors: For an ensemble of classifiers to perform well, they must make a variety of errors. The correlation between the base classifiers employed should be very low. In other words, the base classifiers are supposed to work together to give better classification results. In the majority of the experiments surveyed, only classifiers with similar selected features were combined. As a result, the base classifier's correlation errors are high.
Open issues that must be considered in order to improve the efficiency of deep learning models based thoracic diseases diagnosis: • Publicize datasets, so researchers would have access to more data and the classifiers developed would be more accurate; • Efforts can be focused on investigating several features. When employing ensemble approaches, this can help address the issue of high error correlation. As more features are added, the number of contrasts increases and the model's accuracy improves. The results are often better when merging multiple versions; • Using ensemble learning, especially in multi-classifications, to improve the accuracy of model detection and reduce training time; • The majority of the models discussed in this analysis classify rather than localize or segment abnormalities, and this is an area that can be explored further; • Unsupervised learning approaches like generative adversarial networks and variational autoencoders are being used by numerous researchers to investigate automated data curation.

Conclusions and Future Work
Medical practitioners and computer scientists all over the world are working collaboratively to develop effective techniques to diagnose thoracic diseases and track them by using AI-based methods. This paper provides a literature review of recent thoracic disease diagnosis and prediction research which involves the use of AI techniques. This research introduced a new classification of thoracic diseases from the medical point of view. It covered many different thoracic diseases, including COVID-19. A comprehensive survey of diseases belonging to this classification was made in terms of image type, the dataset used, model type, ensemble techniques, results, and open issues. Other researchers may use the classification provided to plan their contributions and research activities. A possible future approach could lead to increased efficiency and an increase in the number of applications for the detection of thoracic diseases with the help of AI.
The suggested future work is the use of multi-modality data, including medical visual data and patient health information, to verify the severity of the disease. Fusion methods will play a pivotal role in determining the severity score of the disease and overcoming the varied nature of the data utilized. The future of this work is to employ ML and/or DL algorithms to investigate various fusion techniques to achieve more accurate results and use recent models such as vision transformers (ViT), hybrid models, or explainable artificial intelligence (XAI) in the diagnosis of these diseases.