A Novel Method for Detection of Tuberculosis in Chest Radiographs Using Artiﬁcial Ecosystem-Based Optimisation of Deep Neural Network Features

: Tuberculosis (TB) is is an infectious disease that generally attacks the lungs and causes death for millions of people annually. Chest radiography and deep-learning-based image segmentation techniques can be utilized for TB diagnostics. Convolutional Neural Networks (CNNs) has shown advantages in medical image recognition applications as powerful models to extract informative features from images. Here, we present a novel hybrid method for efﬁcient classiﬁcation of chest X-ray images. First, the features are extracted from chest X-ray images using MobileNet, a CNN model, which was previously trained on the ImageNet dataset. Then, to determine which of these features are the most relevant, we apply the Artiﬁcial Ecosystem-based Optimization (AEO) algorithm as a feature selector. The proposed method is applied to two public benchmark datasets (Shenzhen and Dataset 2) and allows them to achieve high performance and reduced computational time. It selected successfully only the best 25 and 19 (for Shenzhen and Dataset 2, respectively) features out of about 50,000 features extracted with MobileNet, while improving the classiﬁcation accuracy (90.2% for Shenzen dataset and 94.1% for Dataset 2). The proposed approach outperforms other deep learning methods, while the results are the best compared to other recently published works on both datasets.


Introduction
Tuberculosis (TB) is considered one of the world's biggest threats to humanity as it is ranked as the fifth leading cause of death globally (1.5 mln deaths annually) [1]. For this reason, the World Health Organization encourages interlacing efforts to root it out because it is rather easy to cure [2]. Although it has low specificity and difficult interpretation, posteroanterior chest radiography is one of the preferred TB screening methods. In these circumstances, an automated computer-based diagnosis system for TB could be an efficient method for widespread TB screening. produced from CNN which is largely redundant. This can accordingly minimize the capacity and resources consumption, then improve the classification of Chest X-ray images. up to our knowledge, AEO has not applied on any real applications yet.
The other parts of the article are structured in the following order: Section 2 presents the methodology and the techniques used including the model structure and description. The proposed MobileNet-AEO approach is presented in Section 3. Datasets and evaluation measures are described in Section 1. The experimental results and comparisons with other works are presented in Section 5, while, the conclusions are presented in Section 6.

Feature Extraction Using Convolutional Neural Networks
The main concept of transfer learning with very deep neural networks is to re-train a CNN model (on our dataset) that was previously trained on the ImageNet [24] (about 1.2 million images). Since the dataset covers a wide range of objects (1000 different categories), so the model can learn diverse types of features, then they can be reused for other classification tasks.
In this work, we used MobileNet [37] with 88 layers. The MobileNet structure is mainly based on the depth-wise separable convolutions (Conv.). Each Conv. layer is succeeded by a Batch Normalization (BN.) [42] and ReLU. Figure 1 shows that down-sampling is performed with stride-Conv. in the depth-wise Conv., and in the first layer. Then, a final average pooling is applied to reduce the spatial resolution to be 1 before exposed to the fully connected layer. Counting depth-wise and 1 × 1 (point-wise) Conv., shows that MobileNet contains 28 layers. It is worthwhile to mention that 95% of it's computation time is spent in 1 × 1 convolutions which contain 75% of MobileNet parameters. The whole MobileNet architecture consists of 17 of these blocks as seen in Figure 1. There are no pooling layers between these depth-wise blocks. Instead, there is a stride of 2 to reduce the spatial dimensions.
In our case, the last layer has only 2 channels (TB or not TB). The final layer is the soft-max layer because it is a binary classification. Hidden layers have the rectification non-linearity [20]. To implement transfer learning with MobileNet, we first retrieved the previously extracted bottleneck features by MobileNet, then combine them with current extracted features from our TB images by training MobileNet on existing training data. Finally, we assign another class (rather than 1000 classes as in Imagenet) at the top of the model (for classification) [43].
One of the drawbacks of CNN in general, especially, MobileNet is that it requires higher computational resources such as memory and storage capacity. In order to get over this problem, some statistical operations were applied to exclude irrelevant and correlated features, and also to make the proposed approach computationally efficient, they are listed as follows [44]:

1.
Chi-square is applied to remove the features which have a high correlation values by computing the dependence between them. It is calculated between each feature for all classes, as in Equation (1): where O k and E k refer to the actual and the expected feature value, respectively. 2.
Tree-based classifier is used to calculate feature importance to improve the classification since it has high accuracy, good robustness, and is simple [45].

Feature Selection Using Artificial Ecosystem-Based Optimization
Recently, the Artificial Ecosystem-based Optimization (AEO) algorithm has been proposed [41]. It is inspired by the behavior in the earth system of biology of flow of the energy. Such mechanism emulates some actions of living organisms like decomposition, consumption, and production.
In general, the producers can be any type of green plants that obtained their food using the photosynthesis process. During this process (i.e., photosynthesis), the sugar (i.e., oxygen and glucose) is produced by interactions between the water and carbon dioxide subject to the absence of sunlight. Then, this obtained sugar is used by plants to make fruits, leaves, roots, and wood. So, the herbivore consumers and omnivore consumers obtained their essential food from the producers. Since the consumers are animals and they can't produce their food and they only feed only on food from either producers or other consumers. In nature, there are three types of consumers (1) carnivores, (2) herbivores, and (3) omnivores. These animals that feed only on plants (i.e., producers) named herbivores; whereas, omnivores are these animals that have the ability to eat the producers and other animals. In addition, these animals that feed only on other animals are named carnivores. Decomposers represent the organism that feeds on producers (i.e., dead plants) and consumers (i.e., animals) or on the waste from living organisms. The fungi and the most of bacteria are decomposers, where the decomposers break down the remains of organisms dies and convert them into simple molecules, for example, minerals, water, and carbon dioxide. Followed by absorbing these energy types by producers to generate sugar using photosynthesis.
These actions can be modeled in a mathematical formulation where the action of production can control the trade-off of exploitation vs. exploration during the optimization process. During the decomposition action, the consumer can control the search space which is ended by deleting the intensification. Through such a system, the plants reach food using water, carbon dioxide, the light of the sun, and bacteria and fungi which can decompose nourishment.
The update process can be summarized as follows: 1. Production Procedure: according to the followed procedure in [41], the selection of the producer position is performed in a random way and the corresponding producer is the worst. However, the best solution represented by the decomposer can be modeled as following: where t and T ma are the current iteration and total number of iterations, respectively. ub and lb represents the upper and the lower boundaries of the search space. rand 1 and rand 2 are arbitrary variables in the interval [0,1] and d is a the weight parameter. X rand (t) donates a solution that generated randomly in the search space.

2.
Consumption procedure: in such a procedure, the first user feeds to the other user with a lower level of energy or on a producer. Each set of users known as omnivores, vegetarian or herbivores, and carnivores has its mechanism in modernizing its position as follows: (a) The herbivores locations can be modernized just with respect to the producers: where X 1 represents the location of the producer and K represents a parameter for the consumption, it is determined using the levy flight by the following equations: where Norm(0, 1) is a variable generated using the normal distribution with the zero mean and the unit variance.
The update process of the carnivores is performed through the arbitrary customer with several levels of the energy which has an index (l). Such procedure can be modeled as: The position update of omnivores are depends on the producer and as well as the randomly chosen consumer with high level of energy index (l) as framed follows: 3.
Decomposition process: This represents the last phase in the biological system in which each agent passes on and the remaining parts are separated. This step refers to the exploitation of AEO and it is formulated as in [41]: In Equation (7), the parameter D refers to the decomposition factor, h and e represent the weight parameters. rand 4 is random number generated from [0,1].
The steps of AEO are given in Algorithm 1. Moreover, The AEO has some advantages over the other MH techniques such as no parameters need to be determined during the optimization process. In addition, it has a high ability to balance between the exploration and exploitation which leads to improve convergence and avoids stuck at local optima. Consequently, improve the quality of the output. These characteristics made it more suitable to combine with DNN.
Inputs: N the number of solution and T max : total number of iterations.
Generate initial ecosystem X (solutions).
Compute the fitness value Fit i , and X 1 is the best solution.
repeat Update X 1 using Equation (2). Production Update X i using Equation (3), Herbivore Update X i using Equation (6), Omnivore else Update X i using Equation (5), Carnivore Compute the fitness of each X i .
Find the best solution X 1 . Decomposition Update X i using Equation (7).
Compute the fitness of each X i .
Update the best solution X 1 .

Proposed MobileNet-AEO for Chest X-ray Classification Approach
The proposed MobileNet-AEO approach starts by extracting deep features from raw Chest X-ray images using MobileNet as discussed in Section 2.1. About 50 K features are produced, which represent the output of the last layer 7 × 7 × 1024.
Then a selection of the optimal subset of relevant features is performed using binary AEO as a feature selection technique. The AEO set the initial value for N solutions each of them has dimension equal to the total number of extracted features Dim. This process is formulated as: where α i ∈ [0, 1] is a randomly generated number [46]. The LB i = 1 and UB i = 0 indicates the bottom and top boundary of the domain of searching. Thereafter, each solution U i is converted into binary vector (BU i ) using Equation (9).
The binary vector considered as the main step to determine the relevant features. Those extracted features that corresponding to zeros will be removed and the rest features represent the relevant features. For example, assume U i = [0.58, 0.94, 0.72, 0.21, 0.12, 0.78]. By using Equation (9), BU i = [1, 1, 1, 0, 0, 1]. Then, the 3rd and 4th features will be removed while the remaining features will be kept as the relevant ones.
To validate the quality of the selected features using the current solution U i , the following formula is used which represents the fitness function.
In Equation (10), |BU i | is the total number of the selected features. λ is applied to balance between the ratio of the selected features (the second part of Equation (10)) and the error of classification (γ i ) (first part). In this stage, KNN classifier is used to assess the performance of the selected features from the training set (here, it is 80% from the whole dataset).
The next process is to find the best solution U b which has the smallest Fit b . Followed by updating the solutions using operators of traditional AEO as discussed in Section 2.2. The process of updating the solutions is repeated until reaching the total number of iterations. The best solution U b is the output from this stage, to evaluate its quality, the testing set (represents 20% from the dataset) is reduced according to it. Again, KNN is used to predict the target of the testing set and the classification metric is computed. The outline of the proposed MobileNet-AEO approach is presented in Figure 2.   Table 1 illustrates the variation in each dataset, in terms of morphology, structure, shape, and zoom level. The variation in size (height and width) is from 200 to 4000 pixels, also each of them has a different image file format.

Evaluation
We used accuracy (Acc), sensitivity (Sens), specificity (Spec) [49] and time consumption as fitness measures. These are defined as follows: Speci f icity = TN TN + FP (13) where TP (true positives) is the number of the TB samples that were labeled correctly, TN (true negatives) is the number of the not-TB samples that were labeled correctly. FP (false positives) is the number of the TB samples that were labeled incorrectly as being not TB samples, and "FN" (false negatives) is the number of the not-TB samples that were miss classified as the TB samples.
We also calculate the standard deviation (STD) of the fitness measures as follows: where r is the run numbers. Fit i denotes a fitness function value. µ represents the average of the fitness value overall r.

Implementation Environment
The proposed approach was implemented in Python 3 on Windows 10 64 bit using a Core i7 CPU and 8 GB RAM, besides Google Colaboratory "Colab" [50]. The model was developed using Keras library [51] with Tensorflow backend [52]. For feature selection, the experiment was performed using Matlab 2018b on a computer Core i5 and 8 GB of RAM running with Windows 10.

Parameters
The proposed approach has been trained on 80% of samples (537 for Shenzhen dataset and 4160 for Dataset 2), while the rest 20% (134 for Shenzhen dataset and 1040 for Dataset 2) were used for testing (external validation) the model's performance. There is no overlap between any of the two sets. Also, all results are given on the testing set. In this work, we adopt the following parameters for building MobileNet. A learning rate of 0.0001 and a mini-batch of size 20 and binary cross-entropy as a loss function were used. Also, we adopt Rmsprop as the optimization algorithm [53]. In total, the model has 4,253,864 trainable parameters.
The parameters of each algorithm are set to default. The input to all feature selection algorithms is the features extracted by Mobilenet. Tables 2 and 3 show the results of the feature selection process over the two datasets, Shenzhen and Dataset 2. From the presented results, it can be noticed that our method based on AEO excels other methods. For example, in terms of accuracy, it achieves the first rank, followed by the SCA and TLBO that allocate second and third rank, respectively. Similarly, in terms of sensitivity, AEO performs well as has the highest best, worst, and means sensitivity values. In terms of specificity, AEO shows better best value than all other methods. However, the SCA and TLBO provide better mean and worst specificity when compared to other methods. These results indicates the high ability of the proposed AEO algorithm to find the relevant features and this reflected from its Best value of accuracy, sensitivity, and specificity. In addition, by analysis the value of the worst case in terms of performance measures, one can be noticed that the AEO outperforms other FS methods in sensitivity value. However, in accuracy and specificity allocates the third rank after SCA and TLBO, respectively. For the stability of the FS algorithms as measured using STD, it can be noticed that the WOA, AEO, and TLBO are more stable in terms of accuracy, sensitivity and specificity. Moreover, by analysis the results in terms of Best, Worst, and STD value for Datatset 2, it cab be noticed that AEO has high superiority over other models except in terms of STD of accuracy it has be observed that the GWO is more stable.  Figure 3 shows a comparison between the proposed approach and other feature selection algorithms. From this figure, it can be seen that the proposed approach shows advantages compared to the other algorithms in terms of accuracy, sensitivity and specificity on all experiments (best, mean and worst).   Table 4 depicts the performance of AEO in terms of computation time and number of selected features for Shenzhen dataset and Dataset 2. For Shenzhen dataset, it can be seen that AEO allocates the fourth rank by selecting 24.6 features on average, compared to WOA, which ranked first in selecting the smallest subset features, with only 10.8 features selected on average. However, the extracted features by WOA are less efficient as they show lower performance compared to AEO, as seen in Tables 2 and 3. In contrast, the proposed AEO provides the smallest and most significant feature set among other methods, on dataset 2.

Comparison with Other CNN Models
Here we compare the proposed approach (MobileNet-AEO) to MobileNet and other CNN models based on the classification evaluation criteria, (i.e., accuracy, specificity and sensitivity). Table 5 shows a comparison between our approach and MobileNet. In Table 5, we compare the features extracted from MobileNet and those extracted by our method. Only 0.05% and 0.038% of MobileNet features were extracted from Shenzhen and Dataset 2, respectively. The MobileNet-AEO method uses only 24 and 19 features (for Shenzhen dataset and Dataset 2, respectively) shows better performance in all classification measures than the basic MobileNet feature set that has some 50 K features. Figure 4 presents a comparison between the proposed approach and different efficient CNN architectures such as VGGNet (VGG 16 and VGG 19) [34], ResNet [35], NasNet [36], MobileNet [37], Inception [38] and Xception [39].   As shown in Figure 4 (top), the proposed MobileNet-AEO approach outperforms all other CNN architectures in both accuracy and sensitivity with a slight advantage than the basic version of MobileNet, while VGG 19 comes first in specificity with 91% compared to ours, 90%. In Figure 4 (right) , which represents Dataset 2, the proposed approach shows an advantage among all CNNs in all classification criteria. Also, ResNet comes last with both datasets for accuracy, sensitivity and specificity . It is noted that NasNet was excluded from the second dataset experiments due to resource limitations as it contains 88 K parameters, which produces about 487 K of features, which put it first as the deepest CNN.

Comparison with Related Works
Here we compare our results with relevant works. Table 6 shows the most recent published works on both Shenzhen and Dataset 2. As seen in Table 6 (top), the proposed approach has an advantage in performance (accuracy) over other recent works. Also, the first three models with the highest performance used CNN as a feature extractor, which means that CNN can extract the most informative features that improve the model's performance. Although Jaeger et al. [17] extracted features from chest x-ray images then adopted various classification methods, the results reported were achieved using low-level image features and linear logistic regression classifiers. Also, Lopes et al. [18] adopted the bags of features method on features extracted from GoogLeNet, ResNet, and VGG networks and then applied an ensemble of individual SVM classifiers.
In Dataset 2, as shown in Table 6 (button), the proposed method come also first among other previous works on the same dataset. Although in [61], the authors claim that they achieved 94.3% of classification accuracy, but they provided no more details about the model they proposed, they named their model as Sequential CNN.

Conclusions
In this paper, a new hybrid method for Tuberculosis X-ray image classification was introduced. The method is based on extracting features from the chest X-ray images using a MobileNet deep neural network, and then filtering the produced huge number of features using the recently proposed Artificial Ecosystem-based Optimization (AEO) algorithm to include only the relevant features and exclude the irrelevant features.
The classification approach (MobileNet-AEO) was validated using two publicly available datasets, Shenzhen Dataset and Dataset 2, which both contain chest X-ray images. MobileNet-AEO performed well by achieving high classification accuracy. Also, the complexity was reduced, which positively affects the computation time. MobileNet-AEO was successful in reducing the number of features from 50 K to only 25 and 19 and achieving an accuracy of 90.2% and 94.1% for Shenzhen dataset and Dataset 2, respectively, while increasing the performance at the same time. The proposed MobileNet-AEO approach outperforms all published works on the two datasets, as well as showed an advantage when compared to other convolutional network models. Our future work will include building a hybrid approach that combines a transfer learning model and a swarm optimisation algorithm to build a classification model for COVID-19 diagnostics from the chest radiographs.

Conflicts of Interest:
The authors declare no conflict of interest.