Automatic Detection of Pulmonary Nodules using Three-dimensional Chain Coding and Optimized Random Forest

: The detection of pulmonary nodules on computed tomography scans provides a clue for the early diagnosis of lung cancer. Manual detection mandates a heavy radiological workload as it identiﬁes nodules slice-by-slice. This paper presents a fully automated nodule detection with three signiﬁcant contributions. First, an automated seeded region growing is designed to segment the lung regions from the tomography scans. Second, a three-dimensional chain code algorithm is implemented to reﬁne the border of the segmented lungs. Lastly, nodules inside the lungs are detected using an optimized random forest classiﬁer. The experiments for our proposed detection are conducted using 888 scans from a public dataset, and achieves a favorable result of 93.11% accuracy, 94.86% sensitivity, and 91.37% speciﬁcity, with only 0.0863 false positives per exam.


Introduction
Lung cancer is a disease that mainly affects the lungs, which is the principal organ of the respiratory system and performs vital activities for human survival. It usually starts once the epithelial cells of the bronchioles or alveoli grow abnormally. If the lungs do not carry out their functions properly, due to lung cancer, other parts of the body can also be affected, which can eventually lead to death. This fact makes the mortality rate of lung cancer higher than those of other types of cancer. The global cancer data by the World Health Organization (WHO) stated that lung cancer reached the most significant number of deaths, 1.8 million, which was 18.4% of the total cancer deaths in 2018 [1]. Effective ways to reduce the mortality of lung cancer include preventions, early detections, and precise treatments. However, most patients usually notice lung cancer at the advanced and uncurable stages because lung cancer is asymptomatic at the immature stage of the disease.
Importantly, since the advent of medical imaging technologies, computed tomography (CT) became an effective modality for the early diagnosis of lung cancer. According to the National Lung Screening Trial (NLST), the mortality rate of lung cancer reduced up to 15%-20% in patients who performed low-dose CT screening [2]. CT can provide detail morphologic information about lung cancer by detecting abnormal lesions inside the lungs, called pulmonary nodules. They are the round or oval-shaped opacities on the CT scans that have well-defined or irregular margins. Nodules are generally regarded as an early signal of lung cancer. They form due to the inflammation of pulmonary structures, and often react to infection and diseases. However, not all the nodules discovered in lungs. If concave regions along the border are detected, they are assumed to be the boundary-attached nodules and enclosing them will be attempted. The bidirectional chain code in [6] is an improved version of the original Freeman chain code. Unlike the original one, it traces the boundary pixels in both the horizontal and vertical directions using two code-word systems. Compared to other border reconstruction schemes, chain coding algorithms have more of a guarantee to detect variant sizes and shapes of the nodules because they do not depend on the SE or window. Nevertheless, they need extra effort to enclose the concave regions. For example, [6] used SVM to detect the critical point pairs in order to enclose the concave areas along the lung boundary. Since SVM is a supervised learning method, it needs historical data to train the model. As a negative consequence, it may suffer training related problems, such as overfitting and underfitting. An alternative border reconstruction method, known as contour-marching (CM), was proposed in [22]. It corrects the lung boundary by checking the texture features. If a region suspected to be a border-attached nodule is detected, CM encloses that region into the lungs using the region growing method. Indeed, only the texture features are insufficient to distinguish the border-attached nodules.
The second step, false reduction, aims to discriminate the nodules from a high number of suspicious structures. This task can be accomplished using a classification algorithm. For example, rule-based methods [23], k-nearest neighbour (KNN) [20] and support vector machine (SVM) [6][7][8][9][10]12,13], are some popular algorithms for false reduction. Similarly, the neural network-based classifiers, such as the artificial neural network (ANN) [11,21] and convolutional neural network (CNN) [24,25], are also found in this step. Besides, the ensemble classifiers like the integrated ANN [26] and random forest (RF) [27] are also applicable for false reduction. As there are several classification algorithms, it is difficult to pick the best one because the performance of each classifier depends on the segmentation methods and the dataset applied. However, the recent trend of automated lung nodule detection aims to achieve better performance. Moreover, a de facto nodule detection scheme which is fully automatic, increases sensitivity, reduces false positives, and ensures the detection of variant nodules, is still demanding for practical use.

Related Works
As automated nodule detection has been an active research area since 1980, numerous research papers have been published on this topic. Those papers proposed different algorithms using different datasets with different focuses. Among them, we studied some related research that presented the same objective and processes. As our experiments are based on the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) [28][29][30], we reviewed some previous works using the same dataset. An extensive review of nodule detections, working on different datasets, can be referred to [31].
In 2015, Demir and Çamurcu [9] proposed computer-aided lung nodule detection using outer surface extraction. For lung region segmentation, they applied multiple thresholding, which uses two threshold values: maximum and minimum. Afterward, possible nodules were detected using labeling and rule-based methods. From each possible nodule, four different feature groups, including (i) morphological features, (ii) statistical and histogram features, (iii) outer surface statistical and histogram features, and (iv) outer surface textural features, are extracted. Based on these features, false reduction was conducted using a support vector machine (SVM), optimized by the particle swarm algorithm. Their experiments were performed using 200 CT exams of the LIDC-IDRI dataset. Their findings revealed that using outer surface features can provide a higher performance up to 98.03% sensitivity, 87.71% selectivity, 90.12% accuracy, and 2.45 false positives per scan. As a weakness, their work could not detect the nodules that had intensities outside the maximum and minimum threshold range. Moreover, they did not take into account the detection of nodules that were attached to the lung boundary.
In addition, Arindra et al. [24] also implemented novel nodule detection using multi-view convolutional networks (ConvNets). They initiated suspicious nodule detection by combining three existing candidate detectors, which are designated for solid, sub-solid, and large solid nodules, respectively. Subsequently, multiple streams of 2D ConvNets were created using 2D patches of the detected nodule candidates. ConvNets can learn the discriminative features automatically from the training data in order to perform the false reduction. Their detection is evaluated using three datasets, and the highest performance (sensitivity 90.1% with four false positives per scan) has resulted in the LIDC-IDRI dataset. Even though ConvNets are powerful for feature learning and classification, they demand very high computation costs, especially the graphical processing unit (GPU).
Recently, Zhang et al. [13] also developed a CAD for nodule detection that applied 3D skeleton features. For lung region segmentation, they used the global optimal active contour model, which can enclose both solitary and boundary attached nodules. Then, the suspicious nodules scattered in the lungs are segmented using thresholding, morphological operations, and labeling methods. Ten features, including the 3D skeleton features, are extracted, and invalid nodule candidates are eliminated, based on prior anatomical knowledge. Finally, the real nodules are segregated by the SVM-based false reduction. Their CAD was also tested on 71 exams of LIDC-IDRI and reported a 89.3% sensitivity, 93.6% accuracy, and 2.1 false positives per subject. Nonetheless, their CAD did not describe the detection of small nodules surrounded by the vessels.
Moreover, Yuan et al. [25] also presented a lung nodule classification scheme, based on hybrid features. Instead of using a segmentation method to detect suspicious nodules, they created icosahedrons to sample the spherical structures. From these samples, they estimated nodule radii based on threshold values and created a 3D volume of nodule candidates. Then, convolutional neural networks (CNNs)-based statistical features and Fisher vector (FV)-based geometrical features were extracted from the volume candidates. These features were fused and fed into the multi-class support vector machine (SVM) for the classification of different nodule types. Their scheme is evaluated using two datasets, namely LIDC-IDRI and ELCAP. They achieved the classification rate of 93.1% on LIDC-IDRI and 93.9% on ELCAP, respectively. However, their nodule voxels selection was based on the radii threshold; thus, it is challenging to make it a scalable scheme for the detection of invariant and very tiny nodules.
The primary objective of this paper is to present a fully automated scheme that can detect various types of nodules providing a high accuracy. Unlike the previous works, we present three significant contributions. For the first contribution, we propose an automated seeded region growing algorithm for the segmentation of lung regions. It is an advance of the original region-growing method and can overcome the weakness in the conventional one, in particular, the needs of the proper initial region and homogeneity criteria. For the second contribution, we present a three-dimensional chain coding algorithm to detect the boundary-attached nodules. This algorithm traces the voxelized edge of the lungs to find the juxta-pleural nodules. As it is based on three-dimensional codeword systems, it can provide more specific information about the nodules, as compared to the two-dimensional methods. Moreover, it does not need an extra algorithm to find the start and endpoints of the boundary gap. As an advantage, we can eliminate the training related problems as well as saving on computational effort. For the third contribution, we develop an optimized random forest (RF) for false reduction. The original RF is a successful ensemble method for nodule detection, and we further improve its performance by combining the attribute selection algorithms in order to get more accurate diagnosis results. The rest of this paper is organized as follows. Section 2 describes the details of the materials and methods used in this work. Section 3 designs the proposed nodule detection and evaluates the experimental results. Lastly, Section 4 summarizes the paper by discussing the scope of future works.

LIDC-IDRI Dataset
Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) [28][29][30] is a well-known public dataset of thoracic CT scans for lung cancer. This dataset is comprised of 1018 exams, and each exam contains sequential, two-dimensional (2D) CT slices in DICOM format. Each slice has a 512 × 512 pixels dimension, and the number of slices per exam varied from 65 to 764. Moreover, each exam in the dataset contained an XML file that annotated the diagnosis results. These results are accurate because four experienced thoracic radiologists confirmed them through a two-phased reading process.
In our proposed nodule detection, we applied 888 exams of LIDC-IDRI because some exams contained missing and inconsistent slices; thus, we excluded such exams from our experiments. We applied the annotation files as the ground truths to train and evaluate our method. Instead of detecting the pulmonary nodules directly from the 2D scans, we reconstructed three-dimensional volumetric images by stacking all the 2D slices in a single exam, i.e., each exam gave a volume image, and it was used as an input for our nodule detection.

Methods
Our proposed nodule detection has two fundamental steps: (i) suspicious nodules detection and (ii) false reduction, as illustrated in Figure 1. The first step is conducted using three methods, namely, (i) automated seeded region growing, (ii) three-directional chain code, and (iii) preliminary screening. In the second step, there are two methods: (i) 2D and 3D feature extraction and (ii) an improved random forest classifier. We discuss the technical details of these five methods in the following sub-sections.

214
The automated seeded region growing is an enhanced version of the original seeded region 215 growing (SRG) algorithm [32]. SRG is famous for segmentation due to its rapid, robust, and easy-to-216 use procedure. Unlike other region-based algorithms, SRG does not require homogeneity criterion to 217 determine the similarity between the voxels. Instead, it starts the segmentation by assigning sets of 218 voxels called the initial seed points, where each set represents a separate region in the image. In each 219 step of the algorithm, SRG assigns an unlabelled voxel , , into one of the initial regions. The

Automated Seeded Region Growing
The automated seeded region growing is an enhanced version of the original seeded region growing (SRG) algorithm [32]. SRG is famous for segmentation due to its rapid, robust, and easy-to-use procedure. Unlike other region-based algorithms, SRG does not require homogeneity criterion to determine the similarity between the voxels. Instead, it starts the segmentation by assigning sets of voxels called the initial seed points, where each set represents a separate region in the image. In each step of the algorithm, SRG assigns an unlabelled voxel v x,y,z into one of the initial regions. The algorithm terminates when all the voxels in the image are labelled into different regions. For example, assume a volume image I has initial seed regions (S 1 , S 2 , S 3 . . . S n ) and a set of unlabelled voxels V. Then, we want to label each element of V to each S i , where i = 1, 2, 3, . . . n.
where N v x,y,z is the 26 connected neighbors of v x,y,z . After the m step of the algorithm, N v x,y,z meets just one of the initial seed regions S i , and N p x,y ∩ S i p x,y ∅ . Then, we calculate the difference grey value between v x,y,z and its adjacent region by using the following equation.
where g v x,y,z is the grey value of the unlabelled pixel, and the latter part of the equation is the mean grey value of all the voxels in the adjacent region r.
If v x,y,z is located on the boundary of two or more regions, we assign v x,y,z into the region that has the minimum ∂ value.
In this way, SRG allocates all the unlabelled voxels into separate regions. SRG is easy to use, even for the unskilled user who does not have detailed knowledge about the input image. Nevertheless, the segmentation result of SRG strongly depends on the initial seed regions. If the seed regions fall on the noisy or outlier voxels, it will result in poor segmentation. Moreover, user-oriented seed selection of SRG is also a flaw for fully automated detection.
Therefore, we propose an enhanced version of SRG, which can identify the initial seed points automatically. Seed point selection is a vital task for SRG because correct segmentation results are strongly correlated to the seed points. Each seed point should represent a typical grey value of its region, but it should not be a noise or an outlier voxel. Fortunately, on a CT scan, specific organs exhibit a range of grey values, depending on the radiographic density, which is measured by the Hounsfield unit (HU). Based on this fact, we applied the HU value of the input CT scan for automatic seed point selection.
As one instance, Figure 2a illustrates a histogram that shows the distribution of HU values in the input CT volume. In this histogram, HU values are ranging from −1024 to 2513. Then, we can calculate the mean HU value of the image to find the seed points. The mean HU of the CT volume in this example is 147.78. Based on this mean HU value, we can assign the seed points automatically. In our research, we identify two initial seed points. The first seed point is selected by finding the very first voxel, which is less than the mean HU value. In Figure 2b, the first seed point is denoted by a blue square, and it represents black regions (low-density regions). Likewise, the second seed point is set as the very first voxel, which is higher than the mean HU (denoted by a red square in Figure 2b), and it represents the grey region (high-density regions). Based on these two seed points, the segmentation of the lungs can be performed by following the steps for SRG.
automatically. Seed point selection is a vital task for SRG because correct segmentation results are 236 strongly correlated to the seed points. Each seed point should represent a typical grey value of its 237 region, but it should not be a noise or an outlier voxel. Fortunately, on a CT scan, specific organs 238 exhibit a range of grey values, depending on the radiographic density, which is measured by the 239 Hounsfield unit (HU). Based on this fact, we applied the HU value of the input CT scan for automatic

Three Dimensional Chain Coding
The segmentation of the lung regions usually fails when the nodules adhere to the boundary of the lungs. Such nodules are known as the juxta-pleural nodules and are exposed the intensities that are similar to the lung wall. Thus, during segmentation, they are usually excluded, together with the lung wall, and this produces an under-segmentation result. As an instance, Figure 3a demonstrates a 2D CT slice containing a juxta-pleural nodule (denoted by a red rectangle), and Figure 3b shows the segmentation result of the lungs in 3D, applying the automated SRG. From this image, we can notice that the juxta-pleural nodule is under-segmented (denoted by a red rectangle).
Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 25 As one instance, Figure 2a illustrates a histogram that shows the distribution of HU values in 244 the input CT volume. In this histogram, HU values are ranging from -1024 to 2513. Then, we can 245 calculate the mean HU value of the image to find the seed points. The mean HU of the CT volume in 246 this example is 147.78. Based on this mean HU value, we can assign the seed points automatically. In 247 our research, we identify two initial seed points. The first seed point is selected by finding the very 248 first voxel, which is less than the mean HU value. In Figure 2b, the first seed point is denoted by a 249 blue square, and it represents black regions (low-density regions). Likewise, the second seed point is 250 set as the very first voxel, which is higher than the mean HU (denoted by a red square in Figure 2b),

251
and it represents the grey region (high-density regions). Based on these two seed points, the 252 segmentation of the lungs can be performed by following the steps for SRG.

264
Chain coding [4,5] is the most frequently used algorithm to solve the problem of juxta-pleural 265 nodule segmentation. It traces the pixels/voxels along the contour of the lungs and determines 266 whether there is a discrete path or not. If a discrete region is found, it is assumed to be the place of 267 the juxta-pleural nodule and it is enclosed by connecting the lung border. The bi-directional chain 268 coding method in [6] is an advancement to the original chain coding in [5]. It uses two-dimensional  Chain coding [4,5] is the most frequently used algorithm to solve the problem of juxta-pleural nodule segmentation. It traces the pixels/voxels along the contour of the lungs and determines whether there is a discrete path or not. If a discrete region is found, it is assumed to be the place of the juxta-pleural nodule and it is enclosed by connecting the lung border. The bi-directional chain coding method in [6] is an advancement to the original chain coding in [5]. It uses two-dimensional coordinate systems to trace the boundary pixels in both the horizontal and vertical directions. It works well for juxta-pleural nodule segmentation, but it needs to operate on each 2D slice of the input CT exams. As a negative consequence, it consumes a lot of processing time and effort. Additionally, it needs a supplementary algorithm, for example, the support vector machine (SVM) [6], to determine and connect the start-point and end-point of the discrete path.
Unlike the chain codes in [4,5], we propose a three-dimensional chain code algorithm [33,34] for juxta-pleural nodule detection. The main difference is that the chain coding in [4,5] works in 2D segments of lung regions, but our proposed 3D chain coding works in the 3D segments of lung volume. Firstly, we extract the voxelized edge V, i.e., a single voxel outline, of the segmented lungs. On that boundary or edge, V, suppose there are N number of voxels. V(i) is the i th voxel of the boundary where i = {1, 2, . . . N}. Then, Freeman's F26 code word system [33] is applied to trace the directional changes between V(i) and its adjacent voxel V(i + 1).
As illustrated in Figure 4, F26 views a boundary voxel as a cubic lattice, and there are three types of adjacencies between neighborhood voxels. A cube is comprisesd of six faces, twelve edges, and eight vertices; thus, there are twenty-six possible directions. The associated codeword values of these directions are also described in Figure 4, using a to z characters.     The codeword values are assigned, depending on the direction of the movement between a voxel to its adjacent one. A movement between two voxels is defined as a set of codeword values c (i), and each c (i) contains three elements that represent a face, edge and vertex connection, such that c (i) ∈ {0, 1, −1}. For a visual explanation, Figure 5 demonstrates a path of the voxelized edge, and its associated codeword values can be assigned based on the adjacencies and directions between voxels along that path. The path contains seven voxels; thus, there are six directional movements between its voxels. Then, the associated codeword values of these movements can be defined by six sets, where each set contains three elements, as described below.
each ( ) contains three elements that represent a face, edge and vertex connection, such that ( ) ∈ 290 {0, 1, −1}. For a visual explanation, Figure 5 demonstrates a path of the voxelized edge, and its 291 associated codeword values can be assigned based on the adjacencies and directions between voxels 292 along that path. The path contains seven voxels; thus, there are six directional movements between 293 its voxels. Then, the associated codeword values of these movements can be defined by six sets, where 294 each set contains three elements, as described below.  After assigning the codeword values for all voxels along the boundary, we need to detect the points that have convexity changes. These points are called the inflection points (denoted by the red stars in Figure 6b. They can be detected by calculating the differential operations of the codeword values. where i is the i th voxel of the lung boundary and c(i) is the corresponding codeword value of that voxel. The non-zero points after the differential operation of c(i) can then be defined as inflection points f (i). Once inflection points are detected, we calculate and fill the volume of the gap surrounded by the inflection points, as demonstrated in Figure 6. Unlike the hole filling using SE [4,7,10,12,20,21], our border reconstruction using 3D chain coding can accurately fill the volume of the gap, i.e., the exact region of juxta-pleural nodules is filled without resulting in under-or over-filling.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 25 stars in Figure 6b. They can be detected by calculating the differential operations of the codeword 299 values.

300
where is the ℎ voxel of the lung boundary and ( ) is the corresponding codeword value of that 301 voxel. The non-zero points after the differential operation of ( ) can then be defined as inflection  being prone to FPs can also increase the computational workload. Therefore, it is crucial to reduce

316
FPs before the classification process.

317
In this paper, we initially performed simple thresholding to segment all the structures inside the 318 lungs, such as nodules, airways, blood vessels, and fissures. These structures expose the attenuation 319 ranging from −910 HU to −500 HU [35] on the CT scans. We choose the low-density value of -910 HU 320 as a threshold and segment all the objects higher than that value. This helps to ensure the detection 321 of nodules with ground glass opacities that have very faint intensities.

322
As an illustration, Figure 7a depicts the volumetric structure of the segmented lung regions 323 using the automated seeded region growing and 3D chain-coding methods. Figure

Preliminary Screening
Preliminary screening is a strategy to remove potential false positives (FPs) from the image. Generally, the automatic detection of nodules produces a relatively high number of non-nodule lesions, called FPs, because most of the lung structures such as airways, bronchi, and blood vessels possess a grey level that is similar to that of the real nodules. As a negative impact, these structures are usually misclassified as nodules and results in a decline in classification accuracy. Additionally, being prone to FPs can also increase the computational workload. Therefore, it is crucial to reduce FPs before the classification process.
In this paper, we initially performed simple thresholding to segment all the structures inside the lungs, such as nodules, airways, blood vessels, and fissures. These structures expose the attenuation ranging from −910 HU to −500 HU [35] on the CT scans. We choose the low-density value of -910 HU as a threshold and segment all the objects higher than that value. This helps to ensure the detection of nodules with ground glass opacities that have very faint intensities.
As an illustration, Figure 7a depicts the volumetric structure of the segmented lung regions using the automated seeded region growing and 3D chain-coding methods. Figure 7b describes the initial segmented lung structures using thresholding, and it contains a high number of FPs. To reduce these FPs, we initially applied 3D connected component labeling [36] and analyzed the shape of each labeled object. Generally, nodules are circular, while other lesions, especially vessels and fissures, are tubular in their structure. Based on this fact, we calculated three criteria, namely, surface area, eccentricity, and voxel remove rate (VRR) [13] in order to check the shape of each candidate object.

331
Let be a labeled object of a roughly segmented lung structure, and its surface area can be 332 calculated by where ( , , ) is a voxel value of at the three-dimensional coordinate point ( , ,   Let l be a labeled object of a roughly segmented lung structure, and its surface area A can be calculated by where l (x, y, z) is a voxel value of l at the three-dimensional coordinate point (x, y, z). By checking the surface area, we can exclude huge and tiny objects that show a high probability of being non-nodules. We empirically set two threshold values, T1 = 10 and T2 = 3700, to eliminate the FPs outside that range.
Similarly, we calculated the eccentricity Ecc to check the roundness of each labeled object. The Ecc value ranged from 0 (a circular object) to 1 (a tubular object), and we empirically applied the Ecc threshold as 0.9 to remove the tubular objects.
Then, another criterion VRR was also calculated, based on the skeleton structure of the labeled object. Skeletonization gives a topological structure by transforming the labeled object into a set in the interconnected single-voxel centerline. By calculating the VRR, we could exclude the elongated skeletal structures, such as the vessels and bronchioles.
where ls is the skeleton of l. Then, we empirically set the VRR threshold as 0.9 to remove the vessels and bronchioles.
After a preliminary screening using these three criteria, the FPs could be eliminated, as illustrated in Figure 7c. However, it could not guarantee the removal of all FPs because the nodules might have attached to the adjacent pulmonary structures, particularly the vessels, in some cases. This type of nodule is called a juxta-vascular nodule, and it may be excluded, together with the tubular structures during preliminary screening. To prevent this, we set very high thresholds (0.9) for Ecc and VRR. Nonetheless, it was not a perfect way to detect all the vessel-attached nodules. A more effective way would be to cut the nodules from the attached vessels.
Fortunately, the nodules were in blob shapes in a three-dimensional structure, and hence, a Hessian-based blob enhancement filter could help to discriminate them from the vessels. Let V be the volumetric image, and H be its Hessian matrix, which can be calculated by the second-order partial derivatives of the voxels in V.
where I is the identity matrix and λ = {λ 1 , λ 2 , λ 3 } is the associated eigenvalues of H. These three eigenvalues can provide information about the shape of the objects as follows: Depending on the shape that we want to enhance, different mathematical computations can be done using these three eigenvalues. In our proposed method, we intend to detect pulmonary nodules. Hence, we enhance the blob-like structures and suppress tube-like objects using the following equations [37].
where T is the cutoff threshold value (between 0 and 1); x is the coordinate points of the V and s is the gaussian scales. Using these equations, we can discriminate the juxta-vascular nodules from the attached vessel, as illustrated in Figure 8.

Feature Extraction
The features are the fundamental factors to determine the nodules and non-nodules. Thus, they must contain sufficient and relevant information in order to produce correct diagnosis results. In clinical practice, radiologists manually detect and classify the nodules, depending on several radiological criteria such as size, shape, margin, calcification, contrast, and density. Based on these characteristics, we extract image features from each suspicious nodule. We assume that the features such as size, shape, margin, and density are associated with the geometric image features. Similarly, the calcification and contrast are associated with the intensity and contrast feature of the image.
We extracted 25 features in total from both 2D and 3D structures of the nodule candidates. The extracted features and their mathematical calculations can be seen in Table 1. In the formulae, N represents a suspicious nodule in a 2D or 3D structure. For 2D feature extraction, the median slice of the suspicious nodule is used because it has the maximum number of pixels, as compared to the other slices. Assume that the coordinate points are only x and y in 2D and x, y and z in 3D. dx, dy, and dz represent the diameters of N; G N represents the grey value of N and n represents the total number of pixels (voxels) in N.

Improved Random Forest
Several authors have proved that the random forest (RF) [38] is a significant machine learning algorithm for the classification of pulmonary nodules. RF is computationally effective in solving problems with noisy data and overfitting. It is a combination of multiple decision trees h 1 (x), h 2 (x), . . . h K (x), where x is a group of input samples, and K is the total number of trees in the forest. Each tree h i (x), i = {1, 2, . . . K} works in parallel and produces its own prediction p i independently. The final prediction result of RF is decided by major voting and finding the winning class among p i . Individual tree h i (x) is grown on a random subset of samples R i , i = {1, 2, . . . K} from the training data T with replacements. The number of sample n in each random subset R i is the same. RF does not use all the samples n in R i for the training of individual tree. It only uses 36.8% of n for training and the rest are used for internal validation.
Suppose R i (x) is a random subset that has a group of samples x. Each sample in x has attributes A i , i = {1, 2, . . . .a}, where a is the total number of attributes. However, RF does not use all the attributes A i to grow the trees. Instead, it selects random attributes A r by calculating the Gini values of A i .
where c is the total number of classes and p(y i ) is the probability of the class y i . Each attribute A i has values v j where m i is the number of values and p v i, j is the probability that the attribute A i has the value v j . Then, p y i v i,j is the probability of the class y i conditioned by the attribute A i having value v j .
The Gini calculation is easy and fast because it evaluates the attributes separately without considering the conditional dependencies and interactions with other attributes. As a negative impact, the performance of the RF may degrade when the attributes are strongly dependent on each other. Moreover, using random attributes to grow each tree also makes the individual trees weaker because random selection cannot be guaranteed to obtain the relevant and non-redundant attributes [39].
For this reasons, we aim to improve the conventional RF by combining it with attribute selection algorithms. Instead of selecting random attributes by calculating the Gini value, we select optimal ones to grow each decision tree. We apply four state-of-the-art attribute selection algorithms, namely RELIEFF [40], genetic algorithm (GA) [41], particle swarm optimization (PSO) [42], and firefly algorithm (FA) [43], to measure the quality of the attributes. Put simply, we use these four feature selection algorithms as a pre-processing step before growing individual trees. Each algorithm estimates the quality of the attributes and selects r number of attributes that have the highest quality. The decision trees in the RF are then grown using these selected attributes. Figure 9 demonstrates the architecture of our optimized RF, with i decision trees (i = 1, 2, 3 . . . , K). Unlike the random attribute selection of the original RF, the attribute selection algorithms in our optimized RF take into account the conditional dependencies between the attributes. The Gini calculation is easy and fast because it evaluates the attributes separately without 406 considering the conditional dependencies and interactions with other attributes. As a negative 407 impact, the performance of the RF may degrade when the attributes are strongly dependent on each 408 other. Moreover, using random attributes to grow each tree also makes the individual trees weaker 409 because random selection cannot be guaranteed to obtain the relevant and non-redundant attributes 410 [39].

411
For this reasons, we aim to improve the conventional RF by combining it with attribute selection 412 algorithms. Instead of selecting random attributes by calculating the Gini value, we select optimal 413 ones to grow each decision tree. We apply four state-of-the-art attribute selection algorithms, namely 414 RELIEFF [40], genetic algorithm (GA) [41], particle swarm optimization (PSO) [42], and firefly 415 algorithm (FA) [43], to measure the quality of the attributes. Put simply, we use these four feature 416 selection algorithms as a pre-processing step before growing individual trees. Each algorithm 417 estimates the quality of the attributes and selects number of attributes that have the highest 418 quality. The decision trees in the RF are then grown using these selected attributes.

432
Then, the final assessment of cross-validation can be obtained by calculating the average. This idea 433 helps to provide more accurate assessments on the predictability of the proposed nodule detection

Results and Discussions
To design and evaluate the performance of our proposed nodule detection, we randomly divided the input materials (888 CT exams of LIDC-IDRI) into k subsets where k = 5 and performed k-fold cross-validation. Unlike other validation methods that only use one training and one testing subset, k-fold cross-validation makes our proposed nodule detection more robust. Among k subsets of samples, it uses a single subset for testing and the remaining k-1 subsets for training. In this way, cross-validation iterates the training and testing processes k times so that each of the k subsets have appeared as a test set exactly once in order to validate the performance of the proposed method. Then, the final assessment of cross-validation can be obtained by calculating the average. This idea helps to provide more accurate assessments on the predictability of the proposed nodule detection on unknown samples. Moreover, k-fold cross-validation helps our proposed nodule detection prevent overfitting.
After preparing data for cross-validation, we try to set the hyperparameters of our proposed RF. As the performance of RF strongly depends on the hyperparameters used, selecting appropriate values of the hyperparameters is very important. Hyperparameters are the values that are predefined before running the model. Generally, these values are manually set by the developer because they cannot be estimated from the data. The common hyperparameters that are applied in RF are (i) the number of decision trees in the forest, (ii) the maximum number of features selected to develop an individual tree, (iii) the maximum number of depths of an individual tree, (iv) the minimum number of samples required to split an internal node, and (v) the minimum number of samples at the leaf node. Proper adjustment of these hyperparameter values can help the RF's prediction results be more reliable and robust.
In this experiment, we define the optimal hyperparameter values using the "trials and errors." We initially set a possible range of values that correspond to a specific hyperparameter and perform cross-validation. The accuracy score generated by RF, using each value of the hyperparameter, is measured, and the value that can provide the highest accuracy is picked as an optimal hyperparameter.
For the first hyperparameter, in regards to the number of decision trees used in RF, we define a range from 100 to 500. Then, the accuracy scores, using each number of decision trees from the range, is calculated. Subsequently, a validation curve is created to show and compare the accuracy scores that are produced using the different number of decision trees, as illustrated in Figure 10a. From this curve, we can see that the highest accuracy value (0.9304) is reached using (230) decision trees. Hence, we choose (230) as an optimal hyperparameter value for the number of decision trees in RF.
In general, most RFs set the second hyperparameter, which is the maximum number of features, as the square root of the total number of features. However, in our research, we tested using a range from 1 to 25 (total number of extracted features) and chose 15 as a hyperparameter because it could ensure the highest accuracy (0.9232), as demonstrated in Figure 10b. As well as, the third hyperparameter, the maximum number of depths of an individual tree is selected by finding the highest accuracy using a range from 2 to 20. As we can see in Figure 10c, the FR reached the highest accuracy (0.9285) at the depths (18); thus, we selected that number as the third hypermeter.
In the same way, the next hyperparameter, the minimum number of samples to split an internal leaf node is selected as (5) by testing a range from 2 to 20, as depicted in Figure 10d. Finally, the last hyperparameter, the minimum number of samples at the leaf node, is chosen as (2) as it could give the highest accuracy score (0.9178), as shown in Figure 10e.

436
After preparing data for cross-validation, we try to set the hyperparameters of our proposed RF.

437
As the performance of RF strongly depends on the hyperparameters used, selecting appropriate   To summarize the hyperparameter adjustments, Table 2 lists the optimal values of each hyperparameter used in our proposed RF. All of these hyperparameters are selected using an original version of RF before combining any attribute selection algorithm. Using these hyperparameter values, we developed the five RFs (original and four improved RFs). All of these RFs are implemented using the same setting of parameters and evaluated by k-fold cross-validation. In each iteration of k, we assessed the following statistical measurements and calculated the average for the final evaluation results.
Speci f icity (Spec) = TN TN + FP (22) Fasle positive (FP) = FP Number o f exams (23) F1 score (F1) = 2TP 2TP + FP + FN (24) where TP, TN, FP, and FN are true positive, true negative, false positive, and false negative, respectively. In addition to these measurements, we also provide a graphical method called the receiver operating characteristics (ROC) curve to evaluate the performance of the proposed nodule detection quickly. ROC is a two-dimensional graph showing the false-positive rate (FPR) (1-Specificity) in the X-coordinate and the true-positive rate (TPR) (Sensitivity) in the Y-coordinate [44]. The continuous outputs of the RFs, i.e., the scores that represent the probability of a sample belonging to a specific class, are applied to generate the ROC curve. Firstly, the samples are sorted depending on the descending order of their corresponding scores. Then, a threshold value is set as the maximum (α), and it decreases until the minimum (-α). In every decrease, the threshold value is compared to the prediction scores. If the score of a sample is equal to or higher than the threshold, that sample can be classified as a positive class (1); otherwise, it is defined as a negative class (0).
At the maximum threshold value (α), the scores of all samples are less than α; therefore, all the samples are classified as negative classes. As a result, the (FPR, TPR) at that threshold is (0,0), and we plot it on the graph. Inversely, if the threshold value is minimum (-α), then the coordinate points of (FPR, TPR) reach (1,1), meaning that all the samples are classified as positives. In this way, the ROC curve is generated by plotting (FPR, TPR) for every threshold value. Figure 11 demonstrates an example of ROC plotting based on different threshold values. Suppose we have ten samples of two classes (five for positives and the rest for negatives), and they are arranged according to their prediction scores. At first, the threshold value is set as α and calculates the FPR and TPR. Then, the threshold value is reduced until it equals the score of the second sample, and (FPR, TPR) is recomputed by comparing that threshold with the scores. In this way, the threshold is decreased, and (FPR, TPR) is computed for all the samples. Finally, we can generate the ROC curve by plotting the resulting (FPR, TPR) values on a 2D coordinate system (as shown in the right side of Figure 11). The ROC in the figure exposes a step function because only ten samples are tested in this example. The real ROC curves in practice are smoother and are more complex, as compared to the curve in this example [44]. Based on this ROC, we can know that the highest performance of the classification model in this example is reached at the point (0.2,0.8), which is the minimum FPR and maximum TPR. If a ROC curve reaches the point (0,1), it means that the model produces an ideal performance because it can correctly predict all positive and negative samples.  Our experiment uses five classification models (original RF and four improved RFs), and each iterates the training and testing process five times for five-fold cross-validations. Therefore, in every iteration, we generate an ROC of each RF. After that, the mean ROC curves are generated by interpolating the single ROCs, as illustrated in Figure 12a-e. Finally, the comparison of the mean ROCs of five RFs is shown in Figure 12f.

521
However, comparing the ROC curves directly from the graph is very difficult as they are not 522 scalar values. For this reason, we add another assessment measurement called the area under the 523 curve (AUC) to ascertain the specific differences between the ROC curves. As the name implies, AUC 524 can be measured by calculating the area of the trapezoid region under the ROC curve. Table 3 525 Figure 12. Comparison of ROC curves for five RFs. (a) Original RF, (b) improved RF using RELIEFF attribute selection algorithm, (c) improved RF using genetic algorithm (GA) attribute selection algorithm, (d) improved RF using particle swarm optimization (PSO) attribute selection algorithm, (e) improved RF using firefly (FA) attribute selection algorithm, and (f) comparison of the mean ROCs of five RFs with respected to five-fold cross-validation.
However, comparing the ROC curves directly from the graph is very difficult as they are not scalar values. For this reason, we add another assessment measurement called the area under the curve (AUC) to ascertain the specific differences between the ROC curves. As the name implies, AUC can be measured by calculating the area of the trapezoid region under the ROC curve. Table 3 summarizes all the assessment values of each RF with respect to the five-fold cross-validation. From this table, we can know that the optimized RF that applies the firefly attribute selection (RF+FA) can provide superior performance, as compared to its counterparts. Some example of outputs of our proposed nodule detection are demonstrated in Figure 13. Figure 13a shows the detection of a sharp solid nodule that resides inside the lung without touching other lesions. Such nodules are easy to detect, and our proposed method well-detected them. However, Figure 13b shows the results of an isolated ground glass nodule (GGO) that had a very faint intensity. This type of nodules is hard to detect due to their hazy structures. However, the proposed method can successfully detect them because we used a very low-intensity value (−910 HU) during nodule candidate detection. Moreover, our method ensures the detection of juxta-pleural nodules that are connected to the lung boundary. Figure 13c represents the detection of a small juxta-pleural nodule, and Figure 13d represents the detection of a very big one, respectively. Our method can ensure the detection of juxta-pleural nodules, regardless of the size, which is beneficial to the use of the 3D chain coding method for border reconstruction.  Lastly, Figure 13e shows the detection of a juxta-vascular nodule, which is also challenging to detect due to nearby pulmonary structures. Our method can successfully detect them with the help of the blob enhancement filter in preliminary screening. Therefore, it is evident that our proposed method can detect any variants of nodules.
Moreover, we compute execution time for our proposed nodule detection because it is an essential factor in evaluating the performance of an algorithm. Table 4 describes the profile summary, which shows the time complexity for each function to detect the pulmonary nodules from a single exam with a maximum number of slices. From this table, we can see that our automated nodule detection takes 144.4 seconds (2.4 minutes) in maximum to predict a single CT exam on a standard processor with Intel(R) Core (TM) i7-6500U CPU @2.6 GHz. Lastly, we conduct the performance comparison of the proposed nodule detection with some of the state-of-the-art techniques discussed in Section 2.1. While these techniques worked on the same dataset (LIDC-IDRI), they used a different number of samples and different performance assessment measurements. For a fair comparison, we quantitatively compared each method based on the same assessment measurements, as demonstrated in Table 5. From the table, we can analyze that the techniques in [13] and [25] achieved higher and the same accuracy values, compared to our proposed method, but their performances were assessed using fewer samples. Moreover, the performance of [24] is also quite high, using the same number of samples. However, [24] applied the CNN, which consumes a high computation cost and is applicable in some specialized computers that have good GPUs. Unlike their methods, our proposed method applied the improved random forest by firefly (RF + FA), which is preferable and applicable in any standard computers. Moreover, the proposed automated nodule detection is promising for the detection of variant size and type of nodules in a high detection accuracy.

Conclusions
In this paper, we presented a fully automated nodule detection for lung cancer diagnosis. Firstly, we proposed an automated seeded region growing (SRG) to segment the lungs from the input CT volume images. The strength of the proposed SRG is that it can select initial seed regions automatically and perform the segmentation without requiring homogeneity criteria to determine the similarity between the pixels. Secondly, we designed a 3D chain code algorithm for the detection of juxta-pleural nodules. It traces the voxelized boundary of the segmented lungs to identify the concave regions along the border and enclose them by detecting the inflection points. Compared to the 2D border reconstruction methods, the use of 3D chain coding takes less effort and provides more accurate results. Moreover, it does not require training data or other classification algorithms to find the start-point and end-point of the nodule gap. Finally, we developed an optimized random forest (RF) classifier using four attribute selection algorithms, namely, RELIEFF, genetic algorithm (GA), particle swarm optimization (PSO), and firefly algorithm (FA), for false reduction. Our proposed nodule detection was evaluated using 888 exams of LIDC-IDRI, and a favorable performance of 93.11% accuracy, 94.86% sensitivity, and 91.37% specificity with only 0.0863 false positives per scan was achieved using optimized random forest by firefly algorithm. Moreover, our proposed nodule detection ensures the detection of variant type and size of nodules using a standard machine. Therefore, it can be applied as a supportive tool for physicians and radiologists to diagnose lung cancer. However, as a limitation, the preliminary screening of our nodule detection depends on some empirical threshold values to reduce false positives. In future work, the pulmonary nodules detected by our proposed method can be further classified into different clinical stages to determine the severity of lung cancer.
Author Contributions: M.P.P. and C.P. conceived and designed this study; M.P.P. performed the experiments, simulations, and draft preparation of the paper; C.P., K.H., S.T. and S.V. reviewed and edited the paper. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.