A Clinical Decision-Support System Based on Three-Stage Integrated Image Analysis for Diagnosing Lung Disease

Thoracic computed tomography (CT) technology has been used for lung cancer screening in high-risk populations, and this technique is highly effective in the identification of early lung cancer. With the rapid development of intelligent image analysis in the field of medical science and technology, many researchers have proposed computer-aided automatic diagnosis methods for facilitating medical experts in detecting lung nodules. This paper proposes an advanced clinical decision-support system for analyzing chest CT images of lung disease. Three advanced methods are utilized in the proposed system: the three-stage automated segmentation method (TSASM), the discrete wavelet packets transform (DWPT) with singular value decomposition (SVD), and the algorithms of the rough set theory, which comprise a classification-based method. Two collected medical CT image datasets were prepared to evaluate the proposed system. The CT image datasets were labeled (nodule, non-nodule, or inflammation) by experienced radiologists from a regional teaching hospital. According to the results, the proposed system outperforms other classification methods (trees, naïve Bayes, multilayer perception, and sequential minimal optimization) in terms of classification accuracy and can be employed as a clinical decision-support system for diagnosing lung disease.


Introduction
In recent years, substantial difficulties have been encountered in the treatment of lung cancer, which have attracted increasing attention in medical research. Since 2008, according to medical statistics, lung cancer has been the cancer with the highest mortality rate. If lung cancer could be diagnosed in its initial stage, the 5-year survival probability of patients would increase to 70% [1]. Therefore, the early diagnosis of lung cancer generally increases the chances for successful treatment. Thoracic computed tomography (CT) is a highly effective tool that facilitates the diagnosis of lung metastases in tumor patients and the assessment of the progression of lung tumors during their treatment. With improvements in CT scanners and image analysis methods, the diagnostic sensitivity of pulmonary nodules has also improved [2,3]. In recent years, CT technology has been used for lung cancer screening in high-risk populations, and this technique is highly effective in the identification of early lung cancer [4].
Since chest CT technology is mainly utilized for the diagnosis of lung disease, medical specialists must spend substantial time and effort analyzing numerous chest CT slices. As a result, expert

Singular Value Decomposition (SVD)
The singular value decomposition (SVD) method is a matrix factorization technique [12] for image analysis. The method is introduced briefly in this section. SVD can reduce a high-rank matrix to a low-rank matrix while preserving important information. Thus, SVD is a dimension reduction method. Supposing the input image is represented by a M × N matrix A with rank r. Via SVD, matrix A can be decomposed as follows [17] · · · u 1,M u 2,1 · · · u 2,M . . .
where U and V are orthogonal matrices whose dimensions are M × M and N × N, respectively, and D, which is called the singular matrix, is an M × N diagonal matrix whose diagonal entries are nonnegative real numbers.

Discrete Wavelet Packet Transform (DWPT)
The wavelet transform (WT) method has been applied in various fields, such as telecom, the target recognition of radars, and the image classification of textures [11]. The main advantage of the wavelet transform method is that it can be applied to various window sizes and to slow and fast frequencies [18]. Since the window can adapt to the transient state of each scale, the wavelet transform method does not require a "stationarity" condition to be satisfied [19,20].
In the traditional approach, the discrete wavelet transform (DWT) method can only recursively decompose low-frequency bands. However, some high-frequency bands should also be decomposed to obtain additional information. The DWPT is an extension of the DWT and enables both detail and approximation results to be decomposed further, therefore this method can use low-pass filter and high-pass filter collections to decompose more detailed coefficients. In 1-level wavelet packet decomposition, the cell of 'L-L1' is an approximation and the other three cells ('L-H1', 'H-L1', and 'H-H1') are the detail results. In 2-level wavelet packet decomposition, the cell of 'L-L2' is an approximate result, and the other 15 cells are detail results. The main advantage of DWPT is that it can combine various levels of decomposition to generate the optimal time-frequency representation of an original source [21].
The standard 2D-DWPT method can be applied in a low-pass filter h and a high-pass filter [16]. The 2D-DWPT of an N × M discrete image X up to level p + 1 p ≤ min log 2 N, log 2 M is defined, along with the coefficients at level p, by the following Equations (1) where C 0 0,(i, j) is image X, and in each step, image C p k is decomposed into four quarter-sized images, namely, C p+1 4k , C p+1 4k+1 , C p+1 4k+2 , C p+1 4k+3 , as illustrated in Figure 1.

Discrete Wavelet Packet Transform (DWPT)
The wavelet transform (WT) method has been applied in various fields, such as telecom, the target recognition of radars, and the image classification of textures [11]. The main advantage of the wavelet transform method is that it can be applied to various window sizes and to slow and fast frequencies [18]. Since the window can adapt to the transient state of each scale, the wavelet transform method does not require a "stationarity" condition to be satisfied [19,20].
In the traditional approach, the discrete wavelet transform (DWT) method can only recursively decompose low-frequency bands. However, some high-frequency bands should also be decomposed to obtain additional information. The DWPT is an extension of the DWT and enables both detail and approximation results to be decomposed further, therefore this method can use low-pass filter and high-pass filter collections to decompose more detailed coefficients. In 1-level wavelet packet decomposition, the cell of 'L-L1' is an approximation and the other three cells ('L-H1', 'H-L1', and 'H-H1') are the detail results. In 2-level wavelet packet decomposition, the cell of 'L-L2' is an approximate result, and the other 15 cells are detail results. The main advantage of DWPT is that it can combine various levels of decomposition to generate the optimal time-frequency representation of an original source [21].
The standard 2D-DWPT method can be applied in a low-pass filter ℎ and a high-pass filter [16]. The 2D-DWPT of an × discrete image up to level + 1 ≤ (log , log ) is defined, along with the coefficients at level , by the following Equations (1)-(4) [22] ,( , ) = ℎ( )ℎ( ) ,( , ) where ,( , ) is image , and in each step, image is decomposed into four quarter-sized images, namely, , , , , as illustrated in Figure 1.  An orthonormal wavelet basis is selected; the computed coefficients are independent, with a distinct feature of the original signal [23]. According to Muneeswaran et al. [23], wavelet packets can be represented by basic functions, as in Equations (5) and (6): An orthonormal wavelet basis is selected; the computed coefficients are independent, with a distinct feature of the original signal [23]. According to Muneeswaran et al. [23], wavelet packets can be represented by basic functions, as in Equations (5) and (6): where p denotes a scale index, l represents a translation index, h is a low-pass filter, g is a high-pass filter, and g(k) = (−1) k h(1 − k). The function W 0 (x) is a scaling function Φ, and W 1 (x) has the mother wavelet Ψ. Pawlak (1982) proposed the rough sets for extracting rules from a large number of instances to support decision making [13]. This theory can be regarded as a new mathematical approach to vagueness [15]. The theory of "rough sets" is based on the assumption that every object is associated with some information (knowledge). For example, while an object is a manifestation of a patient with a disease, information is only a symptom of the disease. Jothi et al. (2016) proposed a TRSFFQR (tolerance rough set firefly-based quick reduct) for selecting features, and applied it to MRI brain images [16]. The effectiveness of the rough sets in the area of medical CT image analysis has been proven. For an introduction to the concept of rough sets, refer to the related literature [13][14][15].

Rough Sets Theory
The rough sets method is used to analyze information system (IS) via a series of logical inference processes. An IS can be regarded as a decision table that is denoted by IS = (U, A, C, D), in which U is the universe of discourse and A is a set that consists of primitive features (characters or variables). Let A = C ∪ D, C ∩ D = ∅,and C, D ⊂ A be two subsets of features, where C is a "condition feature" and D denotes a decision feature. The inexactness of an approximation classification is defined as the quality of the approximation of X by B. This refers to the percentage of objects that are correctly classified into class X using the feature B [13]. The quality of the classification accuracy is defined in Equation (7) If γ B (A) = 1, the decision table is consistent; otherwise, it is not consistent. Feature reduction is an important task in rough sets, in which the set of reduced features can realize the same quality of approximation as the original full set of features. Using feature reduction, rules can be generated for determining the values of a decision feature based on the values of condition features; the rules are represented as "IF condition(s) THEN decision(s)".

Medical Image Datasets
In this research, two medical image datasets were prepared to evaluate the proposed system. One was collected from the lung image database consortium (LIDC), and the other was obtained from a regional teaching hospital (RTH) in Taiwan.

LIDC Image Dataset
The LIDC dataset was collected from five sites in the United States [24]. The LIDC dataset is formatted as DICOM and has a high resolution and sensitivity to chest anatomy. The dataset is composed of 100 chest CT images. The images come from patients of different genders, ages and case histories. In addition, the 100 chest images were evaluated by three experienced radiologists Symmetry 2020, 12, 386 5 of 20 from a regional teaching hospital in Taiwan and labeled with three categories: nodule, non-nodule, and inflammation.

RTH Image Dataset
The RTH dataset was collected by the same process as the LIDC dataset and contains 100 CT images. Three radiologists participated in this research and gave evaluation results (nodule or non-nodule) for the 100 images contained in the RTH image dataset. Nodules located in the central and peripheral areas of the CT image are labeled with "nodule". The image format for the RTH dataset is "JPG". Although its resolution is lower than that of the LIDC dataset, we believe that this will not impact the analysis results of the human anatomical images.

Proposed System
In this paper, we have proposed an advanced clinical decision-support system based on the three proposed methods: the three-stage automated segmentation method (TSASM), the discrete wavelet packets transform (DWPT) with singular value decomposition (SVD), and rough set algorithms. The framework of the proposed system is illustrated in Figure 2. The system has four processing blocks, which are introduced briefly in the following.

Image Processing Block (A)
In this block, chest CT images are preprocessed with two sub-processes as follows: (1) adjusting the image contrast: adjusting the image contrast based on the density difference between the lung and thoracic cavity areas; and (2) outlining the lung areas from a chest CT image-outline the lung areas of a chest CT image with a box field by the three-stage automated segmentation method (TSASM) and the region-growing method (RGM) [25,26].

Reconstruction Block (B)
This block applies the SVD method to compute the singular values for the processed lung image from block (A); these values will be utilized in the reconstruction of the lung areas in the chest CT image in the next block.

Feature Extraction Block (C)
This block consists of two sub-processes: (1) The process of image construction with the wavelet packet coefficients for the chest CT image and two reconstruction methods (DWPT and DWPT with SVD) are provided in this block to analyze the chest CT image with wavelet packet coefficients; and (2) employ the "reduct sets" from the rough set theory to select wavelet packet features and reduce the features of the chest CT image.

Classification Block (D)
In this block, the algorithms of the rough set theory (LEM2) [27] are employed to generate understandable rules for medical image specialists by extracting and classifying the two image datasets (reconstructed and non-reconstructed with SVD), which have been processed previously by the DWPT method in block (C).

Proposed Procedure
The procedure of the proposed system is composed of six steps: (1) adjusting the image contrast, (2) outlining the lung area, (3) reconstructing the image by SVD, (4) generating a coefficient by DWPT, (5) computing the feature values and reducing features, and (6) classifying the lung image dataset. The detailed steps are introduced as follows. Firstly, the original medical image is automatically adjusted by the contrast adjustment tool to produce a clear image with a high contrast (see Figure 4) and to generate a histogram of its strength  Step 1: Adjusting Image Contrast The contrast of an original medical image is sometimes insufficient (see Figure 3). An adjustment process is required for original medical images and is performed in the proposed system. We use the LIDC image dataset to demonstrate this process with two steps.
Symmetry 2020, 12, 386 7 of 20 (see Figure 5). The chest CT images have two main density distribution areas: (1) low-density areas, representing the background air, lungs, and bronchial trees; and (2) high-density areas, representing fats, muscles, and bones. Secondly, the adjusted image from the above process (see Figure 5) is tuned again, with its contrast as follows. The display range for the image is set from the starting point of the 'background' and the end point before the 'fat', and the selected range (from the 'background' to 'fat') is expanded to the whole pixel range (the whole range is from −32,768 to 32,767 with an integer type). Figure 6 illustrates the tuning process for expanding the display range. After this process, the high-density areas (representing fats, muscles, and bones) are shown with one density (a white color) and excluded from the image display (see Figure 7).    Firstly, the original medical image is automatically adjusted by the contrast adjustment tool to produce a clear image with a high contrast (see Figure 4) and to generate a histogram of its strength (see Figure 5). The chest CT images have two main density distribution areas: (1) low-density areas, representing the background air, lungs, and bronchial trees; and (2) high-density areas, representing fats, muscles, and bones. (see Figure 5). The chest CT images have two main density distribution areas: (1) low-density areas, representing the background air, lungs, and bronchial trees; and (2) high-density areas, representing fats, muscles, and bones. Secondly, the adjusted image from the above process (see Figure 5) is tuned again, with its contrast as follows. The display range for the image is set from the starting point of the 'background' and the end point before the 'fat', and the selected range (from the 'background' to 'fat') is expanded to the whole pixel range (the whole range is from −32,768 to 32,767 with an integer type). Figure 6 illustrates the tuning process for expanding the display range. After this process, the high-density areas (representing fats, muscles, and bones) are shown with one density (a white color) and excluded from the image display (see Figure 7).    (see Figure 5). The chest CT images have two main density distribution areas: (1) low-density areas, representing the background air, lungs, and bronchial trees; and (2) high-density areas, representing fats, muscles, and bones. Secondly, the adjusted image from the above process (see Figure 5) is tuned again, with its contrast as follows. The display range for the image is set from the starting point of the 'background' and the end point before the 'fat', and the selected range (from the 'background' to 'fat') is expanded to the whole pixel range (the whole range is from −32,768 to 32,767 with an integer type). Figure 6 illustrates the tuning process for expanding the display range. After this process, the high-density areas (representing fats, muscles, and bones) are shown with one density (a white color) and excluded from the image display (see Figure 7).    Secondly, the adjusted image from the above process (see Figure 5) is tuned again, with its contrast as follows. The display range for the image is set from the starting point of the 'background' and Symmetry 2020, 12, 386 8 of 20 the end point before the 'fat', and the selected range (from the 'background' to 'fat') is expanded to the whole pixel range (the whole range is from −32,768 to 32,767 with an integer type). Figure 6 illustrates the tuning process for expanding the display range. After this process, the high-density areas (representing fats, muscles, and bones) are shown with one density (a white color) and excluded from the image display (see Figure 7).  Step 2: Outlining the Lung Area To clearly refine the region of interest (ROI), an image segmentation method is proposed in this system (a three-stage automated segmentation method (TSASM)) to outline the lung areas with a box field. Figure 8 demonstrates the image processing processes of the proposed method. Each stage of this method is introduced as follows.

Segmenting the Chest CT Image
With this process, we can remove most of the irrelevant areas from the chest CT image from Step 1. Figure 9 illustrates the unprocessed and processed images. The algorithm of this process is listed as Algorithm 1.  Step 2: Outlining the Lung Area To clearly refine the region of interest (ROI), an image segmentation method is proposed in this system (a three-stage automated segmentation method (TSASM)) to outline the lung areas with a box field. Figure 8 demonstrates the image processing processes of the proposed method. Each stage of this method is introduced as follows.

Segmenting the Chest CT Image
With this process, we can remove most of the irrelevant areas from the chest CT image from Step 1. Figure 9 illustrates the unprocessed and processed images. The algorithm of this process is listed as Algorithm 1. Step 2: Outlining the Lung Area To clearly refine the region of interest (ROI), an image segmentation method is proposed in this system (a three-stage automated segmentation method (TSASM)) to outline the lung areas with a box field. Figure 8 demonstrates the image processing processes of the proposed method. Each stage of this method is introduced as follows.  Step 2: Outlining the Lung Area To clearly refine the region of interest (ROI), an image segmentation method is proposed in this system (a three-stage automated segmentation method (TSASM)) to outline the lung areas with a box field. Figure 8 demonstrates the image processing processes of the proposed method. Each stage of this method is introduced as follows.  With this process, we can remove most of the irrelevant areas from the chest CT image from Step 1. Figure 9 illustrates the unprocessed and processed images. The algorithm of this process is listed With this process, we can remove most of the irrelevant areas from the chest CT image from Step 1. Figure 9 illustrates the unprocessed and processed images. The algorithm of this process is listed as Algorithm 1.

Removing Irrelevant Background Areas
With this process, we can remove the irrelevant background areas of the CT image and save the lung regions for further diagnostic analysis. Figure 10 demonstrates the unprocessed and processed images. The algorithm for this process is listed as Algorithm 2.

Removing Irrelevant Background Areas
With this process, we can remove the irrelevant background areas of the CT image and save the lung regions for further diagnostic analysis. Figure 10 demonstrates the unprocessed and processed images. The algorithm for this process is listed as Algorithm 2.

Removing Irrelevant Background Areas
With this process, we can remove the irrelevant background areas of the CT image and save the lung regions for further diagnostic analysis. Figure 10 demonstrates the unprocessed and processed images. The algorithm for this process is listed as Algorithm 2.

Outlining the Lung Areas with a Box Field
Through this process, the lung areas can be outlined clearly with a box field. Figure 11 demonstrates the unprocessed and processed images. The algorithm for this process is listed as Figure 10. Process for removing the irrelevant background areas.

Outlining the Lung Areas with a Box Field
Through this process, the lung areas can be outlined clearly with a box field. Figure 11 demonstrates the unprocessed and processed images. The algorithm for this process is listed as Algorithm 3. Using the processed images as experimental datasets, we can reduce the computing complexity for the proposed clinical decision-support system.  Step 3: Reconstructing the Image by SVD This step applies SVD to proceed with the decomposition and reconstruction of the lung images. Given an input lung image, IMG-A (see Figure 12; the pixel size is 512 × 512), the SVD method can decompose IMG-A as U × D × VT, where U and V are both a square matrix (512 × 512) and D is a singular diagonal matrix. The values of the diagonal cell in the D matrix are singular values of IMG-A. Figure 13 demonstrates the partial singular values for IMG-A. If more values are located in the upper left side, more important characteristics for IMG-A are generated. After the singular values are generated, we can reconstruct IMG-A with these values. More singular values are used for reconstruct, as a more distinct image is produced. Figure 14 demonstrates the reconstructed images with various singular values (10, 20, 30, and all). Step 3: Reconstructing the Image by SVD This step applies SVD to proceed with the decomposition and reconstruction of the lung images. Given an input lung image, IMG-A (see Figure 12; the pixel size is 512 × 512), the SVD method can decompose IMG-A as U × D × VT, where U and V are both a square matrix (512 × 512) and D is a singular diagonal matrix. The values of the diagonal cell in the D matrix are singular values of IMG-A. Figure 13 demonstrates the partial singular values for IMG-A. If more values are located in the upper left side, more important characteristics for IMG-A are generated. After the singular values are generated, we can reconstruct IMG-A with these values. More singular values are used for reconstruct, as a more distinct image is produced. Figure 14 demonstrates the reconstructed images with various singular values (10, 20, 30, and all).   Step 4: Generating the Coefficient by DWPT Step 4: Generating the Coefficient by DWPT In this step, the discrete wavelet packet transform (DWPT) algorithm is applied to extract features (coefficients) from the lung images. Two sub-processes are involved:

The DWPT Decomposition Process
In this step, we use the DWPT algorithm to process the lung images and a multi-resolution pyramidal structure is applied with a depth m = 1 and 2. The four coefficients of DWPT (one approximate and three detailed coefficients) are produced for each image when m = 1 ( Figure 15 illustrates lung images with four various DWPT coefficients for two types of images: non-reconstructed and reconstructed by SVD). There are 16 coefficients of DWPT (one approximate and 15 detailed coefficients) produced for each of the image regions when m = 2 ( Figure 16 illustrates the lung images with 16 various DWPT coefficients for two types of images: non-reconstructed and reconstructed by SVD). Therefore, the amount of the DWPT coefficients is 16.
Symmetry 2020, 12, 386 13 of 20 In this step, the discrete wavelet packet transform (DWPT) algorithm is applied to extract features (coefficients) from the lung images. Two sub-processes are involved:

The DWPT Decomposition Process
In this step, we use the DWPT algorithm to process the lung images and a multi-resolution pyramidal structure is applied with a depth m = 1 and 2. The four coefficients of DWPT (one approximate and three detailed coefficients) are produced for each image when m = 1 (Figure 15 illustrates lung images with four various DWPT coefficients for two types of images: nonreconstructed and reconstructed by SVD). There are 16 coefficients of DWPT (one approximate and 15 detailed coefficients) produced for each of the image regions when m = 2 ( Figure 16 illustrates the lung images with 16 various DWPT coefficients for two types of images: non-reconstructed and reconstructed by SVD). Therefore, the amount of the DWPT coefficients is 16.

Wavelet Packet Entropy
"Entropy" is a popular approach that is applied to many research areas, such as image processing and signal processing. In DWPT coefficients, the wavelet packet norm entropy value can be generated by the following equation where p is the power, whose numeric range is 1 ≤ p < 2, N is the size of the lung images, and t(i, j) is a transformed value in (i, j) for any sub-band (one of L-Li, L-Hi, H-Li, or H-Hi) of size N × N [28]. In this paper, we assign '1' to p to produce wavelet packet norm entropy values.
(a) non-reconstructed (b) reconstructed by SVD  Step 5: Computing the Feature Values and Reducing Attributes This step computes the feature values of the wavelet packet for image datasets. Using Avei's (2008) method, statistical values are used as inputs for the adaptive-network-based fuzzy inference system (ANFIS) [18]. This paper also applies these statistical feature values of wavelet packets to generate the following features: mean, median, mode, maximum, minimum, range, standard deviation, and the absolute values of the median and mean. In this step, we employ "reduct sets" to select the features of the wavelet packet. Reduct sets denote a subset of features that preserves the completely discernible information from its original information system [14].
Take Table 1 as an example. There are six reducts, size = 1 (size of reduct), Pos.Reg. = 1 (reduct depends totally on the set and 0 ≤ Pos.Reg. ≤ 1), and SC = 1 (the stability coefficient for the reduct and 0 ≤ stability ≤ 1). Based on Table 1, there are six features that can be employed as the inputs for the classification method: the range value, mean value, minimum value, maximum value, standard deviation, and mean absolute value. (1-6) Size Pos.Reg. SC Reducts  1  1  1  1  {range}  2  1  1  1  {mean}  3  1  1  1  {min}  4  1  1  1  {max}  5  1  1  1  {standard-deviation}  6  1  1 1 {mean-absolute-deviation } Step 6: Classifying the Lung Image Dataset In this step, we apply the intelligent classifier, the rough sets theory (LEM2 algorithm [22]), to classify eight lung image datasets (see Figure 27). The decision attributes of the LIDC image dataset include three classes: nodule, non-nodule and inflammation, while the RTH dataset contains two classes: nodule and non-nodule. The feature values of the DWPT coefficient are employed as conditional attributes. Using the rough sets algorithm, understandable rules for classifying lung images are generated, and system accuracy is improved for model verification.

Wavelet Packet Entropy
"Entropy" is a popular approach that is applied to many research areas, such as image processing and signal processing. In DWPT coefficients, the wavelet packet norm entropy value can be generated by the following equation where p is the power, whose numeric range is 1 ≤ p < 2, N is the size of the lung images, and t(i, j) is a transformed value in (i, j) for any sub-band (one of L-Li, L-Hi, H-Li, or H-Hi) of size N × N [28]. In this paper, we assign '1' to p to produce wavelet packet norm entropy values.
Step 5: Computing the Feature Values and Reducing Attributes This step computes the feature values of the wavelet packet for image datasets. Using Avei's (2008) method, statistical values are used as inputs for the adaptive-network-based fuzzy inference system (ANFIS) [18]. This paper also applies these statistical feature values of wavelet packets to generate the following features: mean, median, mode, maximum, minimum, range, standard deviation, and the absolute values of the median and mean. In this step, we employ "reduct sets" to select the features of the wavelet packet. Reduct sets denote a subset of features that preserves the completely discernible information from its original information system [14].
Take Table 1 as an example. There are six reducts, size = 1 (size of reduct), Pos.Reg. = 1 (reduct depends totally on the set and 0 ≤ Pos.Reg. ≤ 1), and SC = 1 (the stability coefficient for the reduct and 0 ≤ stability ≤ 1). Based on Table 1, there are six features that can be employed as the inputs for the classification method: the range value, mean value, minimum value, maximum value, standard deviation, and mean absolute value. Step 6: Classifying the Lung Image Dataset In this step, we apply the intelligent classifier, the rough sets theory (LEM2 algorithm [22]), to classify eight lung image datasets (see Figure 2). The decision attributes of the LIDC image dataset include three classes: nodule, non-nodule and inflammation, while the RTH dataset contains two classes: nodule and non-nodule. The feature values of the DWPT coefficient are employed as conditional attributes. Using the rough sets algorithm, understandable rules for classifying lung images are generated, and system accuracy is improved for model verification.

Experimental Results and Discussions
In this paper, we employed the chest CT image of LIDC and RTH datasets to implement experiments for model evaluation. The image datasets both contained 100 chest CT images (with picture resolutions of 512 × 512). To evaluate the proposed system carefully, we conducted 10 sampling experiments with each image dataset to determine the method's performance. Every experiment employed 40 images as an input dataset, randomly selected from the CT image dataset, and the ratio of training: testing was 3:1 (30 images were used for training, and 10 images were used for testing). The average and standard deviation of the classification accuracy for the 10 experiments are used as the performance indicators.
To verify the proposed system, we adopted many comparison methods in different processes. In image processing, the region-growing method (RGM) [25] (the processes of the RGM are illustrated in Figure A1 of the Appendix A) was used as the comparison method within the proposed method (the three-stage automated segmentation method, TSASM). In feature extraction, the DWPT method was used as a comparison method with the DWPT-SVD method. In the classification process, we employed four advanced methods, applied recently to analyze medical CT images, as comparison models: Trees.J48 [29], Naïve Bayes [30], Multilayer Perception [31], and Sequential Minimal Optimization (SMO) [32].
After the experiments were completed, many parameters were produced. For the processes of SVD reconstruction, the r values of each image are different, ranging from 255 to 478 for the two image datasets and, for the decomposition process, the number of the DWPT coefficients is 16. The experimental results (classification accuracy) for the proposed system and different comparison methods are shown from Tables 2-5. Based on the performance data, we have discovered four findings as follows.    Firstly, from the performance data (Tables 2 and 3), it is clear that the proposed system performs best in classification accuracy among the five listed methods ( Secondly, the performance data (Tables 2 and 3) also shows that the proposed image segmentation method (TSASM) improved the classification accuracy of the five listed classification methods. The accuracy for all methods is slightly better using TSASM than using RGM. For the LIDC dataset, the improvement of accuracy ranges from 1.61% to 8.02% (Rough set theory: 1.61%; Trees.J48: 8.02%; Naïve Bayes: 3.08%; Multilayer Perception: 4.68%; SMO: 3.73%). However, for the RTH dataset, the improvement is not as significant as that of the LIDC dataset, ranging from −4.5% to 5.5% (Rough set theory: 1.29%; Trees.J48:5.5%; Naïve Bayes: −4.5%; Multilayer Perception: 0.5.0 %; SMO: 0.40%). Based on the evidence, we argue that medical images' quality and classification method both influence classification accuracy.
Thirdly, based on Tables 4 and 5, the different signal transformation methods (DWPT and DWPT-SVD) lead to no significant improvements in classification accuracy (from 0.22% to 1.08% for the LIDC; from −0.20% to 2.20% for the RTH). Although the improvement is not high, this method contributed slightly to the improvement of diagnostic accuracy. For diseases with a high mortality rate, this level of improvement is also of importance.
Lastly, as seen in Tables 2-5, the difference in the classification accuracy between the RTH and the RIDC for the proposed system is very small (The difference is "−0.61 %") and insignificant statistically. The reason why the accuracy for the RTH is smaller than that of the LIDC is the difference in image quality between them (e.g., the radiologist's experience in judging nodules, patient's cooperation during CT image scanning, and equipment status issues for the CT scanner). Although the proposed system has excellent classification accuracy in analyzing the chest CT images, more performance comparisons with similarly advanced methods in the related literature are required. Messay et al.'s computer-aided detection (CAD) system is able to correctly identify 80.4% of nodules (115/143) using 40 selected features of the LIDC datasets [24]. Li et al.'s computer-aided diagnosis (CAD) system can achieve an average pancreatic cancer identification accuracy of 96.47% from PET/CT data (from General Hospital of Shenyang Military Area Command) [31]. The sensitivity of nodule candidate detection in the advanced system developed by Xie et al. (2019), based on a 2D convolutional neural network (CNN), was 86.42% [33]. We can see that the identification accuracy for disease diagnosis with CT images has improved rapidly. Compared with Messay et al.'s (2010) system [24], our proposed system performs (99.41% for the LIDC and 98.80% for the RTH) outstandingly. Our system is more reasonable compared to Li et al.'s system (96.47%) [31]. We argue that the great improvement in identification accuracy for computer-aided system of disease diagnosis can be explained in three ways: (1) the knowledge progress of medical image specialists, (2) the improvement of image quality by next-generation scanners, and (3) the advances in image analysis algorithms by researchers.

Conclusions
According to the results reported in Table 2 to Table 5, the proposed system clearly performs best in classification accuracy among the five listed classifiers. Moreover, the standard deviation in classification accuracy for the proposed system is smaller than the comparison methods. This shows that the proposed system performs perfectly and robustly in term of classification accuracy. In segmentation, we see that the proposed image segmentation method (TSASM) improved the classification accuracy of the listed five classification methods (shown in Tables 2 and 3). Based on the results, we argue that the medical image quality and classification method both influence classification accuracy. In signal transformation methods (DWPT and DWPT-SVD), there were no significant improvements in classification accuracy (as shown in Tables 4 and 5). However, this method has contributed slightly to the improvement in diagnostic accuracy. Due to this disease's high mortality rate, this level of improvement is also of importance. Although the proposed system can efficiently improve classification accuracy and be qualified as a clinical decision-support system for diagnosing lung disease to increase clinical quality and efficiency, the accuracy of the classification results from the proposed system was still verified by medical specialists. From the experimental data, it is concluded that the classification algorithm plays a key role in accuracy, and medical image quality also plays a supporting role.
We offer two suggestions for future work: (1) other human organ CT imaging databases can be used to test the proposed system and examine its classification accuracy, and (2) other classification methods (e.g., k-nearest neighbors, random forest) can be applied to the classification process of the proposed system to examine its performance improvement.