All thyroid nodules used in this study consist of histopathologically verified benign and malignant ones. Sonographic images of thyroid nodules were retrospectively evaluated by a specialist sonographer with more than 25 years of experience in the field and the sonographic features of benign and malignant nodules were determined as original. Classification studies are carried out by using an innovative approach based on training of the ANFIS model with the GA algorithm in the differential diagnosis of malignant/benign nodules with the signs determined by the sonographer, which is considered as a real-world problem. In addition, the performance of the used innovative approach is compared to the performances of other artificial intelligence methods based on ANFIS trained with derivative-based back propagation algorithm and Deep Neural Network (DNN). In addition, the Decision Tree Algorithm is applied to determine the most effective signs in the differential diagnosis of malignant/benign nodules. In this study, a new guideline showing significant effect of the sonographic sign on the differential diagnosis of malignant/benign nodules is introduced to the literature. Valuable discussions on the performances of different methods to solve this real-world problem are presented in this paper.
2.1. Dataset Description
A data set covering histopathologically proven 398 thyroid nodules from 224 patients with thyroid cancer is used in this study. Total of 398 nodules were examined in this study, consisting of 284 malignant and 114 benign nodules. Patients were operated on at Güven Hospital between September 2012 and September 2016 with the cytological diagnosis of thyroid cancer. All patients have undergone a final preoperative US examination by same sonographist of multidisciplinary team, and US images were recorded prospectively. US images of patients in study group were reviewed by the same sonographist, and predefined US characteristics of all nodules, including benign ones, were recorded according to final histopathology results. This retrospective study was approved by Güven Hospital Science Committee and waived the requirement for informed consent.
Sonographic examinations, collection, and re-evaluation of all images were performed by the same sonographer with Siemens Acuson brand ultrasound system and a 12 MHz transducer. The sonographic signs of the nodules were classified under 27 categories listed in
Table 1.
For P1, there is no consensus in the literature regarding its relationship with malignancy. Examination of P1 itself and its relationship with malignancy when evaluated together with other features are among the areas of interest of this study.
For P2, features that are “completely solid” and “solid-containing microcysts (<5%)” have a high risk of malignancy.
For P3, features that are “markedly hypoechoic” and “hypoechoic” have a high risk of malignancy.
For P4, if “anteroposterior/transverse diameter” is increased, it is likely to be malignant.
For P5, features that are “irregular (coarse lobulation)”, “irregular (microlobulation)”, “irregular (spiculated)” and “ill-defined” have a high risk of malignancy.
For P6, feature that is “punctate” has a high risk of malignancy.
For P7 and P8, there is no consensus in the literature regarding their relationship with malignancy. Examination of P7 and P8 themselves and their relationship with malignancy when evaluated together with other features are among the areas of interest of this study.
For P9, feature that is “hypoechoic” has a high risk of malignancy.
For P10, P11, P12, and P13, there is no consensus in the literature regarding their relationship with malignancy. Examination of P10, P11, P12, and P13 themselves and their relationship with malignancy, when evaluated together with other features, are among the areas of interest of this study.
For P14, if “interruption in echogenic capsule if there is a capsule relationship” is present or gross extrathyroidal, it is likely to be malignant.
For P15, P16, P17, and P18, there is no consensus in the literature regarding their relationship with malignancy. Examination of P15, P16, P17, and P18 themselves and their relationship with malignancy, when evaluated together with other features, are among the areas of interest of this study.
For P19, features that are “microechogenicity in the solid component” and “eccentric solid component and microechogenicity” have a high risk of malignancy.
For P20, P21, P22, P23, and P24, there is no consensus in the literature regarding their relationship with malignancy. Examination of P20, P21, P22, P23, and P24 themselves and their relationship with malignancy, when evaluated together with other features, are among the areas of interest of this study.
For P25, features that are “suspicious (<5 mm)”, “suspicious (5–10 mm)”, “suspicious (>10 mm)”, “typical metastatic (<5 mm)”, “typical metastatic (5–10 mm)” and “typical metastatic (>10 mm)” have a high risk of malignancy.
For P26, there is no consensus in the literature regarding its relationship with malignancy. Examination of P26 itself and its relationship with malignancy, when evaluated together with other features, are among the areas of interest of this study.
For P27, features that are “suspicious” and “typical” have a high risk of malignancy.
The mean age of 224 patients aged from 16 to 77 years old was 40.98, and 49 (21.875%) of the patients were male and 175 (78.125%) were female.
2.4. Training ANFIS Using the GA Algorithm
The proposed innovative approach for training the ANFIS model with the GA algorithm is explained in detail throughout this section. Optimization of premise and consequent parameters in ANFIS has been one of the main problems, mostly due to reasons such as the slow convergence of the derivative-based algorithms, their inability to exceed the local minimum, and their dependence on the initial values to a large extent. A population-based genetic algorithm, which is a powerful algorithm that will eliminate the above-mentioned disadvantages of derivative-based algorithms, has been used to optimize the parameters of a difficult model such as ANFIS in this context. The premise and consequent parameters of ANFIS, whose initial values are generated by the FCM clustering method, represent a chromosome in the genetic algorithm. The GA tries to find the best premise and consequent parameters with the chromosome with the most appropriate fitness value considering the training process of ANFIS. The block diagram of the proposed method is depicted in
Figure 3. The root mean square error (RMSE) function is used to calculate the fitness value of the solution.
During the optimization of premise and consequent parameters, the aim is to minimize the RMSE function. The most effective RMSE value is the value obtained when the actual value and the estimated value are closest to each other. The GA tries to find the most effective RMSE value until the stopping criterion is met. The number of iterations is utilized as the stopping criterion in the present study.
Classification studies have been carried out by training the ANFIS model with the GA algorithm in order to diagnose the nodules as malignant/benign (see
Figure 3). The number of parameters to be optimized is related to the number of inputs, the number of membership functions, the type of membership functions and the number of rules in the training processes of the ANFIS model with the GA algorithm. Classification studies have been conducted for three different cases in this study. Three different ANFIS models used the 27, 13, and 8 sonographic signs as input. In addition, gaussmf has been used as a membership function, and 10 membership functions have been utilized for each entry within the scope of this study. Therefore, the total numbers of parameters to be optimized for three ANFIS models are 820, 400, and 250, respectively.
Training and test datasets of 398 thyroid nodules containing 27 sonographic signs are created with a random sampling method by determining the nodules as original. In this context, they are divided into two different groups as 70–30% and 80–20%. In addition, the K-fold cross validation method, one of the commonly used cross validation methods, is used to accurately evaluate the ability of the proposed method to be generalized where the objective is to repeat an experiment under independent conditions and to test the validity of its results in the K-fold cross validation method. Specifically, 5-fold and 10-fold cross-validation methods are used for data splitting in this study.
The performance of optimization algorithms largely depends on control parameters. Determination of these parameter values may vary according to the problem they are applied to, and there is no specific rule or method. Many attempts are required to determine the most appropriate control parameter values. In this context, many test attempts are performed to determine the control parameters of the GA algorithm. After these attempts, control parameters are determined as follows: the number of iterations as 100, the number of populations as 50, the crossover rate as 0.4, and the mutation rate as 0.15.
The ANFIS network is also trained with derivative-based Back Propogation (BP) and Hybrid (HB) algorithms to evaluate the performance and contribution of the proposed method over derivative-based algorithms. In this context, the learning rate for the BP algorithm and the momentum coefficient are also chosen as 0.2 and 0.4, respectively. The HB is considered as a combined method that consists of using least squares estimation and the BP algorithm. The number of iterations for the BP and HB is set as 100. Moreover, recently popular and highly successful Deep Neural Network (DNN) is utilized to diagnose the nodules as malignant/benign for comparison purpose and emphasizing the performance of the proposed method. For the simulation studies, Keras deep learning library is used to create a feed-forward neural network. Similar to the GA control parameter determination, many attempts are made to decide on the various control parameters of the DNN. Afterward, four hidden layers with 48, 36, 12, and 6 neurons per layer, respectively, are used, and the sigmoid logistic regression function is applied to the output layer. Also, the RMSprop algorithm is chosen for the optimization of the DNN model. The control parameters used in the DNN model are set as follows: the momentum coefficient as 0.9, the learning rate as 0.03, the weight decay as 0.00005, and the dropout rate as 0.2 to avoid the network from overfitting.
Commonly used Accuracy (AC), sensitivity (SN), and specificity (SP) measurements are used to evaluate the performance of the proposed method. In order to determine these measurements given in Equations (1)–(3), the expressions of the model showing TP (true positive) / TN (true negative) correct classifications and FP (false positive) / FN (false negative) false classifications are analyzed. Accuracy measures the model’s ability to accurately classify samples. In addition, sensitivity is the percentage of correctly classified actual positives, while specificity shows how well negative examples are predicted by the model.