Cascaded Deep Learning Neural Network for Automated Liver Steatosis Diagnosis Using Ultrasound Images

Diagnosing liver steatosis is an essential precaution for detecting hepatocirrhosis and liver cancer in the early stages. However, automatic diagnosis of liver steatosis from ultrasound (US) images remains challenging due to poor visual quality from various origins, such as speckle noise and blurring. In this paper, we propose a fully automated liver steatosis prediction model using three deep learning neural networks. As a result, liver steatosis can be automatically detected with high accuracy and precision. First, transfer learning is used for semantically segmenting the liver and kidney (L-K) on parasagittal US images, and then cropping the L-K area from the original US images. The second neural network also involves semantic segmentation by checking the presence of a ring that is typically located around the kidney and cropping of the L-K area from the original US images. These cropped L-K areas are inputted to the final neural network, SteatosisNet, in order to grade the severity of fatty liver disease. The experimental results demonstrate that the proposed model can predict fatty liver disease with the sensitivity of 99.78%, specificity of 100%, PPV of 100%, NPV of 99.83%, and diagnostic accuracy of 99.91%, which is comparable to the common results annotated by medical experts.


Introduction
Early diagnosis and treatment of liver steatosis, defined as the abnormal accumulation of fat in more than 5% of liver cells, are critically important [1] to prevent further progression of liver diseases, such as hepatocirrhosis and hepatocellular carcinoma [2][3][4]. Ultrasound (US) is the most widely used imaging technique, particularly for diagnosing liver steatosis [5,6]. However, US images are inevitably degraded by speckle noise, blurring, shading, and other artifacts, which cause adverse effects, sometimes leading to misdiagnosis based on image interpretation [7,8]. The US image quality strongly depends on how effectively the speckle noise is reduced. Thus, many attempts have been made to reduce speckle noise and improve visual quality for better diagnoses [9][10][11][12][13][14][15]. However, despite the improved performance, these methods still suffer from several limitations as they are sensitive to the selected kernel or prone to image blurring. In addition to reducing speckle noise, much work has been carried out to assess the level of liver steatosis more precisely by applying complicated algorithms, statistical models, or image-processing techniques to US images. Among these, the hepatorenal index (HRI) and the gray-level co-occurrence matrix (GLCM) are the most commonly known accurate, simple, and cost-effective tools used in the screening for liver steatosis [16][17][18][19]. However, these methods significantly depend on the skill of choosing the region of interest (ROI) and the experience of physicians performing the examination.
Recently, several deep learning-based artificial intelligence approaches have been introduced in the literature [20][21][22][23][24][25] to overcome the issues and challenges associated with US image quality and operator dependency. Andrea et al. [20] proposed a computer-aided Sensors 2021, 21, 5304 2 of 16 diagnosis (CAD) system based on feature extraction to assist in the classification task of liver pathologies. The incorporated feature extraction is based on first-order statistics, co-occurrence matrix, run-length matrix, and fractal dimensions, where three different classifiers are used for the evaluation of certain features, including artificial neural network, support vector machine (SVM), and k-nearest neighbor. However, the CAD system achieved the accuracy of 79.77%, which is not sufficient for automated clinical use. In addition, Zhang et al. [21] used a shallow convolutional neural network (CNN)-based model to extract texture features from US images and detect liver steatosis levels. Their experiment was generally based on the unrealistic assumption that the texture of a normal liver US image is uniform, while that of the fatty liver is nonuniform. The actual liver US images acquired from a commercial scanner are too obscure and shaded to confidently classify fatty liver using such a shallow CNN-based model. A deep CNN pretrained through transfer learning was first applied by Byra et al. [22] and compared with the HRI and GLCM, showing that the pretrained CNN produces a better result. Transfer learning is a machine learning technique where a model trained on one task is re-purposed on a second related task. It makes use of the knowledge gained while solving one problem and applying it to a different but related problem [26,27]. However, considering the performances of the HRI and GLCM, which are greatly affected by the selection of ROIs, it is difficult to believe that the pretrained CNN outperforms the HRI-and GLCM-based classification methods. Cao et al. [23] compared three image-processing techniques: envelope signal, grayscale values, and a neural network. Although the comparison showed that the neural network had the best sensitivity and specificity in assessing the severity of nonalcoholic fatty liver disease, the result of a deep learning neural network was not considered. They used a shallow network architecture with only three convolutional layers and two fully connected layers for the experiment.
As an actual deep learning approach for assessing liver steatosis, a previous study [24] used transfer learning with two pretrained networks, VGG16 and Inception v3, which are currently the most preferred models of deep learning neural networks. According to the results, the transfer deep learning exhibits high accuracy and sensitivity in classifying normal and fatty liver images. Nevertheless, further studies are required for automated patch selection because the assessment still requires the use of patches manually chosen according to physician preference, which could have significantly influenced the results of fatty liver estimates. More recently, research using deep learning networks has been conducted, but some limitations remain. Zamanian et al. [25] used four pretrained networks, specifically Inception v2, GoogLeNet, AlexNet, and ResNet101, to extract features from initial data. All features from the four networks were then summed and classified using an SVM. The results were compared with those from the four individual networks. Although improved results were expected, the actual experimental results show that the individual pretrained networks are more accurate than the proposed algorithm combining the four networks. Specifically, AlexNet and ResNet101 produce better results, but they still contain errors.
In the present work, a cascaded deep learning neural network is proposed to automatically estimate the level of liver steatosis from a US image. The model constitutes three deep learning neural networks for liver and kidney (L-K) detection, ring detection, and grading the severity of disease (i.e., SteatosisNet).
(a) L-K detection involves cropping of the L-K area from a given US image. To achieve this, the DeepLabv3+ model [28] is employed for the semantic segmentation [29] of L-K candidate areas. This is combined with transfer learning to speed up the training and improve the performance of the model. (b) Ring detection involves checking the L-K area obtained from a US image by checking for the presence of a ring that typically appears around the kidney. This method is employed for areas that are difficult to detect using only the L-K detection method described above.
(c) SteatosisNet takes the above L-K areas as the input and grades the severity of fatty liver disease. It incorporates transfer learning using a CNN model called Inception v3 [30] with a dataset comprising the obtained cropped L-K areas.
We present very promising results regarding the accuracy, sensitivity, and specificity of the proposed model using a dataset from the Samsung Medical Center (SMC) and the widely adopted Byra dataset [22]. The rest of the paper is organized as follows. Data preparation and preprocessing are explained in Section 2.1 and Section 2.2, respectively. In Section 2.3, the proposed cascaded deep learning neural network is briefly described. Sections 2.4 and 2.5 present the L-K detection and ring detection results, respectively. Using the experimental results, the quality of the proposed cascaded deep learning neural network is illustrated in Section 3. Finally, the work is concluded in Section 4.

Dataset Preparation
The liver US images used in this study were collected using a Siemens ACUSON Sequoia 512 US machine, with the frequency range of 3-6 MHz, 256 Gy levels, and maximum depth of 36 cm, from the SMC, which is one of South Korea's leading hospitals. In addition to this main dataset, we collected liver US images from a public dataset [22]. The whole dataset comprised 3200 images obtained in the parasagittal scanning plane. As shown in Figure 1, the parasagittal scanning plane is where most liver parts, the right kidney, and the diaphragm are well-visualized in US imaging. (c) SteatosisNet takes the above L-K areas as the input and grades the severity of fatty liver disease. It incorporates transfer learning using a CNN model called Inception v3 [30] with a dataset comprising the obtained cropped L-K areas.
We present very promising results regarding the accuracy, sensitivity, and specificity of the proposed model using a dataset from the Samsung Medical Center (SMC) and the widely adopted Byra dataset [22]. The rest of the paper is organized as follows. Data preparation and preprocessing are explained in Sections 2.1 and 2.2, respectively. In Section 2.3, the proposed cascaded deep learning neural network is briefly described. Sections 2.4-2.5 present the L-K detection and ring detection results, respectively. Using the experimental results, the quality of the proposed cascaded deep learning neural network is illustrated in Section 3. Finally, the work is concluded in Section 4.

Dataset Preparation
The liver US images used in this study were collected using a Siemens ACUSON Sequoia 512 US machine, with the frequency range of 3-6 MHz, 256 Gy levels, and maximum depth of 36 cm, from the SMC, which is one of South Korea's leading hospitals. In addition to this main dataset, we collected liver US images from a public dataset [22]. The whole dataset comprised 3200 images obtained in the parasagittal scanning plane. As shown in Figure 1, the parasagittal scanning plane is where most liver parts, the right kidney, and the diaphragm are well-visualized in US imaging. Medical experts previously annotated the images as normal, mild, moderate, or severe, according to the level of steatosis. Then, the US image dataset was randomly split into training, validation, and test sets with a 6:2:2 ratio, respectively, as listed in Table 1, where the training set was used to build an optimized network model through supervised learning to label unknown test examples.  Medical experts previously annotated the images as normal, mild, moderate, or severe, according to the level of steatosis. Then, the US image dataset was randomly split into training, validation, and test sets with a 6:2:2 ratio, respectively, as listed in Table 1, where the training set was used to build an optimized network model through supervised learning to label unknown test examples.

Preprocessing
The images were resized to 960 px × 720 px and converted to the PNG file format for inputting to our deep learning network. Then, all metadata and unnecessary black parts were removed from the dataset before applying histogram equalization (HE), as shown in Figure 2. HE is a useful image-processing technique that adjusts image intensities to enhance contrast between medical devices [31]. This allows our deep learning model to maintain better compatibility with images from different equipment or existing databases.
OR PEER REVIEW 4 of 17

Preprocessing
The images were resized to 960 px × 720 px and converted to the PNG file format for inputting to our deep learning network. Then, all metadata and unnecessary black parts were removed from the dataset before applying histogram equalization (HE), as shown in Figure 2. HE is a useful image-processing technique that adjusts image intensities to enhance contrast between medical devices [31]. This allows our deep learning model to maintain better compatibility with images from different equipment or existing databases.

Proposed Cascaded Deep Learning Neural Network
The proposed cascaded deep learning neural network is shown in Figure 3. It consists of three cascaded neural networks.
(i) L-K detection: In this step, a pretrained deep learning neural network was used for cropping the L-K area while classifying parasagittal and non-parasagittal images. (ii) Ring detection: This step checks the parasagittal images via so-called "ring semantic segmentation (RSS)," where the presence of a ring, that is typically located around the kidney, was determined in the images. (iii) Liver steatosis grading: The SteatosisNet used an Inception V3 network [32] transferlearned with cropped L-K images. Once being transfer-learned, the liver and kidney areas obtained from the above steps (i) and (ii) were taken as the input, and the grade of liver steatosis was determined.

Proposed Cascaded Deep Learning Neural Network
The proposed cascaded deep learning neural network is shown in Figure 3. It consists of three cascaded neural networks.
(i) L-K detection: In this step, a pretrained deep learning neural network was used for cropping the L-K area while classifying parasagittal and non-parasagittal images. (ii) Ring detection: This step checks the parasagittal images via so-called "ring semantic segmentation (RSS)," where the presence of a ring, that is typically located around the kidney, was determined in the images. (iii) Liver steatosis grading: The SteatosisNet used an Inception V3 network [32] transferlearned with cropped L-K images. Once being transfer-learned, the liver and kidney areas obtained from the above steps (i) and (ii) were taken as the input, and the grade of liver steatosis was determined.
The following sections detail the steps required for L-K detection, ring detection, and liver steatosis grading.  The following sections detail the steps required for L-K detection, ring detection, and liver steatosis grading.

Liver and Kidney (L-K) Detection
In this step, a semantic segmentation network (SSN) was used to localize and crop the L-K area, while a CNN was employed to classify the US images into two categories: parasagittal and non-parasagittal. Accordingly, a novel L-K detection method was designed, such that it cascades the SSN to the CNN. Figure 4 illustrates cropping of the L-K area and determination of images as parasagittal or not through the serial connection of the SSN and CNN. The steps involved in L-K detection are summarized as follows.
(a) Cropping of the L-K area: SSN was employed to obtain an L-K labeled image from a given HE image. (b) Classifying 1st parasagittal and non-parasagittal images: The output of the SSN was used as the input for the CNN, which then classified the L-K labeled image as a parasagittal or non-parasagittal image. (c) Masking operation: The logical AND operation between the L-K labeled area and HE image yielded the cropped L-K image (ROI ).

Liver and Kidney (L-K) Detection
In this step, a semantic segmentation network (SSN) was used to localize and crop the L-K area, while a CNN was employed to classify the US images into two categories: parasagittal and non-parasagittal. Accordingly, a novel L-K detection method was designed, such that it cascades the SSN to the CNN. Figure 4 illustrates cropping of the L-K area and determination of images as parasagittal or not through the serial connection of the SSN and CNN. The steps involved in L-K detection are summarized as follows.  Localizing and cropping the L-K cortex on a US image was the most important step in our study because it offered crucial and rich information for predicting the level of liver (a) Cropping of the L-K area: SSN was employed to obtain an L-K labeled image from a given HE image. (b) Classifying 1st parasagittal and non-parasagittal images: The output of the SSN was used as the input for the CNN, which then classified the L-K labeled image as a parasagittal or non-parasagittal image. (c) Masking operation: The logical AND operation between the L-K labeled area and HE image yielded the cropped L-K image (ROI 1st LK ).

Cropping L-K Area
Localizing and cropping the L-K cortex on a US image was the most important step in our study because it offered crucial and rich information for predicting the level of liver steatosis. For this, we used a DeepLabv3+ network pretrained through transfer learning that helped to segment the L-K area more effectively where the pretrained network was further trained on the specific target of interest, such as the liver and kidney. First, the DeepLabv3+ network was initialized with the weights from a pretrained ResNet-18 network and then transferred to the L-K labeled dataset (total of 2.650 images) to obtain a new classifier for segmenting the L-K area. Figure 5 presents some semantic segmentation results when the transfer learning network was applied for both parasagittal (top row) and non-parasagittal (bottom row) images. Figure 5b shows the ground truth for the two classes, liver and kidney, labeled with different colors. Meanwhile, Figure 5c shows the corresponding L-K labeled images overlaid onto the original HE images. When the L-K labeled images were obtained, the L-K area, herein referred to as the "ROI LK image," could be logically cropped via the masking operation, defined as where ∩ indicates the operator that performs the masking operation between the HE images and L-K labeled area. In this figure, it is also apparent that the image on the bottom row, compared with those on the top row, is much less segmented into red or green because it is a non-parasagittal image. Thus, the ROI LK images are easily classified into parasagittal and non-parasagittal images using the CNN, which is trained by supervised learning with a training set collected from the L-K labeled images manually annotated as parasagittal or non-parasagittal.
x FOR PEER REVIEW 7 of 17

Classifying Parasagittal and Non-Parasagittal Images
When the L-K labeled images were obtained, they were inputted to the CNN for classification as parasagittal and non-parasagittal images, resulting in the first parasagittal and non-parasagittal images. If an L-K labeled image was classified as a parasagittal image with high possibility, then the corresponding HE image was categorized as a para-

Classifying Parasagittal and Non-Parasagittal Images
When the L-K labeled images were obtained, they were inputted to the CNN for classification as parasagittal and non-parasagittal images, resulting in the first parasagittal and non-parasagittal images. If an L-K labeled image was classified as a parasagittal image with high possibility, then the corresponding HE image was categorized as a parasagittal image. For this purpose, a CNN model called Inception v3 was transfer learned using a dataset of L-K labeled images. Figure 6 shows the data split between training, validation, and test sets for the transfer learning of Inception v3, where each set consists of parasagittal and non-parasagittal L-K labeled images. The resulting transfer-learned network achieved the accuracy of 99.90% for parasagittal detection on the test set.

Classifying Parasagittal and Non-Parasagittal Images
When the L-K labeled images were obtained, they were inputted to the CNN for classification as parasagittal and non-parasagittal images, resulting in the first parasagittal and non-parasagittal images. If an L-K labeled image was classified as a parasagittal image with high possibility, then the corresponding HE image was categorized as a parasagittal image. For this purpose, a CNN model called Inception v3 was transfer learned using a dataset of L-K labeled images. Figure 6 shows the data split between training, validation, and test sets for the transfer learning of Inception v3, where each set consists of parasagittal and non-parasagittal L-K labeled images. The resulting transfer-learned network achieved the accuracy of 99.90% for parasagittal detection on the test set.  Table 1, and then divided into (b) training (60%), validation (20%), and testing (20%) sets regarding steatosis level.

Masking Operation
As shown in Figure 7, the masking operation was used to extract the L-K regions from the HE images by taking the logical AND operation on the L-K labeled area and HE image, yielding the cropped L-K image (ROI ). The masking operation removes all unnecessary components in assessing the steatosis level of the liver, except the L-K regions, and thus provides better prediction of liver steatosis severity.  Table 1, and then divided into (b) training (60%), validation (20%), and testing (20%) sets regarding steatosis level.

Masking Operation
As shown in Figure 7, the masking operation was used to extract the L-K regions from the HE images by taking the logical AND operation on the L-K labeled area and HE image, yielding the cropped L-K image (ROI 1st LK ). The masking operation removes all unnecessary components in assessing the steatosis level of the liver, except the L-K regions, and thus provides better prediction of liver steatosis severity.  The ROI images were inputted to SteatosisNet for grading the severity of fatty liver disease as normal, mild, moderate, or severe. As will be explained in Section 3.2, compared with when non-cropped images were applied, the use of cropped L-K images improved the grading accuracy by approximately 4.5%. This was mainly because Steato-sisNet can pay more attention to L-K features cropped by semantic segmentation [32], without irrelevant information.

Ring Detection
Ring detection was a further step for identifying parasagittal images that might have been missed during L-K detection. Therefore, the input of ring detection would represent The ROI 1st LK images were inputted to SteatosisNet for grading the severity of fatty liver disease as normal, mild, moderate, or severe. As will be explained in Section 3.2, compared with when non-cropped images were applied, the use of cropped L-K images improved the grading accuracy by approximately 4.5%. This was mainly because SteatosisNet can pay more attention to L-K features cropped by semantic segmentation [32], without irrelevant information.

Ring Detection
Ring detection was a further step for identifying parasagittal images that might have been missed during L-K detection. Therefore, the input of ring detection would represent the 1st non-parasagittal image. One of the outstanding features of parasagittal images is the ring-shaped contour encircling the kidney cortex; thus, if a ring-shaped contour can be found on a US image, then it is most likely a parasagittal image. As described in Figure 8, ring detection includes two steps. The first step includes RSS, which is a type of semantic segmentation for identifying ring objects at the pixel level on a given 1st non-parasagittal image. To achieve this, DeepLabv3+ was transfer learned using the same parasagittal training set presented in Figure 6, but labeled with two ring objects, each encircling the liver or kidney cortex.   After the two rings were semantically segmented on the US image, their inner portions (i.e., hole regions) could be completely filled with the corresponding color labels to readily produce an L-K labeled image. Therefore, the hole-filling process of the ring-segmented image is the second step in which the morphological closing operation [33] was applied to the ring-segmented image, resulting in an L-K labeled image. As shown in Figure 9, an L-K-labeled image derived from a parasagittal image was more likely to be parasagittal. Finally, the CNN and masking operation, as described in Sections 3.2 and 3.3, could again be applied to the L-K labeled image to obtain the 2nd parasagittal and the corresponding ROI 2nd LK images, forming the set of ROI LK , given by Figure 10 shows an example of the effectiveness of ring detection, where a 1st nonparasagittal image, which should be parasagittal, could be reclassified as parasagittal through ring detection. According to the experimental results, the detection accuracy of parasagittal images increased by 0.07% upon the application of ring detection, and hence a very high performance was achieved. The set of ROI LK images was inputted to SteatosisNet for grading the severity of fatty liver disease as normal, mild, moderate, or severe.

Results
The proposed deep learning model was implemented with MATLAB programming language on a machine with a 2-way GeForce RTX 2080 Ti GPU 11GB. Liver US images were collected from the SMC, and a public dataset (https://zenodo.org/record/1009146#.YL2a5fkzYuU (accessed on 21 May, 2021), [24]) to verify the performance of the proposed cascaded deep learning model. The images were categorized based on the level of disease severity: normal, mild, moderate, or severe. In addition, data augmentation techniques [34] were used to generate more training data, where affine transformations, such as a random rotation of ±20 and random translation of ±5 pixels in the horizontal/vertical direction, were applied to the original dataset. These data augmentations help avoid overfitting issues while training. As shown by the experiments, the proposed cascaded deep learning neural network yields better performance than the recently re-

Results
The proposed deep learning model was implemented with MATLAB programming language on a machine with a 2-way GeForce RTX 2080 Ti GPU 11GB. Liver US images were collected from the SMC, and a public dataset (https://zenodo.org/record/1009146# .YL2a5fkzYuU (accessed on 21 May 2021), [24]) to verify the performance of the proposed cascaded deep learning model. The images were categorized based on the level of disease severity: normal, mild, moderate, or severe. In addition, data augmentation techniques [34] were used to generate more training data, where affine transformations, such as a random rotation of ±20 and random translation of ±5 pixels in the horizontal/vertical direction, were applied to the original dataset. These data augmentations help avoid overfitting issues while training. As shown by the experiments, the proposed cascaded deep learning neural network yields better performance than the recently reported results (see Table 2), confirming the advantages of the proposed model. The results of L-K detection, RSS, and SteatosisNet are described in detail below.

Performances of L-K Detection and Ring Semantic Segmentation
The resulting performances of semantic segmentation related to L-K detection and ring detection are presented in Table 2. In this work, a cross-entropy loss was used when adjusting the model weights during training of the neural networks, while the semantic segmentation quality was evaluated using metrics, such as the mean accuracy, mean intersection over union (IOU), and boundary F-1 score (BF1 score), as shown in Table 2. Table 2. Semantic segmentation performances of L-K detection and ring detection (IOU: intersection over union, BF1: boundary F-1). The mean IOU is a common evaluation metric for image semantic segmentation and quantifies the percentage overlap between the ground truth and predicted pixels, whereas the BF1 score is a measured value of how close the boundaries of segmented images match those in the ground truth. Table 2 shows that the BF1 score was relatively low compared with the mean accuracy or the mean IOU. This is because speckle noise is inherently present in medical US images. Fortunately, in assessing liver steatosis, echogenicity and echotexture from the L-K areas are much more important than the boundary. The BF1 score merely indicates how well the predicted boundary aligns with the true boundary, and hence does not significantly affect the prediction accuracy of hepatic steatosis. Therefore, it makes sense to improve the IOU or accuracy metrics rather than the BF1 score by either adjusting training parameters or augmenting data. The mean accuracy, IOU, and BF1 score were lower for ring detection than for the semantic segmentation of the L-K area, but this is not very important because ring detection only determines the edge of the liver and kidney. It was also found that the overall detection accuracy of parasagittal images could be improved by up to 99.97% upon ring detection.

Performance of SteatosisNet
As shown in Figure 11, SteatosisNet classifies ROI LK images, each falling into one of four categories regarding the level of steatosis: normal, mild, moderate, or severe. but this is not very important because ring detection only determines the edge of the liver and kidney. It was also found that the overall detection accuracy of parasagittal images could be improved by up to 99.97% upon ring detection.

Performance of SteatosisNet
As shown in Figure 11, SteatosisNet classifies ROI images, each falling into one of four categories regarding the level of steatosis: normal, mild, moderate, or severe. SteatosisNet uses the CNN model Inception v3, transfer-learned with 2,0 ROILK images, to grade the severity of fatty liver disease. The ROILK images were split into four categories according to the steatosis level: normal (0%-5%), mild (5%-30%), moderate (30%-70%), and severe (70%-100%). The model was trained with the Adam (adaptive moment estimation) optimizer, along with momentum (momentum rate = 0.9). The initial learning rate was set as 0.001. The max epoch with the termination condition of validation accuracy of < 99.98 was set as 10 to guarantee sufficient training of the model as well as mitigate network overfitting. Early stopping is a technique used to terminate the training before overfitting occurs. The training terminates immediately when the termination condition is satisfied. Shuffling of the training data is applied at the beginning of every epoch to help the model converge on the optimal solution sooner. Figure 12a shows the training and validation accuracy (y-axis) over 10 training epochs (x-axis). The corresponding loss is displayed in Figure 12b. Note that the validation set is not used to update the network weights, but to assess whether a model suffers from overfitting. As shown in Figure 12b, SteatosisNet uses the CNN model Inception v3, transfer-learned with 2,0 ROI LK images, to grade the severity of fatty liver disease. The ROI LK images were split into four categories according to the steatosis level: normal (0-5%), mild (5-30%), moderate (30-70%), and severe (70-100%). The model was trained with the Adam (adaptive moment estimation) optimizer, along with momentum (momentum rate = 0.9). The initial learning rate was set as 0.001. The max epoch with the termination condition of validation accuracy of <99.98 was set as 10 to guarantee sufficient training of the model as well as mitigate network overfitting. Early stopping is a technique used to terminate the training before overfitting occurs. The training terminates immediately when the termination condition is satisfied. Shuffling of the training data is applied at the beginning of every epoch to help the model converge on the optimal solution sooner. Figure 12a shows the training and validation accuracy (y-axis) over 10 training epochs (x-axis). The corresponding loss is displayed in Figure 12b. Note that the validation set is not used to update the network weights, but to assess whether a model suffers from overfitting. As shown in Figure 12b, the training progress stops early at epoch 10. This is when the validation accuracy is below 99.98%.  The experimental results were assessed using performance evaluation metrics, including the classification accuracy, sensitivity, and specificity. The analytical comparison in Figure 13 shows how much the performance of SteatosisNet is improved with use of the cropped L-K image compared with the non-cropped image. The performance evalua-  The experimental results were assessed using performance evaluation metrics, including the classification accuracy, sensitivity, and specificity. The analytical comparison in Figure 13 shows how much the performance of SteatosisNet is improved with use of the cropped L-K image compared with the non-cropped image. The performance evaluation metrics improved by approximately 4-5% on average when cropped US images were used as the input to SteatosisNet. This is because the cropped L-K image does not contain unnecessary parts; therefore, SteatosisNet can focus more on liver steatosis-related areas, leading to better results.
x FOR PEER REVIEW 13 of 17 Figure 13. Performance comparison between use of (a) non-cropped and (b) cropped images for the SMC dataset.
The proposed model was compared with state-of-the-art results provided in the published literature [22][23][24][25][26][27]. As seen in Table 3, the proposed model is the best regarding various performance evaluation metrics, such as accuracy, sensitivity, and specificity. The resulting metrics based on the testing dataset reached almost 99%-100%. It is also apparent from Table 3 that the proposed model exhibits almost the same performance for both the SMC and Byra datasets. Thus, it can be concluded that the proposed cascaded neural network model is fairly robust between databases and between different US image qualities. The results of this study reveal that the proposed model can serve as a valid and reliable screening tool for estimating the level of steatosis, and for identifying patients who require further investigation. Table 3. Performance comparison regarding classification accuracy, sensitivity, and specificity (%) with recently published state-of-the-art algorithms.
The proposed model was compared with state-of-the-art results provided in the published literature [22][23][24][25][26][27]. As seen in Table 3, the proposed model is the best regarding various performance evaluation metrics, such as accuracy, sensitivity, and specificity. The resulting metrics based on the testing dataset reached almost 99-100%. It is also apparent from Table 3 that the proposed model exhibits almost the same performance for both the SMC and Byra datasets. Thus, it can be concluded that the proposed cascaded neural network model is fairly robust between databases and between different US image qualities. The results of this study reveal that the proposed model can serve as a valid and reliable screening tool for estimating the level of steatosis, and for identifying patients who require further investigation. Table 3. Performance comparison regarding classification accuracy, sensitivity, and specificity (%) with recently published state-of-the-art algorithms.

Ablation Study of Our Method on SMC Database
We designed an ablation study to examine the power of ring and L-K detections in an SMC dataset of US images. SteatosisNet uses a deep convolutional neural network deciding the steatosis level at the final stage, thus it is definitely essential in our study. In this study, it compares the performance of our network with the following configurations: (1) Ring detection is ablated, i.e., only L-K detection is taken into account; (2) L-K detection is ablated, thus only ring detection; and (3) both ring and L-K detections are ablated, thus only SteatosisNet is considered. These ablated architectures are trained under the same training scheme and tested with the same data.
The following table shows the ablation study results on the dataset. The accuracy metric has been widely used for evaluating the classification models. The metric calculates the proportion of correctly classified instances, either true positives or true negatives. Equation (3) represents the accuracy where TP stands for true positives, TN for true negatives, FP for false positives and FN for false negatives.
From the Equation (3), it is seen that the simplest way to improve the accuracy is to decrease FP and FN. Remember that when we crop and extract only liver and kidney areas through L-K detection, which are the most informative portion of US images in estimating the steatosis level, we could decrease FP and FN by 97.96% (See Figure 13), eventually leading to an improvement in accuracy. Table 4 quantitatively proves how much the cropping technology can help improve the overall performance where the test performance largely degrades as the ablation happens on the L-K detection. Therefore, we find that the L-K detection is highly necessary in our system. Additionally, the above table indicates that, on average, the ring detection is substantially effective in reducing the screening inspection cost. The above ablation study results teach us that the L-K detection is relatively more significant than ring detection as a new component in improving performance, nevertheless both are useful in implementing an effective liver steatosis diagnosis system.

Discussion and Conclusions
US images are the most commonly used type of image in CAD systems for diagnosing fatty liver disease. In this paper, we proposed a cascaded deep learning neural network model that can automatically predict the level of liver steatosis. The validity of the proposed model was thoroughly evaluated using both the Samsung Medical Center dataset and the Byra database, which is widely adopted in extant studies. Using an effective semantic segmentation of the liver and kidney, the automatic diagnosis task could be effectively accomplished via the masking operation and ring detection. Furthermore, the cascaded deep learning network model exhibited excellent performance in terms of sensitivity, specificity, and accuracy in predicting the level of liver steatosis. We achieved an accuracy of 99.91%, sensitivity of 99.78%, and specificity of 100%, which are incomparable to those of the conventional research results, clearly highlighting its usefulness and feasibility as a screening tool for grading liver steatosis. We believe that this surprising result is due to the incorporation of the masking operation and ring detection. The former method removes all unnecessary components, except the L-K regions, before assessing the steatosis level, while the latter minimizes the screening number of US images to be inspected by a physician.
The masking operation, which takes only the L-K areas and applies to the input of SteatosisNet, gives a remarkably good result compared with the annotation consistency by medical experts and thus outperforms the state-of-the-art techniques. The masking operation elaborately eliminates non-liver and kidney portions in evaluating and monitoring levels of hepatic steatosis, thus being able to obtain a better prediction of the severity of the fatty liver disease. The ring detection, which tries to detect a ring-shaped contour on US images, increases the detection accuracy of parasagittal images by 0.07% and can accordingly reduce the screening inspection cost. Screening an entire US image is labor-intensive and time-consuming for physicians. The proposed model does not require a presence of physician; in turn, they can invest time into more important tasks and manage patients in critical condition. Thus, the proposed model is promising and can be widely applicable for screening inspection of fatty liver on US images, with a performance comparable to that of physicians.
However, our method has a limitation. The algorithm only works on ultrasound images which are captured by the same ultrasonography machine. It means the network transfer learned with SMC datasets only work at SMC datasets. However, this does not mean that we cannot use other types of ultrasonography images. We need an extra training process when we change the ultrasonography machine. In the future, when different kinds of data are stacked and learned, it will work on images taken from all kinds of ultrasound devices. We expect our new method to be used or helped clinically by radiologists.