StoolNet for Color Classiﬁcation of Stool Medical Images

: The color classiﬁcation of stool medical images is commonly used to diagnose digestive system diseases, so it is important in clinical examination. In order to reduce laboratorians’ heavy burden, advanced digital image processing technologies and deep learning methods are employed for the automatic color classiﬁcation of stool images in this paper. The region of interest (ROI) is segmented automatically and then classiﬁed with a shallow convolutional neural network (CNN) dubbed StoolNet. Thanks to its shallow structure and accurate segmentation, StoolNet can converge quickly. The su ﬃ cient experiments conﬁrm the good performance of StoolNet and the impact of the di ﬀ erent training sample numbers on StoolNet. The proposed method has several advantages, such as low cost, accurate automatic segmentation, and color classiﬁcation. Therefore, it can be widely used in artiﬁcial intelligence (AI


Introduction
The advantages of artificial intelligence (AI) have brought many achievements to the healthcare field in both industry and academia [1]. For example, in recent years, as a research hotspot, convolutional neural networks (CNN) have been used in healthcare to automatically diagnosis diseases [2]. For AI healthcare, clinical examination is critical in diagnosing and reducing diseases. Digestive system diseases are terribly harmful to people's health. For instance, among the different kinds of cancers, the incidence and mortality of stomach cancer rank fifth and third, respectively, in the world [3]. As an important clinical examination, stool examination can effectively diagnose and prevent digestive system diseases. Actually, stool color can reflect lots of patients' conditions in the digestive system. Unfortunately, the current stool color examination heavily depends on laboratorians' sophisticated professional skills.
Some researchers have studied human stools based on biotechnology [4]. However, an accurate automatic analysis system for stool color based on computer technologies is still absent [5]. Currently, many clinical diagnosis technologies are designed based on digital medical images, so it is significant to improve the quality of medical images [6] to accurately diagnose diseases based on medical images [7]. X-ray [8], computed tomography [9], and ultrasonic images [10] can generate many important medical images that can be used for disease diagnosis.
Stool examination has two modes, namely appearance examination and microscopic examination. Color and trait are two important attributes in stool appearance examination. The results of color examination can be used for disease diagnosis, such as dysentery, enteritis, rectal diseases, and 1.
The current stool color examination heavily depends on the medical laboratorians' professional skills. We first focus on this important factor and propose a lightweight, practical, and efficient automatic color classification method to remarkably alleviate burdens on medical laboratorians. 2.
The developed preprocessing method can automatically and accurately segment the effective stool region. The designed model, namely StoolNet, can converge well and automatically and accurately classify the stool color.
In this paper, Section 2 introduces the whole framework including the preprocessing method and StoolNet. The experimental results are shown in Section 3. Conclusions and future works are discussed in Section 4.

Overview of the Proposed Method
The proposed framework is based on digital image processing and CNN, so it not only solves the problem of cross-infection in manual mode but also greatly reduces the heavy burden of manual stool examination. In this paper, neither chemical nor physical processing methods are necessary to deliberately enlarge the stool characteristics. In order to reduce costs, stool samples are classified directly based on the features in original images.
The whole framework consists of two parts, preprocessing and classification. The original images are preprocessed by an automatic segmentation algorithm. The classification mechanism is named StoolNet and is an effective shallow CNN. The experiments show that the accuracy of StoolNet can reach 100% after only a few iterations. And the whole framework is shown in Figure 1.

Preprocessing Stage
Due to the particularity of stools, sample shapes cannot be fixed. The irregular shapes are very likely to lead to incorrect and inaccurate ROI localization and degrade classification accuracy. An effective and efficient method [31] was developed to segment the stool region from the original images. Six features were tested and represented in grayscale maps, as shown in Figure 2. According

Preprocessing Stage
Due to the particularity of stools, sample shapes cannot be fixed. The irregular shapes are very likely to lead to incorrect and inaccurate ROI localization and degrade classification accuracy. An effective and efficient method [31] was developed to segment the stool region from the original images. Six features were tested and represented in grayscale maps, as shown in Figure 2. According to our sufficient investigation and analysis, the discrimination between stool and background is relatively high in saturation maps.

Preprocessing Stage
Due to the particularity of stools, sample shapes cannot be fixed. The irregular shapes are very likely to lead to incorrect and inaccurate ROI localization and degrade classification accuracy. An effective and efficient method [31] was developed to segment the stool region from the original images. Six features were tested and represented in grayscale maps, as shown in Figure 2. According to our sufficient investigation and analysis, the discrimination between stool and background is relatively high in saturation maps. The main purpose of preprocessing is to get the ROI correctly and accurately. Whether the ROI is correct and accurate directly affects the performance of the latter classification algorithm. It is a The main purpose of preprocessing is to get the ROI correctly and accurately. Whether the ROI is correct and accurate directly affects the performance of the latter classification algorithm. It is a simple, quick, and efficient way to segment ROI based on a threshold. The threshold segmentation operation is where f(x, y) labels the foreground and background pixels and g(x, y) represents the pixel values in the original image. T is a threshold value to binarize the image. The pixels with lower, higher, or equal values than T are classified as background or foreground pixels. An adaptive algorithm is employed to optimize the threshold value for accurate segmentation. Once the optimal threshold is obtained, the difference between the background and foreground regions are maximized. The goal is to maximize inter-class variance with where w 0 is the proportion of the number of foreground pixels to the total number of all the pixels in the image. N 0 is the number of foreground pixels. The size of the image is M × N according to where w 1 is the proportion of the number of background pixels to the total number of all the pixels in the image. N 1 is the number of background pixels. Also, the size of the image is M × N according to where N 0 and N 1 should satisfy Formula 4, i.e., their sum should be equal to the total number of all the pixels in the image. In w 0 + w 1 = 1 (5) where w 0 and w 1 should satisfy Equation (5), their sum should be equal to 1. In where µ is the average grayscale of the whole image, µ 0 and µ 1 are the average values of foreground and background pixels, respectively. In where g is the inter-variance between foreground and background regions, it can be given as where Equation (6) is substituted into Equation (7) to obtain Equation (8).
For all possible values of T, we selected the value that maximizes g in Equation (8). The algorithm is performed to extract the ROI candidates from the six color components, as shown in Figure 3. Saturation is the best feature for foreground segmentation among the six color features. The target is labeled with the foreground region in the binary image, shown in Figure 4.  The target is labeled with the foreground region in the binary image, shown in Figure 4.  The target is labeled with the foreground region in the binary image, shown in Figure 4.

StoolNet
There are four different colors for stool medical examination, namely brown, black, yellow, and green. An effective shallow CNN is carefully designed for color classification and is named StoolNet. The structure of the proposed StoolNet is shown in Figure 5.

StoolNet
There are four different colors for stool medical examination, namely brown, black, yellow, and green. An effective shallow CNN is carefully designed for color classification and is named StoolNet. The structure of the proposed StoolNet is shown in Figure 5. The target is labeled with the foreground region in the binary image, shown in Figure 4.

StoolNet
There are four different colors for stool medical examination, namely brown, black, yellow, and green. An effective shallow CNN is carefully designed for color classification and is named StoolNet. The structure of the proposed StoolNet is shown in Figure 5.  The input of StoolNet has been preprocessed, so background information is eliminated. The input is resized to 200 × 200. The sizes of convolutional kernels are 5 × 5 and the stride is 1. All of the activation functions in StoolNet are ReLU. The ReLU formula is The dropout method is used in every layer. The loss function is the cross-entropy and the stochastic gradient descent (SGD) is used to update weights. The loss function is defined as where M and N are the numbers of samples and classes, respectively. y i,k is the true label of the k-th sample if the sample has the i-th label y i,k = 1, otherwise it is y i,k =0. p i,k is the probability of the i-th label of the k-th sample. A benefit of the preprocessing stage is that the background is masked and, hence, the input of StoolNet has more discrimination than its corresponding original image. Since the number of classes is only four, a shallow structure is enough to achieve a satisfactory classification performance. The sufficient experiments in Section 3 will prove that the accuracy would not always be higher when the depth is deeper. Therefore, a suitable and reasonable depth is a key question for classification tasks. In addition, another advantage of a shallow structure is low computational complexity. Considering the balance between the accuracy and the computational complexity, StoolNet is a good choice for meeting these requirements. Accordingly, this indicates that StoolNet can be applied to several healthcare commercial applications.

Experiments
The experimental environment was configured with an Intel Xeon E-2136 CPU @3.30GHz, 16GB internal storage, NVIDIA Quadro P4000 GPU, and a Windows 10 64-bit operating system. The resolution of images in the self-collected dataset was 640 × 480 pixels. The proposed method and testing were implemented using a Tensorflow deep learning framework [32]. The preprocessing codes were programmed in MATLAB R2016a, Mathworks, US. The network related codes, including the sample random selection for training sets and testing sets, were written in Python. The Python compiler was Spyder. Important to training our network was the learning rate set at 0.001, max iteration set at 120, and batch size set at 32.
Stool color classification is not an open problem and there is no public database. It is very difficult to collect a stool image dataset because of privacy problems. Even though stool color classification is very important for healthcare, few researchers foray into this field because of the particularity of stool samples. We collected a stool images dataset, with the collaboration of a hospital and medical institution, containing 110 images. All sample images were carefully labeled by professional doctors, so we believe the conclusions were drawn from reliable medical results. The images were rotated three times and the total number was 440 images. According to our study, this article is the first paper on stool color classification. Since stool color classification is a particular field, StoolNet cannot be compared with other methods, but sufficient experiments can solidly confirm the effectiveness and efficiency of the proposed method.
The proportion of data used to train the network was set to three ratios, 75%, 50%, and 25% and the rest of the data were used for testing. In order to reduce the effect of the training sample quality, the experiments were repeated five times. In each experiment session, the training and testing samples were randomly selected in a prescribed ratio. Figure 6 shows the accuracy and loss curves in training sets and testing sets. Because the network is an effective shallow CNN and the inputs of the network are preprocessed, StoolNet can converge to get a good classification result in several epochs in three prescribed ratios.
the network are preprocessed, StoolNet can converge to get a good classification result in several epochs in three prescribed ratios. The accuracy of Group 5 in (d) of Figure 6 is fairly constant until 10 epochs and the main reason is that CNN is a probability model. However, there is not a remarkable change in loss until 10 epochs, as shown in Figure 7. CNN is a probability model containing a certain randomness, so it is possible that the weights are not properly adjusted, which results in fairly constant loss and accuracy. The accuracy of Group 5 in (d) of Figure 6 is fairly constant until 10 epochs and the main reason is that CNN is a probability model. However, there is not a remarkable change in loss until 10 epochs, as shown in Figure 7. CNN is a probability model containing a certain randomness, so it is possible that the weights are not properly adjusted, which results in fairly constant loss and accuracy.
When the training sample proportion is not large enough, the speed of learning features are slower. It can be concluded that the fewer samples used to train the network, the slower the convergence speed will be. In order to confirm the effect of the preprocessing method, the same experiments are performed on the dataset that is not preprocessed. The results are shown in Figure 8. No matter how many samples are used for training, it is difficult to converge and the oscillation is remarkably violent. When the training sample proportion is not large enough, the speed of learning features are slower. It can be concluded that the fewer samples used to train the network, the slower the convergence speed will be. In order to confirm the effect of the preprocessing method, the same experiments are performed on the dataset that is not preprocessed. The results are shown in Figure  8. No matter how many samples are used for training, it is difficult to converge and the oscillation is remarkably violent.  Table 1 shows the numbers of convergence epochs on training sets with different training sample proportions and their averages. Experiments with and without preprocessing were both performed. * means the dataset was preprocessed. Our good preprocessing method can greatly accelerate the convergence speed. The numbers of convergence epochs were recorded when the accuracy reached  Table 1 shows the numbers of convergence epochs on training sets with different training sample proportions and their averages. Experiments with and without preprocessing were both performed. * means the dataset was preprocessed. Our good preprocessing method can greatly accelerate the convergence speed. The numbers of convergence epochs were recorded when the accuracy reached 100%. StoolNet can achieve remarkable performance in training sets, but different training sample proportions can greatly impact the results in testing sets, as shown in (b), (d), and (f) of Figure 6 and in (b), (d), and (f) of Figure 7. Table 2 intuitively reveals the impact of training sample proportions on the accuracy. * means the dataset was preprocessed. The preprocessing method can remarkably improve the accuracy. Thus the preprocessing method in this paper is necessary to improve the performance of StoolNet. The average accuracies of five groups with different training sample proportions were calculated and are shown in Figure 9. The network can learn more information and features with more training samples. Thus, the larger the training sample number is, the lower the epoch number is. StoolNet can achieve remarkable performance in training sets, but different training sample proportions can greatly impact the results in testing sets, as shown in (b), (d), and (f) of Figure 6 and in (b), (d), and (f) of Figure 7. Table 2 intuitively reveals the impact of training sample proportions on the accuracy. * means the dataset was preprocessed. The preprocessing method can remarkably improve the accuracy. Thus the preprocessing method in this paper is necessary to improve the performance of StoolNet. The average accuracies of five groups with different training sample proportions were calculated and are shown in Figure 9. The network can learn more information and features with more training samples. Thus, the larger the training sample number is, the lower the epoch number is.   The results show that "deeper is not always better", so the depth of StoolNet was set to two in terms of accuracy and efficiency.  Figure 10 reveals the relationship between the network depth and its classification ability. The results show that "deeper is not always better", so the depth of StoolNet was set to two in terms of accuracy and efficiency. StoolNet was compared with the method proposed in [30]. The results are shown in Figure 11. Five groups were tested on the preprocessed images. Lightweight StoolNet was able to converge in each group, while the method in [30] could not converge well. This is because the structure of the convolutional layers of StoolNet is simpler than that of the method in [30], so StoolNet has a better converge performance on a small training set. Actually, most medical equipment for collecting stool images can properly control illumination. However, illumination conditions could be different and could change the color appearance of stool. Since the structure of StoolNet is light, a small-or medium-sized dataset under certain illumination can be conveniently used to train StoolNet. We would like to simulate the effect on different illumination scales. The illumination intensity is enlarged or reduced if the illumination scale is larger or less than 1, as shown in Figure 12. StoolNet was compared with the method proposed in [30]. The results are shown in Figure 11. Five groups were tested on the preprocessed images. Lightweight StoolNet was able to converge in each group, while the method in [30] could not converge well. This is because the structure of the convolutional layers of StoolNet is simpler than that of the method in [30], so StoolNet has a better converge performance on a small training set.  Figure 10 reveals the relationship between the network depth and its classification ability. The results show that "deeper is not always better", so the depth of StoolNet was set to two in terms of accuracy and efficiency. StoolNet was compared with the method proposed in [30]. The results are shown in Figure 11. Five groups were tested on the preprocessed images. Lightweight StoolNet was able to converge in each group, while the method in [30] could not converge well. This is because the structure of the convolutional layers of StoolNet is simpler than that of the method in [30], so StoolNet has a better converge performance on a small training set. Actually, most medical equipment for collecting stool images can properly control illumination. However, illumination conditions could be different and could change the color appearance of stool. Since the structure of StoolNet is light, a small-or medium-sized dataset under certain illumination can be conveniently used to train StoolNet. We would like to simulate the effect on different illumination scales. The illumination intensity is enlarged or reduced if the illumination scale is larger or less than 1, as shown in Figure 12. Actually, most medical equipment for collecting stool images can properly control illumination. However, illumination conditions could be different and could change the color appearance of stool. Since the structure of StoolNet is light, a small-or medium-sized dataset under certain illumination can be conveniently used to train StoolNet. We would like to simulate the effect on different illumination scales. The illumination intensity is enlarged or reduced if the illumination scale is larger or less than 1, as shown in Figure 12. There are two stages of the proposed method, namely preprocessing and StoolNet. Preprocessing is based on the inter-variance between the foreground and the background and is slightly affected by illumination. The foreground and background pixels are labeled as 1 and 0 in the segmentation mask images. The dissimilarity between the original segmentation mask image, denoted as I 1 , and another one under different illumination, denoted as I 2 , is measured with the Hamming distance, denoted as D, and computed as follows: g(x, y) = { 0, I 1 (x, y) = I 2 (x, y) 1, I 1 (x, y) ≠ I 2 (x, y) where (x, y) is the coordinate and where m and n are the width and height of the image, respectively. Table 3 shows the average Hamming distances between the original segmentation mask image and the others under different illuminations. The dissimilarities are all small, so the preprocessing is slightly affected by illumination change. There are two stages of the proposed method, namely preprocessing and StoolNet. Preprocessing is based on the inter-variance between the foreground and the background and is slightly affected by illumination. The foreground and background pixels are labeled as 1 and 0 in the segmentation mask images. The dissimilarity between the original segmentation mask image, denoted as I 1 , and another one under different illumination, denoted as I 2 , is measured with the Hamming distance, denoted as D, and computed as follows: g(x, y) = 0, I 1 (x, y) = I 2 (x, y) 1, I 1 (x, y) I 2 (x, y) (11) where (x, y) is the coordinate and D = m x=1 n y=1 g(x, y) m × n (12) where m and n are the width and height of the image, respectively. Table 3 shows the average Hamming distances between the original segmentation mask image and the others under different illuminations. The dissimilarities are all small, so the preprocessing is slightly affected by illumination change. The proliferated samples with a certain illumination scale are conveniently used to train StoolNet. Table 4 shows that all the trained StoolNets under different illumination conditions have good classification performance, so the StoolNet model is robust in different illumination environments.

Conclusions and Future Work
This paper concerns stool color classification, which has not been extensively studied but is a very important field in healthcare. We collected stool images to establish a dataset for appearance detection. The accuracy of the developed classification algorithm was shown to reach 97.5%. Sufficient experiments revealed the relationship between the training sample proportion and accuracy, the relationship between the training sample proportion and the epoch, as well as the relationship between network depth and classification ability. The proposed framework can basically meet the requirements of business and application, but more experiments should be designed to test the robustness of the algorithm in future works. In addition, we will try to continue collecting more stool images to enlarge the dataset scale and further improve our method.