Research on the Application of Artificial Intelligence in Public Health Management: Leveraging Artificial Intelligence to Improve COVID-19 CT Image Diagnosis

Since the start of 2020, the outbreak of the Coronavirus disease (COVID-19) has been a global public health emergency, and it has caused unprecedented economic and social disaster. In order to improve the diagnosis efficiency of COVID-19 patients, a number of researchers have conducted extensive studies on applying artificial intelligence techniques to the analysis of COVID-19-related medical images. The automatic segmentation of lesions from computed tomography (CT) images using deep learning provides an important basis for the quantification and diagnosis of COVID-19 cases. For a deep learning-based CT diagnostic method, a few of accurate pixel-level labels are essential for the training process of a model. However, the translucent ground-glass area of the lesion usually leads to mislabeling while performing the manual labeling operation, which weakens the accuracy of the model. In this work, we propose a method for correcting rough labels; that is, to hierarchize these rough labels into precise ones by performing an analysis on the pixel distribution of the infected and normal areas in the lung. The proposed method corrects the incorrectly labeled pixels and enables the deep learning model to learn the infected degree of each infected pixel, with which an aiding system (named DLShelper) for COVID-19 CT image diagnosis using the hierarchical labels is also proposed. The DLShelper targets lesion segmentation from CT images, as well as the severity grading. The DLShelper assists medical staff in efficient diagnosis by providing rich auxiliary diagnostic information (including the severity grade, the proportions of the lesion and the visualization of the lesion area). A comprehensive experiment based on a public COVID-19 CT image dataset is also conducted, and the experimental results show that the DLShelper significantly improves the accuracy of segmentation for the lesion areas and also achieves a promising accuracy for the severity grading task.


Introduction
Since the start of 2020, the outbreak of the Coronavirus disease (COVID-19) has been a globally public health emergency, and has caused unprecedented economic and social disaster [1]. It features a number of symptoms including endothelial barrier disruption, dysfunctional alveolar-capillary oxygen transmission, reduced oxygen diffusion capacity, alveolar wall thickening, increased vascular permeability and pulmonary oedema [2]. As a major global public health emergency, COVID-19 has once again proved that human beings

•
In order to improve the performance of segmentation on COVID-19 infection, a label refinement method is proposed to refine the existing labels from rough to precise. The refinement reassigns the incorrectly labeled pixels and enables the network to learn the infection degree of each infected pixel.
• Aiming to assist physicians in the efficient diagnosis of COVID-19, a deep learningaided system (named DLSHELPER) using refined hierarchical labels is proposed. DLSHELPER provides rich auxiliary diagnostic information, including the proposed severity grade, proportion of infected area and infected area visualization.

•
We validate the accuracy of our method for COVID-19 lesion segmentation and grading on public COVID-19 CT datasets.
At present, it is an important opportunity to change the public health governance system. This study takes the diagnosis of COVID-19 as an example to explore the enabling effect of AI in the management of public health emergencies.
The rest of the paper is organized as follows: Section 2 introduces the related work. Section 3 details the proposed method for COVID-19 CT image diagnosis. Section 4 presents the experiment and discussion. Finally, Section 5 concludes the study.

Related Work
In recent years, the intelligent analysis of medical images based on artificial intelligence has been extensively researched [19]. Santosh et al. [20] proposed a lung feature detection model based on multi-feature parameters, and it achieved an accuracy of up to 91%. Pratondo et al. [21] combined multiple machine learning models and a region-based contouring algorithm for the task of medical image segmentation. Ahmad et al. [22] used the Content-Based medical image retrieval algorithm for lung segmentation. However, its Jaccard similarity coefficient was only 0.870. Shepherd et al. [23] proposed a statistical model based on shape prior for segmentation combined with online/offline learning models. Xu et al. proposed a method for lung function assessment based on cough sound [24]. Shaukat et al. [25] developed a fully automated method to detect lung nodules using a hybrid feature set of SVM and achieved a promising accuracy. Souza et al. [26] proposed a Deep Convolutional Neural Network method (DCNN) for fully automated lung segmentation. Park et al. [27] used DCNN for lung CT image segmentation. Although DCNN is capable of learning complex data, it is overly dependent on the amount of data used in the training process. Besides, the size of the data also impacts the performance of the model.
The quality of these public CT image datasets is uneven because they are susceptible to the experience of the physician. In addition, in contrast to the semantically segmented objects, the COVID-19 lesion is translucent and of a low contrast with the surroundings. The labeling operation is conducted manually; hence, the labeling process unavoidably involves human errors. Some normal pixels are mislabeled as infected ones in the situations where: (1) the lung parenchyma pixels are entrapped between lesion pixels; (2) other areas are tissues such as pulmonary vessels. These mislabeled pixels will weaken the performance of model training.

Overview
As shown in Figure 1, the functions of the proposed method include: (1) lung segmentation; (2) lesion label refinement; (3) lesion segmentation; and (4) severity grading. The original CT images and lung parenchyma labels are used to train a two-category semantic segmentation network, which is used for segmenting the lung parenchyma image. With these segmented lung parenchyma images and lesion labels, the infected and normal areas can be identified. By further analyzing the pixel distribution in these two areas, the mislabeled pixels can be corrected and these pixels can be hierarchized to different levels according to the value of each pixel so that these rough lesion labels are finally refined to accurate hierarchical labels. The "level" not only represents the value of a pixel, but also indicates the infected degree of the area in which the pixel is contained. The lung parenchyma images and refined hierarchical labels are used to train a multi-category semantic segmentation network, then we use it to segment the lesion areas. Different output lesions are converted into different colors to generate a hierarchical visual map that provides intuitive information for auxiliary diagnosis. We calculate the proportion of three categories in the lesion area, respectively. Then, the total proportion of the lesion to the whole lung parenchyma is provided as other information for auxiliary diagnosis. Moreover, these four radiological features are used as input parameters for the severity grading, which is based on a three-layer multi-layer preceptor (MLP). In summary, there are three types of information provided to physicians by the proposed system: (1) the hierarchical visual map; (2) the proportion of the lesion in the lung area; and (3) the severity grade. parenchyma images and refined hierarchical labels are used to train a multi-category semantic segmentation network, then we use it to segment the lesion areas. Different output lesions are converted into different colors to generate a hierarchical visual map that provides intuitive information for auxiliary diagnosis. We calculate the proportion of three categories in the lesion area, respectively. Then, the total proportion of the lesion to the whole lung parenchyma is provided as other information for auxiliary diagnosis. Moreover, these four radiological features are used as input parameters for the severity grading, which is based on a three-layer multi-layer preceptor (MLP). In summary, there are three types of information provided to physicians by the proposed system: (1) the hierarchical visual map; (2) the proportion of the lesion in the lung area; and (3) the severity grade.

Label Refinement
As discussed in Section 2, with the traditional method pixels marked as infected may contain normal pixels. Moreover, the traditional strategy for lesion labeling only includes the categories: infected (marked as 1) and normal (marked as 0), which ignores the information contained in the infected pixels; e.g., for each pixel in the infected area, the higher the value, the more serious the infection is.
For two lesions with the same area (assuming that the area of the lung parenchyma in which they are located is also equal and the value of pixel falls in the range (0-255)), the more the grayscale distribution approaches 255, the more serious the infection is in clinical diagnosis. As shown in Figure 2, we selected four CT images from different severity grades and calculated grayscale histograms of their lesions. The results reveal a positive correlation between the grayscale distribution of the lesion with its severity. However, there is no accurate metric to measure the grayscale distribution, Therefore, we hierarchize the infected area to a different level according to its pixel value, and the grayscale distribution can be described by the percentage of pixels at different levels in the lesion.

Label Refinement
As discussed in Section 2, with the traditional method pixels marked as infected may contain normal pixels. Moreover, the traditional strategy for lesion labeling only includes the categories: infected (marked as 1) and normal (marked as 0), which ignores the information contained in the infected pixels; e.g., for each pixel in the infected area, the higher the value, the more serious the infection is.
For two lesions with the same area (assuming that the area of the lung parenchyma in which they are located is also equal and the value of pixel falls in the range (0-255)), the more the grayscale distribution approaches 255, the more serious the infection is in clinical diagnosis. As shown in Figure 2, we selected four CT images from different severity grades and calculated grayscale histograms of their lesions. The results reveal a positive correlation between the grayscale distribution of the lesion with its severity. However, there is no accurate metric to measure the grayscale distribution, Therefore, we hierarchize the infected area to a different level according to its pixel value, and the grayscale distribution can be described by the percentage of pixels at different levels in the lesion. We denote the CT image as and its corresponding lesion label as .
has the same size as . We obtain the lung pixel from by applying lung segmentation and denote it as . Then, the lesion in is obtained by the mask operation in and , and we denote it as . The complement of in is (the normal pixel in the lung). These processes are formulized as: where denotes the network for lung parenchyma segmentation, ⊗ denotes the element-wise multiplication and denotes the pixel in the image; ∁ denotes the complement of a in set B.
We denote the average value of as and the maximum of as , respectively. Given the pixel of can be divided into Grade-, the pixel with a value less than and greater than or equal to Grade-will be reassigned as the background. These processes are formulized as: ∈ ( + ( − 1) * , + * ], 0 where denotes the interval between grades. represents the range of pixel values of the background; meanwhile, represents the range of pixel values of Grade-(0 ). Finally, we assign the pixel belonging to each grade in ; i.e., the value of the where N Two-catagory denotes the network for lung parenchyma segmentation, ⊗ denotes the element-wise multiplication and p denotes the pixel in the image; a b denotes the complement of a in set B. We denote the average value of O No-in f ected as a and the maximum of O In f ected as b, respectively. Given the pixel of O In f ected can be divided into Grade-g, the pixel with a value less than a and greater than or equal to Grade-g will be reassigned as the background. These processes are formulized as: where s denotes the interval between grades. R 0 represents the range of pixel values of the background; meanwhile, R i represents the range of pixel values of Grade-i (0 < i ≤ g). Finally, we assign the pixel belonging to each grade in I Lung ; i.e., the value of the Grade-i pixel is set to i; the value of the background pixel is set to 0. Thus, a refined hierarchical label is generated. Of note, the reason for the pixel with a value greater than or equal to Grade-g being reassigned as the background is that lung trachea and blood vessels may be contained in these pixels. A value of g that is too small or too large will impact the accuracy of the label refinement; hence, we use it as a hyper-parameter and compulsorily set it to 3 (according to experimental results). As shown in Figure 3c, mislabeled infected pixels are corrected to normal ones, and these infected pixels are hierarchized to different grades. Grade-pixel is set to ; the value of the background pixel is set to 0. Thus, a refined hierarchical label is generated. Of note, the reason for the pixel with a value greater than or equal to Grade-being reassigned as the background is that lung trachea and blood vessels may be contained in these pixels. A value of that is too small or too large will impact the accuracy of the label refinement; hence, we use it as a hyper-parameter and compulsorily set it to 3 (according to experimental results). As shown in Figure 3c, mislabeled infected pixels are corrected to normal ones, and these infected pixels are hierarchized to different grades. The left part of (b) is the histogram of (a) and we grade the pixels based on the pixel distribution of the infected and uninfected areas in the lung; (c) shows the lesion labels before and after the refinement.

Lung and Lesion Segmentation
Traditional segmentation models (especially UNet [28]) have achieved good performance on segmentation tasks for lung and COVID-19 lesions. UNet adopts symmetric encoding and decoding paths to aggregate semantic information and recover spatial information with the help of shortcut connections, and it is suitable for medical image segmentation. Thus, in this study, we adopt UNet for lung and lesion segmentation. In addition, we use a multiple-category training strategy (instead of the traditional two-category strategy) to learn the grades of pixels in the lesion.
With the completion of network training, a CT image will be input to the UNet to segment a lung image. Then, the obtained lung image is input to the multiple-category segmentation network to obtain the COVID-19 lesion. Based on these different categories of lesions, a colorful visualized map is generated hierarchically. The left part of (b) is the histogram of (a) and we grade the pixels based on the pixel distribution of the infected and uninfected areas in the lung; (c) shows the lesion labels before and after the refinement.

Lung and Lesion Segmentation
Traditional segmentation models (especially UNet [28]) have achieved good performance on segmentation tasks for lung and COVID-19 lesions. UNet adopts symmetric encoding and decoding paths to aggregate semantic information and recover spatial information with the help of shortcut connections, and it is suitable for medical image segmentation. Thus, in this study, we adopt UNet for lung and lesion segmentation. In addition, we use a multiple-category training strategy (instead of the traditional two-category strategy) to learn the grades of pixels in the lesion.
With the completion of network training, a CT image will be input to the UNet to segment a lung image. Then, the obtained lung image is input to the multiple-category segmentation network to obtain the COVID-19 lesion. Based on these different categories of lesions, a colorful visualized map is generated hierarchically.

Severity Grading
As reported in [26], the number, quadrant and area of lesions in CT images are important factors to determine the severity of the COVID-19 case. However, as the area of lung parenchyma in a volume of continuous CT image slices is different, it is inappropriate to use a fixed value as the threshold to determine the grade of severity. In this work, we calculated the proportion of all lesions in the lung parenchyma to address this issue. We found that the higher the value, the whiter the pixel appears in the lesion area, so the level of the "white" pixel in the lesion area and the density of the white pixel can be taken as indicators for determining the grade of severity. With regard to these indicators, a multilayer perceptron (MLP) is used as a classifier for severity grading.
The multilayer perceptron is a feedforward artificial neural network that uses supervised back-propagation, which is widely used for nonlinear classifications. As shown in Figure 4, the MLP in the proposed method consists of an input layer, a hidden layer and an output layer. The Relu function is used as the activation function of the hidden layer and the SoftMax function is used as the activation function of the output layer for the classification. The number of neurons in the hidden layer is determined by an empirical formula: where k denotes the number of neurons in the hidden layer, n denotes the number of neurons in the input layer, m denotes the number of neurons in the output layer, and a denotes a constant between 1 and 10.
As reported in [26], the number, quadrant and area of lesions in CT images are important factors to determine the severity of the COVID-19 case. However, as the area of lung parenchyma in a volume of continuous CT image slices is different, it is inappropriate to use a fixed value as the threshold to determine the grade of severity. In this work, we calculated the proportion of all lesions in the lung parenchyma to address this issue. We found that the higher the value, the whiter the pixel appears in the lesion area, so the level of the "white" pixel in the lesion area and the density of the white pixel can be taken as indicators for determining the grade of severity. With regard to these indicators, a multilayer perceptron (MLP) is used as a classifier for severity grading.
The multilayer perceptron is a feedforward artificial neural network that uses supervised back-propagation, which is widely used for nonlinear classifications. As shown in Figure 4, the MLP in the proposed method consists of an input layer, a hidden layer and an output layer. The Relu function is used as the activation function of the hidden layer and the SoftMax function is used as the activation function of the output layer for the classification. The number of neurons in the hidden layer is determined by an empirical formula: where denotes the number of neurons in the hidden layer, denotes the number of neurons in the input layer, denotes the number of neurons in the output layer, and denotes a constant between 1 and 10. denotes the input features of the MLP network.
denotes the proportion of infected pixels of three levels in all infected pixels. denotes the proportion of infected pixels in lung parenchyma pixels.

Experiments and Analysis
The dataset [29] used in this study contains about 3500 CT image slices and corresponding lung and lesion segmentation labels. In addition, we recruited a radiology graduate student to label each CT image with a grade of severity (e.g., normal, mild, moderate, severe and critical). The labels were then verified by an experienced radiology specialist for reliability.

Implementation and Evaluation
A two-stage training strategy is adopted in this experiment: (1) training the segmentation of lung and COVID-19 lesions; (2) oversampling the training set, and finally training

Experiments and Analysis
The dataset [29] used in this study contains about 3500 CT image slices and corresponding lung and lesion segmentation labels. In addition, we recruited a radiology graduate student to label each CT image with a grade of severity (e.g., normal, mild, moderate, severe and critical). The labels were then verified by an experienced radiology specialist for reliability.

Implementation and Evaluation
A two-stage training strategy is adopted in this experiment: (1) training the segmentation of lung and COVID-19 lesions; (2) oversampling the training set, and finally training the MLP. We reproduced all the related networks and modules in the Pytorch framework. When training the segmentation network, we set the number of the batch size to 1, then initialize the network weights with Kaiming initialization, set network biases to zero and train the positive/negative samples alternately. In addition, the training set is shuffled in each iteration. We use different metrics in different stages. Intersection over union (IoU), sensitivity (SEN), specificity (SPE) and Dice similarity coefficient (DSC) are used to evaluate the accuracy of the lung segmentation. Besides, mean intersection over union (mIoU), mean pixel accuracy (mPA) and class pixel accuracy (CPA) are used to evaluate the accuracy of COVID-19 lesion segmentation using original labels and refined hierarchical labels. Precision is used to evaluate the accuracy of the severity grading. The above-mentioned metrics can be calculated as: where TP denotes true positives, TN denotes true negatives, FP denotes false positives and FN denotes false negatives. We optimize the lung and lesion segmentation networks using a binary cross-entropy L lung and a multi-category cross-entropy loss L lesion , respectively, using a mean-squared error loss to train the MLP L ml p .
f (a) m = e a m ∑ g n=0, n =m e a n where a is the ground truth and b is the predicted result. g is the number of grades in refined hierarchical labels.

Evaluation of Lung Segmentation
As shown in Table 1 and Figure 5, lung segmentation with UNet works efficiently, with DSC up to 96%. Besides, IoU, SEN and SPE all surpass 90%. The accurate segmentation of the lung parenchyma ensures the quality of subsequent COVID-19 lesion segmentation and severity grading.

Evaluation of COVID-19 Lesion Segmentation Using Refined Hierarchical Labels
To evaluate the performance of refined hierarchical labels for COVID-19 lesion segmentation, four state-of-the-art networks are selected and trained with original labels and refined hierarchical labels, respectively (as shown in Figure 5). With regard to these widely used metrics (e.g., IoU, DSC, SEN and SPE) for medical image segmentation, an evaluation is carried out. Table 2 shows the values of these four metrics of the model trained with original labels and refined hierarchical labels. Table 3 shows the values of these four metrics, the CPA of each level and the MIoU and MPA of the model trained with refined hierarchical labels.
With the original labels, DeepLabV3+ achieves the best DICE of 82.94% among all the networks. Meanwhile, UNet achieves the worst performance. However, we find that the area marked as a lesion by the original labels in the input image contains many normal pixels such as lung parenchyma and pulmonary vessels. As illustrated in Figure 5, the #2 image is the most mislabeled. By introducing the refined hierarchical labels, the segmentation network can not only accurately identify the infected pixels, but also filter out these mislabeled pixels. Besides, as shown in Table 2, with the introduction of refined hierarchical labels, the model achieves better performance, such as the DSC of UNet and Attention-UNet reaching 83.47% and 82.35%, respectively. As shown in Table 3, the performance of pixel segmentation with UNet (2) is the best. Because the ground truth used is different, we cannot directly compare the performance of models training on original

Evaluation of COVID-19 Lesion Segmentation Using Refined Hierarchical Labels
To evaluate the performance of refined hierarchical labels for COVID-19 lesion segmentation, four state-of-the-art networks are selected and trained with original labels and refined hierarchical labels, respectively (as shown in Figure 5). With regard to these widely used metrics (e.g., IoU, DSC, SEN and SPE) for medical image segmentation, an evaluation is carried out. Table 2 shows the values of these four metrics of the model trained with original labels and refined hierarchical labels. Table 3 shows the values of these four metrics, the CPA of each level and the MIoU and MPA of the model trained with refined hierarchical labels.
With the original labels, DeepLabV3+ achieves the best DICE of 82.94% among all the networks. Meanwhile, UNet achieves the worst performance. However, we find that the area marked as a lesion by the original labels in the input image contains many normal pixels such as lung parenchyma and pulmonary vessels. As illustrated in Figure 5, the #2 image is the most mislabeled. By introducing the refined hierarchical labels, the segmentation network can not only accurately identify the infected pixels, but also filter out these mislabeled pixels. Besides, as shown in Table 2, with the introduction of refined hierarchical labels, the model achieves better performance, such as the DSC of UNet and Attention-UNet reaching 83.47% and 82.35%, respectively. As shown in Table 3, the performance of pixel segmentation with UNet (2) is the best. Because the ground truth used is different, we cannot directly compare the performance of models training on original labels and refined hierarchical labels. Experienced radiologists from a hospital in Zhejiang Province confirm that refined hierarchical labels bring more precise results.

Evaluation of COVID-19 Severity Grading
There are few samples of mild, severe and critical cases in the dataset. As shown in Table 4, the classification accuracy of these categories is very low, even as low as 0 (mild). To solve this category imbalance problem, we applied the Synthetic Minority Oversampling Technique (SMOTE) [33] to minority classes. With the operation of oversampling the dataset, all the categories of samples reached a balance, the classification accuracy of mild reached 100% and the classification accuracy of severe and critical increased by 19.81% and 10.25%, respectively. Moreover, the overall accuracy was 98.82%.

Conclusions
In this study, we propose a method for refining lesion labels from rough to precise. Then, a deep learning-based aiding system for CT image diagnosis using refined labels is developed. It performs lung and lesion segmentation from CT images, as well as severity grading. A multi-layer preceptor is used as a classifier, and the proportion of the lesion to the lung and the proportion of each grade in the lesion are used as input features. Auxiliary diagnostic information including the severity grade, proportion of infected area and visualization of the infected area are provided by the DLShelper for physicians in clinic. A comparative experiment based on public datasets is carried out, and the experimental results show that the proposed method achieves better accuracy in comparison with several state-of-the-art networks. Besides, the proposed method achieves a high accuracy for severity grading. In future, we will develop a new metric to describe the grayscale distribution features so as to further improve the performance.
In COVID-19 prevention and control, while developing AI and playing its positive role, we should be alert to the social risks and ethical challenges brought by AI itself, carry out responsive and principled scientific and technological governance, and strengthen ethical review and data legislation under the principle of "harmony, friendship, fairness, inclusiveness and sharing, respect for privacy, security and controllability, shared responsibility, open cooperation, and agile governance".