Accelerating Super-Resolution and Visual Task Analysis in Medical Images

: Medical images are acquired at different resolutions based on clinical goals or available technology. In general, however, high-resolution images with ﬁne structural details are preferred for visual task analysis. Recognizing this signiﬁcance, several deep learning networks have been proposed to enhance medical images for reliable automated interpretation. These deep networks are often computationally complex and require a massive number of parameters, which restrict them to highly capable computing platforms with large memory banks. In this paper, we propose an efﬁcient deep learning approach, called Hydra , which simultaneously reduces computational complexity and improves performance. The Hydra consists of a trunk and several computing heads. The trunk is a super-resolution model that learns the mapping from low-resolution to high-resolution images. It has a simple architecture that is trained using multiple scales at once to minimize a proposed learning-loss function. We also propose to append multiple task-speciﬁc heads to the trained Hydra trunk for simultaneous learning of multiple visual tasks in medical images. The Hydra is evaluated on publicly available chest X-ray image collections to perform image enhancement, lung segmentation, and abnormality classiﬁcation. Our experimental results support our claims and demonstrate that the proposed approach can improve the performance of super-resolution and visual task analysis in medical images at a remarkably reduced computational cost.


Introduction
Super-resolution (SR) is a well-studied area of computer vision with several applications that range from surveillance imaging to medical imaging. It can be defined as the process of estimating a high-resolution (HR) image from one or multiple low-resolution (LR) images [1,2]. SR is categorized into a single image SR (SISR) and multiple images SR (MISR) based on the number of images used to reconstruct the HR image [1]. SISR recovers the HR image from a single LR image while MISR recovers the HR image from multiple LR images of the same object. MISR methods are less popular than SISR because it is difficult to find multiple LR images of the same object. In addition, MISR methods are less effective and have higher computational complexity because they require image registration and fusion prior to enhancement [1,2].
Medical imaging has been widely used in clinical practice to support accurate and real-time diagnosis [3]. Multiple imaging modalities, such as ultrasound (US), X-ray (XR), computed tomography (CT), and magnetic resonance imaging (MRI), are used to provide structural or functional information at different spatial and temporal resolutions. While several medical imaging modalities acquire images at a higher resolution, there are different situations where this is not feasible. Examples include the acquisition modality, lack of expertise, and reduced radiation exposure [3,4].

Related Work and Contributions
In image processing, enhancement techniques focus on the visual improvement of the image quality while restoration techniques focus on restoring the image to its original quality. Super-resolution and contrast stretching are examples of image enhancement techniques while image in-painting is an example of image restoration. We present next existing methods for SISR and discuss the visual tasks that are typically applied to the output of SISR models.

Super-Resolution
Several SISR methods have been proposed to recover an HR image from a single LR image. These methods can be broadly divided into handcrafted methods and deep learning methods.
Early handcrafted SISR methods include interpolation-based methods, statistics-based methods, and example-based methods. Interpolation-based methods, such as bilinear [12], bi-cubic [13], and edge-directed interpolation [14], estimate HR by interpolating LR pixels to HR pixels. These types of methods tend to generate overly smooth or blurry image. Statistics-based methods (e.g., [15]) learns the statistical relationship of the gradient profile between the HR and LR images, motivated by the idea that shape statistics of the gradient profiles is invariant to the image resolution. These types of images tend to produce watercolor-like artifacts when applied to visually complex images. Finally, example-based methods use conventional machine-learning algorithms, such as Random Forests (RFs) [16][17][18] and Markov Random Fields (MRFs) [19], to learn the mapping from LR to HR images. These methods tend to generate a reconstructed image that contains irrelevant (hallucinated) details.
Recently, convolutional neural networks (CNNs) and generative adversarial networks (GANs) have been used to learn the mapping from LR to HR images. For example, Dong et al. [20] proposed the first SR-CNN network to efficiently learn this mapping. Kim et al. [21] proposed a CNN model, known as very deep super-resolution (VDSR), to predict the mapping from LR to HR images using residual learning. Inspired by the VGG architecture [22], VDSR has 20 layers and is trained using extremely high learning rates. VDSR achieved state-of-the-art performance and outperformed SR-CNN [20]. It also resolved several issues of SR-CNN [20] such as the utilization of contextual information from a small image region, slow convergence, and the need for training individual scale-dependent models.
A GAN-based network, named P-SRGAN, has been proposed [23] recently to learn the mapping from LR to HR images. P-SRGAN consists of a generator network and a discriminator network. The generator network takes an LR image as input and generates the HR image while the discriminator compares the generated image with the HR image to generate good quality reconstructed images. Although this approach achieved excellent performance, GAN-based networks are hard to train due to the Nash equilibria problem. This problem is defined as a zero-sum game between the generator network (player 1) and the discriminator network (player 2) where the opponent players contest with each other in a game to improve their objective functions [24]. In addition, GAN-based networks are highly sensitive to the hyperparameter selection and often get into mode collapse (i.e., the generator maps different inputs to the same output) [24]. To regularize the training of GAN-based SR models and enforce the right mapping between the input and output domains, You et al. [25] proposed GAN-CIRCLE network for constructing HR CT images from their LR counterparts. The proposed network combines four loss functions, viz., adversarial loss, cycle-consistency loss, identity loss, and joint sparsifying transform loss, to stabilize training and enforce the right input-output mapping. Expert radiologists evaluation of GAN-CIRCLE on three CT datasets demonstrates its ability in constructing HR images from noisy LR input images. Although the usage of GAN-CIRCLE solved regular GANs training and mapping problems, this network is computationally complex and requires a relatively large GPU as well as much longer training time in comparison to the regular-GAN networks. Further, the network failed to faithfully recover subtle structures in CT images as discussed in [25]. We refer the reader to [26,27] for comprehensive reviews of other handcrafted and deep learning SISR methods.
While existing deep learning models achieved excellent performance and successfully constructed HR images, these models are trained using a single scale (e.g., [20,25]), have a deep architecture with a very large number of training parameters (e.g., [21,23,25]), and are hard to train (e.g., [23]). To resolve issues with existing models, we propose a simple SISR model that learns the mapping from LR to HR images. Our customized SISR model has the following virtues: • Simplicity and Stability. Our customized SISR model, inspired by VDSR (20 conv. layers) [21], has a shallower structure (7 conv. layers) with a lower number of training parameters. In addition, our proposed SISR model is easy to train and has a single network contrary to GAN-based networks which are difficult to train and require both generator and discriminator networks. Further, our proposed model is more stable and less sensitive to hyper-parameters selection as compared to most GAN-based models. As large models with massive number of parameters are restricted to computing platforms with large memory banks and computing capability, developing smaller and stable networks without losing representative accuracy is important to reduce the number of parameters and the storage size of the networks. This would boost the usage of these networks in limited-resource settings and embedded healthcare systems. • Multiple Scales Training: Our SISR model is trained with different scale factors at once. The trained network can then be tested with any scale used during training. As discussed in [21], training a single model with multiple scale factors is more efficient, accurate, and practical as compared to training and storing several scale-dependent models. • Context: We utilize information from the entire image region. Existing methods either rely on the context of small image regions (e.g., [20]) or large image regions (e.g., [21]), but not the entire image region. Our experimental results demonstrate that using the entire image region leads to better overall performance while decreasing computations. • Raw Image Channels: We propose to compute the residual image from the raw image (RGB or grayscale) directly instead of converting the images to a different color space (e.g., YCbCr [21]). The residual image is computed by subtracting the HR reference image from the LR image that has been upscaled using interpolation to match the size of the reference image. The computed residual image contains information of the image's high-frequency details. The main benefit of directly working on the raw color space is that we decrease the total computational time by dropping two operations: (1) converting from raw color space to another color space (e.g., YCbCr) and (2) converting the image back to its original color space. Our customized SISR model computes the residual images directly from the original color space and learns to estimate these images.
To construct an HR image, the estimated residual image is added to the upsampled LR image. • Combined Learning Loss: We propose to train the proposed SISR model using a loss function that combines the advantages of the mean absolute error (MAE) and the Multi-scale Structural Similarity (MS-SSIM). Our experimental results show that MAE can better assess the average model performance as compared to other loss metrics. Also, our experimental results show that the MS-SSIM preserves the contrast in high-frequency regions better than other loss functions (e.g., SSIM). To capture the best characteristics of both loss functions, we propose to combine both loss terms (MAE + MS-SSIM).
In summary, our proposed SISR model has a simple architecture, is stable, and it is trained using the entire raw image and multiple scales at once to minimize a proposed loss function. It reduces the number of training parameters and storage size without performance degradation. Such advantages would boost the usage of SISR models in limited computational resources and facilitate the deployment of these models in clinical settings for potential real-time healthcare applications.

Visual Task Analysis
Existing methods for medical image analysis apply SISR models to construct HR images followed by using the constructed HR images as input to individual models for individual tasks. For example, various methods applied deep learning-based SISR models to LR images and used the constructed HR images as input to separate segmentation (e.g., U-Net network [23,28,29]) and classification (e.g., VGG network [23,28] and DenseNet network [30]) models. This traditional approach for analysis is not efficient because it involves unnecessary repetitions of learning (end-to-end) multiple task-specific models in isolation.
Contrary to previous works that separate image enhancement from other visual tasks, we propose to use our customized SISR model as a shared representation to simultaneously learn multiple subsequent visual tasks. Specifically, the weights of our SISR model, which learns the mapping from LR to HR, are directly used to simultaneously learn tasks such as image segmentation and classification.
Using the proposed SISR model as a shared backbone improves generalization and prevents unnecessary repetitions of learning visual task models in isolation. This can lead to a decrease in resource utilization and training time, and boosts the use of deep learning models in limited-resource settings.

Contributions
In this paper, we propose the Hydra, a deep learning approach that consists of two components: a shared trunk and computing heads. The trunk is a customized SISR model that learns the mapping from LR to HR. The trained trunk is then appended with task-specific layers to learn multiple visual tasks in medical images. Figure 1 depicts the main difference between the Hydra and existing works. As can be seen from the figure, the Hydra trunk is used as a shared backbone to learn multiple visual tasks (heads). On the contrary, the majority of existing methods (traditional approach for task analysis) use the constructed HR image as input to multiple individual models. The main contributions of this paper can be summarized as follows: • We propose the Hydra approach for enhancing medical image resolution and visual task analysis. The Hydra consists of two components: a shared trunk and computing Heads.

•
Hydra trunk is a proposed customized SISR model that learns the mapping from LR to HR. This SISR model has a simple architecture and is trained using the entire raw image and multiple scales at once to minimize a proposed loss function. Our experimental results show that the proposed SISR model, which has a markedly lower number of training time and parameters, achieves state-of-the-art performance.

•
We propose to append the customized SISR trunk with multiple computing heads to learn different visual tasks in medical images. We evaluate our approach using CXR datasets to generate HR representation followed by jointly performing lung segmentation (visual task 1) and abnormality classification (visual task 2). We focus mainly on these two tasks because classification and segmentation are the key tasks in most medical image analysis applications.

•
We empirically demonstrate the superiority and efficiency of our approach, in terms of performance and computation, for SR and medical image analysis as compared to the traditional approach.
We present next the CXR datasets used to evaluate the proposed Hydra and provide detailed descriptions of Hydra trunk and computing heads. Figure 1. Illustration of the traditional approach and proposed Hydra for image enhancement and visual task analysis. The SISR model (Hydra trunk) is used to learn the mapping from LR to HR. In the traditional approach, the constructed HR image is used as input to different models; e.g., segmentation and classification models. In the proposed Hydra, the learned SISR model is used as a shared trunk and appended with task-specific heads. The detailed architecture of Hydra trunk is shown in Figure 2.

Datasets for Training and Testing
We used four publicly available CXR datasets: Radiological Society of North America (RSNA) [31], Shenzhen [32], Montgomery [32], and Japanese Society of Radiological Technology (JSRT) [33] CXR collections. The RSNA CXR dataset is used to build the customized SISR model (trunk) with 70%, 20%, 10% patient-level splits for training, validation, and testing, respectively. The Shenzhen, Montgomery, and JSRT datasets are used alternately for training and testing the task-specific heads.

RSNA CXR
The RSNA dataset contains 30,000 CXR exams, in which 15,000 had positive findings for pneumonia-related pulmonary opacities. The remaining 15,000 negative exams are taken from two groups: 7500 exams had no findings while the remaining had pathologies unrelated to pneumonia. The dataset is provided as DICOM files and has a total of 26,684 frontal images. These images have 1024 × 1024 spatial resolution. The ground truth annotations are provided, for CXRs containing pneumonia-related opacities, as bounding boxes using a commercial annotation system. These ground truth boxes are annotated by six board-certified radiologists. Further details of the RSNA CXR dataset can be found in [31].

Shenzhen CXR
The Shenzhen dataset is collected as a collaboration between the Shenzhen People's Hospital and the Guangdong Medical College, Shenzhen, China. CXRs are collected from outpatient clinics within one-month period, mostly in September 2012, using a Philips DR Digital Diagnostic system. This dataset contains 662 frontal CXR, of which 326 are normal cases and 336 are cases with manifestations of tuberculosis (TB). All the images are provided in Portable Network Graphics (PNG) format. The size of the images in this collection varies, but it is approximately 3000 × 3000 pixels. Each image is provided with a clinical reading text file that contains the patient's age and gender as well as normal and abnormal labels. In addition, each image has a corresponding binary ground truth mask, generated by manually segmenting lung regions. Further details of the Shenzhen CXR dataset can be found in [32].

Montgomery CXR
The Montgomery dataset is collected as part of the collaboration between the Montgomery County (Maryland, USA) and the Department of Health and Human Services. The dataset contains 138 frontal chest X-rays from the Montgomery County's Tuberculosis screening program. The CXR collection of this dataset is acquired using the Eureka stationary CXR machine. Eighty cases are normal and the rest are cases with TB manifestations. All the images are provided in PNG format with either 4020 × 4892 or 4892 × 4020 spatial resolution. Each image is provided with a clinical reading text file that contains the patient's age and gender as well as normal and abnormal labels. In addition, each image has a corresponding binary ground truth mask, generated by manually segmenting lung regions. The manual lung segmentation was performed under the supervision of a radiologist, following anatomical landmarks, such as the boundary of the heart, pericardium, and aortic arc. Further details of the Montgomery CXR dataset can be found in [32].

JSRT CXR
The JSRT dataset was collected by the Japanese Society of Radiological Technology (JSRT) in cooperation with the Japanese Radiological Society (JRS). This dataset contains 247 CXR images, 154 with lung nodules and the rest without lung nodules. All the CXR images have 2084 × 2084 size with spatial resolution of 0.175 mm/pixel and 12-bit grayscale color depth. Each image is provided with a text file that contains additional information such as the patient age, gender, diagnosis (malignant or benign), and coordinates of the nodule. We used the binary masks, which are available at [34], as the ground truth masks for segmentation. Further details of the JSRT CXR dataset can be found in [33].

Proposed Approach: Hydra
We present next both components of the Hydra: the SISR trunk and task-specific heads.

Hydra SISR Trunk
The structure of our SISR model is outlined in Figure 2. As shown in the figure, the model has a shallower depth [21]. The first layer of the network operates on the entire input image (RGB/grayscale), contrary to previous methods [20,21]. This input image is interpolated, via bi-cubic interpolation, and fed into the batch normalization layer. This layer optimizes and speeds up model training, and improves generalization. Our proposed SISR model has seven convolutional layers with 3 × 3 filter size. The number of filters starts as 8 in the first layer and continues to double in subsequent layers. We used zero padding to (1) keep the sizes of all feature maps the same and (2) avoid boundary artifacts due to the convolution operation. We also used ReLU functions with all convolution layers to speed-up model training, resulting in faster convergence. The last three convolutional layers (4th, 5th, and 6th) of the SISR model have dilated convolutional kernels (dilation rate = 2) that capture a wider context, by enclosing a bigger receptive field, at a reduced computational cost. To construct an HR image, the SISR model takes an interpolated LR image as input and predicts the residual image, which contains high-frequency details. As discussed in [21], predicting residual image, instead of non-residual image, leads to faster convergence and superior performance. Finally, the predicted residual image is added back to the input LR image to get the final HR image. We trained the model using pairs of (LR,HR) CXR data. Specifically, we used bi-cubic interpolation to down-sample the HR images for training the model following existing methods in the literature [20,21,23]. We trained the SISR model using Adam optimizer with 1 × 10 −3 learning rate, 256 epochs, 64 batch size, and random weight initialization. The model is trained on multiple scales (i.e., 2, 3, 4, and 8) at the same time. Training is carried out to find the optimal parameters by optimizing the learning-loss function, which consists of two terms: MS-SSIM and MAE. The combined learning-loss achieved the best results as it allows to capture the best characteristics of both loss functions. Depending on the number of GPUs, the training time of our SISR model ranges from 21,917.64 to 54,311.21 s.

Hydra Task-Specific Heads
The traditional approach in medical image analysis involves using separate deep learning models to perform different visual tasks. As shown in Figure 1, a deep learning model would be used to construct HR images from LR images. The constructed HR images are then used as inputs to different individual models for different visual tasks. This traditional approach for analysis is not efficient because it involves unnecessary repetitions, has high resource utilization, and requires a relatively large amount of annotated data.
In this paper, the trained SISR model (trunk) is used as a shared backbone to jointly solve any number of target tasks that use CXR imaging modality; i.e., the target tasks operate on the same imaging modality or have similar input distributions. To achieve this, we truncated the SISR model at the deepest convolutional layer (6th convolutional layer) and appended task-specific heads. Specifically, the optimized weights of SISR are used to transfer modality-specific knowledge and serve as a promising initialization that can be transferred and repurposed for the target tasks. To learn task-specific features, task-specific layers are appended to the shared SISR model. We believe this approach is suitable for medical images because visual medical tasks often operate on the same input modality and have different outputs.

Experiments and Results
In this section, we evaluated the performance of Hydra using four publicly available CXR datasets (see Section 3.1). We compared the performance of the Hydra trunk (SISR model) with VDSR [21]. We also compared the performance of Hydra heads with the traditional approach (individual models for different visual tasks) and the state-of-the-art methods. The code of this work and all the training parameters can be found in: Hydra Github Page.

Hydra Trunk Performance
We evaluated the performance of our customized SISR model using the combined loss (MAE + MS-SSIM), peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and MS-SSIM evaluation metrics. Various studies [35,36] found that SSIM provides a better indication of SISR performance than PSNR because it can better represent human visual perception. Also, it has been reported [35,36] that MS-SSIM is better than SSIM because it is calculated over multiple scales through a process of multiple stages of sub-sampling. We used the RSNA CXR dataset with 70%, 20%, and 10% patient-level splits for training, validation, and testing, respectively. We downscaled all the images (256 × 256 × 1) by scale factor 2, 3, 4, and 8 using bi-cubic interpolation. We used interpolation following existing methods in the literature [20,21,23]. We implemented our SISR model using Keras [37] Python library with TensorFlow backend. Table 1 shows the results of our customized SISR model and VDSR on the testing set. Because both models are trained on multiple scales (2, 3, 4, and 8) at once, they can be tested with any scale used during training. As can be seen in the table, our SISR model outperformed VDSR in most cases. Recall that VDSR, which contains 20 convolutional layers, is a large model with a relatively large number of parameters as compared to our model (7 layers). These results suggest that our customized SISR can faithfully and efficiently construct HR images. Figure 3 shows examples of SR results with scale factors of 2, 3, 4, and 8 using the Hydra trunk.

Hydra Heads Performance
The trained SISR model (Hydra trunk) can be used as a shared backbone to solve any number of target tasks that operate on CXR modality. To solve a new visual task, we only need to append the shared SISR model with task-specific layers followed by training these layers. In this paper, we experimented with segmentation and classification because they represent the key tasks in most medical image analysis applications [38]. However, the Hydra is flexible and can be easily extended to include any number of tasks (e.g., detection). We describe next task-specific heads for CXR abnormality classification and lung segmentation. In all experiments, the Shenzhen, Montgomery, and JSRT datasets are used alternately for training and testing the task-specific heads.

Abnormality Classification
We define the abnormality classification task as categorizing CXR images into normal or abnormal. The abnormal class contains CXR images with different types of pulmonary opacities caused due to TB or lung nodules. Deep learning approaches have been widely used for lung abnormality classification.
For example, Li et al. [30] presented a patch-based multi-resolution CNNs for CXR lung nodule classification. Prior to deep feature extraction, the CXR images are enhanced using a histogram-based method. After enhancing the images, three multi-resolution patches are extracted from each lung field and used to train three CNNs at different resolutions. The final classification label is then generated by fusing the outputs of the three CNNs. The proposed method achieved 0.98 Free-Response Receiver Operating Characteristic (FROC) curve when evaluated on the JSRT dataset. In case of CT images, Han et al. [39] proposed a GAN-based network, called 3D MCGAN, to generate synthetic and diverse nodules in lung CT scan images at a desired location. The generated synthetic images are then used by 3D Faster RCNN as additional training data for lung nodule detection. The experimental results suggest that the proposed method can achieve higher sensitivity and overcome the medical data paucity. Contrary to these works, we propose to simultaneously learn pre-processing operation and CXR abnormality classification using a single trunk and a task-specific head.
To create a task-specific head for abnormality classification, we initialized the SISR model with its weights, freezed the weights, and appended task-specific layers to the truncated SISR model (at the deepest or 6th convolutional layer). The task-specific layers are global average pooling (GAP), fully connected (FC), dropout (D), and Softmax (SM) layers. We then trained task-specific layers on the Shenzhen dataset to minimize the categorical cross-entropy (CCE) loss. We trained the classification layers using Adam optimizer, 1 × 10 −3 learning rate, 32 epochs, and a batch size of 8.
As a baseline, we used a model that has six convolutional layers followed by GAP, FC, D, and SM layers. The baseline classification model has the following architecture: input (256 × 256) → Conv2D (16) We randomly initialized the model's weights and optimized for the search parameters. We trained the baseline classification model to minimize the CCE loss. We performed two baseline experiments using the baseline classification model. In the first experiment, we used the original HR images as input to the classification model. In the second experiment, we applied bi-cubic interpolation to generate interpolated images and used the generated interpolated images as input to the baseline classification model. Recall that the main difference between the proposed approach and the traditional approach is that our approach directly uses the SISR model and only learns the task-specific classification layers. Table 2 presents the performance of abnormality classification using the baseline approach and the proposed approach (Hydra classification head). We trained the head and the baseline model on the Shenzhen dataset, and evaluate their performance on the Montgomery dataset. The performance was reported using the accuracy, AUC, sensitivity, precision, and F-score. As shown in the table, the Hydra outperformed baseline models in all scales. These results suggest that freezing the weights of the trunk while learning the weights of the task-specific head can lead to better overall performance as compared to the traditional approach. Specifically, using the weights of the SISR model improves generalization and decreases the number of training parameters (see Section 4.3) for the classification head, which leads to enhanced overall performance.

Lung Segmentation
To create a task-specific head for lung segmentation, the trained SISR model was truncated at the deepest convolutional layer and a symmetrical decoder was constructed and appended. Specifically, the SISR model (Hydra trunk) was truncated at the sixth layer in Figure 2 and appended with the following layers: Conv2D (128) → Conv2D (64) → Conv2D (32) → Conv2D (16) → Conv2D (8) → Conv2D (1). The segmentation head was trained to minimize the binary cross-entropy (BCE) loss. We used the weights of the SISR model to train the segmentation layers. We trained the segmentation layers using Adam optimizer, 1 × 10 −3 learning rate, and 200 epochs.
As a baseline, we randomly initialized an individual model, which has the same encoder-decoder structure as the segmentation head, and optimized it for the search parameters. The baseline segmentation model has the following architecture: input (256 × 256) → Conv2D (8) (1). We trained the baseline segmentation model to minimize the BCE loss. We performed two baseline experiments using the baseline segmentation model. In the first experiment, we used HR images as input to the baseline segmentation model. In the second experiment, we applied bi-cubic interpolation and used the interpolated images as input to the baseline segmentation model. Table 3 presents the performance of lung segmentation using the proposed approach and baseline model. Both models were trained on the Shenzhen dataset and evaluated on the Montgomery dataset. The performance was reported using the pixel accuracy, loss, and intersection over union (IoU). As shown in the table, the segmentation head, which uses the trunk as a backbone, outperformed baseline models in most scales. Figure 4 presents testing examples of the segmentation generated by the segmentation head as well as baseline approach and ground truth masks. The figure shows a clear visual improvement in the segmentation obtained by our proposed approach as compared to the traditional approach. These results suggest that initializing the segmentation head with the weights of the trunk enhances the overall performance. Recall that Hydra head uses the trunk weights while the baseline model is initialized randomly. In addition to the Shenzhen and Montgomery datasets, we alternated between the Shenzhen, Montgomery, and JSRT datasets for training and testing to further evaluate the performance of the segmentation head. The testing sets of the Shenzhen and JRST datasets contain 100 and 140 images, respectively. Table 4 presents extensive evaluations of Hydra segmentation head. These results suggest the excellent performance of the proposed approach in well-established datasets. Table 5 compares the performance and computation time for lung segmentation using the proposed Hydra, anatomical atlas method [40], and efficient U-Net [41]. As can be seen in the table, the proposed Hydra outperforms similar approaches in the literature with the minimum computation time per image.   Table 6 presents quantitative measurements, in terms of parameters and time, for the Hydra, the state-of-the-art VDSR, and the traditional baseline approach for visual task analysis. We trained all models using a Windows system with the following configuration: (1) Intel Xeon CPU E3-1275 v6 3.80 GHz processor and (2) NVIDIA GeForce GTX 1050 Ti. Keras DL framework with Tensorflow was used for model training and evaluation.

Training Parameters and Time
As shown in Table 6, the training parameters and time for the state-of-the-art VDSR are higher than the proposed Hydra trunk. Similarly, the computation time, per image, of VDSR is higher than Hydra trunk. In addition to the SISR models, Table 6 presents the training time and parameters as well as the computation time (per image) for Hydra heads and the traditional baseline approach. As can be seen in Table 6, the training parameters and time for Hydra classification head are notably lower than the baseline model. In case of segmentation, both Hydra head and the baseline segmentation model have the same number of training parameters. However, the training time of Hydra segmentation head, which was initialized using the trunk weights, is much lower than the baseline model. This suggests that using the weights of the Hydra trunk leads to faster convergence. Finally, it is important to note that although the total training time for Hydra (trunk and heads) is higher than the baseline approach, the majority of Hydra time is coming from the trunk (SISR model). However, this trunk is trained only once and can be used to simultaneously solve N visual tasks in medical images.
In addition to the traditional baseline models, we believe Hydra has significantly lower computational time and parameters as compared to existing methods in the literature. Specifically, given that (1) the number of parameters for most SISR models in the literature ranges from 57K to 43M [27] and (2) the number of parameters for the standalone state-of-the-art classification models ranges from 7M to 146M [42], we believe the proposed Hydra significantly decreases the training time and parameters for super-resolution image enhancement and visual task analysis in medical images. Besides being able to significantly reduce training and speed-up computations, similar to [43] Hydra can tackle the issue of the high dependence on a relatively large corpus of labelled data as follows. The Hydra trunk can be created using an unsupervised learning (no manual labeling) or residual learning (SISR), and then used as a shared backbone to simultaneously learn task-specific layers with relatively small numbers of training parameters; i.e., freezing the weights of the trunk and learning shallow task-specific layers can lead to enhanced performance in visual task analysis, especially for tasks with limited amounts of labeled data.
Our experimental results are promising and prove the efficiency and superiority of the proposed approach for efficient medical image analysis.

Conclusions and Possible Extensions
HR images with fine structural details are preferred for visual task analysis in medical images. Several deep learning networks have been proposed in the literature to enhance the resolution of medical images. However, these networks are very deep and require a massive number of parameters, which restrict them to platforms with large memory banks and computing capability. To address this problem, we propose an efficient approach, called the Hydra, to accelerate SISR and visual task analysis in medical images. The Hydra consists of a SISR trunk and task-specific heads. The trunk is a small SISR model that learns the mapping from LR to HR. The experimental results demonstrated that the performance of the Hydra trunk is comparable to the state-of-the-art VDSR model. After constructing the trunk, task-specific heads are appended to learn task-specific features for the key visual tasks in medical images. The proposed Hydra was evaluated on CXR datasets to perform image enhancement, abnormality classification, and lung segmentation. The experimental results demonstrated that the proposed approach outperformed baseline approach by a large margin. In addition, quantitative measurements of the Hydra showed a remarkable reduction in terms of resources and computations as compared to the baseline traditional approach and methods in the literature.
Although we only demonstrated the performance of our approach in constructing HR CXR images followed by classification and segmentation, the Hydra can be extended in several ways. For example, instead of using a single LR image, multiple LR images can be used as input to the SISR model (Hydra trunk) followed by fusing the reconstructed images as proposed in [44,45] to generate a fused HR image. Besides, the proposed Hydra can be extended to perform pre-processing operations in 3D images using 3D CNNs (e.g., 3D SISR [46]) as a shared trunk followed by appending the trained trunk with multiple heads for different 3D tasks. The Hydra can also be extended and used for other medical image applications. For example, a trunk can be trained to construct HR CT images followed by appending the trunk with task-specific layers to perform different CT tasks. In another imaging modality, a Hydra trunk can be trained to de-noise MRI images followed by simultaneous visual task analyses including the detection of a brain tumor, its location, and the segmentation of the tumor from the background.