CARNet: Context-Aware Residual Learning for JPEG-LS Compressed Remote Sensing Image Restoration

: JPEG-LS (a lossless (LS) compression standard developed by the Joint Photographic Expert Group) compressed image restoration is a signiﬁcant problem in remote sensing applications. It faces the following two challenges: ﬁrst, bridging small pixel-value gaps from wide numerical ranges; and second, removing banding artifacts in the condition of lacking available context information. As far as we know, there is currently no research dealing with the above issues. Hence, we develop this initial line of work on JPEG-LS compressed remote sensing image restoration. We propose a novel CNN model called CARNet. Its core idea is a context-aware residual learning mechanism. Speciﬁcally, it realizes residual learning for accurate restoration by adopting a scale-invariant baseline. It enables large receptive ﬁelds for banding artifact removal through a context-aware scheme. Additionally, it eases the information ﬂow among stages by utilizing a prior-guided feature-fusion mechanism. Alternatively, we design novel R IQA models to provide a better restoration performance assessment for our study by utilizing gradient priors of JPEG-LS banding artifacts. Furthermore, we prepare a new dataset of JPEG-LS compressed remote sensing images to supplement existing benchmark data. Experiments show that our method sets the state-of-the-art for JPEG-LS compressed remote sensing image restoration.


Introduction
Due to limitations in storage space and transmission bandwidth, digital images are usually compressed to remove redundancy [1].Generally, compression methods trade image quality for higher data compression rates.However, scientific considerations make image quality degradation unacceptable for high-performance applications.Thus, nearlossless compression methods are usually employed in remote-sensing applications for a trade-off between image quality and compression rates.JPEG-LS is a new generation of image lossless (LS) compression standard developed by the Joint Photographic Expert Group (JPEG) that can perform lossless and near-lossless compression [2], where NEAR indicates the control of information loss.In order to express this concept more clearly, JPEG-LS in the following description only represents its near-lossless implementation.Due to its low complexity, JPEG-LS has been widely used in remote sensing applications [3].Although one image compressed by JPEG-LS may present slighter quality degradation than compression by general lossy compression schemes (e.g., JPEG and BMP), it still causes noticeable banding artifacts in some flat image areas.As shown in Figure 1, these banding artifacts not only lead to information loss but also result in unpleasant visual feelings, which may severely affect high-performance remote sensing applications.Hence, there is an urgent need for JPEG-LS compressed remote-sensing image restoration.The research community has proposed a lot of compressed image restoration methods, including model-based methods [4][5][6][7][8] and learning-based methods [9][10][11][12][13][14][15].Specifically, a fully convolutional neural network (FCN) [16] has achieved state-of-the-art results in recent years.However, the existing methods all focus on lossy compression schemes and social media images.As far as we know, there is no specialized research work on JPEG-LS compressed remote sensing image restoration.
To this end, we develop this initial line of work for JPEG-LS compressed remote sensing image restoration.Because of the following two core problems, our task is much more difficult than general lossy compressed image restoration: (1) Due to the requirement of low information loss, JPEG-LS compressed images exhibit only slight degradation, manifested by small differences in pixel values from their corresponding references.However, the pixel value of high-bit remote sensing images varies widely.Thus, our task is not only a problem of high-precision restoration but also of bridging small pixel-value gaps from wide numerical ranges.(2) Most remote sensing images cover many flat areas that contain little context information.However, as shown in Figure 1, JPEG-LS compressed banding artifacts generated from run-length coding [17] usually occur in such flat areas.Thus, our task may lack context information when removing JPEG-LS banding artifacts.
To deal with the above problems, we propose a novel CNN network called CARNet.Its core idea is a context-aware residual learning mechanism.It has the following three key components: First, we design a scale-invariant baseline to realize high-accuracy pixel-value recovery.Since the pixel value of high-bit remote sensing images may vary widely, directly learning the latent clean image may amplify slight degradation caused by near-lossless compression.Hence, we consider residual mapping learning may be more efficient for our task.Inspired by [18], we propose a scale-invariant baseline to mine residual features.Scaleinvariant means learning in fixed scale space, which can reduce pixel value reconstruction errors caused by image scale changes.Hence, our baseline can provide spatial accuracy that is good at residual learning to enable high-precision recovery for minor degradation.Second, we propose a context-aware subnet to supplement context information.Our scaleinvariant baseline may perform well in learning residual mapping, but due to limited receptive fields, it may lack the ability to extract rich context information.Hence, we design a context-aware subnet that focuses on mining context information.It can provide large receptive fields for exploring context information, thus showing promising results in JPEG-LS banding artifact removal.Third, we propose a prior-guided feature fusion mechanism to ease the information flow among the above two stages.Our scale-invariant baseline and context-aware subnet focus on solving the two core problems of JPEG-LS compressed remote sensing image restoration, respectively.It may not work if we directly fuse them and further use the fused features to reconstruct the latent clean image.Hence, we progressively integrate context features into our scale-invariant baseline.This scheme forms a prior-guided reconstruction that provides better features for restoration.Further, we notice that the gradient angle values of banding artifacts are almost π/2 or 3π/2.This is a noteworthy local feature of JPEG-LS compressed remote sensing images.By utilizing this JPEG-LS compression prior, we design special loss functions to strengthen the overall supervision in gradient-value space, thus further improving restoration performance.
Alternatively, researchers usually employ two commonly used Reference (R) Image Quality Assessment (IQA) algorithms (i.e., PSNR and SSIM [19]) for quantitatively evaluating restoration performance.However, these metrics are no longer suitable for our study due to the particularity of JPEG-LS-degraded remote sensing images.Specifically, the above problem is manifested in the following two aspects: (1) Because of the nearlossless character of JPEG-LS compression, SSIM scores of JPEG-LS-degraded images under different compression rates are very close (e.g., numerical differences do not appear until the third decimal place).It may be hard to distinguish the perceptual quality of different JPEG-LS-degraded images through such a tiny numerical difference.(2) R IQA models may become unreliable when the original reference image is degraded [20].Due to the particular perspective of the remote sensing image, the evaluation results of PSNR and SSIM in our task may be inconsistent with human judgment.To this end, we propose novel R IQA algorithms, LS-PSNR and LS-SSIM, to provide better quantitative assessment for our research.Specifically, we first design novel R IQA models Q g in gradient-value space then combine Q g with pixel-value-space R IQA models Q p (i.e., PSNR and SSIM).Our LS-PSNR and LS-SSIM may be viewed as a process of conditioning Q p using Q g , where the predicted Q g score serves as prior knowledge.Adopting this conditioning process, our R IQA models greatly expand the difference among predicted scores.Additionally, through utilizing Q g priors, our R IQA models show promise in yielding predictions of human quality judgments.
Furthermore, we prepare a new dataset of JPEG-LS compressed remote sensing images to supplement existing benchmark data.Experiments show that our method sets the state-of-the-art for JPEG-LS near-lossless compressed remote sensing image restoration.The contributions of this paper are highlighted as follows:

•
We develop the initial line of work on JPEG-LS near-lossless compressed remote sensing image restoration.

•
We propose a novel CNN network, called CARNet, to deal with new challenges in this initial line of work.Its core idea is a context-aware residual learning mechanism.Further, we design special loss functions to further improve restoration performance by utilizing JPEG-LS compression priors.

•
We propose novel R IQA algorithms, called LS-PSNR and LS-SSIM, to provide better assessment results for our research by utilizing special characteristics of JPEG-LS banding artifacts.

•
We prepare a new dataset of JPEG-LS compressed remote sensing images to supplement existing benchmark data.Experiments show that our method sets the state-ofthe-art for JPEG-LS near-lossless compressed remote sensing image restoration.

Related Work
Some early works [21][22][23] treat compression artifact removal as a denoising problem by modeling compression artifacts as additive noise.These works only consider the smoothness or regularity in pixel intensities.In their restored images, edges and textures may be smoothed.Other works [24][25][26] treat compression artifact removal as an image inverse problem.These methods further consider the nonstationarity of image content, but they ignore the content-correlated characteristic of the compression noise.Further, due to ill-posedness, prior knowledge is required to regularize the solutions of their methods.
Recently, CNNs have been widely used for low-level image processing problems and have achieved excellent results.CNN-based restoration for compressed images was first introduced by Dong et al. [9].However, their small-scale network limits the network receptive field, and the training process of their network converges too slowly.Then, DnCNN [11] boosts performance on general blind image restoration tasks.Later, a wavelet transform-based network, MWCNN [27], brings further improvements.In other explo-rations, a deep convolutional sparse coding network [28] combines model-based methods with deep CNN.Additionally, a Dual-domain Multi-scale CNN (DMCNN) [14] is proposed for JPEG compressed image restoration by enlarging the receptive fields on both the pixel and DCT domains.Their model shows promising restoration results, but their network architecture is too redundant.Alternatively, some works [13,29] propose a feed-forward fully convolutional residual network.Their model is trained with a generative adversarial framework.However, restoration results produced by such networks are often not vivid, and the training process of a generative adversarial network is usually arduous.
Since the assessment of restoration results consists of quantitative and qualitative evaluation, the most-recent works move towards two genres.On the one hand, some works focus on improving quantitative accuracy.For example, inspired by spatial-wise convolution for shift-invariance, Fan et al. [30] proposes a "scale-wise convolution" to convolve across multiple scales for scale-invariance.Their network shows that modeling scale-invariance into neural networks in a proper way may bring significant benefits to image restoration performance.On the other hand, some works focus on improving qualitative visual feelings.Ehrlich et al. [31] proposes QGAC that adopts a quantization table to make a single model able to correct JPEG artifacts at any compression rate.Additionally, Jiang et al. [32] presents FBCNN that can achieve flexible JPEG image restoration by manual control of compression quality factor.Further, Zamir et al. [15] proposes a multi-stage architecture that progressively learns restoration functions for the degraded inputs, thereby breaking down the overall recovery process into more manageable steps.
Based on the above research, we develop this initial line of work for JPEG-LS compressed remote sensing image restoration.We refer to the proposed network as CARNet; it can achieve accurate restoration while performing well in banding artifact removal by adopting a context-aware residual learning mechanism.

Method
In this section, the proposed method is introduced.The proposed CARNet is introduced first, then followed by the loss function, and last but not least, the novel R IQA algorithm is illustrated.

CARNet Framework
The framework of the proposed CARNet is shown in Figure 2. The entire network is an end-to-end system that takes a JPEG-LS near-lossless compressed image C as input and directly generates the output image O.The network is fairly straightforward, with each component designed to achieve a specific task.As illustrated, our model contains three components: scale-invariant baseline, context-aware subnet, and prior-guided reconstruction.In order to express the network learning process conveniently, we use Φ to represent 3 × 3 convolution, and σ to indicate PReLu [33] activation.
The framework of the proposed CARNet.Each component of our network is designed to complete a specific task.

Scale-Invariant Baseline
Since a near-lossless compressed image degrades slightly, it only shows small pixelvalue differences from its reference, and its restoration requires high accuracy.Generally speaking, the learning process of small residual values may be easier to converge than direct regression of large pixel values, especially for remote sensing images, which are usually 10-bit to 12-bit and present wide numerical ranges.Hence, we propose a scale-invariant baseline to achieve residual learning.As shown in Figure 2, it takes compressed image C as input, obtains low-level feature F, then extracts the basic residual features using five res-blocks, where: and each res-block's output can be represented as F i : where i = 1, 2, 3, 4, 5; F 0 = F. Without any downsampling operation, our baseline extracts residual features from full-res inputs.This scale-invariant network ensures accurate residual mapping and thus can achieve high-accuracy restoration.Further, our network learns small pixel-value differences from a deep network, thus making it easy to meet the problem of vanishing gradients.Based on [11,34], the residual mapping also simplifies the convergence process.

Context-Aware Subnet
Remote sensing images usually cover large flat areas that may not present obvious context information.However, most JPEG-LS banding artifacts occur in these flat areas.Thus, due to learning in fixed-scale space, our scale-invariant baseline may lack receptive fields to obtain enough context information for banding artifact removal in such flat areas.It needs additional context information.To this end, we propose the context-aware subnet to mine context information by adopting various techniques to enlarge receptive fields.As shown in Figure 3, our context-aware subnet is a U-Net-like structure that consists of downsampling convolution, dilated convolution, and pixel-shuffle upsampling convolution.These architectures are all effective designs to enlarge receptive fields, which greatly expands the whole network's receptive fields.Further, we noticed that image gradient features contain rich contextual information.Thus, rather than only taking the compressed image as input, our context-aware subnet adopts gradient maps as additional input priors.Based on the above schemes, our context-aware subnet has enough receptive fields to effectively mine context information for JPEG-LS banding artifact removal in flat areas.

Prior-Guided Reconstruction
Since context information is a supplement to residual features, we do not directly adopt the fused features to reconstruct the latent clean image but use a prior-guided feature fusion mechanism to propagate context information from our context-aware subnet to later stages.As shown in Figure 2, once we obtain the context features, we adopt them to guide the residual feature learning in our baseline.In order to give full play to the guiding role of the context prior, we first integrate context features into the basic residual features by a concatenate operation; we then extract context-aware residual features F using another three res-blocks.Finally, the learned context-aware residual features F are fed into three convolutional layers to generate the output image O.The process can be simply expressed as: This scheme forms a prior-guided reconstruction, which eases the information flow among stages.Hence, our context-aware subnet can provide context information that is lacking in residual mapping, which helps the whole network to achieve great performance in both accurate restoration and banding artifact removal.

Loss Function
We design a loss function L that consists of three components, and we minimize it during the network training.It is expressed as: We use mean squared error (MSE) as the major loss function to supervise the whole network, which can be written as: where I is the ground-truth image and n is the number of pixel points.
Considering the difficulty of learning small pixel-value differences, we add the L1 norm to strengthen the overall supervision in pixel-value space: Further, we notice that the banding artifact has a strong local gradient feature: the gradient angle values are almost π/2 or 3π/2.Thus, based on this prior, we propose the gradient angle loss L G to further strengthen the overall supervision in gradient-value space.It is expressed as: where A I and A O represent the gradient angle of the ground-truth and output, respectively.Additionally, we adopt a Sobel operator to calculate image gradients.

R IQA Algorithm
JPEG-LS compression artifacts appear as horizontal bands with distinct image gradient characteristics.(1) As banding artifacts become severe, the mean value of the deviation between horizontal and vertical image gradients increases continuously.(2) Most of the gradient angles shown in banding artifact areas are π/2 or 3π/2.We design two novel R IQA models in gradient-value space by utilizing the above characteristics.Based on the first characteristic, similar to PSNR, we propose G-PSNR, which computes the deviation difference of horizontal and vertical image gradients between C and I. Based on the second characteristic, similar to SSIM, we propose G-SSIM, which computes the structural similarity of image gradient angles between C and I. Our G-PSNR and G-SSIM can assess the degradation severity caused by JPEG-LS banding artifacts but may be limited in evaluating the pixel-value similarity between C and I. Thus, one step further, we combine Q g (G-PSNR and G-SSIM) with Q p (PSNR and SSIM) and propose LS-PSNR and LS-SSIM, which fuses image-similarity evaluation and artifact-severity assessment.The specific calculation process is as follows.
As illustrated in Figure 4, given an input image I and its compressed version C, PSNR and SSIM scores are generated to account for the perceptual quality difference Q p between I and C in pixel-value space.Then, a gradient component predicts the horizontal gradient G X and vertical gradient G Y of I and C, respectively, followed by the calculation of gradient angle A. Later, G-PSNR and G-SSIM scores are generated to account for the perceptual quality difference Q g between I and C in gradient-value space.The above calculation processes are expressed as follows: where A I and A C represent gradient angles of I and C, respectively, F S indicates the SSIM function that is used to predict SSIM scores, dd represents the difference in the mean deviation of the horizontal and vertical gradient between I and C, and R indicates the range of mean deviations.The mean deviation is computed through the window mean method.The above calculation processes can be illustrated as: where where F P indicates the PSNR function that is used to predict PSNR scores.In our study, we set α = 0.7, β = 0.5.The formal computation program of LS-PSNR is shown in Algorithm 1. raise Value Error: "R must ≥ 0." 7: end if 8: compute dd using Equations ( 10) and (11) 9: compute G-PSNR using Equation (9) 10: compute LS-PSNR using Equation (13) 11: return LS-PSNR The proposed R-IQA models have several merits.They may be viewed as a process of conditioning Q p on Q g , where the predicted Q g score serves as "prior" knowledge of JPEG-LS banding artifacts.Hence, the predicted scores of our R IQA models show promise in estimating the severity of JPEG-LS banding artifacts, which could provide a better evaluation of our restoration performance.

Dataset
We have collected a large dataset with 10-bit and 12-bit panchromatic remote sensing images, which have a resolution from 5353 × 17,144 to 16,296 × 16,968.In each data type, we randomly select some images as test data and adopt the remaining as training data.To train our model to adapt to different compression ratios, we prepare the corresponding degraded images using the JPEG-LS compression method at different NEAR value settings (i.e., 8, 12, and 16).Further, due to the limitation of computing resources, we crop high-res remote sensing images to a uniform size of 256 × 256.After the above processing, we collect a large remote sensing image dataset consisting of 51,966 10-bit image patches and 14,715 12-bit image patches, each containing three types of JPEG-LS compressed patches at different NEAR values.
Alternatively, in order to evaluate the performance of our R IQA models for indicating the similarity between one remote sensing image and its JPEG-LS compressed one, we prepared a manually labeled dataset consisting of 200 image pairs.Each image pair contains a 12-bit remote-sensing image and its corresponding LPEG-LS-degraded one (NEAR = 16).Additionally, each data pair is marked by a Mean Opinion Score (MOS) that indicates the images' similarity.Our MOS result is an arithmetic average of three experts' scores.Figure 5 presents the distribution of our MOS-labeled dataset.Additionally, Figure 6 shows visual examples of different MOS scores.

R IQA Model Performance Evaluation
We evaluate the proposed R IQA models from the following two aspects: first, whether their predicted scores could provide proper assessments for distinguishing tiny similarity differences; and second, whether their evaluation results conform to the human visual evaluation system.We conduct experiments on our MOS-labeled dataset.Additionally, we use PSNR and SSIM as the comparison benchmarks.

Distribution Analysis
We first analyze the distribution of model-predicted scores using our large-scale remote sensing image dataset.Figure 7 shows the predicted scores' distribution of all R IQA models.On the one hand, please see each sub-figure.It presents a box plot of all models' predicted scores under a specific JPEG-LS compression setting, which clearly shows that the predicted scores of our R IQA models, LS-PSNR and LS-SSIM, have a much wider distribution range than all benchmark models.On the other hand, please compare three sub-figures in each line.They present the predicted score's distribution of one dataset compressed under different JPEG-LS NEAR values.It can be found that the variation in our models' predicted scores are more in line with changing compression-ratio trends.All the above improvements show that our R IQA models perform well in distinguishing tiny similarity differences.

Performance Analysis
We then analyze the assessment performance of R IQA models on our MOS-labeled dataset.For quantitative evaluation, we adopt Root Mean Squared Error (RMSE), Pearson's Linear Correlation Coefficient (PLCC), and Spearman Rank-Order Correlation Coefficient (SROCC) as indexes.RMSE measures the absolute error between R IQA scores and MOSa smaller value of RMSE shows better performance.PLCC describes the correlation between R IQA scores and MOS, and SROCC measures the monotony of the R IQA model's predictions.Larger values of PLCC and SROCC indicate better performance.Following usual practice [35], the R IQA predicted scores are passed through a logistic nonlinearity before computing the RMSE, PLCC, and SROCC measure.
The quantitative evaluation results are shown in Table 1.We have notice that PSNR gets negative PLCC and SROCC scores, which indicates that the PSNR-predicted score is the polar opposite of MOS.Though SSIM has relatively higher PLCC and SROCC scores, it gets the highest RMSE score, which presents the SSIM-predicted score as a weak correlation with MOS.On the contrary, our LS-SSIM achieves the lowest RMSE score and the highest PLCC and SROCC scores, followed by our LS-PSNR.The above result shows that our R IQA model performs well in all aspects, including accuracy, correlation, and monotone consistency.Hence, we consider that our R IQA models can excellently yield predictions of human quality judgments.We use Adam [36] optimizer with an initial learning rate set as 0.0001, and we scale down the learning rate by a factor of 0.9 when the validation loss stops decreasing.We train our model with 200 epochs and a batch size of 16.Our code is implemented with PyTorch [37] and runs on a PC with two Intel(R) Xeon(R) E5-2640 v4 CPUs and one GTX 2080Ti GPU.
We compare our method with the start-of-the-art restoration networks ARCNN [9], DMCNN [14], SCN [30], and MPRnet [15].The results of these methods are generated by the codes released by the authors using their recommended experiment settings.Since there is no existing public remote sensing dataset for restoration research, all comparisons are trained and tested on our dataset.As illustrated in the above section, our LS-PSNR and LS-SSIM scores show consistent assessment results with MOS, which can well evaluate JPEG-LS artifact removal performance.Further, high PSNR and SSIM scores may correspond to high pixel-value similarity.Thus, rather than only adopting our R IQA models as evaluation indexes, we also use PSNR and SSIM to indicate restoration accuracy.
For qualitative evaluation, we adopt the above Likert plot as an index.Figure 8 presents a Likert-plot assessment example.An image labeled with a MOS score of 5 has a PSNR predicted score of 1, SSIM predicted score of 3, LS-PSNR predicted score of 4, and LS-SSIM predicted score of 5.The result is consistent with the quantitative comparison that show that our R IQA models show much closer predicted results with MOS.Hence, through the above subjective and objective comparisons, we believe our R IQA models are promising for providing better evaluation results in our study.

Objective Comparisons
The quantitative results are shown in Tables 2-5.Significantly, the proposed CAR-Net model outperforms all the other methods on all evaluation metrics.As shown in Tables 2 and 3, our model far surpasses all image restoration methods in PSNR and also achieves small gains in SSIM, which indicates that our model presents accurate restorations in pixel-value space.Please see Tables 4 and 5-our model significantly advances the stateof-the-art by consistently achieving better LS-PSNR and LS-SSIM scores on all datasets, which shows our model performs well in JPEG-LS banding artifact removal.Further, by comparing the restoration performance of different compression rates (NEAR value), we notice that all models present consistently decreased performance as the NEAR value gets smaller, which indicates that JPEG-LS compressed image restoration becomes harder when the compression rate decreases.Hence, we consider the study of near-lossless compressed image restoration in our work may be more challenging than the existing common works of lossy compressed image restoration, which means the small gains presented in the above evaluation results are acceptable.Here, we conduct a statistical analysis of the above quantitative experiment results to confirm whether the proposed methodology is effectively better or if it is borderline.Specifically, Figure 9 shows the quantitative results' distribution of all evaluation indexes.Each sub-figure presents a box plot of all models' predicted scores for one evaluation index on different dataset types.The median value (labeled with the orange line) indicates all model's average scores.Please see each box-the max value presents our model's predicted scores, and the lowest value presents ARCNN's predicted scores.Although the compared methods' predicted scores of different indexes show different distributions, our model far surpasses the median value of other models in PSNR and LS-PSNR.Further, our model only achieves small gains in SSIM and LS-SSIM.However, the gap between our model's score and the median value is larger than the disparity among other state-of-the-art models.Hence, we consider that the proposed methodology is effectively better than state-of-the-art methods.

Subjective Comparisons
For subjective comparisons, Figures 10-12 present some restored results from our largescale remote sensing image dataset.JPEG-LS compressed images show severe banding artifacts.Among all restored results, the early work, ARCNN with limited receptive fields, performs worst, as it almost cannot remove any banding artifacts.On the contrary, since SCN, MPRnet, and our CARNet have different effective mechanisms to expand the network's receptive fields, they all perform better than ARCNN.Among their comparison, MPRNet is relatively poor and still shows banding artifacts, and SCN is too strong and causes problems with smoothing.Our model achieves the best visual feelings, in that it removes most banding artifacts without over-smoothing.Besides, more visual results can be found in our supplementary materials.

Ablation Studies
Here, we present ablation experiments to analyze the contribution of each component of our model.We mainly demonstrate the effectiveness of the proposed context-aware mechanism by removing our context-aware subnet from the whole network.We conduct this experiment on our 12-bit remote sensing image dataset with the NEAR value set to 16.The results are shown in Table 6.It can be seen that all evaluation indicators drop as our context-aware subnet is removed from the whole network.Particularly, the PSNR score shows a substantial drop from 47.56 dB to 47.26 dB.Hence, we believe the contextaware mechanism significantly improves our model's restoration performance by enlarging receptive fields to extract context features.

Color JPEG Image Restoration Performance Evaluation
To better highlight the academic contribution of our proposed model to deep learning architectures, we conduct an experiment on color JPEG pictures with three channels.For fair comparisons, we use libjpeg [38] for compression with the baseline quantization setting.Based on [32], we employ DIV2K [39] and Flickr2K [40] as our training data to generate synthetic JPEG images.Specifically, all synthetic JPEG images used for training our model are compressed using the libjpeg scheme with quality factor (QF) 10.All methods are tested on the commonly used RGB benchmark LIVE1 [41].Please see Figure 13.It is one example from LIVE1, where we see that the visual feelings produced by the restored result from our CARNet are far superior to those of DnCNN and MWCNN.Our CARNet generates a comparable overall image quality to FBCNN, one of the state-of-the-art methods for JPEG artifact removal.Hence, we consider our proposed CARNet an academic contribution to deep learning architectures.

Conclusions
In this paper, we propose a novel CNN model, CARNet, to explore the restoration of JPEG-LS compressed remote sensing images.It shows promise in solving the challenging problems in our study through a context-aware residual learning mechanism.Specifically, it achieves high-accuracy restoration by adopting a scale-invariant baseline to learn residual mapping.It performs well in JPEG-LS banding artifact removal by using a context-aware subnet to enlarge receptive fields.Additionally, it eases the information flow among stages by utilizing a prior-guided feature fusion mechanism.Alternatively, we propose novel R IQA models, LS-PSNR and LS-SSIM, to provide better evaluation results for our study.By adopting the characteristics of JPEG-LS banding artifacts as priors, our R IQA models can excellently yield predictions of human quality judgments and effectively distinguish tiny similarity differences among JPEG-LS-degraded images.Further, we prepare a new dataset of JPEG-LS compressed remote sensing images to supplement existing benchmark data.The evaluation results indicate that our work is the current state-of-the-art among all CNN-based methods.However, our method requires training a new model for each compression ratio, which is very time-consuming and computationally intensive.Hence, our next work will focus on designing a framework that can accommodate a wide range of JPEG-LS compression ratios.

Figure 1 .
Figure 1.The artifact of JPEG-LS compressed remote sensing images.A large NEAR value corresponds to a high compression rate, which results in more serious banding artifacts.To show image details well, we present the compression results in a 16-bit gray-scale format and use local adaptive histogram equalization to enhance its contrast.

Figure 3 .
Figure 3.The framework of Context-Aware Subnet.

Figure 7 .
Figure 7.The predicted scores' distribution of all R IQA models as box plots.In each box, the central orange mark represents the median, while the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively.The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the '+' symbol.Further, in order to compare the predicted scores' distribution of different R IQA models at the same scale, we enlarged the predicted scores of SSIM and LS-SSIM by one hundred times to reach the same magnitude as PSNR and LS-PSNR.

Figure 8 .
Figure 8. Visual illustration of R IQA model predicted scores.

Figure 9 .
Figure 9.The quantitative results' distribution of all evaluation indexes on our 10-bit dataset, box plot.In each box, the central orange mark represents the median, while the bottom and top edges of the box indicate the max and min values, respectively.

Figure 10 .
Figure 10.Visual comparison with state-of-the-art methods on a test 10-bit image patch (NEAR = 8).

Figure 11 .
Figure 11.Visual comparison with state-of-the-art methods on a test 10-bit image patch (NEAR = 12).

Figure 12 .
Figure 12.Visual comparison with state-of-the-art methods on a test 10-bit image patch (NEAR = 16).

Figure 13 .
Figure 13.Visual comparison with state-of-the-art methods on a color JPEG image from the LIVE1 dataset.

Table 1 .
RMSE, PLCC, and SROCC results of different R IQA models.Best results bold.

Table 2 .
PSNR (dB) results, data type: 10-bit (NEAR = 8) means a 10-bit remote sensing image compressed by JPEG-LS with a NEAR value set to 8. Best results bold.

Table 3 .
SSIM results, data type: 10-bit (NEAR = 8) means a 10-bit remote sensing image compressed by JPEG-LS with a NEAR value set to 8. Best results bold.

Table 4 .
LS-PSNR (dB) results, data type: 10-bit (NEAR = 8) means a 10-bit remote sensing image compressed by JPEG-LS with a NEAR value set to 8. Best results bold.

Table 5 .
LS-SSIM results, data type: 10-bit (NEAR = 8) means a 10-bit remote sensing image compressed by JPEG-LS with a NEAR value set to 8. Best results bold.