Towards Streamlined Single-Image Super-Resolution: Demonstration with 10 m Sentinel-2 Colour and 10–60 m Multi-Spectral VNIR and SWIR Bands

: Higher spatial resolution imaging data are considered desirable in many Earth observation applications. In this work, we propose and demonstrate the TARSGAN (learning Terrestrial image deblurring using Adaptive weighted dense Residual Super-resolution Generative Adversarial Network) system for Super-resolution Restoration (SRR) of 10 m/pixel Sentinel-2 “true” colour images as well as all the other multispectral bands. In parallel, the ELF (automated image Edge detection and measurements of edge spread function, Line spread function, and Full width at half maximum) system is proposed to achieve automated and precise assessments of the effective resolutions of the input and SRR images. Subsequent ELF measurements of the TARSGAN SRR results suggest an averaged effective resolution enhancement factor of about 2.91 times (equivalent to ~3.44 m/pixel for the 10 m/pixel bands) given a nominal SRR upscaling factor of 4 times. Several examples are provided for different types of scenes from urban landscapes to agricultural scenes and sea-ice ﬂoes.


Introduction
Very high spatial resolution imaging data play an important role in many fields of Earth Observation (EO) applications, such as precision agriculture, forestry, urban planning, city intelligence, cartography, geology, oceanography, and energy and utility maintenance. Although there are very high spatial resolution imaging sensors, e.g., the 31 cm/pixel Digital Globe ® WorldView-3 images, the cost of such very high spatial resolution images is generally high, especially when and where large spatial-temporal volumes are required. On the other hand, while improvements in the spatial resolution are gaining priority in the design of new optical-electronic sensors onboard EO satellites, we still need to trade-off spatial resolution against spectral resolution, swath-width, signal-to-noise ratio of the sensor, launch mass, and requested telecommunications bandwidth. Subsequently, using super-resolution restoration (SRR/SR) to enhance existing EO data, especially those open access data, such as the European Space Agency's Copernicus Sentinel systems, is becoming an increasingly attractive alternative, especially if the resultant products can be employed to derive higher spatial resolution products like reflectance and derivatives of reflectance.
SRR refers to the process of enhancing (or increasing) the spatial resolution of images (or video frames) by exploiting non-redundant information from a set of repeat observations, or through a deep learning-based training and inference process. The growing technology interest in SRR, over the past 20 years, has led to the development and subsequent applications of many new algorithms, networks, and/or optimisations [1][2][3][4]. 2 of 25 Classically, SRR was based on the idea of combining non-redundant information from multiple overlapping lower resolution (LR) images to produce the best estimation of a higher resolution (HR) image. This process was done either via image sub-pixel stacking [5,6], exploring the shifting and aliasing properties of the frequency domain [7][8][9], image degradation modelling [10][11][12], or multi-angle view modelling [13][14][15].
Over the past ten years, deep learning techniques have been very successful in the field of SRR due to their performance in terms of processing speed and flexibility over different input data. A variety of deep networks have been proposed over this time period to address the SRR problem. This includes the use of residual networks [16][17][18], recursive networks [19,20], selective attention networks [21,22], and Generative Adversarial Networks (GANs) [23][24][25][26][27]. Among these, the most recent works include Wide Activation Deep Residual SR network (WDSR) [17], the Residual Channel Attention Network (RCAN) [22], and the Multi-scale Adaptive weighted dense Residual SR GAN (MARSGAN) [27].
In particular, WDSR [17] improves the Enhanced Deep residual SR network (EDSR) [16], using slim residual blocks that have wider channels (2 to 9 times), while keeping the same parameter complexity. WDSR uses linear low-rank convolutions that factorise large convolutional kernels into two low-rank convolutional kernels, and use weight normalisation, to tackle the issues of slimmed layer pathway and training of very deep networks, respectively. RCAN [22] employs a residual-in-residual architecture and the Residual Channel Attention Blocks (RCABs) as its basic residual blocks, in order to rescale features adaptively by considering interdependencies between feature channels. MARSGAN [27] employs a very deep, densely connected, and adaptively weighted residual-in-residual architecture to further improve network capacity and information flow on top of the SR GAN network (SRGAN) [23] and the Enhanced SR GAN (ESRGAN) [25] network.
While many different ideas have been proposed to optimise the existing SRR networks, improvements in SRR performance have become more and more marginal if the modifications are purely based on the network architecture. Therefore, recent studies have been focused on either exploring different loss functions (e.g., exploring the perceptual-pleasing solutions [23][24][25]), or exploring the effect of using more realistic training datasets [28,29].
In particular, the Content Adaptive Re-sampler (CAR) based SRR model [28] employs a separately learned content-adaptive image downscaling model, which produces LR images that could keep the key information for best reproducing the HR images. The authors achieved state-of-the-art SRR performance in 2019/2020 using training datasets produced through CAR with an existing EDSR network [16]. Moreover, the authors in [29] constructed a "real-world" SRR training dataset (called RealSR), where paired LR and HR images are captured by adjusting the focal length of a digital camera, to replace the traditional synthetic training LR images, i.e., the bicubic down-sampled HR. RealSR achieved state-of-the-art SRR performance in 2020/2021 using their "real-world" training dataset with the existing RCAN network as well as a newly proposed Laplacian Pyramid-based Kernel Prediction Network (LP-KPN) [29].
In this paper, we further explore our in-house MARSGAN [27] model that was previously developed for Mars applications, using the Sentinel-2 10 m/pixel colour images and 10-60 m/pixel multi-spectral images. Inspired by [28,29], we propose practical modifications of the loss function, training dataset, and network architecture of MARSGAN, which we call learning Terrestrial image deblurring with Adaptive weighted dense Residual SR GAN (TARSGAN).
We show TARSGAN SRR results using the 10 m/pixel Sentinel-2 "true" colour images over a wide range of different types of natural and artificial surface features. These include SRR over urban sites (buildings, roads, cars, ships, airports), forestry and agriculture sites, and natural sites (mountains, deserts, the sea, the snow, and the sea-ice). Figure 1 gives an example of the 10 m/pixel Sentinel-2 "true" colour image and the 3.44 m/pixel TARSGAN SRR results over a geo-calibration site at Baotou, China.
We show TARSGAN SRR results using the 10m/pixel Sentinel-2 "true" colour images over a wide range of different types of natural and artificial surface features. These include SRR over urban sites (buildings, roads, cars, ships, airports), forestry and agriculture sites, and natural sites (mountains, deserts, the sea, the snow, and the sea-ice). Figure 1 gives an example of the 10m/pixel Sentinel-2 "true" colour image and the 3.44m/pixel TARSGAN SRR results over a geo-calibration site at Baotou, China. Moreover, we compare the spectral reflectance of the multispectral SRR product against the original Sentinel-2 multispectral product on all available spectral bands to demonstrate spectral invariance of the proposed TARSGAN system. Furthermore, we propose an automated image effective resolution assessment system, using automated image Edge detection and filtering, and automated measurements of Edge Spread Function (ESF), Line Spread Function (LSF), and Full Width at Half Maximum (FWHM) -for brevity, this system is referred to hereafter as ELF. ELF is considered essential to building a streamlined and on-demand SRR processing system, in the future, which would require automated SRR algorithm selection and performance evaluation.
The ELF measurements suggest a factor of 2.91 times of effective resolution improvement on top of the 10m/pixel Sentinel-2 "true" colour images using the proposed TARSGAN SRR system. This suggests our Sentinel-2 TARSGAN SRR results have an averaged effective spatial resolution of about 3.44m/pixel. More importantly, in contrast to other generative SRR networks, TARSGAN does not introduce synthetic textures and artefacts.
With the proposed TARSGAN SRR and ELF effective resolution assessment system, we believe Sentinel-2 global 10-60m/pixel multispectral images can be transformed into 3-20m/pixel multispectral SRR images, fully automatically in the near future (readers Moreover, we compare the spectral reflectance of the multispectral SRR product against the original Sentinel-2 multispectral product on all available spectral bands to demonstrate spectral invariance of the proposed TARSGAN system. Furthermore, we propose an automated image effective resolution assessment system, using automated image Edge detection and filtering, and automated measurements of Edge Spread Function (ESF), Line Spread Function (LSF), and Full Width at Half Maximum (FWHM) -for brevity, this system is referred to hereafter as ELF. ELF is considered essential to building a streamlined and on-demand SRR processing system, in the future, which would require automated SRR algorithm selection and performance evaluation.
The ELF measurements suggest a factor of 2.91 times of effective resolution improvement on top of the 10 m/pixel Sentinel-2 "true" colour images using the proposed TARS-GAN SRR system. This suggests our Sentinel-2 TARSGAN SRR results have an averaged effective spatial resolution of about 3.44 m/pixel. More importantly, in contrast to other generative SRR networks, TARSGAN does not introduce synthetic textures and artefacts.
With the proposed TARSGAN SRR and ELF effective resolution assessment system, we believe Sentinel-2 global 10-60 m/pixel multispectral images can be transformed into 3-20 m/pixel multispectral SRR images, fully automatically in the near future (readers should refer to the conceptual implementation of a streamlined SRR processing system in Section 4.3), allowing better analytics to be performed in a transformative way.
The layout of this paper is as follows. In Section 2.1, we introduce the training and test dataset. In Sections 2.2 and 2.3, we introduce technical details of the TARSGAN SRR system. In Section 2.4, we describe the ELF image effective resolution assessment system. Experimental results of the ELF system, TARSGAN SRR of 10 m/pixel Sentinel-2 "true" colour images, and TARSGAN SRR of the 10-60 m/pixel multispectral images, are demonstrated in Section 3.1, Section 3.2, and Section 3.3, respectively. In Section 4, we discuss key issues, potential improvements, and future work before drawing conclusions in Section 5.

Datasets for Testing and Training
Our test dataset in this work consists of Sentinel-2 images. The Copernicus Sentinel-2 mission comprises a constellation of two identical polar-orbiting satellites (Sentinel-2A/2B), providing multi-spectral moderate spatial resolutions from 10 m/pixel to 60 m/pixel for its visible and near-infrared (VNIR) bands and short-wave infrared (SWIR) bands. Details of the Sentinel-2A/2B image spatial resolutions and spectral information can be found in Table 1. Sentinel-2A and Sentinel-2B are phased at 180 • to each other, placed in the same sun-synchronous orbit. Sentinel-2A and Sentinel-2B have a wide swath width of 290km and together provide a high revisit frequency (5 days at equator and 2-3 days at mid-latitudes). Sentinel-2 data are accessible through the Copernicus open access hub (previously known as Sentinel scientific data hub; https://scihub.copernicus.eu/; accessed on 2 July 2021). In this work, we perform SRR testing with a wide range of Sentinel-2 images over 6 different testing sites covering different types of natural and artificial surface feature/targets (see Table 2). The 6 test sites are located over Baotou/China, Dubai/ United Arab Emirates, Hainich/Germany, London/UK, Desert Rock/US, and Lincoln Sea/Greenland. These surface features/targets include artificial structures, residential buildings, industrial buildings, farms, countryside roads, highway roads, tower buildings, ships, artificial islands, airports, airplanes, forest, isolated trees, hills, mountains, train stations, urban building blocks, urban landmarks, bridges, deserts, river, sea-ice, leads, open water, and snow-covered surfaces. In this work, we mainly demonstrate SRR results with the level 1 (L1C) images, however, for 2 of the 6 sites (Site-3 Hainich and Site-4 London), where atmosphere clarity is low, we also show SRR results from the level 2 (L2A) images. These form a total number of 8 test Sentinel-2 images. Our training dataset is formed with Deimos Imaging, S.L. Deimos-2 images. Deimos-2 is a high-resolution Earth observation satellite, owned and operated by Deimos Imaging, S.L. Deimos-2 collects 0.75 m/pixel panchromatic (PAN) band and 4 m/pixel Multi-Spectral (MS) band images with a swath width of 12km (at nadir) from an orbit at~600 km. The MS capability includes 4 channels in the visible: Red, Green, Blue bands and near-infrared (NIR) band. In this work, our training dataset is conducted with 102 non-repeat and cloudfree Deimos-2 PAN band images (sampled at 1 m/pixel), which consists of 300,512 pairs of LR and HR training samples. Instead of simply performing the "standard" bicubic downsampling processing of the HR images (Deimos-2 PAN) to produce their LR counterparts, we use bicubic down-sampling followed by an average up-sampling and Gaussian blurring operations, to form the degraded LR images at the same scale as the HR images (1 m/pixel).

Key Modifications of MARSGAN
In contrast to "photo-enhancing" SRR tasks, the desired SRR outputs of remote sensing applications are fundamentally different. In remote sensing applications, higher signal-tonoise ratio (SNR), minimised artefacts, sharper edges and object outlines, and ultimately, the higher image effective resolution, are much more desirable in comparison to "recreating" high-frequency textures and/or objects. The original design of SRGAN [23] and ESRGAN [25] are based on the idea that human vision does not care if the generated highfrequency textures are not strictly correlated with the ground truth as long as they look realistic. Such generated high-frequency textures can significantly improve the "perceptual sharpness" but are considered artefacts in remote sensing or scientific applications. For example, satellite image users probably do not want a synthetic map even it looks extremely real. Therefore, we consider perceptual quality-driven SRR techniques are not suitable to be used directly in any remote sensing applications. In the original work of MARSGAN [27], experiments were made to reduce the weights of perceptual loss terms, but consequently, the edge sharpness is also lowered as a trade-off of reducing the high-frequency artefacts.
In this work, we base our model on the MARSGAN architecture [27] but abandon the idea of training the model with weighted perceptual loss. Instead, we use a structural similarity loss (see Section 2.3 for details) to reconstruct sharper outlines, without recreating any synthetic textures/objects. Moreover, the authors in [28,29], demonstrated that the information contained within the LR image actually plays an important role in successive SRR restoration on top of purely improving a network architecture. Inspired by this, we propose to bring the LR image into the same passive resolution as the HR image and apply a blurring operation at the HR scale, to better model the fuzzy appearance of an LR image after being upscaled or unsuccessfully super-resolved. This is based on the observation that even high-frequency components can be effectively learned in the LR space with upscaling convolutions in the end (as discovered since some early works of [30,31]), the information of the blurring effect is not well preserved in the LR space. For example, an oversmoothed edge of an image could be seen as a sharp zigzag edge after the down-sampling of the image. Figure 2 shows an example of the TARSGAN training LR image that was created via the proposed down-sampling, up-sampling, and Gaussian blur operations (refer to Section 2.1) of the 1 m/pixel Deimos-2 PAN band image, in comparison to the "standard" training LR image created from a simple bicubic down-sampling operation as used in general in SRR works.
original work of MARSGAN [27], experiments were made to reduce the weights of perceptual loss terms, but consequently, the edge sharpness is also lowered as a trade-off of reducing the high-frequency artefacts.
In this work, we base our model on the MARSGAN architecture [27] but abandon the idea of training the model with weighted perceptual loss. Instead, we use a structural similarity loss (see Section 2.3 for details) to reconstruct sharper outlines, without recreating any synthetic textures/objects. Moreover, the authors in [28] and [29], demonstrated that the information contained within the LR image actually plays an important role in successive SRR restoration on top of purely improving a network architecture. Inspired by this, we propose to bring the LR image into the same passive resolution as the HR image and apply a blurring operation at the HR scale, to better model the fuzzy appearance of an LR image after being upscaled or unsuccessfully super-resolved. This is based on the observation that even highfrequency components can be effectively learned in the LR space with upscaling convolutions in the end (as discovered since some early works of [30] and [31]), the information of the blurring effect is not well preserved in the LR space. For example, an oversmoothed edge of an image could be seen as a sharp zigzag edge after the downsampling of the image. Figure 2 shows an example of the TARSGAN training LR image that was created via the proposed down-sampling, up-sampling, and Gaussian blur operations (refer to Section 2.1) of the 1m/pixel Deimos-2 PAN band image, in comparison to the "standard" training LR image created from a simple bicubic down-sampling operation as used in general in SRR works. In summary, our goal in this work is to limit the room for the SRR network to learn "synthetic SRR" and encourage the network to learn "deblurring SRR". We focus on training a deblurring-oriented SRR network, i.e., TARSGAN, that fits the goal of remote sensing SRR applications, by modifying the loss function and removing the up-scaling process of the original MARSGAN system [27] and constructing a new training dataset that preserves the blurring information.

The TARSGAN System
The backbone of our proposed TARSGAN system is the MARSGAN model [27], which itself is based on a GAN framework [32][33][34]. GAN provides an efficient framework for learning generative tasks like SRR. Described in the fundamental work of [32,33], GAN In summary, our goal in this work is to limit the room for the SRR network to learn "synthetic SRR" and encourage the network to learn "deblurring SRR". We focus on training a deblurring-oriented SRR network, i.e., TARSGAN, that fits the goal of remote sensing SRR applications, by modifying the loss function and removing the up-scaling process of the original MARSGAN system [27] and constructing a new training dataset that preserves the blurring information.

The TARSGAN System
The backbone of our proposed TARSGAN system is the MARSGAN model [27], which itself is based on a GAN framework [32][33][34]. GAN provides an efficient framework for learning generative tasks like SRR. Described in the fundamental work of [32,33], GAN trains a generative model for SRR, whilst in parallel, it trains a discriminator model to distinguish the predicted SRR image from ground-truth HR. Through alternative updates of the two adversarial networks, the generative model is trained to produce SRR images that are barely distinguishable from the HR images. For TARSGAN, we apply two practical modifications to the original MARSGAN system as follows.
Firstly, the adaptive weighted multi-scale reconstruction block is removed as the training LR and testing LR images are pre-upsampled at the same passive resolution as the HR images as described previously. Denoting the SRR image as I SRR , the LR image as I LR , and the HR ground truth as I HR . The TARSGAN generator can be simplified from the MARSGAN generator [27] as N (N = 16 in this work) layers of Adaptive-weighted Residual-in-Residual Dense Blocks (AWRRDBs), where layers exclude the first and the last layers. The first layer has 64 filters of size 3 × 3 for initial feature extraction, denoted as f ext , and the last layer has a single filter of size 3 × 3 × 64 for SRR image reconstruction, denoted as f rec ,. Denoting the n-th (n ∈ N) AWRRDB unit as f n AWRRDB , the TARSGAN generator can be expressed as It should be noted that we use fewer AWRRDB layers in TARSGAN (N = 16) in comparison to MARSGAN (N = 23), as needed to reduce the computation cost when having LR image at the same scale as the HR image. However, we empirically found that because TARSGAN is not trained with a perceptual loss term, the improvement from stacking more AWRRDB layers is marginal. For AWRRDBs, we use the same architecture as described in MARSGAN [27], i.e., each AWRRDB contains 3 dense blocks, and each dense block contains 5 convolutional layers (3 × 3 kernels, 32 feature maps, stride 1) and 4 Leaky Rectified Linear Unit (LReLU) activation with a negative slope of 0.2. The generator network architecture of TARSGAN is shown in Figure 3. For a detailed description of the AWRRDB blocks, please refer to [27]. trains a generative model for SRR, whilst in parallel, it trains a discriminator model to distinguish the predicted SRR image from ground-truth HR. Through alternative updates of the two adversarial networks, the generative model is trained to produce SRR images that are barely distinguishable from the HR images. For TARSGAN, we apply two practical modifications to the original MARSGAN system as follows.
Firstly, the adaptive weighted multi-scale reconstruction block is removed as the training LR and testing LR images are pre-upsampled at the same passive resolution as the HR images as described previously. Denoting the SRR image as , the LR image as , and the HR ground truth as . The TARSGAN generator can be simplified from the MARSGAN generator [27] as ( = 16 in this work) layers of Adaptive-weighted Residual-in-Residual Dense Blocks (AWRRDBs), where layers exclude the first and the last layers. The first layer has 64 filters of size 3 × 3 for initial feature extraction, denoted as , and the last layer has a single filter of size 3 × 3 × 64 for SRR image reconstruction, denoted as ,. Denoting the n-th ( ∈ ) AWRRDB unit as , the TARSGAN generator can be expressed as It should be noted that we use fewer AWRRDB layers in TARSGAN ( = 16) in comparison to MARSGAN ( = 23), as needed to reduce the computation cost when having LR image at the same scale as the HR image. However, we empirically found that because TARSGAN is not trained with a perceptual loss term, the improvement from stacking more AWRRDB layers is marginal. For AWRRDBs, we use the same architecture as described in MARSGAN [27], i.e., each AWRRDB contains 3 dense blocks, and each dense block contains 5 convolutional layers (3 × 3 kernels, 32 feature maps, stride 1) and 4 Leaky Rectified Linear Unit (LReLU) activation with a negative slope of 0.2. The generator network architecture of TARSGAN is shown in Figure 3. For a detailed description of the AWRRDB blocks, please refer to [27]. Secondly, we redefine the total loss function, denoted as , as a weighted sum of the Mean Squared Error (MSE) loss, denoted as , the adversarial loss, denoted as , and the Structural Similarity (SSIM) [35] loss, denoted as . SSIM is a commonly used metric in image reconstruction tasks; in particular, it has been widely used in unsupervised image depth estimation tasks to quantify the differences between a backprojected image and the reference image (e.g., [36,37]), and as well as being an evaluation Secondly, we redefine the total loss function, denoted as l total , as a weighted sum of the Mean Squared Error (MSE) loss, denoted as l MSE , the adversarial loss, denoted as l gen , and the Structural Similarity (SSIM) [35] loss, denoted as l SSI M . SSIM is a commonly used metric in image reconstruction tasks; in particular, it has been widely used in unsupervised image depth estimation tasks to quantify the differences between a back-projected image and the reference image (e.g., [36,37]), and as well as being an evaluation metric in many SRR works representing the retrieval quality of structural features. SSIM is derived using patterns of pixel intensities among neighbouring pixels with normalised brightness and contrast as introduced in [35]. For the generated target image I SRR and the reference truth I HR , SSI M(I SRR , I HR ) can be formulated as where µ I SRR , µ I HR , σ I SRR , σ I HR , and σ I SRR ,I HR are the local means, standard deviations, and cross-covariance of I SRR and I HR , respectively. C 1 and C 2 are constants based on the dynamic range of pixel values. As SSIM has an upper bound of 1, l SSI M can be defined as The other two loss terms, i.e., l MSE and l gen , are the same as the ones described in [27]. The total loss l total of TARSGAN can be expressed as where γ, λ and η are the weights to balance the pixel-wise MSE loss, adversarial loss of the discriminator, and the SSIM loss. In practice, TARSGAN is initialised with γ = 1, The initial learning rate is 10 −4 , and standard Adam optimisation [38] is used with β 1 = 0.9 and β 2 = 0.99. Training and testing are achieved on the latest Nvidia ® RTX 3090 GPU (Graphic Processing Unit).
As discussed in [23,27], the potential texture details from an SRR network, are typically synthetic textures (if not absent) and therefore cannot be "pixel-to-pixel" matched with the ground truth HR, thus leading to a smoother solution that averages all potential synthetic solutions when an MSE loss is used. Optimising an SRR network with the MSE loss generally results in a smoothed reconstruction, however, with fewer synthetic artefacts. In TARSGAN, we initialise the network first towards a smoother solution with respect to the HR image to resolve large-scale and intermediate-scale features. Then the network is refined towards better structural similarity measurement with respect to the HR image to resolve shaper edges and shape/outline of small objects that are visible (but blurred) in the LR image. For small objects or textures that are fundamentally not visible from the LR image, we do not try to re-create them with TARSGAN.

The ELF System
In parallel to the TARSGAN SRR system, we also propose the ELF automated imageeffective resolution assessment system. The design of ELF is based on the Imatest ® (see https://www.imatest.com/; accessed on 2 July 2021) slanted-edge method and previous collaborative work within the UK Space Agency funded SuperRes-EO project using FWHM to assess the image effective resolutions. ELF measures the averaged FWHM of all detectable slanted edges within an SRR image and compared against the averaged FWHM of the same edges within the corresponding LR image. The overall workflow of ELF is shown in Figure 4.
ELF takes the SRR image and the reference LR image, which is up-sampled to the same scale as the SRR image, as inputs, and follows 9 processing steps that are briefly described below.
(1) Create a binary image from the input SRR image using the Otsu adaptive thresholding method [39]. (2) Use a Canny edge detector [40] to extract all potential edges.
(3) Use a Hough transform [41] to detect potential lines from the output of (2) and filter for the given thresholds of lengths, gaps, and intersections. (4) Crop for any number of regions of interest (ROIs) centred on the filtered lines and apply the same cropping for the same areas with the same sizes using the corresponding LR image.  ELF takes the SRR image and the reference LR image, which is up-sampled to the same scale as the SRR image, as inputs, and follows 9 processing steps that are briefly described below.
1) Create a binary image from the input SRR image using the Otsu adaptive thresholding method [39]. 2) Use a Canny edge detector [40] to extract all potential edges. 3) Use a Hough transform [41] to detect potential lines from the output of (2) and filter for the given thresholds of lengths, gaps, and intersections. 4) Crop for any number of regions of interest (ROIs) centred on the filtered lines and apply the same cropping for the same areas with the same sizes using the corresponding LR image. 5) Perform image normalisation within each crop for both the crops from SRR and crops from LR. 6) Calculate and plot the ESF for each slanted edge within each normalised crop from (5). 7) Filter each continuous ESF and only leave the peak ESF for each slanted edge. 8) Calculate and plot the LSF for each ESF from (7). 9) Calculate FWHM for each LSF from (8) and calculate the mean FWHM for the SRR and LR images. The final calculated mean FWHM (MFWHM) over all detected slanted edges is used to assess the image effective resolution with respect to the given native resolution of the LR image. The final effective resolution of the Sentinel-2 SRR image can be estimated by calculating the ratio, denoted as , of the ( ) and ( ), which is proportional to the resolution enhancement factor, denoted as . This is expressed as where Res(I LR ) and Res(I HR ) represent the native image resolution of I LR and I HR , respectively, and MFW HM(I LR ) and MFW HM(I HR ) represent the averaged FWHM of all detected slant edges of the same areas of I LR and I HR , respectively. The relationship between α and β can be explored and validation performed using the adjacent bands of the 10 m/pixel, 20 m/pixel, and 60 m/pixel Sentinel-2 images. In Section 3.1, we initially calculate β using the Sentinel-2 20 m/pixel B05 and 10 m/pixel B04 images (for α = 2), and then calculate β (for α = 2, 3, 6) using the 10 m/pixel B08, 20 m/pixel B8A, and 60 m/pixel B09 images to ensure that the inter-comparison is done with spectrally close channels.

Estimation of Image Effective Resolution through ELF
In order to estimate the image effective resolution through the ELF measurements, we perform two experiments based on the original Sentinel-2 10 m/pixel, 20 m/pixel, and 60 m/pixel images. The first experiment (Exp-1) is based on the 10 m/pixel B04 image and 20 m/pixel B05 image. The second experiment (Exp-2) is based on the 10 m/pixel B08 image, 20 m/pixel B8A image, and 60 m/pixel B09 image.
We calculate MFW HM(I HR ) and MFW HM(I LR ) in Equation (5) using all detectable edges from I HR and the same slanted edges at the same locations from I LR , respectively, thus ensuring that the value of β can be computed for a specific image crop. The purpose of Exp-1 (using B05/B04) is to validate against Exp-2 (using B8A/B08), to check if there is any significant difference in the calculated β, when α is fixed (α = 2). As the computed α is demonstrated similar for using B05/B04 and using B8A/B08, we therefore repeat Exp-2 with 4 cropped images of size 8 km × 8 km to calculate the mean values of β for 2 times, 3times, and 6 times resolution differences, using B08 and B8A (α = 2), B8A and B09 (α = 3), and B08 and B09 (α = 6), respectively.
In Figure 5, we demonstrate two examples (two detected slanted edges) within one 8 km × 8 km crop (at Site-1) using 10 m/pixel B04 image (I HR ) and 20 m/pixel B05 image (I LR ). For all measurement records of all detected slanted edges, please refer to the Supplementary Material. There are 29 valid FWHM measurements out of 77 detected slanted edges for this image crop. The calculated MFW HM(I HR ) is 3.19 pixels and MFW HM(I LR ) is 3.31 pixels, which suggests a β of 1.037 for an α of 2. In Figure 6, we demonstrate with two other examples (two detected slanted edges) within the same 8 km × 8 km crop (at Site-1) but using three bands, i.e., 10 m/pixel B08 (I HR ), 20 m/pixel B8A (I LR ), and 60 m/pixel B09 (I LR ). The FWHM measurements of the B08 image are compared against the FWHM measurements of the B8A image for α = 2, and also against the FWHM measurements of the B09 image for α = 6. There are 41 and 16 valid FWHM measurements out of 105 detected slanted edges for B08 and B8A and for B08 and B09, respectively. The calculated MFW HM(I HR ) and MFW HM(I LR ) for B08 and B8A are 3.14 pixels and 3.42 pixels (averaged from 41 FWHM measurements), respectively. The calculated MFW HM(I HR ) and MFW HM(I LR ) for B08 and B09 are 3.33 pixels and 4.89 pixels (averaged from 16 FWHM measurements), respectively. These suggest an average β of 1.057 for α = 2, which is close to the measurements in Exp-1 (β = 1.037), and a β of 4.086 for α = 6.

Demonstration of TARSGAN SRR Results and Subsequent ELF Assessment
In order to demonstrate SRR performance over different features and targets, we show the results of six test sites with eight images, including six Sentinel-2 L1C images and two L2A images. For each Sentinel-2 image, we show four small crops (250 × 250 pixels each and with a nominal spatial resolution of 2.5m/pixel) that cover a variety of different features and targets of interest that are summarised in Table 2. N.B. to look into more details, please refer to the original full-size SRR images provided in the supplementary material. Figure 7 shows cropped examples (625 × 625 ) of the 10m/pixel Sentinel-2 "true" colour image (L1C) and TARSGAN SRR result of Site-1, which is located over Baotou, Inner Mongolia, China. It is part of the CEOS-WGCV (Committee on Earth Observation Satellites Working Group on Calibration and Validation) geometric calibration site described in [42,43]. Area-1 shows the artificial geo-calibration targets together with a few buildings and roads. Area-2 shows a dome-shaped building in the centre with gardens

Demonstration of TARSGAN SRR Results and Subsequent ELF Assessment
In order to demonstrate SRR performance over different features and targets, we show the results of six test sites with eight images, including six Sentinel-2 L1C images and two L2A images. For each Sentinel-2 image, we show four small crops (250 × 250 pixels each and with a nominal spatial resolution of 2.5 m/pixel) that cover a variety of different features and targets of interest that are summarised in Table 2. N.B. to look into more details, please refer to the original full-size SRR images provided in the Supplementary Material. Figure 7 shows cropped examples (625 m × 625 m ) of the 10 m/pixel Sentinel-2 "true" colour image (L1C) and TARSGAN SRR result of Site-1, which is located over Baotou, Inner Mongolia, China. It is part of the CEOS-WGCV (Committee on Earth Observation Satellites Working Group on Calibration and Validation) geometric calibration site described in [42,43]. Area-1 shows the artificial geo-calibration targets together with a few buildings and roads. Area-2 shows a dome-shaped building in the centre with gardens and roads surrounded. Area-3 shows farms with linear farm roads. Area-4 shows industrial building blocks. We can observe from the SRR image that the black and white geo-calibration targets in Area-1 and the farms in Area-3 were brought out with clearer outlines. The buildings and roads in Area 1,2 and 4 can be identified more easily from the SRR image in comparison to the original Sentinel-2 image. There is no artefact found in the SRR image for the four areas of Site-1. Figure 8 shows cropped examples (625 m × 625 m) of the 10 m/pixel Sentinel-2 "true" colour image (L1C) and TARSGAN SRR result of Site-2, which is located over Dubai, United Arab Emirates. Area-1 shows many tower buildings in the city centre. Area-2 shows two ships sailing on the sea that is close to the beach. Area-3 shows an artificial island over the nearby beach with flat buildings and roads. Area-4 shows an airport with parked airplanes and viaducts. We can observe from the SRR that the flat and tower buildings are much clearer, and the roads were well resolved in Area-1, 3 and 4. The ships and airplanes in Area-2 and 4 can be better identified from the SRR image. Especially for Area-2, fine-scale waves are revealed in the SRR image. There is no synthetic artefact (e.g., SRR generated objects/textures) found in the SRR image for the four areas of Site-2.
Remote Sens. 2021, 13, x FOR PEER REVIEW 13 of 26 and roads surrounded. Area-3 shows farms with linear farm roads. Area-4 shows industrial building blocks. We can observe from the SRR image that the black and white geo-calibration targets in Area-1 and the farms in Area-3 were brought out with clearer outlines. The buildings and roads in Area 1,2 and 4 can be identified more easily from the SRR image in comparison to the original Sentinel-2 image. There is no artefact found in the SRR image for the four areas of Site-1.

Figure 8
shows cropped examples (625 × 625 ) of the 10m/pixel Sentinel-2 "true" colour image (L1C) and TARSGAN SRR result of Site-2, which is located over Dubai, United Arab Emirates. Area-1 shows many tower buildings in the city centre. Area-2 shows two ships sailing on the sea that is close to the beach. Area-3 shows an artificial island over the nearby beach with flat buildings and roads. Area-4 shows an airport with parked airplanes and viaducts. We can observe from the SRR that the flat and tower buildings are much clearer, and the roads were well resolved in Area-1, 3 and 4. The ships and airplanes in Area-2 and 4 can be better identified from the SRR image. Especially for Area-2, fine-scale waves are revealed in the SRR image. There is no synthetic artefact (e.g., SRR generated objects/textures) found in the SRR image for the four areas of Site-2.  Figure 9 shows cropped examples (625 × 625 ) of the 10m/pixel Sentinel-2 "true" colour image (L1C) and TARSGAN SRR results of Site-3, which is located over a forestry and agriculture area near Hainich, Germany. Area-1 shows yellow (possibly rapeseed) and green coloured farms and a farmhouse in the centre. Area-2 shows an area of forest. Area-3 shows gridded farms under some thin clouds. Area-4 shows a terraced field with  Figure 9 shows cropped examples (625 m × 625 m) of the 10 m/pixel Sentinel-2 "true" colour image (L1C) and TARSGAN SRR results of Site-3, which is located over a forestry and agriculture area near Hainich, Germany. Area-1 shows yellow (possibly rapeseed) and green coloured farms and a farmhouse in the centre. Area-2 shows an area of forest. Area-3 shows gridded farms under some thin clouds. Area-4 shows a terraced field with farms on the ground. We can observe from the SRR image that the boundaries of the farms can be clearly identified in Area-1, 2 and 4. The farmhouse in Area-1 can be seen with clear outlines. Although the atmospheric clarity is very low in the L1C image, some of the individual trees in Area-2 can still be identified from the SRR image. In order to compare the SRR results from the atmospherically corrected L2A images, we show different cropped areas for the same site in Figure 10, focusing on the agriculture fields. In Figure 10, Area-1 shows some dark green coloured farms with farm roads. Area-2 shows a mixture of light green and dark green coloured farms with farm roads. Area-3 shows a small village surrounded by farms. Area-4 shows light green coloured farms with a road in the middle. We can observe from the SRR image that the boundaries of the farms were clearly brought out and the narrow farm roads are much more visible in comparison to the original Sentinel-2 image. There are no artefacts found from the SRR image for the eight areas of Site-3.   Figure 11 shows cropped examples (625 m × 625 m) of the 10 m/pixel Sentinel-2 "true" colour image (L1C) and TARSGAN SRR result of Site-4, which is located over London, UK. Area-1 shows the London Bridge train station and small buildings. Area-2 shows very dense building blocks and bridges crossing the Thames river. Area-3 shows more urban building blocks. Area-4 shows bridges and ships on the Thames river. We can observe from the SRR image that the building blocks in Area-2 and 3 look more realistic than the original Sentinel-2 image. The outlines of the bridges in Area-2 and 4 are clearer and cars on the bridges in Area-4 are identifiable in the SRR image. Although image quality is lowered by haze in the L1C image, edges of the different objects in all four areas were significantly improved with SRR. In comparison, we show an L2A image and its corresponding SRR result in Figure 12 for London over different landmarks. Although the L2A image has shown better contrast and more vivid colours, some stretching (intensity clipping) issues caused overexposures over very bright objects. Area-1 shows the millennium wheel ("London Eye") under some thin clouds. Area-2 shows the London Stadium. Area-3 shows the London commercial centre at Canary Wharf with many tower buildings. Area-4 shows Kensington Gardens under some thin clouds. We can observe from the SRR image that the super-resolved landmarks can be more easily identified from the SRR. In particular, the millennium wheel and the paths on the grass in Area-1 and the garden path of the Kensington park in Area-4 can all be better identified from the SRR image. Some buildings, in Area-2 and 3, are overexposed, but the shapes and outlines are clearer in SRR. There is no artefact found from the SRR image for the eight areas of Site-3.     A. Area-1 shows a mountain peak and a segment of the hillside road. Area-2 shows a mixture of desert and trees. Area-3 shows a yellow river in the desert. Area-4 shows a rural desert surface. Probably affected by the atmosphere clarity and lack of patterns, the quality of the SRR image for this site is lower than other sites. However, the shape of the mountain peak in Area-1, outlines of the individual trees in Area-2 and 4, and outline of the river in Area-3, in the SRR image have shown significant improvements over the original Sentinel-2 image.
buildings. Area-4 shows Kensington Gardens under some thin clouds. We can observe from the SRR image that the super-resolved landmarks can be more easily identified from the SRR. In particular, the millennium wheel and the paths on the grass in Area-1 and the garden path of the Kensington park in Area-4 can all be better identified from the SRR image. Some buildings, in Area-2 and 3, are overexposed, but the shapes and outlines are clearer in SRR. There is no artefact found from the SRR image for the eight areas of Site-3.   Figure 13 shows cropped examples (625 × 625 ) of the 10m/pixel Sentinel-2 "true" colour image (L1C) and TARSGAN SRR result of Site-5, which is in a rural area over Desert Rock, near Sedona and Flagstaff, Arizona, U.S.A. Area-1 shows a mountain peak and a segment of the hillside road. Area-2 shows a mixture of desert and trees. Area-3 shows a yellow river in the desert. Area-4 shows a rural desert surface. Probably affected by the atmosphere clarity and lack of patterns, the quality of the SRR image for this site is lower than other sites. However, the shape of the mountain peak in Area-1, outlines of the individual trees in Area-2 and 4, and outline of the river in Area-3, in the SRR image have      Table 4. The effective resolution enhancement factor (last column of Table 4-Avg. α) are calculated using the averaged value of β for each site (second last column of Table 4-Avg. β) and the calibrated values of α and β (α ∝ β) are shown in Table 3. It should be noted that the exemplar image crops are very small (250 × 250 pixels ), for the forest, sea, and desert crops, and as well as some crops that were affected by severe haze or under thin clouds, it is impossible to obtain any valid slanted edge ROIs from ELF (these missing values are marked as "-" in Table 4). The total averaged effective resolution enhancement factor (Avg. α) of 2.91 times suggests the TARSGAN SRR results have an averaged effective resolution of 3.44 m/pixel in comparison to the 10 m/pixel Sentinel-2 inputs.   Table 3. It should be noted that the exemplar image crops are very small (250 × 250 ), for the forest, sea, and desert crops, and as well as some crops that were affected by severe haze or under thin clouds, it is impossible to obtain any valid slanted edge ROIs from ELF (these missing values are marked as "-" in Table 4). The total averaged effective resolution enhancement factor (Avg. ) of 2.91 times suggests the TARSGAN SRR results have an averaged effective resolution of 3.44m/pixel in comparison to the 10m/pixel Sentinel-2 inputs.

Results from Multispectral Bands
The proposed TARSGAN model can be used to improve the image effective resolution of any different multispectral bands without changing their spectral property. In order to demonstrate spectral invariance of TARSGAN SRR, we test on each individual band over two test images (both cropped to a size of 30 km × 30 km)-one from Site-2 (covering different surface features of urban, desert, and water; image ID: S2B_MSIL2A_20210528T064629_N0300_R020_T40RCN_20210528T091914) and the other one from Site-3 (covering forestry and agriculture features; image ID: S2B_MSIL2A_ 20210531T101559_N0300_R065_T32UNB_20210531T140040). Figure 15 shows for Site-2 intercomparisons of all spectral bands of the SRR product against the original Sentinel-2 L2A surface reflectance product. It should be noted that the SRR images are down-sampled to the same scale as the L2A product in order to achieve the comparison. For 60 m bands, 100 × 100 pixels are plotted; for 20 m bands, 300 × 300 pixels are plotted for the same area, for 10 m bands, 300 × 300 pixels are plotted for a smaller area. For a detailed comparison of the full area of Site-2, please refer to the multispectral SRR product and corresponding Sentinel-2 L2A product, provided in the Supplementary Material. The proposed TARSGAN model can be used to improve the image effective resolution of any different multispectral bands without changing their spectral property. In order to demonstrate spectral invariance of TARSGAN SRR, we test on each individual band over two test images (both cropped to a size of 30 × 30 ) -one from Site-2 (covering different surface features of urban, desert, and water; image ID: S2B_MSIL2A_20210528T064629_N0300_R020_T40RCN_20210528T091914) and the other one from Site-3 (covering forestry and agriculture features; image ID: S2B_MSIL2A_20210531T101559_N0300_R065_T32UNB_20210531T140040). Figure 15 shows for Site-2 intercomparisons of all spectral bands of the SRR product against the original Sentinel-2 L2A surface reflectance product. It should be noted that the SRR images are down-sampled to the same scale as the L2A product in order to achieve the comparison. For 60m bands, 100 × 100 pixels are plotted; for 20m bands, 300 × 300 pixels are plotted for the same area, for 10m bands, 300 × 300 pixels are plotted for a smaller area. For a detailed comparison of the full area of Site-2, please refer to the multispectral SRR product and corresponding Sentinel-2 L2A product, provided in the supplementary material. Considering the resolution gap, the SSR surface reflectance values all show good correlations against the original Sentinel-2 L2A surface reflectance values-with the majority of the pixels lying on the 1:1 line, which can be observed from the individual scatter plots of Figure 15. Figure 16 shows the Site-3 intercomparisons of all spectral bands of the SRR product against the original Sentinel-2 L2A surface reflectance product. The same sampling rates are used as Site-2 intercomparisons. Similarly, a good correlation between the multispectral SRR product and the original Sentinel-2 L2A product can be observed. For a detailed comparison of the full area of Site-3, please refer to the multispectral SRR Considering the resolution gap, the SSR surface reflectance values all show good correlations against the original Sentinel-2 L2A surface reflectance values-with the majority of the pixels lying on the 1:1 line, which can be observed from the individual scatter plots of Figure 15. Figure 16 shows the Site-3 intercomparisons of all spectral bands of the SRR product against the original Sentinel-2 L2A surface reflectance product. The same sampling rates are used as Site-2 intercomparisons. Similarly, a good correlation between the multispectral SRR product and the original Sentinel-2 L2A product can be observed. For a detailed comparison of the full area of Site-3, please refer to the multispectral SRR product and corresponding Sentinel-2 L2A product, provided in the Supplementary Material. product and corresponding Sentinel-2 L2A product, provided in the supplementary material.

From MARSGAN to TARSGAN
In this work, we explore the MARSGAN [27] model with the Sentinel-2 L1C and L2A images. Two practical modifications were applied over the original MARSGAN network to form a new model, which we call TARSGAN. The first modification is removing the adaptive weighted multi-scale reconstruction block and using pre-upsampled and blurred training LR images to achieve better network learning of the blurring effect of the LR images and thus better resolve small and burry objects. The second modification is replacing the perceptual loss used in MARSGAN with an SSIM loss to obtain better overall edge sharpness, whilst avoiding adding synthetic or stochastic textures or artefacts to the data. Figure 17 shows some examples from Site-1 demonstrating the impact of the two modifications to MARSGAN with the original 10m/pixel Sentinel-2 image, the MARSGAN SRR as described in [27] (MARSGANv0), MARSGAN SRR trained with upsampled and blurred LR dataset using perceptual loss (MARSGANv1), and the MARSGAN SRR trained with up-sampled and blurred LR dataset using the SSIM loss (TARSGAN). From Figure 17, we can observe the impact of using the proposed training strategy (MARSGANv1) that results in the sharpest edges among the three, but with synthetic appearance, and replacing the perceptual loss with SSIM loss, which eliminates the synthetic artefacts, while keeping compatible edge sharpness.

From MARSGAN to TARSGAN
In this work, we explore the MARSGAN [27] model with the Sentinel-2 L1C and L2A images. Two practical modifications were applied over the original MARSGAN network to form a new model, which we call TARSGAN. The first modification is removing the adaptive weighted multi-scale reconstruction block and using pre-upsampled and blurred training LR images to achieve better network learning of the blurring effect of the LR images and thus better resolve small and burry objects. The second modification is replacing the perceptual loss used in MARSGAN with an SSIM loss to obtain better overall edge sharpness, whilst avoiding adding synthetic or stochastic textures or artefacts to the data. Figure 17 shows some examples from Site-1 demonstrating the impact of the two modifications to MARSGAN with the original 10 m/pixel Sentinel-2 image, the MARSGAN SRR as described in [27] (MARSGANv0), MARSGAN SRR trained with up-sampled and blurred LR dataset using perceptual loss (MARSGANv1), and the MARSGAN SRR trained with up-sampled and blurred LR dataset using the SSIM loss (TARSGAN). From Figure 17, we can observe the impact of using the proposed training strategy (MARSGANv1) that results in the sharpest edges among the three, but with synthetic appearance, and replacing the perceptual loss with SSIM loss, which eliminates the synthetic artefacts, while keeping compatible edge sharpness. Figure 17. Cropped examples of Site-1 of the original 10m/pixel Sentinel-2 image, the MARSGAN SRR as described in [27] (MARSGANv0), MARSGAN SRR trained with upsampled and blurred LR using perceptual loss (MARSGANv1), and the MARSGAN SRR trained with upsampled and blurred LR using the SSIM loss (TARSGAN). All subfigures are self contrast stretched and have sizes of 625 × 625 .

Potential improvements to TARSGAN and ELF
In this paper, we focused on the demonstration of the TARSGAN SRR results for Sentinel-2 "true" colour and multispectral images, alongside a demonstration of the automated image effective resolution assessment. The two proposed modifications of MARSGAN are considered light-touch to the original network architecture. In the future, we would like to test out more optimisation ideas to further improve the TARSGAN SRR results. For example, training the network in a cascaded manner, i.e., providing the intermediate 2 × resolution HR as supervision data to better guide the network learning artefact-free 4 × resolution enhancement. Or similar to [28], training a separate model to learn a comprehensive set of degradation effects of LR images, could also be helpful in constructing a better training dataset that not only focuses on the blurring effect of the LR images. Other work like separating the training datasets for different features (e.g.,

Potential Improvements to TARSGAN and ELF
In this paper, we focused on the demonstration of the TARSGAN SRR results for Sentinel-2 "true" colour and multispectral images, alongside a demonstration of the automated image effective resolution assessment. The two proposed modifications of MARS-GAN are considered light-touch to the original network architecture. In the future, we would like to test out more optimisation ideas to further improve the TARSGAN SRR results. For example, training the network in a cascaded manner, i.e., providing the intermediate 2× resolution HR as supervision data to better guide the network learning artefact-free 4× resolution enhancement. Or similar to [28], training a separate model to learn a comprehensive set of degradation effects of LR images, could also be helpful in constructing a better training dataset that not only focuses on the blurring effect of the LR images. Other work like separating the training datasets for different features (e.g., separating forestry scenes with urban scenes) may help improve the SRR performance for a particular category of the applications.
On the other hand, the proposed ELF system is capable of directly assessing the image effective resolution that is considered to be a better metric of representing the performance of an SRR algorithm, in comparison to the commonly used subjective quality metrics or perceptual indexes [44,45]. ELF does not need a reference HR image, which is generally difficult or expensive to obtain. However, ELF is not always applicable if test images are very small and do not contain sufficient edge features. In the future, we would like to extend the ELF system with automated circle feature detection and ring measurements to improve its robustness over scenes that have insufficient edge features. Moreover, combining the subjective quality metrics or perceptual indexes as well as automated detection of the smallest resolvable objects could also improve the performance of the image effective resolution assessment system.

A Future Streamlined SRR System
Over the last few years, deep learning-based techniques have achieved significant success in the field of SRR due to the richly available training datasets (as no labelling is required) and significantly faster processing speed in comparison to the traditional SRR approaches. However, some suboptimal restorations or occasional false predictions cannot be fully eliminated when using a single deep learning-based SRR network. Even with any state-of-the-art algorithms/models, the quality of the SRR results may still differ from dataset to dataset, from scene to scene, and from one area to another area. Therefore, we propose the concept of employing a streamlined SRR system on a GPU or GPU cloud server that is capable of achieving automated algorithms/networks selection and results assessments, which will provide the optimal solution for automated SRR processing of all sorts of input EO datasets. Figure 18 shows a conceptual implementation of this future streamlined SRR processing system using a processing scheduler with web-based data delivery, a combination of different SRR algorithms/networks as the "core processor", and a combination of the proposed ELF system and other image quality assessment metrics as the "quality assessor". separating forestry scenes with urban scenes) may help improve the SRR performance for a particular category of the applications. On the other hand, the proposed ELF system is capable of directly assessing the image effective resolution that is considered to be a better metric of representing the performance of an SRR algorithm, in comparison to the commonly used subjective quality metrics or perceptual indexes [44,45]. ELF does not need a reference HR image, which is generally difficult or expensive to obtain. However, ELF is not always applicable if test images are very small and do not contain sufficient edge features. In the future, we would like to extend the ELF system with automated circle feature detection and ring measurements to improve its robustness over scenes that have insufficient edge features. Moreover, combining the subjective quality metrics or perceptual indexes as well as automated detection of the smallest resolvable objects could also improve the performance of the image effective resolution assessment system.

A Future Streamlined SRR system
Over the last few years, deep learning-based techniques have achieved significant success in the field of SRR due to the richly available training datasets (as no labelling is required) and significantly faster processing speed in comparison to the traditional SRR approaches. However, some suboptimal restorations or occasional false predictions cannot be fully eliminated when using a single deep learning-based SRR network. Even with any state-of-the-art algorithms/models, the quality of the SRR results may still differ from dataset to dataset, from scene to scene, and from one area to another area. Therefore, we propose the concept of employing a streamlined SRR system on a GPU or GPU cloud server that is capable of achieving automated algorithms/networks selection and results assessments, which will provide the optimal solution for automated SRR processing of all sorts of input EO datasets. Figure 18 shows a conceptual implementation of this future streamlined SRR processing system using a processing scheduler with web-based data delivery, a combination of different SRR algorithms/networks as the "core processor", and a combination of the proposed ELF system and other image quality assessment metrics as the "quality assessor".

Conclusions
In this paper, we introduced the TARSGAN SRR model and the ELF image-effective resolution assessment system. We demonstrated TARSGAN SRR results using the 10m/pixel Sentinel-2 "true" colour images over a wide range of different types of natural Figure 18. Proposed future streamlined SRR processing system based on automated SRR and quality assessments.

Conclusions
In this paper, we introduced the TARSGAN SRR model and the ELF image-effective resolution assessment system. We demonstrated TARSGAN SRR results using the 10 m/pixel Sentinel-2 "true" colour images over a wide range of different types of natural and artificial surface features/targets. The ELF measurements show an averaged effective resolution enhancement factor of about 2.91 times over the passive SRR upscaling factor of 4 times. This suggests an effective resolution of~3.44 m/pixel achieved with TARSGAN SRR over the 10 m/pixel bands. In addition, the multispectral properties of the TARSGAN SRR images were demonstrated to have good correlation, considering the resolution gap of 4 times, in comparison to the original Sentinel-2 images for all spectral bands. This suggests multispectral applications (e.g., calculation of multispectral indices indicative of crop health/stress and potential yield in precision agriculture) could be seamlessly applied using the TARSGAN super-resolved images but with better precision. We believe the demonstrated Sentinel-2 TARSGAN SRR system has potential for new applications in a variety of different fields, such as planning infrastructure, public services and monitoring of small urban targets in the field of urban intelligence; providing field scale mapping and boundary management at a global scale in the field of agriculture; achieving better detection and classification accuracy of different science targets, e.g., sea-ice leads and melt-ponds, in the field of oceanology and geology.