MADNet 2.0: Pixel-Scale Topography Retrieval from Single-View Orbital Imagery of Mars Using Deep Learning

: The High-Resolution Imaging Science Experiment (HiRISE) onboard the Mars Reconnaissance Orbiter provides remotely sensed imagery at the highest spatial resolution at 25–50 cm/pixel of the surface of Mars. However, due to the spatial resolution being so high, the total area covered by HiRISE targeted stereo acquisitions is very limited. This results in a lack of the availability of high-resolution digital terrain models (DTMs) which are better than 1 m/pixel. Such high-resolution DTMs have always been considered desirable for the international community of planetary scientists to carry out fine-scale geological analysis of the Martian surface. Recently, new deep learning-based techniques that are able to retrieve DTMs from single optical orbital imagery have been developed and applied to single HiRISE observational data. In this paper, we improve upon a previously developed single-image DTM estimation system called MADNet (1.0). We propose optimisa-tions which we collectively call MADNet 2.0, which is based on a supervised image-to-height estimation network, multi-scale DTM reconstruction, and 3D co-alignment processes. In particular, we employ optimised single-scale inference and multi-scale reconstruction (in MADNet 2.0), instead of multi-scale inference and single-scale reconstruction (in MADNet 1.0), to produce more accurate large-scale topographic retrieval with boosted fine-scale resolution. We demonstrate the improvements of the MADNet 2.0 DTMs produced using HiRISE images, in comparison to the MADNet 1.0 DTMs and the published Planetary Data System (PDS) DTMs over the ExoMars Rosalind Franklin rover’s landing site at Oxia Planum. Qualitative and quantitative assessments suggest the proposed MADNet 2.0 system is capable of producing pixel-scale DTM retrieval at the same spatial resolution (25 cm/pixel) of the input HiRISE images.


Introduction
High-resolution digital terrain models (DTMs) have always been considered as a key geospatial data product for studying a planetary surface such as Mars. For example, DTMs derived from the Mars Express's 12.5-25 m/pixel High-Resolution Stereo Camera (HRSC) [1] are able to provide topographic context of a large area, whilst DTMs derived from the Mars Reconnaissance Orbiter (MRO) 6 m/pixel Context Camera (CTX) [2] and the ExoMars Trace Gas Orbiter (TGO) 4-6 m/pixel Colour and Stereo Surface Imaging System (CaSSIS) [3] allow us to study moderately finer scale topography at the decametre-scale. Moreover, for a small percentage of the Martian surface, DTMs derived from the MRO 25-50 cm/pixel High-Resolution Imaging Science Experiment (HiRISE) [4] targeted stereo acquisitions are able to reveal detailed topographic information of surface features or processes at metre-scale. The high-resolution DTMs from HiRISE have become an invaluable resource for the international community of planetary scientists to carry out geological analysis of the Martian landscapes.
However, due to the spatial resolution being very high, the surface coverage of HiRISE observations is rather limited. According to the product coverage shapefile (https://ode.rsl.wustl.edu/mars/coverage/ODE_Mars_shapefile.html, accessed on 15 October 2021) released on 2 September 2021, the existing HiRISE images have a total coverage of 3.405% of the Martian surface (see Figure 1). The total coverage of HiRISE targeted stereo acquisitions is lower-currently being 0.316%. This means with traditional photogrammetric methods, only 0.316% of the Martian surface can possibly be mapped into 3D at a high-resolution of about 1 m/pixel. For example, the publicly available NASA Planetary Data System (PDS) 1-2 m/pixel HiRISE DTMs (https://www.uahirise.org/dtm/, accessed on 15 October 2021) currently have a total surface coverage of 0.0297%. However, deep learning-based techniques have recently been developed that are able to retrieve DTMs using only a single HiRISE observation as input [5][6][7]. Using deep learning-based single-image DTM retrieval methods, ultra-high-resolution (25-50 cm/pixel) 3D information can now be derived "on-demand" for the remaining 3.098% area of the Martian surface, in which case, meaning scientific analysis that is reliant on high-resolution 3D will become feasible in these remaining areas. In this work, we build on our previous development of the Multi-scale Generative Adversarial U-Net based single-image DTM estimation system (MADNet; hereafter referred to as MADNet 1.0) [6] and propose several key modifications to the original network architecture and the processing pipeline to improve the results and to resolve the issues that are summarised in Section 2.1 of this paper. We propose the new MADNet 2.0 system that employs a simplified and optimised single-image DTM estimation network for pixel-scale DTM retrieval together with a coarse-to-fine reconstruction scheme on top of the DTM inference process. A new set of training data is constructed using selected PDS HiRISE DTMs and iMars CTX DTMs [8] which are available through the ESA's Guest Storage Facility [9] (https://www.cosmos.esa.int/web/psa/ucl-mssl_meta-gsf, accessed on 15 October 2021). We demonstrate the new MADNet 2.0 system with single-view HiRISE images over the ExoMars Rosalind Franklin rover's landing site at Oxia Planum [7,10]. The HRSC and CTX DTM mosaics over the same area (described in [7]) that are co-aligned with the Mars Global Surveyor's Mars Orbiter Laser Altimeter (MOLA) [11,12] global areoid DTM (https://planetarymaps.usgs.gov/mosaic/Mars_MGS_MOLA_DEM_mosaic_glo bal_463m.tif, accessed on 15 October 2021), are used as the baseline reference data. Qualitative assessments using colourised and shaded relief DTMs, and quantitative assessments using semi-automated crater size-counting, and automated slanted-edge image sharpness measurements, are provided for the resultant MADNet 2.0 DTMs, in comparison with the PDS HiRISE DTMs and the MADNet 1.0 HiRISE DTM mosaic product that was produced and described in [7]. In addition, DTM profile and difference measurements are performed for sub-areas to show the height accuracy of the resultant MADNet 2.0 HiRISE DTMs, in comparison with the PDS HiRISE DTMs and the reference CTX DTM.
MADNet 2.0 resolves major issues of the MADNet 1.0 system and produces highquality topography retrieval at the same spatial resolution of the input image (i.e., pixelscale topography retrieval). Figure 2 shows an example of side-by-side comparisons of one of the 25 cm/pixel input HiRISE images, the 1 m/pixel PDS DTM, the 50 cm/pixel MADNet 1.0 DTM, and the 25 cm/pixel DTM that is produced by the proposed MADNet 2.0 single-image DTM estimation system. The layout of the rest of the paper is as follows. In Section 1.1, we review previous work in the field of single-image deep learning-based height/depth estimation. In Section 2.1, we summarise the overall workflow and key remaining issues with the MADNet 1.0 system. This is followed by a discussion of the proposed changes and justifications for the MADNet 2.0 system in Section 2.2 and network training details in Section 2.3. In Section 2.4, we introduce the datasets and the experiments. Results are qualitatively demonstrated in Section 3.1 and quantitative assessments are provided in Section 3.2 and in Section 3.3 where we present profile and DTM difference measurements. In Section 4.1, we demonstrate the extensibility of the MADNet 2.0 system using a single HRSC image and the MOLA DTM as inputs to produce a high-quality pixel-scale HRSC DTM. This is followed by a brief discussion of the remaining limitations and future work in Section 4.2 before the conclusions are drawn in Section 5.

Previous Work
Although photogrammetry and multi-view stereo have a long history in the field of computer vision, height or depth from a single-view image (excluding traditional photoclinometry [13][14][15]) only started being considered feasible in recent years, alongside the great success of the development of deep learning techniques, wherein the term for singleimage DTM/height/depth estimation is generally referred to as "monocular depth estimation" (MDE). With a variety of potential applications in the fields of robotics, autonomous driving, virtual reality, and etc., several hundreds of MDE methods/networks have been proposed [16][17][18][19] over the last 7 years. In terms of training mechanisms, these MDE methods can be classified into supervised methods, requiring ground truth depth image for training, and unsupervised methods, using 3D geometry and requiring only multi-view images as inputs. For ground-based applications, MDE networks are generally trained with outdoor or indoor scenes [20][21][22] either with ground truth measured by LIDAR (Light Detection and Ranging) for supervised learning or with multi-camera or video inputs for calculation of the underlying 3D geometry for unsupervised learning.
In terms of the supervised methods, the earliest successful experiments of applying convolutional neural networks (CNNs) to address MDE was reported in [23,24], wherein the authors proposed a multi-scale CNN architecture, a scale-invariant mean squared error (MSE) and the gradients of the differences (in horizontal and vertical directions) based loss functions. Following these, fully CNNs and residual networks were proposed in [25][26][27] to address MDE, as well as using the Berhu loss function [28] to improve the efficiency and accuracy of the MDE models. Exploring the continuous characteristics of the neighbouring depth with semantic segmentation, [29][30][31][32] combined CNN with the conditional random field (CRF) models and semantic image segmentation to improve their MDE results. More recently, architectures based on the generative adversarial networks (GAN) [33], the encoder-decoder style U-net [34], and the selective attention-based networks, have all shown positive impacts on resolving MDE. Some of the representative methods of these can be found in [35,36] for GAN-based approaches, in [37,38] for U-net-based approaches, and in [39,40] for selective attention-based approaches.
In terms of the unsupervised methods, early breakthrough work was reported in [41,42], wherein the two groups of authors employed the disparity network and the pose network, to regress the transformations between stereo views (namely, "MDE from stereo consistency") and between neighbouring continuous frames (namely, "MDE from multiview"), respectively. Following these, the authors in [43,44], proposed the use of the structural similarity index measurement (SSIM) [45] based appearance loss, the disparity smoothness loss, and the auto-masking loss to overcome a variety of issues caused by using only the photogrammetric loss. For "MDE from stereo consistency", traditional methods of visual odometry and photogrammetry are typically employed [46][47][48], while on the other hand, optical flow methods are generally employed for "MDE from multiview" [49][50][51]. Similar to the supervised MDE methods, GANs were also introduced into unsupervised MDE methods, using reconstructed/reprojected views and real views as the inputs for training the discriminator network [52][53][54].
In addition to the above, some other work focused on the topic of "depth completion" [19], which is different but directly relevant to MDE, aiming at improving the quality of the MDE results. These works generally refer to MDE denoising [55,56] and MDE refinement [57,58]. More recently, new depth completion methods, such as depth super-resolution [59][60][61][62] using subpixel convolutional layers, and multi-resolution fusion [63,64] using content-adaptive depth merging networks, have also shown fairly good impacts on MDE, producing high-quality depth maps with effective spatial resolutions that are very close to those of the input images.
In contrast to the ground-based MDE tasks, single-image DTM estimation tasks using Mars orbital imagery [5][6][7] is different in many aspects. Firstly, the sizes of the target input images are different. For example, a HiRISE image (gigapixel) is much larger in size than a typical ground-based input image (megapixel). This results in extra procedures being required for spatial tiling, mosaicing, as well as multi-scale reconstruction to perform single-image DTM estimation tasks. Secondly, the viewpoints are different, resulting in the typical "global depth cues" that are used in MDE, such as linear perspective, object size and occlusion in relation to object position, texture density, atmospheric effects, being limited or non-existent in single-image DTM estimation tasks. Thirdly, the complexity and appearance of objects/features are different. In general, the surface features on Mars orbital imagery are structurally continuous, resulting in a reduced conflict for keeping structural consistency and bringing out high-frequency details. For example, keeping the object border clear and distinctive is always challenging in ground-based MDE tasks, because of the conflicting interests in keeping the large-scale depth continuous (e.g., a scene of a room or road) whilst making the local depth discrete (e.g., around object borders), but this is considered to be a minor issue in single-image DTM estimation tasks for Mars as the topography of the surface is always "continuous".

The MADNet 1.0 System and Summary of Existing Issues
Of particular relevance to this work, we previously proposed the MADNet 1.0 singleimage DTM estimation system [6] using supervised learning and multi-resolution 3D coalignment. The network core is based on a relativistic GAN framework [65,66] with the generator network based on the adaptively weighted multi-scale U-nets [34]. The overall processing chain of the MADNet 1.0 system is illustrated in Figure 3. It takes a co-registered higher resolution image and a lower resolution reference DTM as inputs, and uses image tiling, multi-scale relative height inference, height reconstruction of each tiled inference output, 3D co-alignment with respect to the reference DTM, and DTM tile mosaicing to produce an output DTM at half the resolution of the input image. For multi-scale relative height inference, the MADNet 1.0 generator uses three U-nets in parallel, operating at different image resolutions, each of which contains a stack of the dense convolution blocks (DCBs) [67] and up-projection blocks (UPBs) [27], and are merged with adaptive weights for estimation of a relative height map for each input image tile. . Simplified flow diagram of the MADNet 1.0 single-image DTM estimation system described in [6]. N.B. lighter coloured boxes represent the inputs and/or outputs. MADNet 1.0 results were demonstrated to be optimal, compared to DTM products processed using major photogrammetric pipelines for HRSC, CTX, CaSSIS, and HiRISE DTM processing, in terms of overall quality and effective resolution, as well as the processing speed [6,7]. However, there are four main issues with the MADNet 1.0 system, which are explained as follows, and demonstrated in detail in Section 3.1 with comparisons with the improved MADNet 2.0 results. (a) Degraded (or weakened) topographic variation/feature at the intermediate scale between the scales of the input image and reference DTM. This is due to the fact that large-scale (relative to the input image) topographic cues cannot be perceived by the trained model using the small-sized (512 × 512 pixels) tiled input. This missing information may also not be recoverable from the reference DTM if it is not large enough in scale for the lower-resolution reference DTM. This is generally not an issue if we use a reference DTM that has a spatial resolution close to (≤3 times) the resolution of the input image. However, if there is a large resolution gap between the input image and reference DTM, intermediate-scale topographic information of the MADNet 1.0 result is likely to be missing or inaccurate. (b) Inherited large-scale (relative to the input image) topographic errors or artefacts. Although we should always use a high-quality or pre-corrected DTM as the lower-resolution reference, some small-scale (relative to the reference DTM) errors are inevitable, and consequently, small-scale errors on the lower-resolution reference DTM can potentially become large-scale errors in the higher-resolution output DTM. Building and refining the MADNet 1.0 DTMs progressively using multi-resolution cascaded inputs could minimise the impact of the inherited photogrammetric errors and artefacts [7], but the issue cannot be fully eliminated, and the issue could become more obvious alongside enlarged resolution gap between the input image and reference DTM. (c) Inconsistent performance (mainly found for high-frequency features) on the DTM inference processes of different tiles of the full-scene input. The effect of this issue is that some of the resultant DTM tiles are sharper and some of the resultant DTM tiles are smoother, and consequently, all these resultant DTM tiles cannot be seamlessly mosaiced together without producing obvious artefacts. Even the smoother ones may still appear to be sharper in comparison to any photogrammetric results, and such issues will still cause discontinuities for local topographic features. Moreover, if there are large and frequent differences in sharpness of adjacent DTM tiles, this creates a patterned gridding artefact on the final DTM result. (d) Tiling artefacts caused by incorrect or inconsistent inference of large-scale slopes of neighbouring tiles. Due to the memory constraint of a graphics processing unit, the size of the tiled input for inference is limited, and consequently, a large image (e.g., HiRISE) needs to be divided into tens of thousands of tiles. As there are not enough "global height cues" within each input image tile, the predicted large-scale topographic information (e.g., a global slope) is highly likely to be incorrect or inaccurate. 3D fitting and overlapped blending were used in [7] to correct the large-scale error and minimise the impact of the inconsistent large-scale topography of adjacent tiles. However, minor height variations (typically being of the order of ~10 cm) still exist at the joints of neighbouring tiles on steep slopes.

The Proposed MADNet 2.0 System
In this paper, we propose the MADNet 2.0 single-image DTM estimation system that is based on the MADNet 1.0 system, but with three key modifications to overcome the aforementioned issues and produce pixel-scale DTM retrieval. The flow diagram of the proposed MADNet 2.0 system is shown in Figure 4, highlighting (in yellow) the modifications, in comparison to the MADNet 1.0 system. The three modifications are listed and described as follows.  There are two reasons for this change. Firstly, as previously explained, global height cues for large-scale features of a large image cannot be perceived by the network with small, tiled inputs, and subsequently, the coarse-scale and intermediate-scale inferences are usually incorrect. Figure 5 shows an example of the inference output from a single tiled input with MADNet 1.0, where the large-scale topography of the raw inference output incorrectly portrays that the high ground is at the bottom of the image. After 3D coalignment using the reference DTM, the high ground is corrected to the top-left corner of the image. However, even the large-scale topographic information can always be corrected via the subsequent 3D co-alignment process using a reference DTM, the multi-scale approach is considered unstable and rather redundant for the orbital image DTM estimation tasks. Secondly, due to the surface features seen in Mars orbital imagery being structurally continuous, global and local height cues are not necessarily needed to be separately considered. This is different from MDE using perspective views of indoor or outdoor scenes, where local depth cues from occlusions and global depth cues from linear perspectives need to be separately treated. It should be noted that any large-scale height variations (e.g., large-scale slopes that are spatially much bigger than the size of the input tile) are also removed from the training datasets so that more local features can be targeted and learnt by the network (please refer to Section 2.3). Figure 5 shows the MADNet 1.0 and MADNet 2.0 results from a single input image tile (512 × 512 pixels) of a HiRISE image. We can observe that the aforementioned issue of the large-scale topography of the MADNet 1.0 inference output being incorrect (from bottom to top for high to low) and is not co-aligned with the large-scale topography that is shown in the PDS HiRISE DTM and MADNet 2.0 DTM (from top-left to bottom-right for high to low). Without being constrained by the coarser scale inference introduced by MADNet 1.0, the MADNet 2.0 inference output shows more details of local features. In addition, we can observe from the MADNet 1.0 DTM result that the large-scale topography still does not agree with the MADNet 2.0 DTM and PDS HiRISE DTM, even though all of the three HiRISE DTMs are co-aligned with the same CTX reference DTM using the same method. This is due to the coarse-scale inference of MADNet 1.0 being incorrect, which has a negative impact on the subsequent 3D correction process. of UPB [27] with concatenation operation at the decoder end, and using the output of each convolution layer (before the pooling layer) of the encoder, instead of using the output of each pooling layer of the encoder, for concatenation with the corresponding output of each UPB of the decoder.
With this proposed network optimisation, MADNet 2.0 is now capable of producing pixel-scale inference, i.e., the output DTM has the same spatial resolution as the input image, instead of being half of the input resolution with MADNet 1.0. The network architecture of MADNet 2.0 is shown in Figure 6. The MADNet 2.0 generator network has an identical structure for its encoder arm with the fine-scale U-net of the MADNet 1.0 generator [6], consisting of a feature extraction layer (7 × 7 kernel, 64 feature maps, stride 2) and a max-pooling layer (3 × 3 kernel, stride 2), followed by a sequence of four DCB [67]convolution (1 × 1 kernel, stride 1, and with increasing numbers of feature maps of 64, 128, 256, …)-pooling layers (2 × 2 kernel, stride 2), which is then connected to its decoder arm. The decoder arm of the proposed MADNet 2.0 generator network consists of a sequence of five convolution (3 × 3 kernel, stride 1, and with decreasing number of feature maps)-UPB [27]-concatenation (with the corresponding output of each convolution layer from the encoder) layers and followed by a final reconstruction layer (3 × 3 kernel, stride 1). Other than the changes made on the decoder network, we use the same relativistic adversarial training procedure [65,66] and total loss function that is a weighted sum of the gradient loss, Berhu loss [28], and adversarial loss, detailed in [6]. (c) A coarse-to-fine multi-scale reconstruction process is implemented on top of the DTM estimation network.
There are two reasons for adding the coarse-to-fine reconstruction module. Firstly, with the proposed coarse-to-fine reconstruction process, intermediate-scale topographic information, which was reliant on the reference DTM in MADNet 1.0, can now be independently and more accurately retrieved in MADNet 2.0. The MADNet 1.0 system [6] only spatially slices the input image, and consequently, if there is a large resolution gap between the input image and the reference DTM, the intermediate-scale and large-scale topographic information that propagates across multiple DTM tiles could be missing or inaccurate (see Figure 7 for some examples). This is because each individual DTM tile is being predicted independently without being guided by its neighbouring content or global context. It should be noted that the MADNet 1.0 multi-scale reconstruction approach within the DTM estimation network cannot address this issue because it is targeted on a single tile. Secondly, the proposed coarse-to-fine reconstruction process can potentially reduce the inherited artefacts from the reference DTM. With multiple-scale reconstruction, artefacts from the reference DTM can be gradually rectified and their impact on the final fine-scale DTM is therefore weakened through each of the coarser-scale reconstruction processes.  Figure 7 demonstrates the issue of not having the proposed coarse-to-fine reconstruction process; intermediate-scale and large-scale topographic information could be lost or be inaccurate, as such information is solely retrieved from the low-resolution reference DTM in MADNet 1.0. For example, we can observe that even though the small-sized craters are satisfactorily retrieved (indicated with the blue arrows) with MADNet 1.0 (in comparison to the PDS HiRISE DTM), the medium-and large-sized craters appear to be blurred and smooth (indicated with the red arrows) with MADNet 1.0 (in comparison to the PDS HiRISE DTM and MADNet 2.0 results).
In MADNet 2.0, the coarse-to-fine reconstruction module is implemented as follows. Firstly, the input image is downscaled by 16 times and 4 times, then spatially tiled, and each image tile is processed into a DTM tile (inference) using the pre-trained MADNet 2.0 generator. Secondly, all the coarse-scale DTM tiles are corrected (using the same 3D coalignment method [6,7,68] that is employed in the MADNet 1.0 system) with respect to the input reference DTM and mosaiced together to produce a low-resolution DTM (1/16 times the input image resolution). Thirdly, all the intermediate-scale DTM tiles are corrected with respect to the low-resolution DTM produced at the previous step and mosaiced together to produce an intermediate-resolution DTM (1/4 times the input image resolution). Finally, all the fine-scale DTM tiles are corrected with respect to the intermediate-resolution DTM produced at the previous step and mosaiced together to produce the final high-resolution DTM, having the same resolution as the input image.

Network Training
The two sets of training data that were used for training of the MADNet 1.0 [6] network consist of 4200 resampled HiRISE ortho-rectified image (ORI) crops (at 4 m/pixel) and DTM crops (at 8 m/pixel) for the first-stage training, and 15,500 resampled HiRISE ORI crops (at 2 m/pixel) and DTM crops (at 4 m/pixel) for the second-stage training. These were formed from 450 publicly available PDS HiRISE DTMs and ORIs, each of which was resampled to match the effective resolution of the photogrammetric DTMs (roughly between 4 m/pixel and 8 m/pixel) and spatially sliced into 512 × 512 pixels (ORI) and 256 × 256 pixels (DTM) crops to match the data dimensions of the network input and output.
As described in Section 2.2, the coarse-scale and intermediate-scale U-nets have been removed in MADNet 2.0, and consequently, the two-stage training process is no longer needed. We have therefore formed a single new training dataset consisting of 4 m/pixel resampled HiRISE ORI and DTM crops, 2 m/pixel resampled HiRISE ORI and DTM crops, and in addition, 36 m/pixel downsampled CTX ORI and DTM crops from the 2300 publicly available iMars (http://www.i-mars.eu/, accessed on 15 October 2021) CTX DTMs and ORIs [8,9] are now included (available at https://www.cosmos.esa.int/web/psa/UCL-MSSL_iMars_CTX_v1.0, accessed on 15 October 2021), all with an identical tile size of 512 × 512 pixels.
In this work, we noticed that even though the HiRISE training datasets used in [6] were roughly screened to remove lower-quality DTMs, there are still about ¼ of them that are either affected by different types of artefacts or lack high-frequency details. We found that not only the quality of each PDS HiRISE DTM is quite varied, but sometimes, different parts of the same image have different DTM quality. The issue of some of the HiRISE DTMs having a lower quality is not necessarily associated with the quality of the corresponding stereo input images, nor does it appear to be related to a certain type of surface feature. We can still observe varied DTM quality (see Figure 8 for examples of these) for the same type of surface feature with similar image quality. Such lower quality training DTM samples prolong the training process and have a negative impact on the accuracy of the inference output.
It should be noted that down-sampling of the training DTMs produced from photogrammetric methods (i.e., PDS HiRISE DTM and iMars CTX DTM sub-image tiles) helps bring their nominal resolution closer to their effective resolution in order to train the network to perform per-pixel image-to-height inference. The down-sampling or averaging process also helps to remove some of the high-frequency photogrammetric artefacts so that we don't need to remove a training crop that has a few erroneous pixels during screening. The fixed spatial resolutions (2 m/pixel, 4 m/pixel, 36 m/pixel) of the training dataset have a reasonable coverage from high-resolution to low-resolution images of the existing Mars orbital imagers, and we do not observe any differences in the inference performance of test datasets that have different spatial resolutions (e.g., 12.5 m/pixel HRSC images or 25 cm/pixel HiRISE images), according to the HiRISE and HRSC tests of this work, as well as the CaSSIS, CTX, HiRISE tests in the previous work [6,7]. This is the basis for deep learning-based methods to produce better DTMs than using photogrammetric methods, even though photogrammetric DTMs were used for training the model. For instance, after successful training of the down-sampled and screened data, the network has learnt how to produce the best DTM result for an "artefact-free" output for comparably larger-sized features (e.g., craters with diameter larger than 100 m using 4 m/pixel images and DTMs), then the learnt parameter sets can be used to produce the most realistic DTM result for similar features that are much smaller in size (e.g., craters with diameter smaller than 10 m from 0.25 m/pixel images), where many artefacts may appear using photogrammetric methods. In this work, we further screened the 4 m/pixel and 2 m/pixel HiRISE training samples (total 19,700 ORI-DTM pairs before screening), as well as the 36 m/pixel CTX training samples (total 13,800 ORI-DTM pairs before screening), using the hill-shaded relief samples, which show more qualitative details. The screening process is assisted with batch calculation and sorting of the mean SSIM values between a smoothed (using bilateral filtering) version of each image crop and the hill-shaded relief crop, from which we empirically found that the lower quality DTM crops tend to have a lower mean SSIM value. We prepared the new training dataset in five steps summarised as follows.  The selected 60,000 training pairs of DTM and ORIs were fed into the network in the final training process to produce the training model that is used to produce the results presented in this paper. It should be noted that at the initial network training and test stage, 1000 training pairs (without data augmentation) were randomly selected to form the test dataset. Root mean squared errors (RMSEs) and mean SSIMs between the test predictions and ground-truth height maps are used as the evaluation metrics and are periodically monitored throughout the initial training process.

Datasets and Experiments Overview
In this work, we base our experiments on the 25 cm/pixel HiRISE images. We demonstrate with five single-strip HiRISE images over the ExoMars Rosalind Franklin rover's landing site at Oxia Planum [7,10], where we previously produced the large area coaligned DTM mosaics using HRSC, CTX and HiRISE [7]. These include a 25 m/pixel HRSC DTM mosaic, a 12 m/pixel CTX DTM mosaic, and a 50 cm/pixel HiRISE DTM mosaic, covering an area of 197 km × 182 km, an area of 114 km × 117 km, and an area of 74.3 km × 86.3 km over Oxia Planum, respectively. We employ the CTX DTM mosaic that is coaligned with HiRISE, HRSC, and MOLA, as the baseline reference, for the HiRISE experiments with MADNet 2.0. The five test HiRISE images (ESP_003195_1985_RED, ESP_036925_1985_RED, ESP_037070_1985_RED, ESP_039299_1985_RED, and ESP_042134_1985_RED) are selected for being overlapped with the existing PDS HiRISE DTMs and the existing MADNet 1.0 HiRISE DTM mosaic. Figure 10 shows an overview of the five test HiRISE images and the reference CTX DTM of this work. It should be noted that both the MADNet 1.0 and the MADNet 2.0 systems are trained with a spatial subset of the published PDS HiRISE ORI-DTM pairs that cover as much as possible of the known Martian surface features and that are of high quality. The training dataset is ~75% of the available PDS HiRISE DTM products after the manual screening process described in [6], and the spatial percentage is further slimmed down to ~35% of the available PDS HiRISE products after the stricter screening process described in this work (please refer to Section 2.  The overall processing chain of the MADNet 2.0 system is described in Section 2.2. For this experiment, MADNet 2.0 takes each of the five 25 cm/pixel HiRISE test images and the 12 m/pixel CTX DTM mosaic as inputs and follows nine processing steps that are briefly listed as follows. In terms of evaluation and assessments of these results, we do not repeat the measurements of 3D co-alignment and image co-registration accuracy that were thoroughly studied in [7]. We provide a qualitative demonstration of overall quality and resolution, as well as the effectiveness of MADNet 2.0 for resolving the four issues that are described in Section 2.1. We also provide quantitative assessments of the effective resolution of the DTM results. In particular, we employ an automatic crater detection method (https://pycda.readthedocs.io/en/latest/index.html, accessed on 15 October 2021) with manual corrections, for crater counting and size analysis, as well as the slant-edge analysis that is described in [69], using the hill-shaded relief images of the DTM results, for quantitative assessments of the effective resolution of the resultant DTMs. In addition, DTM profile and difference measurements are performed for five sub-areas to show the height differences of the MADNet 2.0 HiRISE DTMs, the PDS HiRISE DTMs, and the MADNet 1.0 CTX DTM mosaic (the baseline reference).

Qualitative Assessments
MADNet 2.0 not only provides improved DTM quality and effective resolution but also resolves the major issues (see Section 2.1) arising from MADNet 1.0. In this section, we demonstrate the efficacy of the proposed MADNet 2.0 system with visual inspections, focusing on the areas with the aforementioned issues. We compare the resultant 25 cm/pixel MADNet 2.0 DTMs with the 50 cm/pixel MADNet 1.0 HiRISE DTM mosaic [7] and the publicly available 1 m/pixel PDS HiRISE DTMs. We illustrate with two small exemplar areas (locations shown in Figure 10 for the 10 areas from area-A to area-J) for each of the five test HiRISE images (see Section 2.4). N.B. to examine full-strip HiRISE DTM results, please refer to the Supplementary Materials. Figure 11 shows two zoom-in views (area-A and area-B-refer to Figure 10        , which are generally considered difficult for photogrammetry to produce good quality results. We can observe that the topography of the three small craters is completely missing from the PDS HiRISE DTM, whilst in the MADNet 1.0 DTM, the topography of the three craters is successfully retrieved but with a smoothed appearance and affected by the issue of adjacent DTM tiles having different quality and sharpness (refer to issue (c) of Section 2.1). The MADNet 2.0 result shows the best overall quality with more consistent large-scale topography retrieval and fine-scale details with a similar level of details with the HiRISE image.

Quantitative Assessments
HiRISE is currently the highest-resolution orbital imaging instrument around Mars, and consequently, the effective resolution of any HiRISE derived high-resolution DTMs are difficult to be quantitatively measured without any high-resolution ground-truth being available. In this paper, we use semi-automated crater size-counting and slanted-edge sharpness analysis to assess quantitatively the effective resolution of the resulting DTMs. The two experiments are described as follows.
For crater size-counting, the number of recognisable small craters and the size of the smallest recognisable craters, from a DTM or hill-shaded relief, are key indicators to the quality and effective resolution of the DTM. We apply an open-source crater detection tool (https://pycda.readthedocs.io/en/latest/index.html, accessed on 15 October 2021) for initial automated detection and collection of size (diameter) statistics of the craters using the hill-shaded relief image of the MADNet 2.0, MADNet 1.0, and PDS HiRISE DTMs, as well as using the corresponding HiRISE images. We then manually validate the detection results to exclude any false-positives and to add any true-negatives. Five selected craterrich areas (from area-K to area-O; locations as shown in Figure 10) are used, each from one of the five test HiRISE images (see Section 2.4), to perform the crater size-counting experiment. Figure 16 shows an example of the automatically detected craters (in the red circles) with manually removed false-positives (in the blue circles) and manually added true-positives (in the green circles), over a 250 m × 250 m area (area-K-refer to Figure 10 for the location) on the HiRISE image and the hill-shaded relief images of the DTMs generated from (ESP_003195_1985_RED). We can observe that many more small-and medium-sized   Figure 17 shows an example of automatically detected craters (in red circles) with manually removed false-positives (in blue circles) and manually added true-positives (in green circles), over a 500 m × 500 m larger area (area-L-refer to Figure 10 for the location) on the HiRISE image and the hill-shaded relief images of the DTMs (ESP_036925_1985_RED). We can observe that MADNet 1.0 has successfully retrieved around 50% of the total craters that are shown on the HiRISE image and most of the missing ones are for the small-sized craters (diameter ≤ 10 m). In contrast, MADNet 2.0 has successfully retrieved around 75% of the total craters that are shown in the HiRISE image, and in particular for the retrieval of small-sized craters (diameter ≤ 10 m), MADNet 2.0 shows much better performance in comparison to MADNet 1.0. For the retrieval of the medium-sized craters (diameter between 10 m and 40 m) and large-sized craters (diameter ≥ 40 m), MADNet 2.0 and MADNet 1.0 show similarly good performance for this example.  Figure 18 shows an example of the automatically detected craters (in red circles) with manually removed false-positives (in blue circles) and manually added true-positives (in green circles), over a 1 km × 1 km larger area (area-M-refer to Figure 10 for the location) on the HiRISE image and the hill-shaded relief images of the DTMs (ESP_037070_1985_RED). In this example, we can observe that MADNet 2.0 is able to retrieve more than 60% of the craters that are shown on the HiRISE image, in comparison to about 30% and 10% of successful retrieval from MADNet 1.   Figure 19 shows an example of the automatically detected craters (in red circles) with manually removed false-positives (in blue circles) and manually added true-positives (in green circles), over a 1.5 km × 1.5 km large area (area-N-refer to Figure 10 for the location) on the HiRISE image and the hill-shaded relief images of the DTMs (ESP_039299_1985_RED). We can observe that MADNet 2.0 has retrieved about 60% of the craters that are shown on the HiRISE image. The MADNet 1.0 DTM shows a lower quality for this HiRISE image in comparison to the other scenes. The major difference between MADNet 2.0 and MADNet 1.0 is the retrieval of the small-sized craters (diameter ≤ 10 m) and medium-sized craters (diameter between 20 m and 40 m). Figure 20 shows an example of the automatically detected craters (in red circles) with manually removed false-positives (in blue circles) and manually added true-positives (in green circles), over a 500 m × 500 m area (area-O-refer to Figure 10 for the location) on the HiRISE image and the hill-shaded relief images of the DTMs (ESP_042134_1985_RED). In this example, we can observe that MADNet 2.0 has achieved the best over the aforedemonstrated test areas. More than 90% of the craters that are shown on the HiRISE image are successfully retrieved by MADNet 2.0, compared to the retrieval rate of about 40% and 15% from MADNet 1.0 and PDS DTM, respectively. In particular, MADNet 2.0 is able to retrieve 100% of the medium-and large-sized craters (diameter ≥ 20 m), with only a few small-sized craters (diameter ≤ 20 m) not being revealed.  In the next assessment, we employ slanted-edge sharpness analysis, using the inhouse ELF tool (automated image sharpness measurement via Edge Spread Function (ESF), Line Spread Function (LSF), and Full Width at Half Maximum (FWHM)) [69] that was previously developed for assessing Earth Observation image effective resolution and sharpness. ELF takes a lower-resolution target image and a higher-resolution reference image as inputs, using automated edge detection and profile extraction, and automated calculation and filtering of ESF and LSF, to calculate the averaged FWHM of all LSFs from all available high-contrast slanted edges that are automatically extracted from the inputs. A detailed description of the ELF workflow can be found in [69]. Mean FWHM (M-FWHM) is a key metric for measuring image effective resolution. For example, the larger the M-FWHM value is, the more pixels are involved for a high-contrast slanted-edge to change its intensity values (e.g., from black to white or from white to black), and subsequently, the lower the image effective resolution.
In this work, we use the HiRISE images as the "higher-resolution" references and use the hill-shaded relief images of the MADNet 2.0 DTMs and MADNet 1.0 DTMs as the "lower-resolution" targets. It should be noted that the PDS DTMs are excluded from this assessment, as it is almost impossible to find any common slanted edges (valid candidates) from the PDS DTMs and the corresponding HiRISE images, due to the huge differences in effective resolution. For each of the five test HiRISE images (refer to Section 2.4), we extract four crops (namely test-1, test-2, test-3, and test-4 in Table 2) centred at visually selected large-sized and steep craters (diameter ~200 m), where the image shows a highcontrast crater outline and the DTM has a steep slope (see area-H of Figure 14 as an example) resulting in a high-contrast crater outline on the corresponding hill-shaded relief image. Results of the ELF measurements are summarised in Table 2. To examine the test images and detailed plots from the ELF tool, please refer to the Supplementary Materials. The averaged M-FWHM values shown in Table 2 (the last column), indicate the number of pixels that are needed for representation of a sharp peak in the DTMs. The total averaged M-FWHM values from the 20 test areas of HiRISE, MADNet 1.0, and MADNet 2.0 are 3.128 pixels, 4.122 pixels, and 3.416 pixels, respectively. As we use the hill-shaded relief image of a DTM to compare against a real image, the total averaged M-FWHM values cannot be directly used to calculate the effective resolution of the DTMs. However, M-FWHM is still considered to be an effective indicator for comparing two DTM results and show how close their spatial resolutions are to the reference images. In this experiment, a lower M-FWHM value represents a sharper view of the hill-shaded relief image, and also, a closer M-FWHM value between the hill-shaded relief image and the HiRISE image represents a closer effective resolution between the DTM and the HiRISE image. We can observe from Table 2

DTM Profile and Difference Measurements
According to the qualitative assessments presented in Section 3.1, we observe that the very-fine-scale 3D features retrieved by the MADNet 2.0 system correlate with the 2D features that are displayed in the HiRISE images. This demonstrates that the MADNet 2.0 system does not "invent" any small-scale features which do not exist in the image and the existing retrieval is visually realistic. However, given the absence of any high-resolution "ground-truth" topographic data, it is not feasible to directly measure the correctness of the MADNet 2.0 HiRISE DTMs with respect to "true elevations" or to judge if the veryfine-scale 3D features are correct. We are aware that there exist photoclinometric HiRISE DTMs at similar spatial resolution and photogrammetric DTMs derived from rover stereo imagery at even higher spatial resolution (for very small areas up to a range of 5m around stopping points along the rover traverse). However, both photoclinometry and photogrammetry have their own issues/artefacts (e.g., overshoot/undershoot with photoclinometric DTMs, matching inaccuracy and camera model/triangulation uncertainties with photogrammetry), and consequently, the resultant photoclinometric and photogrammetric heights cannot be considered as sufficiently accurate even when they could have higher spatial resolution. Therefore, these DTMs cannot be used to justify the "true correctness" of the MADNet 2.0 DTMs. In other words, the closer the deep learning-based DTMs are to the photoclinometric or photogrammetric DTMs, doesn't necessarily mean that the deep learning-based DTMs are closer to "the truth".
Nevertheless, the profile and DTM difference measurements presented in this section should provide the readers with an insight into the "upper bound" and "lower bound" of the differences between the MADNet 2.0 HiRISE DTMs, the PDS HiRISE DTMs, and the referencing CTX DTM. We perform these profile measurements using the last five areas that were demonstrated in Section 3.1, i.e., area-F (in Figure 21), area-G (in Figure 22), area-H (in Figure 23), area-I (in Figure 24), and area-J (in Figure 25). For each of the five areas, four DTM profiles are measured, including one longer profile to demonstrate the differences between the HiRISE DTMs and the referencing CTX DTMs (i.e., large-scale topography), and three shorter profiles to detail the differences between the PDS HiRISE DTMs and the MADNet 2.0 HiRISE DTMs (i.e., small-scale topography). Note that if the profile is too short (less than 75 m) for the three shorter profiles, the CTX profiles are excluded in order to show more details for the differences of the higher-resolution HiRISE DTMs. We observe that in general the three measured DTMs are well correlated with each other but show different levels of detail. For large-scale topography, the MADNet 2.0 HiRISE DTMs and reference CTX DTM have differences between ±2 m (for area-F), between −3 m and +48 (for area-G, wherein the reference MADNet 1.0 CTX DTM has failed to retrieve the small hills), between −8 m and +6 m (for area-H), between −1 m and +2 m (for area-I), and between +3 m and +7 m (for area-J, wherein this could either be a smallscale error for the CTX DTM or a large-scale error for the HiRISE DTM). For small-scale topography, we can observe that there are much more fine-scale details (e.g., craters and linear peaks) present in the profiles of MADNet 2.0 HiRISE DTMs. However, to decide whether the fine-scale details are artefacts or real surface features, we need to refer to the original HiRISE images. As previously discussed, most of the fine-scale features that we can observe from the MADNet 2.0 DTMs are also present in the DTM profiles and can be observed in the original HiRISE images. Even though we cannot define their accuracy, the height differences between the MADNet 2.0 HiRISE DTMs and PDS HiRISE DTMs are comparably minor. They are between ±1 m (for area-F), between −3 m and +5 m (for area-G), between −1 m and +1.5 m (for area-H), −0.9 m and +1.3 m (for area-I), and between ±1 m (for area-J). These height variations are reasonable for the small-scale features (e.g., small craters) that are shown on the original HiRISE image.    The difference maps between the MADNet 2.0 HiRISE DTMs and PDS HiRISE DTMs, as well as the difference maps between the MADNet 2.0 HiRISE DTMs and the reference CTX DTM, for the above discussed five areas (i.e., area-F, -G, -H, -I, -J; extents are slightly different due to re-cropping), are shown in Figure 26. Table 3 shows the mean and standard deviations of these difference maps. For the differences between the MADNet 2.0 HiRISE DTMs and the reference CTX DTM, we can observe that except for one outlier for area-G, wherein the CTX DTM has failed to show the topography of the hills, the rest of the four areas have shown a fairly good correlation with means less than 5 m and standard deviations less than 3 m. For the differences between the MADNet 2.0 HiRISE DTMs and PDS HiRISE DTMs, we can observe highly correlated means with standard deviations less than 0.5 m for four areas, i.e., area-F, -H, -I, -J, and a slightly higher standard deviations as 1.073 m for area-G, which is due to the fact that the quality of the reference CTX DTM is very poor in this area.  Table 3. Statistics (mean and standard deviations) of the DTM difference maps that are shown in Figure 26.

MADNet 2.0 without Photogrammetric Inputs
Previously, we demonstrated the MADNet 1.0 multi-resolution DTM products using HRSC, CTX and HiRISE [7], wherein we started from a reference MOLA-corrected HRSC photogrammetric DTM, i.e., the HRSC MC-11W level 5 DTM mosaic (available at http://hrscteam.dlr.de/HMC30/MC11W/, accessed on 15 October 2021). The 50 m/pixel HRSC photogrammetric DTM was employed because it could provide better large-scale topographic input for the 25 m/pixel MADNet 1.0 HRSC DTM, in comparison to using the 463 m/pixel MOLA DTM. Theoretically, the MOLA DTM can always be used as the reference DTM in MADNet 1.0 for any HRSC processing, just like we can use the 12 m/pixel MADNet 1.0 CTX DTM as the reference DTM for the HiRISE processing (the resolution gap between MOLA DTM and HRSC image is similar to the resolution gap between CTX DTM and HiRISE image). However, the results are expected to be more or less affected by the aforementioned issues (refer to Section 2.1). With MADNet 2.0, large resolution gaps between the input image and reference DTM can now be better handled, as demonstrated with the HiRISE examples shown in this paper. This can be similarly applied for the single-input-image HRSC DTM estimation using the global MOLA DTM as the reference input. Figure 27 shows an example of the 12.5 m/pixel MADNet 2.0 HRSC DTM result (using MOLA DTM as the reference), in comparison to the previous 25 m/pixel MADNet 1.0 HRSC DTM result [7] (using the HRSC MC-11W DTM as the reference). It should be noted that the referencing HRSC MC-11W DTM was 3D co-aligned with respect to the global MOLA areoid DTM on top of the standard official HRSC level 4 and level 5 DTM production procedure using bundle adjustment and sequential photogrammetric adjustment [70,71]-for details of the 3D co-alignment process, please refer to [7,68]. We can observe from Figure 27 that not only the large-scale topography is unaffected by using the MOLA DTM as the reference for MADNet 2.0, but also, the quality and resolution of the smallscale topography is greatly improved compared to the MADNet 1.0 result. For each single surface feature that appears on the input HRSC image, MADNet 2.0 is able to produce pixel-scale retrieval of surface topography. We would like to emphasise that the MADNet 2.0 HiRISE DTMs presented in this paper are only derived from single input HiRISE images. The CTX reference DTMs are solely used to rescale the "relative heights" into "absolute heights". The CTX reference DTMs themselves were produced with single input CTX images with an HRSC reference DTM, whilst the HRSC reference DTM was produced with a single input HRSC image using an existing (official) HRSC DTM mosaic as the reference. However, as demonstrated in this section, the HRSC reference DTM (for CTX processing) can also be produced using the MOLA DTM as the reference DTM. The MOLA DTM is also required for photogrammetry to adjust the inaccurate camera triangulation results into "absolute heights". This means with MADNet 2.0 we are not reliant on any existing DTM products produced from the same input source. Therefore MADNet 2.0 should not be considered as a "DTM refinement" method. The overall MADNet 2.0 DTM retrieval process can be achieved via a coarse-to-fine approach, starting from the MOLA reference DTM, a single input lowerresolution image (e.g., HRSC), a single input medium-resolution image (e.g., CTX/CaSSIS), and a single input high-resolution image (e.g., HiRISE).

Limitations and Future Work
There are two limitations to the proposed MADNet 2.0 system. Firstly, the coarse-tofine reconstruction module used in MADNet 2.0 is comparably time-consuming as it involves multiple stages of the 3D co-alignment processing, and consequently, MADNet 2.0 system is about 2-3 times slower in speed, in comparison to the MADNet 1.0 system. Even though the DTM inference process only takes a few minutes (using the Nvidia ® RTX3090 graphics processing unit) to complete for a full HiRISE scene (i.e., 6000-12,000 inference processes are needed), the associated tiling, geo-referencing, height-rescaling, 3D coalignment, and tile mosaicing processes require an extra few hours to complete. In particular, the 3D co-alignment process takes about 85% of the total processing time. In the future, one of the key targets is to explore deep learning-based methods for non-rigid 3D coalignment and merging of multi-scale DTMs in order to speed up the overall process. Secondly, having a higher resolution inference output (from the half resolution of the input image in MADNet 1.0 to the same resolution of the input image in MADNet 2.0) sometimes result in image noise and artefacts being interpreted as small-scale topographic features with the MADNet 2.0 system. An example of this can be found from Figure 2, wherein the HiRISE image shows some linear noise at the centre of the crater, and consequently, the higher resolution MADNet 2.0 DTM has captured this as linear topographic features (they are not very obvious, but are shown at the centre of the crater). In the future, either pre-denoising or post "depth completion" methods could be explored to resolve this issue.
It is always considered desirable to understand the accuracy of a high-resolution planetary DTM, especially for the DTMs that are produced by deep learning, which is much newer than photogrammetry or photoclinometry. However, the only ground truth available for Mars topography is the MOLA profile that has a very low spatial resolution (~330 m/pixel horizontally; ~1 m vertically) [11]. Consequently, measuring the accuracy of the resultant MADNet 2.0 HiRISE DTMs is not considered feasible (see discussions in Section 3.3) for the time being. In the absence of any high-resolution ground-truth of Mars topography, we can only measure the "relative accuracy" of the MADNet 2.0 HiRISE DTM via intercomparisons with DTMs built from different instruments (e.g., CTX, CaS-SIS, or rover/helicopter imagery for the robotic sites) or with DTMs built by different methods (e.g., photogrammetry [7,71,72] or photoclinometry [73][74][75][76]-for general discussions about the advantages and disadvantages of photogrammetry, photoclinometry, and deep learning-based DTM retrieval methods, please refer to [6]). In this paper, we demonstrate with profile measurements that the large-scale differences between the MADNet 2.0 HiRISE DTM and CTX reference DTM are considered minor, and also, the small-scale differences between the MADNet 2.0 HiRISE DTM and PDS HiRISE DTM are even smaller. In the future, we would plan to look at the rover/helicopter imagery of the landing sites of the robotic missions, to better understand the "relative accuracy" of the MADNet 2.0 DTMs. In addition, we plan to apply MADNet 2.0 on repeat HiRISE observations, in the future, to assess the robustness of the method with static features but also to pursue studies on topographic changes that were not considered feasible with non-simultaneous stereo. However, it should be noted that we were able to produce seamless DTM mosaics using overlapping MADNet 1.0 DTMs produced from different overlapping images (for HiRISE and CTX) [7], which to some extent demonstrated robustness of the original MAD-Net 1.0 system with repeat views.

Conclusions
In this paper, we proposed the MADNet 2.0 single-image DTM estimation system that improves on the MADNet 1.0 system to resolve several issues of the latter system and to achieve better DTM quality and effective resolution. A new training dataset is constructed using selected, publicly available PDS HiRISE DTMs and iMars CTX DTMs for the new MADNet 2.0 model. Quantitative assessments using RMSEs and mean SSIMs are provided for the HiRISE test dataset to show efficacy of the proposed MADNet 2.0 network, compared to the MADNet 1.0 network. Followed by these, we demonstrated the proposed MADNet 2.0 system with single-view HiRISE images over the ExoMars Rosalind Franklin rover's landing site at Oxia Planum. Visual comparisons and quantitative assessments are provided in comparison with the PDS HiRISE DTMs and the MAD-Net 1.0 HiRISE DTM mosaic product. The qualitative assessments demonstrate that MADNet 2.0 has effectively resolved the key issues from MADNet 1.0 and is capable of producing improved DTM quality. The quantitative assessments using crater size-counting and slanted-edge sharpness measurements demonstrate the effective resolution of the MADNet 2.0 DTM is very close to the effective resolution of the input image. In addition, DTM profile and difference measurements of the resultant MADNet 2.0 HiRISE DTMs, compared to the reference CTX DTM and the PDS HiRISE DTMs, have shown fairly good relative accuracy for both the large-scale and small-scale topography.