Hubble Meets Webb: Image-to-Image Translation in Astronomy

This work explores the generation of James Webb Space Telescope (JWSP) imagery via image-to-image translation from the available Hubble Space Telescope (HST) data. Comparative analysis encompasses the Pix2Pix, CycleGAN, TURBO, and DDPM-based Palette methodologies, assessing the criticality of image registration in astronomy. While the focus of this study is not on the scientific evaluation of model fairness, we note that the techniques employed may bear some limitations and the translated images could include elements that are not present in actual astronomical phenomena. To mitigate this, uncertainty estimation is integrated into our methodology, enhancing the translation’s integrity and assisting astronomers in distinguishing between reliable predictions and those of questionable certainty. The evaluation was performed using metrics including MSE, SSIM, PSNR, LPIPS, and FID. The paper introduces a novel approach to quantifying uncertainty within image translation, leveraging the stochastic nature of DDPMs. This innovation not only bolsters our confidence in the translated images but also provides a valuable tool for future astronomical experiment planning. By offering predictive insights when JWST data are unavailable, our approach allows for informed preparatory strategies for making observations with the upcoming JWST, potentially optimizing its precious observational resources. To the best of our knowledge, this work is the first attempt to apply image-to-image translation for astronomical sensor-to-sensor translation.


Introduction
In this paper, we explore the problem of predicting the visible sky images captured by the James Webb Space Telescope (JWST), hereafter referred to as 'Webb' [1], using the available data from the Hubble Space Telescope (HST), hereinafter called 'Hubble' [2].There is much interest in this type of problem in fields such as astrophysics, astronomy, and cosmology, encompassing a variety of data types and sources.This includes the translation of observations of galaxies in visible light [3] and predictions of dark matter [4].The data registered from different sources may be acquired at different times, by different sensors, in different bands, with different resolutions, sensitivities, and levels of noise.The exact underlying mathematical model for transforming data between these sources is very complex and largely unknown.Thus, we will try to address this problem based on an image-to-image translation approach.
Despite the great success of image-to-image translation in computer vision, its adoption in the astrophysics community has been limited, even though there is a lot of data available for such tasks that might enable sensor-to-sensor translation, conversion between different spectral bands, and adaptation among various satellite systems.
Before the launch of missions such as Euclid [5], the radio telescope Square Kilometre Array [6], and others, there has been a significant interest in advancing image-to-image translation techniques for astronomical data to: (i) enable efficient mission planning due to the high complexity and cost of exhaustive space exploration, allowing for the prioritization of specific space regions using existing data; and (ii) generate sufficient synthetic data for machine learning (ML) analysis as soon as the first real images from new imaging missions are available in adequate quantities.
We focus on the images collected by both the Hubble and the Webb telescopes, taken at different times, as illustrated in Figure 1.Thus, we present our work as a proof-of-concept for image-to-image translation, aiming to predict Webb telescope images using those from Hubble.This technique, once validated, could inform the planning of future missions and experiments by enabling the prediction of Webb telescope observations from existing Hubble data.
We assume that, despite the time lapse between Hubble's and Webb's data acquisition, the astronomical scenes of interest have remained relatively stable, conforming to the slowchanging physics of the observed phenomena.However, there is a substantial disparity in the imaging technologies of the two telescopes, affecting not only resolution and signal-tonoise ratio but also the visual representation of the phenomena due to different underlying physical principles and the images being taken at various wavelengths.

Hubble image Webb image
< l a t e x i t s h a 1 _ b a s e 6 4 = " L o 7 S 5 3 g 9 2 N F j 0 q b i 4 / 9 K Z r b Y x w Y = " > A A A B 7 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k q M e i F x < l a t e x i t s h a 1 _ b a s e 6 4 = " 4 M F + F L p A W x t n y 6 z H d k 5 w a x n k 0 X 8 = " > A A A B 7 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k q M e i F 4 8 V T F t o Q 9 l s J + 3 S z S b s b o R a + h u 8 e F D E q z / I m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M B V c G 9 f 9 d g p r 6 x u b W 8 X t 0 s 7 u 3 x B H e 6 g A U 1 g 8 A j P 8 A p v 1 t h 6 s d 6 t j 0 V r w c p n j u E P r M 8 f 1 b + S J w = = < / l a t e x i t > x 2.4m 6.5m Hubble Webb

Loss
Figure 1.Image-to-image astronomical setup under study.Given two imaging systems, Hubble and Webb, characterized by different bands, resolutions, orbits, and time of image acquisition, the problem is to predict the Webb images x as close as possible to the original Webb images x from the Hubble ones z using a learnable model g θ .The considered setup is paired but is characterized by inaccurate geometrical synchronization between the paired images.
Our study reveals that Hubble and Webb data are typically dis-synchronized by approximately 3-5 pixels, a discrepancy mainly attributed to synchronization with respect to celestial coordinates during Webb's data pre-processing and differing resolutions.Although this misalignment is subtle to the naked eye, we found that it significantly impairs the accuracy of paired image-to-image translation, highlighting the critical need for precise data alignment.To address this problem, we introduce two synchronization methods using computer vision keypoints and descriptors: (a) global synchronization applies a single affine transformation to the entire image; (b) local synchronization divides the image into patches and computes individual affine transformations for each patch.We compare the impact on the performance of image-to-image translation when using these synchronization methods against provided synchronization with respect to celestial coordinates.
We compare several types of image-to-image translation methods: (i) fully paired methods such as Pix2Pix [7] and their variations; (ii) fully unpaired methods such as CycleGAN [8]; (iii) hybrid methods that can be used for both fully paired setups, fully unpaired setups, or setups where part of the data is paired, and part of the data is unpaired, as advocated by the TURBO approach [9]; (iv) denoising diffusion probabilistic models (DDPM) [10] based image-to-image translation method Palette [11].We investigate the influence of pairing and different types of synchronization for the above methods.We demonstrate that paired methods produce results superior to unpaired ones.At the same time, the paired methods Pix2Pix and TURBO are subject to the accuracy of synchronization.Local synchronization produces the most accurate translation results, according to several metrics of performance.
Furthermore, we show that there is a high potential for uncertainty in the estimation when using DDPM models for image-to-image translation since they can produce multiple outputs for one input.This stochastic translation enabled us to establish the regions that appear to be very stable in each run and the ones that are characterized by high variability.
In summary, we run experiments for image-to-image translation on non-synchronized, globally synchronized, and locally synchronized Hubble-Webb pairs.We report the results using multiple metrics: MSE, SSIM [12], PSNR, LPIPS [13], and FID [14].We use computer vision-based metrics since we are working with telescope images represented as RGB images.
The main focus of this paper is not on the scientific inquiry into the fairness of predictive models.We acknowledge that our results, generated through the image-to-image translation technique, are subject to limitations inherent to such approaches.The data and methods utilized may not be exhaustive or infallible, and the results should therefore be interpreted with caution, as they are not immune to inaccuracies and may contain hallucinated elements which do not correspond to real astronomical phenomena.
Therefore, to enhance the integrity of the image-to-image translation provided in this study, we incorporate uncertainty estimation into our methodology.This feature is designed to assist astronomers by delineating areas within the translated images where the model's predictions are reliable from those where the certainty of prediction remains questionable.Such delineation is crucial in guiding astronomers to discern between regions of high confidence and those that require further scrutiny or could potentially mislead them.
The proposed approach, with its ability to estimate uncertainty, may serve as an instrumental tool for planning future astronomical experiments.In scenarios where observational data from the Webb telescope are not yet available, our model can offer predictive insights based on existing Hubble Space Telescope data.This capability acts as a provisional glimpse into the future, enabling researchers to strategize upcoming observations with the Webb telescope, potentially optimizing the allocation of its valuable observational time.
Our contributions include: (i) the introduction of image-based synchronization for astrophysics data in view of image-to-image translation problems; (ii) a comparison of the image-to-image translation methods for Hubble to Webb translation, and a study of the effect of synchronization on different models; (iii) the introduction of an innovative way of uncertainty estimation in probabilistic inverse solvers or translation methods based on denoising diffusion probabilistic models.In summary, our main contribution is: the demonstration of the potential of using deep learning-based image-to-image translation in astronomical imaging, exemplified by Hubble to Webb image translation.

Comparison between Webb and Hubble Telescopes
In Figures 2 and 3, the same part of the sky captured by the Hubble and Webb telescopes is shown in the RGB format.The main differences between the Hubble and Webb telescopes are: (i) Spatial resolution-The Webb telescope, featuring a 6.5-m primary mirror, offers superior resolution compared to Hubble's 2.4-m mirror, which is particularly noticeable in infrared observations [15].This enables Webb to capture images of objects up to 100 times fainter than Hubble, as evident in the central spiral galaxy in Figure 3.
(ii) Wavelength coverage-Hubble, optimized for ultraviolet and visible light (0.1 to 2.5 microns), contrasts with Webb's focus on infrared wavelengths (0.6 to 28.5 microns) [16].While this differentiation allows Webb to observe more distant and fainter celestial objects, including the earliest stars and galaxies, it is crucial to note that the IR emission captured by Webb differs inherently from the UV or visible light observed by Hubble.The distinction is not solely in the resolution or sensitivity between the Hubble Space Telescope (HST) and the James Webb Space Telescope (JWST) but also in the varying absorption of light by dust within different galaxy types.However, our proposed image-to-image translation method does not aim to delve into these observational differences.Instead, our focus is to explore whether image-to-image translation can effectively simulate Webb telescope imagery based on the existing data from Hubble.This approach seeks to leverage the available Hubble data to anticipate and interpret the observations that Webb might deliver, without directly analyzing the spectral and compositional differences between the images captured by the two telescopes.
(iii) Light-collecting capacity-Webb's substantially larger mirror provides over six times the light-collecting area compared to Hubble, essential for studying longer, dimmer wavelengths of light from distant, redshifted objects [15].This is exemplified in Webb's images, which reveal smaller galaxies and structures not visible in Hubble's observations, highlighted in yellow in Figure 3.

Image-to-Image Translation
Image-to-image translation [19] is the task of transforming an image from one domain to another, where the goal is to understand the mapping between an input image and an output image.Image-to-image translation methods have shown great success in computer vision tasks, including transferring different styles [20], colorization [21], superresolution [22], visible to infrared translation [23], and many others [24].There are two types of image-to-image translation methods: unpaired [25] (sometimes called unsupervised) and paired [26].Unpaired setups do not require fixed pairs of corresponding images, while paired setups do.In this paper, we also introduce a hybrid method for image-to-image translation, called TURBO [9], which is a generalization of the above-mentioned paired and unpaired setups and provides an information-theoretic interpretation of this method.For the completeness of our study, we also consider newly introduced denoising diffusion probabilistic models (DDPM) as image-to-image translation models [11].

Image-to-Image Translation in Astrophysics
Image-to-image translation has been used in astrophysics for galaxy simulation [3], but these methods have mostly been used for denoising [27] optical and radio astrophysical data [28].The task of predicting the images of one telescope from another using image-toimage translation remains largely under-researched.

Metrics
The following metrics were used to evaluate the quality of the generated images:

• Mean square error (MSE) between the original and the generated Webb images; •
To address an issue that the MSE is not highly indicative of the perceived similarity of images, we calculate the Structural Similarity Index (SSIM) [12] between the original and generated Webb images; • Fréchet Inception Distance (FID): proposed in [14].Instead of a simple pixel-by-pixel comparison of images, FID estimates the mean and standard deviation of one of the deep layers in the pretrained convolutional neural network.It has become one of the most widely used metrics for the image-to-image translation task; • Peak Signal-to-Noise Ratio (PSNR): This metric evaluates the quality of the generated images by comparing the maximum possible power of a signal (original images) to the power of the same images after distortion (generated images).PSNR is often used as a measure of reconstruction quality in image compression and restoration tasks; • Learned Perceptual Image Patch Similarity (LPIPS): proposed in [13].LPIPS measures the perceptual similarity between images by using deep features extracted from a pretrained neural network.It is designed to better reflect human perception of image similarity compared to traditional metrics like MSE or PSNR.

Dataset
We use images from the Hubble and Webb telescopes as the dataset.In particular, we use images of Galaxy Cluster SMACS 0723 [29].An example of the image is shown in Figure 2.For the Webb, we use post-processed NIRCam images [30], available as RGB images, provided by ESA/NASA/STScI.Webb images are available publicly at [17].We then select the corresponding Hubble images [18].Since the Hubble images are smaller than Webb images, we upsampled them using bicubic interpolation for comparison purposes.

Image Registration
Image registration or synchronization is needed to ensure that pixels in different data sources represent the same position in observed space.Even though astronomical data are generally synchronized, there is always room for synchronization improvement, especially at the local level.In this section, we compare three synchronization setups for Hubble to Webb translation: synchronization with respect to celestial coordinates, algorithmic or automated global synchronization, and local synchronization, as schematically shown in Figure 4.  Synchronization with respect to celestial coordinates.In this setup, the data are used directly with the provided synchronization with respect to celestial coordinates.
Global synchronization.The data are synchronized using SIFT [31] feature descriptors and the RANSAC [32] matching algorithm.The feature descriptors are computed for the entire image from both the Hubble and Webb telescopes.
Local synchronization.The data are synchronized using SIFT feature descriptors and the RANSAC matching algorithm, with the feature descriptors being computed from image patches.Specifically, input images from both the Hubble and Webb telescopes are divided into a grid made of nine patches, arranged in a three × three configuration both vertically and horizontally, before the cropping process.
The non-synchronized and synchronized Webb and Hubble images can be viewed in our demo: hubble-to-webb.herokuapp.com(accessed on 8 February 2024).

TURBO 3.3.1. Mathematical Interpretation
The TURBO framework [9] is based on an auto-encoder (AE) structure and is represented by an encoder q ϕ (z|x) and a decoder p θ (x|z) that are deep networks parametrized by the parameters ϕ and θ, respectively.A block diagram for the TURBO system is shown in Figure 5.According to the framework we used, given a pair of data samples (Hubble and Webb images) (x, z) ∼ p(x, z), where z is a Hubble image and x is a Webb image, the system maximizes the mutual information between x and z for both encoder and decoder in direct and reverse paths.
Two approximations of the joint distribution can be defined as follow: the marginal distributions are approximated through reparametrizations involving unknown networks.These are represented as qϕ (z) = q ϕ (x, z)dx and pθ (x) = p θ (x, z)dz, relating to the synthetic variables in latent spaces.Furthermore, in our work, we also utilize two approximated marginal distributions for the reconstructed synthetic variables in spaces, denoted as qϕ (z) = pθ (x)q ϕ (z|x)dx and pθ (x) = qϕ (z)p θ (x|z)dz.
The variational approximation is considered for the direct path of the TURBO system based on the maximization of two bounds on mutual information for the latent space and the reconstruction space: . ( 4) Thus, the network is trained in such a way to maximize a weighted sum of ( 3) and ( 4) in order to find the best parameters ϕ and θ of the encoder and the decoder, respectively.This is achieved in the direct path by minimising the L direct loss, representing the left network shown in Figure 5: where z is real Hubble image, x is real Webb image, z predicted Hubble image generated by q ϕ (z|x) from real Webb image x, x is Webb image reconstructed from generated Hubble image z, L z(z, z) reconstruction loss between real and generated Hubble images, D z(z, z) discriminator loss for generated Hubble images, L x(x, x) present cycle reconstruction loss between real and reconstructed Webb images, D x(x, x) is discriminator loss in the reconstructed Webb images, and λ D is a parameter controlling the trade-off between the terms in ( 3) and ( 4).The variational approximation for the reverse path is: The reverse path loss L reverse (ϕ, θ) is represented by the right network shown in Figure 5: where x is a Webb image, generated by p θ (x|z) from a real Hubble image z, ẑ is a Hubble image reconstructed from generated Webb image x, L x(x, x) is reconstruction loss between the real and generated Webb images, D x(x, x) is discriminator loss in the generated Webb images, L ẑ(z, ẑ) is cycle reconstruction loss between real and reconstructed Hubble images, D ẑ(z, ẑ) discriminator loss in the reconstructed Hubble images, and λ R is a parameter controlling the trade-off between ( 6) and (7).
A detailed derivation and analysis of TURBO can be found in [9].The TURBO method is versatile and adaptable to various setups.It supports a fully paired configuration, utilizing direct and reverse path losses, provided above, which are applicable when data pairs are fully accessible during training.In cases where such pairs are unavailable for training, an unpaired configuration is viable.Additionally, a mixed setup can be employed, combining both paired and unpaired data.This method imposes no constraints on the architecture of the encoder and decoder, offering a broad range of architectural choices.

Paired Setup: Pix2Pix as Particular Case of TURBO
Pix2Pix [7] image-to-image translation method can be viewed as a paired case of TURBO approach, with only reverse path, where λ R = 0 in (9): Thus, the direct path is not used as the training of the encoder-decoder pair and Pix2Pix uses uses the deterministic decoder x = g θ (z).

Unpaired Setup: CycleGAN as Particular Case of TURBO
The CycleGAN [8] image-to-image translation method can be viewed as a particular case of the TURBO approach, with both a direct and reverse path, with cycle reconstruction losses and discriminator losses for predicted images, with: CycleGAN does not have paired components in the latent space in comparison to TURBO.

Denoising Diffusion Based Image-to-Image Translation
Conditional denoising diffusion probabilistic models [10] for image-to-image translation apply a denoising process that is conditioned on the input image [11].Image-to-image diffusion models are conditional models of the form p θ (x|z), where x is a generated Webb image, and z is a Hubble image, used as a condition.In fact, the DDPM models are derived from the Variational Autoencoder [33] with the decomposition of the latent space of z as a hierarchical Markov model z T → z T−1 → • • • → z 0 [34].
In practice, the conditional image is concatenated to the input noisy image.During training, detailed in Algorithm 1, we use a simple DDPM training loss (11): where x 0 is Webb image, z is the input Hubble image, used in conditioning, ϵ is Gaussian zero mean unit variance noise added at step t, ϵ θ is conditional DDPM, and ᾱt is noise scale parameter, added at step t. ϵ ∼ N (0, I) In the inference phase of the conditional denoising diffusion probabilistic model, detailed in Algorithm 2, the model starts with an initial noisy sample x T from a Gaussian distribution N (0, I); then, the model utilizes a learned denoising function ϵ θ , which incorporates the conditioning Hubble image x, to iteratively denoise the image at each timestep t.The image is updated according to (12): where ϵ is sampled from Gaussian noise.This denoising process is repeated for T steps until the final image x 0 is obtained.

Uncertainty Estimation
In this section, we show how denoising diffusion probabilistic models can be used for the prediction of uncertainty maps.By design, DDPMs are stochastic generators at each sampling step, so it is possible to generate multiple predictions for the same input.The ensemble of predictions allows us to compute the pixel-wise deviation maps that visualize the uncertainty of the predictions.In Figure 6, we display the true uncertainty map U, computed as , where x is the target Webb image, xi is the i-th predicted Webb image, x is the averaged predicted image estimated from xi , and N is the number of generated images.In our experiments, we have used 100 generations to compute the estimated uncertainty map Û, computed as The uncertainty map can be used for analyzing and evaluating the DDPM results by indicating the regions of low and high variability as a measure of uncertainty in each experiment.It is remarkable that this approach is very discriminating for the different types of space objects: point objects (shown in Figures 7-9), galaxies (shown in Figure 8), and stars (shown in Figure 9).Furthermore, we have found that the method is able to detect the presence of point source objects in the estimated uncertainty maps, while such objects were not usually directly detectable in the Hubble images or in the predicted Webb images (highlighted with orange boxes in Figures 7 and 9).The point sources that were not present in the Hubble images were not completely predicted in the Webb images when considering these images independently.However, the use of an uncertainty map allowed us to spot their presence in the uncertainty maps, which are highlighted with red boxes in the above-mentioned figures.To further evaluate the performance, we introduce the Peak Signal-to-Uncertainty Ratio (PSUR), computed as PSUR = 10 • log 10 MAX 2 x mean( Û) dB, where MAX x is the maximum possible pixel value of the image.This metric, analogous to PSNR but using the uncertainty map instead of MSE, offers a measure of how distinguishable the true signal is from the uncertainty inherent in the prediction process.We compute PSUR value for every uncertainty map, shown in Figures 6-9

Implementation Details
We use PyTorch 1.12 [35] deep learning framework in all our experiments.Data.We use crops from Hubble and Webb images of size 256 × 256 pixels in each experiment.All of the images used in training and validation are available at github.com/vkinakh/Hubble-meets-Webb,(accessed on 8 February 2024).We apply random horizontal and vertical flipping to each image pair of Hubble-Webb images as augmentation.
Pix2Pix and CycleGAN.In the experiments with Pix2Pix and CycleGAN, we use a convolutional architecture consisting of two convolutional layers for downsampling, nine residual blocks, and two transposed convolutional layers for upsampling for both the encoder and decoder.As discriminators, we use PatchGAN [7] with LSGAN loss [36], as provided in the original implementations.During training, we use an Adam [37] optimizer with a learning rate of 2 × 10 −4 and a linear learning rate policy weight decay every 50 steps.Each model is trained for 100 epochs with a batch size of 64.For the experiments, we have used NVIDIA RTX 2080Ti GPU.
TURBO.In the experiments with TURBO [9], we use the same convolutional architectures for the encoder and decoder as in the Pix2Pix and CycleGAN experiments.TURBO consists of two convolutional generators: the first, q ϕ (x, z), generates Webb images from Hubble ones, and the second, p θ (x|z), generates Hubble images from Webb ones.We use four PatchGAN [7] discriminators: one for generated Webb samples D x x( x), one for reconstructed Webb samples D x x( x), one for generated Hubble images D z z( z), and one for reconstructed Hubble images D z ẑ( ẑ).Alternatively, the TURBO model can only use two discriminators: the first D z for generated and reconstructed Webb images, and the second D x for generated and reconstructed Hubble images.The results using two discriminators are shown in the ablation study in Table 1.As estimation and cycle losses, we use the ℓ 1 -metric.We use the LSGAN discriminator loss [36], as in the Pix2Pix and CycleGAN experiments.Similarly, we use the Adam optimizer with a learning rate of 2 × 10 −4 and a linear learning rate policy with decay every 50 steps.The model is trained for 100 epochs with a batch size of 64.For the experiments, we have used NVIDIA RTX 2080Ti GPU.
During training, we use a linear beta schedule with 2000 steps, 10 −6 start, and 0.01 end.During inference, we use a DDPM scheduler with 1000 steps, 10 −6 start, and 0.01 end.The model is trained for 1000 epochs with a batch size of 32.For the experiments, we have used NVIDIA A100 GPU.
During inference, since our images exceed 256 × 256 pixels, we employ a method known as stride prediction to predict patches of size 256 × 256 using a selected stride value.This method works systematically across the image: starting from the top-left corner at position (0, 0), we predict the first patch, then move horizontally by stride s to predict the next, proceeding row by row until the entire image is covered.If the bottom or right edge is reached, the next row begins just below the starting point or back at the left edge, respectively.After predicting all patches, we save the images and track the prediction count for each pixel.The final pixel value is determined by averaging across all predictions for that pixel, ensuring a seamless image reconstruction.
Table 1.Ablation studies on paired models Pix2Pix and TURBO on locally synchronized data.All results are obtained on Galaxy Cluster SMACS 0723.The label "TURBO same D" corresponds to an approach, when the same discriminator is used for generated and reconstructed Webb and Hubble images.The label "LPIPS" denotes adding perceptual similarity loss.

Results
In this section, we report image-to-image translation results for the prediction of Webb telescope images based on Hubble telescope images.In Table 2, we report results for four setups: (a) unpaired setup; (b) paired setup with the synchronization with respect to celestial coordinates, where images were synchronized by hand; (c) paired setup with global synchronization, where the full image was synchronized using a single affine transform; and (d) paired setup with local synchronization, where the images were split into multiple patches and then each of the Hubble and Webb patches were synchronized individually.For each setup, we have defined a training set that covers approximately 80% of the input image of the galaxy clusters SMACS 0723, and the rest is used as a validation set for results.We make sure that the training and validation set cover different parts of the sky and never overlap even for a single pixel.When generating images for evaluation, since the validation images are larger than 256 × 256, we have used the stride prediction described above with a stride of f our.It is shown in Table 2 that the synchronization of the data is very important, as all of the considered models perform best when the data are locally synchronized.This fact was not well addressed in previous studies, to the best of our knowledge.Also, we show that the DDPM-based image-to-image translation model outperforms the CycleGAN, Pix2Pix, and TURBO models in terms of MSE, SSIM, PSNR, FID and LPIPS metrics.The only downside of the DDPM model is its inference time, which is 1000 times longer than the inference time of Pix2Pix, CycleGAN and TURBO.This might be a serious limitation in practice, considering the size and number of astronomical images.In Table 3, we compare parameter counts and inference times for a 256 × 256 image from the models considered in the study.The DDPM model is particularly noteworthy for its extensive parameter count, with both trainable and inference parameters reaching 62.641 Mio.It also necessitates 1000 generation steps, contributing to a longer inference time of approximately 42.77 seconds.Conversely, Pix2Pix, CycleGAN, and Turbo demonstrate a more streamlined parameter structure.These models employ generators with a uniform parameter count of 11.378 Mio and discriminators with 2.765 Mio parameters.Pix2Pix operates with one generator and one discriminator, CycleGAN with two of each, and Turbo with two generators and four discriminators.Despite the architectural differences, these models maintain compact trainable parameters, ranging from 14.143 Mio to 33.816 Mio, and achieve notably swift inference times, clocked at around 0.07 seconds.The inference time is averaged over 100 generations for each model on a single RTX 2080 Ti GPU with a batch size of one.In Table 1, we perform ablation studies on the paired TURBO and Pix2Pix image-toimage translation models.We compare these models trained under various conditions: (a) with the L 1 loss, which is the mean absolute error, (b) with the L 2 loss, which is the mean squared error, (c) with both L 1 and L 2 losses, (d) with L 1 loss and the Learned Perceptual Image Patch Similarity (LPIPS) loss using a VGG encoder [13].We also explore Pix2Pix configurations, such as Pix2Pix with L 1 loss plus a discriminator, Pix2Pix combined with LPIPS loss and a VGG encoder, along with variations of the TURBO model: TURBO with LPIPS loss, TURBO operating only in reverse pass, and TURBO using the same discriminator for both generated and reconstructed images.Models are trained and evaluated on data synchronized locally.As Table 1 indicates, Pix2Pix models and those without a discriminator perform better on paired metrics (MSE, PSNR, SSIM), whereas TURBO-based methods excel in image quality metrics (LPIPS, FID).Notably, the DDPM-based image-to-image translation method outperforms other methods discussed in the ablation study.

Conclusions
In this paper, we have proposed the use of image-to-image translation approaches for sensor-to-sensor translation in astrophysics for the task of predicting Webb images from Hubble.The novel TURBO framework serves as a versatile tool that outperforms existing GAN-based image-to-image translation methods, offering better quality in generated Webb telescope imagery and information-theoretic explainability.Furthermore, the application of DDPM for uncertainty estimation introduces a probabilistic dimension to image translation, providing a robust measure of reliability previously unexplored in this context.We show the importance of synchronization in paired image-to-image translation approaches.
This research not only paves the way for improved astronomical observations by leveraging advanced computational techniques but also advocates for the application of these methods in other domains where image translation and uncertainty estimation are crucial.As we continue to venture into the cosmos, the methodologies refined here will undoubtedly become instrumental in interpreting and maximizing the utility of the data we collect from advanced telescopes.

Future Work
Out future research will include an approach to refine and enhance the methodologies discussed in this paper.A particular focus will be directed towards improving the TURBO model, which, while being computationally efficient, currently lags behind DDPM in terms of performance.TURBO model improvement will be mostly focused on architectural improvements of generators.In parallel, we plan to undertake a thorough investigation into the resilience of our applied methods against various data preprocessing techniques, including different forms of interpolation.This study aims to ensure the robustness and adaptability of our models across a spectrum of data manipulation scenarios.Moreover, the exploration of existing sampling techniques within DDPMs will be pursued with the goal of expediting inference times.This focus is expected to significantly improve the models' efficiency, rendering them more suitable for real-time applications.
The current research specifically focuses on the analysis of RGB pseudocolor images.A significant portion of our future work will be dedicated to the meticulous training and evaluation of the proposed models on raw astrophysical data.This will involve the integration of specialized astrophysical metrics designed to align with the unique properties of such data, thereby assuring that our models are not only statistically sound but also truly resonate with the practical demands and intricacies of astrophysical research.We aspire to bridge the gap between theoretical robustness and real-world applicability, setting the stage for transformative developments in the field of image-to-image translation in astrophysical data analysis.
Funding: This research was partially funded by the SNF Sinergia project (CRSII5-193716) Robust deep density models for high-energy particle physics and solar flare analysis (RODEM).

Data Availability Statement:
The code and data used in the study can be accessed at public repository: github.com/vkinakh/Hubble-meets-Webb,(accessed on 8 February 2024).The experimental results can be accessed at hubble-to-webb.herokuapp.com,(accessed on 8 February 2024).

Conflicts of Interest:
The authors declare no conflicts of interest.
4 8 V T F t o Q 9 l s J + 3 S z S b s b s R S + h u 8 e F D E q z / I m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M B V c G 9 f 9 d g p r 6 x u b W 8 X t 0 s 7 u 3v 5 B + f C o q Z N M M f R Z I h L V D q l G w S X 6 h h u B 7 V Q h j U O B r X B 0 O / N b j 6 g 0 T + S D G a c Y x H Q g e c Q Z N V b y u 2 F E n n r l i l t 1 5 y C r x M t J B X I 0 e u W v b j 9 h W Y z S M E G 1 7 n h u a o I J V Y Y z g d N S N 9 O Y U j a i A + x Y K m m M O p j M j 5 2 S M 6 v 0 S Z Q o W 9 K Q u f p 7 Y k J j r c d x a D t j a o Z 6 2 Z u J / 3 m d z E T X w Y T L N D M o 2 W J R l A l i E j L 7 n P S 5 Q m b E 2 B L K F L e 3 E j a k i j J j 8 y n Z E L z l l 1 d J 8 6 L q X V Z r 9 7 V K / S a P o w g n c A r n 4 M E V 1 O E O G u A D Aw 7 P 8 A p v j n R e n H f n Y 9 F a c P K Z Y / g D 5 / M H a 4 K O c Q = = < / l a t e x i t >
t e x i t s h a 1 _ b a s e 6 4 = " L o 7 S 5 3 g 9 2 N F j 0 q b i 4 / 9 K Z r b Y x w Y = " > A A A B 7 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k q M e i F 4 8 V T F t o Q 9 l s J + 3 S z S b s b s R S + h u 8 e F D E q z / I m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M B V c G 9 f 9 d g p r 6 x u b W 8 X t 0 s 7 u 3 v 5 B + f C o q Z N M M f R Z I h L V D q l G w S X 6 h h u B 7 V Q h j U O B r X B 0 O / N b j 6 g 0 T + S D G a c Y x H Q g e c Q Z N V b y u 2 F E n n r l i l t 1 5 y C r x M t J B X I 0 e u W v b j 9 h W Y z S M E G 1 7 n h u a o I J V Y Y z g d N S N 9 O Y U j a i A + x Y K m m M O p j M j 5 2 S M 6 v 0 S Z Q o W 9 K Q u f p 7 Y k J j r c d x a D t j a o Z 6 2 Z u J / 3 m d z E T X w Y T L N D M o 2 W J R l A l i E j L 7 n P S 5 Q m b E 2 B L K F L e 3 E j a k i j J j 8 y n Z E L z l l 1 d J 8 6 L q X V Z r 9 7 V K / S a P o w g n c A r n 4 M E V 1 O E O G u A D A w 7 P 8 A p v j n R e n H f n Y 9 F a c P K Z Y / g D 5 / M H a 4 K O c Q = = < / l a t e x i t > x < l a t e x i t s h a 1 _ b a s e 6 4 = " 4 M F + F L p A W x t n y 6 z H d k 5 w a x n k 0 X 8 = " > A A A B 7 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k q M e i F 4 8 V T F t o Q 9 l s J + 3 S z S b s b o R a + h u 8 e F D E q z / I m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M B V c G 9 f 9 d g p r 6 x u b W 8 X t 0 s 7 u 3 v 5 B + f C o q Z N M M f R Z I h L V D q l G w S X 6 h h u B 7 V Q h j U O B r X B 0 O / N b j 6 g 0 T + S D G a c Y x H Q g e c Q Z N V b y u 2 F E n n r l i l t 1 5 y C r x M t J B X I 0 e u W v b j 9 h W Y z S M E G 1 7 n h u a o I J V Y Y z g d N S N 9 O Y U j a i A + x Y K m m M O p j M j 5 2 S M 6 v 0 S Z Q o W 9 K Q u f p 7 Y k J j r c d x a D t j a o Z 6 2 Z u J / 3 m d z E T X w Y T L N D M o 2 W J R l A l i E j L 7 n P S 5 Q m b E 2 B L K F L e 3 E j a k i j J j 8 y n Z E L z l l 1 d J 8 6 L q X V Z r 9 7 V K / S a P o w g n c A r n 4 M E V 1 O E O G u A D A w 7 P 8 A p v j n R e n H f n Y 9 F a c P K Z Y / g D 5 / M H b o q O c w = = < / l a t e x i t > z < l a t e x i t s h a 1 _ b a s e 6 4 = " E p b I q 3 m u M r z l Y t H A Q P b O E h m d 4 4 E = " > A A A B 7 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k q M e q F 4 8 V T F t o Q 9 l s N + 3 S z S b s T o R S + h u 8 e F D E q z / I m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O t + O 4 W 1 9 Y 3 N r e J 2 a W d 3 b / + g f H j U N E m m G f d Z I h P d D q n h U i j u o 0 D J 2 6 n m N A 4 l b 4 W j u 5 n f e u L a i E Q 9 4 j j l Q U w H S k S C U b S S 3 w 0 j c t M r V 9 y q O w d Z J V 5 O K p C j 0 S t / d f s J y 2 K u k E l q T M d z U w w m V K N g k k 9 L 3 c z w l L I R H f C O p Y r G 3 A S T + b F T c m a V P o k S b U s h m a u / J y Y 0 N m Y c h 7 Y z p j g 0 y 9 5 M / M / r Z B h d B x O h 0 g y 5 Y o t F U S Y J J m T 2 O e k L z R n K s S W U a W F v J W x I N W V o 8 y n Z E L z l l 1 d J 8 6 L q X V Z r D 7 V K / T a P o w g n c A r n 4 M E V 1 O E e G u A D A w H P 8 A p v j n J e n H f n Y 9 F a c P K Z Y / g D 5 / M H G C a O O g = = < / l a t e x i t > A < l a t e x i t s h a 1 _ b a s e 6 4 = " P D Q b

Figure 4 .
Figure 4. Synchronization setups under investigation in paired image-to-image translation problems: synchronization with respect to celestial coordinates; global synchronization, when images are matched via a global affine transform A; and local synchronization, when images are divided into local blocks and matched via a set of local affine transforms A i , 1 ≤ i ≤ T.

Figure 7 .Figure 8 .
Figure 7. Uncertainty map for point sources: (a) target Webb image; (b) predicted Webb image; (c) true uncertainty; (d) estimated uncertainty.The point sources, that were missed, and for which there is no sign in the uncertainty map, are highlighted with a red box.The point sources are missed, but for which there is a sign in the uncertainty map, are highlighted with an orange box.The estimated PSUR: 26.72 dB.

Figure 9 .
Figure 9. Uncertainty map for the star: (a) target Webb image; (b) predicted Webb image; (c) true uncertainty; (d) estimated uncertainty.The point sources, that were missed, and for which there is no sign in the uncertainty map, are highlighted with a red box.The point sources are missed in the predicted Webb, but there is a sign of one in the uncertainty map, which means it was present in some of the predictions.The estimated PSUR: 24.44 dB.

Table 2 .
Hubble-to-Webb results.All results are obtained on a validation set of Galaxy Cluster SMACS 0723.

Table 3 .
Analysis of parameter complexity and inference time in image-to-image translation models.