Multi-scale Reconstruction of Turbulent Rotating Flows with Generative Diffusion Models

We address the problem of data augmentation in a rotating turbulence set-up, a paradigmatic challenge in geophysical applications. The goal is to reconstruct information in two-dimensional (2D) cuts of the three-dimensional flow fields, imagining to have spatial gaps present within each 2D observed slice. We evaluate the effectiveness of different data-driven tools, based on diffusion models (DMs), a state-of-the-art generative machine learning protocol, and generative adversarial networks (GANs), previously considered as the best-performing method both in terms of point-wise reconstruction and the statistical properties of the inferred velocity fields. We focus on two different DMs recently proposed in the specialized literature: (i) RePaint, based on a heuristic strategy to guide an unconditional DM for flow generation by using partial measurements data and (ii) Palette, a conditional DM trained for the reconstruction task with paired measured and missing data. Systematic comparison shows that (i) DMs outperform the GAN in terms of the mean squared error and/or the statistical accuracy; (ii) Palette DM emerges as the most promising tool in terms of both point-wise and statistical metrics. An important property of DMs is their capacity for probabilistic reconstructions, providing a range of predictions based on the same measurements, enabling for uncertainty quantification and risk assessment.


Italy
We address the problem of data augmentation in a rotating turbulence set-up, a paradigmatic challenge in geophysical applications.The goal is to reconstruct information in two-dimensional (2D) cuts of the three-dimensional flow fields, imagining to have spatial gaps present within each 2D observed slice.We evaluate the effectiveness of different data-driven tools, based on diffusion models (DMs), a state-of-the-art generative machine learning protocol, and generative adversarial networks (GANs), previously considered as the best-performing method both in terms of point-wise reconstruction and the statistical properties of the inferred velocity fields.We focus on two different DMs recently proposed in the specialized literature: (i) RePaint, based on a heuristic strategy to guide an unconditional DM for flow generation by using partial measurements data and (ii) Palette, a conditional DM trained for the reconstruction task with paired measured and missing data.Systematic comparison shows that (i) DMs outperform the GAN in terms of the mean squared error and/or the statistical accuracy; (ii) Palette DM emerges as the most promising tool in terms of both point-wise and statistical metrics.An important property of DMs is their capacity for probabilistic reconstructions, providing a range of predictions based on the same measurements, enabling for uncertainty quantification and risk assessment.a) Electronic mail: alessandrasabina.lanotte@cnr.it

I. INTRODUCTION
In atmospheric and oceanic forecasting, the accurate estimation of systems from incomplete observations is a challenging task [1][2][3][4][5] .These environments, often characterized by turbulent dynamics, require effective reconstruction techniques to overcome the common problem of temporally or spatially gappy measurements.The challenge arises from factors such as instrument sensitivity, the natural sparsity of observational data and also the absence of direct information, as e.g. in the case of deeper ocean layers [6][7][8][9][10] .Established data assimilation techniques such as variational methods 11,12 and the ensemble Kalman filters 13,14 effectively merge time-series observations with model dynamics to attack the inverse problem.When measurements are limited to a single time point, gappy proper orthogonal decomposition (POD) 15 and extended POD 16 deal with spatially incomplete data by exploiting pre-trained statistical relationships between measurements and missing information for the data augmentation goal.These POD-based methods are widely used in fluid mechanics [17][18][19] and geophysical fluid dynamics 20,21 to reconstruct flow fields.
POD-based methods are fundamentally linear yielding reconstructions with smooth flow properties, associated with few leading POD modes.In the context of turbulent flows, this implies that POD-like methods primarily emphasize large-scale structures 22,23 .In recent years, machine learning has led to an increasing number of successful applications in reconstruction tasks for simple and idealized fluid mechanics problems (see 24 for a brief review).We mention super-resolution applications (i.e.finding high-resolution flow fields from low-resolution data) [25][26][27] , inpainting (i.e.reconstructing flow fields having spatial damages) 23,28 , and inferring volumetric flows from surface or two-dimensional (2D)-section measurements [29][30][31] .However, much remains to be clarified concerning benchmarks and challenges, and this is even more important for realistic turbulent setup and at increasing flow complexity, e.g. for increasing Reynolds numbers.When dealing with turbulent systems, the quality of reconstruction tasks must be judged according to two different objectives: (i) the point-wise error, given by the succes to filling gappy or damaged regions of the instantaneous fields with data close to the ground truth configuration by configuration; (ii) statistical error, by reproducing statistical multi-scale and multi-point properties, such as probability distribution functions (PDFs), spectra, etc., of the system.
To move from proof-of-concept to quantitative benchmarks, in a previous work 23 , we systematically compared POD-based methods with generative adversarial networks (GANs) 32 using both point-wise and statistical reconstruction objectives for fully developed rotating turbulent flows, accounting for different gap sizes and geometries.GANs belong to the large family of generative models, i.e., machine learning algorithms that produce data according to a probability distribution optimized to resembles that of the data used in the training.The learning task is made by two networks that compete with each other: A first generative network is used to predict the data in the gap from the input measurement to obtain a good point-wise reconstruction; second, to overcome the lack of expressiveness in the multi-scale (with low energetic content) flow structures, a second adversarial network, called the discriminator, is used to optimize the statistical properties of the generated data.Contrary to expectations, despite their non-linearity, GANs only matched the best linear POD techniques in point-wise reconstruction.However, GANs showed superior performance in capturing the statistical multi-scale non-Gaussian fluctuation characteristics of three-dimensional (3D) turbulent flow 23 .
From our previous comparative study, we also observed that GANs pose many challenges in the training processes, due to presence of instability and the necessity for hyper-parameters fine-tuning to achieve a suitable compromise in the multi-objective task.Furthermore, a common limitation of our GANs and POD-based methods is that they provide only deterministic reconstruction solution.
This singular output contrasts with the intrinsic nature of turbulence reconstruction, which is a oneto-many problem with multiple plausible solutions.The ability to generate an ensemble of possible reconstructions is critical for practical atmospheric and oceanic forecasting, e.g., in relation to uncertainty quantification and risk assessment of rare, high-impact events [33][34][35] .
More recently, diffusion models (DMs) 36 have emerged as a powerful generative tool, showing exceptional success in domains such as computer vision [36][37][38] , audio synthesis 39 , and natural language processing 40 , particularly outperforming GANs in image synthesis 38 .Their applications have also extended to fluid dynamics for super-resolution 41 , flow prediction 42 and Lagrangian trajectory generation 43 .By introducing Markov chains to effectively generate data samples (see Section II B), the implementation of DMs eliminates the need to resort to the less stable adversarial training of GANs, making DMs generally more stable in the training stage.Another characteristic of DMs is their inherent stochasticity in the generation process, which allows them to produce multiple outputs that adhere to the learned distribution conditioned on the same input.
This study focuses on the first attempt to using DMs for the reconstruction of 2D velocity fields of rotating turbulence, a complex system characterized by both large-scale vortices and highly non-Gaussian and intermittent small-scale fluctuations [44][45][46][47][48] .Our objectives are twofold: first, we aim to make comprehensive comparisons with the best-performing GAN methods from our previous research, and second, we aim to investigate the effectiveness of DMs in probabilistic reconstruction tasks.The paper is organized as follows: in Section II, we introduce the system under consideration and the two adopted strategies for flow reconstruction using DMs.The first is a heuristic conditioning method applied to an unconditional DM designed for flow generation, as demonstrated by RePaint 49 .The second strategy uses a supervised approach, training a DM conditioned on measurements, similar to the Palette method 50,51 .In Section III, we discuss the performance of the two DMs in point-wise and statistical property reconstruction, in comparison with the previously analyzed GAN method 22 .In Section IV, we study the probabilistic reconstruction capacity of the DMs.We end with some comments in Section V.

A. Problem Setup and Data Preparation
This study adopts the same experimental framework as our previous work 23 , and explores possible improvements from DMs.We setup a mock field-measurement imagining to be able to obtain data from a gappy 2D slice of the original 3D volume of rotating turbulence, orthogonal to the axis of rotation.The full 2D image is denoted as (I), the support of the measured domain as (S), and the support of the gap where we miss the data as (G).Here (G) represents a centrally located square gap of variable size, as shown in Figure 1a.We use the TURB-Rot database 52 obtained from direct numerical simulation (DNS) of the incompressible Navier-Stokes equations for rotating fluid in a 3D periodic domain, which can be written as where u is the incompressible velocity, Ω = Ω x3 is the rotation vector, and p represents the pressure modified by a centrifugal term.The regular, cubic grid has N 3 = 256 3 points.The statistically homogeneous and isotropic forcing f acts at large scales around k f = 4, and it is the solution of a second-order Ornstein-Uhlenbeck process 53,54 For further details of DNS, see 52 , a sketch of the original 2D spectrum is also shown in Figure 1b.
Data were extracted from the DNS by sampling the full 3D velocity field (Figure 1a) during the stationary stage at intervals of ∆t s = 5.41T L to reduce temporal correlation.We collected 600 early snapshots for training and 160 later snapshots for testing, with the two collections separated by over 3400T L to ensure independence.To manage the data volume while preserving complexity, the resolution of the sampled fields was reduced from 256 3 to 64 3 using Galerkin truncation in Fourier space, with the truncation wavenumber set to k η .We then selected x 1 -x 2 planes at different x 3levels and augmented them by random shifts with the periodic boundary conditions, resulting in a train/test split of 84,480/20,480 samples.
For a baseline comparison, we use the best-performing GAN tailored for this setup in 23 , which showed point-wise error close to the best POD-based method and good multi-scale statistical properties.In our analyses, we focus only on the velocity magnitude, u(x 1 , x 2 ) = ∥u(x 1 , x 2 )∥.Shortly, the GAN framework consists of two competing convolutional neural networks: the first network is a generator, that transforms input measurements into predictions for the missing or damaged data; the second is a discriminator, that works to discriminate between generated data and real fields.The training of the generator minimizes a loss function consisting of mean squared error (MSE) and an adversarial loss provided by the discriminator, optimizing point-wise accuracy and statistical fidelity, respectively.A more detailed description of the GAN can be found in 23 .

B. DM Framework for Flow Field Generation
Before moving to the more difficult task to inpaint a gap conditioned on some partial measurements of each given image, we need to define how to generate unconditional flow realizations.
Unlike GANs, which map input noise to outputs in a single step, DMs use a Markov chain to incrementally denoise and generate information through a neural network, see Figure 1c

4
< l a t e x i t s h a 1 _ b a s e 6 4 = " w T u J i 8 P E N P S h

8
< l a t e x i t s h a 1 _ b a s e 6 4 = " P e U s S b 4 j 2 w C D a O x k q X r V 9 m t / 4 6 4 = "  < l a t e x i t s h a 1 _ b a s e 6 4 = " r X e P r w t 0 r y  DMs is their inherent stochasticity in the generation process, which allows them to produce multiple outputs that adhere to the learned distribution conditioned on the same input.In this section, we introduce the DM framework for flow field generation.The velocity magnitude field on the full 2D domain (I) is denoted by V I = {u(x)|x ∈ I}, and the distribution of this field is represented as p(V I ).In order to train the model we need first to produce a set of images with larger and larger noise.To do that, the DM framework defines a forward process or diffusion process that incrementally adds Gaussian noise to the data until it becomes indistinguishable from white noise after N diffusion steps (Figure 2).This set of diffused images is used for training a network to perform a backward denoising process, starting from the set of pure i.i.d.Gaussian-noise 2D realizations and trying to reproduce the set of images in the training data-set.Once accomplished the training, one freezes the parameters of the network and uses it to generate brand new images by sampling from any realization of pure random images in the input (see Figure 3a for a sketch summary).The forward diffusion process is expressed in terms of a sequence of N steps, conditioned on the original set of images, i.e. for each image in the training data set we produce N noisy copies with an increasing amount of diffusion: where V (0) I = V I is the initial magnitude field and V ., and then some noise is added by j forward steps (blue arrow).Finally, the field is resampled backwards by the same number of iterations, going back to the original step.
Each step, n = 1, . . ., N, of the forward process can be directly obtained as which implies sampling from a Gaussian distribution where the mean of V = N (0, I), through a backward process (see Figure 3a) described by Where it is important to notice that the stochasticity in the process allows for the production of different final images even when starting from the same noise.In the continuous diffusion limit, characterized by sequences of small values of β n , the backward process has a functional form identical to that of the forward process, as discussed in 56,57 .Consequently, the neural network is tasked with predicting the mean µ θ (V (n) The neural network is optimized to minimize an upper bound of the negative log likelihood, .The goal of RePaint is to set up a generative process where this backward probability is also conditioned on some measured data, denoted as V S .In this way, each new sample in the backward direction is generated from the one-step backward conditioned probability, defined as p θ V S .In summary, at any generic backward step n, RePaint approximates the conditional backward probability as follows: Here, V I | G , represents the projection of the sample generated by the backward process at step n projected inside the gap region (the central square), while, V , see Figure 3c.
The propagation of information from the measurements into the gap happens, thanks to the application of the non-linear (and non-local) function approximated by the U-Net employed in the DM.Hence, the output of the U-Net, describing the probability of moving from step n to n − 1, is the result of non-local convolutions mixing information in the two regions (S) and (G).In this way, the model mitigates the discontinuities generated across the gap by merging the generated and the measured data.Furthermore, to allow a deeper propagation of information, improving correlations between the measurements and the generated data, RePaint employs a resampling strategy 49 .The idea of resampling, as shown schematically in Figure 3d, is that each sample at step n − 1, extracted from conditioned probability, p θ V , is not directly used as input to move backward at step n − 2, but instead it is first propagated forward for j steps (by adding more noise) before returning according to the conditioned backward process at step n − 1.
This operation gives the U-Net model the opportunity to iterate the propagation of information from the measured region inside the gap.Resampling can be applied at different steps multiple times, resulting in a back-and-forth progression during the generation process, as opposed to a monotonic backward progression from n = N to n = 0. Further details, such as the network architecture and other parameters, can be found in Appendix B. As demonstrated in computer vision applications 49,[58][59][60] , this strategy has the advantage of being easily generalizable to diverse tasks, such as free-form inpainting with arbitrary mask shapes.However, this introduces several new challenges in the design of such a convoluted generation protocol, which is neither trivial in its optimization nor in its implementation.
PALETTE.An alternative approach to perform flow field reconstruction is to train the DM directly to learn the backward probability distribution conditioned on the measured data, previously G , V S .This method, called Palette, has been successfully used in various computer vision applications such as image-to-image translation tasks 50,51 .The idea is to train a U-Net using the same strategy as any unconditioned DM, but giving the network as input the additional information coming from the measurements, at any step during the diffusion process.This allows the model to learn during training how to use information from available data to achieve optimal reconstruction inside the gap.In addition, unlike the RePaint method, Palette always uses the measured data without adding noise.In this way, the forward process can be defined as for the pure generation case, but it takes place only within the gap region, while the data on the support, V S , are frozen throughout the diffusion process and serves as an additional input to the model.A schematic summary of the Palette approach is shown in Figure 4. Once the DM model is trained, since the reconstruction process is Markovian as in the standard generative DM, the conditional probability of the reconstructed field, p θ V (0) G |V S , can be determined through the following iterative process: starting from any Gaussian noise V (N) G .To facilitate the comparison with the GAN model implemented in our previous works 23,28 , we trained a separate Palette model for each fixed mask size.
Let us stress that both methods are capable of training on a free-form mask 61 .More details on Palette are in Appendix B.

A. Large-scale Information
To quantify the reconstruction error between the predicted velocity magnitude, u (p) G , and the true velocity magnitude, u (t) G , within the gap region, we introduce the normalized MSE as follows: Here ∆ u G represents the spatially averaged L 2 error in the central, gappy region for a single flow configuration, and it is calculated as where A G denotes the area of the gap.Averaging ⟨•⟩ is done over the test data set.The normalization factor, E u G , is defined as the product of the standard deviations of the predicted and true velocity magnitudes within the gap: G is similarly defined.This choice for the normalization term, E u G , ensures that predictions with significantly low or high energy levels will result in a large MSE.
In our analysis, we use the Jensen-Shannon (JS) divergence to assess the distance between the PDF of a predicted quantity and the PDF of the true data.Specifically, the JS divergence applied to two distributions P(x) and Q(x) defined on the same sample space is where M = 1 2 (P + Q) and is the Kullback-Leibler (KL) divergence.As the two distributions get closer, the value of the JS divergence becomes smaller, with a value of zero indicating that P and Q are identical.hand, RePaint has a larger MSE for all sizes compared to the other two methods, demonstrating the limitations of the RePaint approach in enforcing correlations between measurements and generated data without being specifically trained on a reconstruction problem as the other two approaches.
The red baseline, derived from predictions using randomly shuffled test data, represents the case where the predictions guess the exact statistical properties, ⟨u G ⟩ and ⟨(u G ) 2 ⟩, but lose all correlation with the measurements, ⟨u G ⟩.We now examine the velocity magnitude PDFs as predicted by the different methods and compare them with the true data.
In Figure 5b we present the JS divergence between the predicted and true velocity magnitudes, First of all, it is important to highlight that all the JSD(u G ) values are well below 10 −2 , suggesting that there is always a close match between the PDFs of the reconstructed and that of the true velocity magnitude.The agreement between the different PDFs is also shown in Figure 6, where one can see the extremely good performance of all models to closely match the PDFs of the generated velocity magnitude with the ground truth one.
Going back to the results presented in Figure 5b, it is possible to note that in the small gap region, l/l 0 ≤ 0.4, there is a monotonic behavior of the JS divergence, which tends to decrease as the gap increases.This can be interpreted by the fact that the main contribution to the JS divergence is due to statistical fluctuations in the PDF tails which are less accurately estimated when the gap is small.This behavior is clearly visible in the results of the RePaint approach, which shows a monotonic decrease in the JS divergence over the whole range of gaps analyzed.The same effect is not visible in the other approaches in the range above l/l 0 = 0.5.The reason is probably due to the fact that both GAN and Palette rely on different training to reconstruct different gap sizes, and the fluctuations due to the training convergence could be underestimated.The non-monotonicity is much more pronounced in the GAN results, as this approach is known to be less stable during training.The analysis shows that, contrary to the other two approaches, RePaint trained on the pure generation without any conditioning is the best method to obtain a statistical representation of the true data.
In Figure 7, we compare the PDFs of the spatially averaged L 2 error, ∆ u G , for different flow configurations.For small and medium gap sizes (Figure 7a,b), the PDFs of GAN and Palette closely match, whereas the PDF of RePaint, although similar in shape, exhibits a range with larger errors.For the largest gap size (Figure 7c), Palette is clearly the most accurate, predicting the smallest errors.Again, RePaint performs the worst, characterized by a peak at high error values and a broad error range.
Finally, Figure 8 provides a visual qualitative idea of the reconstruction capabilities of the instantaneous velocity magnitude field using the three adopted models.While all methods gener- reconstruction samples, as shown in Figure 11.For small gap sizes (Figure 11a), all three methods produce realistic predictions that correlate well with the original structure.However, for medium and large gap sizes (Figure 11b,c), only Palette is able to generate gradient structures that are well correlated with the ground truth.The better performance of DMs in capturing statistical properties is further demonstrated by a scale-by-scale analysis of the 2D energy spectrum obtained from the reconstructed fields, Here, k = (k 1 , k 2 ) denotes the horizontal wavenumber, û(k) is the Fourier transform of the velocity magnitude, and û * (k) is its complex conjugate.Direct comparison of the spectra are shown in Figure 12a-c, for three gap sizes.In Figure 12d-f, we plot the ratio of the reconstructed to the original spectra, denoted as E(k)/E (t) (k).Deviations from unity in this ratio better highlight the wavenumber regions where the reconstruction is less accurate.While all methods produce satisfactory energy spectra, a closer examination of the ratio to the original energy spectrum shows where δ r u = u(x + r) − u(x) and r = (r, 0), with ⟨•⟩ denoting the average over test data and over x, for points x and x + r where only one, or both of them, are within the gap.The flatness calculated over the entire region of the original field is also shown for comparison.In Figure 13, the flatness results further confirm that RePaint and Palette consistently maintain their highquality performance across all scales.In contrast, while GAN is effective at small gap sizes, it faces challenges in maintaining similar standards at small scales for medium and large gap sizes.

IV. PROBABILISTIC RECONSTRUCTIONS WITH DMS
So far, we have analyzed the performances of the three models in the reconstruction of the velocity magnitude itself and its statistical properties.In this section, we explore the probabilistic of 3D rotating turbulence using conditional DMs, surpassing the previous GAN-based approach.
The better performance of DMs over GANs stems from their iterative, denoising construction process, which builds up the prediction scale-by-scale, resulting in better performance across all scales.The inherent stochasticity of this iterative process yields a probabilistic set of predictions conditioned on the measurement, in contrast to the unique prediction of the GAN here implemented.Our study opens the way to further applications for risk assessment of extreme events and in support of various data assimilation methods.It is important to note that DMs are significantly more computationally expensive than GANs due to the iterative inference steps.Despite this, many efforts in the computer vision field have been devoted to accelerating this process 62,63 .A promising avenue for future studies could focus on flows at higher Reynolds numbers and Rossby numbers, close to the critical transition leading to the inverse energy cascade, a very complex turbulent scenario where both 3D and 2D physics coexists in a multi-scale environment.
The two stacks are connected by an intermediate module, which consists of two residual blocks sandwiching an attention block 65 3a).The model is trained using the AdamW optimizer 66 with a learning rate of 10 −4 over 2 × 10 5 iterations.In addition, an exponential moving average (EMA) strategy with a decay rate of 0.999 is applied over the model parameters.During the reconstruction phase with a total of N = 2000 diffusion steps, the resampling technique is initiated at n = 990 and continues down to n = 0.In this approach, resampling is applied at every 10th step within this range, resulting in its application at 100 different points.At each point the resampling involves a jump size of j = 10 and this procedure is iterated 9 times for each resampling point.
In this process, we first uniformly sample n from 1 to N, and then uniformly sample ᾱ in the range from ᾱn−1 to ᾱn .This approach allows Palette to use different noise schedules and total backward steps during inference.In fact, during reconstruction we use a total of 1000 backward steps with a linear noise schedule ranging from β 1 = 10 −4 to β N = 0.09.The Adam optimizer 67 is used with a learning rate of 5 × 10 −5 , training the model for approximately 720 to 750 epochs.
t e x i t s h a 1 _ b a s e 6 4 = " x + m 4 u R g R E H n + T Z m k I 6 D + z / 0 W 9 n M = " > A A A C A 3 i c b V D L S s N A F L 2 p r 1 p f V Z d u B o v g q i T i o 8 u C G 5 c V b C s 0 o U y m k 3 b o T B J m J k J J s v Q X 3 O r e n b j 1 Q 9 z 6 J U 7 b L G z r g Q u H c + 7 l X I 4 f c 6 a 0 b X 9 b p b X 1 j c 2 t 8 n Z l Z 3 d v / 6 B 6 e N R R U S I J b Z O I R / L R x 4 p y F t K 2 Z p r T x 1 h S L H x O u / 7 4 d u p 3 n 6 h U L A o f 9 C S m n s D D k A W M Y G 0 k N 8 u Q 6 4 s 0 y V G W 9 a s 1 u 2 7 P g F a J U 5 A a F G j 1 q z / u I C K J o K E m H C v V c + x Y e y m W m h F O 8 4 q b K B p j M s Z D 2 j M 0 x I I q L 5 3 9 n K M z o w x Q E E k z o U Y z 9 e 9 F i o V S E + G b T Y H 1 S C 1 7 U / E / r 5 f o o O G l L I w T T U M y D w o S j n S E p g W g A Z O U a D 4 x B B P J z K + I j L D E R J u a F l J 8 k V d M K c 5 y B a u k c 1 F 3 r u t X 9 5 e 1 Z q O o p w w n c A r n 4 M A N N O E O W t A G A j G 8 w C u 8 W c / W u / V h f c 5 X S 1 Z x c w w L s L 5 + A Z R K m G A = < / l a t e x i t > ||u|| < l a t e x i t s h a 1 _ b a s e 6 4 = " W R W n g + e Y g e Z o t 8 B S K H W W a + o L t M o = " > A A A B + H i c b V A 9 T w J B E J 3 D L 8 Q v 1 N J m I z G x I n d G 0 Z L E x h I S + U j g Q v a W A T b s 7 V 1 2 9 0 z w w i + w 1 d 7 O 2 P p v b P 0 l L n C F g C + Z 5 O W 9 m c z M C 2 L B t X H d b y e 3 s b m 1 v Z P f L e z t H x w e F Y 9 P m j p K F M M s d T g F M 4 g w s I 4 A Z q c A 9 1 a A C B G F 7 g F d 6 8 z H v 3 P r z P e e u a l 8 + c w A K 8 r 1 8 V E Z O T < / l a t e x i t > 10 < l a t e x i t s h a 1 _ b a s e 6 4 = " L n p V y T g E N 3 + D G d 4 / J G B u A x S 9 K 2 Y = " > A A A B + H i c b V A 9 T w J B E J 3 z E / E L t b S 5 S E y s y B 1 R t C S x s Y R E P h K 4 k L 1 l g A 2 7 e 5 f d P R O 8 8 A t s t b c z t v 4 b W 3 + J C 1 w h 4 E s m e X l v J j P z w p g z b T z v 2 9 n Y 3 N r e 2 c 3 t 5 f c P D o + O C y e n T R 0 T b s 3 l 1 2 9 0 z w w i + w 1 d 7 O 2 P p v b P 0 l L n C F g C + Z 5 O W 9 m c z M C 2 L B t X H d b y e 3 s b m 1 v Z P f L e z t H x w e F Y 9 P m j p K F M M G i 0 S k 2 g H V K H i I D c O N w H a s k M p A Y C s Y 3 8 / 8 1 h M q z a P w 0 U x i 9 C U d h n z A G T V W q l d 6 x Z J b d u c g 6 8 T L S A k y 1 H r F n 2 4 / Y o n E 0 D B B t e 5 4 b m z 8 l C r D m c B p o Z t o j C k b 0 y F 2 L A 2 p R O 2 n 8 0 O n 5 M I q

6 <
h Z P H s 7 g H C 7 B g 1 u o w g P U o A E M E F 7 g F d 6 c Z + f d + X A + F 6 0 5 J 5 s 5 h S U 4 X 7 + s A Z N e < / l a t e x i t > l a t e x i t s h a 1 _ b a s e 6 4 = " r E 1 y X X 1 J I l F W s t u i x X i A k j 4 G l E 4 = " > A A A B + H i c b V A 9 T w J B E J 3 z E / E L t b T Z S E y s y J 1 R p C S x s Y R E P h K 4 k L 1 l g A 1 7 e 5 f d P R O 8 8 A t s t b c z t v 4 b W 3 + J C 1 w h 4 E s m e X l v J j P z g l h w b V z 3 2 9 n Y 3 N r e 2 c 3 t 5 f c P D o + O C y e n T R 0 l i m G D R S J S 7 Y B q F F x i w 3 A j s B 0 r p G E g s B W M 7 2 d + 6 w m V 5 p F 8 N J M Y / Z A O J R 9 w R o 2 V 6 p V e o e i W 3 D n I O v E y U o Q M t V 7 h p 9 u P W B K i N E x Q r T u e G x s / p c p w J n C a 7 y Y a Y 8 r G d I g d S y U

3 <
7 x p 9 O L S C K o N I R j r d u e G x s / x c o w w u m 4 0 E k 0 j T E Z 4 j 5 t W y q x o N p P p 6 e O 0 Y l V e i i M l C 1 p 0 F T 9 O 5 F i o f V I B L Z T Y D P Q i 9 5 E / M 9 r J y a 8 9 l M m 4 8 R Q S W a L w o Q j E 6 H J 3 6 j H F C W G j y z B R D F 7 K y I D r D A x N p 2 5 L Y E Y F 2 w o 3 m I E y 6 R x V v Y q 5 c u 7 i 1 K 1 k s W T h y M 4 h l P w 4 A q q c A s 1 q A O B P r z A K 7 w 5 z 8 6 7 8 + F 8 z l p z T j Z z C H N w v n 4 B R e i U R g = = < / l a t e x i t > x l a t e x i t s h a 1 _ b a s e 6 4 = " u Q r F z v 6 d c S A C n g v 5 l 4 D e l 2 5 D V u A = " > A A A B 6 n i c b V D L S g N B E O z 1 G e M r 6 t H L Y B D i J e x K f B w D X j x G Y h 6 Q L G F 2 0 k m G z M 4 u M 7 N C W P I J X j w o 4 t U v 8 u b f O E n 2 o I k F D U V V N 9 1 d Q S y 4 N q 7 7 7 a y t b 2 x u b e d 2 8 r t 7 + w e H h a P j p o 4 S x b D B I h G p d k A 1 C i 6 x Y b g R 2 I 4 V 0 j A Q 2 A r G d z O / 9 Y R K 8 0 g + m k m M f k i H k g 8 4 o 8 Z K 9 V L 9 o l c o u m V 3 D r J K v I w U I U O t V / j q 9 i O W h C g N E 1 T r j u f G x k + p M p w J n O a 7 i c a Y s j E d Y s d S S U P U f j o / d U r O r d I n g 0 j Z k o b M 1 d 8 T K Q 2 1 n o S B 7 Q y p G e l l b y b + 5 3 U S M 7 j 1 U y 7 j x K B H N E c 6 L 8 + 5 8 L F r X n G z m B P 7 A + f w B c m + N O A = = < / l a t e x i t > (S)

FIG. 1 .
FIG. 1.(a) Visualization of the velocity magnitude from a three-dimensional (3D) snapshot extracted from our numerical simulations.The two velocity planes (in the x 1 -x 2 directions) at the top and bottom of the integration domain show the velocity magnitude.In the 3D volume we visualize a rendering of the smallscale velocity filaments developed by the 3D dynamics.The gray square on the top level is an example of the damaged gap area, denoted as (G), while the support where we suppose to have the measurements is denoted as (S), and their union defines the full 2D image, (I) = (S) ∪ (G).A velocity contour around the most intense regions (∥u∥ > 6.35) highlights the presence of the quasi-2D columnar structures (almost constant along x 3 -axis), due to the effect of the Coriolis force induced by the frame rotation.(b) Energy spectra averaged over time.The range of scales where forcing is active is indicated by the gray band.The dashed vertical line denotes the Kolmogorov dissipative wavenumber.The reconstruction of the gappy area is based on a downsized image on a grid of 64 2 collocation points, which corresponds to a resolution of the order of 1/k η .(c) Sketch illustration of the reconstruction protocol of a diffusion model (DM) in the backward phase (see later), which uses a Markov chain to progressively generate information through a neural network.

FIG. 2 .
FIG. 2. Diagram of the forward process in the DM framework.Starting with the original field V (0) I = V I , Gaussian noise is incrementally added over N diffusion steps, transforming the original 64 2 image into white noise on the same resolution grid, V (N) I .

∼
N (0, I) represents the final whitenoise state, an ensemble of Gaussian images made of uncorrelated pixels with zero mean and unit variance.The notation V (1:N) I is used to denote the entire sequence of generated noisy fields, V (1) I , . . ., V (N) I

FIG. 3 .
FIG. 3. Schematic representation of the DM flow field generation framework used by RePaint for flow reconstruction.Training stage: (a) the neural network architecture, U-Net 55 , that takes a noisy flow field as input at step n and predicts a denoised field at step n − 1; (b) The scheme of the forward and backward diffusion Markov processes.The forward process (from right to left) incrementally adds noise over N steps, while the backward process (from left to right), modeled by the U-Net, iteratively reconstructs the flow field by denoising the noisy data.More details on the network architecture can be found in Appendix B. (c, d) Reconstruction stage starting from damaged fields with a square mask of variable size.(c) Conditioning the backward process with the measurement, V S , involves projecting the noisy state of the entire 2D field, V (n) I , onto the gap region, V (n) I | G , and combining it with the noisy measurement, V (n) S , obtained from

.
is β n I.The variance schedule β 1 , . . ., β N is predefined to allow a continuous transition to the pure Gaussian state.For more details on the variance schedule and other aspects of the DMs used in this study, see Appendix B. The DM trains a neural network to approximate the reverse process of Equation (3), denoted as p θ V This approximation allows the generation of new velocity magnitude fields from Gaussian noise, p V (N) I

)
This training objective tends to result in more stable training compared to the tailored loss functions used in GANs.For a detailed derivation of the loss function and insights into the training details, please refer to Appendix A. C. Flow Field Data Augmentation with DMs: RePaint and Palette Strategies REPAINT.The RePaint approach aims to reconstruct missing information in the flow field using a DM that has been trained to generate the full 2D flow field from Gaussian noise as described in the section above, without any conditioning on measured data, and without relying on any further model training.To achieve the correct reconstruction, RePaint aims to ensure the conditioning on the measurements only by redesigning an ad-hoc generation protocol 49 .As discussed above, during training DM learns to approximate the backward transition probability to step on a sample V (n−1) I , only from the knowledge of the sample obtained in the previous step V (n) I , hence, DM models the one-step backward transition probability, p θ V

I
, V S .To achieve this goal, RePaint substitutes the DM model input, V (n) I , with another 2D field, Ṽ (n) I , which is given by the union of V (n) I projected only to have support inside the gap (G) and the measured data on the support (S) propagated at step n according to the forward process, namely, Ṽ

S
, is the noisy version of the measured data (outside the square gap) that is obtained by a forward propagation up to step n of the measurements.At this point, Ṽ (n) I , replacing V (n) I , is given as input to the model and it is used to obtain the next sample at step n − 1, V (n−1) I

FigureFIG. 5 .
Figure5ashows the MSE(u G ) as a function of the normalized gap size, l/l 0 .It shows that Palette achieves a comparable MSE with respect to GAN, for most gap sizes.Only for the largest gap size, l/l 0 = 62/64, the MSE of Palette is significantly better than that of GAN.On the other

FIG. 6 .
FIG. 6. PDFs of the velocity magnitude in the missing region obtained from (a) GAN, (b) RePaint and (c) Palette for a square gap of variable size l/l 0 = 24/64 (triangle), 40/64 (cross), and 62/64 (diamond).The PDF of the true data over the whole region is plotted for reference (solid black line) and σ (u) is the standard deviation of the original data over the full domain.

FIG. 8 .
FIG. 8. Examples of reconstruction of an instantaneous field (velocity magnitude) for a square gap of size (a) l/l 0 = 24/64, (b) l/l 0 = 40/64 and (c) l/l 0 = 62/64.The damaged fields are shown in the first column, while the second to fourth columns, circled by a red rectangle, show the reconstructed fields obtained from GAN, RePaint and Palette.The ground truth is shown in the fifth column.

1 FIG. 11 .
FIG. 11.The gradient of the velocity magnitude fields shown in Figure 8.The first column shows the damaged fields with a square gap of size (a) l/l 0 = 24/64, (b) l/l 0 = 40/64 and (c) l/l 0 = 62/64.Note that for the case l/l 0 = 62/64, the gap extends almost to the borders, leaving only a single vertical velocity line on both the left and right sides, where the original gradient field is missing.The gradient of the reconstructions from GAN, RePaint and Palette, shown in the second to fourth columns, is surrounded by a red rectangle for emphasis, while the fifth column shows the ground truth.

For
the Palette model, the U-Net configuration uses [C, 2C, 4C, 8C] channels across its stages, with C set to 64.Each stage has two residual blocks.Attention mechanisms are uniquely implemented in the intermediate module, with multi-head attention using 32 channels per head, as shown in Figure 4b.The model also incorporates a dropout rate of 0.2 for regularization.Following the approach in 39,50,51 , we train Palette by conditioning the model on the continuous noise level ᾱ, instead of the discrete step index n.As a result, the loss function originally formulated in Equation (A12) is modified to . In the stationary state, with Ω = 8 The Kolmogorov dissipative wavenumber, k η = 32, is chosen as the scale at which the energy spectrum begins to decay exponentially.An effective Reynolds number is defined as Re e f f = (k 0 /k η ) −3/4 ≈ 13, with the smallest wavenumber k 0 = 1.The integral length scale is L = E / kE(k) dk ≈ 0.15L 0 , where L 0 = 2π is the domain length, and the integral time scale is . Both DMs are trained with a batch size of 256 on four NVIDIA A100 GPUs for approximately 24 hours.For the RePaint model, the U-Net stages from the highest to lowest resolution (64 × 64 to 8 × 8) are configured with [C, 2C, 3C, 4C] channels, where C equals 128.Three residual blocks are used at each stage.Attention mechanisms, specifically multi-head attention with four heads, are implemented after each residual block at the 16 × 16 and 8 × 8 resolution stages, and also within the intermediate module (Figure