Review on stochastic approach to inflation

We present a review on the state-of-the-art of the mathematical framework known as stochastic inflation, paying special attention to its derivation and giving references for the readers interested on results coming from the application of the stochastic framework to different inflationary scenarios, especially to those of interest for primordial black hole formation. During the derivation of the stochastic formalism, we will emphasise two aspects in particular: the difference between the separate universe approach and the true long wavelength limit of scalar inhomogeneities and the generically non-Markovian nature of the noises that appear in the stochastic equations.


I. INTRODUCTION
Although cosmological inflation was first introduced as a possible solution to the hot Big Bang model problems [1,2], the study of vacuum fluctuations during this accelerated expansion of the universe also gives an explanation to the anisotropies observed in the Cosmic Microwave Background (CMB). The idea is that the different scales of fluctuations leave the observable universe during inflation, long after it, they reenter the observable universe at different times, being the scales reentering the cosmological horizon (horizon from now on) during the time of recombination the responsible for the CMB anisotropies. In fact, the almost gaussianity and scale invariance of the vacuum quantum fluctuations predicted by Slow Roll (SR) inflation has been remarkably well confirmed by the recent Plank satellite mission [3,4].
It is however true that, although inflation provides a causal mechanism to generate CMB anisotropies, the CMB is only accessible to a restricted range of scales, constraining the inflationary phase only during a limited time interval. In order to be accessible to smaller scales of inflation, that reenter the horizon before the recombination epoch, we must then seek for any other hint that help us to extend the time interval of inflation that we can constraint. Primordial Black Holes (PBHs) is one of those hints [5]. In addition to the fact that PBHs can probe the missing scales of inflation, they represent natural candidates not only for dark matter (DM) [6], but also as the seed of supermassive BHs at the center of massive galaxies [7] and even as the progenitors of some events that radiate the gravitational waves detected by the LIGO/VIRGO collaboration [8].
PBHs are expected to form if the amplitude of density perturbations from inflation is big enough such that when they reenter the horizon they will collapse into a Black Hole (BH). This is not the case for CMB scales, for which the SR approximation holds and where the amplitude of inhomogeneities is too small to form PBHs; however, for smaller scales, this amplitude is less constrained and it could grow until PBHs are possible to form, in this case, SR is not a good approximation and other inflationary regimes arise such as Ultra Slow Roll (USR) and Constant Roll (CR).
Since the over-densities responsible for the formation of PBHs must be large enough, they will be found at the tail of the probability distribution of density perturbations, which makes them exponentially rare [9]. At the same time, a large over-density at small scales can modify the large-scale dynamics of the universe in a non-perturbative way. Thus, in order to predict the abundance of PBHs, a precise statistical knowledge of inflationary perturbations is highly desirable.
The hope of the stochastic approach to inflation is that it incorporates quantum corrections to the inflationary dynamics in a non-perturbative way [10]. In this approach, wavelengths that are well outside the horizon are approximated in powers of spatial gradients rather than on amplitudes (as in linear theory). At the same time though, those modes are influenced by the quantum sector by receiving quantum-kicks from stochastic forces generated by the perturbative sub-horizon modes. The success of the stochastic formalism resides in the fact that it allows to reduce a quantum problem into a statistical one and it has been widely used in the literature .
Since this manuscript is mainly devoted to stochastic inflation, we will briefly review the long history behind it, which will also help us to understand why the paper is organized as it is.
Soon after Starobisnky first proposed the idea of treating the short-wavelength part of the field as a white noise which is included as a source term in the equation of motion for the classical (long-wavelength) field in [10], stochastic inflation immediately gained huge popularity [11][12][13][14][15][16][17][18][19][20][21][22][23][24], manly because of the possibility of solving the Fokker-Planck equation that govern the evolution of the probability distribution of the amplitude of the perturbations of the scalar field during inflation in an exact way for some specific form of the potential, or even for generic potentials with the help of the first time passage analysis [69] as firstly done in [24]. In the references above, the heuristic derivation of Starobinsky, which was only valid in single-field slow-roll inflationary models, was extended beyond SR and multifield inflation.
At the same time, Morikawa et al. put the stochastic formalism on a more firm ground [25,26], where the heuristic derivation of the equations of motion of stochastic inflation done by previous authors was refined. More concretely, they were able to derive the same stochastic equations integrating out the short-wavelength part of the scalar field in the path integral. Although this derivation lead to the same equations of motion as the heuristic argument followed by previous authors, it is very useful in order to understand what are the approximations that lead to the stochastic equations first presented by Starobinsky and its consequences.
The most important of these approximations is probably to compute the variance of the noises in a pure de-Sitter background. This approach has been vastly used in mort applications of the stochastic formalism until today. Examples of this can be found in [27, 29-33, 35, 37-39, 50, 54, 56], but more importantly, with the aid of this approach, the well-known δN formalism [70][71][72][73] was extended to an stochastic δN formalism [36,41,46,47,53,61,62,66] which allow us to compute the probability distribution of the curvature perturbation and hence to connect the results of stochastic inflation with PBH formation as done for example in [52,58,59,63,64,67].
However, computing the noises in a pure de-Sitter background has important consequences, in fact, it was checked in [35,37,40,45,[47][48][49] that, under this approximation, stochastic formalism recovers the standard result of Quantum Field Theory (QFT) for test scalar fields on a fixed inflationary background, which give us a hint that thos approach does not correctly take into account the backreaction of the scalar field in the metric. As pointed out in [74], this method misses the coupling between the long-and short-wavelength sectors; more concretely, the background in which the noises should be computed is not exactly de-Sitter, but it is both slow-roll and stochastically corrected. This means that the results obtained with a stochastic formalism that computes the noises in a pure de-Sitter background should not be taken seriously if we are looking for deviations from de-Sitter shuch as in the spectral index. This important limitation of the stochastic formalism as presented by Starobinsky (which will be further studied in Section VI A 1), led to some authors to present alternative ways of solving the equations of motion of stochastic inflation beyond the usual Fokker-Plank equation with de-Sitter noises, some examples are the recursive stategy presented in [42][43][44] or numerical codes as in [67].
Another important simplification that is usually performed in the stochastic framework is to split long-and shortwavelengths modes via a sharp window function (a step function). This choice leads to white noises which are very easy to handle both analytically and numerically. However, it was noted in [28] that this choice of window function leads to some problems in the asymptotic of the noise correlator at large spatial distances. This is why some works with smooth window functions that lead to coloured noises have also been proposed [34,68].
If we now go back to the historical controversies of stochastic inflation, we can find the issue of the time variable in which this formalism must be formulated. Already in one of the first works by Starobinsky [24], it was pointed out that the number of e-folds, (defined as N = Hdt, where t is the cosmic time and H is the Hubble parameter) seem to be a very appropriate time variable. This choice was later justified by connecting stochastic inflation with results form QFT in curved space times [35,37,47]; however, as also pointed out by these references, the choice of N as time variable is only a consequence of writing the stochastic equations in terms only of the scalar field. In fact, one can check that the only way to write all the scalar degrees of freedom in terms solely of the scalar field is to use the uniform-N gauge in the separate universe approach [60], which justifies the use of N as time variable since, by definition, in this gauge it is unperturbed. It was not until recently [65] when the stochastic formalism was formulated in another gauge, making it clear that the choice of N as time variable is only a consequence of the gauge choice.
Finally, it was lately noticed that the stochastic formalism of inflation was leading to some inconsistencies at nextto-leading order in SR parameters even at linear level. For example, it was checked in [55] that the linearization of the stochastic equations do not exactly reproduce the well-known equation of motion of the scalar linear perturbations in the long-wavelength limit (the Mukhanov-Sasaki equation, see Section IV A). Furthermore, in [57] it was demonstrated that, during an ultra-slow-roll (USR) inflationary regime, some noises that appear in the stochastic equations of motion were incompatible with the rest of the system. The origin of these inconsistencies can be traced back to the use of the separate universe approach in the construction of the stochastic formalism and they can be solved by including the momentum constraint in the separate universe approach [65].
This review will give a detailed derivation of the stochastic formalism to inflation rather than giving a list of results obtained within this formalism. During the derivation we will emphasise all controversial aspects that have been indicated in the above summary of the long history of the stochastic formalism, from the difference between the separate universe approach and the true long-wavelength limit of scalar inhomogeneities to the construction of the noises using an exact de-Sitter background.
The review is organized as follows: after presenting the basic inflationary concepts with the aid of the homogeneous picture of inflation in Section II, we will start studying inflationary inhomogeneities, focusing on its long-wavelength behaviour. In order to do so we will present different approximations to the exact Arnowitt-Deser-Misner (ADM) equations  of  Section  III.  These  approximations  are  linear  perturbation  theory  (in Section IV), gradient expansion (in Section V) and the stochastic formalism (in Section VI), which combines the two approximation schemes presented before. Finally, in Section VII we will give some conclusions.

II. HOMOGENEOUS INFLATIONARY SCENARIOS AND SLOW-ROLL PARAMETERS
As we have already mentioned, PBHs represent natural candidates for dark matter (DM) (latest constraints on this idea can be found in [76]). However, one possibility to statistically generate enough PBHs for this to hold one needs, at least, a power spectrum of primordial curvature perturbations several order of magnitudes larger than the one observed in the cosmic microwave background (CMB). Note that this is not the only possibility; one could statistically generate enough PBHs with a large non-gaussianity in the probability distribution of fluctuations even without enhancing the power spectrum. This scenario is discussed in [75].
It is known that a period of Slow-Roll (SR), of which the predictions of the CMB are based upon, cannot lead to the appropriate power spectrum necessary to generate enough PBHs to match the DM abundance [77] ( For the non-linear relation between the inflationary power spectrum and PBHS abundance, under the assumption of gaussianity, the interested reader can see [78]). Thus, one necessarily needs an inflationary epoch evolving beyond SR. A possibility is the introduction of an inflection point in the inflationary potential [79]. This leads the inflaton to undergo a so-called Ultra-Slow-Roll (USR) phase of inflation [80,81]. Taking into account that the statistics of PBHs from non-gaussian fluctuations has yet to be fully developed. The single field USR option with standard kinetic term seems then to be the best [75]. Such a system is described by the Einstein-Hilbert action with a minimally coupled scalar field i.e: whose homogeneous solution is an universe described by a Friedman-Lemaitre-Robertson-Walker (FLRW) metric: where a(t) represents the scale factor. The equation of motion of the scalar field in the universe described by (2) has the following equation of motion: where ≡ȧ a is the Hubble parameter, and a dot denotes a derivative with respect to the cosmic time t. Finally, the super-script "b" stands for "background", the meaning of which will be clarified later on.
The Friedmann equation is The Slow-Roll (SR) parameters ǫ i define the rate of change of the Hubble parameter: where, to write the final expressions, we have used the Friedmann equation and the equation of motion of the field. We can now define different inflationary regimes depending on the values of the ǫ i s: • Slow-Roll (SR): The field is slowly rolling down a potential with an almost constant velocity, which makes the acceleration negligible. In this case the equation of motion (3) is approximately The SR parameters are much smaller than one (ǫ i ≪ 1) and can be written in terms of the potential as • Ultra-Slow-Roll (USR): The field is moving along an exactly flat potential (V φ = 0), which makes the acceleration relevant. In this case the equation of motion (3) is From (8) one can infer that the velocity of the field (and hence ǫ 1 ) exponentially decreases, which makes some ǫ i ∼ O(1). More precisely: when i¿1 and odd.
As we will show later on, an exponential decrease of ǫ 1 makes the power spectrum of curvature perturbation increase.
• Both SR and USR are, at least approximately, sub-cases of Constant-Roll (CR). Hereφ b H bφb = κ where κ is a constant. SR is realized when κ = 0 while USR when κ = −3. We will not analyse further this generic case.
It is important to remark that, given a potential related to PBH formation, the SR and USR phases alternate. Thus, the equations of motion (6) and (8) will always be an approximation of the system.

III. INHOMOGENEITIES DURING INFLATION: THE ADM FORMALISM
Although it is very useful in order to understand the inflationary dynamics, the homogeneous picture of inflation presented in Section II is not very realistic and some inhomogeneities must be introduced in order to explain the universe we currently observe. Thus, we must define a general metric which is not restricted to homogeneity and isotropy. An option which is very useful in the context of inflation is to work in the so-called Arnowitt-Deser-Misner (ADM) formalism [82]. This framework supposes that the four-dimensional space-time is foliated into a family of three-dimensional hypersurfaces Σ t , labeled by its time coordinate t.
In order to work in the ADM formalism, it is convenient to write the metric as: and the action (1) presented in the previous section becomes: where we have introduced many new terms: • α is the lapse function, which measures the rate of flow of proper time with respect to coordinate time t as one moves normally to Σt.
• β i is the shift vector, which measures how much the local spatial coordinate system shifts tangential to Σt, when moving from Σt to Σ t+δt along the normal direction to Σt.
• γij represents the induced metric on the hypersurface Σt, that we we will decompose as γ ij = a 2 e 2ζγij with detγ ij = 1 so that the scale factor a is explicitely present.
• R ij is the Ricci tensor of the spatial metric, and hence R (3) ≡ γ ij R ij .
• Finally, Kij is the extrinsic curvature (K ≡ γ ij Kij), which is defined as: where ni ≡ (−α, 0, 0, 0) is the unit vector normal to the spatial hypersurfaces and ∇µ and Di are the covariant derivatives with respect to gµν and γij, respectively.
It is also convenient to decompose the extrinsic curvature into its trace and traceless part as: whereγ ijÃ ij = 0. Note that,in the homogeneous limit, i.e., when α = 1, β i = 0 and γ ij = a 2 δij and hence the metric (10) reduces to (2), we can identify the extrinsic curvature with the background Hubble parameter, namely K = −3ȧ a = −3H b . We will then define a more general inhomogeneous Hubble parameter as H ≡ − K 3 . This makes sense because K represents the expansion rate of the constant time hypersurfaces.
In the ADM formalism, the lapse function and the shift vector act as Lagrange multipliers for the action (11), and hence they generate two constraints: the Hamiltonian and the momentum constraints, which are, respectively: where E ≡ Tµν n µ n ν and Ji ≡ Tµjn µ γ j i and Tµν is, in our case, the stress-energy tensor of the scalar field, i.e., The two remaining variables, γij and Kij are the dynamical ones and their evolution equations are given by: • for Kij : where Sij = Tij and S k k = γ kl S lk . Finally, and although it can be recovered using the ADM equations just presented, it is also worthy to present here the Klein-Gordon equation for the evolution of the scalar field: We will present more detail in the following section, but it is worth remarking here that the homogeneous equations of Section II are straightforwardly recovered setting α = 1, β i = 0 and γij = a 2 δij .
Since the ADM equations are equivalent to the Einstein equations but much easier to implement numerically, they are very useful to study inhomogeneous space-times in an exact way; however, the numerical methods capable of solving exactly these equations are computationally expensive [83]. Fortunately there exist some very useful approximations that can be done when studying the inflationary universe that, not only allow us to considerably simplify the numerical way of solving the ADM equations, but they even admit some analytical solutions as we will see in the following.

IV. LINEAR PERTURBATION THEORY
Since, as explained in the introduction, the deviations from an exactly homogeneous and isotropic FLRW universe that we observe are very small, it makes sense to solve the system of ADM Equations (14)-(21) by expanding them around a FLRW background. If this expansion is done up to first order, we call it linear perturbation theory. There are many reviews on this topic [84][85][86][87][88]; however, since we are taking a slightly different point of view from most of them, we will go through linear perturbation theory in some detail. With this in mind, the ADM metric (10) will be written as: where g b µν is the homogeneous and isotropic metric of Section II and δgµν represent the perturbation. This implies that the lapse function, the shift vector and the spatial metric are approximated as: where Eij is traceless. The reason why Eij is traceless is because is the perturbation ofγij, which has unit determinant. Precisely, any matrix with unit determinant can be written as: where Mij is traceless. Note that the last two linearizations in (23) leads to Finally, the scalar field responsible for inflation must also be linearized, i.e., It can be easily shown that, of the linear variables introduced above, A, D and δφ transform as scalars under rotations in the background space-time coordinates, Bi as a 3-vector and Eij as a 3D-tensor. This does not mean that the only scalar components are A, D and δφ. In fact, we know from Euclidean 3D vector calculus that a vector can be decomposed as: and hence where B is some scalar field.
Similarly, for a tensor field we have where E is again a scalar field. The procedure explained above allow us to decompose the perturbations into a scalar, vector and tensor sector. These sectors have the characteristic of evolve independently one from each other at linear order in perturbation theory, which make them easier to handle. During this review we will be mostly focused on scalar perturbations of the metric since they are the ones that couple to the scalar field perturbation δφ. This allows us to write the perturbed metric of (23) as: Similarly to what we have just done with the metric, we can split each one of the ADM equations presented in Section III into an homogeneous and a perturbed part as follows: • Hamiltonian constraint (14).
Note that in (32) the term inside the first square brackets corresponds to the background Hamiltonian constraint (4) of Section II, being the term inside the second square brackets the linearization of the Hamiltonian constraint. We will follow this same notation for the rest of the ADM equations.
As it can be seen in (33), the momentum constraint do not have a contribution at the background level, which is logical due to the presence of spatial derivatives, which cannot appear in an exactly homogeneous and isotropic global background. However, it does have an homogeneous contribution in the perturbative sector due to the presence of a total spatial derivative, this is of crucial importance for the rest of the review.
where we have used the identification K ≡ −3H explained below Equation (13). From (34) we can easily identify the perturbation of the Hubble parameter, i.e., • Evolution equation forÃ ij (18).
Once we have seen how linear perturbation theory works and what are the linear equations that describe small inhomogeneities during an inflationary epoch, it is important to have a physical intuition about what a perturbation of the metric really means. By definition, a perturbation is the difference between the value of a quantity in the real and inhomogeneous space-time and its value on the idealized FLRW background. This seems trivial; however, in order to make such a comparison, it is necessary to compute these two values at the same space-time point. Since the quantities to compare live in different space-times, we require a pointwise correspondence between them, which is given by a coordinate system x µ such that the point P b in the background space-time and the point P in the perturbed space-time, which have the same coordinate values, correspond to each other.
The freedom in the choice among these coordinate systems is called the gauge choice. The way the different gauges are related in linear perturbation theory (for gauge transformations beyond linear perturbation theory see for example [89]) is via an infinitesimal gauge transformation of the coordinates: We can split the vector δx µ into its time an space components δx µ = λ 0 , λ i , and following the same idea as when we decomposed the perturbations in the metric, λ i can be written as λ i = λ i ⊥ + ∂ i η, where λ i ⊥ is a 3-dimensional divergenless vector and η is a scalar function. In terms of these functions, we can impose the gauge invariance of the perturbed metric (31) to deduce the gauge transformation of each one of the scalar variables in the metric: Finally, the scalar field perturbation will transform as: From (41) and (42) we can clearly see that the freedom on the choice of the gauge allow us to set two out of the five scalar perturbations to zero by choosing η and λ 0 accordingly. This reduces the scalar degrees of freedom to three (which further reduces to two when using the ADM equations for a single scalar field), which can be written in terms of gauge invariant, and hence physical, variables: the two Bardeen potentials [90] and the Mukhanov-Sasaki (MS) variable During this review we will pay special attention to the MS variable. In fact, it can be shown that, by rearranging the linearized ADM equations of (32)-(39), the MS variable follows a simple equation of motion which, written in terms of the SR parameters defined in Section II, is: In order to solve (45), we must take into account that subhorizon scales during inflation are microscopic, and hence we must study the behavior of the inflationary scalar field using quantum field theory (QFT). The way to proceed is similar to what we do when quantizing the harmonic oscillator:Q(x, t), which is now a quantum operator, can be expressed in Fourier space as: whereQ and a k and a † k are the usual creation and annihilation operators, that satisfy the usual commutation relation: With this construction, the variable Q k of (47) is the solution of the MS Equation (45) in Fourier space, i.e.,: If we impose the Bunch-Davies vacuum [91] to be recovered at early time, we can only write analitically a solution for (49) if a specific condition is satisfied. In order to see what is this condition we will define the variable z as z = aφ b H b and the conformal time τ as adτ = dt, with this, the analytical solution of (49) in terms of the conformal time only exists if is a constant. It is easy to check that this is the case generically; in fact we can integrate by parts τ = dt a to get: See for example Appendix E of [65] for the derivation. The equation above, together with the fact that (50) to be a constant generically up to O(ǫ1). Note that, since in SR, ǫi is also a constant up to O(ǫ1), in The analytical solution in this case is: where H (1) ν is the Hankel function of first class. In order to finish this reminder of linear perturbation theory, let us introduce a very useful quantity that characterizes the properties of the perturbations: the power spectrum. We will give a physical interpretation of the power spectrum later on and for the moment we will focus only on its mathematical definition. The power spectrum of a generic quantity X(x, t) is defined as the Fourier transform of the two-point correlation function, i.e.: where r = |x1 − x2| and k = |k|. From (52) it is clear that the power spectrum is defined as: Another interesting quantity derived from the power spectrum is the spectral index n X s − 1, which is defined as:

A. Long Wavelength Limit of Linear Perturbation Theory
As explained in the introduction, many quantities of interest such as the power spectrum of the scalar tilt are usually computed in the long wavelength limit, or superhorizon scales. This makes sense because, as long as inflation is taking place, the exponential expansion of the universe stretch the perturbations of the quantum fields from microscopic to cosmological scales. In this way, the inhomogeneities that re-enter the horizon once inflation has finished and hence the ones of interest today, were in its long-wavelength limit when inflation ended. In this section we will take a close look to this limit and its physical consequences. We will study the behaviour of perturbations with characteristic wavelength L much larger that a local Hubble radius H represents a local FLRW universe. With this in mind, let us study the long wavelength limit of perturbations during inflation. In order to do so we will take two different point of view: the first one is very intuitive and we will call it the k → 0 limit, the second one is slightly less intuitive but it is very useful when dealing with non-linear perturbations, this is the so-called linear separate universe approach of k = 0 case. Although they are very similar, the two point of view are not exactly equivalent as we will see in the following.

K → 0 Limit
Since each spatial derivative introduces a factor k in Fourier space, it would be logical to think that in the k → 0 limit of the ADM equations we must neglect any term that contains a spatial derivative (This will be done in the next subsection). However, by doing that, we would be neglecting terms that actually contribute in the k → 0 limit.
In the following we will enumerate the two cases in which neglecting a term with a spatial derivative would be too naive: 1. Non-local terms.
As an example, let us imagine a re-scaling of the spatial coordinates It is easy to realize that if we rewrite δ in terms of the transformations parameters of (41) we get: The fact that a transformation as the one described in (55) is perfectly allowed in a FLRW universe together with (56) and the transformations rules of the perturbed metric variables of (41) immediately tell us that variables like ∇ 2 E cannot generically be neglected in the long wavelength limit even if they contain a Laplacian. In order to avoid this problem, we will only neglect terms that contain extra spatial derivatives, by extra we mean that we will neglect a term like ∂iX if and only if it is compared with X, but we will not neglect it if it appear alone.

Equations with overall spatial derivatives.
A clear example is the momentum constraint of (33). Because it contains a total spatial derivative, it gives non-trivial information even in the k → 0 limit, namely: The same happens with the evolution equation forÃij (38).
Taking these two important point into account one can re-derive the equation of motion for the MS variable Q in the k → 0 limit.
The result is, as expected, the Equation (45) but without the Laplacian, i.e., If we want to know the solution for Equation (58) we can simply take the kτ → 0 limit of the solution (51) obtained before (we will do that later on); however, this would be restricted to ν (defined in (50)) being a constant. In this section we will solve independently the long wavelength equation of motion for Q without any assumption, i.e., valid at all orders in slow-roll parameters. The price to pay is that, since the initial conditions are given at sub-horizon scales and (58) is only valid at super-horizon scales, we will not specify any initial condition here.
It can be checked that (58) can be written as a total derivative as follows: For convenience, we will define the comoving curvature perturbation as: It terms of Rc, (59) takes a very simple form: where C1 is a constant. Therefore, the solution of the MS equation in the long wavelength limit (58) is where C2 is also a constant. Solution (62) is the well-known exact solution in the k → 0 limit for the single component scalar field case [85,92]. The term proportional to C1 is usually known as decaying mode, name inherited from its SR behaviour whereφ b is roughly constant, since a ∼ e −H b t during inflation, we can clearly see that the term proportional to C1 decays as e −3H b t during SR. However, this is not the case beyond SR, for example in USR we haveφ b ∼ e −3H b t (see (8)) and hence ǫ1 ∼ e −6H b t , which makes the term proportional to C1 be approximately constant whereas the one proportional to C2 decays, contrary to what happens in SR.

K = 0 or Linear Separate Universe Approach
The second point of view to solve linear perturbation theory in the long wavelength limit is the linear separate universe approach [93][94][95], in this case we will forget for a moment about the perturbed ADM equations and we will take advantage of the fact that, in the long wavelength limit, each local patch represents an homogeneous and isotropic universe such that its line element is: where, as before, the subscript l stands for local. As a consequence, the constraints and equations of motion will be the same as the ones presented in Section II, i.e., The first thing to realize here is that the momentum constraint is missing. This is because in the separate universe approach we are not only assuming that each patch is homogeneous and isotropic (we were also making this assumption in the k → 0 limit where the momentum constraint plays a role), but we are also assuming that each patch evolve independently from each other. Since the momentum constraint gives information about the interaction between the different FLRW patches due to the presence of a spatial derivative, it makes sense that it does not appear in the linear separate universe approach. Before proceeding, let us clarify here that when we talk about the information about the interaction between different patches encoded in the momentum constraint, which makes these patches evolve in a not completely independent way, we do not claim that what happens in one patch will affect the others, but rather that all the patches as an ensemble must satisfy some conditions (given by the momentum constraint) that are absent if we look to a single patch but that are important when comparing them.
Within this approximation, we can only see the effect of perturbations if we compare different patches between them or with the global background. Knowing that the differences must be perturbative, we can follow our results from linear perturbation theory and write: The term 1 3 ∇ 2 a B can be set to zero without loss of generality, this is because since each patch is independent from each other, we can always choose an orthogonal threading for all of them, in which βi (and hence B) is zero. Another way of reasoning is that, as checked in the k → 0 limit of perturbation theory, a term with a Laplacian can only be important in the long wavelength limit if it contains non-local information, but at the same time, non-local terms would give some information about the interaction between local patches, which is in contradiction with the absence of the momentum constraint (and hence with the separate universe assumption) as explained above.
If we also take into account that the scalar perturbation related with the traceless part of the spatial metric (i.e., E ) does not appear in (65) so neither in (64), we can set in the separate universe approach (as indicated with the superscript "sep"). Note that, although (66) coincides with the Newtonian (or longitudinal) gauge, this is not a gauge choice, but a consequence of the separate universe approach. With this in mind, we can write the equations in (64) in terms of the global background and perturbations over it as follows: As one can see from (67), the linear separate universe approach does not use the whole system of ADM equations presented in Section IV, it only uses the long-wavelength version of: (a) the Hamiltonian constraint (32), (b) the evolution equation for the trace of the extrinsic curvature (37) and (c) the equation of motion of the scalar field (39). All of them setting both ∇ 2 B sep and ∇ 2 E sep to zero accordingly with the separate universe assumption.
In order to see what are the consequences of this reduction in the number of equations and variables when comparing with the k → 0 limit of Section IV A 1, we can follow a similar procedure to what we did when defining the MS variable and write down a "separate-universe" gauge invariant variable [96,97], meaning that it is gauge invariant under time reparametrizations.
Similarly to what we did with the MS Equation (45), we can rearrange the equations of the linear separate universe approach (67) and write a single equation of motion for Q sep , the result, in terms of the SR parameters, is: Comparing (69) with (58) we can now clearly identify two extra terms that appear when assuming that each local patch evolves independently from each other, i.e., when using the separate universe approach. These terms are O(ǫ 1 ǫ 2 ) [55], being strongly dependent on the inflationary regime, for example, they are In order to better quantify this difference we will solve (69) in a similar way as done with (58) and compare the results. We can again write (69) as a total derivative as follows Defining a "separate universe" comoving curvature perturbation as we can write (70) as where C ′ 1 is a constant. Hence, the solution for Q sep is: We can see that the solutions for Q sep of (73) and for Q of (62) differ by an extra term in the decaying mode, which is obviously due to the difference of O(ǫ 1 ǫ 2 ) in the equation of motion. The importance of this extra term depends on the inflationary regime, being more important beyond SR, where the term proportional to C ′ 1 does not decay (see discussion below (62)).
Since the main difference between the linear separate universe approach and the k → 0 limit is the inclusion of the non-trivial information of the momentum constraint at large scales, which is not taken into account in the former, one could think that the two solutions will coincide if we impose the solution (73) to satisfy the momentum constraint; however, this is not the case. In fact, in the following we are going to check that the mode proportional to C ′ 1 , which contains the new extra term, does not represent the k → 0 limit of some solution to the perturbations equations with k = 0. This should not be a surprise since the correct solution for the MS variable in the long wavelength limit is given by (62). In order to check that, let us rearrange equations (65) to write If we want solution (73) to be the k → 0 solution to the perturbation equations with k = 0 we must be sure that our solution satisfies the momentum constraint (although it is lost in the separate universe approach.) because, as we know, this constraint gives non-trivial information even at large scales. What would be the momentum constraint in the linear separate universe approach is which, comparing with (74), immediately implies: From condition (76) and the equation of motion (70) we can see that, if the momentum constraint of (75) is satisfied at initial time, it will be always satisfied so one could think that the problem of ignoring the momentum constraint in the linear separate universe approach is simply solved by an appropriate choice of initial conditions; however, the only mode that satisfies (76) is the mode proportional C ′ 2 in the solution (73), which means that the mode proportional to C ′ 1 therein does not represent a k → 0 solution of the perturbation equations with k = 0 and it must be set to zero. However, setting C ′ 1 = 0 means that we are losing the term proportional to C 1 in the correct solution of (62), which can be important beyond SR. Thus, imposing that the momentum constraint must be satisfied in the separate universe approach does not give the correct solution for the long-wavelength limit of the MS variable and hence we have to use the whole set of ADM equations in the long wavelength limit to describe the correct dynamics (at all orders in SR parameters) of the MS variable at super-horizon scales. Let us indicate here that although there exist some ways to recover the correct k → 0 limit of the MS variable using only the linear separate universe approach (see [96,97]), we will not study them here because they are restricted to linear perturbation theory and, as we will see later on, we are planning to use the long wavelength limit of inhomogeneities beyond perturbation theory.
For clarity purposes, let us summarize the findings of this section until now: • Although both the k → 0 and the linear separate universe approach consider local homogeneous and isotropic universes, the k → 0 limit allow some interaction between them whereas the linear separate universe approach assume that they evolve independently. Mathematically speaking, the linear separate universe approach ignore possible non-local terms and equations with overall spatial derivatives, both present in the k → 0 limit.
• As a consequence, the equation of motion for Q sep differ by terms of O(ǫ 1 ǫ 2 ) from the equation of motion for Q in the long wavelength limit. This difference induces an extra decaying term, namely: • The difference (77) always decays so one could think that it can be safely ignored; however, its importance strongly depends on the inflationary regime. For example, in USR we have Q sep − Q(k → 0) ∼ O(ǫ 1 ) so if we want to be precise enough when studying the long-wavelength limit of perturbation theory, we should use the k → 0 limit rather than the linear separate universe approach.
• Finally, imposing the momentum constraint to be satisfied in the separate universe approach not only does not solve the difference between Q sep and Q(k → 0), but it makes it worse.

B. Linear Perturbation Theory and PBHs
In order to finish this section we will study the behaviour of the power spectrum of the inflationary regimes that can lead to the formation of PBHs in the long wavelength limit (The reader interested in PBHs as probes for the physics of the very early universe in a more detailed way can read the following reviews: [98,99]). We will do that in a very qualitative way and at leading order in SR parameters, this is why we will simply use the kτ → 0 limit of solution (51), where the initial conditions are properly specified. If we expand the Hankel function of first order we have: where Γ[ν] is the Euler gamma.
For the formation of PBH we are interested in the power spectrum of the comoving curvature perturbation, i.e., where we have used τ ≃ − 1 aH b .
For USR and SR we have ν ≃ 3 2 and hence the power spectrum is roughly scale invariant: However, and although the k dependence of the comoving curvature power spectrum for USR and SR is roughly the same, and hence n Rc s − 1 ∼ O(ǫ 1 ) for both, this is not true for the time dependence. In fact we have An exponential growth of the power spectrum at super-horizon scales as it happens in USR has important consequences for the formation of PBH. In fact, by definition of the power spectrum, P Rc give us the variance of the probability distribution that follow the amplitude of the perturbations (R k 1 ) c with characteristic wavenumber k 1 , therefore, a growth in the power spectrum can be interpreted as a growth in the variance or, equivalently, a spreading of the probability distribution. This means that high values for the amplitude of the perturbations placed at the tail of the probability distribution are much more probables if the power spectrum grows. Finally, and since the high values for the amplitude of the curvature perturbations are the ones that collapse to form a PBH when they re-enter the horizon, we can conclude that inflationary regimes beyond SR favor the creation of PBH.
The argument given above is very hand-waving but it is enough to remark the importance of inflationary regimes beyond SR when talking about PBH. However, if we want to make actual predictions about the mass and abundances of PBH we need to study inflationary dynamics in a much more precise way, in fact, the tail of the probability distribution for the amplitude of perturbations, which is where the perturbations responsible for the formation of PBH are located, is very sensitive to any small change in the power spectrum (or in the higher order correlators as the bispectrum). This is the reason why a description of the perturbations at all orders in SR is highly desirable, making the linear separate universe approach not the best approximation to study the tail of the probability distribution of the amplitude of perturbations generated during a inflationary regime beyond SR, i.e., to study the formation of PBH. The reason is that, generically, Q sep − Q(k → 0) ∼ O(ǫ 1 ) as explained above.
Finally, another problem arises when dealing with inhomogeneities large enough to form a PBH, namely, non-linear or even non-perturbative effects can play an important role, reason why we need to go beyond linear perturbation theory. The rest of the review is devoted to the study of inhomogeneities generated during inflation beyond linear perturbation theory.

V. GRADIENT EXPANSION
The gradient expansion is a non-perturbative, in terms of the amplitude of the inhomogeneities, expansion of the ADM equations valid when the characteristic wavelength of the inhomogeneities L is much larger that the Hubble horizon H −1 l [100 -109]. Since this is the same approximation that we do when studying the long wavelength behaviour of the linear perturbations, we will follow the same idea and we will take advantage of the fact that, when L ≫ H −1 l , the universe is locally homogeneous and isotropic to define an expansion parameter σ ≡ H −1 l L such that at leading order in σ, each local patch of the universe of size (σH −1 l ) (we will call this scale the coarse-grained scale) is approximately described by a FLRW universe. Higher order terms in the σ will instead capture local inhomogeneities.
Contrary to the linear theory approach to cosmological perturbations, the gradient expansion is valid for any amplitude of local over-densities as long as the patch is taken small enough for the gradients to be negligible. Note that this assumption on which the gradient expansion is based on implies that a patch can be found such that any spatial gradient would only introduce an order σ. In other words, for any generic function X, ∂ i X ∼ X × O(σ). This is because a function which is approximately homogeneous in local coordinates can be written as X(t, σx i ) with σ ≪ 1. Thus, we have and since ∂ ∂(σx i ) X(t, σx i ) σx i =0 can be of the same order as X(t, σx i ) we can generically write: When we were studying linear perturbation theory, the way of relating different patches was to define a global background over which each patch represented a perturbation. In this case we do not have a physical background metric because we are no longer perturbing anything; however we will still define a fictitious global background metric with coordinates t and x i , i.e., We are then interested in writing each local FLRW patch, which at leading order in σ and in terms of local coordinates is simply (63), in terms of the coordinates t and x i . At leading order in gradient expansion and considering only scalar perturbations (note that, if we want to study inflation in a fully non-perturbative way, we should also take into account vector and tensor perturbations. This is because, although they are independent at linear order in perturbation theory, this is no longer true at higher orders. The reason why we do not include vector and tensor perturbations here is because, although at this level it would be straightforward, it is not possible when applying gradient expansion to stochastic inflation as we will see) we have: with the conditions: where we are using the subscript (0) to remind the reader that we are at leading order in gradient expansion It is important to remark here that the leading order in gradient expansion of each quantity can be different, for example, the leading order for α and ζ is O(σ 0 ) whereas the leading order for β i is O(σ −1 ). Having a quantity whose leading order is gradient expansion is O(σ −1 ) could seem problematic; however this is only telling us that β i is generically a non-local quantity, in fact, its linearization give us the non-local variable B (see (23)) studied in Section IV A 1. Furthermore, we will see that β i always appear together with a spatial derivative in the equations of motion.
One could be worried about the fact that a homogeneous and isotropic metric contains terms outside the diagonal; however, following [110] we know that a space-time is homogeneous and isotropic if: 1. All constant time hypersurfaces Σ t are constant curvature spaces. In our case the hypersurfaces Σ t are simply Euclidean and this condition is trivially satisfied.
2. The extrinsic curvature of the hypersurfaces is homogeneous and isotropic. Using the definition of extrinsic curvature of (12) together with the conditions for (0) α, (0) β i and (0) γ ij specified below (83), we can see that the extrinsic curvature only depends on time and hence this condition is also satisfied.
Different functions for (0) α, (0) β i and (0) ζ will give different FLRW patches as long as they satisfy the conditions given below (83). We can then relate the different locally homogeneous and isotropic patches by knowing the different non-perturbative functions for (0) α, (0) β i and (0) ζ that lead to each one of them. Of course, in the same way as in perturbation theory, the value of (0) α, (0) β i and (0) ζ will depend on the gauge choice and on the solution for the ADM equations.
We can generalize the idea of writing a FLRW metric in terms of x i and t to writing quasi-FLRW metrics using the same coordinates. Following the ADM formalism we can write a metric valid at all orders in gradient expansion. With the identification γ ij = a(t) 2 e 2ζγ ij we have: where the leading order in gradient expansion for each variable is: The last term has been added to take into account the expansion of the scalar field, which is generically non-zero at the background level. It is important to realize here that the conditionγ ij − δ ij ∼ O(σ) implies a further condition on the scalar part ofγ ij , in fact, using the expansion of the exponential of a matrix we can writẽ Now, M ij must be traceless by definition. Focusing only in the scalar part of M ij we can then writẽ where C is a scalar function. This immediataly implies that ∂ i ∂ j − 1 3 δ ij ∇ 2 C ∼ O(σ). Note that, as we will see later on, this condition is not in contradiction with ∇ 2 C ∼ O(σ 0 ).

A. O(σ 0 ) in Gradient Expansion
Having specified the leading order gradient expansion for each one of the non-perturbative variables, we can now expand the ADM equations in terms of σ. As an example we will expand the Hamiltonian constraint at O σ 0 using the spatially flat gauge, i.e., where the subscript f stands for "flat". Since in this gauge we also have R (3) f = 0 we can write the Hamiltonian constraint from (14) as: where Equation (89) can be written in terms of the metric variables α f and (β f ) i of (84) and the scalar field φ f . Using the results for (A f ) ij and K f given by (18) and (17) respectively we have: which is valid at all orders in gradient expansion. Keeping only the O(σ 0 ) terms we get: where by (0) (∂ i φ f ) we mean σ ∂ ∂(σx i ) φ(t, σx i ) σx i =0 as explained before. Note that the first line of (90) is O(σ) because of the condition of (0) β i below Equation (83). Following this procedure, it is straightforward to write all the ADM equations at order O(σ 0 ); however, we know from linear perturbation theory that there are equations with global spatial derivatives that play a role in the k → 0 limit, which translated to the gradient expansion way of thinking means that there are equations that contain an overall factor σ and hence in order to extract its contribution at O(σ 0 ) they must be written up to O(σ). We know that the momentum constraint is one of these equations so let us analyze it. In spatially flat gauge, the momentum constraint (15) is: If we expand (92) up to O(σ) we get: An important aspect regarding the gradient expansion is worthy to remark at this point, note that, in the derivation of (93) we have used ∂ j Ã f ij ∼ O(σ), which seems in contradiction with Ã f ij ∼ O(σ) (as we saw when deriving the Hamiltonian constraint at O(σ 0 )) and the fact that each spatial gradient introduces an order in σ. However, in this case we have an exception due to the traceless nature of Ã f ij . Let us see why: from the condition 2 below (83) we have (β f ) i ≃ b(t, σx l )x i and from (18) in spatially flat gauge: which is clearly O(σ).
The linear version of (93) obviously corresponds to the linear momentum constraint in spatially flat gauge, i.e., As already stressed during Section IV A 1, although (94) is an equation that appears at O(σ), it contains information at O(σ 0 ), namely Unfortunately, things are not that easy when dealing with the fully non-perturbative momentum constraint at O(σ), in fact, if we look at (93), we will easily realize that the total spatial derivative is no longer present. How the O(σ 0 ) information is encoded in (93) in a fully non-perturbative way is beyond the scope of this review, nevertheless, it is very important to be aware that we must take this information into account if we want to correctly describe the non-perturbative and long wavelength dynamics of the inhomogeneities, otherwise we would be making a mistake equivalent as when using the linear separate universe approach of Section IV A 2 instead of the k → 0 limit of Section IV A 1. Nevertheless, due to its usefulness, in the following section we will present the non-perturbative version of the linear separate universe approach.

B. Non-Perturbative Separate Universe Approach
The O(σ 0 ) expression for the Hamiltonian constraint (91) seems very difficult to solve, and we have just argued the difficulties that arise when trying to extract the O(σ 0 ) information from the O(σ) momentum constraint. This is why a further approximation in addition to the O(σ 0 ) gradient expansion is usually performed in the literature [20,111,112]. This new approximation is the separate universe approach, which assumes that each FLRW patch of the universe evolve independently one from each other. We have already studied the linear version of the separate universe approach in Section IV A 2, where we have seen that this assumption implies that neither non-local terms nor the momentum constraint will be present, which is equivalent to state that β i ∼ O(σ 3 ) as done in [109]. Within this approximation, the Hamiltonian constraint in spatially flat gauge of (91) takes a much simpler form: which reminds us of the background Hamiltonian constraint (4). In fact, this coincidence extends also to the equation of motion of the scalar field as one could expect since we are assuming every FLRW to evolve independently from the others. A famous application of the separate universe approach is the δN formalism in order to compute the uniform-density curvature perturbation in a non perturbative way [70][71][72][73]. As a reminder, the linear uniform-density curvature perturbation is defined as where ρ is the energy density of the scalar field, which in inflation is ρ = 3M 2 P L H b 2 and δρ f is the perturbation of the energy density in spatially flat gauge.
In this section we have presented the non-perturbative generalization of the long wavelength linear perturbation theory presented in Section IV, being the O(σ 0 ) in gradient expansion of Section V A the generalization of the k → 0 limit for scalar perturbations of Section IV A 1 and being the separate universe approach the non-linear generalization of the k = 0 (or linear separate universe approach) case of Section IV A 2. For this reason, it is obvious that the O(σ 0 ) gradient expansion and the separate universe approach will give different "decaying" terms, being the ones of the O(σ 0 ) gradient expansion the correct ones, as it happened in linear theory.
Using the results from linear theory again, we can conclude that the error made when using the separate universe approach instead of the O(σ 0 ) gradient expansion will be generically of O(ǫ 1 ), strongly depending on the inflationary regime. This conclusion is deduced of course assuming that higher orders in perturbation theory will not spoil the results at leading order, assumption that can be problematic if the inhomogeneity under study is non-perturbative.
We will finish this section by reminding the reader that setting initial conditions for long wavelength perturbations is problematic because we cannot use the Bunch-Davies vacuum, which is only well defined when k aH → ∞. This is why, although the gradient expansion is a very useful way to study inhomogeneities in a non-perturbative way, we need some other tool to set the initial conditions.

VI. STOCHASTIC APPROACH TO INFLATION
The stochastic approach to inflation combines the two approximations schemes presented until now to study the evolution of inhomogeneities in a non-perturbative way. The idea is to split the variables of interest (let us say X) into two parts: an infrared (IR) part that contains all the inhomogeneities with characteristic wavelength larger that some coarse-grained scale (σH) −1 (σ is the same parameter as the one used in gradient expansion) and a ultraviolet (UV) part, which encompasses inhomogeneities with characteristic scale smaller than (σH) −1 (or characteristic wavenumber k bigger than σaH).
Since the UV part starts evolving well inside the Hubble horizon, we will assume that it is perturbatively small. Thus, one can use linear perturbation theory to describe it, where initial conditions are well defined. The IR part instead can be large; however, since the IR part only contains long wavelengths, the gradient expansion can be used there. As we will see, whenever an UV mode exits the coarse-grained scale, it will act as a kick for the IR part, solving the initial condition problem of gradient expansion. Note that, although we will only study stochastic inflation in the context of Einstein's gravity, the only requirements for the construction of a stochastic formalism are the possibility of a separation between IR and UV modes and a well-behaved perturbative expansion. There is then no reason a priori to think that the stochastic framework can not be applied to modifications of Einstein's gravity such as massive gravity [113,114].
The stochastic formalism is then a mathematical framework that, in principle, allows us to study the inhomogeneities generated during inflation in a non-perturbative way, reason why it is widely used when studying PBHs formation (see for example [115,116]). To see how stochastic inflation works and why it is called "stochastic" we will derive the formalism step by step. From Section V it should be clear now that we will have different stochastic equations depending on if we are using the separate universe approach or the O(σ 0 ) gradient expansion.
The stochastic formalism that uses the separate universe approach is the most widely used in the literature (see for example [10, 14-19, 23, 27, 35, 37, 38, 51-53, 63, 66, 115, 116]) so we will start with its derivation. After that, we will take advantage of the stochastic equations just derived to present a stochastic formalism based on the O(σ 0 ) gradient expansion [65].
Before starting with the derivation and, since it is not as trivial as the spatially flat gauge, let us present the gauge we will be working with: the uniform-N gauge. We define the number of e-folds N as the integrated expansion rate of Σ t hypersurfaces, i.e., where K ≡ −3H l (being H l the local Hubble parameter) is the extrinsic curvature defined in (12), which coincides with the expansion rate of Σ t hypersurfaces as shown in Section III and t l is the local time of (63). In terms of the variables and coordinates of the ADM metric (84) N can be written as: The uniform-N gauge is defined such that N coincides with the number of e-folds defined in the global background of (82), i.e., N ≡ H b dt. From (98), this immediately implies ζ δN = 0 and (β δN ) i = 0 (where the subscript δN specifies the gauge), or D δN = 0 and B δN = 0 in its linear limit (see Equation (23)). The reason behind this gauge choice will be clarified along the following two subsections.

A. Stochastic Formalism Based on the Separate Universe Approach
As we have already indicated, this is the stochastic formalism most widely used in the literature due to its simplicity; however, as we will see, it has some problems. First of all, it is very important to realize that in the separate universe approach, both the uniform-N gauge and the spatially flat gauge are equivalent, which leads to many authors to use the uniform-N gauge for the IR part and the spatially flat gauge for the UV part [60]. Let us see why: As we know, in the separate universe approach, due to the absence of non-local terms, both (γ sep ) ij = δ ij and ∂ i (β sep ) i = 0 are automatically satisfied (see (66)). This means that, under this assumption, we have to add the condition ∂ i (β sep f ) i = 0 to the spatially flat gauge where (γ f ) = δ ij and ζ f = 0. In the same way, we have to add the condition (γ sep δN ) ij = δ ij to the uniform-N gauge, where ∂ i (β δN ) i = 0 and ζ δN = 0. The main consequence of this is that, under the separate universe condition, we can express all the scalar fluctuations only in terms of the field inhomogeneities not only in the spatially flat gauge, but also in the uniform-N gauge. This can be clearly seen when looking at the linear "separate-universe" gauge invariant MS variable of (68). For non-linear generalizations of this variables see [117,118].
Before continuing, let us remind the reader that the equivalence between these two gauges in only valid under the separate universe approach, which, as seen in Section IV A 2, it generically fails at O(ǫ 1 ).
During this section, and although it is not compulsory, we will work using the number of e-folds N ≡ H b dt as time variable (Note that we can use any time variable we want because we will use the coordinates of a fictitious global background, i.e., the coordinates of (84). If we would instead use local coordinates (as in (63)), we would be interested in using an unperturbed time variable, being N the natural choice in the uniform-N gauge). Another important aspect is that, in order not to overload notation, we will suppress the subscript δN indicating the gauge we are using, such that in the following, unless otherwise stated, a variable without a subscript that indicates the gauge is a variable in the uniform-N gauge.
As indicated before, the stochastic formalism presented in this subsection is based on the separate universe approach for the IR part and on linear perturbation theory for the UV part. To illustrate this we will consider in detail the equation of motion for the trace the extrinsic curvature (19) in uniform-N gauge, which can be written using the variables of the ADM metric (84) as: The first thing to do is to split the variables of interest into their IR and UV part. In this case we only have two variables to split: In (100) we are not considering ∂γij ∂N as a variable of interest not only becauseγ ij = δ ij in the separate universe approach, but also because ∂γij ∂N ∂γ ij ∂N ∼ O(σ 2 ) in gradient expansion and quadratic in perturbation theory so it does not play any role even if we were using O(σ 0 ) gradient expansion.
Due to the perturbative nature of the UV variables, we will expand (99) keeping only linear terms in UV and isolate them in the right hand side of the equation getting Note that this is the same as we did in linear perturbation theory of Section IV but using the metric (63) (or equivalently (84)) as background metric. Now, since the IR variables are well outside the Hubble horizon, we will use the separate universe approach for them. Since α IR ∼ O(σ 0 ) and hence D k D k α IR ∼ O(σ 2 ) we have: where we have inserted an extra subindex (0) to indicate that we are at leading order in gradient expansion. Using Fourier analysis we can now define more rigorously the IR and UV modes. If we choose the Heaviside theta as a window function (The choice of the Heaviside theta as window function in the stochastic formalism is the most common one. However it can lead to some problems as indicated in [28]) we have the following decomposition for a generic function X.
where, similarly as in linear perturbation theory (see (47)),X UV k (t, x) is define as the following hermitian operator: where X k (N ) is the solution of the evolution equation for the perturbation X over the local background defined by (83) and a k and a † k are the usual creation and annihilation operators which follow the commutation relation given in (48).
Note that, in the spirit of gradient expansion, the splitting is done in the local cosmological coarse-grained scale (σH l ) −1 , which generically differs form the one of the background, for example in uniform-N gauge we have H l = H b (0) α IR . Inserting the definition of X UV of (103) into (102) we get: where α UV k and ϕ UV k are operators defined as in (104). The right-hand side of (105) has two different terms: 2. The first two integrals, proportional to a Dirac delta, can be seen as boundary conditions and hence they will act as the initial conditions missing when using only gradient expansion.
We then get: where we have defined ξ 1 and ξ 3 as: For the moment we will not characterize the quantities ξ 1 and ξ 3 . If we now follow the procedure for the evolution equation of the trace of the extrinsic curvature just explained with the Hamiltonian constraint we get: which can be solved for Finally, the stochastic equation of motion for the field is obtained in the same way: where ξ 2 is defined similarly to ξ 1 and ξ 3 : As anticipated before, the usage and uniform-N gauge and the separate universe approach ensures that all the scalar inhomogeneities are encoded in the scalar field. This becomes clearer once we realize that we can write the system (106)- (110) in terms only of the scalar field. Inserting (106) and (109) into (110) and neglecting ξ 2 i terms because they are quadratic in perturbation theory we get: which can be conveniently written if we use an auxiliary variable (0) π IR :

Characterization of the Noises
The interpretation of ξ 1 and ξ 2 (note that ξ 3 no longer appears in the final equation of motion) as classical noises is not trivial because they are, strictly speaking, quantum operators. In order to see how they are effectively classical, we can compute the two-point correlation function of ξ 1 for example at equal space point, the result is: where we have used (104) together with the commutation relation (48). From (114) we see that δφ k , which is the solution for the field perturbation over the local patch of size σa H b , is evaluated at the coarse-grained scale, i.e., well outside the Hubble horizon. It can then be shown that at those scales, any perturbation that started from a coherent vacuum state has evolved into a highly squeezed state [119,120], which means that we can consider as the power spectrum of a classical random variable, whose time evolution is consistent with probabilities conserved along classical trajectories.
Once this is clarified, we are now in position to describe ξ 1 as a classical white noise (Its "white" nature is due to the presence of δ (N 1 − N 2 ) in the two-point correlator. Note that this is a consequence of the the choice of the Heaviside theta function as Window function, any other choice would lead to coloured noises, which are much more difficult to deal with, both analytically and numerically) with variance given in (114). Furthermore, since the field fluctuations are Gaussian to a good level of approximation, the variance computed in (114) is enough to fully characterize ξ 1 . Finally, in order to characterize the system (113) we also need: The characterization of ξ 1 and ξ 2 as white noises give us now a intuitive picture of the physics behind the stochastic formalism. As explained before, different functions for (0) α IR and (0) φ IR (in uniform-N gauge) describe the evolution of different FLRW patches, the way of getting these different functions is now clear if we see ξ 1 and ξ 2 as random variables. For example, the evolution of a specific patch, let us call it 1 F LRW , will be given by 1 (0) α IR and 1 (0) φ IR , whose specific form will be determined by the random values that the noises 1 ξ 1 and 1 ξ 2 will pick at each time step. Now, if we want to describe a second patch 2 F LRW , we just have to solve again the stochastic equation with different random values for the noises 2 ξ 1 and 2 ξ 2 , always satisfying the statistics described by (114) and (115). Like this, we are then able to describe the evolution of an ensemble of FLRW patches by solving many times the same stochastic equation with different random values for the noises. The correlators between these patches are then simply described by statistical moments of the IR variables.
The stochastic system is already fully characterized by (113)- (115); however, this system is very difficult to solve because, in order to compute the variance of the noises ξ 1 and ξ 2 , we need to solve the perturbation equations and compute δφ k over a stochastic background every time step, which makes the systen non-Markovian. Although there are numerical algorithms capable of doing so [67], it would be very convenient if we were able to write an analytical solution for δφ k in terms of the IR variables, that would make the system Markovian and easily solvable. Let us try to solve the equation for δφ k over the stochastic local background.
We know that, as a consequence of the separate universe approach used here, the linearized equation for the field perturbation in the uniform-N gauge will be the same as the equation for the "separate-universe" gauge invariant variable Q sep defined in (68), in other words, Q sep = δφ f = δφ δN under the separate universe assumption. Now, since the linearized equation for Q sep only gives the correct solution for the true gauge-invariant MS variable Q defined in (44) if we set ǫ 1 = 0 as checked in Section IV A 2, the equation we will try to solve here is the linearized equation for the gauge invariant quantity Q over a local background in which all the SR parameters have been set to zero (note that, in order for this approximation to work we need − 3 2 ǫ 2 − 1 4 ǫ 2 2 ∼ O(ǫ 1 ) , which is true for SR and USR. This is because we are setting the combination of SR parameters multiplying Q in (45) to zero), in other words, we will try to solve the equation for Q as if the local background were an exact de-Sitter, which is: or, using the Hamiltonian constraint of (109): Now, both the functions (0) π IR and V (0) φ IR are stochastic so, in order to know them we must already know the value for the variable ξ 1 . This makes Equation (117) not analitically solvable, more concretely, the stochastic system of (113) is non-Markovian, meaning that the value of the noises at each time N will depend on the whole evolution of the stochastic patch up to N .
The only option remaining to solve the stochastic system of (113) seems then to be numerically; however, a further and very important approximation is usually done, which consists on assuming that the IR (and stochastic) quantities do not differ much from their global background (and deterministic) counterpart, namely Y IR X UV ≃ Here X UV and Y IR are any UV and IR functions, we then define Y b as the equivalent background function of Y IR . Under this approximation, the last term of (116) is: where we have substituted (0) α IR by its background value, i.e., 1. Under this approximation we can write a analytical solution for (116), that, once evaluated at coarse-grained scale, will correspond to the long-wavelength limit of solution (51) with ν = 3 2 , i.e., and therefore the correlators (114) and (115) can be written as follows: Under this approximation, the stochastic system of (113) simplifies considerably and becomes a Markovian process with additive noises, meaning that ξ 1 depends only on time, which is true as long as we are using the global background to characterize it. The system is: where ξ(N 1 )ξ(N 2 ) = δ(N 1 − N 2 ) and we have dropped O(ǫ 1 ) terms in order to be consistent with the computation of the noises and the reasoning above (116). Note that in SR the acceleration of the field is also of higher order in SR parameters so in this case the stochastic equation would simplify even more: Although the approximation used in order to arrive to (122) seems very useful, it has important consequences, in fact, it is equivalent to state that any Y IR − Y b ∼ O X UV . Thus, we immediately see that if this approximation holds, the system in (122) can only reproduce the results of linear theory at leading order in SR parameters, and hence it does not give any non-perturbative (or even non-linear) information. In fact, (122) is slightly inconsistent. The point is that, by the same approximation adopted on the right hand side, the left hand side should also be linearized (in fact the linearization of the left-hand side of (122) is sometimes performed when recursive methods are used in order to solve the stochastic system as it can be seen in [42][43][44]). This inconsistency can however give some information when comparing the results from the stochastic formalism with the ones of linear perturbation theory, in fact, since we know that the correlations functions calculated with the stochastic system of (122) will coincide, up to second order in perturbation theory, to the ones calculated in linear perturbation theory with QFT methods, any inconsistency between the two approaches will signal the break-down of perturbation theory.
Since the stochastic Equation (116) and its deterministic counterpart with (0) α IR = 1 have the same structure, one could be tempted to write the solution of (116) in a similar way as the solution (119) and hence the stochastic system of (113) would approximately be: The system of (125) represents a Markovian process with non-additive noises (the term proportional to ξ does no longer depends only on time), which is, provided that (124) holds, able to describe the inhomogeneities generated during inflation in a non-perturbative way. Unfortunately, we cannot trivially generalize the solution (119) for a deterministic equation to a solution (124) for a stochastic equation, this is due to the differences between stochastic and deterministic integrals (see for example [121]). This is why, although it is the most common approach found in the literature, we will not trust (125) to describe non-linear inflationary effects in this review.
Having discarded this option, we are left with three different ways of solving the stochastic system of (113): • Pros: It is a Markovian process with additive noises able to describe the linear dynamics of the inflationary perturbations even when they are not approximately described by (119).
• Cons: It is not capable of describing any non-linear effects and it is inconsistent generically at leading order in ǫ 1 due to the use of the separate universe approach. This inconsistency can be clearly seen in the term proportional to ξ 1 in (112), which in the background can be written as 3 − ǫ 1 + ǫ1ǫ2

3−ǫ1
and not (3 − ǫ 1 ) as it should be if it came from the MS equation. In fact 3 − ǫ 1 + ǫ1ǫ2 is precisely the term that appears in the equation for Q sep (69) derived in Section IV A 2, making it clear that this inconsistency is a consequence of the separate universe approach.
Due to the problems that the stochastic formalism presented in this section has (namely its validity only up to leading order in ǫ 1 due to the separate universe assumption and its difficulty to describe non-linear dynamics), it does not seem very promising if we want to describe the non-perturbative dynamics of the inohomogeneities generated during inflationary regimes of interest for PBH formation with enough precision. However, in the next subsection we will present a stochastic formalism (firstly presented in [65]) which is at least able to reproduce linear perturbation theory at all orders in SR parameters, and with the potential to correctly describe the non-perturbative dynamics of the inhomogeneities during any inflationary regime.
However, before doing so, and in order to finish this subsection, it is important to know that the stochastic system of (113) can be straightforwardly derived by splitting into a IR and an UV only the scalar field in the separate universe equations of (64). In this case we would have used the metric (63) for the local patch instead of the metric (83) as we have done here. We have not chosen this option in this review because of three main reasons: 1. It does not make it clear that we have used gradient expansion, and hence the problem of not using the momentum constraint that we solve in the next subsection is difficult to remark.

2.
Since it only works if the time variable is unperturbed, it could lead us to think that the number of e-folds N is the only allowed time variable for a stochastic formalism that describes all the scalar inhomogeneities in terms of the inflaton field. On the contrary, the derivation used in this review is valid for any time variable and makes it clear that the description of inhomogeneities in terms solely of the inflaton field is only a gauge choice.
3. It does not explicitly obtains the linear equations used for the characterization of δφ k , more concretely, for example this derivation would miss the k 2 a 2 α UV k term that multiplies the Heaviside theta in (105).

B. Stochastic Formalism Based on O(σ 0 ) Gradient Expansion
As many times claimed during the review, the separate universe approach generically fails to give the correct long-wavelength evolution of the inhomogeneities at O(ǫ 1 ). The O(σ 0 ) gradient expanision solves this problem by including both non-local terms and the momentum constraint, this is why in this section we will construct a stochastic formalism based on O(σ 0 ) gradient expansion.
First of all, it is important to remark that some of the affirmations we did about the uniform-N gauge at the beginning of Section VI A are no longer correct, more concretely, we cannot longer study the scalar inhomogeneities in terms solely of the inflaton field. This is clear form linear perturbation theory where the MS variable can be written as so if we insist on using the uniform-N gauge we must also take into account the contribution from E when studying scalar perturbations. We could also use spatially flat gauge in this case and forget about E; however, in this case we should take into account all the terms proportional to (0) β f i that appear for example in (90), this is why we will keep using the uniform-N gauge, where β i = 0. One can easily check that the stochastic equations for the evolution of the extrinsic curvature (106), for the Hamiltonian constraint (109) and for the evolution of the field (110) do not change when including E and hence the stochastic system is still the one given by (113).
The only difference will then be given by the inclusion of the momentum constraint (15), which, in uniform-N gauge can be written as: where we have used the evolution Equation forγ ij (18) to eliminateÃ ij . Splitting (127) into IR and UV and using the decomposition ofγ ij explained around (87) to keep only O(σ) terms in the IR part (remember that the O(σ 0 ) information from the momentum constraint can only be extracted if we write the momentum constraint up to O(σ)), we can write the stochastic equation for the momentum constraint: where ξ 4 is defined similarly to ξ 1 , ξ 2 and ξ 3 , i.e., With the addition of the stochastic Equation (128) to the system of (113) obtained before, we have a stochastic formalism able to describe the non-linear evolution of scalar inhomogeneities at all orders in SR parameters.
However, it is not all good news: firstly, since the construction of the gradient expansion in Section V, we have been neglecting possible interactions scalar-tensor or scalar-vector, reason why the stochastic formalism constructed here will not take these interactions into account either. Secondly, we do not exactly know how to extract the O(σ 0 ) information form (128) in a fully non linear way. Finally, we do not know which is the combination of (0) φ IR and ∇ 2 C IR that give us the correct and non-perturbative and gauge invariant quantity that describe scalar inhomogeneities, i.e., we do not have a non-linear generalization of the MS variable. The first issue is beyond the scope of this review. With respect to the third one, it is true that a non-linear gauge invariant variable at leading order in gradient expansion has been defined in [117,118] as: However, the variable above does not include the term proportional to ∇ 2 E in its linearization. Reason why it can be only interpreted as a non-linear generalization of Q sep . This is the reason we will not use the non-linear variable defined above and we will solve the second and third issues, at least approximately, by imposing that both the momentum constraint and the non-linear generalization of the MS variable match their linear counterpart when the global background is subtracted. In this way, the O(σ 0 ) information of (128) can be straightforwardly extracted and the whole system of stochastic equations based on the O(σ 0 ) gradient expansion and hence valid at all orders in SR parameters is: where we have used the Hamiltonian constraint to eliminate (0) α IR in the last equation. Note that ξ 1 ,ξ 2 and ξ 4 are constructed in the uniform-N gauge, which is no longer equivalent to the spatially flat gauge.
To see the gauge transformation between spatially flat and uniform-N gauges in linear theory one can see [60], where it is claimed that the differences between those two gauges is always of higher order in gradient expansion; however, this conclusion is reached by considering the value of ǫ 1 at horizon crossing (ǫ * 1 there) to be constant with k, which is generically not a good approximation beyond SR. In fact in [67] it is shown numerically that the difference between δφ f and δφ δN can be O(ǫ 1 ) in regimes of interest for PBH formation, in agreement with the differences between the separate universe approach and O(σ 0 ) gradient expansion remarked in this paper.
Finally, as suggested in [122][123][124], we can define the non-linear counterpart of the MS variable of (44) at leading order in gradient expansion as: where we remind the reader that (0) φ IR and (0) ∇ 2 C IR are both in the uniform N gauge. The stochastic formalism based on the O(σ 0 ) gradient expansion presented in this section was introduced for the first time, although in a different gauge, in [65], where the authors show numerically that, when computing the noises over a global background, i.e., when using the approximation described around (118), the two point correlator computed using the stochastic formalism perfectly matches with the one computed in linear perturbation theory at all orders in SR parameters, as expected. This already represents an important improvement with respect to any stochastic formalism based on the separate universe approach as the one explained in Section VI A.

C. Stochastic Formalism Versus Linear Perturbation Theory
Once the stochastic formalism has been properly introduced, the simplest way to know if there exists any important non-perturbative effect during inflation is to compare the results obtained within the stochastic framework with the results coming from linear perturbation theory explained in Section IV. In order to do so, it is crucial to realize that the results coming from the stochastic formalism are in real space whereas the result for the linear MS variable in terms of Hankel function of (51) is in Fourier space. Due to the difficulty when trying to express stochastic correlators in Fourier space, we will compare the stochastic two-point correlator with the linear two-point correlator, both in real space.
As explained below (115), the correlators between different FLRW patches are described by statistical moments of IR variables, so the two point correlator of the scalar inhomogeneities is: if we are using the separate universe approach or if we are using O(σ 0 ) in gradient expansion. On the other hand, the two-point correlator in linear perturbation theory is defined as (see (52)): where we are now using the power spectrum evaluated at the same spatial point where Q k in uniform-N gauge is given by (126). The limits of (134) correspond to the selection of modes inside the coarse-grained scale defined by k = σa(N )H b (N ) and they are necessary to correctly compare the two point correlators obtained in both formalisms, in fact, in the stochastic formalism the IR part of the field recieves stochastic kicks from N = 0 onwards. Thus the first k-mode from which the IR field recieves a kick is the one with k = σa(N = 0)H b (N = 0). Whenever P Q (k, N ) does not depend on N , one can do a very useful approximation, which consist on evaluating the power spectrum at coarse-grained scale crossing, i.e., at k = σaH b , and assume that this value does not change with time. This would allow us to write (134) as: In this case one could write the power spectrum as the derivative with respect to the number of e-folds N of the correlator in real space.
However, and although it has been sometimes wrongly used for a time-dependent power spectrum [33], this technique cannot be used if the power spectrum evolves with time, which makes it only valid at zeroth order in O(ǫ 1 ).
Finally, and due to the difficulty in the definition of non-linear gauge invariant variables (remember for example that the "trivial" non-linear generalization of the MS variable of (131) is not guaranteed to be the correct one), some authors have used the so-called stochastic-δN formalism to compute the non-linear curvature perturbation, which is very useful when dealing with PBHs, by studying the statistics in the number of e-folds [36,41,46,50,51]. However, and similarly to what happens with the usual δN formalism, the stochastic-δN formalism uses the separate universe approach and hence it is generically only valid at leading order in SR parameters.

VII. CONCLUSIONS
The stochastic approach to inflation seems the most promising way to study, in a non-perturbative way, the probability distribution of the scalar inhomogeneities generated during inflation and responsible for the creation of PBHs. However, as remarked during the review, there is still a lot of work to do in that direction. Indeed, although very useful, the most common and widely used stochastic formalism has some difficult problems to solve, which can be summed up in three points: 1. As we have seen during the review, the stochastic formalism uses a gradient expansion for the IR part and a perturbative expansion for the UV part in such a way that the IR part, due to the large wavelength of the characteristic inhomogeneities that form this sector, can be described as a local FLRW universe. This description, that we called O(σ 0 ) gradient expansion, relates the different local FLRW patches via non-local terms and the momentum constraint, and describes at all orders in SR parameters the correct dynamics of long wavelength scalar inhomogeneities. However, it presents some problems such as the extraction of the O(σ 0 ) information from the momentum constraint.
This is the reason why it is usually assumed that the different local FLRW universes evolve independently from each other, which is an assumption known as the separate universe approach. Under this approximation, the problem with the momentum constraint disappears (the momentum constraint itself disappears); nevertheless, we checked in Section IV A 2 that this assumption fails to describe the correct long-wavelength dynamics of scalar perturbations generically at O(ǫ 1 ) already in its linear limit. For this reason, the stochastic formalism commonly used presented in Section VI A, which is based on the separate universe approach will fail to describe the non-perturbative dynamics of the scalar inhomogeneities at O(ǫ 1 ).
A stochastic formalism that does not uses the separate universe approach can also be constructed, as we did in Section VI B. However, this option is not without its difficulties, more concretely, it has problems when extracting the long wavelength information from the momentum constraint and when describing the scalar inhomogeneities with a gauge invariant generalization of the MS variable.
2. On the other hand, the UV part of the inhomogeneities is assumed to behave perturbatively, having as a background the local FLRW background defined by the IR part. Since the UV part acts as a stochastic noise for the IR part and the IR part is necessary to characterize the UV noises, the stochastic approach to inflation is generically a non-Markovian process, meaning that the value of the noises depend on the whole history of the local patch over which they are computed.
This does not represent a huge problem when solving the stochastic equations numerically; however, in order to have analytical results we need to do some other approximation that makes the process Markovian. This approximation consists of computing the stochastic noises over the global background instead of over the local one. Unfortunately, doing so is equivalent to assume that all the IR inhomogeneities are linear in perturbation theory.
3. Finally, and although we have not paid too much attention to this problem, any stochastic approach that aims to describe the non-perturbative behaviour of scalar inhomogeneities at the long wavelength limit should also take into account scalar-vector and scalar-tensor interactions, which no longer decouple beyond linear perturbation theory.