Mass and rate of hierarchical black hole mergers in young, globular and nuclear star clusters

Hierarchical mergers are one of the distinctive signatures of binary black hole (BBH) formation through dynamical evolution. Here, we present a fast semi-analytic approach to simulate hierarchical mergers in nuclear star clusters (NSCs), globular clusters (GCs) and young star clusters (YSCs). Hierarchical mergers are more common in NSCs than they are in both GCs and YSCs, because of the different escape velocity. The mass distribution of hierarchical BBHs strongly depends on the properties of first-generation BBHs, such as their progenitor's metallicity. In our fiducial model, we form black holes (BHs) with masses up to $\sim{}10^3$ M$_\odot$ in NSCs and up to $\sim{}10^2$ M$_\odot$ in both GCs and YSCs. When escape velocities in excess of 100 km~s$^{-1}$ are considered, BHs with mass $>10^3$ M$_\odot$ are allowed to form in NSCs. Hierarchical mergers lead to the formation of BHs in the pair instability mass gap and intermediate-mass BHs, but only in metal-poor environments. The local BBH merger rate in our models ranges from $\sim{}10$ to $\sim{} 60$ Gpc$^{-3}$ yr$^{-1}$; hierarchical BBHs in NSCs account for $\sim{}10^{-2}- 0.2$ Gpc$^{-3}$ yr$^{-1}$, with a strong upper limit of $\sim{}10$ Gpc$^{-3}$ yr$^{-1}$. When comparing our models with the second gravitational-wave transient catalog, we find that multiple formation channels are favored to reproduce the observed BBH population.

One of the distinctive signatures of the dynamical scenario is the formation of hierarchical mergers, i.e. repeated mergers of stellar-origin black holes (BHs) that build up more massive ones [84][85][86][87][88]. This process is possible only in dense star clusters, where the merger remnant, which is initially a single BH, can acquire a companion by dynamical exchanges [89]. The main obstacle to the formation of second-generation (2g) BHs via hierarchical mergers is the high relativistic kick that the merger remnant receives at birth, because of radiation of arXiv:2007.15022v2 [astro-ph.HE] 12 Sep 2021 linear momentum through beamed GW emission (e.g. [90][91][92][93]). This kick can be up to several thousand km s −1 and can easily eject the BH remnant from its parent star cluster [76,[94][95][96][97]. Hence, the interplay between the properties of the host star cluster (e.g. its escape velocity), those of the first-generation (1g) BBH population and the magnitude of the kick decides the maximum mass of a merger remnant in a given environment. This might be used to constrain the formation channels of BBHs.
The spins of 1g BHs are one of the critical ingredients, because relativistic kicks are sensitive to spin magnitudes and orientation (e.g. [98,99]). In the zero-spin assumption, more than 10% of merging BBHs from GCs have components formed from previous mergers, accounting for more than 20% of the mergers from GCs detectable by Advanced LIGO and Virgo [100].
The main challenge of studying hierarchical mergers is the computational cost. It is nearly impossible to investigate the relevant parameter space with hybrid Monte Carlo and/or Nbody simulations of star clusters, especially GCs and NSCs. Here, we present a new fast and flexible semi-analytic model to investigate hierarchical mergers in different environments, complementary to dynamical simulations. Our new tool allows us to probe the parameter space (1g masses, spins, delay times, 2g masses, spins and delay times, escape velocity from the parent cluster and kick magnitudes) and to reconstruct the merger rate evolution of each formation channel, with just minimal model assumptions.

Methods
We consider four different environments: i) the field, where hierarchical mergers are not possible, ii) young star clusters (YSCs), which are the main birth site of massive stars in the local Universe (e.g. [127]), iii) globular clusters (GCs), and iv) nuclear star clusters (NSCs). To evaluate the properties of 1g mergers, we start from catalogs of single and binary BHs obtained with population-synthesis simulations. When the 1g BHs merge, we estimate the relativistic kick v kick and the escape velocity from the parent star cluster v esc . If v kick < v esc , we assume that the merger remnant remains bound to its parent star cluster and can pair with another BH dynamically. We estimate the mass and spin of the merger remnant and of its new companion, as detailed below. Then, we randomly draw a new delay time between previous and next merger. If the sum of the new delay time and the previous one is shorter than the Hubble time, we repeat the loop for another generation.

First generation (1g) mergers
We take the mass of 1g BHs from our population synthesis simulations. In particular, we used our code MOBSE [28,30,31]. MOBSE is an upgraded and customized version of BSE [128]. The treatment of stellar winds is one of the key aspects affecting the final mass of BHs and is subject to a number of uncertainties [129]. In MOBSE, mass loss by stellar winds for massive host stars (O-type, B-type, luminous blue variable and Wolf-Rayet stars) is modeled asṀ ∝ Z β , where Z is the metallicity and In eq. 1, Γ e is the Eddington ratio, i.e. the ratio between the luminosity of the star and its Eddington value. This formalism, introduced by [130], is a fit to the models presented in [131]. It accounts for both the Z dependence of line-driven winds [132] and the importance of Thomson scattering when a star is nearly radiation pressure dominated [131,133]. In detail, for O and B-type stars with effective temperature T eff ≥ 12500 K, we use the same fitting formulas as introduced by [132], but we correct the Z dependence as described in equation 1. We model the mass loss rate of Wolf-Rayet stars as [134] The mass loss rate of luminous blue variable stars is [30] Finally, the treatment of mass loss of cold massive stars is the same as originally described by [135]. The effect of core-collapse supernovae on the mass of compact objects is described following the delayed model of [136]. According to this model, stars with final carbon-oxygen mass m CO 11 M collapse to a BH directly. The minimum BH mass is 3 M . Following [137] and [138], we compute neutrino mass loss for both neutron stars and BHs as where m bar is the baryonic mass of the compact object. The resulting gravitational mass of the compact object is m grav = m bar − m ν . Stars with helium core mass (at the end of carbon burning) 32 ≤ m He ≤ 64 and 64 ≤ m He ≤ 135 undergo pulsational pair instability and pair instability supernovae, respectively [104]. Stars that undergo a pair instability supernova leave no compact remnant, while stars going through pulsational pair instability become BHs with mass m BH = α P m no, PPI , where the possible values of α P ≤ 1 are discussed in [107] and m no, PPI is the BH mass from direct collapse, if pulsational pair instability is not accounted for. Finally, electron-capture supernovae are included following [139]. For natal kicks, we adopt the prescription v k ∝ m ej m −1 rem , where m ej is the mass of the ejecta and m rem is the mass of the compact remnant (neutron star or BH, [140]).
Binary evolution processes (wind mass transfer, Roche lobe overflow, common envelope, mergers, tidal evolution, GW decays) are implemented as in [128], with one significant exception. During Roche lobe overflow, the accretion rate is calculated aṡ whereṁ a is the accretion rate,ṁ d is the mass loss rate by the donor,ṁ Edd is the Eddington accretion rate and f MT ∈ (0, 1] is the accretion efficiency. Here, we consider f MT = 0.1, 0.5, 1.0. The original prescriptions by [128] are close to f MT = 1.0. We parametrize common envelope evolution with the parameter α [128]. Here, we consider α = 1, 5, 10, large values of α meaning that the envelope is easily ejected, without much shrinking of the binary. In its original meaning [141], α is the fraction of orbital energy that is transferred to the envelope during the spiral-in phase. Here, we also consider values of α > 1, because the original formalism does not include additional contributions to the energy budget (e.g. [142,143]). Giacobbo et al. (2018, [30]) have shown (e.g. their Figure 4) that with these prescriptions for stellar and binary evolution the maximum mass of a single BH can be as high as m BH ≈ 65 − 70 M . Such massive BHs come from metal-poor stars 1 (Z ∼ 0.0002) with initial mass m ZAMS ≈ 70 − 80 M , which retain most of their hydrogen envelope at the time of collapse and have sufficiently small helium cores to avoid pulsational pair instability [107]. However, the maximum mass of a BH merging within a Hubble time as a result of isolated binary evolution is only m BH ≈ 50 M [30]. This happens because binary stars that are sufficiently tight to merge within a Hubble time by GW emission evolve through mass transfer and common envelope. These processes remove the hydrogen envelope, leading to smaller BH masses. Hence, the resulting BBH cannot have a total mass higher than m TOT ≈ 100 M .
In dynamical environments, exchanges and dynamical hardening might allow even more massive BHs to merge, up to total binary masses m TOT ≈ 130 − 140 M [56]. For this reason, we consider two different sets of models for 1g masses. In our fiducial model A5F05 (conservative approach), the masses of 1g BBHs are randomly drawn from catalogs of BBHs simulated with MOBSE. In this case, the evolution of the semi-major axis A and of the eccentricity e of the BBHs are calculated as [146] dA dt = − 64 5 where G is the gravity constant, c is the speed of light, m 1 and m 2 are the masses of the primary and secondary BH, respectively. In the HIGH_MASS model (optimistic approach), the masses of field BBHs are still taken from catalogs of BBH mergers, while the masses of 1g dynamical BBHs are uniformly drawn from the list of all the BHs formed with MOBSE, which include both single and binary BHs, both merging and non-merging systems. This ensures that the masses of dynamically formed 1g BBHs can reach m TOT ≈ 140 M , while the maximum total mass of field binaries is m TOT ≈ 100 M . In the HIGH_MASS case, we randomly pair the primary and the secondary component and we randomly draw the delay time (i.e., the time elapsed from the formation of the BBH to its merger) from a distribution dN/dt ∝ t −1 between t min = 10 7 yr and t max = 1.4 × 10 10 yr [22,56].
We define the dimensionless spin magnitude a as a ≡ S c/(G m 2 BH ), where S is the spin magnitude in physical units. Spin magnitudes of 1g BHs are randomly drawn from a Maxwellian distribution with fiducial one-dimension root-mean square σ a = 0.2 and truncated at a = 1. We consider also two extreme cases in which σ a = 0.01 (LOW_SPIN model) and σ a = 0.4 (HIGH_SPIN model). This is just a toy model because the uncertainties on BH spin magnitudes from stellar evolution and core-collapse supernova models are still too large to make predictive statements. Angular momentum transport via the magnetic Tayler-Spruit instability might be effective and lead to predominantly low spins (e.g. [147,148]), while binary evolution processes can significantly affect the overall picture [149,150]. Our LOW_SPIN case can be interpreted as the result of the spin distribution inferred by [147]. Spin directions in dynamical BBHs are isotropically distributed over a sphere [62].
Our set of runs is described in Table 1. The initial MOBSE population of each model is obtained running 1.2 × 10 8 binary stars with metallicity Z = 0.02, 0.016, 0.012, 0.008, 0.006, 0.004, 0.002, 0.0016, 0.0012, 0.0008, 0.0004, 0.0002. The initial mass of the primary is drawn from a Kroupa initial mass function [151] between 5 and 150 M . Mass ratios, orbital periods and eccentricities are randomly drawn following the distributions presented in [152]. Column 1: Name of the model. Column 2: parameter α of common envelope for 1g BBHs. Column 3: parameter f MT of accretion efficiency for non-degenerate accretors (eq. 5) in the case of 1g BBHs. Column 4: delay time distribution of 1g BBHs; 'eq. 6' indicates that the delay times were calculated solving eq. 6; 't −1 ' means that delay times were randomly drawn from dN/dt ∝ t −1 . Column 5: one-dimensional root-mean square associated with the Maxwellian distribution used to extract 1g spin magnitudes; we adopted values σ a = 0.2 (fiducial), 0.01 (LOW_SPIN), 0.4 (HIGH_SPIN). Column 6: distribution from which we drew the mass of the secondary component in the Ng BBHs; 'uniform' means that m 2 is uniformly distributed between m MIN = 3 M and m MAX = m 1 (fiducial), 'MOBSE' means that we randomly selected m 2 from catalogs of BHs simulated with MOBSE (used in the SMALL_M2 run). Column 7: t min is the minimum delay time for Ng BBHs. Column 8: mean and standard deviation of the lognormal distribution of escape velocities v esc for NSCs, GCs and YSCs.

Relativistic kicks
We model the magnitude of relativistic kicks according to equation 12 of [98]: where In the above equations, q = m 2 /m 1 with m 2 ≤ m 1 , η = q (1 + q) −2 , A = 1.2 × 10 4 km s −1 , B = −0.93, H = 6.9 × 10 3 km s −1 , (V 1,1 , V A , V B , V C ) = (3678, 2481, 1792, 1506) km s −1 , ξ = 145 • [153], while a 1 and a 2 are the spin vectors of the primary and secondary BHs, respectively. Moreover, a 1 (a 2 ) is the component of the spin of the primary (secondary) BH parallel to the orbital angular momentum of the binary system, while a 1⊥ (a 2⊥ ) is the component of the spin of the primary (secondary) BH lying in the orbital plane. S is the component parallel to the orbital angular momentum of the vector S = 2 ( a 1 + q 2 a 2 )/(1 + q) 2 . Finally, φ ∆ represents the angle between the direction of the infall at merger (which we randomly draw in the BBH orbital plane) and the in-plane component of ∆ ≡ (m 1 + m 2 ) 2 ( a 1 − q a 2 )/(1 + q), while φ is the phase of the BBH, randomly drawn between 0 and 2 π. Equation 7 results from an empirical model for the recoil velocity as a function of the progenitor's parameters (mostly q, a 1 and a 2 ). The basic idea behind it is that the recoil of spinning BHs is mostly produced close to the time of merger [154], and can be modeled by a parametrized dependence of the leading (on spins and mass ratio) post-Newtonian expressions for the linear momentum radiated [155]. The term v m mainly comes from the contribution of the mass ratio q to the linear momentum radiated (the merger of an unequal mass BBH produces a kick even if the two BHs are non spinning), while v ⊥ and v account for the contribution of the spin components aligned and orthogonal to the orbital angular momentum, respectively. The coefficients in eq. 7 are given by fits to full numerical relativity simulations, as detailed in [98]. This formalism yields kicks up to ∼ 4000 km s −1 , but the most common kicks are of the order of a few hundred km s −1 , as shown in Figure 9 of [98] and discussed in Section 3.1.

Escape velocities
For each merger, we calculate the relativistic kick magnitude as in eq. 7 and then compare it with the escape velocity of the host cluster v esc . If v kick < v esc , the remnant is retained inside its host cluster and can undergo another merger. Otherwise, it is ejected and remains a single BH. We randomly draw v esc from a log-normal distribution with median 2) for NSCs, GCs, and YSCs, respectively. This choice is motivated by observations of NSCs [71], GCs [156] and YSCs [127] in the local Universe. Figure 1 shows the distribution of escape velocities in our fiducial case. In the next sections, we show what happens if we change these assumptions. Namely, in the BROAD_VESC (NARROW_VESC) model we assume σ v = 0.3 (0.1) for NSCs, GCs and YSCs.

Nth generation (Ng) mass and spin
We model the mass and spin of merger remnants using the fitting formulas in [157] for quasi-circular non-precessing mergers (see also [76,158,159]). The final mass is ≈ 0.95 the total mass of the two merging BHs, while the final spin magnitude clusters around a f ≈ 0.75. If the merger remnant is retained, it eventually pairs up with another BH. The mass of the companion is selected in two different ways. To account for the fact that the secondary component might be either a 1g or a Ng object with N > 1, we uniformly draw the mass of the secondary BH, m 2 , between m MIN = 3 M and m MAX = m 1 (fiducial model). This assumption favors Ng−Ng mergers with respect to Ng−1g mergers. To account for cases in which the primary component is a Ng merger (with N > 1) and the secondary component is a 1g BH, we draw the mass of the secondary BH from the population-synthesis catalogs of 1g BHs (model SMALL_M2).
The spins of secondary BHs are randomly drawn from a Maxwellian distribution with default one-dimensional root-mean square σ a = 0.2. In the LOW_SPIN (HIGH_SPIN) case, σ a = 0.01 (0.4). This is a simplification, because we do not distinguish whether the secondary component is a Ng or a 1g BH. The spin vectors of both the primary and secondary BH are isotropically distributed over a sphere.
Finally, in all models we check that the mass of the remnant BH is always less than This condition is equivalent to assuming that the most massive BH cannot be more massive than the total mass of all BHs in the star cluster, assuming a Kroupa IMF. If a BH hits this mass threshold, it cannot grow any further by hierarchical merger.

Delay times of Ng mergers
For Ng mergers (where N = 2 or more), we randomly draw the delay times according to a distribution uniform in dN/dt ∝ t −1 [22] and spanning from t min = [0.1, 10, 100] Myr (see column 7 of Table 1) to t max = 1.4 × 10 4 Myr. By adopting dN/dt ∝ t −1 , we have assumed that GW decay is the dominant effect to determine the delay time. Since the GW timescale τ GW ∝ A 4 , where A is the initial semi-major axis of the BBH [146], for an initial semi-major axis distribution f (A) ∝ A −1 , we obtain a delay time distribution dN/dt ∝ t −1 [160][161][162]. This is the crudest assumption in our method, because we neglect the impact of dynamical hardening on the evolution of the BBH semi-major axis. However, this assumption is supported by N−body simulations of dense YSCs [55,56], which show that dynamical BBH mergers follow a trend dN/dt ∝ t −1 . The choice of t min depends on the time for dynamical pairing of the BBH t dyn , i.e. the time needed for a single BH to find a new companion BH via dynamical interactions. This can be estimated as the sum of the dynamical friction timescale (t DF , i.e. the time over which the merger remnant, which is ejected in the outskirts of the star cluster by the relativistic kick, sinks back to the core of the parent cluster by dynamical friction, [163]) and the three-body timescale (t 3bb , i.e. the timescale for BBH formation by three-body encounters, [164]): where M SC and n are the star cluster mass and central number density, while σ SC = v esc /(2 √ 3) is the one-dimensional velocity dispersion. In eq. 10, we assumed that the average mass of a star in the star cluster is 1 M and that, locally, the star cluster is in equipartition [165]. The timescale for dynamical BBH formation is then t dyn = t DF + t 3bb . For most star clusters, we find 0.1 ≤ t dyn /Myr ≤ 100, roughly corresponding to the values of t min we assume in our analysis (column 7 of Table 1).

Summary of the models
In Table 1, A5F05 is our fiducial model. Models with name AiFj (with i = 1, 5, 10 and j = 01, 05, 1) differ from the fiducial model only for the choice of the common envelope parameter α (α = 1, 5, 10 if i = 1, 5, 10) and of the accretion efficiency f MT ( f MT = 0.1, 0.5, 1 if j = 01, 05, 1). The model HIGH_MASS differs from the fiducial model for the choice of the masses of 1g BHs and for their delay time distribution. In the model HIGH_MASS, 1g BH masses in star clusters are uniformly sampled from all BHs generated with MOBSE (including single BHs), mimicking the impact of dynamical exchanges. Delay times are drawn from dN/dt ∝ t −1 .
The SMALL_M2 model differs from the fiducial one for the masses m 2 of secondary BHs in Ng mergers, which are randomly drawn from 1g BHs. Hence, in this model all BBH mergers occur with a 1g secondary BH. In contrast, m 2 is uniformly sampled in [3 M , m 1 ] in all the other models.
The LOW_SPIN and HIGH_SPIN models differ from the fiducial model only for the distribution of spin magnitudes of 1g BHs. The one-dimensional root-mean square σ a is 0.01 and 0.4 in LOW_SPIN and HIGH_SPIN, respectively.
The SHORT_DELAY and LONG_DELAY models differ only for the minimum value of the delay time t min of Ng mergers, which is 0.1 and 100 Myr, respectively. Finally, the BROAD_VESC and NARROW_VESC models differ from the fiducial case for the standard deviation of the log-normal distribution of v esc , which is 0.3 and 0.1 in the former and in the latter case.

Merger rate
We calculate the merger rate by assuming that each channel accounts for a fraction f i (t) of the star formation rate density at a given look-back time t (where i = NSC, GC, YSC or field). In our fiducial model, we assume that f GC (t) is given by where f max,GC = 0.1, t GC = 11.8 Gyr and σ t = 2.5 Gyr. The parameters t GC and σ t are chosen based on the age distribution of Galactic GCs [166][167][168].
NSCs likely are the result of the dynamical assembly of GCs, which sink to the center of the galactic potential well by dynamical friction [169][170][171][172][173][174][175], plus some contribution from in situ star formation [176]. Hence, we assume the same functional form for f NSC (t), with a different normalization: where f max,NSC = 0.01, while t GC and σ t are the same as in eq. 11. The values of both f max,GC and f max,NSC are calibrated to give a mass budget of GCs and NSCs that matches the observed ones at low redshifts [156,177].
To keep the fiducial model as simple as possible, we assume In the following sections, we briefly discuss the impact of changing the f i (t) parameters on the merger rate.
The total merger rate for each channel is then evaluated as where t(z) is the look-back time at redshift z, ψ(z ) is the cosmic star formation rate density at redshift z , f i (z ) is the fraction of the total star formation rate that goes into channel i = NSCs, GCs, YSCs or field at redshift z , Z min (z ) and Z max (z ) are the minimum and maximum metallicity of stars formed at redshift z , η(Z) is the merger efficiency at metallicity Z, and F (z , z, Z) is the fraction of BBHs that form at redshift z from stars with metallicity Z and merge at redshift z, normalized to all BBHs that form from stars with metallicity Z. To calculate the look-back time we take the cosmological parameters (H 0 , Ω M and Ω Λ ) from [178]. The maximum considered redshift in equation 14 is z max = 15, which we assume to be the epoch of formation of the first stars. The merger efficiency η(Z) is estimated as the number of BBHs that merge within a Hubble time in a coeval population of star with initial mass M * and metallicity Z, divided by M * . We take the fitting formula for the star formation rate density ψ(z ) from [179]: and we model the metallicity evolution as described in [180]. Equation 15 is based on data ranging from z = 0 to z ∼ 10 [179]. In future work, we will consider alternative options to model the very high redshift star formation, such as modelling population III stars separately [181] and considering high-redshift long gamma-ray bursts as tracers of star formation ( [182], but see [183] for some caveats).  Figure 2 shows the mass of the primary BH (m 1 ), the mass of the secondary BH (m 2 ), the spin magnitude of the primary BH (a 1 ) and the kick velocity (v kick ) in the fiducial model (A5F05) for Z = 0.0002. The maximum primary and secondary mass strongly depend on the environment: we have 2g BBHs even in YSCs and GCs, but NSCs are more effective in producing hierarchical mergers, because of the larger value of v esc . In the fiducial model, the maximum primary mass is ≈ 100 M in YSCs and GCs, while it is close to ≈ 10 3 M in NSCs. The distribution of primary spin magnitudes shows a clear secondary peak at a 1 ≈ 0.7 − 0.8 in both GCs and NSCs, corresponding to the typical values of Ng merger remnants. The most common kick velocities are v kick ∼ 100 − 300 km s −1 , but larger kicks, up to ∼ 3000 km s −1 , are possible. Figures 3 and 4 compare the primary mass distributions that we obtain by varying the values of α and f MT in the first generation of BHs for metallicity Z = 0.0002 and 0.002, respectively. By comparing Figures 3 and 4, it is apparent that both the BH mass distribution and the maximum BH mass strongly depend on progenitor's metallicity, even in hierarchical mergers. Also, the efficiency of common-envelope ejection α and the efficiency of mass accretion f MT significantly affect the mass distribution of Ng BHs. Hence, the mass distribution of 1g BBHs, which strongly depends on metallicity, has a crucial impact on the mass distribution of Ng BHs.

Properties of hierarchical mergers
In Figure 5, we fix α = 5, f MT = 0.5 and Z = 0.0002, and consider the impact of the other main parameters of our model. Drawing the mass of 1g BBHs from the distribution of all 1g BHs (including single BHs) shifts the entire distribution of dynamical mergers to higher masses. In the model HIGH_MASS, the most common primary mass of dynamical BBHs is ∼ 30 − 50 M , while the primary masses of field BBHs peak at ∼ 10 M . The reason is that MOBSE allows the formation of BHs with mass up to ∼ 65 M , but small BHs  In the SMALL_M2 model, we draw the secondary mass from the distribution of 1g BHs, which is equivalent to assuming that only Ng-1g mergers are possible. Hence, this model differs from the others because of the smaller values of q = m 2 /m 1 in hierarchical mergers. We observe a peculiar trend of m 1 in the NSC case: values of m 1 ∼ 10 4 M are about one order of magnitude more common than m 1 ∼ 200 M . The reason is that relativistic kicks get smaller and smaller if q tends to zero. Hence, the maximum mass of the primary BH in this model is set by the number of hierarchical mergers that happen within a Hubble time, rather than by the relativistic kicks.
If we compare the LOW_SPIN (σ a = 0.01) and HIGH_SPIN model (σ a = 0.4), we see that high-mass BHs are more and more suppressed if the spin distribution moves to higher values, because relativistic kicks get stronger. In contrast, we find only a mild difference between the SHORT_DELAY and LONG_DELAY model, in which we change the minimum delay time t min .
Finally, the escape velocity has a large impact on hierarchical BH masses, especially for NSCs. BHs with mass up to ∼ 10 6 M form if σ v = 0.3 (BROAD_VESC), two orders of magnitude more than if σ v = 0.2 (fiducial case, A5F05) and three orders of magnitude more   than if σ v = 0.1 (NARROW_VESC). This result is consistent with both [102] and [47], who report that the maximum BH mass approaches 10 6 M if v esc ≥ 300 km s −1 are considered. This might be a key ingredient to understand the formation of super-massive BHs and the connection between the mass of the central BH and its parent galaxy mass/central velocity dispersion [177,[184][185][186]. Figure 6 shows the number of BBHs we simulated per each generation Ng in the case of Z = 0.0002 and the maximum primary mass in each generation. We only show NSCs because BHs in GCs and YSCs do not exceed the 5th and 3rd generation, respectively. The maximum number of generations in NSCs ranges from a few to a few thousands. In the fiducial case and in most of the other simulations, the maximum number of generations is N ∼ 10. Only in three cases we obtain a significantly larger number of generations, namely the BROAD_VESC model (≈ 40 generations), the HIGH_MASS model (≈ 50 generations) and the SMALL_M2 model (≈ 5000 generations). The SMALL_M2 case outnumbers all the other models for the number of generations, because of the strong dependence of v kick on q. However, even in this extreme case, the number of Ng mergers with N ≥ 20 is ∼ 10 5 times lower than the number of mergers in the first generation. The upper panel of Fig. 7 shows the fraction of Ng BBH mergers with N > 1 with respect to all BBH mergers, defined as f >1g = (N 2g + N 3g + .. + N Ng )/N BBH , where N 1g , N 2g , N 3g ,..,N Ng is the number of 1g, 2g, 3g,..,Ng BBH mergers and N BBH is the total number of BBH mergers summing up all possible generations including the first one. In this figure, f >1g is only shown for NSCs. In the fiducial model and in NSCs, Ng BBH mergers with N > 1 are about 16% of all the BBH mergers, with a small dependence on metallicity. For other models, the percentage of Ng BBHs can be as low as ∼ 8% (HIGH_SPIN case) or as high as ∼ 40 − 50% (LOW_SPIN case). For GCs and YSCs these percentages should be lowered by a factor of ∼ 30 and ∼ 10 3 , respectively. Table 2 reports the values of f >1g in detail.

BHs in the mass gap and IMBHs
Hierarchical mergers could be responsible for the formation of BHs with mass in the pair instability mass gap (∼ 60 − 120 M ) or even in the IMBH regime (> 100 M ). The bottom left panel of Figure 7 shows f PISN defined as f PISN = N PISN /N BBH , where N PISN is the number of BBH mergers with primary mass in the pair instability mass gap, while N BBH is the number of all BBH mergers. In our fiducial model and in NSCs, ∼ 0.7% of all BBH mergers contain at least one BH in the pair instability mass gap at the lowest metallicity (Z = 0.0002). This percentage decreases as metallicity increases and drops to zero at Z ≥ 0.012. The other models follow the same trend with metallicity. The HIGH_MASS model is the one with the largest value of f PISN : in this case, up to 7.5% of all the BBH mergers contain at least one BH in the pair instability mass gap at the lowest metallicity (Z = 0.0002). These percentages should be lowered by a factor of 10 in GCs and by a factor of ∼ 10 3 in YSCs.
In the bottom right panel of Figure 7, we show the fraction of IMBH mergers f IMBH , defined as f IMBH = N IMBH /N BBH , where N IMBH is the number of BBH mergers with primary mass m 1 > 10 2 M . The fraction of IMBH mergers follows the same trend with metallicity as f PISN : it is higher at lower Z and drops to zero at Z ≥ 4 × 10 −3 . In the fiducial model, f IMBH ∼ 5 × 10 −4 at Z = 0.0002 in NSCs. We find no IMBHs in GCs and YSCs in the fiducial case. The fraction of IMBH mergers is maximum in the SMALL_M2 simulation, where f IMBH ∼ 2 × 10 −2 at Z = 0.0002. Moreover, f IMBH ∼ 6 × 10 −4 and ∼ 3 × 10 −5 at Z = 0.0002 in the HIGH_MASS case for GCs and YSCs, respectively. Table 2 reports the values of f PISN and f IMBH in detail. Figures 8 and 9 show the merger rate density evolution for all our models, calculated as detailed in Section 2.7. The contribution of each channel to the total merger rate density is set by the value of f i (z), because hierarchical mergers are only a small fraction of the total BBH mergers (Figure 7). Since f i (z) is highly uncertain, the relative importance of different channels in Figures 8 and 9 can change wildly and is only indicative. The uncertainty is particularly large for field and YSCs.

Merger rates
Models with α = 1 have a higher merger rate than models with α = 5, 10. The merger rate evolution of dynamical BBHs in the HIGH_MASS case is remarkably different from the other cases. The reason is our choice of the delay time distribution of 1g BBHs (dN/dt ∝ t −1 ), which does not take into account a possible dependence of t delay on the mass and other properties of BBHs. In particular, the delay time distribution obtained with MOBSE tends to deviate from the dN/dt ∝ t −1 trend when t delay < 1 Gyr. Hence, dynamical BBHs in the HIGH_MASS case have shorter delay times than the fiducial case. Figures 8 and 9 also show the BBH merger rate density we obtain if we consider only Ng BBHs in NSCs. In the local Universe, the merger rate density of Ng BBHs in NSCs ranges from ∼ 10 −2 to ∼ 0.2 Gpc −3 yr −1 . For GCs and YSCs we obtain lower values because, even if these star clusters are likely more common than NSCs, the occurrence of Ng BBH mergers in GCs and YSCs is lower than in NSCs (e.g. Section 3.1). Figure 10 shows the total mass distribution of primary BHs in the source frame at redshift z = 1. NSCs are responsible for the high mass tail (m 1 100 M ) at all redshifts and in all models. We show only the distribution at z = 1, because we do not see significant changes of the mass distribution with redshift in all cases but the HIGH_MASS model. In this case, the importance of dynamical BBHs drops at redshift zero because of the different delay time distributions (Figure 9). Figure 11 shows the mass distribution of primary BHs at redshift z = 2 for NSCs only. We separate 1g BBHs from Ng BBHs with N > 1. The maximum mass of 1g BBHs extends up to ∼ 40 M in all simulations but the HIGH_MASS case. In the HIGH_MASS case, 1g BHs with mass up to ∼ 100 M are possible, because this model includes BHs that form with mass in the pair instability gap from the merger of massive stars [56] and acquire companions by dynamical exchanges.

Mass distribution at different redshifts
The mass of Ng BHs extends up to ∼ 100 − 200 M in most models, with the exception of the following runs. In the HIGH_MASS case, we find primary BHs with mass up to ∼ 600 − 10 3 M . In the SMALL_M2 case, the most massive BH reaches ∼ 5 × 10 4 M . Finally, this realization of the BROAD_VESC model produces one single BH with mass ∼ 4.3 × 10 5 M . To obtain the shown distributions, we started from catalogs of ≥ 10 6 BBHs.     confirms that the distribution of Ng BBHs strongly depends not only on the properties of the environment (e.g. v esc ) but also on the mass distribution of 1g BHs.

Comparison with BBHs in GWTC-2
To compare our models against GW events in the first (O1), second (O2) and in the first part of the third observing run (O3a) of the LIGO-Virgo collaboration (hereafter, the GWTC-2 catalog [9]), we use a hierarchical Bayesian approach. In this framework, the posterior for a set of data {h} k observed during an observation time T obs and a model parametrized by λ is well described by an in-homogeneous Poisson process [187,188] where θ are the GW parameters, N λ is the number of events predicted by the astrophysical model, µ λ is the predicted number of detections associated with the model and GW detector, π(λ, N λ ) is the prior distribution on λ and N λ , and L k ({h} k |θ) is the likelihood of the k−th detection.
The predicted number of detections is given by µ(λ) = N λ β(λ), where β(λ) = θ p(θ|λ) p det (θ) dθ is the detection efficiency of the model; p det (θ) is the probability of detecting a source with parameters θ and can be inferred by computing the optimal signal-to-noise ratio and comparing it to a detection threshold [189]. The values for the event's log-likelihood are derived from the posterior and prior samples released by the LIGO-Virgo collaboration, such that the integral in the above equation is approximated with a Monte Carlo approach as where θ k i is the i−th posterior sample for the k−th detection and N k s is the total number of posterior samples for the k−th detection. To compute the prior term in the denominator, we use a Gaussian kernel density estimation.
In practice, for each model, we generate a catalog of a fixed number of sources (fixed to 50000 sources), such that the sources are distributed according to the merger rate density of the model. Each entry of the catalog is represented by a set of parameters θ = {M c , q , χ eff , z}, where M c is the chirp mass of the source, q the mass ratio, χ eff is the effective spin and z the redshift, that was set to take values between 0 and 2. More details on this procedure are described in [188] and [189].
In our analysis, our model distribution is the sum of the contributions from multiple channels (isolated BBHs, dynamical BBHs in YSCs, GCs and NSCs) weighted by mixing fraction hyper-parameters as p(θ|ξ 1 , ξ 2 , , ξ 3 , ξ 4 , λ) = ξ 1 p(θ|Field, λ) + ξ 2 p(θ|YSC, λ) + ξ 3 p(θ|GC, λ) + ξ 4 p(θ|NSC, λ), (18) where ξ 1 , ξ 2 , ξ 3 and ξ 4 are the mixing fractions of the field (Field), young star cluster (YSC), globular cluster (GC) and nuclear star cluster (NSC) scenarios, defined so that ξ 1 + ξ 2 + ξ 3 + ξ 4 = 1. Based on this definition, the mixing fraction for each channel is the fraction of merger events associated with that specific channel. Figure 12 shows the posterior probability distribution of the mixing fractions for the four different channels and for a selection of our models. The results show large variations from one model to another. The strong fluctuations of ξ i from one model to another indicate that the mixing fractions are extremely sensitive to the hyper-parameters λ. Considering the large uncertainties on the astrophysical models, different assumptions about the 1g BBH mass function or other key parameters deeply affect the values of ξ i . However, there is one common feature: the results of GWTC-2 support the co-existence of multiple channels. In fact, the median value of the mixing fraction is significantly higher than zero for at least two of the four channels in each specific model. For example, the posterior distributions of the mixing fraction of the NSC, YSC and Field channels peak at values significantly larger than zero in the A5F05, LONG_DELAY, SHORT_DELAY, HIGH_SPIN, BROAD_VESC, and NARROW_VESC models. This result is in agreement with previous work [189][190][191][192].
Overall, the Field model seems to be associated with the higher values of the mixing fraction, with median values between ξ 1 ∼ 0.3 (HIGH_SPIN model) and ξ 1 ∼ 0.85 (HIGH_MASS model). The rates and the mass function are two key ingredients here: the predicted rates are higher for the Field than for the other channels in all our models and the mass function of Field BBHs has a preference for low values; the BBH population inferred from GWTC-2, after correcting for detection biases, is better represented by a mixture model in which Field binaries give a substantial contribution. The HIGH_SPIN case is the one that maximally "penalizes" the Field case, because of an excess of large positive values of χ eff in this channel. This is the main reason why YSCs and NSCs are expected to contribute significantly to the overall population in this specific model.
In most models, GCs are associated with low mixing fractions. This mostly happens because NSCs "work" better than GCs to explain the most massive events (like GW190521) and thus are preferred by our formalism. In the SMALL_M2 case, both GCs and NSCs are associated with low mixing fractions because they tend to predict too many massive primary BHs with very low mass ratios.
The details of these results might strongly depend on the star formation rate and metallicity evolution model, which can deeply change the merger rate [36,38,193,194]. We will investigate the impact of these quantities in a follow-up study.

Discussion of the main caveats
We presented a new model that can be used to rapidly simulate hierarchical mergers in different environments (NSCs, GCs and YSCs), exploring a broad parameter space (e.g. progenitor's metallicity, binary evolution parameters such as α and f MT , escape velocity from the parent star cluster, delay times and 1g spin distribution). The treatment of dynamical pairing of Ng BBHs is still approximate: we assume that the retained merger remnants find a new companion and merge over a timescale dN/dt ∝ t −1 . This is in agreement with the findings of [102], but could be improved with an analytic treatment of dynamical hardening [115]. Furthermore, we assume that BHs can only be ejected by relativistic kicks, i.e. neglect dynamical recoil via close encounters. Finally, we assume that the star cluster does not evolve with time: it has a constant escape velocity. As shown in previous work [195][196][197][198], the properties of the star cluster might significantly change with time and the growth of an IMBH is strongly linked to the evolution of the host star cluster. For example, if we assume constant cluster mass, the half-mass ratio is expected to grow as r h ∝ t 2/3 and the escape velocity to decrease with time as v esc ∝ t −1/3 [199]. These two effects might slow down or even suppress the growth of an IMBH in the late evolutionary stages [102].
In our fiducial model, we assume that the stellar binaries which give birth to 1g BHs are primordial binaries and are not ionized by dynamical interactions. This assumption is motivated by the properties of such binaries. A BBH merger progenitor has an initial binding energy where A is the initial semi-major axis. The typical kinetic energy of a star in a star cluster is where m is the average stellar mass in the cluster and σ SC is the velocity dispersion. In the example, we consider an extremely high velocity dispersion σ SC = 100 km s −1 . Hence, binaries that will produce BBH mergers are hard binaries even in the most extreme star clusters and should survive ionization. This assumption breaks in the immediate vicinity of a super-massive BH. For example, inside the influence radius of a supermassive BH with mass m BH = 10 6 M , the typical velocities are ∼ 120 km s −1 (a/0.01 pc) −1/2 (m BH /10 6 M ). In this extreme case, even BBHs and their stellar progenitors might be soft binaries and might be broken. On the other hand, dynamical hardening might also be very effective as the BBH gets closer to a supermassive BH by dynamical friction, allowing the BBH to avoid ionization and even speeding up its merger [77].
Here, we make no assumptions about the formation of NSCs. If some of them, if not all, are formed by the hierarchical assembly of GCs [169,170], this might have a crucial impact on the population of BBHs. In fact, the GCs might already be depleted of merger remnants (because of their relatively low escape velocity) before merging to build up a NSC. Moreover, we neglect the AGN disk formation channel [78][79][80][81]200]. Including the physics of AGN disks can boost the contribution of galactic nuclei to the total merger rate and to Ng mergers. AGN disk physics can further speed up the pairing and merger of our BHs. We will include the AGN disk scenario in future work.
Arca Sedda et al. (2020, [76]) found remnant masses only up to ∼ 200 M , significantly lower than the results presented here for most models. The main reason for this difference is that [76] fixed the escape velocity from NSCs to v esc = 100 km s −1 and did not change this parameter. Our results are consistent with other models (e.g. [102]), where higher values of v esc are explored. This result is remarkable when considering that [102] adopt a more accurate model for dynamical interactions than the one presented here. Hence, escape velocities are the key ingredient to understand the mass spectrum of BHs in NSCs.
Finally, we include a simple redshift dependence based on the f i (t) functions. Alternative redshift dependencies can be obtained by changing f i (t). For example, if we assume f NSC = 0.1 (constant with redshift), we obtain an upper limit to the merger rate density associated with NSCs, because they are unlikely to contribute to 10% of the overall cosmic star formation rate. Under such extreme assumption, the local BBH merger rate density from NSCs is R NSC ≈ 7 − 10 Gpc −3 yr −1 , i.e. approximately a factor of 10 higher than the models we presented in Figures 8 and 9.

Summary
Hierarchical mergers in dynamical environments can lead to the formation of BHs with mass higher than the limits imposed by pair instability, core-collapse supernovae and stellar evolution theory. Here, we presented a fast semi-analytic method to draw the main properties (masses, spins, merger rate) of hierarchical BBHs, while probing the relevant parameter space.
In our models, NSCs are the dominant environment for the formation of hierarchical BBHs. In our fiducial model (A5F05), primary BHs with mass up to ∼ 10 3 M can form in NSCs, while the maximum primary BH mass is ∼ 100 M for both GCs and YSCs.
We find that the mass distribution of 1g BBHs has a crucial impact on the mass distribution of Ng BHs with N > 1. The metallicity of the progenitor is a key ingredient to shape the distribution of Ng BBHs, because it affects both the number and the maximum mass of BBHs. The common envelope α parameter and the accretion efficiency f MT also play a role, with smaller values of α leading to higher merger rates and higher values of f MT leading to more top-heavy BH mass functions.
If BHs with mass in the pair instability gap are allowed to form by stellar mergers [56], the mass distribution of Ng BBHs is skewed toward significantly larger masses (HIGH_MASS model). Primary BH masses up to a few ×10 4 M can be obtained in NSCs if only Ng−1g mergers are allowed to take place, i.e. if we prevent the secondary BH from being a merger remnant itself (SMALL_M2 model). The main reason is that relativistic kicks are smaller if the mass ration q = m 2 /m 1 tends to zero.
The escape velocity of the parent star cluster (v esc ) is probably the most important parameter to set the maximum BH mass. If we assume that the distribution of escape velocities from NSCs is log 10 (v esc /km s −1 ) = 2 ± 0.3 (BROAD_VESC model), BHs with mass up to ∼ 10 6 M are allowed to form in the NSCs with the highest escape velocities. This result is consistent with [102] and [47].
While BBHs in GCs and YSCs do not exceed the 5th and the 3rd generation, respectively, we expect at least 10 different BBH generations in NSCs. This number grows up to a few thousands if Ng−1g BBHs are the only way to produce hierarchical mergers (SMALL_M2 model).
BHs in the pair instability mass gap and IMBHs can form via hierarchical mergers. Their fraction is strongly suppressed at high metallicity. At Z = 0.0002 and in our fiducial model, the fraction of BBH mergers with primary BH mass in the pair instability gap is f PISN ∼ 7 × 10 −3 , 3 × 10 −4 and 5 × 10 −6 in NSCs, GCs and YSCs, respectively. In our fiducial model, the fraction of BBH mergers with primary BH mass in the IMBH regime is f IMBH ∼ 5 × 10 −4 in NSCs, while we do not find any IMBH mergers in either GCs or YSCs. These fractions are significantly higher in the SMALL_M2 and HIGH_MASS models (Figure 7).
The local BBH merger rates in our models range from ∼ 10 to ∼ 60 Gpc −3 yr −1 , but Ng BBHs in NSCs only account for 10 −2 − 0.2 Gpc −3 yr −1 in our models. If we assume that 10% of all stars form in NSCs, we find a robust upper limit ∼ 7 − 10 Gpc −3 yr −1 for the local merger rate density of Ng BBHs in NSCs.
We compare our models against LIGO-Virgo data from the second gravitational wave transient catalog (GWTC-2, [9,10]), by estimating the mixing fractions of the four considered channels. Even if the mixing fractions are wildly affected by model hyper-parameters, our analysis suggests that more than one channel is needed to explain the observed population from GWTC-2. This result confirms that the BBHs observed by the LIGO-Virgo collaboration likely are a combination of several different channels and opens new perspectives for the study of BBH formation.  Data Availability Statement: The data underlying this article will be shared on reasonable request to the corresponding authors.

Acknowledgments:
We thank Eugenio Carretta for useful discussion and we thank the internal referee of the LIGO-Virgo collaboration, Fabio Antonini, for his suggestions, which helped us improve this work.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: