B Discrepancies Hold Their Ground

: This write-up aims at a comprehensive discussion of the status of the so-called B -anomalies, as well as their interpretation from an effective-theory point of view. The focus is on presenting facts and physics arguments using the bare minimum of equations and pointing instead to the relevant literature in each speciﬁc case.


Introduction
Certain meson decays rank somewhat traditionally among the very best probes of new effects beyond the Standard Model (SM), for various good reasons that are perhaps worthwhile to itemize from the outset: • First, many among such decays are forbidden "classically", i.e., they only arise at loop level. Since the mechanism responsible for this fact does not seem to be a necessary ingredient of the ultimate theory of fundamental interactions, these decays allow probing new interactions occurring at the loop or even the tree level. As a matter of fact, the interaction scale that can thereby be probed is vastly above the Electroweak (EW) scale, e.g., it can attain 10 3 -10 5 TeV from K-meson mixing observables [1,2] (this wide range depends on whether it includes or not CP-violating quantities such as K and on whether the new physics is tree-level-or else loop-mediated via strongly or else weakly coupled new particles); • It is also noteworthy that many of the decays that do not enjoy this loop suppression, i.e., that arise at the tree level, allow measuring to high precision many of the parameters that enter the SM predictions. This generically allows for a small parametric uncertainty on these decays; • Besides, given the hierarchies between the EW scale and the b-quark mass, as well as between the latter and the light quarks u, d, and s, many of these decays are calculable within Effective-Field Theory (EFT) frameworks, with controlled errors, especially those scaling as powers of G F , of Λ QCD /m b , and of the QCD and the QED couplings; • The latter corrections are actually accompanied by logs, and since the underlying dynamics involves, as argued, vastly different scales, the products α s,e.m. × log need to be summed to all orders for a reliable prediction. Fortunately again, established tools of Renormalization-Group (RG)-improved perturbation theory, as well as systematic expansions along collinear directions often allow consistently identifying and efficiently summing such effects; • For the decays where an EFT expansion into the local operator is possible, one can also take advantage of nonperturbative techniques to compute the necessary matrix elements between the quark-level operators and the external hadrons, e.g., lattice QCD, which can now often attain predictions to a few percent accuracy.
In short, these decays offer a vast zoo of exquisite probes of new effects, because the corresponding SM predictions are often very precise for the above set of reasons.
Incidentally, the above decays can be produced with huge statistics. In fact, the decaying hadrons involved are much below the typical energies that can now be attained at hadron colliders; as an alternative for lepton colliders, one can resonantly produce pairs of such hadrons by tuning the collision energy appropriately. Factoring in the advanced detectors and Monte Carlo tools currently available, one ends up with very accurately measured observables.
The above arguments speak for a field that is very vast and still very topical, in spite of its decades-old tradition. The build-up of a set of discrepancies, or "anomalies", in a coherent ensemble of these decays makes the field even more interesting. This brief review aims at discussing the present status of these anomalies. The focus is on the main underlying messages. In this respect, the equations were reduced to the bare minimum. For them, as well as for the quantitative results, the reader is addressed to the original papers quoted.

Phenomenology of Leptonic b → s Modes
Purely leptonic B d,s → decays provide especially deep probes of the SM mechanism of flavor mixing. Within the SM, such transitions are Flavor-Changing Neutral Currents (FCNCs) and helicity suppressed. These two features imply a double suppression mechanism, resulting in extremely rare decay rates. Besides, the purely leptonic final state causes these decays to be theoretically very clean. As a result, these modes provide formidable probes of physics beyond the SM, in particular of nonstandard Higgs sectors and, recently, of models with leptoquarks. Although experimental searches exist on all six of the B d,s → modes [3,4] (plus on the LFV ones, with , on which we will comment separately), the only established mode is B s → µµ. The first evidence was provided by the LHCb experiment in 2012 [5], and the first observation was the result of a CMS and LHCb joint analysis [6]. State-of-the-art measurements at LHC, using about half of their full Run-1 and -2 statistics, are provided in [7][8][9], and combined in [10]. Very recently, LHCb published an updated analysis performed with the full Run-2 dataset [11,12].
On the theory side, the accuracy of the SM prediction for B s → µµ relies on three pillars [13]:

•
The evaluation of the "nonradiative" branching fraction (B (0) ), i.e., the branching fraction in the absence of soft or collinear QED corrections. The current accuracy on the "matching conditions" is NLO in the electroweak coupling and NNLO in the strong coupling [14][15][16]. Notably, such accuracy allows taming the dependence, now negligible, on the renormalization scheme for electroweak parameters; • The subtraction of soft-photon radiation and the inclusion of collinear radiation. The purely nonradiative mode is a theoretical quantity, whose width vanishes in the full theory with α e = 0. The procedure to take into account soft-photon radiation is well known [17,18]. One defines the branching ratio inclusive of an arbitrary number of undetected photons γ i such that ∑ E γ i ≤ ∆E, with E γ i the γ i energy in the decaying-B rest frame and ∆E a cutoff. This yields: where ω(∆E) is a multiplicative correction, which tends to unity as ∆E → m B s /2, i.e., its kinematical endpoint. The experiment accesses the l.h.s. of Equation (1), and the ω correction is estimated through a Monte Carlo, typically PHOTOS [19], so that B 0 s → µµ measurements are directly comparable with B (0) . More subtle is the inclusion (in B (0) ) of photons of arbitrary energy (within their kinematic endpoint), but collinear to a final-state lepton, so that + γ is indistinguishable from alone. Such an effect has been calculated in Soft Collinear Effective Theory (SCET) [20,21] and shows how, for such photons, the B 0 s → µµ decay really merges onto B 0 s → µµγ: the SCET calculation yields corrections proportional to C 9 and C 7 that lift the chiral suppression inherent in the C 10 contribution (This is due to the fact that the energetic photon delocalizes the initial-state light quark (not the b quark!) participating in the weak-transition operator by a distance 1/ Λ QCD m B ). However, the C 7 and C 9 corrections accidentally cancel each other. The seminal result in [20] should be extended to any other semileptonic B decay mode discussed next, and only such an endeavor would allow conclusively claiming that there are no unaccounted for collinear-log corrections in these modes; • The time dependence of the initial state. At hadron colliders, one measures the timeintegrated sum of the B 0 s (t) andB 0 s (t) decays.B s − B s oscillation effects make this quantity different than the decay rate computed at the initial time, because of the large width difference between the two B s -system mass eigenstates [22][23][24]. This correction may, in principle, be affected by new physics [25,26].
All of the above points are addressed in the state-of-the-art theory prediction for [14,20,21].

First B Anomaly: Branching Ratio Data
Semileptonic b → s modes are decays of the type B → M + − , with M either a pseudoscalar or a vector meson with strangeness. The presence of M lifts the chiral suppression inherent in the purely leptonic modes, so that semileptonic branching ratios are "less rare", typically O(10 −7 ), than leptonic ones. The inclusion of M also warrants a far richer phenomenology, e.g., in the number of independent observables. In fact, the three-body decay makes the final states not monochromatic (in the B rest frame), as is the case for leptonic decays; besides, if M is a vector, the decay is effectively four-body, with, e.g., K * → Kπ.
These One important difference across these experiments is the accuracy for = e vs. = µ. At lepton colliders, the two datasets come with comparable accuracy, because efficiencies are comparable for muons and electrons (and results typically combine electronic and muonic data). At hadron colliders, signal electrons are to be fished out of an environment much richer in electrons and photons (and the initial state is also much less known than at lepton colliders). This is less of a problem with muons; hence, at hadron colliders one has to set different p T thresholds for electrons than for muons, in particular higher for the former; see [42] for a recent discussion. Up to Run-2, such a p T cut was performed at the hardware-trigger level. Starting from Run-3, one would replace such a cut with one on the impact parameter, performed at the software level, after reconstructing the tracks (F. Dettori, private exchanges). Needless to say, this will allow for a much superior flexibility, and it is hoped that electron efficiencies will improve to a figure much closer to muon efficiencies [43,44].
Inspection of the above data shows that the SM prediction [45][46][47][48] is, in all channels, higher than the respective measurement, at least in the dilepton mass-squared region below narrow charmonium. A representative example is the channel B 0 s → φµ + µ − [37,49], which displays a 3.6σ tension with the SM. Note that this tension is present in dimuon data, but not in dielectron ones, in spite of the larger errors in the latter. This tension in branching ratio b → sµµ data constitutes the most longstanding "B anomaly". In spite of the impressively coherent "data < SM pattern", it is difficult to take this B anomaly alone very seriously, because branching ratio predictions are quadratically dependent on form factors, whose estimation is marred by certain inherent limitations, to be discussed next.
The first step towards calculating semileptonic decay amplitudes is to factorize them into local hadronic times leptonic matrix elements; this is possible modulo subleading QED corrections and barring nonlocal contributions, which we will comment upon below. A systematic discussion, including also beyond-SM operators was presented in, e.g., [50]: one writes the amplitude in terms of leptonic bilinear times matrix elements of the form M(p M )|Γ i |B(p B ) , with Γ i a quark bilinear (possible Lorentz indices are understood). These matrix elements can then be expressed as sums of "form factors" times all the possible Lorentz structures that are appropriate to the problem, with as many Lorentz indices as Γ i 's. The form factors are functions of q 2 ≡ (p B − p M ) 2 and are real-valued away from kinematic regions where intermediate states with the same invariant mass can be produced. Form factors are inherently nonperturbative objects. For q 2 close to the endpoint, they can often be calculated within Lattice QCD; conversely, for "small" q 2 , which corresponds to large energies E M of the final-state meson in the decaying B rest frame, one can perform a double expansion, in 1/m B and 1/E M , and use QCD factorization and thereby access ratios of form factors [51]. This technique exploits the fact that in the simultaneous largem B and large-E M limit, the dynamics enjoys a larger symmetry, implying that the thus obtained form factor ratios are corrected by the Λ/m B and Λ/E M effects-where typically, Λ ∼ Λ QCD . This technique has also been used to work out "optimized" observables, with namely reduced form factor sensitivity [52]. A similar approach can actually be applied also for small E M , where one exploits heavy-quark symmetry alone [50,53], and in the full kinematic range [54]. Concretely, these observables are designed so that, in the strict large-m b limit, form factor dependence drops out. However, the argument is valid up to the above-mentioned power corrections, which then represent the inherent theory-error figure; putting in numbers, this can be as large as 20%.
A uniquely simple example is the case where M is a pseudoscalar meson. In this case, for Γ i , a vector bilinear one has two form factors, because two are the possible Lorentz vectors that are constructable: (p B ± p M ) µ . The tensor bilinear adds one further structure, whereas any other bilinear vanishes due to Lorentz symmetry or parity. The nonzero form factors have been calculated for B → light meson decays including B d,s → K, B d → π in lattice QCD [55][56][57][58][59][60][61][62][63] and also from light cone sum rules [64,65].
For M, a vector meson, things become complicated in two respects: first, in the number of possible Lorentz structures and nonzero form factors; besides the fact that a vector meson is a resonance, and (in principle) finite-width effects need to be taken into account. A "stateof-the-art" study was provided by [66]. A more standard approach is to adopt the so-called narrow-width approximation, e.g., [67]. Interestingly, for the cases B → K * and B s → φ, of special interest in the context of B discrepancies, all local form factors are available, both from lattice QCD [46,48, [68][69][70][71] and with light cone sum rules [45, 65,72]. Similarly as in the case M = a pseudoscalar, the two approaches hold for large and respectively small q 2 -the region in between being narrow charmonium. Uncertainties in the two approaches are mutually compatible, as one can see by suitable extrapolations.
So far in this discussion, we have been confining ourselves to local matrix elements. In certain kinematic regions further, nonlocal structures become relevant, namely T-products between the e.m. current and either the four-quark operators O 1,2 , or the so-called QCD penguins, or the chromomagnetic-dipole operator, each with flavor indices appropriate for the external B and M mesons.
Formally, these matrix elements are expressed similarly as the local ones, namely as products of "form factors", nonlocal in this case, times Lorentz structures. Such T-products are nonlocal by construction and, thereby, much more challenging to estimate than the local contributions. Estimates exist within approaches based on a local OPE or using Light Cone Operators (LCOPEs).
The local OPE will be valid only to the extent that all components of the spacetime separation between the two operators in the T-product are much smaller than 1/Λ QCD . This holds for large q 2 around the endpoint. One then performs a matching onto local operators [73,74]; the leading-power O(α s ) matrix elements of the dimension-three operators were calculated in [75]. The underlying framework is that of QCD factorization as established in [76,77], which assumes a large E M as also stated above. The integrated OPE prediction can then be related to the measurements, integrated over large q 2 . This relation rests on the assumption of quark-hadron duality, although the effects of so-called duality violation imply possibly large and not clearly quantifiable systematic uncertainties; see, e.g., [78].
As regards the approach based on LCOPEs, a benchmark calculation is [79], where the authors calculated the leading contributions from four-quark operators and estimated the subleading ones, which are suppressed by q 2 − 4m 2 c . The resulting "charm-loop effect" can be as large as 20% for B → K * , largely because of the soft-gluon contribution. This result has motivated several phenomenological analyses [80][81][82][83] aimed at addressing to what extent the discrepancies observed at low q 2 may be explained away through this effect. The calculation in [79] was recently reappraised in Ref. [72], which reported agreement on the operator basis in the LCOPE at next-to-leading power, but noted that the LCSR setup in [79] implied an incomplete basis of Lorentz structures. This per se implies a factor-10 smaller correction in [72], with a further factor 10 due to updated parametric input, yielding all in all a two-o.o.m. smaller correction. This result is reassuring as regards the possible role of charm-loop effects in explaining away the "B anomalies". Yet, the "oversensitivity" of the calculation to the parametric input is somewhat upsetting. Calculations [72,79] are crucial to disambiguate SM nonperturbative systematics from genuine short-distance information. This subject is thereby an active one, and several groups are striving to qualitatively improve the underlying approach towards estimating these nonlocal matrix elements. The main idea [79,81,84] is to calculate the genuinely nonlocal parts, i.e., the form factors, at space-like q 2 values, which are free from branch cuts and would allow a faster convergence o the LCOPE expansion. The thus-calculated "boundary conditions" would then be related to the physical-region form factors through analytic continuation. On this subject, see [72,78,79,81,[84][85][86][87][88][89].

Second B Anomaly: Differential Data and Analyses
Semileptonic decays with a final-state pseudoscalar or vector meson are three-or effective four-body decays. Statistics permitting, one can measure the fully differential distributions, which offer a much wider set of observables than the branching fraction alone. The simplest example is B → K + − , whose rate may be differentiated w.r.t. q 2 and an angle, which is usually the angle θ between + and K in the dilepton rest frame. The resulting doubly differential distribution will be a quadratic polynomial of cos θ , and two of the three coefficients represent nontrivial additional observables w.r.t. the branching ratio, expressible in terms of the Wilson coefficients of the leading, dimension-six, effective Hamiltonian [50].
The case of B → V + − is more involved, because V is a massive spin-1 particle, implying three physical polarizations, and because it is unstable, implying an effective fourbody decay. With, e.g., B = B 0 d and V = K * 0 (to which we will focus in the following for definiteness, so that also the angles' conventions are unambiguous), one has K * 0 → K + π − . Note that Kπ includes a spin-0 ("S-wave") component, which represents an irreducible background to be carefully subtracted. The signal decay has thus 3 (polarizations) × 2 (leptonic-current chiralities) physical amplitudes, and the S-wave contribution introduces 1 × 2 additional ones (for their implications in terms of contributions from pure S-wave and S-wave/P-wave interference, see [90]).
The differential decay can be described in terms of q 2 plus three angles. To these quantities one has to add m Kπ to the extent that the finite width of K * is taken into account. Different conventions exist for the three angles, see; e.g., [67,91]. One common convention is that of [92], where the three angles are θ , the angle between + and the negative B 0 d direction in the dilepton rest frame; θ K , the angle between the K + momentum and the negative B 0 d one, in the K * 0 rest frame; and φ, the angle between the dilepton decay plane and the K + π − decay plane in the initial-state rest frame. A dictionary across the different angle conventions can be found in [93].
One ends up with a four-fold differential distribution, expressed as products of angle functions times "angular coefficients", usually denoted as I i (q 2 ) for theB 0 d and asĪ i (q 2 ) for the B 0 d decay. These (-) I i are bilinear combinations of the six amplitudes mentioned above and are usefully rearranged into CP-even and -odd angular observables S i ∝ I i +Ī i and A i ∝ I i −Ī i . Certain combinations of these observables are more easily accessible, and/or they come with an intuitive physical interpretation, e.g., the forward-backward asymmetry A FB or the different polarization fractions of the vector meson. Besides, and importantly, one can construct ratios of angular observables such that the dependence on hadronic form factors exactly cancels in the m b → ∞ limit, e.g., the well-known P ( ) i observables for low q 2 [52,54,94,95] and the H (i) T observables for large q 2 [50,53]. We note that these angular observables generally require identifying the flavor of the parent B meson ("flavor tagging"), which is unambiguous in the above example of K * 0 , but is not straightforward in other decay modes, e.g., Clearly, the latter case would require a time-dependent analysis to disentangle the B 0 s from theB 0 s components [24]. The angular distributions discussed in this section have been studied at BaBar [96,97], Belle [30,98], CDF [99], ATLAS [100], CMS [32,101-103], and LHCb [38,104-106]. The extracted angular observables are generally found to be in good agreement with the SM. However, the angular observable P 5 as measured by LHCb [106,107] disagrees with the SM in two low-q 2 bins ([4, 6] GeV 2 and [6,8] GeV 2 ), with local significances around 2.5-3σ. Taking into account all the other angular observables, one ends up with a global significance slightly above 3σ [106]. This is the second "B anomaly".
The above subject is extremely dynamical, in the first place experimentally. Important updates including the full Run-1 and Run-2 datasets are expected soon from LHC experiments. Besides, LHCb released a first analysis that is unbinned in q 2 [87], and more involved cases are being proposed [88,[108][109][110].
One avenue of improvement possibly addressing the issue of theoretical errors consists of measuring the differences of angular observables between two leptonic channels, e.g., the quantities [111][112][113]. Interestingly, Belle has published data to perform the first such test for i = 4, 5 [98], in the low-q 2 region relevant for the anomalies. The analysis finds P µ 5 to be 2.6σ below the SM and P e 5 closer to the SM prediction (1.3σ). Once again, the pattern is coherent with the rest of the anomalies.
Further measurements of the Q i observables will arguably be feasible within Belle II. Aside from providing crucial cross-checks from an independent experimental setup, Belle II will take advantage of efficiencies between dimuon and dielectron modes that are similar, as already discussed. This will also offer additional tests of lepton universality violation, i.e., ratio observables, to be discussed in more detail next.

Third B Anomaly: Lepton-Universality-Violating Ratios
These kinds of tests are represented by ratios of the branching ratios discussed in Section 3.1, the only difference being the lepton flavor in the numerator vs. denominator. In particular, the dilepton invariant mass range is the same between the numerator and denominator.
These ratios are by construction tests of one near-symmetry of the SM, lepton universality. As is well known, within the SM, leptons couple universally to gauge interactions; hence, the only nonuniversal dynamics comes from Yukawa interactions and is proportional to the mass of the concerned lepton. In ratios such as: such effects are minuscule [114,115] (for small enough q 2 min , one gets close to the lower endpoint in the muon channel and there is LUV by lack of phase space). QED effects may also lead to Lepton Universality Violation (LUV), notably through corrections due to collinear photons of arbitrary energy (within the kinematic limit), yielding corrections ∝ α e log(m /∆), with ∆ denoting any other scale in the problem. Note that this may include inherently physical scales such as m B or Λ QCD , as well as scales induced by the definition of the actual observable, including q 2 min,max . The effects of photons with energy well below Λ QCD , i.e., unable to resolve the internal structure of the external mesons, were discussed in [116,117] in the framework of a point-like meson Lagrangian and compared to PHOTOS's [19]. R M ratios as in Equation (2) are advantageous from both an experimental and theoretical point of view. In fact, most of the form-factor-induced uncertainties that mar branching ratios (see the discussion in Section 3.1) cancel to a large extent in such ratios; experimentally, these ratios imply ratios of efficiencies, so that many sources of systematics cancel. The ratios R K and R K * have been measured at BaBar [118], Belle [119,120], and LHCb [121][122][123]. It is worth quoting explicitly the latest LHCb result [123]: R K = 0.846 +0.044 −0.041 , as measured in the dilepton mass-squared region q 2 ∈ [1.1, 6.0] GeV 2 , departing from unity at 3.1σ, thereby providing evidence for LUV. It is encouraging that, with respect to the previous measurement, the statistics went from 5/fb to 9/fb; the statistical error (dominant component of the total) went down accordingly (factor of 1.3); the significance went up accordingly (from 2.5σ to 3.1σ) (here, we are referring to the significance of R K alone, not of the combination of all b → sµµ data). Including the analogous discrepancy reported in R K * (using about half of the full Run-1 and -2 statistics, LHCb has recently reported the analogous of R K , but for the hyperon channel Λ b → pK [124]), these measurements represent the third "B anomaly", probably the most representative one, given the theoretical cleanliness of the R M ratios.
Because of the reasons discussed above, most of the uncertainty in the R M ratios is statistical. Typical error figures for the medium-term error on R K are about 2% with the 50/fb of the LHCb upgrade; below 1% for the data sample expected by Upgrade II [125]; 3.6% at Belle II with 50/ab (i.e., the full data sample) [126]. Assuming that the LHCb discrepancy stays, the Belle II measurement will be an important validation, also because Belle II has similar acceptances for electrons and muons, and is thereby naturally suited for LUV tests, as already discussed.
Before concluding this part, it is worthwhile to pause on the challenges inherent in R M -ratio measurements at hadron colliders, in particular the already mentioned differences in electron vs. muon efficiencies. Clearly, in a measurement to dimuons over dielectrons, one basic reconstruction challenge is related to the fact that electrons emit much more bremsstrahlung than muons, and the reconstructed momentum is the momentum after emission, not the "true" leptonic momentum. This problem has been well known since the initial R K measurement and has been the subject of enormous internal scrutiny before R K updates were released. The numerous tests performed suggest that the effect is understood.
Two crucial examples of these tests are: (1) R K in the control region where the dilepton is emitted by cc resonances-the J/ψ and the ψ(2S)) and (2) electron efficiencies e in these control regions vs. in the signal region. Specifically, e are calibrated in the J/ψ → ee region and extrapolated to the signal region. However, the kinematic properties of electrons in these two regions are very similar (see Figure 10, right, in [123]), and R J/ψ obtained for electrons with kinematics in either of these two regions is well compatible with unity (see Figure 10, left, in [123]). Finally, note that in none of the 16 bins is R J/ψ anywhere close to 0.8-the central-value figure of the last R K measurement [123].
Besides the above considerations, and maybe most importantly, all electronic datanotably all BRs to dielectrons-are SM-like, whereas it is muonic data that are not SM-like, in particular all BRs to dimuons. Hence, dismissing the R M measurements on the grounds of possible uncontrolled systematics in electrons is not straightforward, because it is unclear how such systematics would not manifest itself in B(b → see) data-which are SM-likeand instead result in B(b → sµµ) below the SM predictions, in basically all the main channels. In other words, ratios B(b → sµ + µ − )/B(b → se + e − ) would be suspicious if the denominator were above the SM prediction, but instead, it is the numerator that is below SM, and not only in B → K, but basically in any other measured channels, including hyperon modes [33-41,124].

Lepton Flavor Violation
To summarize the discussion so far, and pending further experimental scrutiny, the three B anomalies discussed in Sections 3.1-3.3 suggest new dynamics that couples dif-ferently to the different lepton families, i.e., LUV. Besides, the fact that the discrepant measurements are all and only those to dimuons, whereas modes to dielectrons are SM-like within errors and modes to ditaus do not yet pose strong constraints, points to new physics hierarchically coupled to the different lepton generations [127].
In the presence of LUV dynamics, one generally expects Lepton-Flavor-Violating (LFV) dynamics as well. (Here, "generally" means that one can of course construct models that forbid nonstandard LFV and are concurrently able to explain R K . However, this requires additional assumptions, i.e., a dynamical or a symmetry mechanism preventing LFV while allowing LUV. One avenue in this respect is to extend the peculiar SM leptonflavor symmetries to the new model; see, e.g., [128,129].) The question is whether a general argument exists why a measurable LUV effect could imply likewise measurable LFV effects. This connection was discussed in [127] and certain aspects further developed through examples in [130,131]. One may take as a starting point the observation that all b → s data are explained at one stroke by a four-fermion operator composed of a Left-Handed (LH) quark times a left-handed lepton structure, with a lepton-family-dependent Wilson coefficient, larger for µ + µ − than for ee [132]. Such a pattern suggests a purely third-generation interaction of this LH × LH form, generated at a scale larger than the EWSB scale. Therefore, fields in this interaction are not in the mass eigenbasis, and the unitary redefinitions leading to this basis will in general misalign the initial interaction across generations and yield LUV, but also LFV. In particular, one can parametrically relate the measured LUV (through R K ) to measurable LFV decays such as B → Kτµ. Plugging the numbers, one sees that LFV BRs are generically expected around 10 −8 [127]. As discussed in this paper and made more explicit in [131], one can understand this o.o.m. as the B(B → Kµµ) ≈ 4 × 10 −7 , times the departure of R K from unity, squared, times the ratios of the products of the above-mentioned unitary rotations that lead to the mass eigenbasis.
Certain LFV decays represent strong constraints. In fact, if one starts with the mentioned effective interaction, but properly SU(2) L -symmetrized [133], closes the quark loop, and attaches a gauge boson decaying to two further leptons, one obtains LFV effects in the decays of leptons, e.g., τ → 3µ [134,135].
Following the above line of argument, extensive literature has further explored the topic of LFV in semileptonic B decays, from both theoretical and experimental viewpoints, e.g., an extensive LFV phenomenology was discussed in [136][137][138][139][140] for B and even K decays. Interestingly, a detailed program of experimental searches has likewise blossomed. Recent searches at LHCb include [141,142], and more are expected to come, since several aspects of the underlying analyses are completely analogous to the lepton-flavor-conserving modes (Section 3.1).

B s → µµγ
It has been pointed out that the dimuon dataset used for the B 0 s → µ + µ − measurement may be used to "parasitically" extract suitable B 0 s → µ + µ − X channels, provided the corresponding backgrounds are sufficiently under control [143]. This observation was applied to the example of B 0 s → µ + µ − γ. This decay is interesting because the additional photon lifts the chiral suppression in B 0 s → µ + µ − [144][145][146][147] (for the ee channel, this means an enhancement of five o.o.m., making it comparable with the µ + µ − mode). The removal of the chiral suppression implies that B 0 s → µ + µ − γ can probe a richer set of Wilson coefficients than B 0 s → µ + µ − , in principle all those that are interesting in the light of B anomalies.
The direct measurement of B 0 s → µ + µ − γ (i.e., with detected γ) is quite challenging at hadron colliders, because of the ubiquitous π 0 and stray γ. In fact, there is no PDG entry whatsoever for B 0 s → γ. In light of this challenge, Reference [143] suggested to extract B 0 s → µ + µ − γ from the B 0 s → µ + µ − event sample, basically by enlarging the dimuon invariant mass window below the B 0 s peak, with the essential precondition that the other backgrounds that will populate this enlarged window are under control within the signal yield at which one is aiming.
One crucial limitation of B 0 s → µ + µ − γ at present is the poor knowledge of the necessary B 0 s → γ form factors. By construction, the method [143] measures the B 0 s → µ + µ − γ spectrum for photons close to their lower endpoint, i.e., for large q 2 close to q 2 max . This is the part of the spectrum most sensitive to the Wilson coefficients pointed to by current B anomalies, and the form factors are in the q 2 range preferred for lattice QCD estimations. As a matter of fact, substantial progress has been made recently in this domain. In particular, for large E γ , it was shown that the required LQCD correlator (which is the insertion, between the B and the vacuum, of a weak and of an e.m. current) has always the desired large-time behavior [148]. Besides, several calculations of B 0 s → γ form factors based on QCD factorization and soft collinear effective theory [149] or on light cone sum rules [150][151][152] have recently appeared. One arduous obstacle to overcome, especially for low q 2 , is the presence of resonances. Even if one restricts to low enough q 2 to avoid charmonium, there are resonant contributions at around 1 GeV from the φ (in the B 0 s case). With a Breit-Wigner ansatz for these contributions, one can make a prediction of the total BR integrated between the lower endpoint, through the resonance region, up until 6 GeV 2 , finding ballpark values of 10 −8 . If one excludes the φ, one finds 3 × 10 −10 . Therefore, the prediction is totally dominated by the resonant contributions, and a phenomenological parameterization of their shapes looks inescapable. For high q 2 above the ψ(2S) resonance, the problem is substantially milder [146].
For small E γ , the problem arises to define IR-safe LQCD quantities. A benchmark new approach to this problem was proposed and first applied in [153,154]. The idea is to use the width calculated in the continuum within scalar QED in order to cancel IR divergences, and do this for each photon momentum considered within the lattice simulation. The main limitation is the assumption of scalar QED, i.e., of point-like mesons. This implies a cutoff on E γ well below Λ QCD .
The limitations inherent in B 0 s → γ form factors, as well as in resonance pollution, motivated the proposal of B 0 s → observables with lesser sensitivity to long-distance physics. For example, in [146], the resonant ansatz was used to trade the resonant region for the measured B 0 s → φγ decay. Then, the main focus was the large-q 2 region, i.e., above narrow charmonium. Although here, one has broad-charm pollution, it can be estimated with a similar ansatz, and the underlying uncertainty is substantially tamed if one defines a suitable ratio of B 0 s → µ + µ − γ / B 0 s → e + e − γ differential branching ratios, akin to R K [146,147].
Actually, natural ratio observables arise in the context of CP violation. This is the case of, for example, A ∆Γ [22,23]. Accordingly, Reference [155] constructed this observable in the context of B 0 s → µ + µ − γ and performed an explicit comparison among the different B 0 s → γ form factor parameterizations available [145,147,149,[156][157][158].

b → sτ + τ −
A dedicated, if short, comment is deserved for b → sτ + τ − modes such as B 0 s → ττ and B → Kττ. As mentioned, B anomalies suggest new effects hierarchically coupled to leptonic generations, and largest for the third one. It is clear that b → sτ + τ − modes provide the most immediate smoking guns of such a possibility [127,159]. The existing limits on these modes are from Belle [160] and LHCb [4]. Belle II anticipates reaching the level of 10 −5 on these branching ratios [126].

Phenomenology of Semileptonic b → c Modes
These are decays where an initial b-flavored hadron (aB meson or a Λ b hyperon) decays to a c-flavored one (D or Λ c ) plus a charged leptonic current ± (-) ν. The main qualitative difference for phenomenology is that these decays proceed through tree SM diagrams, i.e., they are not loop-suppressed. As a consequence, assuming the absence of beyond-SM "pollution" (for the status of departures from universality in muon vs. electron modes, see [161][162][163][164][165]), these decays, for = e, µ, have been used as standard candles for the determination of V cb . This parameter is often one of the four "standard" parameters describing the full CKM (see, e.g., [166]). Interestingly, the ratios R(D ( * ) ) ≡ B(B → D ( * ) τν)/B(B → D ( * ) ν) display some disagreement with the SM predictions at the level of 3.1σ as of spring 2019 [167] and more recently upgraded to 4σ, following the reappraisal in [168,169]. This represents the fourth "B anomaly".
The "ideal" setup to measure R(D ( * ) ) is B-factories, because the missing kinematic information due to the elusive neutrinos (also the τ decay produces at least one) can be made up for by the knowledge of the initial state, the clean decay environment, and the large angular coverage. B-factory measurements include [170][171][172][173][174][175]. Further measurements of R(D * ) come from the LHCb collaboration [176][177][178]. In this case, the momentum of the decaying B-meson is inferred from information on its decay direction, in turn reconstructed from the decay vertex. Then, R(D * ) is inferred from a multidimensional fit, whose variables depend on whether the τ decays hadronically or leptonically. (Yet another test of the same underlying discrepancy is the ratio: LHCb has performed such a measurement with = µ [179]. The experimental result departs from the quite precise SM prediction [180,181] in a way that is intriguingly comparable, in magnitude and sign, to the R(D ( * ) ) case.) Needless to say, a rich program of measurements of these and other b → c modes is foreseen at Belle II [126] and LHCb [125]. Belle II will exploit the same well-known advantages of leptonic colliders over hadronic ones, which have already been advertised, and should be able to access also the τ and D * polarizations. It could also possibly perform inclusive measurements, which were recently discussed in [182,183]. LHCb's strength will be the ability to access many R-measurements, including also the Λ c , the D s , and the already mentioned J/ψ channel. The ultimate experimental precision will be of a few percent.
The R(D ( * ) ) discrepancy is typically interpreted as due to new effects in B(B → D ( * ) τν). It should be noted in fact that, to the extent that new physics is present in b → s transitions and is caused by above-EWSB-scale dynamics, new effects should "spill over" to some degree also in b → cτν transitions, especially if one starts with the assumption that the new interaction is dominantly coupled to the 3rd generation [127]. In this case, as shown in [129,133], the b → s and b → c anomalies are related to first approximation by SU(2) L symmetry.
From the EFT point of view, the description of b → c ν decays is similar to the b → s case: after integrating out EW-scale particles, the decay is described by a leptonic times a quark (cΓb) current, and the latter needs be evaluated between hadronic external states, depending on form factors that are functions of the leptonic invariant mass squared q 2 .
The determination of these form factors is a very active subject of research, encompassing both first-principles QCD approaches, i.e., lattice QCD, as well as QCD-inspired approaches, commonly denoted as Light Cone Sum Rules (LCSRs), and grounded on the assumption of so-called quark-hadron duality. In the casesB → D andB → D * , lattice studies are best suited for the kinematic endpoint q 2 = q 2 max ; see [184][185][186][187], respectively. Conversely, LCSR calculations [65,188] provide results for negative q 2 , i.e., below the q 2 min value for these decays. It is noteworthy that f.f.s have been calculated (and interesting phenomenological applications performed) also for the hyperon case [189,190] and even for the B c → J/ψ case [180].
One qualification is in order here. TheB → D ν-decay prediction depends on two f.f.s only, usually chosen to be the vector and the scalar f.f.; hence, the f.f. dependence is, in principle, much simpler than forB → D * ν. Nonetheless, the R(D * ) prediction may be more robust, as discussed in [191]. In fact, inB → Dτν, the contribution from the scalar f.f. is sizeable as compared to the vector f.f., whereas it is negligible in the case of light leptons. Reference [191] showed that, if one assumes an O(10%) departure in the scalar f.f. (e.g., for some suitable q 2 interval below q 2 max ) with respect to the current lattice evaluation [184,185], the R(D) prediction would be in better agreement with the experimental figure.
We next give a few words on how the f.f. calculations are connected to R(D ( * ) ) predictions through global analyses. We saw that, aside from specific cases, f.f.s are calculated for certain q 2 regions, so that extrapolations are required to estimate them in the full q 2 range relevant for the decay. Some reference approaches exist towards these extrapolations. A first such approach is due to Boyd, Grinstein, and Lebed (BGL) [192]. The underlying idea (see, e.g., [193][194][195][196][197][198][199][200][201][202][203][204][205]) is to start from the f.f. normalization provided by heavy-quark symmetry ( [206][207][208][209]; for reviews, see [210,211]), then constrain the f.f. shapes as functions of the momentum transfer by means of dispersion relations in momentum space. The latter provide relations between the rate for an inclusive production rate on one side and a two-point function reliably calculable in perturbative QCD on the other side. The inclusive production rate may be estimated as a sum over channels, and BGL showed that inclusion of higher states substantially improves the shape constraints for b → c transitions. To maximize the constraining power of this relation, one performs global analyses [212].
The predictivity of the f.f. parameterization can be further enhanced by using unitarity, analyticity, and perturbative QCD scaling as in Bourrely, Caprini, and Lellouch (BCL) [213]. This overall approach aims at a model-independent f.f. parameterization based on general QFT constraints, unitarity and analyticity, and without resorting to explicit expansions, whether in α s or in inverse powers of the heavy-quark mass. Recent applications of this approach include [214][215][216], aimed at the simultaneous determination of R(D ( * ) )-affected by the above mentioned "fourth B anomaly"-and V cb -whose exclusive vs. inclusive determinations tend to disagree, in what is known as the V cb puzzle.
A somewhat separate approach consists of starting from the heavy-quark symmetry f.f.s and systematically include terms of O(α s ), as well as power-suppressed terms in 1/m b and in 1/m c , i.e., considering also c as a heavy quark [168,169,[217][218][219]. Applications include again the simultaneous determination of V cb and R(D ( * ) ) [217][218][219]. The very recent [168,169] included in the above expansion also terms of O(1/m 2 c ) (interestingly, the size of the calculated 1/m 2 c coefficient provides an argument in favor of the convergence of the series) and focused on the f.f. determination, including the most recent calculations from QCD LCSRs, constraints from lattice QCD, three-point functions determined through QCD sum rules, and unitarity.

Interpretation
It is quite remarkable that B-decay anomalies hold their ground at the time of this writing. As already mentioned in Section 3.3, the last R K update by LHCb is especially hopeful in that the significance of the discrepancy went up proportionally to the increase in statistics. This fact, as well as the degree of coherence across the three different b → s "anomalies" (Sections 3.1-3.3)-plus the b → c ones (Section 5)-clearly call for the question whether a common, "natural" interpretation exists. This question is especially nontrivial because of the wealth of collider data, particularly at the energy frontier, which on the other hand, show no significant discrepancies with respect to the SM.
In the discussion to follow, we will focus on the current understanding from an effective-theory point of view. We will not discuss specific models, as the landscape is too vast. We will thereby focus on the b → s anomalies only. In fact, a quantitative inclusion of the b → c anomalies requires assumptions to relate Wilson coefficients in different flavor sectors.
In great synthesis, one can say that the b → s anomalies lend themselves to an explanation in terms of semileptonic four-fermion interactions of either of the following forms: The effective scale hinted at by the fits (for recent ones, see [83,155,[220][221][222][223][224][225][226][227][228][229][230][231][232][233][234][235]) is Λ 35 TeV. It is noteworthy to mention that the measurements of the branching ratio for the B 0 s → µ + µ − decay keep showing a similar trend as branching ratios of semileptonic decays: the experiment is below the SM prediction by O(15%). The last LHCb measurement, performed with the full Run-2 dataset, yields BR(B 0

+0.15
−0.11 ) × 10 −9 [11,12], to be compared with the SM prediction of (3.67 ± 0.15) × 10 −9 [229]-see [20,21] for the state-of-the-art calculation. The last world average [10] of the Atlas, CMS, and LHCb results [7][8][9] came with an error about 30% smaller than the last LHCb measurement alone and hinted at a ≈ 2σ tension with respect to the SM prediction. This tension consistently increases to 2.3σ if one uses the updated world average [229], which includes the latest LHCb measurement. The possible patterns of New Physics (NP) contributions to the b → s anomalies can be studied systematically within the weak effective theory (WET) at the b-quark scale. Restricting to |∆B| = |∆S| = 1 contributions that can be generated in the SM effective theory (SMEFT) at dimension ≤6, the corresponding NP effective Hamiltonian reads: (5) with N = 4G F / √ 2V tb V * ts e 2 /(16π 2 ) and the operators: For fuller details on the conventions, see [225]. Contributions from operators O ( )bsµµ S,P are strongly constrained by the B 0 s → µ + µ − decay. SMEFT implies however well-known relations among the corresponding Wilson coefficients [236]. (Discussions of the case where the relations (7) are violated can be found in [89,237]. For a detailed numerical study, including also tensor operators, see [238].) With these relations enforced (at the SMEFT scale), the corresponding operators do not lead to sizeable contributions in semileptonic b → sµµ transitions [239].
The Wilson coefficients of the operators (6) can subsequently be fit to all available data. This can be done, e.g., by constructing a χ 2 function with all the observables of interest and the corresponding theory predictions, which carry a dependence on the Wilson coefficients to be fit. One crucial aspect of such an approach is a correct estimation of the covariance matrix. The latter is the sum of experimental and theoretical components. One customary approximation is to evaluate the theoretical component at the SM point (C NP i = 0), i.e., to neglect the possible contribution that NP induces on the theory covariance matrix. This specific point was recently reappraised quantitatively in [229]. The approximation of evaluating the theory covariance matrix at the SM point is deemed valid to the extent that NP contributions are "small". However, there are observables whose theory uncertainty is completely negligible with respect to the experimental one only in the absence of NP. This is the case for lepton-flavor universality tests such as R K ( * ) , which represent some of the pivotal quantities for the assessment of the significance of B anomalies. In the case of LUV NP contributions, by construction, the cancellation of long-distance theory uncertainties becomes less and less efficient the larger these contributions. In such cases, the theory correlation matrix has to be evaluated at the specific NP point considered.
In the specific case of R K , the SM uncertainty is quoted to be 1% [116,117], whereas the experimental uncertainty is currently 5% [123]. An O(15%) LUV effect multiplying a longdistance contribution known to O(30%) (from f.f.s squared) would yield a theory error again around 5%. A similar warning holds for other LFU tests, in particular the so-called [111][112][113] also mentioned in Section 3.2; although, at present, experimental errors are vastly dominant [98], and the prospects for these quantities to reach an accuracy of a few percent [240] are longer term than for R K and siblings.
As of Moriond 2021, several new measurements have been made available, in particular the R K update with the full LHCb dataset as of Run-2 [123], which triggered a string of new assessments of the status of B anomalies [155,[229][230][231][232][233][234][235]. Besides the R K update, many additional recent analyses are worthy of note: the B 0 s → µ + µ − update from CMS, which includes the B 0 s → µ + µ − effective lifetime [8]; the very recent LHCb counterpart [11,12], which remarkably includes B 0 s → µ + µ − , its effective lifetime (already measured at LHCb in [9]), B d → µ + µ − and even the first (ever) limit on B 0 s → µµγ using the method proposed in [143]; two B → K * µ + µ − angular analyses by LHCb, namely an update of the B 0 channel with 2016 data [106] and a new analysis of the B ± channel [107].
With these data in hand, one can perform several tests, thereby accessing different useful pieces of information. Below, a few specific directions are itemized, along with their motivation. Note that not all these tests are mutually exclusive.

(a)
Focus on a one-(real-)parameter scenario. This test is useful to address the question whether the observed discrepancies are due to one dominant (weak-effective theory) operator, with a phase aligned with the SM one. This greatly reduces the SMEFT space of possible contributions and excludes sizeable new phases, although this can only be established through CP-sensitive tests; see below; (b) Consider two-real-WC scenarios instead. There are certain two-parameter choices that lend themselves to well-defined interpretations in the UV. A first prominent example is C bs 9 vs. C bs 10 , for = µ (see the corresponding operators in Equation (6)). Such a fit serves to establish whether the second interaction in Equation (4) solution, emerges naturally from the data, i.e., whether the latter is constraining enough already at present. Encouragingly, this exercise tends to return opposite signs for the two WCs, although the magnitude of C bsµµ 9 (< 0)-mostly determined by the b → sµµ data-tends to be more than twice as large as C bsµµ 10 (> 0)-which is mostly determined by the slight tension in B 0 s → µ + µ − . A second popular example among model-builders is the case C univ 9 vs. ∆C . This scenario is interesting because C univ 9 could be generated by the running of semitauonic operators able to explain R D ( * ) , as pointed out in [241]. Intriguingly, the size preferred by the fit is precisely the one needed for this purpose, as first quantified in [225]. It should however be noted that a C univ 9 shift could also arise from four-quark operators [225] and/or it could just be the effect of underestimated hadronic effects; (c) Perform fits to different subsets of observables, for example: (i) BR plus angular data only (Sections 3.1 and 3.2); (ii) ratio data only or ratio plus B 0 s → µ + µ − data only (Sections 2 and 3.3); (iii) perform a truly global fit, including (i) and(ii). In the case of (i) one would focus on "less clean" observables. If the anomalies were here only, it would be very difficult to unambiguously claim new physics of the observed size. This can instead be done, unambiguously, with the observables of case (ii). It is remarkable that both sets of observables do point to new physics of the very same size and in the same EFT directions; see Equation (4). Hence, estimating to the best of our knowledge the errors for the observables of set (i), it makes sense to perform a global fit, that includes (i) and (ii) alike; (d) Consider complex Wilson coefficients. As has been well known since [242], in the b → s case, the most constraining observables-branching ratios, as well as CPaveraged angular observables-are such that only NP contributions that are aligned in phase with the SM can interfere with the SM counterparts. This implies that if NP carries a "nonstandard" (i.e., not aligned with the corresponding SM direction) CP phase, it will be constrained more weakly than NP with a "CKM-like" phase.
See [229] for a state-of-the-art discussion. For the Wilson coefficients of greatest interest in connection with b → s anomalies, namely C 9 , C 10 , and to some extent, C 7 , the constraints on the imaginary part end up being looser than on the real part [155]. It should also be noted that the thus-obtained constraints on the imaginary parts are all compatible with zero, and the best-fit values for the real parts are similar whether the corresponding imaginary parts are switched on or off; (e) Compare NP significances evaluated with theory errors at the SM point vs. theory errors at the given NP point. This is an important issue and is clearly separate from that of evaluating theory errors with robust assumptions on the relevant longdistance physics, in particular form factors and resonant contributions. This issue was already described below Equation (7) and was addressed in [229]. The impact of including theory errors evaluated at the NP point is at present "moderate": the best-fit points tend to change a little in the case of fits performed with observables of set (i) in item (c) above-mostly for the C 9 -only case-and changes are well within the errors attached to the best-fit points; on the other hand, the best-fit points for fits performed with theoretically clean-only observables (set (ii) in item (c) above), the solution basically does not change at all. This issue however warrants further consideration as data accumulate and experimental uncertainties decrease.
Different aspects of the above points have been addressed in the post-Moriond-2021 global analyses [155,[229][230][231][232][233][234][235]. From several of the above fits, it emerges that a few scenarios-corresponding to assumption (b) or (d) above-fit the data with comparable pulls with respect to the SM assumption. These scenarios include C and C bsµµ 9 [229]. An interesting question addressed by Altmannshofer and Stangl is whether these scenarios may be told apart by the numerous upcoming measurements. Among the latter, and focusing on the theoretically cleanest, one may quote R K,K * ,φ for both low and high q 2 (in the case of R K * , one also considers the very-low-q 2 range [0.045, 1.1] GeV 2 going down to the dimuon endpoint); the muon-minus-electron differences of P i observables in B → K * analyses; CP-asymmetries. Of all these quantities, the only observable clearly able to discriminate among the abovementioned scenarios is, somewhat expectedly, Q 5 ≡ D P 5 [229]. It is actually remarkable that the combined information from this quantity, ratio observables, and selected branching ratio data (notably B 0 s → µ + µ − ) stand real chances to tell apart the above scenarios in the short/medium term.

Conclusions
Semileptonic decays of B mesons (as well as Λ b hyperons) to strange or charm counterparts have provided, already for some time, the most tangible chance of consolidating the presence of new effects in collider data. These observables cannot all be taken on a similar footing as far as the control of theory errors is concerned; however, theoretically clean vs. less clean datasets lead to similar conclusions. The most prominent conclusion is that, parameterizing the new effects most generally through weak-scale effective operators, there exist a handful of scenarios that describe all data in a coherent way and improve over the SM description substantially-by O(4σ) or more, depending on the considered data and on the fit methodology. The main underlying physics assumption is that the putative new physics lies above the EW scale, but not much above. The most distinctive feature of this new physics is (or at least has been for some time) somewhat unexpected: the violation of lepton universality. If the above picture should consolidate more and more with upcoming data, ideally as several O(5σ) discrepancies in the cleanest among the above-mentioned quantities, it would likely provide a handle-whose importance for the future of our field is difficult to overstate-on the question of what lies beyond the SM. Amusingly, particle physics would thereby experience something it has not for decades: a genuinely unexpected discovery. [CrossRef] 27. Aubert