The Leiden Atomic and Molecular Database (LAMDA): Current Status, Recent Updates, and Future Plans

The Leiden Atomic and Molecular Database (LAMDA) collects spectroscopic information and collisional rate coefficients for molecules, atoms, and ions of astrophysical and astrochemical interest. We describe the developments of the database since its inception in 2005, and outline our plans for the near future. Such a database is constrained both by the nature of its uses and by the availability of accurate data: we suggest ways to improve the synergies among users and suppliers of data. We summarize some recent developments in computation of collisional cross sections and rate coefficients. We consider atomic and molecular data that are needed to support astrophysics and astrochemistry with upcoming instruments that operate in the mid- and far-infrared parts of the spectrum.


Introduction
Although baryons constitute only ≈5% of matter in the Universe, the formation of stars and planets out of baryonic matter is a key astrophysical process. In the standard model of cosmology, the Universe expanded, and matter cooled and recombined to an almost neutral, transparent state without luminosity sources (the 'Dark Ages'), while some initial density fluctuations grew to form structures: clusters, galaxies, and stars. The first generation of stars and the earliest active galactic nuclei re-ionized much of the intergalactic and interstellar gas, into its present ionized transparent state. Within galaxies, the cycle of stellar birth, stellar evolution, and stellar death continues to the present day, with stellar birth accompanied by the formation of planets. Nuclear fusion reactions inside stars enrich their composition in heavy elements, which winds and explosions deliver to the interstellar medium. The dilute matter in interstellar space plays a crucial role in this cosmic recycling scheme, both as a source and as a reservoir. Astronomers use spectroscopy from radio to X-ray wavelengths to follow the physical and chemical evolution of the interstellar medium (ISM), which is typically in a state far removed from thermodynamic equilibrium. Quantitative analysis of atomic and molecular spectra thus requires information about the myriad processes that form and destroy molecules and that redistribute the populations of their internal states.
Quantitative analysis of astronomical spectra requires comparison with models. In general, a model of dilute gas consists of (a) a set of coupled differential equations-rate equations-that characterize the molecular absorption database [5] extends through near-and mid-infrared wavelengths. All three of these databases are curated and critically reviewed and the accuracy of radiative data rarely limits the analysis in quantitative astronomical spectroscopy. A useful tool to access the CDMS and JPL databases is the Splatalogue 6 , which allows filtering of data and conversion of units. Some astrophysical applications at temperatures above ∼1000 K require much more extensive molecular models and line lists. The ExoMol 7 database [6] is very useful in this regime, although caution is needed as the data for most molecules are based on theoretical term energies, so that transition frequencies are generally not of spectroscopic accuracy. Table 1. Levels of accuracy in the LAMDA database a .

Class Accuracy Type of Calculation
A ∼30% Quantum (close coupling or coupled states methods) B factor of ∼2 IOS, QCT/statistical calculations; Born/Coulomb-Born/recoupling approximation C factor of ∼3-4 direct scaling He → H 2 , O → S D factor of ∼10 indirect scalings for similar systems; reduced dimension approaches for reactive systems a : Isotopologues are in the same class as their parents except for H→D substitutions (where the mass correction is >30%), and symmetry breaking systems (which require their own PES).
The focus of LAMDA is on non-LTE calculations, which provide more accurate estimates of column densities, and allow determinations of the kinetic temperature and volume density of the gas. Non-LTE models require cross sections (integrated over velocity into rate coefficients) of the observed species for collisions with the bulk gas component (usually hydrogen in molecular or atomic form). For a detailed formulation of the non-LTE problem, see [7]. For the ∼200 molecules which have to date been detected in interstellar space, collisional data exist for ∼50 species (not including atoms), although for very few molecules have all relevant collision partners been considered. For inclusion into astrophysical radiative transfer models, these data need to be matched to the available spectroscopic data in their coverage of temperatures and energy levels. In addition, existing collisional data do not always resolve hyperfine (and other) interactions, so that spectroscopy at matching resolution is needed. Besides this matching of spectroscopy with collisional data, the added value of LAMDA lies in its homogeneous data format and its easy access with all species in one place.
Users of the LAMDA database are expected to refer to the paper by Schöier et al. [1]. When individual molecules are considered, references to the original papers providing the spectroscopic and collisional data must also be made. The interpretation of astronomical spectra depends on the availability of accurate data; conversely, data providers depend on citations and credit for their funding. References to the database should include the date when LAMDA was consulted, to make clear which version of the data was used. In the future, a more advanced versioning system may be implemented.
Besides the database, the LAMDA team also maintains the public version of the widely-used RADEX program for fast non-LTE analysis of interstellar line spectra [7]. The program comes in two versions: an online calculator for quick checks, and a stand-alone version for extensive calculations. For publication-quality results, authors should always use the stand-alone version, which gives full control over all input parameters and molecular datafiles. We encourage users to write a script calling RADEX, to ensure that their results are reproducible.
After reviewing current methods to calculate molecular collisional data (Section 2), this paper describes the actual status of LAMDA (Section 3) and currently planned updates (Section 4). Subsequently, we discuss strategies to deal with missing collisional data (Section 5) and the molecular data needs for the 2020s (Section 6). The paper focuses on developments since our previous review [8]. Recent advances in atomic and molecular spectroscopy are discussed by [9,10]. The theory behind calculations of collision cross-sections is reviewed by [11,12]. For a primer about estimating molecular column densities from observational spectra, see [13]. Analysis of X-ray spectra is beyond the scope of LAMDA, but SPEX is a well-known tool for this [14]. A complementary database for molecular collisional data is BASECOL [15] 8 .  [29], but the difference with the posted rates is small.

Potential Energy Surface and Scattering Calculations
The theoretical study of collision-induced rovibrational energy transfer has received much attention over the past 50 years. Astrophysical applications played a significant role in the development of this research field. Excitation studies of interstellar molecules began in the early 1970s following the establishment of quantum scattering theory. However, the computational resources available at that time essentially limited these studies and most of the collisional data available were quite approximative. In addition, the helium atom was most of the time used as a template to mimic H 2 collisions. However, over the last two decades, the huge increase of computational resources and the significant progress in ab initio quantum chemistry have led to a spectacular improvement of accuracy and to a large variety of studied systems. From the direct comparisons between theoretical and experimental state-to-state collisional cross sections, the typical accuracy of modern calculations is 10-20% [30][31][32][33], even for small polyatomic molecules colliding with H 2 . Such a good accuracy is required to guarantee high confidence in radiative transfer calculations and to allow a robust determination of molecular column densities and physical conditions. The computation of collisional rate coefficients takes place within the Born-Oppenheimer approximation for the separation of electronic and nuclear motions. Scattering cross sections are thus obtained by solving for the motion of the nuclei on an electronic potential energy surface. This corresponds to the fixed-nuclei approximation in electron-molecule collision studies. The next two subsections describe the recent advances both in terms of computation of interaction potentials and scattering calculations.

Potential Energy Surface (PES)
The theoretical study of any scattering process requires the prior determination of an interaction potential between the particles involved. This so-called potential energy surface (PES) must be accurate since dynamical calculations are very sensitive to the PES quality, and it is impossible to obtain accurate collisional data if the PES is not of high quality. In practice, the PES accuracy should be a fraction of the kinetic energy, i.e., about 1 Kelvin for interstellar applications. The most accurate approaches are those based on ab initio quantum chemistry methods. For non-reactive systems and because of the low temperatures in the ISM, the excitation process of a molecule due to a collision with a neutral projectile relates generally to Van der Waals systems in their electronic ground state. As these systems often have the particularity of being adequately represented by a single electronic configuration, the use of mono-configurational ab initio methods like the coupled-cluster methods [34] are preferably used for the determination of the PESs. The coupled-cluster method with single, double, and perturbative triple substitution [CCSD(T)] is considered as the gold-standard [33,35]. The explicitly-correlated CCSD(T)-F12 approaches [36] are nowadays the methods of choice for PES generation since they allow the use of small atomic basis sets while maintaining high accuracy. They were successfully applied to many-electron collision systems [37,38]. We note that in rotational excitation studies most of the PESs are computed by freezing the molecules in their vibrationally averaged geometries. This so-called rigid-rotor approximation was recently validated against full-dimensional scattering calculations for the CO-H 2 system [39]. Flexible intermolecular potentials are thus required only for vibrational excitation studies.
For electron-molecule collisions, multi-configurational ab initio techniques are preferred because many electron collision processes involve electronic excitation of the target. A variety of theoretical approaches are thus available for treating such collisions, such as the Schwinger multichannel, the complex Kohn variational, and the R-matrix methods (see Tennyson and Faure [40] and references therein). The UK Molecular R-matrix method is one of the most successful [41]. Initially developed in the 1940s to treat nuclear reactions, the R-matrix approach was successively adapted to the study of electron-atom and electron-molecule collisions in the early 1970s. The R-matrix method relies on the division of space into an inner and outer region. The inner region is designed to be large enough to contain the entire electron density of the molecular target. In the inner region, the scattering electron and target electrons are treated as being indistinguishable and all interactions (polarization, exchange, and correlation) are explicitly considered through multi-reference techniques. Conversely, in the outer region, the scattering electron is only affected by the long-range potential and the physics is much simpler. A significant advantage of the R-matrix approach is that the inner region problem needs to be solved only once and the energy dependence is entirely obtained from the rapid outer-region calculations.

Scattering Calculations
The most accurate approach to treat inelastic scattering remains the close-coupling method [60]. When combined with a high-level PES, it provides highly accurate data. However, for bimolecular collisions involving 'complex' organic molecules (>4 atoms) or reactive species such as ions or radicals, the increasingly large number of equations to be solved simultaneously in the close-coupling method prevents its use, even within the coupled-states approximation. Several methods have been developed over recent years to circumvent this issue. The quasi-classical trajectory (QCT) method, which has been used with success to generate rate coefficients for rotationally inelastic transitions [61,62] can be used for de-excitation transitions as long as the molecule is linear or has certain symmetry properties (see the discussion in [61]). Elevated temperature does not seem to be required since for HC 3 N; good agreement with close-coupling calculations was found down to 10 K [63].
More recently, some progress has been made in the development of alternative quantum methods to treat inelastic scattering. In particular, the wave packet-based Multi Configurational Time-Dependent Hartree (MCTDH) method [64] and the mixed quantum-classical theory (MQCT) [65] have shown promise in the calculation of inelastic cross sections.
Another quantum alternative is provided by the statistical approach to bimolecular collisions. The general idea is to assume the formation of a long-lived intermediate complex during the collision, resulting in a statistical redistribution of the population of the internal molecular levels. A particular class of statistical approaches consists of adiabatic capture theories, in which a series of rotationally or vibrationally adiabatic potentials is constructed using the long-range anisotropic potential. One such method is the statistical adiabatic channel model (SACM) introduced by [66]. This method was recently combined with global PESs and benchmarked against close-coupling calculations by [67]. For systems where the well depth of the interaction potential is larger than a few 100 cm −1 , the statistical method was found to reproduce the close-coupling results within a factor of 2 up to room temperature. A more elaborate (but more CPU time consuming) statistical treatment was also employed by [68] in the case of CH+H.
For electron-molecule collisions, simple dynamical approaches such as the adiabatic-nuclei-rotational (ANR) approximation (equivalent to the infinite-order-sudden approximation; e.g., [69]) can be used because the electron collision time is shorter than a typical rotational period down to the near-threshold regime [70]. The R-matrix method combined with the ANR approximation was checked against experiment for the rotational cooling of HD + and the deexcitation theoretical rate coefficients at 10 K were verified to within 30% [71]. The ANR approximation was also checked against full rovibrational multichannel quantum defect theory (MQDT) calculations. It was demonstrated that ANR rotational cross-sections are accurate down to threshold for the molecular ions H + 3 [72], HD + [73], and HeH + [74]. We note that recent developments include the extension of the ANR approach to open-shell targets, e.g., CN [75] and OH + [76]. A review of recent electron-molecule calculations of astrophysical interest can be found in [40].

Current Status of LAMDA
The LAMDA database combines spectroscopic information with collisional data for atoms and molecules of astrophysical interest. To ensure quality and accuracy, the LAMDA database only contains collisional data which result from quantum mechanical, quasi-classical, or statistical calculations, and which have appeared (or are about to appear) in the refereed literature. While published radiative data are usually based on laboratory measurements of line frequencies combined with Hamiltonian models, published collisional data are mostly the result of (quantum-mechanical) calculations, and experimental tests are the exception.
When LAMDA was started in the early 2000s, computational capabilities limited most collision calculations to the use of He as partner, rather than H 2 , the most common species in the cold ISM. The workaround was to scale the He rates to H 2 , just correcting for the change in reduced mass between the X-He and X-H 2 systems in the thermal averaging. This scaling is reasonably accurate only for para-H 2 , which limits its applicability to temperatures 78 K, i.e., the point where the ortho/para ratio of H 2 drops below unity. While calculations with H 2 have long become the new standard [8], the scaling procedure is still sometimes used, although the spherical symmetry of He and its lower polarizability compared with H 2 tends to underestimate the long-range interaction [11], especially with ions. Collisional data with He are still useful though, as ignoring He collisions leads to errors in derived column and volume densities of order 20%, which is the He/H 2 abundance ratio.
In recent years, the field of collisional calculations has grown more sophisticated, and the commonly accepted method to calculate intermolecular interaction potentials is now the coupled-cluster (CC) formalism, usually at the CCSD(T) level and its explicitly correlated variant or better [33]; see Section 2. As a result, most calculations now have H 2 as a collision partner, and often resolve its ortho (J = 1) and para (J = 0) moieties. Second, hyperfine selective calculations are becoming commonplace, which is important especially for species such as HCN [92] and CN [55]. Third, older calculations are often limited to low temperatures ( 100 K) and require extrapolation for applicability in warm star-forming regions. Modern calculations cover temperatures up to ∼300 K which largely removes the need for extrapolation and its associated uncertainty.
To guide the users, the LAMDA database distinguishes four levels of accuracy for its collisional data, which are summarized in Table 1. Most modern PESs are based on CCSD(T) calculations or better, so the accuracy is mostly limited by the scattering treatment. For most symmetry-conserving isotopologues, we recommend to use the collisional data for the main isotope; only for symmetry-breaking systems and for H→D substitutions does the isotopolog need its own PES, i.e., a separate calculation. In particular for hydrides, differences can exceed a factor of 2 due to both kinematics and PES effects (see Scribano et al. [93]). For further discussion on this issue, see Section 4.
Tables 2 to 5 list the collisional data which are included in the LAMDA database as of March 2020. For atoms and ions, the database includes the far-infrared fine-structure lines of C, C + , O, and N + . Most included molecules are di-and triatomic species, but some larger molecules with 4-6 atoms are also included. The collision partner is mostly H 2 , although for some species only scaled He rates are available. In some cases, collisional data with H and electrons exist, which are useful to interpret observations of diffuse parts of the ISM, and regions with high radiation fields. The last columns of Tables 2 to 5 give the quality labels as defined in Table 1. We add a * symbol to the label if more accurate data are available; see Section 4 for details.
The LAMDA data format is used by several well-known radiative transfer programs such as RADEX [7], RATRAN [94], LIME [95], and DESPOTIC [96]. The radiative transfer part of the CLOUDY program [97] also uses this format. The format provides a transparent way for astronomers to use atomic and molecular data without expert knowledge of the physical and chemical literature. However, proper interpretation of astronomical spectra does require basic knowledge of quantum physics.  [105] use a more accurate interaction potential, but cover a smaller temperature range.

The LAMDA Data Format
The format of the LAMDA datafiles is straightforward, flexible, and easy to use (Table 6). Lines starting with a ! sign are to inform human readers, and are not supposed to be read by computer programs.
Lines 1-2 name the atomic or molecular species, optionally with a reference for the spectrocopic data. Lines 3-4 give the weight of the species in atomic mass units (amu). Lines 5-6 give the number of energy levels in the datafile (NLEV). Lines 7 to 7+NLEV list the energy levels: level number (starting with 1), level energy (cm −1 ), and statistical weight (to calculate partition functions). In most datafiles, this information is followed by quantum numbers, which is informative for human readers but not required by computer programs. The levels must be listed in order of increasing energy. Spectroscopic accuracy is not required for excitation calculations but is necessary for spectral synthesis applications (Section 6.2).
Lines 8+NLEV and 9+NLEV give the number of radiative transitions (NLIN). Lines 10+NLEV to 10+NLEV+NLIN list the radiative transitions: transition number, upper level, lower level, and spontaneous decay rate (s −1 ). In many data files, these numbers are followed by line frequencies and upper level energies. These line frequencies are not read by radiative transfer programs, and do not have to be of spectroscopic accuracy. However, spectroscopic frequencies help human readers to identify the lines and to locate errors in the file. Many authors also add the upper level energy (in K) after the frequency, which again is informative but not required. Lines 11+NLEV+NLIN and 12+NLEV+NLIN give the number of collision partners (NPART). This is followed by NPART blocks of collisional data: Lines 13+NLEV+NLIN and 14+NLEV+NLIN give the collision partner ID and reference. Valid identifications are: 1 = H 2 , 2 = para-H 2 , 3 = ortho-H 2 , 4 = electrons, 5 = H, 6 = He, 7 = H + . Lines 15+NLEV+NLIN and 16+NLEV+NLIN give the number of transitions for which collisional data exist (NCOL).
Lines 17+NLEV+NLIN and 18+NLEV+NLIN give the number of temperatures for which collisional data are given (NTEMP).
Lines 19+NLEV+NLIN and 20+NLEV+NLIN list the NTEMP values of the temperature for which collisional data are given.
Lines 21+NLEV+NLIN to 21+NLEV+NLIN+NCOL list the collisional transitions: transition number, upper level, lower level, and rate coefficients (cm 3 s −1 ) at each temperature. The data files only tabulate downward collisional rate coefficients. It is presumed that detailed balance (microscopic reversibility) applies to the computation of the corresponding upward rate coefficients from the tabulated values. Because of the absence of an energy threshold, downward rate coefficients tend to vary less with temperature, so that interpolation and extrapolation are simpler and more robust.

Common Mistakes and How to Avoid Them
The format of LAMDA datafiles is designed to be straightforward so that users of modeling codes can create such files on their own. This 'FAQ' section offers guidance on some issues that frequently occur in the construction and use of datafiles: • Partner IDs are wrong, so that the program uses collisional data for the wrong partner. See Section 3.1 for the correct IDs. • The actual number of lines does not match NLIN, and it is the same thing for NLEV, NTRANS, NPART, and NTEMP. To check for such mismatches, RADEX has a 'debug' option. • Transitions occur between levels with the same energy. This happens for example if the spectroscopic data have insufficient frequency resolution. The obvious solution is to use higher resolution data. • Datafiles contain false metastable states. This occurs especially due to incomplete line lists.
In particular, the tables of energy levels and transitions must be tested to ensure that there are no levels that completely lack radiative transitions to any lower states, aside from true metastable states. Moreover, true metastable states should always be connected to at least one lower state by a tabulated collisional process. Otherwise, the solution of rate equations might suffer convergence problems, especially when chemical source and sink terms are ignored. • collisional data are practically always more limited in frequency and energy coverage than spectroscopic data, so to match the two, the spectroscopy needs to be trimmed. Care must be taken for undesired side effects.

Planned Updates of LAMDA
Recently, a number of collisional calculations have appeared in the literature which need incorporation into LAMDA. This section orders these updates of the database that are currently foreseen in four types. The ordering of the types roughly corresponds to the urgency of the updates from high to low. The actual priority for updating specific collisional data sets depends also on their added value compared to existing data, and demand from the community, which in turn depends on how often a species (or transition) is observed, and in what type of environment (which determines which temperatures and collision partners are appropriate).
The second type of updates are extensions of existing collisional data to more transitions or higher temperatures. This can be extensions to higher rotational levels, addition of hyperfine structure, and extension to vibrational or even electronic transitions. This category includes rovibrational CO-H 2 [122], rovibrational CO-H [123], HCO + -H 2 to 500 K [124], hyperfine N 2 H + -H 2 [125], and rovibrational SiO-He [126].
The fourth type of updates are data for molecular isotopologues, which come in several varieties. Some isotopic substitutions fundamentally change the molecular symmetry, such as H 2 D + -H 2 [136] and ND 2 H-H 2 [137]. The LAMDA database treats such isotopologues as species on their own. Other substitutions cause a minor change in the molecular symmetry, such as 12 C-13 C in C 3 H 2 ; in such cases, the potential energy surface of the interaction is basically unchanged. Significant effects on collisional properties occur mostly for H→D substitutions, especially for hydrides: DCO + -H 2 [138], ND-He [113], ND 3 -H 2 [137], D 2 O-H 2 [85], and N 2 D + -H 2 [139]. If isotopic substitution does not change the symmetry of the molecule at all, the effect on its collisional properties is small (<30%); this category includes 13 CN-H 2 and C 15 N-H 2 [140], and N 15 NH + -H 2 [141] This latter category is the least urgent to update because, for symmetry-conserving isotopic substitutions, simple scaling by (reduced) mass is quite accurate.
Isotope-like scaling is also sometimes used for O→S and certain other substitutions [8], although quantitative tests do not exist. The expected accuracy is no better than for He→H 2 substitution, as the change in rotational constants for O→S is much more pronounced than for e.g., 12 C→ 13 C. Recent results for the H 2 S-H 2 system [142] indicate that the propensity rules are quite different to those of water, which may be expected since H 2 S is a near-oblate asymmetric top while H 2 O is closer to prolate.
For symmetry-conserving isotopic substitutions, using collisional data for the main isotope is often a reasonable approximation, with ≈20% accuracy, but exceptions exist. One case is ND 3 where the substitution does not change the symmetry, but where more states are allowed than in NH 3 ; see [143]. Another case is H 18 2 O, where some quantum numbers are inverted w.r.t. H 2 O, so that upward transitions become downward. This effect occurs especially for high quantum numbers, when higher order terms in the Hamiltonian overtake the lower orders. The best known example is the 3 13 -2 20 transition [144], but others exist too; for a recent astrophysical study, see [145].
The H 2 O molecule is prone to special isotopic effects as it is an asymmetric top with C 2V symmetry, which means that all transitions are b-type and change both K quantum numbers (K a and K c ). As a result, there is a balance between spectroscopic constants that are powers of J, i.e., (B + C)/2, or of K, i.e., A − ((B + C)/2). Both types of constants will become smaller when substituting a heavier atom, but not equally. The 3 13 -2 20 transition is an rP branch, with ∆J = −1 and ∆K a = +1. The J-effect and the K-effect will be different (and different at each J and K), so that for each transition of this type, it depends on which effect is bigger to decide if the frequency goes up or down. The usual notion of all transitions going to lower frequency with heavier isotope is generally true for rR branches and rQ branches, as long as the molecular symmetry is not affected.

Notes on Individual Cases
Hyperfine HCN rates: The integer non-zero spin of the 14 N nucleus leads to hyperfine structure in the rotational lines of HCN, which can be resolved for the lowest-J lines towards clouds which have low intrinsic line broadening due to thermal or turbulent processes. The intensity ratios of the hyperfine components are useful to estimate the optical depth of the line without assumptions on the spatial extent of the emission. While LTE is often assumed, the optical depth estimates are more accurate if collisional deexcitation is taken into account. The first calculations of hyperfine-resolved collisional data for the HCN-H 2 system were made by [146] with reduced dimensionality for the potential energy surface due to computational limitations. Upgrading the PES to full dimensionality leads to significant changes in the collisional propensities, as demonstrated by [83]. See Braine et al. (submitted to A&A) for further discussion and tests.
Rovibrational CO: Thanks to the efforts of Castro et al. [122], there now exists a set of cross sections and rate data for ortho-H 2 and para-H 2 collisions with CO that is complete for vibrational quanta v = 0 − 5 and rotational quanta J ≤ 40. In addition, Li et al. [147] have published a careful re-evaluation of vibration-rotation line lists for nine isotopologues of CO. We are preparing a LAMDA-format data file that incorporates all of these molecular data. Furthermore, Song et al. [123] have calculated rate coefficients for the rovibrational excitation of CO in collisions with H atoms. Woitke et al. [148] have shown that such collisions are important in the upper layers of protoplanetary disks.

Spectroscopic Updates
Another area for which LAMDA could be extended is the role of electronically excited species in the ISM. One example is O, where Krems et al. [149] have investigated the H + O( 3 P) → H + O( 1 D) process. Even the downward rate coefficients are relatively slow; however, the de-excitation process is a means for producing kinetically hot H atoms if O( 1 D) is otherwise excited in warm neutral gas. In a future update of the data file, the 1 D and 1 S states should be included, together with some of the UV resonance lines, and extended to higher temperature. For consistency, the electron and proton collisions should be included too, since they tend to be even more important in controlling the balance between the 3 P, 1 D, and 1 S terms of the ground configuration of atomic oxygen. In addition, collisions with protons involve both inelastic and charge-transfer processes.
The two most recent quantum mechanical studies of the H + + O H + O + charge transfer process disagree with each other [28,150]. This process is extremely important in the interstellar medium and planetary ionospheres. It affects both the ion chemistry and the fine-structure excitation of O. Its low-temperature behavior needs to be known accurately because the fine-structure splittings in the ground term of atomic O are comparable to interstellar temperatures. A definitive investigation is urgently needed.

What to Do If Collisional Data Are Missing
For some commonly observed molecules, collision rates with H 2 are still lacking so that scaled He rates are being used. Examples include OCS, NO, and CH 3 CN (although for the latter, calculations are being performed by M. Ben Khalifa). The CF + ion also lacks H 2 rates but is less often observed; the PES was recently calculated by Desrousseaux et al. [151]. Between He and H 2 , the changes in the collisional rate coefficients are usually a factor of 2-4, or up to an order of magnitude for hydrides or ions. The resulting changes in the derived abundances are usually less, but having H 2 collisional data for these species would ensure that their abundances have the same level of quality as other non-LTE estimates.
In the astrophysical literature, scalings, estimates, or educated guesses are sometimes used if actual collisional data are lacking. An example are the scaled radiative rates for H 2 D + introduced in Ref. [152] and used by Ref. [153] and Ref. [154]; a similar scaling was adopted in Ref. [155] for H 2 O + . Briefly, the collision rates of the radiatively allowed transitions are approximated as Q 0 * S ij , where Q 0 is a typical downward rate coefficient and S ij is the normalized line strength out of initial (upper) level i summed over all final (lower) states. This factor S ij also enters the calculation of the Einstein A coefficient from an observed microwave intensity; see, e.g., the CDMS website 9 for details. The choice of Q 0 depends on the molecule and its collision partner. Species with high dipole moments ( 1 D), especially ions, should exhibit strong coupling to H 2 , suggesting values for Q 0 as high as 10 −10 cm 3 s −1 . Apolar species (µ 1 D), and especially neutrals, are expected to couple more weakly to H 2 , leading to ∼5-10× lower Q 0 values. Atomic H as a collision partner tends to show higher rates than H 2 , at least for light (reactive) hydrides, where we recommend high Q 0 values like those for H 2 on a polar neutral. For heavier neutral species such as NH 3 and H 2 O colliding with H, lower Q 0 values are in order, like for H 2 collisions.
The validity of scaling with S ij critically depends on the dipole selection rule that only ∆J = 1 transitions are allowed. Non-radiative transitions are assumed to have negligible collisional rate coefficients, which is of course a simplification. For certain cases, this scaling does not work at all: NH 2 is an example [116].
In the case of inelastic collisions between electrons and molecules, the Born approximation works well when quantum mechanical calculations are not available (see review by [156]). The Born approximation cross sections for rotational transitions can be written in terms of radiative transition probabilities [157], as there is a strong propensity for dipole selection rules. For electron-ion collisions, the Coulomb-Born approximation is used. Both approximations work best for large dipoles (µ 2 D); otherwise, the dipole-forbidden transitions have non-negligible rates. See [40] for further discussion.
To interpret spectroscopic observations of comets and estimate the production rates of various species in the coma, most existing collisional data are not suitable since the main collision partner in this case is H 2 O. For this situation, the 'Boltzmann' recipe is often used, where collisions are supposed to redistribute the molecule according to the Boltzmann distribution [158].
Although calculated rates are always preferred over scalings and estimates, astronomers are regularly observing molecules for which no collisional data exist. This situation is unlikely to change anytime soon; however, where 10 years ago the gap between theory and observation was widening rather than closing [8], it seems that today the rates of new molecular detections in space and of newly computed collisional data have converged at a few species per year. When observing species without collisional data, guessed collision rates are clearly preferred over guessed excitation temperatures, which are assumed to hold for all transitions of a species. In turn, guessed excitation temperatures are clearly preferred over quasi-LTE (T ex = T kin ) which tends to hold only for low-J transitions in regions with high gas densities. 9 https://cdms.astro.uni-koeln.de/classic/predictions/description.html Therefore, we consider scaled collisional data useful in cases where detailed calculations are infeasible, such as for rovibrational and/or vibronic transitions, especially for large molecules.

Collisional Data
As discussed in Section 2, it seems now possible to compute highly accurate collisional rate coefficients for many simple non-reactive species in collision with electrons, He, H, and H 2 . Despite two decades of theoretical efforts, however, there are still some collisional data to update. In particular, molecules for which only the He collisional partner was considered (e.g., NO, CH 2 , HCS + , ...) have to be the object of new scattering studies considering H 2 as a projectile.
Then, the new methods that have been developed should allow for providing collisional data for reactive species such as H 2 O + . Collisional excitation studies implying complex organic molecules can also be studied using approximate scattering approaches such as MQCT. This method was compared to close-coupling results for the rotational excitation of methyl formate (HCOOCH 3 ) by He [120,159].
Finally, extension of the available data to higher temperatures and to a larger number of rotationally excited levels is desirable and sometimes even critical. With increasing temperature, vibrational excitation thresholds open and become a new challenge to overcome. In the last decade, pioneering studies have considered the excitation of the torsional motion of CH 3 OH [160] and the bending motion of triatomics such as HCN [161] by He. Recent studies have also treated the rovibrational excitation of diatomics by H 2 (see e.g., [162]). Future work will need to address the coupling of the vibration(s) of nonlinear polyatomic targets with the H 2 rotation.

Radio and Far-Infrared Data
From an astrophysical point of view, several types of collisional data are needed to interpret observations from upcoming telescopes. The first are collisional data for large organic species. Currently, the largest species in LAMDA are CH 3 OH and CH 3 CN, but molecules with up to 13 atoms have been detected in the ISM 10 , with c-C 6 H 5 CN being the current record [163]. In addition, the 'buckyball' molecules C 60 , C + 60 and C 70 have been detected in circumstellar shells and nebulae [164]. Spectral surveys of star-forming regions with ALMA now reveal myriads of lines from organic molecules with ≈10 atoms, which are complex by astrophysical (if not by chemical) standards [165,166]. collisional data are needed to carry out non-LTE analysis for these species, but which data are needed depends on the situation. For studies of dark clouds, collisional data for low-J transitions at low temperatures would be sufficient for the species detected in such clouds (e.g., [167,168]). For warm and dense star-forming regions, quasi-LTE models give good fits to the data, and molecular column densities are accurate to ≈30% [169] within reasonable ranges of T ex , although the assumption of a single T ex for all lines may give some additional uncertainty. Furthermore, to interpret the inferred temperatures (kinetic or radiative) and to determine the importance of infrared pumping, collisional data are essential. Detailed quantum mechanical calculations exist for CH 3 CHO [120] and are being calculated for NH 2 CHO, CH 3 OCH 3 , and CH 3 CHCH 2 O [170]. For larger molecules, such calculations are currently unfeasible, but at the temperatures of ∼100-300 K of star-forming regions, approximate recipes and extrapolations for their collision rates would be adequate. First, steps toward this goal were taken by [171], and further worked out by [87] for the rovibrational excitation of H 2 O.
Second, collisional data are needed for several small species: • HeH + which is a key species in the chemistry of the early Universe [172] and recently detected in a planetary nebula [173,174]. Collisional data with electrons and H atoms exist [108,109], but data with H 2 are still missing, although the HeH + zone contains little H 2 . • PH 3 as a key species in interstellar phosphorus chemistry [175,176] which also has application to Jupiter and exoplanet atmospheres [177]. Its structure resembles that of NH 3 , but the barrier to inversion (umbrella-mode vibration) is so much higher in PH 3 that the inversion splitting is unmeasurably small in the ground state and thus does not lead to any inversion transitions [178]. Hence, inversion-resolved collision rates for NH 3 cannot be applied as such to PH 3 , although inversion-resolved rates could be averaged and summed to provide rotational rates. • H 2 O + as a key species in the chemistry of H 2 O-like ions which is a useful probe of the cosmic-ray ionization rate [179] and for which collisional data are underway as part of the ERC Consolidator Grant COLLEXISM project (PI F. Lique). • HC 5 N, which is important to study the formation of interstellar carbon chains. Work is underway by Lique and Dawes. Ideally, a recipe should be developed to extend the calculations to HC n N with n = 7, 9, 11, ...
Third, a recipe for scalable collisional data for isotopic species would be useful to improve abundance estimates for commonly observed isotopologues such as DCO + and DCN. For isotopes of C, N, O, and heavier atoms, the relative change in mass is too small to affect the collisional properties of the molecules significantly. However, simple recipes are only likely to apply to substitutions which conserve molecular symmetry and which do not involve spectral complications (see Section 4).
Fourth, collisional data with water as partner are needed to interpret cometary observations. Full quantum data for HCN and CO have just appeared [180,181] and more work is underway. The difficulty with molecule-H 2 O collisions is twofold: first, the potential well is much deeper than in molecule-H 2 systems and, second, H 2 O has a denser rotational spectrum than H 2 . This means that full close-coupling calculations are prohibitively expensive due to the excessively large number of angular couplings. As a result, the data of [180] were obtained using partially converged coupled-states calculations while those of [181] were computed within the SACM approximation. These two pioneering studies have shown, however, that propensity rules are less pronounced than in molecule-H 2 collisions and that pair correlations and resonant energy transfer between the target and H 2 O play a critical role. We note that, for comets at large heliocentric distances (typically R h > 3 au), CO becomes another important or even dominant collider because of its low sublimation temperature.

Near-and Mid-Infrared Data
The fifth and final type of needed data are collisional data for atoms and small molecules to interpret spectra from upcoming mid-infrared observatories, especially JWST/MIRI [182], ELT/METIS [183], and SPICA/SMI [184]. These facilities will probe the physical conditions and the chemical composition of the inner parts of protoplanetary disks. Calculated rate coefficients already exist for CO rovibrational and H 2 and HD rotational lines (Tables 2 to 5). In Nijmegen, measurements are ongoing for the ν 2 ('umbrella') mode of NH 3 [185], and calculations for the rovibrational transitions of CO 2 [186] and C 2 H 2 (Selim et al., in prep). Recently, rovibrational calculations have also been made for H 2 O-H 2 [187] and HCN-He [161], while for species like SO 2 and OH that are commonly observed at mid-IR wavelengths, no data for their mid-IR transitions exist yet. Particularly important are symmetric molecules without a dipole moment such as CH 4 [196] that cannot be observed at (sub)mm wavelengths, but are key players in the organic chemistry of inner disks [188]. In addition, atomic line data are needed, in particular for the mid-IR fine structure lines of Ne + , Fe, and Fe + . For the ionic species, data exist for collisions with electrons, which tend to dominate the excitation. The corresponding data files are in preparation [189,190]. Likewise, Pelan and Berrington [191] have calculated e-impact rates included in the upcoming LAMDA file for Fe, although their work applies only to the two lowest terms of even parity, viz. a 5 D and a 5 F. For collisions of atoms and ions with H and especially H 2 , the calculations are complicated as reactions compete with inelastic (de)excitation.

Spectroscopic Data and Radiative Transfer Tools
Finally, a number of developments in spectroscopy and radiative transfer are needed for proper interpretation of astronomical spectra from upcoming facilities: • The interpretation of laboratory spectra through Hamiltonian models and the creation of synthetic line lists for comparison with observed spectra is commonly done with the programs SPFIT and SPCAT, developed by Herb Pickett. Currently, the use of these programs requires significant specialization and training. User-friendly versions of these programs are needed to ensure proper interpretation of laboratory spectra in the future. The PGOPHER 11 package [192] is a major step in this direction. • (Sub-)mm laboratory spectra of large/complex organic molecules are needed, including vibrational modes and isotopic species, to beat line confusion in ALMA spectral surveys such as PILS [169] and ReMoCa [193]. • An update to RADEX is in preparation, which is able to include formation/destruction processes in the radiative transfer problem. Such calculations require state-to-state reactive collisional data, which exist only for some cases such as CH + [194] and OH + [131]. This new version of RADEX is also capable of multi-molecular spectral synthesis, similar to the programs XCLASS [195] and CASSIS 12 .

Conclusions
In conclusion, it is encouraging to see that, for many species of astrophysical interest, accurate collisional data are now either existing or underway. Thanks to 1-2 decades of dedicated work, the calculation of collisional data is now keeping pace with the rate of new molecular detections. With many updates planned or ongoing, we consider the LAMDA database ready for the 2020 decade.

Conflicts of Interest:
The authors declare no conflict of interest.