A Decade with VAMDC: Results and Ambitions

: This paper presents an overview of the current status of the Virtual Atomic and Molecular Data Centre (VAMDC) e-infrastructure, including the current status of the VAMDC-connected (or to be connected) databases, updates on the latest technological development within the infrastructure and a presentation of some application tools that make use of the VAMDC e-infrastructure. We analyse the past 10 years of VAMDC development and operation, and assess their impact both on the ﬁeld of atomic and molecular (A&M) physics itself and on heterogeneous data management in international cooperation. The highly sophisticated VAMDC infrastructure and the related databases developed over this long term make them a perfect resource of sustainable data for future applications in many fields of research. However, we also discuss the current limitations that prevent VAMDC from becoming the main publishing platform and the main source of A&M data for user communities, and present possible solutions under investigation by the consortium. Several user application examples are presented, illustrating the benefits of VAMDC in current research applications, which often need the A&M data from more than one database. Finally, we present our vision for the future of VAMDC.


Introduction
The Virtual Atomic and Molecular Data Centre (VAMDC) has been developed to interconnect atomic and molecular databases, thus providing a single location where users can access atomic and molecular (A&M) data.The VAMDC portal currently provides access to 38 databases containing a wide range of data from atomic spectroscopy (The Vienna Atomic Line Database (VALD), Section 2.1.2) to polycyclic aromatic hydrocarbon (PAH) theoretical data (PAH, Section 2.2.2).The paper presents the current status of the VAMDC project and its underlying databases, and gives some examples of the exploitation of VAMDC by some exemplar user communities; we describe open issues in the project and our vision for future developments.
The VAMDC project (http://www.vamdc-project.vamdc.eu/)originated as a European Union Framework 7 (FP7) research infrastructure project [1] that was funded between 2009 and 2012, and subsequently extended through the SUP@VAMDC project (http://www.sup-vamdc.vamdc.org/)[2] between 2012 and 2014.Since 2014, the VAMDC Consortium has operated as an independent body comprising some 35 research groups.This consortium has continued to operate and develop the data centre, as well as pursuing other related objectives in data science.The VAMDC consortium published its original aims in 2010 [1], which were updated in 2016 following the formal launch of the independent consortium [3].The public entry of the VAMDC consortium is its public website (http://www.vamdc.org) that provides access to data and documentation, as well as consortium-related information such as current membership and how to join us (http://www.vamdc.org/structure/how-to-join-us/).

Current Status of VAMDC Connected Databases
The VAMDC nodes are distributed such that they are located at the members' and partners' sites.At present, most of the databases included in the VAMDC e-infrastructure are databases used primarily in astrophysics.The common output format is the XML Schema for Atoms, Molecules and Solids (XSAMS) (see Section 3.1.1)that uses the tree-structured form of the data model.All of the current VAMDC databases are listed on the VAMDC portal (https://portal.vamdc.eu/vamdc_portal/nodes. seam), which also provides a short introduction to the database, a link to its public graphical interface and contact details of a member of that database team to whom inquiries may be directed.In addition, it is possible to list the atoms and the molecules that can be queried from those databases.
Table 1 provides a summary of the 38 databases currently accessible via VAMDC.It includes eight databases that are in the process of joining the consortium.It should be noted that a database might be offline for a period of time due to maintenance issues.
Table 1.Table of databases connected (C) to the Virtual Atomic and Molecular Data Centre (VAMDC) and to be connected (TC).User fields are labelled as follows for applications in astrophysics and planetary science: stellar physics (STEL), solar physics (SOL), interstellar medium (ISM), earth (E), planets (PL), exoplanets (EX O), brown dwarfs (BDW), comets (C).Partnerships are indicated as Full Members (FM) according to the VAMDC consortium MoU and partners following the VAMDC technical quality chart only (P).

Database Data Classification Applications
DREAM d radiative data for rare earths [39] STEL, SOL, plasmas, lighting industry IAMDB d  A+M spectroscopy, atomic collision ( a ) Astrophysics, other PEARL d atomic processes [40] STEL, SOL, plasma, fusion Clusters d cluster size distributions, condensation ( a ) ISM, P, biology a Paper in preparation b (FM, C) c (P, C) d (P, TC) e (FM, TC).
Among the databases already present in the VAMDC e-infrastructure in 2016 and cited in our last publication [3], some have evolved with respect to their data content, functionalities and internal structure, and improved their interoperability with the VAMDC ecosystem.Some of them have not evolved, and two have been disconnected from the VAMDC ecosystem.Ten databases have been added.Among these new databases is a new node at The National Institute for Fusion Science (NIFS) in Japan.

Evolution of VAMDC Nodes/Connected Databases Since 2016
Each individual database incorporated in VAMDC will now be discussed, highlighting recent developments and improvements.

NIFS Databases
The National Institute for Fusion Science (NIFS) has compiled and developed an extensive atomic and molecular numerical database on collision processes which has been open for public access (http://dbshino.nifs.ac.jp) since 1997.A data compilation on collisional cross sections for hydrogen, its isotopes and helium [41,42] was initiated by a collaborative working group of atomic and plasma researchers from Japanese universities, organised at the Institute of Plasma Physics, Nagoya University, in 1973.Subsequent compilations were published in Institute of Plasma Physics (IPPJ)-Atomic and Molecular Data (AM) reports in 1977-1989 and as NIFS-DATA (atomic, molecular, and plasma material interaction data) reports since 1989, covering electron impact ionisation and excitation cross sections, heavy particle collision cross sections and plasma-wall interaction properties.
The first retrieval and display database system, the Atomic and Molecular Data Interactive System (AMDIS), was constructed in a mainframe computer for electron impact ionisation and excitation cross sections of atoms and atomic ions in 1981 [43], with the database system being subsequently extended to include other collision processes.Currently, the database system consists of eight sub-databases, accessible via the internet; full details are described by Murakami et al. [44].
Numerical data of the collision cross sections and rate coefficients of various collision processes are compiled mainly from the published literature, as well as bibliographic data, and additional information such as experimental or theoretical methods used to derive the data are attached.Initially, data were collected for elements that were important, mainly for fusion science; later, the database was extended to include many more atoms and molecules applicable to various plasma applications.
Usually, several data sets obtained from different publications are stored for the same collision processes.Users can compare those data sets to check their reliability.Data can be queried by elements, charge states or various other query fields such as initial and final states, author and published year, and retrieved data are displayed as a numerical table or as a graph.
One of the sub-databases, Atomic and Molecular Data Interactive System (AMDIS)-IONIZATION, has been available from the VAMDC portal with search functionality by element since 2017 [4].AMDIS-IONIZATION has electron impact ionisation cross sections and rate coefficients for atoms and atomic ions.Data sets of cross sections with collision energy or rate coefficients with electron temperatures for single or multiple ionisation processes are available.Sometimes, initial and final electronic states are available according to the original data source.In the future, the addition of electron impact excitation cross sections and rate coefficients for atoms and atomic ions (AMDIS-EXC)) to the VAMDC portal is planned.

VALD
The Vienna Atomic Line Database (VALD) compilation is a critically assessed collection of radiative transition data aimed primarily at the stellar astrophysics community.VALD contains data on energy levels, wavelengths, oscillator strengths and line broadening parameters for nearly all stable elements in the periodic table and for a few simple molecules.For atoms, VALD includes six ionisation stages.The VALD interface allows a search of the whole data collection or of a restricted subset that is based on high-precision laboratory measurements.The Moscow VAMDC node serves high-precision data only, while the Uppsala VAMDC node provides the most complete VALD data set.
VALD also includes atomic data for individual rare-earth elements/ions, also provided by the Database of Rare Earths at Mons University (DREAM) database (https://hosting.umons.ac.be/html/ agif/databases/dream.html) [45] (see Section 2.3.4).These atomic data are extracted via VALD extraction tools following an adopted quality ranking.Therefore, only a part of DREAM data collection is accessible through the VALD-VAMDC nodes.
VALD is under continuous development, both in terms of functionality and data content.New data sets are systematically compared with existing ones and with experimental data in order to establish data quality rankings.For any given transition, all data elements are merged according to the ranking offering the best quality result for the end user while preserving all relevant bibliographic references.Recent developments include the gradual introduction of energy level and spectral line data for individual isotopes and isotopologues.Currently the work is complete for Li, Ca, Ti, Cu, Ga, Ba, Eu atoms and for isotopologues of CN, TiO, C 2 , CH, CO, OH, MgH and SiH.VALD now also provides data for the hyperfine-structure splitting of 6 Li I, 7 Li I, 23 Na I, 27 Al I, 27 Al II, 39 K I, 40 K I, 41 K I, 45 Sc I, 45 Sc II, 47 Ti I, 49 Ti I, 47 Ti II, 49 Ti II, 50 V I, 51 V I, 51 V II, 55 Mn I, 55 Mn II, 57 Fe I, 59 Co I, 59 Co II, 61 Ni I, 63 Cu I, 65 Cu I, 67 Zn I, 67 Zn I, 69 Ga I, 69 Ga I, 71 Ga I, 71 Ga II, 85 Rb I, 87 Rb I, 87 Sr II, 89 Y II, 93 Nb II, 95 Mo II, 97 Mo II, 127 I I, 127 I I, 135 Ba II, 137 Ba II, 139 La II, 141 Pr I, 141 Pr II, 151 Eu II, 153 Eu II, 155 Gd II, 157 Gd II, 159 Tb II, 165 Ho I, 165 Ho II, 171 Yb II, 173 Yb II, 175 Lu I, 175 Lu II, 176 Lu I, 181 Ta I, 181 Ta II, 191 Ir I, 193 Ir I, 203 Tl I, 205 Tl I and 209 Bi III.The latest developments of VALD and its interface to VAMDC have been presented in several refereed publications [46][47][48][49] and conference proceedings [50][51][52].It should be noted that the VALD database is linked to the VAMDC Query Store (see Section 3.2.2).

NIST ASD
The Atomic Spectra Database (ASD) [7] at the National Institute of Standards and Technology (USA) contains critically evaluated atomic data including energy levels, radiative transition probabilities and oscillator strengths, ionisation potentials, and observed and accurately calculated wavelengths of spectral lines between the hard X-ray and infrared regions of spectra.As of July 2020, ASD provides data on more than 112,000 energy levels, 280,000 spectral lines and 118,000 radiative transition probabilities for elements from H (Z = 1) to Ds (Z = 110).The majority of the data on energies and spectral lines were collected and evaluated from experimental papers, and in most cases realistic uncertainties are provided as well.ASD is directly linked to the set of the National Institute of Standards and Technology (NIST) Atomic Spectroscopy Bibliographic Databases [53], which allows an immediate access to the original sources of data.In addition to a tabular output of data in various formats, ASD also offers rich graphical services, e.g., online-generated Grotrian diagrams of levels and transitions.ASD serves as the basis for the NIST Laser-Induced Breakdown Spectroscopy (LIBS) Database [54], which allows on-the-fly generation of realistic spectra for Saha-Boltzmann plasmas that are used for the analysis of elemental abundances and diagnostics of terrestrial and astrophysical plasmas.

Spectr-W 3
The Spectr-W 3 project (http://spectr-w3.snz.ru) is a long-term collaboration between the Russian Federal Nuclear Centre-All-Russian Institute of Technical Physics (RFNC-VNIITF) and the Joint Institute for High Temperatures of the Russian Academy of Sciences (JIHT RAS) [8].At present, Spectr-W 3 is the largest available database providing information on the spectral properties of multicharged ions.The database contains over 450,000 records of experimental, theoretical and compiled numerical data on ionisation potentials, energy levels, wavelengths, radiative and autoionisation widths, satellite-line intensity factors for free atoms and ions and also fitting parameters, and analytic formulae to represent electron-collisional cross sections and rates (optional).References to the original sources and comments on the methods of data acquisition, etc., are also provided.Since 2016, a new section of Spectr-W 3 providing graphical data on X-ray emission spectra (densitograms) recorded from various plasma sources has been made available to users.Densitogram graphic images are characterised with a set of fields, enabling one to perform queries specifying the shortest and longest wavelengths of interest, the identification of up to five chemical elements and ionic isosequences, emissions from which contribute to the recorded spectra, with the upper and lower principal quantum numbers of optical single-electron transitions corresponding to the spectral lines pictured, and the reference to the relevant publication.Densitogram database records are also supplied with comments elucidating details of the experimental measurements.
2.1.5.CHIANTI CHIANTI (www.chiantidatabase.org),first released in 1996 [55], is a well-established atomic database and modelling code for optically thin plasmas.For a recent overview of the status of the database and future plans, see [9].The database consists of a series of American Standard Code for Information Interchange (ASCII) files with all the relevant atomic collisional and radiative rates (collisional excitation rates by electron and proton impact, transition probabilities, as well as theoretical/experimental wavelengths) required to calculate spectral line emissivities.The database originally only included ions of astrophysical importance, but, in recent versions, data for minor ions and some neutrals have also been added.
Radiative transition rates and wavelengths from CHIANTI versions 6 and 7.1 were included in VAMDC.The CHIANTI database also has ionisation/recombination rates and various other types of data that are specific to the modelling codes, but these were not included in VAMDC.CHIANTI version 8 [56] included new atomic rates for ions in the Li, B and Ne isoelectronic sequences, plus atomic data for several iron ions participating in the emission of light from the solar corona.Excitation rates among all states have been included (for the modeling of high-density plasma), which has increased the size of the database significantly.Version 8 also changed the format of some of the main ASCII files.CHIANTI version 9 [57] then introduced a significant change in the structure of the data for several ions to allow the calculation of the emissivities of satellite lines, with autoionising states and rates having been added to those of the bound levels.
These changes mean that a straightforward update to include all the changes of CHIANTI version 8 onwards within VAMDC could not be carried out, so at present the VAMDC portal only accesses CHIANTI version 7.1.CHIANTI version 10 is now in development and the integration of this version with the latest CHIANTI data into VAMDC is planned.

TIPbase-TOPbase
The Iron Project (IP) and The Opacity Project (OP) databases provide energy levels/terms, radiative transitions probabilities, photoionisation cross sections and collision strengths for a large selection of ions in the range from H to Ni. TOPbase (The OP database) contains the OP data for radiative processes [11,58,59], and TIPbase the IP data for collisional and radiative processes [10].The data have been calculated using state-of-the-art computer programs developed and maintained by the IP-OP members, including the R-matrix suite of codes [60,61] and the AUTOSTRUCTURE code [62].These data are relevant for experimental analysis, theoretical comparisons and for various astrophysical or laboratory experimental applications.The OP radiative data are used to calculate monochromatic and mean opacities [63] required in stellar codes and for the analysis of experiments.TOPbase and TIPbase are hosted at the University of Strasbourg in France (http://cdsweb.u-strasbg.fr/OP.htx).Sets of new (there were no data of that type in the database before) and recalculated data (with a better methodology) for existing data are being implemented in TOPbase for Fe I (new), Fe II (new) and Fe XVII (recalculated) as well as all the Ni ions (new); other recalculated ions will follow.A new service for opacity tables is available on the TIPTOP webserver [64] (but not yet accessible through VAMDC).It should be noted that the TIPbase-TOPbase databases are linked to the VAMDC Query Store (see Section 3.2.2).

Stark-B
Stark-B [12,[65][66][67] is devoted to the modelling and spectroscopic diagnostics of various plasmas in astrophysics, laboratory experiments, laser equipment design, laser-produced plasma analysis and inertial fusion research.It contains Stark widths and shifts of isolated lines of atoms and ions due to collisions with electrons, protons, ionised helium (the most important colliders for stellar atmospheres) and other ions.The data are calculated using the semi-classical perturbation [68] and modified semi-empirical methods [69].Stark-B is continuously updated by adding new data and by introducing new facilities for their use.Under the website option "Data history", there are "New data sets" and "Updated data sets", which make available a description of newly added data, including the date of import and details of the modifications for revised data.In order to enable inclusion of data from Stark-B in the computer codes used for stellar atmospheres modelling and other numerical calculations, it is possible to fit tabulated data with temperature.In particular, a fitting formula for interpolation of the displayed data for different temperatures has been derived [70].Thus, to each table of Stark widths and shifts, an additional table has been added with the coefficients needed for the fitting formula.The latest updates are further described in the current issue [13].It should be noted that the Stark-B database is linked to the VAMDC Query Store (see Section 3.2.2).

CDMS and JPL Spectral Line Catalog
Both these databases (https://cdms.astro.uni-koeln.de,https://spec.jpl.nasa.gov)provide spectral data for molecular species, which are or may be observed in various astronomical sources (usually) by radio astronomical means or for use in remote sensing.The Jet Propulsion Laboratory (JPL) catalog [15,71,72] and the Cologne Database for Molecular Spectroscopy (CDMS) [14,[72][73][74] have been previously described to some extent in the literature.Briefly, the database content is generally restricted to effective Hamiltonian predictions and associated assigned experimental data for quantum transitions with entry fields, including line position with accuracy, intensity, lower state energy and quantum numbers.These restrictions facilitate the transfer of high-precision laboratory data into comprehensive predictions within the associated range of quantum numbers, and reconciliation and/or extension of models from multiple laboratory studies.Separate entries exist for different isotopic species and usually also for different vibrational states.Updates and new entries to the spectral data are performed aperiodically, with a focus on detection in the interstellar medium as well as candidates for terrestrial remote sensing target molecules.
The incorporation of these two databases in VAMDC, and thus the application of its standards (see Section 3.1.1),not only greatly simplified the interoperability between the databases and improved the readability of the content (e.g., quantum number format), but also provided additional features and information.A listing of energy levels as well as all files that have been used to generate the data are commonly provided.Partition functions are now tabulated for 110 temperatures up to 1000 K.In the context of VAMDC, a Python library (vamdclib) has been developed that allows queries of data via VAMDC protocols with the information being stored in a local SQLITE3 database for use in third-party applications (e.g., XCLASS, see below in the section of use cases).Both databases are linked to the VAMDC Query Store (Section 3.2.2),and thus support VAMDC's generation of digital object identifiers (DOIs) for individual queries that can be used to cite these data in publications.
Creating new entries or updating existing ones is an important part of the work for the CDMS.As of July 2020, there are 1020 entries in the CDMS, up from 808 four years ago.Recent activities have emphasised species that are or may be observable with the Atacama Large Millimeter/submillimeter Array (ALMA), the Northern Extended Millimeter Array (NOEMA) and similar facilities.Examples include metal-containing di-and triatomics, which are relevant for the circumstellar envelopes of late-type stars, small to moderately sized organics, including cyclic molecules, which were or may be detected in star-forming regions, and several radicals or cations.Some effort has been made to make vibrationally excited state data available since transitions pertaining to low-lying excited vibrational states have been observed for several small to mid-sized organics (<15 atoms) in the dense and warm parts of star-forming regions.Very highly excited states of diatomics and some larger molecules, such as HCN, have been detected, in particular in the envelopes of C-and O-rich late-type stars.A detailed account of these activities is planned in the near future.

HITRAN
The High-Resolution Transmission Molecular Absorption Database (HITRAN) molecular spectroscopic database [16] is a canonical compilation of molecular spectroscopic parameters that are required for the input into the radiative transfer codes.HITRAN provides the best unique parameter values assimilated from both experimental and theoretical studies.Before adapting data to the database, the HITRAN group performs independent evaluations (controlled laboratory, atmospheric retrievals and theoretical analyses) wherever possible.While the target audience of HITRAN are scientists that study terrestrial and planetary atmospheres, the applications of HITRAN span a great many fields of science, engineering and medicine.The database was first established several decades ago [75] and has been under continuous development since then [16, [76][77][78][79][80][81][82][83].The latest edition of the database is HITRAN2016 [16] and is distributed through HITRANonline [84], which is accessible through the VAMDC portal.The VAMDC portal provides access to a traditional line-by-line high-resolution section of HITRAN, which contains spectral parameters for 49 molecules along with their significant isotopologues appropriate for terrestrial and planetary atmospheric applications.HITRAN provides references to the original sources for the majority of the parameters for every transition.The details of how such referencing was incorporated is described by Skinner et al. [85] in this Special Issue.It is worth pointing out that, with the exception of some diatomic molecules (including hydrogen halides) for which the line lists can be used at temperatures up to 5000 K or higher, the majority of the HITRAN line lists are intended to be used at the lower temperatures encountered in the terrestrial atmosphere.
For higher temperature applications, it is recommended to use the High-Temperature Molecular Database (HITEMP) [86], which mimics the format of HITRAN but contains a substantially larger amount of lines.However, at the present time, the HITEMP database provides data for only eight molecules [86][87][88][89], and is not currently accessible through VAMDC due to the large number of transitions it contains.
HITRAN also provides a section that contains high-resolution experimental cross sections for molecules with very dense spectra that are not amenable to a full quantum-mechanical description.A recent update [90] provided spectra of almost 300 molecules at various pressures and temperatures.There is also a section of the compilation that provides collision-induced absorption cross sections, which has been updated recently [91].These data are not accessible through VAMDC yet but can be accessed through HITRANonline (https://hitran.org)or through the HITRAN Application Programming Interface (HAPI) [92].A very extensive effort is currently underway to release the new HITRAN2020 edition of the database.

S&MPO
Spectroscopy and Molecular Properties of Ozone (S&MPO)) is an information system [17] jointly developed by Reims University (France), the Institute of Atmospheric Optics (Russia) and Tomsk State University (Russia).The line-by-line list of vibration-rotation transitions, energy levels, transition moments and other related information can be requested from S&MPO via the VAMDC portal.Since the previous edition of VAMDC [3], considerable new spectroscopic information has been included using experimental data and extended analysis of ozone bands in the infrared [93] and isotopic spectra enriched by 17 O and 18 O oxygen [94,95].Supplementary information concerning theoretical spectra simulation, comparisons with experimental records, dipole moment functions [96] and ab initio intensities for strong lines [97] can be accessed via the Reims (http://smpo.univ-reims.fr)and Tomsk (http://smpo.iao.ru)sites.The S&MPO interactive software was completely rewritten to make it relevant to current trends in internet application development.A new version (3.0) of the S&MPO relational database and information system is now operational and provides supplementary functionalities for the fully updated graphical interface.More information about the version 3.0 S&MPO information system can be found on its public website (https://smpo.univ-reims.fr/news/en_2019-09-25_02- .Applications of S&MPO may include education/training in molecular and atmospheric physics, studies of radiative processes and spectroscopic analysis. 2.1.11.MeCaSDa, ECaSDa, TFMeCaSDa, SHeCaSDa, GeCaSDa, RuCaSDa, TFSiCaSDa, UHeCaSDa These databases contain calculated rovibrational transitions (mostly infrared absorption and also some Raman scattering lines) for highly symmetrical molecules.They result from the analysis and fit of effective Hamiltonian and transition moments using experimental spectra recorded at high resolution.The calculation uses a model [98] and programs [99] developed by the group at the Laboratoire Interdisciplinaire Carnot de Bourgogne, Dijon, France, and based on group theory and tensorial formalism [100].
The MeCaSDa database [101] contains methane (CH 4 ) lines, which constitute the main research subject of the Dijon group.It is Dijon's biggest database and is of importance for atmospheric and planetary applications.Its contents have recently been updated, and it now has more than 16 million lines [18].
Two new databases have been added recently: TFSiCaSDa [19] concerning the SiF 4 molecule with applications to volcanic gases and UHeCaSDa (UF 6 ) for applications for the nuclear industry.The latter database is an exception: no experimental data are publicly available for this peculiar radioactive molecule and calculations are only based on literature data [102,103].More databases of such highly symmetrical molecules maybe developed in the near future.The databases SHeCaSDa, MeCaSDa, GeCaSDa, TFMeCaSDa and RuCaSDa are linked with the Query Store service (Section 3.2.2),while the databases ECaSDa, TFSiCaSDa and UHeCaSDa are currently being connected to the Query Store.
2.1.12.CDSD-296, CDSD-1000, CDSD-4000, ASD-1000, NOSD-1000 and NDSD-1000 Six molecular databanks in VAMDC of atmospheric and astrophysical interest have been provided by the Laboratory of Theoretical Spectroscopy, V.E.Zuev Institute of Atmospheric Optics (IAO), Siberian Branch, Russian Academy of Sciences, Russia.These include three versions of the Carbon Dioxide Spectroscopic Databank (CDSD-296, CDSD-1000, CDSD-4000), the Acetylene Spectroscopic Databank (ASD-1000), the Nitrous Oxide Spectroscopic Databank (NOSD-1000) and the Nitrogen Dioxide Spectroscopic Databank (NDSD-1000).These databanks provide positions, intensities, air-and self-broadened half-widths, coefficients of temperature dependence of the half-widths and quantum numbers associated with the transitions.The line positions and intensities are calculated within the framework of the method of effective operators.The line shape parameters are calculated using different theoretical approaches or empirical equations in terms of the rotational quantum numbers.
CDSD-296 and CDSD-1000 have been described in our previous paper [3].CDSD-296 has been updated recently [20] to improve the line parameters' accuracy.CDSD-4000 contains more than 628 million lines of the four most abundant isotopologues of carbon dioxide and covers the 226-8310 cm −1 wavenumber range [21].The reference temperature is T ref = 296 K and the intensity cutoff is 10 −27 cm −1 /(molecule cm −2 ) at 4000 K. ASD-1000 has more than 30 million lines of the principal isotopologue of acetylene in the 3-10,000 cm −1 spectral range [24].The database is adopted for temperatures up to 1000 K with an intensity cutoff of 10 −27 cm −1 /(molecule cm −2 ).The line intensities and pressure broadening coefficients are given for T ref = 296 K. NOSD-1000 contains more than 1.4 million lines of the principal isotopologue of nitrous oxide and covers the 260-8310 cm −1 region [22].The intensity cutoff is at 10 Finally, the NDSD-1000 has more than 1 million lines in the 466-4776 cm −1 wavenumber range [23].The intensity cutoff is at 10 −25 cm −1 /(molecule cm −2 ) at T ref = 1000 K.The broadening parameters are given for two reference temperatures: Currently, the above mentioned databanks are presented in VAMDC as separate nodes with a similar basic set of "restrictables"-restrictables refer to the type of data queried by VAMDC (https://standards.vamdc.eu/dictionary/restrictables.html).parameters (wavenumber, wavelength, line strength, InChIKey and state energies).The common set of returnable entities includes radiative transitions, sources, isotopic species, environment signatures and function descriptions for the temperature and pressure dependencies of line shape parameters.The parametric content of these returnables, however, depends on the original structure of the particular databank.Unification of these databanks, which is a step necessary for their merging into a single VAMDC node, is subject for future works.
Apart from being available through VAMDC, the databanks can be downloaded from the IAO ftp server (ftp://ftp.iao.ru/pub/)where they are presented in the original tabular format.

SESAM
Spectroscopy Database Dedicated to Electronic Spectra of Diatomic Molecules (SESAM) (http: //sesam.obspm.fr/) is restricted to the electronic spectra of molecular hydrogen and its deuterated substitutes, as well as CO.The hydrogen spectra include the Lyman and Werner band systems as well as B'-X, D-X electronic bands.The SESAM database allows queries, within a specific wavelength range, about the properties of the available transitions of a selected molecule.The transitions are in the Vacuum Ultra Violet (VUV) range.It is also possible to download the full range of data for a particular goal.Different additions are considered, e.g., the query of these molecular transitions at any redshift, which can be interesting for extragalactic observations where the spectrum is shifted in the visible.Since 2016, the CO molecule has been added.Further information and the latest updates are described in this Special Issue [13].It should be noted that the SESAM database is linked to the VAMDC Query Store (Section 3.2.2).

W@DIS
The W@DIS databases are part of the information system [25] designed to provide access to both tabular and graphical data, as well as information [104] ("information" is interpreted in accordance with the same term defined in the given reference and ontologies [105] for quantitative molecular spectroscopy necessary for solving fundamental and applied problems in a number of subject areas: atmospheric spectroscopy of planets and exoplanets, astronomy, etc.).The semantic information system W@DIS is the next-generation molecular spectroscopy information system, based on application of Semantic Web technologies to its tabular and graphical information resources [106,107].The W@DIS information system is available and hosted at the V.E.Zuev Institute of Atmospheric Optics in Tomsk, Russia (http://wadis.saga.iao.ru).This system can be a prototype for semantic information systems in the atomic, ionic and solid-state spectroscopy to be used by the VAMDC consortium.Meanwhile some data such as transitions are accessible from the VAMDC portal; they can be displayed via the "Molecular XSAMS to HTML" visualisation tool.

KIDA
The KInetic Database for Astrochemistry (KIDA (http://kida.astrophy.u-bordeaux.fr))[26,27] is a compilation of kinetic data (chemical reactions and associated rate coefficients) used to model chemistry in astrophysical environments (interstellar medium, protoplanetary disks, planetary atmospheres, etc.).In addition to detailed information on each reaction (e.g., temperature range of validity of the rate coefficients, reference and uncertainty), particular attention is given to the quality of the data, which is evaluated by a group of experts in the field.Since 2016, KIDA also compiles data used to compute chemical reactions occurring on the surface of interstellar dust grains (branching ratios, activation energies and barrier width) and desorption/diffusion of species on these surfaces (desorption and diffusion energies).

UDfA
The University of Manchester Institute of Science and Technology (UMIST) Database for Astrochemistry (UDfA), first released to the public in 1991 [108], contains basic chemical kinetic data and associated software codes and documentation for modelling the chemical evolution of interstellar clouds and the circumstellar envelopes of evolved Asymptotic Giant Branch (AGB) stars.The core of the data, which can be accessed via its website (http://www.udfa.net), in addition to VAMDC, consists of reaction rate coefficients of several thousand gas-phase reactions and is supplemented by more restricted data sets concerning the chemistry of deuterium fractionation.The database does not contain any surface chemistry, but a file of surface binding energies, which allows processes such as reaction and desorption to be taken into account, is supplied.Where possible, and in line with VAMDC policy, a great deal of effort has been made to identify the precise source of each datum entry through the application of its DOI, in particular for those rate coefficients measured experimentally.Software codes for calculating the chemical evolution of interstellar and circumstellar regions are also provided, as are codes that generate UDfA output files in the form needed for radiative transfer codes such as RADEX (https://personal.sron.nl/~vdtak/radex/index.shtml), RATRAN (https://personal.sron.nl/~vdtak/ratran/frames.html) and RADMC-3D (www.ita.uni-heidelberg.de/~dullemond/software/radmc-3d/), which are used to calculate emergent molecular line profiles from these regions.A major revision of the 2013 release [28], including the review of current data and the identification of new reactions, particularly those associated with chlorine chemistry and with the formation of metal oxides, hydroxides and chlorides in oxygen-rich AGB stars, is underway with a public release due by the end of 2020.

BASECOL
The Rovibrational Collisional Database (BASECOL) database collects, from the refereed literature, the rate coefficients for the excitation of rotational, vibrational and rovibrational levels of molecules by atoms, molecules and electrons.The processes are described in the temperature ranges relevant to the interstellar medium, to circumstellar atmospheres and to cometary atmospheres.The BASECOL database is currently the sole VAMDC-connected database that implements the Java version of the node software (www.vamdc.org/activities/research/software/java-nodesoftware/).It can be displayed from the VAMDC portal with the "Collisional data XSAMS to HTML" processor and is accessible from the SPECTCOL tool [109].Since its last review paper in 2013 [29], the scientific content of the database has been updated with published data.Since the last VAMDC review paper in 2016 [3], the technical components of the database have been entirely replaced [30] with the internal structure of the database, and the data ingestion files have been made compliant with the metadata necessary for VAMDC interoperability.The public graphical interface has been changed to a simpler system.The connection of BASECOL with the Query Store (Section 3.2.2) is in the final testing phase.A complete description of the new BASECOL technical design and updates can be found in this Special Issue [30].

MOlD
The MolD VAMDC node [31,32] provides data for plasma modelling, e.g., for modelling different stellar atmospheres, early Universe chemistry and analysis of the kinetics of laboratory plasma.MolD contains photodissociation cross sections for individual rovibrational states of diatomic molecular ions as well as corresponding data on molecular species and molecular state characterisations calculated using a quantum mechanical method described in [110].Since the previous VAMDC review [3], large amounts of new data concerning alkali molecular ions have been included for reference [111].
The node is hosted at the Belgrade Astronomical Observatory (http://servo.aob.rs/mold).It has enabled fairly easy access to data (in tabulated and graphical form) of thermally averaged photodissociation cross sections across the available spectrum at a requested temperature in order to facilitate atmospheric modelling and other numerical calculations [112].Future plans are to include new (i.e., complex) molecules of astrophysical importance.The MolD database is linked to the Query Store service (Section 3.2.2).

BeamDB
The Belgrade Electron-Atom/Molecule Database (BeamDB) is a collisional database, in which electrons are projectiles while targets are considered to be atoms and molecules [33].The interactions of electrons with atoms and molecules are presented as both differential and integral cross sections for processes such as elastic scattering, excitation and ionisation [113].Since the previous review of VAMDC [3], a considerable volume of new collisional data on metal atom targets has been included from the published sources (e.g., for bismuth [114] or zinc [115]) and selected molecules (e.g., for methane [112] or nitrous oxide [32]).Curation and maintenance of electron collisional data is relevant in many research areas such as astrophysics [116], plasma [117], radiation damage [118] or in lighting applications [119].The plan for the future expansion of the database is to include ions as new targets [120].BeamDB (http://servo.aob.rs/emol) is hosted at the Belgrade Astronomical Observatory and is linked to the Query Store service (Section 3.2.2).

IDEADB
The Innsbruck Dissociative Electron Attachment (DEA) database node collects relative partial cross sections for dissociative electron attachment processes of the form: AB + e − → A − + B, where AB is a molecule.Queried identifiers are searched in both products and reactants of the processes.XSAMS files (see Section 3.1.1)are then returned, which describe the processes found, including numeric values for the relative partial cross sections of the dissociative electron attachment reactions.Additionally, a visual representation of the cross sections can be viewed on the website.Since 2016, the possibility to add cross sections for cationic products and the resolution of several minor issues have been addressed.There is a plan to modify the database structure, so that measurements in matrices such as water or helium nanodroplets can be added.The node (https://ideadb.uibk.ac.at/) is hosted and maintained by the group of Paul Scheier at the University of Innsbruck, Austria.

VAMDC Data Nodes that Have Not Evolved Since 2016
There have been no changes to some databases and nodes since 2016.These will be described in this section, and more information can be found in our previous publication [3].It should be noted that these databases contain quite unusual species and processes for the VAMDC e-infrastructure, for which the current VAMDC visualisation tools and even the current VAMDC standards lack features that would allow a full display of the databases contents.Therefore, at present, VAMDC mainly provides only a simple view of those databases.

LASp
The Laboratorio di Astrofisica Sperimentale (LASp) database is hosted at the Catania Astrophysical Observatory in Italy (http://vamdclasp.oact.inaf.it/GUI/index)and additional information can be found at its URL (http://www.oact.inaf.it/weboac/labsp/index.html).LASp spectra are taken by using in situ techniques and equipment, specially developed to analyse the effects of irradiation (ion and/or UV photons) and thermal cycling (down to 10 K) by infrared, Raman and UV-VIS-NIR spectroscopy.The analysed materials include frozen gases, solids samples and meteorites.The main application field up until now has been in astrophysics and, over the years, many hundreds of ice mixtures of various compositions and of solids have been studied.
Through the VAMDC portal, transmission and optical depth data for water-ice experiments are available [121,122].The public database also includes optical constants of solid-state CO, CO 2 and CH 4 deposited at various temperatures.

PAH
The Cagliari-Toulouse PAH theoretical spectral database [36] is a joint effort by the groups of G. Mulas (INAF-OAC) and C. Joblin (University Toulouse III/CNRS-IRAP) aimed at providing all the "ingredients" needed for modelling the photophysics of individual polycyclic aromatic hydrocarbons (PAHs) in space, mainly in photon-dominated interstellar and circumstellar environments.It includes the basic structural properties of PAHs in four charge states (−1, 0, +1 and +2), ionisation potentials and electron affinities, harmonic vibrational analyses and vertical photoabsorption electronic spectra.The link to the "old" database with flat files, which includes more data than those available through VAMDC, is still online (https://astrochemistry.oa-cagliari.inaf.it/database/).The web interface to the relational database, holding the data available through VAMDC, is hosted at the Cagliari Observatory in Italy (https://qchitool-pah-dev.oa-cagliari.inaf.it/).An effort is underway to develop import tools that will feed the relational database, and we expect some substantial changes within a year's time.Currently, from the VAMDC portal, it is possible to obtain molecular structures, corresponding energies and vibrational analyses for a number of PAHs in different ionisation states.The electronic photoabsorption spectra are not available yet through VAMDC but will be in the near future.

ExoMolOP
The ExoMol project provides molecular line lists for exoplanet and other atmospheres [123] with a particular emphasis on studies of hot atmospheres.Apart from line lists, which are stored as states and transition files [124], the ExoMol database (http://www.exomol.com)stores a variety of other associated data, including partition functions, state lifetimes, cooling functions, Landé g-factors, temperature-dependent cross sections, opacities, pressure broadening parameters, k-coefficients and transition dipoles.These data and the associated data structure are described in the database release papers [125,126].The line lists provided by ExoMol are huge and are too big to be handled by the VAMDC portal; this issue is described below.
Recently, a new offshoot of the ExoMol project called ExoMolOP has been created, which contains opacity cross sections and k-tables [127] for molecules of astrophysical interest [37].This database is built on ExoMol data but contains input from HITRAN [16], the empirical MolList database of Bernath [128] and NIST for selected atoms.ExoMolOP provides data on a grid of 22 pressures and 27 temperatures on a grid of wavelengths for each species.By comparison with the unprocessed ExoMol data, these provide a comparatively compact representation of the absorption property of each species.An implementation of the ExoMolOP data within the VAMDC portal is currently in progress.

SSHADE in VAMDC
Currently, VAMDC allows access to the original Grenoble Astrophysics and Planetology Solid Spectroscopy and Thermodynamics (GhoSST) database (https://ghosst.osug.fr/) to allow searches for a few pure and mixed molecular solids through their constituent species and to retrieve their infrared spectra, either as absorption coefficients or as optical constants.Solid Spectroscopy Hosting Architecture of Databases and Expertise (SSHADE) (http://www.sshade.eu)[35] is a database infrastructure of solid-state spectroscopy that hosts spectral data of many different types of solids, including ice, snow, minerals, carbonaceous matters, meteorites, interplanetary dust particles and other cosmo-materials covering a wide range of wavelengths: from X-rays to millimeter wavelengths.The data are collected from a consortium of partners (https://wiki.sshade.eu/sshade/databases),which provide their data in their own database of the SSHADE infrastructure.Currently, a "band list" database of molecular solids is under development in the frame of the Europlanet-2024 research infrastructure program.It will host critical compilations of the position, intensity, width and vibration modes of absorption bands (visible-infrared or Raman active) of pure and mixed molecular solids as well as for several types of molecular compounds such as hydrates and clathrate hydrates.Both the SSHADE spectral databases and the band list database will be linked to VAMDC, but this will require an upgrade of the XSAMS data model (see Section 3.1.1)in order to describe the fundamental solid constituents better.With such databases, VAMDC will allow the user to retrieve and compare the band parameters of molecular species in both gaseous and solid states and, therefore, will allow them to determine which one actually does contribute to the observed absorption features.This capability is particularly useful in environments where both phases coexist, such as planetary atmospheres with aerosols.

AMBDAS
The Atomic and Molecular Bibliographic Data System (AMBDAS) (https://amdis.iaea.org/databases/) is a library of around 50,000 references to publications in the scientific literature concerning collisional and plasma-material interactions of relevance to nuclear fusion energy research.An online, browser-based searchable interface allows the database to be queried by reactant species, charge state and process type as well as by author, journal and title keyword.
The integration of AMBDAS within the VAMDC infrastructure is planned for release in 2020 as part of an upgrade to the database software, which includes a recently developed DOI-centred reference management library [85] and an updated classification of plasma processes [129].In addition, a new search interface, VSS2 queries and XSAMS output (see Section 3.1.1)are supported.The AMBDAS system will be queried by species and processes, and thus will be accessible from the species database.In addition, since the AMBDAS system is a bibliographic database, it will be queried through the new bibliographic service described in Section 6.5.

DREAM-DESIRE
The Database of Rare Earths At Mons University (DREAM) contains information concerning the radiative parameters (wavelengths, transition probabilities and oscillator strengths) for more than 72,000 spectral lines belonging to the lower ionisation stages of lanthanide elements (Z = 57 to 71), from neutral to triply ionised species.This database, originally created by Biémont et al. [45] and recently updated by Quinet and Palmeri [39], is hosted by Mons University (http://hosting.umons.ac.be/html/agif/databases/dream.html).All the data tabulated in DREAM have been determined from detailed pseudo-relativistic Hartree-Fock calculations, including core polarisation effects [130,131] carried out by the Atomic Physics and Astrophysics group at Mons University, Belgium.The accuracy of the theoretical results have been assessed through comparisons with experimentally measured radiative lifetimes using laser-induced fluorescence spectroscopy.The Database on Sixth Row Elements (DESIRE) contains the same type of information as DREAM, but is dedicated to the elements of the sixth row elements of the periodic table (Z = 72 to 86).This database (http://hosting.umons.ac.be/ html/agif/databases/desire.html) is described in [38].

IAMDB
The volume of high-precision data generated by the Indian atomic and molecular community is quite substantial.This is evident from the number of articles published by various groups.However, such data are not well organised and, hence, are very difficult to retrieve for further use.This is the reason for the requirement of an atomic and molecular data repository in India, which has been envisaged for many years now.The Indian Atomic and Molecular DataBase (IAMDB (www.iamdb.org.in)) is the outcome of such an effort.With the help of the VAMDC interface, the data generated and gathered in IAMDB can be easily retrieved by any user.
In the last few years a great deal of electron scattering data have been produced in India.In particular, partial and total electron ionisation cross sections have been measured over an extended energy regime for about two dozen organic molecules important to radiation biology [132,133].Since many of these systems are in solid form at room temperature, a new experimental technique was employed for the measurements.The result of some important ones like DNA and RNA bases are reported in [132,133].The same group has also measured dissociative electron attachment cross sections for several organic molecules [134].On the other hand, Antony and coworkers have produced a large quantity of electron scattering data, in particular for those molecules and radicals that are difficult to measure [135,136].Recent calculations from this group include positron scattering from a variety of atoms and molecules [137,138] and photoionisation cross sections for polyatomic molecules [139,140].Once these electron/positron/photon collision data are incorporated into IAMDB, it will be integrated into VAMDC.

PEARL
The Photonic Electronic Atomic Reaction Laboratory (PEARL) database in the Korea Atomic Energy Research Institute (KAERI) includes electron impact ionisation (EII), recombination and photoionisation for atoms and ions.The EII [141] and the dielectronic recombination (DR) [142] have been calculated using a relativistic distorted wave approximation, and the photoionisation [143] has been calculated using a non-iterative eigenchannel R-matrix method for the ground and lower excited levels of atomic ions below Z = 30, which are of interest in astrophysics.The EII [144] and the DR [145] calculations have also been performed for tungsten (W, Z = 74) ions, which are essential in fusion tokamak research.The calculated cross section data can be graphically displayed on the PEARL website (http://pearl.kaeri.re.kr, together with other available experimental and theoretical data for comparison; the numerical data can be also downloaded.Recently a collisional-radiative model (CRM) for low-temperature plasma has been developed and the electron impact excitation (EIE) data for the levels considered in the model have been calculated and compiled.The calculated line ratios of He I can be displayed as a function of the electron temperature and density on the website.The CRM results for Ar I [146] will be uploaded in the near future.

Clusters
Clusters are complexes of two up to several million atoms and/or molecules which bridge the gap between molecular physics and solid-state physics.The addition or removal of a single atom or molecule from a cluster may dramatically change its properties.Interesting attributes in cluster physics are, e.g., cluster structures, bond lengths and bond dissociation energies.These properties can be extracted by combining experimental techniques such as mass spectrometry and/or spectroscopy in combination with quantum chemical simulations.
Clusters are not yet addressed by VAMDC.Inclusion of clusters in VAMDC requires the addition of a section to the schema of VAMDC to describe the cluster data.We propose to develop a two-layer implementation of the node.The first layer--which will be available in the portal as well-is for selecting the dominant species of interest (for example "(CO 2 )(H 2 O)").This will return all data sets with clusters of "(CO 2 ) n (H 2 O) m ", (m, n > 0).Often, the produced mass spectra display features of many different cluster ions, because impurities are attached to the clusters of interest.The fragmentation of larger molecules may also display features of further non-stoichiometric cluster progressions.Thus, different queries return the same data set.On the website of the node itself, we will offer more filter options:

•
Method of cluster formation (supersonic expansion, seeded beam, gas aggregation, electrospray ionisation, helium nanodroplets, etc.) Method of ionisation (electrospray ionisation, matrix-assisted laser desorption/ionisation, electron impact, photo ionisation, etc.) Steps in between (tandem mass spectrometry, collision-induced dissociation, etc.) Analysis method (time-of-flight, quadrupole, ion cyclotron resonance, etc.) • Environment (temperature, pressure, etc.) Others (evaluation of data, publication, magic numbers, solvation-effect, etc.) The most important returned data will be mass spectra (which can also be visualised in a browser).Published papers related to the species asked for, as well as possible evaluation programs [147] and their outputs, will be made available.Larger files will be available via a download link found in the XSAMS data file (see Section 3.1.1).Since 1986, a group in Innsbruck [148,149] has been producing mass spectra of clusters via different approaches.These results are planned to be made available in this database.

Overview of the VAMDC e-Infrastructure Components
The e-infrastructure currently connects, in an interoperable way, 38 heterogeneous databases with atomic and molecular data.By providing data producers and compilers with a large dissemination platform for their work, VAMDC successfully removes the bottleneck between data producers and the wide community of atomic and molecular data users.The "V" of VAMDC stands for "virtual" in the sense that the e-infrastructure itself does not contain the data: it is a wrapper that exposes the heterogeneous databases in a unified way.The wrapping software, called the node software [150], integrates a stand-alone database into a VAMDC federated database to become a data node.Each data node accepts queries submitted in a standard grammar (see Section 3.1.1)and provides an output in a standard format.Each data node is entered into the VAMDC registry (see Section 3.1.2)that enables a standardised application programming interface (API) to discover the available resources.

Data Nodes, Query Language and Data Formats
A data node is a database, either pre-existing or created for the purpose of VAMDC, wrapped in the node software that implements the web-service (in this context, a web service is a data source on the World Wide Web designed for access by the application software, c.f., a web page designed to be interpreted by a human intellect), protocol VAMDC-TAP, which is derived from the International Virtual Observatory Alliance (IVOA) (http://ivoa.net/)Table Access Protocol (TAP).(http://www.ivoa.net/documents/TAP/20190927/)TAP enables an application to query a remote database.
All VAMDC-TAP services support a common data model, query language and output format.The data model, expressed in the VAMDC Dictionary, represents the data both in a tree structure and as a standardised set of virtual tables.The query language VAMDC SQL Subset 2 (VSS2) (http://vamdc.eu/documents/standards/queryLanguage/vss2.html)operates on these virtual tables.The common output format is the XML Schema for Atoms, Molecules and Solids (XSAMS) (https://standards.vamdc.eu/#data-access-protocol-query-language-and-dictionaries)[151] that uses the tree-structured form of the data model.All VAMDC-TAP services allow XSAMS output while the nodes may optionally support other formats.XSAMS is highly flexible and expressive, but other formats may be preferred for compactness.
Libraries in Python of the node software are provided by the VAMDC consortium.The node owner configures these libraries for the database of choice by adding small translation functions from VSS2 to SQL operating on the actual database, and from the query results to XSAMS or other formats.Additional information and examples of best practices were described by Regandell et al. [150].

Registry
The VAMDC metadata registry (http://registry.vamdc.eu/registry-12.07/main/index.jsp)lists the details of the VAMDC data nodes.Applications use the registry to decide which databases should be queried and to locate the services for those databases on the internet.The VAMDC registry is based on the work of Astrogrid, which was the UK's Virtual Observatory development project from 2001 to 2010 [152].They developed a registry whose interface is based on the then-current IVOA standard.To simplify the access to this registry, the VAMDC consortium provides Java and Python libraries that are used by the VAMDC portal, among other applications.

The Portal
The VAMDC portal (https://portal.vamdc.eu/vamdc_portal/home.seam)[153] relies on the infrastructure elements previously described in this section to provide seamless access to the inter-connected VAMDC databases.Through this unique interface, a user can query any database member of the VAMDC infrastructure and can retrieve data in the common shared file format VAMDC-XSAMS (see Section 3.1.1).The page displaying the resulting data recalls the exact query processed by the infrastructure to produce the data (for example, see Figure 1).The portal embeds processors to convert data from the XSAMS format into several formats (chosen by the user) and has several graphical tools to visualise the extracted data.Moreover, the portal implements the IVOA-SAMP protocol [154] created to connect scientific tools when working with multiple data types.VAMDC data can be directly piped from the portal into any tool implementing the SAMP protocol, e.g., TOPCAT (http://www.star.bris.ac.uk/~mbt/topcat/).

Services Built over
Existing VAMDC Infrastructure Since 2016

The Species Database
To overcome the problem of species identification, the VAMDC consortium created, in 2016, a centralised chemical species repository called the species database.Updated daily by automated collation of data from the VAMDC data nodes, it contains a list of all the species in each VAMDC database.Every species is identified uniquely by an InChIKey [155], a hash code generated from an InChI description (https://iupac.org/who-we-are/divisions/division-details/inchi/,https: //iupac.org/who-we-are/divisions/division-details/inchi/).In the species database each InChiKey is associated with the different ways of identifying a species, e.g., their chemical names, formula, stoichiometic formula and Chemical Abstracts Service registry-number number.By adding a representational state transfer REST API (a restful API is a interface between a web-based client and a server that exploits representational state transfer (REST) constraints) and a web graphical interface to this species database (https://species.vamdc.eu),we provide a versatile tool to explore the species content of the atomic and molecular VAMDC-connected databases.Using this REST API of the species database, the VAMDC portal provides both an autocomplete suggestion for species names and identifiers and a feature to discover the isotopologues of a species.Thus, it is possible to specify, very precisely, which species is the most relevant to a user's search.

The Query Store
By using the VAMDC facilities, scientists can easily discover atomic and molecular resources and access their data in a unique and practical way.However, as VAMDC has been adopted by a larger range of communities, it has revealed a new set of challenging issues linked to citation and data reuse:

•
The VAMDC infrastructure data are dynamic.A database displayed through the VAMDC infrastructure may evolve over time: the most recent and precise version of given data may replace old ones.We therefore needed mechanisms to allow for the citation of dynamic data.

•
The data set provided by the VAMDC infrastructure always contains the references of the papers used for compiling the data sets.However, the citation process may become cumbersome when the extracted data sets come from many sources.
VAMDC is addressing these issues at the data-community level and, in 2014, VAMDC joined the Research Data Alliance (RDA).The RDA, through its Data Citation Working Group and RDA/WDS Scholarly Link Exchange (Scholix) Working Group, has defined new citation models in the digital era.We succeeded in implementing the RDA recommendations to provide the VAMDC users with a Query Store [156], a tool to facilitate the citation of the data extracted from VAMDC for scientific reproducibility and for giving due credit to data producers.Through the Query Store: • each query served by the infrastructure is identified by a persistently unique resolvable identifier; • the query-produced data may be assigned a DOI (DOI: digital object identifier, a formal name for a document or data set in a standard format intelligible by software.;• data become directly citable by their DOI. When registering a DOI, the authors of the papers used for compiling the data appear in the "references" part of the DOI metadata schema.The data set sources become the references of the DOI.The authors/papers referenced in the VAMDC extracted data set will automatically get credit when the data set is cited (using the DOI) in a paper.
Table 2 lists the VAMDC databases currently integrated with the Query Store service, whose queries may be discovered by accessing the URL (https://cite.vamdc.eu);this service is the most complete collection of VAMDC queries that may be cited as example of the queries performed through the infrastructure.

Pending Technical Issues
In addition to all of the advancements of the individual databases and VAMDC as a whole, there are still technical issues that need to be addressed in the future.

Treatment of Big Data
The growth in large data sets, which is common to all areas of science, is also true in A&M physics.VAMDC uses a tightly defined and rigorous data structure that is also, as a result, relatively verbose.This means that the data returned by queries to some databases such as VALD and KIDA are already limited by the software so as not to cause problems.Moreover, if many current VAMDC molecular spectroscopic databases contain data with a relatively limited amount of spectral lines for each molecule (HITRAN, JPL, CDMS, etc.), some others (MeCaSDa, etc.) contain extensive calculated line lists, and, in this case, their number for a given species can be very large (several millions of spectral lines at least) and could increase even more with subsequent updates.At present, there is a limit within the VAMDC data retrieval volume.Queries producing large data extracts time out or are truncated by the data nodes to avoid timeouts.This prevents downloads of line lists over a wide spectral range or for many molecules at once.The actual limits vary from node to node according to the computing resources invested and the details of the database.
For example, when asked for all data connected with lines between 150 nm and 800 nm, seven nodes produced a complete result, three produced a truncated result, two failed and one timed out; the others quickly reported that they had no relevant data.The largest response was for 67 million lines in 218MB of XSAMS, and this is close to the practical limit of the system.For this data set, three of the recommended displays and format converters were able eventually to process the data, one failed and one produced no results within 10 minutes.This is a problem for applications in atmospheric planetary physics and exoplanet studies.A solution currently under investigation is the possibility of asynchronous queries that run slowly in the background and leave their results cached and accessible by an ephemeral URL.
There are also whole databases that are currently really too big to be usefully probed via the VAMDC portal.A particular area that produces very large data sets is the provision of molecular line lists for studies of hot atmospheres.The ExoMol [125,126], TheoReTS [157] and the NASA Ames group (e.g., [158]) all employ theoretical methods to compute very extensive lists of molecular transitions particularly for high-temperature applications in astrophysics.Many of these line lists are huge; for example, the ExoMol line list CH 3 Cl contains 166 billion transitions for each of the two main isotopologues [159].Even at room temperature, complete line lists for relatively heavy long-living greenhouse molecules with low-frequency vibrations like CF 4 [160] or NF 3 [161] require billions of transitions to converge opacity calculations over infrared wavelengths.Data sets of this size are completely outside the current capabilities of the VAMDC project.Some work has been performed on data compacting, in particular using the so-called super lines approach [162,163].Another method of big-data compression is to use "effective" lines, as implemented in a recent addition of methane [89] to the HITEMP database [86].Use of these protocols can lead to a very significant reduction in the size of the effective line list, but these compressed lists still contain many millions of lines making it difficult for the current VAMDC infrastructure to process or download the full line list even after it is compacted.We note that the ExoMol line lists have recently been used to generate a set of temperature and pressure-dependent opacities [37] creating the so-called ExoMolOP database.These opacity functions provide a much more compact representation of the molecular data, and an implementation of the ExoMolOP database within VAMDC is currently being explored.A similar direction is currently being followed for the TheoReTS information system (http://theorets.univ-reims.fr,http://theorets.tsu.ru)containing a theoretical and experimental atlas of methane absorption/emission cross sections [164] up to 1000 K.

Selection and Comparison of Atomic States
It is a long-standing goal of VAMDC to be able to combine data from different databases that relate to a given state or energy level of an atom or molecule.This allows comparison of measurements or theoretical values from different sources, and also permits a fuller set of data to be built up: e.g., energies for radiative transitions from one database combined with broadening values from another.This comparison needs a common labelling scheme for states and energy levels.
This common labelling is already achieved for simple molecules in the "case by case" formalism of XSAMS.The tool SPECTCOL [109], which takes spectral data from CDMS and collisional excitation rates from BASECOL, exploits this.The labelling of states simplifies the current, theoretical understanding of the molecules sufficiently for the software to suggest states that might be treated as equivalent: i.e., it performs a "fuzzy" match from which an expert user can pick states to treat as true matches.
It is currently much harder to combine data on atomic states and levels.This part of the XSAMS data-model allows greater precision and flexibility in the description of a level at the cost of easy matching between databases.We hope to retain the current accuracy of the data model while augmenting it with extra notations that are less precise but more consistent between databases.Thus, we may recover the ability to make "fuzzy" matches between databases.The restricted notations introduced by the PyValem software-package (https://github.com/xnx/pyvalem)may be relevant here.
We emphasize that we are not seeking exact and complete quantum-mechanical descriptions of all states.We understand that such a description may not be possible with the current, theoretical models of atoms, which are themselves approximate and incomplete (except for the simplest atoms).Furthermore, state descriptions precise to the limits of current theory may not be unique, e.g., due to underlying assumptions about coupling and state mixing.Thus, the VAMDC consortium is exploring hierarchical structures for describing atomic terms and electronic configurations.At the highest, most precise level, these would describe the atoms to the full extent allowed by current theory.At the lower, more approximate levels, they would allow automatic matching of data that probably describe the same states, leaving final interpretation to expert users.As a bonus, we hope to make it possible to only select the data relating to particular atomic levels and states from the databases.

Updating of Node Software
The common Python software on which most data nodes are built relies on a small suite of third-party libraries.We wish to keep this infrastructure baseline constant as long as possible to enhance stability and reduce maintenance for the node operators.However, it is now necessary to upgrade to newer versions of the libraries.Notably, Django (https://www.djangoproject.com/)v2, which is no longer supported, must be replaced with v3.This may force some changes in the customisation of each node since Django upgrades are rarely compatible in all APIs.
Provided that old and new versions of the node software support the same version of VAMDC standards, we do not require all the data nodes to be upgraded at the same time.

Applications and User Case Studies
Any database lives from the use of its data, the more the better.Many applications need concerted access to different databases, just what is provided through VAMDC.Here, we present a few of these applications that demonstrate the use and benefit of VAMDC.[165] VO service from IVOA registries) and identifies atomic and molecular species through its link to databases such as JPL [15] and CDMS [74] via a SQLite database.Species identification may also proceed via a direct access to VAMDC for any available spectroscopic database using the Table Access Protocol (TAP), which is based on the IVOA format.
Modelling is performed in Kelvin units, and a conversion tool for Jansky units is available.Local Thermodynamic Equilibrium (LTE) and non-LTE modelling tools are provided.CASSIS gives ∼80 collision files constructed from the LAMDA [166] and BASECOL [29,30] databases, as well as from local files, such that their quantum numbers, Einstein coefficients, upper energy levels and rest frequencies match the request made within CASSIS.Indeed, some molecular species appear both in JPL and CDMS with sometimes different spectroscopic parameters, hindering the request for the modelling of line profiles.In the near future, CASSIS will use the SPECTCOL tool [109], developed by the VAMDC consortium, to associate spectroscopic data provided by spectroscopic databases with collisional data provided by collisional databases to obtain the most up-to-date rate coefficients.Analysis modules for line and baseline fitting, resampling, rotational diagram analysis and χ 2 minimisation with Jython (https://www.jython.org)scripts are available to find the best-fit model parameters, such as column density, T k (or T ex ), n(H 2 ) or source size.Figure 2 shows an example of an observed spectrum (in black) of a CO transition compared with LTE modelling (red) of the CO transition using the CDMS node in VAMDC.The vertical blue bar corresponds to the requested CO transition unshifted for the V LSR (velocity in the standard of rest).The vertical green bars correspond to the transitions that appear in VAMDC with the thresholds on the upper energy levels and Einstein coefficients, and shift in V LSR (right side of the Figure ).The spectroscopic parameters are listed in blue for the observations and in red for the model with the resulting opacity and excitation temperature values.When VAMDC proposes different frequencies for a given transition (derived, predicted, experimental values) that may even originate from the same database, CASSIS selects the frequency with the minimum uncertainty.The right side of the figure shows the species available in the database and the possible thresholds on the Einstein coefficients and upper energy levels.The V LSR is also given for the shift applied to the databases' frequencies, and the green bars below the spectrum correspond to the possible transitions within the frequency range.

XCLASS-eXtended CASA Line Analysis Software Suite
XCLASS (https://xclass.astro.uni-koeln.de/)[167] is a full message passing interface (MPI) parallelised toolbox for the Common Astronomy Software Applications package (CASA), aimed at fitting spectral line data from astronomical sources observed both with interferometers or single dish telescopes.XCLASS models a synthetic spectrum that is automatically compared to the data with the aim of providing a measurement of physical quantities, such as the temperature, molecular abundance and velocity.Molecular data required by XCLASS are taken from an embedded SQLite3 database containing entries from CDMS [74] and JPL [15] that is populated and updated via the Python library VAMDCLIB, which queries the data directly from the VAMDC nodes.XCLASS offers the possibility of describing molecules and radio recombination lines (RRLs) in LTE and non-LTE conditions using the RADEX (https://personal.sron.nl/~vdtak/radex/index.shtml) formalism.Hereby, local overlap of lines can be taken into account as well.Furthermore, non-Gaussian line profiles such as Lorentzian, Voigt (including pressure broadening for RRLs) and Horn can be used.Different continuum contributions like dust, free-free and synchrotron emissions can also be modelled.Finally, complex source structures can be described by using sub-beam descriptions and component stacking.The toolbox contains an interface for the model optimiser package Modelling and Analysis Generic Interface for eXternal numerical codes (MAGIX) [168], which helps to find the best description of the data using a certain model, i.e., finding the parameter set that most closely reproduces the data.XCLASS can also automatically identify the molecules producing a given spectrum.
In XCLASS, the myXCLASSFit function can be used to fit multiple spectra, i.e., to fit multiple frequency ranges simultaneously in multiple files (see Figure 3).The function returns the optimised model parameters and the corresponding modelled spectra. .In addition, the XCLASS interface contains the myXCLASSMapFit function that fits one or more complete (FITS) data cubes.For this the myXCLASSMapFit function reads in the data cube(s), extracts the spectra for each pixel and fits each spectrum separately.At the end of the whole fitting procedure, the myXCLASSMapFit function creates FITS images for each free parameter of the best fit, where each pixel corresponds to the value of the optimised parameter taken from the best fit for that pixel (see Figure 4).Some applications of this include temperature maps, as well as first and second moment maps, which are based on the simultaneous fitting of many lines and are fairly robust against line confusion and blending of single lines-this can be a severe issue in many ALMA data sets with line-rich sources.The new CubeFit function is an extension to the myXCLASSMapFit function, and can be used to describe data cubes by physical models.

Use of Stark-B Data
From an analysis of applications of Stark broadening parameters [171], calculated with a semi-classical perturbation method [68], it was shown that the main users of Stark-B data are astronomers, who use them for the investigation of A and B type stars, white dwarfs and hot stars in late evolution stages-for example, PG1159 type stars (these are the prototypes of stars with a hydrogen-deficient atmosphere that are in transition between being the central stars of planetary nebulae and hot white dwarfs, named after their prototype, PG1159-035).[172].The data most often used for this purpose primarily concern He I and Si II.An additional usage of these data is for plasma modelling in physics and technology, primarily for laser produced plasma with some emphasis on Stark broadening parameters of Zn I.An analysis of the citations within the Stark-B database shows that its data are also used for different analyses of regularities and systematic trends, as a source of a large number of homogeneous data obtained in the same way [173].The Stark-B database is often cited in studies of spectral lines within laboratory plasma.

Examples of Concrete User Issues
Through the federation of resources and the adoption of common standards, the key features of the VAMDC e-infrastructure are its ability to query all resources with the same type of queries and to retrieve data identified with the same metadata.Theoretically, this is a very big step towards speeding the retrieval of data in a safe and secure way.As an example, through VAMDC, it is impossible to make a mistake when querying a given atomic species with its given ionisation stage: all databases will return data concerning that specific query.On the other hand, it is easy to query the wrong ionisation stage by going directly to the different databases.For example, a question from a user was: "I searched Ca + in the wavelength range from 4800 Å to 4950 Å directly from NIST, from VALD, and from VAMDC/NIST and VAMDC/VALD.None of the transition wavelengths seem to be identical, what shall I take?"It turned out that the user had in fact queried Ca + in NIST and Ca in VALD.The VAMDC query was obviously the same both for VAMDC/NIST and VAMDC/VALD, but the values seemed to differ between NIST and VAMDC/NIST.It turned out that VAMDC transfers by default wavelengths in a vacuum, while a direct request to NIST provides wavelengths in air and a direct request to VALD provides wavelengths in a vacuum.This simple example shows the difficulty that users might encounter.
In spite of major advances developed by the VAMDC federation, we have identified several user issues of such an e-infrastructure, which, in turn, might prevent its wider adoption by several communities.Users have their habits with queries and output formats; to a lesser extent, they question the data directly obtained from long-established databases.Databases sometimes offer science services that facilitate the users' work; for example, helping the user to choose the right set of data from the database data for their application.Other databases may include recommendations of their own data whenever, for identical processes, the database contains different values.For example, the BASECOL database keeps historical collisional sets even when new data sets are included for the same process, and data sets are labelled as recommended or not.This local feature is not currently transferred through VAMDC.Some knowledgeable users would like it if the VAMDC portal provided the functionality to choose the databases prior to specific queries on species and processes, thus avoiding unnecessary waiting times.Users deploying applications such as CASSIS (Section 4.1) would appreciate the option to make total or partial copies of the databases locally in order to speed up their application.Such copies would be coupled to a rapid search for the availability of updates.Some users would also appreciate a lighter format than the XSAMS format.
We can therefore identify the following non-exhaustive and general issues to address in the future: (i) trust in a new numerical environment; (ii) the ability to quickly understand and compare data; (iii) the ability to choose the right data for its own usage and for the improvement/development of tools.Some of the future lines of work are outlined in Section 6. Closer collaborations between the VAMDC community and different user communities will be addressed through a VAMDC users committee whose representative will be part of our board.

Node Data Standardisation and Quality Control
An interesting impact that VAMDC has on participating databases is the pressure to introduce accurate and complete descriptions of data elements stored on individual nodes.Such descriptions are necessary for the uniform representation of A&M data in XSAMS.In practice, the construction of the translation dictionary between the local node and XSAMS requires unambiguous interpretation of data elements.While this sounds trivial for physical parameters like energy and frequency, the task becomes increasingly more difficult when dealing with energy level classification, quantum numbers, mixed states, etc.For the nodes that store data compilations from different sources and authors (e.g., VALD) the connection to VAMDC provides a good motivation for setting internal standards.Without it, researchers maintaining the local node would be much less interested in spending time, for example, on converting or creating atomic energy level descriptions in the appropriate coupling scheme following the format that is unique and can be automatically analysed to get the main quantum numbers.Once this work is done, the XSAMS description of energy levels becomes straightforward.Standardisation also opens opportunities for the verification of the basic quantum mechanics selection rules, cross-comparisons between data from different authors or/and data between nodes as well as trust assessments of expert data [174].Mistakes, typos and misinterpretations discovered in this process significantly improve the quality of the data products and reduce the amounts of errors introduced between data production and its extraction through the VAMDC portal.Standards introduced by VAMDC play a decisive role in this improvement, while the practical work, of course, falls on the shoulders of people maintaining local nodes.

VAMDC and the FAIR Principles
VAMDC experts have been involved for several years in international data-sharing organisations (see Section 5.5) and have anchored ante-litteram (an early definition of the Findable, Accessible, Interoperable and Reusable (FAIR) principles was given by Larry Lannon from CNRI in 2012 when he introduced the DAIR principles-the D standing for Discoverable would be replaced by the F for Findable) the FAIR Principles [175] into the design of the VAMDC infrastructure.The FAIR principles are implemented with a fine-grained granularity:

•
Findable: data coming from the infrastructure are assigned persistently unique identifiers (Section 3.2.2),are described by rich metadata schemes and are indexed into public registries (Section 3.1.2).

•
Accessible: the extraction query relies on open, documented standards (Section 3.1.1).

•
Interoperable: the data extracted from VAMDC are formatted using the XSAMS standard (Section 3.1.1).Moreover VAMDC implements widely adopted international data interoperability standards (Sections 3.1.3and 3.2.2).

•
Re-usable: the provenance and sources of all the data are documented in each data set extracted from VAMDC.Data tools are provided to convert VAMDC data into widely adopted community data formats (Section 3.1.3).

GrafOnto Collection of Scientific Plots
The construction of databases, extracted from tabular resources in spectroscopy, required VAMDC to pay attention to the spectral functions presented in the form of plots and figures.The basis for creating a database of spectral functions was a digital library of more than a thousand articles with plots and figures, containing spectral functions for the problem of continuum absorption [176], the properties of weakly bounded complexes, including atmospheric molecules [177] and the absorption cross sections in the near and far ultraviolet ranges.In these subject domains, the amount of graphical information is much larger than the amount of tabular information.We identified typical examples of plots and figures under study and created ontologies [105] of graphical resources, tools for integrating plots and means for citation quality assessment [107].
At present, the collection contains about 3500 original and 1055 cited primitive plots combined into 980 composite plots and 166 composite figures.The uploaded plots describe the properties of 19 molecules, 44 complexes and 66 mixtures.In total, 2338 primitive plots characterise the properties of a water molecule and water dimer.The GrafOnto graphics system is available at its URL (http: //wadis.saga.iao.ru/rdf/plot/plot.rdf/).

New On-Line Databases and Community Portal
The VAMDC project has directly stimulated the creation of new online databases such as TFSiCaSDa and UHeCaSDa [19], and the deployment of the theoretical PAH database within VAMDC [3,36], while motivating the development of an experimental databases for PAHs.The VAMDC infrastructure deployment also encouraged a similar strategy for experimental databases for solid-state data in SSHADE (Section 2.3.2).Collaborations with VAMDC directly led to a renewed effort to compile local data in India (see Section 2.3.5).The RADAM collaboration adopted the VAMDC standards and designed a RADAM portal (http://radamdb.mbnresearch.com/#)that offers access to the VAMDC connected databases, in addition to some other databases, to their community.
The Belgian repository of fundamental atomic data and stellar spectra (BRASS) [178]-which aims to provide the largest systematic and homogeneous quality assessment of atomic data to date in terms of wavelength, atomic and stellar parameter coverage-retrieved atomic data from various repositories and cross-matched these data.The majority of repositories were queried using the VAMDC e-infrastructure and the BRASS consortium thus acknowledged the efforts of the VAMDC team in homogenising the repositories, as this has helped to expedite their comparisons and cross-match work.

Sustainability Issues
Today's science often seeks to treat complex applications such as the modelling of planetary atmospheres.The large amount of data needed to make realistic simulations is in sharp contrast to what a single researcher or institution is able to collect.Therefore, there is a common need and interest for large repositories of data and their accessibility by the community.Thus, the need for sustainability of scientific data has been understood also by funding agencies and, on a larger scale, by science policymakers.All national and international calls for scientific projects now routinely demand that applicants provide a sustainable way to make validated scientific data from the project available to the general community in the long term, often specifying open access.
The sustainability issue occurs on different levels.At the lowest level, original data from laboratory measurements, computations or field measurements, e.g., astronomical observation data, are stored in a repository such that at least their validity can be checked after many years.The next level concerns processed data being kept in a certain format to become available for later analysis in order to avoid unnecessary repetitions of work.All of this requires a minimum level of documentation in order to retrieve the data.In that respect, the association of data with a DOI or other identifiers is currently being developed and becoming available on national and international levels, but it requires infrastructure and organisation down to the level of individual institutions.Common standards such as the DOI are important means to reach a reasonable level of documentation and accessibility.
Databases connected to VAMDC offer a reliable and established way of supplying data that are processed even further and, therefore, are among the most advanced answers to the concern of sustainability.The VAMDC consortium has agreed common standards and descriptions, which serve as a one-stop shop for a plethora of atomic and molecular data.By defining common schemes for the data stored in the various databases, VAMDC has reached an invaluable level of interoperability and, therefore, a unique level of sophistication.VAMDC is an open platform that welcomes databases from all over the world and, as such, provides a focal point of atomic and molecular data.As a result, VAMDC serves as a role model to structure research data on international, national and local levels.We can cite, for instance, the DAT@OSUportal that gathers metadata for all research databases in the Bourgogne Franche-Comté region in France [179].
Sustainability issues are also related to human training and capabilities.In that respect, VAMDC has trained engineers and post-docs who are still involved in the fields of software developments, database management, development of standards for the benefit of the public sector in the VAMDC collaboration and in other domains, as well as in the private sector.

Impact on Open Science Initiatives and International Data Alliances
VAMDC delegates are active and have a high reputation in international data alliances, where they bring the VAMDC requirements and their expertise about open-data sharing: The current Executive Director of the Consortium represents VAMDC in the Group of European Data Experts in RDA (GEDE-RDA), and is one of the two co-chairs of this group.The aim of GEDE-RDA is to promote, foster and drive the discussions and consensus-forming in creating guidelines, core components and concrete data fabric configuration-building based on a bottom-up process.To achieve these goals, GEDE-RDA is composed of a group of European data professionals appointed by invitation from various European Research Infrastructures and some specialists from the Research Data Alliance.The core group includes delegates from 47 European research infrastructures.The chairs from the European Strategy Forum on Research Infrastructures (ESFRI) and the e-Infrastructures Reflection Group (eIRG) agreed to take an observation role.Through our participation in GEDE-RDA, both the VAMDC experience and vision directly contribute to the definition of international standards [182] and to the definition of the European Open Science agenda [183].Since November 2019, the VAMDC Portal, the Species Database and the Query Store (see, respectively, Sections 3.1.3,3.2.1 and 3.2.2) have been registered as parts of the European Open Science Cloud (EOSC) hub and VAMDC is a service provider for the EOSC marketplace.
Our involvement in Open Science has been acknowledged at the French National level, where the VAMDC executive director has been invited to join the The Research Data College, which is responsible for defining and implementing the French Open Science data agenda.

Visions of the Future
When the VAMDC project was first conceived, the thinking was that if we could introduce a strict and complete standard for describing atomic and molecular data, work out the transport protocol, the (standard) interface to the individual nodes and the common user portal, it would solve most of the problems limiting user access to the variety of data collections and data formats available on the web.VAMDC has made enormous progress in establishing the XSAMS standard for describing energy levels in atomic and molecular species, the node software-with its dictionaries for quick and reliable interfacing to the individual nodes-and the portal.VAMDC introduced a registry for the regular indexing of node status and data content, publishing tools to encourage data providers to publish new data in VAMDC, query storage and many other tools.Essentially, all the initial goals have been achieved and we have demonstrated all the functionality and reliability aspects of the new infrastructure.Now, after several years of using and promoting VAMDC to other users, we see that our original goals, while ambitious, were not sufficient to make VAMDC the ultimate source of A&M data.The list of problems that an average VAMDC user is facing includes difficulties in automatically analysing XSAMS files, restriction on their size, complications in cross-identification between XSAMS generated by different nodes, and some missing functionality offered by the nodes through their native interfaces, etc.Thus, the work on VAMDC is far from complete, but it has a strong foundation and an extensive toolbox that can be built upon in the future.
In order to enhance the usefulness of the VAMDC system in the delivery of A&M data to a wider community of users, VAMDC needs a more "understanding" interface, much more flexible means for extracted data manipulation and new complex data selection constraints, for example, based on models of the environment.In the following sections, we discuss our plans to develop VAMDC in the future.

User Interface
Our attempts to be very precise in describing atomic and molecular configurations with XML schema have had an unexpected adverse effect on the user interface implemented in the portal.At first sight, the portal seems to have an amazing flexibility-one can start from specific energy levels or processes-but the flexibility is massively reduced by the rigidity of the XSAMS requirements and "restrictables" (restrictables refer to the data types for which selection criteria can be included in the VAMDC query, https://standards.vamdc.eu/dictionary/restrictables.html)supported by each node.The only field that has some rudimentary attempt at translating user inputs into alternative representations is the field "species", which allows auto-completion.This is made possible through the species database described in Section 3.2.1.In the future, VAMDC will explore the possibility of creating/adopting an engine for the interpretation of much more relaxed formulations of the user's request, using iterations that gradually narrow the requested data set to the user's needs.This more intuitive approach is complemented by the final query format available to the user (e.g., through the Query Store described in Section 3.2.2),allowing for the quick automation of the process.

Complex Restrictions in VAMDC Request
Allowing for fuzzy definitions in the user interface through the portal may (and will) require additional levels of complexity in the request.For example, the interaction with the user may result in more than one complementary request being previewed before a final decision is made.This can be further extended to the model-based selection, when constraints are set on properties sensitive to the environment-for example, spectral line strengths in the Earth atmosphere or in a given experimental or industrial setup.Estimating such derived properties requires additional information, such as the equation of state and partition functions, which, at the moment, are not offered through the VAMDC interface.Another example of such a complexity layer that could be useful is the ability to select multiple processes such as Raman transitions.Some molecular databases contain Raman transitions (e.g., MeCaSDa and SHeCaSDa).

Large Data Sets
As discussed above, the current VAMDC portal is not designed to easily handle databases with many millions or even billions of transitions.At the smaller end of these, it is possible to include the database but limit the amount of data that can be downloaded.At present, very large databases are simply excluded.A possible solution could be to allow asynchronous queries followed by an e-mail sent to the user with a link to the result.
More concise data sets may require sacrificing the completeness of XSAMS.Formats that represent projections of XSAMS structures onto single or multiple tables, and independent tables such as FITS, are not able to carry the complex inter-relations of the data model.Indeed, FITS is an open standard mainly designed for image transport (as its name indicates).It may still be used in specific cases when requested data can be arranged in a table, for example, a line list with data for computing opacity tables.In this case, the FITS headers could provide the metadata explaining the meaning of numerical elements.In more general cases, the complexity of relations in atomic and molecular physics calls for tree-based formats like XSAMS, that is, a graph oriented data model.
Relational sets of tables could be the right compromise.The most concise but generally useful format may be an SQLite file containing the query results in the form of an independent, relational database.Supplying extracted data as miniature databases also provides an easy way to query within the data subset stored on the users' own computers.SQLite is a well-established tool with widespread support in scientific computing.Ultimately, this approach allows a user to obtain a database that is a cached copy of the data in a VAMDC node, and this can readily be refreshed from that node as data are added or changed in the node.The VAMDC infrastructure will be developed to support these cached copies and to embed them into applications, as has been demonstrated in the user case studies discussed above in Section 4.

XSAMS Manipulation
We have previously considered creating a library for XSAMS manipulations but we found this option to be insufficiently flexible.We will adopt a scripting language capable of handling one or several XSAMS data sets, allowing the creation of any simplified projection of XSAMS, unit conversion, cross-identification between XSAMS, formatting of the output, handling conditions, loops and data structures.The language should be simple enough for general users to learn (e.g., resembling Python or other popular programming/scripting languages).The language will be supported by a library of critical functions such as automatically analysing XSAMS, unit conversion, etc.This will provide a different level of flexibility for end users that could be integrated with asynchronous queries to the VAMDC request or with automated sequences of multiple requests.

Prototype of a New Bibliographic Service
We note that all the VAMDC data have a direct reference to the publications where the data have been originally presented, described and explained.The idea behind the new bibliographic service we have designed and are now prototyping is to use this bibliographic link "in the reverse direction" and discover data through publications.The resulting digital library portal, inspired by the Astrophysics Data System (https://ui.adsabs.harvard.edu),provides the community with a unique one-stop shop for the bibliographic information contained in VAMDC; the entry point is a tool where users may look for publications.The proposed filtering criterion for the search refinement is the author(s), title and year of publication.The results corresponding to the submitted bibliographic query are displayed in a tabular form where each line corresponds to a publication.For each publication: • VAMDC provides the main bibliographic information complemented by the list of the VAMDC nodes containing data related to that particular publication; • when supported by the node, VAMDC provides direct link(s) for extracting the XSAMS data related to the publication from the VAMDC node.
This new service may be used as a tool for identifying data set overlaps between VAMDC nodes.

Semantic Search of Qualitative Tabular and Graphic Resources in Quantitative Spectroscopy
A semantic search for information resources is possible for systematic data, provided there is an explicit description of their properties in the form of ontologies (logical theories) [105].In the Semantic Web approach [184], the rules for constructing such ontologies are defined in the languages of their specification [185].In VAMDC, the quality of spectral resources is determined by the data standardisation.This includes their analysis, validity control and trust assessment (see Section 5.1).A semantic search in molecular spectroscopy was implemented in 2007-2019 to find qualitative information resources in quantitative spectroscopy of atmospheric molecules [186], qualitative energy levels and molecular transitions (http://wadis.saga.iao.ru/saga2/ontology-3atomic-molecules/)[187,188], as well as spectral functions [189], used to solve the problems of continuum absorption and to describe the properties of weakly bounded complexes.The accumulated experience allowed systematic organisation of the semantic searches for high-quality spectral resources, states and transitions, as well as spectral functions in atomic, ionic and solid-state spectroscopy that describe the processes of absorption and emission.
A crucial task in the coming years is the organisation of a semantic search for spectral data in applied fields (astronomy and atmospheric optics of exoplanets), in which atoms, ions, molecules and complexes play an important role.The systematisation of these subjects will simplify the search for necessary spectral information.The main challenge that needs to be overcome is the construction of a template that allows the user to submit their requirements to be directly implemented as a set of queries to a semantic site.For this, it is necessary to build typical ontologies [105] of applied domains.The crucial task here is a reduction problem, defining a structure for each of the applications, which requires cooperation with experts in these applied domains.

Visualisation and Data Access from Python
Good data visualisation is important, particularly in data set exploration.For larger data sets, dynamic display becomes essential, wherein the user scrolls through the data and zooms into areas of interest.Currently, VAMDC provides some static displays of data (generated on the servers as web pages) and some dynamic displays (generated by a program on the user's computer using data downloaded from VAMDC).Neither of these approaches works well with large data sets; static pages are not sufficient and downloading large and rich data sets to plot only a small fraction of the data is frustratingly slow.
Plotting large data sets is a common problem in scientific computing, and, in recent years, some solutions have been developed.The Python libraries Bokeh (https://bokeh.org),Holoviews (http://holoviews.org) and Datashader (https://datashader.org) can be combined to build plots that can be generated on a server, viewed in a web browser and that are usefully interactive even with large data.The graphics packages also work well in Jupyter notebooks (https://jupyter.org/).To exploit this, we should provide code that allows such a notebook to extract data from VAMDC, either via a downloaded XSAMS-file or directly from data nodes.

Link Data to Evaluation Ratings
We foresee handling the issue of linking data to evaluation ratings performed by review panels.The intention is to facilitate user choices among data sets provided by the various databases.This will involve a scientific organisation within VAMDC, collaborations with evaluation bodies and panels and, finally, technical developments in order to use the reviewed information.
6.9.Open Science for the Future VAMDC is collaborating with the EOSC Secretariat to study an innovative Flexible Semantic Mapping Framework to achieve a scientific and interdisciplinary interoperability between the services within the European Open Science Cloud.The VAMDC consortium has explored the development of innovative services for automatic data quality assessment based on the interlinking between data and scientific papers (see Section 3.2.2) in combination with Artificial Intelligence and semantic techniques.In the context of Open Science, this approach may be adopted by any other discipline.The VAMDC consortium is evaluating the opportunity to put these ideas into effect as part of a future European project proposal.

Quality Status and Future
It should be stressed that the VAMDC infrastructure does not collectively address the issue of the data scientific quality.The scientific quality of the data taken from the e-infrastructure is that of the individual databases.The VAMDC infrastructure is currently a technical platform that ensures that the exchanged data are compliant with the standards, in particular with the data format standards.As described in Section 5.1, the standardisation imposed quality/coherence checks at the individual nodes, and those checks were beneficial to the overall quality of the outputs.The data nodes must be compliant with an internal quality chart on voted by the board (http://www.vamdc.org/structure/how-to-join-us/internal-regulations/).Some specific quality improvements specify that references must be attached to the data, that the transferred data are timestamped and that the various queries are stored in the "Query Store" to ensure the tracing of the data.When uncertainties are available in the databases, this information is transferred through VAMDC.
In addition, as described above, there is a plan to attach external rating reviews to data: rating reviews might concern the comparison of several sets of data for a single range of applications or might concern a single set of data to provide its range of applicability.Obviously, we should be aware that reviews are only a guide; they depend upon the group of people who performed the review, they can rapidly become out of date and should be updated.VAMDC offers an open infrastructure that allows third parties to establish some data assessment and to provide such services.VAMDC collects data and facilitates the comparison of data from different databases that might contain identical quantities: since all VAMDC nodes produce XSAMS formatted output data, it is easy for a user to identify overlaps between databases, e.g., by identifying when the same data units have different values.
We would like to stress that VAMDC provides support to the work of scientists, but that users should be pro-active with VAMDC to enrich the scientific information.The system and service can be steadily improved and augmented by collecting and providing more central statistics about the content of the attached databases.Currently, the species database allows for the display of the species contained in individual databases, but it could additionally list the processes and their range of applicability.The bibliographic service will display the papers attached to the data from the different databases, thus providing another type of central information system.Additional useful central information would be to display what fraction of data content in each database is accessible through VAMDC (sometimes it is 100% of the data, sometimes it is not).
Finally, some users would certainly like to have a "one-stop shop" for their application and such an effort can be made by users since VAMDC offers the technical possibility to easily build secondary data sets (see the discussion of impact on the BRASS database in Section 5.3.2).The VAMDC infrastructure could also provide the means to display such "user oriented secondary data collections" if the users wished to make them publicly available.The VAMDC consortium could also be the place where such demands are made and then collaborative projects could provide such "user oriented data sets".The communities should be aware that the VAMDC effort has already been huge, that those who currently maintain the databases and infrastructure do so efficiently and with little manpower and financial resources, and that communities must get involved/engaged with VAMDC if they want to ensure the service they want.

Sustainability
The above technological and scientific innovations are driving forces for the sustainability of the VAMDC e-infrastructure.However, the VAMDC consortium must address the long-term issues of renewing the scientific and technical people involved in the curation of atomic and molecular data and of maintaining the leadership of these activities at/across institutes and in/across countries.

Conclusions
VAMDC is a sophisticated infrastructure that makes large sets of data publicly available in a common format.
A fundamental feature of VAMDC is its ability to connect more databases from atomic and molecular domains already described in VAMDC, but also from other domains.Therefore, it serves as an ideal platform for the sustainability of scientific data.
A crucial key to the evolution of VAMDC is its usage in application software and the involvement of users in the orientation of the VAMDC e-infrastructure platform and services.Here, we have presented a few user case studies to illustrate this.
A striking point in the evolution of VAMDC is its involvement in data management at the highest international level.This places the VAMDC e-infrastructure as an international prototype and example, where general concepts are developed and applied, thus providing to the international and national communities diverse high-level expertise in data management and handling.
In this paper, we have demonstrated once again the strength and coherence of the VAMDC concept and the quality and innovation in the development of the standards and software solutions that the VAMDC consortium has delivered.This summarises the advancement of a ten-year effort of a large group of leading scientists.The paper has also shown how the VAMDC continues to question the current status and to identify current weaknesses, while outlining how it will upgrade its services (Sections 4.4 and 6) and make future improvements (see Section 6).
Finally, we provide help to newcomers and to users through detailed documentation and tutorials (http://www.vamdc.org/activities/research/).A 'helpdesk' (email: support@vamdc.eu),led by experts, is available to answer questions, to link the user with the appropriate organisers of the relevant databases, and to support the inclusion of new nodes into VAMDC.

Figure 1 .
Figure 1.Example of the displayed results by the Virtual Atomic and Molecular Data Centre (VAMDC) portal after performing a given query.The processed query is highlighted in the Your Request box.

Figure 2 .
Figure 2. Example of a request made within Centre d'Analyse Scientifique de Spectres Instrumentaux et Synthétiques (CASSIS) for an observed transition of CO (black) using the CDMS node of VAMDC for the line identification and Local Thermodynamic Equilibrium (LTE) modelling (red) (see text).The right side of the figure shows the species available in the database and the possible thresholds on the Einstein coefficients and upper energy levels.The V LSR is also given for the shift applied to the databases' frequencies, and the green bars below the spectrum correspond to the possible transitions within the frequency range.

Figure 3 .
Figure3.Spectra towards the high-mass star-forming region G327.3−0.6, with the intensity in brightness temperature unit (T B ) observed with the Atacama Large Millimeter/submillimeter Array (ALMA).The myCLASSFit function was used to model the observational data shown in black.Synthetic spectrum, including all identified molecular species are shown in grey, and has the intensity multiplied by a factor −1.This study aimed at finding and quantifying ethyl cyanide (C 2 H 5 CN) in these dense spectra.The contribution from C 2 H 5 CN v = 0 and C 2 H 5 CN v 13 + v 21 = 1 transitions are highlighted in orange and blue, respectively (Reproduced from[169]; copyright Elsevier (2020)).

Figure 4 .
Figure 4. Example of parameter maps created by the myXCLASSMapFit function using nine transitions of CH 3 OCHO simultaneously, taken from an ALMA data set of the core of G24.78 [170].

Table 2 .
List of databases connected with the Query Store service.The databases marked with a star ( ) are currently being connected to the Query Store and the test phase is in progress.
4.1.CASSIS Software Centre d'Analyse Scientifique de Spectres Instrumentaux et Synthétiques (CASSIS; http://cassis.irap.omp.eu) is a standalone software written in Java, freely delivered to the community for help in visualising, analysing and modelling observations from ground or space-based observatories.It has been developed at Institut de Recherche en Astrophysique et Planétologie (IRAP) since 2005 and is part of the Observatoire Virtuel du Grand Sud-Ouest (OVGSO) data centre that aims to promote VO technology.It displays any spectra (ASCII, fits one or more complete (FITS) or GILDAS/CLASS (https://www.iram.fr/IRAMFR/GILDAS/)format or the result from a query to any SSAP ( http: //www.ivoa.net/documents/SSA/20120210/)VO or EPN-Table Access Protocol (TAP)

•
[181] has been active in the Research Data Alliance since 2014 (Section 3.2.2),whenVAMDCbecameanearlypilotfor the data-citation recommendation.Starting from 2016, VAMDC took a leading role in the RDA-Federated Identity Management Interest Group: the RDA recommendations produced by this interest Group[180]incorporate ideas and needs coming from the authentication, authorisation and accounting strategy we developed for the VAMDC consortium[181].•Since2016, we have worked with the IVOA on converging VAMDC and the IVOA atomic and molecular standards (http://www.ivoa.net/documents/SLAP/,http://www.ivoa.net/documents/SLAP/).