Cosmic-ray database update: ultra-high energy, ultra-heavy, and anti-nuclei cosmic-ray data (CRDB v4.0)

We present an update on CRDB (https://lpsc.in2p3.fr/crdb), the cosmic-ray database for charged species. CRDB is based on MySQL, queried and sorted by jquery and tsorter libraries, and displayed via PHP web pages through AJAX protocol. We review the modifications made on the structure and outputs of the database since the first release (Maurin et al., 2014). For this update, the most important feature is the inclusion of ultra-heavy nuclei ($Z>30$), ultra-high energy nuclei (from $10^{15}$ to $10^{20}$ eV), and limits on anti-nuclei fluxes ($Z\leq -1$ for $A>1$); more than 100 experiments, 350 publications, and 40000 data points are now available in CRDB. We also revisited and simplified how users can retrieve data and submit new data. For questions and request, please contact crdb@lpsc.in2p3.fr.


Introduction
Cosmic-Ray (CR) physics has been established more than a century ago and advancements in the measurements have been performed through this entire period. Besides preserving the measurements for their historical value, the motivations for a CR database are numerous.
Firstly, GeV CRs vary with the Solar cycle, and past data are still of interest-for an illustration of the usage of CR data going back to the 50's, see Ghelfi et al. [1]. Secondly, it is infeasible to design and build CR experiments that measure all species at all energies, so that multi-species studies over different energy ranges have to rely on many datasets from as many experiments-for an illustration of the usage of data samples of H to Ni elements over 45 years, see Shen et al. [2]. Thirdly, new synergies with other fields of astrophysics make the collection of even the rarest CR data desired. For instance, ultra-heavy elements (UHCRs, Z > 30) have very small fluxes and are difficult to measure. They require dedicated experiments and so far the data are very sparse (e.g. Donnelly et al. [3] and Binns et al. [4]). UHCRs have not been a very active topic in the last decade, but the situation is likely to change: the first detection of gravitational waves from a binary neutron star inspiral [5]-and in particular optical follow-ups-indicated that neutron star mergers could be a major contributor to r-process nucleosynthesis for UHCRs [6,7]. Fourthly, in the last two decades, a growing number of high-precision experiments were developed to measure leptons, antimatter, and Z < 30 nuclei in tens of MeV to hundreds of TeV range (AMS-02, PAMELA, TRACER, Voyager, etc.). These balloon-borne or space-based experiments allow to address many astrophysics questions [e.g. 8] including searches for dark matter [e.g . 9]. In the context of these searches, upper limits on

Database structure and updates
Throughout the paper, we separate the data in two broad categories, data and meta-data (data about data). Clearly separating these two categories is important to define the format to submit new data (see Sect. 3). The categories comprise the following items: • Data: CR data points, including energy range and data uncertainties; • Meta-data: CR data-related informations that include data taking periods, experiments that provide the data, publication from which the data were taken, etc.
The first CRDB publication [22] focused on the rationale behind the design choices made for the database structure and organisation (in MySQL). In this update, we follow a complementary presentation path. We start at the data level, moving up to the experiments and then publications, underlining some salient aspects of the data themselves and how they are tied to the other meta-data tables. Along the way, we highlight the changes made between CRDB v2. 1 [22] and CRDB v4.0.
The database structure is shown in Fig. 1. By design, each entry has a unique ID in all tables, so that all tables can be accessed by other tables; this is particularly useful to build bridge tables (i.e. building associations between table entries). The different tables, their contents, and the associations between data and meta-data tables are further detailed in the following subsections.

DATA table
A data point corresponds to a quantity measurement (with uncertainties) at a given energy. We list below what keys are needed for a full data entry description in the DATA table (see Fig. 1). The only change made since CRDB v2.1 is enabling data upper limits (new key IS_UPPER_LIMIT).
• ID: unique identifier for a data.
• NUM and DEN: direct detection CR experiments measure fluxes (in unit of E −1 unit m −2 s −1 sr −1 ) or ratios (no unit), with E unit in GeV, GeV/n, or GV in CRDB. Depending on the detector charge and isotopic resolution, the measured quantities are isotopes (e.g. 12 C), elements (e.g. C), or even groups of elements; the latter case mostly applies for UHCR and UHECR data (see next sections). Indirect detection CR experiments also provide shower-related quantities (with their specific units) that we plan to add in the future. To allow for this flexibility, we define a CR_QUANTITY table which contains a list of names including CR isotope names, element names, and names for relevant groups of elements or other significant measurable quantities (e.g. AllParticles, see Table A2 for new names in CRDB v4.0): a data quantity is thus a single item (NUM) or a ratio of two items (NUM and DEN).
• E-AXIS: depending on the detector type (calorimeter, spectrometer, etc.), different energy types are measured in experiments and provided in publications. A conversion into a common unit is not possible for all data sets and therefore CRDB stores the data in the original energy type, which is one of the following: total (E tot ) or kinetic energy (E k = E tot − m) in GeV; rigidity (R = p/(Ze)) in GV; kinetic energy per nucleon (E k/n = E k /A) in GeV/n; or total energy par nucleon (E tot/n = E tot /A) in GeV/A. • E_MEAN, E_BIN_L, and E_BIN_U: fluxes are measured by counting events in energy bins. In CRDB, we include the energy interval range [E lo , E up ] when it is published. The measured flux obtained from a finite bin neither exactly corresponds to the differential flux at the bin centre or at the mean energy E , unless a correction is applied [25]. While it is possible to approximately correct the data, the exact solution is to instead integrate a flux model over the bin width before comparing it to data [26]. If only a single value E is provided, we set E lo = E hi = E ; conversely, if only the bin range is provided, we set E = √ E lo E hi . • VALUE, VALUE_ERRSYST_L, . . . , IS_UPPER_LIMIT: a measurement has a value and statistical and systematic uncertainties. In CRDB, we store the value and the possibly asymmetric statistical and systematic uncertainties all individually-see App. A.2 for subtleties and caveats concerning the gathering of uncertainties from publications. In CRDB v4.0, upper limits are values whose key IS_UPPER_LIMIT is set to 1. • SUBEXP_PUBLI_ID: a datum is always attached to a unique publication and a sub-experiment (see next sections). This key enables to bridge the data ID (this table) to the subexp-publi ID (SUBEXP_PUBLI

EXP and SUBEXP tables and meta-data
CR data are collections of data points at different energies, from an experiment measuring one or several quantities. In CRDB, for flexibility, we make the distinction between an experiment and sub-experiment (hereafter sub-exp for short). The experiment is defined by its name and starting date. The sub-exp is linked to the experiment, but is more closely related to the data in order to uniquely tag the dataset released by this experiment. The tag is here to inform us that the data correspond to (i) analyses from different data taking periods, (ii) data from the same period but from different analysis techniques or different sub-detectors, (iii) data reconstructed relying on different third-party model assumptions (e.g. using different Monte Carlo generators for air showers in UHECRs).
Experiment and sub-experiment data are meta-data (attached to CR data) stored in the EXP and SUBEXP tables respectively (see Fig. 1); they are briefly described below.

EXP table
Only one change has been made for CRDB v4.0, and it is related to the experience type (key TYPE, see below).
• ID: unique identifier for an experiment.
• EXPNAME: unique name associated with the main instrument (AMS-01, KASCADE, etc.). Most of the data before the 90' come from balloons, and to ensure a unique identifier, CRDB uses the syntax: Balloon (YYYY) for a single flight, Balloon (1966Balloon ( ,1967 . . ) for balloons flown several times, and Balloon (1967+1968+. . . ) for experiments that report only the combined analysis of several flights. The SUBEXP table contains detailed information on possible sub-detector(s) from which the data were obtained; this includes the data taking conditions and, in some cases, the analysis technique. In CRDB v4.0, a new key was added to track the energy-scale uncertainty (i.e. a calibration uncertainty on the energy measurement), which is especially important for ground-based experiments where it exceeds 10 %. We list all the keys in the SUBEXP table (see Fig. 1) and then comment on the effect of Solar modulation on multi-GeV GCR data [27]. In particular, we stress how the last three keys listed below are necessary inputs for Solar modulation calculations.
• ID: unique identifier for a sub-experiment • EXP_ID: a subexp is always attached to a unique experiment (i.e. ID in the EXP (∆E/E) scale , which applies to all energies measured by the sub-experiment 5 . We refer the reader to App. A.3 for more details on the origin of this uncertainty. • DATES: list of data taking periods for the instrument, taken from the publication or, if not reported in old balloon flights, retrieved from the StratoCat database 6 . • DISTANCE: most of the experiments are located on Earth, or are orbiting Earth. However, a few satellites (e.g. Ulysses, Voyager, etc.) took data at various locations in the Solar system. We provide in CRDB the average distance to the Sun in a.u. (an astronomical unit is the distance Sun-Earth) during the data taking period, as provided in the publications. • SMALL_PHI: accurate solar modulation models are complex and depend on many parameters. The Force-Field approximation [28-30] depends on a single parameter φ(t). It remains useful for many applications despite its limitations [31], therefore we provide the mean modulation level φ FF calculated externally from φ time series (see below) over the sub-experiment data taking periods.
In CRDB v4.0, we provide these three values in the SMALL_PHI keys (of the SUBEXP

PUBLI and SUBEXP_PUBLI tables and meta-data
CR data are published by experimental teams and collaborations in scientific publications. To save us time and avoids confusion, we have a strong preference to include only final data published in peer-reviewed journals, and most data in CRDB come from refereed journals. An important exception is the proceedings of the biyearly International Cosmic-Ray Conference (ICRC, started in 1947): until the 90's, many balloon flight results were published in the ICRC proceedings only, and even nowadays these proceedings still often contain the latest preliminary results and updates of previously published results.

PUBLI table
To store what a publication is, the following keys are used.
• ID: unique identifier for a publication.
• HTML: unique ID for the reference, taken from the Astrophysics Data System (ADS) 12 , e.g.
2014A&A...569A..32M. • REF and BIBTEX: these two keys provide a full publication reference and BibT E X 13 entry, automatically retrieved from the ADS ID thanks to the ADS API (application programming interface) 14 . • SUPERSEDED_BY: in some cases, the same data (same quantity from the same sub-exp) are re-analysed and updated in a subsequent publication. This key enables to store the ID (in the PUBLI Deprecated keys in CRDB v4.0 One major modification in the PUBLI table in this release was to abandon book-keeping Solar modulation informations from the publication, i.e. Solar modulation levels calculated by the authors for their data. This feature was introduced in CRDB v2.1-requiring many extra keys in the PUBLI table, see Maurin et al. [22]-in order to highlight the various assumptions behind the modulation levels provided. The motivation was to ease comparisons between modulation levels for data taken at similar periods but provided by different experimental teams. However, picking these informations from the publication was time consuming for very little practical use in retrospect 15 . Moreover, as stressed in the previous section, modern CR propagation model studies either make use of φ FF or rely on public Solar modulation codes, and the informations stores in the SUBEXP table allow both these practices (see Sect. 2.2.2). A major benefit of not considering the modulation from the publication is that it simplifies the structure of the database, the extraction of data, and the submission of new data.

SUBEXP_PUBLI bridge table
To avoid duplicates, for cases where different data refer to the same publication or the same sub-experiment, a bridge table (SUBEXP_PUBLI, see Fig. 1) enables to link entries from the SUBEXP and PUBLI tables. Its keys are as follows.
• ID: unique identifier for a subexp-publi association.
• QUANTITIES: lists all the quantities (fluxes, ratios, etc.) provided by a given sub-exp/publi association.
• DATA_UPLOAD: stores date at which the corresponding data were uploaded.

New data in CRDB v4.0
We now turn to the database content. As much as possible, data are added in CRDB on a continuous basis (by one of us); new data can also be submitted (see Sect. 4.4). Thanks to the feedback of many users, typos and mismatches in the data and meta-data are regularly corrected-the list of corrections and people who helped us (see acknowledgements) are available online (see CRDB log file).
In this section, we highlight the main datasets added since the first release [22], in particular in the two most recent releases CRDB v3.1 ( §3.1) and v4.0; for the latter we detail the motivation and specifics of UHCR data ( §3.2), anti-nuclei upper limits ( §3.3), and UHECR data ( §3.4). The complete list of experiments and publications (presently in CRDB) is gathered in App. B.

Relevant data updates before CRDB v4.0
Since the first release of CRDB seven years ago, two major data updates are worth highlighting. The first, tagged as CRDB v3.0, added H and He data from a large number of balloon flights going back to the 50's. These data sets 15 Indeed, the IS spectra and the Solar modulation models used are too patchy. For IS fluxes, publications used spectra derived from their data, from other authors, of from inputs of GCR propagation models. For Solar modulation, the models found in the publications were (i) for very old data, the outdated diffusion/convection model and its were used to derive Solar modulation levels over sixty years and compared to values derived from neutron monitor data in Ghelfi et al. [1]. The second major upload, tagged CRDB v3.1, was made a few weeks before preparing this release. The update is the result of a tremendous concentrated effort to include a wealth of data released in the last two years from various experiments (AMS-02, CALET, NUCLEON, TRACER, Voyager, and a few others); they correspond to some of the most interesting and high-precision CR data ever recorded. In addition, time-dependent fluxes and ratios from AMS-02 [47,48] and PAMELA [49] were added.
We emphasise at this point, that the CRDB is open for data submissions from experiments since its first release, and contributions are very welcome. We further simplified the submission interface, presented in Sect. 4.4, to encourage more data submissions in the future.

Very heavy and ultra-heavy CRs
The theory of stellar nucleosynthesis to explain Solar system abundances goes back to the fifties [50]. In the following decade, Z ≥ 30 [51,52] and even a few Z > 90 (actinides) [53-55] CRs were detected. The abundance of these UHCRs are driven by three processes, corresponding to three categories of stable heavy nuclides: the p-, r-, and s-nuclides, respectively located at the neutron-deficient side, neutron-rich side, and bottom of the valley of nuclear stability [56]. How much p-[56,57], r-, and s-processes [58,59] contribute to the various UHCRs, and which astrophysical sites are involved remain debated. Moreover, the abundance of UHCRs slightly differs from the Solar system ones, and these differences are likely to be related to specific acceleration [e.g. 60] or transport processes [e.g. 61].
Measuring UHCR data is challenging because the fluxes are low. From Z = 28 (Fe) to Z = 30 (Zn) and then Z = 40 (Zr), the flux is suppressed by a factor ∼ 10 3 and ∼ 10 5 [4]. The rarest CRs, the actinides Z ≥ 90 (Th and U), are about seven orders of magnitude less abundant than Fe [62]. Standard techniques used for measurements of Z < 30 CRs can still be pushed to cover the 30 ≤ Z ≤ 60 region [4], but come slightly short for the heaviest ones [63,64]. An alternative consists of passive detectors exposed for long durations (several years). The detection principle is based on the chemical modification made in a solid state nuclear track detector when ionising particles go through it. After long exposure, the detector is etched in a chemical agent and each track left is analysed and its properties can be related to the charge and velocity of the CR.
We added in CRDB v4.0 most of the CR data found for Z > 30. Most measurements do not resolve elements, and we needed to create new name holders for groups of charges, for instance Pt-group (74 ≤ Z ≤ 80), Pb-group (81 ≤ Z ≤ 87), Subactinides (74 ≤ Z ≤ 87), and Actinides (Z ≥ 88)-see Table A2. We note that in the literature, different groups sometimes used slightly different charge range in their definition, but we stick to this one for all implemented data-data uncertainties are larger than the error made from using slightly enlarged or reduced charge groups. The data themselves mostly consist of ratios of elements (or of groups of elements) in an unspecified energy band (related to the detector capabilities). In practice, CR fluxes are expected to be maximal at a few GeV/n, and several experiments reported no energy evolution (within the uncertainties) of their measured ratios. For this reason, and for simplification, we set all UHCR data Z > 50 to a generic energy bin around 1.5 GeV/n.
More than sixty years after their discovery, only very few UHCR data are available and almost no new experiment is running or planned. Except for the relatively recent superTIGER experiment in the Z = 30 − 40 range . The solution to the origin of these elements may come from yet another route: recently, optical follow-up observations of a gravitational wave event have shown that binary star mergers could be a major r-process contributor to UHCRs [75]. More such events are likely to be detected in the future, and they will provide a complementary view to that of CR data.

Antimatter CRs
So far, anti-protons are the only anti-nuclei found in CRs. They were first detected at the end of the seventies [76,77] and soon interpreted by Silk and Srednicki [78] as possible dark matter signatures. The pioneering theoretical aspects and subsequent efforts were acknowledged by the 2019 Cosmology Gruber prize: [...] Silk recognised dark matter's indirect signatures such as anti-protons in cosmic rays and high energy neutrinos from the Sun. This exploration continues with modern experiments, which have detected several tens of thousands of anti-protons [79].
The quest for Z < −1 CRs is also very much alive but the expected fluxes are very low. Indeed, most if not all CRs anti-protons can be attributed to nuclear production (H and He CRs on the interstellar gas) during CR propagation in the Galaxy. The cost to create and fuse together an extra anti-proton or anti-neutron from colliding protons is a roughly 10 −4 suppression factor [e.g. 80]. As a consequence, the ratio of astrophysical CR anti-protons to protons is at most ∼ 10 −4 [e.g. 79], and the ratio of astrophysical CR anti-nuclei (of atomic number A) to protons is expected to go very roughly as 10 −4A . So far, only upper limits have been derived on the latter [81]. CR anti-deuterons are particularly difficult to detect because they must be isotopically separated from thep tail, and also because charge confusion rejection from p should be at the level of 10 8 . The AMS-02 experiment on the ISS or the GAPS experiment based on a novel detection technique could detect a few events in the near future [11]. Although at a much lower level, anti-helium events may have been seen in AMS-02 very preliminary analyses [12]. Whether these events are real, and whether this is compatible with no detection of anti-deuterons and can be accounted for physics scenario is still at the exploratory stage [82][83][84]; a confirmation of these events would have huge consequences.
To tackle upper limits, a new key in the CR database table was added (see Sect. 2.1). Limits on anti-elements are also usually derived with respect to elements, and we added in CRDB v4.0 new names to cover all associated data (see Table A2). The best limits so far come from the BESS balloon flights [10] and PAMELA mission [103], as shown in Fig. 2 with results from other experiments derived over the years (all present in CRDB v4.0).

Ultra-high energy CRs
A major extension of the already comprehensive CR database is the inclusion of data from air shower experiments. High-energy CRs are measured indirectly with ground-based experiments. They detect the particle cascades initiated by CRs in Earth's atmosphere (air showers). Since the CR flux drops rapidly with increasing energy, the air-shower technique is the only feasible way to obtain CR data above PeV energies. The air-showers allow for large aperture experiments due to their extensive footprint on the ground in charged particles and Cherenkov light, and their visibility from a distance in ultra-violet light.
The energy and direction of a CR can be well inferred from air showers, but the identity of each individual CR cannot be determined accurately. The identity, more specifically the mass A of the CR, has to be inferred from air shower properties that naturally fluctuate, like the slant depth X max of shower maximum measured from the top of the atmosphere or the total number N µ of muons produced in the shower. These natural fluctuations make it impossible to distinguish species on an event-by-event basis. Fluxes of individual elements or isotopes therefore cannot be determined.
Experiments most commonly report the all-particle flux. This includes in principle also stable electromagnetically and weakly interacting particles, but the flux of these components at the PeV scale and above is negligible compared to nuclei [104,105]. In addition, some experiments report the fluxes of mass groups by splitting the observed mass range into two groups (proton-helium group and the rest), four groups (proton, helium, oxygen-, and iron-group), or five (as before, but adding a silicon group). These groups span over roughly equal intervals in ln A, since the mass sensitivity is roughly constant in this variable. Beside the usual individual element or isotope names, the list of CR quantity names in CRDB v4.0 was expended to handled these groups (e.g. H-He-group, Fe-group, AllParticle, see Table A2). We note that these results are obtained by simulating air showers initiated by only the leading element in each group and by fitting the sum of the air shower response distributions to the observed distribution [106,107]. Representing a whole mass-group by a single element is a great simplification, but unavoidable since the relative fluxes of species within a mass group are unknown. The analysis approach works in practice because air shower fluctuations are large compared to the small differences in the response to individual elements within a group.
Another peculiarity of high-energy data is that one raw data set often has several interpretations, in which the fluxes of mass groups and, to a lesser degree, also the all-particle flux vary. These parallel interpretations are reported by the experiments due to significant theoretical uncertainties in simulated air showers. Detailed air shower simulations are required to infer the mass A from air shower properties like X max and N µ , and these simulations use hadronic interaction models that extrapolate fixed-target and collider measurements of particle interactions using theory and phenomenology. Air showers are dominated by soft QCD interactions, which so far cannot be predicted accurately from first principles. Therefore the interpreted measurements depend on the hadronic interaction model used in air shower simulations, which is listed with the interpreted data.  [108], Pierre Auger Observatory [109] for two different Monte Carlo event generators, and TUNKA-133 [110].
For this update, we focus on adding flux measurements to the CRDB, but we also consider to include measurements of air shower properties like the average of X max or N µ as a function of the shower energy in the future. In the current update, we include air-shower based flux data from the Pierre Auger Observatory, Telescope Array, the IceCube Neutrino Observatory, the KASCADE-Grande experiment, the TUNKA-133 Array, and the H.E.S.S. observatory. An illustration of the evolution with energy of the ratio of two groups of elements is shown in Fig. 3.

User-interface updates and new submission form
CRDB is hosted by the LPSC laboratory website. It runs on a LAMP solution, i.e. a stack of free open source softwares: Linux (operating system), Apache HTTP (server), MySQL (database), and PHP (hypertext pre-processor language). The web interface relies on several third-party libraries (jquery, jquery-ui, jquery.cluetip, and table-sorter) that are used to sort and display the database content, as briefly presented in the various sections below. For efficiency and speed, all web pages make use of AJAX (Asynchronous JavaScript and XML) web development techniques.
Accessing the data in CRDB can be achieved in two ways, either by accessing the resources via CRDB web pages, or via a REST (representational state transfer) interface. While web pages allow to access the fully contextualised data and meta-data (thanks to several sorting, selections, display, and download of data), the REST interface is useful for users who want to access and retrieve data from command lines (i.e. from their terminal or codes, without going through the CRDB web pages).
Below, we discuss separately the various web interfaces: those with minor changes CRDB v4.0 ( §4.1), the interface for extracting φ FF time series ( §4.2), the new help page for the REST interface ( §4.3), and the simplified procedure (and format) to submit new data ( §4.4).

Web user interface
The address https://lpsc.in2p3.fr/crdb leads the user to a choice of various tabs, as illustrated in Fig. 4. Among them, two tabs provide differently contextualised informations on CR experiments and data, while other tabs provide general information on CRDB, external CR resources, etc. We give below a brief description of these tabs, highlighting the most salient features and novelties of this version, and providing a few snapshots as illustrations.

'Welcome' tab
This is the unique entry point of the website (see Fig. 4). It contains a brief description of the database content and structure, and informations on CRDB including versions, log, logo, and publications. In particular, the log file tracks the changes made in the various releases (data corrections and past and ongoing developments).

'Experiments/Data' tab
This is the tab to go for users looking for a specific experiment, its meta-data, and the data it gathered; experiments can be sorted according to their name, starting date, or type. This tab lists the quantities they measured, provide links to the experiment web page and associated publications, pictures of the sub-detectors (by clicking on the 'magnifying glass' icon); see Fig. 5 for an illustration. Clicking on '[data]' for any given sub-exp gives further information on their origin: data taking period, how data were retrieved (numbers from tables, extracted from figures, or from personal communications), details on the detector, etc.
We stress that the data presented in this tab are 'native' data only, i.e. data as provided in publications and uploaded in the database without modification (which is not the case in the 'Data extraction' tab). Accordingly, the energy axes for the data are as provided in the publications.

'Data extraction' tab
This is the main interface to select, extract and plot data, displaying also some meta-data (sub-exp names, links and BibT E X references for publications matching the selection), with the possibility to save the outputs in different formats (images, ASCII files, etc.). The web page consists of a selection box allowing to choose a CR quantity (numerator and denominator) and to display all native and combined data (see App. A.1) 16 -for caution about the usage of overlapping data taking periods for data from the same experiment, we refer the reader to App. A.5. More selection criteria allow to ask for matching sub-exp (partial or full) names, specific dates and energy range, and flux rescaling with energy. In the context of long time series provided by both the AMS-02 [47,48,111] and PAMELA [49] experiments, CRDB v4.0 allows to select only (or discard) data from time series, in order not to overcrowd the display of data. This is illustrated on a 'time-series only' selection in Fig. 6, where we show PAMELA H monthly average from 2006 to 2010 [112]. This list triggers on the presence of the word 'average' (in yearly, monthly, daily, Bartels rotation. . . averages) in the 'subexp-info key in the SUBEXP table (see Sect. 2.2.2). A critical choice is that of the energy axis (among E k/n , R, E tot , etc.): if no native data exist for this energy axis, conversion are performed to rescale the data from its native axis to the queried one. However, this conversion is only possible for fluxes (not ratios), and is exact for CR isotopes and leptons (whose A, Z, and m are identified) but approximate for elements and group of elements (see App. A.4).
Hitting the 'Extract Selection' button pops-up a new window displaying: (i) a plot of the data, (ii) list of meta-data (per sub-exp) associated with the data, (iii) summary of extraction process (i.e. whether combinations and approximations were made), and (iv) full listing of all retrieved data points (energy, values, and uncertainties). In CRDB v4.0, we improved the browsability between these these informations. From the plot, we can export the data in a format compliant with the USINE and GALPROP propagation code, or as a cvs or tarball of ASCII files. We can also directly retrieve images or ROOT 17 -compliant macros (.root, .C). A 'replot' button allows to further trim the data and modify the plot axes and look, with the same options (as in the original plot) to save the results.
To conclude on this tab, we are also pleased to provide in CRDB v4.0 an additional selection option, that allows to display a comma-separated list of quantities-selection criteria (specific energy range, dates, sub-experiments, etc.) are enforced to all quantities in the list. This is illustrated in Fig. 7, where we show the full CR spectrum from MeV to EeV energies from a selection of recent experiments (top panel), and a selection of all nuclear fluxes from the AMS-02 experiment (bottom panel).

'Admin' tabs
This tab can be accessed by authenticated users only (CRDB maintainers). In this page, various scripts provide internal checks of the database content (missing images for the detectors, orphan ID in the various tables, list of superseded publications, etc.). This page used to handled validation forms for submitted data, but it has been removed thanks to the new submission form in CRDB v4.0 (see Sect. 4.4). We can also monitor from this page the traffic on CRDB, stored in a specific LOG_QUERIES table (connection date, IP address, and page visited, see Fig. 1). This traffic since August 2013 is shown in Fig. 8. CRDB received more than a quarter million queries, from ∼ 20, 000 different IP addresses in over a hundred countries-most queries originate from Germany, USA, Italy, France, China, Switzerland, and Japan, i.e. countries also strongly involved in recent CR experiments.

'Useful links' tab
This tab gathers links to many online CR resources. These links include propagation codes for Solar, Galactic and extra-galactic CRs, other useful CR databases, and CR-related websites. If you feel that some important resources are missing, please contact us (crdb@lpsc.in2p3.fr), we will be happy to add them.

'Solar modulation' tab
This tab was added in CRDB v3.0 and thus not described in Maurin et al. [22]. Its purpose is to provide Solar modulation level φ FF in the Force-Field approximation [e.g. 31] for any date between the 50's and today. The calculation is based on neutron monitor (NM) data and a reconstruction algorithm detailed for instance in Ghelfi et al. [1].
Neutron monitors are devices developed in the 50's to monitor Solar activity [125]. They combine a very good time resolution (a few minutes) and a good stability over decades. For this reason, they are well-suited devices to get time series over large time periods [37]-they are used to fill φ FF from any CRDB sub-experiment as discussed in Sect. 2.2.2. As illustrated in Fig. 9, in this tab, users can select NM stations and a time period (top panel) in order to get the average φ FF (not shown) or φ(t) time series (bottom panel) at different time resolution (from the finer-grain 10 mn to monthly averaged time series).
In practice, scripts on LPSC servers retrieve and process NM data from NMDB. They produce, behind the scene, ASCII files for all available stations at a 10 mn time resolution. Then for each CRDB query, another layer of scripts reads these files to calculate and display the user selection. To ensure a continuous service and update, a cron 18 time-based job scheduler is used on a daily basis.

'REST access' tab
The database offers a REST 19 interface, which makes it easy to download data sets according to selection criteria from another program. The interface has been available since version CRDB v1.3, but the new help page in CRDB v4.0 makes this feature more prominent.
Data sets can be selected by quantity (element, isotope, electron or positron, or mass group), experiment, energy range, and time range. Users must specify the energy type in their query. As mentioned in previous sections, data sets are stored in their native energy type, which can be kinetic energy, kinetic energy per nucleon, total energy, rigidity. The native type is converted to match the request, details on the conversion are given in App. A.4. Other parameters of the interface control the output format, and whether ratios should be synthesised from individual flux measurements.
The available parameters are documented in Table 1 (also reproduced on the website). To give an example, the boron-to-carbon flux ratio as a function of the kinetic energy per nucleon can be queried with the command-line utility curl as follows: curl -L 'http://lpsc.in2p3.fr/crdb/rest.php?num=B&den=C&energy_type=EKN' > db.dat The output in USINE format (table with columns separated by spaces) is stored in the file db.dat. Queries can be made from any general purpose programming language. For Python users, we provide a simple module to run Add combinations via ratio or product of native data (from the same sub-exp at the same energy) that match quantities in list (e.g. compute B/C from native B and C). Three levels of combos are enabled: 0 (native data only, no combo), 1 (exact combos), or 2 (exact and approximate combos): in level 1, the mean energy (or energy bin) of the two quantities must be within 5%, whereas for level 2, it must be within 20% energy_convert_level c Add data obtained from an exact or approximate energy_type conversion (from native to queried). Three levels of conversion are enabled: 0 (native data only, no conversion), 1 (exact conversion only, which applies to isotopic and leptonic fluxes), and 2 (exact and approximate conversions, the latter applying to flux of elements and of groups of elements)   The table shows the column ID, the associated keyword in the database structure (see Fig. 1), the expected content of the column, and an example (and associated unit) for this column-entries corresponding to starred keywords can be left empty (i.e. " ") if the user is unsure. For more details on CRDB data and meta-data definitions, see §2. 1 EXP-NAME Experiment name "AMS02" 2 EXP-TYPE balloon, ground, or space "space" 3 EXP-HTML Experiment official website "http://www.ams02.org" 4 EXP-STARTYEAR Experiment starting year "2011" 5 SUBEXP-NAME Name specific to analysis ( §2.2. Columns that can be left empty (but not omitted) if the user is unsure about what he should write. † To help pick the correct denomination for the measured quantities, we list in the webpage all currently defined names. If necessary, quantities not defined yet will be added along with the submitted data.
queries from Python and return the result as a numpy table. The code and a tutorial with example usage are available on GitHub 20 and linked to from the CRDB website.

'Submit data' tab: new submission form
In CRDB v2.1, the submit data interface was based on a sequential online 4-step procedure. The user had to fill meta-data first (experiment, sub-experiment, and then publication) and the data. Each step had to be completed and validated by one of us (via the admin interface) before being able to go to the next step. In practice, because of the two many steps, probably not documented enough procedure, and delay between the validation steps, this was almost never used.
In CRDB v4.0, we decided to completely change the approach and only ask for single csv file, whose expected columns are described in Table 2 (and also reported CRDB web page); files must be sent crdb@lpsc.in2p3.fr. One of us then uses dedicated scripts on these files to finalise the upload on CRDB. In this new format, informations 20 https://github.com/crdb-project/tutorial on the data and meta-data are still required, but to simplify the submitter task, many entries can be left empty (and will be filled by us), as indicated by stars in Table 2. We hope that this simplified submission procedure and lessened number of meta-data entry to provide will convince experimentalist to make the extra step of submitting the data to CRDB after they submit the paper to a journal.

Summary and future developments
We have presented in this article CRDB v4.0, a new version of the CR database for charged species, highlighting several improvements on the database structure and content, and on the user interfaces. We summarise the changes made below: • Database structure: a few tables were simplified and three became deprecated (Solar modulation from the publication, users and validation table for the old data submission interface). In one key, predefined values were modified (experiment types are now balloon, ground, or space), and in another they were extended (energy axis type now includes ETOTN). Two new keys were added, to keep track of the data origin (retrieved from from a table or figure in the publication, etc.), and to store a possible energy scale uncertainty in the data (single number for a given sub-experiment). ). The database now contains more than 100 experiments and 350 publications, covering 14 decades in energy (from 10 6 to 10 20 eV)! • User interfaces: the many improvements we implemented should provide users with more options to retrieve data, along with a clearer presentation of the various interfaces. These changes include: (i) more data selection options (e.g. on times series) and plotting/export options (.pdf, .csv), including the possibility to display several CR quantities on the same plot; (ii) new help page and an example python notebook script to retrieve data via the REST interface; (iii) updated interface to retrieve Solar modulation level φ FF time series (or average) from any time period since 1950 with its dedicated REST interface, (iv) simplified file format to submit new data; (v) extended list of links for online CR resources. We hope in particular that the new submission format, with clearer explanations about the data and meta-data to fill, will help us always keeping CRDB up-to-date in the future.

Final thoughts on CRDB evolution
CRDB was thought from the start as a service to the CR community. In an ideal world, we would like experimentalists to submit their data to CRDB as soon as their results appear on the arXiv preprint server 21 or in a peer-review journal.
There are several possible directions along which CRDB could be improved. On a technical aspect, we are planning to make the website responsive for a better rendering on diverse electronic devices. On the data content aspect, we will obviously continue to upload new data as they appear, as quickly as we can. We would also be grateful for any help to extend the datasets to new quantities. Indeed, without any change in the database structure, we could quite easily include (i) more relevant quantities for UHECRs, i.e. the mean logarithmic mass ln A , the mean depth of shower maximum X max , etc.; (ii) dipole anisotropy spectral data as presented in Ahlers and Mertsch [126], (iii) upper limits or data on the neutrino CR spectrum discovered by the IceCube Collaboration [127]. As a first step in this direction, near the completion of this article, a preliminary agreement was reached with the KCDC team [24] to include their data in a forthcoming CRDB release. Finally, we could also imagine CRDB as a portal to gather more CR-related data, for instance hosting the many nuclear cross-section data presented in Génolini et al. [128]. These future developments will depend on the workforce available and feedback from the CR community.
To conclude, any help and feedback to further expand the database is welcome. Comments, questions, suggestions, and corrections are to be sent at crdb@lpsc.in2p3.fr.

Appendix A Tips and caveats on data and their extraction
CRDB users go to the online 'Data extraction' tab or use the REST interface to retrieve data, which is fine. However, they generally overlook how the extraction was made and miss possible caveats with some of the data. Below, we go through a few important items that the user should keep in mind. We stress that the REST interface goes through exactly the same steps to extract data from CRDB and thus the same cautions apply.

Appendix A.1 Difference between 'native' and 'combined' data
We recall the core data of CRDB are 'native' only, i.e. measurements made available in the publications (in tables or extracted from figures using DataThief III 22 ). However, many useful combinations can be formed from native data. For instance, starting from C and O native fluxes in CRDB, the user can extract C/O. In order to get meaningful results, this procedure has to ensure that C and O are from the same energy range or use the same central energy point. As described in details in App. A of Maurin et al. [22], priority rules exist to form such ratios, and tick boxes in the 'Data extraction' interface enable to select the energy tolerance for which CRDB accepts to form the ratios, in order to have exact (energy point within 5%) or approximate (energy point between [5%-20%]) combinations.

Appendix A.2 Data uncertainties in CRDB
Most pre-80's experiments were sensitivity-limited, meaning that the error budget was dominated by statistical uncertainties. When retrieved from CRDB, these data only show statistical errors and no systematics.
With improved detectors, the practice in the literature shifted towards paying more and more attention to systematic uncertainties. The latter are of various origins (inefficiencies, fragmentation in the detector, etc.). They are either provided as raw text description in the publication (see the example of HEAO-3 data discussed in App. A.2 of Maurin et al. [22]), or as a single number per data point in tables (separately from the statistical uncertainties). In the latter case, CRDB can directly provide statistical and systematics uncertainties as given in the publication. In the former case, we use the text description to estimate the systematic uncertainties, combining quadratically the 22 http://datathief.org various sources of systematics. In case in which the systematics description is not clear enough, or when data are indirectly retrieved (i.e. taken from plots), all the uncertainties are assumed to be of statistical origin 23 .
With the advent of high-precision experiments (e.g. AMS-02), the situation has become even trickier. It is now routine in publications to have tables with statistical and several systematics separately. Furthermore, as highlighted in Derome et al.
[26], systematics uncertainties usually are likely to be correlated for nearby energy bins. We did not think yet of modifications in CRDB to handle more than one systematics in the extraction, even less accounting for covariance matrices of systematics. Until further notice, all systematics are combined quadratically and provided as a a single number per data in CRDB.

Appendix A.3 Energy-scale uncertainty
Energy measurements of CR detectors suffer from two types of distortions, random and systematic. The random distortions differ from event to event and average to zero over many events. In flux measurements, these types of distortions lead to a softening of the measured flux compared to the true flux. Correcting these is possible with unfolding methods, which is the responsibility of each experiment. In the following we discuss the second kind of uncertainty. The second type of distortions are systematic and originate from the residual uncertainty of the energy calibration of the detector. These distortions are the same for each event, do not average out, and lead to shifts in the measured flux compared to the true flux.
In general, the energy distortion itself varies with the (true) energy, but experiments are designed and controlled to keep the relative calibration of energies within the covered energy range to higher accuracy. Therefore, the uncertainty can be usually summarised by a single number, the relative energy-scale uncertainty, for each experiment. Since the CR fluxes are mostly steeply falling power laws (∝ E −3 ), even a small energy-scale uncertainty has a significant impact on the uncertainty on the flux. An uncertainty of 10 % in the energy scale translates to a 30 % uncertainty on the flux.
In practice, experiment either include this as a systematic uncertainty for each data point, or quote the energy-scale uncertainty separately (and not propagate it to the flux). This is especially common for air shower experiments, which have energy-scale uncertainties of 10 % to 24 %. Whenever available in the publications, we fed this number to CRDB, so that it can be retrieved by users. However, we stress that this uncertainty is never accounted for in CRDB displays.

Appendix A.4 Energy axes and conversion to other energy axes
Most GCR data in CRDB are in kinetic energy per nucleon, as it was the standard for a long time. However, from an instrumental point of view, spectrometers measure rigidities and calorimeters the total energy, etc. With the advent of high-precision experiments, most data are published in their 'natural' energy axis, and CRDB enables the following energy axes: R, E k , E k/n , E tot , and E tot /A.
For isotopic fluxes, the mass, atomic number, and charge are uniquely defined, so that conversions between different energy axes are exact and enabled in CRDB-both the energy and the flux are modified. We list in Table A1 the formulae used to move from a native energy axis E into a queried axis E * , and how the converted CR flux dJ/dE * is linked to the original data dJ/dE.
For measurements that can only resolve fluxes from elements (or groups of elements), the conversion is no longer possible, because the mandatory (energy-dependent) isotopic content to do so is unknown. Nevertheless, assuming that elements (or groups of elements) are dominated by a single isotope, an approximate conversion can be provided. As this proves useful for several users, this approximate conversion is enabled since CRDB v1.2. It Table A1. Conversion between CRDB energy axes. The columns are (i) the native energy axis (E), (ii) the queried axis E * to which to convert, (iii) the relation between E * and E, (iv) the relation between the converted flux dJ/dE * and the native flux dJ/dE, and (v) β = v/c expressed as a function of the native energy axis E. In the formulae below, m is the CR mass in GeV, and energies are in GeV (E k and E tot ), GeV/n (E k/n ), GV (R), or GeV/A (E tot/A ).
takes as the dominant isotope the most abundant one in the Solar system [129], and this is defined, along with the associated values for A, Z, and m, in the ISOTOPE_PROXY table of the database (see Fig. 1). We list in Table A2 the proxies enabled for the new CR_QUANTITY names introduced in CRDB v4.0.

Appendix A.5 Data taking period (possible) overlap
By default, for a given sub-exp, the data extraction tool returns all requested data, regardless of their data taking periods. We caution the user to check for any overlap between dates. Because of the time-dependent Solar activity, datasets from different though overlapping data taking periods can be of interest in the context of Solar modulation studies. However, if one is only interested in the accumulated statistics, one should only consider datasets accumulated on the longest time period.