Atompy: an Open Atomic Data Curation Environment for Astrophysical Applications

We present a cloud-computing environment, referred to as AtomPy, based on Google-Drive Sheets and Pandas (Python Data Analysis Library) DataFrames to promote community-driven curation of atomic data for astrophysical applications, a stage beyond database development. The atomic model for each ionic species is contained in a multi-sheet workbook, tabulating representative sets of energy levels, A-values and electron impact effective collision strengths from different sources. The relevant issues that AtomPy intends to address are: (i) data quality by allowing open access to both data producers and users; (ii) comparisons of different datasets to facilitate accuracy assessments; (iii) downloading to local data structures (i.e., Pandas DataFrames) for further manipulation and analysis by prospective users; and (iv) data preservation by avoiding the discard of outdated sets. Data processing workflows are implemented by means of IPython Notebooks, and collaborative software developments are encouraged and managed within the GitHub social network. The facilities of AtomPy are illustrated with the critical assessment of the transition probabilities for ions in the hydrogen and helium isoelectronic sequences with atomic number Z ≤ 10.


Introduction
This report is concerned with problems of atomic data assessment and the development of cloud-computing tools to make it more efficient in the current data-intensive and collaborative research enterprise.In more than one way, two of us (C.M. and M.A.B) were early converts to this new scientific order, usually referred to as e-Science [1], by participating since the 1980s in the international Opacity Project (OP) [2,3] and Iron Project (IP) [4][5][6].The OP was concerned with the computation of the massive atomic data required to estimate astrophysical opacities, while the IP computed large radiative and collisional datasets for plasma diagnostics based on iron-group ions.Data dissemination initiatives within these two consortia led to the development of some of the first online atomic databases, namely TOPbase [7][8][9] and TIPbase [10], and the astrophysical opacities web service referred to as OPserver [11,12].
Today e-Science is transforming the whole research cycle, especially data assessment methodologies due to the large volumes involved, distributed repositories with heterogeneous data models and the widespread use of innovative and sophisticated information and communication technologies (ICT) that are difficult to assimilate by traditional scientists [13][14][15].Atomic data production is not alien to this revolution.We have seen, apart from the seminal NIST (National Institute of Standards and Technology) Atomic Spectra Database [16,17], the proliferation of atomic databases: CHIANTI [18][19][20], atomDB [21,22], uaDB [23], NORAD-Atomic-Data [24], the MCHF/MCDHF database [25] and VALD [26][27][28], to name a few.These databases vary greatly in terms of structure, data models and nomenclature.They also differ in data content, completeness, accuracy and provenance, and are hardly interoperable inasmuch as to facilitate user data searches.A remarkable attempt to improve integration and interoperability among atomic and molecular databases was launched in 2009 by the European Union: the Virtual Atomic and Molecular Data Centre (VAMDC) [29][30][31][32], aimed at implementing a cyberinfrastructure capable of interconnecting several (more than 15) of such databases.In this project an international group of physicists and computer scientists defined and implemented interoperability standards and protocols, virtual node architectures, XML schemata, query languages and web portals.
In spite of all these data activities, the issue of atomic data accuracy among astrophysical modelers remains inscrutable.The implementation of any spectral modeling code generally involves lengthy searches of atomic parameters in the aforementioned databases in order to piece together a master dataset as complete and accurate as possible.Such a task is usually reserved to experts capable of reviewing the available data, a job that may take up several years.There are many disadvantages to this approach: the master dataset becomes outdated even before its release; the process is expensive and unsustainable in the long term; newer atomic data are not taken into account and tested until they are incorporated into the databases; it is error-prone, from mistakes in data entry to expert misjudgment in data selection; there is scanty user feedback to the data producers and collectors; and a lot of the data are replaced, discarded or simply lost (together with the knowledge and investment that went into their production).With regards to expert judgment on the available atomic data, it must be emphasized that there are no exact solutions to multi-electron atomic structures; thus, atomic physicists must follow their own experience and rules of thumb when evaluating data worthiness from published papers, but that is no guarantee that the chosen datasets are indeed the most accurate.
To address some of these issues, we propose in Sections 2-3 a new data curation model driven by an open virtual research community of both data producers and users.As a previous experience, the evaluation of oxygen atomic data presented in Appendix A of the book Oxygen in the Universe [33] was efficiently supported by open Google-Drive spreadsheets that avoided the clogging of the text with an excessive number of tables.We have therefore been encouraged to extend this approach to the implementation of a cloud-computing data curation environment, to be referred hereafter as AtomPy, based on the functionalities of both Google Sheets and Pandas (Python Data Analysis Library) DataFrames.This system is described in Section 4 and fully used to evaluate in Section 5 the radiative rates for ions in the hydrogen and helium isoelectronic sequences with atomic number Z ≤ 10.Finally, some conclusions and recommendations are discussed in Section 6.

Virtual Research Communities
Collaborative data-intensive science essentially lives in the virtual space (cyberspace) brought about by a second generation Internet, where the geographical distances between researchers and their everyday facilities (instrumentation, computers, libraries, editors, etc.) are no longer a limitation [34].In addition to a marked increase in the number and scale of international research ventures, new patterns of collaboration are quickly emerging: social networks; virtual research communities (VRCs) and interdisciplinary research projects [35].The new dynamics are characterized by alternative communication channels, different divisions of labor, standardization of working habits, collaborative continuity, competency matching, specialization and cooperation.
As shown in Figure 1, the VRC allows a dispersed but networked group of researchers to work together effectively through the use of ICT.Within VRC cyberspace, they share data, software tools, facilities and information resources, intercommunicating and producing joint results.The productivity of a VRC depends on the development of certain capabilities: community management; support for distributed research and collaboration; human-computer interfaces and interoperability that inevitably require an open and flexible middleware of computer tools and services [36].
The VRC, particularly one consisting of both data producers and users, offers singular potential for efficient and sustainable data assessment.It presents, in fact, a unique opportunity to nucleate and consolidate both camps using data quality as the new currency to debug and streamline the whole research chain.It must be mentioned that cohesiveness between atomic data producers and users has not been easy to broker in the past, in spite of periodic international conferences such as ICAMDATA [37], joint symposia (e.g., the Tenerife meeting in October 2010 on "Uncertainties in atomic data and how they propagate in chemical abundances" [38,39]), capacity development schools (e.g., NebulAtom [40]), and the AstroAtom blog [41].However, there are exceptions to this inherent detachment, in particular the extensive astrophysical benchmarks that have been carried out of iron EUV and X-ray lines for the CHIANTI database [42][43][44][45][46][47][48][49][50][51][52][53][54][55][56].

Community-Driven Data Curation
With the onset of the data deluge, it was soon realized that the scale and complexity of scientific data would require active management during their complete life cycle in order to ensure trustworthiness, integrity, access, fitness for use and reusability [57][58][59].These activities are now part of an emerging and growing discipline referred to as data curation dedicated to the present and future maximization of data potential, in particular of knowledge-centric data.It then becomes clear that the process of data assessment implies a stage beyond database development very much associated to the realms of this new field, which is the main concern of the present work.
In an extensive analysis of digital data curation [58], an integrated three-level model has been proposed to address the demands of e-Science (see Figure 2).The black arrows denote data pathways in the traditional research process (Level 1): primary data are used to derive secondary and tertiary data, the latter finally resulting in papers that are published in the usual peer review journals eventually reaching a wider audience (other scientists, general public, industry and libraries).Primary and secondary data may also be archived enabling alongside (red arrows) data-based research (Level 2) and, hopefully, new discoveries.A further level of long-term curation (Level 3) involves, apart from the Level-2 archives, more sophisticated and dynamic data repositories with extensive metadata and hyperlinks (blue arrows) to promote collection-based research.The traditional data archivist now becomes the data curator and the publication process experiments with alternative media.We are of the opinion that data assessment activities should be managed in Level 3 of this model.
Since data assessment is in general a long-term activity, it is limited by contemporary funding time scales that are mostly short-term and project-based; its sustainability is consequently always fragile.In this respect, we are exploring the feasibility of a self-sustainable model based on an open VRC as described in Section 2 and on community-driven data curation similar to Wikipedia [60], ChemSpider [61] and WikiGenes [62,63].In a comprehensive analysis of this approach [64], its effectiveness is discussed when the availability of updated, large-scale and dynamic datasets is critical, and its success depends on several well-defined practices: data producer-user involvement; the promotion of outreach activities; member incentives and a governance model based on meritocracy.The need for robust and standardized data representations, a balance between human-and computer-based curation methods and for data provenance and preservation, is also therein emphasized.Moreover, the participation of the data users in the curation process is key to the development of modern repositories, and curation activities would need to start early in the research cycle [65].This is certainly the case of atomic data where accuracy must be reinforced through diagnostic benchmarks carried out mostly by plasma modelers.While Wikipedia uses the wiki as its main content building block, we propose the Google Sheet and its counterpart in the Python environment, the Pandas DataFrame, as the basic structures for data assessment due to their powerful and well-known data manipulation functions.Therefore, the AtomPy software is fairly simple: it is essentially confined to an Application Programming Interface (API) for data downloading from Google Drive to the user disk space and a series of Python utilities developed by community members to be shared in the GitHub [66] social network.

AtomPy
AtomPy is a cloud environment for atomic data curation rather than an atomic database in the sense that a prospective user, apart from being able to search for data, is encouraged to contribute with datasets and utilities to facilitate data interfaces, comparisons, assessments and, ultimately, preservation.The AtomPy [67] atomic data and metadata reside in spreadsheets in Google Drive (see Section 4.1), where they can be openly accessed, modified and downloaded by any user.Data downloading can be carried out through the different format options offered by Google Sheets (Section 4.2) or to local Pandas DataFrames (Section 4.3) by means of the AtomPy API (Section 4.4) for further manipulation.Data uploading by prospective contributors to existing or new spreadsheets is at present managed through the usual Google-Drive channels.In order to facilitate user interaction, workflows for different data manipulation procedures are to be implemented with IPython notebooks (Section 4.5), which can be accessed from the GitHub repository (Section 4.6); that is, the end user is especially encouraged to contribute to the notebook and module pools in GitHub.Details of the installation of the AtomPy Python modules are given in Section 4.7.

AtomPy Spreadsheet Structure
AtomPy contains three reference spreadsheets with useful atomic information: elements-lists the names, symbols and atomic weights for chemical elements indexed with the atomic number Z ≤ 118; ions-lists the symbol, ground electronic configuration, ground spectroscopic term, total angular momentum (J) and ionization potential (in eV) for each ionic species indexed with the atomic number and electron number tuple (Z, N ), where 1 ≤ Z ≤ 110 and 1 ≤ N ≤ Z; isotopes-lists symbols, atomic weights and fractions for isotopes indexed with the atomic number and mass number tuple (Z, M ) for Z ≤ 118.
In Figure 3 we show the ions sheet where it may be appreciated that, for each ionization potential, its value and uncertainty are both given.The source reference, in this case NIST, is also specified and hyperlinked.
The atomic data (energy levels, radiative A-values, collision strengths and electron effective collision strengths) for ionic species (Z, N ) are stored in the Google workbook zz_nn.Xi, where zz and nn are two-character strings respectively associated to the atomic and electron numbers.X denotes the spreadsheets included in this workbook: E-contains the level energies of the atomic model; A-contains the radiative transition probabilities (A-values) and, in some cases, f -values; O-lists energy tabulations of collision strengths; U-lists temperature tabulations of effective collision strengths.
There may be more than one spreadsheet for each data type (i.e., i=0,1... where i=0 is the default); for example, when LS and intermediate-coupling atomic models are both considered or when allowed and forbidden transitions are analyzed separately.Each ionic workbook is displayed in both the IsonuclearSequences/zz and IsoelectronicSequences/nn subdirectories of AtomPy.An important point here is the selected atomic model for each species that is mainly determined by astrophysical requirements, and is limited to a set of levels for which both radiative and collisional data have been reported.
The workbook for He I, for example, is labeled 02_02.In Figure 4 we show: (i) the 02_02.E0 spreadsheet listing level energies for a 49-level atomic model and (ii) 02_02.A0 with A-values for transitions with upper level k ≤ 7. The key advantage of AtomPy in data evaluation with respect to regular atomic databases is that for each atomic attribute-e.g., level energies E(Z, N, i) or radiative transition probabilities A(Z, N, k, i) (see indexing in E0 and A0)-it displays side by side values from several sources, which can then be statistically or graphically compared using the versatile functions of spreadsheets and DataFrames.Furthermore, the general policy is not to replace older datasets as new ones appear, thus contributing to data preservation for future reuse.

Google Sheets
The data processing capabilities of AtomPy are boosted by the paramount functionality of Google Sheet [68], essentially a cloud web-based spreadsheet application and data repository.Its functions regarding data manipulation, formatting and exporting, formula editing and chart plotting are comparable to the more familiar desktop spreadsheet packages; but additionally, it opens up new attractive possibilities for the VRC dynamics and community-based data curation described in Sections 2 and 3: • Simultaneous distributed editing; • URL access; • Dynamic embedding in websites and blogs.
Google Sheets offer different data sharing profiles, but to comply with the precepts of Section 3, full open access with editing capabilities has been adopted.Therefore, one could easily envision a geographically distributed group of researchers jointly evaluating, in real time, data tabulated in a single, common, editable spreadsheet stored in the cloud and displayed at each member's site through a web browser.URL access has also enabled, through the HyperText Transfer Protocol (HTTP), the development of the AtomPy API, which is described in Section 4.4.On the other hand, Google Sheets do have data volume restrictions and the user may complain about poor response times, but in our opinion, its other positive features amply compensate for its present shortcomings.

Pandas DataFrames
The Python Data Analysis Library (Pandas [69]) provides high-level data structures and computer tools for large-scale data analysis and modeling in the Python environment, and is built on top of the NumPy package for high-performance scientific computing.Within Pandas, DataFrames are two-dimensional spreadsheet-like data structures with integrated indexing.In a similar fashion to spreadsheets, they provide extensive built-in functions for data manipulation; e.g., selection, filtering, indexing and re-indexing, mapping, sorting, ranking, uploading, storage, plotting and entry dropping.In practice, DataFrames would facilitate interfacing with modeling codes, particularly if model sensitivity to different atomic datasets is to be regularly tested.

API
The AtomPy API allows the user to download data from the Google-Drive cloud space into local Python data structures, in particular tuples and DataFrames.Its modules reside in both the Python Package Index (PyPI [70]) and GitHub [71] repositories from where they can be downloaded for local use (see Section 4.7 for details regarding installation and module dependencies).
AtomPy is invoked within a Python shell or IPython [72] interactive environment (see Section 4.

5) with the command
In [1]: import atompy Initializing AtomPy... AtomPy ready!Data from the elements, ions and isotopes reference sheets (see Section 4.1) can be addressed with commands of the type In [2]: atompy.element(2)Out [2]: ('Helium', 'He') to list the name and symbol of the chemical element with atomic number Z = 2, or its atomic weight It must be noted that the system returns a tuple rather than a single datum that includes, in the case of the atomic weight, both its nominal value and uncertainty.For the He I system, (Z, N ) = (2, 2), ions would provide In [4] The local df DataFrame now contains data for four spreadsheets, namely E0, A0, A1 and U0.Attributes for the first 10 levels of E0, for example, can be listed with the Pandas command The A-values for the electric dipole (E1) transitions in sheet A0 for source "S5" are obtained with where each transition is now indexed with the tuple (Z, N, k, i).It may be seen that the DataFrame structure can handle empty items labeling them with NaN.Furthermore, the metadata for the source references in A0 can also be obtained with a similar command We have tried to give in this section a brief overall view of the possibilities of AtomPy in the Python sphere.What must be emphasized is that most of the data manipulation commands that have been shown-and there are many more-actually belong to Pandas rather than to our API; hence, the latter is a fairly concise yet powerful piece of software with the intention of promoting further module development by both atomic data producers and users.

IPython Notebook
Due to large and diverse data volumes and the mushrooming of distributed repositories, the introduction of scalable methods for data management and analysis such as workflow tools are becoming a pivotal feature [73,74].A workflow is the blueprint of a multi-step scientific process that facilitates its automation, validation and reproduction, and therefore must integrate a cadre of distributed computational services, applications and databases without the need of relocation and low-level programming.
There are several workflow tools currently available, but in the present context we have implemented and recommend the IPython Notebook [75] (installation details are given in Section 4.7).It is a web-based interactive computational environment for writing documents that includes hypertext, mathematics, graphs, images, video and, most importantly, dynamic input/output data and code execution.Such documents can then be shared, run and modified by a community of users that are more interested in pursuing scientific endeavors than getting immersed in the technical aspects of the procedures involved.A Notebook file (myfile.ipynb,say) can be imported and run locally in the IPython shell with the command ipython notebook myfile or displayed as a static web page with the IPython Notebook Viewer [76].
Interactive instructions for running AtomPy, or any other utility that makes use of its API, written in the Notebook format can be stored in the AtomPy GitHub repository (see Section 4.6) for general downloading.

GitHub
A salient feature of data-intensive science is that the end user has evolved from the isolated individual to become a social network, and GitHub [66] is essentially a social network of programmers that offers software repositories (both public and private), code sharing, publishing services and project management tools for collaborative code development.It is built around the Git version control system, and its functionality is based on three methods: the fork, the pull request and the merge.Forking allows the copying of a repository from one account to another such that its code can be modified at will.If the intention is to share the new changes, a pull request is made to the primal owner who can then decide to merge them into the mainstream project.
As previously discussed (see Sections 2 and 3), the idea behind AtomPy is to promote joint activities among the atomic data-producer and astrophysical communities where the sharing of data processing utilities is among the most attractive.In this respect, the AtomPy API can then be a pipeline between the atomic data worksheet repositories and prospective spectral modeling codes.In a similar fashion, IPython can be installed with the command pip install ipython which should take care of all the prerequisites for the Notebook option.

Radiative Data Assessment
We are interested in compiling and assessing atomic models containing both radiative and collisional data for astrophysical applications.Therefore, the number of energy levels in the models is determined on the one hand by user requirements and, on the other, by what is actually available from the data producers who have attended such a demand; for the simpler systems, this is usually limited to electron configurations with principal quantum number n ≤ 5. To illustrate the possibilities of AtomPy, we review here the fairly large datasets of level energies and radiative rates that have been computed for ions of the hydrogen and helium isoelectronic sequences with Z ≤ 10 by means of well-established structure and scattering codes: the Breit-Pauli, configuration-interaction (CI) SUPERSTRUCTURE [81], AUTOSTRUCTURE [82,83] and CIV3 [84]; the multiconfiguration Hartree-Fock MCHF [85]; the multiconfiguration Dirac-Fock GRASP [86]; and the electron-ion scattering R-matrix package in both its LS and intermediate coupling (IC) versions [87].

H Sequence
Highly accurate radiative transition probabilities for both allowed and forbidden lines of the hydrogen isotopes (H, D and T) have been critically compiled by Wiese and Fuhr [88], who recommend scaling laws for other members of the H isoelectronic sequence with low Z.Also, a noteworthy measurement in one-electron systems is the lifetime of the 2p 1/2 level in He II at 99.717 ± 0.075 ps, which is in excellent agreement with theory (99.6891 ps) thus confirming basic radiation theory at the 0.075% level [89].
Since the relativistic A-values for E1 transitions listed in the online tables [99] of Jitrik and Bunge [100,101] agree with those in [88] to five significant figures and provide a more complete treatment of the forbidden transitions, we adopt their datasets as the standard for comparison.The relativistic A-values in [102] are not included in this study since only E1 transitions for selected ions were therein considered, neither will the two independent relativistic calculations [103,104] on the 2s−1s two-photon transition as they are in almost perfect accord.
Table 1.Average permyriad (1/10 4 ) differences between the A-values computed with GRASP [98] for hydrogenic ions and the standard [100,101].Allowed (E1) and forbidden (E2, E3, M1, M2 and M3) transitions between levels with principal quantum number n ≤ 5 in (Z, N ) = (1−7, 1) are considered.A(4, 1) In Table 1, we tabulate average relative differences between the A-values computed with GRASP [98] for transitions involving levels with principal quantum number n ≤ 5 in H-like ions (Z ≤ 7) and those in the standard datasets [100,101].In this comparison, we only include transitions with log A(Z, N, k, i) > −10, and excellent agreement (a few parts in 10 4 ) is found except for M1 transitions.For H-like ions with low Z, such transitions have very small line strengths and, thus, very small A-values; furthermore, the M1 transition operator in some of the computer packages is coded in its exact relativistic form, while in others the Breit two-body corrections are added to the zero-order term where l(m) is the m th electron angular momentum operator and σ m is twice its spin operator [105,106].These higher order corrections can lead to remarkable contributions (orders of magnitude in some cases) to the matrix element in H-like systems.In particular, four M1 transitions, namely 4s 1/2 −3d 3/2 , 5s 1/2 −3d 3/2 , 5s 1/2 −4d 3/2 and 5p 3/2 −4f 5/2 , have very small A-values showing discrepancies as large as an order of magnitude.It may also be appreciated in Table 1 that the differences subside as Z increases, and by Z > 7 they are expected to be less than 1 part in 10 4 .Statistical comparisons with the A-values in the NORAD database for the H isolectronic sequence-namely allowed and forbidden transitions between states with n ≤ 4 in (Z, N ) = (1−2, 1), (6−8, 1) and (10−11, 1) [90][91][92][93][94][95][96]-are impaired by the low number (i.e., 3) of significant figures with which they have been tabulated.Therefore, for the E1 and E2 transitions, we can only assert that their accuracy is around or better than 1%.For the E3, M1 and M2 transitions, on the other hand, we have found some problems that are perhaps associated to an early, untested prototype of SUPERSTRUCTURE used to compute these radiative data.While the A-values for some transitions are reasonably accurate, for others significant discrepancies appear and many are missing.For instance, the listed A(1, 1, k, i) E3 are found to be accurate to better than 1% except for A(1, 1, 3d 5/2 , 2p 1/2 ) E3 , which differs by 60% (see Table 2), and A(1, 1, 3d 3/2 , 2p 3/2 ) E3 , A(1, 1, 3d 5/2 , 2p 3/2 ) E3 and A(1, 1, 4d 3/2 , 3p 3/2 ) E3 are not quoted.Only five M2 transitions are given, and as shown in Table 2, their A-values are a factor of four too high suggesting an algebraic bug.These inconsistencies have been verified with A-values computed with AUTOSTRUCTURE by us that are also tabulated in Table 2. Furthermore, the problems previously discussed involving M1 transitions also manifest themselves in this comparison; for the M1 transitions listed in Table 2, there are definite correlations among the four datasets but also some outstanding mismatches.
As can be confirmed in the 02_02 AtomPy workbook, the IC energies computed with the combined R-matrix-MCHF method in [118] for He I are on average −18 ± 2 cm −1 from those listed in the NIST tables.Also, their A-values for E1 transitions are within 2% of the standard [88] except for intercombination transitions with very small rates, log A(2, 2, k, i) E1 < 3, for which the differences are somewhat larger ( 10%); e.g., A(2, 2, In general, this may be regarded as outstanding agreement.The NORAD LS term energies computed with the R-matrix method for He I [95] are also found to be below the NIST spectroscopic values, in this case by ∆E(2, 2) = −3640 ± 70 cm −1 .This comparison can be extended to the A-values computed with the same method for this system in the OP [129] and listed in TOPbase [9], resulting in a somewhat smaller average difference of ∆E(2, 2) = −2220 ± 60 cm −1 .In our opinion, this noticeable improvement is the result of the inclusion of pseudo-states in addition to spectroscopic states, namely, {1s, 2s, 2p, 1d, 3p} in the OP hydrogenic target models (the pseudo-orbitals 3p and 1d are introduced to respectively account for the dipole and quadrupole polarizabilities of the 1s).In contrast, NORAD only considered the spectroscopic target states {1s, 2s, 2p, 3s, 3p, 3d, 4s, 4p, 4d, 4f} for this sequence.Nonetheless, the accuracy ranking of the A-values obtained in these two calculations is comparable ( 5% with respect to the standard [88]) except for the transitions listed in Table 3.The discrepant OP transitions are between ∆n = 0 states that are energetically very close and thus are subject to cancellation effects and wavelength corrections.The problematic transitions in NORAD are between D − F states that would probably need further CI to attain convergence.
Table 3. A-values (s −1 ) for E1 transitions in He I that show discrepancies larger than 10% with respect to the critical compilation of [88] (WF).NORAD: rates from the NORAD database computed with the R-matrix method in a 10-state approximation [95].OP: results from the OP [129] listed in TOPbase.Wavelengths are determined from the NIST term values.
The NORAD database also lists LS terms and radiative data for E1 transitions in higher members of the He sequence, namely (6−8, 2), computed with the R-matrix method [119,120], where a healthy situation similar to (2, 2) is encountered (see the AtomPy 06_02, 07_02 and 08_02 workbooks).
The level energies E(6−8, 2) and E(10, 2) computed with the relativistic CI method of [121] are of sufficient accuracy to encourage the evaluation of the spectroscopic data of the NIST database (v5.1).For (6, 2), the NIST level energies originate from the measurements of [130], and on average, the difference with theory is found to be a remarkable ∆E(6, 2) = 51 ± 14 cm −1 .For the other species, they are taken from the unpublished level list in [131] that has not been critically assessed by NIST, and although the agreement with theory is not as good, it enables the detection of poor measurements.As shown in Figure 5, the differences with theory for Ne IX are within the band ∆E(10, 2) = ±1000 cm −1 except for six levels: 3 1 D 2 , 4 1 F o 3 , 5 3 D J and 5 1 F o 3 , whose experimental energy positions could perhaps be due for a revision.
Figure 5. Level energy differences in Ne IX between the theoretical values of [121] and the experimental data listed in the NIST database (v5.1).Such differences are mostly bound to the interval ±1000 cm −1 except for six levels whose spectroscopic positions are open for a revision.
For E1 transitions, the A(6−8, 2) E1 computed in [121] are within 0.7% of the standard except for A(7−8, 2, n 1 P o 1 , n 1 D 2 ) E1 and A(7−8, 2, n 3 D J , n 3 P o J ) E1 ; that is, ∆n = 0 transitions subject to the strong aforementioned cancellation effects.Furthermore, their n = 2 forbidden transitions are also in good accord ( 1%) except for the sensitive intercombination transitions A(6−8, 2, 2 3 P o 1 , 1 1 S 0 ) E1 that are 4%.The most revealing comparison, in fact, is of the electric quadrupole rates A(Z, 2, 3−5 1 D 2 , 1 1 S 0 ) E2 computed by Savukov et al. [121] with those by Godefroid and Verhaegen [108] using the MCHF method and Cann and Thakkar [112] by means of explicitly correlated wave functions.Apart from satisfactory agreement ( 5%), it brings out incorrect data for A(6−8, 2, 3 1 D 2 , 1 1 S 0 ) E2 in the NIST database (v5.1); as indicated in the NIST web page, the A-values by Cann and Thakkar [112] are listed for this transition for 2 ≤ Z ≤ 5 while those by Godefroid and Verhaegen [108] for 6 ≤ Z ≤ 8.However, as demonstrated in Table 4, the NIST A-values for 6 ≤ Z ≤ 8 are low by a factor of 2/3, which appears to indicate the use of incorrect statistical weights in the conversion formula from gf -values to A-values; it may also be therein appreciated the excellent overall agreement that prevails among the theoretical data.
showing the incorrectly assigned data in the NIST database (v5.1) for 6 ≤ Z ≤ 8. GV: [108] computed with the MCHF method.CT: [112] using explicitly correlated functions.SJS: [121] computed with a relativistic CI method.Excellent agreement is otherwise found among the computed A-values.Wavelengths are determined from the NIST level energies.(10, 2) [127,128] with SUPERSTRUCTURE, which are contained in the AtomPy zz_02 workbooks.In Table 5, we tabulate for these species the average differences between the spectroscopic level energies listed in the NIST database (v5.1) and those obtained with GRASP and SUPERSTRUCTURE; it must be noted that, for (3−9, 2), they have been calculated with and without the Breit and QED corrections.For (3−9, 2), the GRASP average differences are approximately constant at ∆E(3−9, 2) ∼ −22 × 10 3 cm −1 , and the inclusion of the Breit and QED corrections does not lead to significant reductions.By contrast, for (10, 2) the GRASP average difference is an order of magnitude smaller (1.5 × 10 3 cm −1 ) and so are those obtained with SUPERSTRUCTURE for other ions, a level of agreement more compatible with that encountered in the calculation by [121].It must also be appreciated that, for (10, 2), the SUPERSTRUCTURE energy differences are both positive and negative depending on authorship.In conclusion, the general outcome of this exercise seems to indicate that a variety of strategies for atomic model optimization have been employed-some more successful than others-and due to their relevance in the final data products, a great deal of pondering and effort must go into the atomic system representation.
Moreover, we have found that only a fraction of the published A-values for He-like ions computed with GRASP are in reasonable agreement (within 10%, say) with the standard and the accurate dataset by [121].This fraction is ∼ 40%-60% for E1 transitions and somewhat lower or similar for the forbidden transitions: ∼ 30%-50% for E2 and M1 and ∼ 50%-60% for M2, and it tends to increase slowly with Z.The A-values calculated in [127] for (10, 2) with SUPERSTRUCTURE are mainly for 17 strong E1 transitions (log A > 7) that are found to be in good agreement (better than 10%) with the standard except for A(10, 2, 3 1 P o , 1 1 S).Ratio of theoretical A-values for selected transitions relative to the standard [110,113].Star: calculation with GRASP for (10, 2) [125].Squares: calculation with AUTOSTRUCTURE, present work.Crosses: relativistic CI calculation [121].Diamonds: A-values from the NORAD database computed with the R-matrix method [95,119,120].Triangles: OP [129].
In order to illustrate the problems in such datasets, we plot in Figure 6 the ratio relative to the standard [110,113] of A-values computed with different methods for three transitions along the isoelectronic sequence: the A(2−10, 2, 2 1 P o , 1 1 S) resonance (E1) transition, the A(2−10, 2, 2 3 S, 1 1 S) M1 transition within n = 2 and the A(2−10, 2, 3 1 D, 1 1 S) E2 transition.For the E1 transition, it may be seen that, in comparison with the R-matrix (including OP) [95,119,120,129] and the relativistic CI method of [121], the data computed with GRASP are noticeably higher ( 20%) for low Z, which seems to indicate insufficient correlation.We have also calculated A-values for the He-like ions (2−10, 2) with AUTOSTRUCTURE using atomic models containing the configurations with 0 ≤ n ≤ n − 1, and as shown in Figure 6a, for A(Z, 2, 2 1 P o , 1 1 S) E1 they are not much different from those by GRASP.More comparable values are obtained with AUTOSTRUCTURE for low Z by adopting the CI expansions of [108]: with 1 ≤ n ≤ 4, 2 ≤ n ≤ 4 and 3 ≤ n ≤ 4; however, the A-values are found to be very sensitive to the choice of the λ 1s scaling parameter of the statistical model potential used to generate the 1s orbital, not making it easy to arrive at an optimized value variationally.Single-excitation CI expansions of the type in Equation ( 4) have been extensively used for He-like targets in scattering calculations, while those in Equations ( 5) and ( 6), which include double excitations, would rapidly lead to unmanageable collisional targets.However, the quality of a collisional target is determined to a great extent by the accuracy of its radiative signatures; therefore, targets built up with poor CI expansions can lead to unreliable collision strengths.Furthermore, the situation with the M1 and E2 transitions in Figure 6b,c is not much different, i.e., distinctively discrepant A-values for low Z, but in this case AUTOSTRUCTURE performs somewhat better than GRASP.Again, more accurate A-values for low Z can be obtained with AUTOSTRUCTURE with the double-excitation CI expansions [108] |2 3 S = {1s2s, 3s4s, 2p3p, 4p5p, 3d4d} This unconverged correlation in He-like collisional targets becomes more acute for excited states as shown in Figure 7 where we plot A(Z, 2, n 1 P o , 1 1 S) E1 for 2 ≤ n ≤ 9 and Z = 6 and Z = 10.Firstly, there is excellent agreement between the OP A-values [129] and those obtained with the relativistic CI calculation of [121].On the other hand, the differences with the data computed with GRASP for C V [124] and Ne IX [125] are significant and grow with n; for C V, the relative difference between GRASP and [121] for n = 5 is a factor of three (see Figure 7).It may also be seen that A-values computed with AUTOSTRUCTURE show this similar discouraging behavior.
There have been extensive measurements of level lifetimes in He-like ions which, taking advantage of the high accuracy of the theoretical data, have led to useful benchmarks and progressive refinement of a variety of experimental methods.In Tables 6 and 7 we give a representative but by no means exhaustive comparison between experiment and theory for (2−10, 2).It may be appreciated that the portfolio for He I includes excited states with n ≤ 5 while for Z > 2 it is limited to n = 2, but in most cases the theoretical lifetimes lie within the experimental error bars.Outstanding agreement (better than 1%) is found for τ (2, 2, 2−5 1 P o 1 ) and τ (2−10, 2, 2 3 S 1 ), except for Z = 3 where it deteriorates to ∼ 6% due to experimental limitations of the heavy-ion storage ring [132].The accord for τ (6−9, 2, 2 3 P o 1 ) and τ (4−9, 2, 2 3 P o 2 ) is found to be within 5%.As shown in Table 8, measurements have also been reported for the transition rates A(2−9, 2, 2 3 P o 1 , 1 1 S 0 ) E1 and A(2, 2, 2−5 1 P o 1 , 1 1 S 0 ) E1 , where the agreement with theory is within 3% except for A(8, 2, 2 3 P o 1 , 1 1 S 0 ) E1 .

Conclusions and Recommendations
In the present work we have made an attempt to bring out and discuss new scientific research directions that are driven by collaborative data-intensive projects, and in this context, some of the problems that compromise reliable atomic data assessments for the astrophysical community.We have thus proposed a new scheme based on an open virtual research community of both atomic data producers and users to be engaged in an ongoing data-curation pursuit where data quality will be the prevalent currency.For this purpose we have developed a working prototype of a cloud-computing data curation environment on Google Drive, referred to as AtomPy, to promote the use, reuse, assessment and preservation of the atomic datasets.AtomPy is based on the powerful functionalities of Google Sheets and Pandas DataFrames, and opens a door to the GitHub social network for the development and sharing of atomic data manipulation workflows and Python utilities.The concept and possibilities of the workflow representation as the blueprint of a research process is innovative and is likely to play a major role in data-intensive science, and among the many workflow tools that are becoming available and after the experience of the present work, we recommend the use of IPython Notebooks.
With the aid of AtomPy we have proceeded to review the energy structures and radiative rates of ions in the hydrogen and helium isoelectronic sequences with atomic number Z ≤ 10, which has been an interesting opportunity to gauge the accuracy level that can be actually attained with the main body of the atomic computational codes, among them SUPERSTRUCTURE, AUTOSTRUCTURE, GRASP, MCHF, CIV3 and R-matrix.As a result, we intend to denote our recommended datasets in the AtomPy metadata by upgrading their source identifications from Sn to Rn where n is just an integer assignment.
The critical compilations of radiative rates for isonuclear sequences with Z ≤ 5 [88,149] have prepared the stage for the present dataset comparisons by establishing the standard references for the H [100,101] and He [105,[107][108][109][110][111][112][113][114][115][116] sequences.For the H sequence, we find that the A-values for allowed (E1) and forbidden (E2, E3, M1, M2, M3) transitions computed with GRASP [97,98] are highly accurate (a few parts in 10 4 ), the larger differences being among the magnetic dipole (M1) transitions with small line strengths.The relativistic A-values for E1 transitions in selected H-like ions of [102] also display this accuracy ranking.On the other hand, the A-values listed in the NORAD database for this sequence and computed with SUPERSTRUCTURE [90][91][92][93][94][95][96] are found to be accurate (∼ 1%) for E1 and E2 transitions, but large unexpected discrepancies appear for some E3, M1 and M2 transitions probably due to computer bugs.
Regarding the He isoelectronic sequence, we have found that in general the E1 A-values calculated with the R-matrix method for the neutral and higher members-in both LS [90,95,119,120] and intermediate [118] couplings and involving highly excited states (n ≤ 10)-are of satisfactory accuracy.We should also include in this ranked group the older rates from the OP [129], which reinforce the longstanding effectiveness of the R-matrix method for the computation of large and accurate radiative datasets for allowed transitions; furthermore, taking into account the impressive accuracy reached in the IC data for He I [118] with a variation of this method, it can certainly be extended to the more sensitive intercombination (∆S = 0) transitions.
We also find that the energies and radiative rates for E1, E2, M1 and M2 transitions in selected He-like ions computed in a fully relativistic CI approach [121] are of outstanding accuracy (A-values better than 1%), and have allowed us to evaluate the consistency of other energy and radiative datasets, including some of the NIST database: some inaccurate E2 A-values and curable spectroscopic level energies in Ne IX.
Most of the radiative datasets that have been computed with GRASP [122][123][124][125] and SUPERSTRUCTURE [127] (A-values from [126,128] were not available) for He-like ions have used CI expansions streamlined for collisional targets rather than to obtain accurate A-values.Therefore, such CI is not fully converged, and the accuracy for both allowed and forbidden transitions in low-Z ions and involving excited states can be poor.In the case of the datasets computed with GRASP, just over a half of the A-values are within 10% of the standard, particularly discrepant transitions being those in the principal series and E2 in ions with Z ≤ 8.The situation is not that critical in the A-values for Ne IX listed in [127] as only strong E1, low-n transitions were considered.
We have carried out a fairly extensive comparison of experimental and theoretical lifetimes for He-like ions that brings out two important points.Firstly, the high degree of experimental precision for metastables, e.g., τ (Z, 2, 2 3 S), and the excellent agreement with theoretical estimates have thoroughly validated the finer grain of radiation theory (e.g., Breit M1 corrections).Secondly, the quality of theoretical radiative rates for He-like ions has been useful to benchmark a wide variety of experimental techniques that intend to focus excited states and states with much shorter lifetimes.
We would like to close this review by recalling that the datasets that have been analyzed in the present work are openly available from AtomPy [67], and by making a plea to data producers regarding publishing formats.The latter do not often facilitate machine reading or even manual transcriptions and, consequently, become a burden for compilations and sound curation.A general effort to make the atomic data available at least in digital format would indeed be welcome.assessment for ionic species of the H and He sequences presented here.Josiah Boswell developed, coded and tested the different versions of the AtomPy prototype, and together with David Ajoku, implemented utilities to download massive datasets from several websites and publications, making them available in practical digital formats for further data processing.

Figure 1 .
Figure 1.Example of a virtual research community.Image source: Figure 1 of [36].

Figure 2 .
Figure 2. Three-level data curation model proposed for e-Science specifications.Image source: Figure 10 of [58].

Figure 3 .
Figure 3. Google reference sheet ions of AtomPy listing NIST ionization potentials (IP) for elements with Z ≤ 110 (only those with Z ≤ 8 are displayed).Note that IP uncertainties are also included.

Figure 4 .
Figure 4. Workbook 02_02 for the (2, 2) ionic system (i.e., He I) showing the 02_02.E0 sheet with a 49-level atomic model (only the first 15 levels are depicted) and the 02_02.A0 sheet with A-values for transitions with upper level k ≤ 7.

Table 8 .
Experimental and theoretical transition rates (s −1 ) for He-like ions.