PyAtomDB: Extending the AtomDB Atomic Database to Model New Plasma Processes and Uncertainties

The AtomDB project provides models of X-ray and extreme ultraviolet emitting astrophysical spectra for optically thin, hot plasma. We present the new software package, PyAtomDB, which now underpins the entire project, providing access to the underlying database, collisional radiative model calculations, and spectrum generation for a range of models. PyAtomDB is easily extensible, allowing users to build new tools and models for use in analysis packages such as XSPEC. We present two of these, the kappa and ACX models for non-Maxwellian and Charge-Exchange plasmas respectively. In addition, PyAtomDB allows for full open access to the apec code, which underlies all of the AtomDB spectra and has enabled the development of a module for estimating the sensitivity of emission lines and diagnostic line ratios to uncertainties in the underlying atomic data. We present these publicly available tools and results for several X-ray diagnostics of Fe L-shell ions and He-like ions as examples.


History of the AtomDB Project
The AtomDB project was originally designed to model X-ray emission from collisionally ionized, optically thin plasma in thermal equilibrium. It built on a history of development, starting with the Raymond-Smith code [1] in 1977 which created spectra for the most abundant 12 elements in a hot plasma using tabulated Gaunt factors and effective collision strengths. In 1985, this was updated by Brickhouse, Raymond, and Smith [2] to improve both the code and atomic data, particularly for the iron L-and K-shell ions. Effective collision strengths were updated and, significantly, the atomic data was separated from the code into independent data files.
In 2001, the first version of AtomDB was released [3]. This included the first release of the publicly accessible atomic database (APED: Astrophysical Plasma Emission Database), complete with defined filetypes as standardized FITS files [4]. In addition, the thermal plasma model, the Astrophysical Plasma Emission Code (APEC), was written and used to convert this database into line and continuum emissivities, which were and are still used extensively in spectral analysis tools such as XSPEC [5] and Sherpa [6]. This was accompanied by another large scale update to the atomic data in APED, in particular for modeling He-like and H-like ions.
In 2012, AtomDB version 2 was released [7]. This again included a large update to atomic data in APED: Ionization and recombination rates were updated, along with new excitation data for He-like and H-like ions. In addition, APED was expanded to include all the elements up to Nickel (previously it had covered only the 14 most abundant elements).
Four years later, AtomDB version 3 was released. This involved a major expansion of the code, the database and tools used to interpret the spectra to allow for the modeling of non-equilibrium ionization plasma, as well as updates to existing atomic data. In particular, there was a need to handle much larger data files (inner shell processes opened up many new levels), calculations for one ion at a time, and new input and output formats suitable for more flexible spectral analysis.
This large scale restructuring of the code exposed inflexibility in the APEC code, which had been designed to run from start to finish and produce one complete output file for the APEC model. To allow for future flexibility, the entire project was shifted to python with the release of the open source PyAtomDB (https://github.com/AtomDB/pyatomdb), which now underpins all aspects of the AtomDB project.
During and subsequent to this, we discovered that new data formats allowed for a significant improvement in AtomDB's capabilities, enabling a wide range of additional models to be produced. In this paper, we will outline the PyAtomDB project, the new data formats created for non-equilibrium modeling, the new models which this has enabled for charge exchange (CX), non-Maxwellian electrons (Kappa), and the ability to perform uncertainty studies.

The Astrophysical Plasma Emission Database (APED)
The AtomDB database consists of a selection of data for every ion from H to Ni, all stored as binary FITS (http://fits.gsfc.nasa.gov/fits_standard.html) table files. The data are stored in several different files for each ion, separated by process. Throughout the database, all ions are indexed by their ion charge plus one, so for example He_1 is neutral Helium, and files related to it are stored in the directory $ATOMDB/APED/he/he_1/. There are up to eight different file types for each ion, as listed in Table 1. In addition, APED contains several multi-element files, which hold data covering many ions together, as described in Table 2.  Data in the APED is stored as close to the original data format as possible. As a result, there is no single standard for values that are frequently represented as functions, such as collision strengths. Instead there are 10 or more different formats ranging from temperature/value pairs to fit coefficients, each of which can then be interpreted by the APEC code to produce the actual rate coefficient or other value required. This can be confusing for an end user, so part of the motivation for PyAtomDB was to provide a unified interface allowing users to request a piece of atomic data and get the actual rate or coefficient back, instead of the raw fit parameters, etc., which require further interpretation. This accessibility improvement makes using data in other models significantly easier.

The Astrophysical Plasma Emission Code (APEC)
APEC reads APED and processes the data using a collisional-radiative model to produce line and continuum emissivities on a set temperature and density grid. The plasma is assumed to be optically thin. For each ion at each grid node the excitation rates are calculated and combined with the spontaneous emission coefficients to form a collisional radiative matrix connecting the populations of all levels. Solving this gives level populations through simple multiplication and line emissivities per ion. Ionization and recombination into excited levels is also included in this matrix.
These line emissivities are then multiplied by the elemental abundance (by default [8]), and the ion population in equilibrium to produce the emissivity ε ij for each line: where A ij is the spontaneous emission coefficient from level i to j, N i is the number density of ions in excited state i, N z1 is the number density of the ion, N Z is the number density of the element, and N H is the number density of hydrogen. Therefore these terms from left to right are the spontaneous emission coefficient, the level population, the ion fraction, the elemental abundance relative to hydrogen, and the hydrogen density. APEC then produces two output files for the plasma, storing two kinds of emissivity data. The line file contains the ε ij , wavelength, ion and transition identification for each line above a set ε ij (by default, 10 −20 ph cm 3 s −1 ). The coco files contain the continuum emissivities from bremsstrahlung, radiative recombination and two photon decay, in ph cm 3 keV −1 s −1 , compressed onto an emissivity vs energy grid by an algorithm [9] which ensures that when the table is linearly interpolated the result is within 1% of the original emissivity value at every point. This is stored on an element by element basis, assuming equilibrium ionization fractions. The coco file also contains the sum of the emission from weak lines not contained in the line file, known as the pseudocontinuum, which is compressed in the same way as the continuum. Within these two files, both of which are standard FITS files, the first Header/Data Unit (HDU) is the list of temperatures, then each subsequent HDU contains the line or continuum emission for one of these temperatures. This data can then be rapidly read by fitting codes such as XSPEC and used to analyze data, interpolating between neighboring temperatures in log(T) − log(ε) space.
With AtomDB 3, the introduction of the non-equilibrium ionization (NEI) model means that the ionization fraction, N z1 /NZ, could not be assumed to be fixed. There are therefore new nei versions of the same data files that do not have this factor included in their emissivity, and for each line and continuum component specify the driving ion which created the feature, as opposed to just the element or ion of the line. For example, the forbidden line of He-like O can originate from the excitation of O 6+ , ionization of O 5+ , or recombination of O 7+ . Creating a model spectrum then involves calculating the ionization fraction for the plasma in question and then multiplying the tabulated emissivities by this to the result to obtain the final emissivity for the model.

PyAtomDB
The introduction of AtomDB 3 created a vastly larger database to handle all of the underlying atomic processes involved in non-equilibrium processes. In particular, inner-shell excitation and ionization, along with excitation-autoionization, were included in the database for the first time and it was necessary to include ions that are not significant emitters in the X-ray region in equilibrium plasmas, such as M-shell ions.
Initially, APEC was modified to handle these data sets, although it became clear that this was not an efficient method for handling this. The original code started by loading all of the atomic data in the database, then iterating over each temperature, density, element, and ion to produce the spectra. Due to the code's structure, changing the memory storage alone (required due to the entire database no longer fitting in a reasonable machine's memory) required a complete re-write of the code. While doing this, it was also converted to Python due to the ease of development, support across different systems, widespread existing use in astronomy, and existing relevant packages such as Astropy [10,11] which PyAtomDB now uses. This conversion enables several long term goals of the project: Users can now run the code themselves, direct access to the APED is possible for those wishing to integrate atomic data lookups into their code, and spectral models based on APED can be developed by users and integrated easily into analysis tools. The resulting six different modules of the PyAtomDB package are listed in Table 3. A full manual of the capabilities of PyAtomDB is beyond the scope of this paper. We will instead highlight some of the capabilities of the spectrum module, creating spectra of interest to end users, and extensions to it for modeling different, non-standard plasma types and investigating the effects of uncertainties on atomic data.

Spectral Modeling with PyAtomDB
The spectrum module in PyAtomDB contains routines for creating collisional ionization equilibrium (CIE) spectra of thermal plasmas, complete with applying instrument responses. These tools are also provided with wrappers, allowing them to be used in PyXspec (https://heasarc. gsfc.nasa.gov/xanadu/xspec/python/html/).

Non-Equilibrium Ionization Modeling
In addition to simple CIE plasmas, there are extensive models for NEI plasmas within PyAtomDB. These models represent a plasma where the electron temperature has rapidly changed, however the charge state distribution of the ions has not yet had time to reach a new equilibrium at this temperature. As the timescale for ionization and recombination is significantly longer than that for individual level populations to change, there is significant emission from ions at temperatures where they would not normally be observed in CIE. Such plasmas can be found in any place with shock heating-for example young Supernova Remnants (SNR) where the reverse shock rapidly heats the electrons leaving the plasma in an ionizing state until the plasma reaches equilibrium again e.g., [12]. Similarly, a recombining plasma, where the electrons have undergone a rapid cooling, can be found in many Mixed Morphology SNR, where rapid expansion leads to rapid cooling of the electrons and a subsequent recombination dominated plasma [13]. Figure 1 shows an equilibrium, ionizing from kT e = 0.07 keV and recombining from kT e = 7.0 keV spectrum for a kT e = 0.7 keV plasma at an ionization timescale of 10 10 cm −3 s. The ionizing case is dominated by He-like lines of Si and Ne, while the recombining case shows significant emission from He-and H-like Fe, as well as the H-like Si and Mg ions. This model is identical to the existing rnei model in XSPEC, but has additional flexibility such as allowing the user to identify strong lines in a selected wavelength region of a non-equilibrium plasma. There is a similar model for a plane-parallel shock (pshock) [12].

Non-Maxwellian Modeling
Non-Maxwell-Boltzmann electron energy distributions can occur in a range of astrophysical plasmas such as the Earth's magnetosphere [14], supernova shock waves (e.g., [15]), or solar flares (e.g., [16]). In these cases, the electron energy distribution is not well represented by a Maxwellian distribution due to an excess of high energy electrons. These can be modeled by a κ distribution: where: When κ approaches its lower limit of 3/2, the distribution is highly non-Maxwellian, when κ = ∞ it is once again Maxwellian.
Most electron energy dependent atomic data is, however, collated assuming the electron energies follow a Maxwellian distribution, as it saves substantial disk space, speeds computation of emissivities, and is relevant to the majority of applications. In theory the parameters such as electron collision strengths could be recalculated from first principles for non-Maxwellian plasmas, however the underlying cross section data either no longer exists for the vast majority of ions or, if they are, are too unwieldy for practical use. As a result, representing the κ distribution as a sum of Maxwellian plasmas presents an attractive compromise. We have implemented [17] the decomposition method of [18] to create spectra for non-Maxwellian plasma, wherein several different Maxwellian temperatures are added to closely approximate a true κ energy distribution.
Initially this was to be performed as a complete rewrite of the underlying model, however with the flexibility of PyAtomDB and the new data formats, we are able to achieve this using a simple 2-step process: The effective ionization and recombination rates are calculated by appropriately summing the Maxwellian rates as shown in Figure 2, the charge state distribution is calculated, and then the non-equilibrium emissivities can be combined with these to provide a non-Maxwellian spectrum.

Charge Exchange
Charge exchange occurs when a donor ion or atom with one or more electron still attached interacts with a recombining ion, which is missing at least one electron, resulting in the transfer of the electron from the donor to the receiver: where R is the recombining ion, and D is the donor. In most situations where charge exchange is postulated (e.g., the solar wind), the donor is neutral hydrogen as the only element with a significant enough abundance for the interaction to be observable. Typically, the exchanged electron is captured into an excited state of the ion, and these are often significantly higher n and l shells than those populated by electron impact excitation. As the ion relaxes to its ground state, the line emission is therefore significantly different from that observed in a typical thermal plasma. In some cases this can allow for the direct identification of CX, but in many more cases the foreground CX emissions adds an additional nuisance component to the spectrum of interest. We released version 1 of the AtomDB Charge Exchange (ACX) model in 2014. This used a combination of analytical nl shell capture distribution formulae with a cascade to ground calculated through the APED database to create spectra useful for identifying CX emission [19], and successfully used it to model the spectrum of the hot and cool solar wind components. However, the model had several shortcomings. On the physics side, the CX process was modeled without any velocity dependence, with the user selecting from one of four analytical cross section models to see which worked best. On the implementation side, the model relied on a computationally expensive process combining the AtomDB database to produce the spectra and line lists of interest. This meant that as the APED database was updated in the future, the changes in line wavelength and A-values were not reflected in the ACX model. To expand the model, for ACX version 2 theoretical charge exchange cross sections (as opposed to the analytical nl distributions), including velocity dependent effects, were obtained from the Kronos database [20][21][22]. PyAtomDB was then used to create spectra for capture into each nl shell, and subsequent cascade to the ground state. By treating each of these as independent spectra, as if they were separate ions in the NEI model, spectra can be rapidly created in two steps: First, the cross section for capture into each nl shell is calculated, then the spectrum for each nl shell is multiplied by this cross section and then summed: where ε CX is the total CX emissivity, nl is the shell captured into, E and v are the energy and velocity of the center of mass of the donor-recombining ion, and ε nl is the emissivity for capture into that one specific nl.

Cross Section Data Selection
The Kronos database contains only nlS selective cross sections for collisions with a hydrogen donor and a H-like recombining ion, and also for the bare recombining ion of C, N, O, and Ne. For all other bare ions, and for He as the donor, data is only resolved to the n shell.
In the case of nlS resolved data, the total capture into all the states which match the nlS is split evenly. For n resolved data the l shell distribution is handled by one of the analytical distribution functions described for ACX version 1 (the user may choose which one). For all other ions, where there is no data in the Kronos database, the model falls back to the ACX 1 models, with the total cross section fixed at 3 × 10 −15 cm 2 .
Within the Kronos database, there are often different data sets present for the same ion. As the authors of the database themselves advised, we choose the data which exists from the best available technique if no specific set is recommended. The techniques are ranked from quantum molecular orbital close coupling (QMOCC) [23], atomic orbital close coupling (AOCC) [24], classical trajectory monte carlo (CTMC) [25,26], to multi-channel Landau Zener (MCLZ) [27].

Spectral Generation
For each nl or nlS to which capture is possible given the cross section in the Kronos database, we calculate the spectrum by creating a radiative matrix assuming no further collisional excitation as the electron cascades to the ground state. The A-values are taken from the AtomDB database for all ions where they exist.
For some heavier ions, the n shell for capture ends up being very high, e.g., 16 for the donor Fe 25+ . This is beyond the AtomDB holdings, which stop at n = 10 for He-and H-like ions and at n ≤ 7 for all other ions. For these cases we have topped up the existing AtomDB A-values and wavelengths using calculations from AUTOSTRUCTURE [28]. Where possible, the wavelengths for these calculations have been shifted to match the NIST ASD [29] values. The new energy levels were matched to the existing AtomDB levels using energy-order, symmetry, and degeneracy matching. For the Li-, He-, and H-like ions these matches were relatively trivial, though in the middle of the Fe L shell ions this matching becomes more problematic. Fortunately, the inaccuracies introduced by poor wavelengths at high energies in these complex ions can be largely discounted due to the fact that the sheer number of lines prevents any one line from producing a strong spectral signature. By contrast, a simple ion such as O 7+ has a straightforward series of lines with well established wavelengths, and therefore discrepancies would be more pronounced.
Constructing the model in PyAtomDB has two distinct advantages: It allows the model to be rapidly developed (as it reuses most of the tools for NEI plasma) and it enables users to deploy it using the same commands.

Varying Atomic Data
Estimating the magnitude of the uncertainties on rates for fundamental atomic processes is a difficult task with no obvious correct solution. The source data comes from a wide range of theoretical and experimental sources, often lacking sufficient information on the associated uncertainties to make a meaningful estimate. Nevertheless, there is significant effort being applied to estimating atomic data uncertainties: Most atomic data publications in the past decade have some effort to make reasonable uncertainty assessments, and there are several groups e.g., [30,31] that are working on trying to estimate uncertainties on the existing atomic data.
Given the magnitude of this task, being able to direct the efforts by identifying the atomic data which line emission and ratio diagnostics are most sensitive to is essential for more accurate data analysis. We have developed the variableapec (https://github.com/AtomDB/variableapec) module, which makes use of routines in PyAtomDB to identify which atomic processes most strongly affect the resulting spectra. The user can vary any fundamental atomic data in the AtomDB database by a set amount and re-run emissivity calculations to identify which lines are sensitive to the changed parameter. This allows users to investigate the sensitivity of line and line ratio emissivities over a range of temperatures and densities. The module also checks for other lines affected from perturbing a single line.
Varying an atomic rate of a single transition may have cascading effects for other transitions, depending on the strength of the line varied or the levels of the transition. The user can input a range of wavelengths and cutoff threshold for the fractional emissivity change and plot lines sensitive to the transitions changed. This is an important diagnostic of lines that are sensitive to cascade effects as well as a diagnostic of correlated errors. Throughout this work we provide errors as the mean value of the positive and negative uncertainties for the fractional emissivity change. Line ratio errors are half the range of the maximum and minimum ratios obtained when applying both positive and negative uncertainties to the individual atomic data. Figure 3 shows Fe XVII lines sensitive in the range of 10-20 Å to a ±30% change in A value in key emission lines at low density and the temperature where Fe XVII peaks in abundance. Varying the A value for the 3E transition alone resulted in a greater than 0.01% change in emissivity for not only the 3E transition but also four other lines in the range of 10-20 Å. Note that at a low density, the emissivities of the lines are almost only sensitive to the rate of population into the level (i.e., the'effective collision strengths) as opposed to the rate out (i.e., the A values). 3E also has the greatest fractional change in emissivity of the six key transitions with A values perturbed by 30%. The value of 30% chosen for the uncertainty on the A-value is deliberately large for demonstration purposes, although it is in line with the uncertainties on the 3C and 3D line ratios reported in Bernitt et al. [32], and NIST Atomic Spectral Database uncertainties for these lines are class C (±18%) and D (±50%). fundamental atomic data in the AtomDB database by a set amount and re-run emissivity calculations 229 to identify which lines are sensitive to the changed parameter. This allows users to investigate the 230 sensitivity of line and line ratio emissivities over a range of temperatures and densities.

251
The new variability module also allows the user to vary the atomic rates of two separate transitions 252 and recalculate diagnostics for that line ratio, such as the Fe XVII 3C/3D ratio shown in Figure 4. Line 253 ratios can be sensitive to either temperature or density, making them useful diagnostics of estimating a 254 plasma's temperature or density. It is more informative to add uncertainties to two lines individually 255 and recalculate line ratios rather than adding a blanket uncertainty to the final line ratio. The new variability module also allows the user to vary the atomic rates of two separate transitions and recalculate diagnostics for that line ratio, such as the Fe XVII 3C/3D ratio shown in Figure 4. Line ratios can be sensitive to either temperature or density, making them useful diagnostics of estimating a plasma's temperature or density. It is more informative to add uncertainties to two lines individually and recalculate line ratios rather than adding a blanket uncertainty to the final line ratio.  variableapec can also apply uncertainties for more complicated line ratios, such as the G and R ratios in the n = 2 → 1 transition lines of He-like ions [33]. These are the ratios of the forbidden plus intercombination lines to the resonance lines (G) and the forbidden to intercombination (R) lines, which are temperature and density dependent respectively. It adds the same uncertainty to the forbidden, intercombination, and resonance lines and then calculates the error propagation on the individual rate changes to produce an error on the final line ratio. Figure 5 and 6 show the effect of perturbing these key emission lines for the O VII R ratio and Fe XXV G ratio respectively. We applied a constant error over a temperature or density range, however the errors on the underlying data can be significantly temperature dependent, e.g., effective collision strengths. Future steps include recalculating line emissivities and line ratios with representative temperature-dependent errors.
The tool allows the user to recalculate line diagnostics at multiple temperatures or densities. It is informative to estimate the fractional error on the resulting line ratio after varying the A value or direct excitation rate of a density-or temperature-sensitive line. The O VII R ratio at three different plasma temperatures in Figure 5 shows that the magnitude of fractional error varies for different plasmas. Fractional Error Figure 6. Fe XXV G ratio with a ±15% uncertainty on the direct excitation rates of the forbidden, resonance, and intercombination lines at low density. The final error on the line ratio is ∼15% for the entire temperature range plotted.
When a pair of lines is blended, it is more useful to add the uncertainty onto the three lines individually and recalculate the blended line ratio. Figure 7 shows the blended line ratio of (209.62 + 209.92) Å/213.77 Å for Fe XIII, a density-sensitive line ratio often used in solar physics [34].

Summary
The new PyAtomDB package had been released as open source code. It is now possible for the astronomy community to obtain all of the data, thermal plasma codes, and spectral models based on the project, and to develop extensions based on it.
We have developed modules for charge exchange and non-Maxwellian electrons, which are compatible with PyAtomDB and analysis tools such as PyXspec. The code can easily be extended to include new models the community may contribute.
In addition, we have used the newly available codes to develop a tool for estimating the sensitivity of spectral diagnostics to uncertainties in the underlying atomic data, which is immediately applicable to a range of plasmas for upcoming high resolution missions such as XRISM.

Appendix A. Summary of Filetypes in APED
This section contains all the filetypes in AtomDB as of version 3.0.9. The format notations are from the FITS standard, so J is an integer, E is a float, D is a double precision float, and A is a string. So 20E denotes a 20 float array. The atomic number of an ion is denoted by Z, while z is the ion charge and z1 is the ion charge plus one. Table A1. APED IR (ionization and recombination rate) file definition. Excluded data types (XI, XR, and XD) are used to populate excited levels of ions but are not used when calculating the charge state distribution. The effect of these rates is included in the CI, EA, DR, and RR total rates used for calculating the charge state distribution. Notable header keywords: IONPOT contains the ionization potential in eV.

Name
Format Description    The files are data covering not just a specific ion, but covering a range. For example, there is one file which contains the abundances of all the elements.

Name Format Description
Source 10A description of source H 1D Abundance of H in log(12) notation He 1D Abundance of He in log(12) notation Li 1D Abundance of Li in log(12) notation Be 1D Abundance of Be in log(12) notation B 1D Abundance of B in log(12) notation . . . . . . . . .

Ni 1D
Abundance of Ni in log(12) notation Cu 1D Abundance of Cu in log(12) notation Zn 1D Abundance of Zn in log(12) notation Table A10. APED eigenvector file definition. These can be in separate file for each ion or a separate Header Data Unit (HDU) in the same file for each. Data is stored at 1251 temperatures from 10 4 to 10 9 K, logarithmically spaced. Eigenvector and eigenvalue data is stored for solving non-equilibrium ionization balances, using the method of [53].

Name Format Description
FEQB (Z+1)D Equilibrium ionization fraction for each ion (Z = atomic number of element) EIG (Z)D Eigenvalues for each ion VR Z 2 Right eigenvectors for each ion VL Z 2 Left eigenvectors for each ion Table A11. APED filemap file definition. This is a text file which denotes which files were used in an APEC run to create the emissivity files. It is used to identify the current recommended data in the database. Each line indicates another file, with the contents of each line being four columns, in the format "I2XI2XI2XS". I2 is a two character integer, S is a string of unspecified length, and X is a white space character. Each column represents, from left to right: