Upgraded Thoth: Software for Data Visualization and Statistics

: Thoth is a free desktop/laptop software application with a friendly graphical user interface that facilitates routine data-visualization and statistical-calculation tasks for astronomy and astrophysical research (and other ﬁelds where numbers are visualized). This software has been upgraded with many signiﬁcant improvements and new capabilities. The major upgrades consist of: (1) six new graph types, including 3D stacked-bar charts and 3D surface plots, made by the Orson 3D Charts library; (2) new saving and loading of graph settings; (3) a new batch-mode or command-line operation; (4) new graph-data annotation functions; (5) new options for data-ﬁle importation; and (6) a new built-in FITS-image viewer. There is now the requirement that Thoth be run under Java 1.8 or higher. Many other miscellaneous minor upgrades and bug ﬁxes have also been made to Thoth. The newly implemented plotting options generally make possible graph construction and reuse with relative ease, without resorting to writing computer code. The illustrative astronomy case study of this paper demonstrates one of the many ways the software can be utilized. These new software features and reﬁnements help make astronomers more efﬁcient in their work of elucidating data.


Introduction
In the first paper on Thoth (http://web.ipac.caltech.edu/staff/laher/thoth, accessed on 6 December 2022) [1], hereafter referred to as Paper I, it was introduced as a standalone desktop/laptop software application with a graphical user interface (GUI) for making it easy to query/import, display, visualize, and analyze tabular data stored in relational databases and data files. A rich set of table-column statistics can be computed with just a couple of computer mouse clicks, making it eminently useful for data-quality assurance (QA). There are special software features to improve its utility in astronomy and astrophysical research, such as dealing with FITS-formatted image and table files [2]. The astronomy-specific capabilities of Thoth motivated and will continue to drive its development, although some of what Thoth does is actually quite universal. The software can be used as a general tool in any field where numbers are visualized, as well as business and education. Thoth's major strength is its ability, via user interaction, to rapidly set up and generate various kinds of graphs. It is programmed with seventeen different graph types at this juncture, without need for the computer-coding of scripts to make plots. The fully-functioning software application can be downloaded free of charge (see http://web.ipac.caltech.edu/staff/laher/thoth, accessed on 6 December 2022), and there is no license governing its use. The online instructions cover installation of the software on the major types of computing platforms (Macs, Windows, Linux, and Unix systems with available Java Virtual Machine).
Since the publication of Paper I, several noteworthy upgrades have been made. This report is given in order to apprise users of Thoth's recently implemented capabilities and promote new interest in the software. The software version documented in this report is v. 5.5, which is the latest version at the time of this writing, and v. 3.6 was covered in Paper I. Several versions were released along the way with incremental upgrades and bug fixes, all to provide the fast turnaround of user feedback and make software improvements available as soon as possible.
Aside from its data visualization and statistical capabilities, Thoth is useful for viewing tables, especially the wide tables common in astronomical catalogs, which can be larger than a normal computer screen. It is important for understanding data and software debugging to see all columns of a wide table quickly. Tables from database queries and file imports are displayed in tabbed panes in the lower part of Thoth's main GUI panel. The GUI has a horizontal scroll bar, so there is no need for long rows to be wrapped around. Moreover, the tables are redundantly displayed in two subordinate tabbed panes, titled "Cells" and "Text", each pair containing a cell-based spreadsheet and a text area with tabular-formatted plain text.
Thoth is provisioned for examining FITS tables (and headers); both ASCII and binary types. Thoth has been recently given the novel facility of importing FITS-image data into a Thoth data table. This has multifarious uses, such as examining the image values and making scatter plots of select image rows or columns. A FITS-image viewer has also been built into Thoth as a complementary analysis tool. Section 2 describes the major upgrades made to Thoth since Paper I was published. An astronomy case study is presented in Section 3, in order to demonstrate one of the many ways the software can be utilized. A performance update and the limitations of Thoth heretofore not brought to light are discussed briefly in Section 4. The conclusions of this report are given in Section 5.

New Graph Types
Since Paper I, Thoth has been upgraded with six new graph types, so there are now 17 different graph types Thoth can produce. The new graph types include 3D stackedbar charts, 3D surface plots, as well as 2D stacked-bar, area, ring, and spider-web charts. Examples of these are given below (Paper I has examples of the previously available graphs). Except for the 3D stacked-bar chart and 3D surface plot (to be discussed in more detail in the next paragraphs), the new chart functionality is provided by the mature, open-source JFreeChart library (https://www.jfree.org/jfreechart, accessed on 6 December 2022).
Implemented for the first time in Thoth is the capability of generating 3D stackedbar charts, which is made possible using the open-source Orson Charts library (http: //www.object-refinery.com/orsoncharts, accessed on 6 December 2022). Figure 1 is the example reproduced from David Gilbert's marketing literature, based on the data compiled by Hall [3]. Figure 1 was made, in particular, using version 1.6 of the Orson Charts no-FX package. The data table used to make Figure 1 is presented in Table 1, shown here for users who want to try to reproduce Figure 1 themselves with Thoth; the data table can be saved in a space-delimited, plain-text file, and then imported into Thoth via the Import Table button on the main GUI panel.  Figure 2 is an example of the new 3D surface plot made by Thoth, where shown is a source in a FITS image. The control panel (not shown) has text boxes for the user to manually specify the lower and upper range of data in the color scale if the default data minimum and maximum are too extreme. Two other color-scale schemes are available for the 3D surface plot: cubehelix and custom. The cubehelix formulation was invented by Dave Green to display image intensity with an underlying increase in the perception of the brightness of the colors used [4]. It is popular among some astronomers, and is a useful and aesthetically pleasing color scale, in general. The custom color scale is a gradation from one color for the low end of the data range to another color for the high end of the data range, and the user is free to choose the custom colors.
The 3D surface is computed via bilinear interpolation. In the user preferences (save preferences after running this tool interactively, in order to persist the settings), the keywords imageSurfaceChart3DGridAxisSamplesX and Y control the number of samples in the interpolation grid along each dimension, and the default value is 70. Setting a higher number of interpolation-grid samples will make the rotation response slower. Holding down the Alt (or Option) button on the keyboard and simultaneously dragging the mouse on the surface plot can be used to move it within the display panel. The mouse wheel zooms the viewing distance of the 3-D surface plot in and out. The aforementioned new 2D stacked-bar chart now supported by Thoth is given in Figure 3, with an example derived from labor-force statistics downloaded from the U.S. Bureau of Labor Statistics website (https://www.bls.gov/web/empsit/cpseea03.pdf, accessed on 6 December 2022). . Example 2D stacked-bar chart made by Thoth. This counterpart of the regular bar chart is useful for visualization of orthogonal categories that can be added to sum up a more general population (such as "females" plus "males" give total or net number of people).
Area charts are now available in Thoth. An example is shown in Figure 4, which is based on Mouawad et al.'s data [5]. Thoth has controls to adjust the area color and its opacity for each data series separately, in order to help distinguish curve areas in the case of partial overlap. The legend is generated automatically, as for all the chart types, and the curve labels can be customized.  Figure 13 in Paper I gives a line-chart example). In this example, the categories are number labels. The difference between the two graph types, however, is that area charts have coloring or shading under their curves, and the curve outlines are not drawn so as to emphasize the area under each curve.
Ring charts are available in Thoth for the first time. Figure 5 gives a ring-chart example. The controls for customizing ring charts are similar to pie charts, except that ring thickness is a new adjustment that can be changed with a slider widget and an additional label can be put in the ring center. The ring-chart slice labels with category, quantity, and percentage in the yellow boxes are optional, as is the same for pie charts. The ring-chart legend gives a color key to the displayed-slice categories. The slice colors on ring charts can be easily customized, the same as for pie charts in Thoth. Finally, Thoth now supports spider-web charts, and an example is given in Figure 6. Spider-web charts have categories in the azimuthal dimension, and are unlike polar charts, which have floating-point numbers instead of categories. As with bar charts, the number of categories allowed for spider-web charts is arbitrary.

Saving/Loading Graph Settings
Thoth has new functionality to enable the saving and loading of graph settings in a file for permanent storage and retrieval. This is especially useful for complex charts that require time-consuming setup. It is often the case that once a graph is made, it is useful as a template for graphing similar data sets. The authors' experiences with this new capability have been that it is most definitely a huge time-saver, as a graph created and saved months earlier can be dredged up in an instant, provided that the graph-settings file has been memorably named (more on this below).
The control panels for each of the seventeen graph types now include prominent Save Settings and Load Settings buttons. If a graph setting is changed, one simply clicks on the Save Settings button to overwrite the old settings with the new ones, or the new settings can be saved to a different file. The graph settings are keyword = value pairs of graph attributes stored in a property file. A default convention has been established to store these files in graph-type subdirectories under the plots subdirectory of the .Thoth hidden directory. This way, the graph-settings files are in a known place and can be found easily later (assuming the filenames are sufficiently differentiated and specific).
In the graph-settings filename, users can employ keywords of their own choosing to customize the dot-separated filename according to a particular project or task name and/or date. Note that, as illustrated in the above example, the software forces users to conform to filenames having the following form: where the asterisk can be one or more dot-separated keywords, and similarly for the other graph types.
It is important to note that the saved graph settings are completely divorced from the graph data. No reference to any particular graph data is made in the graph-settings file. This makes the graph settings possible to reuse as a template for building similar graphs for other data sets.

Batch Mode
A batch or command-line mode has been implemented in Thoth for using the software to noninteractively generate graphs, such as in a data-processing pipeline. Thoth in the batch mode must be executed separately for each different graph to be made, and a PNG image of the graph is the produced result of Thoth batch-mode execution. A choice of graph type from the seventeen available different graph types is currently possible in the batch mode. More graph types will be added in the future. The Thoth command-line options that are available can be listed by executing command Thoth -h in a terminal window. The software's built-in documentation has examples of batch-mode commands for all graph types.

Graph Annotations
It is common to annotate graph data with labels, lines, and colored or shaded areas to draw attention to salient features of the data. Annotations furnish a graph with an extra layer of sophistication and customization, and transform ordinary graphs into extraordinary ones. Thoth documentation and GUI components refer to these annotations as graph labels, marklines, and markareas. A markline is defined as a horizontal or vertical line that cuts across the entire data domain or range displayed. Markareas are more general than marklines, and are defined as line segments and two-dimensional shapes whose size can be arbitrarily specified to be smaller than the data domain or range displayed.
Thoth has been upgraded to allow arbitrary placement of graph labels, marklines, and markareas on many of its available chart types (there are exceptions, and notable exceptions at this time are the new 3D stacked-bar chart and 3D surface plot). An example of a Hess diagram made by Thoth is given in Figure 7, which has red labels (various kiloparsec distances), black-dashed marklines, red-line markareas, red-filled-square markareas, and blue-unfilled-circle markareas. This example is a variant of Figure 2 of Soraisam et al. [6].
In other examples in this paper, Figure 3 has graph labels and marklines, and Figure 4 has graph labels. Currently, the menu of choices for markarea shapes include line segment, rectangle, ellipse, triangle, and arrow. A markarea specified as a line segment can have an arbitrary length and slope (its end-point coordinates specify opposite corners of a rectangle in which the associated diagonal is the line segment).

Improved Table Importation, etc.
There have been upgrades to Thoth's capability of importing tables from data files in various formats. The FITS-table importer has been modified so that it is able to read in tables containing columns with the byte data type, and also import LDAC FITS binary tables (Leiden Data Analysis Center). The file-importer code was further refined to allow blanks in the data columns of plain-text, non-space-delimited data-table files and IPAC-table files. New functions have been added to import and export PostgreSQL query files, which comprise a plain-text, pipe-delimited data format that is similar to output from a database query executed via the psql command. A new option is the ability to import image data from a FITS file for display as a Thoth table, and, conversely, any Thoth table can now be exported as a FITS-image file. Additionally, code was added to trim leading and trailing spaces from table rows when importing delimited, plain-text tables. Finally, code was included to skip the non-standard marker lines in the output photometry table generated by Aperture Photometry Tool or APT (https://www.aperturephotometry.org, accessed on 6 December 2022) [7], which is otherwise a space-delimited, plain-text table, in order to make Thoth compatible with APT for astronomy-education programs. These upgrades make importing (exporting) tabular data from (to) files more flexible and robust. Table 2 gives the file formats that Thoth handles to import data tables from files, and nearly all of these format choices are available for exporting tables to files (the exception is that there is no option for exporting to an LDAC-formatted file). The new capability of importing a FITS image as a Thoth data table is accessed under the menu revealed by the Import Table button on the main GUI panel. This functionality is handy for making image-slice plots through rows (if the image is optionally transposed) or columns of an image. The checkbox option of transposing the image is available for taking full advantage of Thoth's column-wise plotting capabilities. Another option is to include a row-number column, with the "rownum" column name, as the first column of the table of image data, which facilitates the generation of image-slice scatter plots, etc. The text representation of the table includes a listing of the entire FITS header (for all header-data units therein). This capability was tested on images with BITPIX = 8, 16, −32, and −64. A 3 K × 3 K-pixel image with BITPIX = −32 took a couple minutes to be read in and displayed as a Thoth table. A 9 K × 9 K-pixel image with BITPIX = −64 was also tested, and it took 12 min for the same. Images of this large size are quite impractical to work with in Thoth because of the long load time, but making a scatter plot from it, for example, can be carried out eventually if the user is exceedingly patient (and if the software does not exceed the memory limit). Keep in mind that only the first 2000 columns will be loaded into the corresponding SQLite table newly created in the scratch database (refer to Paper I, Subsection 2.1, for more information about Thoth's scratch database). The user is responsible for ensuring that adequate machine memory is available and the allocated Java heap size is sufficient (more memory can be allocated by changing -Xmx8192M to a larger value in Thoth.csh, assuming Thoth will be launched from a terminal window via the execution of this script).
Another new Thoth option is the creation of a custom data table with Excel formulas. This capability is useful for quickly generating tables with an arbitrary number of rows via computation from simple equations, notwithstanding that Excel permits substantial complexity in its formulas. The underlying Excel engine is provided by Apache POI, version 3.17, which is the de facto standard Java application program interface for Microsoft documents (https://poi.apache.org, accessed on 6 December 2022). Thus, the Excel formulas that can be programmed into this tool are limited to those handled by Apache POI. Thoth's Import Table  menu has been augmented with the Create Custom Table option for this purpose.

FITS-Image Viewer
A FITS-image viewer has been added to Thoth. It was constructed from the Aperture Photometry Tool code base [7], pruned to be without aperture-photometry capability, and reworked and distilled into its present form. Its main purpose is to provide basic imageviewing capability and equip the user for exploring various image regions and reading off image-data values at pixel coordinates of interest. When a FITS image is imported as a table into Thoth, users can easily see how positions on the image correspond to numbers in the Thoth table.
The viewer panel can be brought up from the Image menu on the menu bar or when importing a FITS-image file into a Thoth table. FITS-image files containing one or more image extensions that are either tile-compressed or uncompressed are both handled by the software. The user is asked to choose which image in the FITS file is to be displayed when a multi-extension FITS-image file is loaded (the selected and total number of header-data units in the FITS file and pixel dimensions of the selected image are displayed as [m : n; j × i] after the FITS filename at the bottom of the viewer panel). The image data at load time are statistically analyzed in order to best display the image. Using the test machine described in Paper I, it takes about a minute to load a 6 K × 6 K-pixel Pan-STARRS1 image into the viewer.
Thoth's user-preferences menu currently has a tabbed pane for the FITS-image viewer, and on it are options for specifying whether scientific notation will be used for displaying numbers and whether celestial coordinates will be in either a sexagesimal or decimal representation.
New external dependencies required by the viewer include the following packages: NASA GSFC nom.tam.fits, version 1.15.1 Apache Commons Ccompress, version 1.18

Binned Statistics
Thoth's Statistics panel has been augmented with a tool to compute a table of binned statistics. The data column to be binned, which generally will be different from the data column for which statistics are to be computed, can be selected via a pull-down menu. The other input parameters are: bin-start value, bin-end value, bin width, and lower/upper limits for outlier rejection. Clicking on the Generate Binned-Statistics Table button causes  a binned-statistics table to be computed and subsequently displayed on the main GUI panel in a new tabbed pane. A box-and-whisker chart can be created for binned statistical quantities versus bin-center values (as long as the number of bins are limited to around 10, so that the bins can be labeled properly on the domain axis).

Astronomy Case Study
This section provides a quick, practical example of how Thoth can be incorporated into a foundational lesson in astronomy. In particular, the construction of a Hertzsprung-Russell (HR) diagram for the globular cluster Messier 92 (M92) is demonstrated (or, more specifically, a raw g − r color-magnitude diagram containing mostly cluster stars, and relatively small numbers of contaminant foreground stars and background objects). An r-band cut-out image of M92, made from a FITS file (The FITS file shown is a stack of 22 40-second exposures: rings.v3.skycell.2206.075.stk.r.unconv.fits) downloaded from the Pan-STARRS1 data archive (https://outerspace.stsci.edu/display/PANSTARRS, accessed on 6 December 2022) is given in Figure 8. The aforementioned diagram utilizes catalog data from the Pan-STARRS1 data archive [8] as well. The resulting observational HR diagram clearly shows the main features of stellar evolution: a main sequence (where most of the stars lie), the turn-off point to the giants branch (upper right), a less-prominent horizontal branch (HB) with instability-strip gap at the top of the diagram, and a "blue hook" of extremely hot stars at the left end of the HB.

Performance and Limitations
When a text-table object is made, it is now performed in the background. And a text-table object is no longer automatically made for a table with 1 M rows or more, in order to speed up the software. However, if the user clicks on the Text tabbed pane next to the Cells tabbed pane on the main GUI panel, then the object will be made as a background task and the message "Creating text table now; please wait. . . " will be displayed until this message is replaced with the actual text table. Both the conditional generation of the text table, depending on the number of table rows, and the new recoding of this action as a background task has made Thoth potentially more responsive. Because of this, along with other small improvements, the wall-clock time required to complete the performance test on the 10 M-row table described in Paper I has been reduced from ≈10 min to ≈6 min when benchmarked on the same laptop computer under similar conditions (with upgraded operating system).
Loading a large FITS image into a Thoth table can lead to memory or responsiveness issues if an insufficient amount of memory is allocated. Before the aforementioned 6 K × 6 K-pixel Pan-STARRS1 image was loaded, Thoth was launched using the -Xmx16G Java option, in order to have the software successfully create a text-table object without running out of Java heap space.

Conclusions
Thoth has undergone substantial upgrading since Paper I was published. The newly implemented major functionality includes graph types not previously supported. The other areas of major functionality implemented in Thoth since Paper I are: the saving/loading of graph settings, which is especially useful for complex charts that require a time-consuming setup; a batch mode for convenient non-interactive graph generation in automated processing; the arbitrary placement of graph labels, marklines, and markareas on graphs in the plot area to highlight interesting portions of the data; more versatile and accommodating importation of data tables from files; and a built-in FITS-image viewer. Additionally, a number of miscellaneous minor software improvements have been made, which has, overall, buttressed the functional cohesiveness of the software. Many of these refinements ameliorate the software appearance and usability, and generally make it less complicated to augment the software with new functionality.
A potential further Thoth enhancement would be to import/export tables from/to Excel spreadsheet files. Another would be to implement the capability of 2D stacked-area charts. In addition, the other 3D graph types in the Orson Charts library will be implemented in Thoth. There are many other things that could be done to improve Thoth further. Community involvement is essential for Thoth's continued development. Eager users are encouraged to engage the authors in discussions about pressing upgrades that may be required.