Enhancing Automatic Prediction of Spirality using SpArcFiRe ’ s Spiral Arm Analysis and Random Forests

Automated machine classifications of galaxies are necessary because the size of upcoming surveys will overwhelm human volunteers. We improve upon existing machine classification methods by adding the output of SpArcFiRe to the inputs of a machine learning model. We use the human classifications from Galaxy Zoo 1 (GZ1) to train a random forest of decision trees to reproduce the human vote distributions of the Spiral class. We prefer the random forest model over other black box models like neural networks because it allows us to trace post hoc the precise reasoning behind the classification of each galaxy. We find that, across a sample of 470,000 Sloan galaxies that are large enough that details could be seen if they were there, the combination of SpArcFiRe outputs with existing SDSS features provides a better machine classification than either one alone on comparison to Galaxy Zoo 1. We suggest that adding SpArcFiRe outputs as features to any machine learning algorithm will likely improve its performance.


Motivation
The Hubble Ultra Deep Field (HUDF) represents about 1/13,000,000 of the celestial sphere and contains about 10,000 galaxies at least 100 of which have visible structure (by our own estimate), suggesting that the entire sky contains upwards of 10 9 galaxies with visible structure at the resolution and depth of the HUDF.To classify and quantitatively understand this number of galaxies will require automated methods.
SpArcFiRe 1 [1][2][3] is an algorithm designed to automatically extract structural information from the images of spiral galaxies.It was tested around a sample of 29,250 spiral galaxies from the Sloan Digital Sky Survey (SDSS), as selected by one of the PIs of the Galaxy Zoo project 2 .The selection criteria were: (GZ1 P S + GZ1 P Z ) > 0.8 OR (GZ2 FeaturesOrDisk > 0.7 AND GZ2 NotEdgeOn > 0.7 AND GZ2 spiral > 0.8), where P S is the fraction of human votes for S-wise (counterclockwise) spiral, P Z is the fraction of human votes for Z-wise (clockwise) spiral, and spiral is the addition of P S and P Z , for each object, in either Galaxy Zoo 1 (GZ1) or Galaxy Zoo 2 (GZ2).This sample used the same magnitude limit as GZ1 (17.7 in the red band).
Even though some galaxy images (eg., elliptical galaxies or low-resolution spirals) do not have visible arms, we do not know in advance which images exhibit arms.For this reason, we run SpArcFiRe on every galaxy image, and our goal is to figure out when SpArcFiRe's output is meaningful, preferably using the output of SpArcFiRe itself.SpArcFiRe's job is to find spiral arms in spiral galaxies; often it also marks noise as spiral structure.Thus, we wish to recognize when a galaxy image has visible spiral structure.Although ultimately we hope to develop an objective, quantitative, continuous measure of galaxy morphology, for now, we focus on the simple task of reproducing what we call the spirality of a galaxy image: from the GZ1 catalog [4,5], we define the spirality to be P SP = (GZ1 P S + GZ1 P Z ), representing the probability that there is any spiral structure visible for each object.We emphasize that spirality is a measure of the image, not the object.We are not trying to classify galaxies; we are trying to discern if a particular image exhibits spiral structure that is unlikely to be caused by noise.For example, although elliptical galaxies should be assigned a spirality of zero, an edge-on disk also should be assigned a spirality of zero, because spiral structure is not visible; thus, we wish to detect in both cases that SpArcFiRe's output should not be interpreted as representing spiral structure.
Since humans introduce certain types of biases into the classification scheme (for example the chirality bias [5][6][7]), we also wish to "dilute" such biases even though we train our method on human classifications.We do this by carefully choosing which inputs we allow our code to use.For example, we allow SpArcFiRe's measured pitch angle of spiral arms to be used as input to our machine learning classifier, but not the sign of the pitch angle [7], thus reducing chirality discrepancies to about 2 parts in 10,000 [3,7].Our work follows up on existing work published in the astronomical literature [8][9][10][11][12][13].

Related Work
We compare against the most impactful and successful classifiers published in the astronomical literature: Banerji et al. [10] and Dieleman et al. [12].The former was one of the first to apply Machine Learning to try and reproduce the human classifications of the GZ1 catalog [4] and the latter focuses specifically on reproducing the vote distribution of the GZ2 catalog [14], a regression problem, exactly like the approach we explore on this paper.The main difference here is that they are not concerned with the bias present in the dataset, so the smaller their Root Mean Squared Error (RMSE) is, "the better" their results are.In the recent Galaxy Zoo dataset releases, there has been an increased effort to eliminate human biases, but Hayes et al. [7] have proven that these datasets, in particular, GZ1, still contain biases so there is a trade-off between lowering the RMSE of a model and avoiding the introduction of such biases on the prediction.Banerji et al. [10] present good results using neural networks.They classified Sloan galaxies in one of three categories: spiral, elliptical, and point sources/artifacts, using a neural network with inputs listed in Table 1.They found that on the entire sample of about 900,000 Sloan galaxies, they could reproduce the human GZ1 classifications in 92% of cases.Across a sample of brighter galaxies (r < 17), they correctly classify about 94% of galaxies.They do even better for a sample called the "Gold sample", in which galaxies are only included if the humans are themselves more than 80% confident in the classification.We do not believe the Gold sample comparison is meaningful, however, because it is crucial to know how good the machine learning classifier is when it thinks it is confident but is, in fact, mistaken, and the Gold sample completely disregards this aspect. 3 Kaggle.com,a website devoted to machine learning competitions, offered £10,000 (GBP) to the algorithm which best minimized the RMSE between the automatic classification scheme and the human vote distribution for Galaxy Zoo 2. The winning entry was a Deep Learning algorithm using convolutional Neural Networks [12].It had an RMSE of about 0.07 relative to the human GZ2 vote distribution.Although this result is closer to the human votes than our result presented below, we are concerned about the professional use of deep learning techniques for several reasons: • We do not understand exactly what they are doing or how they are doing it, and research to better understand this aspect is still in its infancy [15,16].Although we have some control over neural networks, we cannot learn from what they have learned, or learn from how they make their decisions, because a neural net is a near-complete "black box".• We would prefer an objective, quantitative system, with parameters that are understood and can be modified by professional astronomers, and decision trees seem better suited to this task.• Decision trees are often used to measure the quality of features 4 used to make a decision and thus are more suitable for our goals in this paper.This is not the case for Deep Neural Networks, which do not yet easily provide a similar measure for the features it used.
For these reasons, we prefer a method that can be understood, dissected, and whose individual decisions can also be understood and dissected, if necessary.Understanding these decisions can teach us about galaxy characteristics and morphology in ways that "black box" machine learning classifiers cannot.Figure 1  Other interesting works published recently include Abd Elfattah et al. [18], which uses Neural Networks and Empirical Mode Decomposition to perform galaxy classification but uses a very small test set of 108 objects so it is hard to predict how their models would fare when trying to classify a 3 In essence, comparing against the Gold sample says "look how well we do when the humans pre-select the easy ones for us!"More formally, it disregards false positives -galaxies which the prediction is confident but is actually way off.It is important to clarify that throughout this paper we will be using the term feature(s) to describe an individual measurable property or characteristic of a phenomenon being observed [17] as it is commonly done in the machine learning literature as opposed to features as seen in an image -like globular clusters.much larger set of objects, like our test set.Kuminski et al. [11] makes a case for using "high-quality data" but we believe this will have the same issues as Banerji et al. [10]'s use of a "Gold Sample".Applebaum and Zhang [19] uses an ensemble of Support Vector Machines to classify GZ2 galaxies achieving good results.Ferrari et al. [13] uses Linear Discriminant Analysis to classify galaxies from a couple different surveys.
All of the aforementioned work present good accuracies (≥ than 90%) but, except for Dieleman et al. [12], they are tackling this problem from a different point of view: they are all performing classification rather than regression.There have been tremendous advances in Machine Learning towards improving classifiers, and most of these papers make use of those techniques, but that is not the goal of our work.Whereas in classification one is concerned in finding a line that best separates two or more classes (in our case, spiral and non-spiral galaxies), in regression we seek to learn about the underlying distribution, in this case, how to put a probability on a galaxy being spiral 5 , and at present the GZ votes are the best way to do that.Usually more information is gleaned from a continuous distribution than a discrete classification -in particular a user of the output can choose a confidence threshold themselves for classification that is more suitable for a certain task rather than relying on the table creator's subjective determination of where that threshold should lie.Peng et al. [20], for example, used regression for a task where they needed to analyze how spirality prediction degraded as a function of redshift, a task for which classification gives limited information.
For the sake of comparison we can turn our regressor into a classifier by choosing a boundary for the decision.If we choose that boundary to be 0.5, we will make our decision based on the majority vote, which mimics the choice of the Galaxy Zoo researchers in some releases [4].That would give our regressor an accuracy of approximately 93% based on the test set presented in Table 5. 6

Methods
We are mostly concerned with correctly predicting spirality (the probability of an image of a galaxy having visible spiral structure) for images of galaxies, in which spiral structure is visible, that have a reasonably high resolution.In particular, since SpArcFiRe is designed to discern spiral structure in disk galaxies, we are most interested in isolating disk galaxies in which spiral structure is visible.By a judicious eyeball study of images at the low end of resolution, we have subjectively determined that spiral structure is invisible in Sloan galaxies if the full major axis of the observable image is less than about 13 pixels, so we ignore any galaxy smaller than this.This is similar to the cutoff of 4.5 arcseconds petrosian radius used by the GZ1 team for galaxies with visible structure [4].Also following GZ1, we cut off galaxies dimmer than magnitude 17.7 in the R band.This leaves about 470,000 Sloan galaxies.
We created models using Weka [21] which provides many machine learning algorithms, an easy-to-use interface, and the ability to create sophisticated standalone command-line classifiers once the model has been trained.Weka provides, among many algorithms, a Neural Network algorithm, and a Random Forest algorithm.Neural Networks have been used with success in similar tasks like the convolutional model used by Dieleman et al. [12].These models excel in tasks where the input is spatially or temporal correlated like images or audio, so we briefly used the Neural Network algorithm to roughly reproduce the results of [10], having downloaded the same data they used from the Galaxy Zoo 1 survey [4,5], which was a treated sample of the Sloan Digital Sky Survey Data Release 6 (SDSS DR6) [22].Since our machine learning algorithm uses the data only after SpArcFiRe has processed it, we found that Weka's Random Forest model had a lower RMSE, and as described in the previous 5 High spirality is a strong indicator of a galaxy being spiral, but it's not a necessary condition.Galaxies with low spirality may be edge-on spirals, ellipticals, low-resolution spirals, or even disk galaxies without spiral structure, such as the Sombrero Galaxy.
A higher accuracy can be achieved if we use a boundary below 0.5.Note that since there were 6 choices in GZ1, any vote receiving more than 1/6 of the votes can be a winning vote; for example, a vote of 40% could be considered a classification if all the other choices had less than 40% of votes.It is also possible to get better accuracy, using the same features, if we build a classifier rather than a regressor, but that's outside the scope of this paper.section, a Random Forest model (described below) makes decisions that are easier for us to dissect and learn from.For the most advanced tasks, we recreated the same random forest models using Julia [23]; the results using Weka and Julia are virtually identical since the underlying mechanisms are the same.
To provide context, we explain the general idea of random forests.The "forest" part refers to a set of decision trees.Each decision tree has a set of input parameters.At each level of the tree, one asks if a particular parameter is in a specific range.For example, one level of the decision tree may ask if the galaxy has an absolute magnitude brighter than 18; another level may ask if it has a color redder than 0. The tree can be very deep, and once we arrive at a leaf node, we have a set of galaxies that satisfy an exact set of characteristics across the parameters that lie along the decision path to that node.The process of optimizing the decision tree is beyond the scope of this paper, but the goal is to optimize the leaf nodes to precisely define whatever output characteristic we are trying to reproduce.In our case, we are trying to reproduce the GZ1 human vote distribution.For example, one leaf may represent all galaxies where the human votes for (elliptical, spiral, other) are close to (0.80, 0.19, 0.01).This helps us determine what characteristics lead a decision tree to classify a galaxy as spiral, elliptical, or other.
The "random" part of a random forest refers to the fact that each decision tree's input parameters are chosen randomly from a larger set of input parameters provided by the user.The number of parameters to use for each tree is itself an integer parameter (fixed, in our case), as is the number of trees to use.Each tree effectively constitutes an "expert" in galaxy classification using its chosen set of parameters, and the forest is then a "mixture of experts", in which a voting mechanism is used to come up with the final classification.A mixture of experts generally results in a much better classification than a single tree trained on all parameters, because the signal of each expert reinforces all the others, while the noise of the experts tends to cancel each other out.( [24] provides an excellent introduction to this idea.) Figure 1 is a simple example of a two-parameter decision tree.In this example, we will apply it only to galaxies that are clearly either spiral or elliptical.However, rather than a discrete classification, our goal is to provide just one number for each galaxy: the probability that it is a spiral galaxy.We use two familiar parameters: color and absolute magnitude.It is well known that elliptical galaxies tend to be both brighter and redder than spirals.Given a training set of galaxies that are truly either spiral or elliptical and given the colors and magnitudes of each, we perform the following set of operations to generate a 2-parameter decision tree: • Compute the mean magnitudes M s , M e for spirals and ellipticals, respectively.
• Compute the mean colors C s , C e for spirals and ellipticals, respectively.
• Compute a threshold color T C intended to separate spirals from ellipticals; we will simply use the midpoint T C = (C s + C e )/2.• Similarly compute a threshold magnitude T M = (M s + M e )/2.
• Now for each galaxy, first ask which side of the threshold its color is on, and then ask which side of the threshold its magnitude is on.• This bins each galaxy into one of four leaf nodes, as in Figure 1.
As we can see, the results are correlated with the correct answers but not strongly so: dim, blue-ish galaxies only have a slightly greater than 50% chance of being spiral, although it is true that bright, reddish galaxies are correctly measured as unlikely to be spiral.Table 2 re-iterates this fact in more detail, and provides an example of another pair of features that provide a better classification scheme, although still only about 75% "correct" in total.We begin to see significantly better results when we start to add features and levels in an individual tree.Table 3 lists the extra features, both from SpArcFiRe and elsewhere, that we use.Table 4 demonstrates how much better the classification gets as we increase the number of features and number of trees in the forest.Table 3. Outputs from SpArcFiRe that are used as input features for our model, in addition to those from Table 1.See Davis and Hayes [2] for full descriptions of these parameters.Parameters labeled "DCO" are measured only across arcs of "dominant chirality only"-that is, arcs of the "wrong" chirality, which are likely to be noise, are not included.The parameter "arcLenAt50%" means: lay arcs end-to-end sorted longest to shortest, resulting in a line of total length L, and measure the length of the arc that lies at the point L/2 along the line.If the arms are short at L/2, then short arcs tend to suggest the galaxy is either flocculent or non-spiral, whereas a long arc at this point suggests a more grand-design spiral.The "rankAt50%" feature is similar, except this is the integer rank of the arc touching the L/2 point.If the ratio ((diskAxisRatio) / (bulgeAxisRatio)) is close to 1, it is suggestive of an elliptical galaxy, whereas if this ratio is significantly less than one it suggests a spiral galaxy (since the bulge axis ratio tends to be 1 from any vantage point, but not so for the disk.)our case), but that each decision tree chooses some random set of features.We will look at how both of these parameters change the results.Presumably, the more features a particular tree uses, the better that tree will be, although more care needs to be put into training these models to avoid overfitting.Figure 2 plots the Pearson correlation between the GZ1 human vote proportion for P SP , and our reproduction of that proportion, as a function of how many features are used by each tree.As can be seen, increasing the number of features used by each tree generally results in improvement.However, since each tree chooses a random subset of features, there is a bit of noise in the curve.It becomes less obvious that there is an improvement beyond about 35 features per tree, so we use 35 in our final results below.We also see that the entire curve moves up as the number of trees in the forest increases.Similarly, we would expect that as the number of trees in the forest is increased, the result would get better.Essentially, as more "experts" weigh into the decision, the better the results should be.Figure 3 demonstrates that this is indeed the case.Furthermore, unlike the case of choosing features, the curve is pretty much monotonically increasing: it seems that more trees are always better [24].In our results below, we use 150 total trees, each using 35 features out of our total set of 101 combined features from SpArcFiRe and SDSS.The advantage of this method over other more opaque methods such as Neural Networks, or SVM, is that once we get to a leaf node of the decision tree, we know exactly why each galaxy is in that node-we can follow the decisions down the tree and build a boolean expression that describes all the galaxies at that node.If we wish, we can then ask ourselves if the decision path makes sense; we can look at the galaxies at that node, and ask if they form an interesting set.This kind of detailed, explicit decision-making analysis is (currently) absent in other machine learning methods although very recent work has begun to study this question [15,16], and is what allows us to be more confident that biases are unlikely to creep into the classification scheme.

Results
As stated before, our goal is to test if adding SpArcFiRe's features to the set of input features will improve our ability to reproduce the vote distribution of GZ1 for spiral galaxies, so instead of classification, we are using regression to achieve our results.This means that rather than having a galaxy falling under a class (spiral, elliptical, and other) our output is the probability of an image of a galaxy having spiral structure.This value, between 0 and 1, is represented by the percentage of humans that agree that a certain galaxy has visible spiral structure.We represent this idea by making the sum of GZ1 values P S + P Z as our target variable, and this is what we train our machine to reproducewhile simultaneously striving to eliminate the known P S bias [6,7].

Measuring the quality of SpArcFiRe features
In the era of big data, machine learning scientists tend to agree that more is always better [24] but for some cases, this is not always true.Just adding features to a model does not guarantee that it will get better.Additional features might represent redundant information, which would not translate into more accurate classifiers for certain machine learning models, or worse, they would contribute to the curse of dimensionality [25].In order to make sure we are adding meaningful information we further analyzed our features.
We built three different random forest models using the same hyperparameters (150 total trees, each using 35 features) but with different feature sets.Model 1 used only SDSS features, Model 2 used only SpArcFiRe features, and Model 3 used both sets of features (this is the model we discuss Peer-reviewed version available at Galaxies 2018, 6, 95; doi:10.3390/galaxies6030095throughout the paper).We ran a 10-fold cross validation7 [26] in each one of those to get a more accurate measure of how those sets performed individually.Model 1 had a mean RMSE 0.1518, Model 2 had a mean RMSE of 0.1522, and Model 3 had a mean RMSE of 0.1404.For the tests and analysis made on this paper, we used the model with the lowest RMSE from the 10-fold cross validation used by model 3, which had an RMSE of 0.1374.This demonstrates that SpArcFiRe features alone are just as good as SDSS features alone at predicting spirality.Furthermore, combining both sets has proven to increase the accuracy of our models.This is already an indication that there is valuable information in both feature sets.Now let's study our results in more detail.Table 5 shows our results, using both SDSS and SpArcFiRe's features, for the test set in a 10x10 confusion matrix.Each row represents one of 10 bins holding galaxies in which a certain fraction of humans voted for that value of spirality; each column represents one of 10 identical bins containing the predicted spirality from our method.Thus, "correct" predictions (within 10% of the human vote) appear along the diagonal of the matrix.The first off-diagonal elements represent where our prediction was 10%-20% off, etc.; the far corners represent our worst predictions.
Notice that our model has high sensitivity and specificity rates, which means that when it predicts that an object is spiral or non-spiral with high confidence, the prediction is very likely correct.For example, let's look at the case where our model predicts that an object is spiral with more than 90% of confidence, the penultimate column of the Table 5.If we consider a decision for spiral or non-spiral object being made above or below the 0.5 threshold, this gives us a sensitivity rate of more than 98%.The similar case happens for non-spiral predictions with more than 90% confidence (where P SP ≤ 0.1), the second column of the same table, in which, also considering a 0.5 threshold for a decision, our model gets more than 99% specificity rate.
In order to check which features seem to be the most important overall, we also created a feature ranking.As we have depicted in Figure 1, each node in a decision tree is a condition that splits the decision tree in two based upon a threshold in one variable.The measure used to make that decision is called impurity, and it is usually entropy for classification trees and variance for regression trees.It basically encodes how much information a particular feature, upon selection, adds to the decision process.The more outputs a feature can separate, the higher its entropy is going to be, thus decreasing the impurity of the decision tree.So, we compute how much each feature decreases the weighted impurity of a tree.In our case, since we are using random forests, the impurity decrease from each feature can be averaged, and the features are ranked according to this measure [27].Table 6 shows the top 10 features ranked by their importance along with the standard deviations of that score since this is an average over 150 decision trees.We can see that from the top 10 features 5 come from SDSS and 5 from SpArcFiRe, suggesting again that the two feature sets contribute roughly equally to the quality of the results.The 5 best SpArcFiRe features are all related to the number of arcs greater or equal to a certain amount of pixels, which is, not surprisingly, a strong indicative of the presence of spiral structure.Interestingly, in SpArcFiRe's favor, the best feature overall is the number of dominant-chirality-only arms equal or longer than 120, which is 30% more relevant than the most relevant feature from SDSS.
Another way to visualize our results is to look at the Pearson correlation between our results and the GZ1 votes.Figure 4 shows this correlation represented in a graph where the x-axis represents the human votes and the y-axis our algorithm output, this time for all the 470,000 galaxies.Each red point represents one galaxy, and its (x,y) position represents our level of agreement.When x equals y, we are in complete agreement with the human votes.The clustering around the line y = x suggests good agreement with GZ1.It is also notable that more than 98% of the galaxies have |x − y| ≤ 0.3 and approximately 95% of the objects fall under |x − y| ≤ 0.2.
In figure 5 we show some of our correctly classified objects.Those objects were cases where our model had a high agreement with the classifications provided by GZ1, and looking at the images we understand why.In 5a we display some of the spiral objects detected, while in 5b we show the non-spiral objects detected, which belong to the other classes of objects in GZ1: Elliptical, Merger, and Artefact, respectively.
It is important to understand what is going on in the 2% of objects that are outside of that scope.These objects are in the opposite corners of the off-diagonal in Table 5: 4 objects from the bottom left corner and 1 from the top right corner.These are that objects with a high disagreement: |x − y| ≥ 0.9.From our total of 45802 galaxies in the test set, only 5 falls under this margin, and we show all of them in figure 6.
The top 4 rows depict the same problem: very faint arms that SpArcFiRe entirely failed to detect during the disk detection phase, so that it zoomed in past the arms, making it impossible for the arm detection code to find anything useful.This is a rare occurrence, and we are aware of this issue and are working on improving this specific step of the algorithm.The object on the bottom row is clearly a merger, and arm-like features are present, so our machine predicts a high spirality.One could argue that this is a correct prediction that the galaxy is not an elliptical galaxy, but the GZ1 humans correctly Peer-reviewed version available at Galaxies 2018, 6, 95; doi:10.3390/galaxies6030095marked it as a merger and thus not a spiral at all.Since our machine has not been trained to detect mergers, it is unclear whether this should count as a misclassification. 8

Conclusion
Our results show that it is possible to have a solid model that is in agreement with human classifiers above 90% of the time and also deal with the winding bias problem which was addressed in more detail in [7].In this sense, we "filter" the errors made by humans while still retaining the useful knowledge provided by the Galaxy Zoo.
What differentiates this from previous work is the addition of SpArcFiRe's output which adds more information to the objects we are discriminating and helps to decrease the amount of bias present in the classifications provided in GZ1.These results demonstrate that SpArcFiRe adds valuable information, rather than repeated, which can be used by automatic machine learning classifiers and regressors to achieve better results.We provided some insights on what these models find more descriptive for spiral galaxies demonstrating the most important parameters used by random forests in Table 6.
Further experimentation with SpArcFiRe information could contribute even more to automatic classification since this work focused on showing that its information could be useful when added to 8 One might argue that perhaps our "spirality" measure is more aptly called "non-ellipticity".

Figure 2 .
Figure2.The Pearson correlation between the fraction of GZ1 humans voting for spiral, and our reproduction of that vote fraction, as a function of the number of features per tree that are chosen at random from the entire feature set.The three curves correspond to the cases where the total number of trees is 10, 20, or 50.

Figure 3 .
Figure 3. Similar to Figure 2, the Pearson correlation between the fraction of GZ1 humans voting for spiral, and our reproduction of that fraction, as a function of the number of trees in the forest.The three curves also show how the results change when the number of features per tree is 10, 20, or 50.

Figure 4 .
Figure 4. Scatter plot of our predicted spirality (vertical) vs. the fraction of GZ1 humans voting for spiral (horizontal).The Pearson correlation is 0.86, and the points cluster around the line y = x, depicting good agreement.Additionally, more than 98% of the galaxies have |x − y| ≤ 0.3 and approximately 95% of the objects fall under |x − y| ≤ 0.2.The vertical white lines appear because the fraction of human voters is a ratio of discrete integers.

PreprintsFigure 6 .
Figure 6.Grossly Misclassified Objects.In sets of 3, from left to right column, the images show the Original Input Image, the same image automatically cropped by SpArcFiRe, and the spiral Arcs detected on the image (if any).The SDSS Object IDs, the GZ1 Spirality prediction (P SP ), and our Random Forest Prediction (RF SP ) are shown above each trio of images.In all but the last, the problem is low-surface-brightness arms, which we know about and are working on this issue.Despite the disagreement in the 5th object, a merger, spiral structure is indeed present.

Table 1 .
[10]SpArcFiRe input parameters we used, identical to those used in Banerji et al.[10], except for the absolute Magnitudes that also come from SDSS.
provides an example, giving a quantitative flavor of how color and magnitude provide information -both separately and together -about separating spirals and ellipticals, in ways a Neural Network cannot.

Table 2 .
Classification results for two-level, two-feature trees like that in Figure1.Columns p i represent the average fraction P SP , across galaxies in leaf node i, of GZ1 humans who voted that object to be a spiral galaxy, across the training set.This value is then the assigned P SP for any non-training-set galaxy placed in this leaf node.correctAll:assumingPSP> 0.5 represent a positive spiral classification, the percentage across all galaxies of correct classifications; SPcapture: the fraction of true spirals that are captured by this classification scheme.SPcontam: the fraction of galaxies classified as spiral that are incorrectly classified.Top row: exactly the tree of Figure1.Second Row: a pair that arguably performs better because it has a higher total correct classification, primarily because it has far less contamination of non-spirals, even though it has a smaller capture fraction.It demonstrates that we can have 75% correct classifications even with just a two-parameter, two-level tree.See Table1for the meaning of the input variables.

www.preprints.org) | NOT PEER-REVIEWED | Posted: 18 June 2018 doi:10.20944/preprints201806.0279.v1 Peer
We now explore in depth how many total trees should be in the forest, and how many randomly chosen features should be in each tree.Recall that the total number of features is fixed (and is 101 in Preprints (-reviewed version available at Galaxies 2018, 6, 95; doi:10.3390/galaxies6030095

Table 4 .
Illustration of how the results of the classification improve as we allow more complex trees, and larger forests.

Table 5 .
Predictions Confusion Matrix.The rows represent the number of objects that have a GZ1 spirality between a specific interval.The columns represent how many of those our Random Forest predicted in the same and different intervals.Notice that these numbers are only for the test set, thus a total of 45802 objects, which represent a more accurate measure of how our Random Forest would perform in real-world situations.

Table 6 .
Top 10 best features for spirality prediction in decreasing order of importance.The standard deviation is measured across the 150 decision trees.