Automated quantification of galaxy morphology is necessary because the size of upcoming sky surveys will overwhelm human volunteers. Existing classification schemes are inadequate because (a) their uncertainty increases near the boundary of classes and astronomers need more control over these uncertainties; (b) galaxy morphology is continuous rather than discrete; and (c) sometimes we need to know not only the type of an object, but whether a particular image of the object exhibits visible structure. We propose that regression is better suited to these tasks than classification, and focus specifically on determining the extent to which an image of a spiral galaxy exhibits visible spiral structure. We use the human vote distributions from Galaxy Zoo 1 (GZ1) to train a random forest of decision trees to reproduce the fraction of GZ1 humans who vote for the “Spiral” class. We prefer the random forest model over other black box models like neural networks because it allows us to trace post hoc the precise reasoning behind the regression of each image. Finally, we demonstrate that using features from SpArcFiRe—a code designed to isolate and quantify arm structure in spiral galaxies—improves regression results over and above using traditional features alone, across a sample of 470,000 galaxies from the Sloan Digital Sky Survey.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited