1. Introduction
The exploration of the activity of black holes in the universe has become an important topic of contemporary astronomy. Observations have shown that supermassive black holes exist in the centers of the most massive galaxies; however, only a small number of them have shown high activity levels, a phenomenon known as the active galaxy nucleus (AGN) [
1]. This activity is characterized by the constant radiation of different wavelengths, including x-rays, ultraviolet rays, and visible light, which are often accompanied by the presence of broad-absorption lines in the spectrum. Among them, broad-line absorption quasars (BAL) have attracted considerable interest because of their unique properties and their potential to provide insights into the physics of active galaxy nuclei and the accumulation of black holes.
The discovery of quasars marked a crucial moment in the early 20th century, significantly shaping the field of extragalactic astronomy. The first quasars were discovered by spectral observations, and unusual features were discovered in their light. In 1917, the American astronomer Harlow Shapley observed apparent absorption lines in the spectra of some galaxies, likely caused by the absorption of background radiation by stars and surrounding matter [
2]. This observation served as a basis for further work. In 1922, the American astronomer Walter S. Adams discovered that the spectral lines of quasars were very different from those of stars. This made it possible to recognize that quasars must function according to a unique set of physical principles that distinguish them from ordinary stars and galaxies [
3].
Improved observation techniques have significantly improved our knowledge of quasars. Originally, such research focused on visible and ultraviolet wavelengths, but the appearance of X-ray and infrared observations allowed astronomers to better understand the properties of quasars. In 1967, the astronomer Roger J. Taylor of Nash University, United Kingdom, detected for the first time the X-rays emitted by quasars, a revolutionary discovery [
4]. This revolutionary discovery provided strong evidence of the presence of high-energy, high-temperature particles inside the quasars, likely produced by the accumulation of matter in the supermassive black holes in the centers of quasars. This discovery was essential for improving our understanding of the extreme physical processes that occur in these mysterious objects.
Quasars, or quasi-stellar objects (QSOs), represent a fascinating subclass of AGNs, known for their exceptional brightness and intense, high-energy emissions. The discovery of quasars by Maarten Schmidt in 1963 marked a turning point in astronomical research and catalyzed a series of revolutionary advances in the field of astrophysics [
5]. Schmidt’s discovery, which considered quasars to be distant, high-energy objects with unusual spectral characteristics, not only reshaped our understanding of the most powerful phenomena in the universe but also paved the way for later discoveries about the nature of black holes, the evolution of galaxies, and the large-scale structure of the universe. For example, the study of quasars has greatly improved our understanding in areas such as the coevolution of supermassive black holes (SMBHs) and their host galaxies [
6,
7,
8,
9], the processes of controlling the structures of large-scale galaxies [
10,
11], the interstellar and intergalactic media through various red shifts [
12,
13], and cosmic reionization and accumulation mechanisms [
14,
15,
16].
BAL quasars represent a distinct subclass within the broader category of quasars, characterized by the blue-shifted absorption troughs in their spectra. These features are indicative of high-velocity gas outflows originating from a quasar’s central region, moving along the line of sight. These outflows can exceed velocities of 2000 km s
−1, serving as a potent feedback mechanism within an AGN [
17]. The absorption lines observed in BAL quasars predominantly originate from highly ionized atoms, such as C IV, Si IV, N V, and O IV, providing valuable insights into large-scale galactic processes [
18]. However, BAL quasars have sparked widespread interest as an important subclass in the quasar research process. They exhibit broad- and deep-absorption lines in their spectra, and the mechanism by which these lines are produced may involve the motion of high-speed gas clouds around black holes. A team headed by the British astronomer Patrick B. Palit first characterized BAL quasars in detail in 1991 [
19]. They found that the spectra of BAL quasars differed markedly from those of typical quasars, particularly in the characteristics of the absorption lines, such as their widths and depths. These distinctive features indicated the unique physical processes occurring within a quasar’s central region, offering valuable insights into the dynamics of the material surrounding supermassive black holes.
With the continuous development of observational techniques, several survey programs, such as the Sloan Digital Sky Survey (SDSS), have provided more observational data for quasar research. However, with the rapid increase in the amount of data, traditional manual classification methods become difficult and subjective. Therefore, automated classification methods have become particularly important. Principal component analysis (PCA), spectral template matching, and deep learning techniques have become widely employed tools in quasar classification. These methods enable researchers to efficiently process and analyze vast amounts of observational data, facilitating the identification and categorization of quasars. By leveraging these advanced computational approaches, astronomers can extract key features from quasar spectra, enabling more rapid and accurate classification while also uncovering subtle patterns and correlations that may otherwise be overlooked [
20,
21,
22].
This study applied deep learning techniques to classify the quasar spectrum in the 16th SDSS data publication (DR16) using tags that distinguished the BAL quasars from the non-BAL quasars. We investigated various methods of data pre-processing, including various downscaling methods, to optimize inputs for the formation of the deep learning models. The goal was to build robust models that could accurately classify these quasar samples, with the aim of using the formed models when datasets became available in the future. By completing this study, we sought to improve our ability to differentiate quasar types, which, in turn, would deepen our understanding of the activity of supermassive black holes and open up new possibilities for studying complex phenomena in the most dynamic regions of the universe.
2. Data
2.1. Dataset
The Sloan Digital Sky Survey (SDSS), a major astronomical survey project that uses the Sloan Foundation 2.5 m telescope at the Apache Point Observatory (APO) and the DuPont 2.5 m telescope at the Las Campanas Observatory (LCO), has multi-band images and spectroscopic data that have been provided for more than three million objects. The SDSS has gone through the following five phases since the start of its official survey in 2000: SDSS-I (2000–2005), SDSS-II (2005–2008), SDSS-III (2008–2014), SDSS-IV (2014–2020), and SDSS -V (2020–2025). Each phase covered different observational and scientific objectives [
23].
The SDSS-IV DR16 is the core dataset of this study. DR16 was derived from the fourth data release of SDSS-IV in 2020, which included quasars with spectral wavelengths in the range of 3600–10,400 Å and a spectral resolution of ~2000. The DR16 Quasar Catalog (DR16Q) (Lyke et al., 2020) recorded 750,414 quasars with 920,110 observations. DR16Q is the final SDSS-IV/eBOSS catalog and the largest sample of spectroscopically certified quasars available to date [
22,
23]. Bolton et al. (2012) described the classification and redshift measurement methods applied to the SDSS data [
24]. The SDSS DR16, documented by Ahumada et al. (2020), provides extensive spectroscopic data that were critical for analyzing and classifying quasars in this study [
23].
The rest-frame wavelength range of 1260–2400 Å was selected in this paper following thorough analysis and consideration. This range was chosen due to the significance of key spectral lines, such as C IV and Si IV, and their positions within the spectrum. This selection ensured adequate coverage of the C IV absorption region, which was crucial for studying the BAL quasars as it reflects the absorption signature of high-velocity ions. Additionally, this range minimized the impact of unrelated spectral regions, enabling a more precise analysis of the absorption features.
Figure 1 shows some spectral images of the BAL and non-BAL samples, which were the one-dimensional spectrum.
In constructing the dataset, the BAL_PROB parameter from SDSS DR16Q was used to differentiate between the BAL quasars and the non-BAL quasars. The classification of the BAL quasars and the non-BAL quasars was based on the BAL_PROB parameter provided in the DR16Q quasar catalog. BAL_PROB is a probability value calculated by fitting the C IV absorption features in the spectrum. When the BAL_PROB parameter is a 1, it indicates that the quasar exhibits significant broad-absorption line features, meaning it is a BAL quasar, and when the BAL_PROB parameter is a 0, it indicates that the quasar does not exhibit broad-absorption line features, meaning it is a non-BAL quasar. This calculation method combines physical model fitting and statistical analysis to ensure the accuracy of a classification [
21]. To ensure the model remained balanced and unbiased, a random selection of the non-BAL quasars, equal in number to the BAL quasars, was chosen to form the final training and test datasets. The exact sample sizes are presented in
Table 1.
2.2. Data Preprocessing
To ensure all spectral data were on a consistent scale, a linear interpolation method was applied to resample the flux data. This process not only guaranteed that all spectra had the same number of data points but also helped to reduce the errors caused by uneven data distribution. Additionally, to eliminate the flux differences between the spectra, the flux of each spectrum was normalized by dividing it by its average flux, ensuring that the average flux for all spectra was normalized to 1. The formula for this normalization is as follows:
where
and
is the
i-th and the
j-th data point of the spectral flux sequence, and
m is the number of data points in the spectral flux sequence. The normalized spectral flux data were used in the subsequent research.
3. Method
3.1. Data Dimensionality Reduction
Spectral data from BAL and non-BAL quasars were used in this study. While these datasets contain valuable information from astronomical observations, they also face major challenges. Dimension reduction is a way to convert larger dimension data into a smaller dimension representation [
25,
26]. This process preserves the essential features of the data while information which makes little contribution to the analysis or construction of the model is eliminated, making the data more suitable for subsequent classification tasks.
Therefore, this study applied various dimension reduction techniques including PCA, t-SNE (t-distributed random neighborhood embedding), and variety learning algorithms to extract features. The simplified data were used as inputs into the deep learning algorithm for the quasar classification. This approach allowed for a more efficient processing of the high-dimensional data, improved model formation efficiency, and prediction accuracy, leading to better research results.
3.1.1. Principal Component Analysis (PCA)
PCA is a common dimensionality reduction algorithm that extracts the most important information from high-dimensional data [
27]. It is used to reduce the number of features while preserving the maximum variance. PCA uses a linear transformation to map high-dimensional datasets to a lower-dimensional space, retaining as much of the original variance as possible. These new lower-dimensional features are called principal components. The first principal component has the highest variance in the data, the second has the second highest variance, and so on, with each subsequent component capturing progressively less variance. By selecting the top few principal components, we could reduce the dimensionality of the data while still retaining most of the dataset’s key information [
28]. This approach was supported by previous work (e.g., Guo and Martini, 2019) that achieved a high level of accuracy in detecting BAL quasars using the CNN model in combination with PCA for quasar classification [
29]. Fereras et al. (2006) used PCA to analyze the spectra of galaxies, helping to identify the characteristics of star groups and highlighting the effectiveness of PCA in reducing the size of astronomical datasets [
30]. In addition, Fathivavsari (2020) applied deep learning methods for emission-line predictions, demonstrating the effectiveness of deep learning in processing complex spectral data [
31].
The data were subjected to a principal component analysis. The first eight main components accounted for 8.7%, 4.6%, 3.5%, 3.4%, 2.3%, 1.9%, 1.7%, and 1.3%, respectively, of the total variance. The cumulative difference for the eight major component explanations was 27.4%, and for the first 100 major component explanations, it was 55%. This showed that the first 100 main components retained 55% of the information. The first 100 main components were, therefore, used as input features in the deep learning model.
3.1.2. t-Distributed Stochastic Neighbor Embedding (t-SNE)
t-SNE is a non-linear dimension reduction algorithm designed to map high-dimensional data into two- or three-dimensional space for data visualization and feature extraction. The central idea is to calculate the similarity between the samples in a larger space and then translate this similarity into probability distributions.
The main advantage of t-SNE is its ability to effectively visualize class information in large data, focusing on local structure and similarity between data points. In astronomy, t-SNE is widely used for spectral classification, star classification, and the analysis of abnormal spectral data, revealing the intrinsic structure of larger astronomical data [
32,
33]. In this study, we used the t-SNE algorithm to reduce the size of the data to two dimensions (as shown in
Figure 2) and introduce them into a deep learning model.
3.1.3. Manifold Learning Algorithms
Locally linear embedding (LLE) and isometric mapping (ISOMAP) are two common manifold learning algorithms used for nonlinear dimensionality reduction. They are effective at preserving the intrinsic structure of data while mapping high-dimensional data to a lower-dimensional space. The LLE algorithm reduces dimensionality by preserving local linear relationships within data [
34]. Similarly, ISOMAP captures the manifold structure of the data by using geodesic distances to achieve high-to-low dimensional reduction. We applied both the LLE and ISOMAP methods to reduce the dimensionality of the preprocessed data, allowing us to better capture the data’s intrinsic features. We used LLE and ISOMAP to reduce the original data to 20 dimensions, which were then used as inputs for the deep learning model.
3.1.4. Data Transformation and CNN Input
In the method presented in this paper, the feature dimensions resulting from the different dimensionality reduction techniques (PCA, t-SNE, LLE, and ISOMAP) were inconsistent. To make the dimensionality-reduced results compatible with the CNN input format (a fixed length of 1426), we designed an extension method to map the reduced dimensions to uniform lengths while retaining as much information as possible from the original reduced results.
The result of PCA dimensionality reduction was a 100-dimensional feature set, representing the principal components that captured the most information from the original spectrum. However, CNN requires input data to have a fixed length of 1426. To adapt the PCA result to the CNN input format, the 100-dimensional features were extended through linear interpolation, treating each principal component as an independent one-dimensional feature sequence. Linear interpolation was applied to each sequence, expanding the data points from 100 to 1426. This method maintained the smoothness of the feature distribution during interpolation while avoiding excessive information loss. After interpolation, all extended principal components were combined along the time steps to form a single-channel input sequence length of 1426. This process ensured the integrity of the dimensionality-reduced features while adapting to the CNN architecture’s requirement for fixed-length inputs.
The result of a t-SNE dimensionality reduction is typically 2D, intended to represent the nested relationships in high-dimensional data. To make this compatible with the CNN input format, we applied a repeating padding method to extend the 2D features generated by t-SNE, filling each dimension until the data points in each dimension reached 1426. The repeating padding method effectively extended the feature length while avoiding the potential information distortion caused by the interpolation. After extension, the two-dimensional features were combined into a single-channel input sequence length of 1426. The padding and combination process ensured the independence and consistency of the feature information, providing a sufficient representation of the nested relationships for the classification task.
The results of LLE and ISOMAP dimensionality reductions are typically 20-dimensional, capturing the local linear relationships or global manifold structures of the original data. Using a combination of interpolation and padding, each feature sequence from each dimension was first extended through linear interpolation, expanding the data points from 20 to 1426. If the number of interpolated points was less than 1426, the length was completed by circular padding or repetition. After interpolation and padding, all dimensional features were combined along the time steps to form a single-channel input sequence length of 1426. This method retained the key feature information from the dimensionality-reduced results while ensuring consistency with the CNN input format.
3.2. Deep Learning Model
In
Section 3.1, we applied various methods of dimension reduction for the extraction of features. Next, we used the simplified data as inputs for a deep learning classification model and formed this model to classify quasars. A convolutional neural network (CNN) is a very successful deep learning model for image processing and computer vision tasks. It automatically learns the features of images through convolution, pool, and full connection layers, allowing tasks such as image classification and object detection to be performed [
35]. For the classification of quasars, spectral data can be used as a particular type of one-dimensional image, allowing a CNN to learn the features within a spectrum to distinguish BAL quasars from non-BAL quasars. In astronomy, a CNN can be used for various tasks such as galaxy classification, object detection, spectral classification, etc. [
36,
37,
38]. For example, as part of a mission to classify galaxy images, a CNN can learn to distinguish different types of galaxies, such as elliptical and spiral galaxies, using images observed through a telescope [
36]. The convolutions capture the texture and shape features of galaxies, allowing for high-precision classification. In object detection missions, a CNN can be used to detect targets such as planets, stars, and galaxies in celestial patrol data [
39]. By sliding a convolution core over an image, a CNN can automatically locate astronomical objects. As part of spectral classification tasks, a CNN can learn the features of spectral data in different wavelength regions and perform tasks similar to image classification. For example, with quasar spectral data, a CNN can distinguish between broad- and non-broad-absorption lines [
40].
Therefore, the in-depth learning model of this study used a CNN as the basic architecture, the structure of which is illustrated in
Figure 3. This model took one-dimensional spectral data as inputs and then easily extracted customer features using three convolution frames. Each frame was composed of a convolution layer, an activation function, and a pool layer. The convolution layer comprised a 1 × 3 convolution core that increased the number of channels from 64 to 128 and possibly to 256, allowing the model to explore in-depth representations of more complex features. The activation function used in the model was the rectified linear unit (ReLU) function, while the pooling layer comprised a maximum 1 × 2 pooling policy to compress the features (i.e., CARDS). After the feature extraction, the features were affected by a dropout layer that randomly eliminated 50% of the units, including their connections, to mitigate the risk of overshoot. The data were then propagated to the fully connected layer into which the model exported the results of the classification. Our model was trained for 15 epochs, which was sufficient to achieve a balance between underfitting and overfitting for this dataset, and we used the Adam optimizer because of its adaptive-learning rate properties. In addition, we applied a learning rate reduction strategy in which the learning rate was dynamically adjusted as training progressed.
4. Results and Discussion
To evaluate the performance of the classification model, we used
accuracy,
precision,
recall, and
F1 scores to evaluate the classification performance of the model. The following formulas were calculated:
where
TP (true positive) is the number of cases where the sample was truly positive (BAL quasars) and the model’s predictions were correct,
TN (true negative) is the number of cases where the sample was actually negative (non-BAL quasars) and the model correctly predicted that it was negative,
FP (False Positive) indicates a situation where the sample was actually negative and the model incorrectly predicted it as positive, and
FN (false negative) indicates a situation where the sample was actually positive and the model incorrectly predicted it as negative.
In this paper, we used PCA, t-SNE, LLE, and ISOMAP, respectively, to extract features from original data. These features were applied to the deep learning model (
Figure 3), and the classification results were evaluated using the
accuracy,
precision,
recall, and
F1 scores mentioned above. The results indicated that the deep learning model with feature extraction using the PCA method was the best-performing model on both the training (
Table 2) and test (
Table 3) datasets.
During our classification experiments, we found that the performance of the deep learning models varied considerably depending on the dimension reduction algorithm used to extract the features. Among the four algorithms—PCA, LLE, ISOMAP, and t-SNE—PCA performed the best on the training and test datasets. Specifically, the models using PCA achieved impressive accuracy rates of 99.7% and 99.1% on the training and test datasets, respectively. In addition, the recall, precision, and F1 results for the combined PCA model were all greater than 99.0%, indicating that it was able to accurately classify the vast majority of BAL and non-BAL quasars with minimal error. This excellent performance shows that PCA effectively captured the key data features while reducing the size of the data. In doing so, it promoted the ability of the deep learning model to generalize unseen data, highlighting the robustness of the approach in distinguishing types of quasars.
In the contrast, the model using LLE also performed well, although it was slightly behind the model that was based on PCA. It had an accuracy of 98.5% for the training set and 98.9% for the testing set. The performance indicators (the recall, accuracy, and F1 scores) were almost comparable to PCA, with a value of approximately 99% for all test data. The emphasis was on maintaining local relationships in the data, which allowed it to obtain strong classification results. However, its performance was slightly lower than that of PCA, likely due to the emphasis on local rather than global structures, which may have led to a less complete understanding of the global feature space of the data. The models using ISOMAP showed more significant performance declines, with an accuracy of 94.2% for the training data and 94.5% for the testing data. Although ISOMAP was effective in preserving the structure of the overall variety, its computational complexity and the nature of the data may have precluded its ability to fully grasp the complex relationships between the different quasars. Despite the accuracy of 95.8% obtained for all test data, the recall rate was 93.0% and the F1 score was 94.4%, which indicated that the model had more difficultly in correctly identifying all positive cases (BAL stars), resulting in a higher rate of false negativity (209 FNs for all the test data). Although commonly used for data visualization, the t-SNE algorithm performed the least well for this classification task, with 91.1% accuracy for the training set and 90.6% for the testing set. The model had a recall rate of 88.3% and an F1 score of 90.4% for the test dataset. As shown by the relatively high number of false negatives (351 FNs for the test dataset), the model had more difficulty in identifying positive cases. The fact that t-SNE focused on preserving local structures without maintaining overall relationships may have limited its ability to extract the most relevant features for classification. This was also reflected in the higher number of false positives and false negatives, indicating that the model had difficultly distinguishing between the two categories of quasars (BAL and non-BAL). The LLE focused more on preserving local relationships, allowing for a better understanding of the patterns and features of the dataset. Compared to ISOMAP, LLE focused more on local structure, required less computing power, and was suitable for larger datasets. On the other hand, ISOMAP focused on the global variety structures and may have required more computing resources, especially for large datasets, making it more suitable for cases where there were large distances between the data points or global schemas. Overall, the experimental results emphasize the importance of choosing an appropriate method for extracting features for a reduction in a dimension of a deep learning model. The ability of PCA to capture global patterns in the data allowed for more accurate classification, while the other approaches, particularly t-SNE, had difficulty maintaining the necessary balance between representing the local and global features.
Since PCA showed the best performance, we applied the features by PCA as the inputs for a CNN model. In the CNN-based classification task, we demonstrated the effectiveness of the deep learning methods for spectral data classification, achieving up to 99.1% accuracy on the test dataset, as shown in
Table 4. This showed the strong capability of deep learning models in distinguishing between BAL quasars and non-BAL quasars.
Based on using PCA for the dimensionality reduction,
Table 4 compares the performance of our best deep learning model with the XGBoost model previously developed by Kao et al. [
20]. Our model outperformed Kao’s model in both accuracy and classification metrics for distinguishing between BAL and non-BAL quasars. Overall, this research yielded promising results in using deep learning methods to differentiate between BAL quasars and non-BAL quasars. While the various dimensionality reduction methods differed in how they captured data features, patterns, and computational costs, optimizing deep learning models played a critical role in achieving high accuracy in the classification tasks.
5. Conclusions
This study examined the effectiveness of deep learning methods in distinguishing BAL quasars from non-BAL quasars. The impact of the different dimension reduction techniques on the classification of astronomical data was studied, and their respective performances were compared by reducing the size of the spectral data and applying a CNN classification.
The PCA, t-SNE, LLE, and ISOMAP dimensionality reduction methods were assessed globally. The experimental results showed significant differences between these methods in capturing the features and patterns of the underlying data. PCA is a widely used linear reduction technique that effectively reduces data size while preserving critical variances, making it ideal for providing the input features needed for subsequent deep learning models. LLE and ISOMAP focus on capturing local relationships and global variety structures, respectively, and they show significant performances based on specific data features and patterns. Although t-SNE may be less effective for classification tasks, it has proven itself valuable in visualizing data distribution and revealing intrinsic structures in larger spaces.
High-dimensional data often contain large amounts of redundant information and noise, which poses enormous challenges in computing and generalizing compared to traditional classification models. As a linear dimension reduction method, PCA effectively reduces data size by preserving key overall features, reducing model complexity, and improving drive efficiency. However, principal component analysis is essentially linear and cannot fully capture the potentially non-linear models of spectral data. CNNs have powerful feature extraction capabilities that allow for automatically learning local and global features through their hierarchical structures. This study combined the dimension reduction capability of PCA with a CNN’s ability to learn features to obtain complementary benefits. PCA extracted key feature components from spectral data and efficiently filtered redundant information. The CNN also extracted deep feature models from the PCA-reduced data, improving the model’s classification accuracy. The experimental results showed that the combined method not only improved the classification performance, but it also significantly reduced the computing volume and was particularly suitable for processing large-scale spectral datasets.
Spectral signatures of wide-absorption line (BAL) quasars are often complex and varied, and conventional classification methods have difficulty in effectively capturing the subtle features of larger data. The method proposed here optimizes the classification task according to the following aspects: the highly adaptive selection of features, as this method used the main component analysis to select the first 100 main components, which ensured that the main physical information of the spectrum, such as the absorption characteristics of C IV and Si IV, were preserved during size reduction; and efficient model architecture, as the CNN used a one-dimensional convolution core to process the spectral data in combination with a pool layer and activation functions, which improved the model’s ability to capture local features while avoiding overfitting.
This study offers new approaches and perspectives for the classification of astronomical data, providing astronomers with tools to select the size reduction algorithms and deep learning frameworks best suited to their specific needs. These methods have the potential to replace traditional manual classification techniques and significantly improve the efficiency of astronomical surveys. In addition, they should be used for the analysis of large-scale astronomical data, such as data from the LAMOST and DESI surveys, where high-speed automated classification methods can play a key role in improving our understanding of the universe.
Author Contributions
Conceptualization, S.P., H.K. and Y.Z.; data curation, S.P. and Y.Z.; formal analysis, S.P.; funding acquisition, Y.Z.; investigation, S.P., W.K. and Y.Z.; methodology, S.P., H.K. and Y.Z.; project administration, H.K. and Y.Z.; resources, S.P. and Y.Z.; software, S.P., H.K., Z.L. and W.K.; supervision, H.K. and Y.Z.; validation, S.P., W.K. and Y.Z.; visualization, S.P., H.K. and Z.L.; writing—original draft, S.P., H.K. and Z.L.; writing—review and editing, S.P., H.K., Z.L. and Y.Z. All authors have read and agreed to the published version of the manuscript.
Funding
The research was funded by the National Natural Science Foundation of China under grants nos. 12273076 and 12133001 and the China Manned Space Project under science research grant nos. CMS-CSST-2021-A04 and CMS-CSST-2021-A06.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data presented in this study are available on request from the corresponding author.
Acknowledgments
The authors acknowledge the SDSS databases. Funding for the Sloan Digital Sky Survey IV was provided by the Alfred P. Sloan Foundation, the U.S. Department of Energy Office of Science, and the Participating Institutions. SDSS-IV acknowledges the support and resources from the Center for High-Performance Computing at the University of Utah. The SDSS website is
www.sdss.org (accessed on 21 May 2024). SDSS-IV is managed by the Astrophysical Research Consortium for the Participating Institutions of the SDSS Collaboration, which includes the Brazilian Participation Group, the Carnegie Institution for Science, Carnegie Mellon University, the Chilean Participation Group, the French Participation Group, the Harvard-Smithsonian Center for Astrophysics, the Instituto de Astrofísica de Canarias, Johns Hopkins University, the Kavli Institute for the Physics and Mathematics of the Universe (IPMU)/University of Tokyo, the Lawrence Berkeley National Laboratory, the Leibniz Institut für Astrophysik Potsdam (AIP), the Max-Planck-Institutfür Astronomie (MPIA Heidelberg), the Max-Planck-Institutfür Astrophysik (MPA Garching), the Max-Planck-Institut für Extraterrestrische Physik (MPE), the National Astronomical Observatories of China, New Mexico State University, New York University, the University of Notre Dame, Observatário Nacional/MCTI, Ohio State University, Pennsylvania State University, the Shanghai Astronomical Observatory, the United Kingdom Participation Group, the Universidad Nacional Autónoma de México, the University of Arizona, the University of Colorado Boulder, the University of Oxford, the University of Portsmouth, the University of Utah, the University of Virginia, the University of Washington, the University of Wisconsin, Vanderbilt University, and Yale University.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Antonucci, R. Unified Models for Active Galactic Nuclei and Quasars. Annu. Rev. Astron. Astrophys. 1993, 31, 473–521. [Google Scholar] [CrossRef]
- Shapley, H. Note on the Magnitudes of Novae in Spiral Nebulae. Publ. Astron. Soc. Pac. 1917, 29, 213–217. [Google Scholar] [CrossRef]
- Adams, W.S.; Joy, A.H. A Spectroscopic Method of Determining the Absolute Magnitudes of A-Type Stars and the Parallaxes of 544 Stars. Astrophys. J. 1922, 56, 242. [Google Scholar] [CrossRef]
- Mestel, L.; Pagel, B.E.J. Roger John Tayler, O.B.E.. 25 October 1929–23 January 1997. Biogr. Mem. Fellows R. Soc. 1998, 44, 405–416. [Google Scholar] [CrossRef]
- Hazard, C.; Mackey, M.B.; Shimmins, A.J.; Schmidt, M.; Greenstein, J.L.; Matthews, T.A.; Oke, J.B. 120. The Discovery of Quasars Investigation of the Radio Source 3C 273 by the Method of Lunar Occultations. Nature 1963, 197, 1037–1039. [Google Scholar] [CrossRef]
- Di Matteo, T.; Springel, V.; Hernquist, L. Energy Input from Quasars Regulates the Growth and Activity of Black Holes and Their Host Galaxies. Nature 2005, 433, 604–607. [Google Scholar] [CrossRef] [PubMed]
- King, A. Black Holes, Galaxy Formation, and the MBH-σ Relation. Astrophys. J. 2003, 596, L27–L29. [Google Scholar] [CrossRef]
- Kormendy, J.; Ho, L.C. Coevolution (or Not) of Supermassive Black Holes and Host Galaxies. Annu. Rev. Astron. Astrophys. 2013, 51, 511–653. [Google Scholar] [CrossRef]
- Silk, J.; Rees, M.J. Quasars and Galaxy Formation. arXiv 1998, arXiv:astro-ph/9801013. [Google Scholar]
- Dawson, K.S.; Kneib, J.-P.; Percival, W.J.; Alam, S.; Albareti, F.D.; Anderson, S.L.; Armengaud, E.; Aubourg, É.; Bailey, S.J.; Bautista, J.E.; et al. The SDSS-IV extended baryon oscillation spectroscopic survey: Overview and early data. Astron. J. 2016, 151, 44. [Google Scholar] [CrossRef]
- Eisenstein, D.J.; Weinberg, D.H.; Agol, E.; Aihara, H.; Allende Prieto, C.; Anderson, S.F.; Arns, J.A.; Aubourg, É.; Bailey, S.; Balbinot, E.; et al. Sdss-iii: Massive spectroscopic surveys of the distant universe, the milky way, and extra-solar planetary systems. Astron. J. 2011, 142, 72. [Google Scholar] [CrossRef]
- Hassan, S.; Davé, R.; Mitra, S.; Finlator, K.; Ciardi, B.; Santos, M.G. Constraining the Contribution of Active Galactic Nuclei to Reionization. Mon. Not. R. Astron. Soc. 2017, 473, 227–240. [Google Scholar] [CrossRef]
- Weymann, R.J.; Carswell, R.F.; Smith, M.G. Absorption Lines in the Spectra of Quasistellar Objects. Annu. Rev. Astron. Astrophys. 1981, 19, 41–76. [Google Scholar] [CrossRef]
- Jin, X.; Zhang, Y.; Zhang, J.; Zhao, Y.-H.; Wu, X.-B.; Fan, D. Efficient Selection of Quasar Candidates Based on Optical and Infrared Photometric Data Using Machine Learning. Mon. Not. R. Astron. Soc. 2019, 485, 4539–4549. [Google Scholar] [CrossRef]
- Lovelace, R.V.E.; Li, H.; Colgate, S.A.; Nelson, A. Rossby Wave Instability of Keplerian Accretion Disks. Astrophys. J. 1999, 513, 805–810. [Google Scholar] [CrossRef]
- Shakura, N.I.; Sunyaev, R.A. Black Holes in Binary Systems: Observational Appearances. Symp. -Int. Astron. Union. 1973, 55, 155–164. [Google Scholar] [CrossRef]
- Morris, S.L.; Weymann, R.J.; Savage, B.D.; Gilliland, R.L. First Results from the Goddard High-Resolution Spectrograph—The Galactic Halo and the Ly-Alpha Forest at Low Redshift in 3C 273. Astrophys. J. 1991, 377, L21–L24. [Google Scholar] [CrossRef]
- Turnshek, D.A.; Grillmair, C.J.; Foltz, C.B.; Weymann, R.J. QSOs with PHL 5200-like Broad Absorption Line Profiles. Astrophys. J. 1988, 325, 651. [Google Scholar] [CrossRef]
- Francis, P.J.; Hewett, P.C.; Foltz, C.B.; Chaffee, F.H.; Weymann, R.J.; Morris, S.L. A High Signal-To-Noise Ratio Composite Quasar Spectrum. Astrophys. J. 1991, 373, 465. [Google Scholar] [CrossRef]
- Kao, W.-B.; Zhang, Y.; Wu, X.-B. Efficient Identification of Broad Absorption Line Quasars Using Dimensionality Reduction and Machine Learning. Publ. Astron. Soc. Jpn. 2024, 76, 653–665. [Google Scholar] [CrossRef]
- Lyke, B.W.; Higley, A.N.; McLane, J.N.; Schurhammer, D.P.; Myers, A.D.; Ross, A.J.; Dawson, K.; Chabanier, S.; Martini, P.; Busca, N.G.; et al. The Sloan Digital Sky Survey Quasar Catalog: Sixteenth Data Release. Astrophys. J. Suppl. Ser. 2020, 250, 8. [Google Scholar] [CrossRef]
- Pâris, I.; Petitjean, P.; Aubourg, É.; Myers, A.D.; Streblyanska, A.; Lyke, B.W.; Anderson, S.F.; Armengaud, É.; Bautista, J.; Blanton, M.R.; et al. The Sloan Digital Sky Survey Quasar Catalog: Fourteenth Data Release. Astron. Astrophys. 2018, 613, A51. [Google Scholar] [CrossRef]
- Ahumada, R.; Prieto, C.A.; Almeida, A.; Anders, F.; Anderson, S.F.; Andrews, B.H.; Anguiano, B.; Arcodia, R.; Armengaud, E.; Aubert, M.; et al. The 16th Data Release of the Sloan Digital Sky Surveys: First Release from the APOGEE-2 Southern Survey and Full Release of EBOSS Spectra. Astrophys. J. Suppl. Ser. 2020, 249, 3. [Google Scholar] [CrossRef]
- Bolton, A.S.; Schlegel, D.J.; Aubourg, É.; Bailey, S.; Bhardwaj, V.; Brownstein, J.R.; Burles, S.; Chen, Y.-M.; Dawson, K.; Eisenstein, D.J.; et al. Spectral classification and redshift measurement for the SDSS-III baryon oscillation spectroscopic survey. Astron. J. 2012, 144, 144. [Google Scholar] [CrossRef]
- Sorzano, C.O.S.; Vargas, J.; Montano, A.P. A Survey of Dimensionality Reduction Techniques. arXiv 2014, arXiv:1403.2877. [Google Scholar]
- Van Der Maaten, L.; Postma, E.; Van den Herik, J. Dimensionality reduction: A comparative. J. Mach. Learn. Res. 2009, 10, 13. [Google Scholar]
- Hasan, B.M.S.; Abdulazeez, A.M. A Review of Principal Component Analysis Algorithm for Dimensionality Reduction. J. Soft Comput. Data Min. 2021, 2, 20–30. [Google Scholar]
- Bailey, S. Principal Component Analysis with Noisy And/or Missing Data. Publ. Astron. Soc. Pac. 2012, 124, 1015–1023. [Google Scholar] [CrossRef]
- Guo, Z.; Martini, P. Classification of Broad Absorption Line Quasars with a Convolutional Neural Network. Astrophys. J. 2019, 879, 72. [Google Scholar] [CrossRef]
- Ferreras, I.; Rogers, B.; Lahav, O. Principal Component Analysis as a Tool to Explore Star Formation Histories. arXiv 2006, arXiv:astro-ph/0611456. [Google Scholar] [CrossRef]
- Fathivavsari, H. Deep Learning Prediction of the Broad Lyα Emission Line of Quasars. Astrophys. J. 2020, 898, 114. [Google Scholar] [CrossRef]
- Traven, G.; Matijevic, G.; Zwitter, T.; Žerjal, M.; Kos, J.; Asplund, M.; Bland-Hawthorn, J.; Casey, A.R.; De Silva, G.; Freeman, K.; et al. The Galah Survey: Classification and Diagnostics with T-SNE Reduction of Spectral Information. Astrophys. J. Suppl. Ser. 2017, 228, 24. [Google Scholar] [CrossRef]
- Verma, M.; Matijevič, G.; Denker, C.; Diercke, A.; Dineva, E.; Balthasar, H.; Kamlah, R.; Kontogiannis, I.; Kuckein, C.; Pal, P.S. Classification of High-Resolution Solar Hα Spectra Using T-Distributed Stochastic Neighbor Embedding. Astrophys. J. 2021, 907, 54. [Google Scholar] [CrossRef]
- Roweis, S.T. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
- Becker, B.; Vaccari, M.; Prescott, M.; Grobler, T. CNN Architecture Comparison for Radio Galaxy Classification. Mon. Not. R. Astron. Soc. 2021, 503, 1828–1846. [Google Scholar] [CrossRef]
- Zhu, X.-P.; Dai, J.-M.; Bian, C.-J.; Chen, Y.; Chen, S.; Hu, C. Galaxy Morphology Classification with Deep Convolutional Neural Networks. Astrophys. Space Sci. 2019, 364, 55. [Google Scholar] [CrossRef]
- Liu, W.; Zhu, M.; Dai, C.; He, D.; Yao, J.; Tian, H.; Wang, B.Y.; Wu, K.; Zhan, Y.; Chen, B.; et al. Classification of Large-Scale Stellar Spectra Based on Deep Convolutional Neural Network. Mon. Not. R. Astron. Soc. 2018, 483, 4774–4783. [Google Scholar] [CrossRef]
- He, Z.; Qiu, B.; Luo, A.-L.; Shi, J.; Kong, X.; Jiang, X. Deep Learning Applications Based on SDSS Photometric Data: Detection and Classification of Sources. Mon. Not. R. Astron. Soc. 2021, 508, 2039–2052. [Google Scholar] [CrossRef]
- Xu, L.; Xie, J.; Cai, F.; Wu, J. Spectral Classification Based on Deep Learning Algorithms. Electronics 2021, 10, 1892. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).