Abstract
Astronomical spectra, which encode rich astrophysical and chemical information, are fundamental to understanding celestial objects and universal laws. The advent of large-scale spectroscopic surveys, generating tens of millions of spectra, presents significant challenges for efficient data processing and analysis. To address these challenges, we develop an AI-powered platform (named “SpecZoo”) for spectral visualization and analysis. This platform integrates modern information technology and machine learning to lower the barrier to spectral data utilization and enhance research efficiency. Its core functionalities include interactive visualization, automated spectral classification, physical parameter measurement, spectral annotation, and multi-band/multi-modal data fusion, all supported by flexible user and data management systems. It has become an essential tool for the National Astronomical Data Center, directly supporting spectral data processing and research for major projects including LAMOST, SDSS, DESI, and so on. Furthermore, the platform demonstrates strong potential for science-education integration, providing a novel resource for cultivating talent in astronomy and data science.
1. Introduction
Spectroscopic observations provide a fundamental means of probing the physical conditions of celestial objects. The detailed shapes and strengths of spectral features allow precise measurements of chemical abundances, effective temperatures, surface gravities, stellar ages, kinematics, and so on. These parameters are crucial for understanding the formation and evolution of stars and stellar populations, constraining the structure of the Milky Way, and tracing the assembly histories of galaxies across cosmic time.
Modern large-scale spectroscopic surveys have transformed observational astrophysics by delivering unprecedented volumes of high-quality spectral data. Examples include the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST; [1]), the Sloan Digital Sky Survey (SDSS; [2]), the Dark Energy Spectroscopic Instrument (DESI; [3]) and the Global Astrometric Interferometer for Astrophysics (Gaia; [4]). SDSS, through successive generations and most recently SDSS-V (2020–2025), is obtaining optical and near-infrared spectra for more than six million objects, enabling detailed investigations of Galactic structure, stellar populations, exoplanets, compact objects, and even the Universe. Complementing these, Gaia provides astrometric, photometric, and spectroscopic data for over 1.8 billion sources, delivering precise radial velocities and stellar parameters that are essential for studying the kinematics, chemistry, and formation history of the Milky Way. LAMOST, China’s first major national spectroscopic facility, combines a large aperture with a wide field of view and has released over twenty million spectra, providing an unparalleled resource for Milky Way archaeology and stellar parameter inference. DESI, operational since 2021, deploys 5000 robotically positioned fibers and collects over 100,000 spectra per night; its first-year data yielded a three-dimensional map containing 18.7 million galaxies, quasars, and stars, offering a transformative dataset for cosmology. Despite these advances, the vast and complex data produced by modern surveys pose new challenges for storage, visualization, and analysis.
To date, there remains no comprehensive international platform for spectroscopic data visualization, analysis, and management that simultaneously meets the needs of astronomical research, education, and public outreach. Many early-generation software packages—such as VOSpec [5], SpecView [6], SPLAT [7], and CASSIS [8]—were developed on Java-based architectures. These tools suffer from cumbersome deployment, strong dependence on local computing environments, and limited functionality; moreover, they have not evolved into integrated research–education platforms, leading to their limited adoption in modern astronomical workflows. While SkyPortal [9] excels in managing time-domain targets and collaborative workflows, they share a common limitation: as general-purpose observation management frameworks, their core designs are not optimized for the deep interactive analysis of spectroscopic data or research-grade parameter extraction. They lack built-in, advanced AI-powered capabilities dedicated to spectral features, such as automated classification, parameter estimation, and anomaly detection. Furthermore, their collaboration models are generally confined to task assignment and discussion within project teams, failing to establish an open ecosystem for spectroscopic analysis that seamlessly integrates professional research with public participation.
In comparison to general-purpose citizen science platforms such as Zooniverse [10], existing spectral tools lack sufficient depth in specialization and interactive analysis. While Zooniverse has successfully enabled multi-project management and public participation, its design is not optimized for complex spectral data analysis and research-level parameter measurement. Our developed platform, SpecZoo, aims to bridge this gap by providing a specialized web-based platform focused on the domain of spectroscopy, integrating artificial intelligence with collaborative functionalities.
In recent years, machine learning methods have demonstrated significant advantages in astrophysical spectral classification, parameter estimation, and the discovery of rare celestial objects, leading to the proliferation of algorithms and models based on techniques such as deep learning and transfer learning. However, these studies have largely focused on the algorithms themselves and have not been adequately integrated into user-friendly, open, and collaborative analysis platforms, resulting in barriers to their application within practical scientific workflows.
To address these challenges, we develop an AI-powered, web-based platform (named “SpecZoo”) for spectral data visualization and analysis. Built on a modern front-end/back-end architecture, the platform provides full functionality through a standard web browser, eliminating technical barriers and simplifying access. SpecZoo integrates a suite of AI algorithms for automated spectral classification, parameter estimation, redshift measurement, and anomaly detection and supports advanced capabilities such as multi-modal data fusion, interactive visualization, and collaborative annotation workflows. In scientific applications, SpecZoo enables targeted object searches, sample construction, and efficient validation of rare or peculiar celestial objects. For educational use, the platform facilitates research-oriented teaching, allowing students to explore authentic spectral datasets and perform detailed data annotation.
As the foundational article of the SpecZoo series, this work describes the platform’s architectural design, core functionalities, and operational capabilities. Building upon this foundation, subsequent studies will leverage machine-learning techniques and the SpecZoo platform to systematically search for or validate rare celestial objects. These research activities will be integrated into research-oriented teaching, with the ultimate goal of establishing SpecZoo as a comprehensive spectral “zoo” for both scientific and educational communities.
The remainder of this paper is organized as follows: Section 2 describes the design philosophy and modular architecture of SpecZoo, outlining its foundational pillars of scientific empowerment, educational innovation, and sustainable data ecosystem development; Section 3 details the platform’s core functionalities and technological advancements, covering basic role and data management, spectral template libraries, and advanced AI-powered features; Section 4 illustrates several scientific use cases; and Section 5 discusses the platform’s educational applications, focusing on its integration into structured spectral identification training and research-oriented teaching modules that cultivate practical data analysis skills. Finally, we give the conclusion in Section 6.
2. Design of SpecZoo
SpecZoo is an AI-powered spectral platform that enhances previous system architectures [11] by integrating large-scale spectral data analysis with research and educational applications. In this section, we describe the platform’s design philosophy, modular architecture, and technical details, highlighting how SpecZoo addresses challenges in managing, visualizing, and analyzing massive spectroscopic datasets.
2.1. Design Philosophy
SpecZoo is built on three foundational pillars: scientific research empowerment, educational innovation, and sustainable data ecosystem development. These principles guide both the architectural and functional design of the platform. By emphasizing scientific research, SpecZoo enables users to extract novel insights from each observational dataset. At the core of the SpecZoo design, illustrated in Figure 1, are two foundational pillars supporting a sustainable research-teaching ecosystem. First, the sustainable data pillar establishes a robust infrastructure for efficient data management, sharing, and long-term preservation. Centered on these foundations is the educational pillar, which facilitates research-oriented teaching by enabling students to work directly with authentic spectral data, thereby cultivating essential analytical skills.
Figure 1.
SpecZoo design philosophy. The platform integrates AI-powered analysis, educational innovation, and sustainable data management to address the challenges posed by large-scale spectral data.
2.2. Modular Architecture of SpecZoo
SpecZoo adopts a modular, front-end/back-end separation architecture, consisting of four core layers: the User Roles Layer, the Visualization and Label Layer, the Data Node Layer, and the AI Layer, as shown in Figure 2.
Figure 2.
Architectural design of SpecZoo. The platform consists of four layers: User Roles, Visualization and Label, Data Node, and AI, enabling interactive analysis, data management, and AI-powered processing.
The User Roles Layermanages user authentication, permissions, task assignments, annotation review, and statistical reporting. Single sign-on via the OAuth protocol [12] allows registered users of the National Astronomical Data Center (NADC) to access the platform without additional registration steps.
The Visualization and Label Layer provides interactive online spectral visualization and analysis. It supports spectral map display, spectral line annotation, multi-band data fusion, AI-assisted classification recommendations, template matching, and redshift or velocity measurements. To enhance data quality, the layer incorporates wavelet-based noise reduction and a tool for removal of 3σ sky lines. Visualization interfaces are implemented with ECharts [13], enabling multi-terminal interaction.
The Data Node Layer handles storage and access control for user-submitted spectral data and star catalogs. Data visibility is managed through three modes: PublicDB (public database), GroupDB (group-shared database), and MyDB (user-private database). Large files are uploaded using segmentation and breakpoint-resume mechanisms, ensuring stability and efficiency.
The AI Layer integrates machine learning methods for tasks including spectral classification, stellar feature extraction, calculation of stellar atmospheric parameters, and redshift measurements for galaxies and quasars. Computationally intensive operations run on a dedicated server, with results transmitted to the front-end for visualization and further analysis.
The front-end is implemented using the Vue.js framework [14], chosen for its interactivity and maintainability. The back-end employs SpringBoot [15], simplifying configuration management and dependency handling. Data persistence is handled by MyBatis [16], which maps Java objects to MySQL database records (SpecZoo emphasizes stable data acquisition and management, visualization, long-term maintainability, and scalability in its technical design. Vue.js was selected for its low learning curve and well-structured component architecture, facilitating efficient development of complex data interfaces. Spring Boot streamlines backend implementation with its convention-over-configuration approach and robust ecosystem. MySQL provides a reliable and cost-efficient solution for structured metadata storage. MyBatis is employed to maintain explicit SQL control, which is crucial for performance-sensitive astronomical queries and complex data operations. Collectively, these established technologies constitute a sustainable and extensible stack that aligns with the collaborative and evolving requirements of scientific research). This architecture ensures high performance, scalability, and system stability. SpecZoo is currently operational at the NADC1.
3. Implementation of SpecZoo
3.1. Basic Functions
Role Management. The Workgroup function serves as the cornerstone for collaborative work within SpecZoo. SpecZoo employs a hierarchical user-role system with four levels: administrators, group leaders, group managers, and regular users. Group leaders form teams and define spectroscopic tasks, group managers assist with task assignment and monitoring, and regular users have the flexibility to join multiple workgroups, where they can share data and engage in cooperation, thereby fostering a dynamic scientific community. This structure ensures organized collaboration and efficient task management. Specific permissions are as shown in Figure 3.
Figure 3.
Specific permissions of characters.
Spectral Templates and Characteristic Lines. SpecZoo provides predefined spectral templates for stars, galaxies, quasars, and other common astronomical objects, alongside a library of characteristic spectral lines. As an example, a quasar template is illustrated in Figure 4. SpecZoo currently hosts a collection of 334 standard templates. These resources can be flexibly updated by administrators via the backend and are readily accessible to users, who may actively employ them for visual searches of specific or rare celestial objects. They serve to standardize spectral classification and line identification, thereby improving analytical consistency and lowering the entry barrier for less experienced users.
Figure 4.
Spectral template of a quasar from the Sloan Digital Sky Survey (SDSS).
Catalogue Data Management and Visualization. Users can upload catalogue data from LAMOST, SDSS, DESI, APOGEE [17] and other sources in the standard CSV format. Uploaded data are automatically validated and parsed, enabling browsing, spectral analysis, and dataset comparison. SpecZoo automatically recognizes the unique identifiers of major sky surveys (e.g., OBSID for LAMOST; refer to the official survey manuals for identifier specifications of different projects) and downloads the corresponding spectral files via their URLs. Furthermore, the platform supports an Extralinks feature: if the uploaded CSV file contains RA and DEC coordinate fields, the system automatically reads them, allowing users to click on a survey identifier to directly navigate to its dedicated data query interface. Advanced visualization features include spectral line annotation, local feature zoom, and wavelet-based noise reduction.
Spectral Task Configuration and Annotation. Within the same workgroup, group leaders can assign research or educational tasks, and users can label the physical parameters of selected spectral samples. Typical parameters include redshift, emission line width, and continuum flux for quasars; redshift, mean age, metallicity, and total mass for galaxies; radial velocity, surface gravity, and metallicity for stars. Figure 5 presents the task release interface, which enables multiple observers to conduct repeated observations of the same spectrum. Parameters can be dynamically adjusted, allowing tasks to be tailored to celestial objects while maintaining annotation rigor. SpecZoo supports multiple rounds of annotation by different users on the same spectrum, as well as the assignment of multiple spectrum to a single annotator concurrently.
Figure 5.
Interface for spectral recognition task labeling (Asterisk indicates required field), enabling users to annotate physical parameters of selected spectral samples.
3.2. Advanced AI-Powered Features
SpecZoo integrates multiple machine learning and deep learning frameworks to support automated spectral classification and parameter estimation.
AI-powered Spectral Classification. MSPC-Net [18] is a convolutional neural network designed for automated spectral classification. It employs multi-scale feature extraction to capture both fine-grained local features and overall spectral structure, partial convolution to reduce overfitting by operating on channel subsets, and grouped convolution to enhance computational efficiency. This combination allows MSPC-Net to robustly classify large spectral datasets with consistently high accuracy (e.g., 87–91% on stellar classification and up to 85% on subclass tasks) and strong generalization, handling diverse object types including stars, galaxies, and quasars.
Stellar Atmospheric Parameter Prediction. SLAM (Stellar Label Machine) [19] leverages Support Vector Regression to extract fundamental stellar parameters from spectral surveys. It dynamically adapts model complexity to different spectral types and signal-to-noise levels, making it suitable for both large-sample surveys and small-sample, resource-constrained studies. On the LAMOST DR5 dataset, SLAM achieves accuracies of 50 K for , 0.09 dex for , and 0.07 dex for [Fe/H] on high S/N spectra. The method has been successfully applied to combined LAMOST-APOGEE data, demonstrating scalability and reliability across heterogeneous datasets.
Integration with Large Language Models. SpecCLIP [20], developed jointly by Zhejiang Lab and NAOC, combines natural language processing techniques with spectral analysis, enabling large-scale feature extraction and automated parameter estimation. SpecCLIP can process complex spectral patterns and low signal-to-noise data, complementing SLAM in a hybrid framework. This integration supports end-to-end analyses, anomaly detection, and rapid processing of extensive survey datasets. SpecCLIP achieves accuracies of 132.669 K for , 0.079 dex for , and 0.056 dex for [Fe/H] on LAMOST low-resolution spectra.
GaSNet-III Spectral Analysis. GaSNet-III [21] is a generative deep learning model for spectral reconstruction, redshift estimation, and anomaly detection. Its architecture combines an autoencoder variant for interpretable feature extraction (analogous to PCA) with a U-Net structure for noise reduction, achieving high-fidelity reconstruction in logarithmic wavelength space. The model explores the space efficiently to identify global minima across object types and redshifts. For stellar spectra, classification accuracy exceeds 99.9%, with overall accuracy above 98%. Compared with traditional template-matching methods, GaSNet-III significantly improves computational efficiency and scalability for upcoming large surveys such as 4MOST [22].
The output parameters of the AI-powered functionalities in SpecZoo are specified in Table 1. The symbol represents the error rate. For the specific calculation method, please refer to the detailed literature on the three AI functions [19,20].
Table 1.
Physical and chemical parameters derived by SpecZoo. The table lists key stellar properties and asteroseismic quantities output by the platform.
3.3. An Example for Displaying the Main Interface and Validation of the AI Functionalities
Based on the LAMOST survey for target selection, Li et al. [23] performed a homogeneous elemental abundance analysis of 385 very metal-poor stars using high-resolution spectra obtained with the Subaru Telescope [24] and incorporating Gaia parallax data, resulting in a target catalog. Here, we select a source (LAMOST obsid: 203105173) as an example for demonstration to showcase the interface and functionalities of SpecZoo, as demonstrated in Figure 6. The core interface of SpecZoo is organized into three main tabs. Upon entering the Visualization tab, users access Visualization functions for spectral display, denoising, and template fitting. The interface also features four dedicated AI buttons (as detailed in Section 3.2) that enable automated classification, parameter estimation, and feature extraction, alongside other fundamental tools for interactive spectral inspection; the Data Browser tab presents the dataset from uploaded CSV files in tabular form and incorporates filtering and bookmarking capabilities to facilitate rapid data retrieval, where selected entries can be directly visualized in the corresponding spectral interface; the Superposition tab is dedicated to spectral superimposition, enabling comparative analysis of multiple spectra from repeated observations of a single source as well as spectral comparisons across similar astrophysical objects.
Figure 6.
Core interface of SpecZoo illustrated with a metal-poor star. The interface provides three main tabs: ‘Data Browser’ for navigating through all rows in the dataset; ‘Visualization’ for detailed spectral analysis; and ‘Superposition’ for comparing multiple spectra simultaneously. The four dedicated ‘AI buttons’ enable the execution of distinct AI−powered functionalities, facilitating automated classification, parameter estimation, and feature extraction.
Table 2 specified the values of stellar parameters obtained using the AI functionalities, which closely approximate the values obtained from the high-resolution spectra provided by Li et al. [23].
Table 2.
A metal-poor star (LAMOST obsid: 203105173) is used as an example to demonstrate three algorithms. MSPC-Net shows it as F0 star, while SLAM and SpecCLIP measure its atmospheric parameters.
4. Use Cases for Science
SpecZoo facilitates both independent and collaborative scientific investigations, particularly in the identification of rare objects and construction of statistically significant samples. The platform efficiently integrates multi-band spectral data, enabling rapid analysis and visualization for large datasets. For example, it has been demonstrated that the average time for graduate students to visually inspect a single spectrum can be reduced by ∼30% (based on the team’s calculation of average screening time for special objects such as quasars and carbon stars using SpecZoo), allowing researchers to focus more on scientific interpretation and parameter extraction.
4.1. Identification of Strong Gravitational Lens Candidates
Strong gravitational lensing, a direct prediction of general relativity, occurs when a massive foreground object, such as a galaxy or black hole, bends the light from a more distant source, producing magnification, distortion, or multiple images. While most galaxy-scale lenses are traditionally identified through high-resolution imaging, automated detection using spectroscopic data remains a developing research frontier.
SpecZoo facilitates the efficient identification of strong gravitational lens candidates in large spectroscopic surveys by integrating spectral analysis, interactive visualization, and cross-survey data access. Within the platform, researchers can examine spectra to detect multiple redshift components and identify characteristic emission lines from background galaxies (e.g., [O iii] , Ly). Its cross-survey linkage module enables rapid retrieval of high-resolution imaging data (e.g., Hubble Space Telescope (HST) deep-field observations [25]), allowing confirmation of lensing morphology such as Einstein rings, arcs, or multiple images.
As an illustrative example of manual visual screening strategies, [26] proposed a four-step workflow for large spectroscopic surveys: (1) assigning lens-candidate probabilities to all spectra; (2) selecting high-probability candidates above a predefined threshold; (3) estimating redshifts for both foreground and background objects, requiring the background redshift to exceed the foreground by at least ; and (4) conducting manual visual inspection to verify candidate authenticity. While SpecZoo does not directly implement this specific framework, its combination of interactive spectral tools and candidate management capabilities supports analogous workflows, enabling researchers to efficiently perform manual validation of strong lens candidates. Specifically, SpecZoo facilitates rapid processing during the visual inspection phase; after uploading a catalog of candidates, researchers can leverage its interactive visualization tools to compare multiple spectra and annotations simultaneously, improving inspection efficiency several-fold.
Figure 7 shows a representative strong gravitational lensing system identified using SpecZoo, accompanied by high-resolution HST imaging (insets). The foreground galaxy is a quiescent red galaxy exhibiting prominent absorption features, including K, H, Mg I, Na I, and H, with a measured redshift of . Superimposed on this spectrum are emission lines from a background galaxy (Figure 8) at , confirming the coexistence of foreground and background components and demonstrating the characteristic double-redshift signature of a strong lensing system.
Figure 7.
Spectra (cyan curve)and HST images (insets) of a source exhibiting strong gravitational lensing. The foreground galaxy is an old red galaxy, with absorption lines of K, H, Ge, Mg I, Na I, and H, corresponding to a redshift of .
Figure 8.
Emission lines from the background galaxy overlay the spectrum of the strongly lensed source (red curve). The inset shows a magnified view around 8000 Å, highlighting the emission features corresponding to a redshift of .
This example illustrates how SpecZoo facilitates both the identification and preliminary validation of strong lens candidates. The platform supports: (1) interactive detection of multiple redshift components within a single spectrum, (2) integration with high-resolution imaging to verify morphological features such as arcs or Einstein rings, and (3) efficient candidate management that combines automated screening with manual verification.
Our team is developing a machine-learning pipeline on the SpecZoo platform to systematically identify strong gravitational-lens candidates within DESI spectroscopic data, primarily by detecting multiple, spatially unresolved redshift components within single-fiber spectra (Li et al., in preparation). This spectroscopic search provides a strategic precursor catalog for next-generation imaging surveys, most notably the China Space Station Telescope (CSST) [27]. The synergy between DESI spectroscopy and CSST high-resolution imaging will enable definitive confirmation and the construction of a robust, multi-wavelength lens sample, exemplifying a forward-looking approach to maximizing the scientific return of complementary survey technologies.
4.2. White Dwarf–Main Sequence Binary Systems
White dwarf–main-sequence (WDMS) binaries are a common class of compact binaries in the Milky Way, each consisting of a white dwarf (WD) and a low-mass main-sequence (MS) companion. These systems typically form from main-sequence binary evolution. In wide binaries, the two components evolve largely independently with little mass transfer. In close binaries, the more massive star may expand during its red giant or asymptotic giant branch phase, engulfing its companion and initiating a common-envelope (CE) phase. Tides within the envelope reduce the orbital separation and remove energy and angular momentum, eventually leading to envelope ejection and leaving behind a post-common-envelope binary (PCEB). Owing to their abundance and relative ease of identification, WDMS binaries serve as key testbeds for studying common-envelope evolution [28].
SpecZoo facilitates the identification and analysis of WDMS systems by combining interactive spectral analysis, template fitting, and cross-survey data integration. The platform’s two-template fitting module allows decomposition of composite spectra, assigning separate spectral templates to distinct wavelength ranges. This enables disentangling of overlapping WD and main-sequence features, even when the angular separation is too small for fiber spectroscopy to resolve the components individually. Cross-matched astrometric data, including Gaia parallaxes and proper motions, can be retrieved within SpecZoo to verify spatial and kinematic consistency, providing an additional layer of confirmation for physically bound systems. Figure 9 illustrates a representative WDMS system identified using SpecZoo. Panel shows the SDSS optical image, with the cyan curve representing the composite LAMOST spectrum. The WD component is matched to the blue portion of the spectrum (purple curve), while the M-type main-sequence companion is matched to the red portion (blue curve), demonstrating effective spectral disentanglement. Gaia parallaxes (, ) and proper motions (, ) for both components, confirming their spatial and kinematic coherence and validating the physical association.
Figure 9.
Analysis of a WDMS binary using SpecZoo. Panel LAMOST composite spectrum (cyan curve) overlaid with SDSS optical images; the purple and blue curves represent the white dwarf and M−type main−sequence templates, respectively, fitted via SpecZoo’s two−template decomposition. Gaia parallaxes and proper motions for both components, enabling verification of their physical association.
This approach can be readily applied to other types of binary systems. By leveraging SpecZoo’s capabilities for multimodal data integration and intelligent spectral analysis, researchers can efficiently identify and assemble large samples of binaries, facilitating statistical studies and evolutionary modeling. Systematic searches using spectra from SDSS, LAMOST, and other surveys are essential for investigating the evolution of compact binaries, particularly the physical processes governing the common envelope phase [29]. SpecZoo’s combination of spectral decomposition, template fitting, and cross-survey data access enables rapid identification, classification, and validation of rare or complex binary systems.
During the visual inspection of WDMS binaries, our team identified a peculiar cataclysmic variable system that exhibits pronounced eclipsing behavior in its light curve, confirmed by combined spectral and time-domain analysis. We are currently modeling its orbital parameters and geometric structure (Feng et al., in preparation).
4.3. Efficient Identification and Subtype Classification of Carbon Stars
Carbon stars constitute a distinct population of late-type giants (spectral types G, K, and M) whose atmospheres are enriched in carbon relative to oxygen. This enhancement produces prominent molecular absorption features, including C2, CN, and CH bands, which distinguish them from typical late-type giants of similar temperature and luminosity. Carbon stars play a critical role as tracers of stellar evolution and Galactic chemical enrichment. They are further subclassified into spectral types such as C–H, C–R, C–N, and Ba stars, reflecting variations in molecular absorption features, surface composition, and formation pathways, often linked to binary mass-transfer events [30].
Traditionally, carbon star identification relies heavily on manual spectral inspection. For instance, [31] applied a multi-stage workflow to 10,599,979 low-resolution LAMOST DR7 spectra, including (1) pre-selecting spectra with i-band signal-to-noise ratio ; (2) screening candidates using molecular line indices; (3) incorporating 2MASS infrared photometry to separate “warm” and “cold” subgroups; and (4) performing final manual verification. While effective, this approach requires extensive manual effort, yielding over 100,000 candidate spectra that demand labor-intensive inspection.
SpecZoo significantly enhances both efficiency and reliability in the identification and subtype classification of carbon stars. Candidate spectra pre-selected via molecular line indices and infrared colors can be interactively analyzed within the platform. SpecZoo enables detailed visualization of diagnostic molecular absorption features, including the C2 bands (4737 Å, 5165 Å, 5635 Å), CN bands (7065 Å, 7820 Å), and continuum characteristics (e.g., flux cutoffs at Å in C–N stars). Subtype classification is further facilitated through semi-automated identification of key spectral line features following established criteria [30]. By integrating automated pre-selection, spectral decomposition, and interactive validation, SpecZoo drastically reduces manual workload and supports the rapid assembly of large, robust carbon star samples.
Figure 10 illustrates a representative carbon star candidate identified using SpecZoo. The spectrum, obtained from LAMOST survey data, exhibits strong C2 absorption bands at 4737 Å and 5165 Å, along with pronounced s-process element lines, including Sr II 4077 Å and Ba II 4554 Å and 6496 Å, indicating a Ba-type carbon star. Interactive anaslysis with SpecCLIP quantifies the carbon-to-oxygen ratio, confirming that the super-solar abundance of C/O. This example demonstrates how, following automated pre-selection by external pipelines, SpecZoo combines spectral decomposition and quantitative assessment to efficiently process, identify, and classify carbon star candidates, thereby enabling large-scale studies of their formation history and chemical enrichment patterns.
Figure 10.
The spectrum of a carbon star candidate from LAMOST reveals strong C II molecular absorption bands at 4737 Å and 5165 Å along with strong absorption lines of Sr II (4077 Å) and Ba II (4554 Å, 6496 Å), identifying it as a Ba-type carbon star. According to the results returned by SpecCLIP (subplot), the super-solar abundance of C/O.
5. Use Cases for Education
SpecZoo also has educational functions, which contributes to the cultivation of astronomical talents and astronomical education that proceeds from the elementary to the advanced.
5.1. Assist in Foundational Teaching
In the first year of undergraduate course Astronomical Data Processing at China West Normal University, SpecZoo was fully integrated into the module on stellar spectral classification. The platform supports a structured workflow in which students first employ the MSPC-Net AI classification module to predict the spectral types of unknown stars from large survey data, such as LAMOST or SDSS (e.g., F, G, K, and M-type stars). These preliminary AI-powered predictions provide efficient and reliable guidance for subsequent analysis. Students then perform manual validation by comparing the target spectra with the platform’s comprehensive library of spectral templates based on MKCLASS [32] standards. Overlaying the spectra with templates allows detailed inspection of line morphology, intensity, and wavelength positions of diagnostic absorption features (e.g., Balmer series, metallic lines), facilitating a quantitative understanding of stellar physical properties such as effective temperature and surface gravity.
The integration of AI significantly accelerates the learning process. In practice, students achieved an average classification accuracy of 69% after only 16 h of training (see Figure 11), compared to approximately 40% for traditional astronomical pedagogy. This dual approach—AI-assisted pre-classification followed by manual template verification—lowers the entry barrier for spectral analysis and reduces the time required compared with traditional fully manual classification. Importantly, it transforms abstract theoretical concepts into hands-on practice, enabling students to systematically develop spectral diagnostic skills and deepen their understanding of the connection between stellar physical states and observed spectral features. With continued practice, students are able to rapidly classify stellar spectra by recognizing key absorption features.
Figure 11.
Undergraduate course “Astronomical Data Processing” teaching effectiveness visualization based on 16 h of structured AI-assisted training.
5.2. Support for Research-Oriented Teaching
In the Research Training and Innovative Practice course launched for the Physics Elite Class2 at Hangzhou Dianzi University, the curriculum is designed around the analysis of massive spectroscopic datasets assisted by artificial intelligence, guiding students in the systematic search for rare celestial objects (e.g., Be stars [33], WDMS binaries [28], carbon stars [31], strong gravitational lens [26], and so on). The course utilizes the SpecZoo expert spectroscopic platform to conduct hands-on training in spectral visualization and analysis. Students are divided into 4–6 research groups (with 4–5 students per group), each tasked with replicating the complete search pipeline for distinct types of celestial objects, such as carbon stars, quasars, and WDMS binaries, and so on. This process covers the full research cycle, including literature review, data processing, model construction, and result validation. Through an integrated teaching model combining theory, practice, and discussion, the course aims to cultivate foundational research competencies among students. Promising projects are further encouraged to extend into undergraduate thesis work or graduate research, fostering a continuous talent development pipeline.
Based on pedagogical assessments conducted during the 2024 and 2025 academic years, over 95% of students entered the course with no prior background in spectral analysis. Following a structured 64 h training program supported by the integrated tools of the SpecZoo platform, all student groups achieved the ability to independently conduct spectral analyses and fully reproduce the object-discovery or special object validation workflows documented in the reference literature. Notably, several groups extended their work to produce original research outcomes, for instance,
- (1)
- The gravitational-lens group performed a systematic search of 220,000 LAMOST galaxy spectra with the machine-learning approach, identifying 170 candidate strong lenses. Subsequent manual verification on SpecZoo yielded around 20 high-probability lens candidates.
- (2)
- The carbon-star group manually reviewed and subclassified existing carbon-star catalogs via SpecZoo, optimized the deep-learning algorithm from He et al. [34], and applied it to the latest LAMOST data release.
- (3)
- The WDMS group reproduced the methodology of Pérez-Couto et al. [35] and successfully adapted its unsupervised machine-learning approach to low-resolution LAMOST spectra for WDMS candidate detection.
The aforementioned findings and methodological innovations in each direction demonstrate notable academic merit and publication potential. The involved student teams are currently systematically organizing their research findings and actively preparing manuscripts for corresponding academic papers.
These results demonstrate that, through hands-on use of the SpecZoo platform, students can effectively acquire and apply contemporary research methodologies to real astronomical data. All final projects have been compiled into course papers and are publicly accessible on the SpecZoo platform homepage, illustrating the platform’s role in facilitating research-oriented education and enabling novice learners to transition into active contributors in spectral data analysis.
6. Conclusions
Spectroscopic observations provide fundamental insights into the physical properties and chemical compositions of astronomical objects, serving as a cornerstone for studies of stellar populations, galactic structure, and cosmic evolution. We developed an integrated spectral analysis platform, SpecZoo, that combines artificial intelligence, interactive visualization, and multimodal data integration to support both scientific research and education. The platform offers automated spectral classification, precise stellar parameter estimation, spectral decomposition, cross-survey data access, interactive visualization, and manual annotation, enabling efficient identification of astrophysical objects, construction of large statistical samples, and systematic analyses.
The scientific capabilities of SpecZoo are illustrated through three representative applications. First, in the identification of strong gravitational lens candidates, the platform enables spectroscopic detection of multiple redshift components within a single observation, allowing researchers to disentangle foreground and background sources and efficiently screen and validate candidate systems. While high-resolution imaging can provide morphological confirmation, the primary identification relies on the characteristic spectral redshift signatures. Second, for WDMS binaries, SpecZoo facilitates spectral decomposition and template fitting to separate overlapping signals, which, when combined with parallax and proper motion measurements, confirms physically bound systems. This enables the assembly of large, statistically robust samples for studies of binary evolution and the common envelope phase. Third, in the classification of carbon stars, the platform supports interactive visualization of molecular absorption features and s-process element lines, automated subtype identification, and cross-survey confirmation, greatly accelerating the construction of large, reliable samples for investigations of stellar evolution and galactic chemical enrichment. Each of these examples demonstrates how SpecZoo integrates automated analysis, human expertise, and multi-survey data to streamline discovery and validation workflows.
In educational contexts, SpecZoo converts theoretical concepts into hands-on practice. Through foundational teaching, students can quickly get started and significantly improve their spectral classification accuracy; through research-oriented teaching, students can rapidly become proficient and engage in actual scientific research tasks. Specifically, students use AI-assisted pre-classification (e.g., via MSPC-Net) to obtain preliminary spectral types, followed by interactive template verification and manual inspection, comparing spectral morphology, line strengths, and wavelength positions. This workflow not only lowers the barrier to spectral analysis but also systematically develops students’ skills in spectral diagnostics, interpretation of stellar physical parameters, and scientific reasoning. With this dual AI–human approach, students gain both operational experience and enhanced understanding of the relationships between stellar spectra and underlying astrophysical properties.
By supporting collaborative annotation and feature identification, SpecZoo also generates high-quality labeled datasets that improve automated classification accuracy and enable iterative refinement of AI algorithms. Building upon this foundation, future work will leverage machine-learning techniques and the SpecZoo platform to systematically search for rare celestial objects. Future developments will incorporate advanced neural network architectures and generalized modeling frameworks to enhance multi-band and multimodal analysis. Expanded collaborations with institutions (e.g., NADC and Zhejiang Laboratory) will enable seamless cross-platform integration, broaden accessibility, and facilitate the construction of large, robust datasets. This research activity will be deeply integrated into research-oriented teaching, with the ultimate goal of establishing SpecZoo as a comprehensive spectral “zoo” for both scientific and educational communities. Overall, SpecZoo exemplifies the potential of AI-powered spectroscopy to advance astronomical research, support systematic studies of rare and complex objects, and cultivate the next generation of researchers through immersive, data-intensive learning experiences.
Author Contributions
Conceptualization, H.T.; methodology, Y.P. and G.L.; software, G.L. and Y.X.; validation, Y.P., G.L. and H.T.; formal analysis, H.T. and Y.X.; investigation, Y.P., Y.X. and H.T.; resources, Y.P. and H.T.; data curation, Y.X. and Y.P.; writing—original draft preparation, Y.P.; writing—review and editing, X.C., Y.P. and H.T.; visualization, G.L.; supervision, H.T.; project administration, H.T. and Y.P.; funding acquisition, H.T. and X.C. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the National Natural Science Foundation of China (under Grant Nos. 12373033, 12447204, 12403037) and the first batch of scientific research projects of the China Space Station Telescope (CSST) (under Grant Nos. CMS-CSST-2021-A09, A08).
Data Availability Statement
The data presented in this study are available upon reasonable request from the corresponding author. The spectral data analyzed in this work can be accessed through the SpecZoo system on 1 January 2026 at https://nadc.china-vo.org/speczoo-system after registration and login. Due to the proprietary nature of the database and access restrictions, the raw data are not publicly available but can be obtained following the platform’s data access protocol.
Acknowledgments
The authors thank Feng Wang, Jian-Rong Shi, Dong-Wei Fan, Chang-Hua Li, Yan-Xia Zhang, Chen-Zhou Cui and Hua-Xi Chen for the helpful discussions and thank Jing-Jing Wu, Fu-Cheng Zhong, and Han Wu for the deployment of the AI services. H.J.T. thanks the support from the Key Project of Zhejiang Provincial Natural Science Foundation (No. ZCLZ25A0301). Y.X. thanks the support by the Young Data Scientist Program of the National Astronomical Data Center (NADC).
Conflicts of Interest
The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
Notes
| 1 | https://nadc.china-vo.org/speczoo-system/ (accessed on 22 January 2026). |
| 2 | The class, which enrolls around 30 junior physics majors each year, prepares students for careers in scientific research. |
References
- Zhao, G.; Zhao, Y.H.; Chu, Y.Q.; Jing, Y.P.; Deng, L.C. LAMOST Spectral Survey—An Overview. Res. Astron. Astrophys. 2012, 12, 723–734. [Google Scholar]
- Kollmeier, J.; Anderson, S.F.; Blanc, G.A.; Blanton, M.R.; Covey, K.R.; Crane, J.; Drory, N.; Frinchaboy, P.M.; Froning, C.S.; Johnson, J.A.; et al. SDSS-V: Pioneering Panoptic Spectroscopy. Bull. Am. Astron. Soc. 2019, 51, 274. [Google Scholar]
- Levi, M.E.; Allen, L.E.; Raichoor, A.; Baltay, C.; Benzvi, S.; Beutler, F.; Bolton, A.; Castander, F.J.; Chuang, C.H.; Cooper, A.; et al. The Dark Energy Spectroscopic Instrument (DESI). Bull. Am. Astron. Soc. 2019, 51, 57. [Google Scholar] [CrossRef]
- Lindegren, L.; Perryman, M. GAIA: Global astrometric interferometer for astrophysics. Astron. Astrophys. Suppl. Ser. 1996, 116, 579–595. [Google Scholar] [CrossRef]
- Laruelo, A.; Barbarisi, I.; Salgado, J.; Osuna, P. VOSpec Spectral Analysis Tools. In Proceedings of the Astronomical Data Analysis Software and Systems XVII, London, UK, 23–26 September 2007; Astronomical Society of the Pacific: San Francisco, CA, USA, 2008; Volume 394, p. 513. [Google Scholar]
- Busko, I. SPECVIEW: An interactive java tool for visualization and analysis of spectral data. In Proceedings of the Astronomical Data Analysis Software and Systems IX, Waikoloa Village, HI, USA, 3–6 October 2000; Astronomical Society of the Pacific: San Francisco, CA, USA, 2000; Volume 216, p. 79. [Google Scholar]
- Škoda, P.; Draper, P.W.; Neves, M.C.; Andrešič, D.; Jenness, T. Spectroscopic analysis in the virtual observatory environment with SPLAT-VO. Astron. Comput. 2014, 7, 108–120. [Google Scholar] [CrossRef]
- Lebouteiller, V.; Barry, D.; Spoon, H.; Bernard-Salas, J.; Sloan, G.; Houck, J.; Weedman, D. CASSIS: The Cornell Atlas of Spitzer/infrared spectrograph sources. Astrophys. J. Suppl. Ser. 2011, 196, 8. [Google Scholar] [CrossRef]
- van der Walt, S.J.; Crellin-Quick, A.; Bloom, J.S. SkyPortal: An astronomical data platform. J. Open Source Softw. 2019, 4, 1247. [Google Scholar] [CrossRef]
- Simpson, R.; Page, K.R.; De Roure, D. Zooniverse: Observing the world’s largest citizen science platform. In Proceedings of the 23rd International Conference on World Wide Web, Seoul, Republic of Korea, 7–11 April 2014; pp. 1049–1054. [Google Scholar]
- Lei, G.; Xu, Y.; Niu, C.; Tian, H.; Zhang, Y.; Cui, C.; Zhao, Y. Design and Implementation of an Expert Platform for Spectral Inspection. Astron. Res. Technol. 2018, 15, 216–224. [Google Scholar]
- Hardt, D. (Ed.) The OAuth 2.0 Authorization Framework. RFC 6749, Internet Engineering Task Force (IETF), October 2012, 76p. Available online: https://www.rfc-editor.org/info/rfc6749 (accessed on 1 January 2026).
- Li, D.; Mei, H.; Shen, Y.; Su, S.; Zhang, W.; Wang, J.; Zu, M.; Chen, W. ECharts: A declarative framework for rapid construction of web-based visualization. Vis. Inform. 2018, 2, 136–146. [Google Scholar] [CrossRef]
- Hanchett, E.; Listwon, B. Vue.js in Action; Simon and Schuster: New York, NY, USA, 2018. [Google Scholar]
- Boaglio, F. Spring Boot: Acelere o Desenvolvimento de Microsserviços; Casa do Código: Lisbon, Portugal, 2017. [Google Scholar]
- Ho, C. Using MyBatis in Spring. In Pro Spring 3; Springer: Berlin/Heidelberg, Germany, 2012; pp. 397–435. [Google Scholar]
- Prieto, C.A.; Majewski, S.; Schiavon, R.; Cunha, K.; Frinchaboy, P.; Holtzman, J.; Johnston, K.; Shetrone, M.; Skrutskie, M.; Smith, V.; et al. APOGEE: The Apache point observatory galactic evolution experiment. Astron. Nachrichten Astron. Notes 2008, 329, 1018–1021. [Google Scholar] [CrossRef]
- Wu, J.; He, Y.; Wang, W.; Qu, M.; Jiang, B.; Zhang, Y. Classification of Astronomical Spectra Based on Multiscale Partial Convolution. Astron. J. 2024, 167, 260. [Google Scholar] [CrossRef]
- Zhang, B.; Liu, C.; Deng, L.C. Deriving the Stellar Labels of LAMOST Spectra with the Stellar LAbel Machine (SLAM). Astrophys. J. Suppl. Ser. 2020, 246, 9. [Google Scholar] [CrossRef]
- Zhao, X.; Huang, Y.; Xue, G.; Kong, X.; Liu, J.; Tang, X.; Beers, T.C.; Ting, Y.S.; Luo, A.L. SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars. arXiv 2025, arXiv:2507.01939. [Google Scholar] [CrossRef]
- Zhong, F.; Napolitano, N.R.; Heneka, C.; Krogager, J.K.; Demarco, R.; Bouché, N.F.; Loveday, J.; Fritz, A.; Verdier, A.; Roukema, B.F.; et al. Galaxy Spectra Networks (GaSNet). III. Generative Pre-trained Network for Spectrum Reconstruction, Redshift Estimate and Anomaly Detection. arXiv 2024, arXiv:2412.21130. [Google Scholar]
- De Jong, R.S.; Agertz, O.; Berbel, A.A.; Aird, J.; Alexander, D.A.; Amarsi, A.; Anders, F.; Andrae, R.; Ansarinejad, B.; Ansorge, W.; et al. 4MOST: Project Overview and Information for the First Call for Proposals. arXiv 2019, arXiv:1903.02464. [Google Scholar] [CrossRef]
- Li, H.; Aoki, W.; Matsuno, T.; Xing, Q.; Suda, T.; Tominaga, N.; Chen, Y.; Honda, S.; Ishigaki, M.N.; Shi, J.; et al. Four-hundred very metal-poor stars studied with LAMOST and Subaru. II. Elemental abundances. Astrophys. J. 2022, 931, 147. [Google Scholar] [CrossRef]
- Kaifu, N. Subaru telescope. In Proceedings of the Advanced Technology Optical/IR Telescopes VI. SPIE, Kona, HI, USA, 23–25 March 1998; Volume 335, pp. 14–22. [Google Scholar]
- Brammer, G.B.; Van Dokkum, P.G.; Franx, M.; Fumagalli, M.; Patel, S.; Rix, H.W.; Skelton, R.E.; Kriek, M.; Nelson, E.; Schmidt, K.B.; et al. 3D-HST: A Wide-field Grism Spectroscopic Survey with the Hubble Space Telescope. Astrophys. J. Suppl. Ser. 2012, 200, 13. [Google Scholar] [CrossRef]
- Zhong, F.; Li, R.; Napolitano, N.R. Galaxy Spectra neural Networks (GaSNets). I. Searching for strong lens candidates in eBOSS spectra using Deep Learning. Res. Astron. Astrophys. 2022, 22, 065014. [Google Scholar] [CrossRef]
- Miao, H.; Gong, Y.; Chen, X.; Huang, Z.; Li, X.D.; Zhan, H. Cosmological constraint precision of photometric and spectroscopic multi-probe surveys of China Space Station Telescope (CSST). Mon. Not. R. Astron. Soc. 2023, 519, 1132–1148. [Google Scholar] [CrossRef]
- Ren, J.; Luo, A.L.; Zhao, Y. Search and Research Progress on White Dwarf–Main-Sequence Binaries. Prog. Astron. 2014, 32, 462–480. [Google Scholar]
- Willems, B.; Kolb, U. Detached White Dwarf Main-sequence Star Binaries. Astron. Astrophys. 2004, 419, 1057–1076. [Google Scholar] [CrossRef]
- Barnbaum, C.; Stone, R.P.S.; Keenan, P.C. A Moderate-Resolution Spectral Atlas of Carbon Stars: R, J, N, CH, and Barium Stars. Astrophys. J. Suppl. Ser. 1996, 105, 419. [Google Scholar] [CrossRef]
- Li, L.; Zhang, K.; Cui, W.; Shi, J.; Ji, W.; Huo, Z.; Gao, Y.; Zhang, S.; Sun, M. Identification of Carbon Stars from LAMOST DR7. Astrophys. J. Suppl. Ser. 2024, 271, 12. [Google Scholar] [CrossRef]
- Jaschek, C.; Conde, H.; De Sierra, A.C. Catalogue of Stellar Spectra Classified in the Morgan-Keenan System; Universidad Nacional de La Plata: La Plata, Argentina, 1964. [Google Scholar]
- Slettebak, A. The Be stars. Space Sci. Rev. 1979, 23, 541–580. [Google Scholar] [CrossRef]
- He, Y.; Cao, Z.; Deng, H.; Wang, F.; Mei, Y.; Tan, L. Identification of Carbon Stars in LAMOST DR9 Based on Deep Learning. Astrophys. J. Suppl. Ser. 2024, 274, 6. [Google Scholar] [CrossRef]
- Pérez-Couto, X.; Manteiga, M.; Villaver, E. Finding White Dwarfs’ Hidden Companions using an Unsupervised Machine Learning Technique. arXiv 2025, arXiv:2503.04672. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.










