1. Introduction
As of June 2025, more than 14,600 active and inactive satellites and approximately 15,000 cataloged debris fragments orbit Earth [
1]. The rapid spread of space objects, driven by mega-constellation deployments and increased commercial activity, has heightened the probability of on-orbit collisions [
2]. Each collision can generate thousands of secondary fragments, potentially initiating a cascading sequence of impacts known as the Kessler Syndrome [
3]. These developments underscore the critical importance of Space Situational Awareness (SSA), which involves monitoring, characterizing, and predicting the behavior of Resident Space Objects (RSOs) to ensure the safety and sustainability of the orbital environment. SSA relies on multiple sensing modalities. Radar systems provide robust detection and tracking capability in Low Earth Orbit, but their sensitivity decays with the fourth power of distance, limiting their usefulness at higher altitudes and restricting their ability to infer attitude states or surface properties. Laser-ranging systems offer high-precision distance measurements but require active illumination, strict safety protocols, and precise pointing control [
4].
In contrast, optical systems, both ground- and space-based, provide passive, scalable monitoring of RSOs across Low Earth Orbit (LEO), Medium Earth Orbit (MEO), Geostationary Orbit (GEO), and Highly Elliptical Orbit (HEO). Optical sensors measure reflected solar radiation and can reveal dynamical and structural information without requiring cooperative instrumentation [
5]. After calibration, optical measurements are converted into photometric time series known as light curves. The light curve of an RSO reflects a complex interplay of physical and observational factors, including geometry [
6,
7], material composition [
8], surface reflectivity [
9,
10], size [
11], solar-panel configuration [
8], attitude motion [
12,
13], phase angle [
14], and atmospheric conditions [
9]. Periodic signatures can reveal whether an RSO is three-axis stabilized, spin-stabilized, or tumbling, while irregular patterns can indicate structural anomalies or operational events [
15,
16]. For these reasons, light-curve analysis has long been recognized as a key non-cooperative technique for inferring the physical state and dynamic behavior of RSOs within large orbital catalogs [
17]. An overview of the end-to-end optical photometry workflow for producing machine-learning-ready light curves is shown in
Figure 1. Beyond sensing, recent SSA work also advances AI-enabled mission planning and trajectory optimization [
18,
19].
In recent years, the rapid expansion of global optical surveillance networks has produced large volumes of photometric data, rendering manual interpretation increasingly impractical. This has driven the adoption of machine learning (ML) and deep learning (DL) techniques for automated light-curve analysis [
20]. ML/DL methods enable scalable inference and can learn temporal structure that is difficult to capture with handcrafted rules or simplified analytical models [
20]. A simplified taxonomy of existing light-curve-based RSO studies is illustrated in
Figure 2, comprising three broad methodological categories: classification methods, physical characterization frameworks, and hybrid pipelines that integrate characterization results into ML classifiers. A more detailed discussion of light-curve physics, problem formulations, and classification objectives is deferred to
Section 2.
While this taxonomy highlights diverse research directions, most existing studies focus on narrow and visually well-separated categories such as distinguishing stable from tumbling objects or classifying object types. A key challenge, not systematically addressed in prior surveys, is the semantic gap between the multi-attribute information required for operational SSA and the single-task label spaces supported by existing light-curve datasets. In practice, SSA decisions benefit from several semantic axes, such as object type (payload, rocket body, and debris), coarse shape proxies (box like, cylindrical, plate like, and irregular), attitude regimes (three axis stabilized, spin stabilized, slow/fast tumbling, chaotic rotation, and nutation), and operational state (active, inactive, maneuvering, and malfunctioning), each of which can influence orbit prediction, anomaly detection, and conjunction assessment [
21].
However, most publicly available light-curve datasets expose only a small number of classes (typically 3–8) for supervised learning, often coarse and visually separable (e.g., stable vs. tumbling, or object type). Moreover, no public dataset provides unified, track-level, multi-label annotations that jointly encode object type, attitude, shape, and operational state in a standardized ML-ready format [
22]. This mismatch limits the operational relevance of current ML-based classification frameworks and underscores the need for richer datasets, improved labeling practices, and more robust learning paradigms.
This disconnect between operational SSA requirements and available dataset semantics motivates a structured reassessment of light-curve-based RSO classification. Accordingly, this paper presents a systematic scoping review of ML/DL approaches for light-curve-based RSO classification, with particular emphasis on the semantic gap between operational SSA needs and existing dataset label spaces. Specifically, we (1) survey existing photometric datasets and analyze their labeling schemes and limitations; (2) synthesize state-of-the-art ML/DL approaches for classification, characterization, and hybrid modeling; and (3) propose a three-tier semantic roadmap describing what current light-curve methods can achieve, what researchers should target in the near term, and what SSA ultimately requires for operational capability. The insights presented here aim to guide the development of next-generation benchmarks, datasets, and learning frameworks capable of supporting scalable, reliable, and mission-relevant SSA systems.
The remainder of this article is organized as follows.
Section 2 provides background on light-curve physics and classification objectives.
Section 3 outlines the study selection methodology.
Section 4 reviews publicly available light-curve datasets and evaluates their annotation practices.
Section 5 synthesizes ML and DL approaches to RSO classification.
Section 6 discusses the current challenges, including semantic limitations and dataset biases.
Section 7 outlines future research directions for advancing light-curve-based SSA toward operational capability. Finally,
Section 8 concludes the paper by summarizing key findings and their implications for scalable and reliable SSA systems.
2. Context and Taxonomy of Light-Curve-Based RSO Classification
Before reviewing specific datasets and machine learning methods, this section establishes the background context and terminology for light-curve-based RSO classification and introduces a taxonomy used to organize prior work.
Light-curve analysis has played a foundational role in astronomy for more than a century, where variations in observed brightness have been used to infer the physical and dynamical properties of celestial objects. Examples range from determining the periods of variable stars [
23] to estimating the rotation and amplitude of asteroids [
24] and detecting exoplanets through periodic transit signatures. As Earth’s orbital environment has become increasingly congested, these established photometric techniques have been adapted to address the needs of SSA, providing the conceptual foundation for modern light-curve-based classification studies. Within this context, light curves offer a non-cooperative mechanism for assessing the physical condition, operational status, and attitude behavior of RSOs, thereby providing essential information for monitoring an environment characterized by rapid growth in the number and diversity of orbiting objects.
To provide a coherent conceptual structure for interpreting the diverse body of research that uses light curves for RSO analysis,
Figure 2 presents a high-level taxonomy of methodological directions in the literature. Although studies often span multiple categories, three recurring approaches can be identified. One category focuses on classification, in which RSOs are assigned discrete labels such as attitude state, object type, spacecraft family, or platform. A second category seeks to characterize physical or optical properties—including spin rate, reflectivity, spectral features, and approximate size—directly from photometric signatures. A third category combines these two aims, using characterization outputs as features within machine learning classification pipelines. While these distinctions are not always rigid, the taxonomy in
Figure 2 provides a structured lens for interpreting the diverse analytical strategies reported in the literature.
Figure 3 complements this taxonomy by visualizing the distribution and co-occurrence of classification labels across prior studies. The labels span multiple semantic levels, including attitude states (e.g., stable or tumbling), spacecraft platforms (e.g., Starlink, Iridium, OneWeb, and Globalstar), program names (e.g., Nimbus), and standardized bus families (e.g., A2100, HS-601, HS-702, DFH-3, DS-2000, and LS-400). Node size reflects the frequency of each label, while link density indicates how often labels co-occur within the same study. This visualization reveals a highly uneven distribution of ground-truth labels, with most studies emphasizing attitude-state discrimination and comparatively few addressing higher-level semantic categories such as program lineage, bus architecture, or structural surrogates. Together, these visualizations motivate the need for consistent semantic definitions and inform the methodological comparisons presented in later sections.
Attitude classification remains one of the most recurrent themes in the literature. A stable RSO maintains a controlled and predictable orientation either through spin stabilization, where the rotation is about a principal axis at nearly constant angular velocity, or through three-axis stabilization, where the spacecraft maintains a fixed orientation relative to an inertial frame or to Earth. Conversely, a tumbling object exhibits uncontrolled, multi-axis rotation without a consistent principal axis [
25]. Some works refine this definition into subcategories such as nadir-pointing, sun-pointing, velocity-aligned, anti-velocity, and zenith orientations. These hierarchical relationships reveal how the stable and tumbling categories branch into more specific operational states. Beyond attitude state, the taxonomy also captures program-level, platform-level, and bus-level designations. Program names such as Nimbus refer to families of spacecraft sharing a common mission lineage. In contrast, satellite bus labels such as DFH-3, HS-601, DS-2000, and LS-400 denote standardized spacecraft architectures upon which different payloads can be mounted. Recurring platform types—such as Starlink, Iridium, OneWeb, Globalstar, NOAA JPSS, Yaogan-31, Navstar, Haiyang-2, Fengyun-3, Shijian-3-4, DMSP 5B/5C, and Yunhai-2—appear frequently as classification labels and are treated consistently following the hierarchy defined by [
26]. Classification labels were cross-validated against the satellite catalog maintained by [
27] to maintain consistency in distinguishing between program names, satellite buses, operational platforms, and individual spacecraft. Although the orbital regime (e.g., LEO, MEO, GEO, and HEO) is not typically used as a direct classification label in machine learning models, it appears in many studies as contextual information that helps interpret the physical and functional distribution of RSOs. Including orbital regime information therefore provides additional context for interpreting how classification labels relate to orbital behavior. The growing heterogeneity of RSO classification labels in the literature underscores the need for a standardized taxonomy. Without such a framework, comparing machine learning and deep learning approaches becomes difficult, particularly given the wide variability in label granularity and annotation practices across existing datasets. Together,
Figure 2 and
Figure 3 establish a unified conceptual foundation for comparing classification objectives, interpreting reported results, and assessing the semantic limitations that shape current machine learning evaluations as examined in subsequent sections.
3. Methodology
This review follows a systematic literature survey workflow guided by the PRISMA Extension for Scoping Reviews (PRISMA-ScR) framework [
28] to identify, screen, and synthesize studies applying machine learning or deep learning methods to the classification of RSOs using optical light curves. The resulting corpus forms the basis for the analyses presented in
Section 5. The methodology comprises database retrieval, citation-based expansion, multi-stage screening using explicit eligibility criteria, and structured data extraction to enable cross-study comparison.
3.1. Search Strategy and Database Coverage
A primary literature search was conducted using Google Scholar and Elsevier’s Engineering Village, selected for their broad coverage of aerospace engineering, astronomy, and machine learning venues. Engineering Village was used to access indexed databases including Compendex and Inspec. The primary database search was conducted in March 2025. To reduce the likelihood of missed studies due to inconsistent indexing or disciplinary fragmentation, the search was supplemented with Connected Papers for co-citation analysis as well as backward and forward citation chaining from highly cited seed papers. A final targeted update search was performed on 1 February 2026 to identify newly published studies meeting the same inclusion criteria. Search queries were constructed using four keyword families summarized in
Table 1: object terms, signal terms, task terms, and method terms. The exact database-specific search strings and the dates on which each search was executed are provided in
Appendix A.
These families were combined using Boolean AND operators to construct database-specific query strings of the form
3.2. Screening Workflow and Study Selection
All retrieved records were exported into a structured spreadsheet and deduplicated using title, author, year, and publication venue metadata. Screening proceeded in two stages. First, titles and abstracts were reviewed to remove studies that were clearly out of scope. Second, the remaining articles were assessed through full-text review against the inclusion and exclusion criteria described below. A PRISMA-style flow diagram summarizing the identification, screening, exclusion reasons, and final inclusion counts is provided in
Figure 4. A total of 297 records were identified through database searching and citation chaining. After the removal of 152 duplicate records arising from overlapping database coverage and inconsistent bibliographic metadata, 145 unique records remained for title and abstract screening. Of these, 38 studies met the inclusion criteria and were selected for full-text assessment. Only 29 studies satisfied the eligibility requirements and constitute the final corpus of works systematically reviewed in this paper. The included studies are [
15,
16,
20,
22,
26,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43,
44,
45,
46,
47,
48,
49,
50,
51,
52].
Although a substantial body of literature addresses photometric characterization or attitude estimation of RSOs, only a limited subset explicitly formulates light-curve analysis as a supervised or representation-learning classification problem, which explains the relatively small number of studies retained in the final review corpus. This review follows a systematic survey workflow inspired by PRISMA but does not perform a quantitative meta-analysis, as the included studies employ heterogeneous datasets, label definitions, and evaluation protocols that preclude statistically normalized performance aggregation. Title/abstract screening and full-text eligibility assessment were conducted by the primary reviewer (M.H.), with inclusion decisions discussed and validated with co-authors when ambiguity arose. Consistent with PRISMA-ScR guidance for scoping reviews, no formal quality appraisal of individual studies was performed.
3.3. Inclusion and Exclusion Criteria
Studies were included if they (i) were peer-reviewed journal or conference publications from 2014 to 2025 (with preprints excluded unless a peer-reviewed version was available), (ii) used optical light curves or photometric time series as the primary input signal, and (iii) applied machine learning or deep learning methods to assign discrete labels to RSOs, such as attitude state (e.g., stable vs. tumbling), object type, platform or bus family, rocket-body subtype, or related categorical taxonomies.
Studies were excluded if they focused exclusively on physical parameter estimation without categorical labeling (e.g., reflectivity or Bidirectional Reflectance Distribution Function, BRDF, and modeling), addressed only image-level preprocessing or light-curve extraction without ML/DL-based classification, investigated astronomical targets outside the SSA context, or applied light curves solely for anomaly detection without a classification objective. A limited number of doctoral theses were included when they introduced foundational datasets or methodological pipelines frequently cited but not fully documented in peer-reviewed venues; in this review, this included the doctoral thesis by [
52]. These sources were treated as limited supporting gray literature rather than as substitutes for the peer-reviewed core review corpus.
3.4. Data Extraction and Coding Framework
For each included study, a structured set of attributes was extracted to enable consistent comparison across heterogeneous tasks and datasets. These included publication metadata (authors, year, and venue), classification objective and label granularity (including number of classes), dataset source (simulated, measured, or mixed) and sensor details when reported, preprocessing and representation strategy (raw sequences, phase-folded curves, handcrafted features), model family (feature-based ML, CNN/RNN/Transformer architectures, self-supervised learning, and domain adaptation), evaluation protocol (train/test split strategy and cross-validation), and performance metrics (accuracy, F1 score, precision, and per-class recall when available). This coding framework supported both quantitative summaries and qualitative synthesis of methodological trends. Following study inclusion, data extraction and charting were conducted by the primary reviewer (M.H.) using a predefined coding framework developed for this review. Inclusion decisions were finalized prior to extraction, and no additional eligibility judgments were made during the data extraction stage. Because the objective of this scoping review was exploratory and descriptive, calibrated extraction forms and inter-reviewer reliability procedures were not employed. No formal review protocol was prospectively registered prior to conducting the study. Review materials and methodological details were later documented on the Open Science Framework [
53].
3.5. Limitations of the Methodology
Several limitations should be acknowledged. First, restricting the search to English-language publications may omit relevant studies reported in other languages. Second, the review is limited to the publicly available literature and does not include proprietary or restricted studies, which may report additional operational results not accessible in the open domain. Third, reported performance metrics vary across studies and are often evaluated under non-standardized protocols, limiting direct comparability and precluding quantitative meta-analysis. These limitations should be considered when interpreting reported performance trends and cross-study comparisons.
4. Light-Curve Databases
Publicly available RSO light-curve databases were examined to assess the distribution, coverage, and observational biases of photometric measurements used in machine learning and deep learning studies. Particular attention was given to the orbital regimes, object classes, and sensor characteristics represented in each dataset, as these factors directly influence the feasibility and generalizability of data-driven classification approaches. Representative light curves from the examined databases are shown in
Figure 5. A summary of the sensor and observational characteristics of each database is provided in
Table 2.
The Mini-Mega TORTORA (MMT) system currently forms the foundation of most open research on machine learning and deep learning classification of RSO light curves [
54]. Its large-scale coverage, long operational history, and public accessibility have made it the primary source of labeled photometric data used in published studies. In contrast, the Space Debris Light-Curve Database (SDLCD) [
55], though providing high-quality photometric measurements, has not yet been adopted as a primary training dataset in ML/DL classification research. A smaller number of works rely on light curves from the Ukrainian Database and Atlas [
56], particularly for LEO-focused classification experiments [
22]. Commercial optical databases operated by organizations such as Electro Optic Systems (EOS) [
57] and the ExoAnalytic Global Telescope Network (EGTN) [
58] were excluded, as access requires paid subscriptions or restrictive licensing, preventing independent benchmarking and reproducibility.
4.1. Mini-Mega TORTORA (MMT)
MMT is a wide-field, ground-based optical monitoring facility operated at the Special Astrophysical Observatory of the Russian Academy of Sciences [
54]. The system consists of nine co-mounted optical channels (MMT-9), each equipped with an Andor Neo sCMOS detector, enabling high-cadence, wide-area sky monitoring. The system operates in a wide-field survey mode (∼900
) and a narrow-field follow-up mode (∼100
). Observations are typically acquired using Johnson-Cousins B, V, and R filters, as well as unfiltered white-light and polarimetric configurations. The publicly accessible MMT archive contains light curves for 12,932 unique RSOs, with object identities cross-referenced to NORAD Two-Line Elements (TLEs) and the McCants classified satellite catalog [
59]. Notably, observations of objects associated with Commonwealth of Independent States (CIS) country codes are not publicly released, introducing a geopolitical bias into the dataset.
4.2. Space Debris Light-Curve Database (SDLCD)
SDLCD is operated by the Astronomical and Geophysical Observatory in Modra, Slovakia, using the AGO70 telescope [
55]. The system employs an FLI ProLine PL1001 Grade 1 CCD camera and is optimized for targeted observations of objects in GEO, Geostationary Transfer Orbit (GTO), and Molniya orbits. Observations are conducted through Johnson–Cousins BVRI filters with typical exposure times ranging from 1.0 s to 5.0 s under terrestrial tracking. The archive contains 2224 light curves corresponding to 791 unique RSOs. Compared to MMT, SDLCD provides a smaller, curated collection intended primarily for photometric analysis and periodicity studies rather than large-scale ML/DL classification tasks.
4.3. Ukrainian Database and Atlas
The Ukrainian Database and Atlas was established by the Astronomical Observatory of Odessa I.I. Mechnikov National University to archive photometric observations of RSOs collected between 2012 and 2020 [
56]. Observations were conducted using the KT-50 telescope equipped with a Watec-902H2 CCD camera. The database contains light curves for approximately 340 LEO objects [
60]. Although the dataset was explicitly designed to support RSO classification tasks, the public archive has been offline since 2022, limiting its accessibility and long-term utility for reproducible benchmarking.
4.4. Actionable Datasets Versus Benchmark-Ready Datasets
Although the three databases examined are publicly accessible, none constitute benchmark-ready datasets in the ML sense. Here, we distinguish between an actionable dataset and a benchmark-ready dataset. An actionable dataset contains sufficient photometric signal, object identity information, and auxiliary metadata to support ML/DL experimentation after study-specific preprocessing, metadata reconciliation, and label curation. A benchmark-ready dataset goes further by enabling reproducible comparison across studies through standardized data preparation and evaluation design.
In the context of RSO light-curve classification, a benchmark-ready dataset should satisfy, at minimum, the following requirements: (i) stable public access and versioned data release; (ii) standardized train/validation/test splits, ideally defined at both the object and track levels; (iii) a consistent and documented label taxonomy; (iv) transparent preprocessing and inclusion/exclusion rules; (v) track-level observational metadata sufficient to interpret acquisition context, such as cadence, filter, phase angle, and observing geometry; and (vi) a recommended metric suite and evaluation protocol for reproducible model comparison.
By this definition, current public archives such as MMT, SDLCD, and the Ukrainian Database should be viewed as valuable public source archives rather than benchmark-ready community datasets. MMT provides the largest publicly available archive and underpins much of the published ML/DL literature, but it exhibits strong class and orbital imbalance and still requires additional curation before standardized use. SDLCD offers high-quality targeted observations, but its scale and design are better suited to photometric analysis than to community-wide ML benchmarking. The Ukrainian Database is particularly relevant for LEO classification studies, but its long-term reproducibility is limited because the public archive has been offline since 2022.
As a result, most published ML and DL studies still rely on study-specific preprocessing, metadata reconciliation, label mapping, and split design, which limits direct comparability across reported results. This gap between data availability and benchmark readiness remains a major obstacle to systematic progress in light-curve-based RSO classification.
4.5. Database Characterization
To characterize the distribution of RSOs represented in the examined light-curve databases, each object was cross-referenced by NORAD ID with metadata obtained from DISCOS (Database and Information System Characterising Objects in Space) and Space-Track.org. Orbital regimes were categorized as LEO, MEO, GEO, and HEO following standard definitions. Objects lacking valid orbital parameters were assigned to an “Unknown” category. Object class labels (payload, rocket body, and debris) were obtained from Space-Track.org [
61].
Figure 5 summarizes the aggregated databases by object class and orbital regime.
Across all three archives, payload objects dominate the available observations (
Figure 5A), with payloads comprising over 80% of MMT observations. This pronounced class imbalance motivates the use of class-balanced subsets or cost-sensitive learning strategies in prior work. Rocket bodies are comparatively underrepresented in MMT but constitute the majority of SDLCD targets, reflecting that database’s emphasis on high-altitude debris and disposal orbits.
Orbital-regime coverage also differs substantially across datasets (
Figure 5B). MMT is heavily concentrated in LEO (89.2%), with smaller fractions in MEO (6.4%) and limited representation of higher orbits. In contrast, SDLCD provides broader coverage of higher-altitude regimes (43.8% MEO, 12.7% GEO, 26.8% HEO), while the Ukrainian Database and Atlas is almost exclusively LEO focused (94.7%), reflecting its design for short-duration, high-cadence observations of low-altitude objects. These differences highlight the strong observational biases introduced by sensor capability, survey strategy, and target selection, which directly affect the generalizability of ML/DL classification models trained on any single archive. These dataset-level biases likely contribute to the variability in reported model performance across studies, particularly when models trained on one archive are evaluated under different orbital, sensor, or label distributions.
In addition to orbital and class distributions, object size information was examined using geometric cross-sectional area estimates from ESA’s DISCOS database [
62]. Geometric cross-sectional area represents the projected surface area of an RSO and serves as a natural size-based discriminator that can improve classification performance. However, DISCOS does not provide cross-sectional information for most debris fragments. Consequently, the size analysis in
Figure 6 is restricted primarily to payloads and rocket bodies, potentially underrepresenting the smaller end of the size distribution. MMT primarily observes larger objects (median∼10
), while incorporating SDLCD, and the Ukrainian Database broadens the observed size range to approximately 1–100
, yielding a more diverse training distribution that can reduce size-related biases in learned decision boundaries.
5. Methods for Light-Curve-Based RSO Classification
Prior research on light-curve-based classification of RSOs has progressed through three methodological paradigms: (i) physics- and estimation-driven inference pipelines that recover attitude-related parameters and apply thresholding or rule-based labeling; (ii) supervised machine learning using handcrafted features extracted from photometric time series; and (iii) DL approaches that learn task-relevant representations directly from raw or minimally processed light curves. These paradigms differ in the degree of prior knowledge they assume, the amount of manual feature design required, and their robustness to real-world effects such as irregular sampling, atmospheric turbulence, and sensor-dependent calibration.
This section synthesizes classification-oriented ML/DL methods used in the reviewed studies and organizes them around two recurring design choices: (1) modeling approach and input representation, i.e., whether the pipeline relies on handcrafted features with conventional ML classifiers or learns representations end-to-end from raw or transformed light curves; and (2) task definition (label space), i.e., which semantic categories are being inferred (e.g., attitude state, coarse object type, platform/bus family, or hierarchical label structures).
Table 3,
Table 4,
Table 5 and
Table 6 consolidate datasets, algorithm families, target label spaces, and reported accuracies. The reviewed studies are grouped into simulated studies and real-data studies. This organization helps distinguish results obtained under synthetic conditions from those evaluated on observational datasets. When multiple datasets are evaluated within a single study, accuracies are reported per dataset; when multiple algorithms are evaluated on the same dataset, only the best-performing result is shown. Across
Table 3,
Table 4,
Table 5 and
Table 6, classification tasks are described using a consistent taxonomy: attitude state (e.g., stable vs. tumbling), object type (e.g., payload, rocket body and debris), platform/family (e.g., Starlink, Iridium, and standardized bus families), shape/size proxies, rotation state/dynamics, and hierarchical or hybrid tasks combining multiple semantic levels. The performance values presented in the tables correspond to the best-reported result from each study, with the best-performing model highlighted in bold in the algorithm column. Wherever available, mean accuracy or test accuracy is reported; otherwise, alternative evaluation metrics provided by the original study are included.
Table 3 summarizes traditional ML approaches for RSO classification using simulated light curves reported in the literature. The simulated data are generated using several photometric and rendering models, including the Ashikhmin–Shirley [
63], Cook–Torrance BRDF [
64], Blender-based simulations [
65], and other synthetic observational scenarios. Across these studies, a broad range of algorithms has been explored, including bagged trees [
66], Random Forest (RF) [
67], Logistic Regression (LR) [
68], Naive Bayes (NB) [
69], k-Nearest Neighbors (k-NN) [
70], Neural Networks (NNs) [
71], CN2 [
72], Decision Trees (DT) [
73], Support Vector Machines (SVMs) [
74] and XGBoost [
75] combined with Wavelet Scattering Transform (WST) [
76]. Among these methods, RF and SVMs are the most commonly used and consistently among the strongest performers. The corresponding classification tasks include object type, attitude state, and shape/configuration, with most simulated-data studies reporting high performance, often exceeding 90% accuracy.
Table 4 tabulates traditional machine learning approaches applied to real observational light curves. The studies draw on several public databases, including MMT, EOS, EGTN, IWF SPARC, and the Ukrainian database, while some also use internally curated private light-curve datasets. In addition to the methods listed in
Table 3, these studies explore a broader set of algorithms, including Stochastic Gradient Descent (SGD) [
77], Cost-Sensitive Random Forest (CSRF) [
78], Linear Discriminant Analysis (LDA) [
79], subspace K-NN [
80], Feedforward Neural Networks (FFNNs) [
81], Hidden Markov Model–Random Forest (HMM-RF) [
82], 1-NN with Euclidean Distance (ED) [
83], and 1-NN with Dynamic Time Warping (DTW) [
84]. Across the observational studies, RF and SVMs are the most commonly used and consistently among the strongest performers in the real light curves as well. The classification tasks are broadly similar to those considered for simulated light curves, including object type, attitude state, and shape/configuration, although several real-data studies also place greater emphasis on platform- and family-level discrimination. Overall, the reported performance on real light curves is generally lower than that obtained on simulated datasets, reflecting the greater noise, variability, and labeling challenges present in observational data.
Table 5 provides an overview of deep learning approaches for simulated light-curve-based RSO classification. In addition to the simulation and modeling frameworks summarized in
Table 3, these studies also employ Phong [
63] and Beard–Maxwell [
85] reflectance models. The simulated light curves are generated predominantly for GEO scenarios. The reviewed papers investigate a broad range of DL architectures, including convolutional neural networks (CNNs) [
86], fully connected neural networks (FCNNs) [
87], deep neural networks (DNNs) [
88], ENDE/ENCLA variants [
89,
90,
91], long short-term memory networks (LSTMs) [
92], multi-scale convolutional neural networks (MCNNs) [
93], LSTM-FCNs [
94], and convolutional autoencoder (CAE)-CNN variants [
95]. The associated classification tasks are broadly consistent with those in the traditional ML literature, spanning object type, attitude state, and shape/configuration, with several DL studies placing greater emphasis on explicit shape-class prediction. Reported accuracies are generally high, often exceeding 95%, although these values are not directly comparable because of differences in datasets, preprocessing, and evaluation settings.
Table 6 summarizes the deep learning approaches developed for RSO classification using observational light curves. Beyond the datasets introduced earlier, the reviewed studies also incorporate IWF SPARC and a private dataset transformed into short-time Fourier transform (STFT) representations. Transfer learning has additionally been explored, particularly through the adaptation of models pretrained on Blender-simulated light curves to the EOS dataset. Recent studies have introduced a range of advanced architectures, including model-agnostic meta-learning (MAML) [
96], ConvLSTM-CNN models [
97], CoAtNet [
98], HRCNN [
99], Transformer [
100], 1D-ResNet [
101], LC-VAE [
102], and Barlow Twins [
103]. Across the reviewed studies, MMT is the most frequently used observational dataset, and CNN-based variants remain the most common and often the best-performing approaches. Nevertheless, classification performance on observational light curves is generally lower than that reported for simulated data as expected given the greater measurement noise, class overlap, and variability present in real observations.
Taken together,
Table 3,
Table 4,
Table 5 and
Table 6 illustrate a clear progression in light-curve-based RSO classification, from feature-engineered pipelines with conventional learners to end-to-end deep learning models that operate on raw or transformed photometric sequences. While both paradigms report strong performance under controlled conditions, their assumptions, data requirements, and failure modes differ substantially. Notably, approaches explicitly targeting domain shift (e.g., transfer learning and meta-learning) report improved performance under synthetic-to-real transfer relative to naïve training on limited real data, indicating that robustness—not only architecture choice—drives practical gains under operational conditions. The following subsections examine these approaches in detail, beginning with supervised ML methods based on handcrafted features.
5.1. Supervised ML with Handcrafted Features
Early and classical ML pipelines convert a light curve into a fixed-length feature vector and classify it using conventional supervised learners such as DT, RF, SVM, and k-NN. This design is attractive in SSA settings because it supports smaller datasets, yields compact representations, and enables partial interpretability by linking features to physically meaningful signal properties (e.g., periodicity strength, amplitude statistics, or regression coefficients). Accordingly, feature-based ML remains common for low-cardinality label spaces such as binary attitude-state discrimination (stable vs. tumbling) or coarse object-type classification (payload vs. rocket body vs. debris) [
16,
22,
29,
30,
31,
35].
Feature families: Handcrafted features span multiple domains: (i) spectral and periodic descriptors, including dominant frequency components, peak structure, and cepstral representations to capture rotational signatures [
29]; (ii) regression and parametric fits, where coefficients from polynomial, spline, or Fourier-series fits become discriminative attributes [
16,
30,
33]; (iii) summary statistics (variance, skewness, and robust dispersion measures), sometimes computed via time-series feature toolkits [
31] or TSFresh [
51]; and (iv) multi-resolution representations such as wavelet coefficients, wavelet scattering transforms, or empirical mode decomposition to capture nonstationary behavior [
22,
42,
47].
Strengths and limitations: Across studies, feature-based ML can achieve high accuracy on simulated or carefully curated datasets [
15,
29,
34]. However, performance is sensitive to preprocessing choices (normalization, detrending, and phase folding), feature definitions, and class imbalance. Overall accuracy can be inflated when majority classes dominate, motivating cost-sensitive learning, resampling (e.g., SMOTE), or class-balanced subsets [
16,
31]. These patterns align with the database-level imbalance described in
Section 4.5.
5.2. Deep Learning for End-to-End Representation Learning
Deep learning methods reduce reliance on manual feature design by learning hierarchical representations from raw sequences or lightly processed inputs (e.g., normalized magnitude sequences or time–frequency transforms). CNNs are widely adopted because convolutional filters capture local motifs in brightness variation and can be applied in one dimension to time series or in two dimensions to time–frequency images (e.g., short-time Fourier transform and STFT spectrograms) [
20,
36,
37,
40]. Recurrent models, particularly LSTM, explicitly model temporal dependencies and have been used for classification from short observation windows, including LEO stable/tumbling discrimination [
22]. More recent work explores attention-based and hybrid architectures that combine convolutional encoders with temporal modeling components [
38,
47,
49].
5.3. Sim-to-Real Generalization and Domain Transfer
A persistent theme across DL studies is the performance gap between models trained on simulated light curves and deployment on measured observations. This
sim-to-real gap refers to the challenge of transferring models learned in simulation to real-world environments under distribution shift, where discrepancies between simulated and measured data lead to degraded target-domain performance [
104]. In the context of light curves, synthetic data are generated under simplified assumptions about shape, material, attitude dynamics, and noise, whereas real measurements reflect complex illumination geometry, atmospheric effects, calibration differences, and sensor-specific artifacts. Empirically, multiple studies report near-ceiling accuracy on simulated datasets but substantial degradation on real data (
Table 5 and
Table 6) [
20,
39]. Two families of strategies recur in the reviewed literature.
Transfer learning (supervised domain adaptation): Transfer learning typically pretrains a neural network on a large source domain (often synthetic light curves) and then fine-tunes the model on a smaller labeled target domain (observed light curves). More broadly, this setting can be viewed as supervised domain adaptation, which seeks to minimize performance degradation under distribution shift by aligning source and target domains [
105]. This strategy leverages representation reuse and can reduce labeled-data requirements in the observational domain [
26,
41]. A representative example is provided by [
26], who applied transfer learning to a 1D convolutional neural network by pretraining on simulated Blender light curves and fine-tuning on real observational data. When transferring from simulation to the EOS dataset, the approach achieved 78.3% classification accuracy, improving performance by 3% relative to a baseline CNN trained only on EOS. This result provides empirical evidence that simulation-to-real transfer can mitigate domain shift and reduce labeled-data requirements in real light-curve classification.
Few-shot and self-supervised learning: Few-shot learning targets regimes where only a handful of labeled examples exist for new objects or classes and aims to generalize new tasks from limited supervision by exploiting prior knowledge learned across related tasks [
106]. MAML learns an initialization enabling rapid adaptation to new tasks from limited labeled samples [
39]. Self-supervised learning reduces dependence on labels by training representations using pretext objectives on unlabeled data. Barlow Twins is a redundancy-reduction method that learns invariant representations from augmented views of the same input and has been applied to RSO light curves to improve classification under limited labeled data and varying track durations [
52,
103].
5.4. Model Performance and Evaluation
Table 3,
Table 4,
Table 5 and
Table 6 report accuracies as stated in the reviewed subset of literature. Accuracy is defined as
where TP, TN, FP, and FN denote true positives, true negatives, false positives, and false negatives, respectively. While accuracy is widely reported, it is not sufficient for imbalanced label spaces typical of public RSO datasets (
Section 4.5). A classifier that predicts the majority class can attain deceptively high accuracy without meaningful discriminative capability. Accordingly, when studies reported additional metrics (e.g., precision, recall, F1-score, or per-class performance), these results were considered qualitatively during synthesis, even when the tables focus on accuracy for consistency. This issue is compounded when repeated observations of the same RSO are split at the track level rather than at the object level. In such cases, the model may encounter highly similar light curves from the same object in both training and test sets, inflating reported accuracy by rewarding recognition of previously observed targets rather than true generalization to unseen RSOs. For this reason, object-disjoint splits and class-sensitive metrics such as macro
, balanced accuracy, and per-class recall provide a more informative assessment of operationally relevant performance than overall accuracy alone. The lack of standardized evaluation protocols and reporting conventions across studies remains a barrier to direct comparability, reinforcing the need for benchmark-ready datasets with defined splits and metric suites.
5.5. Summary of Method–Task Alignment
Across the reviewed corpus, model choice is closely tied to label-space complexity. Feature-based ML methods are most effective when categories are coarse and decision boundaries are separable using engineered descriptors (e.g., stable vs. tumbling, or coarse object type) [
16,
22,
29,
31]. Deep learning approaches dominate tasks with higher intra-class variability and nonlinear temporal behavior, including platform- or family-level labeling and higher-granularity classification where discriminative cues are distributed across time and frequency scales [
41,
47,
49]. However, even sophisticated architectures remain constrained by (i) the limited availability of labeled observational data, (ii) repeated observations of the same objects inflating apparent performance, and (iii) heterogeneous preprocessing and evaluation protocols. These constraints motivate the challenges and future directions discussed in
Section 6 and
Section 7.
8. Conclusions
Light-curve-based classification is a promising data-driven approach for characterizing RSOs from passive optical observations. Since 2014, the field has progressed from handcrafted feature extraction and rule-based inference toward deep learning models that learn discriminative representations directly from photometric sequences. This systematic scoping review synthesizes 29 studies (
Table 3,
Table 4,
Table 5 and
Table 6) and consolidates the past decade of ML/DL efforts into a unified reference for current capabilities and remaining gaps. Collectively, the reviewed methods demonstrate potential for distinguishing object type, platform/family, and attitude state, particularly when training data are sufficiently diverse and evaluation is conducted under realistic split assumptions.
Reported success often does not generalize beyond specific datasets and observing conditions due to persistent challenges in data scarcity, class imbalance, multiplicity bias, and limited reproducibility (
Section 6). At present, only three publicly documented public repositories are widely referenced in the open literature (MMT, SDLCD, and the Ukrainian Database and Atlas), and the Ukrainian archive has been offline since 2022. Consequently, many reported accuracies primarily reflect dataset-specific properties rather than robust performance across sensors, illumination geometries, and orbital regimes.
This review clarifies the current capabilities and limitations of light-curve-based RSO classification by consolidating evidence from both simulation-based and observational studies. Progress in this field will be driven not primarily by increasingly complex architectures, but by the availability of representative open datasets and consistent benchmarks (
Section 7). Benchmark-ready resources incorporating measured light curves across multiple observation platforms, including ground-based, stratospheric, and space-based sensors, would enable reproducible comparison and provide a common testbed for domain adaptation, few-shot learning, and physics-informed modeling.
As space activity increases and orbital congestion intensifies, reliable, interpretable, and scalable classification pipelines will become increasingly important for space-domain awareness and orbital safety. By addressing the gaps identified here, the community can unlock the full potential of data and AI-driven light-curve classification and accelerate its transition from proof-of-concept demonstrations to operational capability within next-generation SSA frameworks.