Next Article in Journal
Attitudes Towards Sustainability at Business Schools: A Comparative Study of Students at Local and International Universities in Singapore
Previous Article in Journal
A Review of U.S. Education Policy on Integrating Science and Mathematics Teaching and Learning
Previous Article in Special Issue
Synergizing Knowledge Graphs and LLMs: An Intelligent Tutoring Model for Self-Directed Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Integrating Exploratory Data Analysis and Explainable AI into Astronomy Education: A Fuzzy Approach to Data-Literate Learning

by
Gabriel Marín Díaz
1,2
1
Faculty of Statistics, Complutense University, Puerta de Hierro, 28040 Madrid, Spain
2
Science and Aerospace Department, Universidad Europea de Madrid, Villaviciosa de Odón, 28670 Madrid, Spain
Educ. Sci. 2025, 15(12), 1688; https://doi.org/10.3390/educsci15121688
Submission received: 11 November 2025 / Revised: 2 December 2025 / Accepted: 4 December 2025 / Published: 15 December 2025

Abstract

Astronomy provides an exceptional context for developing data literacy, critical thinking, and computational skills in education. This paper presents a project-based learning (PBL) framework that integrates exploratory data analysis (EDA), fuzzy logic, and explainable artificial intelligence (XAI) to teach students how to extract and interpret scientific knowledge from real astronomical data. Using open-access resources such as NASA’s JPL Horizons and ESA’s Gaia DR3, together with Python libraries like Astroquery and Plotly, learners retrieve, process, and visualize dynamic datasets of comets, asteroids, and stars. The methodology follows the full data science pipeline, from acquisition and preprocessing to modeling and interpretation, culminating with the application of the FAS-XAI framework (Fuzzy-Adaptive System for Explainable AI) for pattern discovery and interpretability. Through this approach, students can reproduce astronomical analyses and understand how data-driven methods reveal underlying physical relationships, such as orbital structures and stellar classifications. The results demonstrate that combining EDA with fuzzy clustering and explainable models promotes deeper conceptual understanding and analytical reasoning. From an educational perspective, this experience highlights how inquiry-based and computationally rich activities can bridge the gap between theoretical astronomy and data science, empowering students to see the Universe as a laboratory for exploration, reasoning, and discovery. This framework thus provides an effective model for incorporating artificial intelligence, open data, and reproducible research practices into STEM education.

1. Introduction

In recent years, the integration of real scientific data and computational methods into education has gained increasing relevance to fostering analytical thinking and scientific literacy among students. Astronomy provides a powerful context for this integration, as it combines mathematical reasoning, data interpretation, and the use of open scientific databases. The rapid availability of astronomical archives, such as NASA’s JPL Horizons (Giorgini et al., 1996), the (Minor Planet Center [MPC], 2024), and the European Space Agency’s Gaia mission (Tanga et al., 2023), has transformed the way in which learners can engage directly with authentic scientific data. This change allows students to visualize, analyze, model, and interpret celestial phenomena using the same tools employed in research.
Within this framework, project-based learning (PBL) has emerged as a highly effective pedagogical strategy for promoting student engagement and inquiry-based exploration (Krajcik & Blumenfeld, 2005). By working on real-world problems, learners develop computational and analytical competencies that bridge theoretical concepts and practical applications. In the context of astronomy, PBL allows students to explore orbital mechanics, photometric properties, and stellar classification through datasets that reflect the complexity and uncertainty of real scientific observations. Moreover, the combination of programming and data analysis supports the development of data literacy, a key competency in modern STEM education.
The first phase of this study focuses on the analysis of comets and asteroids, which serve as a gateway for understanding orbital dynamics and observational geometry. Using the Python library Astroquery (Ginsburg et al., 2019), students retrieve orbital vectors, heliocentric coordinates, magnitudes, and solar elongations from open-access repositories. Through exploratory data analysis (EDA), they learn to visualize heliocentric trajectories, identify temporal patterns, and quantify relationships among key variables. This hands-on process transforms abstract orbital parameters into dynamic representations, reinforcing both conceptual understanding and technical skills.
In the second phase, the approach is extended to stellar data from Gaia DR3, allowing students to investigate the physical and photometric characteristics of stars. Here, the Hertzsprung–Russell (HR) diagram becomes a central analytical tool to study stellar evolution and luminosity–temperature relationships (Langer & Kudritzki, 2014). The dataset is enriched with derived features such as absolute magnitude and color indices, and students apply statistical and computational models to identify natural groupings within the data. The use of fuzzy clustering (Fuzzy C-Means) introduces the concept of soft classification, enabling learners to reason about uncertainty and gradual transitions between stellar types (Marín Díaz et al., 2024).
To guide interpretation and foster deeper understanding, the FAS-XAI (Fuzzy-Adaptive System for Explainable Artificial Intelligence) framework is implemented (Marín Díaz, 2025a). FAS-XAI integrates fuzzy logic for uncertainty modeling (Sugeno & Yasukawa, 1993), adaptive scoring for pattern recognition, and explainable AI (XAI) techniques for model interpretability (Molnar, 2019). In this context, machine learning algorithms such as XGBoost are trained to predict fuzzy memberships derived from clustering, while LIME (Local Interpretable Model-agnostic Explanations) and feature importance analyses provide transparent explanations of the model’s reasoning. The resulting framework helps students connect mathematical models with physical meaning, linking predictive accuracy with interpretability.
Overall, this study demonstrates how open astronomical data, combined with computational modeling and interpretability techniques, can serve as a transformative resource for STEM education. By integrating scientific practice into a pedagogical structure, the project contributes to improving participation and data-based reasoning. It also illustrates how diffuse and explainable methods can shed light on the uncertainty and complexity inherent in the universe, turning it into both an object of study and a laboratory for learning.
The methodological structure of this study follows a five-phase process inspired by the Knowledge Discovery in Databases (KDD) paradigm (Shafique & Qaiser, 2014):
  • Phase 1 involves the selection and acquisition of data from open astronomical repositories such as NASA JPL Horizons, the Minor Planet Center, and ESA Gaia DR3, focusing first on comets and asteroids and later stellar datasets.
  • Phase 2 addresses data preprocessing and enrichment, including the computation of orbital distances, magnitudes, and derived photometric parameters.
  • Phase 3 consists of an EDA to uncover statistical patterns, temporal trends, and spatial relationships within the heliocentric and stellar contexts.
  • Phase 4, Fuzzy C-Means clustering is applied to model uncertainty and identify fuzzy groupings of celestial objects based on their photometric and positional attributes.
  • Phase 5 introduces the FAS-XAI framework, combining fuzzy logic, adaptive scoring, and XAI techniques such as LIME and feature importance analysis to interpret the results.
This progressive structure provides a coherent pathway from raw data to interpretable insights, linking computational modeling with educational objectives in data literacy and scientific reasoning. The work is guided by the following questions:
  • How can EDA, fuzzy logic, and XAI be combined to create a reproducible and pedagogically meaningful workflow for astronomy education?
  • What types of scientific and interpretative insights can students obtain when working directly with real orbital and stellar datasets?
  • How does the FAS-XAI framework support students in understanding uncertainty, classification, and interpretability in data-driven contexts?

2. Related Work

2.1. Astronomy as a Gateway to Scientific Learning

Astronomy has long been recognized as an interdisciplinary domain for fostering scientific literacy, curiosity, and conceptual understanding in education. Through the observation and interpretation of celestial phenomena, students naturally engage with fundamental concepts of physics, mathematics, and data analysis, developing reasoning and abstraction skills that transfer across disciplines.
Recent studies highlight how astronomy-based learning enhances students’ quantitative and scientific literacy, even among non-STEM undergraduates (Uzpen et al., 2019), demonstrated that quantitative literacy correlates strongly with scientific literacy, emphasizing that improving one dimension requires supporting the other, particularly in introductory physics and astronomy courses that attract students with diverse academic backgrounds.
Moreover, astronomy serves as an entry point to scientific culture, where curiosity-driven exploration motivates learners to think critically about evidence and uncertainty. Large-scale longitudinal studies, such as those by (Buxner et al., 2018), show that even brief exposure to astronomy courses can positively influence students’ attitudes toward science and their ability to evaluate scientific information, reinforcing astronomy’s value as a “gateway science.”
At the institutional level, initiatives like (Lee, 2017) and national curricular analyses (Rodrigues et al., 2025) underscore that astronomy’s presence in formal education remains limited, despite its high potential for interdisciplinary integration. Projects such as Mission to Stars (Domenech-Casal & Ruiz-Espana, 2017) illustrate how inquiry and project-based learning can transform astronomy into an authentic context for scientific investigation, linking conceptual understanding with creativity, spatial reasoning, and data analysis.
Furthermore, innovative programs like (Costa et al., 2025) demonstrate how teacher training initiatives centered on astronomy democratize access to science, particularly in underrepresented communities, by promoting inquiry-based pedagogies and cross-curricular collaboration. Complementarily, (Huerta-Cancino & Ale-Silva, 2024) explored the use of augmented reality (AR) tools to overcome students’ alternative conceptions about celestial phenomena, reinforcing astronomy’s potential to develop critical thinking and scientific literacy through emerging technologies.
Finally, cultural and historical perspectives, such as those examined by (Ferreira et al., 2025) through Greek Astromythology, show how astronomy can bridge science, history, and philosophy, enabling students to contextualize scientific knowledge within broader human narratives. These approaches position astronomy as a scientific discipline and a cultural gateway to epistemological reflection and interdisciplinary dialog.
In summary, the literature converges on the idea that astronomy provides a unique educational ecosystem where observation, modeling, and inquiry naturally intertwine. When combined with modern pedagogical frameworks, particularly project-based and data-driven learning, it becomes a catalyst for developing scientific reasoning, data literacy, and a reflective understanding of science as a human enterprise.

2.2. Data-Driven Learning and the Integration of Computational Tools

The emergence of open-access astronomical archives from NASA and ESA has enabled educators to incorporate authentic, research-grade datasets into coursework. When combined with Python, Jupyter notebooks, and libraries such as Astroquery and Astropy, students can perform end-to-end workflows of data retrieval, cleaning, modeling, and visualization, engaging with astronomy as a living data science rather than a static body of facts (Akeson et al., 2013; Ginsburg et al., 2019; Giorgini et al., 1996; Tanga et al., 2023).
Recent initiatives have strengthened this approach by developing platforms and infrastructures that facilitate access to professional astronomical data for educational use. For instance, SciServer (Taghizadeh-Popp et al., 2020) provides a collaborative, cloud-based environment integrating Python, R, and SQL within Jupyter notebooks, enabling users, from researchers to students, to analyze data directly from large archives such as the Sloan Digital Sky Survey. Its architecture, designed for scalability and resource sharing, demonstrates how data-intensive astronomy can be effectively adapted for both research and classroom contexts.
Similarly, the renovated Thacher Observatory (Swift et al., 2022) illustrates the pedagogical potential of integrating automated observatories and computational workflows into high-school education. The observatory’s automated photometric and follow-up programs enable students to collect, process, and interpret real data on variable stars, supernovae, and exoplanet transits, fostering an authentic scientific experience grounded in open-source computational tools. These examples showcase how combining physical infrastructure with data-driven computational environments promotes inquiry, critical reasoning, and collaboration.
Pedagogically, these practices align closely with PBL and inquiry-based approaches: learners formulate questions, retrieve data from Gaia, MPC, or JPL Horizons through Astroquery, develop scripts to analyze and visualize the data, and discuss findings in the light of measurement uncertainty and observational constraints.
Such experiences cultivate data literacy, computational thinking, and critical interpretation, competences central to contemporary STEM education (Marín Díaz, 2025b). However, despite the abundance of open data and mature tools, their structured integration into formal educational frameworks remains limited; most initiatives appear as outreach or extracurricular programs rather than systematically embedded curricula. This gap motivates the present work: a reproducible, data-centered educational framework (FAS-XAI) that operationalizes authentic archives and computational tools within PBL to develop critical thinking about evidence, uncertainty, and inference.

2.3. Fuzzy Logic and Uncertainty Modeling in Education

While the use of machine learning and artificial intelligence in education has expanded rapidly, the application of fuzzy logic and fuzzy clustering remains relatively underexplored. Fuzzy systems have been widely used to model uncertainty, partial membership, and human-like reasoning in learning analytics, student assessment, and adaptive systems, offering a more nuanced representation of cognitive processes than crisp classification (Sugeno & Yasukawa, 1993). These approaches can capture gradations in learner performance, confidence, or conceptual understanding, yet their pedagogical deployment is still fragmented.
In contrast, within astronomical and astrophysical research, fuzzy clustering has become a consolidated methodological tool to manage noisy, incomplete, or overlapping data. Applications range from solar flare prediction using hybrid supervised–unsupervised fuzzy models (Benvenuto et al., 2018), to segmentation of solar EUV images for coronal structure analysis (Barra et al., 2009), solar radio spectrogram segmentation (Liu et al., 2024), and asteroid and exoplanet classification (Colazo et al., 2022; Szabó et al., 2023). Similarly, fuzzy clustering methods have been applied to study star-forming regions and molecular clouds, identifying partially overlapping physical regimes (Offner et al., 2022), as well as to galaxy morphology when combined with explainable AI techniques (Marín Díaz et al., 2024). These examples illustrate the effectiveness of fuzzy methods in domains where uncertainty is intrinsic and categorical boundaries are fluid, a condition that mirrors real-world scientific reasoning.
However, despite their success in modeling uncertainty across physical sciences, fuzzy methods remain largely absent from educational practice. Their integration into teaching could provide students with a tangible means of understanding how scientific data often embody degrees of membership, ambiguity, and uncertainty, rather than absolute categories. Introducing fuzzy clustering or inference systems in classroom settings, using open astronomical datasets and accessible tools such as Python and Astroquery, can help students explore uncertainty quantitatively while developing critical and epistemological awareness about the nature of scientific inference.
This alignment between fuzzy reasoning, open data, and interpretability directly supports the objectives of data-driven education and XAI, making fuzzy approaches a promising pedagogical bridge between machine reasoning and human understanding of uncertainty.

2.4. XAI in Educational Contexts

The rapid growth of artificial intelligence applications across educational settings has intensified the demand for transparency, interpretability, and accountability, giving rise to Explainable Artificial Intelligence. In education, XAI has been examined both as a pedagogical tool to help learners understand how predictive systems reach their decisions, and as a didactic principle to foster reflective reasoning and epistemic awareness toward algorithmic processes (Khosravi et al., 2022).
Recent studies in learning analytics and intelligent tutoring systems show that explainable models increase trust and metacognitive engagement, allowing students to visualize how data features affect predictions (Karpouzis, 2024; Lin et al., 2023). Methods such as LIME, SHAP, and feature-importance visualization make the internal logic of models explicit, bridging algorithmic reasoning and human interpretation (Marín Díaz et al., 2025).
Despite this progress, a search in the Web of Science (2025) combining “Explainable Artificial Intelligence”, “Education”, and “Astronomy/Astrophysics” yielded no results. This absence confirms that the intersection between XAI, fuzzy systems, and astronomy-based learning remains unexplored. Yet such a combination could offer an unprecedented methodological bridge: interpretable fuzzy models capable of revealing and explaining complex patterns in astronomical data while cultivating students’ reasoning about uncertainty, evidence, and inference. Within this perspective, the FAS-XAI framework introduces explainability not merely as a computational requirement but as a pedagogical strategy for fostering scientific literacy and interpretive thought in data-driven astronomy education.

2.5. Summary and Research Gap

The reviewed literature shows that astronomy has long been valued as a gateway discipline for promoting scientific literacy, curiosity, and interdisciplinary learning. Recent developments in open-access astronomical archives and computational tools, such as Python, Astroquery, and Jupyter notebooks, have opened new opportunities for authentic, data-driven learning experiences. These resources allow students to engage directly with real data, bridging scientific inquiry and education through project-based learning environments.
At the same time, the rapid evolution of Artificial Intelligence (AI) in education has stimulated interest in adaptive systems and explainability. Fuzzy logic provides a powerful conceptual and mathematical framework to model uncertainty and partial reasoning, while XAI contributes interpretive transparency to machine learning outcomes. Yet, despite advances in each area, there is currently no integrated framework that combines open astronomical data, fuzzy modeling, and XAI for educational purposes. The absence of results in the Web of Science linking XAI, fuzzy systems, and astronomy education confirms the novelty of this intersection.
The FAS-XAI framework addresses this gap by integrating data-driven astronomy with interpretable AI methods in a pedagogically coherent way. Its educational contribution lies in shifting the focus from using AI as a generative or computational aid to understanding AI as an epistemic and reasoning tool. Students are encouraged to generate code and to explain how models work, justify their analytical decisions, and relate model behavior to scientific evidence.
By engaging with real astronomical datasets, where uncertainty, ambiguity, and complexity are inherent, learners develop critical thinking, metacognitive reflection, and an authentic appreciation of the scientific process.

3. Methodology

This study applies the FAS-XAI framework as an educational and analytical methodology that bridges astronomy, data science, and artificial intelligence through PBL.
The objective is to help students acquire data literacy, computational reasoning, and interpretability skills by working directly with real astronomical data. The methodological process integrates open-access databases, Python programming, fuzzy logic, and explainable models, guiding learners through the full data analysis pipeline, from raw data acquisition to interpretable knowledge extraction.
The study was implemented in two complementary contexts. The first focuses on the orbital dynamics of comets and asteroids, using positional and photometric data from NASA’s JPL Horizons and the Minor Planet Center. The second addresses the classification of stars, employing astrometric and photometric parameters from the Gaia DR3 catalog. All datasets were retrieved programmatically using the Astroquery Python library, which provides access to professional astronomical databases via API queries. This ensures both reproducibility and alignment with real-world scientific workflows, with all analyses executed in Python/Google Colab (version 3.12 and standard scientific libraries).
Following the Knowledge Discovery in Databases (KDD) paradigm, the methodology unfolds in sequential yet interconnected phases, Figure 1.
In the data selection and extraction phase, representative subsets of Solar System objects and bright stars (Gmag < 13) were obtained, filtered, and standardized. For comets and asteroids, heliocentric and geocentric vectors were extracted using astroquery.jplhorizons and astroquery.mpc, while stellar data were collected through astroquery.gaia and astroquery.vizier from Gaia DR3. Each dataset was cleaned to remove incomplete records and harmonized in terms of units and timestamps, producing a consistent analytical base.
During the preprocessing and feature engineering phase, additional variables were computed to describe the physical and geometric properties of celestial bodies. For Solar System objects, derived features included heliocentric distance (r), geocentric distance (Δ), solar elongation, and angular velocity. For stars, the absolute magnitude (M_G) and color index (BP_RP) were calculated, allowing the construction of Hertzsprung–Russell (HR) diagrams that reveal intrinsic relationships between luminosity and temperature. These transformations converted the raw observational data into meaningful scientific descriptors, enabling statistical and machine learning analysis.
The EDA stage provided a descriptive and visual understanding of the datasets. Statistical summaries, correlation matrices, and temporal plots of brightness, elongation, and distance helped to identify observational patterns and orbital periodicities in comets and asteroids. In the stellar dataset, two-dimensional HR diagrams displayed natural groupings corresponding to main-sequence stars, giants, and subgiants. This phase served both analytical and pedagogical purposes, encouraging students to identify relationships and anomalies through direct visual reasoning.
To model these latent structures, the Fuzzy C-Means (FCM) algorithm was applied to the stellar data. Using the ( B P R P ) color index and absolute magnitude M_G ( M G ) , three fuzzy clusters were identified, each representing a distinct stellar population. Unlike traditional crisp clustering, FCM assigns a degree of membership ( μ 0 ,   μ 1 ,   μ 2 ) to each observation, capturing gradual transitions between stellar types. The resulting fuzzy partition demonstrated strong internal coherence, as indicated by a high partition coefficient (FPC). The centroid positions corresponded to physically interpretable regions of the HR diagram, approximately representing giants, main-sequence stars, and high-luminosity objects.
In the predictive and explainable modeling phase, the fuzzy results were validated and interpreted using machine learning. An XGBoost classifier was trained to predict discrete cluster assignments, and three XGBoost regressors were developed to estimate the fuzzy memberships of each star. Both achieved high consistency, with correlations between predicted and true memberships exceeding 0.9 for all clusters. Feature importance analysis revealed that the color index ( B P R P ) and the absolute magnitude (M_G) were the most discriminative variables, a result consistent with astrophysical theory. Local explanations using LIME further clarified how specific features contributed to each classification, providing interpretable reasoning for both students and educators.
Finally, the FAS-XAI framework integrates these computational results into a broader educational objective. By engaging students in authentic data analysis and interpretation, it promotes an inquiry-driven learning process where AI is not a black box but an interpretable tool for scientific understanding. The iterative workflow, from data extraction and cleaning to fuzzy modeling and explainability, helps learners develop both technical proficiency and conceptual depth. This methodological design exemplifies how open astronomical data and explainable AI can jointly foster critical thinking, data literacy, and curiosity about the Universe in educational contexts.

3.1. Data Collection and Dataset Construction

The data used in this study were obtained entirely from open-access astronomical repositories, ensuring both reproducibility and educational transparency. Two complementary datasets were constructed to illustrate different analytical and pedagogical dimensions of the FAS-XAI framework: one focused on small Solar System bodies (comets and asteroids), and the other on stellar objects derived from the Gaia mission. Together, these datasets allow students to experience the complete data-science cycle, from raw data acquisition to the interpretation of explainable models, using real scientific data.
For the first dataset, heliocentric and geocentric vectors of selected comets and asteroids were retrieved from NASA’s Jet Propulsion Laboratory (JPL) Horizons service and the Minor Planet Center database. The extraction was performed programmatically through the Astroquery Python library, which provides a direct API connection to these professional repositories. The queries included a representative set of objects, five comets (e.g., Halley, Encke, Churyumov–Gerasimenko, Hartley 2, Tsuchinshan–ATLAS) and five asteroids (e.g., Ceres, Vesta, Eros, Phaethon, Ryugu), for a time span of six years (five years backward and one forward from the current epoch). For each object, the retrieved parameters included position vectors (x, y, z) in astronomical units, velocities (vx, vy, vz), heliocentric distance (r), geocentric distance (Δ), solar elongation (degrees), and apparent magnitude. These were stored in structured CSV and Parquet formats under a unified naming convention (vectors_helio_6y_7d.csv), forming the foundation for subsequent visualization and exploratory analysis.
The second dataset was constructed using data from the Gaia DR3 catalog, accessed via the astroquery.vizier module. A sample of 8000 bright stars with apparent G magnitude below 13 was selected to ensure high photometric accuracy and minimize parallax uncertainty. The retrieved attributes included right ascension (RA), declination (Dec), parallax, proper motion components (pmRA, pmDec), and photometric magnitudes (G, BP, RP). Derived quantities were then computed, such as distance in parsecs (from parallax inversion), absolute magnitude M_G ( M G ) , and color index ( B P R P ). These parameters were consolidated into a clean, analysis-ready table named stars_gaia_dr3_bright_clean.csv, serving as the input for fuzzy clustering and explainable modeling.
The entire data-acquisition process was conducted within a reproducible Jupyter/Colab environment, allowing students and instructors to replicate every step, from the formulation of database queries to the generation of analysis-ready datasets. The materials developed for this activity include interactive visualizations (e.g., 3D heliocentric orbit animations created with Plotly), designed to support hands-on exploration and promote critical engagement with astronomical data.

3.2. Exploratory Data Analysis (EDA)

The EDA phase served as the bridge between raw astronomical datasets and interpretable scientific insights, allowing students to visualize, question, and understand the structure of real observations. From an educational perspective, EDA functions as a cognitive laboratory, where learners move beyond descriptive statistics to engage in pattern recognition, hypothesis generation, and critical reasoning about the data itself.
Comets and Asteroids
For the small-body dataset, the analysis began by inspecting heliocentric and geocentric coordinates retrieved from NASA’s JPL Horizons and the MPC (Giorgini et al., 1996; Minor Planet Center [MPC], 2024). Students learned how to compute orbital positions and visibility conditions by analyzing quantities such as heliocentric distance r , geocentric distance Δ , and solar elongation ϵ .
Orbital motion was contextualized through Kepler’s third law,
T 2 = 4 π 2 G M a 3 ,
where P is the orbital period, a the semi-major axis, and M the solar mass.
By plotting r ( t ) and apparent magnitude m V ( t ) over time, learners visualized the periodic nature of cometary visibility and how geometric alignment affects observational feasibility. Interactive 3D visualizations of orbital trajectories (using Plotly in Python/Google Colab) enabled direct exploration of synodic cycles and orbital inclination.
Stellar Dataset
For the stellar dataset extracted from Gaia DR3 (Tanga et al., 2023), the EDA focused on constructing and interpreting the Hertzsprung–Russell (HR) diagram, a cornerstone in stellar astrophysics and an ideal educational context for data-driven reasoning.
Students calculated the absolute magnitude M G from apparent magnitude G and parallax π (in arcseconds) via the distance modulus:
M G = G + 5 l o g 10 ( π ) + 5
They also derived color indices ( B P _ R P ) to identify stellar populations, revealing the main sequence, red giants, and white dwarfs. Data filtering was performed using the renormalized unit weight error ( R U W E < 1.6 ), helping students to reflect on measurement quality and uncertainty propagation.
Through guided exercises, learners analyzed distributions of parallax, proper motion, and apparent brightness, linking statistical properties (mean, variance, skewness) to their physical interpretation. These analyses fostered discussions about data completeness, observational bias, and error characterization, central skills in modern data literacy.
Pedagogically, this EDA stage represented a structured exploration environment, where inquiry and visualization preceded formal modeling. By modifying queries and visualization parameters, students experienced the iterative nature of scientific discovery, seeing immediate feedback from the data. This hands-on exploration embodies the essence of PBL: curiosity-driven investigation supported by computational reasoning and reproducible workflows in Python/Google Colab.
The quantitative results of this stage, together with the EDA, are presented and discussed in Section 4 (Results and Discussion).

3.3. Fuzzy C-Means Clustering

After the exploratory phase, the next step involved the application of FCM clustering to identify latent structures within the stellar dataset. This algorithm is particularly suited for educational purposes, as it introduces students to the concept of soft classification, where each object can belong to multiple groups with different degrees of membership. Such reasoning closely mirrors the uncertainty inherent in scientific modeling, reinforcing the development of critical thinking and data literacy skills.
The FCM algorithm minimizes the following objective function:
J m = i = 1 N j = 1 C u i j m x i c j 2 ,
where N is the number of samples, C the number of clusters, x i the feature vector of the i object, c j the centroid of cluster j , and u i j the degree of membership of x i in cluster j . The parameter m > 1 controls the fuzziness of the classification.
Membership degrees and cluster centroids are updated iteratively according to:
u i j = 1 k = 1 c | x i c j | | x i c k | 2 m 1 ,
c j = i = 1 n u i j m x i i = 1 n u i j m
The quality of the fuzzy partition is evaluated using the Fuzzy Partition Coefficient (FPC):
FPC = 1 N i = 1 N j = 1 C u i j 2 ,
with higher values indicating clearer cluster separation.
In this study, FCM was applied to the Gaia DR3 stellar sample using color ( B P R P ) and absolute magnitude ( M G ) as input features. The model aimed to reproduce the main structural patterns of the Hertzsprung–Russell diagram, specifically, the main sequence, red giants, and white dwarfs.
Students explored how modifying parameters such as the fuzzifier m or the number of clusters C alters the resulting classification, thereby understanding the balance between model flexibility and interpretability.
Through interactive visualizations, the FCM stage became an opportunity to conceptualize uncertainty as a continuous property of natural systems, rather than a source of error. In doing so, students engaged with fuzzy logic as both a computational technique and a philosophical lens for reasoning under ambiguity.
The specific results of the clustering process, including centroid positions, membership distributions, and visual interpretations, are presented in Section 4 (Results and Discussion).

3.4. Predictive and Explainable Modeling

The final stage of the FAS-XAI methodological framework combines supervised machine learning with XAI to evaluate and interpret the fuzzy classification results. After the unsupervised phase (Fuzzy C-Means) identified latent structures within the dataset, predictive modeling was employed to validate these groupings and quantify the discriminative relevance of the underlying variables.

3.4.1. Predictive Layer: XGBoost Classifier and Regressors

The predictive component was implemented using the Extreme Gradient Boosting (XGBoost) algorithm, chosen for its interpretability, computational efficiency, and robustness to non-linear relationships. Two complementary modeling strategies were designed:
1.
Multiclass Classification:
A gradient boosting classifier ( objective = multi:softprob ) was trained to predict the most probable cluster label identified by the fuzzy segmentation. The model received as input the same feature space used in the clustering process, such as stellar color index ( B P R P ) and absolute magnitude ( M G ), and produced a probability distribution over all clusters.
Its training objective minimized the multinomial logistic loss:
L = 1 N i = 1 N k = 1 C y i k l o g ( p i k ) ,
where y i k is the true class indicator (1 if sample i belongs to class k ) and p i k the model-predicted probability.
This predictive stage allowed students to verify how well the fuzzy-derived categories could be learned by a standard machine learning model, connecting unsupervised discovery with supervised generalization.
2.
Regression-Based Fuzzy Reconstruction:
In parallel, a set of XGBoost regressors ( C models) was trained to predict the fuzzy membership degrees ( μ i j ) for each cluster. This approach provided a continuous validation of the fuzzy logic layer, quantifying how accurately the model could reproduce partial memberships instead of discrete labels. The reconstruction consistency was later assessed through metrics such as R 2 , RMSE, and Pearson correlation.
Together, these two predictive paths, classification and regression, illustrate how learners can evaluate both the accuracy and fidelity of computational models to the underlying fuzzy structures.

3.4.2. Explainability Layer: From Feature Importance to Local Interpretation

To promote interpretability and foster reflective thinking, the framework incorporated XAI tools in two dimensions:
  • Global explanations, using feature importance metrics derived from the XGBoost ensemble, allowed students to identify which astrophysical variables most strongly influenced the model’s predictions. Bar plots and aggregated importance scores provided visual insight into how color, brightness, or parallax contribute to stellar classification.
  • Local explanations, based on LIME, were used to explore the behavior of the model around specific examples, representative stars or ambiguous data points. For each instance, LIME builds a locally weighted linear surrogate model:
f ^ ( x ) = w 0 + j = 1 p w j x j ,
where f ^ ( x ) approximates the global model f ( x ) near the point x , and w j represents the local contribution of feature j . This helped learners visualize how small variations in physical parameters could change the predicted cluster membership.
Through these XAI techniques, students were able to move beyond predictive performance metrics to understand and justify model behavior. The inclusion of interpretability within the FAS-XAI structure supported a deeper educational outcome: not merely achieving accurate models but comprehending why those models behave as they do.
From a pedagogical standpoint, the predictive and explainable modeling stage illustrates the integration of data-driven reasoning and critical interpretation. Students transition from descriptive exploration to analytical validation, learning how to reconcile the probabilistic nature of fuzzy logic with the transparency demanded by explainable machine learning.
The quantitative results, including classification accuracy, membership reconstruction quality, and feature interpretability analysis, are presented and discussed in Section 4 (Results and Discussion).

3.5. Strategic Interpretation

The methodological process concludes by integrating two complementary analytical dimensions that illustrate the educational and scientific scope of the study.
On the one hand, the analysis of comets and asteroids, based on data retrieved from NASA’s JPL Horizons and the Minor Planet Center, enables the visualization of heliocentric trajectories and the prediction of future transits of small Solar System bodies. This dynamic modeling allows students to understand orbital behavior, motion patterns, and the predictive value of ephemerides over multi-year intervals.
On the other hand, the FAS-XAI framework is applied to stellar data from Gaia DR3 to perform fuzzy clustering, identify stellar typologies, and explore the relationships between physical parameters through explainable machine learning. This approach combines fuzzy logic for uncertainty modeling with interpretable predictive models, providing insights into how data-driven methods can describe complex astrophysical structures.
Together, these two perspectives illustrate a coherent methodological workflow, linking data acquisition, visualization, fuzzy analysis, and interpretability. This dual framework allows learners and researchers to move from physical representation to computational explanation, bridging astronomical modeling and explainable artificial intelligence.
The resulting analyses and visualizations are presented in Section 4 (Results and Discussion).

4. Results and Discussion

This section presents the main outcomes derived from the application of the proposed methodological framework. The results are organized according to the two analytical perspectives described in Section 3: the exploration and prediction of comet and asteroid trajectories, and the fuzzy and explainable modeling of stellar classifications using Gaia data. The first part focuses on Exploratory Data Analysis, emphasizing descriptive statistics, correlations, and visualizations that help understand the physical and spatial behavior of the observed bodies. The second part introduces the FAS-XAI modeling results, showing how fuzzy clustering and interpretable machine learning contribute to identifying stellar patterns and typologies. From an educational standpoint, both components highlight how real astronomical data and data-driven methods can be used to foster critical reasoning and project-based learning in STEM contexts. Each subsection connects technical outcomes with their pedagogical implications, demonstrating how students can learn by exploring, modeling, and interpreting phenomena in a reproducible scientific workflow.

4.1. EDA of the Case Study

The exploratory data analysis phase aimed to characterize the behavior of the selected astronomical objects and identify significant patterns in their physical and observational parameters. Two complementary analyses were performed: one focused on comets and asteroids within the Solar System, and another on stellar data from the Gaia mission.

4.1.1. Comets and Asteroids

The first dataset comprised a representative selection of five asteroids (Ceres, Vesta, Eros, Phaethon, and Ryugu) and five comets (1P/Halley, 2P/Encke, 67P/Churyumov–Gerasimenko, 103P/Hartley 2, and C/2023 A3 Tsuchinshan–ATLAS).
Ephemerides for each object were obtained using the NASA JPL Horizons API via Astroquery, covering a six-year time window, five years backward and one forward, sampled at seven-day intervals. This yielded a total of 314 epochs per object, resulting in a balanced temporal dataset suitable for multi-object orbital comparison.
Each observation included heliocentric Cartesian coordinates ( x , y , z ) and derived orbital parameters describing spatial position, apparent motion, and angular geometry.
The main variables integrated into the dataset are represented in Table 1. The correlation matrix between variables is shown in Figure 2.
The strongest positive correlations were observed between r a u and Δ a u ( ρ 1.0 ), reflecting the geometric coupling between heliocentric and geocentric distances.
Conversely, negative correlations emerged between distance and orbital speed ( ρ 0.58 ), consistent with Kepler’s second law, as orbital velocity increases near perihelion. Spatial coordinates x , y , z exhibited strong mutual correlations ( ρ > 0.85 ), describing coherent heliocentric trajectories within the ecliptic plane.
Figure 3 illustrates the temporal evolution of the geocentric distance (Δ) and solar elongation (Elong) for ten representative small Solar System bodies, including five asteroids and five comets, within a six-year observation window (2020–2026).
The solid line represents the geocentric distance (Δ, in AU), while the dashed line corresponds to the solar elongation (in degrees). The combined analysis of both variables allows for the identification of opposition events (Elong ≈ 180°), where the object is at its maximum visibility from Earth, and of conjunctions (Elong ≈ 0°), where the object becomes optically hidden by the Sun. Asteroids in the main belt, such as Ceres and Vesta, exhibit regular and sinusoidal oscillations in both Δ and Elong, typical of short and stable orbits (~4.6 years). The peaks of solar elongation coincide with minima in geocentric distance, marking optimal observational windows. Their highly periodic patterns reflect the dynamic stability of the main asteroid belt. Near-Earth asteroids (NEAs), represented by Eros and Phaethon, show asymmetric and more pronounced oscillations in geocentric distance. These objects experience close encounters with Earth (Δ ≈ 0.5 AU for Eros), followed by longer periods of separation. Their elongation cycles are broader and more irregular, consistent with their eccentric and inclined orbital geometries and frequent opposition events every ~2 years. The asteroid Ryugu (type Apollo) displays a less periodic and more irregular pattern, with Δ fluctuating between 0.5 and 2 AU. The non-synchronous peaks between Δ and Elong indicate a high orbital inclination, where visibility and distance are not strongly correlated, in agreement with its known oblique orbit. Comets, Halley, Encke, Hartley 2, Churyumov–Gerasimenko, and Tsuchinshan–ATLAS, show markedly irregular and asymmetric curves, characteristic of high-eccentricity trajectories. Halley presents large peaks (Δ > 5 AU) within this time window, representing a distant orbital segment with no perihelion passage during the studied period. Encke exhibits a compact orbit (~3.3 years), with stable and recurrent Δ minima. Churyumov–Gerasimenko and Hartley 2 reveal moderate periodicity and broad elongation ranges, consistent with their well-studied cometary activity cycles. Tsuchinshan–ATLAS (C/2023 A3) shows a progressive decrease in Δ until 2024, corresponding to its approach toward perihelion, consistent with its predicted trajectory by the Minor Planet Center.
The histograms in Figure 4 represent the frequency of observations in each bin and reveal distinct dynamical behaviors between the two populations. Asteroids cluster tightly between 0.5 and 3 AU, reflecting their confinement within the inner Solar System, particularly in the main asteroid belt. In contrast, comets exhibit a broader and multimodal distribution, with some exceeding 35 AU, consistent with their highly eccentric and extended orbits originating in trans-Neptunian regions.
Regarding solar elongation, asteroids display a pronounced concentration at low elongation angles (<30°), gradually decreasing toward opposition (≈180°). This indicates that their apparent position is often near the Sun, reducing the time windows for optimal ground-based observation. Comets, however, present a flatter and wider elongation distribution, extending up to 180°, which highlights their greater observational variability. This behavior arises from their inclined and elongated orbital geometries, enabling them to become visible at diverse solar angles during perihelion approaches.
In addition to the temporal analysis of orbital parameters, the spatial motion of each body was rendered in three dimensions using Astroquery and Plotly in Python.
Figure 5 illustrates three snapshots of heliocentric trajectories for the ten selected objects (five asteroids and five comets) obtained from JPL Horizons. Each panel corresponds to a representative epoch within the study window, July 2022, May 2024, and August 2026, depicting the evolution of orbital positions over time. The paths are expressed in astronomical units (AU) relative to the Sun (orange point at the center).
Asteroids such as Ceres and Vesta maintain compact, coplanar orbits typical of the main belt, whereas comets like Halley and Tsuchinshan–ATLAS exhibit highly eccentric and inclined trajectories that extend well beyond the ecliptic. These visualizations highlight the contrast between the dynamical stability of asteroids and the long-period excursions characteristic of comets.
Interactive 3D versions of these trajectories allow students and educators to manipulate the viewpoint and observe orbital evolution dynamically, fostering an active and exploratory understanding of celestial mechanics. Interactive HTML visualization is available in Supplementary Materials, allowing readers to explore the full 3D orbital motion with greater clarity.

4.1.2. Stellar Data—Gaia Mission

The stellar dataset used in this study was obtained from the Gaia DR3 archive through the astroquery.gaia module, which enables direct SQL-like queries via the Astronomical Data Query Language (ADQL). The goal was to extract a representative, high-quality sample of nearby and bright stars to analyze their physical properties and to introduce students to the principles of stellar astrometry, photometry, and classification.
The ADQL query selected 8000 sources with apparent magnitude G < 13, parallax signal-to-noise ratio > 5.0, and RUWE < 1.6, ensuring reliable distance and photometric measurements. The resulting dataset contains 19 variables encompassing positional (ra, dec), geometric (parallax, distance_pc), kinematic (pmra, pmdec), and photometric (g_mag, bp_mag, rp_mag) parameters. Derived attributes such as color index bp_rp ( B P _ R P ) and absolute magnitude M_G ( M G ) were calculated to construct the Hertzsprung–Russell diagram, one of the central tools in stellar astrophysics.
Table 2 summarizes the structure, physical meaning, and educational interpretation of each variable included in the dataset. This combination of raw and derived features allows both scientific and pedagogical exploration: students can connect mathematical relationships (e.g., magnitude–distance–luminosity) with their physical significance and visual representation in data analysis workflows.
The histograms in Figure 6 summarize the main physical properties of the selected stellar sample.
The main patterns observed in the distributions are as follows:
  • The color index (BP_RP) shows a clear bimodal structure, with a dominant peak near 1.0 mag, corresponding to solar-type (G–K) stars, and a smaller group at BP_RP < 0.5, associated with hotter A–F type stars. This distribution reflects the prevalence of mid-temperature stars in the local Galactic neighborhood.
  • The absolute magnitude (M_G) histogram spans from –5 to +10 mag, with two visible concentrations: a bright group (M_G ≈ 0) representing giants and subgiants, and a broader peak around M_G ≈ 4, corresponding to main-sequence stars. This separation anticipates the morphology observed in the Hertzsprung–Russell diagram, Figure 7.
  • The distance distribution follows an exponential-like decay, concentrated below 2000 parsecs, consistent with the imposed quality filters (parallax_over_error > 5.0) that favor nearby, well-measured stars.
The HR diagram in Figure 7 displays the classical structure of stellar populations in the solar neighborhood, derived directly from Gaia DR3 data.
The vertical axis represents the absolute magnitude (M_G), corresponding to intrinsic luminosity, while the horizontal axis represents the color index (BP_RP), a proxy for effective temperature.
A well-defined main sequence extends diagonally from the upper left (hot, luminous blue stars) to the lower right (cool, faint red stars). Above it, a distinct concentration corresponds to the red giant branch, populated by evolved, high-luminosity stars. A smaller, diffuse region below the main sequence contains subdwarfs and white dwarfs, reflecting diversity in stellar evolution stages.
The color scale indicates distance, revealing that nearby stars (in purple) dominate the main sequence, while more distant ones (in green and yellow tones) are mostly giants, a geometric selection effect due to their brightness.
The correlation matrix in Figure 8 highlights the interdependence between the key stellar parameters used in subsequent modeling. The strongest relationships emerge between distance, parallax, and absolute magnitude, as expected from their geometric and photometric definitions.
The most relevant correlations are summarized below:
  • M_G and distance show a strong negative correlation (ρ = −0.75), reflecting that distant stars tend to appear intrinsically brighter in the sample, due to the observational bias favoring luminous objects at large distances.
  • M_G correlates positively with parallax (ρ = +0.6), consistent with the inverse relationship between distance and parallax.
  • The color index (BP_RP) correlates moderately with M_G (ρ = −0.53) and distance (ρ = +0.57), indicating that redder, cooler stars dominate the nearby population, while bluer, hotter stars are detected farther away due to their higher intrinsic luminosity.
  • Proper motion components (pmra, pmdec) show negligible correlations with the photometric and geometric variables, confirming their statistical independence for this bright-magnitude subsample.
These visual analyses provide an accessible entry point for students into stellar population analysis. By linking photometric and geometric parameters to physical properties such as temperature, luminosity, and spatial distribution, learners can intuitively grasp how astronomical data reflect underlying physical laws. The HR diagram and correlation structures thus serve as pedagogical bridges between observational data and the conceptual understanding of stellar classification and evolution.

4.2. Fuzzy—XAI

To extend the exploratory analysis into an interpretable modeling framework, FAS-XAI was applied to the Gaia DR3 stellar dataset. This phase integrates fuzzy logic, supervised learning, and XAI techniques to analyze stellar typology and evaluate the interpretability of data-driven models.
The process begins with FCM clustering, used to identify latent stellar groups within the color–magnitude space defined by (BP_RP, M_G). Three fuzzy clusters were initialized, corresponding to the main astrophysical categories of the Main Sequence, Giants, and White Dwarfs, thus enabling soft membership assignments for each star. The fuzzy partition coefficient (FPC) was used to evaluate the quality of the clustering, while membership degrees ( μ i ) allowed a probabilistic interpretation of stellar transitions between classes.
Subsequently, an XGBoost classifier was trained to predict the discrete cluster labels obtained from FCM, validating the coherence between unsupervised fuzzy segmentation and supervised classification. The probabilistic outputs of XGBoost were compared against the fuzzy membership vectors ( μ ), achieving high correlations across clusters (ρ ≈ 0.93–0.95), confirming the consistency between fuzzy and predictive perspectives.
Finally, explainability techniques such as LIME were employed to explore local model interpretability, allowing students to visualize how each photometric attribute contributes to the classification of individual stars.

4.2.1. Fuzzy C-Means Clustering Results

The FCM algorithm was applied to the stellar dataset using the photometric variables BP_RP and M_G, which together define the Hertzsprung–Russell (HR) diagram.
Following astrophysical convention, the number of clusters was set to c = 3, corresponding to the major evolutionary regions of the HR diagram: Main Sequence, Subgiant/Transitional Zone, and Giant Branch. This configuration provides a physically interpretable segmentation that balances model simplicity and astrophysical realism, while maintaining smooth transitions between classes, a key motivation for the use of fuzzy clustering rather than crisp partitioning.
Each panel in Figure 9 represents the membership degree ( μ ) of every star to one of the three clusters discovered by the FCM model. The x-axis shows the color index (BP_RP), a proxy for effective temperature, while the y-axis displays the absolute magnitude M_G (inverted scale, increasing upward with luminosity).
The three clusters identified by Fuzzy C-Means can be interpreted as follows:
  • Cluster 0—Intermediate Main Sequence:
The left panel reveals a continuous yellow-orange band along the central HR diagonal (≈0.5 ≲ BP_RP ≲ 1.8, 2 ≲ M_G ≲ 6). This region corresponds to F-, G-, and K-type stars, characterized by stable hydrogen fusion in their cores. The gradual decrease of μ toward lower luminosities reflect stars leaving the main sequence toward subgiant stages.
  • Cluster 1—Transitional Zone (Subgiants/Late Dwarfs):
In the central panel, μ peaks slightly below the main sequence (BP_RP ≈ 1.0–2.0, M_G ≈ 4–7). These stars likely represent objects in evolutionary transition, beginning to exhaust hydrogen and evolve toward cooler, less luminous phases. The diffuse overlap with Cluster 0 confirms the continuity of stellar evolution, where transitions occur gradually rather than through discrete boundaries.
  • Cluster 2—Red and Luminous Giants:
The right panel highlights high μ values (yellow tones) in the upper-right region (BP_RP > 1.5, M_G < 2). These stars are red giants and supergiants, showing low surface temperature but high intrinsic luminosity due to their expanded outer envelopes. The fuzzy overlap with Cluster 0 indicates early expansion signs in intermediate stars, reinforcing the interpretive power of fuzzy boundaries.
From both physical and methodological perspectives, this segmentation demonstrates that stellar evolution is inherently continuous and better captured through degrees of membership rather than rigid class assignments. Within the FAS-XAI framework, the fuzzy clustering acts as an unsupervised explainable model, revealing interpretable linguistic rules such as:
If (BP_RP ≈ 1.0 and M_G ≈ 4.5), then μ1 ≈ 0.7, indicating a likely subgiant.
Table 3 lists representative Gaia DR3 stars with the highest fuzzy membership per cluster (4 per cluster). The μ-vectors confirm the interpretability of the three regions in the HR diagram (main sequence, transition/subgiants, and red giants), providing transparent examples for replication.

4.2.2. Predictive and Explainable Modeling

To validate the fuzzy segmentation obtained in the previous step, two supervised models were trained using XGBoost: a classifier to predict the dominant fuzzy cluster label, and a regressor to estimate the continuous membership degrees (μ0, μ1, μ2).
This dual approach was chosen to ensure both categorical interpretability (clear cluster identification) and quantitative coherence (preservation of fuzzy membership continuity).
The predictors included astrometric, photometric, and quality-related variables extracted from Gaia DR3:
  • Photometric features: bp_rp, M_G, g_mag;
  • Geometric features: parallax, distance_pc, parallax_error, parallax_over_error;
  • Kinematic features: pmra, pmdec;
  • Quality indicator: ruwe.
  • The target variables were:
  • For the classifier: cluster (0 = main sequence, 1 = subgiant/transition, 2 = red giant);
  • For the regressors: μ 0 , μ 1 , μ 2 , representing the membership degrees obtained by Fuzzy C-Means.
The XGBoost Classifier achieved near-perfect consistency with the fuzzy partitions (Figure 10), yielding an accuracy of 0.994 and an F1-score of 0.994. Misclassifications were rare and occurred mainly between adjacent clusters (C0–C1), reflecting natural overlap between evolutionary phases.
The model reproduces the fuzzy clusters with minimal error, primarily along the transition boundary between main-sequence and subgiant stars.
For the XGBoost Regressors (Table 4), the results confirm an almost one-to-one correspondence with the fuzzy membership functions:
This dual modeling approach allows students to contrast discrete and continuous learning paradigms in astrophysics. While the classifier emulates crisp categorization (typical of conventional ML), the regressors preserve the gradual transitions inherent to stellar evolution, thus reinforcing the conceptual value of fuzzy logic in scientific modeling.

4.2.3. Model Explainability via LIME

To assess the interpretability of the predictive model, the LIME technique was applied to both global and local perspectives. LIME decomposes the XGBoost classifier’s decisions into locally linear contributions, allowing students to visualize which stellar parameters most strongly influence each cluster assignment.
Global Feature Importance
Figure 11 (LIME global) reveals that the absolute magnitude M_G overwhelmingly dominates the model’s decisions, followed by the color index B P _ R P .
This aligns perfectly with the astrophysical interpretation of the Hertzsprung–Russell (HR) diagram, where luminosity defines the vertical axis and color (temperature) the horizontal one. The remaining variables, parallax, proper motions, and measurement errors, have minor but stabilizing roles, refining positional and geometric corrections without affecting the main physical classification.
M_G and BP_RP jointly determine the cluster membership, while geometric variables act as secondary refinement.
Local Explanations by Cluster
Local explanations for representative stars in each fuzzy cluster confirm the model’s physical consistency, Figure 12, Figure 13 and Figure 14:
  • Cluster C0—Subgiants/Transition Region:
0.6 < M _ G < 1.9 with intermediate color and low parallax error. These correspond to evolving main-sequence stars beginning their expansion into the giant branch.
Figure 12. Local LIME explanations for the XGBoost classifier, Cluster 0.
Figure 12. Local LIME explanations for the XGBoost classifier, Cluster 0.
Education 15 01688 g012
  • Cluster C1—Main Sequence:
1.9 < M _ G < 3.5 and moderate B P _ R P values.
Stars with balanced color–luminosity combinations are classified as stable hydrogen-fusing dwarfs (F–K types).
Figure 13. Local LIME explanations for the XGBoost classifier, Cluster 1.
Figure 13. Local LIME explanations for the XGBoost classifier, Cluster 1.
Education 15 01688 g013
  • Cluster C2—Red Giants and Supergiants:
M _ G 0.6 and B P _ R P > 1.5 .
LIME assigns high positive weights to low M _ G values (high luminosity), identifying cool, bright stars.
Figure 14. Local LIME explanations for the XGBoost classifier, Cluster 2.
Figure 14. Local LIME explanations for the XGBoost classifier, Cluster 2.
Education 15 01688 g014
Both global and local interpretability confirm that luminosity and temperature are the principal discriminators of stellar types, as expected from the Hertzsprung–Russell framework. LIME reproduces the same physical hierarchy detected by Fuzzy C-Means, but in a rule-based and human-readable format, for example:
  • If M _ G 0.6 and B P _ R P > 1.5 ; then class ≈ C2 (giant star).
  • If 1.9 < M _ G < 3.5 ; then class ≈ C1 (main sequence).
Fuzzy C-Means captures a continuous region spanning the intermediate main sequence and its gradual transition toward subgiant phases, while LIME highlights the most discriminative frontiers, where stellar evolution begins to deviate from hydrogen-core fusion equilibrium. This coherence between fuzzy and explainable perspectives reinforces the interpretive robustness of the FAS-XAI framework.
This phase exemplifies how Explainable AI bridges abstract machine-learning models and physical interpretation, enabling students to connect computational reasoning with astrophysical principles in a transparent and conceptually grounded way.

4.2.4. Integrative Interpretation of FAS-XAI Results

The combined application of fuzzy clustering, supervised prediction, and explainable AI reveals a consistent and physically meaningful structure across the stellar sample. Each method highlights a complementary perspective: Fuzzy C-Means delineates continuous evolutionary regions, XGBoost captures the discriminative geometry of these regions, and LIME expresses the decision logic in human-readable terms. Together, they form a transparent framework that mirrors the continuity and transitions inherent in stellar evolution.
Cluster 0—Subgiants and Transitional Stars
In the Fuzzy C-Means representation, Cluster 0 extends along the central diagonal of the Hertzsprung–Russell diagram ( 0.5 B P _ R P 1.8 , 2 M _ G 6 ), encompassing F–K type stars undergoing stable hydrogen fusion. The gradual fading of the membership degree ( μ ) toward higher M_G values reflects stars that are beginning to evolve off the main sequence. Conversely, LIME narrows this interval to 0.6 < M _ G < 1.9 , isolating the transitional boundary where stars initiate their expansion into the subgiant branch. The discrepancy is thus not a contradiction but a refinement: Fuzzy C-Means captures the entire evolutionary continuum, while LIME highlights the critical inflection zone where the physical state begins to change.
Cluster 1—Main Sequence Stars
Cluster 1, in the fuzzy model, spans the range 4 M _ G 7 , representing the cooler end of the main sequence (late F to early M dwarfs). These stars maintain high stability and density in the HR plane. LIME redefines this region within 1.9 < M _ G < 3.5 , focusing on the statistically most discriminative subset, hydrogen-fusing dwarfs of moderate luminosity and color. While FCM perceives the main sequence as a broad continuum of stellar stability, LIME extracts its most distinctive core, where stellar parameters remain tightly correlated and predictable. This alignment confirms the model’s ability to reproduce the canonical shape of the main sequence through data-driven reasoning.
Cluster 2—Red Giants and Supergiants
Both methods converge in identifying a distinct region of high luminosity and red color (M_G ≤ 0.6, BP_RP > 1.5), corresponding to late K–M giants and supergiants. In the fuzzy model, this cluster exhibits the highest partition coefficient and minimal overlap, indicating strong separability from other stellar types. LIME reinforces this result by assigning consistently high positive weights to low M_G values (high intrinsic brightness), confirming that luminosity is the dominant factor driving the classification. This coherence across techniques demonstrates that the fuzzy and explainable layers of FAS-XAI both recover the giant branch as a physically well-defined locus in the HR diagram.
Overall, FAS-XAI succeeds in linking unsupervised discovery, supervised validation, and explainable interpretation within a unified educational workflow. Fuzzy C-Means represents the continuity of stellar evolution; XGBoost provides predictive structure; and LIME translates it into conceptual rules accessible to students and researchers alike. The combination of these views enables a pedagogically rich and scientifically rigorous exploration of stellar populations, illustrating how AI, when made explainable, can reconstruct the same physical laws that astronomers derive from observation.

5. Conclusions and Future Work

5.1. Conclusions

This study demonstrates the educational potential of integrating real astronomical data and explainable artificial intelligence within a PBL framework. By combining datasets from NASA JPL Horizons, the Minor Planet Center, and the Gaia DR3 mission through Astroquery, students engage with authentic scientific workflows, retrieving, analyzing, and visualizing orbital and photometric data in Python.
The experience shows that astronomy provides an ideal gateway for fostering data literacy, scientific reasoning, and critical thinking, enabling learners to move beyond the passive use of generative tools toward an understanding of how models work and why results emerge.
From an educational standpoint, two key outcomes have been observed:
  • First, the use of open astronomical data significantly enhances student motivation and participation. Working with comets, asteroids, and stellar populations transforms abstract computational exercises into meaningful, context-rich explorations of the Universe.
  • Second, the incorporation of the FAS-XAI framework, which combines fuzzy logic, predictive modeling, and interpretability, helps students to grasp the link between computational reasoning and physical principles.
Instead of treating AI as a “black box,” learners are encouraged to explain, visualize, and validate their models using both global and local interpretability techniques such as LIME. This analytical transparency strengthens their understanding of uncertainty, classification, and decision boundaries.
The implementation across two academic programs, Mathematics (Open Data) and Physics (Advanced Scientific Computing and Data Mining) at the Universidad Europea of Madrid, has been notably successful.
Students expressed high engagement, curiosity, and enjoyment, emphasizing the sense of working like “real scientists.” The integration of interactive visualizations and dynamic simulations further enhanced their immersion, allowing them to explore orbital dynamics and stellar clustering intuitively.
The results obtained in the classroom reveal that when students are given control over real data and interpretative tools, they acquire technical skills and develop a reflective and critical approach to knowledge construction.

5.2. Future Work

Future work will expand the current framework in these main directions:
  • First, by extending the analysis to astronomical imaging datasets, incorporating data from SkyView and personal observations obtained with the Seestar S50 telescope. These will enable students to perform image stacking, evaluate noise-reduction techniques, and apply fuzzy and explainable models to pattern recognition in galaxies, nebulae, and star clusters.
  • Second, by contrasting AI-based enhancement methods with instrumental stacking from the Seestar S50, students will explore how explainable models can assist in noise interpretation and signal recovery from low-light astrophotography.
  • Third, by integrating the FAS-XAI methodology into broader STEM curricula, the project aims to bring interpretability-driven scientific inquiry to diverse domains, demonstrating that explainable AI can serve both as a research tool and a pedagogical catalyst for curiosity-driven learning.
  • Ultimately, the FAS-XAI framework can be extended far beyond astronomy. Once students have mastered the workflow, from data acquisition and fuzzy modeling to explainable interpretation, they can transfer these processes to any domain they feel most connected to.
Final projects are intentionally open-ended: learners are encouraged to select their own topics, apply the methodology creatively, and explore the subjects they are passionate about.
This autonomy transforms technical competence into genuine scientific agency, empowering students not only to analyze data, but to understand, explain, and discover through the lens of interpretable artificial intelligence. In doing so, they cultivate a mindset grounded in curiosity and critical thinking, one that guides their approach to any learning or research endeavor.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/educsci15121688/s1.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used in this study are publicly available from open astronomical archives. Orbital and photometric data for comets and asteroids were obtained from NASA’s JPL Horizons (Giorgini et al., 1996) system and the International Astronomical Union’s Minor Planet Center (Minor Planet Center [MPC], 2024). Stellar parameters were retrieved from the ESA Gaia mission Data Release 3 (Tanga et al., 2023) using the Astroquery Python library (Ginsburg et al., 2019). All data are openly accessible and reproducible through direct queries. All Python scripts, preprocessing routines, and dynamic visualizations developed for this study are available from the corresponding author upon reasonable request. Because these files include classroom materials and educational resources, unrestricted public release is not possible.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Akeson, R. L., Chen, X., Ciardi, D., Crane, M., Good, J., Harbut, M., Jackson, E., Kane, S. R., Laity, A. C., Leifer, S., Lynn, M., McElroy, D. L., Papin, M., Plavchan, P., Ramírez, S. V., Rey, R., von Braun, K., Wittman, M., Abajian, M., … Zhang, A. (2013). The NASA exoplanet archive: Data and tools for exoplanet research. Publications of the Astronomical Society of the Pacific, 125(930), 989–999. [Google Scholar] [CrossRef]
  2. Barra, V., Delouille, V., Kretzschmar, M., & Hochedez, J. F. (2009). Fast and robust segmentation of solar EUV images: Algorithm and results for solar cycle 23. Astronomy and Astrophysics, 505(1), 361–371. [Google Scholar] [CrossRef]
  3. Benvenuto, F., Piana, M., Campi, C., & Massone, A. M. (2018). A hybrid supervised/unsupervised machine learning approach to solar flare prediction. The Astrophysical Journal, 853(1), 90. [Google Scholar] [CrossRef]
  4. Buxner, S. R., Impey, C. D., Romine, J., & Nieberding, M. (2018). Linking introductory astronomy students’ basic science knowledge, beliefs, attitudes, sources of information, and information literacy. Physical Review Physics Education Research, 14(1), 010142. [Google Scholar] [CrossRef]
  5. Colazo, M., Alvarez-Candal, A., & Duffard, R. (2022). Zero-phase angle asteroid taxonomy classification using unsupervised machine learning algorithms. Astronomy and Astrophysics, 666, A77. [Google Scholar] [CrossRef]
  6. Costa, I. A., Morais, C., Aguiar, T., & Silva, A. (2025). Democratizing Astronomy through teacher training in Portuguese-speaking contexts. Open Astronomy, 34(1), 20250019. [Google Scholar] [CrossRef]
  7. Domenech-Casal, J., & Ruiz-Espana, N. (2017). Mission to Stars: A Science and Technology educational project on astronomy, spatial missions and scientific research. Revista Eureka Sobre Enseñanza y Divulgación de las Ciencias, 14(1), 98–114. [Google Scholar]
  8. Ferreira, M., da Fonseca, M. O., Batista, M. C., da Silva Filho, O. L., & Strapasson, A. (2025). Greek Astromythology: Intersections between mythology history and modern astronomy education. Frontiers in Education, 10, 1431336. [Google Scholar] [CrossRef]
  9. Ginsburg, A., Sipőcz, B. M., Brasseur, C. E., Cowperthwaite, P. S., Craig, M. W., Deil, C., Guillochon, J., Guzman, G., Liedtke, S., Lim, P. L., Lockhart, K. E., Mommert, M., Morris, B. M., Norman, H., Parikh, M., Persson, M. V., Robitaille, T. P., Segovia, J.-C., Singer, L. P., … Woillez, J. (2019). Astroquery: An astronomical web-querying package in Python. The Astronomical Journal, 157(3), 98. [Google Scholar] [CrossRef]
  10. Giorgini, J. D., Yeomans, D. K., Chamberlin, A. B., Chodas, P. W., Jacobson, R. A., Keesey, M. S., Lieske, J. H., Ostro, S. J., Standish, E. M., & Wimberly, R. N. (1996). JPL’s on-line solar system data service. Bulletin of the American Astronomical Society, 28, 1158. [Google Scholar]
  11. Huerta-Cancino, L., & Ale-Silva, J. (2024). Augmented astronomy for science teaching and learning. In J. Wei, & G. Margetis (Eds.), Human-centered design, operation and evaluation of mobile communications, Pt I, mobile 2024 (Vol. 14737, pp. 235–253). Springer Nature. [Google Scholar] [CrossRef]
  12. Karpouzis, K. (2024). Explainable AI for intelligent tutoring systems (pp. 59–70). Springer Nature. [Google Scholar] [CrossRef]
  13. Khosravi, H., Shum, S. B., Chen, G., Conati, C., Tsai, Y. S., Kay, J., Knight, S., Martinez-Maldonado, R., Sadiq, S., & Gašević, D. (2022). Explainable artificial intelligence in education. Computers and Education: Artificial Intelligence, 3(May), 100074. [Google Scholar] [CrossRef]
  14. Krajcik, J. S., & Blumenfeld, P. C. (2005). Project-based learning. In R. K. Sawyer (Ed.), The Cambridge handbook of the learning sciences (pp. 317–334). Cambridge University Press. [Google Scholar]
  15. Langer, N., & Kudritzki, R. P. (2014). The spectroscopic Hertzsprung-Russell diagram. Astronomy & Astrophysics, 564, A52. [Google Scholar] [CrossRef]
  16. Lee, K. M. (2017). Astronomy education awards in the IUSE:EHR portfolio. Physics Teacher, 55(1), 58–60. [Google Scholar] [CrossRef]
  17. Lin, C. C., Huang, A. Y. Q., & Lu, O. H. T. (2023). Artificial intelligence in intelligent tutoring systems toward sustainable education: A systematic review. Smart Learning Environments, 10(1), 41. [Google Scholar] [CrossRef]
  18. Liu, Y., Shen, Y. P., Song, H. Q., Yan, F. B., & Su, Y. R. (2024). Solar radio spectrogram segmentation algorithm based on improved fuzzy C-means clustering and adaptive cross filtering. Physica Scripta, 99(4), 45005. [Google Scholar] [CrossRef]
  19. Marín Díaz, G. (2025a). Fuzzy C-means and explainable AI for quantum entanglement classification and noise analysis. Mathematics, 13, 1056. [Google Scholar] [CrossRef]
  20. Marín Díaz, G. (2025b). Supporting reflective AI use in education: A fuzzy-explainable model for identifying cognitive risk profiles. Education Sciences, 15(7), 923. [Google Scholar] [CrossRef]
  21. Marín Díaz, G., Gómez Medina, R., & Aijón Jiménez, J. A. (2024). Integrating fuzzy C-means clustering and explainable AI for robust galaxy classification. Mathematics, 12(18), 2797. [Google Scholar] [CrossRef]
  22. Marín Díaz, G., Gómez Medina, R., & Aijón Jiménez, J. A. (2025). A methodological framework for business decisions with explainable AI and the analytic hierarchical process. Processes, 13(1), 102. [Google Scholar] [CrossRef]
  23. Minor Planet Center (MPC). (2024). MPC database of comets and minor planets. Available online: https://minorplanetcenter.net/ (accessed on 10 October 2025).
  24. Molnar, C. (2019). Interpretable machine learning. In A guide for making black box models explainable (Book, 247). Lean Publishing. Available online: https://christophm.github.io/interpretable-ml-book (accessed on 10 October 2025).
  25. Offner, S. S. R., Taylor, J., Markey, C., Chen, H. H. H., Pineda, J. E., Goodman, A. A., Burkert, A., Ginsburg, A., & Choudhury, S. (2022). Turbulence, coherence, and collapse: Three phases for core evolution. Monthly Notices of the Royal Astronomical Society, 517(1), 885–909. [Google Scholar] [CrossRef]
  26. Rodrigues, L., Meneses, A., Montenegro, M., & Cortes, C. (2025). Direct and indirect opportunities to learn astronomy within the chilean science curriculum. International Journal of Science an Mathematics Education, 23(1), 169–191. [Google Scholar] [CrossRef]
  27. Shafique, U., & Qaiser, H. (2014). A comparative study of data mining process models (KDD, CRISP-DM and SEMMA). International Journal of Innovation and Scientific Research, 12(1), 217–222. Available online: http://www.ijisr.issr-journals.org/ (accessed on 10 October 2025).
  28. Sugeno, M., & Yasukawa, T. (1993). A fuzzy-logic-based approach to qualitative modeling. IEEE Transactions on Fuzzy Systems, 1(1), 7–20. [Google Scholar] [CrossRef]
  29. Swift, J. J., Andersen, K., Arculli, T., Browning, O., Ding, J., Edwards, N., Fanning, T., Geyer, J., Huber, G., Jin-Ngo, D., Kelliher, B., Kirkpatrick, C., Kirkpatrick, L., Klink, D., Lavine, C., Lawrence, G., Lawrence, Y., Cyrus Leung, F. L., Luebbers, J., … Hedrick, R. (2022). The renovated thacher observatory and first science results. Publications of the Astronomical Society of the Pacific, 134(1033), 035005. [Google Scholar] [CrossRef]
  30. Szabó, G. M., Kálmán, S., Borsato, L., Hegedus, V., Mészáros, S., & Szabó, R. (2023). Sub-Jovian desert of exoplanets at its boundaries: Parameter dependence along the main sequence. Astronomy and Astrophysics, 671, A132. [Google Scholar] [CrossRef]
  31. Taghizadeh-Popp, M., Kim, J. W., Lemson, G., Medvedev, D., Raddick, M. J., Szalay, A. S., Thakar, A. R., Booker, J., Chhetri, C., Dobos, L., & Rippin, M. (2020). SciServer: A science platform for astronomy and beyond. Astronomy and Computing, 33, 100412. [Google Scholar] [CrossRef]
  32. Tanga, P., Pauwels, T., Mignard, F., Muinonen, K., Cellino, A., David, P., Hestro, D., Spoto, F., & Berthier, J. (2023). Astrophysics special issue gaia data release 3 the solar system survey. Astronomy and Astrophysics, 12, A12. [Google Scholar] [CrossRef]
  33. Uzpen, B., Houseal, A. K., Slater, T. F., & Nuhfer, E. B. (2019). Scientific and quantitative literacy: A comparative study between STEM and non-STEM undergraduates taking physics. European Journal of Physics, 40(3), 035701. [Google Scholar] [CrossRef]
Figure 1. FAS-XAI educational framework.
Figure 1. FAS-XAI educational framework.
Education 15 01688 g001
Figure 2. Correlation matrix among orbital parameters.
Figure 2. Correlation matrix among orbital parameters.
Education 15 01688 g002
Figure 3. Temporal evolution of geocentric distance (Δ)–solar elongation (Asteroids–Comets).
Figure 3. Temporal evolution of geocentric distance (Δ)–solar elongation (Asteroids–Comets).
Education 15 01688 g003
Figure 4. Histograms of geocentric distance (Δ) and solar elongation for asteroids and comets.
Figure 4. Histograms of geocentric distance (Δ) and solar elongation for asteroids and comets.
Education 15 01688 g004
Figure 5. Heliocentric trajectories of the selected asteroids and comets, shown as three snapshots across a seven-year interval (6 years of history and 1 year ahead).
Figure 5. Heliocentric trajectories of the selected asteroids and comets, shown as three snapshots across a seven-year interval (6 years of history and 1 year ahead).
Education 15 01688 g005
Figure 6. Distributions of key stellar parameters derived from the Gaia DR3 dataset.
Figure 6. Distributions of key stellar parameters derived from the Gaia DR3 dataset.
Education 15 01688 g006
Figure 7. Hertzsprung–Russell (HR) diagram for the Gaia DR3 stellar sample.
Figure 7. Hertzsprung–Russell (HR) diagram for the Gaia DR3 stellar sample.
Education 15 01688 g007
Figure 8. Correlation heatmap among the main stellar parameters derived from Gaia DR3.
Figure 8. Correlation heatmap among the main stellar parameters derived from Gaia DR3.
Education 15 01688 g008
Figure 9. Fuzzy clustering of stellar populations in the Hertzsprung–Russell diagram, showing membership degrees ( μ ) for the three main evolutionary regions.
Figure 9. Fuzzy clustering of stellar populations in the Hertzsprung–Russell diagram, showing membership degrees ( μ ) for the three main evolutionary regions.
Education 15 01688 g009
Figure 10. Confusion matrix of the XGBoost Classifier validating fuzzy segmentation.
Figure 10. Confusion matrix of the XGBoost Classifier validating fuzzy segmentation.
Education 15 01688 g010
Figure 11. Global LIME explanations for the XGBoost classifier.
Figure 11. Global LIME explanations for the XGBoost classifier.
Education 15 01688 g011
Table 1. Structure and variable description of the comet and asteroid ephemerides dataset.
Table 1. Structure and variable description of the comet and asteroid ephemerides dataset.
VariableDescriptionUnits
dateObservation epoch (UTC)ISO format
x, y, zHeliocentric coordinatesAU
r_auHeliocentric distance ( r = x 2 + y 2 + z 2 )AU
delta_auGeocentric distanceAU
elong_degSolar elongation (Sun–Earth–object angle)degrees
speed_au_dApparent heliocentric velocityAU/day
lambda_degEcliptic longitudedegrees
beta_degEcliptic latitudedegrees
Table 2. Description and educational interpretation of variables included in the Gaia DR3 stellar dataset.
Table 2. Description and educational interpretation of variables included in the Gaia DR3 stellar dataset.
VariableTypeUnitsDescriptionPhysical/Educational Interpretation
source_idIdentifierUnique Gaia DR3 identifier of each star.Used for data traceability (non-analytical).
raPositionaldegrees (°)Right Ascension: horizontal coordinate in the equatorial system (celestial longitude).Locates the star in the sky (0–360°).
decPositionaldegrees (°)Declination: vertical coordinate in the equatorial system (celestial latitude).Together with ra, defines the star’s position on the celestial sphere.
parallaxGeometricmilliarcseconds (mas)Trigonometric parallax: apparent displacement due to Earth’s orbital motion.Inversely proportional to distance. Distance (pc) ≈ 1000/parallax (mas).
parallax_errorGeometricmasStandard uncertainty of parallax measurement.Used as a quality indicator for distance_pc.
parallax_over_errorDimensionlessSignal-to-noise ratio of the parallax.Indicates measurement reliability; values >5 imply high accuracy.
pmra, pmdecKinematicmas yr−1Proper motion components in right ascension and declination.Quantify the star’s apparent motion across the sky due to real spatial velocity.
ruweQualityRenormalized Unit Weight Error: statistical indicator of astrometric fit quality.RUWE ≈ 1—good fit; RUWE > 1.6—possible binarity or systematic errors.
g_magPhotometricmagnitudesIntegrated magnitude in the Gaia G (white light) band.Represents total brightness; lower values indicate higher luminosity.
bp_mag, rp_magPhotometricmagnitudesMagnitudes in the blue (BP) and red (RP) bands.It is used to compute the stellar color index (bp_rp), related to temperature.
bp_rpDerived (Color)magnitudesColor index is defined as BP_ RP.Photometric color: direct indicator of effective temperature (cool stars—redder, hot stars—bluer).
distance_pcDerived (Geometric)parsecs (pc)Estimated distance from the Sun, derived from parallax.Converts apparent brightness to intrinsic luminosity.
M_GDerived (Luminosity)absolute magnitudesAbsolute magnitude in the G band, corrected for distance modulus.Reflects intrinsic stellar power; plotted against bp_rp in the HR diagram.
random_indexTechnicalGaia random index used for sampling.No physical meaning; ensures reproducibility.
Table 3. Representative Gaia DR3 stars with their fuzzy memberships (μ0, μ1, μ2).
Table 3. Representative Gaia DR3 stars with their fuzzy memberships (μ0, μ1, μ2).
ClusterSource_idbp_rpM_Gg_magDistance_pcParallaxpmrapmdecruwe μ 0 μ 1 μ 2 μ s e l
02675817814283011.26501.40712.6317560.56946−5.85923.9081.00730.9993800.0002390.0003810.999380
06114599803781391.212813.58912.6521813.60.5514−21.1878.16360.975650.9993100.0002570.0004330.999310
03974991339094901.245113.62410.313616.861.6211−35.69−3.00690.929310.9993000.0002560.0004440.999300
03493678204186881.214313.57512.761907.70.5242−7.18350.506991.10930.9992800.0002680.0004520.999280
14553607104126830.89564.059211.242273.193.6605−26.73−9.2371.34570.0001200.9998500.0000300.999850
14389642844953780.87544.069712.839567.321.76275.977−10.8650.965520.0001300.9998290.0000410.999829
14468654678243360.84974.021511.58324.83.0788−4.6838−5.07120.742490.0001400.9998200.0000400.999820
16324845187429240.84634.05512.042395.822.5264−6.4286−1.5961.14580.0001600.9997900.0000500.999790
25241518576976011.9216−0.6198412.2763795.50.26347−5.13282.44050.894830.0005300.0001060.9993640.999364
25335620391368691.9951−0.563611.3132372.80.421441.353−2.30591.01840.0005400.0001070.9993530.999353
25319180106901202.0026−0.5733512.07833910.2949−1.02194.44151.01450.0006600.0001310.9992090.999209
25347313963753391.888−0.5718512.5744257.90.23486−5.5263.85520.970370.0008500.0001660.9989840.998984
Table 4. Predictive performance of XGBoost regressors (μ0, μ1, μ2).
Table 4. Predictive performance of XGBoost regressors (μ0, μ1, μ2).
TargetR2RMSECorr
μ00.9990.0111.000
μ10.9990.0091.000
μ20.9990.0101.000
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Marín Díaz, G. Integrating Exploratory Data Analysis and Explainable AI into Astronomy Education: A Fuzzy Approach to Data-Literate Learning. Educ. Sci. 2025, 15, 1688. https://doi.org/10.3390/educsci15121688

AMA Style

Marín Díaz G. Integrating Exploratory Data Analysis and Explainable AI into Astronomy Education: A Fuzzy Approach to Data-Literate Learning. Education Sciences. 2025; 15(12):1688. https://doi.org/10.3390/educsci15121688

Chicago/Turabian Style

Marín Díaz, Gabriel. 2025. "Integrating Exploratory Data Analysis and Explainable AI into Astronomy Education: A Fuzzy Approach to Data-Literate Learning" Education Sciences 15, no. 12: 1688. https://doi.org/10.3390/educsci15121688

APA Style

Marín Díaz, G. (2025). Integrating Exploratory Data Analysis and Explainable AI into Astronomy Education: A Fuzzy Approach to Data-Literate Learning. Education Sciences, 15(12), 1688. https://doi.org/10.3390/educsci15121688

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop