Next Article in Journal
Detecting Ransomware Through Dynamic API Call Monitoring and Machine Learning
Previous Article in Journal
The Contribution of Removing Reconstructable Triangles in the Lossless Compression of a Triangular Mesh CAD Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

A Hierarchical Classification Framework for Earth Science Data Based on Large Language Models and Label Graph Constraints

1
College of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China
2
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(11), 5230; https://doi.org/10.3390/app16115230 (registering DOI)
Submission received: 8 April 2026 / Revised: 20 May 2026 / Accepted: 22 May 2026 / Published: 23 May 2026

Abstract

The rapid growth of Earth science observation and simulation data has made efficient data classification increasingly challenging, particularly under conditions of limited annotation resources and continuously evolving data semantics. Conventional classification methods rely heavily on large-scale labeled datasets, which are costly to construct and difficult to adapt to dynamic classification systems. This paper proposes a hierarchical classification framework for Earth science data that leverages large language models (LLMs) and explicitly incorporates hierarchical label relationships to constrain model inference and enhance classification consistency across complex, domain-specific semantic spaces. The framework further integrates retrieval-augmented generation (RAG) and knowledge graph (KG) techniques to introduce external domain knowledge and explicit semantic constraints, enhancing contextual understanding, interpretability, and adaptability to semantic evolution. A benchmark dataset with a two-level hierarchical label structure is constructed based on official NASA metadata. Experimental results demonstrate that by integrating few-shot learning and label space optimization strategies, the proposed framework steadily outperforms various baseline methods in hierarchical classification tasks. Compared with the Bert-BiLSTM model, it achieves an absolute improvement of 8.68% in Micro-F1 and 29.92% in Macro-F1 on the overall hierarchical paths. The framework demonstrates clear advantages in long-tailed data distributions, particularly for minority classes, highlighting its potential for scalable annotation and efficient management of large-scale Earth science datasets.
Keywords: earth-science; metadata governance; GCMD; LLMs; hierarchical classification earth-science; metadata governance; GCMD; LLMs; hierarchical classification

Share and Cite

MDPI and ACS Style

Zhao, L.; Chen, Z.; Li, G.; Guo, H.; Li, J. A Hierarchical Classification Framework for Earth Science Data Based on Large Language Models and Label Graph Constraints. Appl. Sci. 2026, 16, 5230. https://doi.org/10.3390/app16115230

AMA Style

Zhao L, Chen Z, Li G, Guo H, Li J. A Hierarchical Classification Framework for Earth Science Data Based on Large Language Models and Label Graph Constraints. Applied Sciences. 2026; 16(11):5230. https://doi.org/10.3390/app16115230

Chicago/Turabian Style

Zhao, Le, Zugang Chen, Guoqing Li, Hengliang Guo, and Jing Li. 2026. "A Hierarchical Classification Framework for Earth Science Data Based on Large Language Models and Label Graph Constraints" Applied Sciences 16, no. 11: 5230. https://doi.org/10.3390/app16115230

APA Style

Zhao, L., Chen, Z., Li, G., Guo, H., & Li, J. (2026). A Hierarchical Classification Framework for Earth Science Data Based on Large Language Models and Label Graph Constraints. Applied Sciences, 16(11), 5230. https://doi.org/10.3390/app16115230

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop