This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
A Hierarchical Classification Framework for Earth Science Data Based on Large Language Models and Label Graph Constraints
by
Le Zhao
Le Zhao 1
,
Zugang Chen
Zugang Chen 1,2,*
,
Guoqing Li
Guoqing Li 1,2,
Hengliang Guo
Hengliang Guo 1 and
Jing Li
Jing Li 2
1
College of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China
2
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(11), 5230; https://doi.org/10.3390/app16115230 (registering DOI)
Submission received: 8 April 2026
/
Revised: 20 May 2026
/
Accepted: 22 May 2026
/
Published: 23 May 2026
Abstract
The rapid growth of Earth science observation and simulation data has made efficient data classification increasingly challenging, particularly under conditions of limited annotation resources and continuously evolving data semantics. Conventional classification methods rely heavily on large-scale labeled datasets, which are costly to construct and difficult to adapt to dynamic classification systems. This paper proposes a hierarchical classification framework for Earth science data that leverages large language models (LLMs) and explicitly incorporates hierarchical label relationships to constrain model inference and enhance classification consistency across complex, domain-specific semantic spaces. The framework further integrates retrieval-augmented generation (RAG) and knowledge graph (KG) techniques to introduce external domain knowledge and explicit semantic constraints, enhancing contextual understanding, interpretability, and adaptability to semantic evolution. A benchmark dataset with a two-level hierarchical label structure is constructed based on official NASA metadata. Experimental results demonstrate that by integrating few-shot learning and label space optimization strategies, the proposed framework steadily outperforms various baseline methods in hierarchical classification tasks. Compared with the Bert-BiLSTM model, it achieves an absolute improvement of 8.68% in Micro-F1 and 29.92% in Macro-F1 on the overall hierarchical paths. The framework demonstrates clear advantages in long-tailed data distributions, particularly for minority classes, highlighting its potential for scalable annotation and efficient management of large-scale Earth science datasets.
Share and Cite
MDPI and ACS Style
Zhao, L.; Chen, Z.; Li, G.; Guo, H.; Li, J.
A Hierarchical Classification Framework for Earth Science Data Based on Large Language Models and Label Graph Constraints. Appl. Sci. 2026, 16, 5230.
https://doi.org/10.3390/app16115230
AMA Style
Zhao L, Chen Z, Li G, Guo H, Li J.
A Hierarchical Classification Framework for Earth Science Data Based on Large Language Models and Label Graph Constraints. Applied Sciences. 2026; 16(11):5230.
https://doi.org/10.3390/app16115230
Chicago/Turabian Style
Zhao, Le, Zugang Chen, Guoqing Li, Hengliang Guo, and Jing Li.
2026. "A Hierarchical Classification Framework for Earth Science Data Based on Large Language Models and Label Graph Constraints" Applied Sciences 16, no. 11: 5230.
https://doi.org/10.3390/app16115230
APA Style
Zhao, L., Chen, Z., Li, G., Guo, H., & Li, J.
(2026). A Hierarchical Classification Framework for Earth Science Data Based on Large Language Models and Label Graph Constraints. Applied Sciences, 16(11), 5230.
https://doi.org/10.3390/app16115230
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article metric data becomes available approximately 24 hours after publication online.