Large Language Models in Low-Altitude Economy: A Novel Framework for Empowering Aerial Operations and Services †
Abstract
1. Introduction
2. Literature Review
2.1. Mathematical Problems
2.2. Textual Content
2.3. Graphical Information
2.4. Dynamic Imagery
3. Simulation and Comparison
3.1. MMbench Framework
3.2. Data Structure
- Method: The names of the models or methods are recorded in this column as the basis of the dataset to distinguish different models. In this column, the performance of different models on the same task is identified and compared.
- Release date: In this column, the release date of the model is recorded. This information is crucial to understand the progress and evolution of the model. By comparing models with different release dates, improvements or changes in performance are traced over time.
- Parameters: The number of parameters of the model is recorded in (in billions). The number of parameters is an important index to measure model complexity. In general, a model with more parameters has stronger learning ability and representation ability, but it also requires higher computational costs and a higher risk of overfitting.
- Language model: The underlying language model used by the model is recorded. Language model is a basic technology in natural language processing with an important impact on the performance of the model. By comparing the use of different language models, their impact on model performance is analyzed.
- Visual model: The underlying visual model used by the model is stored. Visual models play a key role in computer vision tasks. The performance and applicability of different visual models for cross-modal tasks are recorded.
- Average score: The overall score of the model and an assessment result of the model’s performance across all tasks are recorded. This score helps to understand the overall performance of the model and compare it between different models.
- Logical reasoning: The model’s score on the logical reasoning task is recorded. Logical reasoning ability is one of the important indexes to measure the level of model intelligence. The ability of the model in logical inference, reasoning judgment, and so on is analyzed.
- Attribute inference: The model’s score on the attribute inference task is recorded. Attribute inference refers to the inference of attributes or characteristics of objects or events based on existing information. The model’s ability in attribute inference and its application potential in different scenarios is assessed.
- Relational reasoning: The model’s score on the relational reasoning task is recorded. Relational reasoning refers to the ability to understand and infer relationships between objects. This ability is of great significance for the understanding and application of the model in complex scenarios.
- Single-object perception: The model’s score on the single-object perception task is recorded. Single-object perception refers to the ability of a model to recognize and perceive a single object. The performance of the model is analyzed in terms of single object recognition.
- Multi-object perception: The model’s score on the multi-object perception task is recorded. Multi-object perception refers to the ability of a model to recognize and perceive multiple objects. This ability is critical to the understanding and application of models in complex scenarios.
- Coarse-grained perception: The model’s score on the coarse-grained perception task is recorded. Coarse-grained perception refers to the model’s ability to understand and perceive objects or scenes at a larger scale or higher level. Using the score, the model performance is assessed in terms of coarse-grained perception and its applicability in different application scenarios.
3.3. Data Collection and Variables
3.4. Regression Analysis
3.5. Correlation Matrix Heat Map
- Negative of Pa
- High correlation of MOP
- Relationship between visual and language models
- Significant correlations with AS
3.6. Structural Equation Model and Path Dependence
4. Mechanism Flowchart
4.1. Impact Mechanism of LLMs
4.2. Flowchart of LLMs
5. Applications of LLMs in Low-Altitude Economy
5.1. Dynamic Mission Planning
5.2. Natural Language Interfaces for UAV Control
5.3. Automated Data Annotation and Analysis
5.4. Customer Service and Support
5.5. Challenges and Mitigation Strategies
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Austin, J.; Odena, A.; Nye, M.; Bosma, M.; Michalewski, H.; Dohan, D.; Jiang, E.; Cai, C.; Terry, M.; Le, Q.; et al. Program synthesis with large language models. arXiv 2021, arXiv:2108.07732. [Google Scholar]
- Sapp, B.; Taskar, B. Modec: Multimodal decomposable models for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013. [Google Scholar]
- Luger, G.F. LLMs: Their Past, Promise, and Problems. Int. J. Semant. Comput. 2024, 18, 501–544. [Google Scholar] [CrossRef]
- Yamauchi, R.; Sonoda, S.; Sannai, A.; Kumagai, W. Lpml: Llm-prompting markup language for mathematical reasoning. arXiv 2023, arXiv:2309.13078. [Google Scholar]
- Chivereanu, R.; Cosma, A.; Catruna, A.; Rughinis, R.; Radoi, E. Aligning actions and walking to llm-generated textual descriptions. In Proceedings of the 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG), Istanbul, Turkiye, 27–31 May 2024; IEEE: Piscataway, NJ, USA, 2024. [Google Scholar]
- Gopali, S.; Abri, F.; Namin, A.S.; Jones, K.S. The Applicability of LLMs in Generating Textual Samples for Analysis of Imbalanced Datasets. IEEE Access 2024, 12, 136451–136465. [Google Scholar] [CrossRef]
- Schneider, J.; Schenk, B.; Niklaus, C. Towards llm-based autograding for short textual answers. arXiv 2023, arXiv:2309.11508. [Google Scholar]
- Tan, J.; Xu, S.; Hua, W.; Ge, Y.; Li, Z.; Zhang, Y. Idgenrec: Llm-recsys alignment with textual id learning. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA, 14–18 July 2024. [Google Scholar]
- Cai, Y.; Mao, S.; Wu, W.; Wang, Z.; Liang, Y.; Ge, T.; Wu, C.; You, W.; Song, T.; Xia, Y.; et al. Low-code LLM: Graphical user interface over large language models. arXiv 2023, arXiv:2304.08103. [Google Scholar]
- Zhang, S.; Zhang, Z.; Chen, K.; Ma, X.; Yang, M.; Zhao, T.; Zhang, M. Dynamic planning for llm-based graphical user interface automation. arXiv 2024, arXiv:2410.00467. [Google Scholar]
- Bazi, Y.; Bashmal, L.; Al Rahhal, M.M.; Ricci, R.; Melgani, F. Rs-llava: A large vision-language model for joint captioning and question answering in remote sensing imagery. Remote Sens. 2024, 16, 1477. [Google Scholar] [CrossRef]
- Naseh, A.; Thai, K.; Iyyer, M.; Houmansadr, A. Iteratively prompting multimodal llms to reproduce natural and AI-generated images. arXiv 2024, arXiv:2404.13784. [Google Scholar]
- Noever, D.; Noever, S.E.M. The multimodal and modular ai chef: Complex recipe generation from imagery. arXiv 2023, arXiv:2304.02016. [Google Scholar]
- Yang, Z.; Lin, X.; He, Q.; Huang, Z.; Liu, Z.; Jiang, H.; Shu, P.; Wu, Z.; Li, Y.; Law, S.; et al. Examining the commitments and difficulties inherent in multimodal foundation models for street view imagery. arXiv 2024, arXiv:2408.12821. [Google Scholar]
- Zhang, S. A High-Flyer in the Making. Beijing Rev. 2024, 67, 32–33. [Google Scholar]
- Zheng, B.; Liu, F. Random Signal Design for Joint Communication and SAR Imaging Towards Low-Altitude Economy. IEEE Wirel. Commun. Lett. 2024, 13, 2662–2666. [Google Scholar] [CrossRef]
- Fan, B.; Li, Y.; Zhang, R. Initial analysis of low-altitude internet of intelligences (IOI) and the applications of unmanned aerial vehicle industry. Prog. Geogr. 2021, 40, 1441–1450. [Google Scholar] [CrossRef]
- Chen, C.; He, Y.; Wang, H.; Chen, J.; Luo, Q. Delayptc-llm: Metro passenger travel choice prediction under train delays with large language models. arXiv 2024, arXiv:2410.00052. [Google Scholar]
- Liu, Y. Large language models for air transportation: A critical review. J. Air Transp. Res. Soc. 2024, 2, 100024. [Google Scholar] [CrossRef]
- Balcacer, A.; Hannon, B.; Kumar, Y.; Huang, K.; Sarnoski, J.; Liu, S.; Li, J.J.; Morreale, P. Mechanics of a Drone-based System for Algal Bloom Detection Utilizing Deep Learning and LLMs. In Proceedings of the 2023 IEEE MIT Undergraduate Research Technology Conference (URTC), Cambridge, MA, USA, 6–8 October 2023; IEEE: Piscataway, NJ, USA, 2024. [Google Scholar]
- Jiao, A.; Patel, T.P.; Khurana, S.; Korol, A.; Brunke, L.; Adajania, V.K.; Culha, U.; Zhou, S.; Schoellig, A.P. Swarm-GPT: Combining large language models with safe motion planning for robot choreography design. arXiv 2023, arXiv:2312.01059. [Google Scholar]
- Lykov, A.; Tsetserukou, D. LLM-BRAIn: AI-driven Fast Generation of Robot Behaviour Tree based on Large Language Model. arXiv 2023, arXiv:2305.19352. [Google Scholar]
- Xie, Q.; Zhang, T.; Xu, K.; Johnson-Roberson, M.; Bisk, Y. Reasoning about the Unseen for Efficient Outdoor Object Navigation. arXiv 2023, arXiv:2309.10103. [Google Scholar]
- Javaid, S.; Khalil, R.A.; Saeed, N.; He, B.; Alouini, M. Leveraging large language models for integrated satellite-aerial-terrestrial networks: Recent advances and future directions. IEEE Open J. Commun. Soc. 2024, 6, 399–432. [Google Scholar] [CrossRef]
- Li, H.; Xiao, M.; Wang, K.; Kim, D.I.; Debbah, M. Large language model based multi-objective optimization for integrated sensing and communications in uav networks. IEEE Wirel. Commun. Lett. 2025, 14, 979–983. [Google Scholar] [CrossRef]
- Rasal, S.; Boddhu, S.K. Beyond segmentation: Road network generation with multi-modal llms. In Science and Information Conference; Springer Nature: Cham, Switzerland, 2024; pp. 308–315. [Google Scholar]
- Liu, Y.; Duan, H.; Zhang, Y.; Li, B.; Zhang, S.; Zhao, W.; Yuan, Y.; Wang, J.; He, C.; Liu, Z.; et al. Mmbench: Is your multi-modal model an all-around player? In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Springer: Berlin, Germany, 2024. [Google Scholar]
- Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; Agarwal, S.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Hoffmann, J.; Borgeaud, S.; Mensch, A.; Buchatskaya, E.; Cai, T.; Rutherford, E.; Casas, D.D.L.; Hendricks, L.A.; Welbl, J.; Clark, A.; et al. Training compute-optimal large language models. arXiv 2022, arXiv:2203.15556. [Google Scholar]
- Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 2022, 35, 27730–27744. [Google Scholar]
- Wang, A.; Singh, A.; Michael, J.; Hill, F.; Levy, O.; Bowman, S.R. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. arXiv 2018, arXiv:1804.07461. [Google Scholar]
- Xu, L.; Liu, J.; Pan, X.; Lu, X.; Hou, X. DataCLUE: A Benchmark Suite for Data-centric NLP. arXiv 2021, arXiv:2111.08647. [Google Scholar]
- Zellers, R.; Holtzman, A.; Bisk, Y.; Farhadi, A.; Choi, Y. Hellaswag: Can a machine really finish your sentence? arXiv 2019, arXiv:1905.07830. [Google Scholar]
- Bisk, Y.; Zellers, R.; Gao, J.; Choi, Y. Piqa: Reasoning about physical commonsense in natural language. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Computer Vision & Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Lin, T.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin, Germany, 2014. [Google Scholar]
- Wang, W.; Chen, Z.; Chen, X.; Wu, J.; Zhu, X.; Zeng, G.; Luo, P.; Lu, T.; Zhou, J.; Qiao, Y.; et al. Visionllm: Large language model is also an open-ended decoder for vision-centric tasks. Adv. Neural Inf. Process. Syst. 2023, 36, 61501–61513. [Google Scholar]
- Soomro, K.; Zamir, A.R.; Shah, M. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv 2012, arXiv:1212.0402. [Google Scholar]
- Kay, W.; Carreira, J.; Simonyan, K.; Zhang, B.; Hillier, C.; Vijayanarasimhan, S.; Viola, F.; Green, T.; Back, T.; Natsev, P.; et al. The kinetics human action video dataset. arXiv 2017, arXiv:1705.06950. [Google Scholar]
- Fu, C.; Lin, H.; Long, Z.; Shen, Y.; Zhao, M.; Zhang, Y.; Dong, S.; Wang, X.; Yin, D.; Ma, L.; et al. Vita: Towards open-source interactive omni multimodal llm. arXiv 2024, arXiv:2408.05211. [Google Scholar]
Variable | Observation Number | Mean | Standard Deviation (SD) | Minimum | Maximum |
---|---|---|---|---|---|
AS | 153 | 67.87 | 15.30 | 1.2 | 86 |
LR | 153 | 48.87 | 18.15 | 0.5 | 81.5 |
AI | 153 | 77.06 | 14.32 | 1.2 | 91 |
RR | 153 | 64.60 | 16.03 | 4.3 | 85.7 |
SOP | 153 | 72.33 | 17.65 | 0.5 | 93.3 |
MOP | 153 | 61.89 | 16.78 | 0 | 89.5 |
CGP | 153 | 71.14 | 13.86 | 1.3 | 83.5 |
Pa_b | 128 | 14.69 | 18.99 | 1 | 88.6 |
LM_a | 153 | 2.60 | 1.82 | 0 | 5 |
VM_a | 153 | 2.20 | 1.81 | 0 | 6 |
LM_a | LM_a | VM_a | VM_a | |
---|---|---|---|---|
AI | −0.0359 | −0.0359 | 0.125 *** | 0.125 *** |
(−0.99) | (−0.96) | (3.50) | (3.40) | |
RR | 0.0549 | 0.0549 | 0.0264 | 0.0264 |
(1.56) | (1.67) | (0.75) | (0.67) | |
SOP | −0.00100 | −0.00100 | −0.0338 | −0.0338 |
(−0.03) | (−0.03) | (−0.94) | (−0.77) | |
MOP | −0.0138 | −0.0138 | 0.0294 | 0.0294 |
(−0.33) | (−0.31) | (0.72) | (0.75) | |
CGP | 0.00997 | 0.00997 | −0.117 ** | −0.117 ** |
(0.24) | (0.24) | (−2.84) | (−2.90) | |
PA_b | −0.0165 | −0.0165 * | −0.0177 * | −0.0177 * |
(−1.98) | (−2.13) | (−2.13) | (−2.11) | |
CONS | 2.732 ** | 2.732 ** | 0.558 | 0.558 |
(3.28) | (2.64) | (0.68) | (0.85) | |
N | 128 | 128 | 128 | 128 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, J.; Shi, Y. Large Language Models in Low-Altitude Economy: A Novel Framework for Empowering Aerial Operations and Services. Eng. Proc. 2025, 98, 33. https://doi.org/10.3390/engproc2025098033
Wang J, Shi Y. Large Language Models in Low-Altitude Economy: A Novel Framework for Empowering Aerial Operations and Services. Engineering Proceedings. 2025; 98(1):33. https://doi.org/10.3390/engproc2025098033
Chicago/Turabian StyleWang, Jun, and Yawei Shi. 2025. "Large Language Models in Low-Altitude Economy: A Novel Framework for Empowering Aerial Operations and Services" Engineering Proceedings 98, no. 1: 33. https://doi.org/10.3390/engproc2025098033
APA StyleWang, J., & Shi, Y. (2025). Large Language Models in Low-Altitude Economy: A Novel Framework for Empowering Aerial Operations and Services. Engineering Proceedings, 98(1), 33. https://doi.org/10.3390/engproc2025098033