Previous Article in Journal
Research on Intelligent Resource Management Solutions for Green Cloud Computing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Aerial Image Analysis: When LLMs Assist (And When Not)

Dipartimento di Matematica e Informatica, University of Catania, 95125 Catania, Italy
*
Author to whom correspondence should be addressed.
Future Internet 2026, 18(2), 77; https://doi.org/10.3390/fi18020077 (registering DOI)
Submission received: 15 December 2025 / Revised: 19 January 2026 / Accepted: 20 January 2026 / Published: 1 February 2026

Abstract

Large language models (LLMs) have shown remarkable results when tasked with the analysis and production of texts or images and for captioning images. Aerial images differ from other images since they exhibit many natural objects that have a highly variable color range and no clear contours. This paper reports to what extent an LLM, i.e., Llama-4, can be tasked with the identification and captioning in aerial images of natural objects, such as tree categories, uncultivated land, and some man-made objects, such as roads. This valuable automation is needed to scan large areas and detect the parts for which a sudden maintenance or an emergency intervention is due. Tests on the chosen LLM were performed against a custom image dataset built to overcome the limited availability of such a domain-specific aerial image set. To evaluate the identification and captioning results, the accuracy, precision and recall metrics were computed. The results given by a cutting-edge variant of Llama-4, namely Maverick, reveal its strengths and weaknesses in this context. Although it is remarkable that an out-of-the-box tool can give assistance in such a complex observation and detection task, substantial progress is needed for such a model to improve accuracy and constitute a reliable support, as accuracy is at most 58.6% and recall is at most 56.1%.
Keywords: captioning; aerial images; LLMs; LLaMA-4; Roboflow captioning; aerial images; LLMs; LLaMA-4; Roboflow

Share and Cite

MDPI and ACS Style

Calcagno, S.; Scaletta, E.; Tramontana, E.; Verga, G. Aerial Image Analysis: When LLMs Assist (And When Not). Future Internet 2026, 18, 77. https://doi.org/10.3390/fi18020077

AMA Style

Calcagno S, Scaletta E, Tramontana E, Verga G. Aerial Image Analysis: When LLMs Assist (And When Not). Future Internet. 2026; 18(2):77. https://doi.org/10.3390/fi18020077

Chicago/Turabian Style

Calcagno, Salvatore, Erika Scaletta, Emiliano Tramontana, and Gabriella Verga. 2026. "Aerial Image Analysis: When LLMs Assist (And When Not)" Future Internet 18, no. 2: 77. https://doi.org/10.3390/fi18020077

APA Style

Calcagno, S., Scaletta, E., Tramontana, E., & Verga, G. (2026). Aerial Image Analysis: When LLMs Assist (And When Not). Future Internet, 18(2), 77. https://doi.org/10.3390/fi18020077

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop