VideoARD: An Analysis-Ready Multi-Level Data Model for Remote Sensing Video
Highlights
- A multi-level VideoARD model formalizes scene–object–event entities with standardized metadata and provenance.
- A spatiotemporal VideoCube links frame facts to spatial, temporal, product, quality, and semantic dimensions to enable OLAP-style queries and cross-sensor integration.
- The standardized representation streamlines workflows, reducing preprocessing burden and improving reproducibility for detection, tracking, and event analysis.
- Benchmarks show lower query latency and resource usage with non-inferior or slightly improved task accuracy versus frame-level baselines.
Abstract
1. Introduction
- 1.
- Compared to directly applying algorithms on raw videos, VideoARD performs a one-time preprocessing step to generate reusable analysis-ready data, significantly reducing redundant computation in downstream tasks.
- 2.
- The multi-level semantic abstraction enables accurate extraction of scene, object, and event information, improving detection and recognition performance.
- 3.
- The spatiotemporal cube framework organizes data across varying resolutions and time scales, greatly enhancing querying and statistical efficiency.
- 4.
- The effectiveness and scalability of VideoARD are validated through three representative case studies—vessel speed monitoring, forest fire detection, and 3D scene reconstruction—by comparing it with baseline methods in terms of latency, accuracy, and resource consumption.
2. Related Works
2.1. Definition and Standardization Progress of ARD
- CEOS Analysis-Ready Data for Land (CARD4L): Developed by the CEOS Working Group on Calibration and Validation (WGCV), CARD4L [26] defines ARD product specifications for land remote sensing, covering optical, infrared, and radar sensors. It imposes strict requirements on geometric accuracy, radiometric calibration, quality masking, and metadata to ensure comparability and interoperability across satellites and sensors [27].
- USGS Landsat Collection 2 ARD: The USGS released Collection 2 ARD [28] products for the Landsat series provide tiled-surface reflectance data embedded with SpatioTemporal Asset Catalog (STAC) metadata and quality masks, supporting cloud-based analysis and downloads.
- Sentinel-2 Level-2A: Although not explicitly labeled as ARD, the European Space Agency’s Sentinel-2 L2A products under the Copernicus program implement atmospheric correction, scene classification masks, and standardized projections, consistent with ARD principles. These products are distributed on cloud platforms such as CREODIAS in an ARD-compatible manner [18,29,30].
- Open Data Cube (ODC): The ODC initiative offers a data cube framework for ARD distribution and management, enabling the loading and cataloging of ARD products from diverse sources and bands [27]. It supports spatiotemporal linked retrieval and analysis, facilitating large-scale time series remote sensing research.
2.2. Management and Analysis of Remote Sensing Video Data
3. VideoARD Framework
3.1. VideoARD: A Multi-Level Data Model
3.1.1. Characteristics of Remote Sensing Video Data and ARD Readiness Criteria
3.1.2. Semantic Abstraction Framework
- S: Scene-level entities, each representing a spatial–temporal segment of the video.
- O: Object-level entities, each corresponding to a tracked physical object.
- E: Event-level entities, each denoting high-level semantic phenomena involving one or more objects and/or scenes.
- : Mapping from the original video to scene entities.
- : Mapping from the video to object entities.
- : Mapping from scenes and objects to event entities.
3.1.3. Multi-Level Model Design and Implementation
3.2. VideoCube: A Spatiotemporal Data Cube for VideoARD Management
3.2.1. Architecture and Implementation of VideoCube
- 1.
- Video Frame Fact: VideoARD tiles, generated by temporal segmentation and spatial gridding, serve as measures of the video frame fact, linked with temporal, spatial, product, and semantic dimensions.
- 2.
- Temporal Dimension: To harmonize varying temporal resolutions, this dimension is designed with precision higher than the maximum among data sources and is linked to the frame fact.
- 3.
- Spatial Dimension: Using a unified grid reference, heterogeneous spatial resolutions are standardized into a 2D grid, where each cell corresponds to a spatial member.
- 4.
- Product Dimension: This records product metadata, including source, type, and processing level, and is linked to the frame fact.
- 5.
- Quality Dimension: This captures scene-level quality indicators (e.g., scene classification, object detection accuracy, cloud masking) and is aggregated into the product dimension.
- 6.
- Semantic Dimension: This encodes higher-level video semantics, including scenes, objects, and events, and is linked to the frame fact.
3.2.2. EOLAP Operations on VideoCube
4. Experimental Evaluation
4.1. Efficiency Benchmarking
4.1.1. Experimental Environment and Dataset
4.1.2. Benchmarking Methodology
- Query time (ms): The time difference between a user request and the system’s data response.
- Data throughput (Mbps): The maximum data volume the system can process.
- Resource utilization: CPU consumption.
4.1.3. Results and Observations
- 1.
- Enhanced data modeling: The proposed method incorporates a multi-dimensional hierarchical structure on top of distributed storage, making it better suited for complex analytical tasks. In contrast, object storage methods focus primarily on block-based storage and redundancy, without supporting complex data-dimensional models.
- 2.
- Structured and fine-grained management: The proposed method extends data across spatial, temporal, and semantic dimensions, facilitating rapid queries and supporting complex applications. Object storage, however, is oriented toward unstructured data management and lacks support for complex queries.
- 3.
- High concurrency support: By using distributed architecture and multi-dimensional data partitioning, the proposed method supports parallel processing and high-concurrency access. The VideoCube design leverages load balancing and parallel mechanisms to maintain low latency under heavy load. While traditional file systems and object storage can handle concurrency, their performance can degrade for complex queries or large-scale concurrent access.
- 4.
- Optimized query efficiency: The proposed method emphasizes efficient multi-dimensional query and analysis through the indexing, caching, and pre-processing of key frames. Traditional methods primarily optimize for data transfer speed and storage scalability rather than complex query performance.
- 5.
- Application-specific design: The method is tailored to complex remote sensing video scenarios, enabling multi-dimensional queries and rapid analytics. Object storage solutions are better suited for general-purpose tasks such as video-on-demand or data backup, but are less effective for real-time analysis or multi-dimensional retrieval.
| Comparison Dimension | Group A (Ours) | Group B |
|---|---|---|
| Architecture | Multi-level analytical model combining spatiotemporal cubes and semantic dimensions; supports complex queries and analytics | Distributed file system or object storage architecture; emphasizes storage and data transfer efficiency |
| Data Management | Multi-level management (scene, object, event); structured and fine-grained | Simple file-block and metadata-based management |
| Concurrency Support | Multi-dimensional partitioning and parallel processing; suitable for high-concurrency, multi-dimensional queries | High concurrency capability, but limited support for complex queries |
| Performance Optimization | Preprocessing key frames and semantic data; caching and indexing improve access speed | Relies on block storage and redundancy; low efficiency for complex queries |
| Application Scenarios | Multi-user remote sensing video analysis, dynamic analytics, event detection | Simple queries and storage, e.g., video-on-demand, backup |
4.2. Application Cases Studies
4.2.1. Case 1: Vessel Speed Monitoring (Object-Level Case)
4.2.2. Case 2: Wildfire Detection and Monitoring (Event-Level Case)
4.2.3. Case 3: Near-Real-Time 3D Scene Reconstruction (Scene-Level Case)
4.2.4. Case-Level Summary and Practical Remarks
4.2.5. Accuracy Consistency Evaluation
5. Discussion
6. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Moving Features Instances of the Object-Level VideoARD
| Object | Timestamp | Relative Coordinate X (Ratio) | Relative Coordinate Y (Ratio) |
|---|---|---|---|
| Ship2 | 0 | 0.584115 | 0.460185 |
| 1 | 0.584115 | 0.460648 | |
| 2 | 0.584375 | 0.461111 | |
| 3 | 0.584896 | 0.462037 | |
| 4 | 0.585677 | 0.462963 | |
| 5 | 0.585938 | 0.463426 | |
| 6 | 0.586458 | 0.463889 | |
| 7 | 0.586719 | 0.464352 | |
| 8 | 0.586979 | 0.464815 | |
| 9 | 0.587500 | 0.465278 | |
| 10 | 0.587760 | 0.466204 | |
| 11 | 0.588021 | 0.466204 | |
| 12 | 0.588802 | 0.466667 | |
| 13 | 0.589323 | 0.467593 | |
| 14 | 0.589583 | 0.468981 | |
| 15 | 0.590104 | 0.469907 | |
| 16 | 0.590365 | 0.470370 | |
| 17 | 0.590885 | 0.470833 | |
| 18 | 0.591406 | 0.471296 | |
| 19 | 0.591667 | 0.472222 | |
| 20 | 0.592188 | 0.472685 | |
| 21 | 0.592448 | 0.472685 | |
| 22 | 0.592969 | 0.473148 | |
| 23 | 0.593490 | 0.473611 | |
| 24 | 0.593750 | 0.474074 | |
| 25 | 0.594271 | 0.474074 | |
| 26 | 0.595052 | 0.475000 | |
| 27 | 0.595573 | 0.475000 | |
| 28 | 0.596354 | 0.475463 | |
| 29 | 0.596875 | 0.475926 | |
| 30 | 0.597656 | 0.476852 |
| Objects | Relative Coordinate X | Relative Coordinate Y | Relative Coordinate X, Y (Ratio) |
|---|---|---|---|
| ship_1 | 1032 | 2027 | 0.251953,0.938425 |
| ship_2 | 238 | 1773 | 0.058105,0.820833 |
| ship_3 | 825 | 1720 | 0.201416,0.796296 |
| ship_3 | 513 | 1534 | 0.125244,0.710185 |
| ship_4 | 1851 | 1598 | 0.451904,0.739814 |
| ship_4 | 2342 | 1912 | 0.571777,0.885185 |
| ship_5 | 910 | 1007 | 0.222167,0.466203 |
| ship_5 | 1470 | 1365 | 0.358886,0.631944 |
| ship_6 | 2872 | 1416 | 0.701171,0.655555 |
| ship_6 | 3145 | 1595 | 0.767822,0.738425 |
| ship_7 | 2132 | 1112 | 0.520507,0.514814 |
| ship_7 | 2149 | 1125 | 0.524660,0.520830 |
| ship_7 | 2162 | 1127 | 0.527832,0.521759 |
| ship_7 | 2379 | 1272 | 0.580810,0.588890 |
| ship_7 | 2527 | 1365 | 0.616943,0.631944 |
| ship_8 | 4087 | 1965 | 0.997802,0.909722 |
| ship_9 | 3962 | 1855 | 0.967285,0.858796 |
| ship_10 | 3817 | 1772 | 0.931884,0.820370 |
| ship_11 | 3685 | 1687 | 0.899658,0.781018 |
| ship_12 | 3570 | 1570 | 0.871582,0.726851 |
| ship_13 | 3685 | 957 | 0.899658,0.443055 |
| ship_14 | 3217 | 1410 | 0.785400,0.652777 |
| ship_15 | 2010 | 1290 | 0.490722,0.597222 |
| ship_16 | 2815 | 1122 | 0.687255,0.519444 |
| ship_17 | 2142 | 747 | 0.522949,0.345833 |
| ship_18 | 1685 | 470 | 0.411376,0.217592 |
| ship_19 | 1510 | 350 | 0.368652,0.162037 |
| ship_20 | 1415 | 285 | 0.345458,0.131944 |
| ship_21 | 1315 | 217 | 0.321044,0.100462 |
| ship_22 | 1095 | 110 | 0.267333,0.050925 |
| ship_23 | 1010 | 50 | 0.246582,0.023148 |
| ship_24 | 2866 | 1042 | 0.699707,0.482407 |
| ship_25 | 2895 | 1048 | 0.706787,0.485185 |
| ship_26 | 2905 | 982 | 0.709228,0.454629 |
| ship_27 | 2937 | 927 | 0.717041,0.429166 |
| ship_28 | 2968 | 871 | 0.724609,0.403240 |
Appendix B. Database Design of VideoCube
| Field Name | Type | Foreign/Primary Key | Description |
|---|---|---|---|
| id | int4 | PK | Fact id of the video fact |
| product_key | int4 | FK | Product key of the video fact |
| space_key | int4 | FK | Spatial key of the video fact |
| time_key | int4 | FK | Temporal key of the video fact |
| semantic_key | int4 | FK | Semantic key of the video fact |
| tile_id | int4 | FK | Tile id of the video fact |
| Field Name | Type | Foreign/Primary Key | Description |
|---|---|---|---|
| product_key | int4 | PK | Product key of the product |
| product_name | varchar | None | Product name of the product |
| product_type | varchar | None | Product tpye of the product |
| sensor_key | int4 | FK | Sensor key of the video fact |
| video_key | int4 | FK | Video key of the video fact |
| level_key | int1 | FK | Level key of the video fact |
| Field Name | Type | Foreign/Primary Key | Description |
|---|---|---|---|
| level_key | int1 | PK | Level key of the product |
| level_type | enum | None | Level type of the product, include L0, L1, L2 and L3 |
| level_name | varchar | None | Level name, include original-level, scene-level, object-level and event-level |
| Field Name | Type | Foreign/Primary Key | Description |
|---|---|---|---|
| video_key | int4 | PK | Video key of the video fact |
| URL | varchar | None | Video file path |
| tile_ids | varchar | FK | The ID set of all video tiles associated with the video file |
| video_resolution | varchar | None | Video resolution |
| video_fps | int2 | None | Frames per second of video file |
| Field Name | Type | Foreign/Primary Key | Description |
|---|---|---|---|
| sensor_key | int4 | PK | Sensor key of the video fact |
| sensor_name | varchar | None | Sensor name |
| platform | varchar | None | Name of the platform on which the sensor is mounted |
| Field Name | Type | Foreign/Primary Key | Description |
|---|---|---|---|
| time_key | int4 | PK | Time key of the video fact |
| time_standard | varchar | None | Time reference framework |
| time | timestamp8 | None | Time with time zone, e.g., 2003-04-12T04:05:06Z |
| Field Name | Type | Foreign/Primary Key | Description |
|---|---|---|---|
| space_key | int4 | PK | Spatial key of the video fact |
| grid_key | int4 | FK | Spatial grid key |
| grid_ids | varchar | None | Grid id sets of the video fact |
| city_key | int4 | FK | City key of the video fact |
| Field Name | Type | Foreign/Primary Key | Description |
|---|---|---|---|
| grid_key | int4 | PK | Grid key of the video fact |
| grid_type | varchar | None | Type of the grid reference |
| grid_size | int4 | None | Size of the grid |
| crs | varchar | None | Coordinate reference system of the grid |
| Field Name | Type | Foreign/Primary Key | Description |
|---|---|---|---|
| city_key | int4 | PK | City key of the video fact |
| city_name | varchar | None | City name |
| province_name | varchar | None | Province name |
| Field Name | Type | Foreign/Primary Key | Description |
|---|---|---|---|
| semantic_key | int4 | PK | Semantic key of the video fact |
| level_key | int4 | FK | Level key of the video fact |
| Field Name | Type | Foreign/Primary Key | Description |
|---|---|---|---|
| object_key | int4 | PK | Object key of the video fact |
| semantic_key | int4 | FK | Semantic key of the video fact |
| object_type | int4 | None | Type of the object |
| object_name | varchar | None | Class of the object |
| trajectory_key | int4 | FK | Trajectory key of the object |
| Field Name | Type | Foreign/Primary Key | Description |
|---|---|---|---|
| trajectory_key | int4 | PK | Tracjectory key of the object |
| trajectory | varchar | None | Trajectory, a sets of objects’ coordinates |
| Field Name | Type | Foreign/Primary Key | Description |
|---|---|---|---|
| scene_key | int4 | PK | Scene key of the video fact |
| semantic_key | int4 | FK | Semantic key of the video fact |
| scene_type | varchar | None | Type of the scene |
| limit | varchar | None | Limits of the scene |
| Field Name | Type | Foreign/Primary Key | Description |
|---|---|---|---|
| event_key | int4 | PK | Event key of the video fact |
| semantic_key | int4 | FK | Semantic key of the video fact |
| event_type | varchar | None | Type of the event |
| event_name | varchar | None | Event name |
| object_keys | int8 | None | The target collection associated with the event |
| moving_features_url | varchar | None | Moving Features files path |
| period | int4 | None | The time range (s) |
| extent | varchar | None | The spatial range, represented by the upper left and lower right corner coordinates. |
| Field Name | Type | Foreign/Primary Key | Description |
|---|---|---|---|
| product_key | int4 | FK | Product key of the video |
| sensor_key | int4 | FK | Sensor key of the product |
| quality | varchar | None | Quality info of the video fact |
| Rowkey | Timestamp | RasterTile | |
|---|---|---|---|
| tile_id | long | tile | Metadata |
| Byte | Json | ||
| Path | File | Description |
|---|---|---|
| background_path | Json | Scene-level Moving Features file path |
| object_path | Json | Object-level Moving Features file path |
| event_path | Json | Event-level Moving Features file path |
References
- Picoli, M.C.A.; Camara, G.; Sanches, I.; Simões, R.; Carvalho, A.; Maciel, A.; Coutinho, A.; Esquerdo, J.; Antunes, J.; Begotti, R.A.; et al. Big earth observation time series analysis for monitoring Brazilian agriculture. ISPRS J. Photogramm. Remote Sens. 2018, 145, 328–339. [Google Scholar] [CrossRef]
- Li, D.; Zhang, L.; Xia, G. Automatic analysis and mining of remote sensing big data. Acta Geod. Cartogr. Sin. 2014, 43, 1211–1216. [Google Scholar]
- Elliott, J. Earth observation for the assessment of earthquake hazard, risk and disaster management. Surv. Geophys. 2020, 41, 1323–1354. [Google Scholar] [CrossRef]
- McCabe, M.F.; Rodell, M.; Alsdorf, D.E.; Miralles, D.G.; Uijlenhoet, R.; Wagner, W.; Lucieer, A.; Houborg, R.; Verhoest, N.E.; Franz, T.E.; et al. The future of Earth observation in hydrology. Hydrol. Earth Syst. Sci. 2017, 21, 3879–3914. [Google Scholar] [CrossRef]
- Zhao, Z. Jilin-1 satellite constellation. Satell. Appl. 2015, 11, F0003. [Google Scholar]
- Guo, Z. Satellite Video Processing and Applications. J. Appl. Sci. 2016, 34, 361–370. [Google Scholar]
- Pelapur, R.; Candemir, S.; Bunyak, F.; Poostchi, M.; Seetharaman, G.; Palaniappan, K. Persistent target tracking using likelihood fusion in wide-area and full motion video sequences. In Proceedings of the 2012 15th International Conference on Information Fusion, Singapore, 9–12 July 2012; pp. 2420–2427. [Google Scholar]
- Davey, S.J.; Gaetjens, H.X. Tracking in Full Motion Video. In Track-Before-Detect Using Expectation Maximisation: The Histogram Probabilistic Multi-Hypothesis Tracker: Theory and Applications; Springer: Berlin/Heidelberg, Germany, 2018; pp. 287–318. [Google Scholar]
- Wu, R.; Chen, Y.; Blasch, E.; Liu, B.; Chen, G.; Shen, D. A container-based elastic cloud architecture for real-time full-motion video (fmv) target tracking. In Proceedings of the 2014 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, 14–16 October 2014; pp. 1–8. [Google Scholar]
- Kant, S. Activity-based exploitation of full motion video (fmv). In Full Motion Video (FMV) Workflows and Technologies for Intelligence, Surveillance, and Reconnaissance (ISR) and Situational Awareness; SPIE: Baltimore, MD, USA, 25 May 2012; pp. 78–88. [Google Scholar]
- Asraf Mohamad Sharom, M.A.; Ahmad Fauzi, M.F.; Sipit, A.R.; Mat Azmi, M.Z. Development of video data post-processing technique: Generating consumer drone full motion video (FMV) data for intelligence, surveillance and reconnaissance (ISR). Def. S&T Tech. Bull. 2021, 14, 70. [Google Scholar]
- Faraj, F. Object Detection and Pattern of Life Analysis from Remotely Piloted Aircraft System Acquired Full Motion Video. Master’s Thesis, Queen’s University, Kingston, ON, Canada, 2021. [Google Scholar]
- Macior, R.E.; Bright, G.A.; Walter, S.M. Adapting full motion video data for the real world. Electro-Opt. Infrared Syst. Technol. Appl. V 2008, 7113, 361–370. [Google Scholar]
- Gong, J.; Li, G. China’s Earth Observation Data Resources Development Report (2019); National Earth Observation Science Data Center: Beijing, China, 2019. [Google Scholar]
- Lewis, A.; Lacey, J.; Mecklenburg, S.; Ross, J.; Siqueira, A.; Killough, B.; Szantoi, Z.; Tadono, T.; Rosenavist, A.; Goryl, P.; et al. CEOS Analysis Ready Data for Land (CARD4L) Overview. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 7407–7410. [Google Scholar]
- Dwyer, J.L.; Roy, D.P.; Sauer, B.; Jenkerson, C.B.; Zhang, H.K.; Lymburner, L. Analysis ready data: Enabling analysis of the Landsat archive. Remote Sens. 2018, 10, 1363. [Google Scholar] [CrossRef]
- Zhu, Z. Science of Landsat analysis ready data. Remote Sens. 2019, 11, 2166. [Google Scholar] [CrossRef]
- Frantz, D. FORCE—Landsat+ Sentinel-2 analysis ready data and beyond. Remote Sens. 2019, 11, 1124. [Google Scholar] [CrossRef]
- Landsat Collection 2 U.S. Analysis Ready Data. Available online: https://www.usgs.gov/ (accessed on 27 August 2025).
- Pahlevan, N.; Schott, J.R.; Franz, B.A.; Zibordi, G.; Markham, B.; Bailey, S.; Schaaf, C.B.; Ondrusek, M.; Greb, S.; Strait, C.M. Landsat 8 remote sensing reflectance (Rrs) products: Evaluations, intercomparisons, and enhancements. Remote Sens. Environ. 2017, 190, 289–301. [Google Scholar] [CrossRef]
- Pahlevan, N.; Mangin, A.; Balasubramanian, S.V.; Smith, B.; Alikas, K.; Arai, K.; Barbosa, C.; Bélanger, S.; Binding, C.; Bresciani, M.; et al. ACIX-Aqua: A global assessment of atmospheric correction methods for Landsat-8 and Sentinel-2 over lakes, rivers, and coastal waters. Remote Sens. Environ. 2021, 258, 112366. [Google Scholar] [CrossRef]
- Rosenqvist, A.; Tadono, T.; Shimada, M.; Itoh, T. Jaxa Global Sar Mosaics–Assessing Compliance with Ceos Analysis Ready Data for Land (Card4l) Specifications. In Proceedings of the IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 5545–5548. [Google Scholar]
- Yuan, F.; Repse, M.; Leith, A.; Rosenqvist, A.; Milcinski, G.; Moghaddam, N.F.; Dhar, T.; Burton, C.; Hall, L.; Jorand, C.; et al. An operational analysis ready radar backscatter dataset for the African continent. Remote Sens. 2022, 14, 351. [Google Scholar] [CrossRef]
- Geographic Information—Schema for Moving Features. Available online: https://www.iso.org/standard/41445.html (accessed on 27 August 2025).
- Hayashi, H.; Asahara, A.; Kim, K.; Shibasaki, R.; Ishimaru, N. OGC Moving Features Access. Version 1.0. Available online: https://repository.oceanbestpractices.org/handle/11329/1009 (accessed on 27 August 2025).
- Bachmann, M.; Alonso, K.; Carmona, E.; Gerasch, B.; Habermeyer, M.; Holzwarth, S.; Krawczyk, H.; Langheinrich, M.; Marshall, D.; Pato, M.; et al. Analysis-ready data from hyperspectral sensors—The design of the enmap card4l-sr data product. Remote Sens. 2021, 13, 4536. [Google Scholar] [CrossRef]
- Lewis, A.; Oliver, S.; Lymburner, L.; Evans, B.; Wyborn, L.; Mueller, N.; Raevksi, G.; Hooke, J.; Woodcock, R.; Sixsmith, J.; et al. The Australian geoscience data cube—foundations and lessons learned. Remote Sens. Environ. 2017, 202, 276–292. [Google Scholar] [CrossRef]
- Earth Resources Observation and Science (EROS) Center. Landsat 8-9 Operational Land Imager/Thermal Infrared Sensor Level-2 Collection 2. Available online: https://www.usgs.gov/centers/eros/science/usgs-eros-archive-landsat-archives-landsat-8-9-olitirs-collection-2-level-2 (accessed on 27 November 2020).
- Ørka, H.O.; Gailis, J.; Vege, M.; Gobakken, T.; Hauglund, K. Analysis-ready satellite data mosaics from Landsat and Sentinel-2 imagery. MethodsX 2023, 10, 101995. [Google Scholar] [CrossRef]
- Chini, M.; Pelich, R.; Hostache, R.; Matgen, P.; López-Martinez, C. Towards a 20 m global building map from Sentinel-1 SAR data. Remote Sens. 2018, 10, 1833. [Google Scholar] [CrossRef]
- Khosravi, M.R.; Tavallali, P. Real-time statistical image and video processing for remote sensing and surveillance applications. J. Real-Time Image Process. 2021, 18, 1435–1439. [Google Scholar] [CrossRef]
- Schowengerdt, R.A. The title of the cited contribution. In Remote Sensing: Models and Methods for Image Processing; Elsevier: Amsterdam, The Netherlands, 2006. [Google Scholar]
- Sharma, V.; Gupta, M.; Kumar, A.; Mishra, D. Video processing using deep learning techniques: A systematic literature review. IEEE Access 2021, 9, 139489–139507. [Google Scholar] [CrossRef]
- Kastrinaki, V.; Zervakis, M.; Kalaitzakis, K. A survey of video processing techniques for traffic applications. Image Vis. Comput. 2003, 21, 359–381. [Google Scholar] [CrossRef]
- AWS Ground Station—Easily Control Satellites and Ingest Data. Available online: https://aws.amazon.com/cn/ground-station/ (accessed on 27 August 2025).
- Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
- Zhang, R.; Zhang, J.; Wang, W. Remote sensing imaging analysis and ubiquitous cloud-based mobile edge computing based intelligent forecast of forest tourism demand. Distrib. Parallel Databases 2023, 41, 95–116. [Google Scholar] [CrossRef]
- Jiang, Q.; Zheng, L.; Zhou, Y.; Liu, H.; Kong, Q.; Zhang, Y.; Chen, B. Efficient On-Orbit Remote Sensing Imagery Processing via Satellite Edge Computing Resource Scheduling Optimization. IEEE Trans. Geosci. Remote Sens. 2025, 63, 1000519. [Google Scholar] [CrossRef]
- Koubaa, A.; Ammar, A.; Abdelkader, M.; Alhabashi, Y.; Ghouti, L. AERO: AI-enabled remote sensing observation with onboard edge computing in UAVs. Remote Sens. 2023, 15, 1873. [Google Scholar] [CrossRef]
- Hu, W.; Li, W.; Zhou, X.; Kawai, A.; Fueda, K.; Qian, Q.; Wang, J. Spatio-temporal graph convolutional networks via view fusion for trajectory data analytics. IEEE Trans. Intell. Transp. Syst. 2022, 24, 4608–4620. [Google Scholar] [CrossRef]
- Zhang, X.; Reichard-Flynn, W.; Zhang, M.; Hirn, M.; Lin, Y. Spatiotemporal graph convolutional networks for earthquake source characterization. J. Geophys. Res. Solid Earth 2022, 127, e2022JB024401. [Google Scholar] [CrossRef]
- Cheng, G.; Xie, X.; Han, J.; Guo, L.; Xia, G. Remote sensing image scene classification meets deep learning: Challenges, methods, benchmarks, and opportunities. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 3735–3756. [Google Scholar] [CrossRef]
- Phan, T.C.; Nguyen, T.T.; Hoang, T.D.; Nguyen, Q.V.H.; Jo, J. Multi-scale bushfire detection from multi-modal streams of remote sensing data. IEEE Access 2020, 8, 228496–228513. [Google Scholar] [CrossRef]
- Gu, Y.; Wang, Y.; Li, Y. A survey on deep learning-driven remote sensing image scene understanding: Scene classification, scene retrieval and scene-guided object detection. Appl. Sci. 2019, 9, 2110. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J.; Lu, X. Remote sensing image scene classification: Benchmark and state of the art. Proc. IEEE 2017, 105, 1865–1883. [Google Scholar] [CrossRef]
- Guttman, A. R-trees: A dynamic index structure for spatial searching. In Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, Boston, MA, USA, 18–21 June 1984; pp. 47–57. [Google Scholar]
- Finkel, R.A.; Bentley, J.L. Quad trees a data structure for retrieval on composite keys. Acta Inform. 1974, 4, 1–9. [Google Scholar] [CrossRef]
- Li, S.; Sun, X.; Gu, Y.; Lv, Y.; Zhao, M.; Zhou, Z.; Guo, W.; Sun, Y.; Wang, H.; Yang, J. Recent advances in intelligent processing of satellite video: Challenges, methods, and applications. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 6776–6798. [Google Scholar] [CrossRef]
- Byeong-Ho, K. A Review on Image and Video processing. Int. J. Multimed. Ubiquitous Eng. 2007, 2, 49–64. [Google Scholar]
- Li, S.; Cui, Y.; Liu, M.; He, H.; Ravan, S. Integrating global open geo-information for major disaster assessment: A case study of the Myanmar flood. ISPRS Int. J. Geo-Inf. 2017, 6, 201. [Google Scholar] [CrossRef]
- ISO 19144-2:2023; Geographic Information—Classification Systems—Part 2: Land Cover Meta Language (LCML). International Organization for Standardization: Geneva, Switzerland, 2023. Available online: https://www.iso.org/standard/81259.html (accessed on 27 August 2025).
- Lin, T.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Thomsen, E. OLAP Solutions: Building Multidimensional Information Systems; John Wiley & Sons: Hoboken, NJ, USA, 2002. [Google Scholar]
- Chaudhuri, S.; Dayal, U. An overview of data warehousing and OLAP technology. ACM Sigmod Rec. 1997, 26, 65–74. [Google Scholar] [CrossRef]
- Bachmann, P. Datacube standards and their contribution to analysis-ready data. In Proceedings of the IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 2051–2053. [Google Scholar]
- Varghese, R.; Sambath, M. Yolov8: A novel object detection algorithm with enhanced performance and robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar]
- Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.; et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 4015–4026. [Google Scholar]
- Wang, M.; Yu, D.; He, W.; Yue, P.; Liang, Z. Domain-incremental learning for fire detection in space-air-ground integrated observation network. Int. J. Appl. Earth Obs. Geoinf. 2023, 118, 103279. [Google Scholar] [CrossRef]
- Leotta, M.J.; Long, C.; Jacquet, B.; Zins, M.; Lipsa, D.; Shan, J.; Xu, B.; Li, Z.; Zhang, X.; Chang, S.; et al. Urban semantic 3D reconstruction from multiview satellite imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]



















| Product/Standard | Data Type | Provider | Temporal Support | Semantic Granularity | Video Preprocessing |
|---|---|---|---|---|---|
| Landsat C2 U.S. | Optical (30 m) | USGS | Multi-temporal (since 1982) | Pixel-level (reflectance, QA) | × |
| CARD4L | Optical, SAR | CEOS | Time-series compatible | Pixel-level (reflectance, backscatter) | × |
| CEOS-ARD * | Thematic (Land, Ocean) | CEOS | - | Thematic-level (domain classes) | Conceptual |
| Sentinel-2 L2A | Optical (10–60 m) | ESA | 5-day revisit | Pixel-level (surface reflectance) | × |
| Sentinel-1 RTC | SAR (C-band, 20 m) | ESA | 6-day revisit | Pixel-level (terrain-corrected backscatter) | × |
| Planet ARPS | Optical (3 m) | Planet Labs | Near-daily | Pixel-level (four-band reflectance) | × |
| Open Data Cube | Multi-source | ODC | Multi-temporal archive | Pixel-level (gridded data) | × |
| VideoARD (Ours) | Optical video | WHU | Sub-second continuous | Scene/object/event-level | Stabilization, SR |
| Data Level | Product Name | Description |
|---|---|---|
| L0 | Original video | Video products representing surface radiance values after geometric correction, radiometric correction, video stabilization, and super-resolution reconstruction, accompanied by essential auxiliary information |
| L1 | Scene-level VideoARD | Processed video imagery mapped onto a unified temporal and spatial scale, ensuring integrity and consistency of variables, supplemented with scene-level information |
| L2 | Object-level VideoARD | Extends L2 data by incorporating auxiliary information that describes object-level information within the video |
| L3 | Event-level VideoARD | Builds upon L3 data by further adding auxiliary information that characterizes event-level information in the video |
| Metadata Parameter | Description | Minimum Configuration | Optimal Configuration |
|---|---|---|---|
| Video keyframe encoding | Video keyframe sequence number | Video keyframe ID | The ID of the video keyframe after thinning. The ID reflects the time of the video keyframe in the video. |
| Video spatial range | The spatial range of the surface covered by the video data | All video frames use a unified spatial range | Each video frame has a unique spatial range, which is manually verified |
| Video scene information | Scene information in the video | Type of static ground objects in the video | Manually verified background information in the video, consisting of object type, spatial range, and limit value |
| Scene information storage | Saves extracted scene information | Metadata fields | Scene information file |
| Metadata Parameter | Description | Minimum Configuration | Optimal Configuration |
|---|---|---|---|
| Object detection | Detect objects in video frames | Object axis aligned bounding boxes | Object oriented bounding box |
| Object positioning | Extract the location of objects in video frames Objects’ relative position within the video frame | Objects’ actual surface coordinates | |
| Object classification | Classify detected objects | Primary classification | Secondary classification with human interpretation assistance |
| Object information storage | Save extracted object information | Metadata fields | Object information files |
| Metadata Parameter | Description | Minimum Configuration | Optimal Configuration |
|---|---|---|---|
| Object tracking | Tracks the same object in different video frames | Provides different object numbers | Eliminates interference, ensuring the same object has the same number, while different objects have different numbers |
| Trajectory tracking | Extracts the trajectory of a object within a video frame | A set of relative positions of the object within the video | Provides a set of real-world coordinate points corresponding to the time |
| Abnormal behavior limits | Scenario-specific abnormal behaviors | Default restricted behaviors | User-configured restricted behaviors |
| Abnormal behavior perception | Determines whether the object’s behavior in the video is abnormal | Detects default abnormal behaviors | Detects user-configured abnormal behaviors |
| Event information storage | Record the detected abnormal behavior information | Records default abnormal event information, including event type, time period, spatial scope | Associated objects, in metadata. Records complete abnormal event information, including event number, event type, time period, spatial scope, associated objects, object trajectory, and object status, in an abnormal event file |
| Level 1 | Level 2 | Level 3 |
|---|---|---|
| Scene | Artificial surfaces | Building, Bridge, Transportation, Railway, Tunnel, Vegetation, City Furniture, Land Use, Relief, Road, City Object Group |
| Cultivated Areas | Cultivated Areas | |
| Waterbodies | Ocean, Sea, Lake, River, Pond, Swamp | |
| Permanent Snow/Ice | Snow/Ice | |
| Forest Areas | Forest | |
| Grassland | Grassland | |
| Wetlands | Wetlands | |
| Bare Areas | Bare Areas | |
| Object | Person | Person |
| Vehicle | Bicycle, Car, Motocycle, Bus, Truck, Train | |
| Ship | Commercial Vessels, Military Vessels, Leisure Vessels, Special Purpose Vessels, Workboats, Support Vessels | |
| Airplane | Commercial Aircraft, Cargo Aircraft, Military Aircraft, General Aviation Aircraft | |
| Animal | Dog, Cat, Cow, Horse, Elephant, Sheep, Bear, Giraffe, Zebra, … | |
| Micro-Object | Sports Ball | |
| Others | Chair, Dining Table, TV, Bed, … | |
| Event | Motion | Stationary, Constant Speed, Acceleration, Deceleration, Turning, Loitering, Abnormal Path |
| Interaction | Meeting, Collision, Separation, Following, Parallel Movement | |
| State Change | Appearance Change, Disappearance, Emergence | |
| Environmental Monitoring | Wildfire, Flood, Landslide, Earthquake, Tsunami, … | |
| Statistical Aggregated | Object Count Threshold Reached, Abnormal Object Density, … |
| Query Type | Functional Description |
|---|---|
| Slice | Selects a single member from a specified dimension, producing a sub-cube restricted to that member. |
| Trim | Selects multiple members from a specific dimension to form a subset of the data cube. |
| Roll-up | Aggregates measures along a dimension to produce a new cube with coarser granularity. |
| Drill-down | Performs the inverse of roll-up, retrieving data at finer granularity from an aggregated cube. |
| Node Role | Operating System | CPU | Memory | Storage |
|---|---|---|---|---|
| Master | CentOS 7.9.2009 | Intel Xeon CPU E5-2692 v2 @ 2.20 GHz | 94 GB | PB-scale |
| Worker #0 | CentOS 7.9.2009 | Intel Xeon CPU E5-2692 v2 @ 2.20 GHz | 128 GB | |
| Worker #1 | CentOS 7.9.2009 | Intel Xeon CPU E5-2692 v2 @ 2.20 GHz | 128 GB | |
| Worker #2 | CentOS 7.9.2009 | Intel Xeon CPU E5-2692 v2 @ 2.20 GHz | 128 GB |
| Software | Version | Purpose |
|---|---|---|
| MySQL | 5.5.62 | Storage and retrieval of transactional data from the legacy system |
| PostgreSQL | 9.4.4 | Storage and retrieval of remote sensing video metadata |
| Hadoop | 2.7.4 | Distributed file system for large-scale data storage |
| HBase | 1.4.13 | Distributed storage for structured scene-level and object-level data |
| MinIO | 8.3.3 | Distributed object storage for unstructured video data |
| JDK | 1.8.0_131 | Java runtime environment |
| Tomcat | 8.5.57 | Application server for system service deployment |
| Apache | 2.4.10 | Web server for system front-end delivery |
| FFmpeg | 6.0 | Video preprocessing |
| OpenCV | 4.8.0.76 | Object detection and tracking |
| YOLO | v8 | Object detection, classification, and tracking in video streams |
| Datasets | Content |
|---|---|
| VideoARD#1 | 21 Jilin-1 satellite videos and 5 Shuangqing-1 satellite videos |
| VideoARD#2 | 50.4 GB of UAV fire videos and 42.5 GB of UAV forest videos |
| Stage | Duration(s) | Variance (±) | Description |
|---|---|---|---|
| Cube Creation | 32 | 1 | Including preprocessing and storage |
| Data Query | 3.2 | 0.25 | Query and index reading |
| Event Creation | 15 | 5 | Create the event |
| Alert Push | 3.0 | 0.25 | Triggering and push |
| Stage | Duration(s) | Variance (±) | Description |
|---|---|---|---|
| Cube Creation | 10 | 1 | Including preprocessing and storage |
| Data Query | 3.2 | 0.25 | Query and index reading |
| Event Creation | 8 | 5 | Create the event |
| Alert Push | 2.9 | 0.2 | Triggering and push |
| Stage | Duration(s) | Variance (±) | Description |
|---|---|---|---|
| Cube Creation | 10 | 1 | Including preprocessing and storage |
| Data Query | 3.5 | 0.25 | Query and index reading |
| Event Creation | 305 | 150 | Create the event |
| Alert Push | 910 | 1 | Triggering and push |
| Case | Algorithms | Metric | Group A | Group B | |
|---|---|---|---|---|---|
| #1 | YOLOv8+DeepSORT | MOTA | 59.6 | 58.7 | +0.9 |
| IDF1 | 64.9 | 64.1 | +0.8 | ||
| #2 | DEnet | Precision | 85.3 | 84.5 | +0.8 |
| Recall | 83.4 | 82.7 | +0.7 | ||
| #3 | Danesfield | Z-RMSE | 1.29 | 1.32 | −0.03 |
| H-RMSE | 1.91 | 2.06 | −0.15 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, Y.; Zhang, C.; Lu, Y.; Su, Y.; Jiang, X.; Xiang, Z.; Li, Z. VideoARD: An Analysis-Ready Multi-Level Data Model for Remote Sensing Video. Remote Sens. 2025, 17, 3746. https://doi.org/10.3390/rs17223746
Wu Y, Zhang C, Lu Y, Su Y, Jiang X, Xiang Z, Li Z. VideoARD: An Analysis-Ready Multi-Level Data Model for Remote Sensing Video. Remote Sensing. 2025; 17(22):3746. https://doi.org/10.3390/rs17223746
Chicago/Turabian StyleWu, Yang, Chenxiao Zhang, Yang Lu, Yaofeng Su, Xuping Jiang, Zhigang Xiang, and Zilong Li. 2025. "VideoARD: An Analysis-Ready Multi-Level Data Model for Remote Sensing Video" Remote Sensing 17, no. 22: 3746. https://doi.org/10.3390/rs17223746
APA StyleWu, Y., Zhang, C., Lu, Y., Su, Y., Jiang, X., Xiang, Z., & Li, Z. (2025). VideoARD: An Analysis-Ready Multi-Level Data Model for Remote Sensing Video. Remote Sensing, 17(22), 3746. https://doi.org/10.3390/rs17223746

