1. Introduction
The identification of human behavior patterns can play an important role in detecting and characterizing transportation events of interest. Among the more readily observable human behaviors, vehicular activity can provide insight into ongoing or planned activities. The ability to track specific vehicles over time within a region of interest is a key enabling technology for vehicle-based transportation analysis. In this work, our goal is to leverage several multi-view vehicle datasets to train neural networks capable of linking vehicle observations across viewpoints and inferring longer-term patterns. We validate our method with data collected with unmanned aerial vehicles (UAVs) from varying elevations, angles, and distances to simulate different sensor perspectives. The goal of vehicle re-ID is to match vehicle images against previously collected images to identify re-occurrences of the same vehicle.
The ability to track vehicles across a broad suite of sensors provides the added benefit of being able to re-identify a vehicle that visits a specific location but takes a different route each time and thereby passes a different set of sensors. Vehicle detection from UAV imagery has attracted extensive research attention in recent years. UAVs have been increasingly applied to traffic monitoring and utilized as a platform for vehicle detection [
1,
2,
3]. The topic of vehicle re-identification (re-ID) has also been studied extensively, and various approaches have used specific discriminative vehicle features [
4,
5], spatio-temporal information [
6,
7], and combinations of vehicle shape and license plate characteristics [
8]. In our recent paper [
9], we introduced a decision fusion framework using features extracted from vehicle images and their wheel patterns. We configured Siamese networks to select crucial signatures from vehicular pairs of images. The proposed model examined the dependence between features generated from side-view vehicle images to correctly match pairs of vehicles. To that end, we collected more than 500,000 side-view vehicle images with ground-level sensors on the Oak Ridge National Laboratory (ORNL) campus.
Vehicle re-ID over multiple-view camera networks is challenging in intelligent transportation systems because of the largely uncontrolled nature of the camera viewing angles. Vehicle re-ID models can be utilized in many crucial real-world applications, including searching for suspicious vehicles, tracking vehicles across cameras, automating toll collection, managing road-access restrictions, analyzing traffic behavior, counting vehicles, estimating travel times, and supporting access and border control. While both person and vehicle re-identification are challenging, vehicle re-ID is especially difficult since changes in viewpoint cause large variations in a vehicle’s visual appearance. In other words, images of the same vehicle from different viewpoints share much less overlapping visual information than images of the same person, since many discriminative vehicle features are viewpoint-dependent.
Due to ongoing improvements in computational resources, deep network architectures, and data availability, automatic vehicle re-ID for large-scale urban surveillance is now a realizable goal. Vehicle re-ID approaches can be grouped into two categories: single-view (identification from a known, fixed camera viewpoint) and multi-view (identification across different camera viewpoints). Single-view approaches assume that a vehicle’s appearance in an image will not change significantly between subsequent observations; thus, these approaches are suitable for specific types of sensor installations. For example, highway and overpass-mounted cameras capture vehicles from predictable angles, allowing vision-based re-ID methods to assume fixed viewpoints. Such models, however, do not generalize well to uncontrolled scenarios where camera positions and perspectives vary.
In general, surveillance cameras installed at different locations provide distinct viewing angles. Consequently, a multi-view approach is necessary for vehicle re-ID to leverage information from all available cameras. License plate-based multi-view vehicular re-ID models can achieve high accuracy because license plates are unique identifiers. Nevertheless, license plates can be unreliable because of occlusion, challenging illumination conditions, the absence of a license plate in some scenarios, or even deliberate deception. Furthermore, license plates are visible only in the rear or, in some jurisdictions, the front view of a vehicle, which limits their usefulness in realistic scenarios where many surveillance cameras capture top-down or side views.
To the best of our knowledge, few efforts have addressed uncontrolled multi-view vehicle re-ID. Most existing approaches are designed for limited or fixed viewpoints, making viewpoint variation a central challenge. Several studies have attempted to mitigate this issue. Wang et al. [
7] proposed orientation-invariant feature embedding that extracted local region features of different orientations based on 20 key point locations. Prokaj et al. [
10] introduced a method for recognizing a vehicle’s make and model in a video clip taken from an arbitrary viewpoint using a pose estimation-based approach. Zhou et al. [
11] designed an end-to-end deep learning architecture to learn the different viewpoints of a vehicle using long short-term memory. However, these approaches remain limited to constrained settings or controlled datasets and have not generalized to truly uncontrolled multi-view scenarios.
Generative adversarial networks (GANs) have supported a variety of applications, including object re-ID, since their introduction in 2014 [
12]. Multiple frameworks have adopted GANs for vehicle re-ID. Lou et al. [
13] proposed an adversarial learning network that performs end-to-end embedding and can generate samples localized in the embedding space. The proposed adversarial learning scheme automatically produces artificial cross-view samples that enhance the network’s ability to discriminate among similar vehicles. Zhou et al. [
14] proposed a framework for image generation using convolutional GANs. Their deep network architecture, namely cross-view GAN, generates vehicles in different viewpoints based on a single input image to re-identify vehicles and infer cross-view images. A common limitation of these approaches is their reliance on densely sampled camera viewpoints, which are rarely available in practical surveillance settings. Moreover, existing datasets exhibit low diversity, which leads to poor generalization performance for the trained models.
2. Contribution
The objective of this work is to extend the described matching concept to re-identify vehicles across a wide range of viewing angles and distances, such as those acquired from satellite imagery, UAVs, surveillance cameras, and ground sensors. To further enhance the robustness and efficiency of the proposed re-ID approach, we shall integrate synthetic data into the training set. Appropriate inclusion of synthetic data generated under a wide range of simulated viewing angles and environmental conditions enhances overall re-ID accuracy achievable using only collected data.
To diversify our training data, we created a large dataset of both real and synthetic vehicle images with varied vehicle types and camera perspectives. We rendered synthetic high-resolution photorealistic vehicular images using a custom rendering pipeline. The generated images span diverse conditions of lighting, elevation angles, and camera positions. Examples of synthetic images generated by our software are shown in
Figure 1.
The synthetic and real imagery datasets were combined to create a robust training dataset. Furthermore, we developed an algorithm for matching multi-view vehicle images. We evaluated the performance of our vehicle-matching algorithm using both simulated and real data. Several parameters were examined, including viewpoint variation, illumination, and vehicle type. UAV platforms were leveraged to collect multi-perspective real vehicle imagery for training and validation, and imagery from surveillance and traffic cameras as well as ground sensors was also incorporated for both training and validation purposes. This data acquisition simulated both fixed surveillance cameras and overhead sensing assets over a wide variety of viewing configurations. Compared with fixed sensors, on-demand UAV-based collection offers greater flexibility in image resolution, direction, distance, and angle, and provides a continuum of viewpoints for evaluation. Additionally, UAVs can simulate surveillance cameras and overhead assets during collection. When used for wide-area surveillance, UAV imagery offers the opportunity to observe vehicles from multiple angles as they traverse the field of view, thereby enabling improved identification accuracy through the use of richer visual information. Our approach aims to extend the limits of vehicle re-ID across disparate sensor views and deliver a more robust tracking capability.
3. Dataset
Reliable data that reflect practical surveillance conditions are crucial for training and evaluating any vehicle re-ID algorithm. The training dataset used in this study comprises three components: preexisting datasets, a synthetic dataset generated with Blender, and a collected dataset obtained during this project.
3.1. Preexisting Datasets
In this task, we utilized publicly available surveillance benchmark datasets of real imagery (
Table 1). These datasets included VeRi-776 [
15], PKU VehicleID [
16], and VeRr-Wild [
17], as well as two UAV-collected vehicular datasets, namely VRAI [
18] and UAV-VeID [
19]. VeRi-776 [
15] has been widely used by the computer vision community for vehicle re-ID applications. The dataset contains 50,000 images collected by 18 surveillance cameras for 776 different vehicles. Each vehicle has 2 to 18 viewpoints with varying resolutions, occlusion scenarios, and illumination conditions. Each image is annotated with vehicle color, type, model, and license plate information. The PKU VehicleID dataset [
16] was assembled by the National Engineering Laboratory for Video Technology (NELVT). The dataset contains daytime data collected by multiple surveillance cameras in a small city in China. In total, the dataset comprises 26,267 vehicles and 221,763 images. The VERI-Wild dataset [
17] contains 416,314 images of 40,671 vehicles captured under varying viewpoints, illumination conditions, occlusions, and backgrounds. We also utilized the Profile Images and Annotations for Vehicle Re-identification Algorithms (PRIMAVERA) dataset [
20], which contains a comprehensive collection of side-view vehicular images previously employed in research on vehicle re-ID [
9,
21]. This dataset contains 636,246 images of 13,963 distinct vehicles, captured during both daytime and nighttime conditions over several years.
The VRAI dataset [
18] was collected using UAVs and divided into separate training and testing sets. The training set contains 66,113 images of 6302 vehicle identities. The test set includes 71,500 images of 6720 IDs. The UAV Ve-ID dataset [
19] was captured at multiple locations under diverse background, lighting, viewpoint, scale, and partial-occlusion conditions. The UAVs operated at altitudes ranging from 15 to 60 m to produce vehicle images with varying scales and resolutions. This dataset comprises 41,917 vehicle bounding boxes corresponding to 4601 unique vehicles.
3.2. Synthetic Dataset
In addition to real imagery, we generated a synthetic dataset of high-resolution, photorealistic vehicle images using Blender version 4.0.2, scripted with Python 3.10.12 and rendered with the Cycles engine. The purpose of this dataset was to supplement real data and fill gaps in the parameter space by systematically varying scene conditions. A custom Python script automated the rendering of diverse vehicle models under different speeds, camera viewing angles, solar positions, and illumination levels. The simulator was configured to mimic both drone and overhead perspectives while reproducing realistic degradations such as camera jitter, motion blur, and atmospheric distortion. This ensured that the synthetic data complemented real imagery rather than idealizing it, while also extending coverage to viewpoints and lighting conditions that were underrepresented in the collected dataset.
The rendering pipeline was further tuned to target specific regions of the parameter space where real data was sparse, and the collection was expanded to include commercial trucks for added diversity. Synthetic renders were incorporated into both the training and validation sets, where their inclusion improved model robustness to varied conditions and revealed preprocessing refinements that enhanced the utility of the synthetic imagery. By combining realism with systematic variability, the synthetic dataset played a crucial role in strengthening the adaptability and performance of the vehicle re-ID model across a wide range of operating conditions.
3.3. Collected Dataset
To complement preexisting and synthetic datasets, we collected a novel dataset on the ORNL campus using roadside cameras and UAVs. The objective was to construct a multi-perspective vehicle image corpus in which each vehicle is consistently labeled across ground-based and aerial viewpoints. Over a six-month period, six collection events were conducted at three campus sites, producing approximately 20 h of UAV video. These collections yielded several thousand vehicle detections. Because plate reads were occasionally unsuccessful and because some vehicles reappeared across events, the final dataset contains on the order of 103 unique vehicles. During early, limited collections conducted to validate the synchronization pipeline, approximately 300 unique roadside detections were automatically aligned with coincident UAV frames, yielding multiple aerial perspectives for each vehicle. Although the pipeline was developed to support the broader effort rather than as a primary research contribution, it operated with a high degree of automation and demonstrated the feasibility of large-scale multi-view association, even if not every possible match was exhaustively captured.
To manage the data volume and complexity of the collections, we developed and deployed a highly automated processing pipeline, comprising the following steps. Roadside footage was ingested and parsed frame by frame, and vehicles with legible license plates were detected using the Ultimate Automatic License Plate Recognition (ALPR) SDK, version 3.13.0 (Doubango Telecom). Validation logic required consistent multi-frame plate reads before a vehicle was registered. All detections were then organized into subfolders by license plate, which formed the organizing principle for the dataset. Because both ingressing and egressing traffic was recorded, some vehicles were seen multiple times—entering and exiting, or reappearing across events. The folder structure preserved these distinctions while unifying all occurrences under the vehicle’s plate identifier. Each record was further enriched with structured metadata, including make, model, color, and approximate year, inferred by OpenAI’s GPT-4o model accessed through its API.
The pipeline also established correspondences with UAV video, helping establish ground truth for the collected dataset. Temporal alignment was achieved by applying empirically determined temporal offsets between roadside and drone recordings, after which candidate aerial frames were extracted around the coincident timestamps. Each aerial frame was processed with You Only Look Once (YOLO), version 8.3.140 (Ultralytics), which returned bounding boxes for all visible vehicles. These bounding boxes were used to crop the detected vehicles, and the resulting aerial crops were submitted in the same query with a cropped roadside vehicle image to OpenAI’s GPT-4o API. The model returned a probability that each aerial crop corresponded to the roadside vehicle; these probabilities were then thresholded to identify the most likely match, if any, for that frame. UltimateALPR and YOLO were applied as described in the pipeline in accordance with their respective strengths: UltimateALPR excelled at license plate recognition in ground-level imagery but struggled with overhead views, whereas YOLO proved effective for detecting vehicles in UAV frames where plates were rarely legible. Confirmed matches were cropped, annotated, and integrated into the dataset under the appropriate license plate identity.
For clarity, GPT-4o served only as a pairwise scoring component within the ground-truthing workflow. Each query supplied exactly two RGB crops, a roadside image and a UAV crop, together with a short prompt requesting a single probability that the two images depict the same vehicle type. The LLM therefore operated strictly on individual image pairs and played no role in resolving multi-view variability. Fusion of viewpoint differences was addressed later by our dedicated multi-view re-identification model during both training and inference. Although a given UAV frame could contain several candidate vehicles, the likelihood of multiple visually similar candidates appearing simultaneously was low, so coarse type-level cues were sufficient for reliably supporting ground-truth assignment. Accordingly, all crops were downsampled to a maximum of 256 pixels on the longest side, which preserved the attributes needed for reliable scoring while reducing bandwidth, latency, and API cost.
Although GPT-4o proved effective for establishing matches between roadside images and a limited number of candidate vehicles appearing in coincident UAV frames, this approach does not scale to the broader problem of vehicle re-ID. In our setting, the task involved comparing a single reference image against only a handful of candidates, and the GPT-4o large language model (LLM) demonstrated strong accuracy even when viewpoints differed substantially. However, re-ID at scale requires matching a query image against hundreds or thousands of reference vehicles captured under diverse perspectives. Current LLMs are not suited for this task for the following reasons:
Lack of offline availability for frontier models, raising operational security concerns;
Non-negligible response latency;
Nondeterministic outputs, reducing reproducibility;
Black-box operation, with no transparent assurance of which representations are being applied;
Limited control of embedding strategies, with no guarantee that API instructions are faithfully followed; and
Susceptibility to hallucinations when asked to introspect on internal processes.
These limitations formed a central rationale for undertaking this project. Even in the context of rapidly evolving LLM capabilities, we recognized the need for a dedicated vehicle re-ID model that could deliver reproducibility, scalability, and transparent control. This dedicated approach ensures that the system addresses core research needs while avoiding the operational and methodological constraints of relying solely on vision-enabled LLMs.
All processing was executed locally, ensuring both reproducibility and operational security. In addition to generating multi-perspective records for approximately 300 vehicles, an incidental benefit from the project was a demonstration that the pipeline above can reliably automate the process of synchronizing roadside and UAV imagery. The resulting labeled dataset spans a wide range of vehicle types, including sedans, SUVs, fleet vehicles, vans, box trucks, and trucks with specialized mounted equipment. It integrates annotated roadside images, aerial captures, and structured metadata into a resource that would be infeasible to assemble manually. This work establishes a framework for scalable, reproducible dataset generation and highlights best practices for multi-perspective data collection in support of vehicle re-ID research and related applications.
4. Experimental Setup
A key factor that limits the performance of any vehicle re-ID model is inadequate training data. Obtaining enough data is essential for learning both inter- and intra-class variability. Most publicly available datasets consist of non-overlapping viewpoints or poses with either a limited number of classes or few instances per class. Two nearly identical vehicles may differ only in subtle aspects of their visual appearance (i.e., inter-class similarity). Conversely, the same vehicle can appear different under varying conditions such as environment, camera settings, roadside position, illumination, image resolution, and viewing angle. Another challenge for vehicle re-ID approaches lies in data annotation, as building a strong supervised model requires sufficient labeled data. Yet, manually labeling a large dataset is often prohibitively costly. Moreover, having too few images per vehicle or class makes it difficult for a model to learn the intra-class variability. Our goal is to generate a sufficient and complementary dataset that accurately reflects real-world surveillance conditions.
In this work, we used UAV platforms to collect multi-view vehicle imagery on the ORNL campus. Multi-view vehicular data were acquired using the Parrot Anafi drone, a compact, low-cost commercial UAV with a 26–78 mm focal-length camera capable of capturing 21 MP still images and 4K UHD (3840 × 2160) video at 30 fps. Owing to its small size, afforability, and ease of operation, the Parrot Anafi is well suited for capturing imagery from diverse viewing angles for off-board analysis. The platform, illustrated in
Figure 2, offers a flight endurance of approximately 25 min, facilitating efficient multi-view data acquisition.
This study investigates vehicle re-ID from a new perspective, adopting a vision-based approach that seeks to extend the limits of vehicle re-ID across disparate sensor views and provide a more robust capability for tracking and pattern analysis. We introduce a deep multi-view vehicle re-ID framework that can match pairs of vehicles based on their visual cross-view appearance. The proposed method leverages recent advances in deep learning, particularly convolutional neural networks, to learn features of vehicle imagery captured by UAVs from disjoint viewpoints.
To match pairs of features corresponding to a pair of vehicles, we trained a bank of Siamese networks. Siamese networks [
22] are a class of neural networks comprising two or more identical sub-networks. These sub-networks utilize identical parameters and weights; furthermore, the weights are updated in the same manner during training. The purpose of Siamese networks is to measure the resemblance between inputs by comparing their feature representations. A key benefit of a Siamese model is that it does not require retraining or modification when a new object class is introduced or an existing one is removed from the dataset. In addition, Siamese networks handle class imbalance effectively, as only a small number of images per class can be sufficient for training. In our recent paper [
9], we introduced a decision fusion framework using features extracted from vehicle images and their wheel patterns. We designed Siamese networks to capture distinctive feature representations from pairs of vehicle images. Our method investigated the level of dependency among the features derived from side-view vehicle images to effectively merge multiple similarity measures and yield a more accurate matching score between two vehicles. The objective of this work is to expand the described matching concept to re-identify vehicles across a wide variety of viewing angles and distances, such as could be acquired from satellite imagery, UAVs, surveillance cameras, and ground sensors.
Figure 3 shows sample UAV multi-view vehicle images captured at ORNL, illustrating the diversity of perspectives in our dataset.
Network Structure
Building on the training procedure described above, the following description details the network architecture used to learn and compare vehicle image features. A deep network was trained to compare pairs of multi-view vehicle images and produce a matching score representing the distance between feature embeddings. The matching score assesses whether the pair of images belong to the same vehicle and ranges from 0 to 1, where lower values indicate greater similarity. The re-identification accuracy was evaluated using a threshold value of 0.5, with any score below this threshold considered a positive match.
As noted earlier in this work, our framework adopts a Siamese architecture. The multiview matching network consists of seven layers: five convolutional layers followed by two dense layers. The first convolutional layer uses 64 filters of size 10 × 10, the second uses 64 filters of size 7 × 7, the third uses 64 filters of size 4 × 4, the fourth uses 64 filters of size 3 × 3, and the fifth uses 64 filters of size 2 × 2. Each convolutional layer is followed by a max-pooling operation and batch normalization. The output of the fifth convolutional layer is flattened and passed to the two consecutive dense layers. Each dense layer contains 400 neurons with a sigmoid activation function. The outputs of the second dense layers from both branches are subtracted and passed to a final single-neuron dense layer that computes the similarity score. The structure of each branch is listed in
Table 2.
The architecture of the network is illustrated in
Figure 4. For training and validation, we used all datasets described in
Section 3, including PRIMAVERA, Veri, VeriW, PKU Vehicle, VRAI, UAV VehicleID, and a collection of synthetic data. These datasets cover a wide range of viewing conditions, sensor types, and vehicle appearances, allowing the model to learn from diverse examples. By combining both real and synthetic imagery captured from ground cameras, surveillance systems, and UAV platforms, we ensured that the network was exposed to variations in perspective, resolution, lighting, and occlusion. This diversity was essential for developing a model capable of generalizing across different environments and re-identifying vehicles under uncontrolled and challenging scenarios. We randomly sampled image pairs representing either the same vehicle or different vehicles from the datasets. Aside from resizing all images to 234 × 234 × 3 while preserving the aspect ratio, no additional preprocessing was applied. This minimal preprocessing allowed the model to learn to compare vehicle appearances across disparate viewpoints and distances without relying on prior knowledge about the camera perspective or acquisition conditions.
5. Results
The following section presents the performance of the vehicular re-ID network. To ensure consistency across experiments, we used a fixed validation set randomly selected from data not included in training. We randomly selected pairs from all datasets for each training epoch, ensuring an equal number of positive and negative vehicle pairs in each batch. The multi-view network was trained using the Adam optimizer [
23] with a learning rate of 0.005. A binary cross-entropy loss function was applied with a batch size of 100, corresponding to 100 vehicle pairs. Our model was trained for 90,000 epochs, and its performance was assessed on a fixed validation set of 10,000 varied vehicle pairs every 200 epochs. Training accuracy was measured by averaging results across 100 batches. Matching accuracy was evaluated with a threshold of 0.5 on the similarity scores, where scores below this threshold were considered true matches. Model accuracy was computed as the fraction of correct predictions.
Table 3 illustrates the peak accuracy reached in both training and validation. After 79,000 epochs, we saved the vehicle-image pairs that were incorrectly classified in a separate folder and added them to the retraining set used to further fine-tune the network.
Figure 5 depicts vehicle pairs that were correctly matched, and
Figure 6 shows pairs that were initially mismatched.
The performance of the retrained model was evaluated using same validation set employed in the previous step. Results after retraining are shown in
Table 3. These results indicate that retraining the network significantly enhances its robustness when handling challenging or visually similar vehicle pairs.
To evaluate the model’s performance under varying thresholds, we calculated the true positive rate (TPR) and false positive rate (FPR) at threshold values ranging from 0 to 1 with increments of 0.1, using a test set consisting of 10,000 vehicle image pairs.
Figure 7 presents the ROC curve of the retrained model across the different thresholds applied to the distance metric. The ROC curve demonstrates that the model performs well overall, and it indicates that a threshold of 0.5 provides the best balance between true positives and false positives.
To complement the quantitative results, the following figures present qualitative examples illustrating the model’s matching performance.
Figure 8 illustrates correctly matched pairs of the same vehicles, with each subfigure showing example pairs and their corresponding distance measures.
Figure 8b,c highlight cases in which the model successfully re-identified vehicles captured under different illumination conditions.
Figure 9 depicts correctly identified non-matching pairs, that is, different vehicles that were accurately distinguished by the model, along with their distance metrics. These examples demonstrate the model’s ability to discriminate between visually similar vehicles of the same make, model, and color, as illustrated in
Figure 9b–d.
Figure 10 presents examples of false negatives, in which the same vehicles were incorrectly classified as different. These cases often result from improper vehicle detection, such as partial cropping, which leads to geometric inconsistency between the images (
Figure 10a,c). Severe illumination differences (
Figure 10d) or added decorations and loads at different collection times (
Figure 10b) can also contribute to misclassification.
Figure 11 shows examples of false positives, where different vehicles were mistakenly matched as the same. These errors are often caused by poor lighting conditions (
Figure 11a,b) or by vehicles that share nearly identical visual characteristics such as make, model, and color (
Figure 11c,d).
6. Conclusions
This study introduced a deep learning model for vehicle re-identification across a wide range of viewing angles and distances, including perspectives obtainable from satellite imagery, UAVs, surveillance cameras, and ground sensors. To enhance overall performance, we augmented the training set with synthetic data, improving re-identification accuracy beyond what was achievable using only real imagery. The model was trained using a combination of UAV and ground-based camera data, with UAV imagery used to simulate diverse sensor vantage points for training and validation. Results demonstrate that the proposed approach can robustly re-identify vehicles across uncontrolled viewpoints and varying illumination conditions.
Although large multimodal models such as GPT-4o showed strong accuracy in limited cross-view matching tasks, they remain unsuitable for vehicle re-identification at scale due to issues of reproducibility, latency, transparency, and controllability. These limitations motivated the development of our dedicated re-ID framework, which provides a scalable, reproducible, and transparent alternative for operational and research applications.
Author Contributions
Conceptualization, S.G., J.H.H.II and R.A.K.; methodology, S.G., J.H.H.II and R.A.K.; software, S.G. and J.H.H.II; validation, S.G.; formal analysis, S.G.; investigation, S.G.; resources, S.G., J.H.H.II and R.A.K.; data curation, S.G. and J.H.H.II; writing—original draft preparation, S.G.; writing—review and editing, S.G., J.H.H.II and R.A.K.; visualization, S.G. and J.H.H.II; supervision, S.G., J.H.H.II and R.A.K.; project administration, S.G.; funding acquisition, S.G. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the U. S. Department of Energy. This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (
https://www.energy.gov/doe-public-access-plan (accessed on 1 December 2025)).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Acknowledgments
The authors would like to express their sincere gratitude to Andrew Duncan, Jairus Hines, and Zach Ryan for their invaluable assistance with drone data collection and their expert help in the field.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Moranduzzo, T.; Melgani, F. Detecting cars in UAV images with a catalog-based approach. IEEE Trans. Geosci. Remote Sens. 2014, 52, 6356–6367. [Google Scholar] [CrossRef]
- Zhao, T.; Nevatia, R. Car detection in low resolution aerial images. Image Vis. Comput. 2003, 21, 693–703. [Google Scholar] [CrossRef]
- Shao, W.; Yang, W.; Liu, G.; Liu, J. Car detection from high-resolution aerial imagery using multiple features. In Proceedings of the 2012 IEEE International Geoscience and Remote Sensing Symposium, Munich, Germany, 22–27 July 2012; pp. 4379–4382. [Google Scholar]
- Wang, H.; Sun, S.; Zhou, L.; Guo, L.; Min, X.; Li, C. Local feature-aware siamese matching model for vehicle re-identification. Appl. Sci. 2020, 10, 2474. [Google Scholar] [CrossRef]
- He, B.; Li, J.; Zhao, Y.; Tian, Y. Part-regularized near-duplicate vehicle re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3997–4005. [Google Scholar]
- Shen, Y.; Xiao, T.; Li, H.; Yi, S.; Wang, X. Learning deep neural networks for vehicle re-id with visual-spatio-temporal path proposals. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1900–1909. [Google Scholar]
- Wang, Z.; Tang, L.; Liu, X.; Yao, Z.; Yi, S.; Shao, J.; Yan, J.; Wang, S.; Li, H.; Wang, X. Orientation invariant feature embedding and spatial temporal regularization for vehicle re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 379–387. [Google Scholar]
- de Oliveira, I.O.; Fonseca, K.V.; Minetto, R. A two-stream siamese neural network for vehicle re-identification by using non-overlapping cameras. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 669–673. [Google Scholar]
- Ghanem, S.; Kerekes, R.A.; Tokola, R. Decision-based fusion for vehicle matching. Sensors 2022, 22, 2803. [Google Scholar] [CrossRef] [PubMed]
- Prokaj, J.; Medioni, G. 3-D model based vehicle recognition. In Proceedings of the 2009 Workshop on Applications of Computer Vision (WACV), Snowbird, UT, USA, 7–8 December 2009; pp. 1–7. [Google Scholar]
- Zhou, Y.; Liu, L.; Shao, L. Vehicle re-identification by deep hidden multi-view inference. IEEE Trans. Image Process. 2018, 27, 3275–3287. [Google Scholar] [CrossRef] [PubMed]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Lou, Y.; Bai, Y.; Liu, J.; Wang, S.; Duan, L.Y. Embedding adversarial learning for vehicle re-identification. IEEE Trans. Image Process. 2019, 28, 3794–3807. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Y.; Shao, L. Cross-view GAN based vehicle generation for re-identification. In Proceedings of the BMVC, London, UK, 4–7 September 2017; Volume 1, pp. 1–12. [Google Scholar]
- Liu, X.; Liu, W.; Ma, H.; Fu, H. Large-scale vehicle re-identification in urban surveillance videos. In Proceedings of the 2016 IEEE International Conference on Multimedia and Expo (ICME), Seattle, WA, USA, 11–15 July 2016; pp. 1–6. [Google Scholar]
- Liu, H.; Tian, Y.; Yang, Y.; Pang, L.; Huang, T. Deep relative distance learning: Tell the difference between similar vehicles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2167–2175. [Google Scholar]
- Lou, Y.; Bai, Y.; Liu, J.; Wang, S.; Duan, L. Veri-wild: A large dataset and a new method for vehicle re-identification in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 3235–3243. [Google Scholar]
- Jiao, B.; Yang, L.; Gao, L.; Wang, P.; Zhang, S.; Zhang, Y. Vehicle Re-Identification in Aerial Images and Videos: Dataset and Approach. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 1586–1603. [Google Scholar] [CrossRef]
- Teng, S.; Zhang, S.; Huang, Q.; Sebe, N. Viewpoint and scale consistency reinforcement for UAV vehicle re-identification. Int. J. Comput. Vis. 2021, 129, 719–735. [Google Scholar] [CrossRef]
- Kerekes, R. Profile Images and Annotations for Vehicle Re-Identification Algorithms (PRIMAVERA); Technical Report; Oak Ridge National Lab. (ORNL): Oak Ridge, TN, USA, 2022.
- Ghanem, S.; Kerekes, R.A. Robust wheel detection for vehicle re-Identification. Sensors 2022, 23, 393. [Google Scholar] [CrossRef] [PubMed]
- Chicco, D. Siamese neural networks: An overview. In Artificial Neural Networks; Springer: Berlin, Germany, 2021; pp. 73–94. [Google Scholar]
- Kingma, D.P. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).