Recent Advances in Multi-Camera Computer Vision for Industry 4.0 and Smart Cities: A Systematic Review

Fierro-Silva, Carlos Julio; Del-Valle-Soto, Carolina; Mostafa, Samih M.; Varela-Aldás, José

doi:10.3390/a19040249

Open AccessSystematic Review

Recent Advances in Multi-Camera Computer Vision for Industry 4.0 and Smart Cities: A Systematic Review

by

Carlos Julio Fierro-Silva

¹,

Carolina Del-Valle-Soto

²

,

Samih M. Mostafa

³

and

José Varela-Aldás

^1,*

¹

Centro de Investigación MIST, Facultad de Ingenierías, Universidad Tecnológica Indoamérica, Ambato 180103, Ecuador

²

Facultad de Ingeniería, Universidad Panamericana, Zapopan 45010, Mexico

³

Computer Science Department, Faculty of Computers and Information, South Valley University, Qena 83523, Egypt

^*

Author to whom correspondence should be addressed.

Algorithms 2026, 19(4), 249; https://doi.org/10.3390/a19040249

Submission received: 20 February 2026 / Revised: 20 March 2026 / Accepted: 22 March 2026 / Published: 25 March 2026

(This article belongs to the Special Issue Algorithmic Innovations: Bridging Theoretical Foundations and Practical Applications (2nd Edition))

Download

Browse Figures

Versions Notes

Abstract

The rapid deployment of surveillance cameras in urban, industrial, and domestic environments has intensified the need for intelligent systems capable of analyzing video streams beyond the limitations of single-camera setups. Unlike traditional single-camera approaches, multi-camera systems expand spatial coverage, reduce blind spots, and enable consistent tracking of people and objects across non-overlapping views, thereby improving robustness against occlusions and viewpoint changes. This article presents a comprehensive review of multi-camera vision systems published between 2020 and 2025, covering application domains including public security and biometrics, intelligent transportation, smart cities and IoT, healthcare monitoring, precision agriculture, industry and robotics, pan–tilt–zoom (PTZ) camera networks, and emerging areas such as retail and forensic analysis. The review synthesizes predominant technical approaches, including deep-learning-based detection, multi-target multi-camera tracking (MTMCT), re-identification (Re-ID), spatiotemporal fusion, and edge computing architectures. Persistent challenges are identified, particularly in inter-camera data association, scalability, computational efficiency, privacy preservation, and dataset availability. Emerging trends such as distributed edge AI, cooperative camera networks, and active perception are discussed to outline future research directions toward scalable, privacy-aware, and intelligent multi-camera infrastructures.

Keywords:

multi-camera; multi-view; surveillance; re-identification; edge computing; computer vision

1. Introduction

In recent years, a massive deployment of surveillance cameras has been observed in urban, industrial, and domestic environments, driving the demand for intelligent systems capable of automatically analyzing video streams from multiple cameras. Unlike single-camera scenarios, multi-camera systems expand spatial coverage, reduce blind spots, and coordinate multiple views, enabling continuous tracking of people or objects across different locations with minimal human intervention [1,2]. In public security, these networks support large-area surveillance and the recognition of critical events, including anomalous behaviors [3,4]. In intelligent transportation, distributed cameras enable vehicle tracking, speed enforcement, and traffic analysis in real-world scenarios [5,6]. In healthcare, multi-camera systems have been proposed for fall detection and activity recognition [7,8,9]. Furthermore, recent applications include industry and mining (e.g., mine or process monitoring), smart cities and IoT, as well as precision agriculture for animal behavior monitoring [10,11,12]. Figure 1 illustrates the conceptual difference between single-camera and multi-camera detection scenarios. In a single-camera setup, object detection and tracking rely exclusively on one viewpoint, which may be affected by occlusions, a limited field of view, or unfavorable perspectives. In contrast, a multi-camera configuration provides complementary viewpoints of the same scene, improving detection robustness, spatial consistency, and identity preservation across perspectives. The availability of multiple synchronized views reduces ambiguity and enhances the reliability of object localization and tracking, particularly in dynamic and crowded environments.

Despite their advantages, multi-camera systems present significant technical challenges. A central issue is robust multi-target multi-camera tracking while preserving identity, particularly in non-overlapping camera networks with illumination variations, viewpoint changes, and occlusions, which increase the complexity of inter-camera association compared to single-camera scenarios [13,14,15]. Additionally, the massive volume of data generated by camera networks demands efficient computational architectures for real-time processing and resource allocation [16,17,18]. In this context, edge computing has become essential to reduce latency, bandwidth consumption, and exposure of sensitive data, enabling deployments in which analysis occurs close to the capture point [19,20]. Moreover, integration issues with IoT and device–object pairing in multi-camera environments introduce additional challenges related to interoperability and consistency [21].

Recent advances in computer vision and deep learning have driven more sophisticated solutions for multi-camera systems. In particular, modern approaches integrate detection, tracking, and re-identification (Re-ID), including training strategies that improve cross-camera generalization and lightweight variants for edge deployment [22,23,24]. In biometrics, multi-camera approaches have been explored for pose-robust face recognition and gait analysis, broadening the identification spectrum in surveillance contexts [25,26,27]. At the system level, the convergence of IoT and AI has fostered distributed and collaborative architectures, including coordination and trust mechanisms for multi-camera tracking in edge computing environments [28,29,30].

In this context, the rapid evolution of computer vision and deep learning techniques has further strengthened the capabilities of multi-camera systems, enabling more robust and scalable solutions for complex real-world scenarios. In particular, the evolution of object detection techniques, as comprehensively analyzed in recent surveys, has provided a strong foundation for accurate recognition and localization tasks in dynamic environments [31]. Furthermore, emerging paradigms based on transformer architectures have introduced new opportunities for modeling long-range dependencies and multi-view representations, improving cross-camera understanding and scene interpretation [32]. In parallel, the integration of edge computing has become a key enabler for real-time multi-camera analytics, reducing latency and bandwidth consumption while allowing distributed processing closer to data sources [33]. These advances are particularly relevant in large-scale surveillance and smart city applications, where distributed intelligent systems must efficiently coordinate multiple data streams [34]. Additionally, recent research in multi-target multi-camera tracking and re-identification has demonstrated the importance of robust feature representation and cross-view consistency to ensure reliable tracking across non-overlapping camera networks [35].

Although there are reviews focused on subproblems, such as multi-target multi-camera tracking [13,36] or anomaly detection in video surveillance [3], a comprehensive perspective consolidating applications and emerging trends in multi-camera systems during the 2020–2025 period is still needed. Therefore, this article presents a comprehensive review of the recent literature, organized by application domains (security and biometrics, intelligent transportation, smart cities/IoT, healthcare, agriculture/environment, industry/robotics, mobile/PTZ cameras, and other emerging areas), with the aim of identifying predominant techniques, significant advances, and knowledge gaps to guide future research [37].

In line with this perspective, this review is not limited to Industry 4.0 and cyber–physical production environments. Instead, multi-camera computer vision systems are examined as a transversal technology that enables a wide range of intelligent infrastructures, including smart cities, transportation systems, healthcare monitoring, environmental analysis, and retail analytics. Consequently, the focus is placed on the technological evolution of these systems across multiple domains rather than on purely industrial applications.

2. Methodology

This systematic review was conducted following the PRISMA 2020 framework guidelines to ensure transparency, traceability, and reproducibility in the identification, selection, and synthesis of the scientific literature [38]. The methodological process was designed to structure the analysis in a manner comparable to recent reviews on multi-camera systems and computer vision [13,36,37].

2.1. Search Strategy

To conduct this systematic review, the PRISMA 2020 guidelines were followed to ensure transparency and reproducibility in the identification, screening, and synthesis of relevant studies (See Supplementary Materials). The review was not prospectively registered. Literature searches were conducted in major scientific databases relevant to engineering and computer vision research, including IEEE Xplore, Scopus, and Web of Science. The search terms consisted of combinations of keywords related to multi-camera systems and their primary application domains. Specifically, the search strategy was structured using Boolean operators as follows: (“multi camera” OR “multi-camera” OR “multiple cameras” OR “multi camera tracking” OR “multi view” OR “multi-view” OR “camera network”) AND (“surveillance” OR “smart city” OR “intelligent transportation”).

The search strategy was designed to identify relevant studies published between January 2020 and December 2025, and all databases were last searched in January 2026. Database-specific filters were applied to restrict the results to journal articles and conference papers within the subject areas of engineering, computer science, and automation. In addition, a backward snowballing procedure was conducted by examining the reference lists of selected studies to identify potentially relevant publications that may not have been retrieved during the initial database search.

Additionally, a snowballing technique was employed to identify further studies from the reference lists of relevant papers, which is a common practice in technical reviews in this field [3].

2.2. Selection Process

All retrieved references were managed using a reference management tool to remove duplicates. Subsequently, a two-phase screening process was applied: (i) title and abstract evaluation to exclude studies outside the scope, and (ii) full-text review to verify compliance with the inclusion criteria.

After systematic filtering and evaluation, a total of 93 studies that met all established criteria were included (Figure 2). These studies were organized into a structured matrix documenting the application domain, main task, multi-camera configuration, key technologies, and central contribution, enabling a coherent comparative analysis across domains.

Some studies addressed multiple application domains (e.g., intelligent transportation and smart cities). In such cases, classification was performed according to the primary application context emphasized by the authors. When the contribution was clearly multidisciplinary, the study was assigned to the domain where the multi-camera system played the most central functional role in the proposed architecture.

2.3. Inclusion Criteria

Studies were considered if they:

Were peer-reviewed publications between 2020 and 2025.
Explicitly addressed multi-camera systems or camera networks (minimum of two cameras).
Proposed technical advances in detection, tracking, re-identification, data fusion, or edge/cloud architectures.
Applied multi-camera configurations in domains such as surveillance, transportation, healthcare, agriculture, industry, or smart cities.

2.4. Exclusion Criteria

Studies were excluded if they:

Focused exclusively on single-camera scenarios.
Did not present relevant technical contributions or experimental validation.

2.5. Research Questions

The review was structured around the following questions:

What are the main application domains of multi-camera systems during 2020–2025?
What techniques and technologies are used to address the specific challenges of these systems?
What advances and limitations have been recently reported?
What research gaps remain, and what future directions are proposed?

These questions guided the systematic extraction of information and the organization of results by application domain, enabling a cross-domain analysis of the state of the art.

3. Results

3.1. Overview of Included Studies

After applying the methodological procedure described above, a total of 93 studies published between 2020 and 2025 were ultimately selected, covering a broad spectrum of domains and applications for multi-camera systems. The findings are organized into thematic subsections according to the predominant application domain, considering that some studies may belong to more than one category.

The most represented domains include public security and biometric surveillance, intelligent transportation and traffic, smart cities and IoT, healthcare and monitoring of vulnerable individuals, precision agriculture and environmental monitoring, industry and robotics, as well as emerging applications involving mobile cameras, drones, and pan-tilt-zoom (PTZ) systems (see Table 1).

It is important to note that the domain classification presented in Table 1 is not mutually exclusive. Several studies address multiple application contexts simultaneously due to the transversal nature of multi-camera vision technologies. Based on our analysis, approximately one third of the reviewed studies contribute to more than one domain.

The most common overlaps occur between Public Security and Intelligent Transportation, as well as between Smart Cities and IoT-based urban monitoring applications. These intersections largely reflect the general applicability of core technologies such as multi-camera multi-object tracking (MTMCT), cross-camera re-identification, and distributed video analytics across different operational environments.

3.2. Public Security and Biometric Surveillance

Multi-camera surveillance oriented toward public security constitutes one of the most consolidated and technically mature domains in the recent literature. These systems are designed not only to prevent crime and detect suspicious behavior but also to maintain continuous and coherent tracking of individuals across heterogeneous urban environments such as city streets, transportation hubs, airports, campuses, and densely crowded public spaces. A central research task in this context is Multi-Target Multi-Camera Tracking (MTMCT), typically complemented by person re-identification (Re-ID) techniques to preserve an individual’s identity while moving across cameras with non-overlapping or partially overlapping fields of view [13,14]. Together, MTMCT and Re-ID enable long-term trajectory reconstruction, cross-camera identity consistency, and forensic-level traceability.

Early approaches focused on inter-camera association through collaborative probabilistic models and robust visual descriptors designed to encode color, texture, and geometric information [2,39]. While these methods introduced important mechanisms for cross-view matching, they were often sensitive to illumination changes, pose variation, and occlusions. Subsequently, the integration of deep learning significantly improved identity discrimination under challenging conditions, including illumination variability, pose changes, and partial occlusions [22,23]. Deep feature embeddings enabled end-to-end representation learning, strengthening robustness across viewpoints.

More recent models incorporate fine-grained spatiotemporal constraints, graph-based optimization, and distributed processing strategies to enhance global tracking consistency across extended camera networks [24,40,41]. These approaches move beyond appearance-based similarity and integrate motion dynamics, temporal coherence, and contextual reasoning, reflecting a shift toward holistic multi-view scene modeling.

In real-world deployment contexts, architectural efficiency has become a decisive factor. Edge-computing-based architectures have been proposed to execute Re-ID and tracking modules directly on local gateways or AIoT devices, thereby reducing latency, bandwidth consumption, and exposure of sensitive data [16,19]. Likewise, distributed and collaborative approaches, including blockchain-based mechanisms and active perception strategies in mobile camera networks, have been introduced to improve scalability, resilience, and trust in complex urban environments [29,30].

At the architectural level, distributed tracking approaches integrating Re-ID modules in multi-camera scenarios have also been reported, along with practical implementations that combine modern detectors (e.g., YOLO variants) with tracking algorithms to build complete end-to-end pipelines [42,43,44]. These unified frameworks demonstrate operational feasibility and near-real-time performance in surveillance infrastructures.

Beyond traditional tracking, additional biometric techniques have been explored in multi-camera networks to reinforce identification reliability. Multi-camera face recognition has been employed to improve identification rates in scenarios characterized by partial views or uncontrolled environmental conditions [25,26]. In parallel, gait recognition has emerged as a non-intrusive biometric alternative capable of identifying individuals at a distance even when facial information is unavailable [27]. Studies highlight that viewing angle, clothing variability, and illumination significantly affect accuracy, motivating the deployment of multiple cameras to obtain more invariant and complementary representations.

Regarding anomaly and threat detection, the availability of multiple synchronized perspectives has been shown to improve robustness against occlusions and reduce false alarm rates. Recent reviews demonstrate the rapid growth of anomaly detection techniques in video surveillance, combining classical modeling approaches with deep spatiotemporal networks [3]. Multi-camera-specific proposals include weakly supervised frameworks such as MC MIL [45] and deep spatiotemporal models for detecting anomalous behaviors in dense urban environments [46]. Additionally, the integration of violent activity recognition using optimized YOLO-based models in multi-camera configurations has been explored in educational environments [47].

Other works have addressed precise pedestrian detection and localization through multi-camera extrinsic calibration and three-dimensional reconstruction, strengthening spatial coherence across views and enabling metric-level consistency [48,49]. Hybrid approaches combining HOG descriptors and CNN architectures have also been developed to improve detection rates in non-overlapping camera networks [15]. Overall, the 2020–2025 literature demonstrates significant progress in multi-camera surveillance, primarily driven by deep learning, distributed architectures, and spatiotemporal fusion strategies. However, challenges remain related to scalability as the number of cameras increases substantially, the limited availability of labeled multi-camera datasets for anomaly detection, and the need to balance performance and privacy in large-scale biometric applications.

While convolutional neural networks (CNNs) remain the dominant backbone in most multi-camera vision systems due to their computational efficiency and strong performance in real-time detection tasks, recent research has begun exploring Transformer-based architectures such as Vision Transformers (ViT) and Swin Transformers. These models enable improved modeling of global contextual relationships and long-range feature dependencies across camera views, which can be particularly beneficial for tasks such as cross-camera tracking and person re-identification (Re-ID).

However, despite their promising capabilities, Transformer-based models generally require higher computational resources, which currently limits their adoption in real-time and edge-based multi-camera deployments. Consequently, lightweight CNN-based architectures continue to dominate practical implementations in surveillance and smart city environments.

3.3. Intelligent Transportation and Traffic

The domain of intelligent transportation systems (ITS) represents one of the most active application areas for multi-camera networks, as vehicle traffic management and road safety require continuous observation from multiple angles. In complex urban environments, including intersections, highways, tunnels, and parking facilities, a single camera is insufficient to cover the entire scene. Therefore, coordinated multi-camera configurations are deployed to enable continuous tracking of vehicles and pedestrians across different segments of the road network [50,51].

One of the most developed research lines is multi-camera multi-object tracking (MC MOT) applied to vehicles. Several studies have incorporated deep-learning-based vehicle re-identification techniques to maintain vehicle identities across non-overlapping cameras [52,53]. In particular, approaches combine YOLO-type detectors with feature extraction networks (e.g., OSNet or attention-based variants) to strengthen inter-camera association and improve identity preservation under viewpoint changes [54].

The TIMS system (Traffic Informed Multi-Camera Sensing) was proposed to incorporate contextual information about vehicle flow to improve detection association between nearby cameras, thereby optimizing temporal tracking coherence [50]. Likewise, recent proposals have demonstrated real-time vehicle tracking in congested scenarios, such as drive-through environments, while maintaining identity despite prolonged occlusions [55]. These results confirm that multi-camera collaboration reduces ambiguities that would be difficult to resolve in single-camera configurations.

Regarding road safety and violation detection, multi-camera systems have been developed for average speed enforcement, dangerous driving detection, and anomalous trajectory analysis over extended highway segments [56,57]. The integration of multiple views enables the reconstruction of complete trajectories and the detection of events such as sudden braking, abrupt lane changes, or illegal maneuvers. Additionally, the use of heat maps and field-of-view overlap reasoning has improved the robustness of multi-vehicle tracking in complex environments [53].

Detection under adverse conditions has also been investigated. For example, methods have been proposed that fuse vehicle parts detected by different cameras to improve nighttime detection and reduce false alarms [58]. In high-speed scenarios, specific techniques have been introduced to compensate for motion blur using regression models and feature fusion prior to inter-camera association [59]. These solutions illustrate how the particular challenges of the ITS domain require specialized strategies in multi-camera environments.

Beyond individual tracking, multi-camera networks enable macro-level traffic analytics. Pedestrian re-identification across cameras has been used to estimate origin–destination (O–D) matrices in transportation infrastructures, allowing the inference of mobility patterns and dwell times [60]. Similarly, graph-based frameworks have been proposed that model the road network and multi-camera detections as dynamic graphs to predict congestion and optimize traffic management [61]. These approaches integrate visual sensing with predictive analytics, extending the scope of ITS beyond simple detection.

Another relevant application is intelligent parking management. Adaptive fusion of multiple cameras enables the estimation of parking occupancy with greater robustness to illumination variations and viewing angles [62]. These systems dynamically adjust the weight assigned to each camera according to environmental conditions, improving metrics such as IoU and reducing false positives in nighttime scenarios.

Recent reviews emphasize that combining multiple cameras and complementary sensors is essential to achieving comprehensive coverage in traffic systems and autonomous vehicles [63]. Furthermore, cooperative edge–cloud architectures have been implemented to distribute the computational load of multi-camera vehicle tracking, reducing latency and bandwidth requirements [18,51]. Overall, multi-camera systems applied to intelligent transportation during 2020–2025 have demonstrated significant advances in city-scale continuous tracking, violation detection, dangerous driving monitoring, adverse-condition perception, and advanced mobility analytics. However, challenges remain related to scalability in large urban networks, interoperability across heterogeneous infrastructures, and legal implications arising from the combined use of vehicle and biometric recognition in public spaces. Additionally, previous works have explored vehicle re-identification with tracking context in highways and multi-camera environments, reinforcing cross-view association and temporal consistency [64,65,66].

3.4. Smart Cities and IoT: Distributed Multi-Camera Networks

With the consolidation of the Smart Cities paradigm, multi-camera systems have become an essential component of interconnected urban infrastructure. In this context, cameras no longer operate as isolated devices but are integrated into IoT ecosystems and edge–cloud architectures that enable ubiquitous, scalable, and resource-efficient surveillance [11,37]. Smart cities deploy distributed cameras in streets, public transportation systems, buildings, and open spaces not only for security purposes but also for urban service management, including traffic control, energy optimization, and infrastructure monitoring.

The main challenge lies in coordinating the large volumes of data generated by heterogeneous cameras in real time while ensuring low latency, energy efficiency, and data protection. A widely adopted strategy is edge computing, in which detection and tracking tasks are executed on nodes close to the capture source, reducing reliance on centralized data centers [16,17]. In this direction, cooperative cloud–edge architectures have been proposed in which primary analysis occurs locally and only relevant events or metadata are transmitted to central urban platforms [11]. To better illustrate the structural organization of multi-camera systems in smart city environments, Figure 3 presents a layered architecture integrating sensing, communication, distributed processing, and application services. This framework reflects common design patterns observed in urban surveillance systems, where data is processed across edge, fog, and cloud layers to enable scalable and real-time decision-making.

A representative example is the implementation of re-identification microservices on AIoT gateways, which balance computational load and preserve privacy by avoiding the transmission of raw images to the cloud [19]. These systems employ dynamic orchestration and lightweight virtualization techniques to scale according to the number of active cameras, demonstrating feasibility in real urban deployments.

While most privacy preservation strategies in multi-camera smart city systems rely on architectural approaches such as edge computing and local processing, recent research has begun exploring algorithmic privacy-enhancing technologies (PETs). Approaches such as federated learning and differential privacy enable collaborative model training across distributed camera nodes without sharing raw visual data.

However, the adoption of these techniques in multi-camera deployments remains limited due to several practical challenges, including additional computational overhead, synchronization requirements among distributed camera nodes, and potential performance degradation in real-time detection and tracking tasks. Consequently, most current urban surveillance systems still prioritize edge-based data reduction and metadata transmission as primary privacy-preserving mechanisms.

The convergence between multi-camera video and IoT infrastructures has also driven self-organization and distributed coordination schemes. Recent proposals explore intelligent interconnection mechanisms among cameras through spatial optimization and collaborative strategies [28]. Furthermore, distributed frameworks supported by blockchain technologies have been introduced to ensure integrity and traceability in decentralized urban surveillance environments [29].

The integration of cameras with other IoT sensors expands multimodal analytics capabilities. In complex urban scenarios, multi-camera systems can be complemented with acoustic, environmental, or traffic sensors to detect critical events with greater accuracy. In particular, recent anomaly-detection frameworks in smart cities combine multiple video views with deep spatiotemporal processing to recognize anomalous behaviors in densely populated environments [3,46].

Representative smart city applications illustrate how multi-camera systems support large scale urban monitoring and management tasks. For instance, distributed camera networks can be used for public infrastructure monitoring, detecting anomalies such as flooding, structural damage, or vandalism in streets and public facilities [67].

Another important use case involves crowd management during large scale public events, where fixed and mobile cameras are combined to estimate crowd density, monitor pedestrian flows, and detect potentially dangerous situations [41].

Beyond security and transportation, multi-camera systems in smart cities enable the monitoring of pedestrian flows in public spaces, the estimation of occupancy densities, and support for real-time decision-making. For example, bird’s-eye-view projection systems combine multiple views to estimate social distancing and crowd density [68]. Similarly, occupancy and flow estimation in smart buildings have relied on multi-camera detection and tracking techniques [69,70]. In parallel, scene composition and mosaicking from multiple calibrated cameras have been addressed to improve coverage and global scene understanding, especially in traffic and urban surveillance contexts [71].

Another emerging domain is energy management and the optimization of urban resources. Presence detection through multi-camera networks enables the dynamic adjustment of public lighting or HVAC systems in smart buildings, integrating computer vision with automated control systems. The literature emphasizes the importance of designing camera networks while considering integration with communication infrastructure and other intelligent devices, prioritizing scalability, efficiency, and interoperability [18].

In summary, recent developments point toward distributed, cooperative multi-camera networks empowered by edge AI, capable not only of observation but also of generating autonomous local actions. However, challenges remain related to interoperability among heterogeneous systems, the distributed updating of AI models, and data governance in large-scale urban environments, all of which are critical aspects for consolidating intelligent surveillance as a central component of future cities. Additionally, semantically guided multi-camera pedestrian detection approaches and trajectory forecasting models based on multiple cameras have been proposed, extending analytics capabilities beyond instantaneous tracking [72,73].

3.5. Healthcare and Monitoring for Vulnerable Individuals

The healthcare and assisted-care domain has progressively adopted multi-camera systems for medical emergency detection, home monitoring, and epidemiological surveillance. Between 2020 and 2025, applications such as fall detection for older adults and social distancing monitoring during the COVID–19 pandemic have been particularly prominent, reflecting the growing role of intelligent visual systems in public health and assisted-living environments.

Automatic fall detection constitutes a critical application in geriatric care facilities and hospital environments, where a rapid response can significantly reduce morbidity and mortality. Although wearable accelerometer-based devices have been widely used, they present limitations related to user comfort, incomplete spatial coverage, battery dependency, and potential non-compliance. In contrast, vision-based systems provide a non-invasive alternative; however, single-camera configurations often suffer from occlusions, blind spots, and limited coverage range. Multi-camera configurations mitigate these drawbacks by expanding spatial coverage, reducing blind areas, and enabling multi-view confirmation of critical events.

Shu and Shu [7] developed an eight-camera fall detection system deployed in a home environment, capable of recognizing different types of falls at significantly greater distances compared with single-camera approaches. By fusing multiple viewpoints, the system minimized occlusion effects and achieved high detection accuracy using local processing on low-cost hardware, demonstrating feasibility for smart-home integration. Similarly, Ezatzadeh et al. [8] proposed a multi-camera fusion framework for fall detection that integrates spatial and temporal information to improve robustness against illumination variations and viewpoint changes. These studies demonstrate how visual redundancy across cameras enhances both sensitivity and specificity while reducing false alarms.

Integration with the IoMT (Internet of Medical Things) paradigm has further expanded these capabilities. Hussain et al. [9] introduced a human-centric attention framework based on deep multi-scale fusion, combining information from multiple cameras with contextual data for activity recognition in medical environments. Such multimodal solutions enable the correlation of visual information with physiological or environmental variables, thereby improving the detection of critical events such as collapses or anomalous behaviors.

During the COVID–19 pandemic, multi-camera networks were widely employed for social-distancing monitoring and contact tracing. Tseng et al. [74] proposed a deep-learning-based person retrieval approach for video surveillance, enabling the identification of prolonged proximity between individuals on campuses, in hospitals, and in public spaces. Likewise, bird’s-eye-view projection approaches combined multiple perspectives to estimate occupancy densities and detect interpersonal distance violations with higher geometric consistency [68].

In hospital and emergency settings, multi-camera systems have also been implemented for detecting anomalous behaviors or risky situations involving vulnerable patients, using deep spatiotemporal models that simultaneously analyze multiple views [46]. Recent reviews on anomaly detection highlight the increasing incorporation of multi-camera architectures in healthcare contexts [3]. Salau and Krieter [75] applied instance segmentation with Mask R-CNN in a multi-camera environment to detect and localize dairy cows, demonstrating the effectiveness of instance-level segmentation in complex scenes with frequent occlusions.

A critical aspect in this domain is privacy preservation. Since these systems operate in domestic or clinical environments, many studies prioritize local inference on edge devices, avoiding the transmission of raw video to external servers [7]. Furthermore, visual anonymization techniques, such as silhouettes, pose maps, or skeletal representations instead of full RGB imagery, have been proposed to protect patient identity while maintaining detection capability.

Overall, the recent literature demonstrates that multi-camera systems significantly improve spatial coverage, reduce detection latency, and increase reliability in assisted-care applications. Nevertheless, challenges remain regarding ethical acceptance, regulatory compliance in the handling of sensitive health data, and the balance between high accuracy and strict privacy constraints. Continued advancements in edge hardware, lightweight deep learning models, and IoMT integration suggest that these systems will become increasingly accessible, enabling proactive remote assistance and the early detection of critical medical events.

3.6. Precision Agriculture and Environmental Monitoring

In the agricultural and environmental sectors, multi-camera systems have emerged as strategic tools to enhance productivity, animal welfare, and ecological surveillance. The digitalization of agriculture (AgTech) increasingly incorporates computer vision for livestock tracking, animal segmentation, crop monitoring, and environmental assessment, where multiple cameras enable the coverage of extensive areas or provide complementary viewpoints for more robust analysis.

A prominent application is livestock monitoring. In farming environments, multi-camera configurations allow the supervision of large pens or barns without reliance on wearable sensors, which may cause stress or require maintenance. Salau and Krieter applied instance segmentation based on Mask R–CNN in a dairy farming context using multiple cameras, demonstrating that cows can be segmented and counted even in scenes with partial overlap [76]. The integration of multi-view segmentation enhances counting accuracy and reduces identity switching under occlusions.

More recently, multi-camera fusion with bird’s-eye-view projection has been proposed for continuous monitoring of cattle in large enclosures, integrating perimeter camera views into a unified top-down spatial representation [12]. This approach facilitates the analysis of movement patterns, feeding behavior, and anomalous activities, contributing to the early detection of stress or disease.

Advances in detection models have also been evaluated for individual animal identification. Borwarnginn et al. [77] conducted comparative analyses of YOLO architectures in precision livestock scenarios, highlighting the importance of well-annotated multi-camera datasets to improve robustness against illumination variability and morphological similarity among animals. These findings underscore the need for standardized benchmarks tailored to agricultural contexts.

In environmental monitoring, multi-camera networks have been applied to water-level detection and flood prevention. Borwarnginn et al. implemented a system using CCTV cameras to estimate river levels through deep learning, demonstrating feasibility in repurposing existing infrastructure for early warning systems [78]. Multi-point observation enhances reliability under perspective distortion and adverse weather conditions.

Additionally, aerial and fluvial datasets combining fixed cameras and unmanned aerial vehicle (UAV) platforms have been introduced for semantic segmentation of riverbeds and riparian vegetation, enabling the training of models that integrate ground and aerial viewpoints [67]. This hybrid approach expands spatial coverage and supports continuous ecosystem monitoring.

A critical technical requirement in outdoor environments is accurate multi-camera calibration. Tripicchio et al. [48] proposed a real-time extrinsic calibration method for distributed cameras deployed in large open areas, enabling consistent 3D reconstruction and coherent spatial tracking. Such calibration is essential in structural monitoring applications, including dam, bridge, or infrastructure displacement analysis.

Experimental evaluation of CNN-based positioning and detection systems using fixed cameras has further provided empirical evidence regarding accuracy limitations and deployment constraints in real-world conditions [79]. In the UAV domain, photorealistic multi-camera simulators have been developed for drone-based perception research, facilitating the training and benchmarking of algorithms under complex agricultural and environmental scenarios [80]. These simulation environments enable the modeling of interactions between mobile and fixed cameras for integrated monitoring strategies.

Overall, multi-camera applications in agriculture and environmental monitoring have demonstrated measurable benefits in livestock tracking, early disease detection, hydrological monitoring, and ecological observation. Nonetheless, challenges persist regarding hardware durability under harsh outdoor conditions, connectivity limitations in rural areas, and computational constraints in resource-limited sites. Despite these constraints, multi-camera fusion remains a promising strategy for expanding situational awareness and supporting data-driven decision-making in agricultural and environmental domains.

3.7. Industry and Robotics

In industrial and automation environments, multi-camera systems have become key components for improving operational efficiency, safety, and robotic perception within the framework of Industry 4.0. Multiple cameras are typically integrated with cyber-physical systems, autonomous mobile robots, and IIoT platforms, providing multi-angle observation for monitoring, quality control, process supervision, and navigation tasks. The redundancy and geometric diversity offered by multi-camera configurations enhance robustness in complex and dynamic industrial settings.

In mining and heavy industry, multi-camera configurations are primarily deployed to expand visual coverage in hostile and confined environments. Bai et al. [10] developed a real-time video stitching system for surveillance in underground mines, combining multiple cameras into a continuous panoramic mosaic. Through hybrid image-registration techniques and geometric alignment, the system mitigated adverse lighting conditions and airborne particles, significantly improving situational awareness in narrow tunnels where individual camera fields of view are severely limited.

Within smart-factory environments, multi-camera networks enable distributed tracking of objects, mobile robots, and material flows across production lines. Decentralized architectures integrating edge computing with blockchain-based trust mechanisms have been proposed to securely share tracking information among industrial nodes [29]. This approach enhances data integrity, traceability, and resilience against cyberattacks, which are critical requirements in IIoT ecosystems.

Collaboration between fixed cameras and mobile robotic platforms represents another relevant advancement. Casao et al. [30] introduced a distributed active-perception framework in which multiple cameras, including sensors mounted on robots, cooperate to maintain persistent tracking of targets in large industrial spaces. This hybrid strategy ensures tracking continuity when an object exits the field of view of a mobile robot and is subsequently captured by fixed cameras, or vice versa, thereby reducing tracking fragmentation.

To provide a clearer understanding of how multi-camera systems interact with robotic platforms in industrial environments, Figure 4 illustrates a simplified workflow integrating distributed visual processing and industrial applications such as robot navigation and process monitoring.

In industrial navigation and logistics, the precise localization of autonomous guided vehicles (AGVs) using multi-camera networks has shown promising improvements in positioning accuracy. A recent study optimized PnP-based localization through regression modeling and multi-camera fusion, significantly reducing root-mean-square error compared with single-camera solutions [81]. The geometric redundancy provided by multiple synchronized views enables the correction of calibration errors and enhances robustness against partial occlusions or dynamic obstacles.

The optimal design of multi-camera networks in industrial environments has also been extensively studied. Camera-placement optimization algorithms have been proposed to maximize coverage while minimizing deployment cost [82,83]. More recent approaches incorporate energy constraints into adaptive coverage-optimization strategies, improving sustainability in large-scale facilities [84]. Efficient planning is particularly relevant in expansive industrial complexes where infrastructure decisions directly impact operational expenditures.

Intelligent control of PTZ cameras using multi-agent reinforcement learning has emerged as an active research direction. Yang et al. [85] proposed a hierarchical reinforcement-learning framework to optimize PTZ camera trajectories and orientations in dynamic tracking tasks. Such adaptive control mechanisms are especially suitable for automated warehouses and manufacturing plants characterized by high object mobility and dynamic reconfiguration.

In industrial aerial robotics, UAVs equipped with multiple cameras are increasingly used for infrastructure inspection, inventory auditing, and high-level monitoring. The photorealistic MCS Sim simulator facilitates the training and validation of multi-camera algorithms for UAV-based inspection in complex industrial settings prior to real-world deployment [80]. These tools reduce operational risks and accelerate development cycles.

Regarding physical system design, specialized structures for volumetric surveillance and cost-optimized deployment of multi-camera arrays have been proposed, complementing algorithmic camera-placement optimization [86]. Overall, multi-camera systems in industry and robotics have enabled advancements in visual mosaicking, secure distributed tracking, precise localization, collaborative perception, and energy-efficient coverage. Nevertheless, challenges persist concerning robustness under adverse industrial conditions (dust, vibration, fluctuating illumination), interoperability among heterogeneous platforms, and cybersecurity protection in critical infrastructure environments.

3.8. Mobile Cameras, Drones, and Active Perception

A significant subfield within multi-camera systems involves the integration of mobile or actuated cameras, including autonomous PTZ (pan-tilt-zoom) cameras and sensors mounted on unmanned aerial vehicles (UAVs). Unlike static configurations, these systems introduce the paradigm of active perception, in which cameras dynamically adjust orientation or trajectory to optimize coverage, tracking continuity, or observation quality.

In the PTZ domain, Kumari et al. [87] proposed a dynamic scheduling scheme for an autonomous PTZ camera integrated within a fixed camera network. The scheduling algorithm determines the optimal orientation and zoom level at each time step to maximize event detection probability while accounting for movement costs and reorientation latency. Experimental results demonstrated that a strategically controlled PTZ camera can effectively fill coverage gaps that would otherwise require additional fixed sensors.

The evolution of this approach has incorporated multi-agent reinforcement learning for cooperative control of multiple mobile cameras. Yang et al. [85] developed a hierarchical reinforcement-learning framework to optimize trajectories and orientation policies for PTZ cameras in dynamic multi-target tracking scenarios. In this framework, cameras coordinate to distribute targets efficiently, minimizing redundant field-of-view overlap while improving tracking persistence.

In UAV-based surveillance, aerial mobility significantly extends spatial coverage and adaptability. Gonchigsumlaa et al. [88] formulated an entropy- and coverage-driven optimal-control model for cooperative multi-camera UAV systems. Their results demonstrated substantial performance gains compared with static patrol patterns, particularly in large-perimeter monitoring and critical-infrastructure inspection scenarios.

The validation of these cooperative strategies has been supported by photorealistic simulation environments. MCS Sim provides a virtual testbed for evaluating dynamic calibration, target handoff, and cooperative tracking among drones prior to physical deployment [80]. Simulation-based validation reduces operational risks and facilitates the analysis of occlusion handling and coordination policies in complex environments.

Hybrid integration between fixed and mobile cameras represents another important advancement. In distributed active-perception frameworks, fixed cameras detect initial events and trigger intervention by mobile sensors (PTZ units or UAVs) for closer inspection and persistent tracking [30]. This layered model integrates continuous passive surveillance with adaptive response mechanisms, effectively closing the perception–action loop.

Active collaboration strategies supported by pose estimation have also been proposed, demonstrating that pose-informed coordination improves tracking robustness in dynamic scenes [89]. Overall, mobile multi-camera systems enhance spatial coverage, adaptability, and response time relative to purely static networks. However, they introduce additional technical challenges, including dynamic recalibration between mobile and static cameras, energy-management constraints in UAV platforms, and secure coordination protocols to avoid redundancy or collision. Ongoing research addresses these issues through optimal-control theory, distributed learning, and cooperative multi-agent architectures.

3.9. Other Emerging Applications: Retail, Education, and Forensic Analysis

Beyond traditional domains such as security and transportation, multi-camera systems have expanded into emerging applications including retail analytics, educational environments, and forensic video analysis. In these contexts, the integration of multiple viewpoints enhances interpretability, coverage, and analytical depth.

In the retail sector, smart stores deploy multi-camera networks to analyze customer behavior patterns and optimize operational efficiency. Trajectory-based re-identification systems have been used to map customer movement paths in shopping centers without relying on explicit biometric identification [90]. Such systems enable the estimation of flow between commercial zones and dwell-time analysis while prioritizing metadata-based processing over raw video storage.

In retail logistics and drive-through service management, multi-camera vehicle tracking has demonstrated improvements in throughput and congestion reduction [55]. Robust cross-camera association supports the optimization of service times and the mitigation of bottlenecks in high-demand environments.

In educational settings, multi-camera systems have been applied for student detection and counting in classrooms, mitigating occlusion through complementary viewing angles [91]. These configurations improve detection accuracy in dense environments and have been extended to safety applications, including automated recognition of violent activities using deep multi-camera models [92]. While these implementations offer safety benefits, they also raise ethical and regulatory considerations related to surveillance in academic institutions.

In forensic analysis, multi-camera networks have enabled advanced tools for automatic video summarization and efficient search across large datasets. Veesam and Satish [93] proposed an integrated multi-camera summarization framework combining object detection and multimodal fusion for crime-scene investigation applications. This approach significantly reduces manual review time while preserving critical evidence.

Probabilistic identification methods in multi-camera environments have also been developed for scenarios in which cameras are unsynchronized or exhibit temporal inconsistencies [94]. These methods employ inference based on visual attributes and spatiotemporal context to accelerate re-identification in complex forensic investigations.

Recent reviews have synthesized machine-learning techniques applied to multi-camera networks, highlighting both technical advancements and practical limitations [37]. Key challenges include computational cost, hardware requirements, and efficient integration with edge and fog architectures.

From an IoT integration perspective, efficient encoding mechanisms and device–object pairing strategies have been proposed to improve interoperability between cameras and smart infrastructure components [21,95]. These contributions facilitate coordinated operation in commercial and urban ecosystems.

Overall, emerging applications demonstrate the continued expansion of the multi-camera paradigm into diverse operational domains. Although each sector presents unique ethical or regulatory considerations, they share common technical challenges related to identity association, spatiotemporal fusion, scalability, and computational efficiency. The technological maturity achieved in security and transportation domains increasingly serves as the foundation for these new applications. Finally, multi-camera approaches for suspicious object localization and end-to-end event image stitching and edge detection have been explored, reinforcing the role of multi-view perception in demanding operational contexts [96,97], see Table 2.

4. Discussion

Across the reviewed studies, the evaluation protocols show notable variability depending on the application domain. In surveillance and multi-target tracking research, benchmark datasets such as DukeMTMC, CityFlow, and other multi-camera tracking datasets are commonly used, with evaluation metrics including IDF1, MOTA, and tracking accuracy. In object detection-oriented applications, metrics such as mean Average Precision (mAP) and detection rate remain dominant.

However, in several emerging domains such as agriculture, healthcare, and industrial monitoring, the absence of standardized multi-camera datasets often leads researchers to construct custom experimental datasets, which limits reproducibility and cross-study comparison. This heterogeneity highlights the need for more standardized evaluation protocols and shared datasets for multi-camera computer vision research.

Based on the comprehensive review of the literature published between 2020 and 2025, it is evident that multi-camera systems have reached a substantial level of technical maturity across a wide spectrum of domains, including public security, transportation, healthcare, agriculture, industry, and emerging smart-city ecosystems. Nevertheless, despite these advances, several structural and cross-domain challenges persist that limit scalability, long-term robustness, and broader societal acceptance.

One of the most recurrent and technically complex issues is cross-camera data association. Despite significant progress in re-identification and multi-object tracking, maintaining the consistent identity of an individual or object across multiple spatially distributed cameras remains an open challenge [13,14]. Identity fragmentation, appearance ambiguity, and trajectory discontinuities become particularly problematic in uncontrolled environments characterized by variable illumination, prolonged occlusions, crowd density, and non-overlapping fields of view. Recent surveys emphasize that combining appearance descriptors, dynamic motion modeling, and spatiotemporal constraints enhances robustness; however, determining the optimal integration of these components remains an unresolved research direction [13,14].

Another relevant technical factor influencing the performance of multi-camera systems is the configuration of the camera network itself. The reviewed studies employ diverse deployment strategies, including overlapping and non-overlapping camera fields of view, distributed urban camera networks, and hybrid configurations combining fixed and mobile cameras.

Camera placement directly affects the complexity of cross-camera association tasks. Overlapping views simplify spatial matching through geometric constraints, whereas non-overlapping configurations require stronger appearance-based re-identification and spatiotemporal reasoning. Furthermore, temporal synchronization among cameras plays an important role in ensuring coherent trajectory reconstruction and avoiding identity fragmentation in multi-target tracking scenarios.

Geometric calibration is another critical requirement for several multi-camera applications, particularly in bird’s-eye-view projection, multi-view fusion, and trajectory estimation tasks. Calibration errors or inconsistent spatial alignment can significantly degrade tracking accuracy and cross-view consistency, especially in large-scale camera networks deployed in urban or industrial environments.

From an algorithmic perspective, deep learning has become the dominant approach for addressing detection, tracking, and re-identification tasks in multi-camera systems. Convolutional neural networks are widely used for object detection and feature extraction, while deep embedding networks enable robust person or vehicle re-identification across different viewpoints.

These models learn discriminative appearance representations that help mitigate challenges such as illumination variation, viewpoint changes, and partial occlusions. In addition, several recent works integrate spatiotemporal constraints with deep feature embeddings to improve cross-camera identity association and trajectory reconstruction in large-scale camera networks.

In this context, formulations based on spatiotemporal reasoning and robust cross-camera association have been explored to improve global matching performance in realistic deployment scenarios [40,41]. Complementarily, in intelligent transportation systems, learned representations combined with edge AI architectures have enabled cooperative vehicle tracking across distributed camera networks [51]. These efforts illustrate the trend toward integrating perception algorithms with distributed architectural intelligence.

Another critical axis concerns scalability and computational efficiency. As the number of cameras increases and video resolutions continue to rise, real-time processing becomes a significant bottleneck. Although edge and fog computing architectures, along with resource allocation strategies, have been proposed to distribute computational load [16,18,37], efficient orchestration of heterogeneous resources remains a non-trivial challenge. The need to balance latency, energy consumption, and bandwidth constraints becomes especially relevant in large-scale urban and industrial deployments. In this direction, encoding and compression mechanisms tailored for IoT-based surveillance have also been introduced to accelerate transmission and processing [95].

5. Conclusions

Between 2020 and 2025, multi-camera vision systems have consolidated their position as a key technology across numerous domains, providing environmental perception capabilities that are difficult to achieve with isolated cameras. In this review, the main applications have been identified and analyzed, ranging from public security surveillance, traffic monitoring in intelligent transportation systems, and smart city management to emerging fields such as assisted healthcare, precision agriculture, Industry 4.0, and retail environments, demonstrating that the versatility of camera networks spans a wide range of sectors [36,37]. Each domain leverages intrinsic advantages of multiple viewpoints: enhanced spatial coverage, reduced blind spots, continuous tracking of objects across different scenarios, and redundancy against occlusions [13,98].

The predominant techniques and approaches enabling these advances have also been summarized. The rise of deep learning has been a cross-cutting factor, with detectors and re-identification models significantly improving detection accuracy and identity association in complex environments [22,23,24]. Complementarily, optimization and planning algorithms for mobile cameras, edge computing to distribute processing loads, and information fusion strategies to enhance situational awareness have been integrated into modern systems [16,85,87]. Overall, many systems combine multi-object tracking, biometrics (e.g., face or gait recognition), and behavioral analysis within hybrid architectures tailored to specific contexts [13,27].

Despite these achievements, the review reveals persistent challenges. Robust cross-camera object association in arbitrary situations remains complex, especially under prolonged occlusions, dense crowds, or abrupt appearance changes, keeping the MTMCT problem open in uncontrolled scenarios [13,14]. Scaling systems to dozens or hundreds of cameras while maintaining real-time processing is another major obstacle: distributed solutions and edge computing mitigate some limitations but introduce orchestration complexity, network requirements, and the need for efficient resource allocation [16,17,18]. Furthermore, privacy and security are critical concerns; large-scale deployment requires technical safeguards (local processing, data minimization, integrity, and traceability) as well as robust protection mechanisms against threats [19,29]. Finally, knowledge gaps remain in less explored subareas (e.g., multi-camera anomaly detection and extreme scenarios), requiring further research attention and suitable datasets for evaluation [3,45].

Looking ahead, multi-camera systems are expected to become increasingly integrated with IoT/IIoT infrastructures and cooperative edge–cloud architectures, leading to more intelligent and collaborative networks capable of delivering comprehensive services in cities and factories [11,18]. A trend toward active perception and hybrid networks combining fixed cameras with mobile cameras (PTZ, robots, or UAVs) is also evident, aiming to improve coverage and event response through intelligent control and simulation support [30,80,88].

In summary, during the reviewed period, multi-camera systems have evolved from promising research topics into practical implementations across diverse domains, although technical and data governance limitations remain. Addressing the research questions posed: (1) the main identified domains include security surveillance, transportation and traffic, smart cities, healthcare, agriculture/livestock, industry/robotics, and commercial environments; (2) key techniques include deep learning for detection, tracking, and re-identification, distributed edge computing architectures, active camera control, and IoT-based fusion; (3) recent achievements show improvements in accuracy and coverage, yet limitations persist in cross-camera association, computational cost, and privacy; and (4) future opportunities include more robust association algorithms, efficient scalability through edge AI, integrated security and privacy mechanisms, and new active perception schemes [13,37].

In conclusion, multi-camera systems are on track to become fundamental components of intelligent infrastructures, enhancing safety, efficiency, and situational awareness. Their future development will require a multidisciplinary approach combining technical innovation with ethical considerations and public policy frameworks, ensuring that these networks become more autonomous, collaborative, and privacy-aware.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/a19040249/s1, include the PRISMA checklist and the abstract checklist, as well as the flow diagram prepared in accordance with the PRISMA guidelines.

Author Contributions

Conceptualization, J.V.-A. and S.M.M.; methodology, C.J.F.-S.; validation, J.V.-A. and C.D.-V.-S.; formal analysis, C.J.F.-S. and S.M.M.; investigation, C.J.F.-S. and J.V.-A.; writing—original draft preparation, J.V.-A. and C.D.-V.-S.; writing—review and editing, S.M.M. and C.D.-V.-S.; visualization, J.V.-A. and C.D.-V.-S.; supervision, S.M.M.; funding acquisition, J.V.-A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors acknowledge the support of Universidad Tecnológica Indoamérica for the development of this research. Generative AI (ChatGPT Thinking 5.2) was used solely for minor language editing, including grammar correction and improvement of sentence clarity in English. The scientific review, analysis, and conclusions of the manuscript were developed entirely by the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ortega, J.D.; Cañas, P.N.; Nieto, M.; Otaegui, O.; Salgado, L. Challenges of large-scale multi-camera datasets for driver monitoring systems. Sensors 2022, 22, 2554. [Google Scholar] [CrossRef] [PubMed]
Ye, S.; Bohush, R.P.; Chen, H.; Zakharava, I.Y.; Ablameyko, S.V. Person Tracking and Reidentification for Multicamera Indoor Video Surveillance Systems. Pattern Recognit. Image Anal. 2020, 30, 827–837. [Google Scholar] [CrossRef]
Samaila, Y.A.; Sebastian, P.; Singh, N.S.S.; Shuaibu, A.N.; Ali, S.S.A.; Amosa, T.I.; Mustafa Abro, G.E.; Shuaibu, I. Video anomaly detection: A systematic review of issues and prospects. Neurocomputing 2024, 591, 127726. [Google Scholar] [CrossRef]
Wang, X. Intelligent multi-camera video surveillance: A review. Pattern Recognit. Lett. 2013, 34, 3–19. [Google Scholar] [CrossRef]
Iguernaissi, R.; Merad, D.; Aziz, K.; Drap, P. People tracking in multi-camera systems: A review. Multimed. Tools Appl. 2019, 78, 10773–10793. [Google Scholar]
Wang, Y.; Lu, K.; Zhai, R. Challenge of multi-camera tracking. In Proceedings of the 2014 7th International Congress on Image and Signal Processing, Dalian, China, 14–16 October 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 32–37. [Google Scholar] [CrossRef]
Shu, F.; Shu, J. An eight-camera fall detection system using human fall pattern recognition via machine learning by a low-cost android box. Sci. Rep. 2021, 11, 2471. [Google Scholar] [CrossRef]
Ezatzadeh, S.; Keyvanpour, M.R.; Shojaedini, S.V. A human fall detection framework based on multi-camera fusion. J. Exp. Theor. Artif. Intell. 2021, 34, 905–924. [Google Scholar] [CrossRef]
Hussain, A.; Khan, S.U.; Rida, I.; Khan, N.; Baik, S.W. Human centric attention with deep multiscale feature fusion framework for activity recognition in Internet of Medical Things. Inf. Fusion 2024, 106, 102211. [Google Scholar] [CrossRef]
Bai, Z.; Li, Y.; Chen, X.; Yi, T.; Wei, W.; Wozniak, M.; Damasevicius, R. Real-Time Video Stitching for Mine Surveillance Using a Hybrid Image Registration Method. Electronics 2020, 9, 1336. [Google Scholar] [CrossRef]
Ismail, M.G.; Tarabay, F.H.; El-Masry, R.; El Ghany, M.A.; Salem, M.A.M. Smart Cloud-Edge Video Surveillance System. In Proceedings of the 2022 11th International Conference on Modern Circuits and Systems Technologies (MOCAST), Bremen, Germany, 8–10 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–4. [Google Scholar] [CrossRef]
Nasir, M.F.; Fuentes, A.; Han, S.; Liu, J.; Jeong, Y.; Yoon, S.; Park, D.S. Multi-camera fusion and bird-eye view location mapping for deep learning-based cattle behavior monitoring. Artif. Intell. Agric. 2025, 15, 724–743. [Google Scholar] [CrossRef]
Amosa, T.I.; Sebastian, P.; Izhar, L.I.; Ibrahim, O.; Ayinla, L.S.; Bahashwan, A.A.; Bala, A.; Samaila, Y.A. Multi-camera multi-object tracking: A review of current trends and future advances. Neurocomputing 2023, 552, 126558. [Google Scholar] [CrossRef]
Zhang, P.; Lei, W.M.; Zhao, X.L.; Dong, L.J.; Lin, Z.N. A Survey on Multi-Target Multi-Camera Tracking Methods. Chin. J. Comput. 2024, 47, 287–309. [Google Scholar] [CrossRef]
Kalake, L.; Dong, Y.; Wan, W.; Hou, L. Enhancing Detection Quality Rate with a Combined HOG and CNN for Real-Time Multiple Object Tracking across Non-Overlapping Multiple Cameras. Sensors 2022, 22, 2123. [Google Scholar] [CrossRef] [PubMed]
Kumar, P.P.; Pal, A.; Kant, K. Resource Efficient Edge Computing Infrastructure for Video Surveillance. IEEE Trans. Sustain. Comput. 2022, 7, 774–785. [Google Scholar] [CrossRef]
Kunpeng, Y.; Shan, H.; Sun, T.; Hu, R.; Wu, Y.; Yu, L.; Zhang, Z.; Quek, T.Q. Reinforcement Learning-based Mobile Edge Computing and Transmission Scheduling for Video Surveillance. IEEE Trans. Emerg. Top. Comput. 2021, 10, 1142–1156. [Google Scholar] [CrossRef]
Du, L.; Huo, R.; Sun, C.; Wang, S.; Huang, T. Collaborative Video Processing of Multiple Cameras in Smart Transportation: Content Analysis and Resource Allocation. IEEE Trans. Mob. Comput. 2025, 24, 9965–9979. [Google Scholar] [CrossRef]
Chen, C.H.; Liu, C.T. Person Re-Identification Microservice over Artificial Intelligence Internet of Things Edge Computing Gateway. Electronics 2021, 10, 2264. [Google Scholar] [CrossRef]
Singh, R.P.; Srivastava, H.; Gautam, H.; Shukla, R.; Dwivedi, R.K. An Intelligent Video Surveillance System using Edge Computing based Deep Learning Model. In Proceedings of the 2023 International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), Bengaluru, India, 5–7 January 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 439–444. [Google Scholar] [CrossRef]
Tong, K.L.; Wu, K.R.; Tseng, Y.C. The Device–Object Pairing Problem: Matching IoT Devices with Video Objects in a Multi-Camera Environment. Sensors 2021, 21, 5518. [Google Scholar] [CrossRef]
Zhang, T.; Xie, L.; Wei, L.; Zhang, Y.; Li, B.; Tian, Q. Single Camera Training for Person Re-Identification. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12878–12885. [Google Scholar] [CrossRef]
Herzog, F.; Ji, X.; Teepe, T.; Hormann, S.; Gilg, J.; Rigoll, G. Lightweight Multi-Branch Network For Person Re-Identification. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1129–1133. [Google Scholar] [CrossRef]
Dang, T.L.; Hoang, M.H.; Ngo, V.A.; Duong, M.Q.; Ha, H.H.; Nguyen, T.A.; Le, H. Real-time person re-identification and tracking on edge devices with distributed optimization. Pattern Anal. Appl. 2025, 28. [Google Scholar] [CrossRef]
Nualtim, W.; Suwansantisuk, W.; Kumhom, P. Face Recognition Based on Multiple Video Cameras. In Proceedings of the 2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Phuket, Thailand, 24–27 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 324–330. [Google Scholar] [CrossRef]
Badave, H.; Kuber, M. Head Pose Estimation Based Robust Multicamera Face Recognition. In Proceedings of the 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), Coimbatore, India, 25–27 March 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 492–495. [Google Scholar] [CrossRef]
Mughal, A.B.; Khan, R.U.; Bermak, A.; Rehman, A.u. Person Recognition via Gait: A Review of Covariate Impact and Challenges. Sensors 2025, 25, 3471. [Google Scholar] [CrossRef] [PubMed]
Li, C.; Li, J.; Xie, Y.; Nie, J.; Yang, T.; Lu, Z. Multi-camera joint spatial self-organization for intelligent interconnection surveillance. Eng. Appl. Artif. Intell. 2022, 107, 104533. [Google Scholar] [CrossRef]
Wang, S.; Sheng, H.; Zhang, Y.; Yang, D.; Shen, J.; Chen, R. Blockchain-Empowered Distributed Multicamera Multitarget Tracking in Edge Computing. IEEE Trans. Ind. Inform. 2024, 20, 369–379. [Google Scholar] [CrossRef]
Casao, S.; Serra-Gómez, A.; Murillo, A.C.; Böhmer, W.; Alonso-Mora, J.; Montijano, E. Distributed multi-target tracking and active perception with mobile camera networks. Comput. Vis. Image Underst. 2024, 238, 103876. [Google Scholar] [CrossRef]
Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object Detection in 20 Years: A Survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.K.; Khan, F.S. Transformers in Vision: A Survey. ACM Comput. Surv. 2022, 54, 1–41. [Google Scholar] [CrossRef]
Shi, W.; Cao, J.; Zhang, Q.; Li, Y.; Xu, L. Edge Computing: Vision and Challenges. IEEE Internet Things J. 2022, 10, 450–465. [Google Scholar] [CrossRef]
Zhang, C.; Patras, I.; Haddadi, H. Deep Learning in Mobile and Wireless Networking: A Survey. IEEE Commun. Surv. Tutor. 2019, 21, 2224–2287. [Google Scholar] [CrossRef]
Ristani, E.; Tomasi, C. Features for Multi-Target Multi-Camera Tracking and Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2021; pp. 6036–6046. [Google Scholar] [CrossRef]
Olagoke, A.S.; Ibrahim, H.; Teoh, S.S. Literature Survey on Multi-Camera System and Its Application. IEEE Access 2020, 8, 172892–172922. [Google Scholar] [CrossRef]
Dharan, A.M.; Mukhopadhyay, D. A comprehensive survey on machine learning techniques to mobilize multi-camera network for smart surveillance. Innov. Syst. Softw. Eng. 2023, 21, 313–332. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
Zhou, Z.; Yin, D.; Ding, J.; Luo, Y.; Yuan, M.; Zhu, C. Collaborative Tracking Method in Multi-Camera System. J. Shanghai Jiaotong Univ. (Sci.) 2020, 25, 802–810. [Google Scholar] [CrossRef]
Liu, W.; Wei, G.; Wang, Y.; Wu, R. Indoor Multipedestrian Multicamera Tracking Based on Fine Spatiotemporal Constraints. IEEE Internet Things J. 2023, 10, 10012–10023. [Google Scholar] [CrossRef]
Sakaguchi, S.; Amagasaki, M.; Kiyama, M.; Okamoto, T. Multi-Camera People Tracking With Spatio-Temporal and Group Considerations. IEEE Access 2024, 12, 36066–36073. [Google Scholar] [CrossRef]
Kumar, P.; C, S.; Desai, P. Person Tracking with Re-Identification in Multi-Camera Setup: A Distributed Approach. In Proceedings of the 2022 2nd International Conference on Intelligent Technologies (CONIT), Hubli, India, 24–26 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–5. [Google Scholar] [CrossRef]
Pandya, N.A.; Chauhan, N.C. Multi-Camera Person Tracking: Integrating YOLOv8 with ByteTrack. Int. J. Electr. Electron. Eng. 2024, 11, 53–60. [Google Scholar] [CrossRef]
Gautam, V.; Prasad, S.; Sinha, S. YOLORe-IDNet: An Efficient Multi-camera System for Person-Tracking. In Communications in Computer and Information Science; Springer: Cham, Switzerland, 2024; pp. 185–197. [Google Scholar] [CrossRef]
Pereira, S.S.L.; Maia, J.E.B. MC-MIL: Video surveillance anomaly detection with multi-instance learning and multiple overlapped cameras. Neural Comput. Appl. 2024, 36, 10527–10543. [Google Scholar] [CrossRef]
Veesam, S.B.; Rao, B.T.; Begum, Z.; Patibandla, R.S.M.L.; Dcosta, A.A.; Bansal, S.; Prakash, K.; Faruque, M.R.I.; Al-mugren, K.S. Multi-camera spatiotemporal deep learning framework for real-time abnormal behavior detection in dense urban environments. Sci. Rep. 2025, 15, 26813. [Google Scholar] [CrossRef]
Duc, N.P.; Thanh, H.D.N.; Minh, K.H.; Trang, N.L.H.; Thanh, H.T. Enhancing Violent Behavior Recognition in Schools Through YOLOv8 Optimization Using LSTM with Multi-Camera Model. In Proceedings of the 2024 15th International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 16–18 October 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 2078–2081. [Google Scholar] [CrossRef]
Tripicchio, P.; D’Avella, S.; Camacho-Gonzalez, G.; Landolfi, L.; Baris, G.; Avizzano, C.A.; Filippeschi, A. Multi-Camera Extrinsic Calibration for Real-Time Tracking in Large Outdoor Environments. J. Sens. Actuator Netw. 2022, 11, 40. [Google Scholar] [CrossRef]
Lima, J.P.; Roberto, R.; Figueiredo, L.; Simões, F.; Thomas, D.; Uchiyama, H.; Teichrieb, V. 3D pedestrian localization using multiple cameras: A generalizable approach. Mach. Vis. Appl. 2022, 33, 61. [Google Scholar] [CrossRef]
Yang, H.; Cai, J.; Zhu, M.; Liu, C.; Wang, Y. Traffic-Informed Multi-Camera Sensing (TIMS) System Based on Vehicle Re-Identification. IEEE Trans. Intell. Transp. Syst. 2022, 23, 17189–17200. [Google Scholar] [CrossRef]
Yang, H.F.; Cai, J.; Liu, C.; Ke, R.; Wang, Y. Cooperative multi-camera vehicle tracking and traffic surveillance with edge artificial intelligence and representation learning. Transp. Res. Part C Emerg. Technol. 2023, 148, 103982. [Google Scholar] [CrossRef]
Jin, T.; Ye, X.; Li, Z.; Huo, Z. Identification and Tracking of Vehicles between Multiple Cameras on Bridges Using a YOLOv4 and OSNet-Based Method. Sensors 2023, 23, 5510. [Google Scholar] [CrossRef] [PubMed]
Zhang, H.; Fang, R.; Li, S.; Miao, Q.; Fan, X.; Hu, J.; Chan, S. Multi-Camera Multi-Vehicle Tracking Guided by Highway Overlapping FoVs. Mathematics 2024, 12, 1467. [Google Scholar] [CrossRef]
Tseng, Y.S.; Su, Y.F.; Lin, D.T. Enhancing multi-target multi-camera vehicle tracking with YOLOv9 and attention mechanisms for smart city traffic monitoring. Multimed. Tools Appl. 2025, 84, 45095–45117. [Google Scholar] [CrossRef]
Gellida-Coutiño, C.; Rios-Cabrera, R.; Maldonado-Ramirez, A.; Sanchez-Orta, A. Real-Time Multi-Camera Tracking for Vehicles in Congested, Low-Velocity Environments: A Case Study on Drive-Thru Scenarios. Electronics 2025, 14, 2671. [Google Scholar] [CrossRef]
Zwemer, M.H.; Groot, H.G.J.; Wijnhoven, R.; Bondarev, E.; de With, P.H.N. Multi-Camera Vessel-Speed Enforcement by Enhancing Detection and Re-Identification Techniques. Sensors 2021, 21, 4659. [Google Scholar] [CrossRef]
Zhao, L.; Fu, Z.; Yang, J.; Zhao, Z.; Wang, P. Multi-Adjacent Camera-Based Dangerous Driving Trajectory Recognition for Ultra-Long Highways. Appl. Sci. 2024, 14, 4593. [Google Scholar] [CrossRef]
Zhang, X.; Story, B.; Rajan, D. Night Time Vehicle Detection and Tracking by Fusing Vehicle Parts From Multiple Cameras. IEEE Trans. Intell. Transp. Syst. 2022, 23, 8136–8156. [Google Scholar] [CrossRef]
Chan, S.; Ni, S.; Guo, B.; Hu, J.; Tang, T.; Zhou, X.; Hao, P. Deformable Blur Sensing and Regression Analysis ReID Feature Fusion for Multitarget Multicamera Tracking Systems in Highway Scenarios. IEEE Trans. Comput. Soc. Syst. 2025, 12, 738–748. [Google Scholar] [CrossRef]
Li, Y.; Sarvi, M.; Khoshelham, K.; Zhang, Y.; Jiang, Y. Pedestrian Origin–Destination Estimation Based on Multi-Camera Person Re-Identification. Sensors 2022, 22, 7429. [Google Scholar] [CrossRef]
Liu, B.; Lam, C.T.; Ng, B.K.; Yuan, X.; Im, S.K. A Graph-Based Framework for Traffic Forecasting and Congestion Detection Using Online Images From Multiple Cameras. IEEE Access 2024, 12, 3756–3767. [Google Scholar] [CrossRef]
Lassen, V.; Lübke, M.; Franchi, N. Parking–Occupancy Detection Through Adaptive Multisensor Camera-CNN Fusion. IEEE Sens. Lett. 2025, 9, 7004404. [Google Scholar] [CrossRef]
El-Alami, A.; Nadir, Y.; Mansouri, K. A review of object detection approaches for traffic surveillance systems. Int. J. Electr. Comput. Eng. (IJECE) 2024, 14, 5221. [Google Scholar] [CrossRef]
Liu, X.; Dong, Y.; Deng, Z. Deep Highway Multi-Camera Vehicle Re-ID with Tracking Context. In Proceedings of the 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China, 12–14 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 2090–2093. [Google Scholar] [CrossRef]
Ay, S.; Karabatak, M. A enhanced vehicle tracking and detection in multi-camera surveillance systems using advanced optical flow and deep learning techniques. Signal Image Video Process. 2025, 19, 1146. [Google Scholar] [CrossRef]
Shim, K.; Ko, K.; Hwang, J.; Jang, H.; Kim, C. Fast online multi-target multi-camera tracking for vehicles. Appl. Intell. 2023, 53, 28994–29004. [Google Scholar] [CrossRef]
Wang, Z.; Mahmoudian, N. Aerial Fluvial Image Dataset for Deep Semantic Segmentation Neural Networks and Its Benchmarks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 4755–4766. [Google Scholar] [CrossRef]
Montero, D.; Aranjuelo, N.; Leskovsky, P.; Loyo, E.; Nieto, M.; Aginako, N. Multi-camera BEV video-surveillance system for efficient monitoring of social distancing. Multimed. Tools Appl. 2023, 82, 34995–35019. [Google Scholar] [CrossRef]
Garcia-Quilachamin, W.; Concepción, L.P.; Herrera-Tapia, J.; Salazar, R.J.; Toala-Mero, W. Validation of an Algorithm for the Detection of the Image of a Person Using Multiple Cameras. In Communications in Computer and Information Science; Springer: Cham, Switzerland, 2020; pp. 486–501. [Google Scholar] [CrossRef]
Guo, Y.; Liu, Z.; Luo, H.; Pu, H.; Tan, J. Multi-person multi-camera tracking for live stream videos based on improved motion model and matching cascade. Neurocomputing 2022, 492, 561–571. [Google Scholar] [CrossRef]
Wu, F.; Song, H.; Dai, Z.; Wang, W.; Li, J. Multi-camera traffic scene mosaic based on camera calibration. IET Comput. Vis. 2021, 15, 47–59. [Google Scholar] [CrossRef]
López-Cifuentes, A.; Escudero-Viñolo, M.; Bescós, J.; Carballeira, P. Semantic-driven multi-camera pedestrian detection. Knowl. Inf. Syst. 2022, 64, 1211–1237. [Google Scholar] [CrossRef]
Styles, O.; Guha, T.; Sanchez, V. Multi-Camera Trajectory Forecasting with Trajectory Tensors. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 8482–8491. [Google Scholar] [CrossRef] [PubMed]
Yaghi, M.; Basmaji, T.; Salim, R.; Yousaf, J.; Zia, H.; Ghazal, M. Real-time Contact Tracing During a Pandemic using Multi-camera Video Object Tracking. In Proceedings of the 2020 International Conference on Decision Aid Sciences and Application (DASA), Sakheer, Bahrain, 8–9 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 872–876. [Google Scholar] [CrossRef]
Tseng, C.H.; Hsieh, C.C.; Jwo, D.J.; Wu, J.H.; Sheu, R.K.; Chen, L.C. Person Retrieval in Video Surveillance Using Deep Learning–Based Instance Segmentation. J. Sens. 2021, 2021, 9566628. [Google Scholar] [CrossRef]
Salau, J.; Krieter, J. Instance Segmentation with Mask R-CNN Applied to Loose-Housed Dairy Cows in a Multi-Camera Setting. Animals 2020, 10, 2402. [Google Scholar] [CrossRef]
Bumbálek, R.; Ufitikirezi, J.d.D.M.; Umurungi, S.N.; Zoubek, T.; Kuneš, R.; Stehlík, R.; Bartoš, P. Computer vision in precision livestock farming: Benchmarking YOLOv9, YOLOv10, YOLOv11, and YOLOv12 for individual cattle identification. Smart Agric. Technol. 2025, 12, 101208. [Google Scholar] [CrossRef]
Borwarnginn, P.; Haga, J.H.; Kusakunniran, W. Water Level Detection from CCTV Cameras using a Deep Learning Approach. In Proceedings of the 2020 IEEE REGION 10 CONFERENCE (TENCON), Osaka, Japan, 16–19 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1283–1288. [Google Scholar] [CrossRef]
Partanen, T.; Muller, P.; Collin, J.; Bjorklund, J. Implementation and Accuracy Evaluation of Fixed Camera-Based Object Positioning System Employing CNN-Detector. In Proceedings of the 2021 9th European Workshop on Visual Information Processing (EUVIP), Paris, France, 23–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar] [CrossRef]
Qi, Q.; Wang, G.; Pan, Y.; Fan, H.; Li, B. MCS-Sim: A Photo-Realistic Simulator for Multi-Camera UAV Visual Perception Research. Drones 2025, 9, 656. [Google Scholar] [CrossRef]
Pawako, S.; Khaewnak, N.; Kosiyanurak, A.; Wanglomklang, T.; Srisertpol, J. Optimizing Multi-Camera PnP Localization via Gaussian Process Regression for Intelligent AGV Navigation. Int. J. Intell. Eng. Syst. 2025, 18, 673–687. [Google Scholar] [CrossRef]
Chebi, H. Novel greedy grid-voting algorithm for optimisation placement of multi-camera. Int. J. Sens. Netw. 2021, 35, 170. [Google Scholar] [CrossRef]
Hanel, M.L.; Schonlieb, C.B. Efficient Global Optimization of Non-Differentiable, Symmetric Objectives for Multi Camera Placement. IEEE Sens. J. 2022, 22, 5278–5287. [Google Scholar] [CrossRef]
Sureshms, M.S.; Menon, V.; Govindaraju, V. Adaptive Coverage Optimization for Energy-Constrained Multicamera Surveillance Networks. IEEE Sens. J. 2024, 24, 35528–35537. [Google Scholar] [CrossRef]
Yang, Z.; Liu, H.; Fang, H.; Li, J.; Jiang, Y. Multi-Agent Hierarchical Reinforcement Learning for PTZ Camera Control and Visual Enhancement. Electronics 2025, 14, 3825. [Google Scholar] [CrossRef]
Alqaysi, H.; Lawal, N.; Fedorov, I.; Thornberg, B.; O’Nils, M. Cost Optimized Design of Multi-Camera Dome for Volumetric Surveillance. IEEE Sens. J. 2021, 21, 3730–3737. [Google Scholar] [CrossRef]
Kumari, P.; Nandyala, N.; Teja, A.K.S.; Goel, N.; Saini, M. Dynamic Scheduling of an Autonomous PTZ Camera for Effective Surveillance. In Proceedings of the 2020 IEEE 17th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), Delhi, India, 10–13 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 437–445. [Google Scholar] [CrossRef]
Gonchigsumlaa, K.; Kim, Y.I.; Yeo, K.M.; Park, S.H.; Lee, Y.T. Strategic Optimal Control of Multi-Camera Trajectories for UAV Capture Using Entropy and Coverage Approaches. In Bulletin of the Polish Academy of Sciences Technical Sciences; Polish Academy of Sciences: Warsaw, Poland, 2025; p. 153828. [Google Scholar] [CrossRef]
Li, J.; Xu, J.; Zhong, F.; Kong, X.; Qiao, Y.; Wang, Y. Pose-Assisted Multi-Camera Collaboration for Active Object Tracking. Proc. AAAI Conf. Artif. Intell. 2020, 34, 759–766. [Google Scholar] [CrossRef]
Mendes, D.; Correia, S.; Jorge, P.; Brandão, T.; Arriaga, P.; Nunes, L. Multi-Camera Person Re-Identification Based on Trajectory Data. Appl. Sci. 2023, 13, 11578. [Google Scholar] [CrossRef]
Wang, Q.; Li, W.; Liu, H.; Shan, L. A Robust Approach for Students Detection via Multi Cameras with Mask-RCNN. In Proceedings of the 2021 2nd International Conference on Computers, Information Processing and Advanced Education; ACM: New York, NY, USA, 2021; pp. 24–28. [Google Scholar] [CrossRef]
Shivaprasad Yadav, S.G.; Itagi, S.; Krishna Suresh, B.V.N.V.; K.L, H.; A C, R. Human Illegal Activity Recognition Based on Deep Learning Techniques. In Proceedings of the 2023 IEEE International Conference on Integrated Circuits and Communication Systems (ICICACS), Raichur, India, 24–25 February 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–7. [Google Scholar] [CrossRef]
Veesam, S.B.; Satish, A.R. Design of an Integrated Model for Video Summarization Using Multimodal Fusion and YOLO for Crime Scene Analysis. IEEE Access 2025, 13, 25008–25025. [Google Scholar] [CrossRef]
Li, Z.; Xie, L.; Lü, Y.; Wang, G. Probabilistic Approach to Identifying Same Suspected Target from Multiple Cameras in Video Investigation. Forensic Sci. Technol. 2022, 47, 24–34. [Google Scholar] [CrossRef]
Siddique, A.A.; Mohy-Ud-Din, Z.; Qadri, M.T. Real Time Image Encoding for Fast IOT (Internet of Things) Based Video Vigilance System. Wirel. Pers. Commun. 2020, 114, 995–1008. [Google Scholar] [CrossRef]
Srinath, R.; Vrindavanam, J.; Vasudev, V.P.; Supreeth, S.; Raj, H.; Kesarwani, A. A Machine Learning Approach for Localization of Suspicious Objects using Multiple Cameras. In Proceedings of the 2020 IEEE International Conference for Innovation in Technology (INOCON), Bangluru, India, 6–8 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar] [CrossRef]
Moosmann, J.; Mandula, J.; Li, J.; Mayer, P.; Benini, L.; Magno, M. End-to-end multicamera event-image stitching and object detection on the edge. In Proceedings of the Target and Background Signatures XI: Traditional Methods and Artificial Intelligence; SPIE: Bellingham, WA, USA, 2025; p. 19. [Google Scholar] [CrossRef]
Suresh, M.S.S.; Narayanan, A.; Menon, V. Maximizing Camera Coverage in Multicamera Surveillance Networks. IEEE Sens. J. 2020, 20, 10170–10178. [Google Scholar] [CrossRef]

Figure 1. Conceptual comparison between single-camera and multi-camera detection. (Top): single-camera scenario with a limited viewpoint and potential blind regions. (Bottom): multi-camera configuration providing complementary perspectives for improved detection and tracking robustness.

Figure 2. PRISMA flow diagram of the identification, screening, eligibility, and inclusion process.

Figure 3. Layered architecture of distributed multi-camera vision systems for smart city environments. The framework integrates acquisition, communication, distributed processing (edge, fog, and cloud), multi-camera intelligence, and application services such as real-time alerts and urban monitoring. Source: Authors’ own elaboration.

Figure 4. Simplified workflow of a multi-camera system in an Industry 4.0 environment, showing the integration of distributed processing (detection, tracking, and re-identification) with industrial applications such as robot navigation, process monitoring, and safety. Source: Authors’ own elaboration.

Table 1. Distribution of the 93 studies by application domain (2020–2025).

Application Domain	Number of Studies
Public security and biometric surveillance	21
Intelligent transportation and traffic	18
Smart cities and IoT	14
Healthcare and monitoring of vulnerable individuals	10
Precision agriculture and environmental monitoring	8
Industry and robotics	10
Mobile cameras, drones and active vision	9
Emerging applications (retail, education, forensics)	8
Total	93

Note: The total number of reviewed studies is 93. Some application domains include multiple studies per topic, and individual studies may contribute to more than one domain; therefore, category counts are not mutually exclusive.

Table 2. Summary of the 93 reviewed studies by application domain (2020–2025). Categories are non-exclusive.

Application Domain	Dominant Techniques	Primary Objective	Key Findings/ Contributions	Research Trend
Public security and biometric surveillance	MTMCT, ReID, anomaly detection (CNN/MIL), spatiotemporal fusion	Cross-camera identity preservation and threat detection	Multi-view evidence improves identity consistency and reduces false alarms; distributed and edge designs reduce latency and privacy exposure	High maturity; scalable and privacy-aware MTMCT
Intelligent transportation and traffic	Detection and tracking, vehicle ReID, FoV overlap reasoning, graph models	Vehicle tracking, violation detection, traffic analytics	Robust association under night and motion-blur conditions is critical; strong trend toward cooperative edge–cloud collaboration	Strong growth; edge–cloud collaborative architectures
Smart cities and IoT	Edge/fog/cloud pipelines, orchestration, BEV projection, secure distributed frameworks	Scalable urban monitoring and infrastructure management	Edge AI reduces bandwidth consumption and latency; interoperability and governance remain open challenges	Emerging operational deployments with distributed intelligence
Healthcare and monitoring of vulnerable individuals	Multi-view fall detection, spatiotemporal deep models, privacy-aware edge processing	Emergency detection and assisted-care monitoring	Multi-camera configurations reduce blind spots and increase sensitivity; privacy constraints drive on-device inference	Experimental to pilot-stage implementations
Agriculture and environmental monitoring	Instance segmentation, multi-view fusion, calibration methods	Livestock tracking and environmental observation	Fusion improves large-area coverage; outdoor variability and calibration complexity remain key limitations	Growing adoption; dataset scarcity persists
Industry and robotics	Camera-placement optimization, distributed tracking, multi-view localization	Operational monitoring and robotic perception	Improved situational awareness and navigation robustness; IIoT cybersecurity is critical	Industry 4.0 integration trend
Mobile cameras, drones, and active perception	PTZ scheduling, multi-agent reinforcement learning, optimal control, hybrid fixed/mobile sensing	Adaptive coverage and persistent tracking	Mobile viewpoints compensate for coverage gaps; challenges include energy constraints and dynamic recalibration	Rapid growth; hybrid adaptive architectures
Other emerging applications (retail, education, forensics)	Trajectory analytics, multi-view summarization, detection pipelines	Behavior analysis and post-event investigation	Multi-camera systems enhance interpretation in crowded environments; ethical and regulatory constraints are significant	Early-stage but steadily expanding

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fierro-Silva, C.J.; Del-Valle-Soto, C.; Mostafa, S.M.; Varela-Aldás, J. Recent Advances in Multi-Camera Computer Vision for Industry 4.0 and Smart Cities: A Systematic Review. Algorithms 2026, 19, 249. https://doi.org/10.3390/a19040249

AMA Style

Fierro-Silva CJ, Del-Valle-Soto C, Mostafa SM, Varela-Aldás J. Recent Advances in Multi-Camera Computer Vision for Industry 4.0 and Smart Cities: A Systematic Review. Algorithms. 2026; 19(4):249. https://doi.org/10.3390/a19040249

Chicago/Turabian Style

Fierro-Silva, Carlos Julio, Carolina Del-Valle-Soto, Samih M. Mostafa, and José Varela-Aldás. 2026. "Recent Advances in Multi-Camera Computer Vision for Industry 4.0 and Smart Cities: A Systematic Review" Algorithms 19, no. 4: 249. https://doi.org/10.3390/a19040249

APA Style

Fierro-Silva, C. J., Del-Valle-Soto, C., Mostafa, S. M., & Varela-Aldás, J. (2026). Recent Advances in Multi-Camera Computer Vision for Industry 4.0 and Smart Cities: A Systematic Review. Algorithms, 19(4), 249. https://doi.org/10.3390/a19040249

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Recent Advances in Multi-Camera Computer Vision for Industry 4.0 and Smart Cities: A Systematic Review

Abstract

1. Introduction

2. Methodology

2.1. Search Strategy

2.2. Selection Process

2.3. Inclusion Criteria

2.4. Exclusion Criteria

2.5. Research Questions

3. Results

3.1. Overview of Included Studies

3.2. Public Security and Biometric Surveillance

3.3. Intelligent Transportation and Traffic

3.4. Smart Cities and IoT: Distributed Multi-Camera Networks

3.5. Healthcare and Monitoring for Vulnerable Individuals

3.6. Precision Agriculture and Environmental Monitoring

3.7. Industry and Robotics

3.8. Mobile Cameras, Drones, and Active Perception

3.9. Other Emerging Applications: Retail, Education, and Forensic Analysis

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI