1. Introduction
In recent decades, society has witnessed the rapid advancement of information and communication technologies, which has significantly transformed safety [
1,
2], health [
3,
4], and environmental monitoring [
5]. Innovations in the Internet of Things (IoT), artificial intelligence (AI)/machine learning (ML), and data analytics have enabled more sophisticated and real-time monitoring capabilities [
6,
7]. Wearable devices, environmental sensors, and intelligent infrastructure are now commonplace, collecting vast amounts of data that can be analyzed to improve public safety and individual well-being [
8,
9].
Concurrently, the emergence of the Metaverse represents a paradigm shift in how humans interact with digital and physical environments [
10]. Extending the traditional IoT to the Internet of Physical–Virtual Things (IoPVT), the Metaverse creates a seamless integration between physical objects and their virtual counterparts. This integration allows for real-time synchronization and interaction within the Metaverse, a collective virtual shared space that merges augmented reality (AR), virtual reality (VR), and the Internet [
11,
12]. Using the capabilities of the IoPVT, it becomes possible to create dynamic digital twins (DTs) of physical environments [
13]. These digital representations can be used for advanced analytics, predictive modeling, and immersive simulations, opening new horizons in proactive safety and health monitoring [
14].
While traditional IoT monitoring systems have significantly improved our ability to collect environmental and infrastructural data in various settings [
15,
16], these systems typically function within a limited paradigm, i.e., sensing physical phenomena and transmitting data for analysis in separate, often disconnected computational environments. In contrast, our proposed IoPVT framework represents a fundamental paradigm shift through the seamless integration of physical sensing with virtual representation and interaction. Unlike the traditional IoT, which largely maintains a separation between physical and digital domains, the IoPVT creates a continuous bidirectional flow where physical reality and its digital counterpart are constantly synchronized. This integration enables passive monitoring and active intervention through virtual simulation before physical implementation. Where conventional IoT systems might provide isolated snapshots of conditions, DTs in the IoPVT allow for a holistic, contextual understanding of complex environments while facilitating predictive analytics and immersive visualization that traditional monitoring approaches cannot achieve.
Figure 1 highlights the comparison between traditional IoT and IoPVT approaches for outdoor safety monitoring. Traditional IoT systems operate with siloed components and one-way information flow, primarily providing reactive monitoring capabilities. In contrast, the IoPVT creates a seamless integration between physical and virtual environments through digital twins and immersive analytics, enabling proactive monitoring and multidimensional context awareness with continuous feedback loops.
In addition, despite technological advancements, monitoring the safety, health, and environmental conditions in outdoor environments remains challenging [
17,
18]. Traditional systems often focus on indoor settings or controlled environments, leaving a gap in monitoring capabilities in outdoor, dynamic, and complex settings. On the one hand, outdoor environments are subject to changing weather conditions, natural disasters, and varying terrain, affecting the reliability of sensors and monitoring equipment. However, in rural and coastal areas, there may be limited infrastructure for data transmission and sensor deployment. In addition, the unpredictability of human behavior in public spaces adds complexity to monitoring efforts. Furthermore, collecting and integrating data from various sources, such as environmental sensors, biometric devices, and infrastructure monitoring systems, requires sophisticated data fusion techniques. These challenges are particularly pronounced in urban, rural, and coastal settings, with each presenting unique obstacles and requirements for effective safety monitoring.
Comprehensive solutions are needed to address the complexities of outdoor safety monitoring in different environments [
19]. New applications and tools enable interdisciplinary approaches while addressing ethical and social implications. Integrating humans, technology, and the environment is crucial to measuring and improving safety, biohealth, and climate resilience. By focusing on these three dimensions, it is possible to develop not only technologically advanced but also socially responsible and environmentally sustainable systems. The successful development of a comprehensive solution will bring many benefits to society. Not only will it reduce accidents and improve emergency response through real-time monitoring and predictive analytics, but also the ability to monitor biohealth indicators will prevent health crises and promote healthy behaviors.
Human activity recognition (HAR) focuses on identifying and analyzing human actions through sensor data [
20]. Integrating HAR with IoPVT in the Metaverse allows systems to collect data in real time, create virtual replicas of physical environments, and allow for timely interventions and informed decision making to prevent accidents and emergencies [
21]. This integration promises to revolutionize safety monitoring by providing a comprehensive, interactive, and predictive approach that adapts to the specific needs of different environments.
This paper aims to explore and articulate a future in which outdoor safety monitoring is significantly enhanced through the integration of HAR with IoPVT in the Metaverse. The proposed system addresses various challenges and illustrates the versatility of the physical–virtual intertwined approach. We focus on the interaction between humans, technology, and the environment to create a holistic monitoring system that leverages edge-based sensor data fusion, fog computing, and cloud analytics to enable real-time data processing and decision-making [
22]. By presenting a cohesive vision, the paper aims to contribute to the advancement of outdoor safety monitoring technologies and establish a roadmap for future research and development efforts.
Specifically, this paper will make the following contributions:
Present a Conceptual HARISM Framework: This vision paper outlines a framework for HAR with IoPVT for safety monitoring (HARISM), which focuses on outdoor safety applications, integrating HAR with IoPVT and Metaverse to proactively safeguard public spaces across different environments.
Explore Technological Innovations: A thorough discussion is presented on the advancements in sensor technologies, AI-driven HAR techniques, and computing architectures like edge, fog, and cloud computing.
Examine Societal Benefits: The paper highlights the potential for proactive health monitoring, enhanced emergency response, and contributions to smart city and community initiatives.
Address Challenges and Research Directions: This paper identifies technical, ethical, and policy-related challenges, emphasizing the need for interdisciplinary collaboration.
The remainder of this paper is structured as follows.
Section 2 gives an overview of the conceptual framework and the enabling technologies for outdoor monitoring systems enabled with the IoPVT.
Section 3 thoroughly discusses HAR, the core technology of the envisioned paradigm.
Section 4 emphasizes the social benefits and impacts.
Section 5 illustrates the applications in urban, rural, and coastal cities as case studies.
Section 6 indicates the challenges and opportunities and highlights critical research directions in the near future. Finally,
Section 7 concludes the paper with some final thoughts.
2. The Vision of IoPVT-Enabled Outdoor Health Monitoring
In this paper, we envision a framework for HAR within the IoPVT for safety monitoring (HARISM), involving an integration of immersive virtual environments, sensor technology, and human-centered design to realize a vision of IoPVT-enabled outdoor safety monitoring. Our vision is that by integrating the HAR with IoPVT systems within the Metaverse, data-driven insights can actively drive immediate or preventive actions in the larger community to strengthen public health, safety, and climate resilience. With the envisioned physical and digital ecosystem, where sensors in physical environments interact with their virtual twins, such systems will enable rich and continuous monitoring of urban, rural, and coastal areas in context. Instead of reacting to accidents or dangerous situations, the system would preemptively identify and minimize hazards before they become real tragedies, leading to safer neighborhoods, more agile responses by first responders, and data-driven development in urban space. The HARISM vision goes beyond traditional monitoring approaches, seeking to establish an integrated ecosystem of technologies and actors that collaboratively contribute to creating more flexible, equitable, and sustainable environments.
2.1. Conceptual Framework
Figure 2 conceptually presents the HARISM framework for outdoor safety monitoring enabled by the IoPVT. The HARISM framework centers on creating a dynamic interplay between the physical environment, its virtual representations, and advanced analytics tailored to human activities and environmental conditions. At its core, the HARISM framework connects three key dimensions—humans, technology, and the environment—through continuous data exchange and interactive feedback loops.
The Physical Layer is the foundation of the entire framework. It is a distributed network of heterogeneous sensors and devices embedded across various outdoor settings, including urban centers, rural landscapes, and coastal areas. These sensors capture a wide spectrum of data: from human motion and physiological signals (e.g., wearable biosensors, camera-based activity recognition, etc.) to environmental conditions (e.g., air quality, temperature, humidity, etc.), infrastructure integrity (e.g., structural sensors on staircases or buildings), and climate indicators (e.g., tide levels, wind speeds, precipitation, etc.). Edge computing nodes process this incoming data locally, enabling immediate responses and reducing latency.
The
Virtual Layer is where the Metaverse integration is built. The virtual counterparts of these physical objects, digital twins (DTs), reside in the Metaverse. In this immersive and interactive environment, sensor data are continuously synchronized with their virtual representation. The Metaverse visualizes real-time conditions and simulates potential scenarios, allowing stakeholders to experiment with interventions, preventive measures, or changes in infrastructure design [
12]. As DTs evolve, they incorporate predictive models that project future risks, population flows, and environmental changes, offering a forward-looking perspective on safety and health [
23].
The HARISM framework concentrates on a robust data fusion and analytics engine that synthesizes data streams from diverse sources. Advanced ML and AI-driven HAR algorithms interpret human activities, detect anomalies (such as sudden falls or suspicious behaviors), and classify complex patterns in real time [
24]. Additional predictive analytics models integrate environmental and infrastructure data, identify emerging hazards (e.g., ice formation on stairs, degraded air quality near industrial zones, etc.), and recommend proactive interventions. With the capacity to learn from historical trends and adapt to changing conditions, the analytics engine ensures that the system becomes increasingly accurate and context-aware over time [
25].
A critical aspect of the conceptual HARISM framework is the integration of feedback loops. The alerts, insights, and recommendations generated by the analytics engine are sent to multiple stakeholders: emergency responders, city planners, public health authorities, environmental agencies, and community members [
26,
27]. This bidirectional communication enables swift action, such as dispatching emergency services, issuing public safety advisories, updating evacuation routes, or adjusting resource allocation [
28]. Concurrently, stakeholders can contribute their expertise and context-specific knowledge to the system, refining predictive models, optimizing sensor placement, and informing policy decisions.
To fully realize the HARISM vision, scalability and interoperability are among the essential concerns [
14,
29]. Employing standardized communication protocols, open data formats, and modular architectures ensures that new sensor types, analytics tools, and visualization techniques can be integrated seamlessly as they emerge. Equally important is the ethical dimension, where privacy, data governance, and equitable access must be considered [
30,
31]. Mechanisms for anonymizing sensitive data, secure encryption, and transparent user consent processes help maintain public trust. At the same time, equitable infrastructural investments ensure that well-resourced urban centers and underrepresented rural or coastal communities benefit equally.
In summary, the conceptual HARISM framework envisions an ecosystem where physical sensors, virtual representations, and advanced analytics converge to produce actionable intelligence. HARISM lays the foundation for a transformative approach to outdoor safety monitoring, enabling continuous adaptation, risk mitigation, and collaborative decision making that ultimately improve the well-being and resilience of human communities.
2.2. Three-Dimensional Integration: Humans, Technology, and Environment
A fundamental principle underlying this vision for IoPVT-enabled outdoor safety monitoring is the understanding that humans, technology, and the environment constitute an interdependent triad. To effectively monitor health and safety outside the four walls of buildings, we need to consider how these three pillars are connected, how they each shape the other and, ultimately, how they form the environment in which residents thrive or not.
Human individuals and communities whose well-being and safety the system aims to protect and enhance are the core of this three-dimensional integration. HAR serves as a critical element, focusing on identifying and interpreting movement patterns, behavior, and physiological signals [
32]. By continuously monitoring individuals’ activities, whether walking up an icy staircase in Montreal, commuting to work along rural roads, or navigating a coastal city’s rising sea levels, the system gains insights into people’s needs, vulnerabilities, and responses to their surroundings. In the long run, human-focused data lead to faster action depending on context, personalization, and more just safety policies that honor individual privacy and cultural environments.
The
technological layer refers to tools, platforms, and analytic capabilities that enable proactive outdoor safety monitoring. This means not just physical sensors and devices implemented on the ground, for example, environmental monitors, wearable health sensors, and structural integrity gauges, but it also includes the underlying computing backbone [
33,
34]. Edge nodes, fog computing platforms, and cloud-based analytics engines work together to aggregate data, process them efficiently, and transform raw inputs into actionable intelligence. This integrated dataset is extracted for patterns and anomalies by AI-driven algorithms and simulated in the Metaverse environment to visualize and simulate potential interventions [
35]. This technology enablement layer should be designed to evolve, such as advances in hardware, newer AI techniques, and better communication networks, to make it more scalable, reliable, and resilient.
The
environment, which includes both built and natural surroundings, is complex due to the many factors that influence human health and safety [
19,
36]. Urban landscapes might include complex infrastructure and seasonal risks. At the same time, rural communities can grapple with localized industrial or agricultural issues, and coastal areas have to deal with the impacts of climate change, such as flooding and heavy storms. The system embeds sensors in different terrains to track environmental conditions such as air quality, temperature, wind, precipitation, and structural integrity, mapping them in the Metaverse through the modeling of dynamic factors to holistically consider the awareness of environmental stressors [
37]. This perspective guides risk evaluations and strengthens the formulation of context-sensitive risk reduction measures, ensuring that interventions are appropriately tailored to regional ecological and infrastructural contexts.
Adaptability and improvement rely on continuous feedback loops across these three dimensions of integrated solutions [
38]. Understanding human behavior helps decide where and how sensors might be placed or what predictive models to fine-tune. Environmental changes that stress new technological infrastructure require sensor placement adjustments, data fusion techniques, and analytical approaches. Technology, in turn, improves the quality, timeliness, and usefulness of data, allowing systems to meet human needs more effectively and adapt to changing environmental conditions. Adaptability through cyclical reinforcement means that the three elements remain in equilibrium with each other, with each one informing and refining the other.
Framing outdoor safety monitoring as a synergy of humans, technology, and the environment lays the groundwork for a responsive, context-aware ecosystem. Rather than treating data in isolation, the system acknowledges the multifaceted interplay among these dimensions, leading to more robust, inclusive, and future-proof solutions to improve public safety and health in the outdoor world.
2.3. Technological Innovations Enabling the Vision
The realization of IoPVT-based outdoor environment safety monitoring is driven by multiple technological advances and convergence in these technologies that improve data-based data-based collection, processing, modeling, visualizing, and decision-making processes. These innovations include a broad continuum of devices, from state-of-the-art sensors and next-generation networking infrastructures to sophisticated analytics, immersive virtual platforms, and intelligent automation. By strategically combining these elements, society moves closer to an environment where proactive, data-driven approaches to health and safety become the norm rather than the exception. They are defined as follows:
Advanced Sensor Technologies: The first pillar of the technological foundation lies in the continuous evolution of sensor hardware. Miniaturized, energy-efficient, and ever-more-affordable sensors are now able to capture a wide range of signals, from human biometrics and structural integrity indicators to environmental parameters like temperature, humidity, particulate matter, and noise levels [
39,
40]. Thanks to innovations in materials science and nanotechnology, there are now sensors that can withstand the rigors of outdoor environments—freezing winters, salt-laden air on the coast, or dusty countryside roads—allowing for reliable and longer-lasting deployments [
41,
42]. The system’s overarching situational awareness is ultimately deepened by improving sensor precision and new modalities, spurring the detection of subtleties that may eventually lead to potential safety hazards.
Ubiquitous Connectivity and Networking: The second critical layer for efficiently transporting collected sensing data is robust communication networks. The proliferation of 5G/6G networks, low-power wide-area networks (LPWANs), and new IoT protocols provides scalable and low-latency connectivity even in challenging terrain [
43,
44]. Such ubiquitous connectivity ensures that data flow quickly from edge devices to processing nodes, allowing for real-time responses. In the rural and coastal contexts, satellite communications, mesh networks, and specialized relay nodes can extend coverage and maintain constant monitoring despite geographical and infrastructural challenges.
Edge, Fog, and Cloud Computing Architectures: Orchestrating an equilibrium of edge, fog, and cloud computing resources is at the heart of effective data processing [
45,
46]. Some sensor data filtering and aggregation are done locally in the edge devices, allowing for immediate response to urgent local events [
47]. Fog nodes near the network edge that serve as intermediaries between sensors and the cloud can do complex analytics, cache essential data, and lower backhaul costs [
48,
49]. Cloud platforms offer immense computational power for historical trend analysis, long-term storage, and global optimization. Following a multitiered architecture, the data flow can be optimized, allowing for faster decisions and providing a scalable feature and resilient system to accommodate changing requirements.
AI-Driven Analytics and Predictive Modeling: AI lies at the core of the decision-making engine. ML algorithms analyze multimodal data, recognize intricate activity patterns through human activity recognition [
50], and discover anomalies that signal safety threats, whether slippery surfaces on city staircases or degrading air quality in rural industrial areas [
51]. These algorithms will run through historical and user feedback data, making their predictions more and more accurate, thus allowing for proactive interventions. Predictive modeling tools like digital twins and scenario simulations in the Metaverse enable stakeholders to visualize potential future states, test response strategies, and fine-tune policies long in advance of any emergency emerging [
52,
53].
Metaverse Integration and Immersive Visualization: The fusion of physical and virtual worlds through the Metaverse fundamentally changes how data are understood and acted upon [
54,
55]. Rather than combing through static graphs or disparate data streams, stakeholders build interactive, three-dimensional environments that mirror real-time conditions. Weather patterns, human movement, infrastructure stressors, and environmental quality come together in a shared virtual space. Decision makers can “walk” through digital twins of neighborhoods, identify high-risk zones, and simulate interventions, such as rerouting pedestrian traffic, adjusting energy usage, or optimizing evacuation plans, gaining insights that would be hard to discern from traditional analytics alone [
56].
Automation, Control, and Feedback Mechanisms: Some time-sensitive tasks will be carried out by automated mechanisms as the system transitions from immature to mature. Intelligent signage could change to route walkers around threatened areas, autonomous drones could inspect infrastructure before it fails, and remotely controlled environmental controls could help mitigate hazards before they start. Significantly, an increasing degree of automation does not marginalize human judgment. On the contrary, it augments and enhances human decision-making capabilities [
57,
58]. Using continuous feedback loops supported by bidirectional communication channels provides for ongoing responsiveness to community input, expert advice, and governance principles and ensures that accountability and transparency are upheld [
59].
In essence, the technology that makes this vision possible is a stacked synergetic ecosystem. The sensor and network layer, the compute and AI-driven analytics layer, and the immersive interface layer provide distinct capabilities. Together, these tools offer an opportunity to redirect how we move beyond reactive, shattered approaches to proactive, integrated strategies that protect human health, build ecological resiliency, and reimagine the very nature of how we live, work, and move in the outdoors.
3. Human Activity Recognition Techniques
Human activity recognition (HAR) techniques have evolved significantly over the past decades, starting with manual observation and advancing to sophisticated vision-based and sensor-based systems. Early efforts relied on vision-based systems and later on wearable sensors to capture motion data, which were analyzed using basic machine learning algorithms [
60]. With advances in sensing technologies, computational power, and the emergence of deep learning, HAR now incorporates data from multimodal sources, enabling highly accurate and real-time activity recognition [
61]. These techniques have had transformative impacts in various domains: in defense, HAR is vital for surveillance, threat detection, and situational awareness [
62]; in industry, it enhances workplace safety, productivity monitoring, and automation [
63]; and in healthcare, it improves patient monitoring and facilitates smart home technologies [
64]. The continuous advancement of HAR is driven by the growing need for systems that can operate robustly in complex, dynamic, and real-world environments, addressing challenges like missing data, unreliable inputs, and diverse human behaviors. This drive is further fueled by the potential for HAR to revolutionize emerging areas such as human–robot interaction, augmented reality, and personal well-being.
3.1. Sensing Techniques
Accurate and reliable HAR is based on high-quality data, which form the foundation for feature extraction and the development of machine learning models. This section provides an overview of the sensing techniques commonly used for HAR.
3.1.1. Data Sources and Sensing Modalities
HAR data can be collected from various sources, including wearable sensors, environmental monitoring stations, mobile devices, smart appliances, and IoT devices. Common wearable sensors include accelerometers, gyroscopes, magnetometers, and physiological sensors. These sensors are compact, affordable, and suitable for real-time HAR, though they may cause discomfort if worn for extended periods. Environmental sensors, or ambient sensors, are placed in the surroundings to capture activity indirectly. These sensors include cameras, infrared sensors, pressure sensors, passive radio frequency sensors, and microphones. While environmental sensors are nonintrusive, they may raise privacy concerns, especially with visual and audio data. Smartphones and other mobile devices are rich sources of data for HAR. Other examples include inertial measurement unit (IMU) sensors, such as accelerometers, gyroscopes, and magnetometers, which are widely integrated into smartphones. GPS data can contextualize activities like walking, running, or commuting. Microphone data can capture audio cues such as footsteps, voices, or environmental sounds. Touchscreen data can reveal interaction patterns, such as swiping or typing behavior, which can infer activities. Barometer data measures changes in altitude for activities like climbing a mountain.
Devices designed for gaming or virtual reality provide detailed motion data. Motion controllers can track hand gestures and physical interactions, while head-mounted displays (HMDs) provide head movement and gaze-tracking data. Treadmills and simulators can capture data on walking, running, or driving movements. User-generated data from social media or apps can complement HAR. Check-ins and location tags indicate activities like visiting a gym or restaurant. Activity logs from apps such as fitness trackers record specific actions like jogging or cycling. Data from connected devices in a smart ecosystem also contribute to HAR. Fitness equipment tracks activities on treadmills, cycles, or rowing machines. Smart TVs detect viewing patterns and associated user behaviors. Kitchen appliances’ usage patterns can indicate activities like cooking or eating. These diverse data sources can be used individually or in combination, depending on the requirements of the HAR application. Multimodal approaches are particularly powerful in improving accuracy and robustness.
3.1.2. Data Preprocessing and Feature Extraction
The raw data collected from various sensor modalities are often noisy and high dimensional, necessitating robust preprocessing and feature extraction techniques before being applied for training machine learning algorithms and inference. This section discusses the critical steps and techniques in data preprocessing and feature extraction for HAR, considering the diverse sensor modalities involved.
Data preprocessing is the first and essential step in HAR pipelines, which is aimed at preparing raw data for feature extraction and classification. Due to data heterogeneity, the major challenges include varying the sampling rates, formats, and noise characteristics associated with different sensors. The main preprocessing tasks include data cleaning, synchronization, segmentation, and transformation. Data cleaning addresses noise, missing values, and outliers that often exist in sensor data. Techniques vary depending on the sensor modality. For wearable sensors, signal noise caused by movement artifacts can be reduced by filters such as Butterworth or Savitzky–Golay. Missing data can be imputed using interpolation or predictive models. For environmental sensors, redundant or conflicting signals caused by overlapping sensor ranges can be removed using statistical methods or clustering. Filtering GPS signals collected by mobile devices by Kalman filters is common practice to account for location inaccuracies. And low-pass filters are often applied to address jitter in motion capture data on gaming devices. Incomplete posts or erroneous timestamps can be handled by natural language processing (NLP) techniques.
Data synchronization and segmentation can ensure that data from different sensors align temporally and divide continuous data streams into smaller, meaningful segments. For example, timestamps can be used for the synchronization of wearable and environmental sensors. Cross-correlation can identify and correct time lags. For data segmentation, the sliding window is a common approach suitable for continuous and repetitive activities like walking or running. Event-based segmentation, which detects specific events, such as phone calls in mobile data or hand movements in VR systems, is particularly helpful in scenarios where human activities are triggered by discrete events or involve task-specific actions. Finally, data transformation, such as normalization and standardization, prepares the data for feature extraction.
Feature extraction transforms preprocessed data into representative attributes that capture the essence of human activities. Features can be broadly categorized into time domain, frequency domain, and domain-specific features. Time domain features are computed directly from raw data and are straightforward to interpret. Common time domain features include statistical measures, such as mean, median, variance, and standard deviation, and the signal magnitude area (SMA), which captures the overall activity level, zero-crossing rate useful for cyclic activities, peak intensity such as steps in walking data, and autocorrelation, which measures the similarity between signal segments at different time lags to detect periodic patterns.
Frequency domain features are often derived using fast Fourier transform (FFT) or wavelet transform. These features are crucial for recognizing periodic and oscillatory activities. The power spectral density (PSD), dominant frequency, and spectral entropy are often calculated for repetitive activities, while wavelet coefficients are useful for transient activities.
Machine learning models benefit from feature-rich datasets, emphasizing domain-specific feature extraction. Domain-specific features leverage knowledge of the application or sensor modality to extract meaningful attributes. For wearable sensors, gait parameters, such as stride length, cadence, or joint angles using inertial data, are often extracted. The occupancy of the room or the activity context can be inferred from environmental sensor features, such as temperature, light, and humidity patterns. Combining GPS data with accelerometer readings on mobile devices can help to identify walking, driving, or stationary states. For gaming and VR devices, pose and motion trajectories can be extracted from 3D position data to recognize gestures or interactions. Analyzing text features, such as word frequency and sentiment, or posting patterns on social media can correlate online behavior with physical activities.
3.2. AI-Driven HAR Algorithms
Advancements in artificial intelligence (AI) have revolutionized HAR systems, enhancing their accuracy and adaptability. The ability to operate in real-time and adapt to diverse environments makes AI-driven HAR algorithms indispensable for a wide range of user-centric technologies. One of their primary strengths is the ability to learn complex patterns from high-dimensional data. Deep learning techniques, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have significantly advanced HAR by automating the feature extraction process [
65]. CNNs excel in capturing spatial features from data, such as video frames or sensor signals, while RNNs are well-suited for sequential data, such as time series signals from wearable devices. These models not only improve recognition accuracy but also adapt to a diverse set of activities without extensive manual intervention.
In addition to supervised learning, unsupervised and semi-supervised learning methods are gaining traction in HAR. These approaches are particularly valuable for scenarios where labeled data are limited or difficult to obtain. Techniques like clustering and generative models can uncover hidden patterns in unlabeled data, enabling robust activity classification with minimal human input. Furthermore, advancements in transfer learning allow HAR systems to leverage pretrained models, reducing the computational cost and time required for deployment.
3.2.1. Traditional Machine Learning Models for HAR
Traditional machine learning models were among the first AI-driven approaches to be applied to HAR. These models typically require hand-crafted features extracted from raw sensor data. Sensors like accelerometers and gyroscopes are typically used to capture motion and orientation data, which are then preprocessed and transformed into features such as mean, variance, and frequency domain components. Decision trees, support vector machines, and random forests have been widely used for HAR.
Decision trees are interpretable models that partition feature space into regions based on a series of hierarchical splits. Studies such as [
66] demonstrated their effectiveness in recognizing basic activities such as sitting, standing, and walking. Support vector machines (SVMs) have been widely used for HAR due to their ability to handle high-dimensional data and nonlinear relationships. Kernel functions play a critical role in enhancing their performance in complex activity datasets. For example, Laxmi et al. proposed a fuzzy proximal kernel SVM as a robust classifier, which transforms samples into a higher dimensional space and assigns their membership degree to reduce the effect of noise and outliers [
67]. The study focuses on recognizing walking, sitting, standing, lying down, and transitioning between these states. Random forest is a popular ensemble learning method used for HAR due to its robustness and ability to handle large datasets with high dimensionality. It works by constructing multiple decision trees during training and outputting the mode of the classes or mean prediction of the individual trees. In [
68], Wang et al. investigated the effectiveness of a random forest machine learning algorithm to identify complex human activities using data collected from wearable sensors, demonstrating its potential for applications such as health care monitoring and activity tracking.
3.2.2. Deep Learning Approaches
While traditional models have proven effective for simple tasks, their reliance on feature engineering limits their scalability and generalizability to diverse activities and sensor modalities. Deep learning techniques are known for their ability to extract high-level features directly from raw data. Neural networks, particularly CNNs and RNNs, excel in capturing spatial and temporal patterns in activity data and have become the backbone of modern HAR systems. In [
69], Islam et al. discussed the performances, strengths, and hyperparameters of CNN architectures for HAR along with data sources and challenges. The study in [
70] evaluated a pretrained CNN feature extractor for HAR using an inertial measurement unit (IMU) and audio data. While CNNs are particularly effective in capturing spatial dependencies in inertial data, RNNs and long short-term memory networks (LSTMs) are widely used for modeling temporal sequences in human motion signals [
71]. Pienaar et al. [
72] demonstrated using an LSTM–RNN model for HAR using the WISDM dataset [
73]. The dataset includes activities such as jogging, walking, sitting, standing, ascending stairs, and descending stairs captured using accelerometer data on smartphones. The advent of multimodal sensor fusion has further amplified the potential of these techniques by integrating data from diverse sources, such as accelerometers, gyroscopes, and cameras, to improve recognition accuracy.
Multimodal sensor fusion, enabled by deep learning, plays a pivotal role in addressing the limitations of single-modality systems, such as sensor noise and modality-specific failures. Techniques like attention mechanisms and graph neural networks (GNNs) are increasingly employed to model the relationships and complementarity between modalities. For example, attention-based architectures selectively emphasize informative signals across modalities, thus mitigating the impact of unreliable or missing data. Ma et al. proposed AttnSense [
74], a neural network model incorporating a multi-level attention mechanism. The study utilizes data from accelerometers and gyroscopes to identify activities, including running, walking, and sitting. The model’s combination of the CNN and gated recurrent units (GRUs) network captures dependencies in spatial and temporal domains, leading to improved performance. However, GNNs provide a framework for representing multimodal data as a graph structure, capturing complex interdependencies between sensors. Yan et al. proposed a graph-inspired deep learning approach to HAR tasks [
75]. The GNN framework shows good meta-learning ability and transferability for activities like walking, running, and transitional movements.
3.2.3. IoPVT-Based HAR
Emerging techniques in human activity recognition (HAR) are increasingly leveraging the Internet of Physical–Virtual Things (IoPVT) to enhance the accuracy and efficiency of activity detection [
76,
77]. In IoPVT-based HAR systems, the data collected by physical sensors are transmitted to virtual platforms, where advanced algorithms and machine learning models process and analyze the information to recognize and classify different activities. In addition to the aforementioned deep learning models, IoPVT-based HAR systems rely on data synthesis techniques and transfer learning to combine data from physical sensors with virtual data sources, such as simulated sensor outputs or data extracted from videos, to create comprehensive datasets for training HAR models and apply the knowledge gained from virtual environments to real-world scenarios, enabling HAR systems to generalize better across different contexts and reduce the need for extensive real-world data collection.
Studies have demonstrated that the fusion of wearable sensors with environmental sensors and virtual sensor data, e.g., from videos, simulations, or digital twins, enhances the robustness of HAR systems, particularly in real-world settings with diverse activities and environmental conditions [
78,
79,
80]. Cross-domain learning will allow models to generalize better across different environments and individuals. Digital twins of human activities can be created using the IoPVT to simulate and predict behaviors in different environments, making them useful for applications in rehabilitation, ergonomics, and personalized training. For example, distinguishing “walking upstairs” from “climbing a ladder” might be tough with just a wrist sensor, but the IoPVT could cross-reference room layout data, pressure sensors on stairs, and a virtual simulation of the user’s typical movements.
While AI-driven HAR offers significant advancements in accuracy and adaptability, it is not without limitations. Data bias remains a critical concern, as models trained on nonrepresentative datasets may fail to accurately recognize activities across diverse populations, demographics, or cultural contexts, potentially leading to skewed outcomes or exclusion of underrepresented groups. Additionally, model generalizability poses a challenge, as algorithms optimized for specific environments, such as urban settings, may struggle to perform effectively in rural areas with differing activity patterns, sensor availability, or environmental conditions. Furthermore, computational constraints in real-time processing can hinder deployment, particularly on resource-limited edge devices, where high-dimensional data and complex deep learning models demand significant power and memory, potentially introducing latency or reducing responsiveness in time-sensitive safety applications.
IoPVT-based HAR systems have the potential to address these limitations. Traditional AI models are often static, trained offline, and deployed without continuous updates. The IoPVT’s interconnected nature allows for continuous learning across devices. Edge nodes, such as a smartwatch, could handle immediate processing, while cloud-based virtual components update models in real time based on new data from the network. This could reduce latency and make HAR more responsive to unexpected scenarios. Conventional HAR struggles when scaling beyond single devices or users. The IoPVT framework, rooted in the principles of the IoT, is designed for interoperability. Consider a hospital setting where patient wearables, bed sensors, and staff devices all feed into a unified HAR system. The virtual layer could aggregate and analyze these data, recognizing group activities, such as a team lifting a patient, that a single-device AI might miss. This scalability could make the IoPVT more robust for large-scale applications like smart cities or industrial monitoring. Deep learning models in conventional HAR can be computationally heavy, often requiring powerful hardware or offloading to the cloud, which introduces delays. The IoPVT uses edge computing, processing data closer to the source while using virtual components in the cloud for heavier tasks like model optimization. Edge-driven HAR has already shown promising accuracy with lower resource use. The IoPVT could take this further, dynamically balancing workloads to optimize power and speed. Digital twins, virtual replicas of physical entities, could simulate activities in real time, predicting outcomes or filling in gaps when sensor data are noisy or incomplete. Conventional HAR rarely incorporates such predictive modeling. For instance, if a sensor fails in mid-activity, the IoPVT could use a virtual twin trained on past data to infer the activity, which static AI models cannot easily do. Additionally, the virtual layer of the IoPVT supports cross-domain learning and transfer learning, allowing models trained in well-monitored urban environments to be adapted for less-resourced rural or coastal areas. By simulating rare or region-specific activities, digital twins enable models to generalize better by anticipating edge cases that might not be captured in limited physical datasets.
We believe that IoPVT-enabled HAR is a game-changer, pushing the boundaries of context-aware AI and ubiquitous computing. Using advances in AI, digital twins, and next-generation IoT networks, HAR systems will become more intelligent, efficient, and privacy-aware, making them indispensable in healthcare, smart environments, and industrial applications.
3.3. Predicative Analytics and Hazard Identification
Using historical and real-time data, predictive analytics aims to forecast future events or conditions. Human activity recognition contributes to this field in multiple ways. In healthcare monitoring and disease prediction, HAR can track movement patterns and detect anomalies to provide early warnings of diseases like Parkinson’s, Alzheimer’s, or cardiovascular conditions. Gait analysis can predict fall risks in elderly patients, enabling preventive interventions. For individuals with disabilities, HAR can predict assistance needs, optimizing caregiver responses. In industrial and office environments, HAR can analyze worker postures and movements to predict fatigue, musculoskeletal disorders, and potential productivity declines [
81,
82]. Activity data can be studied to suggest ergonomic adjustments or break schedules to prevent injuries. Furthermore, daily activity patterns extracted through HAR can be integrated into smart home and building management, automating lighting, heating, and security systems based on user behavior [
83]. Finally, in sports and performance analytics, HAR tracks athlete movements to identify inefficiencies, predict performance trends, and suggest training modifications for injury prevention [
84].
HAR is widely used in safety-critical environments to detect hazardous conditions and prevent accidents. In hazardous environments such as construction sites, HAR systems can analyze deviations from standard workflows; detect risky behaviors like improper lifting, lack of safety gears, or proximity to heavy machinery; and trigger real-time alerts [
85]. In chemical plants and warehouses, HAR monitors worker interactions with hazardous substances, ensuring compliance with safety protocols. It can predict exposure risks and suggest precautionary measures based on movement patterns. In transportation, HAR is used in driver monitoring systems to detect fatigue and analyze aggressive driving behaviors to prevent traffic violations [
86]. In public spaces, HAR helps detect suspicious activity, helping security personnel identify potential threats or criminal behavior. In emergency response scenarios, HAR can detect panic behaviors or sudden movements that indicate distress.
By combining HAR with predictive analytics, we gain the ability to identify and mitigate risks before they escalate. From healthcare and workplace safety to transportation and security, HAR-driven insights enable proactive interventions, ultimately improving efficiency, safety, and overall quality of life.
4. Societal Benefits and Impact
Integrating human activity recognition with IoPVT-enabled outdoor monitoring has profound implications for societies worldwide. At its core, the HARISM concept extends beyond technical innovation, seeking to empower communities with safer public spaces, healthier living conditions, and more informed responses to environmental challenges [
87]. By capturing and analyzing data on human behavior, environmental changes, and infrastructural integrity, systems like HARISM can prevent risks, ensure timely interventions, and facilitate better resource allocation. In doing so, they not only enhance immediate safety outcomes, such as fewer accidents, faster emergency response times, and more resilient infrastructure, as well as also contribute to broader, long-term benefits. These include reinforcing public trust in civic technologies, informing equitable policy development, fostering social cohesion around shared challenges, and guiding responsible urban planning in the face of evolving climate and demographic pressures. Ultimately, the societal benefits and impact of this integrated approach reach well beyond technology, shaping inclusive, adaptive, and thriving communities for generations to come.
4.1. Proactive Health and Safety Monitoring
The IoPVT-enabled HARISM framework aims to be a proactive health and safety monitoring paradigm, shifting from a crisis response to a prevention-based model, potentially stopping disasters before they happen. Traditionally, outdoor hazards, such as slippery walkways, compromised stairs, deteriorating air quality, and extreme weather events, have been reactively addressed, with solutions deployed only after accidents or noticeable degradation. With continuous sensing, HAR, and predictive analytics integrated, community leaders and individuals can anticipate challenges in advance and make timely, targeted decisions to deal with them.
A critical part of this proactive approach is the ability to spot early indicators of risks and diffuse them before any real damage is done. For instance, environmental sensors placed along outdoor staircases in urban cities like Montreal can detect subtle changes in temperature and moisture that indicate ice formation, prompting maintenance crews or automated de-icing systems to intervene well before pedestrians face slipping hazards. In the same way, wearable biosensors and HAR solutions based on cameras recognize distress patterns, such as an elderly person climbing steps and falling over or a sudden change in gait that suggests a health emergency, which could trigger a cascade of alerts and responses from city officials, medical services, or neighbors [
88,
89].
Proactive monitoring helps with long-term resilience and planning beyond short-term intervention. Historical data and predictive models can reveal patterns in infrastructure wear and tear, population movements, environmental factors, etc. These can influence the placement of sensors, the design of public spaces, the allocation of resources, and other aspects of urban planning [
90,
91]. By integrating these insights into policymaking and urban planning, cities can adapt more effectively to emerging challenges, such as shifting demographics, increased tourism, or the increasing impacts of climate change.
This approach facilitates a culture of preparedness and shared responsibility for health and safety. As communities increasingly trust that their surroundings are being actively observed and cared for, public confidence in civic technology grows. This trust, in turn, fosters greater collaboration between stakeholders, from city planners and public health experts to residents and entrepreneurs, who can use shared data and insights to drive innovative safety interventions, build health infrastructure, and optimize quality of life. Instead of relying on a reactive status quo, proactive monitoring is a forward-thinking mindset that embraces, instead of settling for, the sustainability of culture for us and future generations.
4.2. Enhanced Emergency Response
In addition to mitigating accidents and health-related incidents, IoPVT-based outdoor safety monitoring has the potential to redefine the emergency and response landscape. Traditionally, emergency services depend on delayed or incomplete information, such as calls from people who witnessed accidents, on-site assessments, and best guesses about conditions, before they can dispatch resources. This lag in situational awareness frequently results in slower response times, suboptimal allocation of emergency personnel, and problems with coordination across agencies and geographies.
However, in the face of this challenge, emergency responders obtain near-instantaneous access to critical information by linking next-generation sensors, HAR algorithms, and immersive virtual environments in the Metaverse [
12,
92]. The continuous flow of real-time environmental sensor data can detect structural failures or accelerating rises in floodwater to HAR systems, identifying individuals who have fallen or are showing signs of physical distress. Using this continuous instant-sensing data stream, IoPVT-based outdoor safety monitoring systems realize incident location and categorization in seconds for emergency operations centers. This accelerated detection process shortens the interval between onset and intervention, potentially saving lives and reducing the severity of injuries or property damage.
In urban environments like Montreal for example, a hazardous icy staircase incident would no longer rely solely on bystanders reporting the event. Instead, the system’s advanced analytics would recognize a high-risk condition or an actual fall, automatically alerting local emergency services and dispatching rescue teams to the exact location. Meanwhile, Metaverse-based visualization tools allow dispatchers and responders to navigate a real-time digital twin of the scene, assessing hazards, planning safe routes for paramedics, and anticipating what equipment might be necessary [
93]. In rural settings, where distances between facilities and incidents can be long, this improved situational awareness translates into more efficient resource deployment and coordinated efforts across fire departments, ambulance services, and community-based first aid responders.
Furthermore, in coastal areas vulnerable to climate impacts, such as storm surges or flash flooding, linking IoPVT data with meteorological and oceanographic modeling provides emergency managers with actionable information. By simulating different response strategies in the Metaverse, decision makers can strategically preposition medical units, rescue boats, and shelters to ensure assistance immediately reaches vulnerable populations quickly and safely. Over time, historical data can be used, as well as lessons learned, to determine how to utilize the data to fine-tune evacuation protocols, predict impact zones, and recommend resilience-building measures to local authorities.
Ultimately, systems like HARISM have two-fold impacts on emergency response. They will improve immediate reaction time and strategic decision making during crises and lay a foundation for continuous improvement in preparedness and resilience. As this cycle of informed action and subsequent learning continues, communities become better equipped to handle emergencies, strengthening public trust in the systems designed to protect them and reinforcing the collective safety net for current and future generations.
4.3. Contributions to Smart City and Community Initiatives
Integrating outdoor safety monitoring enabled by the IoPVT into the larger landscape of urban governance and community development can be a pivotal step toward realizing the vision of truly “smart” cities [
94]. Smart city initiatives frequently emphasize the importance of connectivity, data-driven decision making, and citizen engagement [
95]. By weaving together human activity recognition, environmental sensing, and predictive analytics within immersive virtual platforms, the approach directly aligns with the core principles of sustainable urban planning and collective well-being.
4.3.1. Integration with Existing Smart City Infrastructure
The HARISM framework is designed to complement and enhance existing smart city infrastructures rather than replace them. This integration can occur at multiple levels. The physical layer of our proposed system can interface with existing sensor networks that many cities have already deployed. For instance, cities with established traffic monitoring cameras, environmental sensors, or public Wi-Fi infrastructure can repurpose these assets as data collection points for our HAR-enabled safety monitoring. This repurposing maximizes return on investment while minimizing the need for redundant sensor deployments.
Most progressive cities have implemented some form of a data center or urban management platform. Our IoPVT system is designed with standard API interfaces that allow for bidirectional data exchange with platforms like IBM’s Intelligent Operations Center (IOC) [
96] or open-source alternatives like FIWARE [
97]. This interoperability ensures that the safety insights generated by our system can flow into existing urban dashboards while also consuming relevant contextual data from other city systems.
Real-time alerts and predictive analytics from our safety monitoring framework can be injected directly into established emergency response systems. For example, when potential hazards are detected, automated notifications can be routed through existing Computer-Aided Dispatch (CAD) systems used by police, fire, and emergency medical services, accelerating response while maintaining established protocols and workflows.
4.3.2. Data-Informed Urban Governance and Planning
At the policymaking level, IoPVT data can inform long-term strategies by giving city officials and planners access to abundant valuable insights. Cities often struggle to identify priority areas for infrastructure investment, particularly when competing needs balance limited resources. Our system provides evidence-based metrics that can guide these decisions from multiple points of view.
By mapping accident frequencies, near-misses, and environmental hazards across urban spaces, planners can identify high-risk zones that warrant immediate attention. For example, analyzing where pedestrians gather, identifying accident hotspots, and monitoring localized health risks among communities would all inform and improve the design of infrastructure. This geospatial intelligence allows for targeted interventions rather than blanket approaches.
The continuous monitoring of infrastructure usage patterns and environmental impacts can guide both maintenance schedules and redesign priorities. Planners could prioritize upgrading dangerous staircases, using more durable construction materials in areas vulnerable to climate impacts and adding adaptive lighting or signage in areas with poor visibility. For instance, if certain pedestrian crossings consistently show risky behavior patterns despite signage, the system could suggest alternative designs based on successful implementations elsewhere.
The 24/7 monitoring capability provides insights into how urban safety risks evolve throughout daily cycles, seasonal changes, and in response to events or festivals. This temporal dimension enables dynamic resource allocation and adaptive management strategies that conventional periodic audits cannot achieve.
The DT and Metaverse components of our framework allow urban planners to simulate proposed changes before physical implementation. This capability enables virtual A/B testing of infrastructure modifications, such as evaluating multiple staircase designs for slip resistance during simulated winter conditions or assessing how redesigned public spaces might affect pedestrian flow during emergency evacuations.
As time goes on, with these data-driven interventions layering on top of each other, cities become safer, healthier, and more responsive to the changing needs of their people. The cumulative effect is a more resilient urban environment that proactively adapts to evolving challenges rather than reacting to failures.
4.3.3. Citizen Engagement and Participatory Safety Management
Furthermore, a live monitoring system that guarantees privacy promotes community engagement through transparency and communication [
98]. Modern smart city initiatives increasingly recognize that technological solutions must be accompanied by robust citizen participation to be truly effective and equitable. Public dashboards, data-driven storytelling, and educational outreach programs can help inform citizens about how their city’s infrastructure and services are prepared to respond to emerging risks. Unlike traditional “black box” systems, our framework emphasizes accessible visualizations and plain-language interpretations of safety analytics, enabling meaningful public discourse about risk priorities.
The system can incorporate citizen-reported observations through mobile applications or community portals, creating a hybridized monitoring approach that combines automated sensing with human intelligence. This feature transforms residents from passive subjects of surveillance into active participants in the safety ecosystem. Local residents, businesses, and other advocacy groups could use these insights to call for investments in public health resources or to suggest community-led programs that address local challenges, from forming neighborhood watch groups to creating networks of volunteers trained to help vulnerable populations in times of crisis. The system can even help match community needs with appropriate resources, such as connecting vulnerable elderly residents with nearby volunteer networks during extreme weather events.
By making safety metrics and response times transparent, the system creates natural accountability for service providers and city departments. Citizens can track progress on safety initiatives and hold officials accountable for addressing identified hazards, while officials gain tools to demonstrate the impact of their investments.
4.3.4. Extensibility to Broader Urban Challenges
As the IoPVT ecosystem matures, there are opportunities for its capabilities, beyond safety and health, to be leveraged in areas such as energy management, environmental sustainability, cultural heritage, etc. Data collected for safety applications can help inform energy usage patterns to better balance consumption and production or reduce the carbon footprint of critical infrastructure [
99,
100]. For example, lighting systems monitoring pedestrian activity for safety can automatically adjust brightness levels based on actual usage patterns rather than fixed schedules, conserving energy while maintaining safety.
In coastal communities most affected by climate change, these data can guide adaptation steps, from strengthening sea walls to redesigning green spaces and galvanizing joint climate action within the community. The predictive capabilities of our system can model the compound effects of extreme weather events combined with infrastructure weaknesses, enabling proactive resilience planning. The same infrastructure that monitors safety can be extended to track environmental health indicators such as air quality, noise pollution, or urban heat islands. These factors have significant impacts on public health, and their continuous monitoring provides valuable inputs to healthcare planning and environmental policy.
Safety monitoring systems in historic districts or cultural sites can be augmented to provide tourism information, crowd management during events, and digital preservation of heritage sites through detailed digital twins. This integration enhances both the safety and experiential value of cultural spaces. This aggregate data mining not only serves as a measure to mitigate risks but also forms a bedrock that allows urbanization to play a role as part of holistic urban resilience and socioeconomic vitality by connecting these dots. The multipurpose nature of the system maximizes return on investment while creating synergies across traditionally siloed urban management domains.
4.3.5. Toward Humanistic Smart Cities
At its core, participation in smart community initiatives goes beyond immediate improvements in health and safety; it opens the doors to inclusivity, participatory governance, and long-term sustainability. These integrated monitoring systems, built on the foundation of aligning high-tech with community values, help create a fabric of citizens, policies, and the environment we inhabit together. Our framework includes specific provisions for ensuring that technological benefits reach underserved communities. Mobile interfaces, multilingual support, and simplified access points ensure that safety information remains accessible regardless of technological literacy or socioeconomic status.
Unlike many surveillance-heavy approaches to urban safety, our system emphasizes privacy preservation through techniques such as edge computing (processing data locally before transmission), automated anonymization, and transparent opt-in mechanisms for more sensitive data collection. This approach ensures that safety enhancement does not come at the cost of civil liberties. The system is designed to learn from both successes and failures, incorporating feedback from multiple stakeholders to continuously improve. This adaptive approach ensures that technology remains responsive to changing urban needs and emerging challenges rather than becoming obsolete or irrelevant over time.
These design features ensure that, at every point we collect data, the nature of our “smartness” remains about human interaction rather than just hardware and software. In this way, our proposed IoPVT framework represents not just a technological advance but a philosophical shift in how we conceptualize and implement smart city initiatives: moving from technology-centered to human-centered, from data collection to meaningful insight and from reactive response to proactive prevention.
4.4. Case Study: UCL Partners’ Proactive Health Monitoring System
To illustrate the real-world benefits of proactive monitoring systems similar to our proposed framework, we examined the integrated approach of UCL Partners to the management of hypertension in East London [
101]. This initiative addressed a situation where only 60% of the hypertension cases were diagnosed and only 40% were effectively controlled. The program implemented continuous monitoring protocols in multiple care settings with integrated data systems that enabled a comprehensive risk assessment. Through this proactive approach, more than 7000 cases of previously undiagnosed hypertension were identified, and treatment adherence improved significantly among existing patients.
Critical parallels to our IoPVT framework include the following: (1) continuous data collection to identify risks before they escalate, (2) standardized response protocols based on data thresholds, (3) the integration of previously siloed information streams, and (4) targeted interventions for high-risk scenarios. The success of the program in five diverse London boroughs demonstrates how proactive monitoring systems can operate effectively at scale in complex urban environments.
While this case study focuses on health care rather than outdoor safety, the underlying principles of early detection and prevention align directly with our proposed approach. The demonstrated results suggest similar benefits could be achieved by applying these principles to outdoor safety monitoring through our IoPVT framework.
5. Scenarios and Illustrative Applications
Among the concepts in which ubiquitous edge-based IoVT sensing can increase safety and security include urban dwellings, rural industrial complexes, and coastal infrastructures, among others. Smart edge technology [
102] can mitigate risk by monitoring pedestrians, understanding environmental conditions, and analyzing infrastructure in extreme weather conditions such as ice, fire, and hurricanes. As shown in
Figure 3, the intersection of HAR (people), IoVT (technology), and scenarios (weather) can facilitate future smart planning for responsive action.
Figure 3 illustrates the foundational conceptual framework of our proposed HARISM system by visualizing the critical three-dimensional intersection between people, technology, and the environment that allows for complete outdoor safety monitoring. This Venn diagram representation serves as a conceptual cornerstone for understanding how these three domains must interact harmoniously for effective safety monitoring. The people dimension encompasses users, communities, teams, organizations, and their associated tasks and actions; the technology dimension incorporates computers, networks, and various processing methods; and the environment dimension captures weather, infrastructure, culture, and terrain considerations. The labeled intersections highlight how these domains converge to create DATA (environment–technology), TASKS (people–technology), and ACTIONS (people–environment). Most critically, the central overlap where all three dimensions converge represents the core of our HARISM framework, where human activity recognition becomes contextualized within both technological capabilities and environmental conditions. This visualization is essential to our vision, as it emphasizes that outdoor safety monitoring cannot succeed by focusing on any single dimension in isolation. Effective systems must account for human behavior, leverage appropriate technologies, and adapt to diverse environmental contexts simultaneously.
To ground our vision in practical applications, we explore three distinct settings—urban, rural, and coastal—where IoPVT-enabled outdoor safety monitoring can be implemented. These scenarios were selected to reflect a variety of challenges and opportunities that arise from different environmental, infrastructure, and societal contexts. Urban settings, such as Montreal, exemplify a high-density environment with complex infrastructure and new risks (e.g., icy staircases, crowded public spaces, etc.). The case of Harrisburg illustrates how rural areas face unique challenges due to dispersed infrastructure and industrial activities impacting public health, necessitating specialized environmental and biohealth monitoring methods. Coastal communities, in contrast, must face the harsh realities of climate change, with rising sea levels and extreme weather events making it essential to develop adaptive strategies for everything from evacuation to the resilience of roads and infrastructure. These scenarios demonstrate how integrated sensor networks, advanced HAR, and immersive virtual analytics can be evaluated to assess their potential to be customized for specific local challenges within broader outdoor safety monitoring frameworks.
5.1. Urban Setting: Montreal, Canada
Urban environments inherently present a unique set of challenges for outdoor safety monitoring [
103]. High population density, complex infrastructure, and dynamic public spaces all contribute to an environment where the potential for accidents increases. Variable microclimates, widespread air pollution, and elevated ambient noise levels complicate the calibration of the sensors and the quality of data [
104]. Furthermore, high accuracy requirements require robust, scalable, and fully integrated monitoring systems into the existing urban framework. These complexities must be taken into account when designing IoPVT systems that need to be dynamic and adaptable to the multifaceted nature of urban life, which will help improve both short-term urban safety processes and long-term urban planning initiatives.
One of the emerging developments in HAR video surveillance is in outdoor settings near buildings. Outdoor monitoring systems have become quite popular in front of doors to alert for package deliveries, ensure safety surveillance for someone approaching the door, and monitor activity in case of an intrusion. Additionally, an outdoor IoVT system is utilized by convenience stores, pedestrian pathways (especially busy intersections), and public places such as subways to enhance public safety and security. For discussion, another unique scenario for future analysis involves external stairways that can harness existing doorway IoVT developments for urban safety and health monitoring. For instance, Montreal, Canada, has a historical development of external staircases intended to provide safety through community surveillance; however, they can be hazardous in winter conditions, as shown in
Figure 4. Ice and snow can make the staircases slippery, creating challenges for elderly individuals who climb the stairs during winter. Therefore, external video surveillance can be employed not only to alert about package deliveries but also as a method to ensure quick responses in case someone slips and falls on the stairs. While this example is a unique case, dense urban centers can benefit from IoVT technology in public spaces with distinctive entrance access.
IoPVT systems can install smart sensors along stairs and pedestrian areas to monitor environmental conditions in real time, including temperature drops and ice formation. HAR algorithms can detect abnormal movements, such as slips or hesitations, which can lead to hazardous situations. From a design perspective, precise sensor placement is crucial: devices must be integrated into the cityscape without being unattractive, and their installation must consider factors such as exposure to pollution, potential vandalism, and access to power sources. In addition to IoT considerations, there are network connectivity challenges to face in densely populated urban areas, where sensors must rely on robust, low-latency communication channels that can withstand interference from numerous devices and urban infrastructure.
By entering these data into a dynamic digital twin of Montreal’s urban environment, emergency services, and municipal maintenance teams can receive timely alerts, allowing them to deploy de-icing or repair crews before accidents happen. Moreover, urban planners can utilize historical aggregated data to pinpoint chronic issues and design lasting interventions, such as more durable stairs or enhanced pedestrian pathways. Additionally, the system should be developed with a robust user interface that offers decision makers visual and analytical insights in real time, along with integration into existing urban management systems and smart city solutions. These features include energy efficiency, modularity, and ease of maintenance, allowing sensors and related infrastructure to be easily scaled or upgraded as technologies advance. This comprehensive approach not only improves public safety in the short term but also promotes the long-term development of the urban landscape in line with Montreal’s specific climate and infrastructure needs.
In urban environments like Montreal, the HARISM framework can be customized through targeted sensor deployment at high-density pedestrian zones, transit hubs, and critical infrastructure like external staircases and pedestrian bridges. Urban implementations would emphasize interconnectivity with existing smart city infrastructure, including traffic management systems, public Wi-Fi networks, and municipal emergency services. The framework would be configured for high-density data processing using edge computing nodes distributed throughout the urban landscape, enabling real-time hazard detection despite potential interference from building structures and electromagnetic noise. Urban customization would also include specialized HAR algorithms trained to identify complex activity patterns in crowded settings, such as distinguishing between normal commuter movements and distress behaviors that might indicate falls, medical emergencies, or safety hazards. Additionally, the system would integrate with building management systems and public transportation networks to coordinate responses across multiple stakeholders typical of dense urban environments while maintaining strict privacy protocols essential in areas with high surveillance sensitivity.
5.2. Rural Setting: Harrisburg, Pennsylvania, USA
In rural areas like Harrisburg, PA, safety monitoring takes on a unique focus compared to urban settings. Harrisburg has historically faced environmental challenges related to industrial activity, which have led to higher rates of cancer and other health problems, primarily due to pollution from nearby power plants. In this context, IoPVT systems can be utilized to continuously monitor air quality, radiation levels, and other important environmental factors using advanced chemical and biosensors. Alongside these devices, wearable sensors and localized monitors keep track of ambient conditions in conjunction with residents’ health metrics, collectively offering a comprehensive view of community well-being.
HAR algorithms used in these contexts can analyze data from stationary and mobile sensors to identify anomalies that may indicate exposure to hazardous pollutants or other environmental risks. When integrated into a virtual representation of Harrisburg, these data help in planning energy consumption, ensuring that industrial facilities operate within safe limits, informing public health initiatives aimed at reducing exposure and guiding remediation efforts. This proactive monitoring framework seeks to mitigate health risks, drive targeted industrial regulation, and strengthen community resilience against ongoing industrial challenges.
In addition to environmental monitoring, rural IoPVT applications extend to various industrial contexts, including warehouses, farms, loading docks, oil pipelines, electric power plants, and train stations. As rural industrial complexes grow and residential developments encroach, overlapping zones require robust safety monitoring. For example, in the extreme case of a nuclear power plant (NPP), as illustrated in
Figure 5, IoVT systems can provide integrated surveillance of both the facility and its surrounding area. In the event of industrial accidents or severe weather—such as power plant or power line failures that trigger fires—the system would enable rapid responses not only within the industrial complex but also for nearby rural communities. This comprehensive approach to IoVT in rural settings highlights its crucial role in safeguarding both industrial operations and the populations that depend on them, ensuring resilience and safety amid diverse environmental challenges.
A vital point worthy of further discussion is that deploying IoPVT safety monitoring systems in rural areas goes beyond being a technical task; it is a community initiative. When residents are involved as partners, technology becomes more intelligent, more trusted, and far more effective. Policymakers in cities and counties, such as those on the rural outskirts of Harrisburg, have the responsibility to integrate these systems with larger development plans, aligning them with environmental monitoring programs, industrial safety regulations, and resilience-building efforts. With transparent operations, strong privacy protections, and a commitment to equity, IoPVT innovations can gain community support and provide widespread benefits. Ultimately, an approach that centers on people in the IoPVT realm through participation, co-ownership, and equitable access will improve both the sustainability and safety of rural communities in the face of future challenges. Engaged deployments of this nature can serve as models for how technology and local knowledge work together to foster rural resilience in an increasingly connected world.
In rural contexts such as the Harrisburg region, the HARISM framework would be adapted to address the challenges of dispersed populations, limited connectivity infrastructure, and industrial–environmental interfaces. The deployment of sensors would prioritize critical junctions between residential areas and industrial facilities, focusing on environmental monitoring capabilities to detect hazardous emissions, water contamination, or air quality degradation from industrial activities. The system architecture would be modified to operate with intermittent connectivity, incorporating store-and-forward data transmission protocols and enhanced local processing capabilities to maintain functionality even during communication outages. Solar or other self-sustaining energy solutions would be integrated to overcome the limitations of the power infrastructure common in rural areas. HAR components would be calibrated to detect activities relevant to rural settings, such as agricultural machinery operation, resource extraction work, and outdoor recreational activities near industrial zones. Additionally, the framework would incorporate longer-range monitoring technologies, such as drone-based sensors and satellite imagery integration, to cost-effectively cover expansive geographical areas while maintaining situational awareness of scattered population centers.
5.3. Coastal Setting
Coastal environments are particularly vulnerable to the effects of climate change, such as rising sea levels, extreme weather events, and increased stress on infrastructure. Similar to health and safety concerns during winter conditions, industrial accidents that can cause fires, such as hurricanes and flooding in coastal areas, pose significant risks. With weather services offering forecasting and nowcasting, the IoPVT can be utilized not only locally for housing but also globally for community needs and socially for response planning. Two relevant examples are hurricanes and flooding. The initial destruction caused by a hurricane involves high winds, and IoVTs can aid in safety analysis by assessing potential structural damage to bridges, power line towers, and critical facilities like water towers, as illustrated in
Figure 6.
Using digital twin technology and understanding the safety limits of certain structures such as water towers and bridges [
105], we can monitor the IoVT of these structures when forecasted winds pose a risk of destruction. This approach allows for real-time updates on the potential for structural collapse. With insights from the IoVT regarding system failures, alerts can be dispatched to the public to ensure safety. Furthermore, in the aftermath of such disasters, the same IoVT systems on bridges and highways can provide critical information about water flow, road blockages, and rising water levels. If first responders cannot reach these areas, the city’s IoVT can supply rescue teams with vital information on safe routes to enhance the survival of residents. While UAVs could assist in these efforts, they would only be operational after the storm has passed. Therefore, IoVTs near bridges are essential not only for managing traffic flow under normal conditions and monitoring structural integrity but also for facilitating community-wide disaster response during events like hurricanes, particularly those leading to flooding.
Beyond immediate emergencies, the IoPVT generates valuable data that contribute to long-term climate adaptation and resilience planning for coastal cities. Planners and policymakers can analyze trends captured by sensors and connected devices to make informed decisions about adapting the built environment and communities to a changing climate. Key areas of integration include resilient urban development, infrastructure reinforcement, and managed retreat decisions. To fully realize the IoPVT’s benefits, policymakers must ensure its equitable deployment in high-risk and underserved coastal communities. Often, the communities most vulnerable to climate disasters, such as low-income neighborhoods or remote coastal villages, have the least access to advanced technologies and infrastructure. This digital divide can compound disaster inequities if not addressed. In fact, smart city research warns that if new innovations like the IoT and 5G are implemented without inclusivity, tech-disadvantaged groups could struggle to utilize or benefit from these services, thus widening inequalities and undermining sustainable resilience efforts. A truly climate-resilient coastal city must therefore plan for digital inclusion from the outset.
For coastal settings, the HARISM framework would be specially engineered to withstand harsh maritime conditions while monitoring the unique interface between human activities, infrastructure, and dynamic coastal processes. Sensor systems would be ruggedized against saltwater exposure, high humidity, and extreme weather events, with redundant communication pathways available to maintain operation during storms. The framework would incorporate specialized environmental sensors for tide levels, wave heights, wind velocities, and salinity correlated with infrastructure stress monitoring on seawalls, bridges, and coastal evacuation routes. HAR algorithms would be trained to recognize specific behaviors of coastal risk, such as swimming in dangerous conditions or accessing unstable coastal structures during storms. Critically, coastal implementation would emphasize predictive modeling that integrates real-time data with climate forecasts and historical patterns to anticipate compound events where human activity coincides with environmental hazards. The digital twin component would be enhanced to simulate progressive coastal changes due to climate change, allowing communities to visualize future risks and test adaptation strategies before implementation, creating a powerful planning tool that extends beyond immediate safety monitoring to long-term coastal resilience.
These three cases, each representing an extreme weather scenario, highlight the potential for IoVT to support HAR in outdoor settings. However, with the advent of large infrastructure projects that consider the people who use them and extreme events, leveraging IoVT technology could save lives at minimal cost. Furthermore, the proposed outdoor IoVT technology would also be implemented for normal operations concerning safety, security, and inspection, from which HAR is presumed based on the public spaces in which the infrastructures are located.
6. Challenges and Opportunities
Although the significant promise of IoPVT-enabled outdoor safety monitoring has been widely recognized, there are still hurdles to overcome to realize its transformative potential fully. These challenges range from purely technical challenges, such as ensuring that sensor networks remain robust and scalable under diverse conditions, to broader issues related to policy design, interdisciplinary collaboration, and public trust [
12,
106]. At the same time, these very obstacles present unique opportunities to refine technology, develop more inclusive governance models, and push the boundaries of interdisciplinary research. By carefully balancing innovation with ethical stewardship and engaging stakeholders across sectors and disciplines, the path to a safer and more resilient future is feasible and ripe with potential for scientific breakthroughs and social benefit. The following subsections explore critical areas of technical scalability and reliability, policy and design collaboration, ethical and privacy considerations, and future research directions that promise to shape the field.
6.1. Technical Scalability and Reliability
Building a robust and expansive IoPVT infrastructure for outdoor safety monitoring requires addressing multiple interrelated challenges related to scalability and reliability. As the number of sensors, connected devices, and participating stakeholders deployed grows, so does the complexity of ensuring consistent data collection and analysis across diverse environments: urban high rises, rural farmlands, and coastal ecosystems each present distinct conditions that strain network connectivity and sensor performance. In addition, the sheer volume of data generated by real-time monitoring can quickly overwhelm centralized processing infrastructures, calling for distributed computing strategies at the edge and fog layers to handle localized analytics and reduce bottlenecks.
On the hardware side, sensors must be engineered to withstand harsh environmental conditions, including extreme temperatures, high humidity, and corrosive elements such as salt in coastal regions [
107]. Cities like Montreal, for example, have harsh winters that require devices that can still operate through ice, snow, and subzero temperatures. Coastal areas exposed to salty air, storms, and fluctuating tide heights present a different kind of durability challenge, but one that is equally demanding. This mitigates any deterioration, as materials and enclosures must perform just as well and stand up to the time and test associated with weatherproofing. Another key consideration is power efficiency [
108]. Remote or autonomous sensor deployments often use batteries or energy harvesting solutions to maintain continuous operation, which depends on factors such as temperature or the seasonal availability of sunlight. The strategic placement of sensors, optimizing access to maintenance while enabling successful repairs/migration, further complicates the large-scale deployment of IoPVT systems.
On the software side, scalability lies in the capacity to process enormous data streams without sacrificing responsiveness or security [
109]. The heterogeneous sensor feeds must be ingested, stored, and indexed efficiently in data management systems to allow for near-real-time analysis and the retrieval of relevant information. Also critical is fault tolerance, where a single point failure at the network layer or within the cloud infrastructure can create surveillance gaps, slow notifications, or even complete system outages. Instead, to distribute computational loads and decrease latency (critical for fog networks in remote regions or in bandwidth-limited scenarios), developers can consider introducing redundant network architectures, adding load-balancing mechanisms, or utilizing edge/fog analytics. In addition, end-to-end encryption, other security measures, and secure communication protocols are needed to secure sensitive information, from health data to location data. The third is to install a wide range of sensors, third-party APIs, and different IoPVT layers that require open standards and adaptable middleware solutions to promote interoperability. This level of interoperability is crucial for future-proofing the system so that new technologies can be integrated into it and enable incremental upgrades over time.
Ultimately, ensuring technical scalability and reliability is more about strategic planning and design than technological upgrades. To do so, they must consider topics such as optimal sensor position, appropriate communication protocols for local conditions, or a flexible architecture that can be easily changed. With these aspects in mind, it can be ensured that IoPVT-enabled outdoor safety monitoring can meet the needs of more users and more environments without the degradation of data quality and system responsiveness.
6.2. Interdisciplinary Collaboration for Design and Policy
This analytical perspective of the system is essential to implement IoPVT-enabled outdoor safety monitoring on a much larger scale, which requires much more than technical expertise, but seamless interdisciplinary cooperation between a large diversity of stakeholders. Urban planners, public health experts, software and hardware engineers, policymakers, emergency responders, and community members bring essential perspectives and priorities. This, in turn, can help ensure that the solutions we craft are not only technologically sound but also socially responsible, ethically appropriate, and sustainable in the long term.
At the heart of this vision is a user-centric design philosophy. This will require systems architects and engineers to collaborate closely with urban planners to ensure that sensor deployments and data pipelines can be neatly integrated into the existing and planned infrastructure of the city. Not only does this save on resources, but also systems can be built that are scalable and adaptable. Data can be one way to understand what communities need. However, public health professionals and social scientists further deepen this process by providing insight into community behavior patterns so that responses match needs and are compatible with local needs and values.
For example, sensors must be placed carefully in sensitive areas such as public schools and hospitals [
110]. Balancing technological efficacy with privacy and ethical concerns requires an inclusive dialogue among all parties. In this context, user-centric design is not just a buzzword. It is a foundational principle that ensures that IoPVT solutions remain context-aware and sensitive to cultural and social nuances [
111]. Whether deployed in a densely populated urban center or a quiet rural community, these solutions must be versatile enough to meet varied local demands.
On the policy side, legislators and regulatory authorities are vital. They create transparent, well-defined processes that define what data are collected, who owns them, and how they are used. Regulations should define the roles and responsibilities of all stakeholders, from local government agencies to private sector service providers. Strong regulation not only protects the rights of citizens, especially around data ownership, data use, and data privacy, but also creates a fundamental framework for innovation to grow and thrive without the fear of running into a legal gray area.
Public–private partnerships (PPPs) are among strategies with distinct potential to accelerate the development of IoPVT platforms [
112]. However, there must be transparent oversight mechanisms behind these collaborations to ensure accountability, trust, and cross-jurisdictional collaboration [
113]. Since the responsibility to manage systems is often distributed across local governments, regional bodies, etc., cross-jurisdictional collaboration across neighboring municipalities or utilities in a region can enhance system interoperability. Partnerships can generate vast data ecosystems that provide information throughout the community on public safety and health trends that ultimately benefit all communities.
Interdisciplinary cooperation is the foundation for IoPVT systems that can benefit their communities, combining technical insight with forward-thinking urban forms, ethical constraints, and robust policy structures [
114]. This makes this holistic approach more than just about optimizing hardware and software. It draws on various perspectives to address the complexities of the real world. In doing so, it fosters stakeholder confidence, accelerates the adoption of emerging technologies, and significantly improves safety outcomes in diverse outdoor environments. This has resulted in a powerful and flexible system capable of retaining the technical requirements of contemporary city life while being attuned to wider social, ethical, and regulatory considerations.
6.3. Ethical and Privacy Considerations
Although IoPVT-enabled safety monitoring systems consider various use cases to detect unsafe outdoor conditions at both the individual and community levels, these systems are increasingly integrated with advanced personal and environmental data acquisition devices, many of which are now commonplace in our daily lives. Sensors in public and semi-public spaces can collect detailed information about individual location, movement, and behavior data that can threaten personal freedoms or inadvertently perpetuate harmful discrimination if misused by bad actors. Therefore, balancing proactive security measures and the right to privacy regarding personal data is a critical challenge that requires sound policymaking, robust technical safeguards, and thoughtful public consultation [
115].
One of the biggest questions in these systems is, what data should we collect, who owns those data, and can they be used? Although grainy data yield the most exact information, they also generate more privacy threats [
116]. There should be clear lines drawn informing the use of permissible data by emergency services, health interventions, or infrastructure improvements but also lines drawn to ensure there is zero room for possible unauthorized or unethical use: no intrusive surveillance or commercial profit without public benefit. There are transparent data governance models that various communities can adopt, including specifying retention periods, anonymization standards, and data-sharing protocols that can create a sense of security for local organizations and help maintain trust in technology.
The potential for surveillance overreach represents a significant ethical concern. Excessive monitoring could create a chilling effect on public behavior, discourage the use of public spaces, or enable discriminatory profiling. To mitigate these risks, we advocate for implementing strict purpose limitations on data collection, establishing independent oversight committees with diverse representation, and conducting regular privacy impact assessments with public disclosure of findings. Systems must be designed with “privacy by default” principles [
117], collecting only what is necessary and automatically discarding nonessential data.
Equally concerning are the equity implications of IoPVT deployment. Historical patterns suggest that beneficial technologies often reach affluent neighborhoods first, while low-income and marginalized communities experience significant delays despite often facing greater safety risks. This digital divide could exacerbate existing social inequalities if not explicitly addressed in deployment strategies. We recommend developing equity-focused deployment frameworks that prioritize underserved areas, establish minimum service levels across all neighborhoods regardless of socioeconomic status, and include community benefit agreements that ensure technology investments create tangible improvements for local residents. Implementation plans should include specific metrics for measuring equitable coverage and require corrective action when disparities are identified.
Outdoor monitoring covers broad geographic spaces, complicating the mechanisms of individual consent, compared to more controlled spaces like hospitals or research laboratories [
118]. Local authorities and system developers can improve public trust in a system by making it more transparent. Clear signage indicating areas under sensor surveillance, publicly available dashboards featuring aggregated data, and outreach efforts to inform citizens of what data are being collected and how could alleviate these concerns. Bringing local communities into the conversation, by holding intensive town hall meetings or setting up online town forums for example, also offers citizens an opportunity to raise concerns, suggest improvements, and feel a sense of co-ownership in how the system is deployed and evolves.
AI-powered human activity recognition and predictive analytics can unintentionally systematize bias and exclusion if training data or algorithmic designs do not represent diverse populations and contexts [
119,
120]. This can result in unwanted the uneven distribution of safety resources, the misidentification of innocuous behaviors as suspicious, or systematically excluding specific demographics from early warning systems. To combat these risks, system developers need to use inclusive datasets, regularly audit their algorithms for disparate impacts, and include a diverse group of stakeholders in the model development and vetting process. As part of the responsible aspect of AI, best practices, such as explainable models and continuous monitoring of potential bias, can be incorporated into IoPVT platforms to ensure that the safety benefits are delivered fairly and equitably.
There are also significant environmental justice considerations that must be addressed. Communities with historical exposure to environmental hazards may be justifiably skeptical of new monitoring systems. Implementation teams should acknowledge this history and work with community leaders to establish trust. Furthermore, the environmental footprint of the technology itself—including energy consumption, electronic waste, and resource extraction for manufacturing components—must be minimized and transparently reported as part of a comprehensive ethical framework.
Ultimately, ethical and privacy considerations are not simply additional constraints but rather foundational pillars of a sustainable IoPVT ecosystem. When handled transparently and collaboratively, these measures protect citizens’ rights, increase public trust, and facilitate the long-term viability of outdoor safety monitoring as a force for communal well-being. By centering equity, privacy, and ethical governance in system design from the outset, we can ensure that technological benefits are distributed justly and do not come at the expense of civil liberties or social cohesion.
6.4. Future Research Directions
The rapidly evolving landscape of IoPVT-enabled outdoor safety monitoring opens a wide range of opportunities for continued investigation and innovation. As systems scale in complexity and global reach, researchers and practitioners must push the boundaries of sensor technologies, data analytics, and policy frameworks to ensure that solutions remain robust, adaptable, and ethically grounded. This section details several key areas that warrant in-depth exploration.
6.4.1. Advanced Sensor Development and Integration
One of the most challenging paths from a research perspective is designing and improving sensor hardware robust enough to withstand various, often severe, environmental conditions. In addition to temperature and humidity sensors, other modalities such as chemical, biosensing, and acoustic sensing could introduce a new dimension to outdoor safety monitoring. For example, volatile organic compound (VOC) detection or particulate matter detection technologies could provide invaluable information about health risks related to air quality, particularly in urban and industrial areas [
121]. Furthermore, exploring the integration of wearable sensors that monitor vital signs, such as heart rate variability, blood oxygen levels, or stress indicators, could ultimately lead to more personalized and preventive health interventions.
In challenging environments, the durability of the sensors remains crucial [
122,
123]. Researchers can investigate new materials or packaging methods for sensor casings to extend their lifespan in extreme climates, such as the frigid winters of Montreal or the salty air of coastal regions. Investing in power efficiency is just as vital: advances in battery technology, wireless energy transfer, or energy-harvesting solutions (solar, kinetic, etc.) could significantly reduce maintenance costs and enhance long-term reliability. Meanwhile, increased miniaturization allows sensors to seamlessly integrate into infrastructure, such as building facades, streetlights, or public transportation stations, to optimize coverage while minimizing visual impact and installation challenges.
Another exciting frontier is the integration of more intelligence directly into the sensor nodes themselves [
124,
125]. This innovation allows these sensors, through a high level of onboard signal processing or lightweight AI capabilities, to autonomously filter out noise, flag anomalies, and even perform initial activity recognition rather than simply transmitting raw data for later collection and processing. This local intelligence minimizes network congestion by reducing the volume of data sent upstream for further analysis, thus enhancing response times and facilitating faster and more context-aware decision making. For example, a smart sensor in a remote coastal watchtower could identify and categorize unusual vibration patterns indicating structural stress or wave surges, sending timely alerts about these conditions instead of simply streaming raw data on every vibration the device logs. With advances in embedded hardware power and energy efficiency, these capabilities will not only conserve bandwidth and extend battery life but will also support a more scalable and distributed approach to monitoring outdoor safety.
6.4.2. Next-Generation AI and Machine Learning
Many HAR algorithms are trained on standardized datasets that do not fully capture the complexity of real-world interactions [
69,
126]. Future research may need to focus on situation-specific AI systems capable of adapting to the demographic diversity of climate, geography, culture, and social factors. For example, what qualifies as “abnormal activity” in a crowded urban setting could differ significantly from that in a sparsely populated rural area. Adapting models to local conditions can reduce false positives, making actual safety threats harder to overlook.
Another well-recognized open problem for centralized data processing is the challenge of bandwidth, latency, and privacy. Federated learning offers a valuable approach to addressing this issue by enabling local devices and edge nodes to collaboratively train a model without sending the original data to the cloud [
127,
128]. This can enhance privacy and alleviate network constraints. Moreover, on-device analytics, including partial or complete inference on edge hardware, can improve responsiveness, which is especially beneficial in time-sensitive scenarios like accident detection or emergency alerts.
Alongside these advancements, privacy-preserving AI/ML techniques should be prioritized in future development and research [
129]. As HAR models become capable of recognizing fine-grained behaviors, they also become more likely to capture sensitive personal information. Using techniques such as differential privacy, secure enclaves, or encrypted computation, we can preserve user identity and mitigate data leakage so that the use of such systems does not result in a lack of control for the user to whom such learning pertains. More generally, establishing robust data governance frameworks, providing clarity around consent handling, and implementing role-based access protocols can help protect against improper use or sharing of sensitive data. This approach promotes ethics while contributing to the diverse data necessary for cutting-edge models and insights in next-generation AI and machine learning.
6.4.3. Interoperable Architectures and Open Standards
The growing size and complexity of IoPVT systems increase their risk of fragmentation. Without common communication protocols or standardized data formats, achieving interoperability between diverse devices, networks, and administrative domains is becoming increasingly difficult [
130]. Exploring open standards, potentially within existing IoT frameworks, could enhance interoperability, lower integration costs, and promote wider adoption of these technologies.
Addressing large-scale environmental and disaster scenarios will likely require cooperation across multiple regions or even internationally. Standardized architectures and protocols could enable data sharing between different municipalities or countries. This alignment would improve collective safety response mechanisms and facilitate more comprehensive ecological data sets for comparative studies and the implementation of best practices worldwide.
6.4.4. Human-Centered Design and User Acceptance
The achievement of widespread adoption depends on public trust and acceptance [
131]. Future efforts should explore collaborative design strategies that involve community stakeholders, advocacy groups, and representatives of vulnerable populations (e.g., the elderly and individuals with disabilities). This approach ensures that technologies are developed with empathy for user needs and sociocultural factors. Pilot programs, focus groups, and living labs can provide insight into how citizens view these systems and what improvements could increase their comfort and engagement.
User-friendly interfaces, from mobile and web dashboards to AR/VR platforms, can clarify the data obtained from IoPVT systems and allow citizens to make informed decisions about their safety. Research can investigate innovative ways to visualize complex environmental data or activity metrics, potentially gamifying community participation. By offering intuitive control panels, citizens and policymakers can work together to establish thresholds, customize alerts, and actively shape their local safety landscape.
6.4.5. Enhanced Security and Ethical Frameworks
As IoPVT systems merge with essential urban infrastructure, the risks of a potential cyber attack increase [
106,
132]. To maintain data integrity, researchers must develop robust multilayered security solutions, including real-time anomaly detection, low-power device-optimized encryption protocols, and distributed ledger technologies (e.g., blockchain). Zero-trust architectures, in which every device and service is continually authenticated and monitored, can further enhance the overall system’s resilience.
An emerging vital aspect of securing IoPVT data, especially as they transition between physical and virtual realms, is the establishment of verifiable anchors that authenticate the integrity of digital representations against real-world conditions [
133,
134]. A promising approach involves using environmental fingerprints, such as distinctive patterns of noise, temperature variations, or even microchanges in barometric pressure as unique identifiers that link the digital environment to actual physical-world data [
135,
136]. If environmental signatures are embedded in the data flow, then when someone attempts to manipulate the meter or spoof it and alter the readings, the system can identify that the presented data cannot be accurate due to discrepancies in the “environmental fingerprints” compared to what is expected. This next level of verification is crucial for digital twins and immersive platforms, where false or manipulated images or scenarios can ultimately lead to poor decisions or ineffective resource allocation. Cross-validation between virtual analytics and physical anchors thus helps ensure trust and coherence in IoPVT ecosystems.
If not carefully audited, AI-driven decision making can inadvertently marginalize certain groups [
137,
138]. Future research should investigate strategies to reduce algorithmic bias, such as using diverse and representative training datasets or implementing continual fairness checks throughout model lifecycles. As global and regional regulations around data privacy (e.g., General Data Protection Regulation (GDPR)) and responsible AI intensify, research must explore frameworks that harmonize these technologies with evolving legal landscapes.
6.4.6. Real-World Pilot Studies and Longitudinal Evaluations
Academics, industry partners, local governments, and community organizations can collaborate to create real-world testbeds, or ‘living labs’, where IoPVT solutions are deployed and evaluated iteratively. This enables researchers to improve algorithms and optimize both sensor and user interface placements based on actual usage patterns, real-time feedback, and tangible results.
To advance from conceptual frameworks to practical implementations, we suggest several specific experimental designs and testable hypotheses. A multisite comparative deployment would establish a controlled study across three distinct environments (urban, rural, and coastal) with matched infrastructure components but customized sensing arrays. Each deployment would include a control zone with traditional monitoring systems alongside an intervention zone with full IoPVT implementation, with standardized metrics tracked over 12–24 months. We hypothesize that areas with IoPVT monitoring will demonstrate statistically significant reductions in accident rates compared to control zones. Furthermore, we predict that emergency response times in zones enabled by the IoPVT will noticeably decrease compared to traditional monitoring zones. A third hypothesis suggests that each type of environment will require different optimal sensor density configurations to achieve equivalent hazard detection accuracy.
An incremental technology integration protocol would introduce the IoPVT components sequentially to isolate their individual contributions. This phased experimental approach would begin with physical sensors only, then add digital twin capabilities, followed by predictive analytics integration, and finally implement full Metaverse visualization. We hypothesize that the addition of digital twin capabilities will improve the accuracy of hazard prediction compared to physical sensors alone. Furthermore, stakeholder decision quality (measured by simulation exercises) will likely improve most significantly after the introduction of Metaverse visualization, with expected improvements in decision speed and accuracy. Despite the increased capabilities with each phase, we anticipate that system maintenance costs will decrease progressively as the integration becomes more sophisticated and efficient.
The adaptive response validation framework would test the system’s ability to evolve with changing conditions by introducing controlled environmental variables such as simulated weather events and crowd gatherings. Researchers would measure the system adaptation in sensing parameters, alert thresholds, and response recommendations on multiple time scales, from immediate to seasonal adjustments. We hypothesize that the IoPVT system will automatically adjust the monitoring parameters within 30 min of significant environmental changes. As the system learns and refines its operations, false positive rates for hazard detection should decrease significantly after three months compared to the initial deployment. The trust of the community stakeholder, measured through standardized surveys, would likely increase proportionally to the adaptation capabilities demonstrated by the system.
These experiments should employ rigorous quantitative and qualitative evaluation methodologies that include safety outcome metrics (incident rates, near-miss frequency, response times), technical performance metrics (accuracy, precision, recall, latency, uptime), user experience metrics (stakeholder surveys, usability testing, decision quality assessment), and economic indicators (implementation costs, maintenance requirements, return on investment). Short-term pilot programs provide helpful snapshots of system capabilities, but understanding full impact requires longitudinal studies spanning multiple seasons or years. By systematically tracking these metrics over extended periods, researchers can quantify how effectively IoPVT systems reduce accidents, improve public perception, and adapt to shifting population patterns or climate conditions. This longitudinal data will also facilitate the refinement of predictive models and optimal intervention strategies.
The experimental designs and hypotheses outlined above provide a structured approach to transitioning the HARISM framework from a theoretical concept to evidence-based implementation. By establishing clear metrics and testable predictions, future research can systematically validate the framework’s effectiveness across diverse environments and use cases, addressing the practical implementation concerns raised by reviewers.
7. Conclusions
This paper has presented a comprehensive vision for IoPVT-enabled outdoor safety monitoring, illustrating how the combination of HAR, sensors, predictive analytics, and virtual environments can significantly improve public safety and well-being. Through three-dimensional interactions among people, technology, and the environment, we emphasized the need for a more holistic response, one that addresses the unique challenges of local contexts, urban, rural, or coastal. In this way, we demonstrated how state-of-the-art sensors, AI-powered analytics, immersive interfaces, and robust infrastructures not only connect but also integrate to create a system of data-driven actions that can mitigate risks and respond more quickly in times of need. At the heart of this framework is the emphasis on scalability, reliability, and ethical deployment. As technological architecture evolves, from edge computing and federated learning to sensor miniaturization, our understanding of data governance, privacy, and interdisciplinary collaboration must also evolve. The discussion of challenges and areas for future research highlights the necessity of forming partnerships among technologists, policymakers, community stakeholders, and domain experts. In doing so, we can ensure that the solutions we propose to address safety challenges not only meet urgent needs but also align with societal values and inclusivity.
The framework proposed in this paper has significant global implications for safety monitoring in diverse contexts around the world. As urbanization accelerates globally, with projections indicating that 68% of the world’s population will live in urban areas by 2050 [
139], the need for robust outdoor safety systems transcends geographical and developmental boundaries. The integration of HAR with the IoPVT offers a scalable and adaptable solution that can be implemented in both developed and developing nations, addressing universal public safety challenges while accommodating local environmental, cultural, and infrastructural variations. In high-income regions, this approach can enhance existing smart city initiatives and provide more sophisticated risk prevention. In emerging economies, it offers opportunities for technological leapfrogging by implementing advanced safety systems that require less physical infrastructure than traditional approaches. Furthermore, as climate change intensifies extreme weather events globally, the predictive capabilities of our framework become increasingly valuable for communities around the world facing unprecedented environmental challenges.
The envisioned system is not just about eyeballs and sobering statistics; it is about mobilizing citizens to pursue new solutions and sustain its democratic character. Using real-time data, predictive modeling, and collective intelligence, communities can better predict emerging hazards, allocate resources more efficiently, and create policies that promote resilience and vitality. As the IoPVT ecosystem continues to evolve in terms of complexity and scale, the lessons learned from this vision paper will aid in conducting research, piloting implementations, and informing policies that enable cities and communities worldwide to transform into safer, healthier, and more sustainable places to live.