1. Introduction
The Internet of Things (IoT) market is growing rapidly, with the number of connected devices projected to reach 40 billion by 2030 [
1]. Among them, wireless IoT devices have become particularly popular due to their flexibility and ease of deployment. However, their widespread availability and low cost have also raised serious security and privacy concerns. A prominent example is hidden cameras—often disguised as everyday household items—which are designed to record individuals without their knowledge or consent. Many of these devices now support Wi-Fi connectivity, allowing remote monitoring and further complicating detection. A 2024 survey by smart home company Vivint found that 8% of short-term rental property owners admitted to installing hidden cameras in their listings [
2].
Currently, hidden IoT devices pose significant security threats, such as privacy breaches, yet existing detection technologies often yield unsatisfactory results. Commercial detection tools perform poorly in real-world environments, struggling to accurately identify target devices and often producing high rates of both false positives and false negatives [
3]. Traditional wireless camera detection methods—such as infrared lens scanning and manual physical searches—require thorough inspection of indoor environments. However, these methods have limited detection ranges and low efficiency, making it difficult to effectively identify hidden devices. Detection technologies for hidden IoT devices are increasingly moving toward automation and intelligence, with a core focus on identifying distinctive operational signatures emitted by the devices. Such methods can be uniformly categorized as feature-based detection methods, including those based on electromagnetic radiation, sensor detection, and network traffic analysis.
Electromagnetic radiation-based detection methods: IoT devices generate electromagnetic radiation signals with specific patterns during operation. Camera devices process captured raw images (e.g.,through encoding and compression) in their built-in read/write storage and transmit the images to the target receiver or storage medium, during which corresponding electromagnetic radiation signals are generated. By analyzing the electromagnetic radiation characteristics during this process, devices can be detected and located [
4]. Additionally, fluctuations in electromagnetic radiation caused by changes in the camera’s field of view can also serve as an important indicator for determining its spatial location [
5]. The operating frequency of the device’s internal processor clock also affects the overall electromagnetic radiation characteristics [
6]. Devices equipped with DDR3 or DDR4 memory modules can be effectively distinguished based on their differing radiation characteristics [
7].
Sensor-based detection methods: Such detection systems rely on various integrated sensors to identify and locate devices, including millimeter-wave sensors [
8,
9], radio frequency detectors [
10,
11], thermal imaging sensors [
12], and time-of-flight sensors [
13]. By acquiring and analyzing signal responses generated by target devices under different physical conditions, it is possible to effectively identify specific devices or their operational states.
Traffic-based detection methods: Detection strategies based on traffic analysis offer diverse implementation modes and data carriers. Detection environments can be divided into two major categories: access networks and non-access networks, each further subdivided into encrypted traffic and non-encrypted traffic scenarios. In practical applications, non-access network scenarios are more common, and the encryption status of traffic is typically unknown in advance. In such environments, research focuses on identifying network traffic changes caused by covert devices (e.g., hidden cameras) to infer their locations [
14,
15]. To this end, some studies have analyzed network information at different layers, such as extracting traffic features from physical or MAC layer headers [
16,
17], identifying discriminative statistical features from encrypted traffic [
18,
19], or utilizing Channel State Information (CSI) to model device behavior and environmental distribution with greater precision [
20,
21].
Although detection technologies based on electromagnetic radiation and sensors perform well in controlled experimental settings, such technologies require specialized equipment, and users must incur significant costs to use them, which raises the barrier to adoption. On the other hand, although network traffic analysis methods show theoretical and experimental applicability, they typically require a lengthy data collection process, thereby increasing users’ time-related detection costs. As such, these techniques are unsuitable for real-time detection of hidden cameras and other IoT devices.
In the modern information technology environment, portable smart devices such as smartphones, tablets, and laptops offer an effective and convenient means for detecting and identifying potential hidden devices. Recent research findings have validated the feasibility of implementing such detection on smartphones or personal computers [
22,
23,
24,
25]. Wi-Fi-based detection technology can capture various types of information, including Received Signal Strength Indicator (RSSI), Time of Flight (ToF), and CSI. These signals not only enable the identification of IoT devices but also play a crucial role in determining device locations. However, such methods require collecting large volumes of signal data, which increases the time required for data collection.
However, some detection methods have specific requirements for the devices themselves, such as the need for special lenses, which limits their applicability to certain device types and makes it difficult to conduct universal detection in unfamiliar environments. Additionally, due to the limitations of their detection principles, such methods can typically only identify individual devices within a limited range, making them unsuitable for use in non-cooperative and network-disconnected unfamiliar environments. To address these challenges, this paper proposes a RSSI-based detection and localization method for hidden IoT devices and develops a prototype system that can quickly detect, identify, and localize hidden IoT devices in unfamiliar environments without requiring the system to access the network environment where the hidden devices are located. This is achieved through active sniffing of non-connected networks. Additionally, the prototype system incorporates augmented reality (AR) technology and includes a easy-to-install Android application that can be conveniently used on user terminals such as smartphones.
Table 1 presents a comparative analysis of existing technologies and the work described in this paper. The distance error represents the average error between the final prediction and the actual location, while the time overhead denotes the time required for detection and localization. This time refers to the duration needed to localize a single known hidden device, and entries without an asterisk (*) indicate the average time required for a complete scan of the entire room area to detect potentially hidden devices. It should be noted that the time consumption of room scanning varies depending on factors such as the number and distribution of devices. Therefore, the values provided here are averages under the given experimental settings and are primarily used for relative comparisons between methods.
The system design aims to simplify the operation process, improve detection accuracy, and enhance applicability. To address the issues of high time consumption, dependence on specific device features, and low localization accuracy in current mainstream methods, the following three main designs are proposed:
Non-cooperativeactive data collection mechanism: To address the issue of excessive time consumption caused by reliance on passive data collection in existing detection methods, this study proposes a non-cooperative active packet sniffing technique based on the RTS/CTS interaction mechanism. This technique dynamically adjusts its signal collection strategy based on the number of received packets, enabling efficient acquisition of sufficient data from all hidden devices within a short data collection cycle, thereby providing data support for subsequent detection and analysis.
Efficient device identification algorithm: Due to the limited number of feature fields carried in CTS frames and some Wi-Fi Beacon frames, traditional methods struggle to meet the requirements for information dimensions and statistical characteristics. To address this issue, this paper designs an IoT device classification algorithm based on machine learning. This algorithm uses collected channel fingerprints as input features, effectively enabling identification and classification of different types of IoT devices, significantly improving the system’s discrimination accuracy in complex environments.
High-Precision Device Localization Method: To address the limitations of existing methods in terms of device localization accuracy, a distance estimation model based on RSSI is constructed to accurately estimate the distance between the user and hidden IoT devices. Combined with an iterative localization algorithm, this approach guides users to gradually approach the hidden IoT devices, thereby identifying their locations.
The structure of this paper is arranged as follows:
Section 2 provides a brief introduction to the system scenario.
Section 3 presents the system design and architecture in detail.
Section 4 describes the key implementation techniques.
Section 5 evaluates the system performance. Finally,
Section 6 concludes the paper and outlines directions for future work.
2. System Overview
2.1. Threat Scenario
In this threat scenario, a malicious attacker covertly deploys multiple hidden IoT devices in an indoor environment, such as a rental room, with the intent to monitor and eavesdrop on unsuspecting users. These users, unfamiliar with the environment and lacking professional security knowledge, are often unaware of the presence of such devices.
As illustrated in
Figure 1, a user enters an unfamiliar room that has been deliberately configured by a malicious aggressor. Multiple concealed IoT devices are deployed throughout the environment with the intent to monitor user behavior and eavesdrop on conversations.
The following sections introduce the threat scenario in detail from the perspectives of the network environment, attacker and user capabilities.
Network Environment: In an unfamiliar environment, attackers first deploy hidden IoT devices with Wi-Fi functionality and wireless routers. Attackers use the Wi-Fi network to remotely control the hidden IoT devices. Once ordinary users enter this environment, they cannot connect to the hidden devices’ Wi-Fi network or detect their network traffic using detection tools.
Attacker Capabilities: Attackers possess advanced network control privileges, enabling them to independently select and configure encryption protocols (such as WPA2, WPA3) and restrict users’ access to the wireless network within the room, thereby creating a closed and opaque network environment. Additionally, attackers can hide the SSID, modify Beacon frames, enable client isolation, and implement access control policies to reduce the likelihood of detection from the source. Through device spoofing, signal interference, and anonymous encrypted channels, attackers can further enhance their stealth and anti-detection capabilities, rendering traditional sensing-based detection methods ineffective.
User capability limitations: As victims, users typically lack the professional knowledge and technical means necessary to identify covert devices. These devices are often carefully disguised, making them difficult to detect with the naked eye. Additionally, users cannot access the network where the devices are located and can only use their own portable electronic devices (e.g., smartphones or laptops), which lack the necessary monitoring and analysis tools. The device models, types, communication channels, and password information are all unknown.
2.2. System Workflow
When users enter this unfamiliar environment, they can run the system developed in this study on a smart device and move freely around the room to collect channel data from devices in the current environment. This process provides the data foundation for the system’s subsequent device identification and localization. The channel data collection mechanism leverages the “polite behavior” characteristic of IoT devices communicating over a Wi-Fi network: when these devices receive a “Request to Send” (RTS) frame addressed to their MAC address, they broadcast a “Clear to Send” (CTS) frame. Each CTS frame contains an RSSI value indicating its signal strength. By statistically analyzing the RSSI values of these CTS frames and applying a machine learning–based classification model, the system effectively identifies the type of device emitting the signal. Subsequently, the system estimates the location of hidden devices using an RSSI-based ranging model and multi-point positioning algorithms.Finally, the system employs AR visualization to intuitively display the detected device locations in real-time on the user’s device, enabling ordinary users to perceive and locate hidden IoT devices in unfamiliar environments without professional expertise.
3. System Composition
This section presents the architecture of the implemented hidden device detection and localization system, which is efficient, and deployable on smart terminals via an Android application. The system has the ability to actively detect wireless network traffic and includes three functional modules: a data sniffing module, a device fingerprint recognition module, and a hidden device localization module. The system architecture is shown in
Figure 2.
3.1. Data Sniffing Module
The data sniffing module is designed to rapidly detect and collect data from potential hidden devices in a wireless environment. Its core task is to obtain sufficient wireless communication characteristic data within a short timeframe to support subsequent device identification and localization. Traditional passive sniffing methods primarily rely on Beacon frames periodically broadcast by Wi-Fi devices for identification. Beacon frames are management frames actively sent by access points (APs) to notify their presence, transmit network identifiers (such as SSID), channel information, and encryption methods, among other basic network parameters. However, since the transmission frequency of Beacon frames is controlled by the devices themselves, it is challenging to collect sufficient data within a short timeframe, particularly in scenarios involving low-activity devices such as hidden cameras.
To address this issue, the system employs an active sniffing technique based on the RTS/CTS mechanism, leveraging the “courtesy response” feature of the Wi-Fi physical layer [
26] to induce target devices to quickly return CTS frames without requiring device authorization. Specifically, the attacker’s device broadcasts a large number of validly structured RTS frames to the target MAC address, prompting it to return CTS frames, thereby enabling high-frequency data response collection in an extremely short timeframe. Additionally, the system simultaneously collects Wi-Fi Beacon frames and terminal Visual-Inertial Odometry (VIO) data to enhance overall data dimensionality and spatial perception capabilities. VIO is applied in AR to enable real-time estimation of a device’s three-dimensional position information and motion trajectory in unknown environments.
The complete sniffing process consists of the following steps: (1) rapidly identifying the MAC addresses of active devices through channel scanning, with an average monitoring time of 3–5 s per channel; (2) constructing targeted RTS frames and switching to the corresponding channel for transmission; and (3) receiving and recording the CTS frames returned by the target devices. Compared to traditional passive sniffing, this active sniffing method can flexibly adjust the RTS frame transmission frequency based on actual requirements enabling more precise control over data collection rates. It does not depend on the target device’s configuration and only requires RTS/CTS functionality at the MAC layer to operate, thereby providing higher adaptability and data collection efficiency.
3.2. Device Fingerprinting Module
In non-cooperative environments, the device fingerprinting module is designed to automatically identify IoT device types based on wireless frame behavioral characteristics. It specifically addresses the limitation of traditional methods, which rely on the rich information contained in Beacon frames but lack adaptability to hidden devices. Since this system primarily sniffs CTS frames, which have a relatively simple structure and carry limited information, it faces challenges such as sparse identification features and low device variability. Therefore, accurate and comprehensive feature collection and preprocessing form the critical foundation of the module design. The module’s workflow is illustrated in
Figure 3. The system constructs features primarily in three parts: frame attribute features, RSSI statistical features, and behavioral features, which are used as channel fingerprints for training.
The system first performs high-frequency sniffing of CTS frames to extract all metadata fields, including frame arrival time, frame length, channel number, RSSI values, and other frame attribute features. To enhance the discriminative capability of the features, the system filters fields with similar value distributions across devices (such as certain AP identifier fields) and retains only frame attributes with distinct differences, thereby constructing the original frame attribute feature set.
Additionally, the system repeatedly collects the RSSI sequence of the same device at multiple spatial locations to form sufficiently stable spatial signal distribution features, providing high-quality samples for subsequent model training and constructing an RSSI statistical feature dataset.
After completing the initial feature screening and RSSI data collection, the system proceeds with behavior feature extraction. The system characterizes the device’s response strength and activity level based on the ratio of received CTS frames to transmitted RTS frames. This ratio exhibits significant differences across different device types, effectively aiding in device differentiation.
Finally, all frame attribute features, RSSI statistical features, and behavioral features are integrated into a standardized feature vector, which is then fed into an XGBoost classification model for training and prediction, enabling precise identification of typical IoT device types such as cameras, smart doorbells, and smart plugs within wireless environment. Experimental results demonstrate that this method maintains high recognition accuracy even in low-information frame environments, exhibiting excellent robustness and generalization capabilities.
3.3. Hidden Device Location Module
The hidden device localization module enables precise localization of wireless hidden devices in complex indoor environments, addressing the performance issues caused by large RSSI fluctuations and low localization accuracy in current methods. Some existing approaches employ a “maximum RSSI value localization” strategy, which identifies the location of the device as the point with the strongest signal. However, this method is only effective at extremely close distances and is susceptible to obstruction and signal reflection, resulting in insufficient robustness.
To address this, the proposed module combining RSSI ranging with multi-point localization, incorporating VIO information to enhance overall localization stability and accuracy. In spatial localization, the position of a target can be determined using the distances and coordinates of at least three reference points. Based on this principle, an RSSI-based ranging model is trained to estimate the distance to the hidden device using the RSSI sequence collected at the current location. By combining data from multiple locations, the system can ultimately determine the exact position of the hidden device. The overall workflow of the hidden device localization module is shown in
Figure 4.
The feature space for the ranging model is constructed by analyzing the distribution characteristics of RSSI values, and a supervised learning approach is employed to train the model. This model enables real-time distance estimation for device localization. During the positioning process, the system combines RSSI measurement values with user position information constructed using VIO to form a multidimensional data pair: (x, y, z, RSSI). Among these, the coordinates (x, y, z) represent the user’s current spatial displacement relative to the initial startup position—the system uses the user’s location when the application is launched as the origin of the coordinate system, and the VIO module records the user’s three-dimensional displacement relative to that point in real time to construct a local coordinate system; RSSI represents the signal strength received at the current location. To ensure data alignment, the system synchronously records the timestamps of each data point and uses this to fuse RSSI data with location information. Once a sufficient number of data points are collected, the system uses the pre-trained ranging model to estimate the distance from each point to the target device. These estimates are then fed into a least squares algorithm to compute the initial position of the hidden device.
4. Key Technology Implementation
4.1. Data Collection Based on RTS/CTS Active Sniffing
In non-cooperative environments, the primary challenge in detecting and localizing wireless covert devices is the lack of a controllable, stable source of frame-level data input. Traditional sniffing strategies rely mainly on Beacon frames broadcast by IoT devices. However, the transmission intervals of these management frames are highly unpredictable, affected by factors such as device activity, channel congestion, and power-saving mechanisms. This is particularly problematic in low-activity scenarios like hidden cameras, where severe data shortages and prolonged convergence times often occur.
To overcome these limitations, this paper proposes an active sniffing technique based on the RTS/CTS mechanism. The core idea is to induce target devices to generate and broadcast response CTS frames by sending structured RTS frames, without relying on the cooperation of target devices. Since the RTS/CTS mechanism is a standard feature of the Wi-Fi protocol for collision avoidance, devices with this function enabled will automatically respond with a CTS frame upon receiving a valid RTS frame, even when not connected to a network, thereby generating a large number of identifiable frames in a short time, significantly improving the timeliness and frame density of data collection.
The key to implementing this mechanism lies in the validity of RTS frame construction and the controllability of response packets. In this design, the Receiver Address field in the RTS frame header is fixedly populated with the target MAC address during frame construction, and the frame format, frame sequence number, and control fields are ensured to comply with the IEEE 802.11 [
27] protocol specifications, thereby maximizing the probability of response packets. To address the nonlinear relationship between RTS transmission frequency and CTS response, a feedback-driven frequency adjustment algorithm is designed. This algorithm uses the number of valid CTS responses received per unit time as feedback, employing the following iterative update strategy:
Here, denotes the RTS frame transmission frequency in the tth round, and are the lower and upper response thresholds, respectively; and are adjustment factors; represents the frequency fine-tuning step size; and is the minimum frequency limit allowed by the system. This strategy effectively mitigates the risk of channel congestion caused by excessive RTS transmission frequency while maintaining a high CTS response rate. A frequency rollback condition (i.e., the third item) is also set to address the following scenario: if the number of CTS responses does not increase after the frequency is increased (i.e., ), it is considered that the current frequency has approached the channel bottleneck, and the system will actively reduce the frequency to avoid channel overload. This mechanism not only enhances the stability of the frequency adjustment process but also adapts to the minimum frame interval constraints of different devices, improving the system’s robustness in high-density device environments.
4.2. Efficient Device Type Identification Based on CTS Frame Response Behavior
In non-cooperative wireless sniffing scenarios, accurately identifying hidden device types in the absence of rich control and management frames (e.g., Beacon, Probe, Association) poses a major challenge for building a universal device identification system. Most existing methods rely on high-dimensional features such as vendor information fields, device names, and frame identification numbers in Beacon frames for classification. However, in this system, the sniffing mechanism primarily triggers devices to respond with CTS frames, which have an extremely simplified structure, containing only a fixed-length MAC layer control field and signal strength (RSSI) information, significantly limiting the extraction space for traditional fingerprint features.
To address this issue, this paper proposes a device fingerprint modeling method suitable for low-redundancy frame environments. The core idea is to construct a feature set with distinct differences under the minimal frame structure and use gradient-boosted decision trees (XGBoost) as the classifier to achieve type recognition of typical IoT devices such as cameras, doorbells, and sockets.
4.2.1. Feature Extraction and Frame Behavior Modeling
The system first continuously collects CTS frames emitted by target devices at multiple spatial locations and extracts frame-level raw metadata, including fields such as reception timestamp , RSSI value , and frame length . Since the CTS frame structure is fixed, the system focuses on three types of information sources during the feature engineering phase:
Frame attribute features: Metadata information extracted from CTS frames, such as frame arrival time, length, channel number, RSSI values, etc., reflecting the transmission characteristics of the frame. The system retains fields with device-specific differences and removes features with similar distributions to construct a distinctive raw feature set.
Spatial signal distribution features: For each device’s RSSI sequence generated along its spatial sampling trajectory, the system calculates statistical characteristics (mean, standard deviation, skewness, kurtosis, etc.) to form a channel stability fingerprint.
Frame response behavior features: Define the response rate metric , where NRTS is the number of actively transmitted RTS frames, and NCTS is the corresponding number of received CTS frames. This metric reflects the device’s response strength to active probing, with significant differences observed across different device types in this dimension, such as stable responses from smart plugs and intermittent responses from cameras. During the feature fusion stage, the system filters out fields with minimal variation across devices (e.g., control bits with fixed values) and retains only features with distributional differences across categories to avoid classification noise introduced by “pseudo-features.”
4.2.2. Classification Model Design and Training Strategy
All constructed features are standardized and combined into a vector , which is input into the pre-trained XGBoost classification model. The reasons for selecting this model include: (1) strong expressive power for non-linear feature distributions; (2) robustness to missing values and outlier features, enabling adaptation to differences in data responses across devices; (3) built-in feature importance assessment mechanism, facilitating explanation of model decision logic.
During the training data collection phase, multiple representative spatial locations were selected, and response frames of the target devices were collected from different angles. Device labels were used as supervisory information to form the training sample set. To enhance generalization capabilities, the training samples included multiple devices of the same type (e.g., cameras of different brands) under various scenarios, and interference factors such as occlusion, rotation, and direction changes were added to enhance the model’s adaptability to real-world deployment environments.
The sample training set is as follows, where
.
4.3. Iterative Localization of Hidden Devices Based on RSSI-Based Ranging Model
In complex indoor environments, the precise localization of wireless hidden devices poses significant challenges. Although RSSI theoretically has an inverse relationship with device distance—i.e., higher RSSI values indicate closer distances—this relationship is disrupted in practical applications by various factors such as device characteristics, environmental obstructions, and signal multipath effects, leading to substantial errors in distance estimation. As shown in
Figure 5, the RSSI value does not decrease smoothly with distance due to environmental noise and interference. Additionally, RSSI data is inherently one-dimensional discrete data. If the mean or median is directly used for estimation, it often fails to accurately map the actual distance, limiting its application in high-precision localization.
To overcome these uncertainties, this paper proposes a distance estimation model based on RSSI probability distribution modeling, supplemented by a multi-point iterative localization strategy, to enhance the stability and accuracy of localization. The core idea is to replace the instantaneous RSSI value with the distribution probability characteristics of RSSI values, extracting the steady-state statistical patterns of signals at different spatial positions to perform robust distance estimation between the user and the target device.
4.3.1. RSSI Probability Feature Modeling
Assuming that RSSI samples collected at a fixed distance d follow an unknown distribution
, this paper introduces a local neighborhood probability accumulation strategy to construct a discrete feature vector. Specifically, the probability of RSSI value
occurring is defined as
[
28]. Starting from the minimum RSSI, the following features are constructed:
This structure can effectively resist RSSI disturbances and noise, demonstrating superior generalization capabilities in experiments. By collecting RSSI sample sequences at multiple known distance locations and calculating their local probability characteristics, a mapping relationship between RSSI distribution and distance can be constructed, which is then used to train a supervised learning ranging model. The model adopts the XGBoost architecture, leveraging its strong expressive capabilities for nonlinear relationships and discrete inputs to achieve efficient fitting between RSSI spatial signals and actual physical distances.
4.3.2. Spatial Sampling and Multi-Point Positioning Strategy
To further achieve spatial positioning, this paper designs a positioning method based on multi-point 3D RSSI data fusion. The system integrates a VIO module into the user terminal to record the terminal’s position change vector
in the local 3D coordinate system in real time. At each sampling point location, the system synchronously collects RSSI values and combines them with position information to form data pairs
. After collecting at least three sets of spatial position information, the RSSI-based ranging model is used to calculate the estimated distance to the target device, forming the following nonlinear optimization problem:
where
is the device position to be estimated,
is the sampling point position, and di is the distance between the current point and the device estimated by the RSSI model. This optimization problem is solved using the least squares method to obtain an initial localization estimate.
4.3.3. Iterative Localization and Convergence Strategy
In practical applications, the signal strength emitted by devices typically fluctuates continuously and is influenced by various factors, such as device type, deployment environment, and measurement location. Even at the same distance, the RSSI measurements of the same device may vary significantly depending on the specific location. Nevertheless, RSSI generally exhibits certain consistent patterns: the closer the distance to the device, the stronger the signal strength and the smaller the fluctuation range; conversely, the farther the distance, the weaker the signal strength and the greater the fluctuation.
Based on this characteristic, this paper proposes an iterative localization and convergence strategy to improve positioning accuracy and enhance the system’s adaptability to environmental variability. The strategy begins by estimating a predicted location of the device using the initial RSSI-VIO measurement results and guides the user to move toward that predicted position. As the user approaches, the system collects new RSSI-VIO data to update the estimated location and compares it with the previously collected data to assess whether the user is gradually moving closer to the target device. This process continues iteratively until the prediction error converges to a predefined threshold or the difference between two consecutive iterations falls below a specified value.
The advantages of this strategy are twofold. First, users can perceive whether they are approaching the target device through real-time feedback (e.g., when the signal strength increases or reaches a local peak) during the iteration process. Second, by guiding users to move continuously and adjusting the predicted location based on changes in signal strength, the strategy reduces dependence on RSSI values, thereby improving the system’s resilience to interference in complex environments. Consequently, this iterative localization strategy enhances the robustness and general applicability of the localization system.
6. Conclusions
In this paper, we propose a easily deployable and efficient method for detecting and localizing hidden IoT devices, which improves detection accuracy while minimizing operational complexity. The system utilizes an RTS/CTS-based active sniffing mechanism to trigger device responses, enabling the rapid identification of target devices—typically within five minutes—and efficient completion of scanning tasks in indoor environments. To ensure broad applicability, a prototype was developed on the Android platform and optimized to minimize hardware resource consumption. Experimental results demonstrate that the system exhibits high stability and reliability, achieving an average localization error of only 0.77 m and a device type recognition accuracy of 98.1%.
Although the system has achieved encouraging experimental results, several limitations remain. The current evaluation primarily targets typical indoor environments and has not yet adequately covered complex scenarios with highly congested wireless networks, such as apartment buildings with numerous overlapping networks coexisting. Therefore, conducting further testing in such environments will help to more comprehensively assess the system’s performance and generalization capabilities. Additionally, the system’s final positioning accuracy is influenced by various factors, including prediction errors in the RSSI ranging model and positioning drift introduced by AR applications. Further detailed analysis and experimentation are needed to quantify the impact of these factors and provide a theoretical basis for targeted optimizations in future work. Finally, to achieve effective detection of hidden IoT devices in diverse deployment environments and heterogeneous device platforms, enhancing the system’s hardware adaptability and scalability remains a critical challenge that requires urgent attention and continued effort in future research.