1. Introduction
In large-scale Internet of Things (IoT) systems [
1,
2], devices and services are typically highly distributed. Efficient resource discovery (RD) and the supported queries are critical for ensuring system operation and scalability [
3,
4]. Within the broader Internet environment, multiple mechanisms such as Domain Name System (DNS), UPnP, mDNS, and LDAP have been employed for resource management. However, these methods demonstrate different design philosophies such as hierarchical vs. flat, and symmetric vs. asymmetric roles, but they are not applicable to resource-constrained Message Queuing Telemetry Transport (MQTT)-based IoT scenarios.
MQTT is a lightweight communication protocol widely adopted in the IoT domain. Following a publish/subscribe architecture, it consists of three roles—broker, publisher, and subscriber. Message subscription and forwarding are based on the core concept of topics. MQTT is well suited for constrained devices due to its low bandwidth requirements and implementation simplicity. Nevertheless, MQTT does not natively provide any resource directory functionalities.
To address this gap, Pereira et al. [
1] proposed MQTT-RD—an MQTT-based resource directory. This approach organizes topic-based MQTT resources into a searchable directory and allows users to obtain complete directory information by subscribing to specific topics. While this method provides a certain degree of resource discovery capability, both Pereira et al. [
1] and our experiments indicate that, as the system scales up, the full-mesh synchronization adopted by MQTT-RD results in excessive traffic and processing load; the system stalls even at modest scales—for example, with only 20 sniffers (a sniffer is an RD operating entity). This is due to the symmetric roles of the sniffers, which inevitably leads to disproportionate growth in connections and message overhead. In a short summary, this symmetric design makes the system unscalable.
To overcome these limitations, this paper introduces hierarchical MQTT resource discovery and distribution (HMQTT-RDD). In contrast to MQTT-RD’s symmetric mesh architecture, HMQTT-RDD adopts the asymmetric hierarchy-and-partition management concept of DNS to arrange the sniffers and their management domains. This design reduces synchronization overhead and improves scalability in resource-constrained IoT environments.
The main contributions of this paper are summarized as follows:
Detailed evaluations and analysis of MQTT-RD: We experiment the MQTT-RD implementation. We qualitatively and quantitatively evaluate the overhead of MQTT-RD. We highlight the key structural weaknesses which cause its communication overhead and latency bottlenecks.
The new HMQTT-RDD framework: We provide a hierarchical management principle that reduces cross-domain synchronization overhead.
Implementation and validation of effectiveness: We implement HMQTT-RDD and deploy it in a virtualized IoT environment to demonstrate its effectiveness.
Performance analysis: We evaluate MQTT-RD and HMQTT-RDD in terms of background traffic, synchronization latency, and directory query efficiency.
The remainder of this paper is organized as follows:
Section 2 reviews related work;
Section 3 presents our qualitative analysis and quantitative evaluations of MQTT-RD;
Section 4 presents HMQTT-RDD;
Section 5 discusses the experimental results;
Section 6 concludes the paper.
2. Related Work
Resource discovery (RD) in the Internet has been extensively studied, covering network names, services, and access points [
5,
6,
7,
8,
9]. Resource discovery mechanisms can be characterized by their namespace/data organization (e.g., flat vs. hierarchical with delegation) and their authority/role models (e.g., authoritative–recursive vs. peer-symmetric) [
10,
11,
12]. Representative mechanisms include the following:
Domain Name System (DNS) [
13]: DNS is used for mapping between domain names and IP addresses. It adopts a hierarchical namespace and an asymmetric role design between authoritative servers and recursive resolvers for global name resolution.
Universal Plug and Play (UPnP) [
14]: UPnP represents a set of protocols allowing devices (computers, smart TVs, printers, IoT devices, etc.) to automatically discover each other and establish functional network services with zero configuration. UPnP adopts a flat namespace limited to the local network and an asymmetric role model in which control points discover and manage devices, but no direct interaction is defined between control points.
Multicast DNS (mDNS) [
15]: mDNS is a protocol that provides name resolution within local networks by allowing devices to map hostnames to IP addresses without relying on a central DNS server. The protocol is symmetric, as all participating devices can both send and respond to queries. mDNS adopts a flat namespace with a peer-to-peer communication pattern, where devices exchange information directly within the LAN.
Lightweight Directory Access Protocol (LDAP) [
7]: LDAP provides a directory service that stores and retrieves structured information such as user credentials, organizational hierarchies, and access rights. It adopts a hierarchical data model (Directory Information Tree, DIT), where each directory server is authoritative for its assigned naming context.
Web Crawling and Search Indexing [
16]: Web crawling is a mechanism for automatically discovering and retrieving resources across the Internet, which are then organized into search indexes for efficient retrieval. Crawlers are responsible for collecting and indexing distributed web content, while search engines maintain the centralized indexes and provide responses to user queries.
While effective in Internet-scale environments, these methods were designed for highly diverse resource discovery needs, supporting a wide range of resource types and abundant computational capabilities. In contrast, MQTT-based IoT systems focus on lightweight devices and uniform topic-based resources, where discovery must be efficient, low-overhead, and directly integrated into the publish/subscribe model.
MQTT-specific RD leverages the protocol’s topic mechanism. The pioneering MQTT-RD [
1] introduced a distributed directory maintained through full-mesh synchronization among sniffers. MQTT-RD establishes the baseline for topic-based RD in MQTT, yet it also reveals its scalability weaknesses. Compared to the extensive research on RD in Internet and IoT domains, MQTT-based RD challenges have seldom been explored, with MQTT-RD serving as an early attempt in this direction. Other extensions, such as TD-MQTT [
12], MQTT-SN with CoAP [
17], etc., focus on interoperability, broker scalability, or communication protocol simplification; they do not tackle the resource discovery challenges in MQTT systems.
To tackle the MQTT resource discovery challenges and overcome the weaknesses of MQTT-RD, we propose HMQTT-RDD, a hierarchical and asymmetric RD design inspired by DNS architecture and concepts.
Table 1 summarizes several RD mechanisms, highlighting their architectures, update schemes, and application and suitability. Architecture/update describes entity-to-entity organization and interactions, while query denotes client-to-entity interactions; suitability reflects both.
3. Detailed Analysis of MQTT-RD
MQTT-RD introduces a resource directory (RD) service maintained by specialized clients called sniffers, which record the devices, the supported topics, and the sniffers, along with related attributes such as status and broker information. The broker continues to function solely as a message forwarder, while regular clients (publishers and subscribers) retain their original roles. With this design, MQTT-RD provides directory-based resource discovery fully compatible with the MQTT protocol standards.
Building on this foundation, MQTT-RD exhibits several defining characteristics. First, it leverages the MQTT topic mechanism to emulate registration and discovery operations. Second, it relies on a full-mesh synchronization model, where each sniffer exchanges update messages with every other sniffer to maintain global directory consistency. Third, all Internet sniffers are symmetric in role, handling registration, deletion, query, and synchronization alike. While these traits simplify design and deployment, they introduce critical drawbacks: as the number of sniffers increases, traffic grows drastically, resulting in escalating system operation deterioration. Both the RD maintenance messages and sniffer status-probing traffic such as PingAlive burden the system, limiting scalability and stability. To ensure directory consistency across sniffers, MQTT-RD adopts a full-mesh synchronization mechanism. Its core design can be summarized into the following elements:
As illustrated in
Figure 1, the MQTT-RD architecture consists of a cloud broker, Internet sniffer, local sniffer, local broker, and devices to support registration, synchronization, and query; they all communicate with each other via MQTT to achieve resource registration, synchronization, and query functionalities.
Pereira et al. [
1] implemented and simulated MQTT-RD. In their preliminary study, they reported that as the system scales up to 20 Internet sniffers, the volume of synchronization messages grows tremendously, leading to excessive background traffic and system stall.
According to the description in the MQTT-RD literature, when the number of sniffers exceeds 20, the system experiences abnormal behavior or even crashes. Therefore, we further simulate MQTT-RD and thoroughly analyze the codes/systems to have a detailed understanding of the limitations and weaknesses [
24]. Our results confirm that, when the number of sniffers goes up nearly 20, the synchronization among sniffers crashes, which prevents the resource directory from maintaining consistency.
As illustrated in
Figure 2, each Internet sniffer in MQTT-RD communicates with all others in a full-mesh topology. During synchronization, every Internet sniffer must transmit messages to the remaining Internet
sniffers. Two main message types are RD synchronization updates and PingAlive packets. The total number of packets per synchronization round can be expressed as
.
From this observation, two major weaknesses can be identified:
These trends are quantified in
Table 3 and illustrated in
Figure 3, which presents the estimated communication overhead of MQTT-RD for the various numbers of sniffers. The data show that both registration and PingAlive overheads grow rapidly as the system scales up, which confirms the scalability weaknesses of the full-mesh design.
4. Design of Hierarchical MQTT Resource Discovery and Distribution
HMQTT-RDD adopts three design principles inspired by DNS—hierarchical structure, domain-based management, and recursive queries. These principles replace the flat full-mesh synchronization of MQTT-RD, which leads to dramatic traffic growth and latency.
Tree structure: HMQTT-RDD replaces the flat full-mesh of MQTT-RD with a hierarchical tree to avoid tremendous overhead growth and to distribute responsibilities across layers.
Domain-based management: HMQTT-RDD assigns each sniffer a domain-based responsibility.
Recursive queries: Queries are resolved locally whenever possible, with only unresolved requests forwarded upward. This follows the recursive model of DNS but is tailored for IoT, where low latency is critical in contrast to DNS’s tolerance for higher delays.
Compared with MQTT-RD, HMQTT-RDD eliminates excessive synchronization traffic, reduces query delay by localizing lookups, and maintains eventual consistency in dynamic environments. It further improves availability by allowing regional service continuity and supports lightweight deployment suited to constrained IoT devices.
HMQTT-RDD improves upon MQTT-RD by addressing three major limitations: it reduces synchronization overhead, localizes query latency, and ensures eventual consistency in dynamic environments. In addition, HMQTT-RDD offers regional availability and supports lightweight deployment, which are essential for IoT scenarios. Unlike DNS, which maintains relatively static domain-to-IP mappings with periodic updates, HMQTT-RDD must handle dynamic IoT resources that change frequently. To address this, it relies on event-driven synchronization among sniffers. Similarly, while DNS depends on authoritative servers deployed in stable infrastructures, sniffers in HMQTT-RDD are lightweight clients designed to operate in volatile IoT environments. Furthermore, recursive queries in DNS can tolerate higher latencies, whereas HMQTT-RDD ensures that most queries are resolved locally to satisfy the low-latency requirements of IoT.
Table 4 compares the design orientations of DNS, MQTT-RD, and HMQTT-RDD across naming, management objective, update model, and topology. MQTT-RD adopts a flat, full-mesh arrangement of sniffers with event-driven peer-to-peer synchronization and a global, non-partitioned directory. By contrast, HMQTT-RDD inherits DNS’s hierarchical and asymmetric principles but modifies the update model and query mechanism to better match IoT dynamics—partitioning responsibilities across a sniffer hierarchy and using layer-based, event-driven synchronization with local-first lookups. These adaptations clearly improve upon MQTT-RD’s flat full-mesh design and demonstrate that HMQTT-RDD is a tailored solution for IoT rather than a direct copy of DNS.
4.1. Hierarchical Management Protocol
In HMQTT-RDD, sniffers are organized into a hierarchical structure (
Figure 4). Within this protocol, a client query is first processed by its local sniffer; if the requested information is outside the domain and not in the cache (not yet implemented), the query is escalated to the root, which maintains the global directory. Cross-root coordination is beyond the current scope and left as future work. The three sniffer types are defined as follows:
Root: Located at the top layer, the root sniffer aggregates resource information from all sniffers and provides global resource query services.
Intermediate: Sniffers in the intermediate levels receive synchronization messages from the lower-level sniffers in its domain, updates its directory, handle registration, and reports the results to the upper layer.
Leaf: Sniffers in the leaf levels directly manage devices within their domain, handle registration and status updates, and report them to the upper layer.
Figure 4.
Hierarchical architecture of HMQTT-RDD.
Figure 4.
Hierarchical architecture of HMQTT-RDD.
4.2. Synchronization and PingAlive Protocols
HMQTT-RDD employs a step-wise synchronization mechanism for resource updates (
Figure 5). In this approach, each sniffer aggregates resource information within its domain and forwards the aggregated results to its upper layer, instead of performing full-mesh synchronization with every other sniffer as in MQTT-RD.
To monitor the liveness of sniffers, HMQTT-RDD employs a unidirectional hierarchical PingAlive (contrasted with the mesh-based PingAlive in MQTT-RD). Each sniffer sends PingAlive messages only to its parent (upper layer). The parent implements a timeout mechanism to verify the liveness of its child sniffers, confining failures within their domains.
4.3. Resource Query Protocol
In HMQTT-RDD, RD-exchange messages and queries are transmitted through MQTT topics, with the following optimization strategies:
Publication interval adjustment with retained messages: The publication frequency of /getlist is extended to 60 s to reduce the traffic overhead caused by frequent broadcasts. At the same time, the MQTT retained mechanism facilitates clients obtaining the most recent data even outside the publication interval; it balances query timeliness and traffic reduction.
Localized directory maintenance: Each sniffer maintains only the resource directory of its own domain rather than the global directory (global directory cache will be implemented in future work). This avoids unnecessary cross-domain synchronization and transmission; it further improves system scalability.
Based on this design, clients first query their local sniffer for resource information. If the query misses, the client then redirects the request to the root sniffer according to the directory information (
Figure 6). Thus, even in the worst case, the client only needs to perform one additional upward query. This process ensures both query efficiency and system scalability while avoiding the overhead associated with global queries.
4.4. Resource Directory Data Structure
The resource directory is one of the core components of the HMQTT-RDD architecture; it is responsible for supporting the storage, maintenance, and information exchange of sniffers. Its main contents, summarized in
Table 5, include sniffer identification information, hierarchical roles, key communication parameters, and the list of managed devices.
5. Experiments and Performance Evaluation
This section evaluates the efficiency of the proposed HMQTT-RDD architecture. The evaluation is conducted through experiments in three aspects:
Synchronization cost: Observing the communication overhead incurred during synchronization under different architectures.
PingAlive mechanism: Comparing the status-probing traffic between MQTT-RD and HMQTT-RDD.
Client query performance: Analyzing the latency and hit rate of single-query and two-step query processes.
5.1. Experimental Environment
The current experiment adopts a binary tree (a general hierarchical structure could also be adjusted) as the baseline scenario for evaluating the feasibility and performance differences of HMQTT-RDD. Such a configuration provides a simple environment which easily highlights the advantages of hierarchical design compared to traditional MQTT-RD; this simple baseline experiment could be extended for subsequent comparisons. We note that the design of HMQTT-RDD is a general hierarchical structure: it can be flexibly deployed as other topologies, such as ternary trees or multi-root backup mechanisms, to accommodate different scales and application requirements in IoT environments.
The experiments were conducted on a virtual platform to simulate hierarchical management scenarios under the HMQTT-RDD architecture.
Table 6 summarizes the specifications of the virtualized platform and software environment used in the experiments.
Table 7 details the sniffer configuration, including hierarchical roles and parent–child relations. The hierarchical structure of the entire setup is depicted in
Figure 7.
5.2. Evaluation Metrics
To evaluate the proposed approach, this study adopts three performance metrics: registration overhead, PingAlive overhead, and client query delay.
Table 8 summarizes these metrics, which capture the primary communication behaviors of the system and serve as the basis for comparing different architectures. Importantly, these metrics are not limited to the binary configuration used in this study but can also be applied to other hierarchical structures (e.g., ternary trees or multi-root mechanisms), ensuring generality of the evaluation.
5.3. Experimental Procedure
5.3.1. Registration and PingAlive Overhead
In the designed hierarchical topology (
Figure 7, consisting of one root, two intermediate, and two leaf sniffers), all sniffers complete initialization before entering the synchronization test. In each test round, every sniffer processes one synchronization message, iterated over 10 rounds; this results in a total of 50 synchronization events. The system records the time required for each message to complete synchronization. After the synchronization phase, the system enters a fixed stabilization period, during which the last 20 PingAlive messages are collected to compute the average delay. The above process is repeated four times under the same environment, and the aggregated results are reported as the evaluation outcome.
5.3.2. Client Query Delay:
Here, it measures the delay between the request of an RD-topic subscription request (from a client) and the receiving of a response from the sniffer. Two kinds of scenarios are conducted: the local query is the interactions between the two machines within the same domain, and the other is the root query which involves one local client and a root sniffer (here we use the public and remote brokers on the Internet to act as a root sniffer). A local query involves one single interaction while a remote-root query consists of one local query and one remote-root query. All tests are repeated multiple times, and the average values are reported to ensure the stability and reliability of the results.
5.4. Results and Analysis
The experiment results for registration and PingAlive overhead are summarized in
Table 9. For registration overhead, HMQTT-RDD achieves a latency reduction of approximately 15–29%, and it also significantly decreases the number of processed messages. This demonstrates that the hierarchical design effectively reduces the synchronization workload. For PingAlive overhead, the latency is reduced by more than 75%, and it confirms that the zone-based mechanism in HMQTT-RDD can substantially reduce background maintenance traffic and delay.
Now we evaluate the asymptotic performance of registration overhead, assuming a packet size of 370 bytes. For MQTT-RD, each sniffer synchronizes its information with all others, yielding the total traffic in Equation (1). In contrast, in HMQTT-RDD, each sniffer only forwards its information upward through the hierarchy to the root, where the cost depends on the hop count , as defined in Equation (2).
The results are summarized in
Table 10 and
Figure 8, showing that the hierarchical design significantly reduces synchronization traffic compared to the full-mesh approach.
The asymptotic performance of PingAlive overhead assumes a packet size of 114 bytes. In MQTT-RD, each sniffer transmits PingAlive messages to all the other N − 1 sniffers, resulting in the total traffic defined in Equation (3). In HMQTT-RDD, only non-root sniffers transmit PingAlive messages to their upper layer, leading to the reduced traffic shown in Equation (4).
The results are presented in
Table 11 and
Figure 9, confirming that the hierarchical design minimizes background maintenance traffic and scales efficiently with the number of sniffers.
As shown in
Table 10 and
Table 11, when the number of sniffers scales up to around 1000, the communication cost of HMQTT-RDD remains only 2.895 MB for registration synchronization and 0.140 MB for PingAlive. In contrast, under the same scenarios, MQTT-RD incurs 369.639 MB and 143.859 MB, respectively. These results clearly demonstrate that HMQTT-RDD can significantly reduce communication overhead in large-scale deployments while maintaining the cost within an acceptable range.
In the client query experiments, the average latency of a local query is 47.44 ms, while that of a root query is 862.19 ms, resulting in a total latency of 909.63 ms for a two-step query in the HMQTT-RDD architecture. Assuming that the hit rate of the first query is
p, the expected query latency
T can be expressed by Equation (5):
In contrast, MQTT-RD supports only a single query, which corresponds to the local query with an average latency of approximately 47.44 ms. The query latency of HMQTT-RDD under different hit rates is summarized in
Table 12. Since most device resources are deployed in the local or lower-level domains, the local hit rate in HMQTT-RDD is typically greater than 50%. Even under a hit rate as low as 50%, the expected latency remains around 0.5 s, which is acceptable and shows that the overhead of two-step queries has only a limited impact on overall system performance.
5.5. Discussion
MQTT-RD suffers from structural limitations due to its symmetric and flat design. This results in tremendous growth of the synchronization overhead and background traffic, which cause the system to become unstable or even crash once the number of sniffers exceeds about twenty.
HMQTT-RDD addresses this issue through a hierarchical and asymmetric design inspired by DNS. By assigning domain-based responsibilities and reducing redundant synchronization, it significantly reduces communication costs and ensures scalability. Experiments confirm that HMQTT-RDD remains stable with nearly one thousand sniffers, reduces both registration and PingAlive overhead, and keeps query latency below 0.5 s even at a 50% local hit rate. These findings highlight that architectural choices, not parameter tuning, are the decisive factor in enabling efficient and scalable resource discovery.
Table 13 situates the proposed architecture within the broader context of related approaches. Since other methods such as LDAP, mDNS, UPnP, and web crawling/indexing do not target MQTT-based resource discovery and the targeted resources are quite different, there are no corresponding experiments to enable a fair performance comparison; therefore, we specify NA (Not Applicable) in the last row for these approaches.
In the second row, “Function”, we specify the design goals of these resource discovery mechanisms; among them, we note that only MQTT-RD and HMQTT-RDD target MQTT-based IoT resource discovery. In the third row, we specify the various resource types of these mechanisms; here, we need to highlight that it is quite interesting that all the resources in an MQTT-based IoT system could be effectively specified as some topics; this makes it perfect to select MQTT as its communication protocol for transmitting both normal MQTT messages and RD messages; that is why, in the fifth row “communication protocol”, both MQTT-RD and HMQTT-RDD use only MQTT as the communication protocol for normal MQTT messages and for RD messages. And, this is why MQTT-based IoT systems need customized resource discovery mechanisms.
In the fourth row, we specify the RD entity architectures for different RD approaches. Here, we highlight the selection of the hierarchical and asymmetric design of HMQTT-RDD causing our design to significantly outperform MQTT-RD.
5.6. Remaining Challenges
Although the experiments confirm that HMQTT-RDD effectively improves upon MQTT-RD, several challenges remain.
One major challenge is that the current evaluation was conducted only in controlled test environments. In real-world IoT deployments with unstable links and frequent device churn, the scalability and stability of the system still require further validation.
The second challenge is that further improvement in traffic reduction is still possible. To address this challenge, lightweight consistency checks (for example, hash comparison of RD entries), re-synchronization mechanisms (for example, incremental updates when mismatches are detected), and caching (for example, caching across layers or cross domains) could be investigated to validate their improvement potential.
One another big challenge is the lightweight security for HMQTT-RDD. Currently, both MQTT-RD and HMQTT-RDD could apply SSL/TLS to secure authentication and transmission privacy. However, as the number of sniffers and the volume of traffic increase, the overhead of SSL/TLS would be amplified. Therefore, customized lightweight security mechanisms would be necessary and promising.
One final challenge is how to design and integrate various RD mechanisms for various IoT systems (for example, MQTT-based IoTs, COAP-based IoTs, HTTP-based IoTs, …, etc.). In the future, there would be different kinds of IoT systems for different environments or for different applications. Therefore, in the future, one big challenge is to integrate various IoT systems and their RD mechanisms.
6. Conclusions and Future Work
This work has proposed HMQTT-RDD, a hierarchical resource directory architecture inspired by DNS, to tackle the scalability limitations of MQTT-RD. By shifting from a flat, fully symmetric structure to a hierarchical and asymmetric model, HMQTT-RDD reduces communication overhead, improves synchronization efficiency, and maintains acceptable query latency in large-scale IoT deployments. Asymptotic performance analysis and experimental evaluation confirm that this design offers clear benefits for scalable and reliable resource discovery. Based on this promising approach, some further improvements and challenges are worth studying.
One is the evaluation for large-scale real-world deployments to investigate the performance impacted by some uncontrolled environments. The second is to further investigate some performance improvement mechanisms, like caching, lightweight consistency checks, and re-synchronization mechanisms.
The third one is lightweight security for IoT systems, especially for lightweight MQTT-based IoT systems. One big challenge is the integration of various IoT systems and their corresponding RD mechanisms.
Finally, we would like to note that, as HMQTT-RD could effectively and efficiently manage various resources in MQTT-based IoT systems, one interesting extension of HMQTT-RD is to extend its resource types; current implementations only cover resources about brokers, sniffers, and devices. We might extend the resource types to cover all valuable information in IoT systems so that the huge amount of collected data could be the foundational source for AI reasoning and digital twin applications.