MQTT Broker Architectural Enhancements for High-Performance P2P Messaging: TBMQ Scalability and Reliability in Distributed IoT Systems

Shvaika, Dmytro; Shvaika, Andrii; Artemchuk, Volodymyr

doi:10.3390/iot6030034

Open AccessArticle

MQTT Broker Architectural Enhancements for High-Performance P2P Messaging: TBMQ Scalability and Reliability in Distributed IoT Systems

by

Dmytro Shvaika

^1,2,*,†

,

Andrii Shvaika

^1,2,†

and

Volodymyr Artemchuk

^1,3,4,5,†

¹

G.E. Pukhov Institute for Modelling in Energy Engineering of the NAS of Ukraine, 02000 Kyiv, Ukraine

²

ThingsBoard, Inc., New York, NY 10007, USA

³

Center for Information-Analytical and Technical Support of Nuclear Power Facilities Monitoring of the NAS of Ukraine, 03142 Kyiv, Ukraine

⁴

Department of Information Systems in Economics, Kyiv National Economic University Named After Vadym Hetman, 54/1 Peremohy Ave., 03057 Kyiv, Ukraine

⁵

Department of Intellectual Cybernetic Systems, State Non-Profit Enterprise State University “Kyiv Aviation Institute”, 1 Liubomyra Huzara Ave., 03058 Kyiv, Ukraine

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

IoT 2025, 6(3), 34; https://doi.org/10.3390/iot6030034

Submission received: 6 May 2025 / Revised: 13 June 2025 / Accepted: 19 June 2025 / Published: 23 June 2025

(This article belongs to the Special Issue IoT and Distributed Computing)

Download

Browse Figures

Versions Notes

Abstract

The Message Queuing Telemetry Transport (MQTT) protocol remains a key enabler for lightweight and low-latency messaging in Internet of Things (IoT) applications. However, traditional broker implementations often struggle with the demands of large-scale point-to-point (P2P) communication. This paper presents a performance and architectural evaluation of TBMQ, an open source MQTT broker designed to support reliable P2P messaging at scale. The broker employs Redis Cluster for session persistence and Apache Kafka for message routing. Additional optimizations include asynchronous Redis access via Lettuce and Lua-based atomic operations. Stepwise load testing was performed using Kubernetes-based deployments on Amazon EKS, progressively increasing message rates to 1 million messages per second (msg/s). The results demonstrate that TBMQ achieves linear scalability and stable latency as the load increases. It reaches an average throughput of 8900 msg/s per CPU core, while maintaining end-to-end delivery latency within two-digit millisecond bounds. These findings confirm that TBMQ’s architecture provides an effective foundation for reliable, high-throughput messaging in distributed IoT systems.

Keywords:

MQTT; TBMQ; P2P messaging; Kafka; Redis; Lettuce

1. Introduction

The increasing complexity and scale of IoT systems have created a pressing need for messaging solutions that combine lightweight protocols with the ability to handle high device density, low latency, and reliable delivery [1,2]. However, in high-throughput IoT deployments, these requirements remain a challenge for many MQTT brokers—particularly under point-to-point (P2P) communication patterns that impose strict performance and persistence guarantees. With billions of connected devices exchanging data across domains such as industrial automation, urban infrastructure, and healthcare, efficient device-to-device communication is becoming increasingly important in scenarios that demand direct, low-latency delivery between endpoints [3,4].

Real-world demand for high-throughput P2P messaging is especially pronounced in large-scale Industrial IoT (IIoT) environments. For example, modern automotive manufacturing lines often rely on direct, low-latency communication between robotic welders, PLCs, and quality control stations. They exchange thousands of messages per second across hundreds of synchronized device pairs. These interactions must tolerate intermittent disconnections, demand strict ordering, and cannot rely on centralized fan-in architectures without incurring latency or bottlenecks. Similar requirements exist in warehouse automation systems operated by major e-commerce providers, where fleets of autonomous mobile robots (AMRs) continuously coordinate motion plans and route updates with specific peers. These systems often generate millions of P2P message events per second per site, making throughput, reliability, and scalability essential at the messaging layer. Such workloads clearly highlight the need for broker architectures that are optimized not for broadcast, but for targeted, high-volume, reliable delivery between endpoints.

As the ecosystem evolved and performance demands increased, the ThingsBoard development team—well known in the academic community due to the platform’s widespread adoption in scientific research, as evidenced by the mapping study of Di Felice and Paolone [5] and numerous applied studies [4,6,7,8,9,10,11], identified the need for a specialized messaging solution capable of providing scalability, fault tolerance, and real-time delivery.

This led to the creation of TBMQ, a distributed MQTT broker optimized for high-performance communication in the Internet of Things. The system was initially developed in 2020, deployed in production environments by 2021, and released as open source software in 2023. Internal testing demonstrated its ability to support over 100 million concurrent connections and process several million messages per second.

The majority of MQTT brokers face significant bottlenecks in terms of scalability and latency, primarily due to their reliance on disk-based storage systems aimed at maximizing crash durability. While MQTT brokers typically excel in one-to-many [12] and many-to-one [13] communication patterns, they often encounter performance bottlenecks in high-throughput one-to-one (point-to-point, P2P) messaging scenarios. These challenges become more pronounced in large-scale deployments, where high concurrency, low latency, and efficient resource utilization are critical. Common architectural limitations include blocking I/O operations, centralized routing logic, and poor scalability under concurrent load.

The scientific problem addressed in this work is how to design an MQTT broker architecture that overcomes these scalability and latency limitations in the context of high-throughput P2P messaging. At the same time, the system must ensure reliable message persistence and support for horizontal scalability. While distributed message brokers have been studied extensively, there remains a lack of horizontally scalable architectures that implement multi-layered persistence models capable of delivering low-latency message processing while preserving system resilience and fault tolerance.

This research addresses that gap by presenting the architectural evolution of TBMQ, an open source MQTT broker that was originally designed to reliably aggregate data from IoT devices and forward it to back-end systems using Kafka for durable message routing. To support high-throughput point-to-point messaging at scale, the persistence layer of TBMQ was redesigned by transitioning from a PostgreSQL-based architecture to a horizontally scalable, in-memory model based on Redis. This shift eliminated key performance bottlenecks associated with disk-based storage and enabled low-latency message delivery with reliable session management. This paper details how this transformation was implemented and demonstrates its practical benefits in distributed IoT environments where efficient resource usage, resilience, and reliable message delivery are essential.

The scientific contribution of this study lies in the design and implementation of a horizontally scalable, multi-layered persistence architecture for MQTT brokers. This architecture combines Redis for low-latency in-memory session management with Kafka for fast and reliable message routing. Unlike traditional approaches that rely on monolithic, disk-based databases for session and message persistence, this architecture offloads operational workloads to distributed, in-memory storage layers. This enables true horizontal scalability for high-throughput point-to-point (P2P) messaging patterns in MQTT systems, while preserving reliability and message ordering.

The goal of this study is to demonstrate how the architectural decisions and performance improvements defined in the recent 2.0.x releases of TBMQ [14], optimize persistent session handling, enhance P2P messaging, and improve overall system efficiency within a scalable IoT architecture.

2. Related Work

The architectural scalability of MQTT brokers has been extensively studied in recent years. Spohn [15] provides a comprehensive analysis of typical MQTT scalability bottlenecks and explores clustering and federation techniques to mitigate broker overload in distributed environments. Hmissi and Ouni [2] proposed SDN-DMQTT, a reconfigurable MQTT architecture that leverages software-defined networking to dynamically adapt broker topologies, thereby improving resilience and adaptability. Akour et al. [16] introduced a multi-level elasticity model that enables brokers to scale responsively based on workload demands. Despite these advancements, most existing work focuses on one-to-many or many-to-one communication patterns and does not explicitly address point-to-point (P2P) messaging scenarios.

Several researchers have conducted benchmark-driven evaluations of MQTT brokers under load. Mishra et al. [17,18] performed stress testing of various brokers under concurrent workloads, identifying key factors influencing throughput and latency. Dizdarevic et al. [19] experimentally benchmarked multiple open source MQTT brokers, revealing significant performance variation across implementations. Kashyap et al. [20] analyzed EMQX’s internal architecture to demonstrate how it supports scalability and high availability. However, these studies rarely examine one-to-one communication or persistence strategies under high-throughput conditions.

Recent efforts have explored integrating MQTT brokers with modern stream-processing and caching backends such as Kafka and Redis. Gupta [21] investigated a Redis–Kafka architecture for time-series ingestion, highlighting its suitability for latency-sensitive workloads. Likewise, EMQX [22] demonstrated how Redis-based extensions can enhance MQTT with real-time data processing capabilities. These integrations have informed the design of TBMQ, which employs Redis for in-memory session persistence and Kafka for durable message routing.

Point-to-point messaging is increasingly critical for direct device-to-device communication in IoT systems. However, it remains underrepresented in MQTT research. Mishra and Kertesz [1] surveyed MQTT usage in M2M and IoT contexts but did not focus on the architectural challenges of P2P messaging. Ma et al. [23] applied queueing theory to model server availability and contention in P2P content delivery, showing its impact on performance and energy efficiency. Shen and Ma [24] extended this analysis to networks with malicious peers, proposing repairable breakdown models to maintain robustness. Dong and Chen [25] proposed ARPEC, an adaptive routing strategy for edge P2P systems using graph-based optimization, though its computational complexity may hinder real-time deployment. Security concerns were addressed by Dinculeană and Cheng [3], who proposed lightweight alternatives to conventional encryption, such as Value-to-HMAC, to reduce overhead in resource-constrained P2P deployments.

Despite these contributions, horizontally scalable architectures that explicitly support reliable, high-throughput point-to-point messaging in MQTT systems remain underexplored. To address this gap, the present study introduces TBMQ—an open source MQTT broker enhanced with a multi-layered persistence design combining Kafka for durable message routing and Redis for low-latency in-memory session state management. This architecture is intended to support scalable and fault-tolerant P2P communication in distributed IoT environments. Preliminary results and architectural motivations were previously outlined in a short paper presented at CEUR Workshop Proceedings [26]. This article extends that work with a more detailed analysis, broader experimental validation, and a structured discussion of architectural trade-offs, particularly concerning horizontal scalability, persistent session handling, and reliable point-to-point messaging at scale.

3. Materials and Methods

This section describes the architectural and implementation details of TBMQ, the configuration and optimization of the persistence layer, and the experimental environment used to evaluate the system’s performance.

3.1. Architecture and Implementation Details

While the TBMQ 1.x version can handle 100 million clients [27] at once and dispatches 3 million msg/s [28], as a high-performance MQTT broker it was primarily designed to aggregate data from IoT devices and deliver it to back-end applications reliably (QoS 1). This architecture is based on operational experience accumulated by the TBMQ development team through IIoT and other large-scale IoT deployments, where millions of devices transmit data to a limited number of applications.

These deployments highlighted that IoT devices and applications follow distinct communication patterns. IoT devices or sensors publish data frequently but subscribe to relatively few topics or updates. In contrast, applications subscribe to data from tens or even hundreds of thousands of devices and require reliable message delivery. Additionally, applications often experience periods of downtime due to system maintenance, upgrades, failover scenarios, or temporary network disruptions.

To address these differences, TBMQ introduces a key feature: the classification of MQTT clients as either standard (IoT devices) or application clients. This distinction enables optimized handling of persistent MQTT sessions for applications. Specifically, each persistent application client is assigned a separate Kafka topic. This approach ensures efficient message persistence and retrieval when an MQTT client reconnects, improving overall reliability and performance. Additionally, application clients support MQTT’s shared subscription feature, allowing multiple instances of an application to efficiently distribute message processing.

Kafka [29] serves as one of the core components. Designed for high-throughput, distributed messaging, Kafka efficiently handles large volumes of data streams, making it an ideal choice for TBMQ [30]. The latest Kafka versions are capable of managing a huge number of topics. This makes the architecture well-suited for enterprise-scale deployments. Kafka’s robustness and scalability have been validated across diverse applications, including real-time data streaming and smart industrial environments [30,31,32].

Figure 1 illustrates the full fan-in setup in a distributed TBMQ cluster.

In TBMQ 1.x, standard MQTT clients relied on PostgreSQL for message persistence and retrieval, ensuring that messages were delivered when a client reconnected. While PostgreSQL performed well initially, it had a fundamental limitation—it could only scale vertically. It was anticipated that, as the number of persistent MQTT sessions grew, PostgreSQL’s architecture would eventually become a bottleneck. To address these scalability limitations, more robust alternatives were investigated to meet the increasing performance demands of TBMQ. Redis was quickly chosen as the best fit due to its horizontal scalability, native clustering support, and widespread adoption.

Unlike the fan-in, the point-to-point (P2P) communication pattern enables direct message exchange between MQTT clients. Typically implemented using uniquely defined topics, P2P is well-suited for private messaging, device-to-device communication, command transmission, and other direct interaction use cases.

One of the key differences between fan-in and peer-to-peer MQTT messaging is the volume and flow of messages. In a P2P scenario, subscribers do not handle high message volumes, making it unnecessary to allocate dedicated Kafka topics and consumer threads to each MQTT client. Instead, the primary requirements for P2P message exchange are low latency and reliable message delivery, even for clients that may go offline temporarily. To meet these needs, TBMQ optimizes persistent session management for standard MQTT clients, which include IoT devices.

Figure 2 illustrates a performance bottleneck within the TBMQ architecture that relies on PostgreSQL for persistent sessions storage.

3.2. PostgreSQL Usage and Limitations

To fully understand the reasoning behind this shift, it is important to first examine how MQTT clients operated within the PostgreSQL architecture. This architecture was built around two key tables.

The device_session_ctx was responsible for maintaining the session state of each persistent MQTT client:

The key columns are last_packet_id and last_serial_number, which is used to maintain message order for persistent MQTT clients:

last_packet_id represents the packet ID of the last MQTT message received.
last_serial_number acts as a continuously increasing counter, preventing message order issues when the MQTT packet ID wraps around after reaching its limit of 65535.

The device_publish_msg table was responsible for storing messages that must be published to persistent MQTT clients (subscribers).

The key columns to highlight:

time—captures the system time (timestamp) when the message is stored. This field is used for periodic cleanup of expired messages.
msg_expiry_interval—represents the expiration time (in seconds) for a message. This is set only for incoming MQTT 5 messages that include an expiry property. If the expiry property is absent, the message does not have a specific expiration time and remains valid until it is removed by time or size-based cleanup.

While this design ensured reliable message delivery, it also introduced performance constraints. To better understand its limitations, prototype testing was performed to evaluate PostgreSQL’s performance under the P2P communication pattern. Using a single instance with 64 GB RAM and 12 CPU cores, message loads were simulated with a dedicated performance testing tool [33] capable of generating MQTT clients and simulating the desired message load. The primary performance metric was the average message processing latency, measured from the moment the message was published to the point it was acknowledged by the subscriber. The test was considered successful only if there was no performance degradation, meaning the broker consistently maintained an average latency in the two-digit millisecond range.

Prototype testing ultimately revealed a throughput limit of 30 k msg/s when using PostgreSQL for persistent message storage. Throughput refers to the total number of msg/s, including both incoming and outgoing messages, as seen in Figure 3.

Based on the TimescaleDB blog post [34], vanilla PostgreSQL can handle up to 300 k inserts per second under ideal conditions. However, this performance depends on factors such as hardware, workload, and table schema. While vertical scaling can provide some improvement, PostgreSQL’s per-table insert throughput eventually reaches a hard limit. This experiment confirmed a fundamental scalability limit inherent to PostgreSQL’s vertically scaled architecture. Although PostgreSQL has demonstrated strong performance in benchmark studies under concurrent read-write conditions [35], and has been used in large-scale industrial systems handling hundreds of thousands of transactions per second [36], its architecture is primarily optimized for single-node operation and lacks built-in horizontal scaling. As the number of persistent sessions in TBMQ deployments scaled into the millions, this model created a bottleneck. Confident in Redis’s ability to overcome this bottleneck, the migration process was initiated to achieve greater scalability and efficiency. The migration process began with an evaluation of Redis data structures that could replicate the essential logic implemented with PostgreSQL.

3.3. Redis as a Scalable Alternative

The decision to migrate to Redis was driven by its ability to address the core performance bottlenecks encountered with PostgreSQL. Unlike PostgreSQL, which relies on disk-based storage and vertical scaling, Redis operates primarily in memory, significantly reducing read and write latency. Additionally, Redis’s distributed architecture enables horizontal scaling, making it an ideal fit for high-throughput messaging in P2P communication scenarios [37]. Recent studies demonstrate the successful application of Redis in cloud and IoT scenarios. These include enhancements through asynchronous I/O frameworks such as io_uring which further boost throughput under demanding conditions [38]. In addition, migration pipelines from traditional relational databases to Redis have been validated as efficient strategies for improving system responsiveness and scalability [39]. Figure 4 illustrates the updated architecture, where Redis replaces PostgreSQL as the persistence layer for standard MQTT clients.

With these benefits in mind, the migration process was initiated. It started with evaluating data structures capable of preserving the functionality of the PostgreSQL approach while aligning with Redis Cluster constraints to enable efficient horizontal scaling. This also presented an opportunity to improve certain aspects of the original design, such as periodic cleanups, by leveraging Redis features like built-in expiration mechanisms.

3.3.1. Redis Cluster Constraints

During the migration from PostgreSQL to Redis, it was identified that replicating the existing data model would require multiple Redis data structures. These were needed to efficiently handle message persistence and ordering. This, in turn, meant using multiple keys for each persistent MQTT Client session (see Figure 5).

Redis Cluster distributes data across multiple slots to enable horizontal scaling. However, multi-key operations must access keys within the same slot. If the keys reside in different slots, the operation triggers a cross-slot error, preventing the command from executing. The persistent MQTT client ID is embedded as a hash tag in key names to address this. By enclosing the client ID in curly braces, Redis ensures that all keys for the same client are hashed to the same slot. This guarantees that related data for each client stays together, allowing multi-key operations to proceed without errors.

3.3.2. Atomic Operations with Lua Scripting

Consistency is critical in a high-throughput environment like TBMQ, where many messages can arrive simultaneously for the same MQTT client. Hashtagging helps to avoid cross-slot errors, but without atomic operations, there is a risk of race conditions or partial updates. This could lead to message loss or incorrect ordering. It is important to make sure that operations updating the keys for the same MQTT client are atomic, as seen in Figure 5.

While Redis ensures atomic execution of individual commands, updating multiple data structures for each MQTT client required additional handling. Executing these sequentially without atomicity opens the door to inconsistencies if another process modifies the same data in between commands. That is where Lua scripting comes in. Lua script executes as a single, isolated unit. During script execution, no other commands can run concurrently. This ensures that the operations inside the script happen atomically.

Based on this information, for operations such as saving messages or retrieving undelivered messages upon reconnection, a separate Lua script is executed. This ensures that all operations within a single Lua script reside in the same hash slot, maintaining atomicity and consistency.

3.3.3. Choosing the Right Redis Data Structures

One of the key requirements of the migration was maintaining message order, a task previously handled by the serial_number column in PostgreSQL’s device_publish_msg table. An evaluation of Redis data structures identified sorted sets (ZSETs) as the most suitable replacement.

Redis sorted sets naturally organize data by score, enabling quick retrieval of messages in ascending or descending order. While sorted sets provided an efficient way to maintain message order, storing full message payloads directly in sorted sets led to excessive memory usage. Redis does not support per-member

T T L

within sorted sets. As a result, messages persisted indefinitely unless explicitly removed. Periodic cleanups using ZREMRANGEBYSCORE were required, similar to the approach used in PostgreSQL, to remove expired messages. This operation carries a complexity of

O (log N + M)

, where N is the number of elements in the set and M is the number of elements removed. To address this limitation, message payloads were stored in string data structures. The sorted set maintained references to these string keys. Figure 6 illustrates this structure, client_id is a placeholder for the actual client ID, while the curly braces around it are added to create a hash tag.

In the image above, you can see that the score continues to grow even when the MQTT packet ID wraps around. Figure 6 illustrates the details illustrated in this image. At first, the reference for the message with the MQTT packet ID equal to 65534 was added to the sorted set:

ZADD {client_id}_messages 65534 {client_id}_messages_65534

Here, client_id_messages is the sorted set key name, where client_id acts as a hash tag derived from the persistent MQTT client’s unique ID. The suffix _messages is a constant added to each sorted set key name for consistency. Following the sorted set key name, the score value 65534 corresponds to the MQTT packet ID of the message received by the client. Finally, the reference key links to the actual payload of the MQTT message. Similar to the sorted set key, the message reference key uses the MQTT client’s ID as a hash tag, followed by the _messages suffix and the MQTT packet ID value.

In the following step, the message reference with a packet ID of 65535 is added to the sorted set. This is the maximum packet ID, as the range is limited to 65535.

ZADD {client_id}_messages 65535 {client_id}_messages_65535

Since the MQTT packet ID wraps around after 65535, the next message will receive a packet ID of 1. To preserve the correct sequence in the sorted set, the score is incremented beyond 65535 using the following Redis command:

ZADD {client_id}_messages 65536 {client_id}_messages_1

So at the next iteration, the MQTT packet ID should be equal to 1, while the score should continue to grow and be equal to 65536.

This approach ensures that the message’s references will be properly ordered in the sorted set regardless of the packet ID’s limited range.

Message payloads are stored as string values with SET commands that support expiration

(E X)

, providing

O (1)

complexity for writes and

T T L

applications:

SET {client_id}_messages_1 "{
\"packetType\":\"PUBLISH\",
\"payload\":\"eyJkYXRhIjoidGJtcWlzYXdlc29tZSJ9\",
\"time\":1736333110026,
\"clientId\":\"client\",
\"retained\":false,
\"packetId\":1,
\"topicName\":\"europe/ua/kyiv/client/0\",
\"qos\":1
}" EX 600

In addition to supporting efficient updates and TTL-based expiration, the message payloads can also be accessed or deleted with constant time complexity

O (1)

and leave the sorted set structure unchanged:

GET {client_id}_messages_1
DEL {client_id}_messages_1

Another very important aspect of the Redis-based persistence design is the use of a string key to record the last processed MQTT packet ID:

GET {client_id}_last_packet_id "1"

This mechanism serves the same function as its counterpart in the PostgreSQL approach. Upon client reconnection, the server needs to identify the appropriate packet ID for the next message to be stored in Redis. An initial approach involved using the highest score in the sorted set as a reference. However, because scenarios may arise where the sorted set is empty or removed, storing the last packet ID separately was identified as the most reliable solution.

3.3.4. Managing Sorted Set Size Dynamically

This hybrid approach eliminates the need for the time-based periodic cleanups by applying per-message

T T L s

. Additionally, to maintain consistency with the PostgreSQL design, the sorted set is cleaned up when the number of stored messages exceeds the configured limit.

Maximum number of PUBLISH messages stored for each persisted DEVICE client
limit: "${MQTT_PERSISTENT_SESSION_DEVICE_PERSISTED_MESSAGES_LIMIT:10000}"

The configured limit plays a key role in managing and predicting memory usage per persistent MQTT client. For instance, a client may briefly connect—initiating a persistent session—and then disconnect shortly after. In such cases, it is crucial to cap the number of retained messages to avoid unbounded memory growth while waiting for the client to reconnect.

if (messagesLimit > 0xffff) {
throw new IllegalArgumentException(
"Persisted messages limit can’t be greater than 65535!");
}

Since MQTT packet IDs are limited to 16 bits, the number of persisted messages per client is capped at 65535 to align with the protocol’s constraints. To enforce this limit, the Redis implementation dynamically manages the size of the sorted set. When new messages are added, older entries are trimmed once the limit is exceeded, and the associated string values are removed to free up memory (Figure 7).

3.3.5. Message Retrieval and Cleanup

The design not only manages the sorted set size during message persistence. It also performs cleanup during message retrieval. This happens when a device reconnects to receive undelivered messages. The cleanup removes references to expired entries, keeping the sorted set clean (Figure 8).

By combining Redis sorted sets and strings with Lua scripting for atomic operations, TBMQ ensures both efficient message handling and automated cleanup during storage and retrieval. This design effectively addresses the scalability challenges present in the earlier PostgreSQL-based solution.

The following sections present the performance comparison between the new Redis-based architecture and the original PostgreSQL solution.

3.4. Migration from Jedis to Lettuce

Jedis and Lettuce are two widely used Java clients for interacting with Redis. Jedis is a synchronous client that executes commands sequentially, blocking the calling thread until a response is received. In contrast, Lettuce is an asynchronous, non-blocking client built on top of Netty, designed for high-concurrency, event-driven workloads. These architectural differences significantly influence Redis performance in high-throughput scenarios and motivated the transition described below.

As previously discussed, a prototype test revealed the limit of 30 k msg/s throughput when using PostgreSQL for persistent message storage. At that time, Redis was already integrated into the system for cache-related operations using the Jedis client. Given this, Jedis was initially repurposed to support message persistence for MQTT clients. However, the prototype testing results of the new Redis-based implementation with Jedis were unexpected. While it was anticipated that Redis would significantly outperform PostgreSQL, the performance improvement was modest, reaching only 40 k msg/s throughput compared to the 30 k msg/s limit with PostgreSQL (Figure 9).

This observation triggered an investigation into performance bottlenecks, which revealed that Jedis was a limiting factor. While reliable, Jedis operates synchronously, processing each Redis command sequentially. This behavior causes the system to wait for each operation to finish before starting the next, which, in high-throughput scenarios, significantly restricts Redis’s performance and prevents full utilization of system resources.

To overcome this limitation, the migration to Lettuce, an asynchronous Redis client built on top of Netty, was performed [41,42,43]. With Lettuce, throughput increased to 60 k msg/s, demonstrating the benefits of non-blocking operations and improved parallelism, as seen in Figure 10.

Lettuce enables multiple commands to be dispatched and processed in parallel, fully exploiting Redis’s capacity for concurrent workloads. As a result, the migration delivered the expected performance improvements, laying the foundation for successful P2P testing at scale.

To provide a consolidated view of this performance progression, Table 1 summarizes the results of prototype testing across the three persistence configurations. These results were obtained under identical conditions using the same test harness and workload parameters. The table illustrates how architectural changes—particularly the migration from a synchronous to an asynchronous Redis client—progressively improved throughput and reduced latency, laying the foundation for the subsequent large-scale evaluation.

4. Experiments and Results

With Redis and Lettuce fully integrated, the next step was to validate TBMQ’s capability to support large-scale P2P messaging in a distributed environment. To simulate real-world conditions, TBMQ was deployed on AWS Elastic Kubernetes Service (EKS) [44], enabling dynamic scaling and stress testing of the system.

4.1. Test Methodology

To assess the TBMQ’s ability to handle point-to-point communication at scale, five tests were conducted to measure performance, efficiency, and latency, with a maximum throughput of 1 M msg/s. Throughput refers to the total number of messages per second, including both incoming and outgoing messages. The performance test environment was deployed on an AWS EKS cluster and scaled horizontally as the workload increased. This made it possible to evaluate how TBMQ handles growing demands while maintaining reliable performance.

Each test ran for 10 min, using an equal number of publishers and subscribers. Both publishers and subscribers operated with QoS 1, ensuring reliable message delivery. Subscribers were configured with clean_session=false, ensuring that messages were retained and delivered even during offline periods. Published messages were 62 bytes in size. They were assigned to unique topics such as "europe/ua/kyiv/$number", with corresponding subscriptions to "europe/ua/kyiv/$number/+". Here, $number identified each publisher–subscriber pair.

4.2. Test Agent Setup

To evaluate TBMQ’s performance under increasing message traffic, a test agent architecture was designed to simulate large-scale publisher and subscriber activity. The test agent consisted of two main components: runner pods and an orchestrator pod. Each component was deployed on Amazon Elastic Compute Cloud (EC2) instances, a scalable virtual computing service provided by AWS [44,45]. EC2 enables users to provision virtual machines with configurable CPU, memory, and network capacity, offering flexibility to handle varying workloads [46].

Runner Pods

Runner pods were dedicated to either publishing or subscribing. Each publisher pod generated the same number of clients as its corresponding subscriber pod. This one-to-one symmetry in client allocation ensured a balanced message exchange and consistent load distribution throughout the test. All pods were deployed on EC2 instances, and both the number of pods per instance and the number of instances were scaled in accordance with the target throughput. The configuration was designed to maximize resource efficiency while remaining within the operational constraints of the system.

A critical constraint in scaling was the number of available TCP ports on each runner pod instance. Each MQTT client requires a separate TCP connection. Therefore, we explicitly configured each pod to handle a maximum of 59,975 clients, matching the system’s defined ephemeral port range and file descriptor limits. As a result, the number of pods per instance was increased for higher message throughput scenarios to remain within this TCP ports limits capacity.

Table 2 shows that throughput increases were managed by scaling the number of EC2 instances or pods per instance. For example, a throughput of 1 million messages per second required four instances, each hosting five pods. This flexible setup allowed the test agent to scale with increasing traffic while accounting for infrastructure constraints, including ephemeral port limitations.

Orchestrator Pod

The orchestrator pod managed the execution and coordination of runner pods and was hosted on a dedicated EC2 instance. This instance also supported auxiliary monitoring tools, including:

Kafka Redpanda Console: For real-time broker monitoring [47].
Redis Insight: For analyzing database performance [48].

This modular architecture allowed the test agent to adapt dynamically to increasing traffic demands. By effectively distributing workloads across EC2 instances, it maintained consistent performance and reliable message delivery, even at high throughput levels.

4.3. Infrastructure Overview

This section provides an overview of the test infrastructure, highlighting the hardware specifications of the services utilized in EKS cluster. EKS is a managed platform that simplifies the deployment and management of containers using the popular Kubernetes system [49]. In the test environment, services such as TBMQ, Kafka, and Redis were deployed in containers within an EKS cluster. These containers were distributed across AWS EC2 virtual machines. This setup ensures optimal resource allocation, scalability, and performance during the testing process.

AWS RDS (Amazon Web Services Relational Database Service) is used for managing the PostgreSQL database. This database stores various TBMQ entities such as users, user credentials, MQTT client credentials, statistics, WebSocket connections, WebSocket subscriptions, and others.

Table 3 below presents the hardware specifications for the services used in the tests:

Note: To minimize costs during the load testing phase, only the Redis master nodes were used without replicas. This configuration enabled us to focus on achieving the target throughput without excessive resource provisioning.

Instance scaling was adjusted during each test to match workload demands, as described in the next section.

4.4. Performance Tests

To evaluate performance and prove that the system can scale efficiently, testing started with 200,000 msg/s. In each iteration, the load was increased by 200,000 messages. In each phase, the number of TBMQ brokers and Redis nodes was scaled to handle the growing traffic while maintaining system stability. For the 1 M msg/s test, the number of Kafka brokers was also increased to accommodate the corresponding workload (Table 4).

In addition to resource scaling, each increase in load required careful tuning of Kafka topic partitions and Lettuce command batching. These adjustments helped maintain balanced traffic distribution and stable latency, avoiding bottlenecks during system growth (Table 5).

The target of 1 million msg/s was successfully reached, confirming TBMQ’s ability to support high-throughput, reliable P2P messaging. To illustrate the setup and outcomes of this final test, Figure 11 provides a visual overview of the architecture used.

4.5. Results

During testing, key performance indicators, including, CPU usage, memory consumption, and message latency—were continuously monitored. One notable strength of TBMQ, as demonstrated in the P2P tests, is its high message throughput per CPU core. In comparison with public benchmarks of other brokers, TBMQ consistently achieved greater throughput with fewer resources, underscoring its efficiency in large-scale environments.

The key takeaways from the tests include the following:

Scalability: TBMQ exhibited linear scalability, with reliable performance maintained as message throughput increased from 200 k to 1 M msg/s through the incremental addition of TBMQ nodes, Redis nodes, and Kafka nodes.
Efficient Resource Utilization: CPU utilization on TBMQ nodes remained consistently around $\tilde{9} 0 %$ across all test phases, indicating that the system effectively used available resources without overconsumption.
Latency Management: The observed latency across all tests remained within two-digit bounds. This was predictable given the QoS 1 level chosen for the test, applied to both publishers and persistent subscribers. The average acknowledgment latency for publishers was also tracked, which stayed within single-digit bounds across all test phases.
High Performance: TBMQ’s one-to-one communication pattern showed excellent efficiency, processing about 8900 msg/s per CPU core. This was calculated by dividing the total throughput by the total number of CPU cores used in the setup.

Additionally, Table 6 provides a comprehensive summary of the key elements and results of the final 1 M msg/s test.

TBMQ CPU usage: The average CPU utilization across all TBMQ nodes.

P2P latency: The average duration from when a PUB message is sent by the publisher to when it is received by the subscriber.

Publish latency: The average time elapsed between the PUB message sent by the publisher and the reception of the PUBACK acknowledgment.

Figure 12 demonstrates CPU utilization (%) across five managed nodes in the TBMQ cluster during the 1 million messages-per-second performance evaluation. The peak load occurred at 16:40 UTC, with the highest observed CPU usage reaching 96.4% on one node. The subsequent decline in CPU utilization reflects the scheduled end of the 10-min load test. After the test duration elapsed, publishers and subscribers were terminated, leading to a rapid drop in system activity.

Figure 13 shows the Java Management Extensions (JMX) monitoring of the Central Processing Unit (CPU) confirming steady CPU load.

The system handles the load effectively even under high activity levels (approximately 90% CPU usage).
The absence of GC (Garbage Collection) activity confirms the stability and efficiency of Java Virtual Machine (JVM) performance during the tests.
The current low CPU usage after the test completion indicates that the system quickly releases resources and returns to its normal operational state.

Figure 14 shows the Java Management Extensions (JMX) monitoring of the RAM usage on one of the TBMQ nodes during the test, confirming no memory leaks after the warm-up period.

Initial Growth of Heap Memory: The initial increase in heap memory usage (from 2 GB to 12 GB) indicates the start of the performance test and the allocation of memory required for handling the workload.
Cyclic Memory Usage Patterns: Following stabilization, cyclic patterns in memory usage are observed, reflecting the regular activity of the Garbage Collection (GC) process. GC effectively frees unused memory, maintaining a stable memory footprint.
No Memory Overflow: The total allocated heap size (25 GB) was sufficient, as the memory usage never reached a critical threshold, preventing any OutOfMemory errors.
Stable Performance Under Load: The consistent memory usage patterns without significant spikes confirm the system’s ability to handle high workloads efficiently while maintaining GC effectiveness.
Return to Baseline: After test completion, memory usage gradually decreased, demonstrating the system’s capability to release resources promptly and return to its baseline state.

These results highlight the system’s optimized JVM configuration and heap management, ensuring reliable performance under intensive load conditions.

For a more detailed explanation of the testing architecture, methodology, and results, refer to the dedicated performance evaluation article [50].

5. Discussion

The results demonstrate that TBMQ achieves linear scalability and low-latency message delivery in large-scale P2P scenarios. It reaches throughput levels of 1 million messages per second while maintaining approximately 8900 messages per second per CPU core and 2-digit bound millisecond end-to-end delivery latencies. The Redis-based session persistence layer enabled efficient handling of persistent MQTT sessions, while Kafka provided reliable backend message durability and routing.

Compared to EMQX and HiveMQ approaches, TBMQ shows clear advantages in infrastructure cost efficiency and architectural simplicity. While EMQX and HiveMQ demonstrate substantial progress in scaling MQTT brokers using one-to-one communication patterns, the underlying persistence architectures and infrastructure requirements differ significantly from those of TBMQ. Both EMQX and HiveMQ rely primarily on disk-based persistence (RocksDB and file-based storage, respectively) to ensure high data durability. In contrast, TBMQ leverages an in-memory persistence model via Redis Cluster, combined with Kafka-based routing, to prioritize low-latency delivery and infrastructure cost efficiency.

Table 7 summarizes the architectural differences between TBMQ, EMQX, and HiveMQ.

Compared to previous approaches, such as EMQX and HiveMQ, TBMQ shows clear advantages in infrastructure cost efficiency and architectural simplicity. These suggest that TBMQ provides a compelling alternative for edge-oriented IoT deployments where minimizing latency and operational overhead is critical. A visual comparative analysis of resource consumption (CPU, memory, disk I/O) between TBMQ, EMQX, and HiveMQ under equivalent messaging workloads is a valuable direction for future work. Establishing standardized test suites for such evaluations would provide additional insights into infrastructure efficiency and help practitioners make informed architectural decisions [51,52].

However, several limitations should be acknowledged. First, Redis’s in-memory nature makes it inherently more volatile than disk-based storage. In the context of TBMQ, this volatility is acceptable and well-managed due to the system’s multi-layered persistence design. TBMQ does not rely on Redis Append-Only File (AOF) persistence, which was intentionally disabled. Instead, it employs a default Redis snapshot (RDB) strategy to support baseline fault recovery. This decision is grounded in the fact that AOF introduces unnecessary write amplification without providing significant resilience benefits in this architecture. Specifically, TBMQ persists messages to Kafka at each processing stage and only commits offsets after Redis acknowledges successful writes. This guarantees that any transient Redis failure does not result in data loss, as messages can be deterministically replayed from Kafka. In addition, Redis memory usage is tightly bounded via explicit per-client message limits and per-message TTLs, ensuring predictable memory allocation even in large-scale IoT deployments. While Redis replication and snapshots contribute to system robustness, Kafka remains the core durability layer, allowing Redis to be optimized for speed and responsiveness without sacrificing reliability.

Future work could focus on optimizing Redis utilization further by adjusting Lua scripting strategies. Currently, Lua scripts operate per client session to comply with Redis Cluster’s slot boundaries. By grouping multiple clients into the same hash slot, batch processing could be achieved, reducing scripting overhead and improving Redis efficiency. Additionally, exploring dynamic payload-aware routing strategies between Redis and Kafka could further enhance TBMQ’s flexibility. This would improve performance for a broader range of IoT messaging workloads.

6. Conclusions

This study presented the architectural transformation of TBMQ, an open source MQTT broker, to support high-throughput, low-latency point-to-point (P2P) messaging at scale. TBMQ migrates from a PostgreSQL-backed persistence layer to a horizontally scalable Redis-based architecture. It further optimizes access using Lettuce and Lua scripting. These changes allow TBMQ to overcome critical limitations faced by traditional MQTT brokers under high-concurrency loads. The redesigned architecture effectively separates session state management and message routing concerns, using Redis and Kafka, respectively, to achieve both performance and reliability.

The results of comprehensive performance tests show that the new architecture provides true horizontal scalability without compromising QoS guarantees. During these tests, TBMQ reached 1 million messages per second while maintaining sub-100 ms end-to-end latency and 8900 msg/s per CPU core. The system maintained high CPU efficiency, stable memory usage, and consistent delivery guarantees even under sustained peak load.

The findings confirm that TBMQ can serve as a robust and scalable foundation for next-generation IoT deployments, particularly those requiring reliable P2P communication between massive numbers of connected devices. This work also contributes to the broader research on distributed messaging systems by demonstrating a viable, cloud-native, multi-layered persistence model for MQTT brokers.

Future work will focus on further performance optimizations such as cross-session batching in Redis and tighter integration with external systems. One promising direction involves enhancing embedded integration support within TBMQ, enabling direct traffic routing from IoT devices to various platforms without intermediate transformation. This capability opens the door to advanced features like dynamic payload serialization (e.g., Protocol Buffers), which can be leveraged to reduce overhead and improve interoperability. Preliminary considerations for such enhancements are aligned with recent research on dynamic data serialization in IoT platforms [53].

Author Contributions

Conceptualization, A.S.; methodology, V.A.; software, D.S.; validation, A.S. and D.S.; formal analysis, D.S.; investigation, D.S.; resources, A.S.; writing—original draft preparation, D.S.; writing—review and editing, A.S. and V.A.; visualization, D.S.; supervision, V.A.; project administration, A.S.; funding acquisition, A.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by ThingsBoard, Inc., which provided the cloud infrastructure necessary for conducting the performance tests. The authors gratefully acknowledge their contribution to this research.

Data Availability Statement

The TBMQ source code is publicly available at https://github.com/thingsboard/tbmq. The performance test tool used in this study is available at https://github.com/thingsboard/tb-mqtt-perf-tests. All test configurations and datasets used for benchmarking can be reproduced using the provided scripts and documentation.

Acknowledgments

The authors would like to thank the team at ThingsBoard, Inc. for their support in the development and evaluation of the TBMQ platform. During the preparation of this manuscript, the authors used ChatGPT-4 (OpenAI, 2024) and Grammarly for grammar and spelling checks, sentence refinement, and improving overall clarity. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors are employees of ThingsBoard, Inc., which supported this research by providing access to infrastructure and computing resources. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AWS	Amazon Web Services
CPU	Central Processing Unit
EKS	Elastic Kubernetes Service

IoT	Internet of Things
JMX	Java Management Extensions
MQTT	Message Queuing Telemetry Transport
P2P	Point-to-Point
pub/sub	Publish/Subscribe
QoS	Quality of Service
RAM	Random Access Memory
TCP	Transmission Control Protocol

References

Mishra, B.; Kertész, A. The Use of MQTT in M2M and IoT Systems: A Survey. IEEE Access 2020, 8, 201071–201086. [Google Scholar] [CrossRef]
Hmissi, F.; Ouni, S. SDN-DMQTT: SDN-Based Platform for Re-configurable MQTT Distributed Brokers Architecture. In Mobile and Ubiquitous Systems: Computing, Networking and Services; Zaslavsky, A., Ning, Z., Kalogeraki, V., Georgakopoulos, D., Chrysanthis, P.K., Eds.; Springer: Cham, Switzerland, 2024; Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering; Volume 594, pp. 393–411. [Google Scholar] [CrossRef]
Dinculeană, D.; Cheng, X. Vulnerabilities and Limitations of MQTT Protocol Used between IoT Devices. Appl. Sci. 2019, 9, 848. [Google Scholar] [CrossRef]
De Paolis, L.T.; De Luca, V.; Paiano, R. Sensor Data Collection and Analytics with ThingsBoard and Spark Streaming. In Proceedings of the 2018 IEEE Workshop on Environmental, Energy, and Structural Monitoring Systems (EESMS), Salerno, Italy, 21–22 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar] [CrossRef]
Di Felice, P.; Paolone, G. Papers Mentioning Things Board: A Systematic Mapping Study. J. Comput. Sci. 2024, 20, 574–584. [Google Scholar] [CrossRef]
ThingsBoard. ThingsBoard IoT Platform. 2016. Available online: https://thingsboard.io (accessed on 1 March 2024).
Aghenta, L.O.; Iqbal, M.T. Design and implementation of a low-cost, open source IoT-based SCADA system using ESP32 with OLED, ThingsBoard and MQTT protocol. AIMS Electron. Electr. Eng. 2019, 4, 57–86. [Google Scholar] [CrossRef]
Bestari, D.N.; Wibowo, A. An IoT-Based Real-Time Weather Monitoring System Using Telegram Bot and Thingsboard Platform. Int. J. Interact. Mob. Technol. 2023, 17, 4–19. [Google Scholar] [CrossRef]
Casillo, M.; Colace, F.; De Santo, M.; Lorusso, A.; Mosca, R.; Santaniello, D. VIOTLab: A Virtual Remote Laboratory for Internet of Things Based on ThingsBoard Platform; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2021; Volume 2021. [Google Scholar] [CrossRef]
Jang, S.I.; Kim, J.Y.; Iskakov, A.; Fatih Demirci, M.; Wong, K.S.; Kim, Y.J.; Kim, M.H. Blockchain Based Authentication Method for ThingsBoard. Lect. Notes Electr. Eng. 2021, 715, 471–479. [Google Scholar] [CrossRef]
Okhovat, E.; Bauer, M. Monitoring the Smart City Sensor Data Using Thingsboard and Node-Red; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2021; pp. 425–432. [Google Scholar] [CrossRef]
Team, A.I.C. MQTT Communication Patterns: One-to-Many (Broadcast). 2023. Available online: https://docs.aws.amazon.com/whitepapers/latest/designing-mqtt-topics-aws-iot-core/mqtt-communication-patterns.html#broadcast (accessed on 1 May 2025).
Team, A.I.C. MQTT Communication Patterns: Many-to-One (Fan-In). 2023. Available online: https://docs.aws.amazon.com/whitepapers/latest/designing-mqtt-topics-aws-iot-core/mqtt-communication-patterns.html#fan-in (accessed on 1 May 2025).
ThingsBoard. TBMQ Release Notes v2.0.1 (31 December 2024). 2022. Available online: https://thingsboard.io/docs/mqtt-broker/releases/#v201-december-31-2024 (accessed on 1 May 2025).
Spohn, M.A. On MQTT Scalability in the Internet of Things: Issues, Solutions, and Future Directions. J. Electron. Electr. Eng. 2022, 1, 1–11. [Google Scholar] [CrossRef]
Akour, M.; Rousan, I.A.; Hassanein, H. Multi-Level Just-Enough Elasticity for MQTT Brokers of Internet of Things Applications. Clust. Comput. 2022, 25, 1025–1045. [Google Scholar] [CrossRef]
Mishra, B. Performance Evaluation of MQTT Broker Servers; Springer: Berlin/Heidelberg, Germany, 2018; pp. 599–609. [Google Scholar] [CrossRef]
Mishra, B.; Mishra, B.; Kertesz, A. Stress-Testing MQTT Brokers: A Comparative Analysis of Performance Measurements. Energies 2021, 14, 5817. [Google Scholar] [CrossRef]
Dizdarevic, J.; Michalke, M.; Jukan, A. Engineering and Experimentally Benchmarking Open Source MQTT Broker Implementations. arXiv 2023, arXiv:2305.13893. [Google Scholar]
Kashyap, M.; Dev, A.; Sharma, V. Implementation and Analysis of EMQX broker for MQTT Protocol in the Internet of Things. e-Prime-Adv. Electr. Eng. Electron. Energy 2024, 10, 100846. [Google Scholar] [CrossRef]
Gupta, A. Processing Time-Series Data with Redis and Apache Kafka. 2021. Available online: https://redis.io/blog/processing-time-series-data-with-redis-and-apache-kafka/ (accessed on 3 May 2025).
Team, E. MQTT and Redis: Creating a Real-Time Data Statistics Application for IoT. 2023. Available online: https://www.emqx.com/en/blog/mqtt-and-redis (accessed on 3 May 2025).
Ma, Z.; Yan, M.; Wang, R.; Wang, S. Performance Analysis of P2P Network Content Delivery Based on Queueing Model. Clust. Comput. 2023, 27, 2901–2915. [Google Scholar] [CrossRef]
Shen, Y.; Ma, Z. The Analysis of P2P Networks with Malicious Peers and Repairable Breakdown Based on Geo/Geo/1+1 Queue. J. Parallel Distrib. Comput. 2025, 195, 104979. [Google Scholar] [CrossRef]
Dong, B.; Chen, J. An Adaptive Routing Strategy in P2P-Based Edge Cloud. J. Cloud Comput. 2024, 13, 13. [Google Scholar] [CrossRef]
Shvaika, D.I.; Shvaika, A.I.; Landiak, D.I.; Artemchuk, V.O. Scalable and Reliable MQTT Messaging: Evaluating TBMQ for P2P Scenarios. CEUR Workshop Proc. 2025, 58–66. Available online: https://ceur-ws.org/Vol-3943/paper12.pdf (accessed on 3 May 2025).
ThingsBoard. TBMQ 1.0: 100 Million Connections Performance Test. 2022. Available online: https://thingsboard.io/docs/mqtt-broker/reference/100m-connections-performance-test/ (accessed on 1 May 2025).
ThingsBoard. TBMQ 1.0: 3 Million Messages Per Second Throughput on a Single Node. 2022. Available online: https://thingsboard.io/docs/mqtt-broker/reference/3m-throughput-single-node-performance-test/ (accessed on 1 May 2025).
Apache Software Foundation. Apache Kafka Documentation. 2025. Available online: https://kafka.apache.org/documentation (accessed on 1 April 2025).
Vyas, S.; Tyagi, R.K.; Jain, C.; Sahu, S. Performance Evaluation of Apache Kafka—A Modern Platform for Real Time Data Streaming. In Proceedings of the 2022 2nd International Conference on Innovative Practices in Technology and Management (ICIPTM), Pradesh, India, 23–25 February 2022; Volume 2, pp. 465–470. [Google Scholar] [CrossRef]
Park, S.; Huh, J.H. A Study on Big Data Collecting and Utilizing Smart Factory Based Grid Networking Big Data Using Apache Kafka. IEEE Access 2023, 11, 96131–96142. [Google Scholar] [CrossRef]
Elshoubary, E.E.; Radwan, T. Studying the Efficiency of the Apache Kafka System Using the Reduction Method, and Its Effectiveness in Terms of Reliability Metrics Subject to a Copula Approach. Appl. Sci. 2024, 14, 6758. [Google Scholar] [CrossRef]
ThingsBoard Inc. TBMQ Performance Tests: P2P Messaging Benchmark Suite. 2024. Available online: https://github.com/thingsboard/tb-mqtt-perf-tests/tree/p2p-perf-test (accessed on 3 May 2025).
Timescale Team. PostgreSQL + TimescaleDB: 1000x Faster Queries, 90% Data Compression, and Much More. 2018. Available online: https://www.timescale.com/blog/postgresql-timescaledb-1000x-faster-queries-90-data-compression-and-much-more (accessed on 1 May 2025).
Salunke, S.; Ouda, A. A Performance Benchmark for the PostgreSQL and MySQL Databases. Future Internet 2024, 16, 382. [Google Scholar] [CrossRef]
Ünal, H.T.; Mendi, A.F.; Mete, S.; Özkan, Ö.; Vurgun.;, Ö.U.; Nacar,, M.A. PostgreSQL Database Management System: ODAK. In Proceedings of the 2023 Innovations in Intelligent Systems and Applications Conference (ASYU); Sivas, Turkey, 11–13 October 2023, IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar] [CrossRef]
Redis. Redis Pub/Sub. 2024. Available online: https://redis.io/docs/latest/develop/interact/pubsub/ (accessed on 1 April 2025).
Chen, L.G.; Li, Y.; Laohakangvalvit, T.; Sugaya, M. Asynchronous I/O Persistence for In-Memory Database Servers: Leveraging io_uring to Optimize Redis Persistence. In Proceedings of the CLOUD Computing–CLOUD 2024, Bangkok, Thailand, 16–19 November 2024; Wang, Y., Zhang, L.J., Eds.; Springer: Cham, Switzerland, 2025; pp. 11–20. [Google Scholar]
Muradova, G.; Hematyar, M.; Jamalova, J. Advantages of Redis in-memory database to efficiently search for healthcare medical supplies using geospatial data. In Proceedings of the 2022 IEEE 16th International Conference on Application of Information and Communication Technologies (AICT), Washington, DC, USA, 12–14 October 2022; pp. 1–5. [Google Scholar] [CrossRef]
Redis Lua Scripting Documentation. 2024. Available online: https://redis.io/docs/latest/commands/eval/ (accessed on 1 April 2025).
Lettuce Project Contributors. Lettuce-Scalable Redis Client. 2024. Available online: https://lettuce.io/ (accessed on 1 April 2025).
Lee, T. Netty: Asynchronous Event-Driven Network Application Framework. 2021. Available online: https://netty.io/ (accessed on 1 April 2025).
Maurer, N.; Wolfthal, M.A. Netty in Action; Manning Publications: Shelter Island, NY, USA, 2015. [Google Scholar]
AWS. AWS. 2023. Available online: https://aws.amazon.com/ (accessed on 1 April 2025).
Wittig, A.; Wittig, M. Amazon Web Services in Action: An In-Depth Guide to AWS; Simon and Schuster: New York, NY, USA, 2023. [Google Scholar]
ThingsBoard Inc. How to Repeat the 1 M msg/s Throughput Test. 2024. Available online: https://thingsboard.io/docs/mqtt-broker/reference/1m-throughput-p2p-performance-test/#how-to-repeat-the-1m-msgsec-throughput-test, (accessed on 3 May 2025).
Redpanda Data. Redpanda Console: A Developer-Friendly UI for Kafka. 2024. Available online: https://www.redpanda.com/redpanda-console-kafka-ui (accessed on 3 May 2025).
Redis Ltd. RedisInsight: Developer Tool for Managing Redis. 2024. Available online: https://redis.io/docs/latest/operate/redisinsight/ (accessed on 3 May 2025).
Kubernetes. Kubernetes Documentation. 2023. Available online: https://kubernetes.io/ (accessed on 1 April 2025).
ThingsBoard. Scaling P2P Messaging to 1 M Msg/s with Persistent MQTT Clients. 2022. Available online: https://thingsboard.io/docs/mqtt-broker/reference/1m-throughput-p2p-performance-test/ (accessed on 1 May 2025).
EMQX Team. Reaching 100 M MQTT Connections with EMQX 5.0. 2022. Available online: https://www.emqx.com/en/blog/reaching-100m-mqtt-connections-with-emqx-5-0 (accessed on 1 May 2025).
HiveMQ Team. HiveMQ Increases MQTT Per-Core Throughput. 2023. Available online: https://www.hivemq.com/blog/hivemq-increases-mqtt-per-core-throughput/ (accessed on 1 May 2025).
Shvaika, D.I.; Shvaika, A.I.; Artemchuk, V.O. Advancing IoT interoperability: Dynamic data serialization using ThingsBoard. J. Edge Comput. 2024, 3, 126–135. [Google Scholar] [CrossRef]

Figure 1. Overview of the TBMQ fan-in communication pattern in a distributed cluster.

Figure 2. TBMQ architecture using PostgreSQL for MQTT session persistence, illustrating the performance bottleneck caused by vertically scaled, disk-based storage.

Figure 3. The graph reflects 75 k tuples in/5 s, corresponding to 15 k msg/s for persistent MQTT clients, or half of the 30 k msg/s throughput measured.

Figure 4. Updated TBMQ architecture using Redis for session persistence in high-throughput P2P messaging scenarios.

Figure 5. Redis Cluster slot-based sharding model. Each key is hashed to one of 16,384 slots and routed to the corresponding shard [40].

Figure 6. Redis sorted set structure used for MQTT message ordering.

Figure 7. Lua snippet for cleaning up Redis data structures during message persistence.

Figure 8. Lua snippet for message retrieval with cleanup of expired references in Redis.

Figure 9. RedisInsight shows ~66 k commands/s per node, aligning with TBMQ’s 40 k msg/s, as Lua scripts trigger multiple Redis operations per message.

Figure 10. At 60 k msg/s, RedisInsight shows ~100 k commands/s per node, aligning with the expected increase from 40 k msg/s, which produced ~66 k commands/s per node.

Figure 11. TBMQ architecture and traffic distribution during the 1 million msg/s test. Each TBMQ node handled 100 k publishers and 100 k subscribers using persistent MQTT sessions with QoS 1. Redis nodes provided session storage, while Kafka brokers handled message routing.

Figure 12. CPU utilization (%) of five managed nodes in the TBMQ cluster.

Figure 13. JMX monitoring of CPU on the TBMQ node.

Figure 14. JMX monitoring of RAM on the TBMQ node.

Table 1. Prototype testing results comparing persistent message throughput across PostgreSQL and Redis-based implementations. PostgreSQL results are also visualized in Figure 3.

Backend Implementation	Throughput (msg/s)	Average Latency (ms)	Architecture Notes
PostgreSQL	30,000	~30–50	Vertically scaled, disk-based, no clustering
Redis + Jedis	40,000	~25–40	In-memory, synchronous client, no batching
Redis + Lettuce	60,000	~20–30	In-memory, asynchronous client with commands batching

Table 2. Runner pod deployment configuration across test scenarios.

Throughput (msg/s)	Pods per Instance	EC2 Instances	MQTT Clients per Pod
200,000	5	1	20,000
400,000	5	1	40,000
600,000	10	1	30,000
800,000	5	2	40,000
1,000,000	5	4	25,000

Table 3. Hardware specifications for the services used in the tests.

Service Name	TBMQ	Kafka	Redis	AWS RDS (PostgreSQL)
Instance Type	c7a.4xlarge	c7a.large	c7a.large	db.m6i.large
vCPU	16	2	2	2
Memory (GiB)	32	4	4	8
Storage (GiB)	20	30	8	20
Network Bandwidth (Gibps)	12.5	12.5	12.5	12.5

Table 4. Scaling configuration for P2P throughput evaluation.

Throughput (msg/s)	Publishers	Subscribers	TBMQ Brokers	Redis Nodes	Kafka Brokers
200 k	100 k	100 k	1	3	3
400 k	200 k	200 k	2	5	3
600 k	300 k	300 k	3	7	3
800 k	400 k	400 k	4	9	3
1 M	500 k	500 k	5	11	5

Table 5. Kafka and Redis tuning parameters at different throughput levels.

Throughput (msg/s)	Kafka Partitions	Lettuce Batch Size
200 k	12	150
400 k	12	250
600 k	12	300
800 k	16	400
1 M	20	500

Table 6. Summary of performance metrics at 1 M msg/s throughput.

QoS	P2P Latency (ms)	Publish Latency (ms)	TBMQ CPU Usage (avg)	Payload (bytes)
1	~75	~8	91%	62

Table 7. Comparison of persistence models and infrastructure focus across MQTT brokers.

Feature	TBMQ	EMQX	HiveMQ
Primary Storage	Redis (in-memory)	RocksDB (disk)	File system (disk)
Latency	Very low	Moderate (SSD-dependent)	Moderate (SSD-dependent)
Crash Durability	Medium (depends on Redis snapshot/AOF)	High (RocksDB durability)	Very High (file persistence)
Memory Usage	High	Lower	Lower
Persistence Overhead	Minimal (in-memory ops)	Higher (disk writes)	Higher (disk writes)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shvaika, D.; Shvaika, A.; Artemchuk, V. MQTT Broker Architectural Enhancements for High-Performance P2P Messaging: TBMQ Scalability and Reliability in Distributed IoT Systems. IoT 2025, 6, 34. https://doi.org/10.3390/iot6030034

AMA Style

Shvaika D, Shvaika A, Artemchuk V. MQTT Broker Architectural Enhancements for High-Performance P2P Messaging: TBMQ Scalability and Reliability in Distributed IoT Systems. IoT. 2025; 6(3):34. https://doi.org/10.3390/iot6030034

Chicago/Turabian Style

Shvaika, Dmytro, Andrii Shvaika, and Volodymyr Artemchuk. 2025. "MQTT Broker Architectural Enhancements for High-Performance P2P Messaging: TBMQ Scalability and Reliability in Distributed IoT Systems" IoT 6, no. 3: 34. https://doi.org/10.3390/iot6030034

APA Style

Shvaika, D., Shvaika, A., & Artemchuk, V. (2025). MQTT Broker Architectural Enhancements for High-Performance P2P Messaging: TBMQ Scalability and Reliability in Distributed IoT Systems. IoT, 6(3), 34. https://doi.org/10.3390/iot6030034

Article Menu

MQTT Broker Architectural Enhancements for High-Performance P2P Messaging: TBMQ Scalability and Reliability in Distributed IoT Systems

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Architecture and Implementation Details

3.2. PostgreSQL Usage and Limitations

3.3. Redis as a Scalable Alternative

3.3.1. Redis Cluster Constraints

3.3.2. Atomic Operations with Lua Scripting

3.3.3. Choosing the Right Redis Data Structures

3.3.4. Managing Sorted Set Size Dynamically

3.3.5. Message Retrieval and Cleanup

3.4. Migration from Jedis to Lettuce

4. Experiments and Results

4.1. Test Methodology

4.2. Test Agent Setup

Runner Pods

Orchestrator Pod

4.3. Infrastructure Overview

4.4. Performance Tests

4.5. Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI