Stress-Testing MQTT Brokers: A Comparative Analysis of Performance Measurements

: Presently, Internet of Things (IoT) protocols are at the heart of Machine-to-Machine (M2M) communication. Irrespective of the radio technologies used for deploying an IoT/M2M network, all independent data generated by IoT devices (sensors and actuators) rely heavily on the special messaging protocols used for M2M communication in IoT applications. As the demand for IoT services is growing, the need for reduced power consumption of IoT devices and services is also growing to ensure a sustainable environment for future generations. The Message-Queuing Telemetry Transport or in short MQTT is a widely used IoT protocol. It is a low-resource-consuming messaging solution based on the publish–subscribe type communication model. This paper aims to assess the performance of several MQTT broker implementations (also known as MQTT servers) using stress testing, and to analyze their relationship with system design. The evaluation of the brokers is performed by a realistic test scenario, and the analysis of the test results is done with three different metrics: CPU usage, latency, and message rate. As the main contribution of this work, we analyzed six MQTT brokers (Mosquitto, Active-MQ, Hivemq, Bevywise, VerneMQ, and EMQ X) in detail, and classiﬁed them using their main properties. Our results showed that Mosquitto outperforms the other considered solutions in most metrics; however, ActiveMQ is the best performing one in terms of scalability due to its multi-threaded implementation, while Bevywise has promising results for resource-constrained scenarios. Mishra) and B.M. (Biswaranjan Mishra); writing—original draft preparation, B.M. (Biswajeeban Mishra); writing—review and editing, A.K., B.M. (Biswajee-ban Mishra); visualization, B.M. (Biswajeeban Mishra); supervision, A.K.; project administration, A.K.; funding acquisition, A.K. All authors have read and agreed to the published version of the manuscript.


Introduction
In recent times, as the cost of sensors and actuators is continuing to fall, the number of Internet of Things (IoT) devices is rapidly growing and becoming part of our lives. As a result, the IoT footprint is significantly noticeable everywhere. It is hard to find any industry that has not been revolutionized with the advent of this promising technology. A recent report [1] states that there would be around 125 billion IoT devices connected to the Internet by 2030. IoT networks use several radio technologies such as WLAN, WPAN, etc., for communication at a lower layer. Regardless of the radio technology used, to create an M2M network, the end device or machine (IoT device) must make their data accessible through the Internet [2,3]. IoT devices are usually resource-constrained. It means that they operate with limited computation, memory, storage, energy storage (battery), and networking capabilities [4,5]. Hence, the efficiency of M2M communications largely depends on the underlying special messaging protocols designed for M2M communication in IoT applications. MQTT (Message-Queuing Telemetry Transport) [6], CoAP (Constrained Application Protocol), AMQP (Advanced Message-Queuing Protocol), and HTTP (Hypertext Transfer Protocol) are the few to name in the M2M Communication Protocol segment [4,5]. Among these IoT Protocols, MQTT is a free, simple to deploy,

Basics of a Publish/Subscribe Messaging Service
These are the terms we often come across while working with a publish/subscribe or Pub/Sub System. "Message" refers to the data that flows through the system. "Topic" is an object that presents a message stream. "Publisher" creates messages and sends them to the messaging service on a particular topic head. The act of sending messages to the messaging service is called "Publishing". A publisher is also referred to as a Producer. "Subscriber", otherwise known as "Consumer", receives the messages on a specific subscription. "Subscription" refers to an interest in receiving messages on a particular topic. In a Pub/Sub system, producers of the event-driven data are usually decoupled from the consumers of the data [14,15]: meaning publishers and subscribers are independent components that share information by publishing event-driven messages and by subscribing to event-driven messages of choice [14]. The central component of this system is called an event broker. It keeps a record of all the subscriptions. A publisher usually sends a message to the event broker on a specific topic head and then the event broker sends it to all the subscribers that previously subscribed to that topic. The event broker basically acts as a postmaster to match, notify, and deliver events to the corresponding subscribers. Figure 1 describes the overall architecture of a Pub/Sub system [16].

Overview of MQTT Architecture
MQTT is a simple, lightweight, TCP/IP-based Pub/Sub type messaging protocol [11]. MQTT supports one-to-many, two-way, asynchronous communication [7]. Having a binary header makes MQTT a lightweight protocol to carry telemetry data transmission between constraint devices [17] over unreliable networks [18].
It has three constituent components: • A Publisher or Producer (An MQTT client). • A Broker (An MQTT server). • A Consumer or Subscriber (An MQTT client).
In MQTT, a client that is responsible for opening a network connection, creating and sending messages to the server is called a publisher. The subscriber is a client that subscribes to a topic of interest in advance to receive messages. It can also unsubscribe from a topic to delete a request for application messages and close network connection to the server [19] as needed. The server, otherwise known as a broker, acts as a post office between the publisher and the subscriber. It receives messages from the publishers and forwards them to all the subscribers. Figure 2 presents a basic model of MQTT [20]. Any application message carried by the MQTT protocol across the network to its destination contains a quality of service (QoS), payload data, a topic name [21], and a collection of properties. An application message can carry a payload up to the maximum size of 256 MB [3]. A topic is usually a label attached to all messages. Topic names are UTF-8 encoded strings and can be freely chosen [21]. Topic names can represent a multilevel hierarchy of information using a forward slash (/). For example, this topic name can represent a humidity sensor in the kitchen room: "home/ sensor/humidity/kitchen". We can have other topic names for other sensors that are present in other rooms: "home/sensor/temperature/livingroom", and "home/ sensor/temperature/kitchen" etc. Figure 3 shows an example of a topic tree. MQTT offers three types of QoS (Quality of Service) levels to send messages to an MQTT broker or a client. It ranges from 0 to 2, see Figure 4. By using QoS level 0: the sender does not store the message, and the received does not acknowledge its receiving. This method requires only one message and once the message is sent to the broker by the client it is deleted from the message queue. Therefore QoS 0 nullifies the chances of duplicate messages, which is why it is also known as the "fire and forget" method. It provides a minimal and most unreliable message transmission level that offers the fastest delivery effort. Using QoS 1, the delivery of a message is guaranteed (at least once, but the message may be sent more than once , if necessary). This method needs two messages. Here, the sender sends a message and waits to receive an acknowledgment (PUBACK message) to receive. If it receives an acknowledgment from the client then it deletes the message from the outward-bound queue. In case, it does not receive a PUBACK message, it resends the message with the duplicate flag (DUP flag) enabled. The QoS 2 level setting guarantees exactly-once delivery of a message. This is the slowest of all the levels and needs four messages. In this level, the sender sends a message and waits for an acknowledgment (PUBREC message). The receiver also sends a PUBREC message. If the sender of the message fails to receive an acknowledgment (PUBREC), it sends the message again with the DUP flag enabled. Upon receiving the acknowledgment message PUBREC, the sender transmits the message release message (PUBREL). If the receiver does not receive the PUBREL message it resends the PUBREC message. Once the receiver receives the PUBREL message, It forwards the message to all the subscribing clients. Thereafter the receiver sends a publish complete (PUBCOMP) message. In case the sender does not receive the PUBCOMP message, it resends the PUBREL message. Once the sending client receives the PUBCOMP message, the transmission process is marked as completed and the message can be deleted from the outbound queue [13].

Scalability and Types of MQTT Broker Implementations
System scalability can be defined as the ability to expand to meet increasing workload [22]. Scalability enhancement of any message broker depends on two prime factors; the first one is to enhance a single system performance, while the second one is to use clustering. In case of an MQTT message broker deployment, the performance of an MQTT broker using a single system can be improved using event-driven I/O mechanism for the CPU cores during dispatching TCP connections from MQTT clients [21]. The other way of achieving better scalability is clustering, when an MQTT broker cluster is used in a distributed fashion. In this case it seems to be a single logical broker for the user, but in reality, multiple physical MQTT brokers share the same workload [23]. There are two types of message broker implementations: single or fixed number of threads non-scalable broker implementations and multi-thread or multi-process scalable broker implementations that can efficiently use all available resources in a system [16]. For example, Mosquitto and Bevywise MQTT Route are non-scalable broker implementations that cannot use all system resources, and broker implementations such as ActiveMQ, HiveMQ, VerneMQ, and EMQ X are scalable [23]. It is be noted that Mosquitto provides a "bridge mode" that can be used to form a cluster of message brokers. In this mode, multiple cores are used according to the number of Mosquitto processes running in the cluster. However, the drawback of this mode is the communication overhead between the processes inside the cluster results in the poorer overall performance of the system [16].

Evaluating the Performance of a Messaging Service
Google in its "Cloud Pub/Sub" product guide [14] nicely narrates the parameters to judge the performance of any publish/subscribe type messaging service. The performance of a publish/subscribe type messaging services can be measured in three factors "latency", "scalability", and "availability". However, these three aspects frequently contradict each other and involve compromises on one to boost the other two. The following paragraphs put some light on these terms in a pub/sub type messaging service prospective.

Latency
Latency is a time-based metric for evaluating the performance of a system. A good messaging service must optimize and reduce latency, wherever it is possible. The latency metric can be defined for a publish/subscribe service in the following: it denotes the time the service takes to acknowledge a sent message, or the time the service takes to send a published message to its subscriber. Latency can also be defined as the time taken by a messaging service to send a message from the publisher to the subscriber [14].

Scalability
Scalability usually refers to the ability to scale up with the increase in load. A robust scalable service can handle the increased load without an observable change in latency or availability. One can define load in a publish/subscribe type service by referring to the number of topics, publishers, subscribers, subscriptions or messages, as well as to the size of messages or the payload, and to the rate of sent messages, called throughput [14].

Availability
Systems can fail. It has many reasons. It may occur due to a human error while building or deploying software or configurations or it may be caused due to hardware failures such as disk drives malfunctioning or network connectivity issues. Sometimes a sudden increase in load results in resource shortage and thus causes a system failure. When we say "sound availability of a system"-it usually refers to the ability of the system to handle a different type of failure in such a manner that is unobservable at the customer's end [14].

Related Work
There have been numerous works around the performance evaluation of various IoT communication protocols. In this section, we briefly summarize some of the notable works published in recent years. Table 1 presents a comparison of related works according to their main contributions. In 2014, Thangavel, Dinesh, et al. [24], conducted multiple experiments using a common middleware, to test MQTT and CoAP protocols, bandwidth consumption and endto-end delay. Their results showed that using CoAP messages showed higher delay and packet loss rates than using MQTT messages.
Chen, Y., and Kunz, T., in 2016 [25], evaluated in a medical test environment MQTT, CoAP, and DDS (Data Distribution Service) performance, compared to a custom, UDPbased protocol. They used a network emulator, and their findings showed that DDS consumes higher bandwidth than MQTT, but it performs significantly better for data latency and reliability. DDS and MQTT, being TCP-based protocols, produced zero packet loss under degraded network conditions. The custom UDP and UDP-based CoAP showed significant data loss under similar test conditions. Mishra, B., in 2019 [18], investigated the performance of several public and locally deployed MQTT brokers, in terms of subscription throughput. The performance of MQTT brokers was analyzed under normal and stressed conditions. The test results showed that there is an insignificant difference between the performance of several MQTT brokers in normal deployment cases, but the performance of various MQTT brokers significantly varied from each other under the stressed conditions.
Pham, M. L., Nguyen, et al. in 2019 [26], introduced an MQTT benchmarking tool named MQTTBrokerBench. The tool is useful to analyze the performance of MQTT brokers by manually specifying load saturation points for the brokers.
Bertrand-Martinez, Eddas, et al. [27], in 2020, proposed a method for the classification and evaluation of IoT brokers. They performed qualitative evaluation using the ISO/IEC 25,000 (SQuaRE) set of standards and the Jain's process for performance evaluation. The authors have validated the feasibility of their methodological approach with a case study on 12 different open source brokers.
Koziolek H, Grüner S, et al. [28], in 2020 compared three distributed MQTT brokers in terms of scalability, performance, extensibility, resilience, usability, and security. In their edge gateway, the cluster-based test scenario showed that EMQX had the best performance, while HiveMQ showed no message loss, while VerneMQ managed to deliver up to 10 K msg/s, respectively. The authors also proposed six decision points to be taken into account by software architects for deploying MQTT brokers.
Referring back to this work of ours, we compare both scalable and non-scalable MQTT brokers and analyze the performance of six MQTT brokers in terms of message processing rate at 100% process/system CPU use, normalized message rate at unrestricted resource (CPU) usage, and average latency. We also analyze how each broker performs in a singlecore and multi-core processor environment. For a better analysis of the performance of MQTT brokers, we conducted this experiment in a low-end local testing environment as well as in a comparatively high-end cloud-based testing environment. This experiment deals with an important problem of the relation of MQTT broker system design and its performance under stress testing. Although It is a well-known fact that modular systems better perform on scalable and elastic requirements, but we lack experiment-based information about that relationship. Therefore, results obtained in this study would be immensely helpful to developers of real-time systems and services.

Local and Cloud Test Environment Settings and Benchmarking Results
This section presents the setup of our realistic testbed in detail. To conduct stress tests on various MQTT brokers, we have built two emulated IoT environments: • one is a local testing environment, and • the other one is a cloud-based testing environment.
The local testbed was created using an Intel NUC (NUC7i5BNB), a Toshiba Satellite B40-A laptop PC, and an Ideapad 330-15ARR laptop PC. To diminish network bottleneck issues, the devices were connected through a Gigabit Ethernet switch. The Intel NUC7i5BNB was configured as a server running an MQTT broker, the Ideapad 330-15ARR laptop was used as a publisher machine, and the Toshiba, Satellite B40-A was used as a subscriber machine. The Ideapad 330-15ARR (publisher machine), with 8 hardware threads, is capable enough of firing messages at higher rates. Table 2 presents a summary of the specifications of the hardware and software used to build our local evaluation environment. The cloud testbed was configured on Google Cloud Platform (GCP) [29]. We created three c2-standard-8 virtual machine (VM) instances that have 8 vCPUs, 32 GB of memory, and 30 GB local SSD each to act as publisher, subscriber, and server, respectively. All the VM instances are placed within a Virtual Private Cloud (VPC) Network subnet using Google's high-performing premium tier network service [30]. Table 3 presents a summary of the specifications of our cloud test environment [31]. In this experiment we used a higher message publishing rate with multiple publishers, and the overall CPU usage we experienced stayed below 70% on the publisher machine. On the other hand, we also noticed that CPU usage on the subscriber side did not exceed 80%. We experienced no swap usage at the subscriber, broker or publisher machines during the evaluation.
For this experiment, we have developed a Paho Python MQTT library [32]-based benchmarking tool called MQTT Blaster [33] from scratch to send messages at very high rates to the MQTT server from the publisher machine. The subscriber machine used the "mosquitto_sub" command line subscribers, which is an MQTT client for subscribing to topics and printing the received messages. During this empirical evaluation, the "mosquitto_sub" output was redirected to the null device (/dev/null). In this way we could ensure that resources are not consumed to write messages, and each subscriber was configured to subscribe to the available published topics. In this way we made the server reaching its threshold at reasonable message publishing rates. Figure 5 presents the evaluation environment topology.

Evaluation Scenario
This experiment was conducted on four widely used scalable and two non-scalable MQTT broker implementations. The other criteria for the selection of brokers were ease of availability and configurability. The tested brokers are: "Mosquitto 1.4.15" [34], "Bevywise MQTT Route 2.0" [35], "ActiveMQ 5.15.8" [36], "HiveMQ CE 2020.2" [37], "VerneMQ 1.10.2" [38] and "EMQ X 4.0.8" [39]. Out of these MQTT brokers, Mosquitto and Bevywise MQTT Route are non-scalable implementations, and the rest are scalable in nature. It is to be mentioned that Mosquitto is a single-threaded implementation, and Bevywise MQTT Route uses a dual thread approach, in which the first thread acts as an initiator of the second that processes messages. Table 4 presents an overview of the brokers.

Evaluation Conditions
All the brokers were configured to run on these test conditions, see Table 5, without authentication method enabled and RETAIN flag set to true. It is to be noted that with increase in the number of subscribers or the number of topics or message rate results in an increased load on the broker. In our test environment, with the combination of 3 different publishing threads (1 topic per thread) and 15 subscribers, we were able to push the broker to 100% process usage and limit the CPU usage on publisher and subscriber machines below 70% and 80% respectively.

. Latency Calculations
Latency is defined as the time taken by a system to transmit a message from a publisher to a subscriber [13]. This experiment tries to simulate a realistic scenario of a client trying to publish a message, when the broker is overloaded with many messages on various topics from different clients. To achieve this, a different topic was used to send messages for latency calculations from the topics on which messages were fired to overload the system. It is noteworthy that an ideal broker implementation should always be able to efficiently process messages irrespective of the rate of messages fired to it.

Message Payload
Using the MQTT protocol, all messages are transferred using a single telemetry parameter [9]. Baring this in mind, we used a small payload size not to overload the server memory. Concerning the message payload size setting, we used 64 bytes for the entire testing.

Benchmarking Results
We separate our experimental results into three distinct segments for better interpretation and understanding. We had taken 3 samples for each QoS in each segment and the best result with the maximum rate of message delivery, and zero message drop was considered for comparison. The three different categories are:

1.
Projected message processing rates of non-scalable brokers at 100% process CPU usage. See Tables 6 and 7.

2.
Projected message processing rates of scalable brokers at 100% system CPU usage. See Tables 8 and 9.

3.
Latency comparison of all the brokers (both scalable and non-scalable brokers)-see Tables 10 and 11.   Table 9. Projected message processing rates of scalable brokers at 100% system CPU usage ( cloud test results). All the brokers listed in this table are scalable in nature and can use all cores available in the system.

QoS QoS0 QoS1 QoS2
Observations/ Broker (non-scalable) ActiveMQ 5. 16 Table 11. Latency comparison of all the brokers in the cloud evaluation environment.

Brokers
Average Latency in ms.

Local Evaluation Results
In Table 6, we present a comparative performance analysis of non-scalable MQTT brokers. For non-scalable brokers such as Mosquitto and Bevywise MQTT Route, the projected message rate at 100% CPU usage (R ns ) can be calculated with the below Equation (1): Average Process CPU Usage: The CPU usage of a process (process CPU usage) is a measure of how much (in percentage) of the CPU's cycles are committed to the process that is currently running. Average process CPU use indicates the observed average of CPU use by the process during the experiment [40].
In this segment, Mosquitto 1.4.15 beats Bevywise MQTT Route 2.0 in terms of projected message processing rate at approximately 100% process CPU usage across all the QoS categories. See Figure 6.    Figure 7.
In a multi-core or distributed environment, a scalable broker implementation would scale up to use the maximum system resources available. Hence, the CPU use data sum up the CPU use by the process group consisting of all sub-processes/threads. The process group CPU use for scalable brokers can reach up to 100 × n% (where n = the number of cores available in the system). Here, in this test environment as n = 4, the CPU use percent for the deployed brokers could go up to 400%. This comparison gives a fair idea of how various brokers scale up and perform when they are deployed on a multi-core setup. For scalable brokers, Equation (2) calculates the projected message rate at the unrestricted resource (CPU) (R s ): Average System CPU Usage: The System CPU usage refers to how the available processors whether real or virtual in a System are being used. Average System CPU usage refers to the observed average system CPU use by the process during the experiment [41].
At QoS0, in terms of the projected message processing rate at 100% system CPU usage, EMQ X leads the race, at QoS1 and QoS2 ActiveMQ seems to be showing the best performance among all the brokers put to test; see Figure 8.

Cloud-Based Evaluation Results
In this subsection, we discuss the performance of MQTT brokers on the Google Cloud test environment. It is to be mentioned that the stress testing on MQTT brokers in the cloud environment is done with the latest versions of the brokers available. Table 7 lists average latency and projected message processing rates of non-scalable brokers at 100% CPU usage. In terms of projected message processing rate and average latency recorded Mosquitto 2.0.7 beats Bevywise MQTT 3.1-build 0221-01; see Figures 9 and 10.   To summarize our evaluation experiments, we can state that ActiveMQ scales well to beat all other brokers' performance on our local testbed (using a 4 core/8GB machine), and cloud testbed (on an 8 vCPU/32GB machine). It is the best scalable broker implementation we have tested so far. EMQ X, VerneMQ, HiveMQ CE also perform reasonably well in our test environment. On the other hand, if the hardware is resource-constrained (CPU/Memory/IO/Performance) or has a lower specification, than the local testbed used in this experiment, then Mosquitto or Bevywise MQTT Route can be taken as better choices over other scalable brokers. Another important point to observe is that when we moved from a local testing environment to a cloud testing environment with stronger hardware specification in terms of number of cores and memory, significant improvement in latency is shown by each of the brokers.

Conclusions
M2M protocols are the foundation of Internet of Things communication. There are many M2M communication protocols such as MQTT, CoAP, AMQP, and HTTP, are available. In this work, we reviewed and evaluated the performance of six MQTT brokers in terms of message processing rate at 100% process group CPU use, normalized message rate at unrestricted resource (CPU) usage, and average latency by putting the brokers under stress test.
Our results showed that broker implementations such as Mosquitto and Bevywise could not scale up automatically to make use of the available resources, yet they performed better than other scalable brokers on a resource-constrained environment. Mosquitto was the best performing broker in the first evaluation scenario, followed by Bevywise. However, in a distributed/multi-core environment, ActiveMQ performed the best. It scaled well, and showed better results than all other scalable brokers we put to test. The findings of this research highlight the significance of the relationship between MQTT broker system design and its performance under stress testing. It aims to fill the gap of lack of test-driven information on the topic, and helps real-time system developers to a great extent in building and deploying smart IoT solutions.
In the future, we would like to continue our evaluations in a more heterogeneous cloud deployment, and further study the scalability aspects of bridged MQTT broker implementations.

Author Contributions:
Conceptualization