1. Introduction
The rapid proliferation of the Internet of Things (IoT), driven by advancements in information and communication technology, has resulted in the interconnection of billions of devices, forming expansive networks. The IoT is employed in various fields, including healthcare, industry, and domestic applications, and its convergence with big data, cloud services, and artificial intelligence fosters novel business opportunities and value creation. IoT devices tasked with real-time data acquisition and processing are often geographically dispersed and deployed on a large scale, rendering physical access challenging. Consequently, firmware over-the-air (FOTA) technology has become indispensable for remote device monitoring, diagnostics, and efficient firmware distribution and patching, including timely responses to security vulnerabilities [
1].
The expanding influence of the IoT across daily life and various industries is paralleled by a surge in security threats. Notably, man-in-the-middle (MITM) attacks targeting gateways, application/firmware suppliers, and manufacturers—intermediaries between IoT devices and firmware providers—pose an increasing risk [
2]. MITM attacks can disrupt device operation or lead to sensitive information leakage by enabling attackers to falsify or inject malicious code after intercepting transmitted firmware files. The FOTA mechanism is particularly susceptible to MITM attacks owing to its reliance on wireless communication in environments with limited physical control.
Various security enhancement technologies are being studied to address these attacks. End-to-end encryption (E2EE) is commonly employed as a security system in commercial networks. E2EE systems establish secure communication by encrypting data between the client and server, preventing eavesdropping by potential attackers [
3]. Over-the-air-rekeying (OTAR) is a wireless update technology that decentralizes equipment management. It enables remote key updates for communication security equipment, reducing the frequency of new key transmissions, mitigating key-related threats during transit, and accelerating key update procedures [
4]. Furthermore, OTAR allows for isolating compromised nodes within the network, followed by remote key updates on the remaining nodes [
4]. Encryption algorithms such as advanced encryption standard (AES) [
5], SEED [
6], and triple data encryption standard (3DES) [
7], frequently utilized in OTAR, offer robust security. However, they often do not account for the resource constraints inherent in lightweight IoT devices. As a result, many lightweight cryptographic algorithms that are suitable for resource-constrained IoT environments have been developed. Taking two typical lightweight encryption algorithms as examples: CHACHA20 is faster than AES in CPUs without hardware acceleration [
8]. However, because of the stream cipher nature of CHACHA20, it does not manage a value called nonce in IoT environments, which would destroy the cipher if leaked to an attacker. PRESENT-80 can be implemented with minimal energy per bit on hundreds of bytes of RAM/ROM, particularly if hardware accelerators are available [
9]. However, 80-bit keys are highly vulnerable to brute-force attacks.
Therefore, this study proposes a methodology for reducing firmware file size using server-side lossless compression techniques and countering MITM attacks through a lightweight and secure FOTA mechanism that combines dual-XOR operations with two keys and multichannel transmission.
The main contributions of this study can be summarized as follows:
Compression techniques within the lightweight FOTA update process are utilized to reduce latency and enhance security concurrently.
A lightweight encryption technique is proposed to mitigate MITM attacks.
A comparative evaluation of the proposed and conventional techniques for firmware files was conducted.
The remainder of this paper is organized as follows:
Section 2 analyzes existing techniques designed to counter MITM attacks.
Section 3 details the proposed technique.
Section 4 describes the performance evaluation environment and analyzes the results. Finally,
Section 5 presents the conclusions.
3. Proposed Mechanism
Figure 1 illustrates the operational workflow of the proposed FOTA mechanism, which comprises two primary phases: key table sharing and the FOTA update process. During the key table sharing phase, both the server and the IoT device receive the key table via broadcast communication through a router. The FOTA update phase encompasses a seven-step operational process through which the server transmits the firmware file to the IoT device. Step 1 entails the preparation of a firmware binary file. In Step 2, the binary file undergoes lossless compression using the DEFLATE algorithm. Firmware is composed mainly of binary files [
16]; if some bits are missing during the transfer and decompression process after compression, the original file cannot be restored normally. The DEFLATE algorithm is a popular lossless compression algorithm used in practice, and DEFLATE-based libraries (e.g., QAT zip) are open-source packages that can be easily integrated into various applications [
17]. In addition, it can be dynamically applied with lower compression levels in delay-critical channel environments, which can be efficiently used in multichannel network environments in the future [
17]. Therefore, we used DEFLATE, a lossless algorithm, to integrate sliding window compression (LZ77) and Huffman encoding to achieve lossless compression of redundant data. Step 3 involves the encryption of the compressed file via a dual-XOR operation. Equations (1)–(3) exemplify the dual-XOR operation method:
After dividing the firmware file into multiple blocks, one such block is denoted as
. The size of each block is the same as the key length. If the length of the block (
) is not equal to the length of the key in the last block of the entire file, the blank space is filled by padding with the character 0. Each block
undergoes successive XOR operations with
and
, where each key differs from those used in preceding blocks. The XOR operation between block
of the compressed firmware file and
transforms
into D. Subsequently,
, representing the final encrypted file block, is derived from the XOR operation between D and
. Consequently, applying Equation (3) results in the entire firmware file being encrypted, as a total of
firmware file blocks are encrypted by applying Equations (1) and (2) for each block. In this case,
through
do not depend on each other consecutively; instead, each key is derived based on a seed value (private key) that is generated using a random number generator (RNG). Since XOR is a bitwise operation, even if several bits are corrupted during transmission, the original data can still be recovered, or errors can be detected effectively. In Step 4, the encrypted file and a private key are fragmented for transmission across multiple channels. The private key serves as a seed value for selecting
and
from the key table. This private key is generated randomly using a random number generator (RNG). The 6-GHz channel, exhibiting reduced interference and a superior data rate than the 5- and 2.5-GHz channels, is prioritized. Accordingly,
and the private key are partitioned and transmitted over optimized channels. The encrypted file is transmitted via the 6 GHz channel, whereas the private key is bifurcated into Private_Key1 and Private_Key2 transmitted over the 5- and 2.5-GHz channels, respectively. During this process, the private key is encrypted using the AES-128 algorithm. Therefore, even if an attacker eavesdrops on the packet over the channel, the original private key cannot be obtained owing to encryption of the packet. The use of multichannels increases the complexity of the attack because when an MITM attacker wants to obtain the secret key by eavesdropping on a channel, the attacker needs to successfully eavesdrop on both channels to obtain the complete recovery key. Especially, the proposed FOTA mechanism uses multilink operation simultaneous transmit and receive (MLO-STR). MLO-STR is a network situation where two or more channels can be used simultaneously. When channels are symmetrically occupied, MLO-STR can achieve up to 90% lower latency than single-link operation (SLO) [
18]. In Step 5, the IoT device receives Private_Key1 and Private_Key2 from the server, retrieves
and
, and decrypts the encrypted file. The FOTA update concludes with file decompression in Steps 6 and 7. The proposed method enhances security by distributing the RNG-based seed value of the encryption key (private key) across multiple channels. Moreover, the dual-XOR operation increases the number of key combinations necessary for a brute-force attack. Furthermore, file size and encryption complexity are minimized through the combined use of a dual-XOR operation and the DEFLATE compression algorithm.
4. Evaluation Results and Analysis
This study evaluated the latency, accuracy, and security of the proposed model and compared the results with those of the conventional model.
Table 2 presents each model’s encryption method and compression type. The conventional model (AES-Non-Compression) [
19] encrypts the firmware file using AES-128 encryption without compression. The conventional model (AES-Compression) [
20] employs both firmware file compression and AES-128 encryption. The proposed model (Dual-XOR-Compression) compresses the firmware file and subsequently encrypts it using a dual-XOR operation. This study employed five commercially available firmware images of varying sizes: Vueguera VN-850NHD navigation firmware (1.2 MB) [
21], Hyundai Phoneus A300 black box firmware (27.6 MB) [
22], SKYREX SKY-3004F v0.8.0c CCTV firmware (2.4 MB) [
23], ipTIME 14.28.8 router firmware (42.2 MB) [
24], and Hyundai TNR UNIQ500 V5.0.6A black box firmware (27.6 MB) [
25]. In a real-world scenario with channel noise, data or private keys may need to be retransmitted multiple times, resulting in increased delay. However, our performance evaluations for all models, namely, AES-Non-Compression, AES-Compression, and Dual-XOR-Compression, were carried out under the assumption of a noise-free network environment. The experimental code was written in Python 3.12 on a 13th Gen Intel (R) processor and 32.0 GB laptop with Windows 11 Home to evaluate the performance. The AES encryption algorithm uses an open-source code [
26] implemented in Python and the crypto module provided by Python 3.12.
Figure 2,
Figure 3,
Figure 4 and
Figure 5 were generated using the output data obtained from our implementation, and the graphs were plotted using matplotlib in Python.
The metrics defined in Equations (4)–(6) were used to assess the security of the proposed model:
where
denotes the number of keys within the key table, and
represents the total number of blocks in the entire firmware file divided by the length of the keys. Both
and
are defined with the length of the key as the denominator, which reflects the equivalence of the size of each encrypted block to the length of the key. The “number of seed pattern” refers to the number of potential seed combinations generated by the current key table. When an attacker is between the server and the IoT device transmission channel, the actual value that can be eavesdropped is the seed value, the private key, which is the basis for extracting the keys from the key table. In this case, the higher the number of seed patterns, the higher the difficulty of eavesdropping when an attacker tries to obtain the private key with a brute-force attack.
Figure 2 presents the security evaluation results of the proposed model calculated using Equation (6). A Firmware_file_size of 15,885 kb, equivalent to that of a commercially available firmware file size, was used. This analysis defined the key length as 128 bits, which is in line with current NIST cryptographic recommendations, which proposes to increase the minimum required security strength from 112 bits to 128 bits [
27]. As the value of
increases due to an increase in the key table size, the number of seed patterns in the proposed model correspondingly increases. This increase in the number of seed patterns expands the range of possible key combinations, thereby increasing the complexity of attacks that attempt to deduce the seed value through random key extraction from the key table. Since the attacker does not know the private key, it must brute-force attack the keys used in each block to obtain the original data. This increases the complexity of the attack, and randomized brute force can ensure the confidentiality of the original data from the attack.
Table 3 presents the number of cases where an attacker can determine the encryption key using brute force without knowing the key table. The conventional (AES-128) model has a complexity of
because it encrypts the entire file with one 128-bit key. On the other hand, the proposed (Dual-XOR) model has a complexity of
because it encrypts each block with a different 128-bit key. In the proposed (Dual-XOR) model, an attacker must try a brute force attack
times more than that in the conventional (AES-128) model to obtain the key, which improves attack complexity.
Figure 3 illustrates the latency under constrained memory usage during a brute-force attack on both the conventional model (AES-Compression) and the proposed model (Dual-XOR-Compression), conducted over 1000 iterations. A random 128-bit string was encrypted using a 128-bit key, and multiple random keys were tested to determine decryption feasibility. The proposed model (Dual-XOR-Compression) exhibited an approximately 3.4-fold lower latency than the conventional model (AES-Compression). However, neither successfully decrypted the encryption key within the imposed 250-MB memory constraint. This indicates that both the conventional model (AES-Compression) and the proposed model (Dual-XOR-Compression) are not effective against brute-force attacks with limited memory. The proposed model (Dual-XOR-Compression) is less secure than the conventional model (AES-Compression) when resources and time are unlimited, but it contributes to lightweighting while maintaining security when memory and time are limited.
Figure 4 presents the latency evaluation results for the conventional (AES-Non-Compression and AES-Compression) and proposed model based on firmware file size. Latency was measured as the duration from firmware file compression and encryption to decryption and decompression. Each model was tested 1000 times, and the average was computed. For all models, latency increased with increasing file size. However, the proposed model demonstrated reduced latency regardless of firmware file size, averaging 0.71% and 0.72% of that observed for the AES-Non-Compression and AES-Compression models, respectively.
Table 4 and
Figure 5 illustrate the interrelationships among entropy, compression ratio, and latency categorized by dividing the entropy values of the firmware file into upper, middle, and lower ranges.
Table 4 was obtained based on Equations (7) and (8) using three selected firmware files from the previously mentioned commercial firmware samples. Entropy, in the context of a firmware file, quantifies the randomness of the data and represents the average information content per symbol given a specific probability distribution, as expressed in Equation (7):
where
represents the probability of symbol
being generated, and
denotes the number of possible symbols within the data. Equation (8) defines the lower and upper bounds of entropy [
28]:
The lower bound condition occurs when only one symbol has a probability of occurrence of 1, while all other symbols have a probability of 0. This implies that all data can be compressed into a single symbol as the file’s entropy approaches 0. Conversely, the upper-bound condition is met when all symbols have equal probabilities of occurrence (). In this scenario, uncertainty is maximized, all symbols are equally represented within the file, and further compression becomes challenging. This study set the upper entropy limit to 8, assuming data representation in bytes, where 1 byte can assume 256 distinct values. Additionally, the entropy range was partitioned into three intervals: lower (0–2.67), middle (2.68–5.33), and upper (5.34–8).
A compression algorithm reduces data size by identifying repetitive patterns or regularities. Consequently, higher entropy values, indicating greater randomness, hinder pattern identification and result in lower compression ratios. This, in turn, increases the amount of data to be processed, leading to increased latency. As depicted in
Table 3, the compression ratio was 93.16% at an entropy value of 1.42, decreasing to 64.28% at 5.13 and further to 0.62% at 8.0. As shown in
Figure 5, the conventional model exhibited latencies of 36,710 ms, 110,430 ms, and 616,800 ms at entropies of 1.42, 5.13, and 8.0, respectively. In contrast, the proposed model significantly reduced these latencies to 5.98 ms, 33.27 ms, and 117.6 ms, respectively. The proposed model achieved an average latency of only 0.72% compared to the conventional model, showing better performance regardless of firmware entropy.
Figure 6 presents the accuracy results for both lossy compression (LC) and lossless compression (LZ) scenarios, comparing the conventional and proposed models. Assuming an ideal environment without channel noise, the AES-Non-Compression model involved only encryption and decryption processes. However, the AES-Compression and proposed models incorporated both encryption/decryption and lossless (de)compression processes. The JPEG algorithm was employed for LC, with the compression quality parameter set to 75, which is considered an optimal value [
29,
30]. The DEFLATE algorithm was used for LZ, with the compression level parameter set to 9, representing maximum compression.
Accuracy was measured using byte comparison and SHA-256 hash value comparison methods. The byte comparison method directly evaluates the file size and byte-level correspondence, as expressed in Equations (9) and (10). A single original block is denoted as
and a decompressed block as
. If
, the comparison returns 1; otherwise, it returns 0. The error rate is calculated by comparing the original and decompressed data in 16-byte blocks and counting the number of mismatched blocks. The SHA-256 hash function is valuable for verifying data integrity owing to the Avalanche effect, where a minor input change produces a significantly different output [
31]. Hash value matches are represented as binary values: 0 for mismatch and 1 for match.
The experimental results indicate that, under LC, the conventional model exhibits byte agreement rates of 0.85, 0.72, 4.5, and 1.25%, whereas the proposed model shows rates of 0.72, 0.90, 3.9, and 1.25%. Hash values for both models converge to 0, indicating data mismatch. Conversely, under LZ, both models achieve 100% byte agreement, and hash values converge to 1. For a 2.4 MB file, LC achieved a compression ratio of 79.82%, whereas LZ achieved 0.63%. For a 10.5 MB file, the LC ratio was 79.89%, and the LZ ratio was −0.03%. The compression ratios for 19.5- 42.2-MB files were 80.20 and 51.65% for LC and 80.01 and 3.03% for LZ, respectively. Both the conventional and proposed models demonstrated degraded compression ratios under LZ but improved accuracy. Therefore, the proposed model maintains the accuracy of the conventional model, reduces latency, and enhances security.
Table 5 presents the power consumption results for the conventional (AES-Non-Compression), conventional (AES-Compression), and proposed (Dual-XOR-Compression) model. The base power of the Intel Core™ i7-1360P processor is defined as 28 W [
32]. Accordingly, we define power consumption (j) as the product of the execution time of the conventional and proposed model and the base power of 28 W. The proposed model consumed approximately 261 times less power than the conventional model (AES-Non-Compression) and 179 less power than the conventional model (AES-Compression). These results demonstrate that the proposed Dual-XOR-Compression model achieves improved energy efficiency compared to the conventional AES-128 model. This enhancement highlights its suitability for deployment in lightweight and energy-constrained IoT environments.
Table 6 presents the memory usage results for the conventional (AES-Non-Compression), conventional (AES-Compression), and proposed (Dual-XOR-Compression) model. To evaluate memory usage, the psutil library in Python was employed to measure the memory consumption of both the conventional and proposed model during their entire execution process. Measurements were taken at 0.1 ms intervals, and memory usage was accumulated throughout the execution. This procedure was repeated for 100 iterations, and the average memory usage across all iterations was calculated. The proposed model demonstrated a reduction in memory usage of approximately 23.8 MB compared to the conventional AES model without compression and 8.4 MB compared to the AES model with compression. In terms of reduced memory usage, the proposed model is more suitable for resource-constrained environments compared to the conventional model.
5. Conclusions
The increasing utilization of IoT devices across various industries and diverse industrial sectors, coupled with their wireless interconnection, necessitates robust security measures. MITM attacks in wireless networks present a substantial threat, even within IoT environments. Therefore, an S-FOTA mechanism capable of mitigating MITM attacks is urgently required. Conventional encryption algorithms are often unsuitable for the resource-constrained, lightweight environments characteristic of many IoT devices. Consequently, this study proposed a FOTA update method designed to counter MITM attacks within such constrained environments. The proposed method minimizes file size and encryption overhead by employing LZ of firmware files and subsequent encryption via dual-XOR operations. Consequently, the proposed method reduced latency attributable to file size variations to an average of approximately 0.71% compared with conventional methods, while latency owing to entropy changes was reduced to approximately 0.72%. The proposed model consumed approximately 261 times less power than the conventional AES-128 model without compression and 179 times less than the model with compression. In addition, memory usage was reduced by approximately 23.8 MB and 8.4 MB compared to the AES-Non-Compression and AES-Compression models, respectively. Furthermore, security was enhanced by distributing the RNG-based key table seed value (private key) across multiple transmission channels. When an attacker possesses knowledge of the key table and attempts a brute-force attack, each 5 MB increase in the key table size leads to an approximately unbounded exponential growth in attack complexity. When the attacker does not consider the key table and instead attempts a brute-force attack, the proposed model requires times more attempts than the conventional model to obtain the correct key due to the use of a different 128-bit key for each block. In a practical brute-force attack attempt, a random 128-bit string was encrypted using a 128-bit key, and multiple random keys were tested. Neither AES-128 nor dual-XOR successfully decrypted the encryption key within the imposed 250 MB memory constraint. This result indicates that both AES-128 and dual-XOR are resistant to brute-force attacks under limited memory conditions. The FOTA mechanism demonstrated 100% accuracy, equivalent to that of conventional models, while maintaining security against brute-force attacks. Additionally, it simplifies the encryption process, thereby reducing latency, energy consumption, and memory usage. Therefore, the proposed method optimized for secure and lightweight FOTA updates in resource-constrained environments. This work presents a multichannel transmission mechanism that performs a novel random seed-based XOR operation using key table for FOTA updates while maintaining compatibility with existing technologies through a combination of well-known methods such as XOR, DEFLATE, and random seed. Furthermore, it is not limited to the DEFLATE algorithm; therefore, it can support other lightweight, lossless algorithms better suited for lightweight IoT environments.