Securing IoT Data Using Steganography: A Practical Implementation Approach

: Adding network connectivity to any “thing” can certainly provide great value, but it also brings along potential cybersecurity risks. To fully beneﬁt from the Internet of Things “IoT” system’s capabilities, the validity and accuracy of transmitted data should be ensured. Due to the constrained environment of IoT devices, practical security implementation presents a great challenge. In this paper, we present a noise-resilient, low-overhead, lightweight steganography solution adequate for use in the IoT environment. The accuracy of hidden data is tested against corruption using multiple modulations and coding schemes ( MCSs ). Additive white Gaussian noise ( AWGN ) is added to the modulated data to simulate the noisy channel as well as several wireless technologies such as cellular, WiFi, and vehicular communications that are used between communicating IoT devices. The presented scheme is capable of hiding a high payload in audio signals (e.g., speech and music) with a low bit error rate ( BER ), high undetectability, low complexity, and low perceptibility. The proposed algorithm is evaluated using well-established performance evaluation techniques and has been demonstrated to be a practical candidate for the mass deployment of IoT devices.


Introduction
Over recent years, many definitions of the Internet of Things (IoT) have been presented. They generally refer to trends in integrating digital capabilities, including network connectivity, with physical devices and systems. The author of [1] views IoT as "a world of interconnected things that are capable of sensing, actuating, and communicating among themselves and with the environment (i.e., smart things or smart objects). In addition, IoT provides the ability to share information and autonomously respond to real/physical world events by triggering processes and creating services with or without direct human intervention". In contrast, Ref. [2] defines IoT as "a network that connects uniquely identifiable 'things' to the Internet. The 'things' have sensing/actuation and potential programmability capabilities. Through the exploitation of unique identification and sensing, information about the 'things' can be collected, and the state of the 'thing' can be changed from anywhere, any time, by anything". Similarly, in Ref. [3], IoT is defined as "the network of physical objects or 'things' embedded with electronics, software, sensors, and connectivity to enable objects to exchange data with the manufacturer, operator and/or other connected devices. The Internet of Things (IoT) refers to devices that are often constrained in communication and computation capabilities, now becoming more commonly connected to the Internet, and to various services that are built on top of the capabilities these devices jointly provide". IoT networks comprise billions of connected devices exchanging an exponentially growing global volume of data. As these devices handle sensitive data, ensuring their protection should be paramount. Steganography and cryptography techniques can be used to address cybersecurity concerns in IoT networks. In steganography, the interest is in concealing the existence of a message from a third party, while in cryptography, the purpose is to make a message unreadable to a third party. Additionally, the main objective of steganographic systems is to provide a secure, undetectable, and imperceptible way to conceal a high rate of data in a digital medium. It is used under the assumption that it will not be detected if no one is attempting to uncover it. Steganography techniques manipulate the characteristics of digital media files and use them as carriers (covers) to hide secret information (payload). Covers can be images [4,5], audio [6][7][8][9][10], videos [11] and text [12,13]. Protocols [14,15] and storage devices [16] can also be used as carriers for hidden data. The payload is hidden in any type of digital cover using a key to secure the data and produce a stego file. Figure 1 illustrates a brief comparison between these two techniques. Steganography applications are not limited to data protection. They have also been used maliciously by criminals, hackers, terrorists, and spies [17,18]. To defeat malicious interventions when communicating secretly, steganalysis techniques have been actively researched to counter steganography algorithms [19,20]. Steganalysis aims essentially to detect the existence of the payload and does not necessarily consider its successful extraction. Steganalysis techniques have a dual role: (1) they are regarded as attacks to break steganographic algorithms and (2) used to measure the strength of steganographic algorithms. Steganalysis attempts to detect or destroy the payload using audio/image processing and statistical analysis approaches. The work of a steganalyst can be very challenging, especially when the only information available is the stego file (blind steganalysis). Most of the steganalysis techniques presented lately are based on learning to differentiate between the cover-and the stego-audio signals. The learning process is performed by machine learning, for example, by using a support vector machine (SVM), [21] on a dataset fed with a set of features extracted from the cover and stego signals. A decision is then made on whether the tested signal is a cover or a stego file ( Figure 2). Well-selected features will strengthen the discriminatory power between the cover-and stego-audio signals. The features are intended to capture the differences between the cover and stego signals due to data embedding. In blind steganalysis, the only available information is the received signal. In this case, a reference signal is created to provide an estimate of the cover signal.
The reference signal could be created by applying a de-noising method to the input signal or by applying second steganography using steganographic tools [22][23][24][25]. Existing data protection solutions such as steganography and cryptography are inapt for direct adaptation due to their computational complexity, application specificity, and inflexibility. Furthermore, the IoT environment is depicted as having: limited device capability, high data rate traffic, massive scalability and a large spectrum of heterogeneous devices [26]. IoT cybersecurity challenges, considering the aforementioned limitations, can be addressed using lightweight audio steganography algorithms. Image and video steganography requires a high data rate with additional energy for data transmission, increased processing time, energy consumption, and storage requirement. Audio signals (e.g., speech and music), in this case, are better candidates to facilitate the accommodation of IoT devices' constrained resources. In addition, the number of voice-enabled IoT devices is expanding across industries and has become a standard in connected devices such as: mobiles, tablets, sensors, wearable devices, and smart speakers. The shift from touch to voice is a need that is highlighted by the current pandemic to improve safety. In health care, there are hospitals in the US that allow parents to gain access to high-quality clinical information and specific treatment protocols on Alexa-enabled devices. Manufacturing plants, smart agriculture, construction sites and production lines also require hands-free mobility that IoT voice recognition systems can provide. These voice recordings are transmitted over untrusted public networks and then stored and processed on untrusted third-party cloud-based infrastructures. It is important to protect the information fed to or received by the voice-enabled IoT device such as authentication data, personal and private information.
While many research proposals have been presented in the steganography literature, audio steganography schemes addressing cybersecurity issues in IoT devices are still limited in number, with modest contributions. In this paper, we propose an audio steganography solution designed for IoT implementation through a scalable, noise-resilient, and lightweight algorithm that can accommodate mass deployment in an IoT environment. The remaining part of this paper is organized as follows. The related work is discussed in Section 2 and then our methodology for secure IoT communication is presented in Section 3. The experimental setup and performance evaluation results are given in Section 4 and we conclude the paper and state some future directions in Section 6.

Related Work
Several steganography techniques have been developed recently. Some of them adopt data hiding in the time domain [27,28], frequency domain (e.g., Fast Fourier Transform (FFT), Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT)) [6,29], or encoded domain (e.g., AMR, G.723.1, G729) [30,31]. Most of these techniques are based on least significant bit encoding (LSB). In its naive implementation, the LSB technique could be used as a simplified demonstration of the data hiding process using steganography. Figure 3 shows how we generated a stego signal by replacing the first LSB layer of the cover signal with the digital payload "2018". LSB allows the embedding of a high payload capacity with low perceptibility. However, since the embedding is applied in the LSB plane of the cover signal, the secret message could be easily removed by an unauthorized user. To improve the undetectability of secret data, Ref. [32] proposed a framework based on Generative Adversarial Networks (GANs) to implement optimal embedding for audio steganography in the temporal domain, whereas the authors of [33] proposed a generalized joint adaptive intra-frame and adaptive inter-frame steganography method (called AHCM) within compressed audio streaming and implemented an AdaMP3Stego algorithm in MP3 audio based on the psychoacoustic model. To ensure hidden data robustness against LSB removal and re-sampling attacks, Ref. [34] embedded secret audio into the cover audio by separating the processing of the amplitudes and signs of the secret audio. Ref. [35] combined scrambling and steganography to provide high security for speech hiding in speech. Similarly, Ref. [36] proposed hiding data under the hearing mask in high-frequency samples and [6] proposed a perturbation minimizing algorithm by finding an optimal embedding path and an optimal modification strategy. Discrete wavelet transform and sparse decomposition are used in [37]. There are also several audio steganography tools that are freely available over the Internet, i.e., Xiao Steganography [22], Steghide [23], S-Tools [24] and Camouflage [25]. In audio steganalysis, a second-order difference of pitch delay features was used in [38]. The authors of [39] modeled the deviation between the reference and the input signals by reversible Mel-cepstrum coefficients (R-MFCC), while the authors of [40] presented a steganalysis technique specifically designed for the steganographic method developed by [41]. A more generalized multi-layer architecture for the steganalysis of all mp3 encoders has been presented by [19]. The first layer of the architecture detects the encoder and the second layer performs the steganalysis. To distinguish between the cover and the stego signal, the authors of [20] used the entropy and the energy of the signal as the discriminator features. The authors claim that this combination enhances the performance of the classifier. A steganalysis method for quantization index modulation QIM steganography in a low-bitrate encoded speech stream such as G.723.1 and G.729 is presented by [42]. The authors used the correlation characteristics of split vector quantization VQ codewords of linear predictive coding filter coefficients, arguing that it leads to a stronger correlation network. They showed that this technique improves the steganalysis results for low-bit-rate encoded speech streams.
Although cyber-physical systems (CPSs)/IoT audio steganography techniques are non-existent in the state-of-the-art review, few CPSs/IoT image-based steganography attempts have been made. Ding et al. [43] used mobile edge computing to implement image steganography in IoT, assuming that mobile edge computing fulfills the high computing power, data storage capacity, and bandwidth requirements for the Internet of Things (IoT) through edge servers that process data close to data sources or users. To efficiently protect external product packing in IoT against anti-counterfeiting, Pu et al. [44] used fractionalorder spatial steganography (FSS) and blind steganalysis. To promote secure data transfer in a smart IoT environment, a security scheme is advocated to employ a combined approach of lightweight cryptography and the variable least significant bit substitution steganography technique to conform to the intrinsic constraints of IoT devices [45]. Elhoseny et al. [46] proposed a hybrid security model for securing the diagnostic text data in medical images. Covington et al. [47] discuss the special characteristics, challenges, and peculiarities of ensuring security in IoT systems. Among these peculiarities are the distributed deployment infrastructure, the interoperability and heterogeneity of devices, and the large traffic volume of IoT elements. The authors state that these specific aspects of IoT play a major role in the increased probability of security attacks in IoT when compared to other systems in a controlled environment with well-defined security policies and tools. Owing to the intrinsic challenges imposed by the IoT characteristics, several proposals are presented in the current literature. Researchers, in [48,49], have proposed schemes not specifically designed for IoT, but rather for securing data in processing power-and memory size-constrained devices such as mobile phones and embedded devices using simple low-computational complexity cryptography and steganography algorithms. The work presented in [50] attempts to design a security framework for IoT. This framework consists of two algorithms: AES and image steganography. The authors addressed IoT security by proposing a two-tier security scheme. However, they appropriated a classical image steganography algorithm without any adaptation to meet the IoT-specific challenges. Similarly to [50], the authors of [46] proposed an integrated model between a steganography technique and a hybrid encryption scheme to ensure the security of medical data transmission. However, their solution provides for an algorithm to better accommodate IoT security at the cost of the heavy processing required by the presented scheme.
The proposals above are not explicitly designed to resolve cybersecurity issues in IoT but are instead designed to accommodate devices with low processing power, memory size, or battery constraints in traditional networks. In particular, none of these techniques have addressed the issue of embedded data distortion during transmission due to the noisy transmission environment. Additionally, these techniques are image-based, leaving IoT audio steganography unexplored. Audio signals contain a high level of redundancy, allowing a high payload capacity with minimum disturbance. The number of voice-enabled IoT devices is immensely increasing as this feature has become a standard in connected devices such as: mobiles, tablets, sensors, wearable devices, and smart speakers. On the contrary to images and video, audio files require lower data rates and energy for transmission, resulting in a shorter processing time, better energy consumption, and fewer storage requirements. This facilitates the accommodation of constrained IoT devices. This work proposes an IoT audio steganography realizing the perceptible importance of audio in the current digital society and the application trends that depend heavily on cheap, low-power, and intelligent signal processing algorithms and systems such as autonomous vehicles, health care, Intelligent Transportation Systems (ITS), smart grid, environment, and smart cities. In addition, IoT cybersecurity includes new requirements and challenges that must be first addressed before the deployment of these devices:

1.
Device capabilities: we adopted Fast Fourier Transform (FFT), which inherits its polynomial complexity O(N log N) [51], uses a small FFT size, and hides data through bits replacement, resulting in minimum overheads. In addition, data are hidden in the phase components of the audio signal, resulting in a better signal-to-noise ratio [52] and reduced re-transmission probability. Thereby, the proposed scheme addresses the processing limitations of IoT devices, suggesting it is a better candidate for the mass implementation of IoT devices.

2.
Interoperability: There is a myriad number of nodes in IoT networks (laptops, sensors, RFID, etc.). The proposed security techniques should be capable of accommodating the full spectrum of heterogeneous devices without impeding their functionality. 3.
Scalability: The presented solution must be scalable since IoT networks include a large number of devices. 4.
Data traffic rate: The scheme is designed to achieve a high payload capacity. Even if some IoT devices require low data rates, the collective volume of data sent by the large number of these devices sums up to a massive amount.
We also extend our work [26] by investigating hidden data survival transmitted over noisy channels such as WiFi, Bluetooth, Radio Frequency Identification (RFID), and other IoT communication technologies. To cover several digital communication techniques used in IoT networks, we consider multiple modulations and coding schemes (MCSs) such as quadrature phase-shift keying (QPSK), binary phase-shift keying (BPSK), quadrature amplitude modulation (16 − QAM) or (64 − QAM), and the noisy environment is modeled as additive white Gaussian noise (AWGN), thus facilitating practical implementation in IoT networks. We further evaluate the proposed algorithm by measuring the bit error rate, throughput performance, time complexity, and statistical undetectability rate using state-of-the-the-art steganalysis methods.

Methodology
The proposed IoT steganographic scheme hides data (payload) in a cover signal (s c ) to generate a stego signal (s s ), which is then sent over the IoT network. For audio cover signals such as speech and music, low frequencies are typically much more potent than highfrequency components. They exhibit better signal-to-noise (SNR) ratios and have higher energy, making them more tolerable for data embedding. To further attenuate the noise induced by hidden data and enhance the stego signal quality, we use the signal spectrum properties whereby noise values 13 dB less than the original signal spectrum (frequency mask) are inaudible for all frequencies [53]. Hence, data embedding is prioritized at lower frequencies, higher energy bins, and at least 13 dB under the frequency mask. To satisfy steganography system criteria in terms of the payload (hiding capacity), statistical undetectability, stego signal quality, and security of embedded data against intentional removal attempts, we used a multi-key combination described in Figure 4. The detailed rationale is explained in the following: -Payload: the hiding frequency band limit (FBL) is a key set to define where data will be hidden in each frequency frame. FBL is delimited by F HDmin and F HDmax , representing, respectively, the lower and higher frequency bin values used for data hiding. Similar to the sliding window concept, FBL could be increased or decreased to control the embedding location and the payload capacity. Additionally, we used variable least significant bit (SLB) as a second key to increase the payload capacity. SLB represents the lower limit of the embedding locations at each frequency frame. -Stego signal quality: to increase the stego signal quality, we used only high-energy magnitude frequency bins for data hiding. For this, we defined a threshold value that acts as a third key and sets the minimum energy required for a frequency bin to be elected for data embedding. A high threshold value is expected to maintain good stego signal quality. Yet, there exists a trade-off between high threshold values and the resulting payload capacity and stego signal quality. To ensure a good quality of the stego-audio signal, we also defined a distortion level (∆) as an additional key to model the upper limit of the embedding areas. As an example, if we set the frequency mask (ρ) to 13 dB, the expected SNR value between the cover and the stego signals is: The noise added by the embedded data can be modeled as the difference between the stego (s s ) and cover signals (s c ) such as: ∆ = s s − s c , ∆ can also be represented as a factorization of the cover signal by α, where ∆ = α * s c . Hence, the value of ρ could be increased to enhance the SNR value. The ∆ value, on the other hand, has a dual effect on the stego signal: (1) preserving the statistical nature of the modified signal by shaping the noise created by the hidden data into an audio-like spectrum and (2) embedded data below ∆ preserve the stego signal quality. In addition to securing the embedded data locations, preserving the quality of the stego signal, increasing the payload and hidden data undetectability must be satisfied. Once the embedding locations are defined in the magnitude spectrum using the aforementioned keys, we created their replicas in the phase spectrum to hide our secret data. This is motivated by the following reasons: -Only a few bits in each selected frequency component are modified, which results in a smooth transition while preserving the phase continuity. This means a minimum overhead with low imperceptibility. -When phase coding is used, it gives a better signal-to-noise ratio [52], thereby reducing the probability of re-transmission, which in turn makes the proposed scheme a better candidate for the mass implementation of IoT devices. -Phase coding is robust to common linear signal manipulation such as amplification, attenuation, filtering, and re-sampling [52], which provides better quality with no additional complexity cost. -It allows opportunities to increase the hiding capacity to accommodate a large number of devices and higher IoT traffic.

Hiding Algorithm
The proposed IoT steganographic scheme generates a stego signal which is sent over the IoT network. To accommodate several wireless technologies used in IoT networks (e.g., cellular, WiFi and vehicular communications), the stego signal is modulated using orthogonal frequency division multiplexing (OFDM) and one of the multiple modulations and coding schemes (MCSs) such as QPSK, BPSK, 16 − QAM or 64 − QAM is chosen in a way to obtain the maximum bandwidth while considering the capability of the transmitting IoT device. The stego signal is then sent over an additive white Gaussian noise (AWGN) channel to modulate the noisy environment. At the receiver, the signal is demodulated and the embedded data are extracted.
To generate the stego signal, we divided the cover signal into M frames of 4 ms and N samples each, s c (m, n), 1 ≤ m ≤ M and 1 ≤ n ≤ N. FFT was applied to each frame in the frequency domain such as S c (m, k) = FFT(s c (m, n)) and the magnitude spectrum |S c (m, k)| was isolated. The hiding band range is specified using F HDmin ≤ k ≤ F HDmax , where F HDmin and F HDmax are the minima and the maxima of FBL (i.e., for a sampling frequency of 16 kHz, F HDmin and F HDmax are 1 and 28, respectively). The threshold value was set as the minimum energy required for a magnitude frequency component to be selected for data hiding. The distortion level of the magnitude spectrum ∆(m, k) was set at ρ dB below the magnitude spectrum, where ρ dB is the amount of attenuation from the original spectrum and is approximately 13 dB.
The embedding process in the phase and the generation of the stego channel using QPSK/OFDM modulation is described in Algorithm 1.  δ(m, k). The embedding in a given phase component is defined as follow: |φ c (m, k)| = (a n 2 n + a n−1 2 n−1 + a n−2 2 n−2 + ....a 2 2 2 + a 1 2 1 + a 0 2 0 ), where a n = {0, 1} represents the bit value of the cover phase at a given LSB position, n = {8, 16} depending on the quantization value of the audio signal and δ(m, is the binary representation of the payload that will be injected into a given phase component. The value of stego phase is recalculated as: |φ s (m, k)| = (a n 2 n + a n−1 2 n−2 + d i 2 i + ....d 1 2 1 + d 0 2 0 + a 1 2 1 + a 0 2 0 ). Finally, the new phase is multiplied with its magnitude to produce the stego spectrum therefore S s (m, k) = |S c (m, k)|e jφ s (m,k) . The inverse iFFT transformation is applied on the segment to get the new stego signal segment s s (m, n). The principal parts of our audio hiding technique are represented in Figure 5, where the embedding of area blocks illustrate steps 3 to 6 in Algorithm 1.

Data Extraction Algorithm
At the receiver side, we first demodulated the stego channel and then we extracted the embedded data from the phase components of the stego signal. To achieve this, we first found out the embedding locations in the magnitude spectrum |S s (m, k)| using the embedding keys: Threshold, ∆(m, k), SLB and FBL. Then, we mapped the embedding locations to the phase spectrum. Secret data segments were extracted and then reassembled as shown in Algorithm 2:  20 if ∆(m, k) dB ≥ SLB dB then 21 Extract δ(m, k) from |φ s (m, k)|

return δ(m, k)
Data extraction from the phase spectrum is shown by the block diagram presented in Figure 6.

Scenarios
Four scenarios were designed to analyze the adequacy of the proposed algorithm for the IoT environment and to evaluate its performance by conducting a comparative study between the stego channel and the cover signal.

1.
Perceptual undetectability against payload capacity: In this scenario, the perceptual similarity between the original cover and the stego channel is evaluated against the payload capacity using the PESQ test and SegSNR. We aimed to maximize the payload capacity (Kbps) while maintaining a good stego signal quality to determine the scheme's capability to send a high payload capacity. The PESQ is used to measure the quality of the stego signal. The output of the test is a number ∈ [1 4.5]. An output value of 4.5 indicates that the stego signal is the same as the cover signal. A value of 1 indicates the severest degradation. Achieving a high payload capacity while maintaining perceptual undetectability confirms that the algorithm is capable of accommodating a high data traffic rate summed up from the mass of IoT device transmissions.

2.
Resilience to noise channel: In this scenario, we measured the bit error rate (BER) in the stego channel due to noise-induced by the injected payload and the channel. A lower BER corresponds to low retransmission probability and therefore the scheme capability to send high traffic load.

3.
Scheme application dependencies: We propose a new performance measure to evaluate the average embedding ratio (AER) against different types of applications (e.g., speech, music, and video). An AER value analysis will allow us to prove that our algorithm is not application-specific and can accommodate heterogeneous and high traffic.

4.
Statistical undetectability and time complexity: In this scenario, we measure the performance of our algorithm against steganalysis attacks to determine how well the system can distinguish between stego and cover signals. The accuracy of our predictions is measured by F-measure and the receiver operating characteristic (ROC). In this scenario, the time complexity of our scheme is also measured.

Performance Metrics
The following performance evaluation metrics were adopted to evaluate the proposed algorithm: • Perceptual evaluation of the signal quality (PESQ) measure, defined in ITU-T P862.2 [54]. • Segmental signal-to-noise ratio (SegSNR) in dB (Equation (5)). • Bit error rate (BER): calculated as the percentage number of retrieved binary bits after demodulating the stego channel and have been altered due to noise, divided by the total number of bits in the transmitted payload. • Average embedding ratio (AER): AER is defined as: Embedding rate (%) = total embedded bits total bits o f the signal · 100 (1) AER is used to calculate the ratio of the embedded data to the cover signal size.
• F-measure and the area under the ROC curve are two of the most popular computational methods to find a balance between false positives and false negatives. The F-measure is calculated using precision (PP) and recall (R) such as: The

Scenario-1: Perceptual Undetectability and Payload Capacity
In this scenario, the effectiveness of the proposed algorithm is evaluated on signal frames sampled at 64. We set the algorithm keys' values to maximize the hiding capacity while maintaining the speech quality, i.e., threshold = −20 dB, ρ = 15 dB, F HDmin and F HDmax were set to 1 and 28 for 4 ms frame length. The distortion between the stego and cover signals is calculated and averaged over several frames. SegSNR value for one modified speech frame of 4 ms is given by the following equation: The summation was performed over the signal on a frame basis. To evaluate the results, the following criteria were used. First, the capability of embedding a larger quantity of secret data (Kbps) was sought while the naturalness of the stego signal was retained. Second, the hidden data were fully recovered from the stego channel after being sent over the wireless channel. Figure 7 shows that for SegSNR, the values range from 42 to 48 dB and from 4.38 to 4.41 for the PESQ while registering a payload up to 24 Kbps. These results guarantee a high payload with very good quality of the stego channel regardless of the LSB layer depth.
Additional analysis to Figure 7 indicates that the quality of the stego channels is maintained even when the embedding rate is maximized. Hence, to achieve a higher secret traffic rate required in IoT applications and maintain the stego channel quality, hiding at the first LSB layer is to be adopted if the payload survives channel noise as we discuss in the next section.

Scenario 2: Resilience to Noise
In this scenario, we designed two experiments to assess the robustness of the proposed algorithm for implementation in IoT. The first experiment computes the bit error rate (BER) on our stego channel in the AWGN channel and the second experiment measures the dependency of the expected throughput of the stego channel on the signal-to-noise ratio (SNR). In each experiment, all modulations and coding schemes such as BPSK, QPSK, 16 − QAM, and 64 − QAM are tested. Our results show that the proposed algorithm while using 64 − QAM and 16 − QAM modulations achieves a better average throughput performance than other modulation schemes (Figure 8a), i.e., at SNR = 10 dB, we registered almost 1 Mbps using 64 − QAM modulation against 0.5 Mbps in QPSK modulation. In all modulations schemes, throughput values increase as SNR increases. A further analysis of the behavior of the proposed algorithm using all modulation and coding schemes in a noisy environment (Figure 8b) shows that the BER level proportionally increases when the noise level alleviates in all tested modulation schemes. In addition, Figure 9 demonstrates that the algorithm is resilient to noise even at low SNR and is almost equivalent to the performance of the channel without engaging the stego channel. There is, however, an inherited percentage of uncontrollable erroneous bits due to channel noise, which is not related to our algorithm. Figure 9a-d show that the proposed algorithm did not induce additional BER, since the BER rate is almost the same with and without engaging our algorithm. The BER shown in the above figures is due to the channel condition which is not the focus of this paper. In BPSK modulation, Figure 9a, for instance, at SNR = −10 dB, we registered 0.3274% of BER while using the channel without engaging data hiding versus 0.3298% in the presence of the stego channel. The BER values overlap, in the channel with and without the proposed steganographic algorithm, starting from SNR = −5.4 dB. This behavior was observed for all modulation schemes. There will be, however, a successful payload retrieval trade-off between the maximum throughput and the channel noise level. This trade-off does not impact the quality of the retrieved hidden data if the latter are a signal or image. The visual and auditory human system tolerates a certain level of degradation. However, the payload of nature text or number, while using 16 − QAM or 64 − QAM with SNR = −10 dB and throughput around 0.25 Mbps (Figure 8a), will survive at a relatively higher AWGN power i.e., 6 dB and up. Overall, our algorithm shows resilience to noise for SNR values, from low to high dB values. Therefore, the BER analysis demonstrates the scheme's capability in terms of reducing the IoT retransmission probability and therefore increasing the traffic load.

Scenario 3: Average Embedding Ratio (AER)
We propose another performance measure to evaluate the average embedding performance of the algorithm against different types of applications (speech, music, and video) and its adaptability for high traffic load. The results of the AER measure presented in Figure 10 are almost the same as those for all tested signals. We registered embedding ratios of 31.96, 35,36.2% in speech, music, and video, respectively. These results indicate that our algorithm is not application-specific and can accommodate heterogeneous traffic, which makes it suitable for IoT. In addition, the increased AER values show clearly that the proposed algorithm is a good candidate for IoT by accommodating high traffic and enabling a practical IoT implementation.

Scenario 4: Time Complexity and Statistical Undetectability Rate
In this experiment, we assessed the proposed IoT audio steganography based on the time complexity and undetectability rate. Our datasets consist of 500 training and 500 testing audio samples (speech and music) with a frequency of 44.1 kHz and quantized at 16 bits. The duration of each signal is 10 s. All covers were embedded with random messages using the proposed IoT steganography algorithm. We employed the SVM library tool [21] with radial basis function (RBF) kernels [55]. We used two successful audio steganalysis methods: the first is 2D-Mel (second-order derivative-based Mel-cepstrum) [56], which is an efficient method, and the second is EE-AS (Energy-Entropy Audio Steganalysis) [20] which proved to increase the detection rate of steganography work. The accuracy of our predictions was measured by F-measure and the receiver operating characteristic (ROC).
In the absence of a practical implementation of the IoT audio steganography technique, we compared our method with the well-known audio steganography tools Steghide and S-tools in terms of undetectability. In Table 1, we record the overall accuracy, where higher score values are interpreted as a high detection rate. We registered 59.3% compared to 73.2% in Steghide and 81.7% in S-tools (average score between EE-AS and 2D-Mel). These records and ROC curves in Figure 11 demonstrate that we were able to achieve a higher level of undetectability scores.  Our scheme has also low time complexity since it is based on FFT. For N samples, FFT takes a maximum time complexity of O(N log N) [51]. In addition to the complexity time, the computational time needed is also small considering we only use audio frames of 4 ms, which require a small FFT size.

Conclusions
In this paper, we designed an audio steganography scheme to address the security issues of IoT, which takes into account the collective characteristics of IoT devices. The algorithm is lightweight, noise-resilient, and provides a high payload, which makes it suitable for the deployment of IoT systems. The proposed scheme is based on locating areas to hide data in high frequencies of the audio phase, leading to an increased payload capacity, reduced cover signal disturbance and preserved naturalness of the stego file. The scheme is resilient to noise, where the payload capacity is independent of the signal type, and of low computational complexity, making it appropriate for capability-constrained IoT devices. We utilized audio signals as covers to expand the IoT steganography application range, currently limited to images, and to allow data protection within the increased number of voice-enabled devices. The performance results show that we achieved a low detectability rate of 59.3% and a high payload capacity of 24 kbps at 32 dB SNR. By varying the LSB depths, the SegSNR and PESQ scores gradually increased from 42 to 48 dB and from 4.38 to 4.41, respectively. The simulation and implementation results also indicate that our algorithm (in the presence of the stego channel) is resilient to noise and its performance is very close to the performance of the channel with BPSK, QPSK, 16 − QAM and 64 − QAM modulation schemes in the absence of the stego channel. In all, the proposed scheme can meet IoT steganography requirements and challenges by being able to provide for data protection, survive IoT communication channel noise, and accommodate a large number of devices and the IoT's large traffic volume.