Next Article in Journal
Applications of Multi-Criteria Decision Making in Information Systems for Strategic and Operational Decisions
Next Article in Special Issue
A Systematic Literature Review on Load-Balancing Techniques in Fog Computing: Architectures, Strategies, and Emerging Trends
Previous Article in Journal
Utilizing Virtual Worlds for Training Professionals: The Case of Soft Skills Training of Smart City Engineers and Technicians
Previous Article in Special Issue
Efficient Orchestration of Distributed Workloads in Multi-Region Kubernetes Cluster
 
 
Article
Peer-Review Record

Secured Audio Framework Based on Chaotic-Steganography Algorithm for Internet of Things Systems

Computers 2025, 14(6), 207; https://doi.org/10.3390/computers14060207
by Mai Helmy 1 and Hanaa Torkey 2,3,*
Reviewer 1:
Reviewer 2: Anonymous
Computers 2025, 14(6), 207; https://doi.org/10.3390/computers14060207
Submission received: 10 April 2025 / Revised: 13 May 2025 / Accepted: 21 May 2025 / Published: 26 May 2025
(This article belongs to the Special Issue Edge and Fog Computing for Internet of Things Systems (2nd Edition))

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Dear authors,

Please find the remarks I noticed.

The most important is the section "Related works". It must be rewritten by the authors, not AI.

Literature positions [7], [8], [9], [10] do not exist. This proves that at least part of the article was created by generative AI. Although the use of generative AI is not forbidden by MDPI (refer to ethics), authors are required to declare its use and check the generated text. The authors of this manuscript didn't check if the sources exist and if the generated text is true, and didn't declare AI use in the acknowledgement.

Line 40, There should be OFDM rather than IoT.
Line 60, Authors cite two articles which are about data processing due to GDPR regulations, not about wireless networking, as suggested at the beginning of the paragraph.
Line 77, Authors cite articles, which are on cryptographic algorithms (RC6 and image encryption), and have nothing on the topic of the paragraph (limitations of constrained IoT Edge devices)
Line 80, unwanted "and" at the end of the line.
Line 200, literature knows chaotic maps as "baker", not "backer" maps.
Line 204, the beginning of the sentence sounds erroneous: "Chaotic Maps Chaos theory..."
Line 212, Backer
Line 218, one of the articles cited [22] is about the modulation techniques and has nothing in common with baker maps and permutations, as mentioned in the sentence
Line 220, it should be described what x and y are in the equations (1) and (2)
Line 238, Describe symbols used in equations
Line 241, Explain what IV is
Line 244, should be "in Figure 4"
Line 247, Describe symbols used in equations.
Line 360, The description of figure 9 is unclear. It's written that horizontal lines for histograms start with zero, but in figure 9 they start with -1. There are two types of pictures in figure 9, and they should be clearly explained with a description of both axes and the contents of the images. Every part of the image should have the individual designator and description. It's not explained what the colours (purple, yellow) of the vertical bars mean.
Line 373, figure 9 should be reorganised, cause it's unclear which histogram shows results of which configuration.
Line 404, It refers to grayscale images, not audio files
Line 480, The table caption is unfinished
Line 504, The table caption is unfinished

Position [30] is available with a different title. There is "images" instead of "audios" in the title.
Position [33] description is continued in [34]; they should be merged. Additionally, there are again "audios" instead of "images".

I also had trouble understanding the configuration of the system for simulations. I think a short introductory subchapter would be OK.

The paper requires careful editing work, including:

- Some texts in tables partially disappear.

- Subchapters' numbering in chapter 8 is incorrect. There are different font styles used, and there are two 8.4 sections.

Comments on the Quality of English Language

The paper should be carefully read for English language and formatting.

Author Response

The most important is the section "Related works". It must be rewritten by the authors, not AI.

Response:  We appreciate the reviewer’s emphasis on the importance of the "Related Works" section and the concern regarding authorship and originality. In response, we have thoroughly revised Section 2 to ensure that it reflects a critical and original synthesis written solely by the authors.

The updated section now:

  • Clearly contextualizes prior studies in lightweight cryptography, chaotic systems, and steganography in relation to our proposed hybrid framework.
  • Highlights specific limitations in existing works and how our method addresses these gaps, particularly regarding robustness under wireless impairments in IoT systems.
  • Ensures all references are interpreted, compared, and integrated in the authors’ own words, maintaining coherence with the paper’s overall contributions.

We have taken care to ensure that the writing is original and directly supports the novelty and motivation of our research. We thank the reviewer again for their insight, which has helped us enhance the scholarly integrity and depth of the manuscript.

*************************************************************************************************

Literature positions [7], [8], [9], [10] do not exist. This proves that at least part of the article was created by generative AI. Although the use of generative AI is not forbidden by MDPI (refer to ethics), authors are required to declare its use and check the generated text. The authors of this manuscript didn't check if the sources exist and if the generated text is true, and didn't declare AI use in the acknowledgement.

Response:  We sincerely thank the reviewer for bringing this critical issue to our attention. After a careful review, we acknowledge that references [7], [8], [9], and [10] were erroneously included and do not correspond to verifiable, published sources. This was an oversight on our part during the initial manuscript preparation. We take full responsibility for not verifying the validity of these citations and have now removed and replaced them with accurate, peer-reviewed sources that are directly relevant to the discussion.

Regarding the use of generative AI:
We confirm that parts of the manuscript draft, including preliminary summaries in the related works section, were assisted by generative AI tools for language refinement. However, all technical content, methodology, simulations, and analysis were conducted and authored by us. We understand and support MDPI’s ethics guidelines and have now explicitly acknowledged the use of AI for writing assistance in the revised version of the manuscript. We appreciate the reviewer’s observation, which helped us address this issue and improve the integrity and clarity of our work. A corrected version of the literature review and a formal AI use declaration will be included in the revised submission.

*************************************************************************************************

Line 40, There should be OFDM rather than IoT.

Response:  We thank the reviewer for pointing out this important clarification. We have corrected the terminology on line 40 by rewriting the abstract section. The revised statement now correctly describes the OFDM system as the focus of the transmission performance discussion.

We appreciate the reviewer’s careful reading and valuable feedback.

*************************************************************************************************

Line 60, Authors cite two articles which are about data processing due to GDPR regulations, not about wireless networking, as suggested at the beginning of the paragraph.

Response:  Thank you for your careful review. We acknowledge the mismatch between the content of the cited references and the context of the paragraph discussing wireless networking technologies. The cited GDPR-related articles ([1], [2]) are indeed not relevant to the topic of wireless IoT communication.

In the revised manuscript, we have replaced these citations with appropriate and domain-relevant references that specifically address wireless networking in IoT environments, including technologies such as Wi-Fi, Bluetooth, 4G/5G, and related security challenges.

  1. Al-Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M., and Ayyash, M., "Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications", IEEE Communications Surveys & Tutorials, volume: 17, issue: 4, pages: 2347-2376, 2015.
  2. Palattella, M. R., Dohler, M., Grieco, A., Rizzo, G., Torsner, J., Engel, T., and Ladid, L., "Internet of Things in the 5G Era: Enablers, Architecture, and Business Models", IEEE Journal on Selected Areas in Communications, volume: 34, issue: 3, pages: 510-527, 2016.

We appreciate the reviewer’s attention to detail, which has helped improve the accuracy and relevance of the manuscript.

************************************************************************************************

Line 77, Authors cite articles, which are on cryptographic algorithms (RC6 and image encryption), and have nothing on the topic of the paragraph (limitations of constrained IoT Edge devices)

Response:  We appreciate the reviewer’s observation. Upon review, we agree that the cited articles at line 77 focus on RC6 and image encryption and do not directly address the limitations of resource-constrained IoT edge devices, which is the core topic of the paragraph. We have revised the paragraph to include more appropriate and topic-aligned references that discuss computational, memory, and energy limitations in IoT edge environments. This correction ensures coherence between the text and the supporting literature. As shown in the following section:

  1. Clemente-Lopez, J. de J. Rangel-Magdaleno, and J. M. Muñoz-Pacheco, "A lightweight chaos-based encryption scheme for IoT healthcare systems," Internet of Things, vol. 25, p. 101032, 2024.
  2. Shruti, S. Rani, M. Shabaz, A. K. Dutta, and E. A. Ahmed, "Enhancing privacy and security in IoT-based smart grid system using encryption-based fog computing," Alexandria Engineering Journal, vol. 102, pp. 66–74, 2024.

Thank you for helping us improve the technical accuracy and integrity of the manuscript.

*************************************************************************************************
Line 80, unwanted "and" at the end of the line.

Response:  Thank you for pointing out this typographical error. We have removed the unnecessary "and" at the end of line 80 to improve the clarity and grammatical correctness of the sentence.

*************************************************************************************************
Line 200, literature knows chaotic maps as "baker", not "backer" maps.

Response:  We thank the reviewer for highlighting this important correction. You are absolutely right—“Baker” is the correct term used in the literature to refer to this class of chaotic maps. We have corrected all instances of “Backer” to “Baker” throughout the manuscript to ensure consistency with established terminology.

We appreciate your careful reading and helpful feedback.

*************************************************************************************************

Line 204, the beginning of the sentence sounds erroneous: "Chaotic Maps Chaos theory..."

Response:  Thank you for pointing this out. We agree that the original phrasing was unclear and redundant. We have revised the sentence to begin more clearly and grammatically, as follows:

"Chaos theory describes the behavior of certain nonlinear dynamic systems that exhibit sensitivity to initial conditions and complex dynamics."

This revision improves readability and eliminates redundancy while maintaining the intended meaning. We appreciate your attention to clarity and style.

*************************************************************************************************
Line 212, Backer

Response:  Thank you for your continued attention to accuracy. As noted earlier, "Backer" is a typographical error. We have corrected "Backer" to the proper term "Baker" at line 212 and ensured consistency throughout the manuscript.

We appreciate your meticulous review.

*************************************************************************************************

Line 218, one of the articles cited [22] is about the modulation techniques and has nothing in common with baker maps and permutations, as mentioned in the sentence

Response:  Thank you for catching this inconsistency. We agree that reference [22] relates to modulation techniques and not to Baker maps or permutations. The citation has been removed from this context, and a more appropriate reference on Baker maps has been added instead. As shown in the following section:

  1. Faragallah O, El-Samie F, Ahmed H, Elashry I, Mai Helmy,, El-Rabaie E, Alshebeili S, “Image encryption: a communication perspective.”, CRC Press, 2013.

*************************************************************************************************
Line 220, it should be described what x and y are in the equations (1) and (2)

Response:  Thank you for the valuable feedback. We have revised the manuscript accordingly.

In the equations (1) and (2) of the Baker map, we now clarify that variables x and y represent normalized spatial coordinates within a unit square [0, 1] for both of them.

Here's what they mean in context:
- x: the horizontal coordinate of a point in the unit square.
- y: the vertical coordinate of the same point.
Together, (x, y) identifies a specific point in a 2D space (or grid), which corresponds to a data element, such as a pixel in an image or a sample point in audio, depending on the application.
In Practical Use (e.g., for encryption):
- The unit square is a continuous model. For real digital data like an N × N audio matrix:
  - x and y would be normalized indices, calculated as:
    x = i / N,     y = j / N

 where i and j are integer indices of rows and columns (or time/frequency axes in audio).
- After transformation using the Baker map, the coordinates are mapped to new positions, effectively permuting the data in a chaotic but reversible way.

*************************************************************************************************
Line 238, Describe symbols used in equations

Response:  Thank you for your helpful suggestion. We have added a detailed explanation of all the symbols used in the encryption equations, including definitions for variables such as Pi, Ci​, Ek(⋅), Dk(⋅), k, IV, and the XOR operation (⊕). These descriptions are now included in the revised manuscript to enhance clarity and improve the reader's understanding of the encryption process.

*************************************************************************************************

Line 241, Explain what IV is

Response:  We have clarified in the text that IV refers to the Initialization Vector used in the encryption algorithm, which ensures unique ciphertexts even when the same plaintext and key are used.

************************************************************************************************

Line 244, should be "in Figure 4"

Response:  Thank you for pointing this out. We have corrected the phrasing to "in Figure 4" as suggested, to ensure proper and consistent reference formatting throughout the manuscript.

*************************************************************************************************
Line 247, Describe symbols used in equations.

Response:  Thank you for your valuable feedback. We have now added clear definitions of all symbols used in the relevant equations, including Pj​, Cj, Ij​, Ek(⋅), k, IV, and the XOR operation (⊕). These definitions are provided alongside the description of the CFB mode to enhance clarity and ensure the equations are fully understandable to the reader.

*************************************************************************************************
Line 360, The description of figure 9 is unclear. It's written that horizontal lines for histograms start with zero, but in figure 9 they start with -1.

Response:  Thank you for your detailed feedback. We have revised the caption and body text related to Figure 9 to clearly distinguish the types of images, spectrogram, and histograms. The histogram of a spectrogram normally begins at 0, unless the data has been transformed (e.g., log-scaled or normalized to include negative values). In this figure, since it starts at -1, the spectrogram data was normalized beyond its raw form.

*************************************************************************************************

There are two types of pictures in figure 9, and they should be clearly explained with a description of both axes and the contents of the images. Every part of the image should have the individual designator and description. It's not explained what the colors (purple, yellow) of the vertical bars mean.

Response:  Thank you for your valuable feedback. We acknowledge the need for clearer explanations and more detailed labeling of the spectrogram and histogram figures. Accordingly, we have revised the manuscript to include the necessary clarifications, with all modifications highlighted in red for easy reference. Also:

Figure Explanation and Labeling:

  1. Spectrogram Images (in Figure 9):
    • X-axis (Time in seconds): Represents the progression of the audio signal over time.
    • Y-axis (Normalized Frequency × π radians/sample): Shows the normalized frequency components of the signal.
    • Color Intensity: Reflects the magnitude of the frequency components; brighter (yellow) areas indicate higher energy at those frequencies and times.
    • Purpose: This spectrogram visualizes the time-frequency representation of the audio after applying the encryption and steganography embedding.
  2. Histogram Images (in Figure 9):
    • X-axis: Displays the binned values of the normalized spectrogram magnitudes, ranging from -1 to 1. This normalization was applied during preprocessing to enable consistent comparison.
    • Y-axis: Represents the frequency count (number of occurrences) of values within each bin.
    • Bar Colors: The histogram compares the statistical distribution of the spectrogram values at different specific values of the normalized spectrogram magnitudes, highlighting the preservation of the signal's overall structure. Nothing more. This color distinction helps visually demonstrate the effectiveness of the methodology in flattening the histogram and thereby obscuring the signal characteristics.

*************************************************************************************************
Line 373, figure 9 should be reorganized, cause it's unclear which histogram shows results of which configuration.

Response:  Figure 9 has been reorganized and relabeled. The histogram results are now grouped and labeled according to the corresponding configurations, making it clear which data belong to which case.

*************************************************************************************************
Line 404, It refers to grayscale images, not audio files  

Response:  This was an oversight. We have corrected the term from “audio files” to “grayscale images.”

*************************************************************************************************
Line 480 and 504, the tables caption are unfinished

Response:  Both table captions have been completed to clearly describe the data presented and the context of each table.

*************************************************************************************************

Position [30] is available with a different title. There is "images" instead of "audios" in the title

Response:  We have updated the reference title in position [30] to reflect the correct title, replacing “audios” with “images.”

*************************************************************************************************

Position [33] description is continued in [34]; they should be merged. Additionally, there are again "audios" instead of "images".

Response:  References [33] and [34] have been merged into a single citation and the incorrect term “audios” has been replaced with “images” where applicable.

*************************************************************************************************

I also had trouble understanding the configuration of the system for simulations. I think a short introductory subchapter would be OK.

Response:  We appreciate this suggestion. A new subsection at the beginning of the simulation section has been added to clearly outline the system configuration used for the simulations.

*************************************************************************************************

The paper requires careful editing work, including:

- Some texts in tables partially disappear.

- Subchapters' numbering in chapter 8 is incorrect. There are different font styles used, and there are two 8.4 sections.

Response: We have carefully edited the manuscript to ensure consistent font styles throughout. All tables have been reformatted to ensure no text is missing. Subchapter numbering in Chapter 8 has been corrected, and duplicate “8.4” sections have been resolved.

*************************************************************************************************

The paper should be carefully read for English language and formatting.

Response: Thank you for your valuable feedback. The entire manuscript has been thoroughly revised to improve English language clarity, grammar, and formatting consistency. All modifications and corrections made in response to this comment are highlighted in red throughout the revised manuscript for your easy reference.

*************************************************************************************************

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The paper studies secured audio framework for Internet of Things systems. Some algorithms and experiments are presented. The reviewer has the following comments.

  • The paper sometimes talks about security and encryption, and sometimes talks about orthogonal channels (which is for avoiding interference). You need to develop your paper for a focused motivation.
  • We have already had networking protocols for security and encryption, e.g., TLS. The paper just simply mentions that cannot be used for advanced IoTs. The authors should clearly explain why the current networking protocols like TLS cannot encrypt and secure audio data for IoT applications?
  • It is also unclear what audio data you work on. It would be great if you could explain some application examples of such IoT-based audio application.
  • Also, many studies have proposed to develop AI-based threats prediction and security protection. The paper should clearly justify why these studies are not good enough for IoT audio data. For example, you may refer to: Characterizing and Classifying IoT Traffic in Smart Cities and Campuses, A lightweight Intrusion Detection for Internet of Thing-Based smart building, etc.
  • Most figures are with low quality. The experimental results are hard to understand. Please explain these performance metrics that you collect data for. The justification in line with your study motivation is important to keep the paper coherent.

Author Response

The paper sometimes talks about security and encryption and sometimes talks about orthogonal channels (which is for avoiding interference). You need to develop your paper for focused motivation

Response:  We appreciate the reviewer’s observation. The manuscript has now been revised to clarify the central motivation and ensure a cohesive narrative. The core contribution of the paper is the development of a secure audio transmission system for IoT environments. To achieve this, we introduce a hybrid encryption method based on chaotic systems and steganography, which serves as the primary security mechanism.

The discussion of Orthogonal Frequency Division Multiplexing (OFDM) is included not as a separate focus, but as the wireless transmission medium through which our proposed secure system is evaluated. The use of OFDM aligns with the real-world transmission scenarios for IoT devices, which are often affected by interference and noise. Therefore, we study the robustness of the proposed encryption approach within such a realistic communication framework to demonstrate its practical applicability and resilience.

To improve clarity and focus, we have revised the abstract and introduction to emphasize the security goal and clarified that OFDM is a transmission method used to test and validate the effectiveness of our proposed secure audio communication technique. Additional linking sentences and transitional explanations were also added in the relevant sections to better integrate the encryption and transmission aspects of the system. We also updated other parts of the manuscript (e.g., Section 7 and the Conclusion) to ensure they align with this focused motivation. All edited version are highlighted in red color.

*************************************************************************************************

We have already had networking protocols for security and encryption, e.g., TLS. The paper just simply mentions that cannot be used for advanced IoTs. The authors should clearly explain why the current networking protocols like TLS cannot encrypt and secure audio data for IoT applications?

Response:  We thank the reviewer for this important observation. We agree that widely adopted protocols such as TLS provide strong security guarantees for conventional networked systems. However, our manuscript specifically targets resource-constrained IoT environments where standard protocols like TLS may not be feasible or optimal. In the revised version of the manuscript, we have added a detailed explanation to clarify this point.

Specifically, TLS involves complex handshake mechanisms, certificate management, and relatively high computational and memory overhead, which are unsuitable for many IoT devices that operate with limited CPU capacity, memory, and energy resources. Furthermore, TLS secures communication at the transport layer, but does not offer end-to-end protection across all layers, particularly when intermediate nodes (e.g., gateways or edge devices) may perform data processing or storage.

Additionally, TLS is not designed for the granular data-level security needed in applications such as audio streaming in IoT, where data may be cached, segmented, or transmitted intermittently due to network variability. In contrast, our proposed method applies encryption and concealment at the data level (i.e., on the audio itself), ensuring that the content remains protected regardless of the underlying communication protocol.

To reflect this clarification, we have revised the introduction and discussion sections to explicitly address the limitations of TLS in IoT settings and motivate the need for lightweight, data-centric security solutions such as our proposed chaotic-steganography approach. The edited version is highlighted in red color.

*************************************************************************************************

It is also unclear what audio data you work on. It would be great if you could explain some application examples of such IoT-based audio application.

Response:  We thank the reviewer for pointing out the need for clarity regarding the type of audio data used and its application context. In the revised manuscript, we have now specified that the audio data used in our simulations includes standard voice and environmental audio samples, representative of typical IoT applications such as smart home voice control systems, industrial sound monitoring, and healthcare audio surveillance (e.g., patient distress detection).

To enhance clarity and relevance, we have added examples of real-world IoT-based audio applications that benefit from secure transmission, including:

  • Smart Home Assistants (e.g., Alexa, Google Home), where user voice commands must be securely transmitted and protected from eavesdropping.
  • Industrial IoT (e.g., predictive maintenance systems), where machinery noise patterns are monitored and transmitted to detect failures or anomalies.
  • Healthcare IoT (e.g., remote elderly care), where devices capture ambient or vocal audio to detect falls, breathing irregularities, or emergency situations.

These use cases demand real-time, low-power, and secure transmission of audio data—conditions under which traditional encryption protocols struggle. Our proposed chaotic-steganography framework is particularly well-suited for these scenarios, offering lightweight, embedded security without compromising audio quality or system responsiveness.

We have included these clarifications and examples in the updated introduction and system model sections.

*************************************************************************************************

Also, many studies have proposed to develop AI-based threats prediction and security protection. The paper should clearly justify why these studies are not good enough for IoT audio data. For example, you may refer to: Characterizing and Classifying IoT Traffic in Smart Cities and Campuses, A lightweight Intrusion Detection for Internet of Thing-Based smart building, etc.

Response:  We appreciate your insightful comment regarding AI-based security approaches in IoT systems. We acknowledge that AI-driven intrusion detection and traffic classification methods have made significant contributions to IoT security, particularly in areas such as anomaly detection and traffic pattern analysis. However, these techniques primarily operate at the network or transport layer, aiming to detect threats based on traffic behavior, device signatures, or metadata patterns.

Our work, in contrast, addresses a different but complementary security dimension—data-level protection, specifically for audio content. AI-based systems are generally reactive, identifying threats after network activity has been analyzed. They also rely heavily on high-quality datasets and continuous training, which may not be feasible for all IoT deployments, especially those with limited computational resources, intermittent connectivity, or real-time performance constraints.

Moreover, such AI-based frameworks do not directly secure the audio payload itself. Once an attacker bypasses or evades the detection mechanism, the raw audio data—if unencrypted—can be exposed. In contrast, our proposed hybrid chaotic-steganography approach ensures that even if a network is compromised, the audio content remains encrypted and concealed at the source, making it inaccessible without the proper decryption mechanism.

We have added a discussion in the revised manuscript (Introduction and Related Works sections) to highlight this distinction and explain why lightweight, proactive, data-level encryption methods like ours are essential for scenarios where AI-based systems alone may not provide sufficient protection for sensitive IoT audio data.

*************************************************************************************************

Most figures are with low quality. The experimental results are hard to understand. Please explain these performance metrics that you collect data for. The justification in line with your study motivation is important to keep the paper coherent.

Response: Thank you for your valuable feedback. We have addressed your concerns as follows:

  1. Figure Quality:
    All figures have been revised to ensure higher resolution and clarity.
  2. Experimental Results Presentation:
    We have reorganized and clarified the presentation of our experimental results to enhance readability. Key trends and comparisons are now explicitly highlighted in both the figure captions, tables and the main text.
  3. Performance Metrics Explanation:
    We have added a detailed explanation of  performance metrics used in our study in Section [8]. This helps readers understand how these metrics relate to our system’s effectiveness.
  4. Justification and Coherence:
    To strengthen the coherence of the paper, we have explicitly linked each performance metric and result back to our study’s motivation and research objectives. These connections are now clearly articulated in the revised introduction and discussion sections.

We believe these revisions significantly improve the quality, clarity, and coherence of our manuscript.

Author Response File: Author Response.pdf

Back to TopTop