Next Article in Journal
Joint Optimization Model for Earthwork Allocation Considering Soil and Water Conservation Fees, Landscape Restoration Fees, and Road Transportation Intensity
Previous Article in Journal
Quantifying Fire Risk Index in Chemical Industry Using Statistical Modeling Procedure
 
 
Review
Peer-Review Record

Audio Watermarking: Review, Analysis, and Classification of the Most Recent Conventional Cutting-Edge Results

Appl. Sci. 2025, 15(21), 11514; https://doi.org/10.3390/app152111514
by Carlos Jair Santin-Cruz * and Gordana Jovanovic Dolecek *
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Appl. Sci. 2025, 15(21), 11514; https://doi.org/10.3390/app152111514
Submission received: 18 September 2025 / Revised: 10 October 2025 / Accepted: 25 October 2025 / Published: 28 October 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This manuscript surveys 41 articles in the field of audio watermarking from 2016 to 2023 and categorizes the algorithms proposed in these articles based on their processing workflows. Given that the most recent comprehensive review in this field was published in 2016, the motivation behind this work is commendable.

However, there are still some issues in the manuscript that must be improved.

  1. The authors should discuss the Motivation and Novelty of this work in the Introduction, while moving the performance of Audio Watermarking to Section 2, such as imperceptibility, robustness, and reversibility. Additionally, the applications of audio watermarking should also be placed in Section 2. Overall, the structure of the first three sections needs to be reorganized.
  2. In lines 91–95, the statement "...audio watermarking should be imperceptible..." contradicts the description in lines 41–44: "This classification is based on whether the goal is for the watermark to be noticeable to any user." If watermarks should be imperceptible, there should not exist watermarking algorithms that entirely abandon this goal.
  3. The descriptions of imperceptibility in lines 100–104 overlap with those in lines 41–44 and 91–95. Two of these three parts should be deleted.
  4. Lines 100–119 introduce some performance of audio watermarking. I believe the evaluation methods or metrics for these performances should also be included. For the same reason, abbreviations such as ODG and SNR appearing in the manuscript are not explained (not all readers are well-trained).
  5. The paper classifies audio watermarking algorithms based on their processes. I do not think this classification appropriate, for two reasons.

First, since most watermarking algorithms involve multiple processes in the flow, this classification splits a single algorithm into multiple parts, introducing them across different subsections. This makes it inconvenient for readers to grasp the motivation, innovations, or even the workflow of a specific algorithm, which contradicts the goal of a survey paper.

Second, many algorithms are proposed to address shortcomings in existing ones, so a survey paper should leverage the background of each algorithm to showcase the field's concerns, challenges, and progress over time—something this classification fails to achieve it.

  1. The hierarchical classification in the paper is quite confusing. For instance, in subsection 4.1, the hierarchical tree should be structured as,

├──Time Domain

└──Transform Domain

     ├──Wavelet Transform

     ├──Consine Transform

     ├──Matrix Decomposition

     │   ├──SVD

     │   └──LUD

     ├──Fourier Transform

     ├──Spikegram

     ├──FrCMT

     ├──GBT

     └──SSA

But the author presents all categories at the same level, which is perplexing. 

  1. The paper covers relatively few papers in the field, omitting highly influential methods like Spread Spectrum and currently trending deep learning approaches. I believe these algorithms are just as important as QIM.

Author Response

This manuscript surveys 41 articles in the field of audio watermarking from 2016 to 2023 and categorizes the algorithms proposed in these articles based on their processing workflows. Given that the most recent comprehensive review in this field was published in 2016, the motivation behind this work is commendable.

However, there are still some issues in the manuscript that must be improved.

Q.1. The authors should discuss the Motivation and Novelty of this work in the Introduction, while moving the performance of Audio Watermarking to Section 2, such as imperceptibility, robustness, and reversibility. Additionally, the applications of audio watermarking should also be placed in Section 2. Overall, the structure of the first three sections needs to be reorganized.

Answer:

We appreciate your constructive comment. We agree with the suggestion, and the manuscript's structure has been reorganized accordingly. The motivation and novelty of the work are presented in the Introduction. At the same time, the applications of audio watermarking and the performance criteria are discussed in Section 2, which now focuses on the fundamentals of audio watermarking.

 

Q.2. In lines 91–95, the statement "...audio watermarking should be imperceptible..." contradicts the description in lines 41–44: "This classification is based on whether the goal is for the watermark to be noticeable to any user." If watermarks should be imperceptible, there should not exist watermarking algorithms that entirely abandon this goal.

The descriptions of imperceptibility in lines 100–104 overlap with those in lines 41–44 and 91–95. Two of these three parts should be deleted.

Answer:

Thank you for pointed it out. The sentences in lines 91–95 have been removed to eliminate redundancy and avoid contradiction. In the remaining passages, the description of imperceptibility has been clarified. While the term 'imperceptibility' is commonly used to describe the ideal condition of an audio watermark being inaudible to listeners, it is, in practice, a subjective characteristic that can vary depending on the signal, listening conditions, and human perception. Therefore, the revised text now explains that different degrees of imperceptibility may be observed rather than an absolute condition.

 

Q3. Lines 100–119 introduce some performance of audio watermarking. I believe the evaluation methods or metrics for these performances should also be included. For the same reason, abbreviations such as ODG and SNR appearing in the manuscript are not explained (not all readers are well-trained).

Answer:

We appreciate your helpful comment. In response to this and similar suggestions from other reviewers, a new section has been added to the manuscript providing a detailed explanation of the evaluation metrics used to assess the performance criteria of audio watermarking. This section also includes the reported results from various studies, allowing readers to better understand and compare the effectiveness of different approaches. In addition, abbreviations such as ODG and SNR are now explicitly defined upon first mention to improve clarity for all readers.

 

Q4. The paper classifies audio watermarking algorithms based on their processes. I do not think this classification appropriate, for two reasons.

First, since most watermarking algorithms involve multiple processes in the flow, this classification splits a single algorithm into multiple parts, introducing them across different subsections. This makes it inconvenient for readers to grasp the motivation, innovations, or even the workflow of a specific algorithm, which contradicts the goal of a survey paper.

Second, many algorithms are proposed to address shortcomings in existing ones, so a survey paper should leverage the background of each algorithm to showcase the field's concerns, challenges, and progress over time—something this classification fails to achieve it.

Answer:

We respectfully disagree. A process-centric taxonomy is appropriate for audio watermarking because modern algorithms are inherently multi-stage and many of their key advances arise at the module level (e.g., psychoacoustic masking, feature extraction, embedding, synchronization, error control, and error correction). Collapsing an algorithm into a single label obscures these contributions. By contrast, classifying by processes exposes the design space and enables readers to compare like with like (e.g., different synchronization strategies or detectors), which is a central goal of a survey. The approach mirrors digital communications, where transceivers are studied via standard blocks (modulation, coding, synchronization, equalization). Watermarking pipelines have reached similar complexity; treating blocks as first-class elements is therefore both natural and informative.

 

Q5. The hierarchical classification in the paper is quite confusing. For instance, in subsection 4.1, the hierarchical tree should be structured as,

├──Time Domain

└──Transform Domain

     ├──Wavelet Transform

     ├──Cosine Transform

     ├──Matrix Decomposition

     │   ├──SVD

     │   └──LUD

     ├──Fourier Transform

     ├──Spikegram

     ├──FrCMT

     ├──GBT

     └──SSA

But the author presents all categories at the same level, which is perplexing.

Answer:

Thank you for pointing it out. The given hierarchical classification represents the proper structure. However, due to formatting and paragraph alignment issues, the intended hierarchy was not clearly displayed in the previous version. The indentation and layout have now been adjusted to ensure that the hierarchical organization is clear and correctly represented.

 

Q6. The paper covers relatively few papers in the field, omitting highly influential methods like Spread Spectrum and currently trending deep learning approaches. I believe these algorithms are just as important as QIM.

Answer:

We thank you for this valuable comment. We acknowledge that deep learning–based watermarking algorithms are important contributions within the field. However, as clarified in the revised Introduction, this paper is specifically dedicated to conventional audio watermarking schemes, which continue to play a vital role in protecting intellectual property, ensuring data integrity, and maintaining authenticity in the context of audio content creation and distribution. For this reason, the review focuses on conventional methods and their performance criteria.

A new paragraph has been added to the Introduction explaining the methodology used to select the works included in this survey. This section clarifies that the selection was based on conventional approaches reported from 2016 onward, following criteria such as relevance, technical completeness, and research impact. Regarding Spread Spectrum watermarking, while it was widely adopted in earlier systems, our analysis shows that it has become less prevalent in recent works compared to other conventional techniques such as QIM.

Reviewer 2 Report

Comments and Suggestions for Authors

1. Define a transparent review protocol. Add a subsection detailing databases searched, time window, keywords, inclusion/exclusion criteria, and screening procedure (e.g., PRISMA‑style flow), so readers can assess coverage and replicate/extend the survey. 

2. State a consolidated threat model. Provide a table enumerating the attack set and parameter ranges used in the surveyed papers (e.g., MP3/AAC bitrates, resampling factors, TSM ±2–5%, pitch‑shift semitones, AWGN SNR levels, filtering bandwidths, re‑recording conditions), and map each algorithm family’s robustness profile to this model. 


3. Add an application‑oriented summary table. For each representative method: list family, domain(s), detector type (blind/semi/non‑blind), typical payload (bits/s), imperceptibility metrics (ODG/PEAQ or SNR/LSD), robustness highlights, computational footprint, and best‑fit applications (e.g., broadcast monitoring, rights management, tamper detection, content ID, streaming). Cite Tables1–2 for domains but extend them with metrics and use‑case guidance. 

4. Clarify the contribution over prior surveys. Add a paragraph explicitly contrasting this stage‑based taxonomy with established taxonomies (domain‑based, detector‑based, robust/fragile) and include a small mapping table showing where classic categories reside within Fig.1’s stages. 

5. Strengthen coverage of desynchronization handling. Consolidate synchronization/resynchronization strategies into a guidance box that compares markers, pilot designs, and recovery flows against cropping, jitter, TSM, and time‑warping, with brief pros/cons. 

6. Add a short section on misuse risks (covert tracking, watermark detectability vs. privacy), limits of forensic inference, key management, and collusion/oracle attacks, with definitions of “security” distinct from “robustness.” 

7. Even if scoped as “conventional,” include a short subsection summarizing deep watermarking trends (autoencoder‑style, differentiable DSP for speech/music, TTS/diffusion watermarking), explaining scope boundaries and citing a few anchors for readers. Reference the “future work” statement in conclusions and make the scope explicit in the introduction.

Author Response

Q1. Define a transparent review protocol. Add a subsection detailing databases searched, time window, keywords, inclusion/exclusion criteria, and screening procedure (e.g., PRISMA‑style flow), so readers can assess coverage and replicate/extend the survey.

Answer:

We appreciate your valuable suggestion. Following this recommendation, two new paragraphs have been added to the introduction section to describe the methodology adopted for the survey. This subsection details the databases searched, the time window, the keywords used, and the inclusion/exclusion criteria.

Q2. State a consolidated threat model. Provide a table enumerating the attack set and parameter ranges used in the surveyed papers (e.g., MP3/AAC bitrates, resampling factors, TSM ±2–5%, pitch‑shift semitones, AWGN SNR levels, filtering bandwidths, re‑recording conditions), and map each algorithm family’s robustness profile to this model.

Q3. Add an application‑oriented summary table. For each representative method: list family, domain(s), detector type (blind/semi/non‑blind), typical payload (bits/s), imperceptibility metrics (ODG/PEAQ or SNR/LSD), robustness highlights, computational footprint, and best‑fit applications (e.g., broadcast monitoring, rights management, tamper detection, content ID, streaming). Cite Tables1–2 for domains but extend them with metrics and use‑case guidance.

Answer:

We appreciate your insightful recommendation and fully agree that presenting the corresponding performance metrics is essential to visualize the impact of the surveyed methods better. Accordingly, a new section (section 5) on performance criteria has been added, in which thresholds are also proposed to avoid ambiguous descriptions. In addition to this new section and its discussion, the reported results are presented in Tables 4, 5, 6, 7, and 8.

 

Q4. Clarify the contribution over prior surveys. Add a paragraph explicitly contrasting this stage‑based taxonomy with established taxonomies (domain‑based, detector‑based, robust/fragile) and include a small mapping table showing where classic categories reside within Fig.1’s stages.

Answer:

We appreciate your valuable recommendation. In response, a new paragraph has been added to clearly differentiate the proposed stage-based taxonomy from previously established classifications, such as domain-based or technique used in the embedded process. This paragraph explicitly explains how the proposed approach organizes algorithms according to their functional stages rather than one of their descriptive attributes.

 

Q5. Strengthen coverage of desynchronization handling. Consolidate synchronization/resynchronization strategies into a guidance box that compares markers, pilot designs, and recovery flows against cropping, jitter, TSM, and time‑warping, with brief pros/cons.

Answer:

We thank the reviewer for this insightful suggestion. We agree that synchronization and resynchronization strategies play a crucial role in addressing desynchronization issues, such as cropping, jitter, time-scaling modification (TSM), and time-warping. However, including a comparative table or guidance box may not be entirely appropriate in this case, as synchronization methods are often described only briefly in the reviewed papers and are not always presented with sufficient technical detail for consistent comparison. Moreover, the reported results are frequently evaluated using different metrics, which prevents an objective and uniform assessment across studies. For these reasons, a qualitative discussion of synchronization and resynchronization techniques is maintained throughout the text.

 

Q6. Add a short section on misuse risks (covert tracking, watermark detectability vs. privacy), limits of forensic inference, key management, and collusion/oracle attacks, with definitions of “security” distinct from “robustness.”

Answer:

We appreciate your valuable suggestion. In response, a new section has been added that describes in detail the concept of robustness and the classical attacks reported in the literature. This section discusses the main categories of attacks and their relationship to robustness evaluation, providing a clearer distinction between robustness and security within the context of audio watermarking.

 

Q7. Even if scoped as “conventional,” include a short subsection summarizing deep watermarking trends (autoencoder‑style, differentiable DSP for speech/music, TTS/diffusion watermarking), explaining scope boundaries and citing a few anchors for readers. Reference the “future work” statement in conclusions and make the scope explicit in the introduction.

Answer:

We appreciate your valuable suggestion. In response, a new section has been added that describes in detail the concept of robustness and the classical attacks reported in the literature. This section discusses the main categories of attacks and their relationship to robustness evaluation, providing a clearer distinction between robustness and security within the context of audio watermarking.

Reviewer 3 Report

Comments and Suggestions for Authors

This manuscript provides an updated review of conventional audio watermarking techniques published from 2016, with a particular focus on proposing a novel process-oriented classification that distinguishes between preprocessing, embedding and recovery, watermark processing, adaptive mechanisms, and auxiliary signals. The paper is timely and it succeeds in compiling and describing a significant number of recent works, including some that employ emerging tools such as quantum transforms, evolutionary computation, and machine learning. The language is technically sound, though somewhat dense in places, and the overall structure follows the Applied Sciences format with abstract, introduction, motivation, classification, and conclusions. The references are numerous, properly formatted, and include both foundational and very recent works.

The contribution of the paper lies more in systematizing existing new techniques through a classification framework that is process-based rather than criterion-based. This perspective is useful, and it does add value for readers trying to understand design trends. However, the review remains largely descriptive. It would benefit from more comparative analysis of the surveyed methods, for example through tables that summarize and contrast metrics such as robustness, imperceptibility, bit error rate, and computational complexity.

The abstract is somewhat generic and could be revised to emphasize concrete findings of the survey, such as the high prevalence of blind watermarking approaches or the dominance of wavelet-domain methods. The discussion and conclusions are brief, and the manuscript would be stronger if it expanded on future directions, especially the integration of watermarking with copyright enforcement frameworks and the challenges posed by generative AI. The connection to intellectual property (the most important aspect of the issue in the reviewer's opinion), although mentioned, remains underdeveloped and could be highlighted more explicitly in terms of legal applications.

Overall, the paper is a solid and useful survey that will be of interest to the multimedia security and digital rights management community. It is well suited for publication in Applied Sciences after minor to moderate revision, with the main priorities being clarification and enrichment of the abstract, addition of comparative elements, and expansion of the discussion on applications and future perspectives.

Author Response

This manuscript provides an updated review of conventional audio watermarking techniques published from 2016, with a particular focus on proposing a novel process-oriented classification that distinguishes between preprocessing, embedding and recovery, watermark processing, adaptive mechanisms, and auxiliary signals. The paper is timely and it succeeds in compiling and describing a significant number of recent works, including some that employ emerging tools such as quantum transforms, evolutionary computation, and machine learning. The language is technically sound, though somewhat dense in places, and the overall structure follows the Applied Sciences format with abstract, introduction, motivation, classification, and conclusions. The references are numerous, properly formatted, and include both foundational and very recent works.

 

Q1. The contribution of the paper lies more in systematizing existing new techniques through a classification framework that is process-based rather than criterion-based. This perspective is useful, and it does add value for readers trying to understand design trends. However, the review remains largely descriptive. It would benefit from more comparative analysis of the surveyed methods, for example through tables that summarize and contrast metrics such as robustness, imperceptibility, bit error rate, and computational complexity.

Answer:

We appreciate your insightful recommendation and fully agree that presenting the corresponding performance metrics is essential to visualize the impact of the surveyed methods better. Accordingly, a new section (section 5) on performance criteria has been added, in which thresholds are also proposed to avoid ambiguous descriptions. In addition to this new section and its discussion, the reported results are presented in Tables 4, 5, 6, 7, and 8.

 

Q2. The abstract is somewhat generic and could be revised to emphasize concrete findings of the survey, such as the high prevalence of blind watermarking approaches or the dominance of wavelet-domain methods. The discussion and conclusions are brief, and the manuscript would be stronger if it expanded on future directions, especially the integration of watermarking with copyright enforcement frameworks and the challenges posed by generative AI. The connection to intellectual property (the most important aspect of the issue in the reviewer's opinion), although mentioned, remains underdeveloped and could be highlighted more explicitly in terms of legal applications.

Answer:

We sincerely thank you for this thoughtful and constructive comment, which has undoubtedly contributed to the improvement of the paper's quality. In response, the abstract and the conclusions have been revised to highlight the main findings identified in the survey, as well as to emphasize possible directions for future research and the emerging threats associated with generative AI. Additionally, the connection between watermarking and intellectual property has been strengthened, providing a clearer perspective on its practical applications.

 

Q3. Overall, the paper is a solid and useful survey that will be of interest to the multimedia security and digital rights management community. It is well suited for publication in Applied Sciences after minor to moderate revision, with the main priorities being clarification and enrichment of the abstract, addition of comparative elements, and expansion of the discussion on applications and future perspectives.

Answer:

We sincerely appreciate your comment. We did our best to improve the paper accordingly.  

Reviewer 4 Report

Comments and Suggestions for Authors

The manuscript gives a comprehensive review of recent developments in audio watermarking. The authors of the manuscript give a process-based classification which is a significant contribution and offers clear value to researchers and practitioners by structuring the field into distinct stages (preprocessing, embedding/recovery, watermark process, adaptive process, auxiliary signals).

 

The paper is well-structured and easy to follow, even for someone not in this field. The authors provide a logical flow from motivation to classification and concluding remarks. It cites a wide range of recent sources, although more recent references could be added as well.

This Reviewer believes that the paper can be improved by the following:

Fig. 2  could be improved for readability and visual clarity. Higher resolution and more consistent formatting would benefit the reader. The authors may consider using some other chart rather than a pie chart.

Adding a short comparative discussion, in a summary table, and highlighting which approaches perform best for specific applications.

The paper is strong and contributes meaningfully to the literature in audio watermarking. With improvements in figures/tables, minor language polishing, it will be suitable for publication.

Author Response

The manuscript gives a comprehensive review of recent developments in audio watermarking. The authors of the manuscript give a process-based classification which is a significant contribution and offers clear value to researchers and practitioners by structuring the field into distinct stages (preprocessing, embedding/recovery, watermark process, adaptive process, auxiliary signals).

 

The paper is well-structured and easy to follow, even for someone not in this field. The authors provide a logical flow from motivation to classification and concluding remarks. It cites a wide range of recent sources, although more recent references could be added as well.

This Reviewer believes that the paper can be improved by the following:

Fig. 2  could be improved for readability and visual clarity. Higher resolution and more consistent formatting would benefit the reader. The authors may consider using some other chart rather than a pie chart.

Thank you for the suggestion. The pie chart in Figure 2 has been replaced with a bar chart to improve clarity and readability.

 

Adding a short comparative discussion, in a summary table, and highlighting which approaches perform best for specific applications.

We appreciate your helpful suggestion. Since each application may have specific requirements, it is difficult to include a single summary table of recommendations. However, following this comment and those from other reviewers, Tables 4, 5, 6, 7, and 8 have been added to present the reported results according to performance criteria under the proposed thresholds.

The paper is strong and contributes meaningfully to the literature in audio watermarking. With improvements in figures/tables, minor language polishing, it will be suitable for publication.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The authors fixed all remarks.

Back to TopTop