Perspectives on Generative Sound Design: A Generative Soundscapes Showcase
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsI have doubts regarding the part which discusses historical context of generative music, chapter 1.2.1. This section cites correct examples of early generative music; however, it seems to be irrelevant here as the article is supposed to discuss generative sound instead. It seems that the author(s) identifies(-y) generating sound and generating music as the same field, while in my opinion these are completely separate, nearly unrelated fields. Perhaps the author(s) misinterpret(s) sound design as creation of musical structure. For instance: the example of J. S. Bach's works with basso continuo applies to crafting a musical piece "from notes" – I would say – regardless of instruments used, even if the article discusses crafting sound (or soundscapes). Of course, free interpretation of basso continuo, even while keeping functional harmony unchanged, leads to many variations in chords arrangement, which in consequence may cause certain qualities of timbre to vary, but – to my mind – the effect is not pronounced enough to qualify as a part of sound crafting field.
Although sound design and music composition are fields that overlap each other to some extent, there is explicit boundary in the sense that in most cases musical piece is rather about building a structure regardless of the material used, while designing or generating sound is about projecting the material itself from certain components, like dynamic or spectral envelope.
I have a similar reservation to the section 1.2.2 – the example of Xenakis who "shaped musical structures" while the section's title is "Analog[ue] electronics for generative sound". Again, the concepts of musical structure and sound itself are mixed.
Referring to my aforementioned thoughts, I would suggest an alteration. I think the author(s) should change the title of the section 1.2.1 from "The beginnings of generative music" to "The beginnings of generative sound" and then replace examples of musical composition with examples of so-called early sound design such as inventions of musical instruments, their evolution from the perspective of achieving desired sound qualities (my proposition of a decent example would be: development of pipe organ stops which are named after "original" instruments like flute, trumpet, oboe, vox humana and many others – it would be a great demonstration how inventors strove to achieve specific properties of the sound by experimenting with physical properties of pipes and materials they are made of).
The section 1.2.2 discusses the beginnings of generative sound in the context of analogue electronics. The author(s) uses the term "analog", which is common but not strictly correct. The correct form is "analogue" as opposed to "digital" or "discrete". This minor error should be corrected before publication of the paper.
I think – even if it's not very important for the discussed topic – many ingenious electronics designers should be mentioned among others in the section 1.2.2. The art of analogue circuits in sound design and engineering (filters and synthesizers in particular) is a very essential base for further developments in history. Moreover, the invention of operational amplifier is worth mentioning, as it is a type of circuit that unleashed rapid progress in electronics and thus revolutionized the fields of sound and music.
These are the only substantial reservations I have about the paper. Although I would not consider them as serious mistakes – in my opinion they are just inaccuracies that can be easily corrected – as a reviewer I would request for alteration of the chapters 1.2.1 and 1.2.2 in the manner I described.
Comments for author File: Comments.pdf
Author Response
Comments 1:
„I have doubts regarding the part which discusses historical context of generative music, chapter 1.2.1. This section cites correct examples of early generative music; however, it seems to be irrelevant here as the article is supposed to discuss generative sound instead. It seems that the author(s) identifies(-y) generating sound and generating music as the same field, while in my opinion these are completely separate, nearly unrelated fields. Perhaps the author(s) misinterpret(s) sound design as creation of musical structure. For instance: the example of J. S. Bach's works with basso continuo applies to crafting a musical piece "from notes" – I would say – regardless of instruments used, even if the article discusses crafting sound (or soundscapes). Of course, free interpretation of basso continuo, even while keeping functional harmony unchanged, leads to many variations in chords arrangement, which in consequence may cause certain qualities of timbre to vary, but – to my mind – the effect is not pronounced enough to qualify as a part of sound crafting field.
Although sound design and music composition are fields that overlap each other to some extent, there is explicit boundary in the sense that in most cases musical piece is rather about building a structure regardless of the material used, while designing or generating sound is about projecting the material itself from certain components, like dynamic or spectral envelope.
I have a similar reservation to the section 1.2.2 – the example of Xenakis who "shaped musical structures" while the section's title is "Analog[ue] electronics for generative sound". Again, the concepts of musical structure and sound itself are mixed.
Referring to my aforementioned thoughts, I would suggest an alteration. I think the author(s) should change the title of the section 1.2.1 from "The beginnings of generative music" to "The beginnings of generative sound" and then replace examples of musical composition with examples of so-called early sound design such as inventions of musical instruments, their evolution from the perspective of achieving desired sound qualities (my proposition of a decent example would be: development of pipe organ stops which are named after "original" instruments like flute, trumpet, oboe, vox humana and many others – it would be a great demonstration how inventors strove to achieve specific properties of the sound by experimenting with physical properties of pipes and materials they are made of).”
Response 1:
The author acknowledges that the analogy between generative music and generative sound may initially appear imprecise and perhaps overstated. The primary focus of this article, however, lies in the procedural construction of sonic messages within defined frameworks and rule-based systems. The concept of media generation is used here as a broader umbrella that encompasses various forms of generative content, including both music and sound. In the context of AI-generated audio, the author refers to this analogy in line with interpretations proposed by authors such as Marcus du Sautoy in The Creativity Code: How AI is Learning to Write, Paint and Think (2019), where the composer is described as someone who defines the generative framework, while the final sonic or musical output is determined by chance, variation, or stochastic processes.
The author maintains that the examples included in section 1.2.1 serve to illustrate the procedural logic underpinning generative content, which conceptually aligns with the core theme of this study. However, the author also acknowledges the reviewer's important distinction between music composition and sound design, particularly regarding the autonomy of sound design and soundscape practices as distinct from traditional musical structures.
In response to this concern, the author has decided to retain the subsection on generative music as a reference point for procedural thinking, while simultaneously expanding the historical narrative with a new subsection titled “1.2.2. Stochastic soundscapes origins.” This added section presents early examples of sound design rooted in the use of natural and unpredictable elements such as wind and water — for example, aeolian harps and sea organs — which serve as non-musical but generative systems of sound production. These examples highlight how human perception and environmental variability have historically shaped non-notated sonic structures, thereby clarifying the distinction between music and sound design while enriching the article’s conceptual framework.
Text excerpt after revision (Section 1.2.1 remains unchanged. Section 1.2.2 has been added):
1.2. Historical context
1.2.1. The beginnings of generative music
In the history of music, composition has involved meticulous preplanning and crafting of fixed sequences of notes, with classical composers like Bach and Beethoven establishing structured frameworks that combined careful planning with space for interpretation (Taruskin and Gibbs 2013). Bach’s basso continuo notation, for instance, provided a harmonic foundation while allowing performers to interpret and elaborate on the given material, leading to subtle variations between performances. Similarly, dynamics, phrasing, and ornamentation were shaped by a performer’s individual style, introducing an element of “human randomness” into otherwise structured composi-tions.
An early example of deliberately incorporating stochastic processes into music can be found in Mozart's Musikalisches Würfelspiel (Musical Dice Game). This compositional tool used dice rolls to assemble measures of music from predefined options, creating a unique piece with each play. While Mozart provided the building blocks, the element of chance determined the final outcome, blending deterministic composition with probabilistic execution evoking inspiration in modern composers (Wayne 2022).
1.2.2. Stochastic soundscapes origins
Outside of the musical tradition, generative soundscapes also have a rich historical precedent. Listening to natural phenomena such as wind, waves, or rain inspired a form of sonic awareness rooted in change and irregularity. In many cultures, wind chimes, or water-driven instruments were created to channel these forces into sound-producing structures. These devices may exemplify an early form of procedural sound design — their creators set physical parameters, but the output was determined by environmental variability, such as wind speed or water flow (Toop 1995; Schafer 1993).
One notable example is the aeolian harp, a stringed instrument played by the wind, dating back to ancient Greece and later popularized in the Romantic era (Blesser and Salter 2007). Its ethereal tones, shaped by airflow rather than human agency, represent an early fusion of natural indeterminacy with intentional sonic construction. Similarly, sea organs such as the modern Sea Organ in Zadar, Croatia, transform the irregular force of ocean waves into evolving harmonic textures by channeling air through tuned pipes (Stamać 2005).
Comments 2:
„The section 1.2.2 discusses the beginnings of generative sound in the context of analogue electronics. The author(s) uses the term "analog", which is common but not strictly correct. The correct form is "analogue" as opposed to "digital" or "discrete". This minor error should be corrected before publication of the paper.”
Response 2:
The author thanks the reviewer for identifying this linguistic inconsistency. The observation is valid and the term “analog” has been corrected to the British English spelling “analogue” throughout the manuscript. In the interest of maintaining stylistic consistency across the article, additional corrections were also made: “behavior” was changed to “behaviour” in section 2.2 (Generative soundscapes), and “centered” was corrected to “centred” in section 3.1 (Main assumptions).
Comments 3:
„I think – even if it's not very important for the discussed topic – many ingenious electronics designers should be mentioned among others in the section 1.2.2. The art of analogue circuits in sound design and engineering (filters and synthesizers in particular) is a very essential base for further developments in history. Moreover, the invention of operational amplifier is worth mentioning, as it is a type of circuit that unleashed rapid progress in electronics and thus revolutionized the fields of sound and music.”
Response 3:
The author appreciates the reviewer’s thoughtful suggestion and agrees that acknowledging the contributions of electronics designers provides important historical and technological context. In response, section 1.2.3 (Analogue electronics for generative sound) has been expanded to include a discussion of the foundational role of analogue circuit design in the evolution of sound synthesis. Specifically, the revised section highlights the contributions of pioneers such as Robert Moog and Don Buchla, whose work on modular synthesizers introduced artists to a new range of voltage-controlled tools, including oscillators, filters, and envelope generators.
Text excerpt after revision (third paragraph has been added):
1.2.3 Analogue electronics for generative sound
The advent of analogue technology in the 20th century expanded the boundaries of sound creation. Composers such as Edgard Varèse and Karlheinz Stockhausen began experimenting with electronic sound generators, tape manipulation, and feedback systems (Griffiths 2010). These innovations introduced the concept of semi-automated sound creation, where the composer defined a system or process but left room for variability within its execution.
This era marked a philosophical shift, as composers like John Cage embraced indeterminacy, surrendering control over certain musical parameters to let systems or performers introduce randomness. Iannis Xenakis expanded this approach by applying mathematical models, such as the probability theory and the Markov chains, to shape musical structures (Xenakis and Kanach 1971).
However, this creative explosion would not have been possible without the parallel innovations of electronics engineers. The development of modular synthesizers, pioneered by figures such as Robert Moog and Don Buchla, brought voltage-controlled oscillators, filters, envelope generators and operational amplifiers into the hands of artists giving them even more space for creative exploration and parametrization and randomization (Pinch and Trocco 2009).
Comments 4:
These are the only substantial reservations I have about the paper. Although I would not consider them as serious mistakes – in my opinion they are just inaccuracies that can be easily corrected – as a reviewer I would request for alteration of the chapters 1.2.1 and 1.2.2 in the manner I described.
Response 4:
The author considers the reviewer’s observations to be valuable and constructive, and has taken them into account in the final version of the manuscript. The revisions to sections 1.2.1 and 1.2.2 have been implemented as suggested, in order to improve conceptual clarity and better reflect the distinction between generative music and generative sound. Text excerpts have been provided in earlier responses.
The author would like to express sincere gratitude to the reviewer for their thoughtful, constructive, and detailed feedback. The comments provided have been instrumental in refining both the conceptual and technical aspects of the manuscript. The author greatly appreciates the reviewer's engagement with the topic and the valuable suggestions that have contributed to strengthening the overall quality and clarity of the work.
Reviewer 2 Report
Comments and Suggestions for AuthorsFigure 2. The picture is not legible enough. The characters are very small, it can cause discomfort in receiving the content. I suggest to submitt a graphic with higher resolution and quality.
Author Response
Comments 1:
„Figure 2. The picture is not legible enough. The characters are very small, it can cause discomfort in receiving the content. I suggest to submitt a graphic with higher resolution and quality.”
Response 1:
The author presented a screenshot from the widely used Digital Audio Workstation (DAW) Reaper to illustrate a set of waveform layers. However, the author agrees that certain interface elements may not have been sufficiently legible and thanks the reviewer for pointing out this issue.
In response, the original screenshot-based figure has been replaced with a newly created diagram. This updated version presents a clearer and more legible waveform visualization of the generative soundscape pack. Each layer corresponds to a distinct sound concept within a forest environment, and all sound files have been normalized for visual consistency. The new figure ensures improved resolution, readability, and overall clarity for the reader.
The author added minor text changes to section 5.1 to align with the revised version of Figure 2. The updated description reflects the new waveform visualization with corresponding nametags and normalization for graphical clarity.
Text excerpt after revision:
5 Preliminary testing and empirical research
Initial testing and empirical research have yielded preliminary conclusions and observations regarding the proposed solution.
5.1 Highlights
The proposed pipeline effectively facilitates the generation of multi-layered soundscapes, combining atmospheric backgrounds with distinct point sounds. This approach has demonstrated substantial potential in creating complex auditory representations of specific locations or conceptual ideas, offering valuable tools for sound designers.
An example of the generated output is shown in Fig. 2, which presents a sample generative soundscape pack featuring forest sounds visualized as waveforms with corresponding nametags. The figure illustrates the arrangement of atmospheric and point sound layers in one 10-second long soundscape.
Figure 2. Waveform visualization of a generative forest soundscape pack. Each waveform represents a distinct 10-second sound layer corresponding to a specific conceptual element within the soundscape. All sound files were normalized for the purpose of graphical representation (source: own research).
Reviewer 3 Report
Comments and Suggestions for AuthorsFile review.docx
Comments for author File: Comments.pdf
Author Response
Comments 1:
„Weaknesses and recommendations:
- Ethical reflection could be deepened (chapter 2.5 merely hints at the topic),
- Consider adding a separate subchapter describing specific use cases (e.g. VR project, game, installation) – at the author's discretion.”
Response 1:
The author agrees with the reviewer’s observation regarding the limited scope of ethical discussion in the original manuscript. Ethical concerns surrounding generative technologies indeed represent a broad and ongoing field of debate. While ethics is not the primary focus of this paper, the author acknowledges its relevance and has therefore expanded section 2.5 (Ethical discourse) to include a more in-depth reflection on issues such as authorship, transparency, data bias, and the risks associated with deepfake audio. This addition aims to provide a more balanced perspective while maintaining the article’s core emphasis on technical and creative applications of generative sound systems.
Regarding the second suggestion to include specific use cases (e.g., VR project, game, installation), the author appreciates the recommendation and considers it valuable. However, due to the scope of the current article, the author has decided to reserve a detailed use-case analysis for future work.
Text excerpt after revision (three additional paragraphs have been added):
2.5 Ethical discourse
An important consideration in the application of artificial neural network technologies is the ethical implications surrounding their use (Barnett 2023). The fair use of audio samples and the responsibility of model creators to source training data ethically and legally are significant issues in the discourse on the development and application of such tools. However, this article focuses on exploring the potential applications of these technologies rather than establishing ethical or legal norms.
In the domain of generative sound, more ethical challenges emerge. For instance, when AI models replicate stylistic elements or voices of real composers, musicians, or sound designers, the boundaries of artistic authorship and identity become blurred. Similarly, the use of AI-generated audio in contexts such as media, advertising, or video games may obscure the origin and authorship of content, leading to reduced accountability as seen in other media domains (Ray 2023).
Another concern is bias in training data, which may result in unrepresentative outputs or the perpetuation of cultural stereotypes. Deepfake audio technology further complicates ethical considerations, as it enables the synthesis of realistic yet deceptive content, potentially leading to misinformation or the erosion of trust in recorded media.
While this article focuses on the technical and creative potentials of generative transformers for sound design, further research is needed to establish ethical guidelines and verification frameworks for sound generation systems that ensure fairness, consent, and accountability throughout their lifecycle.
Comments 2:
"Overall assessment:
The article is at a very high substantive and technological level. I strongly recommend its publication, after taking into account the minor additions indicated above.
I recommend the article for publication in the Arts magazine – this article will certainly significantly enrich the volume being published. As I wrote in the comments earlier, I ask for one annotation to be supplemented – ethical issues, while the second one I leave to the author(s) to assess whether he/she wants to supplement."
Response 2:
The author sincerely thanks the reviewer for the positive and insightful evaluation of the manuscript. The recognition of the work’s relevance, clarity, and originality is greatly appreciated. Such encouraging feedback affirms the chosen research direction and motivates further development of the topic in future studies. The author is particularly grateful for the appreciation of the article’s contribution to both theoretical reflection and practical application in the field of generative sound design.