1. The Three Sirens: Context and Description
Over the past decade, the field of artificial intelligence (AI) has seen the fastest industrialization in its history, largely due to the emergence of deep learning, a subfield of machine learning relying on simplistic computational models of the biological brain of animals. These developments have had such a colossal impact on the creative environment of electronic art and music that it is easy to lose track of the broader conceptual, historical, and practical frameworks in which contemporary AI artistic approaches and related imaginaries are embedded.
As artist and scholar Simon Penny notes, the 1990s were marked by a consolidation of computational art practices that nonetheless remained in a fluid and open-ended phase, following the more loosely exploratory gaseous phase of the 1980s and preceding the current crystallization of new media into stabilized forms through disciplinary fields such as video games, virtual reality, and, more recently, so-called “AI-art” (
Penny 2017). The groundbreaking work of Nicolas Baginsky with artificial neural networks exemplifies that era of creative explorations. In 1992, Baginsky launched an artistic project that was visionary due to the originality of both its form and method. In Berlin, he created the robot guitar Aglaopheme (
Figure 1a), which became the first performer of a self-learning robotic band developed throughout the 1990s, soon joined by the robot drum Thelxiepeia (
Figure 1b), robot bass Peisinoe, and eventually other artificial agents, forming the autonomous robotic band The Three Sirens (
Figure 2).
Baginsky created this work during the second AI Winter at a time when interest in artificial neural networks had faded due to computational and theoretical limitations. In the late 1980s, researchers proposed supervised learning approaches applied to music by training neural networks on music scores (
Todd 1989;
Fernandez and Vico 2013). These models generated new compositions in the same style as the one on which they were trained. Baginsky’s goal, however, was to create completely autonomous artificial agents that could invent their own music through direct improvisation, outside of any direct human intervention.
From the outset, Baginsky rejected an anthropomorphic approach to robotic design. The Sirens were not robots that played instruments; rather, each was both musician and instrument, created from scratch by the artist and combining the practices of both the engineer and luthier. In addition, just as Baginsky had no interest in anthropomorphizing his robots, he also refused anthropocentric supervised learning approaches in which the robots would learn from human scores because to him, this tactic was devoid of any creative potential and could only lead to pastiche—in other words, mimicking music that already existed. Baginsky had a more ambitious goal: by creating nonhuman self-learning agents and letting them improvise outside of any human control, he believed it would be possible to create another kind of music.
This motivation to be surprised by the unexpected and to consider instruments as self-learning non-human agents is also reflected in the name of the band: The Three Sirens. There are many similarities between the Greek mythological sirens and the robot-instruments created by Baginsky. The legend behind those dangerous half-human, half-animal creatures able to enchant sailors by their enthralling music (and then devour them) informs the relationship between Baginsky and his creations, as well as his conception of artificial intelligence and machine creativity. Indeed, myths, legends, and fictional and non-fictional references “form the backdrop against which AI systems are being developed, and against which these developments are interpreted and assessed” (
Cave et al. 2020, p. 7).
Our review, in thus looking for the unexpected, will hint at a certain nostalgia for the future,
1 which, in the contexts of artificial intelligence and surprising musical effects, will push us to reimagine such concepts as musical creativity and improvisation within algorithmic composition.
2. Approaches to Sounds and Technology in the Three Sirens
One of the particularities of the work of Baginsky lies precisely in the choice of the music-generating algorithm: the idea of opting for an
unsupervised learning neural network approach is one of the key elements of The Three Sirens. The neural networks used by Baginsky, called self-organizing maps (SOMs) (
Kohonen 2001), are biologically inspired models of neural network that represent raw data in a more compact way by extracting regularities in the data while preserving its topological structure. In other words, these models autonomously create a “map” of a data set without direct human intervention. The use of SOMs in The Three Sirens allows the robots to self-regulate in order to produce sound forms that are virtually independent of preexisting structures.
By contrast, many attempts to generate music with machine learning instead use
supervised learning algorithms. A recent example is the robot-controlled cellos in
Empty Vessels (2022) by Montreal Life Support and Woulg.
2 In this piece, in which three real cellos are augmented by robotic playing systems (
Figure 3), the authors decided to influence the musical output by training the system on scores of their own compositions. This approach results in musical structures broadly inspired by the tradition in which the authors themselves were trained. The advantage of supervised learning algorithms is that they produce a more controllable output, which is more likely to create a more familiar music since the musical structures are known.
3 One can also mention serious projects from Magenta (Google)
4 or Flow Machines (Sony),
5 both of which aim to reproduce, as faithfully as possible, a particular musical style using existing human-made musical corpora as a learning base. In fact, since such supervised learning techniques were pioneered in 1989 by Peter M. Todd (
Todd 1989), there have been several similar projects demonstrating impressive results in score generation based on styles ranging from blues (
Eck and Schmidhuber 2002) to Celtic folk (
Sturm et al. 2016) to that of a particular composer, such as Johann Sebastian Bach (
Hadjeres et al. 2016;
Liang et al. 2017). We are now able to build continuous music generators that can play endlessly in a particular style,
6 bringing us closer to the future envisioned by composer and musician Brian Eno, in which recorded music would disappear in favor of generated music.
7While these projects have strong scientific and commercial relevance, they suffer from the same limitations in terms of musical innovation, which originally led Baginsky to create The Three Sirens in the search for new musical forms. Baginsky’s project is important because it contrasts the use of AI in music for utilitarian purposes with a more open-ended and experimental approach. Since there is no imposed structure, the performances of his robots are not guided (at least directly or consciously) by any repertoire or musical tradition, which grants them the potential to generate truly novel music. Baginsky’s approach to generative music therefore steps away from the world of supervised composition/scores, and ventures towards that of unsupervised improvisation/performance. The Sirens have no example of what a “good” piece of music is. Rather, they listen to their environment and respond to it in real time, influencing each other and evolving as an ensemble through an ongoing feedback loop. This conception of music is emergent, autonomous, and embodied. It is situated in the materiality of their experience of the world.
To the question “If The Three Sirens learn, what are the criteria?”, Baginsky responds “No criteria, no predefined musical knowledge, the acoustic world is the teacher”, adding that “The musical systems that we humans developed in the last couple thousands of years did not come out of thin air.” (
Baginsky 2022a, para. 6) The artist thus allows us to experience systems in which the machine influences the human, rather than the other way around. He allows us to think that it could be the robots that could teach us new ways of playing the guitar. To do this, one must resist the temptation of control and favor techniques that allow a greater amount of letting go, even if it means the results might be unsettling. This does not imply, however, that one must accept any result, since human sensitivity remains, and one will always have a certain number of expectations; but it commends a more balanced relationship between humans and machines in AI-based artistic practices that values uncanny forms of organization and structure that can emerge from algorithms.
Another important element is the fact that the band’s “compositions” are based on live performances by robots. These performances are not attempts to imitate a human band but for the robots to play in their own way. For instance, the robots possess abilities that surpass those of humans: they never get tired and can play with extreme speed and precision for hours without developing carpal tunnel. On their own, these properties affect the sound output throughout the performances. On the other hand, the robots’ sensorimotor capacities are limited compared with those of human performers. Therefore, the sounds they can make are not as elegant or sophisticated as those a dexterous human musician could perform using the dozens of muscles and myriads of nervous receptors in their body. Instead, the robo-musicians have their own abilities, which give rise to new sound forms. Some of these properties can be exploited: one could imagine, for example, the percussionist robot playing at a rhythm that is impossible for a human being to keep.
Once we understand how the Sirens use unsupervised learning to go beyond usual compositional frameworks in order to generate their own music style, one would think that the band would create uncanny-sounding music beyond any known musical genre. Yet, Baginsky notes that the robots spontaneously adopt that musical structure found in blues music: “When listening to the robot’s play, it soon becomes apparent that the machine has a strong preference for the blues (in the Hendrix’s sense)” (
Baginsky 2022b, para. 6). This is partly because of how the Aglaopheme’s guitar strings are tuned. Indeed, Baginsky decided to tune the strings according to the notes of the D#maj 7 chord, used in blues harmonies, which marked an important direction in the project.
While this constraint cannot be ignored, it still leaves room for a large number of possible interpretations and improvisations. This also leads to another question about intrinsic harmonic structures in music: if an algorithm is allowed to create music by itself, with as few constraints as possible, will it necessarily recreate forms that have already been created by humans? Do harmonic structures impose certain musical forms by themselves? This last question has long been debated, but we have known since Pythagoras that the intervals are based on mathematical relationships, and it is therefore not surprising that an algorithm that seeks to adapt itself according to correspondences between the frequencies of harmonics tends to reproduce these patterns and relationships. If robots listen to the harmonic frequencies produced by different instruments and try to play music according to the frequencies they hear, such as in The Three Sirens, then one could expect that the music they produce would respect not cultural or inherited patterns, but the very structure of the sound wave and its physical properties. As further explained by Baginsky himself:
By looking at the first few harmonics in the overtone spectrum of a vibrating string one finds that the first and third harmonic (unequal the fundamental) are the fifth (tone 5 above fundamental) and that the seventh harmonic (unequal the fundamental) is the fourth (tone 4 above fundamental). Both intervals are essential for the blues scheme. Another interesting candidate is the fourth harmonic (unequal the fundamental). This tone turns an ordinary chord into a major7 chord when added. This type of chord is heavily used in blues music.
3. Mythological Archetypes Represented by the Three Sirens: Enchantment, Inspiration, Creativity, and Improvisation
Behind Baginsky’s robo-rock band and its technological design lies a subtle dialogue between science and technology, the imaginary, fiction, art, and music. AI systems are nourished by archetypal, mythological, story-telling tendencies (in other words, the imagination) of their human creators. We understand the imaginary as associated with the “unreal”
8 (
Bottici 2014;
Castoriadis 1998) as well as fictional depictions of ourselves and our societies in myths, legends, fairy tales, and other stories. Imagination, which is a process by which one borrows sense from other objects (
Sartre et al. 2004), plays a key role in designing AI (
Jasanoff and Kim 2015;
Cave et al. 2020) and understanding machine creativity. While speculating about the future, creative AI projects such as The Three Sirens make us reflect on who we are and who we might become, as well as how we relate to intelligent machines. Our relationships as artists or participants in artistic performances are embedded in our cultural memory. In other words, sociotechnical imaginaries are used in building our perceptions and multiple understandings of AI, and they “encode not only visions of what is attainable through science and technology but also of how life ought, or ought not, to be lived; in this respect, they express a society’s shared understanding of good and evil” (
Jasanoff and Kim 2015, p. 4). Artificial intelligence, therefore, reflects the existing questions in older stories, legends, and fairytales: our imagination influences what we think intelligent machines should and should not do, what they may or may not become, and ultimately determine our perception of how creative those machines will be.
To understand the imaginaries that sustain The Three Sirens, it is interesting to explore the meaning and the story behind the name of the robo-rock band. Based on the Greek myth of the three sirens, the robots invoke concepts such as danger, distraction, femininity, animality, and creativity.
The mythological sirens—Peisinoe, who played the cithara; Agalope, who sang; and Thelxiepeia, who played the flute—were said to be the half-women, half-bird daughters of the river god Achelous and the muse Melpomene. They attempted to lure sailors to their island with their beautiful music, and depending on the version of the myth being used as a source, they either devoured the sailors or, with their enchantments, prevented them from leaving. Homer adds the story of the hero Ulysses, who, as he was sailing by their island, plugged the ears of his comrades with wax to protect their minds from the sirens’ music but had himself tied to the mast with his ears unplugged (
Figure 4). Fully exposed to their performance, he experienced an irresistible temptation to swim towards them; but inasmuch as his restraints held him firmly to his ship, the sirens, out of desperation, committed suicide by falling into the sea and drowning.
Baginsky’s project is thus a challenge to our own era in respect to the lure of new AI technologies. We can plug our ears and reject the notion that the sirens’ sonic productions belong to the domain of music. Instead, we can listen to their voices and appreciate them, while binding ourselves to our human-centric conceptions of art and music, unable to escape their limitations. If, however, we boldly embrace their outlandish sounds as genuine works of music, we do so at the risk of losing humanity’s hold on the territory of art, irreversibly plunging into the abyss of more-than-human creation.
The choice of this metaphor—three dangerous half-human, half-animal sirens—is pertinent because, as already noted, the sirens’ mother was said to be one of the nine Greek muses. Traditionally conceived of as the source of artistic inspiration which they whispered into the ears of musicians, sculptors, and playwrights, the muses were the beautiful daughters of Zeus and Mnemosyne (”Memory”) and were frequently represented as playing harps and dancing on Mount Olympus. The sirens, on the other hand, are creatures with animal instincts; guided by their hunger, they devour men rather than inspiring them. As such, when Baginsky states that musical instruments are for him essentially feminine, he is invoking an understanding of this term as including animality and the unreal, dangerous, deceptive, and unexpected.
The sirens’ enchantment radically differs from the muses’ inspiration. “Inspiration” (from the Latin
inspirare) implies the divine—the “immediate influence of God or a god”.
9 It also hints at guiding in the sense of controlling. In the case of the sirens, the word “enchantment” is inseparable from “the act of magic or witchcraft” and is related to the Latin
incantare—to “cast a magic spell upon”. It is also etymologically connected with the pre-indo-European root
kan-, which in modern European languages became “sing”.
10 The sirens, therefore, sing to enchant and to distract. Their music forces sailors to lose control. As anthropologist Stefan Helmreich underlines in his groundbreaking study of artificial life (ALife) movements:
Since the creators of automata have almost always been fully grown white men existing in a world in which women, children, dark-skinned people, people from the “East”, and animals were marked as primitive, such automata have been cast as facsimiles of white children or women playing music, “negro minstrels” strumming guitars, “Turkish” people playing chess, and ducks drinking water.
As such, robots often remain supervised and managed by their creators. This is the case, for example, for Montreal Life Support and Woulg’s cello trio and other score-based supervised learning neural network systems presented earlier, all of which rely on training machine learning systems on preexisting human-made scores. Baginsky’s Sirens use unsupervised learning neural nets and are not built to be controlled, taught musical phrases, imitate existing composers or genres, or even excel in virtuosity to “perform musical pieces known to be extremely difficult at the limit of human capacities” (
Pachet 2012, p. 117). Rather, the artist completely embraces the element of surprise. Baginsky, like Ulysses, is enthralled with the idea of being enchanted and pulled towards the abyss. This willingness to be subsumed is reflected in how he describes his relationship with the robotic instruments: “I’m their technical assistant, not their creator”. He always treats them as mythical creatures, free to do as they please.
Similarly, one cannot ignore his fatherly approach when he describes, for example, the times when the machines have been “stubborn” and have wanted to express their own musical creativity. Yet, he also allows them to explore the world of music, like a good parent would do with their children, allowing them to experiment and explore. In other words, their machine learning experience “compares with the process of aging” (
Audry 2021, p. 272) and relates to our perceptions around temporality: the future, the present, and the past.
4. Conclusions
The dream of an autonomous creative agent may not yet have been achieved,
11 but Baginsky’s The Three Sirens provides a rare example of how it can be seriously pursued. Many boundaries are no longer as clear-cut as they once were, and this type of experience forces us to rethink the role of the artist in creation.
The tensions between human authorship and machine autonomy are articulated in the imaginaries and nostalgia surrounding the work itself. Indeed, Baginsky’s The Three Sirens invokes the story of the mythological sirens, which provides an appealing metaphor that broadly applies to machine-learning art and artificial creativity. The Sirens’ enthralling chants force us into a liminal state of being between curiosity and fear, in which we are faced with a choice: either plug our ears and refuse the possibility that the robots are genuinely creative; accept their chants as real music at the risk of losing human exceptionalism as the only true creative species; or try listening to the music while binding ourselves to our human-centric worldview.
This convoluted relationship with the nonhuman is at the core of Baginsky’s motivation in creating The Three Sirens. Lurking at the frontier of human and machine, of musician and instrument, and of artist and artwork, the Sirens make us consider such notions as musical growth, creativity, virtuosity, and improvisation in relation to both human and robotic musicians. Over the years, the band regularly rehearsed in preparation for their performances, jamming to develop their own style. Even when Baginsky occasionally had them listen to music by human musicians such as Jimi Hendrix and Buddy Holly, the relationship between the robots and these artists was completely different than if they had been trained directly on their scores using supervised learning. Indeed, the robots were not trained to imitate the style of these musicians but rather to listen and respond to them in real time—in short, to be inspired by them.
These reflections hint at nostalgia for the future of music in which our speculations are tainted with certain nostalgic feelings about sounds and musicians we already know. We may then be asking ourselves what new musical forms could emerge beyond the human realm, moving from “music-as-it-is” to “music-as-it-could-be”.