Using Formal Grammars as Musical Genome

: In this paper, we explore a generative music method that can compose atonal and tonal music in different styles. One of the main differences between regular engineering problems and artistic expressions is that goals and constraints are usually ill-deﬁned in the latter case; in fact the rules here could or should be transgressed more regularly. For this reason, our approach does not use a pre-existing dataset to imitate or extract rules from. Instead, it uses formal grammars as a representation method than can retain just the basic features, common to any form of music (e.g., the appearance of rhythmic patterns, the evolution of tone or dynamics during the composition, etc.). Exploring different musical spaces is the responsibility of a program interface that translates musical speciﬁcations into the ﬁtness function of a genetic algorithm. This function guides the evolution of those basic features enabling the emergence of novel content. In this study, we then assess the outcome of a particular music speciﬁcation (guitar ballad) in a controlled real-world setup. As a result, the generated music can be considered similar to human-composed music from a perceptual perspective. This endorses our approach to tackle arts algorithmically, as it is able to produce novel content that complies with human expectations.


Introduction
Creativity is a complex concept that has been forged in the modern eras when it was started to be perceived as a capability of great men, in contrast to just being productive [1]. There are a variety of definitions, but here we understand it as the ability to generate novel and valuable ideas [2]. Music composition is considered an expression of human creativity, even if composers (like artists in general) take inspiration from other sources, such as Nature sounds and -mostly-other authors. Similarly, the algorithmic composition of music usually follows an imitative approach by feeding a computer system with a large corpus of existing (human-made) scores. However, here we investigate music composition from a different perspective: as an iterative process to discover esthetically pleasing musical patterns.

Automated Composition
The digital era brought yet another step in the evolution of musical genres [3] and the development of algorithmic composition [4]. Some works at the early age were: an unpublished work by Caplin and Prinz in 1955 [5], with an implementation of Mozart's dice game and a generator of melodic lines using stochastic transitional probabilities; the Hiller and Isaacson's Illiac Suite [6], based on Markov chains and rule systems; and MUSI-COMP by Baker [7], implementing some methods used by Hillers and Xenakis's stochastic algorithms in the early 1960s, working as CAAC [7]. In 1961, a dedicated computer was able to compose new melodies related to previous ones by using Markov processes [8] Appl. Sci. 2021, 11, 4151 3 of 25 of natural language discourse, while keeping the blind indistinguishability of the TT. As such, they must be considered surveys of musical judgments, not a measure of thought or intelligence.
Another important aspect of generative music systems is their lack of experimental methodology [31] and the fact that typically there is no evaluation of the output by real experts. Delgado [32] considered a compositional system based on expert systems with information about emotions as input and described an interesting study where the extent to which the listeners match the original emotions is evaluated, but the test was only completed by 20 participants. Monteith [33] presented another system and tested it in an experiment where the participants had to identify the emotions evoked for each sample, as well as assess how real and unique they sounded as musical pieces. There were two major limitations in this case: the sample contained only 13 individuals and they knew in advance that there were computer-generated pieces. A more recent example was described by Roig [34], with a questionnaire including three sets of compositions, each with two compositions made by human and the other generated by their machine learning system, which the participants had to identify. In that experiment, there was a bigger sample (88 subjects), however it suffered from lack of control, since the test was performed through the web, and simplicity, because the participants only had to answer which of the three samples was computer generated.
Affective algorithmic composition [35] has promoted new strategies of evaluating the results of automatic composition systems, as compared to human produced music. Here, emotions are understood as relatively short and intense responses to potentially important events that change quickly in the external and internal ambience (generally of social nature). This implies a series of effects (cognitive changes, subjective feelings, expressive behavior or action tendencies) occurring in a more or less synchronized way [36]. This theory of musical emotion [37] asserts that music can elicit emotions from two different involuntary mechanisms of perception and interpretation of the human being: (a) the (conscious or unconscious) assessment of musical elements; and (b) the activation of mental representations, related to memories with significant emotional component [38,39]. Even though self-report has been associated with some obstacles, like a quantitative demand of lexicon or the presence of biases, it is considered one of the most effective ways of identifying and describing the elicited emotions [36]. This work has opted for an openanswer modality of the cited method, since the fact that establishing explicit emotional dimensions might itself condition their appearance due to the human empathy [36].
In Section 3.2 we describe a MDtT designed to assess the compositional and synthesis capabilities of the developed system. The experiment involved more than 200 participants, professional musicians and non-musicians, who listened to and assessed two different pieces of music: one from our generator of tonal music and another composed by a musician. As compared to previous works, the main contributions here are the rigor of the study and the significance of the results.
The conclusion of this endeavor is Melomics, a tool that is able to produce objectively novel and valuable musical scores without using any knowledge of pre-existing music, but trying to model a creative process.

Melodies and Genomics
In order to produce full music compositions through evolutionary processes, we need to encode the structure of the music, not only the single pitches and other music variables, but also the relations among them. That meaning, there must be a higher-level structure more compact than the entire composition. In a sense, it is a compressed version of the composition, which implies the presence of some kind of repetitions and structured behaviour, since truly random data would not be susceptible to compression.
While computers are capable of sound synthesis and procedural generation of sound, we are interested in the production of a music score in the traditional staff notation. Some systems aim to reproduce a particular style of a specific artist, period, or genre by using a Appl. Sci. 2021, 11, 4151 4 of 25 corpus of compositions from which recurring structures can be extracted. We pursue music composition from scratch, modeling a creative process able to generate music without imitation of any given corpus. The ability to create music in a specific style is enforced by a combination of two important constraints: • The encoding, which generates a bias in the search space. That is, by changing the way we represent music we also change what is easy to write and what is difficult -or even impossible-to express. By forcing music to be expressed as a deterministic L-system or a deterministic context-free grammar, regular structures are easier to represent and will appear more often than completely unrelated fragments of music. A set of rules can appear not to provide any repeated structure, but its product might contain a high amount of repetitions and self-similar structures. This is, indeed, an essential aspect in both biology and music [40]. The structure of an entire piece of music is unequivocally encoded, including performing directions, and deploys from a series of derivations. This provides a way to generate repetitions in the product from a simple axiom, without being repetitive. • The fitness function, which associates a measure of quality to each composition, determined by looking at some of its high-level features, such as duration, amount of dissonance, etc.
The dual effect of having an encoding that restricts the search into a more structured space, where individual changes in the genome can produce a set of organized alterations on the phenotype, combined with the filtering of a fitness function, helps the generation process to "converge" to music respecting a particular style or set of conditions, without any imitation.

Atonal Music
The atonal system uses an encoding based on a deterministic L-system (genetic part) and some global parameters used during the development phase (epigenetic part). An L-system is defined as a triple (V, S, P) where V is the alphabet (a non-empty, finite set of symbols), S ∈ V is the axiom or starting symbol and P ⊆ V × V * is the set of production rules in the form A → x, A ∈ V, x ∈ V * . Each production rule rewrites every appearance of the symbol A in the current string into x. Because it is deterministic, there is only one rule A → x for each symbol A ∈ V. In our model, we divide symbols of the alphabet V into two types: • Operators, which are reserved symbols represented by a sequence of characters with the form @i, i ∈ Z or $j, j ∈ {1 . . . m}. There are no rules explicitly written for them, the rule A → A is assumed instead. They control the current state of the abstract machine, modulating the musical variables: pitch, duration, onset time, effects, volume, tempo, etc. • Instruments, with the form #k, k ∈ {0 . . . p}. These symbols have an explicit production rule with them on the left side and can represent either a musical structural unit (composition, phrase, idea, etc.) or, if it appears in the final string, a note played by the instrument that has been associated to that symbol.
Since the rewriting process could be potentially infinite, in order to stop it, we introduce the following mechanism: each of the production rules have an associated value r i , indicating the possibility of that rule to be applied in the next rewriting iteration. There is a global parameter T which serves as the initial value for all r i and there is a weight I i for each production rule, to compute the new value of r i from the previous time step, r i t = f (r i t−1 , I i ). All r i will progressively get a lower value until reaching 0, when the associated rule will not be applied anymore. The formula is applied if that rule is used at least once in the current iteration. To illustrate the process, let us introduce some of the operators that we use: Appl. Sci. 2021, 11, 4151 5 of 25 $1 increases pitch value one step in the scale. $2 decreases pitch value one step in the scale. $5 saves current pitch and duration values in the pitch stack. $6 returns to the last value of pitch and duration stored in the pitch stack. $7 saves current time position in the time stack. $8 returns to the last time position saved in the time stack. $96 applies dynamic mezzo-forte. @60$102 applies the tempo: quarter equals 60. Now, let us consider the following simple and handmade L-System And P consisting of the rules: } Below we show the whole developmental process to get the composition generated by Ga, considering T = 1 and I i = 1 for all the rules. And the following global parameters: Scale: C major Tempo: 80 bpm Default duration: quarter note Default dynamic: mezzo-piano Initial pitch: middle C Instruments: piano (0), rest (1), church organ (2 and 3), cello (4) The vector r will show the current values of r i for all the production rules. Iteration 0 With no iterations, the resulting string is the axiom, which is interpreted as a single note, played by its associated instrument, the piano, with all the musical parameters being on their default values still.
String: #0 To better illustrate the interpretation procedure, Figure 1 shows the resulting score from the current string, if we supposed that the rewriting counter is 0 for the first rule (r 0 = 0) at this point. Since it is not the case, the rewriting process will continue.  This is interpreted from left to right as: change the tempo to "quarter equals 60", apply 246 Figure 1. Resulting score from Iteration 0 (see Audio S1).

Iteration 1
Appl. Sci. 2021, 11, 4151 6 of 25 For the first iteration, the rule associated to the only symbol in the string needs to be applied and then, applying the formula, r 0 will be set to 0.
String: @60$102$96#2$1$1$1$1$1$1$1#2 r = [01111] This is interpreted from left to right as: change the tempo to "quarter equals 60", apply the dynamic mezzo-forte, play the current note on instrument 2, rise the pitch seven steps on the given scale and finally play the resulting note on instrument 2. As before, Figure 2 shows the resulting score.  This is interpreted from left to right as: change the tempo to "quarter equals 60", apply 246 the dynamic mezzo-forte, play the current note on instrument 2, rise the pitch seven 247 steps on the given scale and finally play the resulting note on instrument 2. As before, 248 Figure 2 shows the resulting score.

Iteration 2 250
The previous score showed the hypothetical outcome if we supposed r 2 = 0 already.

251
Since that symbol is in the string and r 2 = 1, the rule has to be applied once.    The rules with #3 and #4 need to be applied, resulting:

Iteration 2
The previous score showed the hypothetical outcome if we supposed r 2 = 0 already. Since that symbol is in the string and r 2 = 1, the rule has to be applied once.
String: @60$102$96$7#3$8#4$1$1$1$1$1$1$1$7#3$8#4 r = [01011] Figure 3 shows the hypothetical score for Iteration 2. At this point, symbol #2, as opposed to what would have happened if it had stopped at Iteration 1, no longer acts as a playing instrument, but rather as a compositional block that gives place to two instruments. These instruments play at the same time due to the use of the time stack operators $7 and $8. The "synchronization" between the two instruments is an emerging property from the indirect encoding. This is interpreted from left to right as: change the tempo to "quarter equals 60", apply 246 the dynamic mezzo-forte, play the current note on instrument 2, rise the pitch seven 247 steps on the given scale and finally play the resulting note on instrument 2. As before, 248 Figure 2 shows the resulting score.

Iteration 2 250
The previous score showed the hypothetical outcome if we supposed r 2 = 0 already.

251
Since that symbol is in the string and r 2 = 1, the rule has to be applied once.    The rules with #3 and #4 need to be applied, resulting:  Figure 3. Resulting score from Iteration 2 (see Audio S3).

Iteration 3
The rules with #3 and #4 need to be applied, resulting: The music has acquired a structure, the initial rewriting resulted in a change of tempo and dynamic and a shift up in the pitch one octave for the second part of the composition. The following rewriting developed two instruments playing in polyphony and the latest one provided the final melody to each instrument.
There is a fourth iteration that will rewrite symbol #1 into itself and set r 1 = 0. The score shown in Figure 4 for Iteration 3 is the same as the final score in this case. The music has acquired a structure, the initial rewriting resulted in a change of  There is a fourth iteration that will rewrite symbol #1 into itself and set r 1 = 0. The 277 score shown in Figure 4 for Iteration 3 is the same as the final score in this case.

278
Supplementary material Scores S10 shows two of the first actual pieces that were 279 generated with the atonal system.  In order to establish a compositional structure as it is common in a traditional 286 composing process, inspired by Bent's Analysis [41], the production rules are explicitly 287 structured in five hierarchical levels: Composition. This is the most abstract level and it is formed by a sequence of Supplementary material Scores S10 shows two of the first actual pieces that were generated with the atonal system.

Tonal Music
While the system used to produce atonal music can also generate tonal music, in order to favor the emergence of structures and elements that are usually present by convention in popular Western music, there are different changes introduced to the original system.
In order to establish a compositional structure as it is common in a traditional composing process, inspired by Bent's Analysis [41], the production rules are explicitly structured in five hierarchical levels: • Composition. This is the most abstract level and it is formed by a sequence of similar or different kinds of periods, possibly with music operators (alterations in tone, harmony, tempo, macro-dynamics. . . ) between each of them. • Period. This is the highest structural subdivision of a composition. There can be more than one type of period, built independently, becoming separate musical units recognizable in the composition. • Phrase. This is the third structural level, the constituent material of the periods. • Idea. Constitutes the lowest abstract level in the structure of a composition. A phrase can be composed by different ideas that can be repeated in time, with music operators in the middle. A musical idea is a short sequence of notes generated independently for each role, using many different criteria (harmony, rhythm, pitch intervals, relationship with other roles. . . ). • Notes. This level is the most concrete level, composed only by operators and notes played by instruments.
Global parameters are introduced to force the occurrence of tonal music. Some of these are applied straight away when creating a new genome, by filtering the use of certain strings as the right hand side of a production rule and some others are used as part of the fitness function (see Section 2.2.2 below). These parameters can be grouped in the following categories: Texture. This sub-section allows the possibility to define rules for the inclusion of roles; different types of dependencies between them; the compositional units where they are forced, allowed or prohibited to appear in; and general evolution of the presence of instruments.
There are a few new operators to improve harmony management, some of them with a more complex interpretation process and dependencies with other operators and global parameters, like the operator M to create chords (see development example below).
Since we enforce a strict hierarchical structure, it is useful to distinguish the symbols that represent compositional elements that will be rewritten during the development from those that have the same specific meaning at any given moment. For this reason we use an encoding based on deterministic context-free grammar, defined as (V, Σ, S, P), where V is the set of non-terminal symbols, Σ is the set of terminal symbols (the alphabet), S ∈ V is the axiom or starting symbol, P ⊆ V × {V ∪ Σ} * is the set of production rules. The symbols in the set V can be identified with the structural units: composition, periods, phrases and ideas, while the symbols in Σ represent notes and music operators, such as modulators of pitch, duration, current harmonic root or current chord. In our implementation, on the right hand side of the production rules, there are only terminal symbols or non-terminal symbols from the following level in the hierarchy in a decreasing order.
To illustrate the rewriting process, let us introduce some of the reserved terminal symbols and their interpretation: N increases the counters pitch and harmonic root in one unit. n decreases the counters pitch and harmonic root in one unit.
[ saves in a stack the current value of pitch, harmonic root and duration. ] restores from the stack the last value of pitch, harmonic root and duration. < saves in a stack the current time position, value of pitch, harmonic root and duration. > restores from the stack the last saved time position, value of pitch, harmonic root and duration. W4.0 applies the macro-dynamic mezzo-forte. M0.0.0.0 makes the next symbol linked to an instrument to play the root note of the current chord, instead of the current pitch.
Let us define Gt, a simple and handmade grammar to illustrate the development using this new model: And P consisting of the following rules: The last idea, F, is interpreted only by the instrument linked to a while the idea E is performed by the instruments associated to a and b, in polyphony; the latter always playing a harmony consisting of the root note of the current chord. To interpret the final string we will use the following values: Initial scale: C major Tempo: 80 bpm Default duration: quarter note Default dynamic: mezzo-piano Initial pitch: middle C Initial chord: major triad Initial root: I Instruments: violin (symbol a), double bass (symbol b), musical rest (symbol s) See the resulting composition in Figure 5. Supplementary material Theme S11 provides the genome, the auxiliary MIDI and the MP3 of an actual piece for clarinet generated with the tonal system. Vln. Vln.
Vln. • Each symbol in the final string has a musical meaning with a low level of abstraction, 460 which is interpreted by the system through a sequential reading from left to right 461 and stored in an internal representation.

462
• Once again the musical information will be adjusted and stabilized. For example, 463 shift the pitches to satisfy constraints in the instruments' tessituras, discretization 464 of note durations and so on.

465
• The input directions are used to assess the produced composition that might be 466 discarded or pass the filter. In that case it is saved as a 'valid' composition.

Composing Process
Melomics uses a combination of formal grammar to represent music concepts of varying degrees of abstraction and evolutionary techniques to evolve the set of production rules. The system can be used to compose both atonal and tonal music and, although using slightly different encoding methods and a much stronger set of constraints in the second case, they both share a similar structure and execution workflow (see Figure 6). From a bio-inspired perspective this can be thought of as an evolutionary process that operates over a developmental procedure, held by the formal grammar. The music "grows" from the initial seed, the axiom, through the production rules to an internal symbolic representation (similar to a MIDI file), where finally the compositions are subject to test by the set of constraints provided. The execution workflow can be described as follows: • Filling the desired input parameters. These parameters represent musical specifications or directions at different levels of abstraction, such as instruments that can appear, amount of dissonance, duration of the composition, etc., with no need for creative input from the user. • The initial gene pool is created randomly, using part of the input parameters as boundaries and filters. • Each valid genome, based on deterministic grammar and stored as a plain text file, is read and developed into a resulting string of symbols. • The string of symbols is rewritten after some adjustment processes that can be of different forms: cleaning of the string, for example by removing sequences of idempotent operators; adjustments due to physical constraints of the instruments, like the maximum number of simultaneous notes playing; or the suppression (or emergence) of particular musical effects. • Each symbol in the final string has a musical meaning with a low level of abstraction, which is interpreted by the system through a sequential reading from left to right and stored in an internal representation. • Once again the musical information will be adjusted and stabilized. For example, shift the pitches to satisfy constraints in the instruments' tessituras, discretization of note durations and so on. • The input directions are used to assess the produced composition that might be discarded or pass the filter. In that case it is saved as a 'valid' composition. • A discarded theme's genome is replaced by a new random genome in the gene pool. On the other hand, a "valid" genome is taken back to the gene pool, after being iteratively subject to random mutations and possible crossover with another genome in the gene pool (see Section 2.2.1), until it passes the filters defined at genome level (the same as with a random genome). • If desired, any composition (usually the ones that pass the filter) can be translated, using the different implemented modules, to standard musical formats that can be symbolic (MIDI, MusicXML, PDF) or audio (WAV, MP3, etc.), after executing a synthesis procedure with virtual instruments, which is also led by the information encoded in the composition's genome. The average time to build a new valid genome varies very much depending on the style and constraints imposed at this level. Using single core on an Intel Xeon E645, for a very simple style it can be ready in less than one second, while for the most complex style, symphonic orchestra in the atonal system, the process takes 283 s on average. Given a built gene pool, the time to obtain a new valid composition, which includes executing the developmental process possibly repeatedly until passing the filters, varies a lot too. It can take 6 s for the simplest atonal style, while it takes 368 s on average for the most complex style in the tonal system.

Mutation and Crossover
The genomes can be altered in any way possible in the valid set of symbols defined by removing, adding or altering symbols in any position; the developmental procedure will always give place to a valid musical composition. However, the disruption provided by genetic operators (mutation and crossover) should be balanced in order for the system to converge properly to the given directions. In Melomics, a small amount of disruption combined with the hierarchical structure granted by the encoding makes the system produce reasonable results since early iterations, typically less than 10.
For atonal music we allow stronger alterations, since the styles addressed in general are less constraining than the popular styles pursued with the tonal system. The implemented mutation are: (1) changing a global parameter (e.g., default tempo or dynamics); (2) changing the instrument associated to a symbol in the grammar; (3) adding, removing or changing a symbol on the right hand side of a production rule; (4) removing a production rule; and (5) adding a new production rule, copying an existing one or generated randomly, and introducing the new symbol randomly in some of the existing production rules. The implemented crossover mechanism consists of building a new genome taking elements from two others. The values for the global parameters and the list of instruments are taken randomly from any of the source genomes. The set of production rules is taken from one of the parent genomes and then the right hand side can be replaced with material from the other parent. The disruption introduced with these mechanisms are still too high, even for atonal music, hence the mutation operations (4) and (5) are used with less probability and the rule with the symbol associated to the musical rest is always kept unaltered. Audio S6 shows a mutation of the Nokia tune (https://www.youtube.com/results?search_query =nokia+tune, accessed on 30 April 2021), reverse engineered into the system, where the instrument has been altered as well as the rules at the lowest level of abstraction, resulting in some notes being changed while maintaining the more abstract structure.
For tonal music the mutation and crossover operations are similar, but more restricted and executing less of them at each iteration. Mutations allowed are: (1) changing a global parameter; (2) changing an instrument for another valid for the same role; (3) adding, removing or changing a non-terminal symbol or a terminal reserved symbol (operators, no instruments associated) on the right hand side of a production rule, provided that it does not alter the five-level hierarchical development; and (4) removing or adding a new terminal non-reserved symbol (Instrument), in the second case duplicating appearances and forcing polyphony with an existing one (enclosing the new one with the symbols <, > and placing it to the left) and assigning an instrument of the same role. The crossover is also similar, taking the genome of one of the parents and only replacing a few rules with material of the same structural level from the other. The global parameters are taken randomly from any of them and the same for the instruments, respecting role constraints. Audio S7 shows a sample generated in the style DiscoWow2 and Audio S8 shows a crossover of it with the tune SNSD SBS Logo Song (https://www.youtube.com/results?search_que ry=SNSD+SBS+Logo+Song, accessed on 30 April 2021) of Korean TV, reverse engineered into the tonal system.

Fitness
Both the atonal and the tonal systems count on a set of parameterized rules to guide the composing process, by allowing the development of those composition that comply with the rules, while filtering the rest of them out. There are (a) global rules that basically constitute physical constraints of the musical instruments, such as the impossibility for a single instrument to play more than a certain number of notes simultaneously or to play a note too short or too long; and (b) style based constraints that encode expert knowledge and are used to assure the emergence of a particular kind of music, resembling the way a human musician is requested to create music in a certain style. This latter kind of rules, in general looser in the atonal system, can be grouped as follows: • Duration. For example lower and upper boundaries for the duration of a composition, which exist both in the atonal and tonal systems and are assessed at the end of the developmental process, on the phenotype, when the musical information is explicitly written in the internal symbolic format. There are also duration filters for other compositional levels depending on the style. • Structure. These are constraints to the number and distribution of compositional blocks. • Texture and dynamics. In the atonal system both polyphonic density and global dynamics are checked at every point of the musical piece, according to the specified function at the input, with a margin of tolerance. For the tonal system the procedure is more restrictive and applied at genotype level. Valid musical roles, dynamics and instrumental density are defined for each type of compositional block. • Instruments. Both systems check the instruments chosen to build a valid genome in a defined style. • Harmony and rhythm. In the atonal system there is a measure of the amount of dissonance, a count of pitch steps and a count of changes of note duration. Each of these properties has a lower and an upper threshold that must by complied for a specified time window that moves along the composition. The tonal system is more restrictive. For each defined style there is the list of optional parameters described in Section 2.1.2, where some of them translate directly to filters of the fitness function.
The main filters regarding harmony and rhythm are: a set of valid modes and tones; valid measure types; valid rhythmic modes for the melody; valid chords for each role type; valid "harmonic transitions", as a way to assess and filter out certain chord progressions; and valid rhythmic patterns, expressed in terms of the current measure type and including rests. All the filters have a tolerance value associated and most of them can be defined globally or at the level of compositional block (period, phrase or idea).

Preliminary Assessments
We used music information retrieval tools to measure similarity of compositions generated under common musical requirements, then compare these themes with other generated in a different style and also with music created by humans.
To deal with music in a symbolic way, we used the open-source application jMIR [42], in particular jSymbolic, which is able to extract information from MIDI files. We set up the tool to get features classified into the categories of instrumentation, texture, rhythm, dynamics, pitch statistics, melody and chords adding up to 111 in total. For the atonal system, we picked a collection of 656 pieces of classical contemporary music, with different ensembles, then computed the average value for each feature, obtaining a sort of centroid of the style. After that, we added to the bundle 10 more pieces from the same contemporary classical substyle, 10 from the style Disco02, produced by the tonal system and 25 pieces from the Bodhidharma dataset [43] tagged as "Modern Classical", created by human composers, representing 701 pieces in total.
For the tonal system we performed a similar test. Choosing 220 themes from Melomics's DancePop and excluding 14 themes to compute the centroid. Then we added: these 14 pieces, 10 from the style Melomics's Disco02 that we consider the predecessor of DancePop and 10 more from Melomics's atonal DocumentarySymphonic.
For both cases, Figure 7 shows a representation of the distances of each theme to the computed centroid. In the atonal, the extra pieces generated of contemporary music appear scattered around the centroid of the group that had been generated previously (mean and standard deviation: µ = 1819.7, σ = 1603.2 and µ = 1842.54, σ = 1838.3 respectively); the human made compositions in similar styles appear close but shifted to farther distances (µ = 2101.88, σ = 2158.76), while music created with different directions appear at the farthest distance to the centroid (µ = 2363.76, σ = 1755.02). In the second case, the pieces of the same style that were added later are close to the centroid of the group of reference  We also used the API provided by The Echo Nest [44] (acquired by Spotify in 2014), to put in context Melomics music, this time analyzing audio files, among another set of musical pieces, natural sounds and noise, extracting some of the musical properties that they define (loudness, hotness, danceability, energy) and obtaining similar results.

Experiment in the Real World
The aim, through an MDtT design, is quantifying to what extent are Melomics and human music perceived as the same entity and how similarly do they affect the emotions of a human listener (musician or naive). The hypothesis is that a computer-made piece is indeed equivalent to conventional music in terms of the reactions elicited in a suitable audience.
In an effort to verify this hypothesis, an experiment registered the mental representations and emotional states elicited in an audience while listening to samples of computer-generated music, human-generated music and environmental sounds. It involved 251 participants (half of them professional musicians) who reported on the mental representations and emotions that were evoked while listening to the various samples. The subjects were also asked to differentiate the piece composed by computer from the one created by a human composer.

Methodology
Two musical pieces were commissioned in a specific style to both a human composer and the Melomics tonal system. In the first stage a comparison between both pieces was made using an approach that focused on human perception: listeners were asked what mental images and emotions were evoked on hearing each sample. In this phase of the experiment the listeners were not aware that some of the samples had been composed by a computer. In an effort to gauge whether or not both works were perceived in a similar way, the nature of the composer was not revealed until the second stage. Data analysis has been performed with a fourfold contingency table (two by two) and the differences between groups were evaluated by the chi-squared test with continuity correction. The Fisher test has been used only in those cases where the expected frequency was lower than 5. The significance level has been established at p < 0.05. In the second phase of the study, the sensitivity (i.e., the capacity to correctly identify the pieces composed by the computer) and the specificity (i.e., correct classification of pieces composed by humans) were also evaluated for both musicians and non-musicians, at a confidence level of 95%.

The Audio Samples
The specifications for the musical pieces were kept simple and the duration short, in order to ease their assessment, especially by non-musician participants: style: guitar ballad instruments: piano, electric piano, bass guitar and electric guitar bar: 4/4 BPM: 90 duration: 120s structure: A B Ar scale: major Guitar ballad was chosen also because its compositional rules were already coded within the system. The final audio files were obtained in MP3 format with a constant bitrate of 128 kbps and shortened to a duration of 1 m 42 s, meaning the beginnings and endings of the pieces (around 9 s each) were removed since they typically render constant composing patterns in popular music. Following composition, both the human-and the computercomposed ballads were doubly instantiated by means of, on the one hand, performance by a professional player and, on the other hand, through computerised reproduction. For the purposes of the performance the human player interpreted the scores of both works while the computer automatically synthesized and mixed a MIDI representation of both pieces using a similar configuration of virtual instruments and effects, the four combinations resulting in corresponding music samples. Table 1 shows how the pieces were labelled: HH stands for human-composed and human performance, CC for computer-composed and synthesized, HC for human-composed and computer-synthesized, CH for computercomposed and human performance, and NS for natural sounds. In contrast to the musical samples, this final sample has been introduced in an effort to gauge the listeners' response to non-musical sounds and consists of a two-minute excerpt from natural sounds (Jungle River, Jungle birdsong and Showers from The Sounds of Nature Collection [45] combined with animal sounds [46]. Audio S9 contains the five audio samples used in the study.

Participants
The experiment was carried out in two facilities in Malaga (Spain): the Museum for Interactive Music and the Music Conservatory. Subjects were recruited via posters in the museum facilities and internal calls among students and educators at the conservatory. Selected participants ranged in age from 20 to 60 and the answers were processed differently according to music expertise: subjects with five or more years training were labelled as 'musicians' while subjects with less or no training were classified as 'non-musicians'. Musicians are assumed to process music in a far more elaborate manner and possess a wider knowledge of musical structure, so they would be expected to outperform nonmusicians in a musical classification task.
The final sample consisted of 251 subjects, the mean age being 30.24 (SD = 10.7) years, with more musicians (n = 149) than non-musicians (n = 102). By gender the sample featured marginally more women (n = 127) than men (n = 124) and, in terms of nationality, the majority of the participants were Spanish (n = 204), the remainder being from other, mainly European countries (n = 47).

Materials
In both facilities the equipment used to perform the test was assembled in a quiet, isolated room in order to prevent the participants from being disturbed. Each of the three stations used for the experiment consisted of a chair, a table and a tablet (iPad 2). The tablet contained both the web application that provided the audio samples and the questionnaire that was to be completed by means of the touch screen. The text content (presentation, informed consent, questions and possible answers) were presented primarily in Spanish with English translations located below each paragraph. The device was configured with a WIFI connection to store the data in a cloud document and featured headphones (Sennheiser EH-150) both for listening to the recordings and to avoid external interferences. The iPad device was configured to perform only this task and users were redirected to the home screen when all the answers had been saved. The possibility of exiting the app was also disabled. The test is publicly available at http://www.geb.uma.es/mimma/, accessed on 30 April 2021.

Procedure
Supported by neuroscientific studies [47], critiques to computer-made compositions are suitable to be affected by anti-computer prejudice, if knowing in advance the nonhuman nature of the author. Hence, during the test, each subject was informed that they were undergoing an experiment in music psychology, but the fact that it involved computer-composed music was not mentioned at the beginning so as not to bias the results [48]. Participants were also randomly assigned to one of two different groups and the compositions were distributed between these two groups in such a way that each subject listened to the musical pieces as rendered by the same interpreter (Table 2). In this way, the responses were independent of the quality of the execution since each subject would listen to both the human and computer compositions interpreted either by the artist or by the computer, meaning that potential differences in composition and performance became irrelevant. Table 2. Distribution of musical pieces into groups according to the interpreter.

Group A Group B
Phase I HH/CH/NS CC/HC/NS Phase II HH/CH CC/HC The workflow of the test detailed in Figure 8 shows that the subject is first introduced to the presentation and the informed consent screens and then prompted for personal data relevant to the experiment (see specific questions in Table 3). The subject is then assigned to either group A or B, listens to five sequential audio recordings and answers a number of questions. The listening and answering process is divided into two phases:

1.
During Phase I each subject in group A listens to the three pieces in random order. HH and CH have been composed by a human and by our computer system, respectively, and both are performed by a human musician. Subjects assigned to group B proceed similarly, listening to the same compositions, but in this case synthesized by computer. Both groups also listened to the same natural sounds recording (NS). Having listened to each piece the subject is then asked whether the excerpt could be considered music and what mental representation or emotional states it has evoked in them. This final question requires an open answer as the subject was not given a list of specific responses.

2.
In Phase II, the subject listens to the same musical pieces (but not the natural sounds), following which they are asked whether they think the piece was composed by a human or by a computer. As previously stated, it is important that identification of the composer is withheld until the second phase so that subjects can provide their assessment of the music in Phase I without the potential bias of Phase II, in which the subject becomes aware that the music might have been composed by a computer. Figure 8. Test workflow. After the initial pages, the participants are randomly assigned to one group (A or B). Then, the samples are played and evaluated in a random order. Finally, the human or artificial nature of the musical pieces is assessed. In summarizing the process, a subject assigned to group A might proceed as follows: 1. Presentation screen 2.
Participant's information Perception assessment of S1 4.
Listen to HH (labelled as M1, and then S1)
Listen to HH 11.
Listen to CH 13. Q6 Table 4 shows the results in percentage of the yes/no questions provided in the first (blind) phase of the experiment. All the questions asked to the subject referred to the piece that had just been listened to: in random order, HH, CH and NS for subjects of group A, and CC, HC and NS for group B. None of the contrasts performed, for the total sample or for each type of listener, was found significant. An affirmative answer was almost unanimously given to the first question (Q1) by the subjects after listening to a sample of a musical piece (as opposed to NS), independently of who composed or interpreted it. By contrast, the NS sample was classified as non-music, although the results were narrower. Regarding the second question (Q2), all the musical pieces elicited mental images, with global percentages ranging from 55.6% to 70.1% and without significant differences observed in terms of who composed or interpreted the pieces (p > 0.05). Meanwhile, natural sounds elicited mental images in more than 90% of the cases, more than with any of the musical samples. Regarding the responses elicited by the recordings (Q4), subjects answered affirmatively (up to 80%) to both musical pieces -regardless of who was the composer or the interpreter-and the natural sounds. Only in the case of the music sample composed by computer and played by human musicians did the global percentage approach 90%. Most of the answers (179) were given in Spanish. Among the responses provided in English (23), three were given by US citizens, three by UK citizens, one by a Canadian and the remaining sixteen by subjects from non-English speaking countries. One subject provided the answers in German and the remainder (48) left the text boxes empty.

Experiments Results
Regarding mental representations, three different categories were established: 'nature/naturaleza', 'self/sí mismo' and 'others/otros'. This taxonomy aims to clearly distinguish whether the representational object is associated with the subjective realm or with the world [49]. Regarding emotions, the model proposed by Diaz & Flores [50] contains 28 polarized categories and an associated thesaurus for the task of grouping and was adopted for the purposes of classification. The aim was, first of all, to provide sufficient classes in which to include most of the participants' subjective characterizations while avoiding oversimplification and, secondly, to reduce the number of classes in the analysis where possible. The particular clustering model can be described as a combination of Plutchik's affective wheel [51], with the terms arranged together based on similarity, and the valence-arousal plane [52] that settles polarities for each of the present terms.
In order to analyse the distribution of frequencies in the assembled descriptive data, the different groups were compared using a chi-squared test with continuity correction in two by two tables, establishing the significance level at p < 0.05. Among the 753 descriptive entries for mental representations (251 subjects × 3 samples), 501 were classified into the categories defined above: nature (n = 300), others (n = 112) and self (n = 89); 23 were not classified and subjects did not offer a response (the box was left empty) in 229 cases. With respect to the emotional states, among the 753 entries, 600 were classified, 9 were not classified and 144 were left empty. Table 5 shows the contingency table for the five audio samples and the 501 classified mental representations, those descriptions relating to 'nature' (59.9%) being the most abundant. In the study of distributions there exist significant differences, since the mental representations elicited by the NS sample are mostly described using terms relating to 'nature' (91.7%). In the rest of the samples (the musical ones) the percentages are distributed more evenly among the categories, with a slight predominance of the type 'others'. In the study of the described emotions, the 600 valid descriptions of emotional stimuli were classified into 20 of the 28 established categories. The the most frequent emotional state was "calm" (57.5%), followed by "sadness" (13.3%) and the rest of them appearing with a notably smaller percentage. Since there are several categories and groups of study, in order to compute the hypothesis testing, they have been arranged into three classes, the two most frequent emotional states, "calm" and "sadness", and "other emotions". Table 6 shows the results. There is a significant difference in the group NS (p < 0.001), since most of the emotional descriptions are associated with "calm" (89.7%), while the rest of groups show similar percentages. In addition, the emotional states were analysed according to the polarity (or valence) in the 14 defined axes. Table 7 shows the results of this arrangement. The majority of emotional descriptions fit into the 'pleasant' category or positive valence (75.7%). Nevertheless, there is once again a significant difference with the group NS (p < 0.001), since 96.1% of the descriptions come under the category of 'pleasant', while in the music samples this figure ranges from 60.7% to 71.7%. In the analysis, no differences were observed regarding the composer or interpreter of the music (the human or the computer) and this lack of bias was observed for both professional musicians and non-musicians. The results are shown in Table 9. Musicians showed marginally higher values and a slight tendency to classify the piece CC as computer made. In general, all subjects failed to correctly identify the compositional source of the two musical samples.

Discussion and Conclusions
In this paper we have presented a system that combines formal grammars and evolutionary algorithms to compose in both atonal and tonal music styles. We have described how it works internally and how, differently from many other approaches, this one does not require a pre-existing dataset of compositions. This property brings a set of important advantages:

•
The product is innovative, since it is not based on imitation and has complete freedom to explore the space of search defined by the (more restrictive or looser) input rules, which act merely as a check that the new samples comply with the commission. Apart from the fitness and the evolutive mechanisms, the other essential part of the system that allows this free search is its implicit encoding based on formal grammars. They impose a hierarchical structure and favor behaviours like repetitions of musical units, satisfying some of the basic requirements of music composition. • The syntax to write the genomes (and hence the music) is (a) highly expressive, since it allows the representation of any piece of music that can be expressed in the common music notation; (b) flexible, any musical sequence can be written in infinite forms; (c) compact, meaning that in spite of including all the compositional and performing information, it consumes between a half and a third less storage than the equivalent in MIDI format, between a third and a quarter of a corresponding MusicXML file and definitely less than any audio based format; and (d) robust, meaning that if a genome is altered in any way, not only it still produces a valid piece of music, but it also shares many elements in common with the original, as being a mutation of it. • The system is affordable: (a) to set up, since there is no need to search and obtain samples from any external source; (b) regarding memory space, since there is no need to store or move large amounts of data to train; (c) while it is true that it requires the intervention of an expert to input the rules and achieve convergence to a particular style, once it is done, the execution is not very computationally demanding. Using a single CPU thread in a current computer, both the atonal and the tonal systems roughly produce one genuine composition in their most complex style in 6.5 min; and it is possible to run in parallel as many of these tasks as wanted.
The system is set up at the beginning, if desired, by using just a few simple highly abstract parameters, such as amount of dissonance, repetitiveness, duration of the composition, etc., which are then translated to the genomes that undergo the evolving process. It neither requires any existing creative input from the user nor does it require an iterative interaction from them (i.e., interactive fitness function). For a more specific purpose, the system can be constrained by allowing people with musical knowledge to produce music in a particular style, in roughly ten minutes, through a set of global parameters (harmony, rhythm, instruments, etc.) that act as fitness and can still be considered highly abstract musical directions, nothing more than what would be given to an expert musician to compose.
However, normally the composition of music is regarded as a creative process where humans can express and evoke sensations. Could we ever consider this artificially-created music, actual music? The definition of music is a tricky question, even if only instrumental music is considered. For example, thinking of it as an organized set of sounds can be too broad an understanding, as it includes a collection of sounds that, while organized, are not music and, at the same time, it takes the risk of excluding the compositions made by Melomics, since until they are synthesized (or performed) they are just scores. Two properties that are usually required in the definition of music are either tonality-or the presence of certain musical features-and an appeal to esthetic properties. The first can be guaranteed by Melomics through the encoding and the fitness function. The second property is certainly more complex to be respected, since we are unable to ascribe any esthetic intention to the software, but if we move the focus from the composer to the listener, the fact that the composer is a machine is not as relevant. Among the different comments on Melomics music, Peter Russell's [53] positive judgement was interesting to us as he, who had no knowledge of the origin of the composition, does not express any doubt on the musical nature of the piece. This encouraged us to follow this understanding to assess whether the system generated actual music. The mentioned properties are not the only way to define music and, indeed, they have problems in capturing all and only what humans usually consider music. There are multiple possible definitions, each one with strengths and weaknesses and their discussion is more the domain of the philosophy of music [54]. We needed to take a more practical approach, instead of looking at Melomics's compositions from a more philosophical perspective, we considered the opinion of critics and of the general public. If they considered the end result as music, then it could be considered music.
The TT was designed to identify thought traces in computer processing and its interactive nature makes the adaptation to a musical version difficult. Nevertheless, the underlying principle of Turing's approach remains a valid inspiration for new tests that measure how close artificial music from human music is. In contrast with previous works, the experiment presented illustrates a controlled and rigorous methodology for a trial of this nature performed over a large sample of participants. The first question of the questionnaire was motivated to measure potential differences perceived from the original music sample composed by a musician and the other samples. In this sense, it is worth noting that the natural sounds sample was classified as music by 41.7% of the professional musicians. In contrast, both music samples were classified as music by most of the subjects (over 90%). With respect to the capability of eliciting mental images, it also endorsed the hypothesis of this experiment, as there was no significance when evaluating the musical samples, eliciting images in around 50% of the subjects. With respect to the natural sounds sample, this measure raised to 90% (both in the specific question and in the presence of terms in the descriptions). This is not surprising: natural sounds are associated with concrete and recognizable physical sources, while music is produced by musical instruments and images arise in the form of memories not directly related to the perceived sounds, following a more abstract channel. The study of qualitative data confirmed this fact: most of the terms used in these descriptions fit the category "nature", differing from those used to describe the musical samples, which do not point to any of the defined categories. These results highlight the difference between natural sounds and the presented music, which appears to generate a more complex and wider set of mental images, with independence of the musical training of the listener or who composed or interpreted the pieces. With respect to the evoked emotional states, one of the most revealing results was that natural sounds had a significant rate of 89.7% of descriptions assigned to the state "calm", in contrast to the music recordings, with a maximum rate of 47.3% in this category, even though the music style presented was arguably calm. As in the case of mental images, all the musical pieces seemed to elicit a wider range of emotions, with independence of the listener, the composer or the interpreter. The second interesting result came from the study of valence in the descriptions. They turned out to be significantly positive (p < 0.001) when describing sounds of nature, while in the case of music, with no relevant differences among the groups of study, they also elicited unpleasant feelings. The final part of the test confirmed that the subjects were unable to distinguish the source of the composition. Even if it was done only with two different musical pieces, the sample was wide and the fact that about a half of it was made of professional musicians is an indicator of the robustness of the conclusions, which confirms the hypothesis, suggesting that computer compositions might be used as "true music". This is, of course, a first result in this line of quality assessment. The model can be extended to different musical styles, given that the automatic music synthesis module (in that particular style) is good enough as considered by the subjects involved in the present experiment.
There are a lot of cases in daily situations that can benefit from music being produced automatically, especially those cases that are not the focus of composers. It can make music composition accessible to more people, even those with very little or no knowledge of the process, which could eventually lead to new styles. A particular application that we would like to develop more in the future is adaptive music. The potential of generative music that complies with human music allows the creation of tailored music (specific genres, structure, instrumentation, tempo, rhythm, etc.) considering personal preferences as well as particular needs or goals, such as responding in real time to the evolution of physiological signals. This can be used for example in therapy (pain relief, sleep disorders, stress, anxiety) or to assist during physical activity.

Data Availability Statement:
The relevant data generated in this study is contained within the article or provided as supplementary material.