Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Combining Real-Time Extraction and Prediction of Musical Chord Progressions for Creative Applications

Electronics 2021, 10(21), 2634; https://doi.org/10.3390/electronics10212634

by Tristan Carsault^*, Jérôme Nika^*

, Philippe Esling^*

and Gérard Assayag

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Electronics 2021, 10(21), 2634; https://doi.org/10.3390/electronics10212634

Submission received: 31 August 2021 / Revised: 24 October 2021 / Accepted: 25 October 2021 / Published: 28 October 2021

(This article belongs to the Special Issue Machine Learning Applied to Music/Audio Signal Processing)

Round 1

Reviewer 1 Report

Having reviewed the work, I believe this is a very interesting study and it should be accepted for publishing given some minor changes.

As far as the technical aspects go, the intention behind this work is clearly explained and the necessity of the work done is well justified. Below are a few remarks with regards to the sections in the article

Sections 1, 2, and 3 well explain the intention (and some methodology) behind the work. Moreover, the literature review not only supports the intention for the work but also provides relevant background on the topics involved. However, one important note about these sections is that the discussions in section 2.3 need to be improved. Given that the DYCI2 library is directly used in this work (as explained in Section 6), I think it should be explained in more technical detail. I believe this is essential for better understanding the real-time system proposed in Section 6.

In Sections 4 and 5, the methodology and the evaluation of the ACE module and the Chord Sequence Predictor are well explained. The authors very well justify why it is important to evaluate the performance of such systems from a musical point of view. The discussions provided in Sections 4 and 5 are very important and relevant.

In section 5, I believe the selected architectures serve the purpose of the argument. Also, the parameter selections, as well as the design decisions, are detailed enough. Using mostly MLP models in this context helps the reader to focus more on the importance of evaluation in selecting more musically suitable models for a generative task, rather than focusing on utilizing “better” architectures without establishing what “better” means in this context.

For me, the only section in this article, which was relatively hard to follow, is Section 6. This section fails to properly detail how the two modules were implemented in the real-time system. I believe that more detail needs to be provided. That being said, the authors clearly mention that the real-time system detailed in this section, is used to demonstrate a possible real-time use case of the modules detailed in Sections 4 and 5, and more evaluation needs to be done about the proposed real-time system. As mentioned above, perhaps, providing more detail about the DYCI2 library, would also improve the readability of this Section.

As a final note, there are some typos and grammatical errors which definitely need to be corrected. Moreover, some parts of Section 2 will greatly benefit from some language editing to improve readability. Some of these are highlighted in the attached annotated version of the article.

As a final remark, I highly suggest accepting this article for publication given that the minor changes mentioned above are addressed in the revision

Comments for author File: Comments.pdf

Author Response

Dear Reviewer,

Thank you for your constructive and helpful review, which has greatly improved the quality of our paper.

Please see the attachment.

Best regards

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper deals with a tool that recognizes played chords acoustically in real time (every beat) and predicts the upcoming chords. This way, a computer could "improvise" together with musicians or accompany them by playing the predicted chords...

Along the way, they compare different loss functions that either consider a chord as being right/wrong or the distance (edges) along a Tonnetz or the number of correctly interpreted notes of a chord. Lastly, they analyze how information on the music, like key and the position of the first beat, affect the prediction results.

This is a fine paper and definitely a whole lot of work was necessary. I really appreciate that you try to make machine learning for music more musical (this is still too rare in the field of MIR)! But I'm afraid the benefit of the new loss function and of adding key information is marginal. The benefit of your suggested, musical loss functions (Tonnetz and correct notes) is marginal in diatonic music and counter-productive in non-diatonic music. Likewise, information on key is only helpful in diatonic music but counter-productive in non-diatonic music.

More importantly: When you really think of improvised music, the key may be unknown when you start to jam. Furthermore, I wonder how reliable your beat detection works. You do refer to [61], so I assume the paper describes a beat detection algorithm (if so, you should state that more clearly). But how reliable is it in hand-made, improvised music, where tempo can change, drum are not looped and instrumentation starts sparse and thin but grows when more people tune in, reduces when someone starts virtuous, solo playing and so on. I think looking at key is not the right research question; it may rather be looking at real improvised music.

How did you deal with music that has multiple chords per beat? Which chord is right and which is wrong, what is the Tonnetz- or the Euclidean distance then?

I find it dubious that the authors cite so many preprints of articles on musical chord recognition, transcription and automatic composition (18,21,27,28,45,60,64,)... why are there no papers published in reliable journals?

Many additional, mostly minor things:

Section 1: I find it weird that you have no Introduction section

line 29: What do these references refer to? The definition of structure? Please make that clear.

Section 2: You overly refer to your own previous work: 5,13,14,15,17,18 ... I think you should reduce that as you mostly list some exemplary works. You could include some other researchers' works. Have you ever heard of Alexandraki's approach https://doi.org/10.1080/09298215.2015.1131990 and http://ediss.sub.uni-hamburg.de/volltexte/2014/7100/
She used machine learning to recognize what musicians play in real time and to predict what will be played next. Her application idea is networked music performance in which latencies make synchroneous ensemble playing impossible; so her tool predicts the upcoming note and sends it to the others before it has actually been played.

line 201: I am pretty sure you never introduced the abbreviation "ACE" before.

line 272: Even in Western tonal music there are only 12 pitch classes if you refer to equal temperament and enharmonic equivalency...

line 273: you should write "for example in jazz music", as there are more examples in which chords other than A0 exist.

line 290: Please don't write that you "chose" the ACE Analyzer... it's your own tool, so please explicitly say so.

line 292: why is the website anonymous?

line 322: should read "analyses"

Section 4.2.4: You should cite Literature on Euler's Tonnetz, either the original work
[Euler, Leonhard (1739). Tentamen novae theoriae musicae ex certissismis harmoniae principiis dilucide expositae (in Latin). Saint Petersburg Academy. p. 147.(1739)] or some secondary literature that explains it well.

lines 575-576: I disagree with the statement that chord changes mainly occur at the downbeat, it took me one minute to come up with 11 EDM examples where the chord changes are anticipated or occur on 3 (in a 4/4 beat) or change even more often ->
https://www.youtube.com/watch?v=k4Ge2BkaieI
https://www.youtube.com/watch?v=RB1CGIPvfFc
https://www.youtube.com/watch?v=wlFx7pPvr1I
https://www.youtube.com/watch?v=fwC9HB-vHyc
https://www.youtube.com/watch?v=NlTiihho9mg
https://www.youtube.com/watch?v=nFg5uIvl6YY
https://www.youtube.com/watch?v=YK3ZP6frAMc
https://www.youtube.com/watch?v=9c5yPIQ3LQI
https://www.youtube.com/watch?v=fcsvE1zv1ek
https://www.youtube.com/watch?v=uuH_nYKXqHM
https://www.youtube.com/watch?v=_5IYhxNH8fo

and in rock music, too:
https://www.youtube.com/watch?v=77xeMlN7ng4
https://www.youtube.com/watch?v=Bz7IGr3hWog
https://www.youtube.com/watch?v=S9tKwSboJeg
https://www.youtube.com/watch?v=UNo2-viKfW8
https://www.youtube.com/watch?v=SCCoO0_CSZA
https://www.youtube.com/watch?v=xy5S8s6rTH0
https://www.youtube.com/watch?v=FhBnW7bZHEE

line 578: should read "shown"

line 589: But what is the "key" worth in non-diatonic music, like some jazz or intelligent drum & bass music? Do you exclude 5 out of 12 chromatic notes? How do you deal with harmonic and melodic minor scale? Do you consider them as being diatonic or not... or the other way round: Does E minor have F# for you or not?

Table 6: your key prediction accuracy is pretty low, I am sure there are already more successful models out there

line 651: A naive baseline is not helpful, I suggest to remove it

lines 661-662: "A beam search is used during the inference, it that saves only the top 100 states at each step" this sentence is weird.

lines 741-742: "This can probably be explained by the fact that in this type of corpus, the underlying harmony often varies every 2 or 4 beats." -> this contradicts your statement that chords mostly change on the downbeat. Even though you cited a reference to underline the downbeat-statement, I suggest to remove it.

Table 8: There is a dot above the table. And: Instead of a capital "K" you should us a lowercase "k" to abbreviate 1,000, as K means Kelvin and k means kilo in SI.

Table 9: "inclusions or" should read " inclusions of"

Section 5.4.2.2: It seems that only ⊂Maj occurs frequently, probably because the dominant is usually a major chord that often uses a seventh and sometimes a ninth, too. Of course, accompanying with a C:Maj chord when the bar should have a C:Maj7 is totally acceptable. So I agree that these 7-8% of errors are mild ones that are totally acceptable. However, I disagree with the statement "To conclude, the ratio of weak errors increases when using high-level musical information."... the difference between all approaches (MLP-K, MLP-B, MLP-KB, ... ) is too subtle to draw such a conclusion.

line 871: "Indeed, as passing chords, they are often used at the same positions in turnarounds, cadences, and other classical sequences, which may explain why the information on downbeat helps identify them." -> Yes, this is very convincing for the given dataset

Section 6: Why is this section located here? You should probably mention these details earlier or ban this to the appendix

lines 908-909: "However, most songs have a tempo located broadly between 40 and 200." -> you mean 40 and 200 bpm(!)... Personally, the slowest song I own has a tempo of 60 bpm (a slow reggae tune)... the fastest around 220 (ska and some punk rock) (except for Moby's "thousand", which goes up to 1000 bpm.

Appendix: You should either
1. refer to the appendix im the main paper or
2. give more explanation on the content of the appendix and the reason why you included it or
3. remove it

Overall you should be careful with generalizations (e.g., that the most frequent non-diatonic chord is the double dominant) as your dataset is very narrow and mostly covers a bit of beat/rock/pop/psychedelic music

Your list of references is a mess that you should clean up:
- most of them are missing a DOI or another permalink, like URI, URN or a linked arXiv-code
- you should link whenever possible the arXiv code, the DOI and the URL/URI/URN
- sometimes you write ICMC, sometimes "Proceedings of ICMC" instead of writing "Proceedings of the International Computer Music Conference", the same with "International Symposium on Music Information Retrieval" (37) and "Proceedings of ISMIR" (38) and "ISMIR" (51)... you should be consistent and not abbreviate everything
- 15, 26 and many more are proceedings papers, so please indicate this
- 20 "ieee" uses capital letters
- MIREX or mirex? you should be consistent
- What is MLSP?

Author Response

Dear Reviewer,

Thank you for your constructive and helpful review, which has greatly improved the quality of our paper.

Please see the attachment.

Best regards

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Dear authors,

Thank you for addressing most of my concerns carefully and in detail.

There are only some minor things left:

lines 49-51: "However, they could take a step forward by being able to infer this sequence dynamically as a plays." -> This sentence seems misleading to me. I think you want to state that this ability would be great. But it reads as if they were able to achieve this already...

line 887: "that one single beat" should read "than one single beat"

Your reference list is still lacks links to easily find references, either via http, DOI. URI/URL/URN permalink or alike:

[1] has a permalink http://hdl.handle.net/2027/spo.bbp2372.2017.037
[4] has a DOI: https://doi.org/10.1145/1178723.1178742

...

and so on and so forth

As I stated before:
- you should link whenever possible the arXiv code, the DOI and the URL/URI/URN

Author Response

Dear reviewer,

Thank you again for your careful reviews.

Your comments have been incorporated into the revised manuscript, and DOI/permalink/arXiv links have been added to the references.

Best regards

Article Menu

Combining Real-Time Extraction and Prediction of Musical Chord Progressions for Creative Applications

Further Information

Guidelines

MDPI Initiatives

Follow MDPI