A Two-Step Approach for Classifying Music Genre on the Strength of AHP Weighted Musical Features

: Music is a series of harmonious sounds well arranged by musical elements including rhythm, melody, and harmony (RMH). Since music digitalization has resulted in a wide variety of new musical applications used in daily life, the use of music genre classiﬁcation (MGC), especially MGC automation, is increasingly playing a key role in the development of novel musical services. However, achieving satisfactory performance of MGC automation is a practical challenge. This paper proposes a two-step approach for music genre classiﬁcation (called TSMGC) on the strength of analytic hierarchy process (AHP) weighted musical features. Compared with other MGC approaches, the TSMGC has three strong points for better performance: (1) various musical features extracted from the RMH and the calculated entropy are comprehensively considered, (2) the weight of features and their impact values determined by AHP are applied on the basis of the Exponential Distribution function, (3) music can be accurately categorized into a main-class and further sub-classes through a two-step classiﬁcation process. According to the conducted experiment, the result exhibits an accuracy rate of 87%, which demonstrates the potential for the proposed TSMGC method to meet the emerging needs of MGC automation.


Introduction
Music is an essential part of human civilization. Different types of music composed of various elements (e.g., rhythm, timbre, tempo.) are played on numerous and various occasions, depending on the situation. In recent years, the digitalization of music has dramatically extended the range of available music applications. According to [1], since users prefer to browse music by genres than other alternatives, the technique of differentiating the music types of genres (music genre classification, MGC) has become a popular tool of choosing appropriate music for an occasion or application. In particular, MGC automation attaches importance to the needs of dealing with a large amount of music within a specified time; for example, MGC is used for musical treatments [2].
Corrêa and Rodrigues [3] presented a review of the most important studies on MGC. In an effort to compare the pros and cons of the published research literature, they investigated the most common music features and classification techniques used. In the field of music information retrieval, MGC methods are generally categorized into those that analyze song lyrics or those that analyze musical content [4]. Although analyzing song lyrics is a straightforward way to identify the emotional expression of music [5][6][7][8], its performance is practically limited for songs with fewer lyrics. For this

Entropy Analysis Method
The entropy analysis method is applied to calculate the entropy of the analyzed data, and is used to generate an indicator of complexity as a feature factor for classification. From a realistic point of view, many effective solutions to real-world problems rely on non-linear approaches. Accordingly, non-linear entropy analysis methods, including approximate entropy (ApEn) and sample entropy (SampEn), are frequently applied as information gaining methods to obtain vital data features.
The ApEn method, a non-linear approach used to measure the complexity of data series, was proposed in 1991. ApEn is often adopted to present the complexity of time series data as a non-negative number which indicates the probability of a new message being created from the time series data. When the complexity of the time series is high, the ApEn value is large [21]. Suppose that the dataset of a time series consisting of N values is respectively recorded as u(1), u(2), . . . , u(N); an m-dimensional matrix X, where m is the number of requested data, is defined. Therefore, X(i) is denoted as [u (1), u(2), . . . , u(i+m−1)], where 1 ≤ i ≤ N − m + 1. the ApEn value, ApEn(m,r,N), can be obtained by Equations (1) to (4). First, Equation (1) is used to calculate the distance between X(i) and X(j), marked as d[X(i), X(j)]. Next, let r be a threshold, Equation (2) can compute the value of C m i (r) with the condition of d[X(i), X(j)] less than or equal to r. Then, Equation (3) determines the value of the defined Φ m (r). Finally, the ApEn value, ApEn(m,r,N), can be obtained by Equation (4).
ApEn(m, r, N) = Φ m (r) − Φ m+1 (r) (4) SampEn, proposed in 2000, is considered to be an enhanced ApEn. Compared with ApEn, SampEn has better classification performance, especially for accuracy and consistency [22]. SampEn also adopts Equations (1) and (2) to calculate the distance between X(i) and X(j), and the value of C m i (r). However, the functions for computing Φ m (r) and SampEn(m, r, N) are given as Equations (5) and (6).

Exponential Distribution
In order to estimate the impact value of the selected musical features in the later feature extraction operation, the use of the Exponential Distribution is made. The Exponential Distribution is a probability distribution that describes the time (or distance) between events that occur continuously and independently at a constant average rate. In probability theory, a probability density function (PDF) is a function whose value at any given sample in the specific sample space can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample. The PDF value at two different samples can be used to infer how much more likely it is that the random variable will equal to one sample compared with the other. The introduction of PDF can be found from web, like Wikipedia; the formulation of the PDF is described as Equation (7).

Analytic Hierarchy Process
According to Bishop [23], a feature is an individual measurable property or characteristic of a phenomenon being observed. Choosing informative, discriminating, and independent features is a crucial step for effective algorithms in pattern recognition and classification, but not all features contribute to the results. The analytic hierarchy process (AHP) proposed by Saaty [24] is a quantitative method capable of evaluating and analyzing multiple factors for decision-making applications through a meaningful and repeatable process. The use of AHP can help in selecting feature vectors that are helpful for the classification result, improving the classification accuracy and the operational efficiency of feature oriented classification applications.

Machine Learning
Machine learning is a field of computer science that uses statistical techniques to give computer systems the ability to progressively improve performance on a specific task with data, without being explicitly programmed. There are many machine learning techniques [25] applied to MGC, such as artificial neural network (ANN), k-nearest neighbors (kNN), support vector machine (SVM) [26] and deep learning [27]. Of these approaches, ANN and kNN are the most used in the MGC field.
ANN systems [25] are computing systems roughly based on the biological neural networks that constitute animal brains. An ANN is based on a collection of connected units or nodes called artificial neurons. Each connection between artificial neurons can transmit a signal from one to another. The artificial neuron that receives the signal can process it and then signal artificial neurons connected to it. In common ANN implementations the signal at a connection between artificial neurons is a real number, and the output of each artificial neuron is calculated by a non-linear function of the sum of its inputs. Artificial neurons and connections typically have a weight that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Artificial neurons may have a threshold such that only if the aggregate signal crosses that threshold is the signal sent. Typically, artificial neurons are organized in layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first (input), to the last (output) layer, possibly after traversing the layers multiple times.
kNN [25] is a simpler supervised learning algorithm in machine learning technologies. It is a non-parametric method frequently used for classification and regression. In kNN classification, the input consists of the k closest training examples in the feature space; the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, the object is simply assigned to the class of that single nearest neighbor.

Design Highlights
When a machine learning based MGC method, like [2,[14][15][16], selects one single factor to perform a classification operation, it will struggle to classify music into proper classes if the musical feature of the evaluated music compositions doesn't have significant differences. In order to effectively and efficiently perform MGC, the following design concepts should be considered. First, in input analysis, multiple features with reasonable weight-settings are necessary to differentiate the music. Second, a two-step process applying the hierarchical classes; i.e., more genre levels, will contribute to precisely categorize the examined musical composition into a 'correct' class.

The Proposed TSMGC Approach
This section introduces the proposed two-step MGC approach based on AHP weighted musical features (TSMGC). The core of the TSMGC approach is an AHP-supported classification model based on accessing several musical features of the music content. Before introducing the design details of the proposed TSMGC, the notations used for presenting the related operations, including musical content retrieval (MCR), musical content analysis (MCA), feature extraction (FE), and two-step classification (TSC) are defined in Table 1. Table 1. The definition of parameters.

N
The number of musical composition n The total number of classes nm The number of main classes ns i The number of sub-classes belonging to the ith main class, where 1 < i < nm m The number of features nn s The number of musical notes in the sth musical composition p s i The pitch value of the ith musical note of the sth musical composition, where 1 < i < nn s i s i,i+1 The musical interval between the ith musical note and the (i+1)th musical note of the sth musical composition, where 1 < i < nn s − 1 P s The set of pitch of the sth musical composition I s The set of musical intervals of the sth musical composition C s The set of chords of the sth musical composition R s The set of rhythms of the sth musical composition E s The set of pitch entropy in the sth musical composition f s i The value of the ith feature of the sth musical composition, where 1 < i < m w s i The weight of the ith feature of the sth musical composition, where 1 < i < m 3.1. The Operation of the TSMGC TSMGC includes three operating phases: model training, model testing, and music classification, as shown in Figure 1. In the model training stage, musical contents are firstly retrieved by MCR from respective training data. An MCA process then generates the features, including pitch, musical interval, chord, rhythm, and pitch entropy. Next, the weight of all the features is measured and determined by the FE method, which helps to build a machine learning-based MGC model. The purpose of model testing is to evaluate the performance of the constructed MGC model; the testing data (music) excluded from the training dataset are used for the MCR, MCA, FE, and TSC processes. If the testing result is satisfactorily accepted, the MGC model is confirmed; otherwise, the approach returns to the model training phase to build a new MGC model. The music classification phase executes MCR, MCA, FE and TSC based on the confirmed MGC model to categorize the music to be classified into a proper main class (main genre) and corresponding sub-classes (sub-genres). Mathematics 2018, 6, x FOR PEER REVIEW 6 of 19 . Figure 1. The procedure of the two-step approach for music genre classification (called TSMGC) operation. MCR: musical content retrieval; MGC: music genre classification.

Musical Content Retrieval
The purpose of MCR is to retrieve the musical content (e.g., the set of musical notes) from the target data (music). In this study, jMusic [28] is used to analyze and retrieve the musical notes in MIDI (musical instrument digital interface) format. Each musical note consists of two musical elements, pitch and rhythm. The output of MCR is a set of numerical values transformed from the musical content of the target data in accordance with the musical elements; this is used in the MCA operation.

Musical Content Analysis
The MCA process measures the numerical values of the features including pitch, musical interval, chord, rhythm, and the entropy of the pitch. The process of determining the respective features for analysis is as follows.
(1) Pitch. The pitch of each musical note can be generated as a numerical value according to the defined numerical value of a musical alphabet (as shown in Table 2). Any two notes that have the same pitch but different musical alphabets are seen as enharmonic equivalents. For instance, the notes as alphabet values C and B♯ are enharmonic notes, thus the transformed numerical values of these two notes are both equal to 1. Based on this rule, a total of 78 pitches can be referred in a music composition. The value of each feature depends on its frequency. Furthermore, since the highest and lowest pitch values are also considered, 80 values in terms of pitch-oriented features can be collected.
(2) Musical Interval. An n-gram segmentation approach is applied to transform a note into a gram. Assuming the number of musical composition (N) equals 2, Equation (8) is used to measure the value of the musical interval between two successive notes. According to

Musical Content Retrieval
The purpose of MCR is to retrieve the musical content (e.g., the set of musical notes) from the target data (music). In this study, jMusic [28] is used to analyze and retrieve the musical notes in MIDI (musical instrument digital interface) format. Each musical note consists of two musical elements, pitch and rhythm. The output of MCR is a set of numerical values transformed from the musical content of the target data in accordance with the musical elements; this is used in the MCA operation.

Musical Content Analysis
The MCA process measures the numerical values of the features including pitch, musical interval, chord, rhythm, and the entropy of the pitch. The process of determining the respective features for analysis is as follows.
(1) Pitch. The pitch of each musical note can be generated as a numerical value according to the defined numerical value of a musical alphabet (as shown in Table 2). Any two notes that have the same pitch but different musical alphabets are seen as enharmonic equivalents. For instance, the notes as alphabet values C and B are enharmonic notes, thus the transformed numerical values of these two notes are both equal to 1. Based on this rule, a total of 78 pitches can be referred in a music composition. The value of each feature depends on its frequency. Furthermore, since the highest and lowest pitch values are also considered, 80 values in terms of pitch-oriented features can be collected. (2) Musical Interval. An n-gram segmentation approach is applied to transform a note into a gram.
Assuming the number of musical composition (N) equals 2, Equation (8) is used to measure the value of the musical interval between two successive notes. According to Wikipedia, the traditional musical theory has defined basic musical intervals, but the influence by semitone and enharmonic notes should also be considered. The use of a semitone may form an additional musical interval [29]; on the other hand, enharmonic notes may conduct an equivalent interval. For instance, the interval from C to D is an augmented second (A2), while that from C to E is a minor third (m3). Since D and E are enharmonic notes, the musical intervals of A2 and m3 are equivalent. Furthermore, the transformation from a compound interval into a simple interval can significantly simplify the identification of musical interval. Based on the mentioned rules, the numerical value of every musical interval can thus be calculated. For example, on a semitone basis, the interval from C (i.e., the pitch value is 1) to F (i.e., the pitch value is 7) is an augmented fourth (A4), the numerical value of this interval is set to 6 (i.e., 7 minus 1 leaves 6). Table 2 shows the numerical value of the defined musical intervals. The value of each musical interval feature is the number of times it appears in the musical composition. As a result, 12 musical interval-oriented features are determined.
(3) Chord. The set of pitches from a musical composition can be used to determine its tonality type by comparison with tonality characteristics. Each music has a different set of chords according to its tonality; there are seven basic chords in a tonality. Take the 12 Variations on "Ah, vous dirai-je, Maman" by Mozart as an example; since it contains no rising-falling tone (as the symbols shown in Figure 2), the tonality of this musical composition is perceived as a major scale based on C. In the first measure, only chord I is included. The second measure contains chords IV and I, the third measure contains chords vii and I, and the fourth measure contains chords V and I. The number of times each chord appears in a musical composition is recorded as the value of the respective chord-oriented features. In this part, 7 chord-oriented features can be generated. (4) Rhythm. According to the Oxford English Dictionary II, a rhythm generally means a "movement marked by the regulated succession of strong and weak elements, or of opposite or different conditions." In this study, 3 rhythm-related features are defined to record the lowest note value (i.e., the shortest duration), the highest note value (i.e., the longest duration), and the average note value of the musical composition. (5) Entropy of the pitch. The ApEn and SampEn methods are adopted in this analysis. First, the pitches of a musical composition are transformed into a time series. The data set of this time series is the core material for entropy computation. If 3 is chosen as the threshold and 2 as the number of dimensions, then ApEn(2, 3, nn s ) and SampEn(2, 3, nn s ) can be calculated as the pitch entropies. (4) Rhythm. According to the Oxford English Dictionary II, a rhythm generally means a "movement marked by the regulated succession of strong and weak elements, or of opposite or different conditions." In this study, 3 rhythm-related features are defined to record the lowest note value (i.e., the shortest duration), the highest note value (i.e., the longest duration), and the average note value of the musical composition.

Musical Alphabet
Enharmonic Note Numerical Value

Feature Extraction
The main purpose of feature extraction is to determine the weight (impact value) of each of the 104 features. Significant features can be determined according to their feature weights. In this process, the exponential distribution is assumed to describe the distribution of the feature values; the impact value of the features can be estimated by calculating the intersection area under two curves, which represents the value of the probability density functions (PDFs) for a feature in a selected class. Figure 3 shows an intersection area under two PDF curves for a feature in Classes 1 and 2, respectively. The solid line in Figure 3 represents the PDFs of Class 1, and the dashed line represents the PDFs of Class 2. The intersection point of these two PDF curves is X, and the shaded area denotes the intersection area under these two PDF curves. The smaller the intersection area, the more significant the feature. Conversely, a larger intersection area implies that the feature has less impact.
. Figure 2. The chord mapping in the first four-measures of 12 Variations on "Ah, vous dirai-je, Maman" by Mozart.
(4) Rhythm. According to the Oxford English Dictionary II, a rhythm generally means a "movement marked by the regulated succession of strong and weak elements, or of opposite or different conditions." In this study, 3 rhythm-related features are defined to record the lowest note value (i.e., the shortest duration), the highest note value (i.e., the longest duration), and the average note value of the musical composition.

Feature Extraction
The main purpose of feature extraction is to determine the weight (impact value) of each of the 104 features. Significant features can be determined according to their feature weights. In this process, the exponential distribution is assumed to describe the distribution of the feature values; the impact value of the features can be estimated by calculating the intersection area under two curves, which represents the value of the probability density functions (PDFs) for a feature in a selected class. Figure  3 shows an intersection area under two PDF curves for a feature in Classes 1 and 2, respectively. The solid line in Figure 3 represents the PDFs of Class 1, and the dashed line represents the PDFs of Class 2. The intersection point of these two PDF curves is X, and the shaded area denotes the intersection area under these two PDF curves. The smaller the intersection area, the more significant the feature. Conversely, a larger intersection area implies that the feature has less impact.  In order to estimate the intersection area, the PDFs of Classes 1 and 2 are defined as Equations (9) and (10), respectively. The mean of the feature values in Class 1 is 1 λ 1 , and that in Class 2 is 1 λ 2 . The intersection point x can be measured by Equation (11). Next, Equation (12) can be used to estimate the intersection area under these two PDF curves.
Mathematics 2019, 7, 19 9 of 17 Next, AHP is applied to adjust the feature weights as well as to rank the features by analyzing the number of records in each class and the weight of each feature. Given a set of n compared attributes denoted as {a 1 , a 2 , · · · , a n }, and a set of weights denoted as {w 1 , w 2 , · · · , w n }, the matrix M and W for the AHP computation can be built using Equation (13) [21].
Based on the above, the final weight matrix W F can be calculated by using Equation (16). Thus, the impact value of every features (i.e., ω k for the kth feature) can be determined. The results calculated by the above AHP related operations give a specific weighting value for each feature, and the importance of the classification results can be screened out to improve the MGC accuracy and reduce the time required for classification.

Two-Step Classification
The core concept of this two-step classification (TSC) is that music can be grouped into not just a couple of main classes (e.g., classical music, popular music, and rock music) but also into expanded sub-classes (e.g., medieval music, baroque music, classical era music, romantic music, and modern music for classical music). For this purpose, a TSC collaborated MGC model is required. For training and testing such a MGC model, this study adopts both kNN and ANN [30]. In theory, there will be n p models to be trained for classifying the main classes, and n i models for classifying the sub-classes of the ith main class. Based on the confirmed MGC model, the TSC realizes the proper main class and further determines the corresponding reasonable sub-classes.

Experiments and Analyses
This section presents a TSMGC experiment with the details of experimental settings including tools, samples, evaluation factor, and the execution. Based on the experiment results, the performance of TSMGC is analyzed and compared with other MGC methods.

Tools for Experiment Implementation
The Java and Matlab languages were adopted to develop and implement this experiment. The Java-based jMusic [28] program (version 1.6.4) was applied to perform the MCR processes. Eclipse (version 4.4.2), an integrated development environment, was used to implement MCA functions capable of generating and accessing the values of the features associated to pitch, musical interval, chord, rhythm, and pitch entropy. Matlab (version R2014a) was adopted to implement the proposed FE and TSC operations.

Samples and the Classes
The main classes for music defined in this experiment include classical music, popular music and rock music. The classical music main class contains five sub-classes: medieval music, baroque music, classical era music, romantic music and modern music. The popular music main class consists of five sub-classes, pop 1960s music, pop 1970s music, pop 1980s music, pop 1990s music, and pop 2000s music. The rock music main class consists of the 1960s music, classic rock music, hard rock music, psychedelic rock music and 2000s music sub-classes. Table 3 describes the selected 141 samples with their class settings.

The Evaluation Factor
The evaluation factor for this experiment is the accuracy rate computed by Equation (17). N is the total number of sample cases; MGC c is the correct number of classifications. A higher result accuracy rate indicates better performance of the adopted MGC approach.

The Experiment Execution
In this study, the data distribution of the extracted features is theoretically assumed to be an exponential distribution. In order to verify whether the data distribution of the extracted features is exponentially distributed, the chi-squared test was used to verify the goodness-of-fit between the real distribution of the sample data and the expected one. For example, Figure 4 shows the feature data distribution of Chord V based on 48 pieces of popular music. The circle points denote the PDF of practical data distribution (Real), and the rectangle points denote the PDF of exponential distribution (Exp). Because the test results are χ 2 = 3.24 < χ 2 n−1=19 = 30.149 when ∝= 0.05 [31], no significant difference was observed; this implies that it follows an exponential distribution.  In terms of arranging the training data and test data, a k-fold cross-validation method was used [26]. When a data sample is chosen as test data, the remaining 140 data samples are used as training data. Assuming that a selected test data belongs to the medieval music sub-class in the classical music category, the training data contains 50 classical music, 48 popular music, and 42 rock music data. Based on the above rule, a total of 141 executions were used to build the MGC models. In terms of arranging the training data and test data, a k-fold cross-validation method was used [26]. When a data sample is chosen as test data, the remaining 140 data samples are used as training data. Assuming that a selected test data belongs to the medieval music sub-class in the classical music category, the training data contains 50 classical music, 48 popular music, and 42 rock music data. Based on the above rule, a total of 141 executions were used to build the MGC models.
For each training round, the 140 training data were used to build four MGC models; one for main class classification, and the other three for sub-class classification (i.e., for respective classical, popular and rock). The features significant to the four classes were individually handled by the MCA and FE operations. After this, the kNN and ANN machine learning approaches were used to extract the feature data, including value and weight, to train the MGC models. In experiments, the value of k was 2 for kNN method, and the structure of ANN included a hidden layer with ten neurons. Next, the model testing operation was performed to evaluate the trained models, until the accuracy rate of the models were accepted and thus confirmed. The confirmed main class classification model was able to determine the main class of the musical composition (i.e., classical, popular or rock music). Regardless of model testing or music classification, the confirmed sub-class classification models were able to classify the evaluated data into appropriate sub-classes based on the identified main class.

Results
The results of different scenarios are listed in Table 4. CI is the case identifier. SC is the class set, which consists of Main, Classical, Popular, Rock, and All-mixed. The All-mixed type is a combination of all the classes, without differentiating between the main class or sub-class. SN represents the number of samples used. LN is the number of the labels applied for indicating the classification output. Taking the Main class as an example: since it is composed of three output labels: Classical, Popular, and Rock, its LN is 3. The remaining LN values of sub-class can be deduced by analogy. MM denotes the adopted machine learning method; either kNN or ANN. FE indicates whether or not the feature extraction is performed. FW indicates whether or not the AHP is carried out to weight the features. TS indicates whether or not the TSC process is performed. Finally, AC is the accuracy rate stating the performance of the MGC for a specific scenario. This study considers the uses of feature weights, feature extraction, and two-step method for comparisons. Table 4 shows that the proposed feature extraction method with AHP feature weights can improve the accuracies of music classification. For instance, the accuracy of music classification based on the proposed method with ANN can be improved to 87.23% for main-class. For two-step classification, the proposed method can extract the important features and give the AHP weights for these features, and the accuracy of the proposed method is 57.19% which is higher than other cases.

Discussions
Referring to the review summary by article [3], the proposed TSMGC approach belongs to a machine learning enabled symbolic data-based MGC that uses global-based features. But the TSMGC can enhance the performance of MGC by adding the following points: (1) combines various musical features extracted from the RMH and the calculated entropy, (2) applies the weight of features and their AHP-determined impact values on the basis of probability density function, (3) performs a machine learning based two-step classification process to more accurately categorize a music into a main-class and further sub-classes.
Based on the experimental results shown in Table 4, this paragraph compares the performance (i.e., accuracy) of the proposed TSMGC method with those of MGC methods whose feature extraction uses musical interval [2], pitch [32] or RMH only. According to Table 5, it implies that TSMGC applying AHP weighted RMH features achieved higher accuracy for each classification model. That is, the proposed TSMGC method considering multiple features delivers better performance than the methods using single features [2,32]. Furthermore, extracting the feature weights by AHP improves the MGC performance so that it is better than directly using RMH without weights. In addition, since the TSMGC is a two-step classification (TSC) method capable of precisely categorizing the result into not only a main class but also corresponding sub-classes, it is worth evaluating if TSC provides better performance than one-step classification (OSC) methods [2,32]. In theory, the OSC method trains a single classification model and completes the whole classification operation in one step, while TSC first determines one model for main classes, and several models for sub-classes. Table 6 presents the accuracy comparisons between OSC and TSC. It shows that the accuracy rate of TSC is higher than that of OSC for all FE methods. In particular, TSC in conjunction with RMH and an AHP based FE method delivers an accuracy rate of 57%, which is significantly higher than that of 19% produced by OSC, even with RMH and AHP supported features. Table 6. Comparison of classification accuracy of one-step and two-step methods.

Conclusions and Future Work
This study proposes a two-step MGC approach based on AHP weighted musical features, called TSMGC. The TSMGC approach adopts a FE method to analyze the distribution of the values of each of the RMH features and to calculate the intersection area of distributions in varied classes in order to extract significant features. Additionally, AHP was applied to adjust the weight of each extracted feature in order to improve MGC performance. Moreover, the TSC algorithm helps to determine the main class and the corresponding sub-class of the target music. Experimental results prove that TSMGC performance is superior to those of traditional MGC methods.
Although the overall accuracy rate of TSMGC is 1 to 2 times better than that of traditional MGC methods, the accuracy rate of its sub-class classification is still lower than 50% due to the similarity of features among the classes. The extraction of more musical features like timbre would address this problem. The idea is that, since a music composition can present different timbres according to different spectrums and waveforms, it can cause different stimuli to human senses. So that, humans can distinguish arpeggios performed by different people, as well as sounds produced by different musical instruments. In reality, each music genre has its frequently used instrument combinations or has some specific singing voices. Therefore, including timbre as a feature in future investigations is expected to improve MGC accuracy. In addition, the sample size of datasets and the number of music genres can be extended for TSMGC enhancement in the future. Furthermore, some techniques (e.g., restricted Boltzmann machine, deep belief nets, deep Boltzmann machine, ensemble techniques, cross-validation, etc.) can be applied for the avoidance of overfitting issues for further TSMGC extended investigations.
With the rapid growth of music applications, effective and efficient automatic MGC mechanisms are becoming increasingly important for novel music applications, such as online music platforms. The proposed TSMGC composed of a RMH multi-feature-based MCR and MCA, an AHP-supported FE, as well as an ML-based TSC, is capable of performing MGC efficiently and accurately. TSMGC addresses the important research direction of extracting AHP weighted features for use in the MGC process. The experiment conducted in this study also provides a valuable and practical reference for designing and implementing new systems for automatic MGC applications.