Mining Characteristic Patterns for Comparative Music Corpus Analysis

: A core issue of computational pattern mining is the identiﬁcation of interesting patterns. When mining music corpora organized into classes of songs, patterns may be of interest because they are characteristic, describing prevalent properties of classes, or because they are discriminant, capturing distinctive properties of classes. Existing work in computational music corpus analysis has focused on discovering discriminant patterns. This paper studies characteristic patterns, investigating the behavior of different pattern interestingness measures in balancing coverage and discriminability of classes in top k pattern mining and in individual top ranked patterns. Characteristic pattern mining is applied to the collection of Native American music by Frances Densmore, and the discovered patterns are shown to be supported by Densmore’s own analyses.


Introduction
Advances in music data mining and the creation of annotated music corpora [1][2][3] have supported a renewed interest in comparative music analysis [4,5]. Analyses of class-labeled music datasets have explored a range of data mining paradigms, including descriptive methods such as subgroup discovery and emerging pattern mining [6][7][8][9][10][11][12] and predictive methods such as decision tree and classification rule induction [13][14][15][16][17][18]. These studies generally focus on identifying discriminant properties, which distinguish different classes. Highly discriminant patterns, however, may only describe a few examples in a class. In fact, emerging pattern mining will discover contrasts between classes even if patterns are infrequent [19]; applications to music corpora consequently reveal emerging patterns that sometimes cover only a small proportion of examples in a class [12]. In contrast to discriminant patterns, characteristic patterns capture properties that are prevalent in a class: ideally, a characteristic pattern is complete, i.e., it covers all, or almost all, examples of a class [20]; completeness does not require characteristic patterns to also discriminate between classes. Other methods for characteristic pattern discovery consider characteristic patterns the more interesting the more specific they are to a certain class [21,22]. This paper explores characteristic pattern mining for music corpus analysis and investigates the trade-off between completeness and discriminant power of descriptive patterns.
Two examples shall illustrate characteristic patterns in music. These examples are based on analyses by Frances Densmore (1867Densmore ( -1957, one of the most prolific collectors of Native American music. Through comparative analyses of feature distributions in different tribal repertoires, Densmore sought to identify common and distinctive properties of these repertoires. The two selected patterns are both shared by a majority of songs in a class: among the songs of the Chippewa, 67% end on the keynote; of Teton Sioux songs, 70% contain one or more rhythmic units. However, when compared against songs of other tribes (Ute, Mandan, Hidatsa, Papago, Pawnee, and Menominee), only the first of these patterns shows different distributions in different tribes: "Tribes differ in the location of the keynote; for example, the Chippewa usually build their melodies above the keynote, while the Papago more frequently place the melody partly above and partly below the keynote. The percentage of Chippewa songs ending on the keynote is 67, while the Papago group contains 41 per cent with this ending" [23] (pp. [19][20]. The occurrence of rhythmic units, on the other hand, is prevalent not only in Sioux music, but also in the songs of other tribes: "The tribes show little difference in this respect [use of a rhythmic unit], and 72 per cent of the entire group contains one or more rhythmic units" [23] (p. 20). Thus, both patterns are characteristic for their class (describing a majority of songs in the class), but the ending on the keynote is more discriminant (describing proportionately fewer songs in other classes) than the use of rhythmic units (Figure 1). Densmore's analyses present a remarkable opportunity to compare, qualitatively, the results of computational pattern mining to published musicological findings.
other 49% Chippewa 67% lastNoteReKey:keynote other 72% Teton Sioux 70% rhythmicUnits:oneOrMore To separate potentially interesting patterns from trivial results, different pattern interestingness measures have been proposed [24]. In this paper, we analyze interestingness measures for characteristic patterns, investigating their ability to balance class coverage and discriminant power of patterns (Section 2). Top k pattern mining is applied to the Densmore collection of Native American music (Section 3); the behavior of top k patterns mined with different interestingness measures is compared, and top ranked example patterns are discussed with reference to observations by Frances Densmore (Section 4).

Descriptive Patterns
Descriptive pattern discovery focuses on symbolic data analysis, which extracts comprehensible and potentially interesting patterns intended for interpretation [25].

Class Association Patterns
Characteristic patterns describe properties that are common to most examples in a class. Hence, to place this work in the context of music pattern mining, we are interested in inter-opus patterns (recurring across multiple music pieces [8]), each of which is associated with a class. Let D be a dataset organized into classes of examples. Further, let X denote a pattern, interpretable as a Boolean predicate on examples: the pattern is true for an example if it describes a property of the example [26]. The set of examples for which a pattern X is true is said to be covered by the pattern. A class association pattern X ∧ C covers those examples in class C that are described by pattern X.
For music corpus analysis, different pattern representation languages have been employed, including representation by global features or by sequences of event features [27]. Following Densmore's quantitative analyses of Native American songs, the current study focuses on global-feature patterns: a global feature is an attribute-value pair a : v describing a song as a whole, e.g., lastNote : keynote or rhythmicUnits : no. A global feature covers a song if the value v of attribute a is true for the song. A global-feature pattern is a set of global features, e.g., {lastNote : keynote, rhythmicUnits : no}. A global-feature pattern covers a song if all global features included in the pattern cover the song.
The association X ∧ C between a pattern X and a class C can be summarized in a 2 × 2 contingency table ( Figure 2). The table shows the pattern's presence (X) or absence (¬X) in a target class C and the background of remaining classes ¬C. More specifically, the table cells list the respective support counts: the support count of pattern X, written n(X), is the number of examples in the dataset that are covered by pattern X, while the support count of pattern X in class C, written n(X ∧ C), is the number of examples in class C covered by pattern X. Further, n(C) gives the number of examples in class C, and N denotes the total number of examples in the dataset. Then, the empirical probability of a pattern X occurring in the dataset D is defined as P(X) = n(X)/N, and the conditional probability of pattern X occurring given class C is P(X|C) = n(X ∧ C)/n(C).

Pattern Interestingness
From the contingency table for a class association pattern X ∧ C (Figure 2), measures of pattern interestingness can be computed [24]. Interestingness measures guide the pattern discovery towards patterns that are of potential interest and should be presented. They can be used during pattern mining to prune uninteresting pattern candidates or during post-processing to rank or select patterns.
The issue of identifying interesting association patterns has been extensively studied, and a large number of interestingness measures have been proposed. Tan et al. [28] analyzed 21 interestingness measures for association analysis; the survey of Geng and Hamilton [24] covered 38 measures; while Belohlavek et al. [29] studied 61 measures. In this paper, we consider seven interestingness measures that have been used in characteristic pattern mining; the measures and their definitions are listed in Table 1.
-Typicality: The measure of typicality evaluates relative pattern frequency in the target class, i.e., the proportion of examples in class C that are covered by pattern X. The measure was originally proposed to quantify disjunctive patterns which together completely cover a class [30], but is here applied to evaluate individual characteristic patterns. -Utility: The utility measure was proposed in order to rank as more interesting those patterns which are more specifically related to the target class than to other classes [22]. The measure returns a positive value when pattern and class cannot be considered statistically independent, i.e., when P(X ∧ C) > P(X) × P(C). Utility is weighted by pattern frequency (scaled by a), favoring more general patterns with higher relative frequency P(X) (for a > 0). -Novelty: Similarly to utility, novelty (applied to characteristic patterns in [31]) is based on comparing the joint and individual probabilities of a pattern and a class: it returns a positive value when the co-occurrence of pattern and class is more frequent, measured by P(X ∧ C), than expected given the individual probabilities P(X) and P(C). Novelty can be rewritten as weighted relative accuracy, P(X) × [P(C|X) − P(C)], combining generality of the pattern with added value of the class probability given the pattern [32].
-Laplace estimate: The Laplace estimate has been used as an alternative to novelty for assessing characteristic patterns [31]. The measure quantifies the pattern's frequency in the target class relative to its frequency in the complete dataset. The value of the Laplace estimate is maximal if all examples covered by the pattern are examples of the target class. Due to the constant summands (1 in the numerator and 2 in the denominator), the measure implicitly penalizes less frequent patterns with smaller n(X) and n(X ∧ C) [33]. -Relative risk: The measure of relative risk has been employed to select among typical patterns those patterns which are more predictive of the target class [21]. Relative risk assesses the conditional probability of the class C given the pattern X against the class probability in the case of the pattern being absent. -IC ++ : The IC ++ measure [34] not only considers the pattern's absence in the target class, but also in the background, with ¬X observed in the target class rather than background suggesting C to be less plausible. Different from utility and novelty, the IC ++ measure, increasing with higher P(C), is biased towards frequent classes rather than frequent patterns. -F1 score: As a further measure, we borrow the F1 score from predictive data mining, which is designed to balance recall P(X|C) and precision P(C|X) by calculating their harmonic mean. While recall (corresponding to typicality) measures the relative frequency of the pattern in the target class irrespective of the pattern's occurrence in other classes, precision quantifies the proportion of pattern occurrences which are observed in the target class rather than other classes.
Hence, while typicality only considers the pattern's frequency in the target class C independently of its occurrence in the background ¬C, the other measures listed in Table 1 select in various ways characteristic patterns that are specific to the target class C, relative to their overall distribution. Among class-specific patterns, the measures of utility, novelty, and Laplace favor more frequent patterns, while the IC ++ measure is biased towards larger classes. Table 1. Interestingness measures for characteristic patterns.

Analysis Criteria
The analysis in this paper focuses on the trade-off between completeness and discriminant power for different pattern interestingness measures. A pattern X is Γ% complete with respect to a class C if it covers Γ% of the examples in C, that is completeness is quantified by the pattern's sensitivity [34,35]: Note that the definition of the typicality measure for characteristic patterns (Table 1) corresponds to Equation (1); considering typicality in this study hence provides a benchmark of maximal completeness for the given datasets. A pattern X is ∆% discriminant for class C if it covers (100 − ∆)% of the examples that are not members of class C, that is discriminant power of a pattern with respect to a class is given by its specificity [34,35]: Equations (1) and (2) quantify completeness and discriminant power of individual patterns. Descriptive statistics (such as the mean, median, and maximum) can then be calculated to analyze output sets of patterns.

Data
For the current study, two subsets from the Densmore collection of Native American music were selected (Section 3.1), and Densmore's own music content features were employed to represent songs (Section 3.2).

Datasets
Case Study 1: Tribes Predominantly, Densmore's comparative analysis "seeks to ascertain in what respects the music of one tribe (or linguistic family) resembles and differs from another" [23] (p. 19). The first dataset in the current study is comprised of songs of four tribes, considering all songs for which Densmore provided music content descriptors: Chippewa (326 songs), Teton Sioux (240), Pawnee (86), and Papago (167). These tribes are expected to reveal both resemblances and differences in their music: all four tribes belong to the Plains musical area, historically considered the most typical style of North American Native music; within the larger style area, however, several sub-areas can be identified whose styles differ from each other, including the Eastern Woodlands and Great Lakes (here represented by the Chippewa), the Plains and Northern Prairies (Teton Sioux), and the Southern Prairies (Pawnee). Included as well is the Pima-Papago style, although more marginal to the area than other sub-areas [36]. In addition, the four selected tribes represent different linguistic families: Algonquian (Chippewa), Siouan (Sioux), Caddoan (Pawnee), and Piman (Papago) [23].
Case Study 2: Song types In Densmore's study of Chippewa music, "the principal tabulated analysis is made on the basis of the class or use of the song" [37] (p. 1). In addition to the tabular analysis, the publication provides a narrative description of each song class and a synopsis of resemblances between classes. Densmore distinguished 11 classes of songs, including one group of unclassified songs. Of these, we consider in our analysis song types represented by at least 10 songs: Mǐde' songs (92 songs), war songs (87), dream songs (51), love songs (26), and moccasin game songs (14). Table 2 summarizes the two datasets in terms of their size (number of songs) and partitioning (number of classes). Table 2. Datasets, selected from Densmore's collection of Native American music [37][38][39][40][41].

Music Content Features
Underlying Densmore's quantitative analyses are music content features, i.e., attribute-value pairs, which describe melodic and rhythmic-metric aspects of the music. The selection of attributes and the partitioning of their values varies across Densmore's published analyses, ranging-for the repertoires considered in this paper-from 237 features (derived from 22 attributes) in the analysis of Chippewa songs [37] to 137 features (derived from 17 attributes) in the analysis of Papago songs [41]. The current study employs a subset of Densmore's features. First, attributes that in Densmore's analyses were applied to complete datasets rather than individual songs (e.g., average interval size or distribution between ascending and descending intervals) were ignored. Second, attributes with highly fragmented values and generally low feature counts (e.g., tempo based on metronome readings) were discarded. For the remaining attributes, some infrequent values were aggregated and-for cross-tribe analysis (Case Study 1)-common attributes selected and different partitionings of values mapped onto the same value ranges. Table 3 summarizes the feature vocabulary for the two case studies, the comparison of songs by four Plains tribes (Case Study 1) and the comparison of five song types within Chippewa music (Case Study 2).

Results
The two datasets described above were mined for class association patterns, employing the seven interestingness measures introduced in Table 1. In this section, we analyze sets of top k discovered patterns; in particular, we study how the different interestingness measures trade off completeness against discriminant power (Section 4.1). The top ranked pattern for each of the interestingness measures is presented, for the two case studies of comparing Plains tribes and of comparing Chippewa song types, discussing completeness and discriminant power of the individual patterns in the context of Densmore's quantitative analyses of the repertoires (Section 4.2). To reduce the risk of spurious patterns and to facilitate the interpretation of the mining output, pattern discovery was run with a minimum support threshold of 3% [42] and a maximum pattern size constraint of four features [43].  Figure 3 show for each measure a box around the first and third quartile, with whiskers from the minimum to maximum completeness or discriminant power. The plots reveal differences between the interestingness measures in balancing completeness and discriminant power. At the same time, overall similar behavior of the measures relative to each other can be observed in both case studies, suggesting that the principal observations can be generalized across datasets.

Distribution of Completeness and Discriminant Power for Different Interestingness Measures
(1) (1) (1) (1) (25) (28) tribes: discriminant power (1) (2) (1) (11) (17) song types: discriminant power Unsurprisingly, patterns ranked high by the typicality measure are located high in the completeness space but predominantly low with respect to discriminant power. In the first case study, the comparison of tribes, the maximum completeness (indicated by the highest ranked pattern according to typicality) is also reached by the IC ++ measure where it is found on rank five. In the second case study, the comparison of Chippewa song types, additionally utility (with higher a), novelty, and F1 score also include the maximally complete pattern among the top 30 patterns, in the case of utility (with a = 1) on rank one. For the utility measure, generally, the bias towards more frequent patterns with higher a favors more complete patterns, while low a leads to more discriminant patterns.
The remaining measures proposed for discovering characteristic patterns seek to increase discriminant power, frequently at the cost of lower completeness. The Laplace estimate generally appears to favor highly discriminant patterns, while relative risk shows a larger variance especially in the first case study, reaching relatively high, but also relatively low completeness values for some among the top ranked patterns. Novelty, like utility biased towards more frequent patterns, achieves higher completeness than Laplace and utility (with a = 0) and predominantly higher discriminant power than IC ++ and utility (with higher a). These findings are in line with alternative uses of these measures: Laplace has proven successful in associative classification using patterns [44]. The measure is similar to confidence in association rule mining, which was proposed as a measure for discriminant as opposed to characteristic pattern induction in early work on data mining and knowledge discovery [20]. Novelty, also known as Piatetsky-Shapiro (or PS) and leverage in association rule mining, has been used for mining differences between groups in combination with a statistical test metric [45] and (as mentioned above) can be translated into weighted relative accuracy in subgroup discovery, which has been adapted for contrast set mining [46]. When integrated into rule-based classification, novelty has been found to yield more general patterns with higher support but lower predictive accuracy than a rule-based classifier employing the Laplace estimate [47].

Example Patterns
The following two sections present example patterns discovered in the two case studies on Densmore's collection of Native American songs. They serve to illustrate the findings of the previous section at the level of individual patterns, to assess the pattern discovery against related observations by Densmore, and to demonstrate characteristic pattern mining for music corpus analysis. Table 4 lists the class association pattern ranked first by each interestingness measure for comparing songs of the four tribes Chippewa, Teton Sioux, Pawnee, and Papago. Several interestingness measures agree on the same top ranked pattern, lastNoteReCompass : above_lowest for Papago songs: this pattern is ranked first in computational pattern mining using either utility (with a = 1, pattern A3), novelty (A5), or relative risk (A7), i.e., measures that according to the findings of Section 4.1 tend to balance completeness and discriminant power to various degrees. In fact, this pattern is both characteristic (Γ = 0.89) and discriminant (∆ = 0.82)-for example, in contrast to Sioux songs generally ending on the lowest tone (A8, A4)-and was also highlighted by Densmore: "The difference between the songs in various tribes is clearly shown [in the ending relative to the compass]. In the Pawnee songs 78 per cent end on the lowest tone of the compass, while in the Papago group 90 per cent contain tones lower than the final tone" [40] (p. 15). In combination with additional features (beginning and ending on the fifth above the keynote with melodic tones based on a major pentatonic scale, A2), discriminant power increases to 0.99, though at the cost of decreasing completeness (Γ = 0.16), discovered with utility (a = 0), which tends to favor discriminant patterns. As for utility (a = 0), the pattern ranked first by the Laplace estimate-another measure biased towards discriminant patterns-is a more specialized pattern comprising four features: major tonality, harmonic structure, beginning on the twelfth above the keynote, and ending on the keynote (A6). Individually, the most discriminant of the features is the beginning on the twelfth (Γ = 0.29, ∆ = 0.93), while the ending on the keynote (Γ = 0.67, ∆ = 0.47) is characteristic but less discriminant for Chippewa songs when compared against Sioux, Papago, and Pawnee music. Indeed, in her analysis of individual features, Densmore commented on the "relatively large proportion" [48] (p. 52) of Chippewa songs beginning on the twelfth, while in Pawnee music, the "highest percentages of initial tones are on the tenth, octave, fifth and keynote" [40] (p. 14) and "[n]o Papago songs begin on a tone higher than the tenth above the keynote" [41] (p. 12). The trade-off between completeness and discriminant power is also clearly illustrated by the first pattern listed for the class of Teton Sioux songs, ranked top by typicality (A1): the characteristic pattern meterChange : yes is prevalent in this group-93% of the Sioux songs contain one or more meter changes-but also found frequently in other tribal repertoires and thus not discriminant for Teton Sioux music (∆ = 0.16); in fact, the pattern occurs among associations discovered with typicality also for the other tribes included in the analysis, on the second rank for Papago (92%), on the eighth rank for Chippewa (83%), and on the 17th rank for Pawnee (74%). This finding corresponds to Densmore's analyses, who observed "a change of measure lengths, as indicated by accented tones, to be a prevailing characteristic of Indian songs" [49] (p. 183). Table 4. Top-ranked patterns (by interestingness measure) in Case Study 1, with class support count n(C), pattern support count n(X), association support count n(X ∧ C), interestingness I (according to the respective interestingness measure), completeness Γ, and discriminant power ∆.

Measure Class (Support)
Pattern (Support) n(X ∧ C) I Γ ∆  Table 5 presents example patterns for Case Study 2, comparing different song types in the Chippewa repertoire. As for Case Study 1 above, the top ranked pattern for each of the studied interestingness measures was selected. The examples confirm the observations already supported by the mining results in the first case study: high completeness at the cost of low discriminant power is achieved by measures such as IC ++ (pattern B8) and utility (with a = 2, B4), while the Laplace estimate yields a top-ranked pattern with high discriminant power but low completeness (B6). In this case, the pattern ranked highest by utility (with a = 0, B2) shows higher completeness than in Case Study 1, but still has higher discriminant power than completeness. Similar to Case Study 1, highly discriminant patterns tend to be more specialized, comprising multiple features. Of these, the pattern ranked first by relative risk (B7) is the most complete; curiously, exactly this combination of features was also referenced by Densmore in her description of Chippewa moccasin game songs: "Directness is shown in the accented beginnings of the songs and their endings on the tonic, but this is contradicted by the small percentage of songs containing a rhythmic unit" [37] (p. 45). Among the features contributing to the other pattern for moccasin game songs listed in Table 5 (ranked highest by utility with a = 0, B2), individually, both the ending on the keynote and the absence of accidentals are highly characteristic (Γ = 0.93, with ∆ = 0.36 and ∆ = 0.18, respectively), and minor tonality is observed in a majority of moccasin game songs (Γ = 0.79, ∆ = 0.62), as also pointed out by Densmore: "In the analysis of these songs may be noted a large proportion in minor tonality [...]. Eighty-four per cent begin on, and all end on, either the tonic or dominant. [...] Only one song contains an accidental" [37] (p. 44). In combination, the features are still shared by 57% of the moccasin game songs, but are also strongly discriminant for this group of songs when compared against Mǐde', war, dream, and love songs. The top ranked patterns characterizing Mǐde' songs allow to directly observe the effect of pattern specialization on decreasing completeness and increasing discriminant power: an initial downward progression is found in 91% of Mǐde' songs (B8), while 70% of the Mǐde' songs begin with a downward progression and are melodic in structure (B5, B9); the latter pattern is more discriminant of Mǐde' songs (∆ = 0.69) than the former (∆ = 0.36). The specialized pattern for war songs ranked first by the Laplace estimate (B6) has 100% discriminant power: it is not found at all in the other groups. The general pattern for love songs ranked highest by typicality (B1) and utility (with a = 1, B3), on the other hand, has 100% completeness: "The love songs were unaccompanied by any instrument" [37] (p. 41); thus, all love songs in the analysis corpus were recorded without the drum. In comparison, songs of other types were usually accompanied by the drum (Mǐde', war, dream, and moccasin game songs) or rattle (Mǐde' songs) [37,38]. Table 5. Top-ranked patterns (by interestingness measure) in Case Study 2, with class support count n(C), pattern support count n(X), association support count n(X ∧ C), interestingness I (according to the respective interestingness measure), completeness Γ, and discriminant power ∆.

Discussion and Conclusions
A central issue in pattern discovery research is the challenge of separating interesting patterns from trivial results. Patterns that are characteristic of a class, covering most examples in the class, may not be discriminant. On the other hand, highly discriminant patterns may have low coverage, may be sensitive to noise, and may be prone to overfit a few examples in the data [50]. In this paper, we studied, for a range of pattern interestingness measures and on real musicological data, the trade-off between completeness and discriminant power in mining characteristic patterns. The empirical findings confirm the considerations underlying different choices of measure: the typicality measure, used to discover characteristic patterns that cover all or most examples in a class [20,30], achieves high completeness without controlling for discriminant power. The IC ++ measure was motivated by the view that those characteristic patterns are most interesting that also discriminate a target class from background classes [34]; consequently, class association patterns ranked high by the IC ++ measure are generally more discriminant than patterns ranked high by typicality, while still retaining relatively high completeness. Similarly, novelty has been selected as an interestingness measure to discover patterns that are frequent in the target class but infrequent in the background, thus trading off sensitivity and specificity of the pattern [31]. Other measures, such as relative risk, suggested to prune patterns that are frequent in a class but not predictive of the class [21], tend to more strongly favor discriminant patterns at the cost of decreasing completeness. The results reported in this paper were also found to align with alternative uses of certain measures in contrast pattern mining (e.g., novelty) or pattern-based classification (e.g., Laplace estimate).
Characteristic pattern discovery was applied to the Densmore collection of Native American music. In her publications, Densmore herself included quantitative analyses of collected songs organized into classes such as tribal repertoires or song types, which together with her narrative descriptions provide a valuable reference for assessing the results of the computational analysis. Without explicitly referring to characteristic and discriminant patterns (and thus not providing a ground truth for quantitatively evaluating computational pattern discovery), Densmore's findings include patterns that are prevalent in a group of songs, and thus can be considered characteristic patterns, and patterns that distinguish one group of songs from others, and hence can be considered discriminant patterns. Revisiting the two Densmore examples introduced at the beginning of the paper, while both patterns-the ending on the keynote in Chippewa songs and the occurrence of one or more rhythmic units in Sioux songs-are similarly prevalent in the respective groups (Γ = 0.67 and Γ = 0.70, respectively), the ending on the keynote in the Chippewa songs is more discriminant (∆ = 0.53) than the use of rhythmic units in the Sioux songs (∆ = 0.28). This analysis of completeness and discriminant power of the two example patterns can be related to Densmore's observation that tribes differed in the location of the keynote (with Chippewa songs exhibiting a comparatively high proportion of songs ending on the keynote), but showed little difference in the use of rhythmic units. When computational pattern discovery is applied to two subsets of Densmore's collection-comparing songs of four Plains tribes and comparing five song types within Chippewa music-the top-ranked patterns can in many cases be shown to match the corresponding observations in Densmore's comments on these repertoires. Thus, the interestingness measures for characteristic pattern mining explored in this paper indeed reveal patterns that appear of interest in comparative music corpus analysis.
The pattern evaluation strategies considered in the current study attempt to balance completeness and discriminant power in a single interestingness measure. This can make measure values difficult to interpret. Indeed, an application of subgroup discovery to music, using novelty, additionally presented other measures (including sensitivity, corresponding to typicality) to facilitate the interpretation of results [6]. Rather than a single integrated measure, multiple measures could be evaluated during the pattern mining. For example, employing frequency and significance constraints, Ali et al. [21] applied a χ 2 test of association in addition to a support threshold in order to ensure that the relation between a characteristic pattern and a class was statistically significant. Brijs et al. [35] set thresholds for minimum coverage in the target class (constraining sensitivity or completeness) and maximum coverage in the background (constraining specificity or discriminant power), though at the level of pattern sets rather than individual patterns. The explicit use of multiple measures potentially gives additional control over the trade-off between completeness and discriminant power and may enhance the interpretation of pattern interestingness results, but requires the specification of one or more measure thresholds.
The challenge of selecting interestingness measures has attracted considerable attention in data mining research, and several principles to characterize the behavior of measures under different conditions have been proposed [24,28,29,51]. The choice of a measure will always depend on the specific analysis interest in the task at hand [52]. The case studies in this paper provide examples in comparative music corpus analysis, studying class-labeled music data. As Densmore's analyses demonstrate, both characteristic patterns, which describe prevalent properties of classes of songs, and discriminant patterns, which capture contrasting properties that distinguish one class from other classes, can be of interest. In the first case, interestingness measures such as typicality, IC ++ , or utility (with higher a) will yield patterns with a bias towards completeness; on the other hand, measures such as relative risk, the Laplace estimate, and utility (with a = 0) will predominantly prioritize discriminant but potentially infrequent patterns. If both characteristic and discriminant patterns are studied or for an initial exploration of a dataset before specifying further analysis questions, measures such as novelty or F1 score, which balance completeness and discriminant power, may present a suitable compromise. Besides quantitative criteria such as completeness and discriminant power, other principles may be considered, e.g., intelligibility of a measure [52]. Attempts have thus been made to develop multi-criteria decision aids for selecting interestingness measures [52]. Ultimately, the choice of interestingness measure in a specific task is a subjective decision and may even be adapted in iterative data analysis; empirical studies comparing computational measures and human assessment of pattern interestingness have observed considerable differences for different human analysts or datasets [53].
Class-labeled datasets have been, and continue to be, prominent in music data mining. Most existing research has focused on identifying discriminant patterns, which alone do not provide an exhaustive picture of the data. The current study complements previous work by systematically investigating interestingness measures for characteristic patterns and illustrating their implications in a detailed discussion of discovered patterns.
Author Contributions: Conceptualization, K.N. and D.C.; methodology, K.N. and D.C.; software, K.N.; formal analysis and validation, K.N. and D.C.; data curation, K.N.; visualization, D.C.; writing, original draft preparation, K.N.; writing, review and editing, K.N. and D.C. All authors read and agreed to the published version of the manuscript.