Chinese Traditional Musical Instrument Evaluation Based on a Smart Microphone Array Sensor

: For Chinese traditional musical instruments, the general subjective evaluation method by experts is not cost-effective and is limited by fewer and fewer experts, but a clear physical law is very hard to established by physicists. Considering the effectiveness of artificial neural networks (ANNs) for complex system, for a Chinese lute case, a neural network based 8-microphone array is applied to correlate the objective instrument acoustic features with expert subjective evaluations in this paper. The acoustic features were recorded by a microphone array sensor and extracted as the constant-Q transform coefficients, Mel-frequency cepstral coefficients and correlation coefficients between each microphone for ANNs input. The acoustic library establishment, acoustic features extractions, and deep learning model for Chinese lutes evaluation are reported in this paper.


Introduction
Chinese traditional music, a living traditional culture, is one of significant parts of worldwide music, which represents the accumulation of national history and ideology. However, with the rapid influx and influence of the mass culture, many traditional techniques related to traditional music are depressed and gradually disappearing, especially on the inheritance and development of traditional musical instruments. Specifically, general method for evaluating musical instruments is the subjective evaluation of expert musicians, where musicians quantify the scores of musical instruments based on national standards, such as the aspects of definition, brightness, smoothness, harmony, etc., and finally get a comprehensive evaluation [1,2]. This method usually requires a synchronous evaluation of several experts to eliminate the personal preferences, which is not cost-effective and easy-achieved due to the presence of fewer and fewer experts. Physicists wish to explore relatively simple laws for correlating with the objective acoustic quality of a musical instrument and the subjective judgments made by experts.
Existing studies of musical instruments evaluation are mainly concentrated on physical characteristics of musical instruments (e.g., materials, component size, etc.) [3,4], and subjective evaluation methods [5]. The literature on correlation with the scientific measurements of acoustic sound of musical instruments and subjective evaluation of experts is quite fewer, since the objective instrument evaluation is a significant challenge [6]. Fritz [7] compared the musical instrument evaluation effectiveness of sound source characteristics and instrument acoustic quality in his review article, and highlighted the later method was closer to instrument evaluation of experts.
Although a clear relationship of the subtle and complex quality judgment of experts and the objective instrument acoustic quality is surprisingly hard to establish [5], the artificial neural networks (ANNs) might offer an effective solution for this complex problem since they can mimic the behaviors of human brain.
In this paper, for Chinese lute case, a neural network based 8-microphone array is applied to build the bridge between objective instrument acoustic features and the expert subjective evaluation. The acoustic features were recorded by a microphone array sensor and extracted as the constant-Q transform coefficients (CQTs), Mel-frequency cepstral coefficients (MFCCs) and correlation coefficients between each microphone (CCs) for the ANN input.

Microphone Array Sensor
As the acoustic field is a radiation field and reverberated with reflection and refraction, the acoustic signals at different positions vary in the front area of played musical instrument. The microphone array needs to be applied to collect more information of acoustic field since single microphone sensor only collects one-point acoustic signal. Based on spatial sampling theorem, the distance between each two microphones should be shorter than = /2 to avoid spatial aliasing.
Here is the acoustic wavelength. A non-uniform linear microphone array, as shown in Figure 1, has been proved its good performance for acoustic signal acquisition [8]. The array, being made of 8-AUDIOA4 microphones, is employed for data acquisition in this paper.

Acoustic Library Establishment
According to the China national musical instruments acoustics standard [1], 144 Chinese lutes almost covering all ranks, and a classic song The Crazy Snake Dancing almost covering all tested music notes are selected for acoustic library establishment. And 6 professional musicians accept to play the Chinese lutes, and 5 lute experts accept to evaluate the quality of Chinese lutes. The quality of Chinese lute is ranked as excellent (E), very good (VG), good (G), fair (F) and poor (P) with respect to the score 8~10, 6~8, 4~6, 2~4 and 0~2. The acoustic signals of playing Chinese lutes are collected by 8-microphone array in a concert hall at North University of China, and the subjective evaluations from 5 lute experts are scored based on national acoustic quality evaluation standard of musical instrument [1]. The established acoustic library is shown in Table 1.

Acoustic Features Extraction
From musician's perspective [2], the elements of music note are pitch, length, loudness, and timbre, which correspond to the intonation, attenuation, intensity, and dynamic range in acoustic, and correspond to baseband frequency, time domain characteristics, amplitude, and frequency domain characteristics in physical quantities. In order to obtain as much acoustic field information of Chinese lutes as possible, the constant-Q transform coefficients (CQTs), Mel-frequency cepstral coefficients (MFCCs) and correlation coefficients between each microphone (CCs) are ultilised to represent the acoustic features. The CQTs [9] are time-frequency representation of a music signal, which contains 88 elements related to the 88 equal-space frequency bins in this paper. The MFCCs [10] are short-term spectral-based features imitating the human ear system, which contains 12 elements in this paper. The CCs [8] are spatial representation of musical instrument acoustic field, which contains 8 elements in this paper. The different acoustic features of 5 ranks' Chinese lutes are shown in Figure 2.

Deep Learning
The 108 values of acoustic features, including CQTs, MFCCs and CCs, are used as classification features in our lute case. As illustrated in Table 1 and Figure 2, the acoustic features of different ranks lute have significant changes. We ultilise a BP neural network implementation provided by MATLAB Deep Learning Toolbox [11] for classification. Our 144 sets instances are randomly divided into three parts: 110 sets instances for network training, 24 sets instances for trained network testing, and 10 sets instances for network validating.

Results and Discussions
We evaluated all testing and validating combinations and averaged the results. For the lute set of 5 ranks, the trained BP neural network achieve a mean accuracy of 92.84%. The classification accuracies and errors of each lute rank are shown in Figure 3. The trained BP neural network has a high identification accuracy for rank VG, G and F, while has a poor identification accuracy for rank E and P.