# Data-Dependent Feature Extraction Method Based on Non-Negative Matrix Factorization for Weakly Supervised Domestic Sound Event Detection

## Abstract

**:**

## 1. Introduction

## 2. Background

#### 2.1. Problem Description

#### 2.2. Non-Negative Matrix Factorization

## 3. Proposed System

#### 3.1. Strategy for the Frequency Basis Learning

#### 3.2. Iterative and Non-Iterative Feature Extraction Methods

#### 3.3. Classifier

#### 3.4. Post-Processing

## 4. Evaluation

#### 4.1. Evaluation Settings

#### 4.2. Comparison of Various Features

#### 4.3. Effect of the Training Data on the Frequency Basis Learning

#### 4.4. Thresholding Singular Values for Calculating the Pseudo-Inverse Matrix

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

**Figure 2.**Schematic diagram of the applications of the non-negative matrix factorization (NMF) methods to the acoustic signal processing systems.

**Figure 3.**Block diagrams for: (

**a**) the learning of the frequency basis; (

**b**) the learning of the composition of the data matrix from strongly-labeled data; and (

**c**) the learning of the composition of the data matrix from weakly-labeled data.

**Table 1.**Averaged results of various features. The performance of the Cornell system was provided as a reference of comparison. The boldface means the best performance of each measure.

w/o Event-Wise Post-Processing | w/ Event-Wise Post-Processing | |||
---|---|---|---|---|

F1-Score [%] (Micro) | F1-Score [%] (Macro) | F1-Score [%] (Micro) | F1-Score [%] (Macro) | |

NMF(iterative) | 35.06 | 31.58 | 40.12 | 39.23 |

NMF (non-iterative) | 34.87 | 30.16 | 40.02 | 38.45 |

MelSpec | 34.41 | 32.31 | 40.41 | 39.72 |

Log-Mel | 30.27 | 29.88 | 35.11 | 36.60 |

GAM | 32.15 | 33.09 | 37.23 | 39.81 |

CQT | 32.25 | 28.76 | 37.28 | 35.36 |

Cornell et al. [46] | - | - | (with own post-processing) | |

42.48 | 39.56 |

**Table 2.**Class-wise F1-scores [%] of various features. The boldface means the best performance of each class.

Electric Shaver | Speech | Dishes | Cat | Running Water | Dog | Frying | Blender | Alarm Bell | Vacuum Cleaner | |
---|---|---|---|---|---|---|---|---|---|---|

ine NMF (iterative) | 32.9 | 46.8 | 18.0 | 39.7 | 22.4 | 21.5 | 28.9 | 27.5 | 36.3 | 41.7 |

ine NMF (non-iterative) | 24.1 | 43.9 | 21.9 | 39.5 | 25.7 | 22.1 | 22.8 | 22.1 | 39.6 | 39.9 |

ine MelSpec | 35.7 | 45.0 | 18.0 | 39.1 | 30.9 | 17.4 | 24.0 | 32.0 | 32.0 | 49.0 |

ine Log-Mel | 38.3 | 37.3 | 14.2 | 36.7 | 27.4 | 13.7 | 23.5 | 23.0 | 35.8 | 49.0 |

ine GAM | 29.5 | 34.2 | 24.6 | 41.7 | 28.3 | 19.7 | 31.4 | 33.9 | 39.4 | 48.2 |

ine CQT | 34.6 | 46.7 | 18.4 | 35.9 | 21.7 | 16.5 | 9.1 | 24.8 | 24.1 | 55.6 |

**Table 3.**Comparison results with different frequency basis matrices from various parts of the database.

NMF (Iterative) | NMF (Non-Iterative) | |||
---|---|---|---|---|

F1-Score [%] (Micro) | F1-Score [%] (Macro) | F1-Score [%] (Micro) | F1-Score [%] (Macro) | |

STR | 40.12 | 39.23 | 40.02 | 38.45 |

WEAK(U) | 40.02 | 38.89 | 38.98 | 37.65 |

WEAK | 41.53 | 38.28 | 39.01 | 37.76 |

STR + WEAK(U) | 38.91 | 37.50 | 38.39 | 38.79 |

STR + WEAK | 38.51 | 37.39 | 39.97 | 38.40 |

**Table 4.**Comparison results with various thresholds of singular values for calculating the pseudo-inverse.

w/o Event-Wise Post-Processing | w/ Event-Wise Post-Processing | |||
---|---|---|---|---|

F1-Score [%] (Micro) | F1-Score [%] (Macro) | F1-Score [%] (Micro) | F1-Score [%] (Macro) | |

$\gamma =0.001$ | 27.26 | 24.56 | 35.16 | 34.24 |

$\gamma =0.005$ | 33.59 | 29.95 | 39.59 | 37.94 |

$\gamma =0.01$ | 34.87 | 30.16 | 40.02 | 38.45 |

$\gamma =0.05$ | 32.52 | 28.35 | 38.51 | 36.23 |

$\gamma =0.1$ | 21.45 | 10.88 | 26.45 | 17.39 |

