# Collaborative Filtering to Predict Sensor Array Values in Large IoT Networks

## Abstract

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Collaborative Filtering Methods

#### 2.2. Datasets Description

#### 2.3. Dataset Syntheses Via Copulas

Observe that, by construction, the correlation matrix and marginal histograms are very similar to the ones of Figure 3 and Figure 5.

## 3. Experimental Results

## 4. Conclusions

**Figure 3.**Distribution of the first 8 features in the [39] dataset. Each subplot shows the distribution of a reading of the sensor array. In grey, the histogram of the feature, and in orange a Gaussian kernel estimator of its density.

**Figure 4.**Boxplot of the normalized range of values for the 16 sensors in the [39] dataset. x-axis: sensor id, y-axis: range of results.

**Figure 5.**Correlation matrix for the 16 sensors in the [39] dataset. Warm colors (with maximum light orange) stand for strong positive correlation and cool colors (with maximum light blue) mean strong negative correlation. Black indicates no linear correlation.

**Figure 6.**Correlation matrix for the 128 sensors in the [40] dataset. Warm colors (with maximum light orange) stand for strong positive correlation and cool colors (with maximum light blue) mean strong negative correlation. Black indicates no linear correlation.

**Table 1.**Most adequate hyper-parameters reported by a grid search optimization on the [39] dataset.

Sparsity | PMF | BiasedMF | NMF | RowKNN | ColKNN | ||||
---|---|---|---|---|---|---|---|---|---|

Factors | $\mathbf{\gamma}$ | $\mathbf{\lambda}$ | Factors | $\mathbf{\gamma}$ | $\mathbf{\lambda}$ | Factors | Neighbors | Neighbors | |

0.1 | 20 | 0.070 | 0.005 | 15 | 0.070 | 0.005 | 10 | 25 | 5 |

0.2 | 20 | 0.070 | 0.005 | 15 | 0.070 | 0.005 | 5 | 25 | 5 |

0.3 | 15 | 0.070 | 0.005 | 15 | 0.070 | 0.005 | 5 | 25 | 5 |

0.4 | 10 | 0.070 | 0.010 | 20 | 0.070 | 0.015 | 5 | 25 | 5 |

0.5 | 15 | 0.070 | 0.015 | 20 | 0.070 | 0.020 | 5 | 25 | 5 |

0.6 | 10 | 0.070 | 0.020 | 20 | 0.070 | 0.030 | 15 | 25 | 5 |

0.7 | 10 | 0.035 | 0.005 | 15 | 0.070 | 0.040 | 15 | 25 | 5 |

0.8 | 15 | 0.040 | 0.005 | 15 | 0.070 | 0.070 | 15 | 25 | 5 |

0.9 | 15 | 0.070 | 0.070 | 15 | 0.070 | 0.070 | 15 | 25 | 5 |

**Table 2.**Most adequate hyper-parameters reported by a grid search optimization on the [40] dataset.

Sparsity | PMF | BiasedMF | NMF | RowKNN | ColKNN | ||||
---|---|---|---|---|---|---|---|---|---|

Factors | $\mathbf{\gamma}$ | $\mathbf{\lambda}$ | Factors | $\mathbf{\gamma}$ | $\mathbf{\lambda}$ | Factors | Neighbors | Neighbors | |

0.1 | 20 | 0.045 | 0.005 | 20 | 0.045 | 0.005 | 20 | 25 | 5 |

0.2 | 20 | 0.050 | 0.005 | 20 | 0.050 | 0.005 | 20 | 25 | 5 |

0.3 | 20 | 0.055 | 0.005 | 20 | 0.060 | 0.005 | 20 | 25 | 5 |

0.4 | 20 | 0.060 | 0.005 | 20 | 0.065 | 0.005 | 15 | 25 | 5 |

0.5 | 20 | 0.070 | 0.005 | 20 | 0.070 | 0.005 | 20 | 25 | 5 |

0.6 | 20 | 0.070 | 0.005 | 15 | 0.070 | 0.005 | 20 | 25 | 5 |

0.7 | 20 | 0.070 | 0.005 | 20 | 0.070 | 0.005 | 5 | 25 | 5 |

0.8 | 5 | 0.070 | 0.005 | 20 | 0.070 | 0.010 | 5 | 25 | 5 |

0.9 | 5 | 0.070 | 0.005 | 5 | 0.070 | 0.010 | 5 | 25 | 5 |

**Table 3.**Most adequate hyper-parameters reported by a grid search optimization on the synthetic dataset.

Sparsity | PMF | BiasedMF | NMF | RowKNN | ColKNN | ||||
---|---|---|---|---|---|---|---|---|---|

Factors | $\mathbf{\gamma}$ | $\mathbf{\lambda}$ | Factors | $\mathbf{\gamma}$ | $\mathbf{\lambda}$ | Factors | Neighbors | Neighbors | |

0.1 | 20 | 0.070 | 0.005 | 20 | 0.070 | 0.005 | 5 | 25 | 5 |

0.2 | 20 | 0.070 | 0.005 | 20 | 0.070 | 0.005 | 5 | 25 | 5 |

0.3 | 20 | 0.070 | 0.005 | 20 | 0.070 | 0.010 | 10 | 25 | 5 |

0.4 | 15 | 0.070 | 0.015 | 20 | 0.070 | 0.015 | 10 | 25 | 5 |

0.5 | 15 | 0.070 | 0.020 | 10 | 0.070 | 0.020 | 5 | 25 | 5 |

0.6 | 10 | 0.030 | 0.005 | 10 | 0.070 | 0.030 | 15 | 25 | 5 |

0.7 | 10 | 0.040 | 0.005 | 10 | 0.002 | 0.065 | 10 | 25 | 5 |

0.8 | 10 | 0.045 | 0.005 | 10 | 0.030 | 0.070 | 15 | 25 | 5 |

0.9 | 10 | 0.070 | 0.070 | 15 | 0.070 | 0.070 | 20 | 25 | 5 |

