# Explorative Multidimensional Analysis for Energy Efficiency: DataViz versus Clustering Algorithms

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Motivation and Problem Identification

#### 1.2. Current Paper Aim and Structure

## 2. Large Scale Buildings Energy Monitoring Methods

#### 2.1. Data Visualization

#### 2.1.1. The Scatter Plot Matrix

#### 2.1.2. The Parallel Coordinates

#### 2.2. Data Clustering Algorithms

#### 2.2.1. Distance Metrics

#### 2.2.2. Evaluation

- (i)
- The within-cluster sum of square [54]$${Q}_{T}=\frac{1}{k}\sum _{j=1}^{k}{\sigma}_{j}=\frac{1}{k}\sum _{j=1}^{k}\sum _{i=1}^{|{Z}_{j}|}\frac{d({x}_{i}^{j},{c}_{j})}{|{Z}_{j}|}$$
- (ii)
- The Davies–Bouldin index [55]$$DB=\frac{1}{k}\sum _{i=1}^{k}\underset{i\ne j}{max}\left(\frac{{\sigma}_{i}+{\sigma}_{j}}{d\left({c}_{i},{c}_{j}\right)}\right)$$
- (iii)
- The silhouette index [56]$$S=\frac{1}{k}\sum _{j=1}^{k}{S}_{j}=\frac{1}{k}\sum _{j=1}^{k}\frac{1}{|{Z}_{j}|}\sum _{i=1}^{|{Z}_{j}|}\frac{{b}_{i}^{j}-{a}_{i}^{j}}{max\left[{a}_{i}^{j},{b}_{i}^{j}\right]}$$$${a}_{i}^{j}=\frac{1}{|{Z}_{j}|}\sum _{l=1,l\ne i}^{|{Z}_{j}|}d\left({x}_{i},{x}_{l}\right)\mathit{and}{b}_{i}^{j}=\underset{p=1,\dots ,k;k\ne j}{min}\left[\frac{1}{|{Z}_{p}|}\sum _{l=1}^{|{Z}_{p}|}d\left({x}_{i}^{j},{x}_{l}^{p}\right)\right]$$

#### 2.2.3. Clustering Algorithms

## 3. Methodology

#### 3.1. Dataset and Indices Description

#### 3.2. k-Means Algorithm

## 4. Results

#### 4.1. Cluster Identification

#### 4.1.1. Cluster Hypothesis

#### 4.1.2. Data Visualization Techniques

#### 4.1.3. Clustering Algorithm

#### 4.1.4. Comparison between DataViz and k-Means Clusters

#### 4.2. Setting Thresholds

#### 4.3. Monitoring Trends

## 5. Discussion

## 6. Conclusions

## Author Contributions

## Acknowledgments

## Conflicts of Interest

## Nomenclature

## List of Abbreviations

GHG | Greenhouse Gas |

HVAC | Heating, ventilating and conditiong |

TRNSYS | Transient System Simulation Tool |

EEI | Energy Efficiency Index |

DataViz | Data Visualization |

TOE | Tonne of oil equivalent |

kWh | Kilowatt hour |

WSS | Within-Cluster Sum of Square |

$X,A,B$ | Multidimensional Dataset |

n | Number of elements in dataset X |

m | Number of attributes/dimensions of dataset X |

${x}_{i}$ | Observation i of dataset X |

${x}_{ij}$ | Real value of attribute j of observation i |

$i,l$ | Observation subscript |

${D}_{il}$ | Minkowski (or Mahalanobis) distance between observation i and l |

p | Minkowski order |

${J}_{\delta}\left(A,B\right)$ | Jaccard Distance |

${Q}_{T}$ | Within-Cluster Sum of Square |

$DB$ | Davies-Bouldin Index |

S | Silhouette Index |

k | Number of Clusters |

${x}_{i}^{j}$ | Observation i lying in cluster j |

${c}_{x}$ | The centroid of the cluster x |

${\sigma}_{x}$ | Mean distance between any data in cluster x and the centroid of the cluster |

$|{Z}_{x}|$ | Number of points in cluster |

$d\left({x}_{i},{x}_{j}\right)$ | distance between points ${x}_{i}$ and ${x}_{j}$ |

${x}_{min,j},{x}_{max,j}$ | Alert thresholds for attribute j |

$EE{I}_{year,kWh,night/day}$ | Annual night/day Electrical Energy Efficiency Index |

${E}_{i,kWh,day}$ | Electrical energy consumption during working day for month i |

${E}_{i,kWh,night}$ | Electrical energy consumption during night and weekend for month i |

**Figure 1.**Scatter plot matrix for the Unito’s buildings stock with respect to four attributes: type of building (1–9), the night/day energy efficiency index, the energy consumption per user and the energy consumption per square meter.

**Figure 2.**Parallel coordinates method for the Unito’s buildings stock for two building functions—(

**a**) Humanities Depts. and (

**b**) Agrarian Depts.—with respect to four attributes: type of building (1–9), the night/day energy efficiency index, absolute annual energy consumption and the energy consumption per square meter.

**Figure 3.**Elbow method. The plot shows within-cluster sum of square vs. k ($n.$ of clusters). The right k number is between 9 and 10.

**Figure 4.**Interactive data visualization tool to monitor historical trends based on the Parallel Coordinates method.

k | WSS | DB Index | Sil Index |
---|---|---|---|

3 | 0.597 | 2.165 | 0.407 |

4 | 0.426 | 1.985 | 0.505 |

5 | 0.321 | 1.966 | 0.479 |

6 | 0.271 | 1.726 | 0.466 |

7 | 0.228 | 1.701 | 0.490 |

8 | 0.213 | 1.694 | 0.411 |

9 | 0.146 | 1.501 | 0.680 |

10 | 0.140 | 1.539 | 0.531 |

Rand Index | Fowlkes Index |
---|---|

0.769 | 0.645 |

Building | kWh/year∗m${}^{2}$ | ${\mathit{EEI}}_{\mathit{night}/\mathit{day}}$ |
---|---|---|

Scientific Depts without lab | 30–50 | 0.8–1.1 |

Scientific Depts with lab | 70–110 | 1.1–1.9 |

Humanities Depts | <50 | 0.6–1.1 |

Agrarian Depts | 20–70 | 1.5–2.5 |

Medical Depts | 50–70 | 1.2–1.5 |

Administrative Offices | <50 | 0.4–1 |

