# Fibers of Failure: Classifying Errors in Predictive Processes

^{1}

^{2}

^{3}

^{4}

^{5}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Topological Data Analysis (TDA)

**Coordinate invariance:**TDA only considers the distances between data points as a notion of similarity (or dissimilarity). This means that a topological model can be rotated freely in space in order to enhance the visual analysis of the data. Compare this property with a common data analysis tool, such as principal component analysis (PCA), which ultimately decides the visual outcome of the data due to its projection of the data into maximum variance space of two or three dimensions. Figure 1a illustrates the coordinate invariance property for an arbitrary dataset.

**Deformation invariance:**Topology explains shapes in a different way compared to geometry. For example, a sphere and a cube are identical (homeomorphic) according to topology. Likewise, a circle and an ellipse are identical. White noise is inherent to any dataset and can be considered as a deformation of the underlying distribution of the dataset. Due to the deformation invariance property, TDA is a suitable method for analyzing noisy datasets and thus presents a more accurate visualization of the underlying dataset. An example of deformation invariance is the figure-8-shaped dataset in Figure 1b).

**Compression:**This property enables TDA to represent large datasets in a simple manner. Imagine having a dataset of millions of data points that have the shape of the letter Y. See Figure 1c). The compression property enables TDA to approximate the dataset using 4 nodes, which contains the data points, and 3 edges, which express the relations between the data points. This property makes TDA highly scalable.

#### 2.2. Mapper

- Choose a collection of maps ${f}_{1},\cdots ,{f}_{k}:X\to \mathbb{R}$, or equivalently some $f:X\to {\mathbb{R}}^{k}$. These are usually chosen to be statistically meaningful quantities such as variables in the dataset, density or centrality estimates, or outputs from a dimensionality reduction algorithm such as PCA or MDS. These are usually referred to as lenses or filters.
- Choose a covering $\mathbb{U}=\{{U}_{1},\cdots \}$ of ${\mathbb{R}}^{k}$: an overlapping partition of possible filter value combinations.
- Pull the covering back to a covering $\mathbb{V}$ of X, where ${V}_{i}\in \mathbb{V}={f}^{-1}\left({U}_{i}\right)$.
- Refine the covering $\mathbb{V}$ to a covering $\widehat{\mathbb{V}}$ by clustering each ${V}_{i}$.
- Create the nerve complex of the covering $\widehat{\mathbb{V}}$: as vertices of the complex we choose the indexing set of $\widehat{\mathbb{V}}$, and a simplex $[{i}_{0},\cdots ,{i}_{j}]$ is included if ${\widehat{V}}_{{i}_{0}}\cap \cdots \cap {\widehat{V}}_{{i}_{j}}\ne \varnothing $. If we are only interested in the underlying Mapper graph, it suffices to add an edge connecting any two vertices whose corresponding sets of data points share some data point.

**Nerve Lemma:**If X is some arbitrary topological space and $\mathbb{U}=\left\{{U}_{i}\right\}$ is a good cover with index i then $X\simeq N\left({U}_{i}\right)$, where $N\left({U}_{i}\right)$ has simplex $[{i}_{0},\cdots ,{i}_{d}]$ if and only if ${\bigcap}_{k=0}^{d}{U}_{{i}_{k}}\ne 0$.

- For each $i=1,\dots ,k$, select (1) a positive integer ${N}_{i}$ and (2) a positive real number ${p}_{i}$, with $0<{p}_{i}<1$.
- For each filter ${f}_{i}$ where $i=1,\dots ,k$, let $mi{n}_{i}$ and $ma{x}_{i}$ denote the minimum and maximum values taken by ${f}_{i}$, and construct the unique covering of the interval ${J}^{i}=[mi{n}_{i},ma{x}_{i}]$ by ${N}_{i}$ subintervals ${J}_{s}^{i}\subseteq {J}^{i}$ of equal length $=\frac{ma{x}_{i}-mi{n}_{i}}{{N}_{i}}$. For the interior intervals in this covering, enlarge them by moving the right and left hand endpoints $\frac{{p}_{i}}{2}\xb7\frac{ma{x}_{i}-mi{n}_{i}}{N}$ to the right and the left, respectively. For the leftmost (respectively rightmost) interval, perform the same enlargements on the right (respectively left) hand endpoints. Denote the intervals we have created by ${J}_{1}^{i},\dots ,{J}_{{N}_{i}}^{i}$, from left to right.
- Construct the covering $\mathbb{U}$ of X by all “cubes” of the form ${({f}_{1}\times \cdots \times {f}_{k})}^{-1}({J}_{{s}_{1}}^{1}\times \cdots \times {J}_{{s}_{k}}^{k})$ where $1\le {s}_{i}\le {N}_{i}$. Note that this is a covering of X by overlapping sets.

#### 2.3. FiFa: The General Case

- Create a Mapper model that uses a measure of prediction failure as a filter.
- Classify hotspots of prediction failure in the Mapper model as distinct failure modes.
- Use the identified failure modes to construct a model correction layer or to provide a guidance for model refinement.

#### 2.3.1. Mapper on Prediction Failure

#### 2.3.2. Extracting Subgroups

#### 2.3.3. Quantitative: Model Correction Layer

**Train classifiers.**For our illustrative examples, we demonstrate several “one versus rest” binary classifier ensembles where each classifier is trained to recognize one of the failure modes (extracted subgroups) from the Mapper graph.

**Evaluate bias.**A classifier trained on a failure mode may well capture larger parts of test data than expected. As long as the space identified as a failure mode has consistent bias, it remains useful for model correction: by evaluating the bias in data captured by a failure mode classifier we can calibrate the correction layer.

**Adjust model.**The actual correction on new data is a type of ensemble model, and has flexibility on how to reconcile the bias prediction with the original model prediction—or even how to reconcile several bias predictions with each other. In the example cases used in this paper, we showcase two different methods for adjusting the model: on the one hand by replacing a classifier prediction with the most common class in the observed failure mode, and on the other hand by using the mean error as an offset.

**Note on Type S and Type M errors.**

#### 2.3.4. Qualitative: Model Inspection

#### 2.4. Statistical Modeling

#### 2.4.1. Performance Metrics

#### 2.4.2. Kolmogorov–Smirnov (KS) Statistic

#### 2.5. MNIST Data with Added Noise

#### 2.6. Electric Arc Furnace

#### 2.7. Selected Models for Analysis

#### 2.7.1. CNN Model Predicting Handwritten Digits

#### 2.7.2. ANN Model Predicting the EE Consumption of an EAF

#### 2.8. FiFa on the MNIST Model

#### 2.8.1. Quantitative

**Filters:**Principal component 1, probability of Predicted digit, probability of ground truth digit, and ground truth digit. Our measure of predictive error is the probability of ground truth digit. By including the ground truth digit itself, we separate the model on ground truth, guaranteeing that any one one failure mode has a consistent ground truth that can be used for corrections.**Metric:**Variance normalized euclidean.**Variables:**9472 network activations: all activations after the dropout layer that finishes the convolutional part in the network and before the softmax layer that provides the final predictions. These are the layers with 9216, 128, and 128 nodes displayed in Figure 4.**Instances:**We used 16,000 data points (5-fold-training), and a selection of 4 of the 5 folds from a randomized mix of the MNIST-test and C-MNIST-test datasets. See Figure 5 for an illustration.

**Ten activations**from the Dense-10 layer, which consist of the probabilities for each digit, 0–9.**Seven-hundred and eighty-four pixel values**representing the flattened MNIST image of size 28 × 28 × 1.**Six variables**: prediction by the CNN model, ground truth digit, corrupt or original data (binary), correct or incorrect prediction (binary), probability of the predicted digit (highest value of the Dense-10 layer), and probability of the ground truth digit.

#### 2.8.2. Qualitative

#### 2.9. FiFa on the EE Consumption Model

#### 2.9.1. Quantitative

**Filters:**principal component 1, the model error, and the true EE consumption.**Metric:**inter-quartile range (IQR) normalized euclidean.**Variables:**forty-eight variables; see Table A4.**Instances:**the same data points, from 9533 heats, used to train the ANN model for predicting the EE consumption.

- The average error value imposed by the predicted data points on the training data, $\Delta {E}_{El}^{Tr}$, must be of the same sign (type S error) as the average group error value, $\Delta {E}_{El}^{Gr}$. This is to ensure that the errors of the predicted data points by each classifer are consistent with errors of the groups they have been trained to predict.
- The error after adjustment of the group data cannot be worse than the group error,$|\Delta {E}_{El}^{Gr}-\Delta {E}_{El}^{Tr}|<|\Delta {E}_{El}^{Gr}|$.This is to verify that the classifier can identify data points that have, on average, somewhat similar error values as the group it is trained to identify.

#### 2.9.2. Qualitative

## 3. Results and Discussion

#### 3.1. MNIST Model

#### 3.1.1. Quantitative

#### 3.1.2. Qualitative

#### 3.2. EE Consumption Model

#### 3.2.1. Quantitative

#### 3.2.2. Qualitative

## 4. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Abbreviations

SHAP | Shapley Additive Explanations |

AHCL | Agglomerative Hierarchical Clustering |

PCA | Principal Component Analysis |

ANN | Artificial Neural Network |

CNN | Convolutional Neural Network |

EAF | Electric Arc Furnace |

EE | Electrical Energy |

FiFa | Fibres of Failure |

KS | Kolmogorov–Smirnov |

LR | Logistic Regression |

TDA | Topological Data Analysis |

CDF | Cumulative Distribution Function |

## Appendix A

**Table A1.**The parameters that were varied and the resulting total number of models used in the grid-search.

Parameter | Values | Description | Number of Parameters |
---|---|---|---|

Delay variables | See Table A4. | 5 unique segmentations of the delays imposed during each heat | 5 |

Hidden layers | $h\in 1,2$ | Number of layers between the input and output layers | 2 |

Hidden nodes | $o\in 1,2,\dots 24$ | Number of nodes in each hidden layer | 24 |

Total number of models: | 240 |

**Table A2.**Parameters that were not varied in the grid-search. Specifications can be seen in the Scikit-learn package MLPRegressor that was used to create the models. See Table A3 for the software used to create the models.

Parameter | Value | Description |
---|---|---|

Output variable | Electrical Energy | The goal of the model is to optimize for the true EE consumption |

No. training data points | 9533 | Selected from approximately 2 subsequent years of production |

No. test data points | 2384 | Selected in chronological order with respect to the training data |

Activation function | Logistic | Function applied between layers in the network |

Tolerance | ${10}^{-8}$ | Minimum change before training stops |

Max iterations | 5000 | Maximum number of iterations before training stops |

No. iterations with no change | 20 | Number of iterations with no change (below tolerance) before training stops |

Validation fraction | 0.2 | Fraction of training data used as validation set |

Optimizer | Adam | First moment vector = 0.9. Second moment vector = 0.999 [80]. |

Learning rate | 0.001 | Constant learning rate. |

Software | Purpose |
---|---|

Anaconda | Python software bundle [81]. |

Scikit-learn | Python package providing basic machine learning models |

Keras | Python package providing deep learning modeling; CNN. |

Keras-vis | Python package providing visualisation tools for Keras. |

Pandas | Python package for handling tabular data |

Matplotlib | Python package for plotting and drawing of Mapper graphs |

Ayasdi SDK | Python package used to retrieve results from Mapper provided by Ayasdi, Inc. [82]. |

**Table A4.**All variables used in the EAF case. The “Count” column shows the number of variables representing the parameter. The ANN/FiFa column indicates whether the variables were used as input variables in their respective models: “x” indicates the variable(s) is part of the model while “-” indicates the variable is absent. Completely absent variables in both ANN and FiFa were still part of the qualitative analysis using the KS-statistic to identify groups in the Mapper graph.

Parameter(s) | Description | Unit | Count | ANN/FiFa |
---|---|---|---|---|

Electrical energy | The electrical energy consumption logged in the transformer system | kWh | 1 | -/- |

Error | The error from the ANN model predicting the electrical energy consumption | kWh | 1 | -/- |

Pre-heater energy | The calculated energy provided to the scrap by the scrap pre-heater | kWh | 1 | -/x |

Other energy variables | Related to the proprietary energy model used to determine when the molten steel is ready for tapping. | kWh | 12 | -/- |

Total Weight | Total input weight of charged material | kg | 1 | x/x |

Metal Weight | Total input weight of metallic material | kg | 1 | -/x |

Slag Weight | Total input weight of oxide material | kg | 1 | -/x |

Oxide composition | Weight percent of 11 oxides and 1 flouride | wt% | 12 | -/x |

Metal composition | Weight percent of 26 alloying elements | wt% | 26 | -/x |

Additive Propane | Total input of propane | ${m}^{3}$ | 1 | x/x |

Additive O2 | Total input of oxygen gas through lance | ${m}^{3}$ | 1 | x/x |

Additive O2 Burner | Total input of oxygen gas through burner | ${m}^{3}$ | 1 | x/x |

Additive N2 | Total input of nitrogen gas through lance | ${m}^{3}$ | 1 | -/x |

Rawtypes | Total input weight of each raw material category | kg | 7 | x/x |

Process Time | Defined as start of the heat to the end of the heat | min | 1 | x/x |

Tap-To-Tap Time | Defined in the proprietary logging system | min | 1 | x/x |

Power-on Time | Defined as the total time the electrical energy source is powered on | s | 1 | -/- |

Power-off Time | Defined as the total time the electrical energy source is powered off | s | 1 | -/- |

First to last power-on | Time between first to last power on of the electrical energy source | min | 1 | -/x |

Charging Time | Total time spent charging scrap into the furnace | min | 1 | -/x |

Melting Remelting Time | Total time in melting and remelting stags of the process | min | 1 | -/x |

Melting Time | Total time in the melting stage of the process | min | 1 | -/x |

Refining Time | Total time in the refining stage of the process | min | 1 | -/x |

Tapping Time | Total time spent in the tapping stage | min | 1 | -/x |

Preparation Time | Total preparation time before the start of the process. | min | 1 | -/x |

All Delays | Includes all delays imposed on the heat | s | 1 | x/x |

Only wait delays | Includes only delays classified as “wait” type | s | 1 | x/x |

Rare delays⋆ | All delays excluding those that are expected to frequently occur | s | 1 | x/x |

Very rare delays† | All delays excluding those that are very rare | s | 1 | x/x |

Common delays | All delays except those defined in ☆ and † | s | 1 | x/x |

Steeltype categories | The categorization of steel types produced in the EAF | - | 8 | -/- |

Production indices | To keep order of the heats relative to the production supply chain | - | 6 | -/- |

## References

- Box, G.E. Robustness in the strategy of scientific model building. In Robustness in Statistics; Elsevier: Amsterdam, The Netherlands, 1979; pp. 201–236. [Google Scholar]
- Nicolau, M.; Levine, A.J.; Carlsson, G. Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proc. Natl. Acad. Sci. USA
**2011**, 108, 7265–7270. [Google Scholar] [CrossRef][Green Version] - Li, L.; Cheng, W.Y.; Glicksberg, B.S.; Gottesman, O.; Tamler, R.; Chen, R.; Bottinger, E.P.; Dudley, J.T. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Sci. Transl. Med.
**2015**, 7, 311ra174. [Google Scholar] [CrossRef][Green Version] - Hinks, T.; Brown, T.; Lau, L.; Rupani, H.; Barber, C.; Elliott, S.; Ward, J.; Ono, J.; Ohta, S.; Izuhara, K.; et al. Multidimensional endotyping in patients with severe asthma reveals inflammatory heterogeneity in matrix metalloproteinases and chitinase 3–like protein 1. J. Allergy Clin. Immunol.
**2016**, 138, 61–75. [Google Scholar] [CrossRef] [PubMed][Green Version] - Schneider, D.S.; Torres, B.Y.; Oliveira, J.H.M.; Tate, A.T.; Rath, P.; Cumnock, K. Tracking resilience to infections by mapping disease space. PLoS Biol.
**2016**, 14, e1002436. [Google Scholar] - Romano, D.; Nicolau, M.; Quintin, E.M.; Mazaika, P.K.; Lightbody, A.A.; Hazlett, H.C.; Piven, J.; Carlsson, G.; Reiss, A.L. Topological methods reveal high and low functioning neuro-phenotypes within fragile X syndrome. Hum. Brain Mapp.
**2014**, 35, 4904–4915. [Google Scholar] [CrossRef][Green Version] - Carlsson, G. The shape of biomedical data. Curr. Opin. Syst. Biol.
**2017**, 1, 109–113. [Google Scholar] [CrossRef] - Cámara, P.G. Topological methods for genomics: Present and future direction. Curr. Opin. Syst. Biol.
**2017**, 1, 95–101. [Google Scholar] [CrossRef] [PubMed][Green Version] - Savir, A.; Toth, G.; Duponchel, L. Topological data analysis (TDA) applied to reveal pedogenetic principles of European topsoil system. Sci. Total Environ.
**2017**, 586, 1091–1100. [Google Scholar] - Bowman, G.; Huang, X.; Yao, Y.; Sun, J.; Carlsson, G.; Guibas, L.; Pande, V. Structural Insight into RNA Hairpin Folding Intermediates. JACS Commun.
**2008**, 130, 9676–9678. [Google Scholar] [CrossRef] - Duponchel, L. Exploring hyperspectral imaging data sets with topological data analysis. Anal. Chim. Acta
**2018**, 1000, 123–131. [Google Scholar] [CrossRef] - Duponchel, L. When remote sensing meets topological data analysis. J. Spectr. Imaging
**2018**, 7, a1. [Google Scholar] [CrossRef][Green Version] - Lee, Y.; arthel, S.D.B.; Dlotko, P.; Moosavi, S.M.; Hess, K.; Smit, B. Quantifying similarity of pore-geometry in nanoporous materials. Nat. Commun.
**2017**, 8, 1–8. [Google Scholar] [CrossRef] [PubMed][Green Version] - Lum, P.Y.; Singh, G.; Lehman, A.; Ishkanov, T.; Vejdemo-Johansson, M.; Alagappan, M.; Carlsson, J.; Carlsson, G. Extracting insights from the shape of complex data using topology. Sci. Rep.
**2013**, 3. [Google Scholar] [CrossRef][Green Version] - Brüel Gabrielsson, R.; Carlsson, G. Exposition and interpretation of the topology of neural networks. In Proceedings of the 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019; pp. 1069–1076. [Google Scholar]
- Saul, N.; Arendt, D.L. Machine Learning Explanations with Topological Data Analysis. Available online: https://sauln.github.io/blog/tda_explanations/ (accessed on 25 May 2020).
- Carrière, M.; Michel, B. Approximation of Reeb spaces with Mappers and Applications to Stochastic Filters. arXiv
**2019**, arXiv:1912.10742. [Google Scholar] - Zhou, Y.; Song, S.; Cheung, N.M. On Classification of Distorted Images with Deep Convolutional Neural Networks. arXiv
**2017**, arXiv:1701.01924. [Google Scholar] - Dodge, S.; Karam, L. Understanding How Image Quality Affects Deep Neural Networks. In Proceedings of the 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX), Lisbon, Portugal, 6–8 June 2016. [Google Scholar]
- Cisse, M.; Adi, Y.; Neverova, N.; Keshet, J. Houdini: Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples. In Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Yuan, X.; He, P.; Zhu, Q.; Bhat, R.R.; Li, X. Adversarial Examples: Attacks and Defenses for Deep Learning. IEEE Trans. Neural Netw. Learn. Syst.
**2019**, 30, 2805–2824. [Google Scholar] [CrossRef][Green Version] - Moosavi-Dezfooli, S.M.; Fawzi, A.; Frossard, P. DeepFool: A simple and accurate method to fool deep neural networks. arXiv
**2015**, arXiv:1511.04599. [Google Scholar] - Chen, S.; Xue, M.; Fan, L.; Hao, S.; Xu, L.; Zhu, H.; Li, B. Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach. Comput. Secur.
**2018**, 73, 326–344. [Google Scholar] [CrossRef] - Wilson, A.G.; Kim, B.; Herlands, W. Interpretable Machine Learning for Complex Systems, NIPS 2016 Workshop. arXiv
**2016**, arXiv:1611.09139. [Google Scholar] - Tosi, A.; Vellido, A.; Alvarez, M. Transparent and Interpretable Machine Learning in Safety Critical Environments. 2017. Available online: https://sites.google.com/view/timl-nips2017 (accessed on 25 May 2020).
- Wilson, A.G.; Yosinski, J.; Simard, P.; Caruana, R.; Herlands, W. Interpretable ML Symposium. arXiv
**2017**, arXiv:1711.09889. [Google Scholar] - Varshney, K.; Weller, A.; Kim, B.; Malioutov, D. Human Interpretability in Machine Learning, ICML 2017 Workshop. arXiv
**2017**, arXiv:1708.02666. [Google Scholar] - Gunning, D. Explainable Artificial Intelligence (XAI). DARPA Broad Agency Announcement DARPA-BAA-16-53. 2016. Available online: https://www.aaai.org/ojs/index.php/aimagazine/article/view/2850 (accessed on 25 May 2020).
- Hara, S.; Maehara, T. Finding Alternate Features in Lasso. arXiv
**2016**, arXiv:1611.05940. [Google Scholar] - Wisdom, S.; Powers, T.; Pitton, J.; Atlas, L. Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery. arXiv
**2016**, arXiv:1611.07252. [Google Scholar] - Hayete, B.; Valko, M.; Greenfield, A.; Yan, R. MDL-motivated compression of GLM ensembles increases interpretability and retains predictive power. arXiv
**2016**, arXiv:1611.06800. [Google Scholar] - Tansey, W.; Thomason, J.; Scott, J.G. Interpretable Low-Dimensional Regression via Data-Adaptive Smoothing. arXiv
**2017**, arXiv:1708.01947. [Google Scholar] - Smilkov, D.; Thorat, N.; Nicholson, C.; Reif, E.; Viégas, F.B.; Wattenberg, M. Embedding Projector: Interactive Visualization and Interpretation of Embeddings. arXiv
**2016**, arXiv:1611.05469. [Google Scholar] - Selvaraju, R.R.; Das, A.; Vedantam, R.; Cogswell, M.; Parikh, D.; Batra, D. Grad-CAM: Why did you say that? arXiv
**2016**, arXiv:1611.07450. [Google Scholar] - Thiagarajan, J.J.; Kailkhura, B.; Sattigeri, P.; Ramamurthy, K.N. TreeView: Peeking into Deep Neural Networks Via Feature-Space Partitioning. arXiv
**2016**, arXiv:1611.07429. [Google Scholar] - Gallego-Ortiz, C.; Martel, A.L. Interpreting extracted rules from ensemble of trees: Application to computer-aided diagnosis of breast MRI. arXiv
**2016**, arXiv:1606.08288. [Google Scholar] - Krause, J.; Perer, A.; Bertini, E. Using Visual Analytics to Interpret Predictive Machine Learning Models. arXiv
**2016**, arXiv:1606.05685. [Google Scholar] - Zrihem, N.B.; Zahavy, T.; Mannor, S. Visualizing Dynamics: From t-SNE to SEMI-MDPs. arXiv
**2016**, arXiv:1606.07112. [Google Scholar] - Handler, A.; Blodgett, S.L.; O’Connor, B. Visualizing textual models with in-text and word-as-pixel highlighting. arXiv
**2016**, arXiv:1606.06352. [Google Scholar] - Krakovna, V.; Doshi-Velez, F. Increasing the Interpretability of Recurrent Neural Networks Using Hidden Markov Models. arXiv
**2016**, arXiv:1611.05934. [Google Scholar] - Reing, K.; Kale, D.C.; Steeg, G.V.; Galstyan, A. Toward Interpretable Topic Discovery via Anchored Correlation Explanation. arXiv
**2016**, arXiv:1606.07043. [Google Scholar] - Samek, W.; Montavon, G.; Binder, A.; Lapuschkin, S.; Müller, K.R. Interpreting the Predictions of Complex ML Models by Layer-wise Relevance Propagation. arXiv
**2016**, arXiv:1611.08191. [Google Scholar] - Hechtlinger, Y. Interpretation of Prediction Models Using the Input Gradient. arXiv
**2016**, arXiv:1611.07634. [Google Scholar] - Lundberg, S.; Lee, S.I. An unexpected unity among methods for interpreting model predictions. arXiv
**2016**, arXiv:1611.07478. [Google Scholar] - Vidovic, M.M.C.; Görnitz, N.; Müller, K.R.; Kloft, M. Feature Importance Measure for Non-linear Learning Algorithms. arXiv
**2016**, arXiv:1611.07567. [Google Scholar] - Whitmore, L.S.; George, A.; Hudson, C.M. Mapping chemical performance on molecular structures using locally interpretable explanations. arXiv
**2016**, arXiv:1611.07443. [Google Scholar] - Ribeiro, M.T.; Singh, S.; Guestrin, C. Nothing Else Matters: Model-Agnostic Explanations By Identifying Prediction Invariance. arXiv
**2016**, arXiv:1611.05817. [Google Scholar] - Singh, S.; Ribeiro, M.T.; Guestrin, C. Programs as Black-Box Explanations. arXiv
**2016**, arXiv:1611.07579. [Google Scholar] - Phillips, R.L.; Chang, K.H.; Friedler, S.A. Interpretable Active Learning. arXiv
**2017**, arXiv:1708.00049. [Google Scholar] - Ribeiro, M.T.; Singh, S.; Guestrin, C. Model-Agnostic Interpretability of Machine Learning. arXiv
**2016**, arXiv:1606.05386. [Google Scholar] - Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. arXiv
**2016**, arXiv:1602.04938. [Google Scholar] - Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 4765–4774. [Google Scholar]
- Carlsson, L.S.; Samuelsson, P.B.; Jönsson, P.G. Interpretable Machine Learning—Tools to Interpret the Predictions of a Machine Learning Model Predicting the Electrical Energy Consumption of an Electric Arc Furnace. Steel Research International. 2000053. Available online: http://xxx.lanl.gov/abs/https://onlinelibrary.wiley.com/doi/pdf/10.1002/srin.202000053 (accessed on 25 May 2020).
- Offroy, M.; Duponchel, L. Topological data analysis: A promosing big data exploration tool in biology, analytical chemistry and physical chemistry. Anal. Chim. Acta
**2016**, 910, 1–11. [Google Scholar] [CrossRef] [PubMed] - Singh, G.; Mémoli, F.; Carlsson, G. Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition. Available online: https://research.math.osu.edu/tgda/mapperPBG.pdf (accessed on 25 May 2020).
- Carlsson, G. Topology and data. Am. Math. Soc.
**2009**, 46, 255–308. [Google Scholar] [CrossRef][Green Version] - Müllner, D.; Babu, A. Python Mapper: An Open-Source Toolchain for Data Exploration, Analysis and Visualization. Available online: http://danifold.net/Mapper (accessed on 10 September 2018).
- Saul, N.; van Veen, H.J. MLWave/Kepler-Mapper: 186f (Version 1.0.1); Zenodo: Geneva, Switzerland, 2017. [Google Scholar] [CrossRef]
- Pearson, P.; Muellner, D.; Singh, G. TDAMapper: Analyze High-Dimensional Data Using Discrete Morse Theory; CRAN: Vienna, Austria, 2015. [Google Scholar]
- Edwards, A.W.; Cavalli-Sforza, L.L. A method for cluster analysis. Biometrics
**1965**, 21, 362–375. [Google Scholar] [CrossRef] - Murtagh, F.; Contreras, P. Algorithms for hierarchical clustering: An overview. Wiley Interdiscip. Rev. Data Min. Knowl. Discov.
**2012**, 2, 86–97. [Google Scholar] [CrossRef] - Blondel, V.D.; Guillaume, J.L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp.
**2008**, 2008, P10008. [Google Scholar] [CrossRef][Green Version] - Sexton, H.; Kloke, J. Systems and Methods for Capture of Relationships Within Information. U.S. Patent 10,042,959, 7 August 2018. [Google Scholar]
- Gelman, A.; Tuerlinckx, F. Type S error rates for classical and Bayesian single and multiple comparison procedures. Comput. Stat.
**2000**, 15, 373–390. [Google Scholar] [CrossRef] - Dodge, Y. The Concise Encyclopedia of Statistics; Springer: New York, NY, USA, 2009; pp. 283–287. ISBN 978-0-387-32833-1. [Google Scholar]
- Carlsson, L.S.; Samuelsson, P.B.; Jönsson, P.G. Using Statistical Modeling to Predict the Electrical Energy Consumption of an Electric Arc Furnace Producing Stainless Steel. Metals
**2020**, 10, 36. [Google Scholar] [CrossRef][Green Version] - Pratt, J.; Gibbons, J. Concepts of Nonparametric Theory; Springer: New York, NY, USA, 1981; pp. 318–344. ISBN 978-1-4612-5931-2. [Google Scholar]
- LeCun, Y.; Cortes, C.; Burges, C. The MNIST Dataset of Handwritten Digits(Images); NYU: New York, NY, USA, 1999. [Google Scholar]
- Mu, N.; Gilmer, J. MNIST-C: A robustness benchmark for computer vision. arXiv
**2019**, arXiv:1906.02337. [Google Scholar] - World Steel Association. Steel Statistical Yearbook 2018. Available online: https://www.worldsteel.org/steel-by-topic/statistics/steel-statistical-yearbook.html (accessed on 29 April 2020).
- Kirschen, M.; Badr, K.P.H. Influence of Direct Reduced Iron on the Energy Balance of the Electric Arc Furnace in Steel Industry. Energy
**2011**, 36, 6146–6155. [Google Scholar] [CrossRef] - Sandberg, E. Energy and Scrap Optimisation of Electric Arc Furnaces by Statistical Analysis of Process Data. Ph.D. Thesis, Luleå University of Technology, Luleå, Sweden, 2005. [Google Scholar]
- Pfeifer, H.; Kirschen, M. Thermodynamic analysis of EAF electrical energy demand. In Proceedings of the European Electric Steelmaking Conference, Venice, Italy, 26–29 May 2002; Volume 7. [Google Scholar]
- Steinparzer, T.; Haider, M.Z.F.E.G.M.H.A. Electric Arc Furnace Off-Gas Heat Recovery and Experience with a Testing Plant. Steel Res. Int.
**2014**, 85, 519–526. [Google Scholar] [CrossRef] - Keplinger, T.; Haider, M.S.T.T.P.P.A.H.M. Modeling, Simulation, and Validation with Measurements of a Heat Recovery Hot Gas Cooling Line for Electric Arc Furnaces. Steel Res. Int.
**2018**, 89, 1800009. [Google Scholar] [CrossRef][Green Version] - Carlsson, L.S.; Samuelsson, P.B.; Jönsson, P.G. Predicting the Electrical Energy Consumption of Electric Arc Furnaces Using Statistical Modeling. Metals
**2019**, 9, 59. [Google Scholar] [CrossRef][Green Version] - Zeiler, M.D. ADADELTA: An Adaptive Learning Rate Method. arXiv
**2012**, arXiv:1212.5701. [Google Scholar] - Vejdemo-Johansson, M.; Carlsson, G.; Carlsson, L. Supplementary Material for Fibres of Failure; Figshare: Boston, MA, USA, 2018. [Google Scholar] [CrossRef]
- Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv
**2013**, arXiv:1312.6034. [Google Scholar] - Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv
**2014**, arXiv:1412.6980. [Google Scholar] - Anaconda Distribution for Python. Available online: https://www.anaconda.com/products/individual (accessed on 25 May 2020).
- Ayasdi Python SDK Documentation Suite. Available online: https://platform.ayasdi.com/sdkdocs/ (accessed on 10 September 2018).

**Figure 1.**Frameworks TDA inherits from topology. (

**a**) Coordinate invariance. The dataset has been rotated approximately 100 degrees clockwise. (

**b**) Deformation invariance. The dataset has been stretched along the y = x line. (

**c**) Compression of a dataset into 4 nodes and 3 edges.

**Figure 2.**Illustration of Mapper for the case when $k=1$; i.e., $f:X\to {\mathbb{R}}^{1}$. Refer to the descriptions of the Mapper algorithm in Section 2.2 for each of the steps (

**a**–

**d**).

**Figure 3.**The two-sample Kolmogorov–Smirnov (KS)-test illustrated for the random variables X and Y, where $X\sim Norm(200,25)$ and $Y\sim Norm(200,35)$ [66].

**Left:**The cumulative distribution functions (CDF) of X and Y. ${D}_{{n}_{1},{n}_{2}}$, calculated using Equation (5), is shown as the difference between the upper and lower dashed lines; 100 samples were drawn from each distribution.

**Right:**The probability density functions of X and Y.

**Figure 4.**The topology for the CNN model. The numbers display the dimensions of the layers in the model. The abbreviations, such as Conv2D, describe the specific transformations performed between layers in the model. The activation function for the classification layer was “softmax”, and for the other layers it was “ReLU”. The optimizer used was “Adadelta” [77].

**Figure 5.**An illustration how all datasets relate to all steps in FiFa for the MNIST model case. The asterisk (*) marker emphasizes that the 5 Mapper graphs and the corresponding LR correction classifiers were created using 5 folds of a randomized combination of the MNIST-test and C-MNIST-test datasets. These datasets are shown in the bottom of the figure. The names in parentheses are the names of the datasets as they are referred to in the text and subsequent figures and tables.

**Figure 6.**One of the five Mapper graphs created by the activations from the CNN model on four of the five folds, i.e., 16,000 images (5-fold-training), as explained in Section 2.8.1. The graph is colored with the probability of predicting the ground truth digit. The colorbar is for interpreting the values of the coloring. The circled nodes and edges are the groups 30, group 40, 47, and 50. The other four Mapper graphs are shown in the Supplementary Material [78].

**Figure 7.**The failure modes for a ground truth of five. We see the distributions of predictions for the three failure modes: only group 30 attaches any significant likelihood to the digit 5 at all, while all three favor eight. For group 40, the digits two and three are also commonly suggested; this happens somewhat more rarely in groups 30 and 47.

**Figure 8.**Examples of noisy images and saliency maps for activations in the penultimate dense layer for the three main failure modes identified for noisy 5s. The two leftmost images were chosen as the most clear saliency maps with respect to digits. The two rightmost were selected based on unclear/noisy saliency maps. All saliency maps are from images classified as members of the respective failure mode group. All saliency maps can be found in our Supplementary Material [78].

**Figure 9.**The Mapper graph. The groups 1247, 1248, 1250, and 1252 have highlighted nodes and are marked in the figure. The color-bar represents values, with respective coloring in the graph, for $\Delta {E}_{El}$. The values are in kWh/heat. The line between the adjacent groups 1248 and 1252 is present for interpretability purposes. Number of data points and $\Delta {E}_{El}$ for each group;

**group 1247:**165, 2160 kWh/heat.

**Group 1248:**202, −2740 kWh/heat.

**Group 1250:**213, 2350 kWh/heat.

**Group 1252:**355, −2970 kWh/heat.

**Figure 10.**Distribution plots for the 5 variables with the highest (

**left**) and lowest (

**right**) KS-values, respectively. Truncated distributions (values within the 1000-quantiles) for the group (black) and the rest (gray) are shown. The dashed lines indicate the means of respective distribution. The values in parenthesis show the KS-values. Number of data points and $\Delta {E}_{El}$ for each group;

**group 1247:**165, 2160 kWh/heat.

**Group 1248:**202, −2740 kWh/heat.

**Group 1250:**213, 2350kWh/heat.

**Group 1252:**355, −2970 kWh/heat.

Energy Factor | Percentage | |
---|---|---|

In | Electric | 40–66% |

Oxidation | 20–50% | |

Burner/fuel | 2–11% | |

Out | Liquid steel | 45–60% |

Slag and dust | 4–10% | |

Off-gas | 11–35% | |

Cooling | 8–29% | |

Radiation and electrical losses | 2–6% |

Variable(s) | Description | Unit |
---|---|---|

Total Weight | Total input weight of charged baskets | kg |

Raw material types | Total input weight of each of 7 raw material categories | kg |

Additive Propane | Total input of propane through burners | Nm${}^{3}$ |

Additive O2 Burner | Total input of oxygen through burner | Nm${}^{3}$ |

Additive O2 | Total input of oxygen through lance | Nm${}^{3}$ |

Process Time | Defined as start of the heat to the end of the heat | min |

Tap-To-Tap Time | The time between the end of the last heat to the end of the current heat | min |

All Delays | Includes all delays imposed on the heat | s |

**Table 3.**Performance of the CNN as compared to CNN with FiFa-driven improvements both on the average of the five folds of test data (5-fold-test) and on entirely corrupted test data (C-MNIST-eval). The improvements by the classifier ensemble are for the best performing parameters. The FiFa-driven improvement produces an 18.43%pt increase in accuracy on the C-MNIST-eval dataset, which consists of only corrupt MNIST images. In addition, the percentage of clean images of the adjusted predictions by the correction layer was only 0.21%.

Dataset | 5-Fold-Test | C-MNIST-Eval |
---|---|---|

(Number of Images) | (4000) | (10,000) |

CNN | 69.40% | 41.14% |

CNN+LR | 75.45% | 59.57% |

**Table 4.**The percentage of blank saliency maps for each of the five neurons with the highest absolute KS-values (compared to group 50) in the Dense-128 layer. The percentages only include saliency maps from the images classified as members of the respective failure mode group. Blank saliency maps do not contribute to the subsequent layers in the network because the activation is zero. The neuron numbers with bold font are the neurons qualitatively identified as encoding digit five. We observe that the neurons encoding digit five have predominantly larger percentages of blank saliency maps. This means that digit five receives lower probability for the number of images equating to the percentage of blank saliency maps.

Group 30 | |||||

Neuron | 24 | 33 | 81 | 89 | 124 |

%Blank | 36.1% | 26.2% | 60.7% | 0% | 8.2% |

Group 40 | |||||

Neuron | 24 | 81 | 89 | 99 | 119 |

%Blank | 82.2% | 91.8% | 0% | 4.1% | 17.9% |

Group 47 | |||||

Neuron | 24 | 49 | 81 | 89 | 122 |

%Blank | 70.6% | 54.3% | 84.3% | 0.5% | 3.6% |

**Table 5.**Results before and after the adjustments on the test data using logistic regression with $C=1$. The chosen logistic regression classifier ensemble was the one that, to the highest extent, reduced the standard deviation of error and increased ${R}^{2}$. (See Appendix A) C for the results from the logistic regression classifier ensembles with the other C-values.

Type | ${\mathit{R}}^{2}$ | $\mathbf{Avg}.\mathbf{\Delta}{\mathit{E}}_{\mathbf{El}}$ | $\mathbf{Std}.\mathbf{\Delta}{\mathit{E}}_{\mathbf{El}}$ | $\mathbf{Min}.\mathbf{\Delta}{\mathit{E}}_{\mathbf{El}}$ | $\mathbf{Max}.\mathbf{\Delta}{\mathit{E}}_{\mathbf{El}}$ |
---|---|---|---|---|---|

Unit | - | kWh | kWh | kWh | kWh |

Original | 0.50 | −70 | 1988 | −13,954 | 6520 |

Adjusted | 0.56 | −81 | 1867 | −12,284 | 7123 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Carlsson, L.S.; Vejdemo-Johansson, M.; Carlsson, G.; Jönsson, P.G.
Fibers of Failure: Classifying Errors in Predictive Processes. *Algorithms* **2020**, *13*, 150.
https://doi.org/10.3390/a13060150

**AMA Style**

Carlsson LS, Vejdemo-Johansson M, Carlsson G, Jönsson PG.
Fibers of Failure: Classifying Errors in Predictive Processes. *Algorithms*. 2020; 13(6):150.
https://doi.org/10.3390/a13060150

**Chicago/Turabian Style**

Carlsson, Leo S., Mikael Vejdemo-Johansson, Gunnar Carlsson, and Pär G. Jönsson.
2020. "Fibers of Failure: Classifying Errors in Predictive Processes" *Algorithms* 13, no. 6: 150.
https://doi.org/10.3390/a13060150